Ep. 104 Scaling Iron Mountain with MongoDB

Media Thumbnail
  • 0.5
  • 1
  • 1.25
  • 1.5
  • 1.75
  • 2
This is a podcast episode titled, Ep. 104 Scaling Iron Mountain with MongoDB. The summary for this episode is: <div class="user-content-block"> <p>Happy World Backup Day!!!</p> <p>Consider Your <a href= "https://www.mongodb.com/blog/post/consider-your-cloud-backup-strategy-world-backup-day"> Cloud Backup Strategy</a> on World Backup Day, March 31</p> <p><a href="https://www.ironmountain.com/">Iron Mountain</a> processes millions of documents for many of the top Fortune 500 companies and globally across five major markets. They digitize and ingest these documents, process them to classify, enrich and extract metadata for our customers. In 2021, they digitized over 870 million pages of documents, which is enough to spread from Atlanta, Georgia to Albuquerque, New Mexico. Today on the show, <a href="https://www.linkedin.com/in/adamdwilliams/">Adam Williams, Sr. Director of Platform Engineering</a> at Iron Mountain joins Michael to talk about how Iron Mountain is leveraging MongoDB to efficiently, and securely achieve massive scale.</p> <p>Iron Mountain is Hiring! To find out more, visit <a href= "https://ironmountain.jobs/">https://ironmountain.jobs/</a></p> <p> </p> </div>
Episode overview
00:50 MIN
Adam talks about his background, and Iron Mountain's digitization business
02:10 MIN
How Iron Mountain benefits from MongoDB's flexible document structure
01:32 MIN
Scaling their service with MongoDB, and the development of an insights platform
05:08 MIN
Surprises from launching insights in the digitization industry regarding compliance and security
01:25 MIN
Other capabilities Iron Mountain makes use of in the Atlas platform
01:35 MIN
Discussing timeframes a customer can expect for their digitized documents
02:44 MIN
Adam talks about the scope of the Insights project
01:37 MIN
Opportunities at Iron Mountain
00:38 MIN
Mike asks Adam to explain more about the stack, platform, and how Iron Mountain builds schemas
02:30 MIN
What's on the roadmap for the Insights platform
01:38 MIN
Adam elaborates on the invoice capability and being able to take customer's data needs from beginning to end
02:17 MIN

Michael Lynn: Welcome back to the show. My name is Michael Lynn, your host. This is the MongoDB podcast. Before we get to the show today, I'd like to ask a really quick favor. Hey, if you're listening and you're enjoying what you hear, leave me a comment. Can I ask for that? Can I ask for a comment on Apple Podcast, Spotify, a brand new platform that gives you the ability to these comments and ratings? We would love to hear your thoughts on what you're enjoying and what you're not. Obviously I want to get some good positive feedback, but I want to improve the show. It's important to me that what we're putting together here at MongoDB makes sense and is valuable to you and leaving a comment and a rating is going to help me do that. It's going to help me improve the show. So enough of the favors, we'll get to the show today. Today, I'm really excited to present to you a conversation I had with Adam Williams. Adam Williams is the senior director of platform engineering at Iron Mountain. Iron Mountain is one of those companies they've been around for a really long time, but they have not stopped growing and innovating. They have developed a very interesting platform of applications called Insight. Now what Iron Mountain does, they're in the document digitization and asset management space. So they're helping companies, many companies in the Fortune 500, Fortune 1000 all around the globe manage their digital assets. We're going to hear all about how they're leveraging MongoDB to do that in a secure way, and in a way that enables them to scale massively. Stay tuned. MongoDB World returns to New York City. MongoDB World 2022, the future runs on MongoDB. This is a conference for creators, disruptors, and transformers of tomorrow. You can register right now. Join us from June 7th to the 9th of 2022 for three days of announcement packed keynotes, hands on workshops and deep dive technical sessions. Use the code podcast to get 25% off the already discounted price. Visit mongodb. com/ world- 2022 to register today.

Adam Williams: So I'm Adam Williams, I'm the senior director of platform engineering at Iron Mountain, and we're going to talk about our use of MongoDB with our Insight platform. We process millions of records for many of the top Fortune 500 companies and globally across five major markets. We digitize and ingest documents, then we process them to classify, enrich, and extract metadata for our customers. To give you a sense of the scale, in 2021, we digitized over 870 million pages of documents, which is enough to spread from Atlanta, Georgia to Albuquerque, New Mexico.

Michael Lynn: Now, I remember Iron Mountain did some work with a financial services company and it was like offsite tape storage. And I think that's how Iron Mountain got its start, but as a company you have just expanded and really gone into the digital space. You want to talk a little bit about how long you've been at Iron Mountain and what that transition has been like?

Adam Williams: Yeah, just in the 10 years I've been with Iron Mountain, I've seen a great transformation. Coming from our shredding services and our traditional physical asset storage, we've evolved into taking and working with our customers to solve and unlock some of the problems they have with the data that they store. With the data that we store customers are having to ask and answer many questions around legal claims, they've had to be able to go and research different document sets that they're storing with us. And they simply just don't have the metadata on all of these physical assets to be able to research and find what they need. So that's where the digitization becomes so key, and in the last couple years, we've developed the insight platform provides all of the capabilities that allows our customers to unlock the data they're storing and then to build the additional collections of data so that they can continue to evolve and work with their business processes.

Michael Lynn: What a great use case. So it all begins with the physical medium and the physical medium is transferred to Iron Mountain where it's digitized. And I can't think of a better use case for MongoDB's flexible document structure. I mean, you don't know what type of data you're going to be getting. Is that one of the primary motivations for choosing MongoDB as a database?

Adam Williams: Yeah, so we have structured and unstructured data that we're working with and to give you a sense, some of that data can be on microfilm reels, it can be in different video archives. So MongoDB gives us the flexibility to start with a really small implementation and then to scale it as we go. A lot of times we're working with customers just to understand their data needs. And at that point we just need a small installation. We just need a small set of components that we can work with while we build out our additional capabilities with the customer. The other thing that's really important for us is the data residency capability. A lot of our customers have requirements where they have to run the data within a certain country or region. And we're able to meet that with the cloud, by being able to stand up instances within the different AWS and GCP data centers.

Michael Lynn: Now, are you doing anything in the multi- cloud space?

Adam Williams: We are. So currently we run on GCP and AWS with our full featured capabilities. We make use of AI and ML capabilities that are within the different cloud platforms. And then we also deliver our own custom IML solutions on the platform.

Michael Lynn: So you talked briefly about scale, remind me how many, I guess, customers and documents are you managing today?

Adam Williams: Yeah, well, we have, well over 100,000 customers at Iron Mountain, and many of them are the largest Fortune 500 companies. And these companies have just vast amounts of data. It's not uncommon to look at a customer and be like, wow they have 68 million documents, right. Or they have 10 million boxes. So in helping those customers sort through all of that data that they store with us is really what we do.

Michael Lynn: And how about scalability? Maybe let's back it up a bit and let's talk about the Insight platform and the stack. Talk a little bit about how the Insight platform was built and what components and frameworks are you using.

Adam Williams: Yeah, so we're a cloud native solution. We run with modern technologies. We run mostly in kubernetes for scalability that allows us to scale up many different parts at once, and to be able to scale up for the needs of the customer, but also to scale down. A lot of our workloads have spikes in them where we might be processing large number of documents in one week, but in the next week we might be having a smaller volume. So that being able to scale up and back down is very important for us. So that's why we use Kubernetes. Then we bring in Atlas MongoDB, and search for us to be able to allow customers to be able to through, find their data in a no SQL instance. We also use Elastic Search, and at times we bring in Postgres and we have different custom software components that we've built as well as vendor software that we bring in to do the digitization and the scanning of our documents.

Michael Lynn: Now MongoDB offers several types of scalability. Obviously it's ultimately scalable from a manual perspective. You can choose to scale up with zero downtime manually, but we also offer auto scalability. I'm curious if you're leveraging that.

Adam Williams: We are. We take advantage of the auto scaling, which really allows us, like I said, to be able to scale up for those big peak periods that we have. In traditional sense, just a couple years ago we were running more traditional databases where we had an Oracle or a Postgres where you had to have a set of DBAs that were closely monitoring it and system administrators who were closely monitoring that system for availability, they're monitoring it to make sure that it's meeting the scaling needs, they're adding additional CPUs and memory to it all the time, which creates downtime. In the new world, a new paradigm that we're living in, we have really, really great availability with MongoDB Atlas and the ability to scale, but there's one other thing in there that's really important that I want to make note of, is our ability to be compliant with the different security patches and upgrades that are out there. A lot of the time that gets sucked away by our engineering teams is putting in upgrades and patches to make sure that we have the most recent versions of software running. And by using past solutions and managed services, we're able to give the team back more time to go work with our customers, so we don't have to constantly be patching systems and making sure that the latest service packs been installed we have the ability to do more of that automatically.

Michael Lynn: So how long has the Insight development project been underway?

Adam Williams: Yeah, it's been about three years. I'd say in the last year, we've made a lot of progress in choosing the right tools and technologies. In the first year we started out trying to understand the space that we were in, what were the problems the customers were facing? What did we need to solve those problems? In our second year of the journey we started building the platform and we settled on cloud- based technologies that we could use. And then we learned, we learned from a lot of different customer challenges and proofs of concepts and different contracts that we did. Now in our third year are we're really gaining a lot of momentum. We have a lot of customers that are coming in. I used to be able to name all the customers off the top of my head. I can't anymore. We have teams with professional services. We have teams with our sales teams and delivery teams that are all mobilized to support our customer needs.

Michael Lynn: So the Insight platform itself, is this an additional offering or does it just enhance the value that you're delivering to your customers already?

Adam Williams: It enhances the value for the customers that we have already. It also can be an additional offering. This isn't just a platform for our existing customers and some elite club. We have the ability to bring in new customers and you've been working with new customers on some major challenges. A lot of that's been in the government space, working with customers on some of their really big digitization projects, where they're able now to turn to Iron Mountain, because we can work through the volumes of data that they're providing to us.

Michael Lynn: And so talk a little bit about that space. I mean, obviously digitization and compliance and security concerns are massive in the government space. What are some of the things that may have surprised you about your implementation and launching Insight in that space?

Adam Williams: I think the patience to work through those security challenges or compliance programs, we started doing our FedRAMP, US FedRAMP about two years ago when I first started, I had to Google FedRAMP. I had some background in the traditional government dicap and discap processes, but we had to come up to speed on what are the security compliance packages that are needed for the government. And going through FedRAMP was a real journey for us. There's over 300 security controls that you have to meet. And then doing that in the cloud is somewhat unchartered territory for a lot of folks. So going through and understanding the security profile that you have to build and the posture and all of the documentation. So I think really the endurance and ability to have patience as you work through that, it's hard to say that you're going to be security compliant on a certain date. You just have to go through and line up all the different capabilities that you need to build and all the documentation that you need to provide. And then the audits that you have to go through to be able to show and demonstrate that you have all of those capabilities in place.

Michael Lynn: So what other features in the Atlas platform are you making use of today?

Adam Williams: Some of the capabilities that we really see that have great advantages for us include the ability to index a lot of data quickly. So the ability to connect to pub sub such as Kafka and the ability to stream the data into the database and in the search engine is really important to us. Oftentimes we would spend time building these large processes that are taking data and indexing it from different sources. Now we have the ability to put the data right onto a message bus and have it go straight into the data stores. So that really saves our development team, again, more time from having to build complex indexing processes, to just streaming the data from multiple sources, right into the data stores so that they accessed and searched by the customers. The other area that we're starting to explore is the use of MongoDB for reporting and analytics. So we have a lot of challenges or opportunities for customers where they have deep analytics and reporting they need to pull back. And customers want to be able to load that report for a million documents on the screen in just a couple seconds, right? And with more traditional software engineering approaches, we would be having back office programs that run and then email that report to the customer some days or hours later. But now we have the ability to generate on demand, reporting in the analytics for our customers without having to have a large, large architecture to support it.

Michael Lynn: So I'm thinking about the use case, and it almost feels like an Amazon in reverse. You've got customers with goods, the digital assets and the paper assets, and they need to get them to a central source. What's the general timeframe that a customer can expect, if I've got some things that need to be digitized.

Adam Williams: It can really vary from same day to same week to being a large, massive project that can take several months to mobilize. What's important is in our platform, that's really unique, we have the three major pillars that we've been talking about around our content management, physical asset management and the intelligent document process and capabilities. Taking all three of those and putting together allows us to be able to deliver that more quickly. So for instance, we provide a single pane of glass to our customers that allows them to log in and they can see all the physical and digital assets they have on one screen. So if you're a lawyer and you're trying to find supporting documentation for a claim that they're reviewing, they have the ability to go in and find either that document or to find that box or that file. They might have that document in it. They can request to have it digitized. We provide same day next day, or within a couple days of turnaround time, or they can, a lot of times attorneys may look at the vast amount of boxes we have and they know that document they need is somewhere in this group. So do a larger scanning project with us to take all of that inventory and then to scan that through and to provide it in a digital format with full tech search.

Michael Lynn: I love that capability. I'm thinking about the IOT use cases that I've worked with. I'm wondering if, do you have devices that customers can install locally so that they speed up the process of getting those paper assets scanned in and digitized?

Adam Williams: Well, a lot of our physical boxes that we store have RFID associated with them that allows us to quickly find those boxes within the record center. And then we have a vast array of different software that allows us to be able to digitize these quickly. We have one customer who simply found that a lot of the metadata that we're storing with them was written on manifests associated with the data that we're storing. And we have the ability to quickly image that manifest that allows them to be able to more accurately determine where their data is. I think a of it a lot is like a library where you have that 1990s, 1980s card catalog, we're, we're able to digitize that card catalog with different devices, allowing them to be able to quickly search and define the information that they're looking for.

Michael Lynn: The Insights project, when it launched, you mentioned traditional databases, was there a migration? And I guess I'm wondering what kind of scope this entire project was and how you were able to take folks that were probably working like DBAs that were working with legacy relational databases and translate those skills to MongoDB.

Adam Williams: Yeah. It's been quite the journey. So at Iron Mountain, we've always had a large IT infrastructure. And we've really invested in our personnel, who've been working with those systems for a long time, and we went through many different cloud boot camps. We learned kubernetes, we learned how to build with Docker containers. And I still remember a pivotal moment that we had where we brought our entire DevOps team together. And we did several days of training. We just unplugged from the real world and went off and trained for a couple days. And it was amazing. You had everyone in the room for the first time building Docker containers and learning how to deploy them. And we quickly got into Terraform and helm charts, and that became the backbone for the system that we have today. We found that investing in the folks that we already have was very important. And then we've brought in a lot of new talent and new folks from the industry with different specialties, that are able to assist us and with a great understanding of the cloud platforms. So it's been a journey to build that team. And it's been a big investment in Iron Mountain to build that team with Iron Mountain and contract personnel that we're able to bring together who have all the expertise that we need.

Michael Lynn: And I have to ask the question folks out there might be looking for opportunities is Iron Mountain hiring?

Adam Williams: We are. We hired a lot of people last month, and we're looking to hire more. Specifically, we're looking for folks with great software development skills. We're looking for really motivated self starters who are interested in building cutting edge, leading architectures, and then working with our customers to solve problems. Our customers and our engineers are often working together to solve the search challenges. They have the IML metadata extraction challenges. And it's a great opportunity here at Iron Mountain.

Michael Lynn: I asked you about the stack. I'm not sure. I got it. What is the stack written in? What is Insights as a platform written in?

Adam Williams: We're mostly in Java and Python, but we bring in different technologies based upon some of the different use cases. The IML capabilities that we run are using Python libraries. And then we use Java services and Java APIs written with a rest APIs to deliver and integrate our solutions.

Michael Lynn: I want to double click on the database aspect. You, you mentioned that there's metadata, and I'm just trying to envision what that schema might look like for a document collection. So I've got physical assets that need to be scanned in, perhaps you're storing an image maybe in an S3 bucket and associating the scan or OCR data from that document in an actual MongoDB document. Is that correct?

Adam Williams: Yeah. Are you looking for a job? So basically we take the metadata and we build the schema, but we've gotten smart with that, because we've found we're building the same schema over and over again. So what we've done is we've taken approach of master data management or MDM. Where we build collections and libraries and metadatas and schema and document types around our common solutions. Two solutions that we're currently working on as we speak are around our mail room offering and our invoice offering. So we have common schema for mail, an invoice that allows us to be able to put those to work with customers. And then we're able to tailor them just slightly for their needs or specific challenges that they're having that we need to fill. So we found that having that standard data approach and document types and schema was really valuable to us having a quick start with customers.

Michael Lynn: Yeah. So having kind of a template approach. I'm curious there is variability from document to document, as you know, MongoDB supports a flexible structure of flexible schema. Is there much variability from document to document?

Adam Williams: Oh yeah. I mean, when you look at customers, a lot of times they'll start with, I have one or two document types, but then you really start looking and you're like, wow, I didn't realize that this could turn into 100 document types. So our ability to go through and determine the different document types that they have to be able to adjust. That's so important for us is that once we go live with a solution, it becomes more of a living solution where we can go in and add additional metadata fields and we can grow and change our schema over time.

Michael Lynn: So what's on the roadmap, what's in the future for the Insights platform.

Adam Williams: Well, there's a lot. So I'll start with the two solutions I just mentioned, the mail room and invoice. Our mountain is really situated for a very unique use case around mail room. So there's a lot of customers who get a lot of mail, but they're not at their office anymore. So if you're working from home like we are and you have mail being delivered to the office, there's really not much structure in any organizations to get that mail delivered to you. So we have the ability now to deliver mail, right to your inbox, the ability for us to be able to go take the full mountain capabilities that we have, where we can drive to the different post offices every single day and pick up the mail, digitize it and then provide it right in your inbox. But it doesn't' just go to any inbox. It goes to your inbox, your department, or your group's inbox, so that folks are able to get the traditional mail. And then that scales into invoices and different mail that the people are getting through a delivery service.

Michael Lynn: Well, so obviously there's a difference in me sending a piece of physical mail to Adam Williams at your home address and getting it to your email, how do you do that?

Adam Williams: So we provide the content services platform capability that allows you to be able to go in, view a dashboard that has your unread mail in different areas for you'd be able to manage that mail. So you get a notification that you have new and then you're able to go and view it. And then you can share it with other people in your organization or within your group.

Michael Lynn: That's fantastic. What a great use case and congratulations on a really fantastic use of MongoDB. It just seems like you're leveraging all of the capabilities. Adam, is there anything else you'd like to tell the folks about Iron Mountain or the Insights platform?

Adam Williams: I think I can expand a little bit on the invoice capability. So we have customers who have invoices that are coming in from different vendors, internal, external invoices. And we're integrating with ERP systems such as SAP and Oracle to allow our customers to be able to get those documents, digitized, extract the key metadata, and then integrate it with their business processes so that they can approve and pay invoices. That's a really important capability. I think coming back to some of the different capability we're working on this year, we're working on a very large implementation for us government customer digitizing over 60 million images or sets of microfilm rather that they have, that's a really big project. And in that project, we've expanded our use of Apache spark and allowed us to be able to process us millions of documents per day. And to really scale, we have some other customers we're working with around their ability to search and discover the different data that they're looking for. And then we're applying the AI ML models that allows us to automate that more fully and to give them more metadata that they can search by and to provide even more structure to the data that we store for them.

Michael Lynn: The future is bright. Anything else you want to share with the audience?

Adam Williams: I think that's it. Definitely open to discussing more the use of MongoDB and our overall offerings and capabilities, feel free to reach out we're here and if any customers that you know of have specific needs we're the new Iron Mountain. We have the new technologies, new capabilities and new service offerings to really take a customer's data needs from beginning to end. Some of our folks call it the cradle to grave. From the moment the documents been created to the time that it needs to be digitized, archived or destroyed, we're now providing that end to end capability.

Michael Lynn: Yeah. Full life cycle. So Adam, how can folks get in touch with you?

Adam Williams: Well, so we have a website out there ironmountain. com and you can reach out to our teams, it I'll even put it out there. My email is adam. williams @ ironmountain. com. Hopefully I don't regret that later, but you guys, anyone who has questions about our use of technology is certainly welcome to reach out. I found that early on in working with Iron Mountain building those partnerships with other technology companies and with others is really important because we learn from each other and we're able to share the capabilities that we're building so that we can provide even more value to our customers.

Michael Lynn: Well, Adam, I want to thank you for spending time with me and sharing details about Iron Mountain, about the Insights platform. Thanks very much.

Adam Williams: All right. Thank you.

Michael Lynn: Thanks so much to Adam for joining us today and thanks to you, the listeners. Hey, if you want to help us out, leave a comment and a rating on Apple Podcasts or Spotify, would love to hear your feedback. Let us know how you think we're doing and what you'd like to hear. MongoDB world, get some more information about Mongo to be world it's June 7th through the ninth 2022, it's coming to New York city. You can find out more information at mongodb.com/ world-twenty 22. If you're going to register, use the code podcast for a 25% that's mongodb.com/ world 22. I hope to see you there.


Happy World Backup Day!!!

Consider Your Cloud Backup Strategy on World Backup Day, March 31

Iron Mountain processes millions of documents for many of the top Fortune 500 companies and globally across five major markets. They digitize and ingest these documents, process them to classify, enrich and extract metadata for our customers. In 2021, they digitized over 870 million pages of documents, which is enough to spread from Atlanta, Georgia to Albuquerque, New Mexico. Today on the show, Adam Williams, Sr. Director of Platform Engineering at Iron Mountain joins Michael to talk about how Iron Mountain is leveraging MongoDB to efficiently, and securely achieve massive scale.

Iron Mountain is Hiring! To find out more, visit https://ironmountain.jobs/