Ep 142 MongoDB Origin Story with Dwight Merriman and Lena Smart
Michael Lynn: Welcome to the show. My name is Michael Lynn and this is the MongoDB Podcast. Thanks for joining us. Today on the show, Lena Smart, Chief Security Officer of MongoDB, and I team up to interview Dwight Merriman, co- founder and key contributor to MongoDB. Dwight Merriman is a true tech legend. In addition to co- founding and co- creating the MongoDB database and 10gen now called MongoDB, the company. He also co- founded and led several other well known successful companies including Business Insider, DoubleClick and Gilt Groupe. In today's interview, Dwight shares openly and honestly about the motivations behind creating the database, which now actually claims nearly half of the entire NoSQL market. He talks about the decision to build the database rather than use something that existed at the time. Dwight's friendly, easy to talk to, knowledgeable, and probably one of the smartest individuals that I've had the pleasure of chatting with. Without further ado, let's get to the interview. If you enjoy the content, please consider visiting Apple Podcasts or Spotify. Leave a rating and a comment if you're able, let us know what you think. Stay tuned. Hey, did you know that MongoDB University has been completely redesigned? That's right. Hands- on labs, quizzes, study guides and materials, bite- sized video lectures, programming language specific courses. You can learn MongoDB in the programming language of your choice, Node. js, Python, C#, Java, so many more. You can earn that MongoDB certification by validating your skills and leveling up your career. Visit learn. mongodb. com today.
Lena Smart: So it is my absolute pleasure, and I'm so glad that you could make it in person today, to introduce Dwight Merriman. He is the first CEO of MongoDB, and you were still coding, I understand. You're also co- founder and director of MongoDB as of today. Are you still coding?
Dwight Merriman: I'm still coding or tinkering a bit myself, but not on the database anymore. I think there's, to really dive in and work on it, there's a certain minimum number of hours a week you have to work on it, just to keep up with the code base and the state of everything, because it's not short, it's not a small program anymore.
Lena Smart: Amazing. And also in the room we have Mike Lynn, who's our developer advocate, and I know that you'll likely have some questions.
Michael Lynn: Yeah, for sure.
Lena Smart: And just fire ahead, because probably this will be the most interesting person I'll speak to in a inaudible too.
Michael Lynn: Well I'm fascinated already and I've got so many questions for Dwight, but I'm going to let you go ahead and ask away.
Lena Smart: Cool. So the first question I have, and this has been a burning question of mine since I joined three and a half years ago, is how did you start the company? How did you start MongoDB?
Dwight Merriman: Right, so when we started, actually the name of the company was 10gen, and this was around 2008, or I forget the date, maybe two months before that, I can't remember. The original, what we were really looking at, at the time, is as myself and our other co- founders like Elliot and Kevin, we've been working on various entrepreneurial projects, and we were seeing this repeated pattern where over and over. New product idea, you start building the system. At this point, I've been doing that for quite a long time. So knew what the best practices were at the time. But it was always around that timeframe, January, 2008, whenever it was, it just seemed like it was always a bit awkward. There was awkward and un- anesthetic, and it just seemed like there was a lot of duct tape and rubber bands. And even though those were best practices. You would talk to CTOs at the time, and they would say things like, " Putting memcached in front of databases is okay, and roll your own sharding in front of my MySQL sequel or Postgres is okay, but it isn't. It was because there wasn't a better way at the time. And everything, that was really when the cloud computing EC2 was really taking off. So it was very clear to us that cloud computing was the future, and a lot of the traditional products weren't very cloud- friendly. So if you have a database that scales vertically, so I can make it bigger, but then it's a mainframe, or a Sun 6500 or something like that, that's the opposite of a cloud principle, which is horizontal scalability and elasticity. And then if you tried to do it the other way, horizontally, it was usually rolling your own when it came to operational databases. And a lot of other things, but also just agile development was the way to go then, all iterative development. But a lot of the old tools, and this isn't just databases, but languages, everything, weren't really designed for that, because they were invented earlier. So it's not their fault. So we were just saying, " Gee, there's got to be a better way to develop applications," and this is both on the how to develop them, how to code them, and also on how to scale them, and how to run them in the cloud painlessly. So our first concept was just we were going to do platform as a service. So we were going to try to do a fresh take on the developer stack, versus LAMP and whatever else was common then. And see what we could come up with. So we started building a platform as a service system. It was open source and this was very early. So I think when we went to beta, it was almost exactly the same time that Google's, was it Google App Engine?
Lena Smart: Yeah.
Dwight Merriman: It's the same time it came out to beta. So our timing was, it was like when they came out with it. And I was like, "Oh, okay, somebody there's thinking similar thoughts." And so that was fine. But a few months later, as we got a little further into it, I was thinking about it and I was like, I'm looking at things like AWS, where they have all these microservices. And they're like, " I'm not going to give you a full cloud platform. I'm going to give you some building box for your toolbox, and over time I'll give you more." Because the scope is large, so today they have a lot of services, but this, we're 15 years later- ish. So if I give you a platform though, to give you everything you need really, it's a big scope, and it's going to take quite a while to build it. So I think platform as a service makes sense, but we got further into it, and we had something working analogous to Google App Engine, or I guess, Heroku was around back then. It just felt like, " Boy, to get this true maturity, there's so many pieces that you would want in it. It's going to take a long time. This is, it's going to take a decade or something." And for a startup you only have so much runway. And it's now even today platform as a service, I think, is a valid notion and concept, but it's certainly not mature yet. The more AWS style or microservices- style approach, which you could do on all the big cloud platforms today, I just, I say AWS because I'm just contrasting it with the PaaS vendors back in the day, approach is still the dominant approach. So we've been building this, and really what were we building? So we're trying to build something where you'd write some code, you put it in inaudible, then you would just click Deploy. And it would deploy your app into our system in the cloud, try to handle scaling for you, including things like app server layer, app tier, how many app servers should there be, and low balancing for that. All this is just happening automatically. You don't have to think about it at all. So it's really trying to eliminate a lot of the operational overhead. It's just, give you a platform. It's like, " Here's my app, I've written all the code, deploy it." And it just happens, and you don't think about machines at all. So this is an aspiration. Obviously what we built, there's a little bit about machines, if we look at today with MongoDB and sharding, and things like that. I mean we do have things like Serverless, but we also have things like sharding where, as the person developing a system, how many shards you have, you can change it, but it's not like it's just completely opaque in that sense. And likewise in your replica sets, have control over how many copies of things there are. But conception, that was the path. We were looking at completely elastic, serverless too. But as we looked at it, we also were thinking about what would we want if we were building a new app or system. And there's certain features I wanted from the data layer, and if you really went to something that was just 100% inaudible, infinitely scalable and so forth, you're getting into things that were more like the early Amazon Dynamo stuff, where they're more, at least back then, it was just more a key value store, key document store, if you will. You didn't have the rich database functionality. So we didn't want to throw out tons and tons of data layer functionality. So our approach was, it had some traditional elements to it, but then we tried to innovate on those. And it's like, yes, it's sharded, but it's auto- sharded. You can, it'll do it, you don't have to write it yourself. And the replication, it's still replication, but it's a lot more sophisticated than the traditional just primary- secondary model, and push button on a lot of these things. So we've been building this platform, we had the app layer, data layer, and then it's just like, " Gee, this is such a large scope for a startup." We didn't have many people at the time, and it was maybe I feel like we should just do one or the other. We should do this, the app layer of the platform, or the data layer. So if we look back at Heroku, their data layer was Postgres, right? That's how they reduced the scope. And then in the end we decided to focus on data layer, because we were in beta with the platform.
Michael Lynn: What was the platform called by the way?
Dwight Merriman: 10gen.
Michael Lynn: 10gen? Okay.
Dwight Merriman: And then we called the data layer MongoDB. And since it was sort of a module or a component, we didn't mind using a slightly cheeky name, because it wasn't the name of the whole product at the time. But actually the background on the name, is that the concept of the Mongo is it's the middle of the word, " Humongous," and half of the point was the horizontal scalability, or easy scalability of the product. And then the other half is of developer productivity and agility. That's where the name came from. So it was the name of the subsystem. And then it's like, " Okay, that's all we're going to do now, instead of the whole platform." So there was a pivot if you will, which we did very early. Things were going fine, but we were getting very good feedback on the beta of the platform. But I was just thinking ahead in how this plays out. And it was like, " This is a lot to do." And also the rate of the adoption of that model. But then thinking about, " Well, do we do the app layer or the data layer to cut the scope?" We were getting really good feedback on the data layer of the platform from the beta testers. So they were like, " Hey, I really like this." So that helped us feel like, " Okay, maybe let's just take the data layer, let's un- bundle it from this platform as a service- thing and just make it a database, open source database, you could run anywhere." And so we just pulled it out of the code base so it was its own thing. And then it's like, " Well, I guess we need to write some drivers." So we spent a month or two running drivers, and then we released version 0. 9. And then it was just all we were working on, was MongoDB, and that was the company.
Michael Lynn: What drove the decision to go open source?
Lena Smart: Mm- hmm. That was going to be my question. Thank you.
Michael Lynn: Sorry.
Dwight Merriman: It seemed pretty clear to us that the traditional enterprise model was changing. And obviously there was a lot of things that were open source at the time. There's a lot of things that were SaaS, and then there's some things that were freemium, that seemed like the options that people were doing for new stuff, were those three. They weren't the classic enterprise software. They were maybe free. For example, I hope, I don't get this wrong, but I think Splunk, it was free for a small amount of data, and then it turned into more enterprise software. And then of course you had any things that are SaaS, or maybe you call it infrastructure as a service, you pay for what you use, and then there's just the open source stuff. So we felt like, " Okay, we are a startup, how do we get awareness, branding, adoption?" People that try it as a startup, they're very big companies. Some of the biggest companies in the world have databases, and how do we compete with them? How do we compete with Oracle, how do we compete with Amazon? Things like this. And it seems like the open source is the asymmetry there that lets you compete with them. At the same time, it was clear that things were moving into the cloud. So when we're thinking about open source licenses, obviously you could go all the way down to BSD license, it's just free, and that's great if you're, especially for a community project. But we had investors and things like that. So we need a way to have revenue eventually, we wanted a license with more like a copyleft. It's like GPL. But with everything moving into the cloud, the traditional GPL copyleft doesn't really work. So this was clear enough to us even in 2008. So we made the license AGPL. I think, it was one of the first projects that was AGPL, and it seemed like that was the right way to go at the time. And I felt like, I was CEO at the time, so I was pretty involved in the decision. So it seemed like, " Well, if it's a problem, we can always just dual license it and with another license that's more flexible." You can't go from a very-
Michael Lynn: Permissive?
Dwight Merriman: Yeah, permissive license to a less permissive license. But you can go the other way, because you could still keep the other license available if you liked it, and you don't want to even go read the new one. But then you could dual license and have something more permissive. So I thought we can always go more permissive, we can't go less permissive really. And then three years ago, we actually switched the license from AGPL to this new license called SSPL, Server Side Public license, which is, it's super similar to AGPL, but if you did a inaudible on it, it's only a couple sentences are different I think. But this was a big decision we didn't take lightly, because obviously all the old releases are still available on AGPL. So it was just on a forward basis, it's like, " Let's use this SSPL thing we came up with." Which is just basically saying if what you're building is just purely a database, like a general purpose database, then you're subject to the copyleft. And this was coming out of some analysis of AGPL, and it was not totally clear that it did what the original intent was, that it totally worked legally. So we thought we needed to do that. That did push the product and the license into a slightly gray area, where there's a classic definition of open source software. Which is, there's no restrictions on how you can use it. So with GPL, you triggered a copyleft by distribution. It's not how you're using it in your application with this, it's actually, well it sort of triggers on how you use it. So if you're doing something like Amazon RDS with the MongoDB source code, it would trigger.
Michael Lynn: So it's offering it, offering your software as a service?
Dwight Merriman: Yeah. Basically Mongo as a service, and if you offer that, you can do it with SSPL, but then you trigger the copyleft, and you have to release your code just like you did with GPL. So you could still do something like inaudible version of Mongo if you wanted it as a service. So it was really a response to things, where the cloud providers, not just Amazon, I'm not trying to pick on them, but with RDS, they're just taking every open source database, and they're making a nice wrapped management layer on it. But then it's like, no, we don't have any direct customers anymore And they wouldn't be paying us, I think. So that was the notion. So it gets gray then, and a purist might say, " Well, that's not open source." But I think in practice it's completely practical. If you're doing applications, you can definitely use it for free and without any encumbrances. So I think the whole notion of how we define open source, and the licenses inaudible, and the definition thereof, I think is, right now, it's in a transitional stage, where it needs to be iterated on. Because I love open source, but given these cloud models, and if you wanted to do anything that had a copyleft, it just doesn't, the old ones don't work anymore. So now we've seen, since we did that, many other projects have done similar things. And I think from some of the standards bodies, why we predict we're going to see some new things that are in the spirit of that. But were definitely not available when we thought we needed it, because we talked to them, and the speed of motion wasn't working for us. So I think in practice, basically nothing changes. You're making an app, you want to use MongoDB, you know you can use it for free. Your code is your code, you don't have to release it, or anything. You haven't triggered a copyleft there. In practice, I think it works great. But if you're an open source specialist, theorist, you write licenses and stuff, you might quibble.
Lena Smart: That was fascinating.
Michael Lynn: It was.
Lena Smart: Thank you. So I'm now going to put my CISO hat on. I guess I should have introduced myself at the beginning.
Michael Lynn: Yeah.
Lena Smart: Maybe we can cut that in. So yeah, so CISO hat. Your experience is as a developer and an entrepreneur, how do you see security as changed? So you've just given us an amazing history of how this company has grown from a few people as a startup, to almost 5, 000 employees. Obviously, I've worked in security for many years, and I've seen a huge number of changes. I've worked in finance, I worked in the power industry, which it's highly regulated. We're moving towards FedRAMP here, which is also very highly regulated with the federal government. So where do you see, and how have you seen security change in your journey in technology?
Dwight Merriman: Yes. So I mean by way of background, coming out of college, I was a CS major. First job was just software developer full- time, right? So that was my, where it was coming from. But then over time, ended up being a bit of an entrepreneur, was involved in about half a dozen different startups. And the last one I was involved with full- time is MongoDB. Still involved, just not full time. So the point of that, is that on the entrepreneurial side, you come up with ideas for new products, or maybe new startups. So you're trying to think ahead and think about the future. You're looking at, " Okay, what's coming? What are some trends in terms of technology? What are some trends in what users need?" And then you try to, my approach is you try to intersect those, right? So if you go back to 1995, the internet was a big trend that was just on the cusp. So we might look at well what people need then, and then try to figure out some ideas for products. And of course there's millions of them. And then every new trend after that. We could even do trends that were before that, like Local Area Networks, that didn't used to exist at one point. So when those come out, there's opportunities to create companies like inaudible or something. But then there's just so many, whether it's smartphones, social media, so many. But I feel like right now, there's an anti- trend. Which is there's some big trends right now, they're pretty clear that they're a big deal, like AI, is a big trend for the next decade. So there's going to be some startups that become big, giant companies like maybe a Microsoft or something that come out of that space. I would assume. Of course, I'm like everybody else, I'm wrong half the time.
Michael Lynn: But inaudible-
Lena Smart: But then you're right half the time, so?
Michael Lynn: Right.
Dwight Merriman: So there's some big trends right now. That's an example. But I also feel like right now, there's an anti- trend, and it's security. And by that I mean the trend is towards massive problems, because the problem is getting just harder every day, right? So information security has always, since computers have existed, been an issue. But every year it gets harder. So pre- internet, it was a bit easier, when you're not plugged into the entire planet, your systems. And then pre- people having computers at home, or their own phones, and accessing your systems from that, it was a bit easier. And just the inherent complexity of modern software, and just the amazing amount of things you can do with it easily now. Just the more complex it is, it's just likely there's more attack vectors. And then your job as the CISO, or just the security person, or the developer thinking about security, is it's just getting harder every day. It's crazy. And then you look at things like, what were those attacks? I think they were demonstrated on Intel Spectre and what's the other one?
Lena Smart: Oh, Heartbleed?
Dwight Merriman: No, not the one I'm thinking of, look that up in a minute, but-
Lena Smart: Oh, the two NSA ones. Yeah, I know what you're talking about.
Dwight Merriman: Yeah, where you could look at timing of things, just-
Lena Smart: Meltdown, specter and meltdown.
Dwight Merriman: You could figure out what's happening in the computer, and this, it's just super clever.
Lena Smart: That was in the microprocessor space.
Michael Lynn: Yes.
Dwight Merriman: Sophisticated kind of a hack. It's like, " Wow."
Lena Smart: Well then we had, the thing that I was dealing with when I was in the power industry of course, was inaudible, which was terrifying.
Dwight Merriman: Yeah. So just some of these things are crazy when you're dealing with attacks from... So just the inherent complexity, but then the sophistication of the attacks. So you got everything from the kid in their basement hacking around, to more sophisticated attacks from organized crime, let's say, or semi- organized crime, whatever you want to call it. And then you got nation state- level attacks, which is going to be, " Well, how do you defend against that as the company when you have orders of magnitude less resources?" And they've just got a bunch of hundreds of PhD, mathematician of computer sciences trying to figure out how to-
Lena Smart: It used to be you could just-
Lena Smart: Yeah, you
Lena Smart: could unplug yourself from the internet, you can't do that anymore. It's impossible.
Dwight Merriman: Right. And then of course it's like, well, what part of our world then uses these systems, computers and software? What runs on it? So it's a higher percentage all the time, right? So if you can break in, but nothing runs on computers, or very little, then the scope of damage you could do is much smaller than if everything runs on computers. And you can break every household appliance in the entire planet, just as an example, or every car, because they're set up to take over their updates and things like that. Break every car in the world, it's a big deal. So in the old days you couldn't connect to your car over the internet, so now you can. So the stakes are higher even if the problem hadn't gotten harder. But it has gotten harder. So it's just such a big deal. And by trend I'm saying it's going to get harder every year for the next 10 years. And the stakes are going to get higher every year for the next 10 years. We've seen, there's plenty of examples in the news and so forth of things that have happened. So I would predict it's going to get worse. So you cannot be too paranoid.
Lena Smart: Thank you for saying that.
Michael Lynn: And-
Dwight Merriman: And now of course we still need to get work done. So I'm a big proponent of you can't create too much friction. And I think a lot of the classic things to do to around security, you could do them all, and you could still have holes.
Lena Smart: Well, we try and prepare. I mean you've seen, I think you've been privy to some of the things that we discussed at the board meetings specifically. We're very lucky to have Dwight and some other members of the board, who actually meet off cadence to talk just about security, which is rare, and I appreciate that time. And one of the things that we do as a team, as a security team, with development, with other groups within the company, is we run mock scenarios, tabletop exercises. And just happenstance had that we ran a tabletop exercise for a pandemic literally two weeks before we were hit with the pandemic that we got hit with. So we had tested a lot of things. The pivot was large, it wasn't small, but it wasn't like, " Oh my gosh, how do we deal with this?" It wasn't like the headless chicken routine. And I don't know whether to cry and go home, or embrace your prediction. I agree with you. I'd had the pleasure of meeting Vince Cerf recently, the godfather of the internet. And basically we were talking about hearing aids of all things, because we both wear the same hearing aid. But then we got chatting about the internet. And I was like, "Did you ever think it would be like this?" And he just shook his head and said no.
Dwight Merriman: Yeah, you could even-
Lena Smart: Because what else can you say?
Dwight Merriman: Your hearing aid could be hacked, I mean at some point, right?
Lena Smart: Well, that's where we were.
Dwight Merriman: Bluetooth and so forth.
Lena Smart: Basically it was, I could see his and he could see inaudible-
Dwight Merriman: Inaudible I mean I don't know exactly what you do then.
Lena Smart: Go deaf.
Dwight Merriman: But the point is everything is a computer now, your watch.
Michael Lynn: Connection is vulnerability.
Dwight Merriman: Yeah. Anyway, but from the Mongo, then just taking that thesis or hypothesis about security, and then applying it to MongoDB. So our goal is just, you can never be 100% sure you're secure, but is just to be extremely paranoid and vigilant about it, and do what we can. And that's why we're adding features to the product now that are about security that are I think fairly innovative. Like the queryable encryption is something that is, it might be the first database to have that feature.
Lena Smart: inaudible
Michael Lynn: Can I ask you to just, for the listeners, explain at a high level what queryable encryption is?
Dwight Merriman: Yes. So obviously, Zero Trust is a big term these days. And if you use a third party service like MongoDB Atlas where it's in the cloud and we just make it work for you, and you don't have to manage it yourself, and then that's very helpful, I think. But then you're like, " Well am I trusting this third party?" So I'd like to trust them as little as possible. But you can also say the same thing about your own internal organization. It doesn't really change when you do it yourself. It's like, " Okay, if there's a database team in my big company, and they run all the databases, I'm trusting them with my data if I am the department, and I have an app, and I have data." It's like, " Okay, are you guys secure?" And so forth and so on. And the problem doesn't go away just by being not in the cloud, for example, or on a service like that. But you do have to vet it. Actually, you have to vet your internal people or systems and processes if you do it yourself. And likewise you should vet the vendors. So we talk about supply chain, but part of your supply chain is your internal supply chain. I mean in particular in large companies, like a Fortune 500 company, where it's so big, you probably got these organizations which are servicing internal groups. They might as well be separate companies, because it's such a big company. So if you're in a company like Fortune 100, 500, you could imagine whatever you think about when you think about security and supply chain, do that internally too. Just think of each department as a supply chain thing, if it is a supplier for you. So I think the best thing to do, is to assume, well, the hardening of the security, the conservative thing to do would be just assume it's not perfect. And then what can we do? Well, the best thing to do then, would be zero trust, or as little trust as possible. So one thing we can do, is obviously we want to store data and databases, is we could store the data in a database in an encrypted format. Where the, let's say, we have a service that has the data and does something with it confidential and important or something, and I want to store it in a database. But that service could encrypt it, and then send it over the wire to the database, and it could be stored in the database on disc encrypted. And it's stored in storage, encrypted. In the database program, or machine, which, theoretically, could be attacked, it's encrypted there. And then it was also encrypted on the wire all the way there. So this is fairly ideal. So it would be, " Gee, this would be nice, it's just everything in the database is encrypted." But then it's like, well, it's not really a database anymore. Because now it doesn't do anything except storage. Now you just have maybe a key value store.
Michael Lynn: Because how could you search?
Dwight Merriman: And how can you query?
Michael Lynn: Yeah, how can you query?
Dwight Merriman: So other than it would be fairly easy to do identity queries, like where you say, " Oh give me all the fields where X equals three." But I've encrypted the three, and it's some long encrypted thing, bitstream. And so it's where X equals this long encrypted bitstream. Database doesn't know it's three, it just has this, the encrypted form. So it's fairly straightforward to just do the basic queries of equality queries, let's call it, where X equals Y, right? So you could do that without a lot of fancy technologies. And I think we've had that a few versions back. And we called, that was the original MongoDB field level encryption, which we did, because we definitely thought this was important. That was not theoretically a super hard problem. It's stored in a database, encrypted, it comes over the wire encrypted, but the only query you can do is equality, or maybe, " Not equal to," also. And so the goal now is like, " Well, I mean this is a database I'd like to be able to do queries, not just identity. So can we do more than that?" And so researchers are doing a lot of work on this, including some that we are working with directly, who work either full- time or part- time at MongoDB.
Lena Smart: Yeah, we have four full- time cryptographers now.
Dwight Merriman: Yeah, and then some of the researchers, I think, from Brown or-
Lena Smart: The company called inaudible, which we bought. Yep.
Dwight Merriman: inaudible, which we acquired. And then there's some folks who are, they're still at Brown as CS professors and doing security research, but they're also consultants for us. So the research that went into inaudible, the startup, which we acquired, and you can go read those papers, it's based on published peer- reviewed papers in terms of some of the things they did around queryable encryption. So basically we're building that technology into the database. And I think we're in, I don't know what we call it, a beta or version of it now.
Lena Smart: Pre- GA?
Dwight Merriman: Yeah, it's in the production release right now.
Lena Smart: It's in the pipeline.
Dwight Merriman: But that feature is, I would call it, it's for pre- production use. So you can start writing your code that makes use of it now, and it'll be stamped as production- ready soon. So the goal would be to let you put things in the database that they're encrypted before they get to the database. But certain query operations are still possible, beyond the trivial ones like identity. So for example, off the top of my head, I can't remember what else is possible so far. But I believe you can do prefix and suffix queries, which is pretty useful. And there are some other ones. I don't have the list off the top of my head. And beyond what we have so far, I think there'll be more in the future, although that might involve new research and inventions by either us or others. So prefix, suffix, it's a little bit analogous to greater than, less than, but a little different. It just turns out if it's actually prefixed rather than greater than, you can do it, and still be secure with reasonable performance.
Michael Lynn: So does it have to do with key distribution? Is there a key exchange?
Lena Smart: There's key management.
Michael Lynn: And do you think they'll ever be a time when there's zero lost functionality between secured fields?
Lena Smart: You mean inaudible overhead?
Michael Lynn: Yeah.
Lena Smart: inaudible.
Michael Lynn: Well, our overhead is one thing, right? Because the data is encrypted, so there's an encryption- decryption, but today there's prefix and suffix searching. Search is a phenomenally valuable space, and without the search, we've got some incredible capability there. Do you ever see a point where there'll be zero difference between encrypted field capabilities in MongoDB?
Dwight Merriman: I think that's a hard question. Can you do everything with encrypted that you would want to do? I actually haven't thought about it enough, that maybe there's a clear answer, but I haven't thought of that. But so I don't know. I'm going to assume for now that maybe not. But maybe that's okay, because inaudible stores some data in the database, not every field is equally important. Social security number pretty important. Certain healthcare values in a healthcare database, pretty important. Some other ones, a little less important. So at this point it's like can we just get the really critical ones, which is probably a minority of the fields into this, and still have some capabilities in the database, like with query building and things like this. Over time, we're going to do as much as we can and make sense. I mean currently there is some overhead to it, it's explained in the documentation. But if you're picking these critical fields, rather than just everything is encrypted, it's okay, I would say. And we'll see, as it's an area of current research, so we'll see what happens over time, and how efficient we can make these things. And there's probably some optimizations to be done, just in the code. That's more, it's not a research problem, it's more an engineering problem that we can do too, improving the performance. The other thing is you've got to be really careful, because you might think, " I could probably make up some way to do what we were talking about in half an hour," but maybe it has holes in it, and there's certain attacks against it. And vulnerabilities are, maybe it's not just completely broken, but maybe you can figure out certain things that you shouldn't around the edge. Or maybe it's just flat- out wrong. So you really need to think very carefully about this as you're doing the research. And having this stuff that's based on published peer- reviewed research, is good, because the peer review is pretty darn important. Because somebody might read it, the peer, and they're like, " Well what if this attacker does X?" And then you're like, " Oh." And then you're like, " I'll get back to you," have to go fly with this. So that's really helpful because it's really not that unusual for there to be a security mechanism algorithm that has an issue. I mean just look at encryption algorithms. A lot of them have had flaws in them. So just flat, traditional secret key encryption. They could have weaknesses, where they're not as, they're easier, they're crack than you thought. In terms of say time complexity of some kind of a attack on it, brute force attack or something. So also it's just things like imagine, we have some value like three we want to, going back to the field level encryption, we want to store in a database. You read it back, and now imagine that that field gets updated later, and it gets written again, and the unencrypted value is three again. So in the perfect world, when it gets set the second time, the encrypted form would be different. Because if I could watch the network or something, let's say, of course you can use encryption over the network too, but actually once it's in the RAM of the database server, it's not encrypted. Or the packet data isn't encrypted. So you just got whatever it was encrypted as from the source. So if somebody had breached that, and they were seeing the encrypted form of three before, and the encrypted form of three later when it was set later, if those encrypted forms are the same, they don't know what the value is, but they know it didn't change. So they learn something, and we don't really want them to learn something. So ideally, even if you set the same value a second time.
Lena Smart: Should be a different encrypted value.
Dwight Merriman: It should be different value.
Lena Smart: Yeah.
Dwight Merriman: And you could also imagine if have a bunch of documents about people, and there's a field that has a certain value, like true or false for you have this disease, you wouldn't want the encrypted form to be the same for all the users. Because maybe I'm a user in there, and I know I have that disease. So then if I saw that, " Okay the encrypted value is blah blah blah," for my document-
Lena Smart: Then you know who else inaudible-
Dwight Merriman: I know it's a true for me. So then one attack is then you just look for everyone else with the same encrypted value.
Michael Lynn: Same encrypted value.
Dwight Merriman: So we're not doing that with the new queryable encryption. It is a little smarter than that. So there's a lot of things to think about like that. So it's not 100% about query- ability, it's also about just generic robustness of doing field- level encryption. And that's an example of a fairly simplistic attack you can imagine if you just do very, encrypt the field on the client, and shoot it over, the server doesn't know what it is.
Lena Smart: But to your point earlier when you said, and I'm cognizant of time, what time do you have until?
Dwight Merriman: I'm pretty good.
Lena Smart: Because you mentioned earlier things are going to get worse before they get better. I'm not even sure if they'll ever get better, but we can live in hope. So with something like queryable encryption, if I'm the bad guy- hacker, data is gold. So I'm going after data, I'm going after information, I'm going after your bank account details so I can empty it. If it's encrypted end to end, I can't make head or tail of it, I'm going to go to the next place hat's not encrypted. Do you see something like this queryable encryption being a game- changer for the security world? Because that's where I'm coming at it from. What can I do to make our customers more secure in what we are doing with their data? And also on a personal level, what's my bank doing with my data to secure it from the bad guys?
Dwight Merriman: Yeah it is true that, it's like if it looks like your front door has a better lock than your neighbor's front door, then maybe the robber goes to your neighbor's house.
Lena Smart: Inaudible with the-
Dwight Merriman: I mean, so for an individual that works, but for society it doesn't really change anything, maybe, overall.
Lena Smart: But if it was that one individual who was going to rob my house, I'm quite happy if it works.
Dwight Merriman: So yeah, and it makes sense as an attacker to, the thing, you just do these very broad sometimes attacks, and you just see where you get a success. So if there's some vulnerability and it's one computer out of a thousand, but if I go hit a thousand of them, or a million of them, I find a thousand that are vulnerable.
Lena Smart: But then if you put your criminal hat on, you're walking down the street, and you see the label that says, " Guardian Security," it's like, "Well, there must be something worth stealing there. So I'll wait till they wait until they're out, and go back, and then go take a peek." Sometimes, I mean you can flip it on its head.
Dwight Merriman: I mean, there's no silver bullet. So the latest thing from us, we're doing lots of things around security. But the latest, maybe the most innovative at the moment thing, would be the queryable encryption. But there's no silver bullet, there's no one thing you can do that solves all problems. So I think from our point of view, it's a never- ending effort to make it better. So we'll try to come up with some new innovative things around the security. And then we'll try to just do more and more about all the classic things you do. I mean there's a lot of different attacks. So having the data encrypted is good, but ransomware? Like, " Okay great, it's encrypted but I still, you don't have it anymore."
Lena Smart: Right.
Dwight Merriman: Okay. I didn't solve that problem for you with queryable encryption.
Lena Smart: True.
Dwight Merriman: Right. Now-
Lena Smart: But the thing they have ransomed, inaudible hopefully, still.
Dwight Merriman: But it's, " Do you need the info?"
Lena Smart: Exactly.
Dwight Merriman: "I can't read it, but what's it worth to you?"
Lena Smart: You want to read it? Yes.
Dwight Merriman: Would you like it back with me never having seen it?
Lena Smart: Yes.
Dwight Merriman: And would you like to pay me? So things like that are things we think about.
Lena Smart: Is that where something like multi- cloud could come in then, having multiple copies in multiple clouds, and making it just more difficult to find things?
Dwight Merriman: Yeah. MongoDB Atlas supports multi- cloud. So right where you can have replicas that are on different clouds. So if somebody was able to get in on one cloud provider, either our account through some attack on us, or some attack on the cloud provider, that doesn't affect getting to the other cloud provider. So that heterogeneity does give you some safety in terms of not losing the data completely.
Lena Smart: You're right, there's no silver bullet, but if you've got zero trust and queryable encryption or field level encryption and multi- cloud, it's like you're wrapping and wrapping and wrapping.
Dwight Merriman: Maybe, I mean the more cloud providers I add, maybe the more I care the data's encrypted, right? Because obviously you might put your data on two or three. But if you just take the thought experiment, if you had it, if you were running on 50 cloud providers with a replica of the data, you would be like, "Well, I think one out 50 is going to get hacked." And then it's like, " Well, they can't destroy the data because I've got 49 other copies they can't get into."
Lena Smart: If I'm ransom- wared inaudible-
Michael Lynn: Unless you're partitioning and multi- cloud sharding-
Dwight Merriman: Yes, which is a different topic. But the basic idea here was just that there's redundant copies, replicas on the different providers. So there's a lot to think about. So all of these things we're thinking about, and trying to be paranoid about, and do the right things. And there's a million things, just all the classic stuff, physical security, social engineering, things that are completely different than the topic we were talking about. But you'd like to make everything as good as you can, and then if anything does go wrong, minimize the damage. So if something did go wrong, somebody did get the document, all the super confidential fields are encrypted. Okay, the damage it's minimized at least, and hopefully they never get in at all.
Lena Smart: Well, I could keep you here talking all day and I know you don't have all day, so I really appreciate it. Do you have one last piece of advice for me as CISO, apart from, " Look for another job, maybe serving coffee somewhere?"
Dwight Merriman: One thing I think is important, and I think maybe this is obvious to you, and you already know this, is just I think in an organization is, don't have one security policy for everything. Because usually when you add procedures and processes and so forth, there's overhead. And you're slowing things down, you're making your organization less agile. So don't apply the same security rules for the organization for your most critical systems and your least critical systems. So if you're a hospital, the patient health records, there should be a lot of rules around that, or processes. But the expense reporting system at the hospital doesn't need to be as hardened. And if you take what you did for the health records on the expense report system, maybe it takes twice as long to deploy that, and twice as many work and cost. And it's like, " Well, why did you do that?"
Lena Smart: So know your crown jewels and protect them according to-
Dwight Merriman: Yes, you need to have two or three, a few buckets. It doesn't have to be that many, and treat them differently. So you just don't lose all agility and productivity, right? Because it's a challenge, because it's you want to be secure, but you need to be agile, you need to move faster than your competitors, or at least it as fast. So how do you do that? Well, I mean maybe if their security is bad, you can't. Yours is good, but you can really make yourself slow, is just to have this super high level of rules, restrictions, policies, procedures on things-
Lena Smart: Inaudible, yeah.
Dwight Merriman: ... that are not,yeah, I don't want to lose my expense report data but-
Lena Smart: The company will still keep going if they lose it.
Dwight Merriman: Lose it, but we probably won't go out of business. So I think that's important, and it's pretty easy if your job is all about security to just think of all the things you need to do, and just want them down on everything. But maybe not?
Lena Smart: You were the first CEO at this company, you were one of the co- founders. I'm the first CISO at the company. When you founded this company, when it first started, what state was security in?
Dwight Merriman: So we started working on the MongoDB, we started from a blank sheet of paper. So we're going back to 2008, let's say. And your first goal is to have something, it's a prototype, proof of concept. Like, " Is this actually useful?" So when there's no users, security isn't the biggest problem, because there's nothing to lose. Now we have people, we have banks and hospitals and things doing super mission critical things with MongoDB, and with Atlas. So it's a totally different situation. And the problem is much worse now, in the world of information security than 15 years ago. So when we started, we were, one thought was, when I would look at the way security worked in traditional databases, accounts, access control, things like that, other things, it just seemed, I mean some of it seemed a little off. Like, you can create in a lot of old relational databases, you can create users, and you can give them passwords, and things like that. But it's like, " Well, I need the concept." A user might be a system, or service, it might not be a person, but it's like, " Well, I need that user to be the service, or the person not just in the database." So I really want some holistic for my whole system- concept of identity, and I don't want it just in the database. So we want to use something more like LDAP or whether comes after that. And the inaudible systems that's that we can use for everything we're building. So hopefully it works with the database, and it also works if I'm doing service- oriented architecture in the services, and it works for my ops- people. And does all the right things, and there's tons of products for that now. So back then, I mean, there was much less stuff. But it seemed to me that it was clear that this shouldn't just be in the database. But at the same time I wanted to see how this was going to shake out. So we started out just with basically no security. So the concept was just run this in a trusted environment, and we won't give you any security features, because it's like, " Well, which ones do you want?" And then without getting into all these problems I was talking about of, " Okay I got to get, in the database I'm creating a user with these privileges, and then over here in this product, I'm creating the same user with these privileges, and the app there is a notion of that role with these privileges," just didn't seem to make a lot of sense. So we started out, and I was thinking like, " Well, let's consider like Memcached. So in Memcached, the notion was, it just runs in a secure environment. And a lot of companies use Memcached, and it's just in the inaudible, it needs to be in its own fenced environment, inaudible or something. And it's like, " Well, let's just start with that, and then over time, we'll add security features and we'll ask our users what they want." And now today there's tons of stuff, but that was the very beginning. And it was at the beginning, you could critique it, and you could say, " Well this is crazy, there's no security in here." But to some degree that's fine, that's not completely invalid critique. But it was also like, " Well, the old way of doing it is going away anyway." So I don't want quite do that. I want to do the new way, and we'll do it as we can. And we have now. I think, so it's interesting, because we've gone over the 15 years from one extreme end of the spectrum, which is, " You make sure, you the deployer, the ops team, just make sure it's in the secure environment, and there's no security, and it's very simple. There's nothing to get confused by, and mess up also." But now we're at something that's more at the opposite end of the spectrum, and also just because the problem is much worse now.
Lena Smart: Yeah, I think as well, I was interviewed recently by a magazine, inaudible, and one of the things that they loved, was that we hadn't monetized security. Because we were talking about Oracle and how they had basically monetized, and I don't know we want to bash Oracle. But how they had basically monetized. When you buy an Oracle database, you have to buy security on top of that, and it's usually at 50% of what you just paid just for the security module.
Dwight Merriman: Wow. I didn't know that.
Lena Smart: Yeah.
Michael Lynn: Did you, if you have time, I'm just curious, in your wildest dreams, did you think about a MongoDB that exists today? Did some of the things that you thought about back then, come to fruition? And how far beyond what you were originally thinking has the company gone?
Dwight Merriman: Well, once we switch to just solely working on a database, I think the original thesis has held up quite well. And there hasn't been any shocking surprises or changes. So we create this application, data platform, a way to build modern applications using modern software engineering methodologies that involve a lot of iteration, and a lot of data that's has complex structures. Maybe it's polymorphic, things like that, make that all very easy. So in relational, it's... Imagine if you're trying to be agile, and you have a system, you want to do a release every day, right? How's your iterating? Well, do you want to do a schema migration script every day? No. Okay, so it's like, " Well, I need a solution for that." And I'm not blaming relational at all for that, because it was invented decades before agile development and methodology was invented. So nobody was asking them for that back then, but when we started on MongoDB, that did exist. So it's like, well, if it's possible to add a new field to a document in the collection, and not every other field in the collection already has it. Or the other documents in the collection inaudible field that's different data type than this new one, where I need it to be an array now instead of a singleton value, there's no schema migration. So it's just things like that fit well with the development methodology of today. And so that was very much on our mind. What would allow me to be productive and fast, but in a rigorous way, not in a way that's full of bugs? And even aesthetic, from an engineer perspective, for data layer, for an operational database. And we were developers, and so we just tried to build what we wish we had. So we had two real things we wanted, which is, one, is that agility side, where I can deal with modern data shapes and things like the polymorphism and the data, the schema, let's call it schema evolution that occurs in an iterative, agile development situation. We wanted that to all be easy. I mean, obviously, objects are a thing. So we, if I have object style data, it shouldn't be hard to store it. And so we call that documents, because we're not storing the code, we're just storing the data. So we call it documents, not objects, which I think makes sense. I think one of the big ideas from databases, is separating data from the code. We go back far enough in time, you're doing things like maybe have a inaudible library, and you're linked into your program. Or you're using VSAM and it's not really a separate database, or whatever, but that's going back a long ways. So separating the data from the code was important. But yeah, that concept of schema evolution I think was important for us, is it's that should be easy because we are iterating. So today's schema is not the schema six months from now. And how do you deal with that? So sometimes people will say"MongoDB is schema- less." That's not true. That's probably, it's a little bit of a misnomer. There's always an implicit schema. You should always design a schema. And we do have features now where you can totally enforce the schema just like you could in a traditional database. But you don't have to. And by default, it doesn't ask you to, because it's very easy to put a bunch of junk in the database without violating the schema rules, right? So that was the thesis. So we wanted that, and then we wanted horizontal scalability. And if we can scale horizontally on commodity hardware, like commodity servers, then it should fit well with cloud computing. So that was part of our goal. That it might be a inaudible from just the concept of horizontal scalability, that without the user having to build it themselves. And things like just fault tolerance and failover, I don't want to have to build that myself. If you had an old database, master- slave replication. But how does a failover work? And then if you do fail over to the secondary from the primary, how do you get the primary back in sync later? Well that's all pretty much automatic with MongoDB. So that was part of our goal too. So it's just like, " What do we want in 2008 to build modern applications as an application data platform?" And so then we started, we built that. One thing we wanted is we wanted fairly high degree of functionality. We wanted to be able to do ad hoc queries. So we had a query optimizer. We wanted to be able to do sorting, and we didn't have transactions in the early days, but we do now, because with sharding, that's non- trivial. But basically that's a long answer, saying that was a thesis, and I think it really hasn't changed. And just really focusing on the database, mostly a database behind an application. So it's an online database, or an operational database, or an inaudible database. Pick your term. And so the original idea has held up very well, and there hasn't been a lot of searching for the right product concept. And then it's just, so we really like, this is the idea. And then in our perfect world, huge numbers of developers and applications are using this as their data layer.
Lena Smart: inaudible.
Dwight Merriman: That's our goal. And we're still on the path, the journey, but we've made it pretty far.
Michael Lynn: So one last question, and I don't know if it's really a question, or just asking to illuminate. This is, MongoDB's not your first success, first wild Success. What were you successful with prior to MongoDB?
Dwight Merriman: Yeah, so I was one of the co- founders of a few startups. I probably tried a couple things that didn't work quite as well, but the ones that worked well before MongoDB, were DoubleClick, Business Insider, and Gilt Groupe. And also, so just seeing what was done at all of these companies, building systems, and also just having friends in the industry who were CTOs, building new, either in enterprises or in startups, building new things, and seeing how we're all doing things. And all the problems which occur, including things like scaling the data layer. And then seeing things like, " Okay, we think about things like asset properties and transactions as being important, and so forth. And then we stick a Memcached farm in front of the database." Okay, so the data in the cache is stale, right? So these various guarantees you had in the database on the data being current and so forth, aren't there anymore. But everyone stuck Memcached in front of databases, because they were not fast enough. So we were looking at all of this, and it's like, "There's got to be a better way to do this." So between all this startups and friends doing startups, it's just like, "Okay, I know if you're using Memcached in front of a database, people aren't going to tell you, 'Well, that's just fundamentally wrong, and you don't know what you're doing,' at that point in time." But it's like, " Well, it's sort of wrong though. It's just we don't have a better solution right now." So that was part of the catalyst for trying to build something new.
Lena Smart: It was fascinating.
Michael Lynn: Yeah.
Lena Smart: Thank you so much.
Dwight Merriman: You're welcome.
Michael Lynn: Thanks so much for spending time.
Dwight Merriman: No problem.
Michael Lynn: inaudible. Thanks so much to Dwight for joining us today. And thanks to Lena for leading the discussion. Check the show notes for links, some of the things we discussed. Once again, MongoDB University, it's been completely redesigned. And one of the things I love about it, is the in- browser experience. You don't even have to install MongoDB to go through the exercises. Check that out at learn. mongodb. com. Thanks everybody. Have a great day.
DESCRIPTION
Today on the show, Lena Smart, Chief Security Officer of MongoDB, and I team up to interview Dwight Merriman, co- founder and key contributor to MongoDB. Dwight Merriman is a true tech legend. In addition to co- founding and co- creating the MongoDB database and 10gen now called MongoDB, the company. He also co- founded and led several other well known successful companies including Business Insider, DoubleClick and Gilt Groupe. In today's interview, Dwight shares openly and honestly about the motivations behind creating the database, which now actually claims nearly half of the entire NoSQL market. He talks about the decision to build the database rather than use something that existed at the time.