Ep. 178 Life at MongoDB: Building the Future of Search

Media Thumbnail
00:00
00:00
1x
  • 0.5
  • 1
  • 1.25
  • 1.5
  • 1.75
  • 2
This is a podcast episode titled, Ep. 178 Life at MongoDB: Building the Future of Search. The summary for this episode is: <p>Tune in to the MongoDB Podcast this week for a live chat with Doug Tarr, VP of Engineering. Doug leads the Search team at MongoDB and is looking to expand since the announcement of the newest addition to our platform: Vector Search.</p><p><br></p><p><strong>Listen for:</strong></p><ul><li>[01:09&nbsp;-&nbsp;03:17] Why is search separate from database?</li><li>[03:51&nbsp;-&nbsp;06:53] The functionality of Atlas Search and how it's differentiated from the database</li><li>[12:30&nbsp;-&nbsp;15:15] How Doug ended up at MongoDB</li><li>[16:48&nbsp;-&nbsp;19:51] What Vector Search is</li><li>[23:30&nbsp;-&nbsp;27:45] What developers can do with Vector Search</li><li>[29:32&nbsp;-&nbsp;32:40] The architecture Vector Seach is in</li><li>[40:10&nbsp;-&nbsp;41:13] Important skill/quality to have for a successful career</li></ul><p><br></p><p><br></p><ul><li><a href="https://mdb.link/engineering-careers" rel="noopener noreferrer" target="_blank"><strong>https://mdb.link/engineering-careers</strong></a></li><li><a href="https://mdb.link/atlas-vector-search" rel="noopener noreferrer" target="_blank"><strong>https://mdb.link/atlas-vector-search</strong></a></li></ul>
Why is search separate from database?
02:07 MIN
The functionality of Atlas Search and how it's differentiated from the database
03:01 MIN
How Doug ended up at MongoDB
02:45 MIN
What Vector Search is
03:02 MIN
What developers can do with Vector Search
04:15 MIN
The architecture Vector Seach is in
03:07 MIN
Important skill/quality to have for a successful career
01:02 MIN

Doug Tarr: Hey, everybody, this is Doug Tarr. I'm vice president of engineering at MongoDB, and this is the MongoDB Podcast.

Michael: Well, welcome, Doug. It's great to have you on the podcast. How are you doing today?

Doug Tarr: I'm doing good. Doing good.

Michael: Fantastic. Today, I'm really delighted to have you on the show. We've been talking about doing this episode for quite some time, and we're going to focus on some of the technologies that fall into your responsibilities at MongoDB. But for the listeners that may not know, introduce yourself. Let folks know who you are and what you do.

Doug Tarr: I am VP of engineering at MongoDB responsible for Atlas Search and now Vector Search. What that means is I manage multiple teams who are building the Atlas Search and Vector Search products here. We've been building Atlas Search for about four and a half years now, and we have three teams. One team is responsible for our core engine, one team does distributed systems, and one team does our web platform and developer experience.

Michael: Talk a little bit about search. Why is search separate from the database?

Doug Tarr: Yeah, that's a good question. MongoDB itself is such a core database company. Everybody sees the world through the database and frankly, our customers do too. Search is so closely connected to what people see as a database that they don't often even understand how it's different, but you could probably describe it as the difference between having to recall all of your query results and just finding the best ones. When you run a database query, if you don't get all the results back, it's usually considered a big problem, an error. We throw an exception and customers are upset, and it doesn't do what you want. Whereas with search, what you're really trying to do is provide the most relevant results for people and you're trying to do it quickly. If you think of any search engine you've used Google, whatnot, they don't give you every result. There's no way you can actually search through all that stuff. What you do is you look at the top results and you make sure that those results are relevant to you. That's really the core fundamental difference. Everything else about the technology, the architecture, and the infrastructure kind of flows from there.

Michael: The functionality of searching a database is found in MongoDB's query language. Folks will know that as part of the MongoDB Query Language, we have operators find and find one, find many. Where does the conventional find capability and search begin to differentiate?

Doug Tarr: Most people when they're building an application, that's how they start. They have find. They insert. They have a table of results. Where search comes in is when you usually start with a natural language query. You might just type into a search box, for example. If you tried to do that with find, you would probably not get very good results. You basically would get the exact match or something like it or just a very simple pattern match. But that's really not what people want. They expect you to understand the language and the grammar associated with that query and break it up and then build a much more sophisticated query that lets you rank those results according to your application's needs.

Michael: Yeah, I think that's the key. I think a lot of people start down that path with find and then end up trying to implement on the front end maybe a regular expression or some type of way to expand just searching for the specific text that the user of the application is trying to find. You mentioned it's Atlas Search and that's the search product that can be found in the MongoDB Atlas platform, which is our cloud database service. Talk a little bit about the functionality of Atlas Search and how it's differentiated from the database.

Doug Tarr: When we started building Atlas Search, we looked at a lot of technologies to figure out where to start, and we settled on Apache Lucene as the core library that powers it. If you start from there, you realize now you've got this entire other system. MongoDB as a database, it's written in C ++. It's kind of a monolithic code base. There are definitely different components to it, but it's all the same. You compile out all the same binaries. Whereas Atlas Search is a different system altogether. Apache Lucene runs on Java. We created an entirely separate process to run that. Part of the reason it's an Atlas is because managing multiple systems together is really difficult. One of the reasons people like Atlas is we do that for them and take care of a lot of the details. At its core, what Atlas Search is is a Java process, we call it MongoT, that runs next to database process, your MongoD and your MongoS. Those two are tightly integrated at multiple levels in the query layer, in the networking layer, all throughout the system to provide an integrated experience.

Michael: I think that's the key, reducing the management overhead for the developer, for the person running the service. I suppose you could set up your own Lucene instance and then manage the indexing and sharing of the information between the database and the Lucene, but that's all done for you in Atlas Search, right?

Doug Tarr: Yeah. In fact, before Atlas Search came around, I think that that is the way people did it. They had their own search engine, and there are lots of options in the market, but really where we see one of our core differentiators is being able to be super integrated and have a great experience for a MongoDB developer.

Michael: No transferring of data. No ETLs. It all happens behind the scene with a nice management interface built right into the administrative GUI for Atlas. Why Lucene?

Doug Tarr: I've been in this industry for, I don't even know, 25 years or so. Lucene came out in the early 2000s and has really been the most rock solid market tested solution out there. A lot of people look at Java and they're like, " Oh, it can't be fast. It's Java." But there's so much work that the open source community around Lucene has developed that it is really competitive with any solution. It doesn't matter if it's written in Java, C ++, Rust, what have you. Lucene is really competitive, offers a lot of features, and really gave us a boost right out of the gates in terms of being able to offer a competitive solution for people running their production applications.

Michael: It's sort of the standard, the best search standard in the open source community. I think one of the great things that I like about our investment and use of Lucene is that we're helping Lucene to continue to be viable. We're helping the project continue to exist. Is that right?

Doug Tarr: Yeah. I think that we have a big team of engineers who are working on MongoT and Lucene. Fortunately for us, you come into a library that's been around for this long, the kinds of improvements you need to make to it to make it work for you are pretty specialized. For us, we've been able to just really leverage it and use a lot of those features, but we're really excited about the future of Lucene and we're excited to contribute more to it as we move more into areas of Lucene that are less mature. Anybody can build on Lucene today. I think you can learn it, you can build it, and you can build pretty fast, reliable search engine. But once you've started to move into really deep areas where people are really pressure testing it, that's when I think we can provide value, because a lot of our customers are doing a lot of really interesting things with Atlas Search. Until you see those things, until you hit the limits of what a product can do, you're just going to use it as an engineer. Why would you do more? But now that we have a lot of talented engineers who are really familiar with it, yeah, I definitely see us contributing more back.

Michael: I want to talk about that team of engineers. It's an interesting time where we're continuing to grow. One of the things that's great is that you're adding to your team. Can you talk a little bit about what specific talents you look for in an engineer?

Doug Tarr: Yeah, it's a good question. We actually have, I think, about seven roles open on our team. We've been growing very rapidly, just a lot of demand for the product and a lot of interest in helping us improve it. We have two main categories of engineers. But as we've been specializing, it's growing. But really we look for people with distributed systems background, who can have a good systems engineering background, can diagnose performance issues, and just really good strong engineers. We don't necessarily care if you've used Lucene. We figure most of that stuff. You go to a new job, you learn whatever kind of platform. We don't care that you've learned it. What we look for are just core fundamentals that make a good backend engineer. That's for the systems and the query team. And for the web platform team, which is a really interesting team that we just started about a year ago, it's a very different persona, but it does require a lot of knowledge of that same skillset. But these are people who have more of an empathy for a developer, who are familiar with what's in the market and can propose interesting ways for people to help developers just really learn how to use our product. They tend to be people from a developer experience background who can help users do what they need.

Michael: Are there specific frameworks, languages, skill sets that you're looking for?

Doug Tarr: I mean, we do use Java for a lot of... MongoDB Atlas in general uses a lot of Java, so that's nice to have though. During an interview, if you didn't know Java, you didn't know any of our programming languages, it's fine. We do look for people who understand databases. I think that's important. We're a database company at our core still. The Atlas Search team actually works very closely with the core server teams to develop new features. You do have to understand databases, data structures, just the core fundamentals there. And then the other thing that I always look for, and it's really not a programming skillset, but we really look for people who are good communicators. MongoDB is a big company at this point. We do a lot of planning, a lot of designing. You might have a great idea. But if you cannot communicate that idea clearly and respond to feedback, that makes it very difficult to be successful building at the scale we're trying to do here. We look for people who are just clear communicators, good listeners, that kind of thing.

Michael: Do you find that challenging to find someone with a good mix of communication skills, highly communicative and analytical and a good engineer?

Doug Tarr: Yes. I do think that we've been very successful at hiring. I'm very excited by the team we've built. You don't need to have everything perfect to join the team. I do think we have a lot of complex technology decisions to make, so we do look for people who can catch up pretty quickly and seem like they have a good ability to learn. Communication, I think that's a skill that engineers don't think about. I wasn't a computer science major, actually. I was a math major when I was in college, but I don't know, I haven't seen too much communications training going on in the industry and it tends to just be something people learn or know or adopt themselves.

Michael: You've got an interesting background. I've checked out your blog and some of your background. How did you end up at MongoDB?

Doug Tarr: Before MongoDB, I was at a company called mLab. mLab was actually a competitor to Atlas, and I was there for about a year as VP of engineering. We had a lot of really interesting customers and who were pushing our limits. About a year after I joined, MongoDB acquired mLab and that's how we came in. Most of mLab was engineers. We had a couple other folks, but the mLab team went into various orgs within MongoDB and it was a great cultural fit for us and for MongoDB. I really think most of the team that was acquired four and a half years ago is still at MongoDB today. It was a great experience for all of us to be involved in that. When I joined, basically what happened was our CTO at the time was like, " Hey, we have this new product we'd like to build." I had a background in search, so it really worked out. Elliott, who was our CTO, just asked me, Elliott and Kaylin and a few other folks that were our leadership team asked me to build a small team to build a beta of Atlas Search. We had to get it shipped nine months to get to the next MongoDB World. That was our initial introduction to MongoDB, but everything worked out from there. That's how we got to MongoDB side of things.

Michael: And then prior to mLab?

Doug Tarr: Prior to mLab, there's definitely a few years back there. Before mLab, I was actually running a program for kids teaching them to code.

Michael: Oh wow.

Doug Tarr: The context there was I actually have been involved in startups my whole life and I had a previous startup that I was involved in that got acquired in the early 2010s. I had some time to figure out what I wanted to do. I had two small kids at the time who really loved coding, and this was around 2013 or'14. Back then, learning to code was the hottest thing. Barack Obama was talking about it. Everybody was talking about it. My kids really wanted to learn to code, and so I just grabbed a few of them and we started teaching them to code around our dining room table. Everyone heard about it and it blew up. I was not expecting to do this. It was really just a thing I wanted to do to teach my kids and everybody else's kids apparently wanted the same thing. But it kept going and I kept growing it. I spent about 2013 to 2019 doing that and built a program for teaching kids to code.

Michael: That's phenomenal. I love the focus on that, on STEM in general. What languages did you teach the kids?

Doug Tarr: We started with little kids. They were like six, seven years old. We would use Scratch, which is a visual programming language developed by MIT, which is a fantastic language if you have little kids. Highly recommend Scratch. It still holds the test of time, I don't know, 10, 15 years later when it was developed. It's just a great tool for kids to learn about it. We'd start them there. And as they grew, Minecraft was really big. We'd teach them to make Minecraft plugins, which honestly was hilarious because it was in Java. We were teaching these kids who were like 10 years old how to use Java. Loved it so much, they didn't care how hard or weird it was for them to learn.

Michael: The thing I love about Scratch, my nephews used it extensively early on and they wrote some really amazing things in it, it's a good mix between visual. You can wade in using the visual drag and drop tools. But during the process of that, you start to understand logic flow. That's a really key part of learning to code. That's great. That's awesome. I love that you did that. Let's go back to the search topic and search is an umbrella at MongoDB. It's a lot of things. One of the newest and most exciting facets under that umbrella is Vector Search. For the folks listening that may not understand what Vector Search is, can you give an overview?

Doug Tarr: It's a new topic and we're all learning right now, but modeling language is a really difficult computer science task. Language is just a very complex thing. If you try to make the rules for how the English language works, you would find it to be frankly impossible because there are no rules and it's mostly a bunch of patterns. For years, the way that people modeled those patterns was they write these little rules. They'd say, " Here's what a plural in English looks like. Here's words that are prepositions and they don't matter that much." You'd make a bunch of rules. Lucene itself was a conglomeration of the knowledge of these rules over decades, but machine learning was just a completely different way to go into that and learn how to model the English language. We've come up with this concept that I'm sure everyone's heard of called large language models. What those do is they map the language into a vector space. What I mean by vector space is a n- dimensional space. If you think of a graph, like a two- dimensional graph with an x and a y- axis and you just start putting words next to each other on the graph, that's what vectors are. If you were playing a party game and I just put the word dog and I asked you, I said, " Michael, what word should you put next to the word dog," you might say what?

Michael: Leash?

Doug Tarr: Leash. You might say leash. You might say something else. You might say hot, like hot dog. But they're all words and they cluster around each other. You can't really quite explain why they cluster around each other, but you know that they do. You know that the word leash is near the word dog in your mind. What large language models have done is they figured out a way mathematically to create those clusters. They take these words and they turn them into numbers. They create these coordinate systems, and they put things that are near each other in this space. Once you have that, you have this mapping of language into mathematical space, you can just find stuff that's near each other. And that's what Vector Search does is it finds stuff that's near other stuff. You have to go back to your math class. If you had two points on a graph, how do you find the distance between those two things? It turns out the math for that in an n- dimensional space is about the same as a two- dimensional space. If you did your geometry or algebra class, you could have done it for Vector Search. Essentially you get this function, this distance, and it says dog is pretty close to leash, but it's far from computer. Vector Search gives you the word dog and says, " Find me all the stuff near dog," and it creates a new data structure that allows you to do that.

Michael: It's a fascinating technology. It's all about proximity. It has me thinking about the limits of large language models. I mean, how static are these mappings? Do they change?

Doug Tarr: They do. They change. Probably the most known large language model is OpenAI's ChatGPT model, and they're changing it all the time. But yes, these vectors can move around. There's no doubt they can move around. I think it's up to people to understand how they move. If you use the same model, it's not going to change. There is a deterministic nature to it, but people are constantly improving those things and coming up with new techniques to make things more accurate. It may turn out that your model is making some mistakes and you might want to tune it to be more accurate for your use case. And then all of a sudden, all your vectors are in different spots.

Michael: I may be going down a rabbit hole, but I'm curious. There are several functions that you think about in a vector database and you've described one really well, and that's proximity between words, nearness of words. If I had said, I don't know, the bounty hunter, the distance between dog and leash and the distance between dog and the bounty hunter may be actually very close, but they're in different clusters. There's locating the distance between them to determine context. But then the use of Vector Search for an application has this other feature where now the engine needs to come back with a response. If I'm asking a question about a dog and a leash, there's determination of the context and then there's a response. Is that using the same technology?

Doug Tarr: Yeah, it's a good question. There are two components in the system. There's a large language model whose job it is to create these vectors, also called embeddings, and then there's the Vector Search engine, which is a different system. It takes those embeddings and it stores them in a data structure, and it uses an algorithm generally called nearest neighbor search. And like you're saying, the data's clustered. Most people don't do an exact search because it's too expensive. It takes too long. If you have millions of data points, you would basically have to scan them all, figure out the distance between your query and every single term, sort them, and then return the top end results. That's just not going to be feasible. The library we're using, Lucene, uses an algorithm called HNSW and it does this clustering algorithm as you've described, and it's an approximate solution. If the data isn't in the cluster you've looked at, then you won't return it. Most Vector Search isn't going to be exact like other search. It's not exact, but it's not exactly in a very different way than classic Lucene is. But there are some knobs you can do to control how far you look. But that is definitely one of the core considerations when you're building a Vector Search solution is how much of the space do you look at and how much of recall versus latency are you willing to trade to get those results?

Michael: Yeah, that performance trade- off. I want to talk a little bit about how Vector Search can be used by developers. We're all familiar with ChatGPT and that's an implementation of Vector Search and a large language model. Why does MongoDB have Vector Search and what can developers do with it?

Doug Tarr: We think that Vector Search is both a core search capability for people doing normal language application search, but it also enables these new application types like a chat function. I'll talk about both. The core search function is today you go type in a search box, you type dog leash and you want to see a list of the dog leashes that you're selling. We try and find similar ones. I think that maps pretty similar to this stuff I was talking about earlier is if you just map those words, you would just find things that were similar, and you do it in a way that is actually more intuitive. Because with classic search, you have to do all the mappings yourself. If there was a brand name or something that didn't really map to the actual words, you would actually have to create a synthesis table, or you'd have to write the query in such a way that it looked for all of these terms. It brought in a lot of metadata to help you do that, which is a very time- consuming process. And frankly, it's people's jobs to do that these days. Whereas Vector Search, it just learns it from the language. It just reads enormous amounts of data. It will see those brands and it will be like, " Oh, that's related. That's a dog brand. Pets. com is a dog brand." It could see them. You don't actually have to do that mapping yourself and it'll return it. On the positive side, you get a lot of that stuff for free. When you use Vector Search and you put it in there, you'll get a bunch of brand stuff or interesting stuff that you don't have to work on. On the negative side, maybe there's some of the stuff that you didn't want in there because it's just looking for things that find similar to leash or dog. Hotdog is similar to dog, so maybe it would find sausage, and that is not what I want to find. That's the part I think that for a classic search that's one of the big challenges is how do you tune it like that. The other new emerging technology is our chat based solutions, and that's what you see when you get to ChatGPT or other solutions where you're asking questions and trying to get answers. And as we all know from using ChatGPT or whatnot, the large language models aren't particularly good with truth. They'll just make things up, because they just see these relationships and things that are near each other and they're like, " Well, that's what you trained me to do. That's what I see." If you want truth, the state- of- the- art solution is to store the truth in a database and use Vector Search to map into that. Let's say you were trying to find an airplane reservation and you wanted to use a chat application to do that. If you just went to ChatGPT, it would just make up a reservation for you. It doesn't care. Whereas if what you want to do is you want to ask a large language model for some embeddings that point you into your Vector database and then you're like, " Oh, you're looking for flights from San Francisco to Newark," these things and it will give you it in a structured way, and then you actually go look into your application to find the results and then you send it back out through the large language model.

Michael: Great. It's really about a custom implementation that can be very specific to your use case, right?

Doug Tarr: It's about a custom implementation and it's about distinguishing the truth. You use Vector Search as a way to combat large language models hallucinating,

Michael: This all falls under the Atlas Search umbrella and Vector Search surfaces through the Atlas admin interface. Is that correct?

Doug Tarr: Yeah. We're in a preview right now where it's actually part of search, and I think that was something we delivered to get stuff into people's hands to use. Our goal is to make Vector Search to feel like a natural part of MongoDB itself. The implementation of it today is part of search, but really our goal is to make it just part of MongoDB and feel like a natural thing you use in your database.

Michael: Vector Search is a part of your team. We talked earlier about recruiting and finding folks that are interested in working on state- of- the- art projects. Are there specific requirements that you have for somebody joining that part of the team?

Doug Tarr: We are recruiting engineers who will work on both things today. The way I tend to work is until something gets completely specialized, you have a lot of shared infrastructure. We definitely would love people who have an understanding of large language models and an interest in machine learning because I do think people will be asking us for help understanding relevance and how to tune these things and how to build the applications. That is a new area that I think we will be looking for. We're looking for people who are curious, who can understand if I told you, " Hey, you need to understand HNSW. Here's a new completely new algorithm and are you willing to do that? Can you go learn about that and can you build solutions on top of that?" That's the kind of person we'd be looking for. People are open to new ideas. A search background is helpful but not required. Our systems are for people who are building production and applications on top of Atlas Search and MongoDB and we take that very seriously. We look for people who are comfortable building those platforms where any kind of outage or problem is considered a really difficult thing. We try and avoid that by planning and analyzing a lot. That's the shape of the person we're looking for.

Michael: It's a fascinating space, and I love that this is all coming together in the Atlas platform. What does the architecture look like that Vector Search is in?

Doug Tarr: Vector Search is really a feature of a database. It integrates with the rest of what you're doing in your database. In that sense, it's very similar to search, and we've built Vector Search on top of the same platform we built search on, which is Lucene. We are able to leverage a lot of the same infrastructure we've built to do that because a lot of the problems are the same. When you think about search, both Vector and classic search, one of the core areas that's really important for people is that it needs to be really fast. And like I had mentioned at the beginning of the podcast, databases need to be correct, and so they're not willing to make some compromises about consistency, whereas search engines often usually do. In terms of how both search and Vector Search work, we listen on what's at MongoDB is called the change stream, but really it's like a change data capture solution for MongoDB, where when you write data to your database, we have a stream that you can consume that lets you follow updates. That's how we keep Vector Search and search up to date is we follow this change stream, which seems like a very simple thing to do. But if you think about all the edge cases, oh, what if it stops working, what if you miss data, what if you get too far behind, all of those things are really interesting problems that our teams work on. Other problems that we work on are like how do you map data that looks one way in MongoDB into data that is in a completely different system in Lucene? This is stuff that if you were trying to build your own search engine or even use an off- the- shelf solution, you have to do yourself. We do it for you. What that means is it starts to look... It looks the same. When you use data in search or Vector Search, it will look the same as the rest of the data in your database. Some of the interesting things about using Vector Search is the solution today, HNSW, which is the standard solution, there are other solutions that make different trade- offs, is a very memory intensive process. Basically what you're doing is you're loading this giant graph into memory and you're searching it. If you have to go to disk, you've lost. Because as we all know, going to disk is a very expensive process. In order to do HNSW well, you need to keep stuff in memory. And then that leads to, well, what if your dataset is too big for keeping into memory? How do you partition that data? What our team does is figure out those things and how do your partitions work with the partitions you've done to your core database? A lot of what we're doing is building the infrastructure that lets people not have to think about any of this. You could just turn on a Vector Search index and we will manage that infrastructure for you and make sure that you do have enough memory, enough capacity, enough resources to run those squares.

Michael: Hierarchical Navigable Small Worlds, HNSW. If somebody is thinking about leveraging Vector Search in MongoDB, what type of setup is going to be required? You can turn on a vector index and then what happens after that? What are they going to need to do to continue to set up Vector Search?

Doug Tarr: I think over the last few months this has been evolving pretty rapidly, but today the state- of- the- art is you have some text, you sort in a database, and then you call out to a large language model, OpenAI's ChatGPT. It will send you back an embedding, which is an array of floats that represent that data, and you then push that into your vector database or Vector Search engine, which then takes that vector and indexes it into HNSW, which you just mentioned. The loop is if you have texts in your database, which most of us do, you send it off to ChatGPT, you get their token, you call their API, you insert that data just right next to your text in your collection. MongoDB is great for this because it's a document model, you can pretty much stick it anywhere. And then Vector Search, you just have an index. And then what the index just does is it tells us where to look for those vectors. Today, you have to tell us, you say, " Oh, this field called Title Embeddings is the one I want you to index," and then you maybe make a few options on there and we will index it and tell you when it's ready. At that point, you've got a working vector index. And then the next step for you is, well, how do I use this thing? How do I query it? To query it, someone might put in text either to a chat solution or into a search box. You do the same thing. You call back to your large language model, your API for ChatGPT. It gives you another embedding. You say find me and then you run a... Today you run a KNM beta query. That will take that vector and it will give you back the top K results that match that vector. It'll give you the documents back and you then use those in your application.

Michael: This sounds a lot like latitude and longitude, spherical... I mean, where the truth is geometry, and now you're introducing language, which is n- dimensions more complex. Is that a fair...

Doug Tarr: One of the ways people used to try and do this was... If you have a geo index, which both MongoDB supports and search supports in slightly different ways, there are ways to store that data that's efficient, but they're all optimized for two or maybe three dimensions. If you want to find the distance between San Francisco and Los Angeles, you can use one of these geo solutions to do that. But it only works up to two or three solutions, all the algorithms are super optimized, whereas this is n- dimensions, it's hard to think about n- dimensions, but it's the same idea, just a slightly different algorithm.

Michael: Well, Doug, thanks so much for joining me today. You've enlightened me in Vector Search and so many ways. I'm grateful for that. I want to end with a couple of interesting questions. If you could invent any piece of technology, no matter how unrealistic it is, what would it be?

Doug Tarr: I was thinking about this last weekend because actually I don't feel like it's that far off from where we are today. I'm very intrigued with large language models, machine learning, all the technologies around there like stapled diffusion, whatnot. To me, I think about this idea that you could create your own stories, your own movies, your own everything, and it's all customized to you. I think I really want that to exist, because I feel like today we're getting other people's ideas for the future and for being creative. I think the great thing to me about all this technology is it really enables a new kind of creativity for people. I want to be able to go to Netflix and just be like, " Tell me a story about my English Lab Hazel and she wants to go get a bone," and it makes a movie for me about my dog and we watch her. I would love that.

Michael: Yeah, that sounds fascinating. I'd love to see that come to fruition. All right, so let's say you could switch jobs with anybody, any person, what job would it be and maybe even who would you switch jobs with?

Doug Tarr: Yeah, this is a little weird and specific, but during COVID, I got really into... Before I was into machine learning, I was really into 3D modeling. And frankly, from my time teaching kids to code, I was into game development as a little bit of a hobby. I really love 3D design and modeling. This kind of goes back to the first question in a way. I've noticed there's a lot of artists who use... They take art and then they use machine learning, and then they take this 3D... Like something you would use to make a Marvel movie almost. They combine them. There's this guy. When I was actually in New York at the MongoDB headquarters, I stopped by MoMA, and this guy, his name's Refik Anadol, and he's this artist out of LA and he makes these generative sculptures. It's all based on machine learning. But he took every piece of art in the MoMA and he combined it in a model. He made this beautiful art piece that took up an entire wall. I just feel like that's an amazing thing. It combines technology, coding, art, creativity. That would just be a dream for me.

Michael: Oh, I love that. I love that. Are you artistic? Do you create art?

Doug Tarr: I am terrible at it, but I love it. I'm just not very good at it. I can't draw. I always try. I try to draw. I'm just not good at it, but I really love... That's how I got moved towards 3D things because it's no the same. For programmers, it just maps. If you're an engineer, oh, I can kind of model this. Literally my hand can't do it. I think I'm creative, but only using technology more than I am... My wife is an artist and she can paint amazing portraits and pictures and stuff, but I just can't do that. For me, technology's my canvas, I guess.

Michael: Yeah, I love that space, the vector overlay of technology and art. I paint, but there's a dramatic difference in the precision associated with the two worlds that I work in. Writing code is very precise, whereas art, it's less precise and it's more in the analysis of the outcome. Well, it's been a great discussion. I'm going to ask one more question, and it's really we had this discussion really to enlighten folks about Vector Search and about search within MongoDB and the platform, but we're also talking about an opportunity to join the team. Thinking about that, what skill or quality do you think is most important for someone to have a successful career in tech?

Doug Tarr: A successful career, it is empathy. Youth go into software engineering and you think my job is to write code, but it's not. And yes, when you're a new grad, yes, that is all you do, but you pretty quickly learn nobody really knows what to do and you basically spend most of your time trying to figure that out. And that is a human thing. You basically spend most of your time working with other humans, talking to them, listening to them. They take risks. They get upset. They get excited. And if they don't feel like you can collaborate with them, you're going to not be successful. But if you're someone people enjoy working with and you can listen to them, I do think that's table stakes. You have to understand how to code and whatnot. But if you can listen and you can take feedback and you can help people in their careers, I think that's probably the most important thing you could do.

Michael: That's great. Well, Doug, thanks so much once again.

Doug Tarr: Yep.

DESCRIPTION

Tune in to the MongoDB Podcast this week for a live chat with Doug Tarr, VP of Engineering. Doug leads the Search team at MongoDB and is looking to expand since the announcement of the newest addition to our platform: Vector Search.