How Multimodal AI Is Changing Search | Google’s VP of Product Robby Stein

Josh:
[0:03] Since seemingly the beginning of time, search has come in the form of Google and their glorious 10 blue links ask any question in the world and the search

Josh:
[0:10] engine goes off aggregating the world's knowledge and presenting it to you nearly instantly.

Josh:
[0:14] What can only really be described as magic. But now the interface is shifting. It's shifting from this black box to multi-modality and things like Google Lens where you now have sensors that engage with the real world and can now give you search in many dimensions, voice, audio, visual, all of the things. And I've been curious about this. EJS, I know you've been really curious about this. And thankfully, another person has been really interested in this. And this is Robbie Stein, who is now on the show. Welcome, Robbie. Robbie is the VP of product at Google search. So you seem like the absolute perfect person to discuss this topic with us. And I think the place I want to start is what does the new version of 10 blue links look like? Is search going to continue to be the same? And kind of what is the plan for Google search moving forward in a world of AI?

Robby:
[0:57] Yeah. Well, first of all, thanks for having me on the show. Really fun to be here and to have this conversation. You know, I think ultimately in this AI moment, it truly is expansionary. We talk about the fact that it's doing more for people than ever before. So when you think about what the future of search looks like, it actually starts from what it is today because the everyday need of someone grabbing a quick phone number, being able to pay a bill online, find a direct link somewhere, it doesn't really go away, but it's that Google search can do so much more. So I think what you're finding is that the search experience has enhanced and it's become a lot more powerful. Now there's AI experiences that start to show up. It's a little preview at the top of the page where AI could be helpful. You can imagine a world that someone asks a really hard question, you get more and more of those and by tapping into there, you're now having this more conversational version of search. So the first big theme is that you can go and have an AI-driven experience and one that's a back and forth. You have refinement, you have follow-up questions, you have more curiosity and you have more questions. And that's a different paradigm, but it really makes sense for a specific kind of question. You're just kind of browsing. You don't really know what you're trying to ask. Actually, a lot of the core search experience is really optimized for that kind of experience. And that might be the best, which is why you don't see AI for every single query. But I think the first big change you're going to observe is that conversationality. I think the second is multimodal.

Robby:
[2:12] I think in the past, AI and search in general has been more constrained to a web experience that feels more like a web page. You type text in, you get a page back. But increasingly, it's just this kind of ambient, knowledgeable AI that imagines something that has been encoded to understand as much of the world's information as possible, as much of the web and what parts of the web are most helpful for every single given question. And it's in this brain. You can talk to it. One way to do it is through text, through the web interface I just mentioned, and you can have a whole conversation. The other way could be live. So in our apps right now, you can have a live conversation while you're driving with the exact same model and just start learning about the world and just talk to you. It's kind of unbelievable. But the other one is images. And we see a 70% increase in the amount of visual searching year over year on Google. It's actually one of the fastest ways people are searching because people just want to take their camera out and ask questions about what they're seeing or ask questions about what's on their device. So you kind of move from this world of a page to a world where you can just ask about what you're observing and it can kind of. Fit the form factor that you need given your life, whether you're going for a walk, you're driving, or you're just on your couch on your phone. Google's there with you.

Ejaaz:
[3:24] I love this description of it's kind of like this ambient brain that's trying to break through the confines of whatever medium it's been chained to. And originally, you know, it's this 10 links, you know, the homepage of Google, the search ranking. This is something that's pretty nostalgic to me, right? And so if it's the doorstep to the internet, anything that evolves with that is is super important. And one thing I've been asking myself is, as AI has become popular through LLMs, through ChatGPT, through Google Gemini,

Robby:
[3:54] Seeing the search page evolve and get to where it is today.

Ejaaz:
[3:57] I would love to understand the timeline and decision points that made us or made you get to that final medium. So the way I think about it is we've started with the search engine and we get a ranked search page of 10 blue links, right? Ranked from most popular to least popular. And then we had Google Gemini appear, this LLM chat interface. And then this LLM chat interface got access to web search. and then we had Google search powered by Gemini where we started seeing more of these AI overviews and now we even have like Google agents that are doing autonomous work behind the scenes.

Ejaaz:
[4:29] Can you briefly walk us through this timeline and decision points of how we got here?

Robby:
[4:33] Yeah, I mean, I think that there's a couple of pieces in between actually that are helpful to clarify. I mean, I think Google a long time ago had, you know, blue links as this prototypical search page that everyone knows and loves but actually it's evolved a lot over the years since then. I mean, think about universal search and how if you ask for local information, maps information shows up or you ask about visual questions you get a universal for image surf and you'll see like related images or you've asked about trending information you'll get a top stories unit with articles at the top right so and then you also have very specific questions someone might ask where we would feature one web result with a extra large snippet at the top of the page that might be like hey how many i don't know you gotta give a very specific question that's lots of people ask, and we just kind of highlight that. So each of those, I think, have been fairly large evolutions to the experience. And I think off the back of that is where AI overviews came, because we had these rich experiences. You ask about weather, you get weather information right at the top of the page, where right at the top, you could get more specific information when you had a specific question.

Robby:
[5:38] And so a natural extension of that is how AI could unlock the ability to really ask these harder, longer questions with natural language. So you could ask a very specific question. Even if there's not one specific web page that has that information, you could provide this AI overview. And so that was the first big move in evolution of what we saw. And the next one was, OK, well, once you had that, we actually observed that people were trying to get it to trigger more and show up. So people would actually put the word AI in the search box. And so and then the other thing was they weren't done there. They wanted to ask follow up questions and there was no easy way to do that. And so really that led to this idea around AI mode.

Robby:
[6:10] Which is a way to have this conversational experience with search. And it's increasingly been, you know, easy to get to from AI overview. So you search, you get the little AI preview at the top, you expand it and you can go deeper in AI mode and have a poll. You can have 10, 20 back and forth conversation with Google now. And so all of that was kind of the main arc of how it's evolved. And then I think it just kind of ascends this kind of curve of complexity and kind of user need. We're like, OK, what's the next thing the user is trying to do? And you just keep asking that question and how else we can be helpful. So, for example, let's say you want to find a restaurant for date night in San Francisco this Friday at seven o'clock specifically. Like, it's possible that you could put that into Google and we get an AI overview and you go back and forth with AI mode until you, you know, like got to some list of great restaurants. But then you're not going to Google just to figure out a restaurant. You actually want the reservation as the ultimate end to that journey.

Ejaaz:
[7:07] Actions.

Robby:
[7:08] And so your next question is, okay, well, what could we do agentically to actually book that table for you or help, you know, we actually implemented now in his live a way for an agent to browse talk and open table and the web for reservations and then bring back in the list, not only great restaurants that use, you know, analysis and reasoning to present that list for you, but then in that list times that the table's available, which is really, really magical. And so hopefully that tells a little bit of the story of how we went, how kind of kind of evolved the experience from one that felt like a page to one that felt like an interactive AI conversation and one that could increasingly

Robby:
[7:47] do things for you and with you.

Ejaaz:
[7:48] That interactive conversation is exactly what I think this new world looks like. But I'm kind of struggling to get my head around what that mental leap looks like, right? So I'm used to search, I'm used to tapping keys on my keyboard, typing words and getting the kind of information that I want, and then kind of scrolling, clicking and getting to an action. Does the multimodal version of this look like?

Josh:
[8:13] And maybe we can even do it in the form of like a mental model. So like if classic search was kind of like type to links, then what is it when my camera, voice, screenshots, and TV are kind of like the way you interface? So like, what does that analogy look like?

Robby:
[8:26] Yeah. So I guess the first thing to say is, I don't think people want to dramatically change their mental model of Google. They want to basically think, and this is my personal view, and particularly as a user, how I feel, is in general, you kind of want to just say, I have a question. I want to ask Google this question. You put it into Google. And right now, the main way that happens is text, but it's very quickly growing to be multimodal, visual, voice, et cetera. But they're all just different modes of the same root need, which is, I have a question. Could be a question about something you're looking at. could take a screenshot on social media and say, I wonder what this outfit is and how I could buy it. It could be a tree you're walking past, take a picture of, you want to know what that tree is, but it's a question nevertheless.

Robby:
[9:07] And so you take your question, you put it into Google, and then you kind of decide what's the easiest way to do it. Well, if you're on your computer, on your desktop, or maybe you have a homework question, you'd copy and paste it, or you'd ask a question, or you just type it. If you're out and about on your phone, and you'd take a picture. If you happen to be on an Android device and you're looking on your phone, you use circle to search. People are already doing this at enormous scale. It's like a billion people a month using Lens, using these products where you can take a photo. So I think this is a pretty commonly understood pattern at this point. And then it just goes to Google. And from the user's perspective, they don't really want, I think, to be bothered with too much more than that. They just want to get their question into Google. And then Google should do the work to give you the best possible information. And so if you ask a question that's really basic and you want to browse about it, like let's say there's a new starting quarterback you've never heard of, if you put the person's name in. Like, sure, maybe you want a quick description, but you actually want to browse, typically, a people search like that. You want to see photos of the person, quick knowledge panel. You want to see their, like, recent posts on social media. You know, you want to see articles that have been written about them. In many cases, an AI response is actually not great for that kind of a question that's like a browsier kind of need. But if you ask a really specific question, boom, you're going to get AI right at the top with an overview and a way to dive in and have them back and forth. And so you kind of rely on Google to always give you the best format, given the question.

Robby:
[10:27] And if it's predictable, then you kind of know, okay, if you ask for something inspirational and imagey, you get images, right? You get visual stuff. If you ask a specific question about knowledge, you get AI. If you ask a question about a person, you get to see photos of that person, you get a description of people. So that's how I think about it. And I think what we hear from users.

Josh:
[10:45] So you're talking about the form factor a lot, which is something I'm interested in, because I think a lot of the times when people talk about it, ourselves included of AI, we kind of imagine, well, there's a chatbot or in terms of search, there's a chatbot and then there's a text box and there's not really much in between. And I guess what I'm curious about is the way you see the final form of this kind of evolving over time, because you describe a text box with Google and then we have the multi-modality with cameras. And when I hear this, it kind of reminds me of the early computer where we kind of started with, text-based stuff, which was better than punch cards. And then we evolved to a graphical user interface. And then we evolved to web browsers. And there was this kind of natural evolution that wasn't obvious, but in hindsight, it was so clear. And I'm curious if you have any ideas on where that evolution goes from here, because I know you pioneered Google Lens, which is amazing. I use the product all the time. It's my favorite way of engaging with the world because it just feels like this magical wand that I could look at anything and get answers from.

Josh:
[11:37] And I'm curious where you see that kind of progressing to over time. Yeah.

Robby:
[11:40] Well, I mean, you were talking, I mean, earlier you asked me about what's the pattern, like what's the mental model. And I guess in hearing you rearticulate it, the most succinct way of saying it is it's the mental model of conversation. And so if you think about, you know, when you have a conversation with something or a person, then you have a text field, you have a way to upload photos, you can go live with that person and see them right now, right? You can have like a live video feed, you know, chat with that person. They're all just different modes of accessing that person. And I do think that technology is moving in a direction where it's as simple to communicate with as talking to a person. I mean, it accepts all of the modes and all the ways of discussing with the person too. Now, it just turns out that most people for most needs don't need to have a long conversation. Like they're kind of single shot moments in time. It's like you're sitting at your computer. You're like, oh, I just totally forgot. I need to call this place. You just Google really quick phone number, whatever, call them. Right.

Robby:
[12:37] And so you don't need, I don't need to have like a 20 minute conversation with the thing. But if you think about in that little moment, you were kind of communicating with Google, you're expressing a need, you're getting information back. And so I think increasingly that is the right way to think about it from a mental model perspective. And so if you think about where things are going, you can kind of just articulate, you know, what else, at least for Google, we think a lot about informational needs. So like what other kind of informational task that we could do to be helpful for you could be new modes, like if new modes come up that could be useful but it could also be doing more for you given a certain question. The same way you might text someone to help you with something and say hey can you do me a favor and scan all this stuff and tell me what I should know about it or.

Robby:
[13:19] You know, can you, you know, book this event? I'm trying to get tickets to this game and it's complicated. You know, you might even hire like someone to help you in some cases, get the perfect ticket for a Superbowl game or some, you know, hard to get ticket. You know, maybe an agent could go do that work for you. And on the backend, it's almost like getting, texting a person, the person getting back to you saying, give me like a few minutes, I'm going to look into this. And then they give you, you know, some result. Like actually these two tickets are available and they're next to each other. And they're at this price point, except that the AI can do things that, you know, most people, you know, couldn't pay for or would be really hard for a person to go do because it would require you to search like hundreds of things to get back to you. So I think that's that's kind of a helpful way of thinking about it.

Josh:
[13:57] To that last point, the searching hundreds of things, we asked our audience what they wanted to ask you. And one of the questions is, how is Google going to survive when the perceived thing is that, well, AI is kind of cannibalizing search and one has an ad model, one doesn't. And I kind of want to ask you because I have a feeling what you're going to say. I want you to help me falsify this zero thumb theory or zero thumb theory as it relates to chatbots. And kind of like, I know I've heard you say and I've heard just Google as a whole say, AI overviews and AI as a whole, it kind of makes people search more. And in a way, we're seeing Jevin's paradox where as there becomes more data, there becomes more of an inkling to search more, to create more queries. And I kind of want you to help us understand that phenomenon, I guess, where in the case that there is AI, well, there's actually a lot more search happening. The number of searches don't go down, they actually go up.

Robby:
[14:45] Yeah. So I think AI is an expansionary thing. It's like people had all these questions and they could ask so much more, but they didn't because there are limitations. And so I think the best example of that is something like lens and with AI, which is the best way, right? So you can now take a picture of your bookshelf and say, I've given these books, what should I read next? Or you can take a screenshot of, you know, a celebrity outfit or something and say, where could I buy this jacket or something? It's possible someone would try to put that kind of a question into Google 10 years ago plus, but it'd be pretty hard to do that in the same way how it would be pretty hard for someone to type in like a 20 sentence question that's like I'm going on this trip I have a kid the kid has this allergy they're like this need I have to go to a hotel hotel needs to be far you just like couldn't do that and so people would eat would just kind of give up or not do it and so you get, growth when you're unlocking new needs for people. And that's what we're seeing. So I think the best way I can summarize it is the everyday need I just mentioned for people getting fast, efficient information from search isn't really changing. But now you can ask technology so many more questions. And that's where the growth is coming from. You know, and we talked about this, like, you know, AI overviews, you know, we're seeing growth, you know, we talk about where AI overviews shows up. It's around when people have a more specific question, they put like a longer, more specific question, they put that into Google, they get this AI response.

Robby:
[16:06] Those kinds of questions are up about 10% in large markets like the U.S., which at Google scale is a pretty enormous kind of growth number year over year. And then these kind of visual searches I mentioned are up 70% year over year. So huge growth in these areas where the market's expanding the fastest. And that's exactly what we're seeing. But obviously the pie is growing and there's lots and lots of opportunities for people to get information from lots of people. And that's exciting.

Ejaaz:
[16:31] I'm curious, how much importance do you place on consumer hardware devices when you're kind of thinking about building out this vision? You know, Meta has been attempting to, you know, revamp the form factor with glasses. OpenAI is rumored to be building their own consumer device. Goodness knows what that looks like. And, you know, obviously, Google values their partnership with Apple being the primary search function through Safari. Is this something you consider at all? Is this something that Google kind of wants to own and kind of like form themselves? or is it just kind of like, let's see what happens and search will be integrated everywhere?

Robby:
[17:05] I mean, I think search is so ubiquitous that we think of it as a service that should just be accessible to any device or context that someone needs help in. You know, whether you're on your phone and you have a question of what you're looking at, you're taking a picture of something like Google wants to, we want people to get the value of Google in whatever that context looks like. I personally, I'm a big fan of the, the kind of saying the best camera is the one in your pocket. I think it's just like a really apt point. And so like, there's lots of cool new things that will probably be created. They'll probably take time, but really there's just the convenience of what people use every day is important. You know, people are, people rely on certain technologies, certain ways of asking questions. Some are their camera, like they have a camera, they have it in their pocket and they're constantly taking photos. And so if you can be close to that experience, great. That means that You can be one tap away from sending your photo to Google, let's say, and asking a question where you're looking at. And I think that's how a lot of people want to interact with with technology. And obviously, there's an exciting breakout hardware category. You know, that would be something that I think searches historically has always

Robby:
[18:09] wanted to be a part of every new, exciting new way that people are interacting with technology.

Ejaaz:
[18:13] Robbie, the number one question we get from viewers of our and listeners of our show is this sounds magical. How does this work, though? Help me understand like what's happening under the hood. I would like to point that question towards Gemini search and Google search in general with the AI overviews and everything you've explained so far. Can you give us maybe a high level breakdown as to what's happening under the hood?

Robby:
[18:37] So at Google, what we've done with AI is we've really trying to create the world's most knowledgeable AI. So one that really is connected to the vast information that obviously like sits at Google, but also around the web, right? Like it's about connecting and understanding the world. And that's unique. And so we have an opportunity to have an AI that, you know, obviously there's billions of products in the shopping catalog, hundreds of millions of places that businesses are updating with local information every day. There's a trillion facts in the knowledge base, updated all the time. There's information about live finance data, sports, travel information, live prices for flights, all that stuff. Like we want to be able to make that easily and quickly accessible to anyone. And then you obviously have the vastness of the web that we want to connect you to. And so the models that we've built, there's an AI model that is kind of based on Gemini's foundational model, which is the one that kind of is a large language model that understands natural language and multimodal questions and is able to generate responses.

Robby:
[19:36] Is able to also understand all that knowledge. And so you can ask a question, and what'll happen on the back end is the AI model will start to actually generate Google searches to start researching that. And given the complexity of the question, may actually spend time thinking and reasoning and doing research. And so if you ask a question about what kinds of sunglasses you should get and learning more about polarized versus not and its benefits, there might be dozens of questions connected to that question. And what would happen under the hood is that the model is actually searching. It's issuing a bunch of Google searches as if a person would, and it's potentially using APIs like the shopping graph to do research as well. It would then retrieve all of the relevant information. And then because of, you know, all of the search knowledge and signals that are available in search, kind of a good understanding of what information is great for a given question. All of that is brought back into the model and the model reasons about it and generates a response with links to dive in, learn more, potentially buy the thing you're looking for and continue your journey. And that all happens, you know, through this AI experience. And so this is largely previewed in this AI overview. So if you just put a hard question right now, you go, how do I get ketchup stains out of a light white couch? Put that into Google. You're very likely to get a little AI preview at the top, AI overviews. And if you were to expand it and click AI mode, you can have a whole back and forth with that. And that's the model that's doing that.

Ejaaz:
[21:01] Blot the excess ketchup immediately, Robbie. and apply a solution of cold water

Robby:
[21:06] And mild dish. I knew that was my problem. I love hearing how you are.

Josh:
[21:10] Robby, I have to confess something, which is that I'm a bit of a fanboy. I didn't quite realize until recently because you had a stint at Instagram and you just mentioned a little earlier, the best camera is the one in your pocket. I love photography. I love taking photos. And particularly, I love Instagram stories because it's my favorite way of sharing content in the world on the internet. And what

Robby:
[21:31] I found out is.

Josh:
[21:32] That you were the guy responsible for implementing that at Instagram, which was amazing. It's such a great product application and probably the single reason why I still use Instagram.

Robby:
[21:41] That's great to hear. A lot of people worked on making that successful, but I was very privileged to have a chance to work on that as well. Yes.

Josh:
[21:47] Oh, it's such a great feature. So what I kind of wanted to ask you about that is how you think about implementing features like that, kind of as it relates to search, because I imagine this wasn't an original idea. Famously, Snapchat kind of had it first, but then Instagram implemented it at what I believe should be much better. And that's the one that I use. And I remember hearing you on another interview, kind of describing the way you reason through, which is like on Snapchat, I couldn't upload my own photos. I had to use the inline camera and that was just a poor experience. But on Instagram, I could upload my own photos that I thought were much more

Josh:
[22:17] beautiful and I much prefer that. So I guess I'm kind of wondering the thought process that was behind that and how you apply that to companies like Google, where now you're developing product for this new AI technology.

Robby:
[22:28] Yeah. Interestingly, I feel like there's a lot of similarities in terms of what to learn about product building from that experience. And I think, you know, the main one is, is that if you have a product that's beloved and used by lots and lots of people, you don't want to dramatically upend that on people because there's a natural, well-worn path that people are traveling every day, billions of times a day. And you don't want to just show up one day and just have it feel like upside down world. Like that's just not a service to anyone and will create a bunch of problems. Now that said, if you're building in a space where your need for your product is directly connected to what people already come for, in this case, information at Google, but people just want to do more for it. There's a natural opportunity to expand what you can do for people in the way that people came to Instagram to share through photos and it turns out there was a.

Robby:
[23:10] Potentially even better way to do that for friends through stories because it was this low pressure kind of ephemeral format and allowed for this you to get a dm and have a fun competition with your friends and feel connected and that whole system really worked well but it didn't replace instagram you know it became an additional way that instagram could help you and i think of ai and search in the same way people come to google every day for information billions and billions of times a day and actually people have tried typing in crazy stuff into search even before ai existed really. And before you couldn't really do much, you know, I might even get to the end of the search results page and say, we couldn't find anything, but now you can really help with almost anything. And so that feels like a natural thing to do, but from the same learning, you have to really design for the needs of your user. And so in the same way for search, like, you know, remember when you asked the question about what was happening today and models used to be like, oh, I don't know. I was trained up until a year ago or something. It's like, it just like, Like, it always seemed crazy to me that you couldn't get information within 100 milliseconds of what was happening in the world just because of the way technology evolved. But now, you know, this has evolved to be very different, but particularly in Google, it's finding information in near real-time basis across all of our knowledge. So that's something I think we can do uniquely well. And, you know, another example is around visual and inspiration. People come to search all the time. They search for images. They go to image search. There's a huge search engine in and of itself.

Robby:
[24:31] But people look for design, they want wallpapers, they want uplighting ideas, they want to redecorate their kids' bedrooms, and they browse for these images. And if you ask AI these kinds of questions, it'll like describe in text how to design a bedroom, which I always thought was really weird. And so now with visual AI mode, you can ask, help me design, you know, my daughter's bedroom, looking for ideas, looking for inspiration, could be about, you know, anything. You could be shopping for fashion dresses and the AI mode will actually go find inspirational images and then you can have a multi-turn conversation. You could say, actually, I want, you know, maximalist dark tones and super brooding theme. It will like know what that means. And using a lot of our lens and technology for imagery, go change the whole grid from something airy and light and Californian to like this dark lodge vibe. And like it knows what that means visually. And I think these are ways that Google, based on what Google users need, can add unique value to the world, you know, versus just trying to implement another kind of general purpose chatbot, which isn't what our intention is.

Josh:
[25:34] I'm curious to understand what the advantage that is uniquely Google's, because to that example, the reason I'm using a virtual background, I have nothing on my walls. I'd love some help getting some assistance on that. And I understand Google is good for that, but I'd love to kind of understand why Google is uniquely good to that. Because if I ask another model, if I ask XAI or Grok, they'll actually go and search the internet, which I assume is mostly indexed by Google. Is there a unique advantage to Google being Google versus having to query against Google? Like the unique data set, the unique kind of profile and indexing that Google does that separates you from a lot of other companies in the same space?

Robby:
[26:08] Yeah, I mean, there's a bunch of things that I think allow us to be really uniquely helpful in these cases. I mean, I think one is just in the technology and the inputs itself. Like there's been many, many, many years of building multimodal capabilities for image recognition and visual understanding. So our models are able to segment the background of your experience, put attention on the correct parts of the object. If you were to say like, hey, like what's up with the. You know, I want like a little tree behind me on the ledge or a little, you know, plant like, well, what's a ledge, you know, and what is behind you mean? And what is the bottom shelf versus the middle shelf? How does the model know which part of the shelf to look at? Our models really understand that really well and uniquely well. Then once you select that region, you know, you might say, hey, replace that plant. Like I want a better one. Okay, well, what visual imagery have other people clicked on and used that are and what has been inspirational and helpful for those people in those journeys. Whereas, you know, I could probably find you like a janky plant that technically is a plant on the web, but without ranking or understanding, if people found that useful in the past for other plant searches, you know, you might not know that this is actually a really helpful plant that lots of people have found and clicked on and enjoyed when looking for, you know, office decor plants, right? Which is something that, you know, I think Google might more intuitively be

Robby:
[27:25] able to offer, given the people that come to Google to search for these kinds of things.

Ejaaz:
[27:29] I want to shift the conversation to ads, the monetization model, because in my mind, this breaks completely when everyone has an AI agent that represents them on the internet, that does all their shopping for them, that has access to their wallet, spends everything for them. How does this mental model, or I guess business model, break in Google search? If you're not pitching adverts to human eyeballs and trying to get their attention, how does it work with AI agents.

Robby:
[27:58] Yeah, I mean, I think there's a lot of unknowns here. This is a very kind of fast moving space. But I think one thing to mention, I think I mentioned before, is that, you know, people are still doing at scale, the kinds of questions that they're doing. And so I view this as, you know, right now, people's ability to do more with agents. And so this feels like.

Robby:
[28:20] You know, you can't spend an hour, let's say, like I was recently trying to find out, like I was looking at buying a safe. Like I have some documents that my bank closed my safety deposit box and say they don't offer that anymore. I was like, that sucks. I probably should put all these in a safe somewhere. Right. And it's kind of annoying to go to the bank. Too much information, but this is the story. So it's actually really complicated to buy a safe. There's like all like I so I used our deep research product or deep search and it's it looked at like hundreds and hundreds of various places and it created this incredible guide to safes and it was like there's different things around moisture different implications on insurance like I would have never spent time doing that. But now that I did it has all of these links it has reviews I've read it has opportunities theoretically for me to go buy those safes in ways that I probably would just put this chore off indefinitely and like never do and those all create new opportunities not just for discovery but for monetization and other things down the road and then obviously if you're talking about agentic tasks where you never need to show anything to the user like theoretically I don't know in some infinite timeline a model knows me so well a safe would just show up in my house that's like perfect somehow. I don't know if I totally believe that that's ever true, but let's say it is. I mean, I think things will just evolve in ways we don't totally understand.

Ejaaz:
[29:36] Yeah, it sounds like the shopping experience actually becomes richer. And so we delve more into knowing what you want, whether it's purchasing a safe or buying a new shirt for an occasion that's coming up. And it feeds Google kind of like this additional information. Do you see the ad model evolving in any way or kind of staying where it is right now?

Robby:
[29:57] I think the ad model is definitely going to evolve because the format evolves. And typically, if history repeats itself, you know, ads and is information. And it's actually really helpful information and content. And it's also a way for people to discover, you know, new businesses and services. And so when there was a shift to mobile, there's a new set of formats that came up for mobile. When there's a shift to video and short form video, there's a new type of ad formats for videos and they're taller and they're more authentic and there are people talking about products and it feels great. And so I think in the AI world, we'll see something similar. And in an agentic world, you might see something similar again, and it'll feel more natural to the format of of, hey, like you're just kind of talking and here's some information. And by the way, hey, here's maybe another, Here's a deal you might want to know about, which I think you're starting to see some experiments around.

Ejaaz:
[30:43] But I mean, we have to address the elephant in the room, which is like,

Ejaaz:
[30:47] this is a lot of power for Google to hold, right? So like, how do you think about treading that line behind, like, you know, responding to our user prompt in a way that's helpful and factual with also kind of like giving sponsored content or a sponsored product embedded into that response?

Robby:
[31:05] Yeah, I mean, I think this is something Google's we've had to do for most of the existence of Google. I mean, people already come to Google for these kinds of tasks. And there's ads, you know, and, you know, on a page with results as well. I think the principles stay the same. It's like one, we have an honest results policy, like ads will not affect the core experience of anything you see. So in AI, it's no different, like will not affect ranking. Advertiser can't change, you know, the organic reply of what the AI is recommending you. And now it doesn't mean you can't insert opportunities to discover new things, but then those things need to be labeled really transparently to show the user, hey, this is something that you might want to know about. This is an advertisement. In the same way, that exact way that it works today on search. So I think the principles don't change. You just, you have to kind of rethink them foundationally every time there's a major move in how people consume information. I think you saw that with video, we saw that with mobile. And I think people will see that again in this kind of more conversational-based paradigm that we're seeing.

Josh:
[32:01] Do you have any ideas of what that form factor is or how that exists? Because I guess the perception is that if these AI overviews trigger fewer clicks for some of the publishers, then surely there needs to be some sort of monetization or controls. Do you have any ideas that you're considering implementing of things that should exist?

Robby:
[32:16] I mean, I think we're running experiments right now. I don't think anyone, I think this is a learning exercise, but the principles are similar, which is that if you search for information, you should be able to be able to find it, go deeper and have control and transparency over what you're seeing. And I think we've started to experiment particularly in different AI surfaces. So with AI overviews and AI mode, there's experiments now with advertising across those experiences to learn about what could work well there. But I'd say those are still in the earlier days. And then I think ultimately what we find is that if you search and you see an AI overviews, you know, those pages, you know, largely monetize very similarly to ones that don't have AI in them. And so you kind of like get to a point where you're searching for something and once you're kind of down the funnel of like, I'm looking for this ketchup removal thing and I just want to know, I just need to know how to blot it in this way. Like, turns out I was very unlikely to probably want to go buy a product in that moment. I probably just wanted to like know how to deal with this in the next two seconds because I have like a, I've got an active situation on the couch that I need to deal with. And so yeah.

Robby:
[33:22] You kind of learn, you'll also learn what the moments are that are going to be most helpful for people to discover new things.

Robby:
[33:27] And from there, the other thing I'll just say overall is that you mentioned links and how to encourage and understand how people can discover the web. This is absolutely essential and something that we take as like a foundational design principle to everything we do. And then Google and search, you know, care more about the web than arguably any company, any product out there. And so one thing our models do uniquely is they actually are using and understand all these search signals. So they know for a given question, what websites are really useful. And so you see, you know, what our approach here is not only to provide helpful links alongside, but also to embed them. So as you're reading, you can click and go deeper for anything that you see. And what we're finding is that people do click and they indeed want to go deeper. Just the paradigm is kind of changing where people want context first. They kind of want to get a sense, a gist of things, and then they want to click in. So say I'm trying to get a credit card or buy a mattress, like I ultimately probably want to read what the experts are going to say about something and read a whole article. But I'm going to get a little bit of superficial information first and then I'm going to go read, let's say there's, I'm going to read what people say probably, like on some social media threads, I'm going to read what experts say, like who are paid professionals who spend a lot of time analyzing this stuff and then I'm going to make, and then I'm going to purchase.

Robby:
[34:40] And that's what we see. But then we think our job is to make those connections possible. And our hope is that AI, because it's incremental largely to what we see in search, there are new opportunities to connect you to new services and new websites that you wouldn't have found because the AI is also doing broader searching than what you would do. And so the hope is that you also can promote discovery long-term as well.

Josh:
[35:03] Okay. And then, yeah, that gets to the positive some pie where there's just a significant more search and curiosity than I think a lot of people perceive as it becomes easier to unlock the answers to that curiosity. So there's one question that I had. We actually, we had Logan Kilpatrick on the show, who is part of the Gemini team. Amazing guest, amazing episode. I suggest everyone go find that and listen to it after you're done with this one. But what it unlocks for me was kind of the behind the scenes look at what that team does, what the Gemini team does at Google. And it kind of, it reminds me of this Manhattan project type thing where there's this small subset of people working on this really intelligent AI, but there's a slight disconnect where it's kind of under a separate thing and it interfaces with Google and search. So there's Google, but there's also Gemini. And I guess what I'm curious is, is kind of what that relationship is like between Google and the Gemini team and how you guys kind of work to integrate these products together, because there's the Gemini AI studio and then there's Google search. So kind of what's it what's a good way for myself and for people who are listening to kind of compartmentalize and see where the synergies lie between those two entities?

Robby:
[36:03] Yeah, I mean, we work incredibly closely with the kind of Google DeepMind Gemini teams. It's a way to think about it is there's these foundational models that are increasingly able to understand any question and help find information about it or generate information about it. And there's people working on the frontier of what that looks like in many ways. And then those are many of those folks. But then how that is brought to life with products people use every day and love is really around the product teams. And so Google search obviously is one of the largest, not the largest, you know, ways that people interact with AI even today. And we work extremely closely and kind of think of it as helping, you know, really push the frontier, particularly for how models are using information. And we'll work closely to bring those models into search and customize them and make them work really well for all the things we just talked about. People are kind of get bedroom inspiration. They're taking photos of things. They're asking about closing times.

Robby:
[36:59] And what will happen is the modeling team will think about it more as capabilities. Like let's say you want the capability for the model to use a tool because, or to have reasoning so that the model can think a little bit more. So someone will go research that and add that capability. And then, you know, from a search perspective, the tools that it's using is something like finance. And so it can make a real-time query to look up financial information. If you're right now, if you ask about any two stocks, you say like, you go to AI mode and you say, compare, you know, last six months of.

Robby:
[37:31] You name two stocks put them in there it'll actually use google finance as a tool and make a request for live information and historical data and then it'll plot that information in the ground now that uses gemini as a model so it has this foundational ability to like understand, what the question is that you were asking but then it has the search ability to like use all these search tools which is really cool and then it can generate that kind of a response for you Can you.

Ejaaz:
[37:58] Just a follow up question on that? Can you can you walk us through what that was like integrating Google search as it was before with an LLM? Like presumably there was like some friction that you ran up against where there was like combining data sets and stuff. I'm curious what that looked like. I don't think there was

Robby:
[38:14] Too much friction per se. I mean, I think the main thing is when you're adding a model into the mix, it has to just be done in a very specific way because models have different tendencies in terms of how it responds to information than kind of the otherwise the search stack that has been built out. But they work together now, you know, harmoniously in the search environment. And so some questions kind of produce these AI responses and other ones are kind of have different AI enhancements. But I'm not sure if there was a more specific kind of tension you were alluding to.

Ejaaz:
[38:45] No, no. I was just curious whether there was massive kind of uplift, whether that was developmental or on kind of like taste making for users that you kind of like run up against. But it sounds like there wasn't much. Another kind of wildcard question that I had, Robbie, is... I like the personalization of using Google search. One that I wasn't even aware of was if I would type in someone's name, for example, a celebrity, and it would come up with their net worth or something. And I would show my friends, I would say, see, this is the most searched thing on Google. And they would be like, well, not quite. Like it's kind of looking at the cookies that you've searched before. And maybe it's kind of like adjusting its preferences to what you might want to search. On that theme, how far down the rabbit hole can we go when it comes to personalization of Google AI search for someone, right? Like, is there a world where I'm not just kind of like discovering new websites or apps or experiences on the internet, but it is highly tailored and personalized to other data sets that are kind of like unique to me, right? It knows my shopping preferences, kind of knows how much money I have in the bank. It kind of knows that I have an event coming up in a couple of weeks time. How does that world look for you?

Robby:
[39:54] Yeah, I mean, I think first of all, there's Google overall, and then there's the questions that people typically use for AI. I think for Google overall, there's plenty of questions out there that aren't great to be personalized. And so we think about a lot, the differential value in being personalized. If you ask about how tall the, you know, Empire State Building is, like it's kind of just like a factual piece of information. Maybe you want to, maybe there's some personalization on the source, like if you really like certain facts or something from specific providers, but, you know, many things are not great. And I think that there's a value in just having this kind of universal place you go to seeing what things are showing up for a given question. And that said, I think many questions, it's the opposite. It's almost weird not to personalize it. Like if you say like, what kind of genes should I get? It's like, well, I don't know. Like what kind of genes would you, like for me, that's gonna be super different than, you know, some other, if I were to grab a random person in the world and try to get them some genes, right? And so, and I think it turns out that in these AI experiences, like AI overviews and AI mode, people have a lot more questions that are.

Robby:
[40:56] These kinds of advice-seeking, recommendations, questions. They want to know where to eat for dinner. They want to know where to travel with their family. They want to know. It's kind of in this more subjective camp. It kind of depends. And so we think there's a huge opportunity for our AI to know you better and then to be uniquely helpful because of that knowledge. And, you know, one of the things we talked about at I.O. was how the AI can get a better understanding of you through connected services like Gmail so that over time it could really know, wow, like you tend to like these kinds of products or brands. Here's another one that just came out from them and how much more useful that would be than just generically showing you like, I don't know if just whatever the top 10 selling gene brands are right now, generically a list of that and how you would, how helpful that would be. And so that is, I think very much the vision of building something that can be really knowledgeable for you specifically. But I think there's a nuance there in, in, in what you, what you personalize. And I think our thought on this is that the user's always seeing like what parts of information are being shown because of your interests or purchases or things that you've done in the past or things that it might think you might like versus things that it's just suggesting generally. I think people want to intuitively understand when they're being personalized, when information is made for them versus when in something that everyone would see if they were to ask this question. And this is kind of the wisdom of the crowds, you know, so to speak,

Robby:
[42:18] represented in a Google search page.

Josh:
[42:20] As we get closer to the end of the show, I love ending on a more optimistic note. So I want to talk about the future vision, the future that you're excited about as someone who builds products that billions of people use. So it was funny, just today, I saw a notification that Jim and I rolled out on my TV and now I have it on my car or on my phone. And then I have it like on some Google watches and some earbuds. And what about cars? What about things that are sitting on my tabletop? What I'm curious to ask you is like, what does the ideal day for you as a product designer where your product slots into this look like for someone when search is kind of everywhere and rarely typed? Like, what does the final form look like? And how does that change the way people go about their day-to-day lives?

Robby:
[42:58] Yeah. I think that really it's not a final form. It's more of like a multi-form. I don't know if that's a word, but that's kind of how I think about it.

Josh:
[43:05] Okay. I understood. Yeah.

Robby:
[43:06] It's kind of like, there's not a single form. I think the, the, it's adaptive. And so the way that you think about it is I love thinking about these journeys where they're multi-day people have needs that are kind of like pending and they're kind of working on them over time. You know, like I think about a lot of people like buying a couch for their apartment, let's say, and it's not just like an easy thing. Like you might be on your computer, you're doing some research and you want to just find cool couches for like apartment in new york let's say you if you use ai mode today and you ask about it you'll see a visual grid of couches and you might actually click on a few of those and it might recommend you some things that you might like okay great you kind of like okay those are kind of cool i'm gonna think about it then you're going for a walk and you walk by you know like a furniture store and you see something that strikes you at a friend's house and then you go to your app you go back to your thread you upload a photo of it and you're like oh actually this is the thing that's super awesome. Like I actually wanted, this is like stuff that's like this. Great. Thanks. Here's some things like that. All right. Put that away another time, blah, blah, blah, blah, blah. Then let's say you're driving and you like pop into your head that you really actually realized that this other color was one you wanted. You go live and you say, Hey, remember I was kind of talking about couches. I actually like this color and I'm mostly focused on these two colors. Send me some recommendations for those. Great. Got it. Boom. And then it does that and then a few days after that maybe you get a push alert that's like there's a deal where one of the ones you were considering in the exact way.

Robby:
[44:30] Has that, you know, and there's like a sale. I don't know. It's a, what, Cyber Monday coming up, a Black Friday, all of the above, big shopping season. Maybe one of the things you were loving was available. Like these are all these ways that Google now across modes, across kind of different

Robby:
[44:44] aspects of your life, being incredibly helpful to you for this need. And I think that's more of how I think of the future of search than any one specific feature or kind of, you know, single form factor?

Ejaaz:
[44:57] Well, we are just about coming to the top of the hour. Robbie, thank you so much for taking time out of your busy day to chat with us. That was a fascinating conversation. I think a lot of people in life just kind of take for granted what Google search has brought for them. And kind of like the way that this thing evolves and the way that it kind of like permeates every facet of our life, especially when it goes fully multimodal, is super important to understand. So thank you for taking us through that journey. Limitless listeners, if you enjoyed this show, please give it a thumbs up. We know that a bunch of you aren't subscribed to us. So we need you to get on that button and kind of focus on that, please. If you enjoyed the show, and if you enjoyed any of the episodes that you've listened to so far this week, please give us a five-star rating, and we will see you on the next one. Robbie, thank you again for joining us.

Robby:
[45:46] Thank you for having me.

How Multimodal AI Is Changing Search | Google’s VP of Product Robby Stein
Broadcast by