Grok-4 Is Now The Smartest AI Model In The World | Everything You Need To Know

Josh:
[0:03] We have a new top model in town a new king has been crowned grok 4 is announced it is the now the smartest model that's ever existed in the history of all time according to all the benchmarks that were shared last night it was pretty amazing i stayed up late last night watching the event it went according to elon time it was very late i stayed up probably until like a little after one in the morning but we we have all the notes and it's amazing this model is smarter than you

Josh:
[0:27] It is smarter than your PhD friend. It is smarter than any PhD in the world at any category that you can imagine. It's incredible. And one thing that I wanted to highlight before we start this episode is just how impressive the rate of acceleration is from the XAI team, because now they're sitting above OpenAI. They're sitting above Claude. They're even sitting above Google. And they haven't been around for that long. So in this chart, we're showing kind of like each one of these bullet points is a model that has been released. So you'll notice Grok has, what is that, two, four, six points. Grok has released six models over the course of the last 24 months compared to open ai that's been doing it since well before 2018 we have anthropic that's released many more models than xai and the rate of acceleration is incredibly impressive so before we get into exactly how everything works what is in this i want you to just kind of share first impressions because to me this is like home run they did it no one thought they would do it they did it they now hold the crown for

Josh:
[1:21] the best model at least in terms of benchmarks in the world i'm.

Ejaaz:
[1:24] I'm honestly shocked, to be honest. I'm a massive fan of Elon, but something about starting a company 28 months ago when you've had all the anthropics and the open AIs in this world just kind of hammer and tonguing it for years on end, I just didn't think it was possible. But he's not only come through at creating the best generalized model. So that's feature number one. It's better than ChatGPT, which I know the viewer and listener listening to this uses on a daily basis. So you now have a new model, which is arguably better than the experience that you have on your favorite model, right? So I'm using Grok4 now more than I use Chachibutian. It's only been like 11 hours since it got released, right?

Ejaaz:
[2:06] The number two feature was something really unexpected, Josh. So for a number of episodes now, we've always heralded Anthropik's model, Claude, as the number one coding model. It's been displaced. It's done. It's Grok 4 now. I hate to say it, but now Grok 4 has somehow managed to do what OpenAI has done and also matched the coding level, which is something OpenAI themselves have failed to do. But I have a third feature, which I'm super pumped about, which is, you know, some AI model producers like to compete at the same categories. You know, they like to compare themselves at the same features. Grok decided to create a completely new feature category and that's in gaming. They announced and they spent, I think, like 10 minutes in the live stream, Josh, talking about how Grok 4 is going to be really amazing at helping you create games. So think about like vibe coding and how products like Cursor were really good for coding up any kind of generalized app. But it was never specialized onto anything. Grok 4 is specialized for creating games. So now you can create like a Minecraft level game or a high fidelity racing game or something as simple as Tic-Tac-Toe or Tetris in a matter of seconds. And if you remember actually, and we can get into this later, but this is something that we predicted in yesterday's episode where we were like, I think Grok 4 is going to come out with something gaming related because Elon is such a major gamer. So super cool to see this.

Ejaaz:
[3:28] And then the final thing, which sounds the nerdiest, but I think is super important to focus on, is it is smarter, not smarter than just any PhD, but any PhD in any kind of sector. So you may have a PhD in science, specifically like physics or maths, or you could be a PhD in kind of art and philosophy. And this new model is now better than that. And the final feature, which I just remembered, because there's so many features this new model is kind of like topping, is the video and audio side of things. Josh, I know you've been playing around with the voice mode quite a bit, actually maybe you want to talk about the the video side of things

Josh:
[4:06] Yeah so some of the stuff is here some of the stuff is coming so the game stuff the coding engine the video generation that is coming soon so before the end of the year we'll get this it's built on top of the model they're kind of iterating but in terms of things that they have today they do have a new advanced voice mode and the new advanced voice is excellent one of the things that i noticed when i was playing around with it this morning is not only does the voice sound

Josh:
[4:26] great but the latency between the request and the answer is so short. It feels like you're actually conversing with the person. You say something, it spits something back with you, and you could also control the speed at which it replies to you. So the way you listen to a podcast, maybe at 1.5 times speed, you could actually just change the speed that the AI speaks back to you. No way. So if you get a little impatient like me, this is a very nice feature. I toggled it up to like 1.4. We're going to try that, see how it goes.

Josh:
[4:51] Yeah, the news that they announced is amazing. So I think people are probably wondering like what, what exactly makes this so good? Where's the proof that this is good? How does this all work? How do they accomplish this? I mean, going from zero to number one in 28 months is no easy feat, especially because Grok 2 has been out. When Grok 2 was released, it was less than 12 months ago. So the amount of progress

Josh:
[5:10] they've made over the course of the last years is pretty incredible. And we have it here on this, this visual you just pulled up. Grok 4 is smarter than pretty much all grad students at everything and what was interesting about grok 4 is that they did this thing called reinforcement learning training where they applied 10 times the amount of compute that they did in the previous model towards reasoning and reasoning basically is taking these facts but applying realistic knowledge to them so it's like if you could imagine grok 3 was a student in school that learned a lot of textbooks but never actually went out and got a real job grok 4 is the person in the workforce who's applying this knowledge to the real world and reinforcement training it's been debated whether or not it actually works at scale this i think proves that it is and basically what happens is you feed it a bunch of problems and you say hey this answer is correct or this answer is wrong and it iterates through that over and over and over again until it learns how to apply this knowledge to a broad base so it's it's incredibly smart at that it's something that's pretty novel in terms of ai training no one's ever applied this much compute to reasoning and i think it shows in this model, then that's part of the reason why it's so smart is because it's been trained on all this data, but then iterated through all of these questions until it is.

Josh:
[6:18] Brilliant, highly skillful model.

Ejaaz:
[6:22] Got it. So if I were to summarize what you just said, Josh, it sounds like, okay, you know the people that spend their entire time in academics, right? They're getting degree after degree, they're getting their master's degree, they're getting their PhD degree. Now, that's a lot of intelligence and knowledge that they're absorbing in that whatever, five to ten year period that they're studying. Right. But it's all kind of theoretical to an extent, you know, and there's certain disciplines where you go out, you do an internship, you get some practical work experience, but it's not really real life. It's not you're not really on the job, right? You're not really at the edge. And what you're saying here is pretty much the equivalent amount of knowledge that is gained from the academics and studying and kind of like school period is equaled. With the real-time work experience that someone has, right? For a model. And that's really where this model kind of like separates itself from all the other models that are out there. It has real-world practical knowledge. It understands all the different terms that you're referencing, maybe in social media culture or any kind of work terms that you're mentioning that you're currently experiencing in your job. It just kind of overall gets you better and it understands where you are at the edge of your learning and what you're trying to achieve at your task. Is that is that right josh is that is that fair

Josh:
[7:36] Yeah it's applied knowledge you could imagine it okay now it's like imagine grok was was a million people that learned in college and then went out into the workforce and it's accumulated millions and millions of years worth of work experience and it's now applying that to the answers that it gives you so yeah

Josh:
[7:51] that's the benefit that they they found from this actually.

Ejaaz:
[7:55] Another thing on this topic josh was actually a concept i kept on seeing which was humanity's last exam and how grok 4 had basically achieved the highest score. It was actually almost double what the previous model had achieved. And I kind of want to set the context as to why this is so cool.

Ejaaz:
[8:10] Humanity's last exam is basically AI researchers bet on AI models getting to human intelligence. That means AGI level, as smart as humans or even smarter than us. So it's, as you can imagine, it's a really, really tough exam. And it's hard for AI models that have currently existed today to crack. But Grok4 kind of like came in and they were kind of expecting it for it to surpass the previous score, which I think was about 24.9% achieved by OpenAI model. And they were kind of like, yeah, it'll probably hit like 30 or something. It almost doubled it. It's almost at 50%. And the way I kind of look at that is that like, if it's improving at such a quick rate, well, how long has this company been around? 28 months? where are we where is it going to be in the next 28 months because this is like an exponential curve we just looked at a graph that you showed us where after six models open ai sorry croc has already kind of reached frontier level model it's it's beaten every single benchmark i can't help but think that this exam is going to be blown out of the water in a matter of i don't know a couple years at this point which is shocking for me because i assume this agi thing is still a number of years out despite you know all these papers opining about it being you know ready in 2027 do you have any takes on this josh like i i'm freaked out about

Josh:
[9:26] This yeah well again we're getting to this point where like is this agi it depends on the definition but what we're seeing happening is is i mean we have humanity's last exam which it reached 40 something percent in but there are a lot of other

Josh:
[9:40] benchmark tests that are actually fully saturated meaning it scored a 100% on these benchmarks. I mean, there's actually no room for improvement in any of these. And I think that was something that was interesting to me is like, okay, how are we going to continue to measure the success, measure the improvement of these models in an objective way? Because we kind of are. And we have this.

Ejaaz:
[10:00] Yeah, we have the postage which is like,

Josh:
[10:02] Okay, first of all, number one across the board. So congratulations. But also we have a 88.9%, a 98.4%, 90%. These are like really, really high numbers where we're probably just one more iteration away from just fully saturating all of them. And that was what was interesting to me is like we really need to remeasure or re-index how we even classify these models because we're very much running out of time and then i guess the agi definition we've kind of said this in the last few episodes but it's i mean i don't really know like are we there is this it because if you ask someone a few years ago like sure this would totally be agi but today it's like probably not it doesn't feel like it but man it's really smart at just about anything a human can do yeah.

Ejaaz:
[10:46] It's pretty insane one thing i actually wanted to point out in this tweet josh is it has a it has something called a 256,000 context.

Josh:
[10:55] Now, I kind of want to, pun intended, set that into context on this show.

Ejaaz:
[11:00] Which is that that's like two novels worth of information that you can just chuck into a single prompt with Grok for. Now, think about what kind of practical context you can put that into. That means you can put a bunch of research papers of which you have no clue or understand nothing about and ask Grok to summarize it and relate to you in a way that you can understand. That is the difference between typing out a simple algebraic formula and kind of like learning how that builds into a massive scientific problem to just copy pasting the entire thing. And I think something like that is just super cool. But it's not just the context, it's how much it costs as well.

Ejaaz:
[11:38] If you look at this, it's $3 per million input, $15 per million output for tokens. That is, for context here, just incredibly cheap for what this model is achieving and for the benchmarks that it just broke. So I just thought that was super cool to point out.

Josh:
[11:53] And another part also in terms of cost is this is now a free product. You are actually able to use Grok4 right now, even if you don't pay for an account, you can go and actually access the Grok4 model. So I would encourage you if you're listening, even if you don't have an account with Grok, try it out. It is amazing. It is really smart. And one of the things that also stood out is when comparing it to O3, which I use a lot, or comparing it to Gemini 2.5, which is Google's offering, is that time to the first token feels significantly faster. So with O3, a lot of the complaints that I have and that other people have is it just kind of takes a little bit to get to you, like to get where you want to go. You ask a question, it thinks for a little bit. Sometimes it'll think for a minute. Sometimes it'll think for two minutes. Grok4 really spits out answers fairly quickly. So I think if you're building an app experience, if you're using this as a day-to-day model, just trying to query things against that, the timed token, that time to the first token is a really big deal. And it's noticeably different in this new model. And then there's another benchmark you haven't pulled up here. Which I really want you to introduce and share because there was one line in this in particular that kind of freaked me out and I'd love for you to just walk us through what's happening here on screen. All right.

Ejaaz:
[12:59] Okay. So we have Greg Kamrat, I think that's how you pronounce his name, who is basically the guy that manages this benchmark called Arc AGI. For simplistic terms, this is the AI AGI benchmark. So it's kind of measuring how close these AI models are to artificial general intelligence, which is like, you know, the precipice of where we want to get to with this entire AI trend. And he says, we got a call from XAI 24 hours ago. And he puts in quotation marks, we want to test Grok4 on RKGI. We heard the rumors. We knew it would be good. We didn't know it would become the number one public model on RKGI though.

Ejaaz:
[13:38] Here's the testing story. And then he goes on to explain how he spoke to the XAI team. He kind of explained the rules and he said, hey guys, we're going to set the rules here. You can't manipulate it in any way. And the reason why I say that is a lot of AI model providers have been rumored to manipulate score results to kind of like make them seem like the models are much better than they are. But this here we have a kind of authentic case of the model team coming to the benchmark provider and saying, hey, we're good to go. Throw us anything you've got. And let's see how well our model does. We back it. We know it's going to do very well.

Josh:
[14:11] And so he goes.

Ejaaz:
[14:13] Exactly. And so he goes, they were on board. So we got started. And he goes, there was some initial kind of errors in terms of like setting it up. but once it got going, it absolutely blew it out the water. And he goes, the previous top score was around 8% set by Opus 4. And he says below 10% is kind of noisy. And then he goes here, Josh, take the sentence. This is the one that you were writing.

Josh:
[14:37] So getting 15.9% breaks through that noise barrier. Grok 4 is showing non-zero levels of fluid intelligence. And if you're not familiar with what fluid intelligence is, it's basically, it's the capacity to reason abstractly. So like it's kind of the ability to solve novel problems and adapt to new situations without relying on prior knowledge or experience so this was the most interesting thing to me where i'm like hmm okay this is the first time where it's actually able to to solve novel problems which gets me to a point that elon actually mentioned later in the show or later in the presentation which was like hey we are actually really close to solving unique technical research unlocks through agi.

Josh:
[15:18] He said, I think the first new technology unlocks that will be learned through the Grok model will come next year. And then the first new physics breakthroughs will come the following year. So I think this is kind of the first step towards what Sam Altman often alludes to in the world of bioengineering, where he frequently says the thing he's most excited about is new bioengineering breakthroughs that are generated through an AI model. Well, Grok is now a contender in this as well, where I think we can very well expect to see genuinely novel technology breakthroughs and physics breakthroughs over the next 24 months. And particularly at this rate of acceleration that they have, that seems really exciting to me. And that was the thing that stood out of this whole thing is like, okay, we're actually at the point where we're right on the cusp of novel unlocks

Josh:
[16:05] due to these large language models, which was really cool. And then in addition to all of this, we had our episode yesterday where we shared our predictions. And I'm pretty happy with our predictions i think we did pretty well i don't want to say we fully knocked out of the park but we got like almost everything we said came true which is so high signal listen if you're listening here you're.

Ejaaz:
[16:22] Out of three or three out of four i would say so so not bad and some of the predictions were kind of out there some of them were technically moonshot predictions and we we kind of nailed it so i'm going to start with one of my moonshot predictions which was grok 4 was going to excel at gaming so not just cursor or vibe coding for any general application but grok 4 was specifically going to focus on letting anyone create the funnest, most engaging games. And from there, sprout some kind of like an app store plethora for gaming where anyone and everyone can share games, interact with each other. And the reason why I said that was

Ejaaz:
[16:59] Nothing novel, but like Elon was a massive gamer. That was literally my thesis. We were saying on yesterday's episode, he is the number one ranked player in, I think, Dota or whatever the game is, which is a highly strategic, pretty intensive game. And it just kind of like was well attuned with his characteristics. I was like, I bet you he's going to make a model that is super good at gaming. And in this post that I have pulled up here, that's pretty much what they spent 10 minutes on the live stream talking about. Grok will develop and play 3D games. So not just, We're not talking about Tetris here. We're not talking about tic-tac-toe. We're talking about real-life 3D games that, you know, you and I grew up loving, that kids nowadays love, Minecraft-type, Roblox-type games. You can now spin up in a matter of seconds or minutes. Not just that, but Grok will have good taste for fun games, meaning it'll understand what you're trying to pitch it instead of, like, giving you some kind of, like, black-and-white game with boxes or whatever. It kind of, like, senses your taste. It senses your vibe. It says that it'll have excellent video understanding, improved tool use, a gaming foundational model. That's super exciting because that's something that we haven't really seen being pitched by the major model makers. You know, we had this like niche indie gaming companies that are like, hey, we're integrating AI. We've had this popular gaming coding engine called Unity kind of spin up their own thing. But we haven't really seen the big boys kind of lean into gaming.

Ejaaz:
[18:22] X is doing that now. Grock 4 is doing that. This isn't out yet. Yeah. Do we know when this is coming out, Josh?

Josh:
[18:27] Yeah. So, I mean, Elon's prediction, the first real AI video game in 2026. I want to add some commentary to the video game thing because I think it's actually more impressive than what people realize. When you're designing and developing games, the actual code to generate the game is not the hardest part. You could kind of ask a game engine or an AI model to generate you a copy of Flappy Bird, generate you a racing game, generate you kind of whatever generic game you want, even a first-person shooter. And there were some examples that people used of first-person shooters the difficult part of building a good game is the environment around you it's nailing the physics it's nailing the textures it's nailing the actual design of the visual elements because by all means games are reinventing the physical world in a digital space and it's really difficult to emulate the physics the design the lighting the texture everything that kind of makes base reality look real so one of the interesting things that they're doing with this new gaming model whenever it gets released whenever the capabilities really come into full form is they are going to allow it to work together.

Josh:
[19:25] Existing game engines like unity and he does we actually talked about this a week or two ago where you you asked the difference between like a vo3 versus a unity engine in terms of generating content and vo3 is very much trained on the perception of physics meaning it's it's seen a lot of videos and it can kind of guess how physics work based on its perception but a game engine like unity it's actually hard-coded with a physics engine with a lighting engine with all the things that make games look real because it's been taught how to use like how to recreate this reality and you kind of see it with the new gta trailers the worlds now look incredible so what grok is doing is it's pairing these tools together so it's pairing the the generative part of it with the like hard-coded super high quality part of it and those two things when combined together can make for some really amazing experiences because it takes the hardest part of gaming out of the equation, which is designing the world around you. And it just gives this model a real life physics engine. And that's going to be freaking awesome.

Ejaaz:
[20:27] It's a really strategic move from Elon and the XAI team as well, isn't it? So from an infrastructure level, what you're basically saying is it's not trying to own the entire stack. It's just trying to own the brain. And it's welcome to inviting or integrating other tools like Unity or any other coding generators that are really good at nailing the physics, as you say, within its tool stack, right? It seems like its goal is just to make it easiest to make the coolest games. And I can't help but think that, you know, Elon's original vision when he kind of renamed Twitter to X was, I want it to be the everything app. And we said this on yesterday's episode, the everything app right now is WeChat that operates in Asia where people can do all their finances, they socialize, they play a lot of games. And we haven't really had that app in the West. And it seems like X might end up being that app, because I'm convinced now that the next step is surfacing these games to anyone and everyone. And so you can kind of like go on to an old school mini clip or Apple App Store like experience and browse the top games that are trending at that moment and interact with them in real time, maybe even with your friends as well. But Josh, I also want to mention these other two sneaky points that he's mentioned down here, which is first half hour watchable TV 2025.

Ejaaz:
[21:45] So what he's saying here is, like you watch these regular sitcoms that appear on Netflix or Apple TV every day, where they're kind of like half an hour episodes, you can now have fully AI generated episodes. So what he's implying here is I'm guessing it's going to be super easy to create these kind of narratives and directed scenes similar to a Hollywood-style VFX

Ejaaz:
[22:07] studio, but for nothing, straight from your X account. So he's kind of like...

Josh:
[22:12] Not only taking on the gaming sector.

Ejaaz:
[22:14] But he's taking on the Hollywood sector all with one single model, which is just insane. And then he says here, first watchable AI movie, 2026. I've got a bunch to say on this, but Josh, please like take the mic. You go first. Yeah.

Josh:
[22:27] So they have like this, this very clear roadmap of everything they want to destroy. It's like, okay, Grok 4 is released today. They have the coding model coming in August. They have the multimodal agent in September. They have video generation, which is what we're discussing now in october and every single one of those is going to sequentially and like in a way that compounds get better and better and better it's just i'm curious why you think the ai video generation is so impressive because we've kind of seen this with vo3 that's the first version that we had that had a lot of like that had audio really so the characters that you were making could talk it had spatial awareness so if you were to like cut something or interact with something it would emulate the perceived sound so what do you think the impact of grok for doing this i mean presumably better, will have on the world of entertainment.

Ejaaz:
[23:11] I think Grok 4 is going to nail the AI episodes, the AI movies better than anyone else. Not necessarily because it's a better model, but because it's going to copy all the best traits of all the other video models, Josh. Okay. And this is not something that is uncommon with other AI model providers, right? We've seen the likes of OpenAI copy some of the coding training methods that Anthropic did with Claude, and now it's become like a really good coding model. We've seen Anthropic do vice versa with open air. We've seen Metalama do the same thing. So there's a history of, you know, mimicry is the highest form of flattery, blah, blah, blah.

Ejaaz:
[23:48] I think Elon has looked at Google's VO3 and said, huh, the visuals are really, really accurate. It's really high fidelity, but there's no character consistency. And then he looks over at MidJourney and their recent model and he's like, huh their video aesthetics isn't as good as VO3 but their character continuity is really good wow look at that anime episode that I've just watched so I think he's picking and choosing all these different things Josh and he's bunging it into Grok 4 I think that's what he's going to launch he's he's not necessarily going to launch a higher aesthetic model than VO3 but he's going to launch a model that has all these that combines all these different characteristics such that you can go on it and say hey uh i've generated this really cool anime character using mid-journey or whatever and i'm going to copy and paste it into my grok 4 model on my x account and i want it to now direct a scene for me using this one character that's kind of where i see this going what do you think

Josh:
[24:44] I'm all for it i think that's great i think it well if they're going to have a tv show by next year or the end of this year it needs to have character continuity so all these things that we are lacking right now, it must accomplish in order to have that. So in that sense, yeah, I totally think that's going to happen. And I'm really, really excited because I think, x ai has access to a lot of visual data that the rest of the world doesn't and i'm not sure how valuable it is but in the sense of like the tesla network i'm sure that data is available for training which is a lot of real world data they have a lot of factories they have a lot of robots they just have a lot of this weird real world data that is kind of proprietary to them and i'm hopeful will make a difference in in understanding the world i think that's yeah it's going to be interesting. We'll see. There also is one other thing that I wanted to mention before we wrap up, which I think is notable. And it's what they're offering because they're not just offering Grok 4,

Josh:
[25:37] right? There's another model here. It is called Grok 4 Heavy. And Grok 4 Heavy is really impressive because Grok 4 Heavy doesn't just rely on a single model. It relies on a series of agents that are kind of working together.

Ejaaz:
[25:52] To give you

Josh:
[25:54] The answer. Yeah. So multi-agent, multi-modality, multi everything it is really impressive it it takes a ton of compute actually so the cost of grok heavy is very expensive it's what i think three hundred dollars a month three thousand dollars a year so we're talking about a good amount this is probably the most expensive subscription that exists in an ai model right now but the outcome is the best in the world and when we showed those benchmarks a little bit earlier, it shows that Grok Heavy, when it has multi-agent models, will produce the single best answer in the world. So if you're doing research, if you're doing any hard problem solving, this will solve that. And the way it works is it basically takes a version of Grok 4, it clones itself into a series of these agents, and they all search for the answer to the same question that you asked. And then what they do is after they've come to a conclusion, they look at each other and they compare notes and then they form consensus on what the best answer is and then push that best answer forward. So what you'll oftentimes find if you're using a language model is that you'll get a slightly different answer every time you ask a question. So you could ask the same prompt and you'll get a different answer twice. And sometimes it'll be better than the other. And what this does is it provides the redundancy to guarantee that each answer is as close to the best answer as possible. And that was super interesting to me. So I don't have the GrokHeavy account. We're not paying $300 yet a month, but we might have to try this out for a demo because I'm really fascinated at how that's going to work.

Ejaaz:
[27:22] Dude, we're going to end up paying our entire rent on AI models. I'm paying, I think, what, $200 on OpenAI's premium tier plan or whatever it is. And it gives me access to all their cool features, their video models and agent thing. Grok's now, like Grok Heavy, you just said it was $300. That's insane. Okay, so my take on this is there's been a few experiments that were talked about recently, and I say the word experiment because that's literally what they were, to see how these different models would interact with each other on real life scenarios. So we spoke about it in, I don't know, like five episodes ago,

Ejaaz:
[28:00] One research group which put Anthropics Claude model, OpenAI's model, Grok, all in a room and said, hey, I want you to raise money for charity. Go, figure it out. You know, you're going to have access to any tool that you want. And what was funny about that little segment that we did was it described how some models were lazy.

Ejaaz:
[28:20] Some models were super practical and some models worked really well together. And on that last point, the models that worked really well together often gave a way better, I'm not talking about marginal, I'm talking about a way better response and output to the original query. They raised way more money for charity. They were way more entertaining and they were way more strategic. And most importantly, they would call each other out for the mistakes that they would make, right? So all these traits were specific to agents that work together or models that work together.

Ejaaz:
[28:52] That's why GrokHeavy is going to win. They've seen that pattern happen, Josh. So imagine you don't just work with one singular terminal saying, hey, can you figure out this research problem for me? It takes a research problem and in the back end speaks to million replications of that exact model, which runs off and does one part of the query, which runs off and does research on another part. It comes back with answers. You have an orchestrator agent, which evaluates the answers and the responses. And all of that happens in milliseconds or seconds rather, and gives you the best answer that you could have possibly get given that would have taken you days or months to figure out. Just insane.

Josh:
[29:31] It's amazing. And it's funny you use that example. I actually just shared a post with you, if you wouldn't mind pulling it up. And it is an example that they used from the presentation last night, which was using AI to make money. And the example that they used was a vending machine and they showed the benchmarks here where Grok, when, tasked with the problem of solving how can I make money with vending machines, they rolled this out virtually and it actually made a lot of money they sold 4,569 of these units more than double more than triple the second which is Claude Opus 4 so that begs the question you look at the net worth over time and, is much higher than the other models.

Ejaaz:
[30:17] Hang on a second, mate. This is a crazy chart. Isn't that cool? What is that? That's insane.

Josh:
[30:23] Yeah. So there's this world in which like, hey, it's now smart enough where I could actually conduct business on your behalf and kind of ideate and apply these ideas to the real world to generate money. It did really good. And you could see where the human falls in this. It's pretty disappointing. So the net worth of a human is $844. The next up is Claude at just over $2,000. And then we have Grok at $4,700. Grok sold 4,500 of these units, while a human sold 344. So in this particular example, Grok 4 is already an order of magnitude plus

Josh:
[30:58] better than a human at selling vending machines, at least. That's our benchmark. So it's just another example of how these things are just getting more aware they have more context they have more capability and again because of the reinforcement training that we talked about earlier in the show they just have the practical knowledge to apply these ideas to the real world and i think that's kind of what you're seeing highlighted in this chart is like damn it's pretty good like it's doing things in the real world and it's making a difference.

Ejaaz:
[31:25] All right, Josh, I want to get back to the predictions that we nailed because I just remembered that you made a banging one, which couldn't be more on point.

Josh:
[31:33] It's coming to Tesla's. Let's go. This is so exciting. Yeah. So yesterday we mentioned like, hey, I'd really love to see Grok in a Tesla. I did cheat a little bit because there's an account that I follow that shares the change logs within the apps. And it showed there were some mentions of Grok last week.

Josh:
[31:49] There was no guarantee that it was going to be announced. And then Elon just this morning posted, Grok is coming to Tesla vehicles very soon, next week at the latest.

Josh:
[31:57] This is very exciting. I am very hopeful that it has. Yeah, I'm very hopeful that it has the things that we mentioned yesterday, which is multi-modality awareness. It can read from the cameras. It could hear you from the microphone. You could have a conversation with it. You could talk about things that you're seeing. It has access to your GPS and navigational data. So it can kind of interact with you, perhaps as you're driving around, give you a tour of a neighborhood. It could tell you of interesting places nearby. by. It could just converse with you about whatever you'd like. It can teach you things. It can entertain you by telling stories. It can just, you have this AI superpowered assistant now inside of these cars. And I think that's a really fun application of it, particularly when you think about robo taxis, because if you're getting into a robo taxi, you have this screen, which is a fun entertainment system. And you can watch like pre-created content. You could go on YouTube, you can go on Netflix. But now you also have this superpowered assistant inside that you can kind of converse with about anything and the idea i would assume is if if people aren't familiar when you get in a tesla even if it's another person's tesla you have a profile on your account and that profile will automatically sync to the car when you get in it so it will automatically adjust the seat it'll log you into the correct accounts it will change your temperature preferences to the way that you like and that also probably gets paired with your grok memory profile so it knows all the memory about you and when you get into a robo taxi that even if it doesn't belong to you it still has all the context of your past experiences.

Josh:
[33:21] That's going to be really fun because you just now have this hyper-personalized profile that travels around with you everywhere you go when you're in a car. So that was a fun prediction that is seemingly coming in the next seven days.

Ejaaz:
[33:31] I mean, I said this on yesterday's episode, but the multi-modality point is a really important one because it means that your AI is going to be everywhere that you go. And that's ultimately where we're heading, right? Like we went from desktop computers to smaller computers called laptops that were portable but you still had to open up to these tiny you know metal slabs that you can kind of like use wherever you are right and interact and socialize and all the likes but it's still clunky you know i need to pick it up i need to open up apps and stuff and then ai just kind of like spun blown all of that out the water but the thing with ai is you need to tell it stuff you know you need to tell it about yourself you need to explain the context of things and now you have this kind of like all-in-one ai model that not only sits on your social media feed and sees all the things that you like sees all the people that you follow sees all the things that you search but it's also your personal assistant it's also your therapist and now it can also be your eyes right so if it jumps in your tesla car it's seeing everything that you see it might even point out different kinds of shops or historical sites that it knows you might like and say, hey, you should take a right down here and you'll have a more scenic route or whatever that might be. And I'm not going to bother to try and opine on what kinds of new experiences that's going to unlock right now because I need to think more deeply about it, but tremendously excited about what this is going to become.

Josh:
[34:58] Yeah, it's going to be really cool. I think Grok 4, the announcement we got last night, is very much the starting point and it kind of laid out the roadmap for where we want to go. So next week when Tesla gets Grok, it's probably not going to have the multi-modality in fact they said they were going to try to roll that out sometime in september we have the coding model in august we have the video generation in october but i think it's safe to say that by the end of this year this form of grok this version of grok will be feature complete and that's going to be a very different world than we're living in today i mean we saw what happened when vo3 came to the market how quickly video content changed and how it's now, like even this morning, I saw this viral video from Popeye's and it was generated by a guest that we had on the show a few weeks ago. And now they're in direct competition with McDonald's and it was generated for like a couple hundred bucks from a dude in his office. And it's like, that was not even possible to do two months ago. Like we're talking a matter of weeks. So as these tools roll out, as we get new game generation, as we get this new coding model, this new video generation that understands the world and can apply the Unity engine, the Unreal gaming engines that we used to see AAA video games. Yeah. Like we're going to have some pretty amazing new stuff to be entertained by to create ourselves. It's going to get really crazy really quick. And I think that was kind of the idea that.

Josh:
[36:18] Elon opened up the presentation with is like hey we are very much in the big bang the big bang like time of the intelligence boom and we are like very very early stages and to go back to the chart that we started with the rate of acceleration the velocity at which these things get better is so fast and if you imagine i mean yeah here's the chart if you imagine we were at grok 2 less than 12 months ago grok 2 by today like you couldn't even pay me to use it it's so bad so if we continue that rate of acceleration the rate of velocity and just extrapolate it out 12 more months i mean the world's totally different place because grok 4 will then be this like kind of dumb model that's stupid that like probably fits on your phone but even though it does you don't even want it anymore and it's like it's getting really good and this is where we start to get those second order effects occurring where it's like hey you start to get novel technology breakthroughs novel physics breakthroughs, novel bioengineering breakthroughs. And all of those things are seemingly coming at a rate that I think is going to be surprising to a lot of people.

Ejaaz:
[37:21] I mean, I couldn't agree more. I think the general theme of these AI developments over the last two years that I've been kind of like heads down studying this, Josh, is we are in the Wild West. And every time I think one model has ended all the others, like it'll never be beaten, i.e. My own words literally within the week and and so and i thought that we'd reached that point about two months ago where they would talk about how the new compute clusters would require billions and billions potentially trillions of dollars of money so they had to raise funds where we were running out of data do you remember that josh and everyone was like ah these models are all going to reach a certain level of intelligence and it's all going to become a commodity. And I just keep eating my words. The graph just keeps going up. And I'm waiting for it to stop. I'm waiting for NVIDIA's market cap to flatten. It's just not. It's worth more than the UK's entire economy right now. It's above $4 trillion, which is 14% more than the British economy. My home, where I'm from, which is just a first world country, an insane thing to even say on this show.

Ejaaz:
[38:29] So the general theme is, I just need to keep setting the ball high, basically.

Josh:
[38:33] I think that's the trend is if you're listening to this and you are following AI closely and you're here for the day to day, expect things to continue to move faster. And as fast as they are today, again, you need to re-index this. They're going to move faster. So for the people who are still listening, thank you. We very much appreciate you sticking with us, being here for the ride. There's a lot of stuff to look forward to. And I just kind of want to take a second to highlight what we are going to be talking about. And that's coming down the pipeline soon. So we have ChatGPT 5, which is confirmed. That's coming this summer. That is going to probably beat Grok 4. It's probably going to be better. It is going to be incredible. Then next week, OpenAI is actually open sourcing a model. So we have that to look forward to. New Claude has been spotted. Claude 4.5, possibly. It's been out in the wild. It's been rumored. And then we have Gemini 3.0, which has also been spotted in the wild. And these are a lot of really big models. So I think for the past few months, we had this breather where it was like, okay, nothing really has come in terms of frontier models. We've been using O3 for like quite some time now. I think that's all about to change in the next few months. So if you're listening to us, buckle up. There's a lot of acceleration, a lot of AI, a lot of intelligence to come. Again, thank you for the comments yesterday about sharing preferences. Some people liked The Daily Show, some people didn't. We're just going to continue to iterate. I think today, the episode works perfect. By the afternoon, you should have all the news. So thank you for listening. Thank you for sharing. Thank you just for making it here.

Music:
[39:55] Music

Grok-4 Is Now The Smartest AI Model In The World | Everything You Need To Know
Broadcast by