World Models: What are They, and is Google's Genie 3 AGI?
Josh:
Google just released something that I think may be more important to creating
Josh:
AGI than ChatGPT was itself.
Josh:
They released an AI that doesn't predict text, it simulates reality.
Josh:
You describe a world and it builds it in front of you in real time,
Josh:
720p, 24 frames a second, just like a video, except there's something a little bit different.
Josh:
You walk through it, you drive through it, the world responds to you.
Josh:
And it was so good that within one week, Waymo, which is a self-driving car
Josh:
company worth $126 billion, they took that technology and built the most advanced
Josh:
driving simulator ever created.
Josh:
They can now simulate things that the cars have never seen, like tornadoes or
Josh:
elephants in a highway or snowstorms in cities that never get snow.
Josh:
So today we're going to break down what world models actually are,
Josh:
why some of the biggest companies on earth are racing to build them,
Josh:
and why this might be the most important shift in AI since the Transformer.
Ejaaz:
Yeah, Demis Hassabis, the head of Google DeepMind and Google's AI efforts,
Ejaaz:
said something a few weeks ago that I haven't been able to get off my mind,
Ejaaz:
which is world models will be the single most important thing that Google focuses
Ejaaz:
on this year to take us to AGI.
Ejaaz:
And all I could think of was this is just like a video generation model, right?
Ejaaz:
And then as we went down into the weeds for prepping for this episode,
Ejaaz:
I realized that it's so much more.
Ejaaz:
But I think we have some visual examples that we kind of want to show the audience
Ejaaz:
first to kind of demonstrate what we mean.
Josh:
Oh, this one is so cool. So what we're seeing on screen is two versions of the video example.
Josh:
One, a person kind of building a box. It's a child.
Josh:
They build a box, they build a little toy it looks like this walking little
Josh:
piece of cardboard they drop it into genie 3 and it,
Josh:
puts it into your own worlds but in a
Josh:
way that's fully controllable like a video game so imagine you have a
Josh:
favorite stuffed animal or a doll or in the second example a cat
Josh:
that lives in your apartment you can scan in not
Josh:
only your apartment but your cat and then you could walk around
Josh:
your apartment as if you are your cat engaging in
Josh:
this real world's kind of pseudo reality
Josh:
it's it's really bizarre watching this work and this other example
Josh:
that we're seeing now proves that you can actually engage with the
Josh:
world around you so what we're seeing is someone that looks
Josh:
like they're walking through a neighborhood but they walk up to a car and you could actually
Josh:
open up the door and you could look inside the car
Josh:
and the hand kind of reaches out and grabs it and i think what's so amazing
Josh:
about this model is it's it's different than any other model that we've ever
Josh:
seen because it actually has this deep understanding of reality it merges base
Josh:
reality with this virtual reality and i think a way that's like very one it
Josh:
shows how smart it is and how well it understands physics but two,
Josh:
I think there's a lot of downstream implications towards training better AIs
Josh:
on this that Genie 3 is going to use.
Ejaaz:
There are a ton, actually. And I was trying to think about how explicitly to
Ejaaz:
explain the difference between an LLM, which you and I and our listeners know
Ejaaz:
very well, and a world model. And I think it's the following.
Ejaaz:
Think of an LLM like a nerd that has never left the library.
Ejaaz:
He has read every single book. He understands the world in theory,
Ejaaz:
but he's never stepped out of
Ejaaz:
the library. So he's never experienced the world for what it actually is.
Ejaaz:
He understands how grass looks and what it feels like, but he's never touched
Ejaaz:
it, felt it, seen the way it changes form when he brushes his hand through it, right?
Ejaaz:
A world model, on the other hand, is a lived and experienced version of AI.
Ejaaz:
It understands the physics and simulated reality of the world.
Ejaaz:
And it understands how like water flows, how people can jump through the air.
Ejaaz:
And it understands the physical reality of like us humans as we experience it, right?
Ejaaz:
And that's the core difference that we see here. Now, Genie 3,
Ejaaz:
which is Google's flagship world model, can basically do this,
Ejaaz:
but at 24 frames a second at 720p, which is the big breakout that everyone's freaking out about now.
Ejaaz:
Now you can just go from a prompt to a simulated reality that it creates in
Ejaaz:
real time that is physically accurate. And that's so cool.
Josh:
Everyone knows that ChatGPT predicts the next word, but World Models predicts
Josh:
the next frame of reality.
Josh:
They simulate how these environments evolve and how your actions change them
Josh:
and the effects that your physical actions have on this digital world.
Josh:
And what we're seeing here is this really cool graphic that we actually, Ijaz,
Josh:
you had Claude build this up, that describes the difference between
Josh:
the two so if you look at the table it shows a few of
Josh:
the statistics whether it's interactive whether it understands physics whether
Josh:
it's generated what it's generated from and what
Josh:
you'll find is that the world model is unique in every way
Josh:
because it it reacts to the world that
Josh:
it simulates around it in an ai video when you're watching
Josh:
a sora video it's all been predetermined and you could put on your goggles
Josh:
and you could watch it as if it's virtual reality but it's all been predetermined
Josh:
by code and it is fixed with a world
Josh:
model that is all dynamic and it's generated on a
Josh:
frame by frame basis so now anytime you interact
Josh:
with something let's say you put some paint on a wall the paint actually
Josh:
sticks and it stays and the world model remembers it and it understands that
Josh:
the paint drips slowly down the wall because that's how paint works in the real
Josh:
world and i think game engines probably feel threatened by this because it does
Josh:
all the things that game engines did except it's infinite and it's fully customizable
Josh:
like that example with the cat where you can take a photo of your cat in your apartment
Josh:
and it'll scan it in and you could actually walk around your apartment,
Josh:
jump on your couches if you're your cat.
Josh:
That is a unique and novel thing only understood and only possible through world
Josh:
models. And that's what makes them really unique relative to anything else that we have.
Ejaaz:
Yeah, I think like a lot of people seeing these demos that we're showing just
Ejaaz:
kind of think like, okay, this is just a video generation model or it's kind
Ejaaz:
of like a game engine, right? What's the difference between this and GTA 6?
Ejaaz:
And the point is everything that goes into producing this thing.
Ejaaz:
Typically, it's extremely expensive. Like how long has it been since the last
Ejaaz:
GTA came out. It's been like, what, a decade now at this point?
Ejaaz:
And every year they keep on delaying it. And now you have a physical,
Ejaaz:
rather an AI engine based on physical reality that can do this in a matter of minutes.
Ejaaz:
Like right now you can go on, type in a prompt with a Google Ultra subscription
Ejaaz:
and minutes later, you have a 60 second interactive world that you can kind
Ejaaz:
of interact with, move around and kind of manipulate to your liking, which is insane.
Ejaaz:
And then with video models, remember, it's all predetermined, as you said.
Ejaaz:
Now, naturally, right, if you were a gaming company looking at this,
Ejaaz:
having spent five to 10 years building your next AAA game, you're going to be kind of scared.
Ejaaz:
And actually, that's exactly how the stock market reacted.
Ejaaz:
This is ugly. Look at this kind of summary of the top gaming stocks and their
Ejaaz:
reaction to Google releasing Genie 3.
Ejaaz:
Unity, which is the prime time gaming engine, dropped 30% in stock price.
Ejaaz:
Roblox, obviously the number one game amongst Gen Z and older,
Ejaaz:
dropped 13% in stock price, and some of the others dropped between 8% to 10%.
Ejaaz:
Now, this is a market reaction to basically Wall Street not understanding,
Ejaaz:
or rather understanding that these models can now emulate any kind of game in
Ejaaz:
a matter of seconds and how that affects gaming development is really important.
Ejaaz:
I read this wild stat in preparation for this episode, Josh.
Ejaaz:
It was an official report on the state of gaming from the Gaming Developer Conference in 2026.
Ejaaz:
And they said over the last two years, 33% of gaming developers had been laid off
Ejaaz:
most of them in the last 12 months. And that's because of technology like this.
Josh:
Yeah. And as an Halo fanboy for life, I have to highlight a pretty interesting
Josh:
example that relates to this, which is Bungie, the company that made the famous
Josh:
video game Halo that's lived on for 20 years in infamy.
Josh:
They started with a team of 40 people when they released that first game.
Josh:
And then they ballooned to 1300 people in 2023.
Josh:
And the reality is, is that you don't need that much.
Josh:
And there's so much bloat that exists in these companies that,
Josh:
I mean, shortly after they laid off some employees i'm sure they're going to continue to
Josh:
do so but the reality is is that you it doesn't take
Josh:
a huge team to build these things when it's
Josh:
just a matter of putting in the right prompts and iterating through something
Josh:
like genie which is a world model that can build any of this for you dynamically
Josh:
in real time it's so much more impressive and i think the market reaction to
Josh:
it is warranted like stocks probably should be going down the same way sas companies
Josh:
are while the market figures out and the gaming industry figures out how to
Josh:
re-evaluate this new tool set that that exists yeah.
Ejaaz:
I saw a really funny meme online where um someone said uh we got gta 6 before
Ejaaz:
we got gta 6 because someone had like recreated what gta 6 would probably look
Ejaaz:
like in arguably a better quality
Ejaaz:
or same quality as some of the demos for the actual case looks like
Josh:
Yeah it's pretty impressive but.
Ejaaz:
The truth of the matter is like they world models aren't just a one-off like
Ejaaz:
novelty They're going to be popping up everywhere.
Ejaaz:
And right now we've got kind of like six or seven kind of main startups and
Ejaaz:
big companies that are building them.
Ejaaz:
The likes of Google DeepMind with Genie, Runway, NVIDIA is working on their
Ejaaz:
own for robotics, specifically Roblox just announced their new world model.
Ejaaz:
Tesla's been using world models to train autonomous driving models for Tesla vehicles,
Ejaaz:
World Labs, and much more, which now makes me want to talk about some of the
Ejaaz:
other important use cases because this isn't just a story about gaming engines, right, Josh?
Ejaaz:
There are, or even like Hollywood effects. There's so much more that you can
Ejaaz:
do with this that I kind of want to touch upon.
Ejaaz:
The first use case that I'm super excited about and the video that we're showing
Ejaaz:
on the screen right now is not really related to this, but it kind of is.
Ejaaz:
Is robotics. Now, robotics is eventually where AI enters into the physical world.
Ejaaz:
Right now, all we're talking to is a chatbot.
Ejaaz:
But in order to kind of like make that jump from the virtual software into the
Ejaaz:
real world where robots are going to do helpful and useful things for us,
Ejaaz:
that is going to take some kind of a bridging model.
Ejaaz:
And that is the world model. So why is it important?
Ejaaz:
Well, robots are actually incredibly hard to train.
Ejaaz:
Why? Because an LLM doesn't work with a robot. You can't describe to it what it needs to do.
Ejaaz:
You need to show it the world and help it understand the world.
Ejaaz:
And so there are two ways to do that right now.
Ejaaz:
One, you can stick a robot in a room and tell it to fold the laundry,
Ejaaz:
and it can try a million times and fail a million times and then learn iteratively and slowly over that.
Ejaaz:
Or two, the more scalable way is you can generate a simulated world,
Ejaaz:
which is physically accurate to a room with clothes that need to be folded and
Ejaaz:
then just run it through a billion simulations at lightning speed.
Ejaaz:
And Google is actually doing this. So it created Genie 3 and then it has an
Ejaaz:
agent or a model called SEMA, S-I-M-A, look it up.
Ejaaz:
And it's a robotics model where they can basically just place into this simulated
Ejaaz:
environment and get it to run a million different scenarios.
Ejaaz:
And it's really meta. And it's kind of like the moment that kind of made the
Ejaaz:
point for me that Demis was getting at, which is, this is the massive step in
Ejaaz:
AGI, which was preventing robots from kind of like making that leap. And we finally have it.
Josh:
Yeah. And it's funny because I look at this and I can't help but think about meta.
Josh:
And how they shifted their name to Meta in anticipation of building a metaverse.
Josh:
And this to me feels like the closest thing we've ever seen to a metaverse.
Josh:
Now imagine this video we're seeing on screen where someone's walking around
Josh:
the woods with the GPS was collaborative.
Josh:
Imagine two people could share the same reality and it could be generated in
Josh:
real time. You could share the same spaces.
Josh:
I think this very much feels like that first step towards a metaverse.
Josh:
And then if you could put a human in there to navigate and move around,
Josh:
then surely AIs can watch how they move.
Josh:
They could learn from them. It creates this great training ground for like you
Josh:
said robotics it creates training grounds for anything another is
Josh:
like medical surgeries or driving like
Josh:
we're going to talk about soon with waymo and how you can train cars to do this
Josh:
and i think it's just this unbelievable progress in
Josh:
understanding that feels like the missing piece towards embodied ai
Josh:
and to your point it's it's not just an llm anymore it's much
Josh:
more than that and these videos like we're looking at a blockbuster demo right
Josh:
now very nostalgic very nostalgic very
Josh:
lifelike and again this is the worst it's going to be and one thing
Josh:
to note is that this model came out last year what was it last august or september
Josh:
like later in the half of last year and only now are they actually allowing
Josh:
the public to use it but this is currently last year's technology this isn't
Josh:
even what the newest version is going to look like that i'm sure they've been
Josh:
building on their new set of tpus and it's it's going to get so good so quickly yeah.
Ejaaz:
It's it's super impressive um the point that you made around medical surgeries
Ejaaz:
is actually like one that most people probably don't think this will affect but it 100 will
Ejaaz:
It's an incredibly precise set of skills that you need. It takes like how long
Ejaaz:
does the average surgeon train for? Like probably their entire life, right?
Ejaaz:
And a decade's worth of experience. And to have a robot that can kind of like
Ejaaz:
precisely do things and make minimal error, well, it reminds me of autonomous
Ejaaz:
driving, right? Like self-driving.
Ejaaz:
I remember when Tesla announced FSD was kind of like this impossible fee and
Ejaaz:
everyone was like kind of crapping on it. And now it is statistically better
Ejaaz:
than the average human driver, right?
Ejaaz:
So Tesla's proven this. Waymo is now doing it with their own world model.
Ejaaz:
And I'm excited to see what other applications this gets involved in.
Josh:
Yeah, I want to do a quick run through the specs that I wrote down here,
Josh:
just so everyone's aware of the limitations, because I thought this was interesting
Josh:
too. So it can generate these 3D worlds.
Josh:
Currently, it's limited to 720p resolution, which is the standard is 1080p.
Josh:
So it's just under true HD, what a lot of people would kind of refer to as.
Josh:
It's the first world model that allows for real-time interaction. we've
Josh:
spoken about this previous versions had a lot of lags and then
Josh:
this is unique in that you can describe your own world and choose
Josh:
how to explore it so one of the earlier examples you showed is that
Josh:
you can fly through it you can walk through it and it understands the
Josh:
difference in how you navigate through these worlds what it can't do it can't
Josh:
generate anything longer than 60 seconds so after you've generated 60 seconds
Josh:
of gameplay or visuals or whatever it stops it can't quite render legible text
Josh:
inside the game engine yet or the world engine yet so if you want it to type
Josh:
anything, it just can't.
Josh:
And it can't simulate real world locations accurately.
Josh:
So if you wanted to put yourself inside of Times Square, it won't actually get
Josh:
Times Square right just yet, which feels a little disappointing because they do have Google Maps.
Josh:
And I'm assuming it's just a matter of time.
Josh:
And we have this example here actually of an F1 car driving through Times Square, but it's not quite...
Josh:
Accurate it is not exactly how timesquare looks it's
Josh:
a perception of timesquare and i imagine that there
Josh:
will be a world in which like you know how you have the microsoft flight
Josh:
simulators where the planes can drive through like a one-to-one
Josh:
replica of the world using the google maps api i'm sure it's just a matter of
Josh:
time until they converge those two together that'll be really amazing and then
Josh:
there were other things that were in the august 2025 demo this one yeah here's
Josh:
a constraint when a person's walking up to a mirror and they look the other
Josh:
way and the mirror, the reflection.
Ejaaz:
Oh, there's like two of them. Like that, look, she pops out of the mirror now.
Josh:
Oh God, that's a little cursed. Yeah, so there's stuff they need to figure out
Josh:
still. But for the most part, it's a pretty killer model.
Ejaaz:
Yeah, and listen, like this thing will get better.
Ejaaz:
This is version one of this technology and it's already this good.
Ejaaz:
You can imagine what this is going to be like in version three, four, five and beyond.
Ejaaz:
And the iteration cycles between these model releases are getting so much quicker, right?
Ejaaz:
So we're probably going to have a much better model this year.
Ejaaz:
It wouldn't surprise me.
Ejaaz:
That being said, this stuff is also incredibly expensive, more expensive than I was thinking out here.
Ejaaz:
We have a tweet here from Mark where he says, Genie by Google DeepLine runs
Ejaaz:
on four H100 GPUs leaked from an internal presentation.
Ejaaz:
That's per instance. So every 60 seconds of video that you kind of want to generate,
Ejaaz:
It requires, I don't know if it's the total capacity, but a good chunk of four H100 GPUs.
Ejaaz:
Now, the average cost of each of these GPUs is around $25,000 to $35,000.
Ejaaz:
So it's not cheap enough for you. It's costing Google a lot of money to do.
Ejaaz:
But, you know, with all the scaling laws, I assume this cost will go down over
Ejaaz:
time. But one thing is pretty certain at this point.
Ejaaz:
Compute is a never-ending resource that we will need to power the future for
Ejaaz:
any of this. like any world models, chat GPT prompts or whatever you could think of. So,
Ejaaz:
It's a really prescient point where if we now want to train physical embodied
Ejaaz:
AI with robots, and we haven't even got V1 of that yet, we're just going to need a ton more compute.
Ejaaz:
That was the other thing that I thought of. Like, this is just going to be a
Ejaaz:
very expensive game to play, pun intended.
Josh:
Yeah. I mean, in addition to the consumer use cases, like what we're seeing,
Josh:
the commercial use cases are very real. Like you said, the embodied AI is a huge perk.
Josh:
And just last week, Waymo, they announced the Waymo World Model,
Josh:
which is built on top of Genie 3, but it's adapted for
Josh:
something very specific which is generating photorealistic driving
Josh:
simulations and what it means is it's
Josh:
able to create these like lidar point clouds and it could create
Josh:
all the camera outputs and yet like we're seeing on screen and
Josh:
it can actually create a simulated reality
Josh:
that the ai which is training on can't tell the difference between so
Josh:
if you show the ai self-driving model a
Josh:
representation created by genie 3 or by the real
Josh:
world it doesn't know the difference so they've essentially
Josh:
unlocked this this data stream that
Josh:
is capped only by the amount of gpus that they can use to
Josh:
render these scenarios out for the model so what we're seeing here
Josh:
and it's funny is is a weird anomaly there is an elephant in the middle of the
Josh:
road i can't imagine there's many times in which an elephant's in the middle
Josh:
of a road and the model now understands not only what an elephant is but kind
Josh:
of how it works the size of it and that you should not hit it and i think it's
Josh:
an interesting example to show how powerful this can be because a lot of the times,
Josh:
the constraints with this physical embodied AI is going to be that data set.
Josh:
It's going to be giving it examples that allow it to get better when navigating the real world.
Josh:
And now using this world model, at least in the case of full self-driving,
Josh:
Waymo was able to do this at scale in a place that they previously haven't had a whole lot of data.
Ejaaz:
Yeah. So I think if I had to summarize the impact of world models this year
Ejaaz:
in particular, it's going to help us make the jump from
Ejaaz:
digital AI to physical AI. And that's going to come in the form of robots that
Ejaaz:
help with really precise surgeries,
Ejaaz:
robots that help with manual automation in factories, and robots that kind of
Ejaaz:
help drive us from point A to point B with a 0% error rate. Hopefully we eventually
Ejaaz:
get to that rounding error.
Ejaaz:
I think this is the next step to get that kind of AGI level type of experience
Ejaaz:
because AI models just don't do a good job today of interacting with reality.
Ejaaz:
It describes reality really well. It tells me how a glass might fall off my
Ejaaz:
table and crash to the ground, but it can't simulate and emulate that.
Ejaaz:
And as humans, we take in everything through our eyes, ears,
Ejaaz:
and a million other sensory stuff.
Ejaaz:
So if we can get an AI model that can achieve all of those things,
Ejaaz:
we can just take that next magnificent leap to a better AGI-like type experience.
Ejaaz:
And I think world models is that step.
Josh:
Yeah, I agree. I mean, think about what we described today. Google,
Josh:
they built a foundation model that understands how the world works.
Josh:
Now Waymo is using it to train cars.
Josh:
Every single AI company in the world who's building robots needs it for training.
Josh:
Waymo is using it for training their fully autonomous cars. It's like every
Josh:
road leads to world models. And it feels like this is a paradigm shifting thing.
Josh:
It's like, in a way, world models are to AGI what the Transformer was to LLMs.
Josh:
And it's like the Transformer paper in 2017, it made everything in the world of LLMs possible.
Josh:
And now that we have Genie 3, it feels like it unlocks this whole new world
Josh:
of possibilities that were previously not available just in time to help train
Josh:
the robotics that are going to be coming into this world over the course of the next 12 months.
Josh:
So it's a really exciting development. And again, I think it's important to
Josh:
note that this came out in August of last year.
Josh:
They've only just recently allowed the public to start using it and engaging
Josh:
with it. And that's why you're seeing companies like Waymo adopting it.
Josh:
But imagine the progress that they've made since last August till now and how
Josh:
much better this is going to get so quickly. So it's something we'll be paying
Josh:
very close attention to.
Josh:
As we continue to navigate and explore the frontiers of AI and technology and
Josh:
all this crazy stuff that happens every week. That wraps up this episode.
Josh:
Thank you, as always, for listening. We have some exciting news in the fact
Josh:
that on Apple Podcasts, we just crossed the top 100 technology podcast, number 99 and counting.
Josh:
Thank you. A lot of that is in part to people who went and wrote a nice review.
Josh:
We got almost 50 new reviews last week, so thank you for that. That was so generous.
Josh:
If you have not written a review, let us know what you think.
Josh:
That's one of the best ways to improve our chances of climbing the
Josh:
charts second to sharing these episodes with your friends if you enjoyed them
Josh:
if you have a friend who is into full self-driving cars or gaming or anything
Josh:
that possibly could have had ties to today's episode send them the episode see
Josh:
if they like it see if they enjoy it as much as we did recording it thank you
Josh:
always for watching ijaz any final parting notes before we we sign off yeah.
Ejaaz:
If any of you listening to this have a Google Ultra subscription,
Ejaaz:
I really want you to try this model out.
Ejaaz:
I'm curious what your feedback is. Is it as cool as the videos and demos that we're seeing?
Ejaaz:
Josh and I don't have access to this right now, but if any of you do,
Ejaaz:
like, you know, maybe there's a better use case for this that you guys find.
Ejaaz:
Share with us. Let us know in the comments. DM us. We're on every social media
Ejaaz:
platform. And we'll hear from you soon. But until then, we'll see you on the next episode.
Josh:
See you in the next one.
Ejaaz:
Peace.
