World Models: What are They, and is Google's Genie 3 AGI?

Josh:
Google just released something that I think may be more important to creating

Josh:
AGI than ChatGPT was itself.

Josh:
They released an AI that doesn't predict text, it simulates reality.

Josh:
You describe a world and it builds it in front of you in real time,

Josh:
720p, 24 frames a second, just like a video, except there's something a little bit different.

Josh:
You walk through it, you drive through it, the world responds to you.

Josh:
And it was so good that within one week, Waymo, which is a self-driving car

Josh:
company worth $126 billion, they took that technology and built the most advanced

Josh:
driving simulator ever created.

Josh:
They can now simulate things that the cars have never seen, like tornadoes or

Josh:
elephants in a highway or snowstorms in cities that never get snow.

Josh:
So today we're going to break down what world models actually are,

Josh:
why some of the biggest companies on earth are racing to build them,

Josh:
and why this might be the most important shift in AI since the Transformer.

Ejaaz:
Yeah, Demis Hassabis, the head of Google DeepMind and Google's AI efforts,

Ejaaz:
said something a few weeks ago that I haven't been able to get off my mind,

Ejaaz:
which is world models will be the single most important thing that Google focuses

Ejaaz:
on this year to take us to AGI.

Ejaaz:
And all I could think of was this is just like a video generation model, right?

Ejaaz:
And then as we went down into the weeds for prepping for this episode,

Ejaaz:
I realized that it's so much more.

Ejaaz:
But I think we have some visual examples that we kind of want to show the audience

Ejaaz:
first to kind of demonstrate what we mean.

Josh:
Oh, this one is so cool. So what we're seeing on screen is two versions of the video example.

Josh:
One, a person kind of building a box. It's a child.

Josh:
They build a box, they build a little toy it looks like this walking little

Josh:
piece of cardboard they drop it into genie 3 and it,

Josh:
puts it into your own worlds but in a

Josh:
way that's fully controllable like a video game so imagine you have a

Josh:
favorite stuffed animal or a doll or in the second example a cat

Josh:
that lives in your apartment you can scan in not

Josh:
only your apartment but your cat and then you could walk around

Josh:
your apartment as if you are your cat engaging in

Josh:
this real world's kind of pseudo reality

Josh:
it's it's really bizarre watching this work and this other example

Josh:
that we're seeing now proves that you can actually engage with the

Josh:
world around you so what we're seeing is someone that looks

Josh:
like they're walking through a neighborhood but they walk up to a car and you could actually

Josh:
open up the door and you could look inside the car

Josh:
and the hand kind of reaches out and grabs it and i think what's so amazing

Josh:
about this model is it's it's different than any other model that we've ever

Josh:
seen because it actually has this deep understanding of reality it merges base

Josh:
reality with this virtual reality and i think a way that's like very one it

Josh:
shows how smart it is and how well it understands physics but two,

Josh:
I think there's a lot of downstream implications towards training better AIs

Josh:
on this that Genie 3 is going to use.

Ejaaz:
There are a ton, actually. And I was trying to think about how explicitly to

Ejaaz:
explain the difference between an LLM, which you and I and our listeners know

Ejaaz:
very well, and a world model. And I think it's the following.

Ejaaz:
Think of an LLM like a nerd that has never left the library.

Ejaaz:
He has read every single book. He understands the world in theory,

Ejaaz:
but he's never stepped out of

Ejaaz:
the library. So he's never experienced the world for what it actually is.

Ejaaz:
He understands how grass looks and what it feels like, but he's never touched

Ejaaz:
it, felt it, seen the way it changes form when he brushes his hand through it, right?

Ejaaz:
A world model, on the other hand, is a lived and experienced version of AI.

Ejaaz:
It understands the physics and simulated reality of the world.

Ejaaz:
And it understands how like water flows, how people can jump through the air.

Ejaaz:
And it understands the physical reality of like us humans as we experience it, right?

Ejaaz:
And that's the core difference that we see here. Now, Genie 3,

Ejaaz:
which is Google's flagship world model, can basically do this,

Ejaaz:
but at 24 frames a second at 720p, which is the big breakout that everyone's freaking out about now.

Ejaaz:
Now you can just go from a prompt to a simulated reality that it creates in

Ejaaz:
real time that is physically accurate. And that's so cool.

Josh:
Everyone knows that ChatGPT predicts the next word, but World Models predicts

Josh:
the next frame of reality.

Josh:
They simulate how these environments evolve and how your actions change them

Josh:
and the effects that your physical actions have on this digital world.

Josh:
And what we're seeing here is this really cool graphic that we actually, Ijaz,

Josh:
you had Claude build this up, that describes the difference between

Josh:
the two so if you look at the table it shows a few of

Josh:
the statistics whether it's interactive whether it understands physics whether

Josh:
it's generated what it's generated from and what

Josh:
you'll find is that the world model is unique in every way

Josh:
because it it reacts to the world that

Josh:
it simulates around it in an ai video when you're watching

Josh:
a sora video it's all been predetermined and you could put on your goggles

Josh:
and you could watch it as if it's virtual reality but it's all been predetermined

Josh:
by code and it is fixed with a world

Josh:
model that is all dynamic and it's generated on a

Josh:
frame by frame basis so now anytime you interact

Josh:
with something let's say you put some paint on a wall the paint actually

Josh:
sticks and it stays and the world model remembers it and it understands that

Josh:
the paint drips slowly down the wall because that's how paint works in the real

Josh:
world and i think game engines probably feel threatened by this because it does

Josh:
all the things that game engines did except it's infinite and it's fully customizable

Josh:
like that example with the cat where you can take a photo of your cat in your apartment

Josh:
and it'll scan it in and you could actually walk around your apartment,

Josh:
jump on your couches if you're your cat.

Josh:
That is a unique and novel thing only understood and only possible through world

Josh:
models. And that's what makes them really unique relative to anything else that we have.

Ejaaz:
Yeah, I think like a lot of people seeing these demos that we're showing just

Ejaaz:
kind of think like, okay, this is just a video generation model or it's kind

Ejaaz:
of like a game engine, right? What's the difference between this and GTA 6?

Ejaaz:
And the point is everything that goes into producing this thing.

Ejaaz:
Typically, it's extremely expensive. Like how long has it been since the last

Ejaaz:
GTA came out. It's been like, what, a decade now at this point?

Ejaaz:
And every year they keep on delaying it. And now you have a physical,

Ejaaz:
rather an AI engine based on physical reality that can do this in a matter of minutes.

Ejaaz:
Like right now you can go on, type in a prompt with a Google Ultra subscription

Ejaaz:
and minutes later, you have a 60 second interactive world that you can kind

Ejaaz:
of interact with, move around and kind of manipulate to your liking, which is insane.

Ejaaz:
And then with video models, remember, it's all predetermined, as you said.

Ejaaz:
Now, naturally, right, if you were a gaming company looking at this,

Ejaaz:
having spent five to 10 years building your next AAA game, you're going to be kind of scared.

Ejaaz:
And actually, that's exactly how the stock market reacted.

Ejaaz:
This is ugly. Look at this kind of summary of the top gaming stocks and their

Ejaaz:
reaction to Google releasing Genie 3.

Ejaaz:
Unity, which is the prime time gaming engine, dropped 30% in stock price.

Ejaaz:
Roblox, obviously the number one game amongst Gen Z and older,

Ejaaz:
dropped 13% in stock price, and some of the others dropped between 8% to 10%.

Ejaaz:
Now, this is a market reaction to basically Wall Street not understanding,

Ejaaz:
or rather understanding that these models can now emulate any kind of game in

Ejaaz:
a matter of seconds and how that affects gaming development is really important.

Ejaaz:
I read this wild stat in preparation for this episode, Josh.

Ejaaz:
It was an official report on the state of gaming from the Gaming Developer Conference in 2026.

Ejaaz:
And they said over the last two years, 33% of gaming developers had been laid off

Ejaaz:
most of them in the last 12 months. And that's because of technology like this.

Josh:
Yeah. And as an Halo fanboy for life, I have to highlight a pretty interesting

Josh:
example that relates to this, which is Bungie, the company that made the famous

Josh:
video game Halo that's lived on for 20 years in infamy.

Josh:
They started with a team of 40 people when they released that first game.

Josh:
And then they ballooned to 1300 people in 2023.

Josh:
And the reality is, is that you don't need that much.

Josh:
And there's so much bloat that exists in these companies that,

Josh:
I mean, shortly after they laid off some employees i'm sure they're going to continue to

Josh:
do so but the reality is is that you it doesn't take

Josh:
a huge team to build these things when it's

Josh:
just a matter of putting in the right prompts and iterating through something

Josh:
like genie which is a world model that can build any of this for you dynamically

Josh:
in real time it's so much more impressive and i think the market reaction to

Josh:
it is warranted like stocks probably should be going down the same way sas companies

Josh:
are while the market figures out and the gaming industry figures out how to

Josh:
re-evaluate this new tool set that that exists yeah.

Ejaaz:
I saw a really funny meme online where um someone said uh we got gta 6 before

Ejaaz:
we got gta 6 because someone had like recreated what gta 6 would probably look

Ejaaz:
like in arguably a better quality

Ejaaz:
or same quality as some of the demos for the actual case looks like

Josh:
Yeah it's pretty impressive but.

Ejaaz:
The truth of the matter is like they world models aren't just a one-off like

Ejaaz:
novelty They're going to be popping up everywhere.

Ejaaz:
And right now we've got kind of like six or seven kind of main startups and

Ejaaz:
big companies that are building them.

Ejaaz:
The likes of Google DeepMind with Genie, Runway, NVIDIA is working on their

Ejaaz:
own for robotics, specifically Roblox just announced their new world model.

Ejaaz:
Tesla's been using world models to train autonomous driving models for Tesla vehicles,

Ejaaz:
World Labs, and much more, which now makes me want to talk about some of the

Ejaaz:
other important use cases because this isn't just a story about gaming engines, right, Josh?

Ejaaz:
There are, or even like Hollywood effects. There's so much more that you can

Ejaaz:
do with this that I kind of want to touch upon.

Ejaaz:
The first use case that I'm super excited about and the video that we're showing

Ejaaz:
on the screen right now is not really related to this, but it kind of is.

Ejaaz:
Is robotics. Now, robotics is eventually where AI enters into the physical world.

Ejaaz:
Right now, all we're talking to is a chatbot.

Ejaaz:
But in order to kind of like make that jump from the virtual software into the

Ejaaz:
real world where robots are going to do helpful and useful things for us,

Ejaaz:
that is going to take some kind of a bridging model.

Ejaaz:
And that is the world model. So why is it important?

Ejaaz:
Well, robots are actually incredibly hard to train.

Ejaaz:
Why? Because an LLM doesn't work with a robot. You can't describe to it what it needs to do.

Ejaaz:
You need to show it the world and help it understand the world.

Ejaaz:
And so there are two ways to do that right now.

Ejaaz:
One, you can stick a robot in a room and tell it to fold the laundry,

Ejaaz:
and it can try a million times and fail a million times and then learn iteratively and slowly over that.

Ejaaz:
Or two, the more scalable way is you can generate a simulated world,

Ejaaz:
which is physically accurate to a room with clothes that need to be folded and

Ejaaz:
then just run it through a billion simulations at lightning speed.

Ejaaz:
And Google is actually doing this. So it created Genie 3 and then it has an

Ejaaz:
agent or a model called SEMA, S-I-M-A, look it up.

Ejaaz:
And it's a robotics model where they can basically just place into this simulated

Ejaaz:
environment and get it to run a million different scenarios.

Ejaaz:
And it's really meta. And it's kind of like the moment that kind of made the

Ejaaz:
point for me that Demis was getting at, which is, this is the massive step in

Ejaaz:
AGI, which was preventing robots from kind of like making that leap. And we finally have it.

Josh:
Yeah. And it's funny because I look at this and I can't help but think about meta.

Josh:
And how they shifted their name to Meta in anticipation of building a metaverse.

Josh:
And this to me feels like the closest thing we've ever seen to a metaverse.

Josh:
Now imagine this video we're seeing on screen where someone's walking around

Josh:
the woods with the GPS was collaborative.

Josh:
Imagine two people could share the same reality and it could be generated in

Josh:
real time. You could share the same spaces.

Josh:
I think this very much feels like that first step towards a metaverse.

Josh:
And then if you could put a human in there to navigate and move around,

Josh:
then surely AIs can watch how they move.

Josh:
They could learn from them. It creates this great training ground for like you

Josh:
said robotics it creates training grounds for anything another is

Josh:
like medical surgeries or driving like

Josh:
we're going to talk about soon with waymo and how you can train cars to do this

Josh:
and i think it's just this unbelievable progress in

Josh:
understanding that feels like the missing piece towards embodied ai

Josh:
and to your point it's it's not just an llm anymore it's much

Josh:
more than that and these videos like we're looking at a blockbuster demo right

Josh:
now very nostalgic very nostalgic very

Josh:
lifelike and again this is the worst it's going to be and one thing

Josh:
to note is that this model came out last year what was it last august or september

Josh:
like later in the half of last year and only now are they actually allowing

Josh:
the public to use it but this is currently last year's technology this isn't

Josh:
even what the newest version is going to look like that i'm sure they've been

Josh:
building on their new set of tpus and it's it's going to get so good so quickly yeah.

Ejaaz:
It's it's super impressive um the point that you made around medical surgeries

Ejaaz:
is actually like one that most people probably don't think this will affect but it 100 will

Ejaaz:
It's an incredibly precise set of skills that you need. It takes like how long

Ejaaz:
does the average surgeon train for? Like probably their entire life, right?

Ejaaz:
And a decade's worth of experience. And to have a robot that can kind of like

Ejaaz:
precisely do things and make minimal error, well, it reminds me of autonomous

Ejaaz:
driving, right? Like self-driving.

Ejaaz:
I remember when Tesla announced FSD was kind of like this impossible fee and

Ejaaz:
everyone was like kind of crapping on it. And now it is statistically better

Ejaaz:
than the average human driver, right?

Ejaaz:
So Tesla's proven this. Waymo is now doing it with their own world model.

Ejaaz:
And I'm excited to see what other applications this gets involved in.

Josh:
Yeah, I want to do a quick run through the specs that I wrote down here,

Josh:
just so everyone's aware of the limitations, because I thought this was interesting

Josh:
too. So it can generate these 3D worlds.

Josh:
Currently, it's limited to 720p resolution, which is the standard is 1080p.

Josh:
So it's just under true HD, what a lot of people would kind of refer to as.

Josh:
It's the first world model that allows for real-time interaction. we've

Josh:
spoken about this previous versions had a lot of lags and then

Josh:
this is unique in that you can describe your own world and choose

Josh:
how to explore it so one of the earlier examples you showed is that

Josh:
you can fly through it you can walk through it and it understands the

Josh:
difference in how you navigate through these worlds what it can't do it can't

Josh:
generate anything longer than 60 seconds so after you've generated 60 seconds

Josh:
of gameplay or visuals or whatever it stops it can't quite render legible text

Josh:
inside the game engine yet or the world engine yet so if you want it to type

Josh:
anything, it just can't.

Josh:
And it can't simulate real world locations accurately.

Josh:
So if you wanted to put yourself inside of Times Square, it won't actually get

Josh:
Times Square right just yet, which feels a little disappointing because they do have Google Maps.

Josh:
And I'm assuming it's just a matter of time.

Josh:
And we have this example here actually of an F1 car driving through Times Square, but it's not quite...

Josh:
Accurate it is not exactly how timesquare looks it's

Josh:
a perception of timesquare and i imagine that there

Josh:
will be a world in which like you know how you have the microsoft flight

Josh:
simulators where the planes can drive through like a one-to-one

Josh:
replica of the world using the google maps api i'm sure it's just a matter of

Josh:
time until they converge those two together that'll be really amazing and then

Josh:
there were other things that were in the august 2025 demo this one yeah here's

Josh:
a constraint when a person's walking up to a mirror and they look the other

Josh:
way and the mirror, the reflection.

Ejaaz:
Oh, there's like two of them. Like that, look, she pops out of the mirror now.

Josh:
Oh God, that's a little cursed. Yeah, so there's stuff they need to figure out

Josh:
still. But for the most part, it's a pretty killer model.

Ejaaz:
Yeah, and listen, like this thing will get better.

Ejaaz:
This is version one of this technology and it's already this good.

Ejaaz:
You can imagine what this is going to be like in version three, four, five and beyond.

Ejaaz:
And the iteration cycles between these model releases are getting so much quicker, right?

Ejaaz:
So we're probably going to have a much better model this year.

Ejaaz:
It wouldn't surprise me.

Ejaaz:
That being said, this stuff is also incredibly expensive, more expensive than I was thinking out here.

Ejaaz:
We have a tweet here from Mark where he says, Genie by Google DeepLine runs

Ejaaz:
on four H100 GPUs leaked from an internal presentation.

Ejaaz:
That's per instance. So every 60 seconds of video that you kind of want to generate,

Ejaaz:
It requires, I don't know if it's the total capacity, but a good chunk of four H100 GPUs.

Ejaaz:
Now, the average cost of each of these GPUs is around $25,000 to $35,000.

Ejaaz:
So it's not cheap enough for you. It's costing Google a lot of money to do.

Ejaaz:
But, you know, with all the scaling laws, I assume this cost will go down over

Ejaaz:
time. But one thing is pretty certain at this point.

Ejaaz:
Compute is a never-ending resource that we will need to power the future for

Ejaaz:
any of this. like any world models, chat GPT prompts or whatever you could think of. So,

Ejaaz:
It's a really prescient point where if we now want to train physical embodied

Ejaaz:
AI with robots, and we haven't even got V1 of that yet, we're just going to need a ton more compute.

Ejaaz:
That was the other thing that I thought of. Like, this is just going to be a

Ejaaz:
very expensive game to play, pun intended.

Josh:
Yeah. I mean, in addition to the consumer use cases, like what we're seeing,

Josh:
the commercial use cases are very real. Like you said, the embodied AI is a huge perk.

Josh:
And just last week, Waymo, they announced the Waymo World Model,

Josh:
which is built on top of Genie 3, but it's adapted for

Josh:
something very specific which is generating photorealistic driving

Josh:
simulations and what it means is it's

Josh:
able to create these like lidar point clouds and it could create

Josh:
all the camera outputs and yet like we're seeing on screen and

Josh:
it can actually create a simulated reality

Josh:
that the ai which is training on can't tell the difference between so

Josh:
if you show the ai self-driving model a

Josh:
representation created by genie 3 or by the real

Josh:
world it doesn't know the difference so they've essentially

Josh:
unlocked this this data stream that

Josh:
is capped only by the amount of gpus that they can use to

Josh:
render these scenarios out for the model so what we're seeing here

Josh:
and it's funny is is a weird anomaly there is an elephant in the middle of the

Josh:
road i can't imagine there's many times in which an elephant's in the middle

Josh:
of a road and the model now understands not only what an elephant is but kind

Josh:
of how it works the size of it and that you should not hit it and i think it's

Josh:
an interesting example to show how powerful this can be because a lot of the times,

Josh:
the constraints with this physical embodied AI is going to be that data set.

Josh:
It's going to be giving it examples that allow it to get better when navigating the real world.

Josh:
And now using this world model, at least in the case of full self-driving,

Josh:
Waymo was able to do this at scale in a place that they previously haven't had a whole lot of data.

Ejaaz:
Yeah. So I think if I had to summarize the impact of world models this year

Ejaaz:
in particular, it's going to help us make the jump from

Ejaaz:
digital AI to physical AI. And that's going to come in the form of robots that

Ejaaz:
help with really precise surgeries,

Ejaaz:
robots that help with manual automation in factories, and robots that kind of

Ejaaz:
help drive us from point A to point B with a 0% error rate. Hopefully we eventually

Ejaaz:
get to that rounding error.

Ejaaz:
I think this is the next step to get that kind of AGI level type of experience

Ejaaz:
because AI models just don't do a good job today of interacting with reality.

Ejaaz:
It describes reality really well. It tells me how a glass might fall off my

Ejaaz:
table and crash to the ground, but it can't simulate and emulate that.

Ejaaz:
And as humans, we take in everything through our eyes, ears,

Ejaaz:
and a million other sensory stuff.

Ejaaz:
So if we can get an AI model that can achieve all of those things,

Ejaaz:
we can just take that next magnificent leap to a better AGI-like type experience.

Ejaaz:
And I think world models is that step.

Josh:
Yeah, I agree. I mean, think about what we described today. Google,

Josh:
they built a foundation model that understands how the world works.

Josh:
Now Waymo is using it to train cars.

Josh:
Every single AI company in the world who's building robots needs it for training.

Josh:
Waymo is using it for training their fully autonomous cars. It's like every

Josh:
road leads to world models. And it feels like this is a paradigm shifting thing.

Josh:
It's like, in a way, world models are to AGI what the Transformer was to LLMs.

Josh:
And it's like the Transformer paper in 2017, it made everything in the world of LLMs possible.

Josh:
And now that we have Genie 3, it feels like it unlocks this whole new world

Josh:
of possibilities that were previously not available just in time to help train

Josh:
the robotics that are going to be coming into this world over the course of the next 12 months.

Josh:
So it's a really exciting development. And again, I think it's important to

Josh:
note that this came out in August of last year.

Josh:
They've only just recently allowed the public to start using it and engaging

Josh:
with it. And that's why you're seeing companies like Waymo adopting it.

Josh:
But imagine the progress that they've made since last August till now and how

Josh:
much better this is going to get so quickly. So it's something we'll be paying

Josh:
very close attention to.

Josh:
As we continue to navigate and explore the frontiers of AI and technology and

Josh:
all this crazy stuff that happens every week. That wraps up this episode.

Josh:
Thank you, as always, for listening. We have some exciting news in the fact

Josh:
that on Apple Podcasts, we just crossed the top 100 technology podcast, number 99 and counting.

Josh:
Thank you. A lot of that is in part to people who went and wrote a nice review.

Josh:
We got almost 50 new reviews last week, so thank you for that. That was so generous.

Josh:
If you have not written a review, let us know what you think.

Josh:
That's one of the best ways to improve our chances of climbing the

Josh:
charts second to sharing these episodes with your friends if you enjoyed them

Josh:
if you have a friend who is into full self-driving cars or gaming or anything

Josh:
that possibly could have had ties to today's episode send them the episode see

Josh:
if they like it see if they enjoy it as much as we did recording it thank you

Josh:
always for watching ijaz any final parting notes before we we sign off yeah.

Ejaaz:
If any of you listening to this have a Google Ultra subscription,

Ejaaz:
I really want you to try this model out.

Ejaaz:
I'm curious what your feedback is. Is it as cool as the videos and demos that we're seeing?

Ejaaz:
Josh and I don't have access to this right now, but if any of you do,

Ejaaz:
like, you know, maybe there's a better use case for this that you guys find.

Ejaaz:
Share with us. Let us know in the comments. DM us. We're on every social media

Ejaaz:
platform. And we'll hear from you soon. But until then, we'll see you on the next episode.

Josh:
See you in the next one.

Ejaaz:
Peace.

World Models: What are They, and is Google's Genie 3 AGI?
Broadcast by