The Coding Model Wars: Claude Opus 4.6 vs GPT-5.3 Codex

Ejaaz:
48 hours ago, Anthropic dropped Claude Opus 4.6, the world's most powerful AI model.

Ejaaz:
And literally 20 minutes later, OpenAI dropped Codex 5.3, which is not only

Ejaaz:
better, but also built itself.

Ejaaz:
Now, to say both of these models are powerful would literally be the understatement of the century.

Ejaaz:
By the time I'd eaten breakfast yesterday, one of the models had discovered

Ejaaz:
500 security flaws, which no one else had discovered before.

Ejaaz:
And by lunchtime, a bunch of software stocks were down hundreds of billions

Ejaaz:
of dollars out of fear that these models would replace entire teams.

Ejaaz:
And it's actually already happened. These models can replace a team of 50 software

Ejaaz:
engineers, rebuild Pokemon from scratch, and so much more.

Ejaaz:
And in this episode, we're going to be doing a live demo side by side to show

Ejaaz:
you which model is the best.

Josh:
Yeah, this is pretty cool. I wanted to spend a lot of time this episode kind

Josh:
of introducing people to these models, what they could do, how they work through

Josh:
demos that we're going to perform ourselves.

Josh:
These are definitely two frontier models but i think more importantly they're

Josh:
frontier coding models and when people hear that i

Josh:
think a lot of them get turned away because it seems like this complicated

Josh:
thing like you need to be a developer in order to use them and we

Josh:
are here to tell you that is not the case as from

Josh:
one non-technical person to another i fed this

Josh:
model a prompt i fed it some assets and

Josh:
then i pressed play and what i got is a

Josh:
side-scrolling game which was exactly what i asked for so on the screen now

Josh:
you're seeing the one shot prompt that i fed this model to ask to create a side

Josh:
scroller that was like mario that we can actually play so it has coins and i

Josh:
don't think the gravity quite works what you're saying is that it understands

Josh:
physics it is able to generate graphics and it plays like a pretty solid side

Josh:
scroller and i created this in five minutes,

Josh:
with one prompt and it actually works what.

Ejaaz:
Was the prompt that you used josh

Josh:
Yeah so i'll pause playing this game to

Josh:
actually show you the the prompt it was very simple it was this

Josh:
one paragraph i want you to make a game you can

Josh:
use python or c++ whatever you find the most convenient a 2d

Josh:
platformer that closely resembles super mario use the

Josh:
attached background image and sprites found in the

Josh:
asset folder take into account that the sprites don't come with transparent background

Josh:
but pink ones so you need to filter the background and for those who are

Josh:
watching you can actually see the sprites on my screen they were

Josh:
just a series of assets that there was no context given as

Josh:
to what each one of them was but the model reasoned through it it removed

Josh:
the background and it actually generated a pretty good representation of

Josh:
that now this was built one shot on codex which

Josh:
is the new open ai mac application that just released this

Josh:
week and i wanted to compare it to claude

Josh:
so i have another instance here on the screen with claude this is using opus

Josh:
4.6 the newest frontier model that they just released this week and i want to

Josh:
do an exact one-to-one comparison so i'm gonna launch the same exact prompt

Josh:
we're gonna have that cook on codex or we're gonna have that cook in claude

Josh:
code and in the meantime you just maybe we can kind of talk about more of what

Josh:
these models do and how they work well.

Ejaaz:
Before we do that actually um as you set this game up i ran it on claude opus

Ejaaz:
4.6 as well but with a slight twist okay

Josh:
Let's see your output what do we have okay.

Ejaaz:
Uh i don't know if you can see my screen

Ejaaz:
but it is the exact game that you just created but i don't know if those characters

Ejaaz:
look uh kind of familiar to you we have the uh hero protagonist character which

Ejaaz:
is uh my beautiful face and my beautiful person ejaz um and we have uh who's

Ejaaz:
this enemy over here that looks a lot like the bear guy

Ejaaz:
and listen we can double jump here josh and i think yep i can crush you but every time i mean this

Ejaaz:
Kind of jokes aside, this is insane. This took me like around three minutes to build end-to-end.

Ejaaz:
I used the exact same prompt that you gave me.

Ejaaz:
And we didn't have sprites ready-made of ourselves, right?

Ejaaz:
We didn't have like cartoon images of ourselves. So I uploaded an image that

Ejaaz:
we had taken, I don't know, like six months ago and said, hey,

Ejaaz:
can you make game avatars out of this?

Ejaaz:
It did it in 20 seconds. And then I said, could you add these to the game and

Ejaaz:
replace the enemy with Josh and the protagonist with Ejaz? And it did it in a minute.

Josh:
So here we go. That's pretty amazing. And these are really, these are just using

Josh:
standard desktop applications. So what you're using right here,

Josh:
this was done in Cloud Code, right?

Josh:
You just went onto Cloud, the MacBook, the Mac app. You downloaded it. You put in the prompt.

Josh:
You shared some assets. And now it built this amazing game in one single prompt.

Josh:
And we're actually going to experiment further in this episode where we're going

Josh:
to create a trading room that does actual real-time stock analysis.

Josh:
So as I'm curating the prompts and as we're getting ready for that second demo,

Josh:
maybe we could walk through what makes these models so exceptional.

Ejaaz:
Yeah, well, you might actually notice the first difference on screen right now.

Ejaaz:
If you notice, if you look closely, my avatar is kind of glitching out, right?

Ejaaz:
And if you compare it to your Codex game that you just coded up,

Ejaaz:
there's no glitches. It runs super smoothly.

Ejaaz:
And the main takeaway here is Codex 5.3 is a superior coding model to Anthropic.

Ejaaz:
And that's a sentence I never thought I would say, at least for the next couple

Ejaaz:
of years, because Anthropic has held that prestige and title for so long.

Ejaaz:
But since Code Red was initiated in open air around three months ago,

Ejaaz:
Sam has devoted pretty much all his resources towards building the best coding model.

Ejaaz:
And the benchmarks don't lie. It is a full 12 points on the software engineering

Ejaaz:
benchmark ahead of Claude Opus 4.6.

Josh:
That's a pretty significant difference.

Ejaaz:
So I've actually pulled up a more general comparison between the two models here.

Ejaaz:
And it summarizes it really well. So if we look at Claude's model,

Ejaaz:
Opus 4.6, what's good about it?

Ejaaz:
Well, they've 5x the context window.

Ejaaz:
So it's gone up to a million tokens or rather characters that you can put in

Ejaaz:
a single prompt, which if you want to understand how powerful this is,

Ejaaz:
you can just put way more information into your initial prompt.

Ejaaz:
It has much better context and memory. So you can end up cooking up much better

Ejaaz:
products overall, which is very, very impressive and important to have.

Ejaaz:
Number two, I would think about this as an orchestration model.

Ejaaz:
So if you look at specific benchmarks, it is beaten OpenAI at GDP eval.

Ejaaz:
GDP eval is a benchmark where they go out and they test a model's performance

Ejaaz:
at a really complex task versus a professional human that would normally do that task.

Ejaaz:
And the decision is, would you use the AI model or would you use the human?

Ejaaz:
And in this case, you would choose Claude 4.6 over humans way more than you

Ejaaz:
would choose OpenAI's latest model. So that's a really important thing.

Ejaaz:
And the point around Claude's latest model is that it doesn't code as well as

Ejaaz:
codecs, but it can orchestrate a bunch of agents and overall activity better than OpenAI.

Ejaaz:
Now, if you look at Codex and OpenAI's new models specifically,

Ejaaz:
It wins on the software engineering. It is simply a better software engineer

Ejaaz:
than Claude is, which is a massive flip around and shows that it's a testament

Ejaaz:
to how much resources and fine-tuning that OpenAI has been able to achieve.

Josh:
And to the note on the quality of the models here, my prompt is done in Claude

Josh:
code that I used, the same one that we used in Codex. And I'm going to run it

Josh:
here for the first time now.

Josh:
You can see on screen and we'll see what it looks like.

Josh:
So underneath, we have our Codex version, which looks beautiful.

Josh:
On top we have our brand new version that was just made by opus now i haven't

Josh:
tried this yet so we're going to see what happens when i press space to start,

Josh:
so it looks like opus has failed to create a

Josh:
floor so i am just falling through the floor until the game ends um okay so

Josh:
just based on this one demo alone this is a fairly significant difference where

Josh:
gpt's codex has created a beautiful side scroller it doesn't have gravity but

Josh:
i could just ask it to or it has gravity it's a little too much i could ask

Josh:
it to lower it opus doesn't even work at all,

Josh:
And again, the test was just a one-shot prompt. So I'm going to get back to

Josh:
work prompting it again to build this new application, the trading application.

Josh:
We'll follow up with that. But I think that's a funny kind of demo just to showcase

Josh:
that one actually is kind of superior in the other in this one use case, at least.

Ejaaz:
Yeah, I mean, you said it pretty clearly, which is Codex is the best coding AI model.

Ejaaz:
And I have to like, I can't emphasize that enough because OpenAI for a long

Ejaaz:
time was behind Anthropic and by a massive margin. and in some way,

Ejaaz:
shape, or form, they've been able to catch up.

Ejaaz:
Now, what's interesting here is both companies have focused on each other's goals.

Ejaaz:
So when Anthropic was typically meant to be the leading frontier model in coding,

Ejaaz:
it now has decided to focus on what OpenAI was really good at,

Ejaaz:
which is overall orchestration and being a better generalized model, right?

Josh:
They're taking each other's lunch. Yeah, exactly.

Ejaaz:
OpenAI has decided to eat Anthropic's

Ejaaz:
lunch and say, okay, we've got the generalized stuff sorted out.

Ejaaz:
Let's try and figure out the coding specific niche, highly defined,

Ejaaz:
professionalized functions. And it's produced the best coding model.

Ejaaz:
So it's kind of a weird win-win for both labs.

Ejaaz:
And what's awesome about this is they both now have really well-rounded,

Ejaaz:
but also very specialized models.

Ejaaz:
And the reason why this is important is, and this is like kind of maybe my hot take,

Ejaaz:
I don't think the coding models matter, Josh. I actually don't think the generalized models matter either.

Ejaaz:
I think they're both going off to something much bigger, which is creating the

Ejaaz:
operating system for the future of work.

Ejaaz:
They know that AI models and AI agents are gonna automate a ton of different

Ejaaz:
industries and the industries are only gonna pick you if you can do both generalized

Ejaaz:
work and hyper-specific work really well.

Ejaaz:
That is coding and orchestration and managing your data.

Ejaaz:
And now we have two amazing models dropped within 20 minutes of each other.

Ejaaz:
That does exactly that to the highest performance metric that we've ever seen before.

Josh:
They're pretty exceptional. So now for this next demo, I have it queued up here.

Josh:
What we're going to do is, what I did is ask the model itself to build me a

Josh:
prompt for this. So I wanted it to create me an AI stock portfolio war room.

Josh:
And I asked, hey, I want to create this, create me a fully fleshed out prompt

Josh:
that kind of should solve this problem with one shot.

Josh:
So what I do is I loaded it up here in our Cloud Code app.

Josh:
And then I also loaded it up into the codex app i created its own

Josh:
project folder and now i'm going to hit send so both of

Josh:
these things are thinking in real time we will check back

Josh:
in once their outputs are done and we'll compare again the second version

Josh:
which is more of a robust one i mean you'll see uh on

Josh:
the cloud screen it has this whole list of to-dos that it wants to do it has

Josh:
an entire plan there's nine different panels that it's going to build it's going

Josh:
to do risk analysis matrix and portfolio action bars and all this stuff so we'll

Josh:
let that cook and let's get back to what separates these what people have been

Josh:
freaking out about on the internet more as these things get going could i.

Ejaaz:
Take three minutes show you some wild demos yeah

Josh:
Let's see what the internet's been demoing while we wait for hours to cook okay.

Ejaaz:
Cool like listen our 2d mario inspired game was cool but imagine if i told you

Ejaaz:
you could recreate the entire pokemon game including levels cities characters

Ejaaz:
and creatures that you fight from scratch in about an hour and 30 minutes

Ejaaz:
That's pretty impressive. That's what we're looking at right now.

Josh:
Wow, it even has the fighting.

Ejaaz:
Yeah, yeah, yeah. And buttons and the multimodal gameplay.

Ejaaz:
And obviously this looks like it's been made by a child image wise,

Ejaaz:
but it's probably going to take you, what, another couple of hours to make a

Ejaaz:
really high fidelity game that you could probably run on your Nintendo Switch or whatever.

Ejaaz:
It is just so impressive that we can do these things.

Ejaaz:
Anyone can do these things with no previous background. Just upload a few images

Ejaaz:
or generate a few images and you can create childhood nostalgic games that are

Ejaaz:
worth billions of dollars, which is just super cool to see.

Josh:
Yeah, one of the cool things that I think it's really important to note is how approachable this is.

Josh:
Like for the recent example that we're having run right now on my screen,

Josh:
all I did was tell it what I wanted and ask it to develop the prompt with me.

Josh:
So even if it feels overwhelming, like you don't really know how to code,

Josh:
you don't know how to prompt things, you can actually just ask the model to

Josh:
help you generate the prompt, help explain to you how it works.

Josh:
And it's a really easy way to build basically anything you can imagine.

Josh:
It's not just games. It's productivity tools. It's CRM tracking.

Josh:
It's whatever you want it to be so i think that's really interesting but it

Josh:
also goes much more technical right i saw another crazy example with the compiler.

Ejaaz:
Okay so for for the tech nerds

Ejaaz:
out there that's been a lot of time coding you are going to

Ejaaz:
be wowed by this um for one of their uh flagship demos for uh opus 4.6 the anthropic

Ejaaz:
team decided to task the model with building a c compiler which is an incredibly

Ejaaz:
complicated execution tool that is required to code up some of the most craziest types of apps.

Ejaaz:
And they just walked away. And they just kind of like looked at it,

Ejaaz:
monitored it, made sure that it wasn't going awry.

Ejaaz:
And in two weeks, let me emphasize that,

Ejaaz:
Two whole weeks, 14 days, it coded nonstop and built this compiler.

Ejaaz:
Now, you might think two weeks is quite a long time. I want my thing done in an hour and a half.

Ejaaz:
Well, let me hearken back to history where previously, if you wanted to create

Ejaaz:
something like this, in today's world, it would take a team of around 50 or

Ejaaz:
so humans, and it would take them a few months to build from scratch. That's today.

Ejaaz:
But back in the day, it would technically have taken them around a decade to

Ejaaz:
build and like thousands of people.

Ejaaz:
So we have just kind of condensed the timeline to create really complicated

Ejaaz:
tools in a matter of hours or weeks in this case.

Ejaaz:
Now, the second thing I want to point out is the fact that these models can

Ejaaz:
go untouched for two weeks is just insane.

Ejaaz:
There was another stat that was released today by OpenAI with,

Ejaaz:
sorry, yesterday with OpenAI is 5.2, I think, 5.2 high, I believe,

Ejaaz:
where it can go pretty much 50% hit rate for 6.6 hours. a time horizon.

Ejaaz:
So that means if you gave it any kind of complicated coding task,

Ejaaz:
50% of the time in 6.6 hours, it would get that done, completely done.

Ejaaz:
And it would nail it 50% of the time, which is just such an impressive track

Ejaaz:
record when you look back a year.

Ejaaz:
And that time was, what was it like 30 minutes, maybe an hour.

Ejaaz:
So every iteration, we see this thing double. It's just so insane.

Josh:
Yeah, it's really, it's unbelievable and almost like intimidating how

Josh:
capable and competent it is even for someone who

Josh:
is a novel at writing code it's not about writing

Josh:
code it's about being able to generate whatever you want it to so like if you

Josh:
think of it you kind of in a way it abstracts the code away and allows you to

Josh:
just speak the english language and get what you want from speaking english

Josh:
and in a way that you understand and it will help walk you through the way one

Josh:
of the things that i love about cloud code in particular is the plan mode.

Josh:
If you leave a lot of things out of your prompt, it'll actually just continue

Josh:
to prompt you with additional questions to understand where you want.

Josh:
And one of the most fascinating things that I read about GPT's 5.3 codex in

Josh:
particular is like you mentioned in the intro, it helps build itself.

Josh:
And I don't think that can be overstated because this is the first model in

Josh:
the history of OpenAI that has helped with the building and construction of itself.

Josh:
And what happens as that starts to ramp up, right? If you think of each model

Josh:
iteration as a flywheel, what is the constraint?

Josh:
The two constraints are the speed at which a developer can actually build it

Josh:
and then create the test for it and make sure that it's safe to ready to deploy.

Josh:
And then it's the hardware that's required to actually train the model.

Josh:
What we're seeing with Codex and Opus, which I really believe was kind of Sonnet,

Josh:
is the incremental improvements.

Josh:
Now, for the incremental improvements that don't require an entirely new training

Josh:
run, the real constraint is the actual software and what you could squeeze out of it.

Josh:
And when you have a model that's helping you build this

Josh:
software that can think for 6 12 24 hours

Josh:
at a time even longer and that is it kind

Josh:
of creates this like self-fulfilling loop right where the models use the

Josh:
new models to make the new models the future models

Josh:
stronger and more powerful and better and i thought that was a really interesting

Josh:
thing to note is that this is the first self propagating model where it ran

Josh:
a lot of the test for itself it introduced new code that made itself better

Josh:
and as we continue to see that you can start to imagine that vertical that like

Josh:
exponential progress line going pretty close to vertical and things getting

Josh:
really good like really really quick.

Ejaaz:
I think what most people listening to this might think is that,

Ejaaz:
well, what was different before?

Ejaaz:
Well, previously, models would just kind of work in a very analog mode.

Ejaaz:
You would just point it at a problem

Ejaaz:
and it would just understand what the problem was and then solve it.

Ejaaz:
But it lacked that awareness and wider context as to like what the wider vision

Ejaaz:
and goal was to achieve and then figuring out stuff for itself.

Ejaaz:
You always had to kind of handhold it. But now with its ability to kind of like

Ejaaz:
understand what it's trying to do and look internally and say,

Ejaaz:
huh, I made that mistake because of this error in my code.

Ejaaz:
I'm going to now like rewrite my code and then I'll be better at it.

Ejaaz:
It kind of functions similarly to a human. Now, I actually saw a great analogy.

Ejaaz:
I forgot who wrote it, but it's

Ejaaz:
fantastic. where if you imagine yourself standing on a sidewalk, right?

Ejaaz:
And a Bugatti Veyron drives super fast by you at let's say 200 miles an hour,

Ejaaz:
you'll be like, wow, that's kind of fast.

Ejaaz:
And then two minutes later, another Bugatti drives by you at 300 miles an hour.

Ejaaz:
You'll be like, wow, that's kind of fast. But you wouldn't really notice the

Ejaaz:
difference between that 100 mile an hour difference, right?

Ejaaz:
But if you were in the car strapped in, you would notice it is significantly

Ejaaz:
improved. And that's how software engineers feel right now.

Ejaaz:
Now, if you're someone that doesn't code all the time, you're not necessarily

Ejaaz:
going to understand these impacts, but it's really important for those of you

Ejaaz:
listening to this to figure out that this is massively impactful and will change

Ejaaz:
the way that a lot of things are happening today.

Ejaaz:
I mean, just take a look at this, right? This is a direct quote from someone

Ejaaz:
who is building at a major tech company, Rakuten.

Ejaaz:
And the quote here says, Claude Opus 4.6 autonomously closed 13 issues and assigned

Ejaaz:
12 issues to the right team members in a single day, managing a 50-person organization

Ejaaz:
across six repositories.

Ejaaz:
Josh, do you know who else is responsible for doing that?

Ejaaz:
An entire team of product managers that each get paid a quarter of a million

Ejaaz:
dollars in compensation automatically.

Josh:
Minimum per year at least yeah their.

Ejaaz:
Jobs are automated now

Josh:
Well one of the earlier moments in

Josh:
which i realized this was pretty profound is is when claude co-work they

Josh:
said they built it with what just a hint like four people over the course of

Josh:
10 days and it was 100 built by the current model of claude which is opus 4.5

Josh:
at the time like the the amount of leverage from these tools is so high but

Josh:
it cuts both ways it's like if you can design and develop a product in 10 days,

Josh:
then that means another company can probably do that in five.

Josh:
And it starts to lower the competitive threshold for these companies to catch up.

Josh:
And it starts to raise the bar of what is possible.

Josh:
Like if you could build something that profound in 10 days, what can you build

Josh:
over the course of six months?

Josh:
Like, can you really build something fantastic that has a moat that like actually

Josh:
delivers on the total power that you have by leveraging this AI?

Josh:
It's going to be interesting to see because i mean what we're finding even with

Josh:
the the codex and opus dual launch is that these companies are right next to

Josh:
each other and if one publishes something,

Josh:
profound or something that attracts a lot of users they're just a few days and

Josh:
a few prompts away from copying it and that's like a pretty difficult thing

Josh:
to compete against on on the software front well.

Ejaaz:
That's why if we look at the stock market over the last couple of days like

Ejaaz:
it's down trillions of dollars and i'm not exaggerating if you look at microsoft

Ejaaz:
over the last two weeks, the stock is down 20%. It's trading like a meme stock, which is just insane.

Ejaaz:
And the reason why that is, is a lot of investors are anticipating that these models,

Ejaaz:
Specifically Opus 4.6 and Codex 5.3, will just create the tools that these billions

Ejaaz:
of dollars worth of SaaS companies have spent or valued their entire lives on

Ejaaz:
in a couple of seconds, just as you described.

Ejaaz:
Now, the counter argument to this, Josh, is, and Jets of Wine actually kind

Ejaaz:
of went live at a conference and spoke about this and made this point,

Ejaaz:
If you're an AI agent or AI model that is capable of building these tools, right?

Ejaaz:
Why would you rebuild the tool every single time you do a function?

Ejaaz:
Surely you would just access the best tool and use it.

Ejaaz:
So there's a bit more nuance where AI models aren't just gonna recreate your

Ejaaz:
entire software stack if you are at a Fortune 500 company.

Ejaaz:
That kind of doesn't make any sense. There are a bunch of tools that are hyper-optimized to do that.

Ejaaz:
But what it will do is it will connect all of these tools and silos in a much more effective way.

Ejaaz:
And maybe that requires rebuilding parts of it.

Ejaaz:
Maybe it requires kind of connecting different ways, but not rebuilding the entire tools.

Ejaaz:
And whatever operating system that ends up becoming will be the most sticky

Ejaaz:
and valuable company ever.

Ejaaz:
Now, that could be Salesforce, or it could be someone completely different,

Ejaaz:
a startup that we haven't even heard of. And I think that's really important

Ejaaz:
to understand, but people are experimenting.

Ejaaz:
And if you look at this graph right here, which is may not look insane to some,

Ejaaz:
but is insane to me at least, 4% of daily GitHub commits are now clawed code.

Ejaaz:
That was, I think, 5% of what it is today two months ago.

Ejaaz:
So the ascent has just been insane. These companies are adopting it and they are using it.

Josh:
Yeah, the number is just going to keep going up and there's no reason why it

Josh:
wouldn't. It's such a testament. One, the speed.

Josh:
It feels like we're strapped in that car and now we're flying.

Josh:
Two, an outsider might not look like it. It certainly feels like that

Josh:
on the inside and i think a lot of people are starting to notice this and get

Josh:
a little nervous about it too like look at this example on the screen right

Josh:
now this is a prompt from gpt 5.3 codex which basically created an entire minecraft

Josh:
clone in a single prompt and it looks awesome and it works really fast and it

Josh:
was super lightweight and

Josh:
And it says, I also tried on Opus 4.6, but for some reason it got stuck.

Josh:
But you can build anything that you want very, very quickly,

Josh:
like very cheaply as well.

Josh:
What Opus 5.3, or Opus 5.3, I'm getting them all mixed up.

Josh:
What GPT 5.3 Codex offered is double the rates, the double the token rates for

Josh:
the next couple of months.

Josh:
So you actually have the freedom for their $20 a month plan to go and build whatever you want.

Ejaaz:
Can I maybe deliver a hot take, Josh?

Josh:
Yeah, what do you got?

Ejaaz:
I think the most exciting part about these model releases aren't the models themselves.

Ejaaz:
Largely, I think the models are kind of similar in capabilities.

Ejaaz:
They are around the same coding benchmarks, and they can roughly do the same

Ejaaz:
things. They can spin up a bunch of agents and orchestrate themselves.

Ejaaz:
The bigger picture, which I think a lot of people missed, was both companies,

Ejaaz:
Anthropic and OpenAI, are at war with each other.

Ejaaz:
And they're trying to basically build and own the operating system for work,

Ejaaz:
which isn't just a model. it's a software suite.

Ejaaz:
So this week alone, OpenAI didn't just release this new model.

Ejaaz:
They released the Codex app, which is a desktop Mac app, which is kind of like

Ejaaz:
a command line interface, which makes the coding experience way better.

Ejaaz:
And they also launched an enterprise platform called Frontier,

Ejaaz:
which allows Fortune 500 companies to basically take this magical model and

Ejaaz:
give it to non-coders and let them do magical things. Now,

Ejaaz:
All of these products together creates a very sticky experience where it starts

Ejaaz:
to make sense for software engineers and non-software engineers to use these products.

Ejaaz:
And it becomes incredibly sticky, which results in billion-dollar contracts, right?

Ejaaz:
Anthropic has done the same thing over the last two weeks.

Ejaaz:
They released Claude Cowork, they released agent teams this week,

Ejaaz:
and then they released this new model.

Ejaaz:
They're going after the same thing, which it kind of makes sense why they're

Ejaaz:
releasing Super Bowl ads that are kind of shitting on each other now.

Ejaaz:
It makes a lot of sense. And so the point is, if they can own this operating

Ejaaz:
system, this future of work, they will basically be the most valuable company.

Ejaaz:
And I think it's going to be when it takes most.

Josh:
I have to interrupt you here. We have some developments on our prompts that

Josh:
we've been working on, our AI stock war room. Let's go. That I'm going to have

Josh:
to share on the screen right now.

Josh:
So currently what it's doing is it's asking to do some quality assurance testing.

Josh:
So you'll see it actually used a it's taking over control of my browser and

Josh:
it's asking to make prompts on the screen. So you can see all of this that you're

Josh:
seeing right here is generated live, and it's doing an actual real-time debug

Josh:
of the product that it made.

Josh:
It's clicking around, it's resizing things, it's going through the links,

Josh:
and it's running real quality assurance testing on the actual product.

Josh:
It's really amazing to see.

Josh:
This was all just built all these visual charts and they're all accurate so

Josh:
right now we're looking at nvidia we have a chart and i'm not going to mess

Josh:
with it because it's doing the real-time manipulation to do quality assurance

Josh:
checks but it's actually clicking through it's making sure the

Josh:
stats are accurate it's making sure all the widgets work and look it has this

Josh:
amazing graphs already it has sentiment analysis 85 percent of people are bullish

Josh:
on nvidia it has recent signals from the news it has the assessment a risk assessment

Josh:
matrix where it shows the like export controls and chip controls.

Josh:
It has revenue and earnings every single quarter, charted, competitive moats.

Josh:
It has sector comparisons. It's like, this is unbelievable.

Josh:
And it just generated this in a single prompt. And I just find it really funny

Josh:
that we can actually watch this do it in real time.

Josh:
So you'll see in this prompt, it's clicking through, it's taking screenshots of what it's seeing.

Josh:
And then it's digesting, analyzing, and understanding what it made,

Josh:
what it messed up and what it actually still has left to finish.

Josh:
And it generated everything, all of this in real time as we're recording this episode.

Josh:
So fascinating.

Ejaaz:
Wow, it reminds me of some of the research platforms at the former companies

Ejaaz:
that I used to work at and they would pay, I'm not joking, millions of dollars

Ejaaz:
a year to get access to these types of platforms that would give them analysis

Ejaaz:
like what you're showing on the screen right now.

Josh:
And you just built it from scratch. From scratch, and look, it's doing this.

Josh:
I'm not even touching my keyboard. I just searched for Apple and now I'm sure

Josh:
if I go over to the prompt,

Josh:
it's taking screenshots of apple it says apple dashboard

Josh:
looking great let me scroll to see the new three column button row layout and

Josh:
it's checking the button rows and it's really unbelievable like we have the

Josh:
investment thesis the bull case for it the bear case for it catalyst and timelines

Josh:
it has wwdc built in it has the iphone 18 launch props um set up for september,

Josh:
It's like so cool. It's absolutely unbelievable. And now this is a real tool

Josh:
that I'll be able to use to type

Josh:
in whatever stock I want to look at and actually get some analysis on it.

Josh:
Now, I'll go over to Codex over here and it looks like Codex is taking its sweet time.

Josh:
It's still zero out of six tasks completed. So it might take a little while

Josh:
for us to get a visual on that, but it's just amazing to watch this happen in

Josh:
real time as at least Cloud Code and Opus 4.6,

Josh:
does some quality assurance testing live by taking over my browser and running

Josh:
it for itself. I just think this is like, this is amazing.

Ejaaz:
It's magic. Something I just noticed in your Opus chatbot screen when it's going

Ejaaz:
through its thinking, it seems to have like spun up a few different agents or

Ejaaz:
instances of its own self to pull this off.

Ejaaz:
Like I think if you scroll up, like I saw a few kind of like prompts that like

Ejaaz:
suggested that that's what it was doing,

Ejaaz:
which I think is, underscore is a very important point that both of these models

Ejaaz:
can do, which is they can spin up multiple versions of the same model and task

Ejaaz:
it with different things to run in parallel.

Ejaaz:
What this means is you can get a really complicated product like what you're

Ejaaz:
seeing on the screen right now in a matter of minutes because it's running in parallel.

Ejaaz:
So imagine having a bunch of computer science geniuses that you can just duplicate

Ejaaz:
immediately and run at a fraction of the cost of electricity, the cost of inference.

Ejaaz:
And now you start to see why all these NVIDIA chips and stuff are worth so much.

Ejaaz:
Because you want to do cool stuff like this. This is insane.

Josh:
It's actually incredible. Okay, so now I want to test it on Tesla.

Josh:
So I'm going to choose Tesla and see if it actually can do it in.

Ejaaz:
A non-controlled environment. This UI is so cool.

Josh:
It's very pretty. What the hell? This looks great. Okay, so here we have Tesla.

Josh:
It has the charts. We're going to click through the charts. It has the one-week

Josh:
chart, the one-month chart, the three-month chart. That looks fairly accurate.

Josh:
It has the price-to-earnings ratio, the 52-week high, 52-week low.

Josh:
So it looks like at one point it was trading at $4.88, now it's trading at $3.89.

Josh:
The bull case for Tesla, RoboTaxi and FSD driving licenses could unlock $500

Josh:
billion in revenue by 2030.

Josh:
It has the RoboTaxi service launch in Austin that it's preparing for.

Josh:
And let's see the sector comparison. So it's comparing it to Rivian, Baidu, Toyota, Ford.

Josh:
It has the competitive moat where it says it's most strong in brand power,

Josh:
IP patents, and cost advantages.

Josh:
You can see the revenue, the estimate per share earnings.

Josh:
Sentiment is much worse on Tesla than it was on Apple. It's at 52% right now.

Josh:
And it looks like, as it relates to the risk assessment, devaluation and competition

Josh:
and execution are all very high risk.

Josh:
And that's probably an accurate assessment, although I'm not sure the competition

Josh:
is really a problem. The execution is certainly going to be an issue.

Josh:
But it's just amazing to see how well it does. And it even gives it a verdict.

Josh:
So the AI verdict on Tesla is,

Josh:
It's a hold. Tesla's optionality is enormous, but current valuations already

Josh:
prices in multiple moonshots.

Josh:
Execution on RoboTaxi will be the key catalyst. That sounds about right.

Josh:
And it's amazing that we just built this with a single prompt without any oversight from me.

Josh:
And it works. It actually works. It's really just unbelievable how capable these things are.

Josh:
And now I have a dashboard that anytime I want to make a decision,

Josh:
I can type in the ticker and get all this um optionality it even has menus that

Josh:
work look at this profit margins pe ratios market cap wow pretty unbelievable it's.

Ejaaz:
It's a reactive in real time bloomberg terminal oh wait for the modern age

Josh:
There's um there's another feature here that looks like you could compare stocks

Josh:
let's see if this actually works here so if i type in let's say apple's ticker

Josh:
and i hit go will that compare the two now it looks like that doesn't work very

Josh:
well oh my god but it has moving average lines and everything. This is pretty robust.

Ejaaz:
I know it's like the traded and investors dream. Just crazy.

Ejaaz:
Kind of like a side note on this, but like,

Ejaaz:
The fact that Tesla's down and everyone's kind of like bearish on this company,

Ejaaz:
even though they're like rumored to be merging and stuff like this.

Ejaaz:
Like the point being is there's an asymmetry between what the market is seeing

Ejaaz:
and what these inventors and builders are seeing.

Ejaaz:
These AI labs have created what they define as pretty much a low form of AGI.

Ejaaz:
You literally have an AI model that is building the next version of itself.

Ejaaz:
That by description is like a super genius and it's only limited by the function

Ejaaz:
of energy and compute, right?

Ejaaz:
And then investors are looking at this and saying, huh, Amazon and Google are

Ejaaz:
about to spend a combined $500 billion worth of CapEx this year.

Ejaaz:
Kind of bearish, that's a lot of money. So there is a real investment opportunity

Ejaaz:
here to really understand the difference of what these things can actually do.

Ejaaz:
And that might lead to a lot of like opportunities to invest.

Ejaaz:
I don't know, but I know that I'm buying Tesla today and a bunch of google stock

Josh:
Yeah i mean look at this google valuation one this chart looks absolutely gorgeous

Josh:
but two um the ai verdict is a buy even the ai thinks google is a buy because

Josh:
they just have um alphabet offers the best value in mega cap tech dominant ai

Josh:
capabilities diversified growth and a cheap valuation if search mode holds and.

Ejaaz:
Yeah give me the week give me the week

Josh:
Let's see the weekly chart here do you want some moving average lines as well

Josh:
because we could drop those in please let's.

Ejaaz:
See let's see i'm actually super yeah look see it's had a slight dip Markets are so reactive. Crazy.

Josh:
Yeah, and I think to the point of the CapEx, markets are viewing that as a scary, high-risk statement.

Josh:
But while that's true, I also think it's a testament to the fact that scaling

Josh:
laws are going to work, and the largest companies in the world are betting on

Josh:
the continuation of them working.

Josh:
And the shared consensus between all of these large-cap companies deciding to

Josh:
spend record CapEx this year,

Josh:
is a testament to the fact that things are only going to go faster.

Josh:
And they believe that the more money they put in, the more outputs they will get.

Josh:
And they're going to continue to put their foot on the gas. So I think any question

Josh:
that anyone had, if these scaling laws could continue to hold up and we could

Josh:
continue to be on the path to whatever AGI looks like and beyond,

Josh:
I think that was answered this week through these earnings reports.

Josh:
And the overwhelming answer is yes, it's true.

Josh:
It is likely that this is going to happen and everyone is betting their entire company on it?

Ejaaz:
I think we have done a great job, if I pat ourselves on the back virtually,

Ejaaz:
Josh, of showing what these models are capable of.

Ejaaz:
And remember, it's been less than 48 hours that these models have been alive.

Ejaaz:
In fact, I think it's been like 36 hours. So if any of you are interested in

Ejaaz:
trying these out, I cannot urge you enough to go out and try these things.

Ejaaz:
Try to solve a problem that you're finding at work or try to solve a problem

Ejaaz:
that you're finding just in your casual leisure time to code up a hobby or a

Ejaaz:
project in a matter of seconds. It's so, so easy.

Ejaaz:
And it'll put you at an advantage to understand how these tools work and why

Ejaaz:
they're really changing the world as we see it around us, why stocks are dumping,

Ejaaz:
why some stocks are pumping.

Ejaaz:
But yes, go demo it. Let us know what you actually end up building.

Ejaaz:
Josh and I are trying to give you more live demos in a lot of the episodes that we put out.

Ejaaz:
And with every other model release and feature that drops, we are going to be

Ejaaz:
trying and testing these things so we can bring to you exactly what these things

Ejaaz:
can do and show you kind of like the benefits and disadvantages,

Ejaaz:
what's real and what's really not.

Josh:
Yeah. And I can't stress this enough. The best way to stay on top of things,

Josh:
the best way to feel like you're not being left behind is just to use the tools

Josh:
as they come out and to understand them and what makes them different.

Josh:
And for a single subscription to ChatGPT or to Claude, you can access tools

Josh:
just like this and build stuff just like this.

Josh:
I'm not, this wasn't like an incredibly difficult technical challenge.

Josh:
You just ask it what you want and you ask it to help you.

Josh:
And it will actually walk through and help you through the process and build whatever you want.

Josh:
So the most important thing for anyone listening is just to train that muscle and to get familiar with,

Josh:
these tools and these skills that you're able to leverage them to your advantage,

Josh:
however it may best fit in your life.

Josh:
And that's what kind of we wanted to share with us.

Josh:
Like, it's simple. You download the app, you log into your account,

Josh:
and you're on your way. It's really

Josh:
not as difficult as I think a lot of people make it seem like it is.

Josh:
And I mean, this beautiful dashboard is a testament to that.

Josh:
Okay, so Ejaz, it also looks like our codex output

Josh:
has finished itself so we have here on the

Josh:
screen we have opus which we saw which is

Josh:
really a lovely dashboard but it seems like codex

Josh:
now has its own version that we could quickly compare so maybe we'll try we'll

Josh:
go to our favorite google we'll type google in and we'll click analyze and kind

Josh:
of see how this compares i find it funny how they've they've merged on the same

Josh:
type of design style but yeah oh okay this whoa this is interesting this is

Josh:
different so it has the moving averages select oh is that,

Josh:
Okay, yeah, so it has the charts.

Ejaaz:
Is that accurate?

Josh:
It has the PE ratio. Yeah, that's what I was looking at. Let's go to that one-week chart and see.

Josh:
I have some questions about these. It looks pretty right.

Ejaaz:
Okay. That looks very wrong.

Josh:
Yeah, the one you're a little confused about. Let's compare it to Claude here.

Josh:
Let's go to Google and we'll analyze that. Well, it thinks we can look at the

Josh:
rest. So it looks like it emulated pretty well.

Josh:
It has the verdict. It has the same stats.

Josh:
The risk assessment matrix is... good but you could see like some of the text

Josh:
you can't really read because it's black on black um but nonetheless pretty

Josh:
interesting they both succeeded.

Ejaaz:
Yeah i mean as we said before like these models are very equally capable and

Ejaaz:
you know maybe it's just the way that you prompt something or uh the way that

Ejaaz:
some of these things work but largely they kind of achieve the same goal and

Ejaaz:
same quality um and like listen like we're talking about like minor discrepancies here

Ejaaz:
I can't wait to see what we will build with this. Like, this is insane.

Josh:
It's amazing. Both of these one-shot prompts didn't touch anything.

Josh:
And here we are. I do think that Google, when your chart is wrong,

Josh:
I think Claude got that one right.

Josh:
But we overall both succeeded in the mission. Both look great.

Josh:
And both are just excellent models.

Ejaaz:
Amazing. Okay, well, that's it. Wherever you're listening to this,

Ejaaz:
if it is on YouTube and you're watching our lovely faces, or if you're listening

Ejaaz:
to us on Spotify, Apple Music, or wherever you listen to us,

Ejaaz:
please subscribe, give us a rating, leave us some comments.

Ejaaz:
We love your feedback and we respond to pretty much every single comment because

Ejaaz:
we're trying to figure out how to make this show better and bring you the content

Ejaaz:
that you guys deserve and want.

Ejaaz:
Turn on notifications because we are releasing more and more videos every week

Ejaaz:
on the hottest topics as they come out.

Ejaaz:
We also have the sickest newsletter ever where one of us will either write a

Ejaaz:
essay or give you the five top highlights of the week.

Ejaaz:
So if you don't want to watch any of these videos, you can just read and digest

Ejaaz:
that and you'll know everything that you need to know in AI and frontier tech.

Ejaaz:
Thank you for listening, and we will see you on the next one.

Josh:
See you in the next one. Peace.

The Coding Model Wars: Claude Opus 4.6 vs GPT-5.3 Codex
Broadcast by