Limitless Podcast | Transcript: Kimi K2 is the Open Source Claude-Killer

Kimi K2 is the Open Source Claude-Killer | US vs China AI

July 16, 2025 / 42:55/E30

Ejaaz:
A bunch of AI researchers from China just released a brand new AI model called

Ejaaz:
Kimi K2, which is not only as good as any other top model like Claude,

Ejaaz:
but it is also 100% open source, which means it's free to take,

Ejaaz:
customize and create into your own brand new AI model.

Ejaaz:
This thing is amazing at coding, it beats any other model at creative writing,

Ejaaz:
and it also has a pretty insane voice mode.

Ejaaz:
Oh, and I should probably mention that it is one trillion parameters in size,

Ejaaz:
which makes it one of the biggest and largest models to ever be created.

Ejaaz:
Josh, we were winding down on a Friday night and this news broke that this team

Ejaaz:
had released this model.

Ejaaz:
Absolutely crazy bomb, especially with like OpenAI rumored to release their

Ejaaz:
open source model this week.

Ejaaz:
You've been jumping into this. What's your take?

Josh:
Yeah. So last week we crowned Grok 4 as the new leading private model, closed source model.

Josh:
This week we got to give the crown to Kimi K2 we got another crown

Josh:
going for the open source team they are winning I mean this is

Josh:
better than DeepSeek and DeepSeek R2 this is basically DeepSeek R3

Josh:
I would imagine um and if you remember back a couple months DeepSeek really

Josh:
flipped the world on its head because of how efficient it was and the algorithmic

Josh:
upgrades it made and I think what we see with Kimi K2 is a lot of the same thing

Josh:
it's it's these novel breakthroughs that come as a downstream effect of their

Josh:
needing to be resourceful

Josh:
China, they don't have the mega GPU clusters we have, they don't have all the

Josh:
cutting edge hardware, but they do have the software prowess to find these efficiencies.

Josh:
I think that's what makes this model so special. And that's what we're going

Josh:
to get into here is specifically what they did to make this model so special.

Ejaaz:
Yeah, I mean, look at these stats here, Josh, like 1 trillion parameters in total.

Ejaaz:
It's 32 billion active mixture of expert models. So what this means is,

Ejaaz:
although it's really large in size, typically these AI models can become pretty

Ejaaz:
inefficient if it's large in size, it uses this technique called mixture of

Ejaaz:
experts, which means that whenever someone queries a model,

Ejaaz:
it only uses or activates a number of parameters that are relevant for the query itself.

Ejaaz:
So it's more smarter, it's much more efficient, and it doesn't use or consume

Ejaaz:
as much energy as you would if you wanted to run it locally at home or whatever

Ejaaz:
that might be. It's also super cheap.

Ejaaz:
I think I saw somewhere that this was 20% the cost of clawed,

Ejaaz:
josh which uh we love that insane uh

Ejaaz:
for all the nerds that kind of want to run you know

Ejaaz:
really long tasks or you know just set and

Ejaaz:
forget the ai to to run on like your coding log or whatever that might mean

Ejaaz:
you can now do it at a much more affordable rate at one-fifth the cost uh than

Ejaaz:
some of the top models that are out there and it is as good as those models

Ejaaz:
so just insane kinds of things josh i know there's a bunch of things that you

Ejaaz:
wanted to point out here on benchmarks um And what do you want to get into?

Josh:
Yeah, it's really amazing. So they took 15 and a half trillion tokens and they

Josh:
condensed those down into a one trillion parameter model.

Josh:
And then what's amazing is when you use this model, like she said,

Josh:
it uses a thing called mixture of experts.

Josh:
So it has, I believe, 384 experts.

Josh:
And each expert is good at a specific thing. So let's say in the case you want

Josh:
to do a math problem, it will take a 32 billion parameter subset of the one

Josh:
trillion total parameters, and it will choose eight of these different

Josh:
Experts in a specific thing. So in the case of math, it'll find an expert that

Josh:
has the calculator tool.

Josh:
It'll find an expert that has a fact, like a fact checking tool or a proof tool

Josh:
to make sure that the math is accurate.

Josh:
It'll have just a series of tools to help itself. And that's kind of how it

Josh:
works so efficiently is instead of using a trillion parameters at once,

Josh:
it uses just 32 billion and it uses the eight best specialists out of the 384

Josh:
that it has available to it. It's really impressive.

Josh:
And what we see here is the benchmarks that we're showing on screen.

Josh:
And the benchmarks are really good.

Josh:
It's up there in line with just about any other top model, except with the exception

Josh:
that this is open source.

Josh:
And there was another breakthrough that we had, which was the actual way that

Josh:
they handled the training of this.

Josh:
And yeah, this is the loss curve. So what you're looking at on screen for the

Josh:
people who are listening, it's this really pretty smooth curve that kind of

Josh:
starts at the top and it trends down in a very predictable and smooth way.

Josh:
And most curves don't look like this. And if they do look like this,

Josh:
it's because the company has spent tons and tons of money on error correction

Josh:
to make sure this curve is so smooth.

Josh:
So basically what you're seeing is the training run of the model.

Josh:
And a lot of times what happens is you get these very sharp spikes and it starts

Josh:
to defer away from the normal training run.

Josh:
And it takes a lot of compute to kind of recalibrate and push that back into the right way.

Josh:
What they've managed to do is really make it very smooth.

Josh:
And they've done this by increasing these efficiencies. So if you can think

Josh:
about it, there's this analogy I was thinking of right before we hit the record button.

Josh:
And it's if you were teaching a chef how to cook, right?

Josh:
So we have Chef Ejaz here. I am teaching him how to cook. I am an expert chef.

Josh:
And instead of telling him every ingredient and every step for every single

Josh:
dish, what I tell him is like, hey, if you're making this amazing dinner recipe,

Josh:
all you need that matters is this amount of salt applied at this time,

Josh:
this amount of heat applied for this length of time, and the other stuff doesn't matter as much.

Josh:
So just put in whatever you think is appropriate, but you'll get the same answer.

Josh:
And that's what we see with this model is just an increased amount of efficiency by being

Josh:
direct by being intentional about the data that they used to train it on,

Josh:
the data that they used to fetch in order to give you high quality queries.

Josh:
And it's a really novel breakthrough. They call it the MuonClip optimizer,

Josh:
which, I mean, it's a Chinese company, maybe it means something special there,

Josh:
but it is a new type of optimizer.

Josh:
And what you're seeing in this curve is that it's working really well and it's

Josh:
working really efficient.

Josh:
And that's part of the benefit of having this open source is now we have this

Josh:
novel breakthrough and we could take this and we could use this for even more

Josh:
breakthroughs even more open source models and and that's part that's been really cool to see

Ejaaz:
I i mean this is just um time

Ejaaz:
and again from china uh so so amazing from their research team so so like just

Ejaaz:
to kind of like um pick up your comment on deep seek at the end of last year

Ejaaz:
we were utterly convinced that the only way to create a breakthrough model was

Ejaaz:
to spend billions of dollars on compute clusters.

Ejaaz:
And so therefore it was a pay-to-play game. And then DeepSeek,

Ejaaz:
a team out of China, released their model and completely open-sourced it as well.

Ejaaz:
And it was as good as OpenAI's Frontier model, which was the top model at the time.

Ejaaz:
And the revelation there was, oh, you don't actually just need to chuck a bunch of compute at this.

Ejaaz:
There are different techniques and different methods if you get creative about

Ejaaz:
how you design your model and how you run the training cluster,

Ejaaz:
the training one, which is basically what you need to do to make your model smart,

Ejaaz:
you can run it in different ways that is more efficient, consumes less energy,

Ejaaz:
and therefore less amount of money, but is as smart, if not smarter,

Ejaaz:
than the frontier models that American AI companies are making.

Ejaaz:
And this is just a repeat of that, Josh.

Ejaaz:
I mean, look at this curve. For those who are looking at this episode on video.

Ejaaz:
It is just so clean yeah it's beautiful

Ejaaz:
the craziest part about this is when deep

Ejaaz:
seek was released they pioneered something called uh reasoning

Ejaaz:
or reinforcement learning uh which are two separate

Ejaaz:
techniques that made the model super smart um with less energy and less compute

Ejaaz:
spend um with this model they didn't even implement that technique at all so

Ejaaz:
theoretically this model can get so much more smarter than it already is um

Ejaaz:
and they just kind of leveraged a new method to make it as smart as it already is right now.

Ejaaz:
So just such a fascinating kind of like progress in research from China.

Ejaaz:
And it just keeps on coming out. It's so impressive.

Josh:
Yeah, this is this was the exciting part to me is that we're seeing so many

Josh:
algorithms or exponential improvements in so many different categories.

Josh:
So this was considered a breakthrough by all means. And this wasn't even the

Josh:
same type of breakthrough that DeepSeek had.

Josh:
So we get this now compounding effect where we have this new training breakthrough

Josh:
and then we have DeepSeek who has the reinforcement learning and that hasn't

Josh:
even yet been applied to this new model.

Josh:
So we get the exponential growth on one end, the exponential growth on the reasoning end,

Josh:
those come together and then you get the exponential growth on the hardware

Josh:
stack where the GPUs are getting much faster and there's all of these different

Josh:
subsets of AI that are compounding on each other and growing and accelerating

Josh:
quicker and quicker and what you get is this unbelievable rate of progress and

Josh:
that's what we're seeing. So

Josh:
reasoning isn't even here yet and we're going to see it soon because it is open

Josh:
source so people can apply their own reasoning on top of it i'm sure the moonshot

Josh:
team is going to be doing their own reasoning version of this model and i'm

Josh:
sure we're going to be getting even more impressive results soon i see you have

Josh:
a post up here um about the testing and overall performance can you please share yeah

Ejaaz:
Yeah so um this is a tweet that summarizes really well how this model performs

Ejaaz:
in relation to other Frontier models.

Ejaaz:
And the popular comparison that's taken for Kimi K2 is against Claude.

Ejaaz:
So Claude has a bunch of models out.

Ejaaz:
Claude 3.5 is its earlier model, and then Claude 4 is its latest.

Ejaaz:
And the general take is that this model is just better than those models,

Ejaaz:
which is just insane to say, because for so long, Josh, we've said that Claude

Ejaaz:
was the best coding model.

Ejaaz:
And indeed it was. And then within the span of, what is it, five days?

Ejaaz:
Grok 4 released and it just completely blew Claude 4 out of the water in terms of coding.

Ejaaz:
Now Kimi K2, an open source model out of China who doesn't even have access

Ejaaz:
to the research and kind of proprietary knowledge that a lot of American AI

Ejaaz:
companies have also beat it as well, right?

Ejaaz:
So it kind of beats Claude at its own game, but it's also cheaper.

Ejaaz:
It's 20% the cost of Claude 3.5, which is just an insane thing to say,

Ejaaz:
which means that if you are a developer out there that

Ejaaz:
wants to try your hand at kind of like vibe coding

Ejaaz:
a bunch of things or actually seriously coding something you

Ejaaz:
know that's quite novel but you don't have the hands on deck to do that you

Ejaaz:
can now spin up a Kimi K2 AI agent actually multiple of them for a very cost-efficient

Ejaaz:
reasonable you know salary you don't have to pay like hundreds of thousands

Ejaaz:
of dollars or you know hundreds of millions of dollars which is what Meta is

Ejaaz:
doing to kind of buy a bunch of these software engineers,

Ejaaz:
you can spend, you know, the equivalent of maybe a Netflix subscription or $500

Ejaaz:
to $1,000 a month and spin up your own app. So super, super cool.

Josh:
And also one added perk that's there is it's that even if you have a lot of

Josh:
GPUs sitting around, you can actually run this model for free.

Josh:
So that's the cost if you actually query it from the servers.

Josh:
But I'm sure there's going to be companies that have access to XS GPUs.

Josh:
They can actually just download the model because it's open source,

Josh:
open weights, and they could run it on their own.

Josh:
And that brings the cost of compute down to the cost per kilowatt of the energy

Josh:
required to run the GPUs.

Josh:
So because it's open source, you really start to see these costs decline,

Josh:
but the quality doesn't.

Josh:
And that's every time we see this, we see a huge productivity unlock in encoding

Josh:
output and amount of queries used. It's like, this is freaking awesome.

Ejaaz:
Yeah josh i saw something else come up as well so so do you remember when claude

Ejaaz:
first released um their frontier model i think it was 3.5 or maybe it was four

Ejaaz:
one of their bragging rights was it had a one million uh token context window which.

Josh:
Oh yes which was huge

Ejaaz:
Yeah which for listeners of the show is huge it's like several uh book novels

Ejaaz:
worth um of words or characters you could just bung into one single prompt.

Ejaaz:
And the reason why that was such an amazing thing was for a while,

Ejaaz:
people struggled to kind of communicate with these AIs because they couldn't set the context.

Ejaaz:
There wasn't enough bandwidth within their chat log window for them to say,

Ejaaz:
you know, and don't forget this. And then there was this.

Ejaaz:
And then, you know, this detail and that detail, there just wasn't enough space.

Ejaaz:
And models weren't performing enough to kind of consume all of this in one go.

Ejaaz:
And then Claude came out and was like, hey, we have one million context windows.

Ejaaz:
Don't worry about it chuck in all the research papers that you want chuck in

Ejaaz:
your essay chuck in reference books and we got you um i saw this tweet that

Ejaaz:
was uh deleted i think you sent this to me um.

Josh:
We got the screenshots we always come with receipts yeah i

Ejaaz:
Wonder why they deleted it but uh good catch from you um yeah let's get into this.

Josh:
What's your take on it was was first posted i think

Josh:
earlier today yeah like an hour ago and then deleted pretty shortly afterwards

Josh:
and this is from a woman name crystal crystal works with the moonshot team she

Josh:
is part of the team that that released kimmy k2 um and in this post it says

Josh:
kimmy isn't just another ai it went viral in china as the first to support

Josh:
A 2 million token context window. And then she goes on to say,

Josh:
we're an AI lab with just 200 people, which is ministerially small compared

Josh:
to a lot of the other labs they're competing with.

Josh:
And it was acknowledgement that they had a 2 million token context window.

Josh:
And for those who, just a quick refresher on the context window stuff,

Josh:
it's imagine you have like a gigantic textbook and you've read it once and you

Josh:
close it and you kind of have a fuzzy memory of all the pages.

Josh:
The context window allows you to lay all of those out in clear view

Josh:
and directly reference every single page so when

Josh:
you have two million tokens which is roughly two million words

Josh:
of context we're talking about like hundreds and hundreds

Josh:
of books and textbooks and knowledge and you could really dump a

Josh:
lot of information in this for the ai to readily access and

Josh:
that if they release that a two million token

Josh:
open source model that's huge

Josh:
deal i mean even grok 4 recently i believe

Josh:
what did we say it was it was a 256 000 uh token context window something like

Josh:
that so grok 4 is one eighth of what they supposedly have accessible right now

Josh:
which is a really really big deal um so i'm hoping it was deleted because they

Josh:
just don't want to share that not because it's not true i would like to believe

Josh:
that it's true because man that'd be pretty epic yeah

Ejaaz:
And the people are loving it josh um check out this

Ejaaz:
graph from uh open router which basically shows

Ejaaz:
uh the split of usage between everyone

Ejaaz:
on their platform that are querying different models so for context

Ejaaz:
here open router is a website that you can go to

Ejaaz:
and you can type up a prompt just like you do at chat gpt and

Ejaaz:
you can decide which model your

Ejaaz:
prompt goes to or you could let open router decide for you

Ejaaz:
and it kind of like divvies up your query so if you have a coding query it's

Ejaaz:
probably going to send it to claude or now kimmy k2 or grok4 but if you have

Ejaaz:
something that's more like to do with creative writing or something that's like

Ejaaz:
a case study it might send it to OpenAI's O3 model, right? So it kind of like decides for you.

Ejaaz:
OpenRacha released this graphic, which basically shows that KimiK2 surpassed

Ejaaz:
XAI in token market share just a few days after launching, which basically means

Ejaaz:
that XAI spent, you know,

Ejaaz:
hundreds of billions of dollars training up their Grok4 model,

Ejaaz:
which just kind of beat out the competition just last week.

Ejaaz:
Then KimiK2 gets released completely open source

Ejaaz:
and everyone starts to use that more than

Ejaaz:
grok 4 which is just an insane thing to say and

Ejaaz:
just shows how rapidly these ai models compete with each other and surpass each

Ejaaz:
other um i think part of the reason for this josh is it's open source right

Ejaaz:
which means that not only are retail users like myself and yourself using it

Ejaaz:
for our daily queries you know uh you know,

Ejaaz:
create this recipe for me or whatever, but researchers and builders all over

Ejaaz:
the world that have so far been challenged or had this obstacle of pots of money

Ejaaz:
basically to start their own AI company now have access to a frontier,

Ejaaz:
world-renowned model and can create whatever application, website,

Ejaaz:
or product that they want to make.

Ejaaz:
So I think that's part of the usage there as well. Do you have any takes on this?

Josh:
Yeah, and it's downstream of cost, right? We always see this when a model is

Josh:
cheaper and mostly equivalent, the money will always flow to the cheaper model.

Josh:
It'll always get more queries. I think it's important to note the different

Josh:
use cases of these models. So they're not directly competing head to head on the same benchmarks.

Josh:
I think what we see is like when we talk about Claude, it's generally known as the coding model.

Josh:
And I don't think like OpenAI's O3 is not really competing directly with Claude

Josh:
because it's more of a general intelligence versus a coding specific intelligence.

Josh:
K2 is probably closer to a Claude. I would assume where it's really good at

Josh:
coding because it uses this mixture of experts.

Josh:
And I think that helps it find the tools. It uses this cool new novel thing

Josh:
called like multiple tool use.

Josh:
So each one of these experts can use a tool simultaneously and they could use

Josh:
these tools and work together to get better answers.

Josh:
So in the case of coding, this is a home run.

Josh:
Like it is very cheap cost per token, very high quality outputs.

Ejaaz:
I actually think you can compete with OpenAO3, Josh. Check this out.

Ejaaz:
So Rowan, yeah, Rowan Cheng put this out yesterday And he basically goes,

Ejaaz:
I think we're at the tipping point for AI-generated writing.

Ejaaz:
It's been notoriously bad, but China's Kimi K2, an open-weight model,

Ejaaz:
is now topping creative writing benchmarks.

Ejaaz:
So just to put that into context, that's like having the top most, I don't know,

Ejaaz:
smartest or slightly autistic software engineer, at the top engineering company

Ejaaz:
working on AI models, also being the best poet or creative script and directing

Ejaaz:
the next best movie or whatever that might be,

Ejaaz:
or creating a Harry Potter novel series.

Ejaaz:
This model can basically do both. And what it's pointing out here is that compared

Ejaaz:
to 03, it tops it. Look at this. Completely beats it.

Josh:
Okay, so I take that back. Maybe it is just better at everything.

Josh:
Yeah, that's some pretty impressive results.

Ejaaz:
I think like what's worth pointing out here is, and I don't know whether any

Ejaaz:
of the American AI models do this, Josh, but mixture of experts seems to be clearly a win here.

Ejaaz:
The ability to create an incredibly smart model doesn't come without,

Ejaaz:
you know, this large storage load that is needed, right? One trillion parameters.

Ejaaz:
But then combining it with the ability to be like, Like, hey,

Ejaaz:
you don't need to query the entire thing.

Ejaaz:
We've got you. We have a smart router, which basically pulls on the best experts,

Ejaaz:
as you described earlier, for whatever relevant query you have.

Ejaaz:
So if you have a creative writing task or if you have a coding thing,

Ejaaz:
we'll send it to two different departments of this model.

Ejaaz:
That's a really huge win. Do any other American models use this?

Josh:
Well, the first thing that came to my mind when you said that is Grok4,

Josh:
which doesn't exactly use this, but uses a similar thing, where instead of using

Josh:
a mixture of experts, It uses a mixture of agents.

Josh:
So Grok4 Heavy uses a bunch of distributed agents that are basically clones of the large model.

Josh:
But that takes up a tremendous amount of compute. And that is the $300 a month plan.

Ejaaz:
That's replicating Grok4 though, right? So that's like taking the model and copy pasting it.

Ejaaz:
So let's say Grok4 was one trillion parameters just for ease of comparison.

Ejaaz:
That's like creating, if there was four agents, that's four trillion parameters,

Ejaaz:
right? So it's still pretty costly and inefficient.

Josh:
Is that what you're saying no it's the actually the opposite direction of k2

Josh:
so what they have used is just and again this is kind of similar to tracking

Josh:
sentiment between the united states and china where the united states will throw

Josh:
compute at it where china will throw like

Josh:
kind of clever resource at it so grok yeah

Josh:
when they use their mixture of agents it actually just costs a lot more

Josh:
money whereas k2 when they use their mixture of

Josh:
experts well it costs a lot less instead of using 4 trillion

Josh:
parameters in this case it uses just 32 billion and it

Josh:
kind of copies that 32 billion over and over and it's really it's a really

Josh:
elegant solution that seems to be

Josh:
yielding pretty comparable results so i think as we

Josh:
see these efficiency upgrades i'm sure they will

Josh:
eventually trickle down into the united states models and when they do that

Josh:
is going to be a huge unlock in terms of cost per token in terms of the smaller

Josh:
distilled models that we're going to be able to run on our own computers um

Josh:
but yeah i don't know of any who are also using it at this scale it might be

Josh:
novel just to k2 right now and

Ejaaz:
And i think that this is the method that probably scales the best josh like.

Josh:
Yeah it makes sense efficiency

Ejaaz:
Always wins at the end right and to see um this kind of innovation come pretty

Ejaaz:
early on in a technology's life cycle is just super impressive to see,

Ejaaz:
Another thing I saw is there's two different versions of this model, I believe.

Ejaaz:
There's something called Kimi K2 Base, which is basically the model for researchers

Ejaaz:
who want full control for fine-tuning and custom solutions, right?

Ejaaz:
So imagine this model as the entire parameter set. So you have access to one

Ejaaz:
trillion parameters, all the weight designs and everything.

Ejaaz:
And if you're a nerd that wants to nerd out you can

Ejaaz:
go crazy you know if you have like your own gpu

Ejaaz:
cluster at home or if you happen to have a convenient

Ejaaz:
warehouse full of of servers that you weirdly

Ejaaz:
have access to you can go crazy with it you can if you

Ejaaz:
think about like um the early gaming days of counter-strike and then you could

Ejaaz:
like mod it you can basically mod this uh model to your heart's desire and then

Ejaaz:
there's a second version called k2 instruct which is for drop-in general purpose

Ejaaz:
chat and AI agent experiences.

Ejaaz:
So this is kind of like at the consumer level, if you're experimenting with

Ejaaz:
these things, or if you want to run an experiment at home on a specific use

Ejaaz:
case, you can kind of like take that away and do that for yourself.

Ejaaz:
That's how I understand it, Josh. Do you have any takes on this?

Josh:
That makes sense. And I think that second version that you're describing is

Josh:
what's actually available publicly on their website, right?

Josh:
So if you go to Kimmy.com, it has a text box. It looks just like ChatGPT like you're used to.

Josh:
And that's where you can run that second tier model which

Josh:
um you described as that's the the drop in general purpose

Josh:
chat and then yeah for the the hardcore researchers there's

Josh:
a github repo and the github repo has all the weights and all the code and

Josh:
you can really download it dive in use the full thing i

Josh:
was playing around with the kimmy tool and it's it's really cool

Josh:
it's fast oh i mean it's lightning fast if you

Josh:
go from a reasoning model to an inference model like kimmy

Josh:
you get responses like this like when

Josh:
i'm using grok 4 or o3 i'm sitting there sometimes for a couple minutes it's

Josh:
waiting for an answer this you type it in and it just types back right away

Josh:
no time waiting so it's it's kind of refreshing to see that but it's also a

Josh:
testament to how impressive it is i'm getting great answers and it's just spitting

Josh:
it right out so what happens when they add the reasoning layer on top well it's

Josh:
probably going to get pretty freaking good

Ejaaz:
So the trend we're seeing, and we saw this last week with Grok4,

Ejaaz:
is typically we're expected to wait a while when we send a prompt to a breakthrough

Ejaaz:
model because it's thinking, it's trying to basically replicate what we have in our brains up here.

Ejaaz:
And now it's just getting much quicker and much smarter and much cheaper.

Ejaaz:
So the long story short is these incredibly powerful, I kind of think about

Ejaaz:
it as how we went from massive desktop computers to slick cell phones,

Ejaaz:
Josh, and then we're going to eventually have chips in our brain.

Ejaaz:
AI is just kind of like fast tracking that entire life cycle within like a couple

Ejaaz:
of years, which is just insane.

Josh:
And these efficiency improvements are really exciting because you can see how

Josh:
quickly they're shrinking and allowing eventually for those incredible models

Josh:
to just run on our phones.

Josh:
So there's totally a world a year from now in which like a

Josh:
grok 403 kimmy k2 capable model

Josh:
is small enough that it could just run inside of in our

Josh:
phone and run on a mobile device or run locally on a laptop

Josh:
or you're offline and you kind of have this portable intelligence

Josh:
that's available everywhere anytime even if

Josh:
you're not connected to the world and that seems really cool

Josh:
like we were talking a few episodes ago about apple's um local

Josh:
free ai inference running on an iphone

Josh:
but how the base models still kind of suck like they don't really do

Josh:
anything super interesting they're basically good enough to do what

Josh:
you would expect siri to do but can't do and these

Josh:
models as we get more and more breakthroughs like this that allow you to

Josh:
run much larger parameter counts

Josh:
on a much smaller device it's going to start really

Josh:
super powering these mobile devices and i can't help but think about the open

Josh:
ai hardware device i'm like wow that'd be super cool if you had like oh three

Josh:
running locally in the middle of the jungle somewhere with no service and you

Josh:
still had access to all of its capabilities like that's probably coming downstream

Josh:
of breakthroughs like this where we get really big efficiency unlocks

Ejaaz:
I mean, it's not just efficiency, though, right? It's the fact that if you can

Ejaaz:
run it locally on your device, it can have access to all your private data without

Ejaaz:
exposing all of that to the model providers themselves, right?

Ejaaz:
So one of the major concerns of not just AI models, but also with mobile phones is privacy.

Ejaaz:
I don't want to share all my kind of like private health, financial,

Ejaaz:
and social media data, because then you're just going to have everything on

Ejaaz:
me and you're going to use me.

Ejaaz:
You're going to use me as a product, right? And that's kind of like been the

Ejaaz:
quota for the last decade in tech.

Ejaaz:
And so with AI, that's a supercharged version of it. The information gets more

Ejaaz:
personal. It's not just your likes.

Ejaaz:
It's, you know, where Josh shops every day and, you know, who he's dating and

Ejaaz:
all these kinds of things, right?

Ejaaz:
And that becomes quite personal and intrusive very quickly.

Ejaaz:
So the question then becomes, how can we have the magic of an AI model without it being so obtrusive?

Ejaaz:
And that is open source locally run AI or privately run AI. and Kimi K2 is a

Ejaaz:
frontier model that can technically run on your local device if you set up the right hardware for it.

Ejaaz:
And the way that we're trending, you can basically end up having that on your

Ejaaz:
device, which is just a huge unlock.

Ejaaz:
And if you can imagine how you use OpenAI 03 right now, Josh,

Ejaaz:
right? I know you use it as much as I do.

Ejaaz:
The reason why you and I use it so much isn't just because it's so smart,

Ejaaz:
but it's because it remembers everything about us.

Ejaaz:
But I hate that Sam knows or has access to all that data.

Ejaaz:
I hate that if he chooses to switch on personalized ads, which is currently

Ejaaz:
the model where most of these tech companies make money right now,

Ejaaz:
he can, and I've got nothing to do about it because I don't want to use any

Ejaaz:
other model apart from that.

Ejaaz:
But if there was a locally run

Ejaaz:
model that had access to all the memory and context, I'd use that instead.

Josh:
And this is suspicious. I mean, this is a different conversation in total,

Josh:
but isn't it interesting how other companies haven't really leaned into memory

Josh:
when it's seemingly the most important mode that there is.

Josh:
Like Grok4 doesn't have good memory rolled out. Gemini doesn't really have memory.

Josh:
There's no, Claude doesn't have memory the way that OpenAI does.

Josh:
Yet it's the single biggest reason why we both continue to go back to ChatGPT and OpenAI.

Josh:
So that's just been an interesting thing. I mean, Kimmy is open source.

Josh:
I wouldn't expect them to lean too much into it. But for these closed source

Josh:
models, that's just, it's another interesting just observation.

Josh:
Like, hey, the most important thing isn't, doesn't seem to be prioritized by

Josh:
other companies just yet.

Ejaaz:
Why do you think that is so so my theory um at least from xai or grok force

Ejaaz:
perspective is elon's like okay i'm not going to be able to build a better chat

Ejaaz:
bot or chat messenger than openai has there's not too many features i can um.

Ejaaz:
Set Grok 4 apart, then that O3 doesn't already do, right?

Ejaaz:
But where I can beat O3 is at the app layer.

Ejaaz:
I can create a better app store than they have because I haven't really created

Ejaaz:
one that is sticky enough for users to continually use.

Ejaaz:
And I can use that data set to then unlock memory and context at that point, right?

Ejaaz:
So I just saw today that they released, they

Ejaaz:
being um xai released a new feature for grok 4

Ejaaz:
called i think it's uh companions josh um

Ejaaz:
and it's basically these yeah these animated um

Ejaaz:
avatar like um characters so they basically look like they're from an anime

Ejaaz:
show and you know how you can use voice mode in open ai and you can kind of

Ejaaz:
like talk to this uh realistic human sounding ai you now have a face and a character

Ejaaz:
on grok 4 and it's really entertaining, Josh.

Ejaaz:
Like I find myself kind of like engaged in this thing because I'm not just typing words.

Ejaaz:
It's not just this binary to and fro with this chat messenger.

Ejaaz:
It's this human, this cute, attractive human that I'm just like now speaking to.

Ejaaz:
And I think that that's the strategy that a lot of these AI companies,

Ejaaz:
if I had to guess, are taking to kind of like seed their user base before they

Ejaaz:
unlock memory. I don't know whether you have a take on that.

Josh:
Yeah, I have a fun little demo. I actually played around with it this morning

Josh:
and I was using it totally unhinged, no filter, very vulgar,

Josh:
but like kind of fun. It's like a fun little party trick.

Josh:
And yeah, I mean, that was a surprise to me this morning when I saw that rolled

Josh:
out. I was like, huh, that doesn't really seem like it makes sense.

Josh:
But I think they're just having fun with it.

Ejaaz:
Can we for a second talk about the team?

Ejaaz:
So we've mentioned just now how they've all come from China and how China's

Ejaaz:
like really advancing open source AI models, and they've completely beat out

Ejaaz:
the competition in America, Mata's Lama being the obvious one.

Ejaaz:
We've got Kwen from Alibaba.

Ejaaz:
We've got Deep Seek R1. Now we have Kimi K2. The team is basically...

Ejaaz:
The AI Avengers of China, Josh. So these three co-founders all have deep AI

Ejaaz:
ML backgrounds that hail from the top American universities,

Ejaaz:
such as Carnegie Mellon.

Ejaaz:
One of them has a PhD from Carnegie Mellon in machine learning,

Ejaaz:
which is basically, for those of you who don't know, is like God-tier degree for AI.

Ejaaz:
That means you're desirable and hireable by every other AI company after you graduate.

Ejaaz:
But it's not just that. They also have credibility and degrees from the top universities in China.

Ejaaz:
Especially this one university called Tsinghua, which seemed to be the top of their field.

Ejaaz:
I looked them up on rankings for AI universities globally, and they often come

Ejaaz:
in number three or four in the top 10 AI universities. So pretty impressive from there.

Ejaaz:
But what I found really interesting, Josh, was one of the co-founders was an

Ejaaz:
expert in training AI models on low-cost optimized hardware.

Ejaaz:
And the reason why I mentioned this is it's no secret that if you want a top

Ejaaz:
frontier AI model, you need to train it on NVIDIA's GPUs.

Ejaaz:
You need to train it on NVIDIA's hardware.

Ejaaz:
NVIDIA's market cap, I think, at the end of last week, surpassed $4 trillion.

Ejaaz:
That's $4 trillion with a T. That is more than the current GDP of the entire British economy.

Josh:
Where I hail from. And the largest in the world.

Ejaaz:
And there's never been.

Josh:
A bigger company

Ejaaz:
There's never been a bigger company it it's just

Ejaaz:
insane to grab your head around and it's not without

Ejaaz:
reason they supply basically or they have a

Ejaaz:
grasp or a monopoly on the hardware that

Ejaaz:
is needed to train top models now kimmy k2

Ejaaz:
comes along casually drops a one trillion parameter model one of the largest

Ejaaz:
models ever released um and it's trained on hardware that isn't nvidia's um

Ejaaz:
and jensen huang i i need to find this clip josh but But Jensen Huang basically

Ejaaz:
was on stage, I think it was at a private conference maybe yesterday,

Ejaaz:
but he was quoted as saying 50% of the top AI researchers are Chinese and are from China.

Ejaaz:
And what he was implicitly getting at is they're a real threat now.

Ejaaz:
I think for the last decade, we've kind of been like, ah, yeah,

Ejaaz:
China's just going to copy paste everything that comes out of America's tech sector.

Ejaaz:
But when it comes to AI, we've kind of like maintained the same mindset up until

Ejaaz:
now where they're really just competing with us.

Ejaaz:
And if they have the hardware, they have the ability to research new techniques

Ejaaz:
to train these models, like DeepSeek's reinforcement learning and reasoning,

Ejaaz:
and then Kimi K2's kind of like efficient training run, which you showed earlier.

Ejaaz:
They've come to play, Josh. And I think it's worth highlighting that China has

Ejaaz:
a very strong grasp on top AI researchers in the world and models that are coming out of it.

Josh:
Where are their $100 million offers? I haven't seen any of those coming through.

Josh:
None, dude. The most impressive thing is that they do it without the resources that we have.

Josh:
Imagine if they did have access to the clusters of these like H100s that NVIDIA is making.

Josh:
I mean, that would be, would they crush us?

Josh:
And we kind of have this timeline here where we're kind of running up against

Josh:
the edge of energy that we have available to us to train these massive models.

Josh:
Whereas China does not have that constraint. They have significantly more energy to power these.

Josh:
So in the event, the inevitable event that they do get the chips and they are

Josh:
able to train at the scale that we are, I'm not sure we're able to continue

Josh:
our rate of acceleration in terms of hardware manufacturing,

Josh:
large training as fast as they will.

Josh:
And they already have done the hard work on the software efficiency side.

Josh:
They've cranked out every single efficiency because they are doing it on constrained hardware.

Josh:
So it's going to create this really interesting effect where they're coming

Josh:
at it from the like ingenuity software approach we're coming at it from the

Josh:
brute force throw a lot of compute added approach and we'll see where both both

Josh:
sides end up um but it's clear that china is still behind because they are the

Josh:
ones open sourcing the models and we know at this point now if you're open sourcing

Josh:
your model you're doing it because you're behind

Ejaaz:
Yeah yeah i mean one thing

Ejaaz:
that did surprise me josh was that they released a one

Ejaaz:
trillion parameter open source model i i didn't

Ejaaz:
expect them to catch up that quickly um like one

Ejaaz:
trillion is a lot um yeah another thing

Ejaaz:
i was thinking about is china has dominated

Ejaaz:
hardware for so long now so it wouldn't

Ejaaz:
really surprise me if like i don't know a

Ejaaz:
couple years from now they're producing better models

Ejaaz:
at specific things basically because they have better

Ejaaz:
hardware than america than the west um but

Ejaaz:
where i think the west will continue to dominate

Ejaaz:
is at the application layer and i don't

Ejaaz:
know if i was a betting man i would say that most of the money is eventually going

Ejaaz:
to be made on the application side of things i think grok

Ejaaz:
4 is starting to um kind of show that

Ejaaz:
with all these different kinds of novel features that they're releasing i i

Ejaaz:
don't know if you've seen some of the games that are being produced from grok

Ejaaz:
4 josh but it is ultimately insane and i haven't seen any similar examples come

Ejaaz:
out of uh asia from any of their ai models even when they have access to american

Ejaaz:
models So I still think America dominates at the app layer.

Ejaaz:
But Josh, I just came across this tweet, which you reminded me of earlier.

Ejaaz:
Tell me about OpenAI's strategy to open source model, because I got this tweet

Ejaaz:
pulled up from Sam Altman, which is kind of hilarious.

Josh:
Yeah. All right. So this week, if you remember from our episode last week,

Josh:
we were excited about talking about OpenAI's new open source model.

Josh:
OpenAI, open source model, all checks out. This was going to be the big week.

Josh:
They released their new flagship open source. Well, conveniently,

Josh:
I think the same day as K2 launched, later in the day, or perhaps the very next morning.

Josh:
Sam Altman posted a tweet. He says, Hey, we plan to launch our open weights model next week.

Josh:
We are delaying it. We need time to run additional safety tests and review high-risk

Josh:
areas. We are not yet sure how long it will take us.

Josh:
While we trust the community will build great things with this model,

Josh:
once weights are out, they can't be pulled back. This is new for us and we want to get it right.

Josh:
Sorry to be the bearer of bad news. We are working super hard.

Josh:
So there's a few points of speculation. The first, obviously,

Josh:
being, did you just get your ass handed to you and now you are going back to

Josh:
reevaluate before you push out a remodel?

Josh:
So that's one possible thing where they saw K2. They were like,

Josh:
oh, boy, this is pretty sweet.

Josh:
This is our first open source model. We probably don't want to be lower than them.

Josh:
And there is this second point of speculation, which, Ejaz, you mentioned to

Josh:
me a little earlier today, where maybe something went wrong with the training one.

Josh:
And it's not quite that they're getting beat up by a Chinese company.

Josh:
Is that like they actually made a mistake on their own accord and can you explain

Josh:
to me specifically what that might be what the speculation is at least yeah

Ejaaz:
Well i'll keep it short i think it was a little racist under

Ejaaz:
the hood and i i can't find the tweet but basically

Ejaaz:
one of these um ai researchers slash

Ejaaz:
product builders on x got access to

Ejaaz:
the model supposedly according to him and he tested it

Ejaaz:
out uh in the background and he said yeah it's it's

Ejaaz:
not really an intelligence thing it's just worse than

Ejaaz:
what uh you'd expect from an alignment and uh consumer facing approach it was

Ejaaz:
it was ill-mannered it was saying some pretty wild shit kind of the stuff that

Ejaaz:
you'd expect coming out of 4chan um and so sam altman decided to delay whilst

Ejaaz:
they kind of like figured out why um it was kind of acting out.

Josh:
Got it okay so we'll leave

Josh:
that speculation where it is there's a there's a funny post

Josh:
that i'll actually share with you if you want to throw it up which was actually from elon

Josh:
and we'll abbreviate but it was like elon was basically saying um

Josh:
it's hard to avoid the the libtard slash

Josh:
mecha hitler like approach both of them

Josh:
because they're on so polar opposite ends of the spectrum and he said he spent

Josh:
several hours trying to solve this problem with the system prompt but there's

Josh:
too much garbage coming in at the foundation model level so basically i mean

Josh:
what happens with these models is you train them based on all the human knowledge

Josh:
that exists right so everything that we've believed all the ideas that we've

Josh:
shared it's been fed into these models.

Josh:
And what happens is you can try to adjust how they interpret this data through

Josh:
the system prompt, which is basically an instruction that every single query

Josh:
gets passed through, but at some point is reliant on this swath of human data that is just

Josh:
It's too overbearing. And that's kind of what Elon shared.

Josh:
And the difference between OpenAI and Grok is that Grok will just ship the crazy

Josh:
update. And that's what they did. And they caught a lot of backlash from it.

Josh:
But what I find interesting and what I'm sure OpenAI will probably follow is

Josh:
this last paragraph where he says, our V7 foundation model should be much better.

Josh:
And we're being far more selective about training data rather than just training on the entire internet.

Josh:
So what they're planning to do to solve this problem, which is what I assume

Josh:
OpenAI probably ran into in the case that the AI training model kind of went

Josh:
off the rails and it started saying bad things about lots of people is that

Josh:
you kind of have to rebuild the foundation model with new sets of data.

Josh:
And in the case of Grok, I know one of the intentions for v7 is actually to

Josh:
generate its own database of data based on synthetic data from their models.

Josh:
And I'm assuming OpenAO will probably have to do this too if they want to calibrate.

Josh:
A lot of times people call that the temperature, which is the like variance

Josh:
of aggression in which a model uses.

Josh:
And I don't know, I think we're gonna start to see interesting approaches from

Josh:
that because as they get smarter, you really don't want them to necessarily

Josh:
have these evil traits as the default.

Josh:
And it's very hard to get around that when you train them on the data that they've been trained on so far.

Ejaaz:
It just goes to show how, I guess, cumbersome it is to train these models,

Ejaaz:
Josh. It's such a hard thing.

Josh:
Yeah. Yeah.

Ejaaz:
It's not something that you can just kind of like jump into the code and tweak a few things.

Ejaaz:
Most of the time you don't know what's wrong with the model or where it went

Ejaaz:
wrong. I mean, we've talked about this on a previous episode, but

Ejaaz:
So essentially, if you build out this model, right, you spend hundreds of millions

Ejaaz:
of dollars, and then you feed it a query.

Ejaaz:
So you put something in and then you wait to see what it spits out.

Ejaaz:
You don't really know what it's going to spit out. You can't predict it.

Ejaaz:
It's completely probabilistic. and so if you

Ejaaz:
release a model and it starts being a little racist or uh

Ejaaz:
you know um kind of crazy uh you

Ejaaz:
have to kind of like go back to the drawing board and you have

Ejaaz:
to analyze many different sectors of of this model

Ejaaz:
like was it the data that was poisoned or was it the way that we trained it

Ejaaz:
or maybe it was a particular model weight that we tweaked too much or whatever

Ejaaz:
that might be so i i think over time it's going to get a lot easier once we

Ejaaz:
understand how these models actually work but my god it must be so expensive

Ejaaz:
to just continually rerun and retrain these models.

Josh:
Yeah when you think about a coherent cluster of 200

Josh:
000 gpus the amount of energy the amount

Josh:
of resources just to to retrain a mistake is is huge so i think i mean the more

Josh:
we go into it the deeper we get the more it kind of makes sense paying so much

Josh:
money for talent to avoid these mistakes where if you pay a hundred million

Josh:
dollars for one employee who will give you a strategic advantage to avoid having

Josh:
to do another training run, that will cost you more than $100 million.

Josh:
You've already, you're already in the profit. So you kind of start to see the

Josh:
scale, the complexity, the difficulties.

Josh:
I do not envy the challenges that some of these engineers have to face.

Josh:
Although I do envy the- I envy the salary.

Ejaaz:
I envy the salary, Josh.

Josh:
I envy the salary and I envy the adventure. Like how cool must that be trying

Josh:
to build super intelligence for the world as a human for the first time in like

Josh:
the history of everything.

Josh:
So it's gotta be pretty fun. This is where we're at now with the open source

Josh:
models closed source models k2's pretty epic i think that's a home run i think

Josh:
we've crowned a new model today um do you have any closing thoughts anything

Josh:
you want to add before we wrap up here this is pretty amazing i

Ejaaz:
Think i'm most excited uh for the episode that we're probably going to release

Ejaaz:
a week from now josh when we've seen what people have built with this open source

Ejaaz:
model that's the best part about this by the way just to remind the listener that,

Ejaaz:
anyone can take this model right now you if you're listening to this can take

Ejaaz:
this model right now run it locally at home and tweak it to your preference

Ejaaz:
now yes it's going to be you know you kind of need to know how to tweak model

Ejaaz:
weights and stuff but i think we're going to see some really cool applications

Ejaaz:
get released over the next week and i'm excited to play around with them personally.

Josh:
Yeah if you're listening to this um and you can

Josh:
run this model let us know because that means you have quite a solid uh

Josh:
rig at your home yeah i'm not sure the average person is

Josh:
going to be able to run this but that is the beauty of the open weights is that anybody

Josh:
with the capability of running this can do so they

Josh:
could tweak it how they like and now they have access to the new

Josh:
best open source model in the world which i mean just a

Josh:
couple months ago from now would have been the best model in the

Josh:
world so it's moving really quickly it's really accessible and

Josh:
i'm sure as the weeks go by i mean hopefully we'll get open ai's model open

Josh:
source model soon in the next few weeks we'll be able to cover that but until

Josh:
then just lots of stuff going on this was uh another great episode so thank

Josh:
you everyone for tuning in again for rocking with us We actually plan on making this like 20 minutes,

Josh:
but we just kind of kept tailing off into more interesting things.

Josh:
There's a lot of interesting stuff to talk about. I mean, there's really,

Josh:
you could take this in a lot of places.

Josh:
So hopefully this was interesting.

Josh:
Go check out Kimmy K2. It's really, really impressive. It's really fast.

Josh:
It's really cheap. If you're a developer, give it a try.

Josh:
And yeah, that's been another episode. We'll be back again later this week with

Josh:
another topic. and just keep on chugging along as the frontier of AMI models continues to head west.

Ejaaz:
So also we'd love to hear from you guys. So if you have any suggestions on things

Ejaaz:
that you want us to talk more about, or maybe there's like some weird model

Ejaaz:
or feature that you just don't understand and maybe we can do a job at explaining it, just message us.

Ejaaz:
Our DMs are open or respond to any of our tweets and we'll be happy to oblige.

Josh:
Yeah, let us know. If there's anything cool that we're missing,

Josh:
send it our way and we'll cover it. That'd be great.

Josh:
But yeah, we're all going on the journeys together. We're learning this as we go.

Josh:
So hopefully today was interesting. And if you did enjoy it,

Josh:
please share with friends, likes, comment, subscribe, all the great things.

Josh:
And we will see you on the next episode.

Ejaaz:
Thanks for watching. See you guys. See you.

Creators and Guests

Host

Josh Kale

Kimi K2 is the Open Source Claude-Killer | US vs China AI

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere