The GPT-5 Reveal Nobody Expected (Not in a Good Way)
Ejaaz:
[0:03] This is the first major model release in two and a half years from open ai gpt5 is now live and in the words of sam altman you now have a phd level team of experts in your pocket that can write create build code whatever you could think of josh and i literally just finished watching the live stream in fact i still think it's it's going on right now in the background of the screen
Ejaaz:
[0:26] Josh, give me a take.
Josh:
[0:27] This was a huge day. This felt like the Super Bowl of AI. This was the first numbered model. I mean, GPT-4 came out two and a half years ago. GPT-5 is here today. We just got off the live stream. It's a mixed bag of sentiment. We're going to get into how we feel about it. But I want to just dive into everything
Josh:
[0:42] you need to know about GPT-5. So, Ijaz, do you want to walk us through exactly the announcements, what they did, what's exciting, the cool new features? Let's just hear exactly what they announced today.
Ejaaz:
[0:51] Okay, so here's that quick high-level 30-second take. There are three new models. Everyone now has access to a GPT-5 in some way, shape or form. You've got GPT-5, the main model, which is really deep in reasoning, learning, coding, all the things we just mentioned. You have a GPT-mini model and a GPT-nano model. Well, we've got pulled up on the official website what kind of like the main differences are between these models. But basically, some are more expensive than the others, but the more expensive ones give you smarter results. But there's one key difference this time, Josh. Typically in the past, you've had to kind of choose between the smart model where you kind of have to like wait a couple of minutes to give you a good answer. And then you've got like the other models which give you quicker answers like GPT-40 that most people use these days. That is now gone. You can just use one singular model that answers super quickly and really smartly.
Josh:
[1:42] So this is like a stark difference than what people are used to, right? Is normally when you use ChatGPT, you have a dropdown menu. It has a bunch of different options of models you could choose from now there's just going to be one it's going to be gpt5 and what's interesting is a lot of people who are listening to this who might not have a plus plan or a pro plan well they're actually going to be going from gpt40 to gpt5 which is going to feel like a huge step change in quality so i think for everyone who's listening to this anyone who's using it whether you're a free user paid user you are going to notice a big difference in the quality of your responses based on this new model change and there's no more confusion there's just one model that does all the thinking for you. It will determine how much reasoning is needed, how much thinking is needed. You don't have to prompt it with any specific thing. It'll just decide exactly how much time is needed to give you the best answer. And that's pretty cool.
Ejaaz:
[2:27] And they also teased a few other features that are going to be releasing soon. One was AI personalities, which means that ChatGPT, you can choose whether it has more sarcasm or whether it sounds more professional or whether it sounds more entertaining. This is kind of similar to the AI Grok companions that we've seen come out of XAI. So I'm really excited to see OpenAI's version of that. They're also releasing an advanced voice mode, Josh, which you and I both love. We speak to the voice mode of ChatGPT so much. And to your earlier point, this is now accessible to everyone. Previously, you kind of need to be on some kind of a premium or maybe even pro
Ejaaz:
[3:01] plan to get access to this. Now everyone has access. But Josh, like we can talk about these benchmarks, we can talk about these kinds of features all day, but the live demos are where all the fun's at, right? And they had quite a few demos to show us. What were your thoughts on that?
Josh:
[3:16] The demos were amazing. I was very impressed, which is a contrast to how I felt about the rest of the presentation. But to start with the demos, I thought they were very good. So the first demo they gave was on the Bernoulli effect, which I mean, I believe it's what happens when wind kind of goes over a plane, it allows planes to fly and it tells you when they're going to crash. And it was a really simple prompt that asked to explain the Bernoulli effect, but then it created this entire visual interface on top of it that was fully interactive, very engaging, looks very pretty. You could see it here on the screen share. Basically, there are these toggles you can change. There's a 3D visual element. It's interactive. I thought that was really neat. I think the canvas has been kind of an underutilized part of the ChatGPT experience and allowing you to prompt it to now create these virtual environments, these virtual worlds. That was pretty cool. The second demo, there was a demo where they created a web app in French. And basically, it was a way for this guy to teach his girlfriend how to speak French. And what he did is he said, hey, can you make me some flashcards? And can you make me a cheat sheet and can you make me a game that has a little mouse and it has the mouse kind of follow around this path. And it's kind of like the snake game, but it's with a mouse and cheese. It was cute. It looked great. The actual interface that it generated looked amazing. The quality of the output was really good. What I found interesting is the multi-modality aspect of it. So he asked it to create a game. Not only did it create the flashcards, the game, it created a full dashboard with your progress and it spoke to you. So it had the voice. It can speak actual French out loud.
Josh:
[4:42] It allows you to create these one-of-one experiences. You could build a game. You could build an app. You can build whatever you'd like in one prompt. I found that to be really interesting and really exciting.
Josh:
[4:52] Those are the demos that I thought were interesting. After that, it got much less interesting. Did you have any takes on demos or anything else about the presentation?
Ejaaz:
[5:01] Okay, I'll preface this with the positive news first. I, like you, thought that some of those demos were pretty cool. Actually, I think the Bernoulli effect demo that you just described, she created in two minutes, 400 lines of code, which is honestly a pretty impressive feat, but also an ambitious thing to do live in front of millions of people who are watching on live stream. They generally recommend not to do like live coding demos, let alone live demos alone. And to have the confidence to do that was pretty cool. But that's when things
Ejaaz:
[5:34] started going downhill for me, Josh. They started to demonstrate its creative writing ability, right?
Ejaaz:
[5:40] And think about this, right? You want to demonstrate how good your model is at creating prose. So you might want to create something fantasical or creative hint, though it's in the term itself but they decided to write a eulogy which was the first of like many weird things that they they tried to figure out so they wrote this eulogy for the old chat gpt model and they thought it might be a fun experiment to kind of like compare which model does better so they had gpt4 which is the current model that exists write a eulogy for itself i guess and then they had gpt5 write a eulogy for that older version of the model and they were pointing out like how GPT-5 sounds way more intuitive. It made people in the audience laugh, etc.
Ejaaz:
[6:24] But all in all, I thought it was kind of like a weird example. This is, and the reason why I say that, the reason why I'm so critical is this is meant to be such a magical moment. Two and a half years in the waiting. GPT-5, OpenAI was the one to lead Frontier AI models in the first place. I want it to be a magical experience, not really talk about the death of whatever in general. And then the other demos josh i think i kind of like tuned out when it came to like the coding stuff not necessarily because that's not my forte but because if you're pitching an example of like coding i think they were coding up a bunch of different they were demonstrating pair programming that's it uh pair programming in software development is a really important thing because you kind of want multiple people working on the same thing at the same time so that it leaves you to kind of like focus on other things they didn't really do it in an intuitive way and honestly it wasn't that different from offerings that Claude Code or Anthropic already has right now they had like this crazy benchmark Josh did you see this it was the biggest form of a chart crime that I've ever seen let me let me show you this
Josh:
[7:25] Was a good one do you have the link to pull up.
Ejaaz:
[7:27] Yeah yep look at this
Josh:
[7:30] Notice any differences that's pretty good so it's interesting because the 30 and the 69 are the same height and then the 52 is taller than the 69 and the 74 is like much taller than the 60 and like the proportions are a little out of whack and this feels a little dishonest i was a bit confused and this kind of happened throughout a few charts in this presentation where they weren't really accurately i mean grant i believe i'm looking at this right this doesn't appear to accurately represent progress so it seemed like there was this trend of of incorrect charts and then sam actually publicly apologized on x but then we saw it in a different slide where the charts again didn't quite match up in favor of making it look like there was more progress than there really was so yeah questionable charting that was uh one of the things in terms of features there were some fun features i mean one thing that i got.
Josh:
[8:19] Excited about it's dumb but it's cool is you can now change the color of your chats so now you could have like blue chats like i like the customization stuff because as a user who sits there all day i want to look a little bit pretty i found the the companions the personalities pretty interesting just like grok so we talked about this on grok literally earlier this week they're starting to roll that out now in chat gpt where you can have an assistant you can have a jokester you could have a storyteller you could kind of customize the sentiment of the chatbot that you're talking to and then they had a lot of other use cases they were demoing they they really had a big moment on health i think the health section was interesting they spoke about someone who was suffering from cancer and needed to diagnose her symptoms and she used the help of chat gpt to guide her through and give her agency through this process that would otherwise be really difficult to navigate if you don't understand the health data it was a little bit of a sad and depressing version of this and i think that was mostly the sentiment throughout the presentation was like you said you just this is this magical moment of this superhuman artificial intelligence and a lot of it was kind of veiled in this this, eulogy and cancer and like kind of this grayscale presentation and very minimalist.
Josh:
[9:27] It was too serious. Very low energy. And yeah, I think a presentation of these things matter. And it's funny seeing Sam stand on stage. He's kind of doing the hands like this apple pose that we see in a lot of these presentations. And as far as the actual presentation goes, that was most of the noteworthy things. I mean, if you are listening to this, you should be excited for a few reasons.
Josh:
[9:44] One, GPT-5 will be available for everyone. Even if you're a free user, you get access to it. It will be smarter. It will do things better. It will allow you to create more cool content. It will basically depreciate the need to be a coder. I mean, this trend feels very obvious now that the coding capabilities of each one of these models is improving so quickly that it seems improbable that a few years from now, anyone will really need to write code. It just, I want personal takes, personal opinions. We both just watched this. We both have our own ideas. What did you walk away from this presentation thinking? Were you mind blown? Were you a little disappointed? Like, where do you stand on the spectrum here? Eh, that's...
Ejaaz:
[10:20] That's kind of how I feel, if I'm being honest. All right. Okay. I accept that it's blown a bunch of benchmarks out the water. In fact, not even by that much. I think it's like two percentage points better than Claude Code, which is still, don't get me wrong, as you point out, amazing. But if you're going to come out with a new Frontier model and you're trying to win this entire AI race, I think you need to kind of maybe do better or have a better example. One thing I did like about the coding stuff, actually, now that I think about it, is in the demo itself, the guy spun up multiple tabs of ChatGPT and ran the same prompt, right? And he explained his reasoning behind that, which was like, I just want to see multiple versions and then pick my favorite, right? Another thing that I liked was Christina Kaplan, who is the head of memory, the memory feature in OpenAI, who we're actually getting on in a few weeks on the show to interview. She said that memory and context travels from your older model conversations as well as across all the new GPT-5 conversations. So that's amazing, right? So if you, in that example, spun out a bunch of different tabs, it'll have context from all the other tabs that you're running on. So, okay, that's amazing. But aside from that, Josh, I'm not that enthused. Like I said, this is the first OpenAI live stream where I kind of exited out of it after like 35 minutes. Sam should have just kept it at 30 minutes. This is the longest live stream he's ever done. And I are you doing
Josh:
[11:43] This yeah it's uh it's funny he just sent me a message like a few minutes before the live stream even ended and he was like i'm done and i was like yeah you know what i'm done here too it just got a little boring for my liking there was this there's this dark undertone to it and a lot of the outputs didn't match the the outputs that i was hoping for you expect this model to not only be smarter but you expect it to come with more features that make it exciting and they did this a little bit with this with the canvas improvements where you can actually generate code and create these uis but even the point that you liked ejaz where you said he opened up multiple windows and gave it the same prompt and chose his favorite, that feels like a responsibility i i shouldn't have to have i want a mixture of the agents i want i want you to tell me and you to decide which one and then i'll tell you how i want it changed but even just like their own demos of opening up five tabs they're like i didn't really like that one this one's okay i didn't love this one i'm like i'm that's my job i'm supposed to do this like i use grok heavy and grok heavy has this mixture of agents it's got 10 of these bad boys doing it all at once and it's giving me one great answer and that's as a user that's what i want i I want models that will improve the user experience, not kind of make me have to jump through hoops to optimize my way that I use it. And also, MDashes are still there.
Josh:
[13:00] So, I mean, generally speaking, I think sentiment matters a lot. When you see, I mean, I'm going to compare this to Grok because XAI had a recent announcement with Grok4. They're very excited about the future of humanity, the future of intelligence.
Josh:
[13:13] They're very excited of this truth-seeking AI with this really grand mission statement. And it's very optimistic. it's very enthusiastic it's very it's very much driven in a way that feels exciting whereas the chat gpt is like oh guys this model is very safe this model is not going to lie to you this model and they actually had an entire demo about what was the the way that they called it they had like a specific word for this deception is what they called it they had an entire deception category where don't worry guys the model will no longer try to deceive you it's much better now and it feels like they're just they're doing two different approaches and one of them is very defensive and the other one is very inspirational and it just left me kind of feeling like okay well i'm still gonna use chat gpt all the time i'm glad i don't have to pick any models now it'll just do it all for me but this doesn't really change much for me and it actually makes me feel a little more excited about other companies who are in the space who are progressing even faster seemingly than open ai because when i think about google when i think about xai they're both doing really cool things that seem really impressive. I mean, XAI, they had the Genie 3 release, or sorry, Gemini had the Genie 3 release earlier this week. And that, to me, blew my mind far more than this.
Josh:
[14:26] This was a disappointment. This was a bummer on a lot of cases. This was not the Super Bowl that I had hoped. I wore this like really nice white shirt. I was ready to go and they let me down. So that was a little depressing. But I do want to get into social commentary because the good news is we're not alone in this sentiment. A lot of people also were, it left a lot to be desired. So do we have any posts that we could share of people who, oh, actually, here's one right here. So for the people listening we're looking at a chart and it is the arc agi2 leaderboard which is generally how we measure the closeness to agi whatever you think that threshold may be this is a single metric that people use to compare and what we're seeing here actually is grok5 is far ahead of gpt5 which doesn't seem right grok5 has already been out for a couple of weeks and yet it is much much more powerful granted it's a little more expensive than gpt5 but that is a significant improvement over GPT-5. So if we're just comparing benchmarks in terms of AGI, Grok is winning. Oh, and here's another post from our good friend, Beth Jesus. He was on Bankless a little while ago. And he had a comparison of GPT-5's benchmarks in humanity's last exam to the Grok benchmarks. And again, it missed. And Grok is actually superior in a lot of these benchmarks.
Josh:
[15:38] So it leaves a lot to be desired and a lot to be reconsidered. I think somewhere in here, Ejaz, is a post from Polymarket. It's a picture from Polymarket, which I really, really adore. This chart right here. And it says, which company has the best AI model by the end of August? And OpenAI has been the favorite. And they've been pinned at, what was that? That was like 90% all month. And people really saw it.
Ejaaz:
[16:02] Just under 80.
Josh:
[16:03] Just under 80. They were pinned at 80% all month. And then that was in anticipation, obviously, of GPT-5. And as soon as this presentation happens, they are now down to sub 20% with Google actually taking the first place at over 80%. So that type of shift, that is a public market sentiment shift. That is a big shift to go from 80% to 20% over one presentation. It means a lot. And I think what's fun for us is now we get to reconsider the leaderboards of who is who is going to be the leader of AI and how they're going to be the leader of AI. If anything, this probably complicates it more because I mean, OpenAI still has the most users. They're still incredible. This was by no means a flop. And I don't want to make people think like this sucks. This is an incredibly smart model that is incredibly capable that I will be using literally every single day personally. So it's it's great. It's just not I mean, the stakes are so high,
Josh:
[16:53] we're moving so quickly, there's there's there was a lot left to be desired. Did you have what do you got? Give me something. Agree, disagree. Any more sentiment to share with the public?
Ejaaz:
[17:04] Yeah, I mean, I mean, I mean, I'm not going to try and debate you because I agree with you on this one, right? Like one word to describe it is underwhelming. It is still great. Like you said, I'm not going to use another model right now, because it still has all my memory, which is the most important thing. And the features are technically, I guess, the best, but I'm not really seeing it in practice just yet. There is no kind of magical component. The point around Grok 4 is simply that like, performs better than GPT-5 on a number of different tasks. And the point that's being made in this tweet is the tasks that were demoed by OpenAI just now in the live stream were the ones that Grok 4, they were selectively chosen, basically cherry picked. And so just goes to show that you can't get this past everyone else that's on social media. It is like the vessel of advertisement and people could sense the inauthenticity, the kind of like morgue like effect when they were writing eulogies that this wasn't really a magical apple moment. This was just kind of like a kind of like nothing burger, dare I say, right? I also saw this tweet,
Josh:
[18:06] Josh. Oh, this is great.
Ejaaz:
[18:08] Okay. So I was just going to bring this up. This is so good. To give some context here, Josh just described earlier on in the episode, their first demo straight out the gate was this lady that was demonstrating how GPT-5 could write code to demonstrate something called the Bernoulli effect. And this is like, you know, a complicated physics effect and it's good to see graphically, visually, the wind dynamics and all that kind of stuff. And someone screenshotted the answer that it gave in terms of like describing the Bernoulli effect and cross-checked that with how the Bernoulli effect actually works. And it was hallucinating. And the reason why this is so funny is because on the live stream, they spent 10 minutes assuring everyone that this model is the model to least hallucinate. And in fact, it showed a bunch of charts and rates showing that it hallucinated the least. So I just found this pretty funny that you had this kind of like live stream showing all these supposedly cool things. And then you had this kind of like mirror-like vertical on social media where everyone was kind of unpacking and showing and exposing the flaws in the demos itself. So it was a mixed bag. Like you said earlier, Josh, like they're approaching two strategies. One, this inspirational thing. the other like hey but we're also like super safe and aligned with humanity don't worry right and they should have just
Josh:
[19:28] Leaned heavily on one and forgotten.
Ejaaz:
[19:31] About the other
Josh:
[19:31] Yeah listen models are going to hallucinate that's okay but to spend 10 minutes on it and then to show an example of a hallucination live on stream it's like okay well you got to pick your battles i don't think those were the winning battles to pick but i think a lot of this goes down to just the the authenticity you mentioned i think is a really big deal it's kind of how you you carry yourself as a company and as a culture when you're delivering this. And when we compare this to Google, Google feels very aligned. Like I can kind of define Google as a personality, as a company. I could almost imagine the decisions they're going to make around certain issues before they even happen. The same is true with Grok, where I understand. And a reason why I really enjoy Grok is because it's very unfiltered. When I ask for an unhinged mode, it will actually do that. And it will be direct. And in OpenAI, it feels like it's trying to satisfy this subset of people that doesn't really feel authentic to the mission so you're getting these mixed results where on one end sam is this like feisty aggressive guy who's building agi for the world and then the other end it's like hey we're not going to lie to you we're like we're being really careful about this we don't want to hurt anybody and i'm, there's just something a little off about it not sure if i put my finger on it but yeah it's just.
Ejaaz:
[20:49] A little a
Josh:
[20:49] Little off which isn't to say again the model's great this is awesome it's just like not quite as awesome as i hoped and i think that's probably the overall vibe of this release is hey this was going to be the super bowl but it turned out it was just another model release and in fact it was a regular bowl it was
Josh:
[21:06] a regular bowl it was just like.
Ejaaz:
[21:07] It was a sunday
Josh:
[21:09] Of football and that is it nothing special here and in fact like it's college.
Ejaaz:
[21:12] Level football yeah
Josh:
[21:13] And it's a shame that on the week of gpt5 uh the announcement that most excited me was google sheeny 3 it's this virtual world builder that that can create these virtual worlds you can walk through them they're dynamic they remember where you are and we'll probably have to do an episode covering that because that was fascinating well.
Ejaaz:
[21:27] Actually josh let me let me interrupt you there they actually integrated google calendar and gmail
Josh:
[21:33] Which sounds like this is huge yeah i forgot this yeah okay.
Ejaaz:
[21:37] So it kind of sounds like a nothing burger, but it's actually pretty cool and useful to me.
Josh:
[21:42] No, this is incredible. Let's talk about it.
Ejaaz:
[21:44] Yeah, this was in the section of their memory update. Again, I mentioned Christina Kaplan. She's coming on the show soon. We're going to interview her. We're excited about that. But she announced that for a while, ChatGPT's memory has only been focused to GPT itself. And she said she found that pretty frustrating in her day-to-day when she's doing other things that weren't related to ChatGPT and her conversations with it. So she announced these two major features where GPT-5, what integrates directly to your Gmail inbox and your Google Calendar. And she went through this demo live on stream where she asked it, hey, I need to train for the marathon that I'm running in four weeks. Can you figure out when is best for me to train? Maybe give me an advisable routine and diet to go with and anything else I might need to be aware of. And it gave this really concise, structured output, Josh, Here's what I loved about it. It didn't just like kind of like figure out where the best place was for me to run the best timing and book it on my slot so that I could just see it on my calendar. It also wrote at the end, by the way, there's these two unread emails, which you should probably get to and answer before you go on your run. And I love that. That was intuitive. That was human. That felt magical to me. And I know it sounds lame talking about email and calendar, but I know most of you listeners spend a lot of your time day-to-day, monday to friday doing exactly that so if there is a tool that comes in that feels human that can save me a bunch of time and make my life easier i'm all game for it josh
Josh:
[23:09] Thoughts this is how you know we're doing this immediately after the live stream we haven't even prepped because that in hindsight was one of the favorite my favorite features that they announced it yeah it was it's it's incredible and of course i mean not to brag or anything but the coolest feature was delivered by our future guest who's coming on in a couple weeks so stay tuned for that but the integration is amazing because what we've recently had Arvind on, the CEO of Perplexity, and I think a lot of my problem with his bull case on the browser is that I don't really want the browser form function, form factor. Like I don't want to have to engage with the browser. I didn't even want it to be there. I want to tell my agent what I wanted to do and it'll go off and do it. And OpenAI has that. They have the agentic feature. And what they just added today is integration into one of my most used applications that require an agent. And that is my email and that is my calendar. And those two things kind of run my life, right? It's the way I interact with all the people for work. It's the way that I schedule all the things in my life. And now OpenAI has access to that and it integrates it into my entire workflow. And that to me is incredibly powerful because there's nothing... There's been nothing so far that's been able to kind of manage my day-to-day life because it doesn't have the context of what I'm doing. And now that it has email context and my calendar context, that is like a very huge unlock.
Josh:
[24:20] So I'm excited to try this out probably the most out of anything. I can't believe we failed to mention this. That's going to be really cool. And when you think about it and you think about the ways that you engage with the web, I mean, most of it is through, we spend a lot of time on Discord. We spend a lot of time in email.
Josh:
[24:35] We spend a lot websites but i'd say a majority of my web surfing time is on a small subset of services that are one api call away from being fully integrated into chat gpt and i think that that is an unlock that we're starting to see here with calendar and mail is that we are just like a couple integrations away from having most of your online productive life being integrated into a singular model and and now it has the context of your email in addition to all of the context you've been feeding it for the last 32 months since I think ChatGPT came out. So that's a big deal. That feels like an exciting seed they're planting. I am looking forward to more integrations with more services that I use.
Josh:
[25:12] But yeah, that's going to be a pretty cool one to try out and actually use in my day-to-day life.
Ejaaz:
[25:16] Well, folks, you heard it all here literally first. We started recording this and it's going out live pretty soon, straight out of the live stream. So these are our freshest thoughts. Sorry if they were a little muddled. We weren't prepped as well as we normally were, but we wanted to get this content out to you because we know our listeners want to hear about this and we're extremely passionate about it. We are going to have a follow-up episode tomorrow because open AI aside, there were actually a number of other really cool things that happened in AI. And we're going to cover all of that in tomorrow's episode. But again, if you enjoyed this episode, if you enjoyed all our previous episodes, if you're curious about anything else, hop on the pod, email us, message us, tell us any kind of feedback that we want to hear, like, subscribe, and share with your friends. And josh we'll see them on the next one yeah
Josh:
[26:00] We'll see you next time here's your homework i need everyone to go and actually try out the model don't let us tell you how to feel go try it out share your thoughts in the comments of how you feel i actually haven't sent a single prompt to the model yet i'm not even sure it's been rolled out so we're going to do that we're going to re-evaluate discuss and then we'll come back with even more takes but yeah let us know what you think uh hopefully you enjoy this quick and dirty update on gpt5 go test it out and we'll see you guys soon.
Creators and Guests
