0:02 Last week, Sam Alman declared a code red
0:04 threat level midnight after OpenAI's
0:06 lead in the AI race was shattered by the
0:08 unexpected dominance of Google's Gemini
0:11 3. Just about everybody was ready to
0:13 ride off OpenAI as the Netscape of the
0:15 2020s. That is until yesterday when they
0:18 dropped their answer to Gemini GPT 5.2,
0:20 a model that once again moves the AI
0:23 hype wheel back in favor of open AI.
0:24 It's dominating all the Trust Me Bro
0:27 benchmarks and even beats Claude Opus
0:29 4.5 on software engineering and
0:30 reasoning. That the real flex though is
0:33 its rise to the top of the ARC AGI
0:35 benchmark. The ARC prize just verified a
0:38 390x efficiency improvement in one year
0:41 from the 03 model to 5.2. That's not a
0:44 typo. That's a model that's 390 times
0:46 more efficient. In today's video, we'll
0:47 find out if we finally reach the edge of
0:50 the AGI threshold or if this is just
0:51 more smoke and mirrors to keep the AI
0:54 hype train going into 2026. It is
0:56 December 12th, 2025 and you're watching
0:58 the code report. Artificial intelligence
1:00 has already ruined my Christmas this year.
1:00 year.
1:06 >> It's the most terrible time of the year.
1:07 Thanks to this nightmarish commercial
1:09 produced by the artificial food
1:11 generation company McDonald's, the
1:12 creators of this piece of crap you're
1:14 watching tried to act like they prompt
1:16 engineered a real work of art, but it
1:18 was so universally hated, McDonald's was
1:20 forced to pull it from the airwaves.
1:22 Unfortunately, this AI slop content is
1:24 only going to get worse. Because OpenAI
1:27 just inked a $1 billion deal with Disney
1:29 to allow their iconic characters to
1:31 appear in AI generated photos and
1:33 videos. And that's huge because it means
1:34 anybody can now generate their own
1:36 custom Star Wars or Toy Story movie and
1:38 will be forced to use OpenAI's tech to
1:41 do so. Very concerning. But speaking of
1:43 concerning, prediction markets like Poly
1:45 Market and Kelshi that somehow predicted
1:47 that GPT 5.2 would be released
1:49 yesterday. And by predicted, I mean
1:52 OpenAI employees and other insiders
1:53 found a new infinite money glitch
1:55 because insider trading exists in a gray
1:57 area in these markets. An obvious
1:59 insider at Google made a million bucks
2:00 this month. And in many cases, it's
2:02 insider trading that makes these
2:04 prediction markets so accurate. Very
2:06 concerning. But now, let's get back to
2:08 GPT 5.2. And the thing everyone is
2:10 talking about, its performance on the
2:13 ARC AGI benchmark. But what even is ARC?
2:14 It stands for abstraction and reasoning
2:17 corpus and is designed to test whether a
2:19 model can solve novel unique problems
2:21 it's never seen before. Problems that
2:22 require pure reasoning instead of
2:24 memorization. The problems are
2:26 intentionally weird low data puzzles
2:29 where brute force pattern mashing fails.
2:30 Regular humans can usually solve them
2:32 after a few examples. What's weird
2:34 though is most AI models completely face
2:36 plant. The important takeaway is that a
2:38 model that scores well on ARC has the
2:40 ability to generalize instead of just
2:42 acting like an autocomplete on steroids.
2:44 And that's why OpenAI flexing on this
2:46 chart is a much bigger deal than many of
2:48 the other Trust Me Bro benchmarks. But
2:50 for the average LLM user, it's becoming
2:52 harder and harder to evaluate each new
2:55 release. Like ChatGpt 5.2 2 is also
2:57 supposedly much better at coding than
2:59 before and it's supposed to have far
3:01 fewer hallucinations, but I'm not sure I
3:02 can even tell a difference. I'm happily
3:04 using it to generate spelt 5 code with
3:06 an MCP server, but what I'm more
3:08 concerned about is deploying my code
3:10 somewhere reliable, like Railway, the
3:12 sponsor of today's video. It's a cloud
3:14 platform that lets you instantly host
3:16 productionready deployments and manage
3:18 your entire infrastructure stack in one
3:20 place. is so instead of fighting with
3:22 mystery YAML files and 25 different
3:24 dashboards, you can spin up isolated
3:26 environments in one click that will
3:28 scale automatically as needed. And
3:30 unlike other platforms that shall not be
3:32 named, Railway only charges you for the
3:35 resources you actually use, not what you
3:38 provision, which saves over 65% on cloud
3:40 costs. And developers love it because it
3:42 gives you 50% faster build times while
3:44 letting you spin up any service you need
3:47 shockingly fast. They even have 1,800
3:49 different templates that let you deploy
3:51 any app or database with a single click.
3:53 Sign up for free today at the link below
3:55 and you'll get $20 in credits when you
3:57 upgrade. This has been the Code Report.
3:58 Thanks for watching and I will see you