YouTube Transcript:
Gemini 2_5 just leveled up_ And it’s a BEAST
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
View:
Why do I have a photo of a tree here?
More on that in just a second. So,
Google just took their smartest AI
model, Gemini 2.5 Pro, and made it even
better. Now, confusingly, instead of
calling it 2.6 Pro, it's still called
Gemini 2.5 Pro, but they've added 0506
to the end, which signifies the date of
the release. Anyways, this model is
seriously impressive. Apparently, it
dominates the LM Marina leaderboard
across all categories, making it the
most performant, most intelligent AI
model out there. So, in this video, I'm
going to show you how and where to use
it. Plus, I'll show you some cool things
it can do, and of course, we'll go over
its specs, performance, and benchmark
scores. Let's jump right in. Thanks to
HubSpot for sponsoring this video. All
right, first of all, where can you use
this? At least at the time of this
recording, it's only available in
Google's AI Studio, which I'll link to
in the description below. At the top
right in the model dropdown, you should
see this new Gemini 2.5
Pro0506. Note that another place to use
Google's Gemini models is the Gemini
platform, which I'll also link to in the
description below. It's just
gemini.google.com. But if you select the
model dropdown, at least for now, it's
not clear if this 2.5 Pro is the latest
0506 version. So, in this video, I'm
mostly going to use AI Studio to show
you some cool examples. And this is what
I personally prefer because you can
switch from all these different models,
including this really powerful image
editor from Gemini 2.0 Flash. And then
note that for this latest Gemini 2.5
Pro, it has a token window of over a
million tokens. This is basically how
much information you can fit into your
prompt at once. So a million tokens is
roughly over 700,000 words or an hour of
video. This is like five times larger
than what other leading AI models can
take in at once. And then we also have
this really handy temperature slider
which determines the creativity in its
responses. So, if you drag this all the
way to two, for example, this wouldn't
follow your prompt as much, and this
allows it to be more creative. If you
drag this all the way to the left, it's
going to process your prompt a lot more
literally. So, for us, I'm just going to
leave it at the default of one. And then
you also have various toggles over here.
So, structured output basically forces
the AI to format its response in a
structured way. So for example, if you
want it to only output JSON or output a
data table with specified columns, then
this would be a good toggle to turn on.
And then for code execution, this allows
Gemini to also execute code in the
prompt. And then for function calling,
if you enable this, the AI can use
external tools or APIs to retrieve
information. And then finally, this one
is also super useful. If you toggle this
on, it basically searches the web with
Google so that it can fetch the latest
information. Notice for the leading AI
models out there, including Gemini 2.5
Pro, they can already do simple stuff
like writing an essay or simple Q&A. So,
if you're going for simple stuff like
that, it doesn't really matter which one
of these models you use. They're all
really good. But what makes the top
models particularly useful, including
Gemini 2.5 and 03 and04 Mini, is their
ability to think and reason and solve
more complex problems for STEM subjects
like coding, math, and science. So, in
this video, that's mostly what I'm going
to show you. I'm going to test it on
some really challenging STEM related
prompts. Plus, the awesome thing about
Gemini is that it's multimodal. So, it
can take in and understand multiple
formats, including audio, images, and
video. In fact, for the first example,
I'm going to take this video where I
draw out a diagram of the app I want and
explain what I wanted to do. Let me play
you the video first. I want you to
create an interactive earthquake
visualization of Japan. So, let's say we
have a map of Japan like this. First, I
want you to list or show me all the
major cities in Japan on the map. And
then there's going to be a left sidebar
where I can adjust various settings like
earthquake, magnitude, etc., etc. So
these are the settings that I can
adjust. And whenever I click somewhere
on the map, so let's say I click here,
then you would start to create an
earthquake. So it's going to be an
animation effect that slowly ripples and
ripples all the way until it hits one of
these cities. And based on the magnitude
of the earthquake, I want you to
calculate how severe the impact would be
for each major city. So, I've uploaded
that to YouTube. And then I'm going to
paste the YouTube link in here like
this. So, notice it automatically knows
how to extract and analyze the YouTube
video. And this takes up around 14,000
tokens out of the million tokens. Now,
to make sure it's actually analyzing the
video and understanding everything in
the video for my prompt, I'm not going
to mention anything about earthquakes or
Japan, I'm just going to write put
everything in a standalone single HTML
file. So, let's click run and see what
that gives us. All right, so here's its
thinking process. For all the top models
out there, usually they have this
thinking function where it takes some
time to think through its answer and
correct itself before it gives you its
final response. So let's look at its
thinking process really quickly. Here
it's breaking down the requirements
which I specified in the video. A map of
Japan, left sidebar for settings, click
to create earthquake, earthquake
animation, impact calculation, etc.,
etc. And then it starts with a plan of
attack. So phase one is the basic
structure and the map. And then phase
two is adding the sidebar controls.
Phase three is the earthquake animation.
Phase four is impact calculation, etc.,
etc. So afterwards, it proceeds to give
me the entire code. So I'm just going to
scroll all the way down and then
download this HTML. And then I'm just
going to open the HTML in my browser.
And here's what we get. Indeed, we have
an interactive map of Japan, which you
can move around. And if I click here,
for
example, it does cause an earthquake.
And we can see the impact of the
earthquake on all the cities. This is so
cool. Now, if we change the magnitude,
let's change this lower. And then if I
click here again, note that the severity
is lower than the previous earthquake,
which has a larger magnitude. And then
if I drag this all the way to like 10,
for example, and let's say I click here,
notice that the severity is a lot
higher, reaching 100 for some of these
cities that are nearby. And then let's
see what wave factor does. I think this
is like the speed of the ripples. So if
I drag this to a lower value and I click
here again. Yeah. So this ripples a bit
slower. Anyways, a really cool app. It
totally understood my really lousy
explanation and illustration from the
video. And this opens up a ton of
possibility. Instead of just typing out
a prompt and not fully being able to
explain how you want to design an app,
you can record yourself just drawing out
an illustration explaining what each
component of the app does. And then you
just plug the video into Gemini and it
would generate the app for you. All
right, next up again because Gemini is
multimodal and it can understand images.
I'm going to upload this image of a tree
and then I'm going to ask it what is
here. Let's click run and see if it can
figure this out. It only thought for
like 5 seconds. And here it has
correctly identified that it's a mossy
leafetailed gecko. It even gave me the
scientific name camouflaged on the tree
trunk. And this is indeed correct. So
for those of you who have no idea what
the hell you're looking at, there is a
gecko like over here. So this is its
head. It's pointing down. You can see
here are its eyes. And if you follow my
mouse, this is roughly the outline of
its head. This is a really cool gecko
found in, I believe, Madagascar. And
it's really good at camouflage. So as
you can see, Gemini has no problem
analyzing and understanding images.
Speaking of Google's Gemini, if you're
in marketing and you find yourself
spending hours on research, strategy,
and content creation, it's time to
rethink your approach with AI. Check out
this free guide, Google Gemini at Work
by HubSpot. Inside, you'll discover the
Gemini Marketing Stack. These are AI
tools that make your research, campaign
planning, and content creation way more
productive. And my favorite part, it
provides a ton of pre-built prompts and
templates which you can just copy and
paste. You'll get step-by-step
instructions on how to do research 10
times faster using Gemini Deep Research.
They also show you how to use Notebook
LM to connect your campaign data,
competitor research, and customer
feedback into one powerful dashboard
that actually thinks for you. No more
digging through folders or piecing
insights together manually. There's even
a four-week implementation plan at the
end, so you can start small and see
results right away. This resource was
made by HubSpot, the sponsor of this
video. I recommend you download it for
free via the link in the description
below. Next, I'm going to upload this
image of a hike I did like a few years
ago. And this isn't even like the main
lake or attraction of the hike. It's a
pretty normal, you know, hiking photo of
mountains and a lake. This could be
anywhere. And then I'm going to paste
the image in here and then ask it where
is this. So let's click run and see what
that gives us. All right, here's what we
get. Let's expand its thinking process.
So it's analyzing the key visual
elements. It has turquoise water, steep
tree covered slopes, glaciers in the
background. It could be all of these
options. Based on the feel, it seems
like Canadian Rockies or BC coast. And
then it's actually searching for
specific lakes. Now, because I didn't
turn on grounding with Google search,
it's not actually using Google to
search. It's just searching mentally in
its head based on the knowledge it was
trained on. And then it has found all
these turquoise lakes. And then after
some additional clues, it has arrived
that this is indeed Joffer Lakes. The
crazy thing is it has kind of even
identified that this looks like the
middle lake, which I believe is correct.
So, those were some tests on its image
and video analysis capabilities. Next,
let's test its knowledge on coding. So,
I'm going to get it to build a Windows
XP desktop with the following apps.
Paint. Clicking on this should open a
new window with an interactive canvas.
Video player. Clicking on this should
open a window where I can enter a
YouTube URL and press play. And then for
calculator, clicking on this should open
a window with a working calculator. Use
CSS,JS, and HTML in a single HTML file.
This is a key phrase I like to use to
keep everything in a self-contained
standalone file. So after pressing run,
if we expand its thought process, again,
it's breaking this down step by step. So
it's first understanding the request,
then it's structuring the HTML, and then
next it's handling styling, so the
desktop look and feel. And then next
it's covering the functionality with
JavaScript. And then it's refining
everything. And then here is a really
interesting observation. So it's also
self-correcting and improving its
response. So here's its initial thought,
but here it has corrected itself. You
also need dragging and stacking. So you
might need to implement this. And then
for the YouTube player, would it just be
this? The correction is no. You also
need an embedded player. And then also
for this, is it safe? etc., etc. So,
it's kind of like evaluating its own
response and then revising it further.
And so, afterwards, it has given me this
code. So, I'm just going to scroll all
the way down and download the HTML. All
right. And if I open this up, you can
indeed see a classic Windows XP desktop
with the appropriate colors. We even
have a start menu and the clock over
here. And if I click on paint, indeed,
it gives me a window. And let's try
painting this. This does work. Let me
change the color a bit. And let me
change the
size. And the size and color also work.
Really impressive. So, let me exit out
of this. Next, I'm going to open this
video player. And then, let me paste in
a YouTube URL. I'm just going to paste
in my earthquake video and then press
play. I want you to create an
interactive earthquake visualization of
Japan. So, let's say we have a map of
Japan like this. First, I want you to
list or show me all the major cities in
Japan. Very nice. So, that works
perfectly. And then finally, we have
this calculator app. Let's do like 3 *
9. And yes, that equals 27. So, all
three apps are working. So, it's able to
code up a Windows XP desktop with three
functional apps in just one prompt.
Super impressive. All right. Next up,
let's get it to create some cool
visualizations. So my prompt is create a
particle cloud visualizer that can
change shape, color, and other
properties. Make it interactive. Use
3JS. This is a JavaScript library for us
to create 3D animations. And also
anime.js. This is also another library
that can help us create smooth and
dynamic animations. And then again, my
key phrase that I like to use, put
everything in a single HTML file. So
let's click run and see what that gives
us. All right, here's what we get. And
if I expand this again, it's breaking
down the core request. Then it's
planning out the structure of the HTML.
It's setting up 3.js. It's setting up
the particle system. And then it's
coding up the interactivity,
implementing shape transitions, color
changes, etc., etc. And then finally, we
also have this really important
self-correction and refinement section
where it evaluates its own response and
revises it further. So afterwards, it
has given me this HTML code, which I'm
just going to scroll all the way down
and then click download. All right.
Next, I'm going to open this up on my
browser again. And holy smokes, what do
we have here? So, it looks like this
particle cloud is slowly forming into
the sphere. Oh my god, this looks so
cool. And I can like drag my mouse
around to view this further. If I
increase the particle size, it does
increase. Very nice. And then I can also
change the color like this. Very nice.
And then if I toggle this, apparently it
also uses this shape mod color. Let me
try changing the color of this and see
what happens. Okay, so it looks like
it's turning the color into a gradient
now. And then for shape, right now it's
a sphere. Let's turn this into a
cube.
Whoa. Holy smokes. This is such a cool
animation. Look at that. And then let's
turn this into a
Taurus. This is so impressive. Look at
that. And then finally, let's turn this
into a
plane. And indeed, it turns it into a
flat plane like this. Really cool. Let
me turn this back into a sphere. And
indeed, it creates a sphere from this.
So there you go. It also just nailed
this zero shot with just one prompt. In
fact, let me refresh the page again. I
really like the initial animation where
it turns into a sphere from this
particle cloud. Look how cool this is. I
really love that effect. All right.
Next, let's test its ability to
understand physics. So, here the prompt
is make a Gton board simulation with a
grid of pegs, sidewalls, and separate
dividers at the bottom. Drop balls from
the top upon button click. use
matter.js. This is another really
important JavaScript library that can
simulate physics very well. And then
here's my key phrase to put everything
in a single HTML file. Let's click run
and see what that gives us. All right,
here's its response. I'm just going to
scroll all the way down and download the
HTML. And then afterwards, let me open
this up. And here you can see a perfect
Gen board with perfect physics
understanding. So if I press on drop
ball, the ball indeed drops. and drops
down randomly into a certain container
based on gravity and physics. So, let's
click this a few more times so you can
see a few more examples. This is again a
flawless app that it created. Zero shot.
Super impressive. All right, here's
another cool example. Show me a
visualizer with animations upon mouse
hover. In a sidebar, I can choose from
different effects like blur, liquid,
chrome, particles, waves, grid,
distortion, iridesence, hypers speed,
add more. Some of these effect names I
just made up. I'm not even sure what
it's going to give me. And then I'm
going to use anime.js, which is again
really good for creating animations on
web pages. So, let's click run and see
what that gives us. All right, here's
its response. Again, it has the usual
thought process where it breaks down
everything and tackles it step by step.
And then at the end of its thinking
process, it's also correcting itself and
then also doing a final check on all the
requirements. And then I'm just going to
scroll all the way down to the end of
the code and then press download. All
right, let's open this up and see what
we get. So here the first effect is
blur. If I hover my mouse over this, it
indeed blurs these circles. And if I
take my mouse off the screen, the
circles are sharp again. Really cool.
So, blur works. Next, let's move on to
particles. If I hover my mouse, wow,
look at that. If I move my mouse along
the screen, it automatically creates
these particle fireworks. So, you can
use Gemini to easily add these really
cool and complex animations on your
website. Next, let's try waves. All
right, here's what we get. Now, if I
place my mouse on the screen, that is
what it does. Let me just do this a few
more times so you can see the effect of
my mouse hover. Really cool. And then
next up, we have grid distortion. And
here's what grid distortion does. Again,
a very interesting effect. And then
hypers speed. If I place my mouse on the
screen, this is so cool. So, notice that
the stars are now moving at a much
faster pace. And then if I take my mouse
off the screen, the stars now revert to
a slower pace. Let's do this again so
you can see the effect. Very nice. Next,
let's try glitch and see what that does.
Very cool. So, depending on where I
hover my mouse, it will add this glitch
effect over the text. And then let's see
what pixel stretch does. Whoa, really
interesting. So, it seems like it's just
stretching the letters either
horizontally or vertically. This kind of
looks like a barcode as well. And then
next we have liquid chrome. Let's see
what this does. Wow, this is also so
cool. Depending on where I move my
mouse, it's creating this effect which I
can't even describe. And then finally,
we have iridesence, which looks like
this. I don't even know what to expect
for iridescents, but uh this does look
like an iridescent orb. And if I move my
mouse on this sphere, I'm not sure what
happens. It does kind of change the
color slightly, but um yeah, again, I
don't really know what to expect for a
lot of these effects, so I'm not
expecting much here. The fact that it
was able to even create an iridescent
looking orb is already really
impressive. So, those are some of my
tests. Notice that this new Gemini 2.5
Pro0506 is not like way better than the
earlier version. This is just like
marginally better. And in fact, I
already did a full review of the
original Gemini 2.5 Pro where I go over
some really insane demos. I got it to
create a Pokédex, an interactive night
sky viewer with constellations. I got it
to analyze a ton of financial reports
and even create a 3D tourist map of Hong
Kong. So, I'm not going to repeat too
many of those examples in this video. If
you want to learn more, check out this
video if you haven't already. Finally,
here are some demos by Google
themselves. So, again, because Gemini is
multimodal, it can understand images.
You can upload an image of this tree and
then get it to transform this image into
a code-based representation of its
natural behavior. And this is what you
get. And instead of a tree, if you
upload a photo of a spiderweb with the
same prompt, it would create this app.
And then here is a photo of a fire with
the same prompt. Here is a photo of
fireflies. We also have clouds, a flock
of birds, and this photo of a fern. I
really like this animation. And then
here we have some water ripples, and I
don't even know what this is. Is this
like fungus growing or something? And
then it can even create this lightning
simulator. Really cool. Here's another
awesome demonstration by Deis Hassabis
where he just drew out a really rough
sketch of the app he wants to create and
then he simply wrote, "Can you code this
app?" And this is the final
result. Or here's another example where
the user prompts to code a game based on
his dog. He's going to upload a photo of
his dog with a Sakura background and it
actually creates a Sakura related game
with his dog as the character. How
incredible is that? All right, next
let's go over its specs and performance.
So, first up is this chatbot arena where
people can blind test different AI
models side by side. And apparently for
this latest version of Gemini 2.5 Pro,
not only is it ranked number one
overall, but across all these
categories, including style control,
hard prompts, coding, math, creative
writing, instruction following, and
longer query. And by the way, the margin
is absolutely huge. So if you look at
like the next top three models which is
open eyes 03 and GPT40 and Gro 3 these
only differ within like 10 points but
for Gemini 2.5 Pro it beats the next
best one by 37 points which is an insane
lead. Now, instead of LM Arena, here's
another popular leaderboard called
LiveBench by Abacus AI. And
interestingly, in this leaderboard, the
latest version of Gemini 2.5 Pro does
not perform so well. Now, this is based
on their own benchmarks. These are not
blind tests from other users, so keep
that in mind. Notice that 03 High is
still ranked number one on their
leaderboard. And then Gemini 2.5 Pro is
in third place. It underperforms 03 in
terms of reasoning and coding and
language, but it does outperform 03 in
terms of mathematics and data analysis.
I also tried going to another
independent evaluator called artificial
analysis, but it looks like they have
not added the latest version of Gemini
2.5 Pro yet. So, this is still the March
version. Here's another really useful
benchmark called Fiction Livebench,
which tests the AI's ability to analyze
really long prompts. So, for example, if
the story is like 120,000 words in
length and you ask it some really
specific questions, can the AI model
actually get it right? And surprisingly,
OpenAI's 03 got it 100% of the time
correct, whereas the latest version of
Gemini 2.5 Pro gets it 71.9% of the
time. Keep in mind that this is the same
score as the previous version of Gemini
2.5 Pro. So, if you want to feed it a
ton of information at once and then ask
it specific questions, according to this
leaderboard, 03 might be the better
option. By the way, if you're interested
in learning more about OpenAI's 03 and04
Mini, I also did a full review on that,
and it has some crazy abilities, so
definitely check out this video if you
haven't already. Next up, we have
another leaderboard called Humanity's
Last Exam. This name is really
misleading. It does not mean that we are
screwed once AI can get 100%. This is
basically a test of some really specific
knowledge on really obscure and
specialized scientific domains. And
interestingly, Gemini 2.5 Pro, the
latest version, actually scores a bit
below the earlier version that was
released in March, as you can see from
the score here. However, based on the
confidence intervals, this is not a
significant difference. So, in fact, all
five of these models do not have a
significant difference in terms of their
performance. So, they're all kind of
tied for number one place. Finally, if
you look at this leaderboard called
Geobbench, this basically tests the AI's
ability on guessing the location based
on a photo, like I did with the Joffre
Lakes example. And you can see here that
Gemini 2.5 Pro is currently ranked
number one. And if you add search to it,
which is kind of cheating, but if you
do, it performs even better. It's also
really important that the AI model
actually gives you factually correct
information and doesn't make stuff up.
So, here's a really useful leaderboard
that lists out the hallucination rates
of these AI models, or basically how
often they make stuff up. Now, they
haven't released the results for the
latest version of Gemini 2.5 Pro yet,
but as you can see from the March
version, it hallucinates 1.1% of the
time. If you really want your
information to be factually correct,
like if this is for scientific or legal
research, then at least according to
this leaderboard, you should use Gemini
2.0 Flash instead. Finally, I also want
to go over the cost of this. So in their
official blog it says this improved
version will be available at the same
price. So if you look at the price of
Gemini 2.5 Pro notice that it is cheaper
than Claude 3.7 Gro 3 and Open Eyes03
which is crazy expensive. So not only is
this one of the best models out there
but it's also cheaper than the other
ones making it really cost effective.
Anyways that sums up my review on this
latest version of Gemini 2.5 Pro. For
me, I think the most useful feature is I
can record a video explaining exactly
how I want an app to look and function,
and it would actually understand
everything and create the app for me.
This is way more effective than just
using a text prompt. But let me know in
the comments what you think. And if
you've had a chance to play around with
this latest version, what are some other
cool and impressive things you were able
to come up with? As always, I will be on
the lookout for the top AI news and
tools to share with you. So, if you
enjoyed this video, remember to like,
share, subscribe, and stay tuned for
more content. Also, there's just so much
happening in the world of AI every week,
I can't possibly cover everything on my
YouTube channel. So, to really stay up
to date with all that's going on in AI,
be sure to subscribe to my free weekly
newsletter. The link to that will be in
the description below. Thanks for
watching, and I'll see you in the next
one.
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.
Works with YouTube, Coursera, Udemy and more educational platforms
Get Instant Transcripts: Just Edit the Domain in Your Address Bar!
YouTube
←
→
↻
https://www.youtube.com/watch?v=UF8uR6Z6KLc
YoutubeToText
←
→
↻
https://youtubetotext.net/watch?v=UF8uR6Z6KLc