YouTube Transcript:
Gemini 2_5 just leveled up_ And it’s a BEAST
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
Why do I have a photo of a tree here? More on that in just a second. So, Google just took their smartest AI model, Gemini 2.5 Pro, and made it even better. Now, confusingly, instead of calling it 2.6 Pro, it's still called Gemini 2.5 Pro, but they've added 0506 to the end, which signifies the date of the release. Anyways, this model is seriously impressive. Apparently, it dominates the LM Marina leaderboard across all categories, making it the most performant, most intelligent AI model out there. So, in this video, I'm going to show you how and where to use it. Plus, I'll show you some cool things it can do, and of course, we'll go over its specs, performance, and benchmark scores. Let's jump right in. Thanks to HubSpot for sponsoring this video. All right, first of all, where can you use this? At least at the time of this recording, it's only available in Google's AI Studio, which I'll link to in the description below. At the top right in the model dropdown, you should see this new Gemini 2.5 Pro0506. Note that another place to use Google's Gemini models is the Gemini platform, which I'll also link to in the description below. It's just gemini.google.com. But if you select the model dropdown, at least for now, it's not clear if this 2.5 Pro is the latest 0506 version. So, in this video, I'm mostly going to use AI Studio to show you some cool examples. And this is what I personally prefer because you can switch from all these different models, including this really powerful image editor from Gemini 2.0 Flash. And then note that for this latest Gemini 2.5 Pro, it has a token window of over a million tokens. This is basically how much information you can fit into your prompt at once. So a million tokens is roughly over 700,000 words or an hour of video. This is like five times larger than what other leading AI models can take in at once. And then we also have this really handy temperature slider which determines the creativity in its responses. So, if you drag this all the way to two, for example, this wouldn't follow your prompt as much, and this allows it to be more creative. If you drag this all the way to the left, it's going to process your prompt a lot more literally. So, for us, I'm just going to leave it at the default of one. And then you also have various toggles over here. So, structured output basically forces the AI to format its response in a structured way. So for example, if you want it to only output JSON or output a data table with specified columns, then this would be a good toggle to turn on. And then for code execution, this allows Gemini to also execute code in the prompt. And then for function calling, if you enable this, the AI can use external tools or APIs to retrieve information. And then finally, this one is also super useful. If you toggle this on, it basically searches the web with Google so that it can fetch the latest information. Notice for the leading AI models out there, including Gemini 2.5 Pro, they can already do simple stuff like writing an essay or simple Q&A. So, if you're going for simple stuff like that, it doesn't really matter which one of these models you use. They're all really good. But what makes the top models particularly useful, including Gemini 2.5 and 03 and04 Mini, is their ability to think and reason and solve more complex problems for STEM subjects like coding, math, and science. So, in this video, that's mostly what I'm going to show you. I'm going to test it on some really challenging STEM related prompts. Plus, the awesome thing about Gemini is that it's multimodal. So, it can take in and understand multiple formats, including audio, images, and video. In fact, for the first example, I'm going to take this video where I draw out a diagram of the app I want and explain what I wanted to do. Let me play you the video first. I want you to create an interactive earthquake visualization of Japan. So, let's say we have a map of Japan like this. First, I want you to list or show me all the major cities in Japan on the map. And then there's going to be a left sidebar where I can adjust various settings like earthquake, magnitude, etc., etc. So these are the settings that I can adjust. And whenever I click somewhere on the map, so let's say I click here, then you would start to create an earthquake. So it's going to be an animation effect that slowly ripples and ripples all the way until it hits one of these cities. And based on the magnitude of the earthquake, I want you to calculate how severe the impact would be for each major city. So, I've uploaded that to YouTube. And then I'm going to paste the YouTube link in here like this. So, notice it automatically knows how to extract and analyze the YouTube video. And this takes up around 14,000 tokens out of the million tokens. Now, to make sure it's actually analyzing the video and understanding everything in the video for my prompt, I'm not going to mention anything about earthquakes or Japan, I'm just going to write put everything in a standalone single HTML file. So, let's click run and see what that gives us. All right, so here's its thinking process. For all the top models out there, usually they have this thinking function where it takes some time to think through its answer and correct itself before it gives you its final response. So let's look at its thinking process really quickly. Here it's breaking down the requirements which I specified in the video. A map of Japan, left sidebar for settings, click to create earthquake, earthquake animation, impact calculation, etc., etc. And then it starts with a plan of attack. So phase one is the basic structure and the map. And then phase two is adding the sidebar controls. Phase three is the earthquake animation. Phase four is impact calculation, etc., etc. So afterwards, it proceeds to give me the entire code. So I'm just going to scroll all the way down and then download this HTML. And then I'm just going to open the HTML in my browser. And here's what we get. Indeed, we have an interactive map of Japan, which you can move around. And if I click here, for example, it does cause an earthquake. And we can see the impact of the earthquake on all the cities. This is so cool. Now, if we change the magnitude, let's change this lower. And then if I click here again, note that the severity is lower than the previous earthquake, which has a larger magnitude. And then if I drag this all the way to like 10, for example, and let's say I click here, notice that the severity is a lot higher, reaching 100 for some of these cities that are nearby. And then let's see what wave factor does. I think this is like the speed of the ripples. So if I drag this to a lower value and I click here again. Yeah. So this ripples a bit slower. Anyways, a really cool app. It totally understood my really lousy explanation and illustration from the video. And this opens up a ton of possibility. Instead of just typing out a prompt and not fully being able to explain how you want to design an app, you can record yourself just drawing out an illustration explaining what each component of the app does. And then you just plug the video into Gemini and it would generate the app for you. All right, next up again because Gemini is multimodal and it can understand images. I'm going to upload this image of a tree and then I'm going to ask it what is here. Let's click run and see if it can figure this out. It only thought for like 5 seconds. And here it has correctly identified that it's a mossy leafetailed gecko. It even gave me the scientific name camouflaged on the tree trunk. And this is indeed correct. So for those of you who have no idea what the hell you're looking at, there is a gecko like over here. So this is its head. It's pointing down. You can see here are its eyes. And if you follow my mouse, this is roughly the outline of its head. This is a really cool gecko found in, I believe, Madagascar. And it's really good at camouflage. So as you can see, Gemini has no problem analyzing and understanding images. Speaking of Google's Gemini, if you're in marketing and you find yourself spending hours on research, strategy, and content creation, it's time to rethink your approach with AI. Check out this free guide, Google Gemini at Work by HubSpot. Inside, you'll discover the Gemini Marketing Stack. These are AI tools that make your research, campaign planning, and content creation way more productive. And my favorite part, it provides a ton of pre-built prompts and templates which you can just copy and paste. You'll get step-by-step instructions on how to do research 10 times faster using Gemini Deep Research. They also show you how to use Notebook LM to connect your campaign data, competitor research, and customer feedback into one powerful dashboard that actually thinks for you. No more digging through folders or piecing insights together manually. There's even a four-week implementation plan at the end, so you can start small and see results right away. This resource was made by HubSpot, the sponsor of this video. I recommend you download it for free via the link in the description below. Next, I'm going to upload this image of a hike I did like a few years ago. And this isn't even like the main lake or attraction of the hike. It's a pretty normal, you know, hiking photo of mountains and a lake. This could be anywhere. And then I'm going to paste the image in here and then ask it where is this. So let's click run and see what that gives us. All right, here's what we get. Let's expand its thinking process. So it's analyzing the key visual elements. It has turquoise water, steep tree covered slopes, glaciers in the background. It could be all of these options. Based on the feel, it seems like Canadian Rockies or BC coast. And then it's actually searching for specific lakes. Now, because I didn't turn on grounding with Google search, it's not actually using Google to search. It's just searching mentally in its head based on the knowledge it was trained on. And then it has found all these turquoise lakes. And then after some additional clues, it has arrived that this is indeed Joffer Lakes. The crazy thing is it has kind of even identified that this looks like the middle lake, which I believe is correct. So, those were some tests on its image and video analysis capabilities. Next, let's test its knowledge on coding. So, I'm going to get it to build a Windows XP desktop with the following apps. Paint. Clicking on this should open a new window with an interactive canvas. Video player. Clicking on this should open a window where I can enter a YouTube URL and press play. And then for calculator, clicking on this should open a window with a working calculator. Use CSS,JS, and HTML in a single HTML file. This is a key phrase I like to use to keep everything in a self-contained standalone file. So after pressing run, if we expand its thought process, again, it's breaking this down step by step. So it's first understanding the request, then it's structuring the HTML, and then next it's handling styling, so the desktop look and feel. And then next it's covering the functionality with JavaScript. And then it's refining everything. And then here is a really interesting observation. So it's also self-correcting and improving its response. So here's its initial thought, but here it has corrected itself. You also need dragging and stacking. So you might need to implement this. And then for the YouTube player, would it just be this? The correction is no. You also need an embedded player. And then also for this, is it safe? etc., etc. So, it's kind of like evaluating its own response and then revising it further. And so, afterwards, it has given me this code. So, I'm just going to scroll all the way down and download the HTML. All right. And if I open this up, you can indeed see a classic Windows XP desktop with the appropriate colors. We even have a start menu and the clock over here. And if I click on paint, indeed, it gives me a window. And let's try painting this. This does work. Let me change the color a bit. And let me change the size. And the size and color also work. Really impressive. So, let me exit out of this. Next, I'm going to open this video player. And then, let me paste in a YouTube URL. I'm just going to paste in my earthquake video and then press play. I want you to create an interactive earthquake visualization of Japan. So, let's say we have a map of Japan like this. First, I want you to list or show me all the major cities in Japan. Very nice. So, that works perfectly. And then finally, we have this calculator app. Let's do like 3 * 9. And yes, that equals 27. So, all three apps are working. So, it's able to code up a Windows XP desktop with three functional apps in just one prompt. Super impressive. All right. Next up, let's get it to create some cool visualizations. So my prompt is create a particle cloud visualizer that can change shape, color, and other properties. Make it interactive. Use 3JS. This is a JavaScript library for us to create 3D animations. And also anime.js. This is also another library that can help us create smooth and dynamic animations. And then again, my key phrase that I like to use, put everything in a single HTML file. So let's click run and see what that gives us. All right, here's what we get. And if I expand this again, it's breaking down the core request. Then it's planning out the structure of the HTML. It's setting up 3.js. It's setting up the particle system. And then it's coding up the interactivity, implementing shape transitions, color changes, etc., etc. And then finally, we also have this really important self-correction and refinement section where it evaluates its own response and revises it further. So afterwards, it has given me this HTML code, which I'm just going to scroll all the way down and then click download. All right. Next, I'm going to open this up on my browser again. And holy smokes, what do we have here? So, it looks like this particle cloud is slowly forming into the sphere. Oh my god, this looks so cool. And I can like drag my mouse around to view this further. If I increase the particle size, it does increase. Very nice. And then I can also change the color like this. Very nice. And then if I toggle this, apparently it also uses this shape mod color. Let me try changing the color of this and see what happens. Okay, so it looks like it's turning the color into a gradient now. And then for shape, right now it's a sphere. Let's turn this into a cube. Whoa. Holy smokes. This is such a cool animation. Look at that. And then let's turn this into a Taurus. This is so impressive. Look at that. And then finally, let's turn this into a plane. And indeed, it turns it into a flat plane like this. Really cool. Let me turn this back into a sphere. And indeed, it creates a sphere from this. So there you go. It also just nailed this zero shot with just one prompt. In fact, let me refresh the page again. I really like the initial animation where it turns into a sphere from this particle cloud. Look how cool this is. I really love that effect. All right. Next, let's test its ability to understand physics. So, here the prompt is make a Gton board simulation with a grid of pegs, sidewalls, and separate dividers at the bottom. Drop balls from the top upon button click. use matter.js. This is another really important JavaScript library that can simulate physics very well. And then here's my key phrase to put everything in a single HTML file. Let's click run and see what that gives us. All right, here's its response. I'm just going to scroll all the way down and download the HTML. And then afterwards, let me open this up. And here you can see a perfect Gen board with perfect physics understanding. So if I press on drop ball, the ball indeed drops. and drops down randomly into a certain container based on gravity and physics. So, let's click this a few more times so you can see a few more examples. This is again a flawless app that it created. Zero shot. Super impressive. All right, here's another cool example. Show me a visualizer with animations upon mouse hover. In a sidebar, I can choose from different effects like blur, liquid, chrome, particles, waves, grid, distortion, iridesence, hypers speed, add more. Some of these effect names I just made up. I'm not even sure what it's going to give me. And then I'm going to use anime.js, which is again really good for creating animations on web pages. So, let's click run and see what that gives us. All right, here's its response. Again, it has the usual thought process where it breaks down everything and tackles it step by step. And then at the end of its thinking process, it's also correcting itself and then also doing a final check on all the requirements. And then I'm just going to scroll all the way down to the end of the code and then press download. All right, let's open this up and see what we get. So here the first effect is blur. If I hover my mouse over this, it indeed blurs these circles. And if I take my mouse off the screen, the circles are sharp again. Really cool. So, blur works. Next, let's move on to particles. If I hover my mouse, wow, look at that. If I move my mouse along the screen, it automatically creates these particle fireworks. So, you can use Gemini to easily add these really cool and complex animations on your website. Next, let's try waves. All right, here's what we get. Now, if I place my mouse on the screen, that is what it does. Let me just do this a few more times so you can see the effect of my mouse hover. Really cool. And then next up, we have grid distortion. And here's what grid distortion does. Again, a very interesting effect. And then hypers speed. If I place my mouse on the screen, this is so cool. So, notice that the stars are now moving at a much faster pace. And then if I take my mouse off the screen, the stars now revert to a slower pace. Let's do this again so you can see the effect. Very nice. Next, let's try glitch and see what that does. Very cool. So, depending on where I hover my mouse, it will add this glitch effect over the text. And then let's see what pixel stretch does. Whoa, really interesting. So, it seems like it's just stretching the letters either horizontally or vertically. This kind of looks like a barcode as well. And then next we have liquid chrome. Let's see what this does. Wow, this is also so cool. Depending on where I move my mouse, it's creating this effect which I can't even describe. And then finally, we have iridesence, which looks like this. I don't even know what to expect for iridescents, but uh this does look like an iridescent orb. And if I move my mouse on this sphere, I'm not sure what happens. It does kind of change the color slightly, but um yeah, again, I don't really know what to expect for a lot of these effects, so I'm not expecting much here. The fact that it was able to even create an iridescent looking orb is already really impressive. So, those are some of my tests. Notice that this new Gemini 2.5 Pro0506 is not like way better than the earlier version. This is just like marginally better. And in fact, I already did a full review of the original Gemini 2.5 Pro where I go over some really insane demos. I got it to create a Pokédex, an interactive night sky viewer with constellations. I got it to analyze a ton of financial reports and even create a 3D tourist map of Hong Kong. So, I'm not going to repeat too many of those examples in this video. If you want to learn more, check out this video if you haven't already. Finally, here are some demos by Google themselves. So, again, because Gemini is multimodal, it can understand images. You can upload an image of this tree and then get it to transform this image into a code-based representation of its natural behavior. And this is what you get. And instead of a tree, if you upload a photo of a spiderweb with the same prompt, it would create this app. And then here is a photo of a fire with the same prompt. Here is a photo of fireflies. We also have clouds, a flock of birds, and this photo of a fern. I really like this animation. And then here we have some water ripples, and I don't even know what this is. Is this like fungus growing or something? And then it can even create this lightning simulator. Really cool. Here's another awesome demonstration by Deis Hassabis where he just drew out a really rough sketch of the app he wants to create and then he simply wrote, "Can you code this app?" And this is the final result. Or here's another example where the user prompts to code a game based on his dog. He's going to upload a photo of his dog with a Sakura background and it actually creates a Sakura related game with his dog as the character. How incredible is that? All right, next let's go over its specs and performance. So, first up is this chatbot arena where people can blind test different AI models side by side. And apparently for this latest version of Gemini 2.5 Pro, not only is it ranked number one overall, but across all these categories, including style control, hard prompts, coding, math, creative writing, instruction following, and longer query. And by the way, the margin is absolutely huge. So if you look at like the next top three models which is open eyes 03 and GPT40 and Gro 3 these only differ within like 10 points but for Gemini 2.5 Pro it beats the next best one by 37 points which is an insane lead. Now, instead of LM Arena, here's another popular leaderboard called LiveBench by Abacus AI. And interestingly, in this leaderboard, the latest version of Gemini 2.5 Pro does not perform so well. Now, this is based on their own benchmarks. These are not blind tests from other users, so keep that in mind. Notice that 03 High is still ranked number one on their leaderboard. And then Gemini 2.5 Pro is in third place. It underperforms 03 in terms of reasoning and coding and language, but it does outperform 03 in terms of mathematics and data analysis. I also tried going to another independent evaluator called artificial analysis, but it looks like they have not added the latest version of Gemini 2.5 Pro yet. So, this is still the March version. Here's another really useful benchmark called Fiction Livebench, which tests the AI's ability to analyze really long prompts. So, for example, if the story is like 120,000 words in length and you ask it some really specific questions, can the AI model actually get it right? And surprisingly, OpenAI's 03 got it 100% of the time correct, whereas the latest version of Gemini 2.5 Pro gets it 71.9% of the time. Keep in mind that this is the same score as the previous version of Gemini 2.5 Pro. So, if you want to feed it a ton of information at once and then ask it specific questions, according to this leaderboard, 03 might be the better option. By the way, if you're interested in learning more about OpenAI's 03 and04 Mini, I also did a full review on that, and it has some crazy abilities, so definitely check out this video if you haven't already. Next up, we have another leaderboard called Humanity's Last Exam. This name is really misleading. It does not mean that we are screwed once AI can get 100%. This is basically a test of some really specific knowledge on really obscure and specialized scientific domains. And interestingly, Gemini 2.5 Pro, the latest version, actually scores a bit below the earlier version that was released in March, as you can see from the score here. However, based on the confidence intervals, this is not a significant difference. So, in fact, all five of these models do not have a significant difference in terms of their performance. So, they're all kind of tied for number one place. Finally, if you look at this leaderboard called Geobbench, this basically tests the AI's ability on guessing the location based on a photo, like I did with the Joffre Lakes example. And you can see here that Gemini 2.5 Pro is currently ranked number one. And if you add search to it, which is kind of cheating, but if you do, it performs even better. It's also really important that the AI model actually gives you factually correct information and doesn't make stuff up. So, here's a really useful leaderboard that lists out the hallucination rates of these AI models, or basically how often they make stuff up. Now, they haven't released the results for the latest version of Gemini 2.5 Pro yet, but as you can see from the March version, it hallucinates 1.1% of the time. If you really want your information to be factually correct, like if this is for scientific or legal research, then at least according to this leaderboard, you should use Gemini 2.0 Flash instead. Finally, I also want to go over the cost of this. So in their official blog it says this improved version will be available at the same price. So if you look at the price of Gemini 2.5 Pro notice that it is cheaper than Claude 3.7 Gro 3 and Open Eyes03 which is crazy expensive. So not only is this one of the best models out there but it's also cheaper than the other ones making it really cost effective. Anyways that sums up my review on this latest version of Gemini 2.5 Pro. For me, I think the most useful feature is I can record a video explaining exactly how I want an app to look and function, and it would actually understand everything and create the app for me. This is way more effective than just using a text prompt. But let me know in the comments what you think. And if you've had a chance to play around with this latest version, what are some other cool and impressive things you were able to come up with? As always, I will be on the lookout for the top AI news and tools to share with you. So, if you enjoyed this video, remember to like, share, subscribe, and stay tuned for more content. Also, there's just so much happening in the world of AI every week, I can't possibly cover everything on my YouTube channel. So, to really stay up to date with all that's going on in AI, be sure to subscribe to my free weekly newsletter. The link to that will be in the description below. Thanks for watching, and I'll see you in the next one.
Share:
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
How It Works
Copy YouTube Link
Grab any YouTube video URL from your browser
Paste & Extract
Paste the URL and we'll fetch the transcript
Use the Text
Search, copy, or save the transcript
Why you need YouTube Transcript?
Extract value from videos without watching every second - save time and work smarter
YouTube videos contain valuable information for learning and entertainment, but watching entire videos is time-consuming. This transcript tool helps you quickly access, search, and repurpose video content in text format.
For Note Takers
- Copy text directly into your study notes
- Get podcast transcripts for better retention
- Translate content to your native language
For Content Creators
- Create blog posts from video content
- Extract quotes for social media posts
- Add SEO-rich descriptions to videos
With AI Tools
- Generate concise summaries instantly
- Create quiz questions from content
- Extract key information automatically
Creative Ways to Use YouTube Transcripts
For Learning & Research
- Generate study guides from educational videos
- Extract key points from lectures and tutorials
- Ask AI tools specific questions about video content
For Content Creation
- Create engaging infographics from video content
- Extract quotes for newsletters and email campaigns
- Create shareable memes using memorable quotes
Power Up with AI Integration
Combine YouTube transcripts with AI tools like ChatGPT for powerful content analysis and creation:
Frequently Asked Questions
Is this tool really free?
Yes! YouTubeToText is completely free. No hidden fees, no registration needed, and no credit card required.
Can I translate the transcript to other languages?
Absolutely! You can translate subtitles to over 125 languages. After generating the transcript, simply select your desired language from the options.
Is there a limit to video length?
Nope, you can transcribe videos of any length - from short clips to multi-hour lectures.
How do I use the transcript with AI tools?
Simply use the one-click copy button to copy the transcript, then paste it into ChatGPT or your favorite AI tool. Ask the AI to summarize content, extract key points, or create notes.
Timestamp Navigation
Soon you'll be able to click any part of the transcript to jump to that exact moment in the video.
Have a feature suggestion? Let me know!Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.