Hang tight while we fetch the video data and transcripts. This only takes a moment.
Connecting to YouTube player…
Fetching transcript data…
We’ll display the transcript, summary, and all view options as soon as everything loads.
Next steps
Loading transcript tools…
‘Thinking is Hard’: Jensen Huang Explains How Nvidia Is Rewiring the Future of Intelligence | AI14 | DWS News | YouTubeToText
YouTube Transcript: ‘Thinking is Hard’: Jensen Huang Explains How Nvidia Is Rewiring the Future of Intelligence | AI14
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
Video Summary
Summary
Core Theme
The AI industry has entered a new phase of rapid advancement and adoption, driven by breakthroughs in AI model training and reasoning capabilities, leading to a significant increase in computational demand and the emergence of a self-sustaining "virtual cycle" for AI development and deployment.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
fairly profound happened this year
actually. If you look in the beginning
of the year, everybody has some attitude
about AI. That attitude is generally
this is going to be big. It's going to
be the future. And somehow a few months
ago, it kicked into turbocharge. And the
reason for that is several things.
The first is that we in the last couple
years have figured out how to make AI
much much smarter
rather than just pre-training.
Pre-training basically says, let's take
all of the all of the information that
humans have ever created. Let's give it
to the AI to learn from. It's
essentially memorization and generalization.
generalization.
It's no, it's not unlike going to school
back when we were kids. The first stage
of learning, pre-training was never
meant, just as preschool
was never meant to be the end of education.
education.
Pre-training, preschool was simply
teaching you the basic skills of
intelligence. so that you can understand
how to learn everything else. Without
vocabulary, without understanding of
language and how to communicate, how to
think, it's impossible to learn
everything else. The next is
post-training. Post-training after
pre-training is teaching you skills.
Skills to solve problem, break down
problems, reason about it, how to solve
math problems, how to code, how to think
about these problems step by step. use
first principal reasoning and then after that
that
is where computation really kicks in. As
you know for many of us you know we went
to school and that's in my case decades
ago but ever since I've learned more
thought about more and the reason for
that is because we're constantly
grounding oursel in new knowledge. We're
constantly doing research and we're
constantly thinking. thinking is really
what intelligence is all about. And so
now we have three fundamental technology
skills. We have these three technology
pre-training which still requires
enormous enormous amount of computation.
We now have post-training which uses
even more computation and now thinking
puts incredible amounts of computation
load on the infrastructure because it's
thinking on our behalf for every single
human. So the amount of computation
necessary for AI to think inference is
really quite extraordinary. Now I used
to hear people say that inference is
easy. NVIDIA should do training. Nvidia
is going to do you know they are really
good at this so they're going to do
training. That inference was easy. How
could thinking be easy? Regurgitating
memorized content is easy. Regurgitating
the multiplication table is easy.
Thinking is hard. Which is the reason
why these three scales, these three new
scaling laws which is all of it in in
full steam has put so much pressure on
the amount of computation. Now another
thing has happened
from these three scaling laws. We get
smarter models and these smarter models
need more compute. But when you get
smarter models, you get more intelligence.
intelligence.
as if anything happens. I want to be the
Jazz kid. I'm sure it's fine. Probably
just lunch. My stomach.
Was that me?
And so, so where was I? The smarter your
models are, the smarter your models are,
the more people use it, it's now more
grounded. It's able to reason. It's able
to solve problems it never learn how to
solve before because it could do
research. Go learn about it, come back,
break it down, reason about how to solve
your how to answer your question, how to
solve your problem, and go off and solve
it. The amount of thinking is making the
models more intelligent. The more
intelligent it is, the more people use
it. The more intelligent it is, the more
computation is necessary. But here's
what happened.
This last year,
the AI industry turned a corner. Meaning
that the AI models are now smart enough.
They're making they're worthy. They're
worthy to pay for. NVIDIA pays for every
license of Cursor. And we gladly do it.
We gladly do it because cursor is
helping a several hundred,000 employee
software engineer or AI researcher be
many many times more productive. So of
course we'd be more than happy to do
that. These AI models have become good
enough that they are worthy to be paid
for. Cursor 11 Labs synthesia a bridge
open evidence the list goes on. Of
course, open AI, of course, claude.
These models are now so good that people
are paying for it. And because people
are paying for it and using more of it,
and every time they use more of it, you
need more compute. We now have two exponentials.
exponentials.
These two exponentials, one is the
exponential compute requirement of the
three scaling law. And the second
exponential, the more people, the
smarter it is, the more people use it,
the more people use it, the more
computing it needs. two exponentials now
putting pressure on the world's
at exactly the time when I told you
earlier that Moore's law has largely
ended and so the question is what do we
do if we have these two exponential
demands growing and if we don't if we
don't find a way to drive the cost down
then this positive feedback system this
circular feedback system essentially
called the virtual cycle. Essential for
almost any industry,
essential for any platform industry. It
was essential for Nvidia. We have now
reached the virtual cycle of CUDA.
The more applications, the more the more
applications people create, the more
valuable CUDA is, the more valuable CUDA
is, the more CUDA computers are
purchased. the more could p computers
are purchased more developers want to
create applications for it that virtual cycle
cycle
for Nvidia has now been achieved after
30 years we have achieved that also 15
years later we've achieved that for AI
AI has now reached the virtual cycle and
so the more you use it because the AI is
smart and we pay for it the more profit
is generated the more profit generated
the more computes put to on the on the
grid. The more compute is put into AI
factories, the more comput the AI
becomes smarter, the smarter, more more
people use it, more applications use it,
the more problems we can solve. This
virtual cycle is now spinning. What we
need to do is drive the cost down
tremendously so that one, the user
experience is better. When you prompt
the AI, it responds to you much faster.
And two, so that we keep this virtual
cycle going by driving its cost down so
that it could get smarter, so that more
people use it, so that so on so forth.
That virtual cycle is now spinning. But
how do we do that when Moore's law has
really reached this limit? Well, the
answer is called co-design.
You can't just design chips and hope
that things on top of it is going to go
faster. The best you could do in
designing chips is add I don't know 50%
more transistors every couple of years
and if you added more transistors just
you know we can add more transistors and
TSMC's got a lot of transistor
incredible company we just keep adding
more transistors however that's all in
percentages not exponentials
we need to compound exponentials to keep
this virtual cycle going extreme code
design is the only company in the world
today that literally starts from a blank
sheet of paper and can think about new
fundamental architecture, computer
architecture, new chips, new systems,
new software, new model architecture and
new applications all at the same time.
So many of the people in this room are
here because you're different parts of
that layer that different parts of that
stack and working with Nvidia.
We fundamentally rearchitect everything
from the ground up and then because AI
is such a large problem, we scale it up.
We created a whole computer, a computer
for the first time that has scaled up
into an entire rack. That's one
computer, one GPU. And then we scale it
out by inventing a new AI Ethernet
technology we call Spectrum X Ethernet.
Everybody will say Ethernet is Ethernet.
Ethernet is hardly Ethernet. Ethernet
spectrum X Ethernet is designed for AI
performance and it's the reason why it's
so successful. And even that's not big
enough. We'll fill this entire room of
AI supercomputers and GPUs.
That's still not big enough because the
number of applications and the number of
users for AI is continuing to grow
exponentially. And we connect multiple
of these data centers together and we
call that scale across spectrum XGS
gigascale X spectrum X gigascale XGS. By
doing so, we do code design at such a
such an enormous level, such an extreme
level that the performance benefits are
shocking. Not 50% better each
generation, not 25% better each
generation, but much much more. This is
the most extreme code-designed computer
we've ever made and quite frankly made
in modern times. Since the IBM system
360, I don't think a computer has been
ground up, reinvented like this ever.
This system was incredibly hard to
create. I'll show you the benefits in
just a second. But essentially what
we've done, essentially what we've done,
we've created otherwise
Hey Janine, you can come out. It's
you have to have to meet me like halfway.
All right. So, this is kind of like
Captain America shield.
So, MVLink 72, MVLink 72, if we were to
create one giant chip, one giant GPU,
this is what it would look like. This is
the level of wafer scale processing we
would have to do.
It's incredible. All of this, all of
these chips are now put into one giant rack.
rack.
Did I do that or did somebody else do
that? Into that one giant rack.
You know, sometimes I don't feel like
This one giant rack makes all of these
chips work together as one. It's
actually completely incredible. And I'll
show you the benefits of that. The way
it looks is this. So, thanks Janine.
I I like this. All right, ladies and
I got it. In the future next, I'm just
It's like when you're at home and and
you can't reach the remote and you just
go like this and somebody brings it to
you. That's Yeah. Same idea.
It never happens to me. I'm just
dreaming about it. I'm just saying.
Okay. So, so anyhow, anyhow, um we
basically this is what we created in the
past. This is MVLink MVLink 8. Now,
these models are so gigantic. The way we
solve it is we turn this model, this
gigantic model into a whole bunch of
experts. It's a little bit like a team.
And so, these experts are good at
certain types of problems. And we
collect a whole bunch of experts
together. And so, this giant multi-
trillion dollar AI model has all these
different experts. And we put all these
different experts on a GPU. Now, this is
NVLink 72.
We could put all of the chips into one
giant fabric and every single expert can
talk to each other. So the master the
the primary expert could talk to all of
the distributed work and all of the
necessary context and prompts and bunch
of data that we have to bunch of tokens
that we have to send to all of the
experts. The experts would whichever one
of the experts are selected to solve the
answer would then go off and try to
respond and then it would go off and do
that layer after layer after layer.
Sometimes eight, sometimes 16 and
sometimes these experts, sometimes 64,
sometimes 256. But the point is there
are more and more and more experts.
Well, here MVLink72, we have 72 GPUs.
And because of that, we could put four
experts in one GPU.
The most important thing you need to do
for each GPU is to generate tokens,
which is the amount of bandwidth that
you have in HPM memory. We have one H
one GPU generating thinking for four
experts versus here because each one of
the computers can only put eight GPUs.
We have to put 32 experts into one GPU.
So this one GPU has to think for 32
experts versus this system each GPU only
has to think for four. And because of
that the speed difference is incredible.
And this just came out. This is the
benchmark done by semi analysis. They do
a really really thorough job and they benchmarked
benchmarked
all of the GPUs that are benchmarkable
and it turns out it's not that many. If
you look at the list of looks list of
GPUs you could actually benchmark is
like 90% Nvidia. Okay. And but so we're
comparing ourselves to ourselves but the
second best GPU in the world is the H200
and runs all the workload.
Grace Blackwell per GPU is 10 times the performance.
performance.
Now, how do you get 10 times the
performance when it's only twice the
number of transistors?
Well, the answer is extreme code design.
And by understanding the nature of the
future of AI models and we're thinking
across that entire stack, we can create
architectures for the future. This is a
big deal. It says we can now respond a
lot faster. But this is even bigger
deal. This next one, look at this. This says
says
that the lowest cost tokens in the world
are generated by Grace Blackwell
MVLink72, the most expensive computer.
On the one hand, GB200 is the most
expensive computer. On the other hand,
its token generation capability is so
great that it produces it at the lowest
cost because the tokens per second
divided by the t by the total cost of
ownership of Grace Blackwell is so good.
It is the lowest cost way to generate
tokens. By doing so, delivering
incredible performance, 10 times the
performance, incre delivering 10 times
lower cost, that virtual cycle can
continue. Which then brings me to this
one. I just saw this literally
yesterday. This is uh the CSP capex.
People are asking me about capex these
days and um this is a good way to look
at it. In fact, the capex of the top six
CSPs and this one, this one is Amazon,
Core Weave, Google, Meta, Microsoft, and
Oracle. Okay, these CSPs together
are going to invest this much in capex.
And I would I would tell you the timing
couldn't be better. And the reason for
that is now we have the Grace Blackwell
NVLink72 in all volume production,
supply chain, everywhere in the world is
manufacturing it. So we can now deliver
to all of them this new architecture so
that the capex invests in instruments
computers that deliver the best TCO. Now
underneath this there are two things
that are going on. So when you look at
this it's actually fairly extraordinary
and it's fairly extraordinary anyhow but
what's happening under underneath is
this there are two platform shifts
happening at the same time.
One platform shift is going from general
purpose computing to accelerated
computing. Remember accelerated
computing as I mentioned to you before
it does data processing, it does image
processing, computer graphics, it does
com comput computation of all kinds. It
runs SQL, runs Spark, it runs, you know,
you you ask it, you tell us what you
need to have run, and I'm fairly certain
we have an amazing library for you. You
could be, you know, a data center trying
to make masks to manufacture
semiconductors. we have a great library
for you. And so underneath, irrespective
of AI, the world is moving from general
purpose computing to accelerated
computing irrespective of AI. And in
fact, many of the CSPs already have
services that have been here long ago
before AI. Remember, they were invented
in the era of machine learning.
classical machine learning algorithms
like XG Boost, algorithms like um uh
data frames that are used for recommener
systems, collaborative filtering,
content filtering, all of those
technologies were created in the old
days of general purpose computing. Even
those algorithms, even those
architectures are now better with
accelerated computing. And so even
without AI, the world's CSPs are going
to invest into acceleration. Nvidia's
GPU is the only GPU that can do all of
that plus AI. And ASIC might be able to
do AI, but it can't do any of the others.
others.
Nvidia could do all of that, which
explains why it is so safe to just lean
into Nvidia's architecture. We have now
reached our virtual cycle, our
inflection point. And this is quite
extraordinary. I have many partners in
the room and all of you are part of our
supply chain and I know how hard you
guys are working. I want to thank all of
you how hard you are working. Thank you
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.