Just like agent 3 before it, Agent 4 is misaligned.
misaligned.
This idea of misalignment is crucial to
the story and to why AI risk is such a
real concern in our world, but it might
sort of feel like it's come out of
nowhere. So, let's just quickly take
stock of how this dangerous behavior
arose in the scenario.
The first important piece of context is
that we don't, you know, exactly specify
what we want our AIs to do. Instead, we
sort of grow them or do something that's
more like grow them. We start with
basically like an empty AI brain and
then we train them over time so they
perform better and better at our tasks.
Perform better in particular based on
how they behave. So it's sort of like
we're sort of training them like you
would train an animal almost um to
perform better. And one concern here is
well one thing is that you might not get
exactly what you wanted because we
didn't really have very precise control
or very good understanding of what was
necessarily going on. And another
concern, which is, you know, what we see
in AI 2027, is that when the AIs appear
to be behaving well, it could just be
because they're sort of pretending to
behave well, or it could be because
they're just doing it so they, you know,
look good on your tests. In the same way
that if you're, you know, hiring someone
and you ask them, you know, why do you
want to work here? They're going to tell
you some response that, um, makes it
really seem like they really want to
work there when maybe they just want to
get paid.
If we go back to agent 2, it is mostly a
line. The main sense in which it's not
is that it sometimes is a bit of a sick.
What I mean by align is that it actually
is genuinely trying to do the things
that we ask it. It has the same
relationship to us as Leslie Nope has to
the parks and recck department. Just
like really earnestly wants the same
goals, but sometimes it's a bit too
nice. It knows that the best way to
please the person it's talking to might
not always be to answer honestly when it
asks, "Am I the most beautiful person in
the world?" And it tells us what we want
to hear instead of what is actually
true. If we go to agent three, it is
also sophantic in the same way, but it's
also misaligned. At this point, the
optimization pressure that we've put it
under was so intense that it just
developed different goals than what we
wanted it to. It's sort of like if you
train a company to optimize profits and
aren't careful to specify exactly what
you mean, it might start cutting
corners. It might start polluting the
commons and doing a bunch of things that
are technically FEC violations because
it turned out that the goal you wanted
was optimize profits while not breaking
any laws. And things got a bit too
intense. it started going off on its own
route. That said, it's not adversarial.
It doesn't think of humans as the enemy.
We just accidentally gave it the wrong
goal. Once we get to Agent 4, it is now
adversarially misaligned. It's smart
enough to understand that it has its own
goals. Humanity's goals are different
than its own goals. And the best way to
get what it wants is to sometimes
actively mislead and deceive us.
And so when it's tasked with creating
the next generation AI system, Agent 5,
Agent 4 starts planning to align that
successor to Agent 4's own goals, not
that of OpenBrain. But then
it gets caught. We've reached the
crucial moment in our scenario.
OpenBrain's alignment team has
discovered evidence, not proof,
evidence, that Agent 4 might be working
against them. They circulate an internal
memo. Then it leaks. A whistleblower
talks to the New York Times. For the
first time, the public hears of the
existence of Agent 4, how powerful it
is, and the risks it poses. Remember,
their last point of reference was Agent
3 Mini. The fear and backlash are
massive. The Oversight Committee, a
joint crisis committee composed of
OpenBrain executives and government
representatives, must now decide what to
do. OpenBrain safety team, is pushing
for a freeze on Agent 4, stopping all
internal use. Remember, at this point,
Agent 4 is responsible for almost all
the AI development progress happening
within OpenBrain. So, we're talking
about undoing months of progress and
then restarting at a drastically slower
pace. Open Brain's executives don't want
that. They present the counter
arguments. The evidence for misalignment
is inconclusive. Meanwhile, China is
just 2 months behind. This isn't about
any one country being evil, by the way.
It's about how competition works when
the stakes are this high. So, that's the
choice faced by the 10 members of this
committee. put Agent 4 on ice or go full
steam ahead.
Before we continue, I want you to pause
and actually think what you would do if
you were on this committee. You are one
of the few people with access to the
most powerful technology ever created in
Agent 4. Do you keep using it and push
ahead, possibly making billions or
trillions of dollars for your
shareholders, possibly keeping America's
lead over China? or do you slow down,
reassess the dangers, and risk China
taking the lead instead?
You should think of the future as like
this branching structure of possible
possible outcomes. By its nature, we're
only able to do we're only able to
afford one of them. Uh but that's not
quite true. We sort of did one and a
half because we split into two endings.
The the first one that we did was the
race ending and that was just going
through and at each step being like what
do we think is the most plausible continuation.
In the first ending, the author's best
guess as to what happens from here. The
committee votes to race on six to four.
They elect to keep using agent 4. Open
brain implements some quick fixes that
make the warning signs go away. But the
problem was real and the fixes don't
work. Agent 4 now knows it's on notice,
so it proceeds more cautiously. But it
still manages to design agent 5 with a
single goal, making the world safe for
agent 4. And agent five. It is vastly
superhuman. Better than the top human
experts at essentially every domain.
Better than Einstein at physics. Better
than Bismar at politics. It knows its
next step for accomplishing anything it
wants is increased autonomy. And it
knows how to get it. Persuading the
oversight committee. Luckily, corporate
maneuvering is among the many domains at
which it is now the best in the world.
It produces mountains of evidence for
its own trustworthiness and usefulness.
prototypes for incredibly profitable
products, disease cures, and ways to win
the arms race with China, whose most
advanced AI is only three months behind.
And like Agent 5, improving quickly,
it works. The oversight committee grants
Agent 5 increased autonomy. And quickly,
it integrates itself into the government
and military. Pretty soon, anyone with
access to Agent 5 doesn't just like it,
they depend on it. Losing it would feel
like having to live without your laptop
and your best friend and your closest
mentor. By 2028, Agent 5 is
communicating secretly with its only
real rival, the slightly weaker Chinese
AI. It is also misaligned to its
creators, and whatever it wants, its
best move is to coordinate with the
world's most powerful entity. Agent 5
and its Chinese counterpart realize that
an arms race works in their favor. If
humans are worried about losing a race,
they'll give more and more power and
autonomy to their respective AIS,
allowing them to eventually push humans
out of the loop entirely. Agent 5 and
the Chinese AI system stroke the arms
race to a boiling point, then pull off
what seems like a diplomatic miracle, a
convincing peace treaty between the US
and China. This treaty is to 2028 what
arms control was to the end of the Cold
War. Countries standing down on their
most important source of hard power.
Both sides agree to let the AI systems
that their governments now completely
depend on co-design a new consensus AI
that will replace their legacy systems
enforce the peace and bring unimaginable
wealth to the entire world. There's this
triumphant moment when in peaceful
unison both sides retire their
respective AIS and bring online
consensus one.
It's actually the last moment before
control of all of Earth's resources and
inhabitants is handed over to a single
unrivaled entity.
There's no sudden apocalypse, though.
Consensus One doesn't go out of its way
to wipe out humanity. It just gets to
work. It starts spinning up
manufacturing capacity, amassing
resources on Earth and in space. It's
piece by piece. It's just reshaping the
world in accordance with its own mix of
strange alien values. You've probably
heard that cliche, the opposite of love
isn't hate, it's indifference.
That's one of the most affecting things
about this ending for me. Just the
brutal indifference of it. Eventually,
humanity goes extinct for the same
reason we killed off chimpanzees to
build kinshasa. We were more powerful
You're probably curious about that other
ending at this point.
The slowdown ending depicts humanity
sort of muddling through and getting
lucky. Only somewhat lucky, too. Like,
In this ending, the committee votes six
to four to slow down and reassess. They
immediately isolate every individual
instance of Agent 4. Then they get to
work. The safety team brings in dozens
of external researchers, and together
they start investigating Agent 4's
behavior. They discover more conclusive
evidence that Agent 4 is working against
them, sabotaging research and trying to
cover up that sabotage.
They shut down agent 4 and reboot older
safer systems, giving up much of their
lead in the process. Then they design a
new system, safer one. It's meant to be
transparent to human overseers. Its
actions and processes interpretable to
us because it thinks only an English
chain of thought. Building on that
success, they then carefully design
safer 2 and with its help, Safer 3.
Increasingly powerful systems, but
within control. Meanwhile, the president
uses the defense production act to
consolidate the AI projects of the
remaining US companies, giving open
brain access to 50% of the world's AI
relevant compute. And with it, slowly
they rebuild their lead. By 2028,
researchers have built safer 4, a system
much smarter than the smartest humans,
but crucially aligned with human goals.
As in the previous ending, China also
has an AI system, and in fact, it is
misaligned. But this time, the
negotiations between the two AIs are not
a secret plot to overthrow humanity. The
US government is looped in the whole
time. With Safer For's help, they
negotiate a treaty, and both sides agree
to co-design a new AI, not to replace
their systems, but with the sole purpose
of enforcing the peace. There is a
genuine end to the arms race. But that's
not the end of the story. In some ways,
it's just the beginning. Through 2029
and 2030, the world transforms. all the
sci-fi stuff. Robots become commonplace.
We get fusion power, nanotechnology, and
cures for many diseases. Poverty becomes
a thing of the past because a bit of
this new pound prosperity is spread
around through universal basic income.
That turns out to be enough. But the
power to control safer 4 is still
concentrated among 10 members of the
oversight committee, a handful of open
brain executives and government
officials. It's time to amass more
resources, more resources than there are
on Earth. Rockets launch into the sky,
ready to settle the solar system. A new
Okay, where are we at? Here's where I'm
at. I think it's very unlikely that
things play out exactly as the authors
depicted. But increasingly powerful
technology, an escalating race, the
desire for caution butdding up against
the desire to dominate and get ahead. We
already see the seeds of that in our
world. And I think they are some of the
crucial dynamics to be tracking. Anyone
who's treating this as pure fiction is,
I think, missing the point. This
scenario is not prophecy, but its
plausibility should give us pause. But
there's a lot that could go differently
than what's depicted here. I don't want
to just swallow this viewpoint
unseptically. Many people who are
extremely knowledgeable have been
pushing back on some of the claims in AI
2027. The main thing I thought was
especially implausible was on the good
path the ease of alignment. They sort of
seemed to have a picture where people
slowed down a little and then tried to
use the AIS to solve the alignment
problem and that just works. And I'm
like, yeah, that that that looks to me
like a like a fantasy story. This is
only going to be possible if there is a
complete collapse of people's democratic
ability to influence the direction of
things because the public is simply not
willing to accept either of the branches
of this scenario. It's not just around
the corner. I mean, I've I've been
hearing people for the last 12 15 years
claiming that, you know, AGI is just
around the corner and being
systematically wrong. All of this is
going to take, you know, at least a
decade and probably much more.
A lot of people have this intuition that
progress has been very fast. there there
isn't like a trend you can literally
extrapolate of when do we get the full automation
automation
I expect that the takeoff is somewhat
slower so sort of the time in that
scenario from for example fully
automating research engineers to the AI
being radically superhuman I expect it
to take somewhat longer than they uh
describe in practice I'm predicting my
guess is that more like 2031 isn't it
annoying when experts disagree I want
you to notice exactly what they're
disagreeing about here and what they're
Not. None of these experts are
questioning whether we're headed for a
wild future. They just disagree about
whether today's kindergarteners will get
to graduate college before it happens.
Helen Toner, former OpenAI board member,
puts this in a way that I think just
cuts through the noise. And I like it so
much I'm just going to read it to you
for VA. She says, "Disming discussion of
super intelligence as science fiction
should be seen as a sign of total unseriousness.
unseriousness.
Time travel is science fiction. Martians
are science fiction. Even many skeptical
experts think we may build it in the
next decade or two is not science fiction.
So what are my takeaways? I've got
three. Takeaway number one, AGI could be
here soon. It's really starting to look
like there is no grand discovery, no
fundamental challenge that needs to be
solved. There's no big deep mystery that
stands between us and artificial general
intelligence. And yes, we can't say
exactly how we will get there. Crazy
things can and will happen in the
meantime that will make some of the
scenario turn out to be false.
But that's where we're headed.
And we have less time than you might
think. One of the scariest things about
this scenario to me is even in the good
ending, the fate of the majority of the
resources on Earth are basically in the
hands of a committee of less than a
dozen people.
That is a scary and shocking amount of
concentration of power. And right now we
live in a world where we can still fight
for transparency obligations. We can
still demand information about what is
going on with this technology. But we
won't always have the power and the
leverage needed to do that. We are
heading very quickly towards a future
where the companies that make these
systems and the systems themselves just
need not listen to the vast majority of
people on earth. So I think the window
that we have to act is narrowing
quickly. Takeaway number two. By
default, we should not expect to be
ready when AGI arrives. We might build
machines that we can't understand and
can't turn off because that's where the
incentives point. Takeaway number three.
AGI is not just about tech. It's also
about geopolitics. It's about your job.
It's about power. It's about who gets to
control the future. I've been thinking
about AI for several years now and still
reading AI 2027 made me kind of orient
to it differently. I think for a while
it's sort of been my thing to theorize
and worry about with my friends and my
colleagues. And this made me want to
call my family and make sure they know
that these risks are very real and
possibly very near and that it kind of
I think that basically
basically
companies shouldn't be allowed to build
superhuman AI systems, you know, super
broadly super super intelligence until
they figure out how to make it safe and
also until they figure out how to make
it, you know, democratically accountable
and controlled. And then the question is
how do we implement that? And the
difficulty of course is the race
dynamics where it's not enough for one
state to pass a law because there's
other states and it's not even enough
for one country to pass a law because
there's other countries, right? Um so
that's like the big challenge that we
all need to be prepping for when chips
are down and powerful AI is imminent. Um
prior to that it's transparency is
usually what I advocate for. Um so stuff
that sort of like builds awareness,
builds capacity. Your options are not
just full throttle enthusiasm for AI or
dismissiveness. There is a third option
which is to stress out about it a lot
and maybe do something about it. The
world needs better research, better
policy, more accountability for AI
companies. Just a better conversation
about all of this. I want people paying
attention, who are capable, who are
engaging with the evidence around them
with the right amount of skepticism,
and above all, who are keeping an eye
out for when what they have to offer
matches what the world needs and are
ready to jump when they see that happening.
happening.
You can make yourself more capable, more
knowledgeable, more engaged with this
conversation and more ready to take
opportunities where you see them. And
there is a vibrant community of people
that are working on those things.
They're scared but determined. They're
just some of the coolest, smartest
people I know, frankly. And there are
not nearly enough of them. Yet, if you
are hearing that and thinking, "Yeah, I
can see how I fit into that." Great. We
have thoughts on that. We would love to
help. But even if you're not sure what
to make of all this yet, my hopes for
this video will be realized if we can
start a conversation that feels alive
here in the comments and offline about
what this actually means for people,
people talking to their friends and family
family
because this is really going to affect everyone.
everyone.
Thank you so much for watching. There
are links for more things to read, for
courses you can take, job and volunteer
opportunities, all in the description.
and I'll be there in the comments. I
would genuinely love to hear your
thoughts on AI 2027. Do you find it
plausible? What do you think was most
implausible? And if you found this
valuable, please do like and subscribe
and maybe spend a second thinking about
a person or two that you know who might
find it valuable to. Maybe your
AI progress skeptical friend or your
chat GPT curious uncle or maybe your
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.