Deep reinforcement learning (DRL) leverages deep neural networks to significantly enhance the capabilities of reinforcement learning agents, enabling them to learn complex behaviors and achieve human-level or superhuman performance in various domains, though challenges in generalization and true artificial intelligence remain.
Mind Map
Cliquer pour agrandir
Cliquez pour explorer la carte mentale interactive complète
welcome back I'm Steve Brenton and today I'm gonna talk to you a bit more about reinforcement
learning so in the last video I introduced the reinforcement learning architecture how you can
learn to interact with a complex environment from experience and today we're gonna talk about deep
reinforcement learning or some of the really amazing advances in this field that have been
enabled by deep neural networks and these advanced computational architectures so again I am I can
Steve on Twitter please do subscribe and do like do share this if you find it useful and tell me
other things that you would like me to talk about ok so again in the last video we introduced this
this agent the environment the agent measures the environment through this state s it takes some
action to interact with the environment given by a policy so it has a control policy based on how
it acts based on what state it's in and it is optimizing this policy to maximize its future
rewards that it gets from the environment and so mathematically this policy is probabilistic so
the agent has some kind of some probabilistic strategy for interacting with the environment
because the environment might be stochastic or have some randomness to it and there is a value
function that tells the agent how valuable being in a given state is given the policy PI that it
is enacting and so today what we're going to do is we're going to augment this picture by introducing
deep neural networks for example to represent the policy and so here we have now we've replaced our
policy with a deep neural network so this PI is parametrized by theta where theta describes this
this neural network and again it Maps the current state to the best probabilistic action to take
in that environment and so the whole name of the game is to update this policy to maximize future
rewards and again we have this discount rate gamma here that says that rewards in the near future are
worth more than rewards in the distant future okay because again remember these rewards are gonna
be relatively sparse and infrequent most of the time because we're in a semi supervised learning
framework where these rewards are only occasional and so it's difficult to figure out what actions
actually gave rise to those rewards this is gonna be a pretty hard optimization problem
to learn this best policy of what actions to take but you know and actually the whole reinforcement
learning paradigm is biologically inspired it's it's essentially inspired by this observation so
there's this this notion called hebbian learning you may have heard this before and the little
rhyme goes neurons that fire together wire together and basically what that means is that
when you have neural activity kind of when things fire together they will essentially strengthen the
wiring and the connections between those neurons and biological systems and so in these kind of
deep reinforcement learning architectures the idea is that the reward signal that you
get occasionally should somehow strengthen connections that led to a good policy when
when the right policy is firing when these neurons are connected in a way that causes the policy the
correct policy and you get a reward you want to somehow reinforce that architecture and there's
lots of ways of doing this you know essentially through back propagation and so on and so forth so
another area where a lot of research is going into deep learning for reinforcement learning is called
cue learning I talked about this in the last video where this cue or quality function essentially
kind of combines the policy and the value and it tells you jointly how good is a current state
given a current action a-okay and so assuming that I do the best possible thing for all future states
and actions so right now if I find myself in state s an action a I can assign a quality based on the
future value that I expect given that state and given the best possible policy I can cook up and
again there are deep queue networks where you learn this quality function and once you learn
this quality function then when you find yourself in a state s you just look up the best possible a
that gives you the highest quality for that state and this makes a lot of sense this is a lot like
how a person would learn how to play chess is they would kind of simultaneously be building a policy
of okay here's how I move in these situations these are the trades I'm willing to make and
you're also building a value function of how you value different board positions and kind of how
you gauge your strength and the strength of your position based on on the state so it kind of makes
sense that this would be an area for really expanding with deep neural networks because
these functions might be very very complex functions of s and a and that's exactly what
neural networks are good at is giving you very very complex representing very complex functions
if you have enough training data so that's what we're talking about here these still suffer from
all the same challenges of regular reinforcement learning like the credit assignment problem so the
fact that I might only get our reward at the very end of my a chess game makes it very hard to tell
which actions actually gave rise to that award reward and so you're gonna do some of the same
things that you would normally do like you might use reward shaping to give intermediate rewards based on some
expert intuition or guidance and there's lots of other strategies like hindsight and replay and things like that
The basic idea is we're gonna take the same reinforcement learning architecture and we're going to either replace the policy
or the the Q function with a policy network or a Q network
And all of this kind of exploded on the scene because of this 2015 nature paper
"Human level control through deep reinforcement learning" where these authors from DeepMind
essentially showed that they could build a reinforcement learner that could beat human level performance
in lots of classic Atari video games. So I'm gonna hit play.
I love this one... this is one of the first ones that got me really excited about this.
So this reinforcement learner is essentially trying to maximize this score by breaking all of these these blocks
and after a few hours of training it has an epiphany that only really excellent human players ever reach
so it essentially finds an exploit in the game where it realizes that if it tunnels in one side... so It's going to tunnel through here.
If it tunnels through one side it can essentially use the physics of the game to break all of these blocks for it.
And that's pretty amazing. So it in a short amount of time learns a really advanced strategy
that only a few humans only a small percentage of humans would actually learn eventually
so really impressive... this is a beautiful paper by the way, you should should go read this they talk about how
they actually build their networks so they use the pixels of the screen itself as the input and they
use convolutional layers and fully connected layers eventually deciding what the joystick
should do what actions to take. And a lot like other examples of neural networks this was the
paper that really brought reinforcement learning and deep reinforcement learning back to everyone's
to the forefront of everyone's mind because this showed performance that hadn't been attainable
before so this is a lot like the image net of reinforcement learning okay this brought it back
into the forefront. Since then so Google bought this company for half a billion dollars because
this promised a big step towards general artificial intelligence or an artificial
intelligence system that could get good at lots of things rather than just one very specific task and
since then billions of dollars have been invested into reinforcement learning in general and deep
reinforcement learning by it by companies and so in this original paper these are all of the Atari
games that they they try out and this is the level above this they beat they're at or above
human level performance and there's only a few games that they're below human level performance
and I actually think it's pretty interesting to look through these and figure out why why this
this deep mind was unable to figure out these games but it was able to figure out these games
that's kind of an interest interesting exercise actually I will point out that the the algorithm
in this paper you could train it on one of these games and it would get really really good but that
same learned algorithm that reinforcement learner cannot then be used to play another game without
completely retraining it and so it's still a way you know a ways away from where we want to be we
want to have a learning system that can learn to play all of these games and if it learns
one it can learn how to play the other faster and better just like a human does we're still a ways
away from that but tons of people are working on it it's one of the big big problems in the
field is kind of transfer learning and general artificial intelligence using reinforcement
learning so building a learner that can learn lots of things and learn faster from its experience
that's what humans do you get a kid you teach them tic-tac-toe tic-tac-toe is easy they learn
the rules they learn how to not lose then you give them checkers checkers is a little bit more
sophisticated but they remember everything they learn from tic-tac-toe and they learn checkers
faster again they learn how to not lose then you give them chess chess is a truly open-ended game
for most humans it takes a lifetime to master and so based on what they know from tic-tac-toe
and checkers a child will learn chess you know they can transfer some of that over and then
they go into the real world and they learn lots of other skills and those problem-solving skills
transfer over that's what we eventually want with computers we're not there yet but this was a big
step and I got everybody really excited and still does good this is another video I love this is
from the tech insider showing you know a lot of these reinforcement learning algorithms are used
to train so this is google's deepmind to train the reagent how to walk or run or leap or swim
or fly in an artificial environment and that's pretty amazing I mean this is a very complex
hard thing to do we take it for granted our bodies are built to move very efficiently and
agile and accurate ways but this is actually very challenging to do and so the fact that
these these algorithms in a virtual environment can learn how to run and walk and fly and suin
is really promising for robotic technology so we eventually want to learn how to do this in
robotic systems and make our robotic agents more independent and more agile and that's actually
really hard that step from from the artificial world to the real world is challenging so for
example to my knowledge Boston Dynamics does not use a ton of reinforcement learning they use a
lot of physics based modeling and kind of by hand controls maybe they're getting into reinforcement
learning but there's a long way to go to do this in the real world it's still quite impressive and
this is a video I love so this is training bipedal walkers essentially generation after generation of
reinforcement learner and eventually the the agent can learn kind of the physics and learn the right
control policy in this case to walk forwards stabili and you know keep its neck straight and
keep its legs straight and I think this is just a really cool video I'm in a nice demonstration
I'll point out there's some great resources other resources on YouTube you should check out
for reinforcement learning I learned about this in two minute papers which is a great Channel I
love love love learning about things in two minute two minutes papers there's also archive insights
has a great series on reinforcement learning and Brian Douglas has a nice kind of reinforcement
learning for control video I believe it's in on math works okay good so really impressive
performance just in the last ten years because of these huge advances in the representational power
of neural networks these can represent functions that we didn't we previously couldn't represent
because they're extremely expressive and we have lots more training data we can train these because
our computers have gotten so much faster and more powerful and there's also open source software
that makes it really really easy to get started building these neural network representations
and so if you have any interest at all in modern reinforcement learning you have to check out open
AI Jim this is a wonderful open source kind of development framework where you can try out your
new reinforcement learning algorithm on all of these different systems both Atari games simulated
you know running and pendulum and really cool physical systems that are hard to control that
are nonlinear you can get started really quickly and easily in the open AI gym and this is one of
the big reasons things are taking off so fast is because there are these amazing open resources
for people to kind of try things out quickly and rapidly prototype and again you can try out all
of these Atari games and see if you can build a reinforcement learner that can learn multiple
Atari games with the same architecture that would be pretty incredible ok and I also want to point
out these things are getting pretty impressive so I almost everybody's heard of alphago and
how alphago beat the human best go player in the world Lisa doll doll from South Korea the the best
go player in the world was defeated by alphago this was a deep reinforcement learning algorithm
developed at google deepmind with the sole purpose of learning how to play go okay and so I want to
point out that so reinforcement learning is really good at learning the rules of the game and how to
win the game when it's very constrained and when it has all the time in the world to try a millions
or billions of different go games so it's playing essentially with itself getting better and better
and better Lisa Dahl is a human and he has a life and although he spends much of his time in the NGO
world he has a much much richer broader world and he can go home and he can go for a walk and he can
enjoy a sunset and so I I do want to point out that and he can learn from alphago and come back
and be better the next time and I think that's really also quite impressive is that the human
masters learned from from from deep mind and actually got better once they they played up
to their competition I think that's really cool but anyway this alphago learning system I think
is really interesting this deserves a whole set of videos just to dive in I think there there's a
documentary on this it's really interesting so you should check it out the original alphago algorithm
was based on a convolutional neural network at CNN and it had lots of reward shaping from humans so
expert humans guided the reward structure for this alphago learner so it didn't have to wait
until the end of the game to figure out if it won or lost because that would take forever instead
humans using reward shaping helped it to get intermediate rewards to help it to learn faster
and give it a denser reward structure now that's tricky because if a human guides the learning
that almost caps out how good it can possibly be because it's it's fundamentally relying on
human knowledge so the next generation alphago zero which came a couple of years later was even
better and much more impressive it didn't use any human features no reward shaping it only learned
using self play it just played itself until it became so powerful that it could beat everyone
in the world including the original one and it was based on a residual network architecture kind of
these that had these you know jump connections and are easier to train with with backpropagation okay
so really cool advances and that actually was one of the major advances that showed that ResNet was
you know kind of definitely a major contender in the in the neural network architecture scene
okay so anyway I think this is really cool it's really interesting but again that is all that
this algorithm does in the world and it doesn't know how to take what it learned from go and do
anything else better and that's you know that's what we need that's what we want in these systems
is to get really good at a game and then take that and allow it you know use that knowledge
to get really good at something else that's what we do and that's really fun and you know that's
one of our strengths so some other examples I love this is a video from Stanford and from ETH
Zurich where they are essentially going to Train using reinforcement learning flying on hat aerial
vehicles a helicopter and a quadrotor so they can train these very aggressive very high-performance
maneuvers using reinforcement learning and so it is starting to happen that we're going from these
simulated environments to the real world to real robotic systems but it's very challenging it takes
you know a lot of training and human guidance in general and they're still limited examples
of real robotic reinforcement learning actually someone asked me in my class that I teach it at
the University of Washington what are some real examples of reinforcement learning in industry
because a lot of you know a lot of times you hear these game examples Atari or alphago but
I think actually one of the original ones was in elevator scheduling so it turns out and this
is I would have never thought but it's really interesting it turns out that in a really really
big building like a super skyscraper it has tons of elevators and lots of floors scheduling these
things efficiently so that you don't get jammed up and so that people can get where they're going
as fast as possible is a huge problem it's a combinatorially hard problem and it's really
hard to solve and reinforcement learning was one of the early algorithms that was used to kind of
figure out this near optimal scheduling policy kind of interesting good ok so robotic learning
is getting pretty good okay so I just want to take a couple of steps back and maybe summarize
so we've talked about reinforcement learning in general both in the last lecture and in this
one as a kind of framework for learning how to interact with a complex environment based on your
experiences this is fundamentally biologically inspired it's trying to mimic how we learn how
animals learn how you train a dog and things like that and in the last ten years because of advances
in deep neural network and major advances in this architecture and how we actually build
and optimize these reinforcement learners there have been big steps towards more powerful more
general learning frameworks that can learn how to interact with more complex environments like
beating humans at alphago or actually moving you know real robotic systems and it really incredible
ways but there's still a long way to go so again I've said this a lot I'm gonna keep stressing this
we humans have bodies and we are curious we go explore and we touch and we learn and you know
children are curious and they constantly are soaking up knowledge and when they learn one
thing they can use it through this incredible human power of abstraction they can use what
they learn in one scenario in a totally different scenario that is still a completely open problem
in reinforcement learning maybe not completely open it is a pressing and central challenge in
modern reinforcement learning is how to take what you learn and generalize how to take a
step back and use your expertise in one problem in one environment to solve another problem in
another environment that would be real general artificial intelligence we're a long way away
from that so the good news is that that's not going to be solved in five years or ten years
that's a hundreds of year problem but it's really exciting because that means you know there's work
to be done important interesting work in the field of reinforcement learning for you know
lifetimes of research to be done in figuring out how to improve these systems figuring out
how to learn faster and better both from from what you're you're learning and the reward structure of
this problem but also from what you've learned in other problems okay thank you so much for watching
Cliquez sur n'importe quel texte ou horodatage pour accéder directement à ce moment de la vidéo
Partager :
La plupart des transcriptions sont prêtes en moins de 5 secondes
Copie en un clicPlus de 125 languesRechercher dans le contenuAller aux horodatages
Collez une URL YouTube
Entrez le lien de n'importe quelle vidéo YouTube pour obtenir la transcription complète
Formulaire d'extraction de transcription
La plupart des transcriptions sont prêtes en moins de 5 secondes
Installez notre extension Chrome
Obtenez les transcriptions instantanément sans quitter YouTube. Installez notre extension Chrome et accédez en un clic à la transcription de n'importe quelle vidéo directement depuis la page de lecture.