Transcription YouTube :
Deep Reinforcement Learning: Neural Networks for Learning Control Laws

Fini de regarder des vidéos en entier — obtenez la transcription complète, recherchez des mots-clés et copiez en un clic.

AutoDub

Comprendre les vidéos YouTube étrangères

Doublage YouTube immersif en français

Brisez les barrières linguistiques, accédez aux meilleurs contenus du monde

Utiliser gratuitement

Transcription de la vidéo

Résumé de la vidéo

Summary

Core Theme

Deep reinforcement learning (DRL) leverages deep neural networks to significantly enhance the capabilities of reinforcement learning agents, enabling them to learn complex behaviors and achieve human-level or superhuman performance in various domains, though challenges in generalization and true artificial intelligence remain.

Mind Map

Cliquer pour agrandir

Cliquez pour explorer la carte mentale interactive complète

welcome back I'm Steve Brenton and today I'm gonna talk to you a bit more about reinforcement

learning so in the last video I introduced the reinforcement learning architecture how you can

learn to interact with a complex environment from experience and today we're gonna talk about deep

reinforcement learning or some of the really amazing advances in this field that have been

enabled by deep neural networks and these advanced computational architectures so again I am I can

Steve on Twitter please do subscribe and do like do share this if you find it useful and tell me

other things that you would like me to talk about ok so again in the last video we introduced this

this agent the environment the agent measures the environment through this state s it takes some

action to interact with the environment given by a policy so it has a control policy based on how

it acts based on what state it's in and it is optimizing this policy to maximize its future

rewards that it gets from the environment and so mathematically this policy is probabilistic so

the agent has some kind of some probabilistic strategy for interacting with the environment

because the environment might be stochastic or have some randomness to it and there is a value

function that tells the agent how valuable being in a given state is given the policy PI that it

is enacting and so today what we're going to do is we're going to augment this picture by introducing

deep neural networks for example to represent the policy and so here we have now we've replaced our

policy with a deep neural network so this PI is parametrized by theta where theta describes this

this neural network and again it Maps the current state to the best probabilistic action to take

in that environment and so the whole name of the game is to update this policy to maximize future

rewards and again we have this discount rate gamma here that says that rewards in the near future are

worth more than rewards in the distant future okay because again remember these rewards are gonna

be relatively sparse and infrequent most of the time because we're in a semi supervised learning

framework where these rewards are only occasional and so it's difficult to figure out what actions

actually gave rise to those rewards this is gonna be a pretty hard optimization problem

to learn this best policy of what actions to take but you know and actually the whole reinforcement

learning paradigm is biologically inspired it's it's essentially inspired by this observation so

there's this this notion called hebbian learning you may have heard this before and the little

rhyme goes neurons that fire together wire together and basically what that means is that

when you have neural activity kind of when things fire together they will essentially strengthen the

wiring and the connections between those neurons and biological systems and so in these kind of

deep reinforcement learning architectures the idea is that the reward signal that you

get occasionally should somehow strengthen connections that led to a good policy when

when the right policy is firing when these neurons are connected in a way that causes the policy the

correct policy and you get a reward you want to somehow reinforce that architecture and there's

lots of ways of doing this you know essentially through back propagation and so on and so forth so

another area where a lot of research is going into deep learning for reinforcement learning is called

cue learning I talked about this in the last video where this cue or quality function essentially

kind of combines the policy and the value and it tells you jointly how good is a current state

given a current action a-okay and so assuming that I do the best possible thing for all future states

and actions so right now if I find myself in state s an action a I can assign a quality based on the

future value that I expect given that state and given the best possible policy I can cook up and

again there are deep queue networks where you learn this quality function and once you learn

this quality function then when you find yourself in a state s you just look up the best possible a

that gives you the highest quality for that state and this makes a lot of sense this is a lot like

how a person would learn how to play chess is they would kind of simultaneously be building a policy

of okay here's how I move in these situations these are the trades I'm willing to make and

you're also building a value function of how you value different board positions and kind of how

you gauge your strength and the strength of your position based on on the state so it kind of makes

sense that this would be an area for really expanding with deep neural networks because

these functions might be very very complex functions of s and a and that's exactly what

neural networks are good at is giving you very very complex representing very complex functions

if you have enough training data so that's what we're talking about here these still suffer from

all the same challenges of regular reinforcement learning like the credit assignment problem so the

fact that I might only get our reward at the very end of my a chess game makes it very hard to tell

which actions actually gave rise to that award reward and so you're gonna do some of the same

things that you would normally do like you might use reward shaping to give intermediate rewards based on some

expert intuition or guidance and there's lots of other strategies like hindsight and replay and things like that

The basic idea is we're gonna take the same reinforcement learning architecture and we're going to either replace the policy

or the the Q function with a policy network or a Q network

And all of this kind of exploded on the scene because of this 2015 nature paper

"Human level control through deep reinforcement learning" where these authors from DeepMind

essentially showed that they could build a reinforcement learner that could beat human level performance

in lots of classic Atari video games. So I'm gonna hit play.

I love this one... this is one of the first ones that got me really excited about this.

So this reinforcement learner is essentially trying to maximize this score by breaking all of these these blocks

and after a few hours of training it has an epiphany that only really excellent human players ever reach

so it essentially finds an exploit in the game where it realizes that if it tunnels in one side... so It's going to tunnel through here.

If it tunnels through one side it can essentially use the physics of the game to break all of these blocks for it.

And that's pretty amazing. So it in a short amount of time learns a really advanced strategy

that only a few humans only a small percentage of humans would actually learn eventually

so really impressive... this is a beautiful paper by the way, you should should go read this they talk about how

they actually build their networks so they use the pixels of the screen itself as the input and they

use convolutional layers and fully connected layers eventually deciding what the joystick

should do what actions to take. And a lot like other examples of neural networks this was the

paper that really brought reinforcement learning and deep reinforcement learning back to everyone's

to the forefront of everyone's mind because this showed performance that hadn't been attainable

before so this is a lot like the image net of reinforcement learning okay this brought it back

into the forefront. Since then so Google bought this company for half a billion dollars because

this promised a big step towards general artificial intelligence or an artificial

intelligence system that could get good at lots of things rather than just one very specific task and

since then billions of dollars have been invested into reinforcement learning in general and deep

reinforcement learning by it by companies and so in this original paper these are all of the Atari

games that they they try out and this is the level above this they beat they're at or above

human level performance and there's only a few games that they're below human level performance

and I actually think it's pretty interesting to look through these and figure out why why this

this deep mind was unable to figure out these games but it was able to figure out these games

that's kind of an interest interesting exercise actually I will point out that the the algorithm

in this paper you could train it on one of these games and it would get really really good but that

same learned algorithm that reinforcement learner cannot then be used to play another game without

completely retraining it and so it's still a way you know a ways away from where we want to be we

want to have a learning system that can learn to play all of these games and if it learns

one it can learn how to play the other faster and better just like a human does we're still a ways

away from that but tons of people are working on it it's one of the big big problems in the

field is kind of transfer learning and general artificial intelligence using reinforcement

learning so building a learner that can learn lots of things and learn faster from its experience

that's what humans do you get a kid you teach them tic-tac-toe tic-tac-toe is easy they learn

the rules they learn how to not lose then you give them checkers checkers is a little bit more

sophisticated but they remember everything they learn from tic-tac-toe and they learn checkers

faster again they learn how to not lose then you give them chess chess is a truly open-ended game

for most humans it takes a lifetime to master and so based on what they know from tic-tac-toe

and checkers a child will learn chess you know they can transfer some of that over and then

they go into the real world and they learn lots of other skills and those problem-solving skills

transfer over that's what we eventually want with computers we're not there yet but this was a big

step and I got everybody really excited and still does good this is another video I love this is

from the tech insider showing you know a lot of these reinforcement learning algorithms are used

to train so this is google's deepmind to train the reagent how to walk or run or leap or swim

or fly in an artificial environment and that's pretty amazing I mean this is a very complex

hard thing to do we take it for granted our bodies are built to move very efficiently and

agile and accurate ways but this is actually very challenging to do and so the fact that

these these algorithms in a virtual environment can learn how to run and walk and fly and suin

is really promising for robotic technology so we eventually want to learn how to do this in

robotic systems and make our robotic agents more independent and more agile and that's actually

really hard that step from from the artificial world to the real world is challenging so for

example to my knowledge Boston Dynamics does not use a ton of reinforcement learning they use a

lot of physics based modeling and kind of by hand controls maybe they're getting into reinforcement

learning but there's a long way to go to do this in the real world it's still quite impressive and

this is a video I love so this is training bipedal walkers essentially generation after generation of

reinforcement learner and eventually the the agent can learn kind of the physics and learn the right

control policy in this case to walk forwards stabili and you know keep its neck straight and

keep its legs straight and I think this is just a really cool video I'm in a nice demonstration

I'll point out there's some great resources other resources on YouTube you should check out

for reinforcement learning I learned about this in two minute papers which is a great Channel I

love love love learning about things in two minute two minutes papers there's also archive insights

has a great series on reinforcement learning and Brian Douglas has a nice kind of reinforcement

learning for control video I believe it's in on math works okay good so really impressive

performance just in the last ten years because of these huge advances in the representational power

of neural networks these can represent functions that we didn't we previously couldn't represent

because they're extremely expressive and we have lots more training data we can train these because

our computers have gotten so much faster and more powerful and there's also open source software

that makes it really really easy to get started building these neural network representations

and so if you have any interest at all in modern reinforcement learning you have to check out open

AI Jim this is a wonderful open source kind of development framework where you can try out your

new reinforcement learning algorithm on all of these different systems both Atari games simulated

you know running and pendulum and really cool physical systems that are hard to control that

are nonlinear you can get started really quickly and easily in the open AI gym and this is one of

the big reasons things are taking off so fast is because there are these amazing open resources

for people to kind of try things out quickly and rapidly prototype and again you can try out all

of these Atari games and see if you can build a reinforcement learner that can learn multiple

Atari games with the same architecture that would be pretty incredible ok and I also want to point

out these things are getting pretty impressive so I almost everybody's heard of alphago and

how alphago beat the human best go player in the world Lisa doll doll from South Korea the the best

go player in the world was defeated by alphago this was a deep reinforcement learning algorithm

developed at google deepmind with the sole purpose of learning how to play go okay and so I want to

point out that so reinforcement learning is really good at learning the rules of the game and how to

win the game when it's very constrained and when it has all the time in the world to try a millions

or billions of different go games so it's playing essentially with itself getting better and better

and better Lisa Dahl is a human and he has a life and although he spends much of his time in the NGO

world he has a much much richer broader world and he can go home and he can go for a walk and he can

enjoy a sunset and so I I do want to point out that and he can learn from alphago and come back

and be better the next time and I think that's really also quite impressive is that the human

masters learned from from from deep mind and actually got better once they they played up

to their competition I think that's really cool but anyway this alphago learning system I think

is really interesting this deserves a whole set of videos just to dive in I think there there's a

documentary on this it's really interesting so you should check it out the original alphago algorithm

was based on a convolutional neural network at CNN and it had lots of reward shaping from humans so

expert humans guided the reward structure for this alphago learner so it didn't have to wait

until the end of the game to figure out if it won or lost because that would take forever instead

humans using reward shaping helped it to get intermediate rewards to help it to learn faster

and give it a denser reward structure now that's tricky because if a human guides the learning

that almost caps out how good it can possibly be because it's it's fundamentally relying on

human knowledge so the next generation alphago zero which came a couple of years later was even

better and much more impressive it didn't use any human features no reward shaping it only learned

using self play it just played itself until it became so powerful that it could beat everyone

in the world including the original one and it was based on a residual network architecture kind of

these that had these you know jump connections and are easier to train with with backpropagation okay

so really cool advances and that actually was one of the major advances that showed that ResNet was

you know kind of definitely a major contender in the in the neural network architecture scene

okay so anyway I think this is really cool it's really interesting but again that is all that

this algorithm does in the world and it doesn't know how to take what it learned from go and do

anything else better and that's you know that's what we need that's what we want in these systems

is to get really good at a game and then take that and allow it you know use that knowledge

to get really good at something else that's what we do and that's really fun and you know that's

one of our strengths so some other examples I love this is a video from Stanford and from ETH

Zurich where they are essentially going to Train using reinforcement learning flying on hat aerial

vehicles a helicopter and a quadrotor so they can train these very aggressive very high-performance

maneuvers using reinforcement learning and so it is starting to happen that we're going from these

simulated environments to the real world to real robotic systems but it's very challenging it takes

you know a lot of training and human guidance in general and they're still limited examples

of real robotic reinforcement learning actually someone asked me in my class that I teach it at

the University of Washington what are some real examples of reinforcement learning in industry

because a lot of you know a lot of times you hear these game examples Atari or alphago but

I think actually one of the original ones was in elevator scheduling so it turns out and this

is I would have never thought but it's really interesting it turns out that in a really really

big building like a super skyscraper it has tons of elevators and lots of floors scheduling these

things efficiently so that you don't get jammed up and so that people can get where they're going

as fast as possible is a huge problem it's a combinatorially hard problem and it's really

hard to solve and reinforcement learning was one of the early algorithms that was used to kind of

figure out this near optimal scheduling policy kind of interesting good ok so robotic learning

is getting pretty good okay so I just want to take a couple of steps back and maybe summarize

so we've talked about reinforcement learning in general both in the last lecture and in this

one as a kind of framework for learning how to interact with a complex environment based on your

experiences this is fundamentally biologically inspired it's trying to mimic how we learn how

animals learn how you train a dog and things like that and in the last ten years because of advances

in deep neural network and major advances in this architecture and how we actually build

and optimize these reinforcement learners there have been big steps towards more powerful more

general learning frameworks that can learn how to interact with more complex environments like

beating humans at alphago or actually moving you know real robotic systems and it really incredible

ways but there's still a long way to go so again I've said this a lot I'm gonna keep stressing this

we humans have bodies and we are curious we go explore and we touch and we learn and you know

children are curious and they constantly are soaking up knowledge and when they learn one

thing they can use it through this incredible human power of abstraction they can use what

they learn in one scenario in a totally different scenario that is still a completely open problem

in reinforcement learning maybe not completely open it is a pressing and central challenge in

modern reinforcement learning is how to take what you learn and generalize how to take a

step back and use your expertise in one problem in one environment to solve another problem in

another environment that would be real general artificial intelligence we're a long way away

from that so the good news is that that's not going to be solved in five years or ten years

that's a hundreds of year problem but it's really exciting because that means you know there's work

to be done important interesting work in the field of reinforcement learning for you know

lifetimes of research to be done in figuring out how to improve these systems figuring out

how to learn faster and better both from from what you're you're learning and the reward structure of

this problem but also from what you've learned in other problems okay thank you so much for watching

Cliquez sur n'importe quel texte ou horodatage pour accéder directement à ce moment de la vidéo

La plupart des transcriptions sont prêtes en moins de 5 secondes

Copie en un clicPlus de 125 languesRechercher dans le contenuAller aux horodatages

Collez une URL YouTube

Entrez le lien de n'importe quelle vidéo YouTube pour obtenir la transcription complète

La plupart des transcriptions sont prêtes en moins de 5 secondes

Installez notre extension Chrome

Obtenez les transcriptions instantanément sans quitter YouTube. Installez notre extension Chrome et accédez en un clic à la transcription de n'importe quelle vidéo directement depuis la page de lecture.

Ajouter à Chrome — Gratuit

Compatible avec YouTube, Coursera, Udemy et d'autres plateformes éducatives

Obtenez des transcriptions instantanément : Modifiez simplement le domaine dans votre barre d'adresse !

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

Transcription YouTubePréparation de vos résultats…

Transcription YouTube :Deep Reinforcement Learning: Neural Networks for Learning Control Laws