YouTube Transcript:
#1_ Q Learning Algorithm Solved Example | Reinforcement Learning | Machine Learning by Mahesh Huddar
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
View:
welcome back in this video I will
discuss how to apply Q learning
algorithm for the given problem
definition this is the solved example
number one the link for other examples
is given in the description below
in this case we have been given a
building with five rooms like zero one
two three and four other rooms the
outside is considered as one big room
and it is represented as 5 in this case
between each room there are some doors
are there the meaning is that the agent
can go from in this case 0 to 4 or 420
in this case similarly the agent can go
from one to three or three to one in
this case
so what we do is we will convert this
particular building into a state and
actions over here each room is
represented as the state and the door is
represented as an action in this
particular case so this is how actually
the thing will look like uh this is the
state that is 0 1 2 3 4 and 5 between 0
and 4 there is a door here it represents
an action in this case so the user can
or the agent can go from zero to four or
he can come from 4 to 0. similarly uh
the other things can happen and one more
thing is there is no door between five
and five so the user can go from five to
five over here so that is represented
here
so in this case we assume that 5 is the
goal State here so we need to identify
an optimal path from each and every uh
state to this particular goal State over
here
and one more very important thing we
need to remember is uh an action which
will lead to this particular goal state
will get an instance reward of 100 here
remaining all are 0 in this particular
case so this is the one action uh this
is second action and third action which
will lead to this particular goal state
so each and every action is given an
instance reward of hundred remaining all
are zero in this particular case
now uh what we try to do is we will try
to apply the Q learning algorithm to
this particular I can say that state
diagram to get the optimal path over
here so the very first thing what we
need to do is we need to write the
reward Matrix the reward Matrix contains
the states or the rows and action as the
columns over here in this case we have
six states namely 0 to 5 actions are
again six here namely 0 to 5 in this
case
now I I will show you how to put fill
this particular reward Matrix let us
assume that you are present in state 0.
when you are present in 0 you can
perform only one action that is this is
the action which will go to uh state
number four that is uh when you are
present in state 0 you can perform an
action four
and then what is the reward over here
that is zero remaining all or minus 1 in
this case so from this particular row
you can understand that when you are
present in state 0 you can perform only
action 4 over here
similarly I will talk about this
particular second row when you are
present in state number one you can
perform an action of five here or you
can perform an action 3 here if you
perform an action 3 the reward is zero
if you perform an action 5 the reward is
100 here so when you are present in one
if you perform an action 3 the reward is
zero if you perform an action five the
reward is 100 here similarly we have to
fill this particular entire Matrix here
minus 1 indicates there is no direct
Edge between that particular States over
here
now uh this algorithm I have discussed
in detail in the previous video the link
for that particular videos present in
the description below go through that
particular video so that you can
understand the Q learning algorithm in
detail uh that will help you to
understand this particular example over
here
now uh coming back to the next part of
this particular algorithm the very first
thing what is required is we need some
learning rate I will initialize it to
0.8 we need to start from One initial
State I will consider the initial State
as one over here and then we need to
initialize the Q Matrix over here the Q
Matrix is initialized to 0 initially so
we have put 0 for every state and action
in this particular case
now what I do is uh I will consider the
initial State as one as I said earlier
though we will start with the initial
State as one
because initial state is 1 we can
perform two actions over here either we
can perform three or we can five couple
from five here so if you perform three
you will get the immediate reward of
zero and if you perform an action five
you will get an immediate word of 100
here so between these two we need to
select one action here let us assume
that I will select an action five in
this case if I select an action 5 I will
get the immediate reward of 100 in this
case and the next state will become 5
over here okay so the current state is 1
and the next state is 5 and the
immediate reward in this case is 100
over here now when you get this
particular next state as hun 5 we need
to identify what all things we can
perform when we are present in the next
state five over here so what all things
you can perform you can perform action
one because here it is zero you can
perform action 4 because it is 0 here
you can perform action 5 here because it
is 100 here
you cannot perform any other things
because uh when you are present in this
particular uh what you can uh five you
can cannot perform action 0 because it
is minus one over here we can perform
one or you can perform four or you can
perform five in this case so now uh we
know the Q learning equation that is Q
of for current state comma action
current status or initial state is one
comma action what we have selected five
uh that is the action over here R of
State
comma action state is one action is 5
here so R of 1 comma 5 is how much
hundred so that will become over here
gamma is the learning rate multiplied by
maximum of very important Point Q of
next state comma all actions what is the
next state we have selected five here
what are the actions you can perform you
can perform one four or five so you have
to put it over here now uh Q of one
comma five R of 1 comma five plus point
eight that's a gamma maximum of Q of
next state that is 5 what all actions
can perform first time one second time
four third time five here so this
particular 5 comma one five comma four
five comma five in this particular Q
Matrix everything is 0 here so maximum
of 0 is 0 multiplied by point eight is
zero R of 1 comma five is how much R of
1 comma five is equal to 100 so this
will become 100 in this case Q of one
comma five one comma five will be
hundred in this particular case so that
is what you can notice in this case
now uh what has happened here is uh when
we were presenting this particular
initial State 1 and after applying that
particular Q learning algorithm we have
reached to the goal State here because
we have reached the goal State one
episode has finished now we need to
perform the same set of episodes for
each and every initial States over here
now uh what I will do is I will consider
the initial status 3 for the next
episode so if I consider 3 as the uh
initial State for the next episode you
can see here you can perform an action 1
or action 2 or action 4 because we have
some values here remaining all or minus
one so we cannot perform those
particular actions that is what I have
written here
now between these three we need to
select one action let us say that I will
select action one in this particular
case
so the current state or the initial
state is 3 and the next state is
equivalent to 1 in this particular case
from the initial State 2 1 we got the
immediate reward as zero in this case
that's a one more important point we
need to remember here
now once you select one as the next
state what all things you can perform
you can perform an action three or you
can perform an action 5 over here so
those are two things you can perform
here so we will put those body things in
this equation again the initial state is
3 and action we have selected is one so
three one over here gamma of Maximum
what is the next state we have selected
the next state we have selected is one
what all action we can perform is three
and five over here so if you put all
those things in this situation that is
three comma 1 R 3 comma one what is R3
comma 1 3 comma one what is the value
here zero so that is what I have written
here 0.8 is the gamma value Max Max is
nothing but Max of this particular part
so max of Q next state is what next
state we have selected one what are the
actions you can perform three and then 5
over here so 1 comma 3 and 1 comma five
one comma three is how much 1 comma 3 is
equal into zero one comma five is equal
to 100 here so 0 and 100 between these
two what is the maximum that is 100 here
so 100 multiplied by 0.8 is 80 and this
particular part is already 0 3 comma 1
you can see here 3 comma 1 is equal to 0
here so this will lead to 80 in this
particular case
the value of Q3 comma 1 you can see here
3 comma 1 is 0 initially now it has
become 80 in this particular case that
is 3 comma 1 is equal to 80 over here
now we need to perform this particular
episodes because we have done with only
two episode till now
uh the same thing has to be repeated
again and again once you do this
particular things again and again you
will come up with this particular final
Q Matrix over here now once you get this
particular queue Matrix you can trace
any sequence over here that is what is
possible uh let us assume that we are
present in the initial state two so we
will try to get the best optimal what
you can say that the path here so if
that is the case either you can draw
this particular diagram or a state
diagram or you can trace it over here
also now when you're present in this
particular 2 you can select one best
value from this particular row so right
now we have only one best value that is
64 which will lead to state number three
so when you are present in state number
three here we have a three possible
values 80 51 and 80 between these three
80 is the best value so either you can
select one here or you can select four
over here so if you select one it will
go this path if you select four it will
go to this particular path now if you
have selected one from one which is the
best possible path this is the five over
here so you can say that two to three
three two one one two five over here
similarly if you have selected four in
this particular case from 4 you can
select the best path as uh this one
that's the five is the next action you
can select so two to three three to four
and then four to uh five in this
particular casing
so this is how we can apply a q learning
algorithm to Any Given problem
definition
I hope the concept is clear if you like
the video do like and share with your
friends press the Subscribe button for
more videos press the Bell icon for
regular updates thank you for watching
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.
Works with YouTube, Coursera, Udemy and more educational platforms
Get Instant Transcripts: Just Edit the Domain in Your Address Bar!
YouTube
←
→
↻
https://www.youtube.com/watch?v=UF8uR6Z6KLc
YoutubeToText
←
→
↻
https://youtubetotext.net/watch?v=UF8uR6Z6KLc