YouTube Transcript:
#1_ Q Learning Algorithm Solved Example | Reinforcement Learning | Machine Learning by Mahesh Huddar

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

Video Transcript

View:

welcome back in this video I will

discuss how to apply Q learning

algorithm for the given problem

definition this is the solved example

number one the link for other examples

is given in the description below

in this case we have been given a

building with five rooms like zero one

two three and four other rooms the

outside is considered as one big room

and it is represented as 5 in this case

between each room there are some doors

are there the meaning is that the agent

can go from in this case 0 to 4 or 420

in this case similarly the agent can go

from one to three or three to one in

this case

so what we do is we will convert this

particular building into a state and

actions over here each room is

represented as the state and the door is

represented as an action in this

particular case so this is how actually

the thing will look like uh this is the

state that is 0 1 2 3 4 and 5 between 0

and 4 there is a door here it represents

an action in this case so the user can

or the agent can go from zero to four or

he can come from 4 to 0. similarly uh

the other things can happen and one more

thing is there is no door between five

and five so the user can go from five to

five over here so that is represented

here

so in this case we assume that 5 is the

goal State here so we need to identify

an optimal path from each and every uh

state to this particular goal State over

here

and one more very important thing we

need to remember is uh an action which

will lead to this particular goal state

will get an instance reward of 100 here

remaining all are 0 in this particular

case so this is the one action uh this

is second action and third action which

will lead to this particular goal state

so each and every action is given an

instance reward of hundred remaining all

are zero in this particular case

now uh what we try to do is we will try

to apply the Q learning algorithm to

this particular I can say that state

diagram to get the optimal path over

here so the very first thing what we

need to do is we need to write the

reward Matrix the reward Matrix contains

the states or the rows and action as the

columns over here in this case we have

six states namely 0 to 5 actions are

again six here namely 0 to 5 in this

case

now I I will show you how to put fill

this particular reward Matrix let us

assume that you are present in state 0.

when you are present in 0 you can

perform only one action that is this is

the action which will go to uh state

number four that is uh when you are

present in state 0 you can perform an

action four

and then what is the reward over here

that is zero remaining all or minus 1 in

this case so from this particular row

you can understand that when you are

present in state 0 you can perform only

action 4 over here

similarly I will talk about this

particular second row when you are

present in state number one you can

perform an action of five here or you

can perform an action 3 here if you

perform an action 3 the reward is zero

if you perform an action 5 the reward is

100 here so when you are present in one

if you perform an action 3 the reward is

zero if you perform an action five the

reward is 100 here similarly we have to

fill this particular entire Matrix here

minus 1 indicates there is no direct

Edge between that particular States over

here

now uh this algorithm I have discussed

in detail in the previous video the link

for that particular videos present in

the description below go through that

particular video so that you can

understand the Q learning algorithm in

detail uh that will help you to

understand this particular example over

here

now uh coming back to the next part of

this particular algorithm the very first

thing what is required is we need some

learning rate I will initialize it to

0.8 we need to start from One initial

State I will consider the initial State

as one over here and then we need to

initialize the Q Matrix over here the Q

Matrix is initialized to 0 initially so

we have put 0 for every state and action

in this particular case

now what I do is uh I will consider the

initial State as one as I said earlier

though we will start with the initial

State as one

because initial state is 1 we can

perform two actions over here either we

can perform three or we can five couple

from five here so if you perform three

you will get the immediate reward of

zero and if you perform an action five

you will get an immediate word of 100

here so between these two we need to

select one action here let us assume

that I will select an action five in

this case if I select an action 5 I will

get the immediate reward of 100 in this

case and the next state will become 5

over here okay so the current state is 1

and the next state is 5 and the

immediate reward in this case is 100

over here now when you get this

particular next state as hun 5 we need

to identify what all things we can

perform when we are present in the next

state five over here so what all things

you can perform you can perform action

one because here it is zero you can

perform action 4 because it is 0 here

you can perform action 5 here because it

is 100 here

you cannot perform any other things

because uh when you are present in this

particular uh what you can uh five you

can cannot perform action 0 because it

is minus one over here we can perform

one or you can perform four or you can

perform five in this case so now uh we

know the Q learning equation that is Q

of for current state comma action

current status or initial state is one

comma action what we have selected five

uh that is the action over here R of

State

comma action state is one action is 5

here so R of 1 comma 5 is how much

hundred so that will become over here

gamma is the learning rate multiplied by

maximum of very important Point Q of

next state comma all actions what is the

next state we have selected five here

what are the actions you can perform you

can perform one four or five so you have

to put it over here now uh Q of one

comma five R of 1 comma five plus point

eight that's a gamma maximum of Q of

next state that is 5 what all actions

can perform first time one second time

four third time five here so this

particular 5 comma one five comma four

five comma five in this particular Q

Matrix everything is 0 here so maximum

of 0 is 0 multiplied by point eight is

zero R of 1 comma five is how much R of

1 comma five is equal to 100 so this

will become 100 in this case Q of one

comma five one comma five will be

hundred in this particular case so that

is what you can notice in this case

now uh what has happened here is uh when

we were presenting this particular

initial State 1 and after applying that

particular Q learning algorithm we have

reached to the goal State here because

we have reached the goal State one

episode has finished now we need to

perform the same set of episodes for

each and every initial States over here

now uh what I will do is I will consider

the initial status 3 for the next

episode so if I consider 3 as the uh

initial State for the next episode you

can see here you can perform an action 1

or action 2 or action 4 because we have

some values here remaining all or minus

one so we cannot perform those

particular actions that is what I have

written here

now between these three we need to

select one action let us say that I will

select action one in this particular

case

so the current state or the initial

state is 3 and the next state is

equivalent to 1 in this particular case

from the initial State 2 1 we got the

immediate reward as zero in this case

that's a one more important point we

need to remember here

now once you select one as the next

state what all things you can perform

you can perform an action three or you

can perform an action 5 over here so

those are two things you can perform

here so we will put those body things in

this equation again the initial state is

3 and action we have selected is one so

three one over here gamma of Maximum

what is the next state we have selected

the next state we have selected is one

what all action we can perform is three

and five over here so if you put all

those things in this situation that is

three comma 1 R 3 comma one what is R3

comma 1 3 comma one what is the value

here zero so that is what I have written

here 0.8 is the gamma value Max Max is

nothing but Max of this particular part

so max of Q next state is what next

state we have selected one what are the

actions you can perform three and then 5

over here so 1 comma 3 and 1 comma five

one comma three is how much 1 comma 3 is

equal into zero one comma five is equal

to 100 here so 0 and 100 between these

two what is the maximum that is 100 here

so 100 multiplied by 0.8 is 80 and this

particular part is already 0 3 comma 1

you can see here 3 comma 1 is equal to 0

here so this will lead to 80 in this

particular case

the value of Q3 comma 1 you can see here

3 comma 1 is 0 initially now it has

become 80 in this particular case that

is 3 comma 1 is equal to 80 over here

now we need to perform this particular

episodes because we have done with only

two episode till now

uh the same thing has to be repeated

again and again once you do this

particular things again and again you

will come up with this particular final

Q Matrix over here now once you get this

particular queue Matrix you can trace

any sequence over here that is what is

possible uh let us assume that we are

present in the initial state two so we

will try to get the best optimal what

you can say that the path here so if

that is the case either you can draw

this particular diagram or a state

diagram or you can trace it over here

also now when you're present in this

particular 2 you can select one best

value from this particular row so right

now we have only one best value that is

64 which will lead to state number three

so when you are present in state number

three here we have a three possible

values 80 51 and 80 between these three

80 is the best value so either you can

select one here or you can select four

over here so if you select one it will

go this path if you select four it will

go to this particular path now if you

have selected one from one which is the

best possible path this is the five over

here so you can say that two to three

three two one one two five over here

similarly if you have selected four in

this particular case from 4 you can

select the best path as uh this one

that's the five is the next action you

can select so two to three three to four

and then four to uh five in this

particular casing

so this is how we can apply a q learning

algorithm to Any Given problem

definition

I hope the concept is clear if you like

the video do like and share with your

friends press the Subscribe button for

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube Transcript:#1_ Q Learning Algorithm Solved Example | Reinforcement Learning | Machine Learning by Mahesh Huddar

Video Transcript

Paste YouTube URL

Transcript Extraction Form

Get Our Chrome Extension

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube Transcript:
#1_ Q Learning Algorithm Solved Example | Reinforcement Learning | Machine Learning by Mahesh Huddar