YouTube Transcript:
#1_ Q Learning Algorithm Solved Example | Reinforcement Learning | Machine Learning by Mahesh Huddar
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
welcome back in this video I will discuss how to apply Q learning algorithm for the given problem definition this is the solved example number one the link for other examples is given in the description below in this case we have been given a building with five rooms like zero one two three and four other rooms the outside is considered as one big room and it is represented as 5 in this case between each room there are some doors are there the meaning is that the agent can go from in this case 0 to 4 or 420 in this case similarly the agent can go from one to three or three to one in this case so what we do is we will convert this particular building into a state and actions over here each room is represented as the state and the door is represented as an action in this particular case so this is how actually the thing will look like uh this is the state that is 0 1 2 3 4 and 5 between 0 and 4 there is a door here it represents an action in this case so the user can or the agent can go from zero to four or he can come from 4 to 0. similarly uh the other things can happen and one more thing is there is no door between five and five so the user can go from five to five over here so that is represented here so in this case we assume that 5 is the goal State here so we need to identify an optimal path from each and every uh state to this particular goal State over here and one more very important thing we need to remember is uh an action which will lead to this particular goal state will get an instance reward of 100 here remaining all are 0 in this particular case so this is the one action uh this is second action and third action which will lead to this particular goal state so each and every action is given an instance reward of hundred remaining all are zero in this particular case now uh what we try to do is we will try to apply the Q learning algorithm to this particular I can say that state diagram to get the optimal path over here so the very first thing what we need to do is we need to write the reward Matrix the reward Matrix contains the states or the rows and action as the columns over here in this case we have six states namely 0 to 5 actions are again six here namely 0 to 5 in this case now I I will show you how to put fill this particular reward Matrix let us assume that you are present in state 0. when you are present in 0 you can perform only one action that is this is the action which will go to uh state number four that is uh when you are present in state 0 you can perform an action four and then what is the reward over here that is zero remaining all or minus 1 in this case so from this particular row you can understand that when you are present in state 0 you can perform only action 4 over here similarly I will talk about this particular second row when you are present in state number one you can perform an action of five here or you can perform an action 3 here if you perform an action 3 the reward is zero if you perform an action 5 the reward is 100 here so when you are present in one if you perform an action 3 the reward is zero if you perform an action five the reward is 100 here similarly we have to fill this particular entire Matrix here minus 1 indicates there is no direct Edge between that particular States over here now uh this algorithm I have discussed in detail in the previous video the link for that particular videos present in the description below go through that particular video so that you can understand the Q learning algorithm in detail uh that will help you to understand this particular example over here now uh coming back to the next part of this particular algorithm the very first thing what is required is we need some learning rate I will initialize it to 0.8 we need to start from One initial State I will consider the initial State as one over here and then we need to initialize the Q Matrix over here the Q Matrix is initialized to 0 initially so we have put 0 for every state and action in this particular case now what I do is uh I will consider the initial State as one as I said earlier though we will start with the initial State as one because initial state is 1 we can perform two actions over here either we can perform three or we can five couple from five here so if you perform three you will get the immediate reward of zero and if you perform an action five you will get an immediate word of 100 here so between these two we need to select one action here let us assume that I will select an action five in this case if I select an action 5 I will get the immediate reward of 100 in this case and the next state will become 5 over here okay so the current state is 1 and the next state is 5 and the immediate reward in this case is 100 over here now when you get this particular next state as hun 5 we need to identify what all things we can perform when we are present in the next state five over here so what all things you can perform you can perform action one because here it is zero you can perform action 4 because it is 0 here you can perform action 5 here because it is 100 here you cannot perform any other things because uh when you are present in this particular uh what you can uh five you can cannot perform action 0 because it is minus one over here we can perform one or you can perform four or you can perform five in this case so now uh we know the Q learning equation that is Q of for current state comma action current status or initial state is one comma action what we have selected five uh that is the action over here R of State comma action state is one action is 5 here so R of 1 comma 5 is how much hundred so that will become over here gamma is the learning rate multiplied by maximum of very important Point Q of next state comma all actions what is the next state we have selected five here what are the actions you can perform you can perform one four or five so you have to put it over here now uh Q of one comma five R of 1 comma five plus point eight that's a gamma maximum of Q of next state that is 5 what all actions can perform first time one second time four third time five here so this particular 5 comma one five comma four five comma five in this particular Q Matrix everything is 0 here so maximum of 0 is 0 multiplied by point eight is zero R of 1 comma five is how much R of 1 comma five is equal to 100 so this will become 100 in this case Q of one comma five one comma five will be hundred in this particular case so that is what you can notice in this case now uh what has happened here is uh when we were presenting this particular initial State 1 and after applying that particular Q learning algorithm we have reached to the goal State here because we have reached the goal State one episode has finished now we need to perform the same set of episodes for each and every initial States over here now uh what I will do is I will consider the initial status 3 for the next episode so if I consider 3 as the uh initial State for the next episode you can see here you can perform an action 1 or action 2 or action 4 because we have some values here remaining all or minus one so we cannot perform those particular actions that is what I have written here now between these three we need to select one action let us say that I will select action one in this particular case so the current state or the initial state is 3 and the next state is equivalent to 1 in this particular case from the initial State 2 1 we got the immediate reward as zero in this case that's a one more important point we need to remember here now once you select one as the next state what all things you can perform you can perform an action three or you can perform an action 5 over here so those are two things you can perform here so we will put those body things in this equation again the initial state is 3 and action we have selected is one so three one over here gamma of Maximum what is the next state we have selected the next state we have selected is one what all action we can perform is three and five over here so if you put all those things in this situation that is three comma 1 R 3 comma one what is R3 comma 1 3 comma one what is the value here zero so that is what I have written here 0.8 is the gamma value Max Max is nothing but Max of this particular part so max of Q next state is what next state we have selected one what are the actions you can perform three and then 5 over here so 1 comma 3 and 1 comma five one comma three is how much 1 comma 3 is equal into zero one comma five is equal to 100 here so 0 and 100 between these two what is the maximum that is 100 here so 100 multiplied by 0.8 is 80 and this particular part is already 0 3 comma 1 you can see here 3 comma 1 is equal to 0 here so this will lead to 80 in this particular case the value of Q3 comma 1 you can see here 3 comma 1 is 0 initially now it has become 80 in this particular case that is 3 comma 1 is equal to 80 over here now we need to perform this particular episodes because we have done with only two episode till now uh the same thing has to be repeated again and again once you do this particular things again and again you will come up with this particular final Q Matrix over here now once you get this particular queue Matrix you can trace any sequence over here that is what is possible uh let us assume that we are present in the initial state two so we will try to get the best optimal what you can say that the path here so if that is the case either you can draw this particular diagram or a state diagram or you can trace it over here also now when you're present in this particular 2 you can select one best value from this particular row so right now we have only one best value that is 64 which will lead to state number three so when you are present in state number three here we have a three possible values 80 51 and 80 between these three 80 is the best value so either you can select one here or you can select four over here so if you select one it will go this path if you select four it will go to this particular path now if you have selected one from one which is the best possible path this is the five over here so you can say that two to three three two one one two five over here similarly if you have selected four in this particular case from 4 you can select the best path as uh this one that's the five is the next action you can select so two to three three to four and then four to uh five in this particular casing so this is how we can apply a q learning algorithm to Any Given problem definition I hope the concept is clear if you like the video do like and share with your friends press the Subscribe button for more videos press the Bell icon for regular updates thank you for watching
Share:
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
How It Works
Copy YouTube Link
Grab any YouTube video URL from your browser
Paste & Extract
Paste the URL and we'll fetch the transcript
Use the Text
Search, copy, or save the transcript
Why you need YouTube Transcript?
Extract value from videos without watching every second - save time and work smarter
YouTube videos contain valuable information for learning and entertainment, but watching entire videos is time-consuming. This transcript tool helps you quickly access, search, and repurpose video content in text format.
For Note Takers
- Copy text directly into your study notes
- Get podcast transcripts for better retention
- Translate content to your native language
For Content Creators
- Create blog posts from video content
- Extract quotes for social media posts
- Add SEO-rich descriptions to videos
With AI Tools
- Generate concise summaries instantly
- Create quiz questions from content
- Extract key information automatically
Creative Ways to Use YouTube Transcripts
For Learning & Research
- Generate study guides from educational videos
- Extract key points from lectures and tutorials
- Ask AI tools specific questions about video content
For Content Creation
- Create engaging infographics from video content
- Extract quotes for newsletters and email campaigns
- Create shareable memes using memorable quotes
Power Up with AI Integration
Combine YouTube transcripts with AI tools like ChatGPT for powerful content analysis and creation:
Frequently Asked Questions
Is this tool really free?
Yes! YouTubeToText is completely free. No hidden fees, no registration needed, and no credit card required.
Can I translate the transcript to other languages?
Absolutely! You can translate subtitles to over 125 languages. After generating the transcript, simply select your desired language from the options.
Is there a limit to video length?
Nope, you can transcribe videos of any length - from short clips to multi-hour lectures.
How do I use the transcript with AI tools?
Simply use the one-click copy button to copy the transcript, then paste it into ChatGPT or your favorite AI tool. Ask the AI to summarize content, extract key points, or create notes.
Timestamp Navigation
Soon you'll be able to click any part of the transcript to jump to that exact moment in the video.
Have a feature suggestion? Let me know!Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.