0:11 Hi everyone, this is Zulaikha from Edureka,
0:13 and I welcome you to this session
0:15 on Artificial Intelligence full course.
0:17 In this video, I'll be covering
0:19 all the domains and the concepts
0:20 involved under the umbrella of artificial intelligence,
0:23 and I will also be showing you a couple of use cases
0:26 and practical implementations by using Python.
0:29 So there's a lot to cover in this session,
0:31 and let me quickly run you through today's agenda.
0:34 So we're gonna begin the session
0:35 by understanding the history of artificial intelligence
0:38 and how it cam into existence.
0:40 We'll follow this by looking at
0:42 why we're talking about artificial intelligence now,
0:45 why has it gotten so famous right now.
0:47 Then we'll look at what exactly is artificial intelligence.
0:51 We'll discuss the applications of artificial intelligence,
0:54 after which we'll discuss the basics of AI
0:56 where in we'll understand
0:57 the different types of artificial intelligence.
1:00 We'll follow this by understanding
1:01 the different programming languages
1:03 that can be used to study AI.
1:05 And we'll understand why we're gonna choose Python.
1:08 Alright, I'll introduce you to Python.
1:10 And then we'll move on and discuss machine learning.
1:13 Here we'll discuss the different types of machine learning,
1:15 the different algorithms involved in machine learning,
1:18 which include classification algorithms,
1:20 regression algorithms, clustering,
1:22 and association algorithms.
1:24 To make you understand machine learning better,
1:26 we'll run a couple of demos
1:28 wherein we'll see how machine learning algorithms
1:30 are used to solve real world problems.
1:32 After that, we'll discuss
1:33 the limitations of machine learning
1:35 and why deep learning is needed.
1:37 I'll introduce you to the deep learning concept,
1:40 what are neurons, perceptrons,
1:42 multiple layer perceptrons and so on.
1:44 We'll discuss the different types of neural networks,
1:46 and we'll also look at what exactly back propagation is.
1:49 Apart from this, we'll be running a demo
1:51 to understand deep learning in more depth.
1:54 And finally we'll move onto the next module,
1:56 which is natural language processing.
1:58 On the natural language processing,
1:59 we'll try to understand what is text mining,
2:02 the difference between text mining in NLP,
2:04 what are the different terminologies in NLP,
2:06 and we'll end the session by looking at
2:08 the practical implementation of NLP using Python, alright.
2:12 So guys, there's a lot to cover in today's session.
2:15 Also, if you want to stay updated
2:16 about the recent technologies,
2:18 and would like to learn more about the training technology,
2:20 make sure you subscribe to our YouTube channel
2:22 to never miss out on such sessions.
2:25 So let's move ahead and take a look at our first topic
2:28 which is history of artificial intelligence.
2:30 So guys, the concept of artificial intelligence
2:33 goes back to the classical ages.
2:35 Under Greek mythology,
2:37 the concept of machines and mechanical men
2:39 were well thought of.
2:41 So, an example of this is Talos.
2:43 I don't know how many of you have heard of this.
2:45 Talos was a giant animated bronze warrior
2:48 who was programmed to guard the island of Crete.
2:51 Now these are just ideas.
2:52 Nobody knows if this was actually implemented,
2:55 but machine learning and AI were thought of long ago.
2:59 Now let's get back to the 19th century.
3:01 Now 1950 was speculated to be
3:03 one of the most important years
3:05 for the introduction of artificial intelligence.
3:08 In 1950, Alan Turing published a paper
3:11 in which he speculated about the possibility
3:14 of creating machines that think.
3:16 So he created what is known as the Turing test.
3:19 This test is basically used to determine
3:22 whether or not a computer can think intelligently
3:24 like a human being.
3:26 He noted that thinking is difficult to define
3:29 and devised his famous Turing test.
3:32 So, basically, if a machine can carry out a conversation
3:35 that was indistinguishable
3:37 from a conversation with a human being,
3:39 it was reasonable to say that the machine is thinking,
3:42 meaning that the machine will pass the Turing test.
3:45 Now, unfortunately, up to this date,
3:47 we haven't found a machine that has fully cleared
3:49 the Turing test.
3:51 So, the Turing test was actually the first serious proposal
3:54 in the philosophy of artificial intelligence.
3:57 Followed by this was the era of 1951.
4:00 This was also known as the game AI.
4:02 So in 1951, by using the Ferranti Mark 1 machine
4:06 of the University of Manchester,
4:08 a computer scientist known as Christopher Strachey
4:11 wrote a checkers program.
4:13 And at the same time,
4:14 a program was written for chess as well.
4:16 Now, these programs were later improved and redone,
4:20 but this was the first attempt
4:21 at creating programs that could play chess
4:23 or that would compete with humans in playing chess.
4:27 This is followed by the year 1956.
4:29 Now, this is probably the most important year
4:32 in the invention of AI.
4:34 Because in 1956, for the firs time,
4:37 the term artificial intelligence was coined.
4:39 Alright.
4:40 So the term artificial intelligence
4:41 was coined by John McCarthy
4:43 at the Dartmouth Conference in 1956.
4:47 Coming to the year 1959,
4:49 the first AI laboratory was established.
4:52 This period marked the research era for AI.
4:55 So the first AI lab where research was performed
4:58 is the MIT lab,
4:59 which is still running til date.
5:01 In 1960, the first robot was introduced
5:04 to the General Motors assembly line.
5:06 In 1961, the first chatbot was invented.
5:10 Now we have Siri, we have Alexa.
5:12 But in 1961,
5:14 there was a chatbot known as Eliza,
5:16 which was introduced.
5:17 This is followed by the famous IBM Deep Blue.
5:21 In 1997, the news broke down
5:23 that IBM's Deep Blue beats the world champion,
5:26 Garry Kasparov, in the game of chess.
5:29 So this was kind of the first accomplishment of AI.
5:32 It was able to beat the world champion at chess.
5:35 So in 2005, when the DARPA Grand Challenge was held,
5:39 a robotic car named Stanley,
5:41 which was built by Stanford's racing team,
5:44 won the DARPA Grand Challenge.
5:46 That was another big accomplish of AI.
5:48 In 2011, IBM's question answering system, Watson,
5:53 defeated the two greatest Jeopardy champions,
5:56 Brad Rutter and Ken Jennings.
5:58 So guys, this was how AI evolved.
6:00 It started off as a hypothetical situation.
6:03 Right now it's the most important technology
6:05 in today's world.
6:06 If you look around every where,
6:08 everything around us is run through AI deep learning
6:10 or machine learning.
6:12 So since the emergence of AI in the 1950s,
6:15 we have actually seen an exponential growth
6:17 and its potential.
6:19 So AI covers domains such as machine learning,
6:22 deep learning, neural networks,
6:23 natural language processing,
6:25 knowledge based, expert systems and so on.
6:28 It is also made its way into computer vision
6:30 and image processing.
6:32 Now the question here is if AI has been here
6:34 for over half a century,
6:36 why has it suddenly gain so much importance?
6:39 Why are we talking about artificial intelligence now?
6:43 Let me tell you the main reasons for the demand of AI.
6:46 The first reason is what we have
6:48 more computation power now.
6:50 So, artificial intelligence requires
6:53 a lot of computing power.
6:55 Recently, many advances have been made
6:57 and complex deep learning models are deployed.
7:00 And one of the greatest technology
7:02 that made this possible are GPUs.
7:04 Since we have more computational power now,
7:06 it is possible for us to implement AI
7:09 in our daily aspects.
7:10 Second most important reason is that
7:12 we have a lot of data at present.
7:15 We're generating data at an immeasurable pace.
7:18 We are generating data through social media,
7:21 through IoT devices.
7:22 Every possible way, there's a lot of data.
7:24 So we need to find a method or a solution
7:27 that can help us process this much data,
7:30 and help us derive useful insight,
7:32 so that we can grow business with the help of data.
7:35 Alright, so, that process
7:36 is basically artificial intelligence.
7:39 So, in order to have a useful AI agent
7:41 to make smart decisions
7:43 like telling which item to recommend
7:45 next when you shop online,
7:47 or how to classify an object from an image.
7:50 AI are trained on large data sets,
7:52 and big data enables us to do this more efficiently.
7:57 Next reason is now we have better algorithms.
8:00 Right now we have very effective algorithms
8:02 which are based on the idea of neural networks.
8:05 Neural networks is nothing but
8:06 the concept behind deep learning.
8:08 Since we have better algorithms
8:10 which can do better computations
8:12 and quicker computations with more accuracy,
8:15 the demand for AI has increased.
8:17 Another reason is that universities, governments,
8:20 startup, and tech giants are all investing in AI.
8:23 Okay, so companies like Google, Amazon,
8:25 Facebook, Microsoft,
8:27 all of these companies have heavily invested
8:29 in artificial intelligence
8:31 because they believe that AI is the future.
8:34 So AI is rapidly growing both as a field of study
8:37 and also as an economy.
8:39 So, actually, this is the right time
8:41 for you to understand what is AI and how it works.
8:44 So let's move on and understand
8:46 what exactly artificial intelligence is.
8:49 The term artificial intelligence
8:50 was first coined in the year 1956 by John McCarthy
8:54 at the Dartmouth Conference.
8:56 I already mentioned this before.
8:58 It was the birth of AI in the 1956.
9:01 Now, how did he define artificial intelligence?
9:04 John McCarthy defined AI as the science and engineering
9:08 of making intelligent machines.
9:10 In other words, artificial intelligence
9:13 is the theory and development of computer systems
9:16 able to perform task
9:17 that normally require human intelligence,
9:20 such as visual perception, speech recognition,
9:23 decision making, and translation between languages.
9:26 So guys, in a sense, AI is a technique of getting machines
9:30 to work and behave like humans.
9:32 In the rest past, artificial intelligence
9:34 has been able to accomplish this
9:36 by creating machines and robots
9:38 that have been used in wide range of fields,
9:41 including healthcare, robotics, marketing,
9:43 business analytics, and many more.
9:46 With this in mind,
9:47 let's discuss a couple of real world application of AI,
9:49 so that you understand how important artificial intelligence
9:52 is in today's world.
9:54 Now, one of the most famous applications
9:56 of artificial intelligence
9:57 is the Google predictive search engine.
10:00 When you begin typing a search term
10:02 and Google makes recommendations for you to choose from,
10:05 that is artificial intelligence in action.
10:08 So predictive searches are based on data
10:10 that Google collects about you,
10:12 such as your browser history, your location,
10:15 your age, and other personal details.
10:17 So by using artificial intelligence,
10:20 Google attempts to guess what you might be trying to find.
10:23 Now behind this,
10:24 there's a lot of natural language processing,
10:26 deep learning, and machine learning involved.
10:28 We'll be discussing all of those concepts
10:30 in the further slides.
10:31 It's not very simple to create a search engine,
10:34 but the logic behind Google search engine
10:37 is artificial intelligence.
10:39 Moving on, in the finance sector,
10:41 JP Morgan Chase's Contract Intelligence Platform
10:45 uses machine learning, artificial intelligence,
10:47 and image recognition software
10:49 to analyze legal documents.
10:51 Now let me tell you that manually reviewing
10:54 around 12,000 agreements took over 36,000 hours.
10:58 That's a lot of time.
11:00 But as soon as this task was replaced by AI machine,
11:03 it was able to do this in a matter of seconds.
11:06 So that's the difference between artificial intelligence
11:09 and manual or human work.
11:11 Even though AI cannot think and reason like humans,
11:14 but their computational power is very strong
11:17 compared to humans,
11:19 because the machine learning algorithm,
11:20 deep learning concepts, and natural language processing,
11:23 AI has reach a stage wherein it can compute
11:26 the most complex of complex problems
11:28 in a matter of seconds.
11:29 Coming to healthcare, IBM is one of the pioneers
11:32 that has developed AI software,
11:34 specifically for medicine.
11:36 Let me tell you that more than 230 healthcare organizations
11:40 use IBM AI technology,
11:42 which is basically IBM Watson.
11:44 In 2016, IBM Watson technology was able to cross reference
11:49 20 million oncology records quickly
11:52 and correctly diagnose a rare leukemia
11:54 condition in a patient.
11:56 So, it basically went through 20 million records,
11:59 which it probably did in a matter of second or minutes,
12:01 max to max.
12:02 And then it correctly diagnosed a patient
12:05 with a rare leukemia.
12:06 Knowing that machines are now used
12:08 in medical fields as well,
12:10 it shows how important AI has become.
12:12 It has reached every domains of our lives.
12:15 Let me give you another example.
12:17 The Google's AI Eye Doctor
12:19 is another initiative, which is taken by Google,
12:22 where they're working with an Indian eye care chain
12:24 to develop artificial intelligence system
12:27 which can examine retinal scans
12:29 and identify a condition called
12:32 diabetic retinopathy which can cause blindness.
12:35 Now in social media platforms like Facebook,
12:38 artificial intelligence is used for face verification
12:42 wherein you make use of machine learning
12:44 and deep learning concept
12:45 in order to detect facial features and tag your friends.
12:48 All the auto tagging feature that you see in Facebook,
12:51 behind that there's machine learning,
12:53 deep learning, neural networks.
12:55 There's only AI behind it.
12:57 So we're actually unaware
12:58 that we use AI very regularly in our life.
13:01 All the social media platforms
13:02 like Instagram, Facebook, Twitter,
13:05 they heavily rely on artificial intelligence.
13:08 Another such example is Twitter's AI
13:10 which is being used to identify any sort of hate speech
13:14 and terroristic languages in tweets.
13:17 So again, it makes use of machine leaning,
13:19 deep learning, natural language processing
13:22 in order to filter out any offensive
13:24 or any reportable content.
13:27 Now recently, the company discovered
13:29 around 300,000 terroristic link accounts
13:32 and 95% of these were found by non-human
13:36 artificially intelligent machines.
13:38 Coming to virtual assistants,
13:40 we have virtual assistants like Siri and Alexa right now.
13:43 Let me tell you about another newly released
13:45 Google's virtual assistant called the Google Duplex,
13:48 which has astonished millions of people around the world.
13:51 Not only can it respond to calls
13:53 and book appointments for you,
13:55 it also adds a human touch.
13:58 So it adds human filters and all of that.
14:00 It makes it sound very realistic.
14:02 It's actually very hard to distinguish between
14:05 human and the AI speaking over the phone.
14:08 Another famous application is AI is self-driving cars.
14:12 So, artificial intelligence implements computer vision,
14:15 image detection, deep learning,
14:17 in order to build cars
14:18 that can automatically detect any objects or any obstacles
14:22 and drive around without human intervention.
14:25 So these are fully automated self-driving cars.
14:28 Also, Elon Musk talks a lot about how AI is implemented
14:32 in Tesla's self-driving cars.
14:34 He quoted that Tesla will have fully self-driving cars
14:37 ready by the end of the year,
14:39 and robo taxi version that can ferry passengers
14:42 without anyone behind the wheel.
14:44 So if you look at it, AI is actually used
14:46 by the tech giants.
14:48 A lot of tech giant companies like Google, Tesla, Facebook,
14:52 all of these data-driven companies.
14:54 In fact, Netflix also makes use of AI,.
14:57 So, coming to Netflix.
14:58 So with the help of artificial intelligence
15:01 and machine learning,
15:02 Netflix has developed a personalized movie recommendation
15:06 for each of its users.
15:08 So if each of you opened up Netflix
15:10 and if you look at the type of movies
15:11 that are recommended to you, they are different.
15:14 This is because Netflix studies
15:16 each user's personal details,
15:18 and tries to understand what each user is interested in
15:21 and what sort of movie patterns each user has,
15:24 and then it recommends movies to them.
15:27 So Netflix uses the watching history of other users
15:30 with similar taste to recommend
15:32 what you may be most interested in watching next,
15:34 so that you can stay engaged
15:36 and continue your monthly subscription.
15:39 Also, there's a known fact that over 75% of what you watch
15:43 is recommended by Netflix.
15:46 So their recommendation engine is brilliant.
15:48 And the logic behind their recommendation engine
15:51 is machine learning and artificial intelligence.
15:54 Apart from Netflix, Gmail also uses AI on a everyday basis.
15:59 If you open up your inbox right now,
16:01 you will notice that there are separate sections.
16:04 For example, we have primary section,
16:06 social section, and all of that.
16:07 Gmail has a separate section called the spam mails also.
16:11 So, what Gmail does is it makes use of
16:14 concepts of artificial intelligence
16:16 and machine learning algorithms
16:17 to classify emails as spam and non-spam.
16:21 Many times certain words or phrases
16:23 are frequently used in spam emails.
16:26 If notice your spam emails,
16:28 they have words like lottery, earn, full refund.
16:31 All of this denotes that the email
16:33 is more likely to be a spam one.
16:36 So such words and correlations are understood
16:38 by using machine learning and natural language processing
16:42 and a few other aspects of artificial intelligence.
16:44 So, guys, these were the common applications
16:47 of artificial intelligence.
16:49 Now let's discuss the different types of AI.
16:52 So, AI is divided into three different evolutionary stages,
16:56 or you can say that there are three stages
16:58 of artificial intelligence.
17:00 Of course, we have artificial narrow intelligence
17:03 followed by artificial general intelligence,
17:05 and that is followed by artificial super intelligence.
17:09 Artificial narrow intelligence,
17:10 which is also known as weak AI,
17:13 it involves applying artificial intelligence
17:15 only to specific task.
17:17 So, many currently existing systems
17:19 that claim to use artificial intelligence
17:22 are actually operating as weak AI
17:25 focused on a narrowly defined specific problem
17:28 Let me give you an example
17:29 of artificial narrow intelligence.
17:31 Alexa is a very good example of weak AI.
17:34 It operates within unlimited pre-defined range of functions.
17:38 There's no genuine intelligence
17:40 or there is no self awareness,
17:42 despite being a sophisticated example of weak AI.
17:45 The Google search engine, Sophia the humanoid,
17:49 self-driving cars, and even the famous AlphaGo
17:52 fall under the category of weak AI.
17:55 So guys, right now we're at the stage
17:57 of artificial narrow intelligence or weak AI.
18:00 We actually haven't reached artificial general intelligence
18:03 or artificial super intelligence,
18:05 but let's look at what exactly it would be like
18:08 if we reach artificial general intelligence.
18:11 Now artificial general intelligence
18:12 which is also known as strong AI,
18:15 it involves machines that posses the ability
18:18 to perform any intelligent task that a human being can.
18:21 Now this is actually something
18:23 that a lot of people don't realize.
18:24 Machines don't posses human-like abilities.
18:28 They have a very strong processing unit
18:30 that can perform high-level computations,
18:33 but they're not yet capable of doing the simple
18:36 and the most reasonable things that a human being can.
18:39 If you tell a machine to process like a million documents,
18:42 it'll probably do that in a matter of 10 seconds,
18:44 or a minute, or even 10 minutes.
18:47 But if you ask a machine to walk up to your living room
18:50 and switch on the TV,
18:52 a machine will take forever to learn that,
18:54 because machines don't have the reasonable way of thinking.
18:58 They have a very strong processing unit,
19:01 but they're not yet capable
19:02 of thinking and reasoning like a human being.
19:05 So that's exactly why we're still stuck
19:07 on artificial narrow intelligence.
19:09 So far we haven't developed any machine
19:12 that can fully be called strong AI,
19:14 even though there are examples of AlphaGo Zero
19:16 which defeated AlphaGo in the game of Go.
19:20 AlphaGo Zero basically learned in a span of four months.
19:24 It learned on its own without any human intervention.
19:27 But even then, it was not classified
19:29 as a fully strong artificial intelligence,
19:32 because it cannot reason like a human being.
19:34 Moving onto artificial super intelligence.
19:37 Now this is a term referring to the time
19:40 when the capabilities of a computer
19:42 will surpass that of a human being.
19:45 In all actuality, I'll take a while for us
19:47 to achieve artificial super intelligence.
19:50 Presently, it's seen as a hypothetical situation
19:53 as depicted in movies and any science fiction books
19:57 wherein machines have taken over the world,
19:59 movies like Terminator and all of that
20:01 depict artificial super intelligence.
20:04 These don't exist yet,
20:05 which we should be thankful for,
20:07 but there are a lot of people
20:09 who speculate that artificial super intelligence
20:12 will take over the world by the year 2040.
20:15 So guys, these were the different types
20:17 or different stages of artificial intelligence.
20:20 To summarize everything, like I said before,
20:23 narrow intelligence is the only thing that exist for now.
20:25 We have only weak AI or weak artificial intelligence.
20:29 All the major AI technologies that you see
20:32 are artificial narrow intelligence.
20:35 We don't have any machines which are capable of thinking
20:38 like human beings or reasoning like a human being.
20:41 Now let's move on and discuss
20:42 the different programming language for AI.
20:45 So there are actually N number of language
20:47 that can be used for artificial intelligence.
20:50 I'm gonna mention a few of them.
20:52 So, first, we have Python.
20:54 Python is probably the most famous language
20:56 for artificial intelligence.
20:58 It's also known as the most effective language for AI,
21:01 because a lot of developers prefer to use Python.
21:04 And a lot of scientists are also comfortable
21:06 with the Python language.
21:08 This is partly because the syntaxes
21:10 which belong to Python are very simple
21:12 and they can be learned very easily.
21:14 It's considered to be one of the most
21:16 easiest language to learn.
21:18 And also many other AI algorithms
21:19 and machine learning algorithms
21:21 can be easily implemented in Python,
21:24 because there are a lot of libraries
21:25 which are predefined functions for these algorithms.
21:28 So all you have to do is you have to call that function.
21:30 You don't actually have to call your algorithm.
21:32 So, Python is considered the best choice
21:34 for artificial intelligence.
21:36 With Python stands R,
21:38 which is a statistical programming language.
21:41 Now R is one of the most effective language
21:43 and environment for analyzing and manipulating the data
21:46 for statistical purpose.
21:48 It is a statistical programming language.
21:51 So using R we can easily produce
21:53 well designed publication quality plots,
21:56 including mathematical symbol and formula, wherever needed.
22:00 If you ask me, I think R is also one of the
22:03 easiest programming language to learn.
22:05 The syntax is very similar to English language,
22:08 and it also has N number of libraries
22:11 that support statistics, data science,
22:13 AI, machine learning, and so on.
22:16 It also has predefined functions
22:17 for machine learning algorithms,
22:19 natural language processing, and so on.
22:21 So R is also a very good choice
22:23 if you want to get started with programming languages
22:26 for machine learning or AI.
22:28 Apart from this, we have Java.
22:30 Now Java can also be considered as a good choice
22:33 for AI development.
22:34 Artificial intelligence has a lot to do
22:36 with search algorithms,
22:38 artificial neural networks, and genetic programming,
22:41 and Java provides many benefits.
22:43 It's easy to use.
22:45 Debugging is very easy, package services.
22:48 There is simplified work with large scale projects.
22:51 There's a good user interaction,
22:53 and graphical representation of data.
22:56 It has something known as the standard widget toolkit,
22:59 which can be used for making graphs and interfaces.
23:02 So, graphic virtualization is actually
23:04 a very important part of AI,
23:06 or data science, or machine learning for that matter.
23:09 Let me list out a few more languages.
23:11 We also have something known as Lisp.
23:14 Now shockingly, a lot of people have not heard
23:16 of this language.
23:17 This is actually the oldest and the most suited language
23:20 for the development of artificial intelligence.
23:23 It is considered to be a language
23:25 which is very suited for the development
23:27 of artificial intelligence.
23:28 Now let me tell you that this language
23:30 was invented by John McCarthy
23:32 who's also known as the father of artificial intelligence.
23:36 He was the person who coined the term
23:38 artificial intelligence.
23:39 It has the capability of processing symbolic information.
23:43 It has excellent prototyping capabilities.
23:46 It is easy,
23:47 and it creates dynamic objects with a lot of ease.
23:51 There's automatic garbage collection in all of that.
23:54 But over the years, because of advancements,
23:57 many of these features
23:58 have migrated into many other languages.
24:01 And that's why a lot of people don't go for Lisp.
24:03 There are a lot of new languages
24:05 which have more effective features
24:06 or which have better packages you can see.
24:09 Another language I like to talk about is Prolog.
24:13 Prolog is frequently used in knowledge base
24:15 and expert systems.
24:17 The features provided by Prolog
24:19 include pattern matching, freebase data structuring,
24:22 automatic back tracking and so on.
24:25 All of these features provide
24:27 a very powerful and flexible programming framework.
24:30 Prolog is actually widely used in medical projects
24:33 and also for designing expert AI systems.
24:36 Apart from this, we also have C++,
24:38 we have SaaS, we have JavaScript
24:41 which can also be used for AI.
24:43 We have MATLAB, we have Julia.
24:45 All of these languages are actually considered
24:47 pretty good languages for artificial intelligence.
24:50 But for now, if you ask me
24:52 which programming language should I go for,
24:54 I would say Python.
24:56 Python has all the possible packages,
24:58 and it is very easy to understand and easy to learn.
25:02 So let's look at a couple of features of Python.
25:04 We can see why we should go for Python.
25:07 First of all, Python was created
25:08 in the year 1989.
25:11 It is actually a very easy programming language.
25:14 That's one of the reasons why
25:15 a lot of people prefer Python.
25:17 It's very easy to understand.
25:18 It's very easy to grasp this language.
25:20 So Python is an interpreted, object-oriented,
25:23 high-level programming language,
25:25 and it can be very easily implemented.
25:27 Now let me tell you a few features of Python.
25:29 It's very simple and easy to learn.
25:32 Like I mentioned,
25:33 it is one of the easiest programming language,
25:35 and it also free and open source.
25:38 Apart from that, it is a high-level language.
25:40 You don't have to worry about
25:42 anything like memory allocation.
25:44 It is portable,
25:45 meaning that you can use it on any platform
25:47 like Linux, Windows, Macintosh, Solaris, and so on.
25:51 It support different programming paradigms
25:54 like object-oriented and procedure oriented programming,
25:57 and it is extensible,
25:58 meaning that it can invoke C and C++ libraries.
26:02 Apart from this, let me tell you that Python
26:04 is actually gaining unbelievable huge momentum in AI.
26:08 The language is used to develop data science algorithms,
26:11 machine learning algorithms, and IoT projects.
26:15 The other advantages to Python also,
26:17 the fact that you don't have to code much
26:19 when it comes to Python for AI or machine learning.
26:22 This is because there are ready-made packages.
26:24 There are predefined packages
26:26 that have all the function and algorithm stored.
26:29 For example, there is something known as PiBrain,
26:31 which can be used for machine learning,
26:33 NumPy which can be used for scientific computation,
26:36 Pandas and so on.
26:38 There are N number of libraries in Python.
26:41 So guys, I'm now going to go into depth of Python.
26:43 I'm now going to explain Python to you,
26:45 since this session is about artificial intelligence.
26:48 So, those of you who don't know much about Python
26:51 or who are new to Python,
26:53 I will leave a couple of links in the description box.
26:56 You all can get started with programming
26:58 and any other concepts or any other doubts
27:00 that you have on Python.
27:02 We have a lot of content around programming with Python
27:05 or Python for machine learning and so on.
27:08 Now let's move on and talk about
27:10 one of the most important aspects
27:12 of artificial intelligence,
27:13 which is machine learning.
27:15 Now a lot of people always ask me this question.
27:17 Is machine learning and artificial intelligence
27:20 the same thing?
27:21 Well, both of them are not the same thing.
27:23 The difference between AI and machine learning
27:25 is that machine learning is used in artificial intelligence.
27:29 Machine learning is a method
27:31 through which you can feed a lot of data to a machine
27:34 and make it learn.
27:35 Now AI is a vast of field.
27:38 Under AI, we have machine learning, we have NLP,
27:40 we have expert systems, we have image recognition,
27:44 object detection, and so on.
27:46 We have deep learning also.
27:48 So, AI is sort of a process or it's a methodology
27:52 in which you make machines
27:53 mimic the behavior of human beings.
27:56 Machine learning is a way
27:57 in which you feed a lot of data to a machine,
27:59 so that it can make it's own decisions.
28:01 Let's get into depth about machine learning.
28:04 So first, we'll understand the need for machine learning
28:07 or why machine learning came into existence.
28:10 Now the need for machine learning
28:12 begins since the technical revolution itself.
28:14 So, guys, since technology became the center of everything,
28:18 we've been generating an immeasurable amount of data.
28:21 As per research, we generate around
28:23 2.5 quintillion bytes of data every single data
28:26 every single day.
28:28 And it is estimated that by this year, 2020,
28:31 1.7 mb of data will be created every second
28:35 for every person on earth.
28:37 So as I'm speaking to you right now,
28:38 I'm generating a lot of data.
28:40 Now your watching this video on YouTube
28:42 also accounts for data generation.
28:44 So there's data everywhere.
28:46 So with the availability of so much data,
28:48 it is finally possible to build predictive models
28:52 that can study and analyze complex data
28:55 to find useful insights and deliver more accurate results.
28:59 So, top tier companies like Netflix and Amazon
29:02 build such machine learning models
29:05 by using tons of data
29:06 in order to identify any profitable opportunity
29:10 and avoid any unwanted risk.
29:12 So guys, one thing you all need to know is that
29:14 the most important thing for artificial intelligence
29:17 is data.
29:18 For artificial intelligence
29:19 or whether it's machine learning or deep learning,
29:21 it's always data.
29:23 And now that we have a lot of data,
29:25 we can find a way to analyze, process,
29:28 and draw useful insights from this data
29:31 in order to help us grow businesses
29:33 or to find solutions to some problems.
29:36 Data is the solution.
29:37 We just need to know how to handle the data.
29:40 And the way to handle data
29:41 is through machine learning, deep learning,
29:43 and artificial intelligence.
29:45 A few reasons why machine learning is so important
29:48 is, number one, due to increase in data generation.
29:51 So due to excessive production of data,
29:53 we need to find a method that can be used
29:56 to structure, analyze, and draw useful insights from data,
29:59 this is where machine learning comes in.
30:01 It is used to solve problems and find solutions
30:04 through the most complex task faced by organizations.
30:08 Apart form this, we also needed to improve decision making.
30:11 So by making use of various algorithms,
30:14 machine learning can be used to make
30:15 better business decisions.
30:17 For example, machine learning is used to focus sales.
30:20 It is used to predict any downfalls n the stock market
30:23 or identify any sort of risk and anomalies.
30:27 Other reasons include that machine learning helps us
30:29 uncover patterns and trends in data.
30:33 So finding hidden patterns
30:34 and extracting key insights fro data
30:38 is the most important part of machine learning.
30:41 So by building predictive models
30:43 and using statistical techniques,
30:45 machine learning allows you to dig beneath the surface
30:49 and explode the data at a minute scale.
30:51 Understanding data and extracting patterns manually
30:54 takes a lot of time.
30:56 It'll take several days for us
30:57 to extract any useful information from data.
31:00 But if you use machine learning algorithms,
31:03 you can perform similar computations in less than a second.
31:07 Another reason is we need to solve complex problems.
31:11 So from detecting the genes
31:13 linked to the deadly ALS disease,
31:15 to building self-driving cars,
31:17 machine learning can be used
31:18 to solve the most complex problems.
31:21 At present, we also found a way to spot stars
31:24 which are 2,400 light years away from our planet.
31:29 Okay, all of this is possible through AI,
31:31 machine learning, deep learning, and these techniques.
31:34 So to sum it up,
31:35 machine learning is very important at present
31:37 because we're facing a lot of issues with data.
31:40 We're generating a lot of data,
31:41 and we have to handle this data
31:43 in such a way that in benefits us.
31:46 So that's why machine learning comes in.
31:48 Moving on, what exactly is machine learning?
31:51 So let me give you a short history of machine learning.
31:54 So machine learning was first coined by Arthur Samuel
31:57 in the year 1959,
31:59 which is just three years from
32:00 when artificial intelligence was coined.
32:03 So, looking back, that year was probably
32:05 the most significant in terms of technological advancement,
32:09 because most of the technologies today
32:11 are based on the concept of machine learning.
32:14 Most of the AI technologies itself
32:16 are based on the concept of
32:17 machine learning and deep learning.
32:19 Don't get confused about
32:20 machine learning and deep learning.
32:22 We'll discuss about deep learning in the further slides,
32:25 where we'll also see the difference
32:26 between AI, machine learning, and deep learning.
32:29 So coming back to what exactly machine learning is,
32:32 if we browse through the internet,
32:34 you'll find a lot of definitions about
32:35 what exactly machine learning is.
32:38 One of the definitions I found was
32:40 a computer program is said to learn from experience E
32:43 with respect to some class of task T
32:46 and performance measure P if its performance at task in T,
32:50 as measured by P, improves with experience E.
32:54 That's very confusing, so let me just narrow it down to you.
32:58 In simple terms, machine learning is a subset
33:01 of artificial intelligence
33:03 which provides machines the ability
33:05 to learn automatically and improve with experience
33:08 without being explicitly programmed to do so.
33:11 In the sense, it is the practice
33:13 of getting machines to solve problems
33:15 by gaining the ability to think.
33:18 But now you might be thinking
33:19 how can a machine think or make decisions.
33:22 Now machines are very similar to humans.
33:24 Okay, if you feed a machine a good amount of data,
33:27 it will learn how to interpret, process,
33:29 and analyze this data by using machine learning algorithms,
33:33 and it will help you solve world problems.
33:36 So what happens here is a lot of data
33:38 is fed to the machine.
33:40 The machine will train on this data
33:42 and it'll build a predictive model
33:44 with the help of machine learning algorithms
33:46 in order to predict some outcome
33:48 or in order to find some solution to a problem.
33:51 So it involves data.
33:53 You're gonna train the machine
33:54 and build a model by using machine learning algorithms
33:58 in order to predict some outcome
33:59 or to find a solution to a problem.
34:02 So that is a simple way of understanding
34:04 what exactly machine learning is.
34:06 I'll be going into more depth about machine learning,
34:09 so don't worry if you have understood anything as of now.
34:12 Now let's discuss a couple terms
34:14 which are frequently used in machine learning.
34:17 So, the first definition that we come across very often
34:21 is an algorithm.
34:23 So, basically, a machine learning algorithm
34:25 is a set of rules and statistical techniques
34:28 that is used to learn patterns from data
34:31 and draw significant information from it.
34:33 Okay.
34:34 So, guys, the logic behind a machine learning model
34:36 is basically the machine learning algorithm.
34:39 Okay, an example of a machine learning algorithm
34:41 is linear regression, or decision tree, or a random forest.
34:45 All of these are machine learning algorithms.
34:48 We'll define the logic behind
34:49 a machine learning model.
34:51 Now what is a machine learning model?
34:52 A model is actually the main component
34:55 of a machine learning process.
34:57 Okay, so a model is trained by using
34:59 the machine learning algorithm.
35:01 The difference between an algorithm and a model is that
35:04 an algorithm maps all the decisions
35:06 that a model is supposed to take
35:08 based on the given input
35:10 in order to get the correct output.
35:13 So the model will use
35:14 the machine learning algorithm
35:16 in order to draw useful insights from the input
35:19 and give you an outcome that is very precise.
35:22 That's the machine learning model.
35:24 The next definition we have is predictor variable.
35:27 Now a predictor variable is any feature of the data
35:30 that can be used to predict the output.
35:33 Okay, let me give you an example
35:34 to make you understand what a predictor variable is.
35:37 Let's say you're trying to predict the height of a person,
35:41 depending on his weight.
35:43 So here your predictor variable becomes your weight,
35:46 because you're using the weight of a person
35:49 to predict the person's height.
35:51 So your predictor variable becomes your weight.
35:54 The next definition is response variable.
35:56 Now in the same example,
35:57 height would be the response variable.
36:00 Response variable is also known as
36:02 the target variable or the output variable.
36:05 This is the variable that you're trying to predict
36:07 by using the predictor variables.
36:09 So a response variable is the feature
36:11 or the output variable that needs to be predicted
36:14 by using the predictor variables.
36:16 Next, we have something known as training data.
36:19 Now training and testing data are terminologies
36:22 that you'll come across very often
36:23 in a machine learning process.
36:25 So training data is basically the data that I used
36:28 to create the machine learning model.
36:31 So, basically in a machine learning process,
36:33 when you feed data into the machine,
36:35 it'll be divided into two parts.
36:37 So splitting the data into two parts
36:39 is also known as data splicing.
36:42 So you'll take your input data,
36:43 you'll divide it into two sections.
36:45 One you'll call the training data,
36:47 and the other you'll call the testing data.
36:49 So then you have something known as the testing data.
36:52 The training data is basically used
36:54 to create the machine learning model.
36:56 The training data helps the model to identify
36:59 key trends and patterns
37:00 which are essential to predict the output.
37:03 Now the testing data is, after the model is trained,
37:06 it must be tested in order to evaluate how accurately
37:09 it can predict an outcome.
37:11 Now this is done by using the testing data.
37:14 So, basically, the training data is used to train the model.
37:16 The testing data is used to test
37:18 the efficiency of the model.
37:21 Now let's move on and get our next topic,
37:24 which is machine learning process.
37:26 So what is the machine learning process?
37:29 Now the machine learning process
37:30 involves building a predictive model
37:33 that can be used to find a solution
37:34 for a problem statement.
37:36 Now in order to solve any problem in machine learning,
37:39 there are a couple of steps that you need to follow.
37:42 Let's look at the steps.
37:44 The first step is you define the objective of your problem.
37:47 And the second step is data gathering,
37:49 which is followed by preparing your data,
37:52 data exploration, building a model,
37:55 model evaluation, and finally making predictions.
37:59 Now, in order to understand the machine learning process,
38:02 let's assume that you've been given a problem
38:05 that needs to be solved by using machine learning.
38:08 So the problem that you need to solve is
38:11 we need to predict the occurrence of rain
38:13 in your local area by using machine learning.
38:16 So, basically, you need to predict
38:17 the possibility of rain by studying the weather conditions.
38:20 So what we did here is we
38:22 basically looked at step number one,
38:24 which is define the objective of the problem.
38:26 Now here you need to answer questions such as
38:28 what are we trying to predict.
38:30 Is that output going to be a continuous variable,
38:32 or is it going to be a discreet variable?
38:35 These are the kinds of questions that you need to answer
38:38 in the first page,
38:39 which is defining the objective of the problem, right?
38:42 So yeah, exactly what are the target feature.
38:44 So here you need to understand
38:45 which is your target variable
38:47 and what are the different predictor variables
38:49 that you need in order to predict this outcome.
38:53 So here our target variable will be basically
38:55 a variable that can tell us
38:57 whether it's going to rain or not.
38:59 Input data is we'll need data such as maybe
39:02 the temperature on a particular day
39:04 or the humidity level, the precipitation, and so on.
39:08 So you need to define the objective at this stage.
39:11 So basically, you have to form an idea of the problem
39:14 at this storage.
39:16 Another question that you need to ask yourself
39:18 is what kind of problem are you solving.
39:21 Is this a binary classification problem,
39:23 or is this a clustering problem,
39:26 or is this a regression problem?
39:28 Now, a lo of you might not be familiar
39:30 with the terms classification clustering
39:32 and regression in terms of machine learning.
39:35 Don't worry, I'll explain all of these terms
39:36 in the upcoming slides.
39:38 All you need to understand at step one
39:40 is you need to define how you're going to solve the problem.
39:43 You need to understand what sort of data
39:46 you need to solve the problem,
39:47 how you're going to approach the problem,
39:49 what are you trying to predict,
39:51 what variables you'll need in order to predict the outcome,
39:54 and so on.
39:55 Let's move on and look at step number two,
39:58 which is data gather.
40:00 Now in this stage, you must be asking questions such as,
40:03 what kind of data is needed to solve this problem?
40:07 And is this data available?
40:09 And if it is available, from where can I get this data
40:12 and how can I get the data?
40:15 Data gathering is one of the most time-consuming
40:17 steps in machine learning process.
40:19 If you have to go manually and collect the data,
40:22 it's going to take a lot of time.
40:24 But lucky for us, there are a lot of resources online,
40:28 which were wide data sets.
40:30 All you need to do is web scraping
40:31 where you just have to go ahead and download data.
40:34 One of the websites I can tell you all about is Cargill.
40:36 So if you're a beginner in machine learning,
40:38 don't worry about data gathering and all of that.
40:40 All you have to do is go to websites such as cargill
40:43 and just download the data set.
40:45 So coming back to the problem that we are discussing,
40:48 which is predicting the weather,
40:50 the data needed for weather forecasting
40:52 includes measures like humidity level,
40:55 the temperature, the pressure, the locality,
40:59 whether or not you live in a hill station,
41:01 such data has to be collected or stored for analysis.
41:05 So all the data is collected
41:07 during the data gathering stage.
41:09 This step is followed by data preparation,
41:12 or also known as data cleaning.
41:14 So if you're going around collecting data,
41:16 it's almost never in the right format.
41:18 And eve if you are taking data
41:19 from online resources from any website,
41:23 even then, the data will require cleaning and preparation.
41:27 The data is never in the right format.
41:28 You have to do some sort of preparation
41:30 and some sort of cleaning
41:31 in order to make the data ready for analysis.
41:35 So what you'll encounter while cleaning data
41:37 is you'll encounter a lot of inconsistencies
41:39 in the data set,
41:41 like you'll encounter som missing values,
41:43 redundant variables, duplicate values, and all of that.
41:47 So removing such inconsistencies is very important,
41:50 because they might lead to any wrongful
41:52 computations and predictions.
41:54 Okay, so at this stage you can scan the data set
41:56 for any inconsistencies,
41:58 and you can fix them then and there.
42:00 Now let me give you a small fact about data cleaning.
42:03 So there was a survey that was ran last year or so.
42:05 I'm not sure.
42:06 And a lot of data scientists were asked
42:09 which step was the most difficult or the most
42:12 annoying and time-consuming of all.
42:14 And 80% of the data scientist said
42:17 it was data cleaning.
42:18 Data cleaning takes up 80% of their time.
42:22 So it's not very easy to get rid of missing values
42:25 and corrupted data.
42:26 And even if you get rid of missing values,
42:28 sometimes your data set might get affected.
42:31 It might get biased because maybe one variable
42:33 has too many missing values,
42:35 and this will affect your outcome.
42:37 So you'll have to fix such issue,
42:39 we'll have to deal with all of this missing data
42:41 and corrupted data.
42:43 So data cleaning is actually one of the hardest steps
42:45 in machine learning process.
42:47 Okay, now let's move on and look at our next step,
42:51 which is exploratory data analysis.
42:53 So here what you do is basically become
42:56 a detective in the stage.
42:58 So this stage, which is EDA or exploratory data analysis,
43:01 is like the brainstorming stage of machine learning.
43:04 Data exploration involves understanding the patterns
43:07 and the trends in your data.
43:09 So at this stage, all the useful insights are drawn
43:13 and any correlations between the various variables
43:16 are understood.
43:16 What do I mean by trends and patterns and correlations?
43:20 Now let's consider our example
43:22 which is we have to predict the rainfall
43:23 on a particular day.
43:25 So we know that there is a strong possibility of rain
43:28 if the temperature has fallen law.
43:30 So we know that our output will depend on
43:34 variables such as temperature, humidity, and so on.
43:37 Now to what level it depends on these variables,
43:40 we'll have to find out that.
43:42 We'll have to find out the patterns,
43:44 and we'll find out the correlations
43:45 between such variables.
43:47 So such patterns and trends have to be understood
43:50 and mapped at this stage.
43:52 So this is what exploratory data analysis is about.
43:55 It's the most important part of machine learning.
43:58 This is where you'll understand
43:59 what exactly your data is
44:01 and how you can form the solution to your problem.
44:05 The next step in a machine learning process
44:08 is building a machine learning module.
44:10 So all the insights and the patterns
44:13 that you derive during the data exploration
44:15 are used to build a machine learning model.
44:18 So this stage always begins by splitting the data set
44:21 into two parts, which is training data and testing data.
44:23 I've already discussed with you
44:25 that the data that you used in a machine learning process
44:28 is always split into two parts.
44:30 We have the training data and we have the testing data.
44:33 Now when you're building a model,
44:35 you always use the training data.
44:37 So you always make use of the training data
44:40 in order to build the model.
44:41 Now a lot of you might be asking what is training data.
44:45 Is it different from the input data
44:46 that you're feeding with the machine
44:48 or is it different from the testing data?
44:50 Now training data is the same input data
44:53 that you're feeding to the machine.
44:54 The only difference is that you're
44:56 splitting the data set into two.
44:58 You're randomly picking 80% of your data
45:00 and you're assigning for training purpose.
45:03 And the rest 20%, probably,
45:04 you'll assign it for testing purpose.
45:07 So guys, always remember another thing that
45:10 the training data is always much more
45:12 than your testing data,
45:13 obviously because you need to train your machine.
45:16 And the more data you feed the machine
45:18 during the training phase,
45:19 the better it will be during the testing phase.
45:22 Obviously, it'll predict better outcomes
45:24 if it is being trained on more data.
45:27 Correct?
45:28 So the model is basically using
45:29 the machine learning algorithm that predicts the output
45:32 by using the data fed to it.
45:34 Now in the case of predicting rainfall,
45:36 the output will be a categorical variable,
45:39 because we'll be predicting
45:40 whether it's going to rain or not.
45:43 Okay, so let's say we have an output variable called rain.
45:47 The two possible values that this variable can take
45:50 is yes it's going to rain and no it won't rain.
45:53 Correct, so that is out come.
45:54 Our outcome is a classification or a categorical variable.
45:58 So for such cases where your outcome
46:00 is a categorical variable,
46:02 you'll be using classification algorithms.
46:04 Again, example of a classification algorithm
46:06 is logistic regression
46:08 or you can also support vector machines,
46:10 you can use K nearest neighbor,
46:12 and you can also use naive Bayes, and so on.
46:16 Now don't worry about these terms,
46:17 I'll be discussing all these algorithms with you.
46:19 But just remember that while you're building
46:21 a machine learning model,
46:22 you'll make use of the training data.
46:24 You'll train the model by using the training data
46:27 and the machine learning algorithm.
46:30 Now like I said, choosing the machine learning algorithm,
46:32 depends on the problem statement
46:34 that you're trying to solve
46:35 because of N number of machine learning algorithms.
46:37 We'll have to choose the algorithm
46:38 that is the most suitable for your problem statement.
46:42 So step number six is model evaluation
46:45 and optimization.
46:46 Now after you've done building a model
46:48 by using the training data set,
46:51 it is finally time to put the model road test.
46:54 The testing data set is used to check
46:56 the efficiency of the model
46:58 and how accurately it can predict the outcome.
47:01 So once the accuracy is calculated,
47:04 any further improvements in the model
47:06 can be implemented during this stage.
47:08 The various methods that can help you
47:10 improve the performance of the model,
47:12 like you can use parameter tuning
47:14 and cross validation methods
47:15 in order to improve the performance of the model.
47:18 Now the main things you need to remember
47:19 during model evaluation and optimization
47:22 is that model evaluation is nothing but
47:25 you're testing how well your model can predict the outcome.
47:29 So at this stage, you will be using the testing data set.
47:33 In the previous stage, which is building a model,
47:35 you'll be using the training data set.
47:37 But in the model evaluation stage,
47:39 you'll be using the testing data set.
47:41 Now once you've tested your model,
47:43 you need to calculate the accuracy.
47:45 You need to calculate how accurately
47:47 your model is predicting the outcome.
47:49 After that, if you find that you need to
47:51 improve your model in some way or the other,
47:53 because the accuracy is not very good,
47:56 then you'll use methods such as parameter tuning.
47:58 Don't worry about these terms,
48:00 I'll discuss all of this with you,
48:01 but I'm just trying to make sure
48:03 that you're understanding the concept
48:04 behind each of the phases and machine learning.
48:07 It's very important you understand each step.
48:10 Okay, now let's move on and look at
48:11 the last stage of machine learning, which is predictions.
48:15 Now, once a model is evaluated
48:17 and once you've improved it,
48:19 it is finally used to make predictions.
48:21 The final output can either be a categorical variable
48:24 or a continuous variable.
48:26 Now all of this depends on your problem statement.
48:29 Don't get confused about continuous variables,
48:32 categorical variables.
48:33 I'll be discussing all of this.
48:34 Now in our case, because we're predicting
48:36 the occurrence of rainfall,
48:38 the output will be categorical variable.
48:41 It's obvious because we're predicting
48:43 whether it's going to rain or not.
48:45 The result, we understand
48:46 that this is a classification problem
48:48 because we have a categorical variable.
48:50 So that was the entire machine learning process.
48:53 Now it's time to learn about the different ways
48:55 in which machines can learn.
48:58 So let's move ahead
48:59 and look at the types of machine learning.
49:01 Now this is one of the most
49:02 interesting concepts in machine learning,
49:04 the three different ways in which machines learn.
49:07 There is something known as supervised learning,
49:10 unsupervised learning, and reinforcement learning.
49:13 So we'll go through this one by one.
49:16 We'll understand what supervised learning is first,
49:18 and then we'll look at the other two types.
49:21 So defined supervised learning,
49:23 it is basically a technique in which we teach
49:25 or train the machine by using the data,
49:28 which is well labeled.
49:30 Now, in order to understand supervised learning,
49:34 let's consider a small example.
49:37 So, as kids, we all needed guidance to solve math problems.
49:40 A lot of us had trouble solving math problems.
49:43 So our teachers always help us understand what addition is
49:47 an dhow it is done.
49:48 Similarly, you can think of supervised learning
49:51 as a type of machine learning
49:52 that involves a guide.
49:55 The label data set is a teacher
49:57 that will train you to understand the patterns in the data.
50:00 So the label data set is nothing but the training data set.
50:04 I'll explain more about this in a while.
50:06 So, to understand supervised learning better,
50:08 let's look at the figure on the screen.
50:10 Right here we're feeding the machine
50:12 image of Tom and Jerry,
50:14 and the goal is for the machine to identify
50:17 and classify the images into two classes.
50:20 One will contain images of Tom
50:22 and the the other will contain images of Jerry.
50:24 Now the main thing that you need to note
50:26 in supervised learning is a training data set.
50:29 The training data set is going to be very well labeled.
50:32 Now what do I mean when I say that
50:34 training data set is labeled.
50:36 Basically, what we're doing is we're telling the machine
50:39 this how Tom looks and this is how Jerry looks.
50:42 By doing this, you're training the machine
50:44 by using label data.
50:46 So the main thing that you're doing is you're labeling
50:48 every input data that you're feeding to the model.
50:51 So, basically, you're entire training data set is labeled.
50:55 Whenever you're giving an image of Tom,
50:57 there's gonna be a label there saying this is Tom.
50:59 And when you're giving an image of Jerry,
51:01 you're saying that this is how Jerry looks.
51:04 So, basically, you're guiding the machine
51:05 and you're telling that, "Listen, this is how Tom looks,
51:07 "this is how Jerry looks,
51:09 "and now you need to classify them
51:10 "into two different classes."
51:13 That's how supervised learning works.
51:15 Apart from that, it's the same old process.
51:17 After getting the input data,
51:19 you're gonna perform data cleaning.
51:21 Then there's exploratory data analysis,
51:23 followed by creating the model
51:25 by using the machine learning algorithm,
51:28 and then this is followed by model evaluation,
51:30 and finally, your predictions.
51:33 Now, one more thing to note here is that
51:35 the output that you get by using supervised learning
51:38 is also labeled output.
51:40 So, basically, you'll get two different classes
51:42 of name Tom and one of name Jerry,
51:45 and you'll get them labeled.
51:46 That is how supervised learning works.
51:48 The most important thing in supervised learning
51:51 is that you're training the model by using
51:54 labeled data set.
51:56 Now let's move on and look at unsupervised learning.
51:59 We look at the same example and understand
52:01 how unsupervised learning works.
52:04 So what exactly is unsupervised learning?
52:06 Now this involves training by using unlabeled data
52:10 and allowing the model to act on that information
52:12 without any guidance.
52:14 Alright.
52:15 Like the name suggest itself,
52:16 there is no supervision here.
52:18 It's unsupervised learning.
52:19 So think of unsupervised learning as a smart kid
52:23 that learns without any guidance.
52:25 Okay, in this type of machine learning,
52:27 the model is not fed with any label data,
52:30 as in the model has no clue that this is
52:33 the image of Tom and this is Jerry.
52:35 It figures out patterns
52:37 and the difference between Tom and Jerry on its own
52:40 by taking in tons and tons of data.
52:42 Now how do you think the machine identifies this as Tom,
52:45 and then finally gives us the output like
52:47 yes this is Tom, this is Jerry.
52:49 For example, it identifies prominent features of Tom,
52:53 such as pointy ears, bigger in size, and so on,
52:57 to understand that this image is of type one.
53:00 Similarly, it finds out features in Jerry,
53:02 and knows that this image is of type two,
53:05 meaning that the first image
53:06 is different from the second image.
53:08 So what the unsupervised learning algorithm
53:11 or the model does is it'll form two different clusters.
53:14 It'll form one cluster which are very similar,
53:17 and the other cluster which is very different
53:19 from the first cluster.
53:21 That's how unsupervised learning works.
53:23 So the important things that you need to know
53:25 in unsupervised learning
53:27 is that you're gonna feed the machine unlabeled data.
53:30 The machine has to understand the patterns
53:33 and discover the output on its own.
53:35 And finally, the machine will form clusters
53:38 based on feature similarity.
53:41 Now let's move on and locate
53:43 the last type of machine learning,
53:45 which is reinforcement learning.
53:47 Reinforcement learning is quite different
53:49 when compared to supervised and unsupervised learning.
53:53 What exactly is reinforcement learning?
53:55 It is a part of machine learning where an agent
53:58 is put in an environment,
54:00 and he learns to behave in this environment
54:03 by performing certain actions,
54:05 and observing the rewards which is gets from those actions.
54:09 To understand what reinforcement learning is,
54:12 imagine that you were dropped off at an isolate island.
54:16 What would you do?
54:17 Now panic.
54:18 Yes, of course, initially, we'll all panic.
54:20 But as time passes by, you will learn
54:23 how to live on the island.
54:24 You will explode the environment,
54:26 you will understand the climate conditions,
54:29 the type of food that grows there,
54:31 the dangers of the island so on.
54:33 This is exactly how reinforcement learning works.
54:36 It basically involves an agent,
54:37 which is you stuck on the island,
54:40 that is put in an unknown environment, which is the island,
54:44 where he must learn by observing and performing
54:46 actions that result in rewards.
54:49 So reinforcement learning is mainly used
54:51 in advanced machine learning areas
54:53 such as self-driving cars and AlphaGo.
54:55 I'm sure a lot of you have heard of AlphaGo.
54:58 So, the logic behind AlphaGo
55:00 is nothing but reinforcement learning and deep learning.
55:03 And in reinforcement learning,
55:04 there is not really any input data given to the agent.
55:07 All he has to do is he has to explore
55:09 everything from scratch
55:11 it's like a newborn baby with no information about anything.
55:14 He has to go around exploring the environment,
55:16 and getting rewards, and performing some actions
55:20 which results in either rewards
55:21 or in some sort of punishment.
55:23 Okay.
55:24 So that sums up the types of machine learning.
55:27 Before we move ahead,
55:28 I'd like to discuss the difference
55:30 between the three types of machine learning,
55:32 just to make the concept clear to you all.
55:35 So let's start by looking at the definitions of each.
55:38 In supervised learning,
55:39 the machine will learn by using the label data.
55:42 In unsupervised learning, they'll be unlabeled data,
55:44 and the machine has to learn without any supervision.
55:48 In reinforcement learning, there'll be an agent
55:50 which interacts with the environment
55:52 by producing actions and discover errors or rewards
55:55 based on his actions.
55:58 Now what are the type of problems
55:59 that can be solved by using supervised, unsupervised,
56:02 and reinforcement learning.
56:03 When it comes to supervised learning,
56:05 the two main types of problems that are solved
56:07 is regression problems and classification problems.
56:10 When it comes to unsupervised learning,
56:12 it is association and clustering problems.
56:14 When it comes to reinforcement learning,
56:16 it's reward-based problems.
56:18 I'll be discussing regression, classification,
56:20 clustering, and all of this in the upcoming slides,
56:23 so don't worry if you don't understand this.
56:25 Now the type of data which is used in supervised learning
56:27 is labeled data.
56:29 In unsupervised learning, it unlabeled.
56:31 And in reinforcement learning,
56:33 we have no predefined data set.
56:36 The agent has to do everything from scratch.
56:38 Now the type of training involved
56:40 in each of these learnings.
56:42 In supervised learning, there is external supervision,
56:45 as in there is the labeled data set
56:47 which acts as a guide for the machine to learn.
56:49 In unsupervised learning, there's no supervision.
56:52 Again, in reinforcement learning,
56:53 there's no supervision at all.
56:55 Now what is the approach to solve problems
56:58 by using supervised, unsupervised,
57:00 and reinforcement learning?
57:01 In supervised learning, it is simple.
57:03 You have to mal the labeled input to the known output.
57:07 The machine knows what the output looks like.
57:09 So you're just labeling the input to the output.
57:12 In unsupervised learning,
57:13 you're going to understand the patterns
57:15 and discover the output.
57:16 Here you have no clue about what the input is.
57:19 It's not labeled.
57:21 You just have to understand the patterns
57:22 and you'll have to form clusters and discover the output.
57:26 In reinforcement learning, there is no clue at all.
57:28 You'll have to follow the trial and error method.
57:31 You'll have to go around your environment.
57:33 You'll have to explore the environment,
57:35 and you'll have to try some actions.
57:38 And only once you perform those actions,
57:40 you'll know that whether this is a reward-based action
57:43 or whether this is a punishment-based action.
57:47 So, reinforcement learning is totally based
57:50 on the concept of trial and error.
57:52 Okay.
57:53 A popular algorithm on the supervised learning include
57:57 linear regression, logistic regressions,
57:59 support vector machines K nearest neighbor,
58:02 naive Bayes, and so on.
58:04 Under unsupervised learning,
58:05 we have the famous K-means clustering method, C-means
58:09 and all of that.
58:10 Under reinforcement learning,
58:11 we have the famous learning Q-learning algorithm.
58:13 I'll be discussing these algorithms in the upcoming slides.
58:17 So let's move on and look at the next topic,
58:20 which is the types of problems solved
58:22 using machine learning.
58:24 Now this is what we were talking about earlier
58:26 when I said regression, classification,
58:28 and clustering problems.
58:30 Okay, so let's discuss what exactly I mean by that.
58:33 In machine learning, all the problems can be
58:36 classified into three types.
58:38 Every problem that is approached in machine learning
58:41 can be put interest one of these three categories.
58:44 Okay, so the first type is known as a regression,
58:47 then we have classification and clustering.
58:50 So, first, let's look at regression type of problems.
58:53 So in this type problem,
58:55 the output is always a continuous quantity.
58:58 For example, if you want to predict
59:00 the speed of a car, given the distance,
59:03 it is a regression problem.
59:05 Now a lot of you might not be very aware
59:07 of what exactly a continuous quantity is.
59:09 A continuous quantity is any quantity that can have
59:12 an infinite range of values.
59:15 For example, The weight of a person,
59:17 it is a continuous quantity,
59:19 because our weight can be 50, 50.1,
59:22 50.001,
59:23 5.0021,
59:26 50.0321 and so on.
59:28 It can have an infinite range of values, correct?
59:32 So the type of problem that you have to predict
59:34 a continuous quantity to make use of regression algorithms.
59:39 So, regression problems can be solved
59:41 by using supervised learning algorithms
59:43 like linear regression.
59:46 Next, we have classification.
59:48 Now in this type of problem,
59:50 the output is always a categorical value.
59:53 Now when I say categorical value,
59:55 it can be value such as
59:57 the gender of a person is a categorical value.
60:01 Now classifying emails into two two classes
60:03 like spam and non-spam is a classification problem
60:07 that can be solved by using
60:08 supervised learning classification algorithms,
60:10 like support vector machines, naive Bayes,
60:13 logistic regression, K nearest neighbor, and so on.
60:17 So, again, the main aim in classification
60:20 is to compute the category of the data.
60:24 Coming to clustering problems.
60:25 This type of problem involves
60:27 assigned input into two or more clusters
60:31 based on feature similarity.
60:33 Thus when I read this sentence,
60:34 you should understand that this is unsupervised learning,
60:37 because you don't have enough data about your input,
60:40 and the only option that you have is to form clusters
60:44 Categories are formed only when you know that
60:46 your data is of two type.
60:48 Your input data is labeled and it's of two types,
60:50 so it's gonna be a classification problem.
60:52 But when a clustering problem happens,
60:55 when you don't have much information about your input,
60:57 all you have to do is you have to find patterns
60:59 and you have to understand that
61:01 data points which are similar
61:02 are clustered into one group,
61:05 and data points which are different from the first group
61:07 are clustered into another group.
61:10 That's what clustering is.
61:11 An example is in Netflix what happens is
61:15 Netflix clusters their users into similar groups
61:19 based on their interest, based on their age,
61:22 geography, and so on.
61:24 This can be done by using unsupervised learning algorithms
61:27 like K-means.
61:28 Okay.
61:29 So guys, there were the three categories of problems
61:32 that can be solved by using machine learning.
61:35 So, basically, what I'm trying to say
61:36 is all the problems will fall into one of these categories.
61:40 So any problem that you give to a machine learning model,
61:42 it'll fall into one of these categories.
61:45 Okay.
61:46 Now to make things a little more interesting,
61:47 I have collected real world data sets
61:50 from online resources.
61:52 And what we're gonna do is we're going to try and understand
61:56 if this is a regression problem,
61:58 or a clustering problem, or a classification problem.
62:00 Okay.
62:01 Now the problem statement in here
62:03 is to study the house sales data set,
62:06 and build a machine learning model
62:08 that predicts the house pricing index.
62:11 Now the most important thing you need to understand
62:13 when you read a problem statement
62:15 is you need to understand what is your target variable,
62:18 what are the possible predictor variable that you'll need.
62:21 The first thing you should look at is your targe variable.
62:24 If you want to understand if this a classification,
62:27 regression, or clustering problem,
62:29 look at your target variable or your output variable
62:31 that you're supposed to predict.
62:34 Here you're supposed to predict the house pricing index.
62:36 Our house pricing index is obviously
62:39 a continuous quantity.
62:40 So as soon as you understand that,
62:42 you'll know that this is a regression problem.
62:45 So for this, you can make use of
62:47 the linear regression algorithm,
62:48 and you can predict the house pricing index.
62:52 Linear regression is the regression algorithm.
62:54 It is a supervised learning algorithm.
62:57 We'll discuss more about it in the further slides.
62:59 Let's look at our next problem statement.
63:02 Here you have to study a bank credit data set,
63:05 and make a decision about whether
63:07 to approve the loan of an applicant
63:09 based on his profile.
63:11 Now what is your output variable over here?
63:14 Your output variable is to predict whether you can
63:17 approve the loan of a applicant or not.
63:20 So, obviously, your output is going to be categorical.
63:24 It's either going to be yes or no.
63:26 Yes is basically approved loan.
63:28 No is reject loan.
63:30 So here, you understand that
63:32 this is a classification problem.
63:34 Okay.
63:35 So you can make use of algorithms like KNN algorithm
63:38 or you can make use of support vector machines
63:41 in order to do this.
63:43 So, support vector machine and KNN
63:45 which is K nearest neighbor algorithms
63:48 are basically supervised learning algorithm.
63:51 We'll talk more about that in the upcoming slides.
63:53 Moving on to our next problem statement.
63:56 Here the problem statement is to cluster
63:58 a set of movies as either good or average
64:01 based on the social media outreach.
64:04 Now if you look properly,
64:06 your clue is in the question itself.
64:08 The first line it says is to cluster a set of movies
64:12 as either good or average.
64:14 Now guys, whenever you have a problem statement
64:17 that is asking you to group the data set
64:19 into different groups
64:21 or to form different, different clusters,
64:23 it's obviously a clustering problem.
64:26 Right here you can make use
64:27 of the K-means clustering algorithm,
64:29 and you can form two clusters.
64:31 One will contain the popular movies
64:33 and the other will contain the non-popular movies.
64:36 These alright small examples of how you can use
64:38 machine learning to solve clustering problem,
64:41 the regression, and classification problems.
64:43 The key is you need to identify the type of problem first.
64:48 Now let's move on and discuss the different types
64:51 of machine learning algorithms.
64:54 So we're gonna start by discussing the different
64:56 supervised learning algorithms.
64:58 So to give you a quick overview,
65:00 we'll be discussing the linear regression,
65:02 logistic regression, and decision tree,
65:05 random forest, naive Bayes classifier,
65:07 support vector machines, and K nearest neighbor.
65:11 We'll be discussing these seven algorithms.
65:14 So without any further delay,
65:15 let's look at linear regression first.
65:19 Now what exactly is a linear regression algorithm?
65:22 So guys, linear regression is basically
65:24 a supervised learning algorithm
65:26 that is used to predict a continuous dependent variable y
65:31 based on the values of independent variable x.
65:34 Okay.
65:35 The important thing to note here
65:36 is that the dependent variable y,
65:38 the variable that you're trying to predict,
65:41 is always going to be a continuous variable.
65:44 But the independent variable x,
65:47 which is basically the predictor variables,
65:50 these are the variables that you'll be using
65:52 to predict your output variable,
65:54 which is nothing but your dependent variable.
65:57 So your independent variables or your predictive variables
66:00 can either be continuous or discreet.
66:02 Okay, there is not such a restriction over here.
66:05 Okay, they can be either continuous variables
66:07 or they can be discreet variables.
66:09 Now, again, I'll tell you what a continuous variable is,
66:12 in case you've forgotten.
66:14 It is a vary that has infinite number of possibilities.
66:18 So I'll give you an example of a person's weight.
66:21 It can be 160 pounds, or they can weigh 160.11 pounds,
66:26 or 160.1134 pounds and so on.
66:30 So the number of possibilities for weight is limitless,
66:33 and this is exactly what a continuous variable is.
66:37 Now in order to understand linear regression,
66:39 let's assume that you want to predict the
66:42 price of a stock over a period of time.
66:45 Okay.
66:46 For such a problem, you can make use of linear regression
66:49 by starting the relationship
66:50 between the dependent variable,
66:52 which is the stock price,
66:54 and the independent variable, which is the time.
66:57 You're trying to predict the stock price
66:59 over a period of time.
67:02 So basically, you're gonna check how the price of a stock
67:05 varies over a period of time.
67:07 So your stock price is going to be
67:09 your dependent variable or your output variable,
67:12 and the time is going to be your predictor variable
67:15 or your independent variable.
67:18 Let's not confuse it anymore.
67:20 Your dependent variable is your output variable.
67:23 Okay, your independent variable is your input variable
67:26 or your predictor variable.
67:29 So in our case, the stock price is obviously
67:31 a continuous quantity,
67:32 because the stock price can have
67:34 an infinite number of values.
67:36 Now the first step in linear regression
67:39 is always to draw out a relationship
67:41 between your dependent and your independent variable
67:44 by using the best fitting linear length.
67:46 We make an assumption that your dependent
67:49 and independent variable
67:51 is linearly related to each other.
67:54 We call it linear regression because
67:56 both the variables vary linearly,
67:58 which means that by plotting the relationship
68:00 between these two variables,
68:02 we'll get more of a straight line, instead of a curve.
68:06 Let's discuss the math behind linear regression.
68:10 So, this equation over here,
68:12 it denotes the relationship between your
68:14 independent variable x, which is here,
68:17 and your dependent variable y.
68:20 This is the variable you're trying to predict.
68:22 Hopefully, we all know that the equation
68:24 for a linear line in math is y equals mx plus c.
68:28 I hope all of you remember math.
68:30 So the equation for a linear line in math
68:33 is y equals to mx plus c.
68:35 Similarly, the linear regression equation
68:38 is represented along the same line.
68:40 Okay, y equals to mx plus c.
68:42 There's just a little bit of changes,
68:44 which I'll tell you what they are.
68:46 Let's understand this equation properly.
68:49 So y basically stands for your dependent variable
68:51 that you're going to predict.
68:54 B naught is the y intercept.
68:56 Now y intercept is nothing but this point here.
69:00 Now in this graph, you're basically
69:01 showing the relationship between your dependent variable y
69:04 and your independent variable x.
69:06 Now this is the linear relationship
69:08 between these two variables.
69:10 Okay, now your y intercept is basically
69:13 the point on the line
69:15 which starts at the y-axis.
69:17 This is y interceptor,
69:19 which is represented by B naught.
69:22 Now B one or beta is the slope of this line
69:25 now the slope can either be negative or positive,
69:28 depending on the relationship between the dependent
69:30 and independent variable.
69:32 The next variable that we have is x.
69:35 X here represents the independent variable
69:37 that is used to predict our resulting output variable.
69:41 Basically, x is used to predict the value of y.
69:44 Okay.
69:45 E here denotes the error in the computation.
69:48 For example, this is the actual line,
69:51 and these dots here represent the predicted values.
69:55 Now the distance between these two
69:57 is denoted by the error in the computation.
70:00 So this is the entire equation.
70:02 It's quite simple, right?
70:03 Linear regression will basically draw
70:05 a relationship between your input and your input variable.
70:09 That's how simple linear regression was.
70:13 Now to better understand linear regression,
70:15 I'll be running a demo in Python.
70:18 So guys, before I get started with our practical demo,
70:22 I'm assuming that most of you
70:23 have a good understanding of Python,
70:26 because explaining Python is going to be
70:28 out of the scope of today's session.
70:31 But if some of you are not familiar
70:32 with the Python language,
70:34 I'll leave a couple of links in the description box.
70:37 Those will be related to Python programming.
70:39 You can go through those links, understand Python,
70:42 and then maybe try to understand the demo.
70:45 But I'd be explaining the logic part of the demo in depth.
70:49 So the main thing that we're going to do here
70:51 is try and understand linear regression.
70:53 So it's okay if you do not understand Python for now.
70:56 I'll try to explain as much as I can.
70:59 But if you still want to understand this in a better way,
71:03 I'll leave a couple of links in the description box
71:05 you can go to those videos.
71:07 Let me just zoom in for you.
71:13 I hope all of you can see the screen.
71:18 Now in this linear regression demo,
71:20 what we're going to do is
71:21 we're going to form a linear relationship
71:24 between the maximum temperature
71:26 and minimum temperature on a particular date.
71:28 We're just going to do weather forecasting here.
71:31 So our task is to predict the maximum temperature,
71:34 taking input feature as minimum temperature.
71:37 So I'm just going to try and make you understand
71:39 linear regression through this demo.
71:41 Okay, we'll see how it actually works practically.
71:44 Before I get started with the demo,
71:45 let me tell you something about the data set.
71:48 Our data set is stored in this path basically.
71:53 The name of the data set is weather.csv.
71:57 Okay, now, this contains data on whether conditions
72:00 recorded on each day
72:02 at various weather stations around the world.
72:04 Okay, the information include precipitation,
72:07 snowfall, temperatures, wind speeds,
72:09 and whether the day included any thunderstorm
72:12 or other poor weather conditions.
72:15 So our first step in any demo for that matter
72:18 will be to import all the libraries that are needed.
72:22 So we're gonna begin our demo
72:24 by importing all the required libraries.
72:27 After that, we're going to read in our data.
72:28 Our data will be stored
72:30 in this variable called data set,
72:32 and we're going to use a read.csv function
72:35 since our data set is in the CSV format.
72:38 After that, I'll be showing you
72:40 how the data set looks.
72:42 We'll also look at the data set in depth.
72:45 Now let me just show you the output first.
72:48 Let's run this demo and see first.
72:53 We're getting a couple of plots which I'll
72:56 talk about in a while.
73:01 So we can ignore this warning.
73:02 It has nothing to do with...
73:04 So, first of all, we're printing the shape of our data set.
73:08 So, when we print the shape of our data set,
73:10 This is the output that we get.
73:12 So, basically, this shows that we have around
73:15 12,000 rows and 31 columns in our data set.
73:19 The 31 columns basically represent
73:21 the predictor variables.
73:23 So you can say that we have 31 predictor variables
73:25 in order to protect the weather conditions
73:27 on a particular date.
73:28 So guys, the main aim in this problem segment
73:31 is weather forecast.
73:32 We're going to predict the weather
73:33 by using a set of predictor variables.
73:37 So these are the different types
73:38 of predictor variables that we have.
73:41 Okay, we have something known as maximum temperature.
73:43 So this is what our data set looks like.
73:47 Now what I'm doing in this block of code is...
73:51 What we're doing is we're plotting our data points
73:53 on a 2D graph in order to understand our data set
73:56 and see if we can manually find any relationship
73:59 between the variables.
74:01 Here we've taken minimum temperature
74:04 and maximum temperature for doing our analysis.
74:07 So let's just look at this plot.
74:10 Before that, let me just comment all of these other plots,
74:16 so that you see on either graph that I'm talking about.
74:28 So, when you look at this graph,
74:29 this is basically the graph between your minimum temperature
74:33 and your maximum temperature.
74:35 Maximum temperature are dependent variable
74:38 that you're going to predict.
74:39 This is y.
74:40 And your minim temperature is your x.
74:43 It's basically your independent variable.
74:45 So if you look at this graph,
74:46 you can see that there is a sort of
74:49 linear relationship between the two,
74:51 except there are a little bit of outliers here and there.
74:54 There are a few data points which are a little bit random.
74:57 But apart from that, there is a pretty linear relationship
75:00 between your minimum temperature
75:01 and your maximum temperature.
75:03 So by this graphic, you can understand
75:05 that you can easily solve this problem
75:07 using linear regression,
75:09 because our data is very linear.
75:11 I can see a clear straight line over here.
75:14 This is our first graph.
75:17 Next, what I'm doing is I'm just checking
75:19 the average and maximum temperature that we have.
75:22 I'm just looking at the average of our output variable.
75:26 Okay.
75:27 So guys, what we're doing here right now
75:28 is just exploratory data analysis.
75:31 We're trying to understand our data.
75:32 We're trying to see the relationship
75:34 between our input variable and our output variable.
75:37 We're trying to see the mean or the average
75:39 of the output variable.
75:40 All of this is necessary to understand our data set.
75:48 So, this is what our average maximum temperature looks like.
75:52 So if we try to understand where exactly this is,
75:56 so our average maximum temperature
75:58 is somewhere between 28 and I would say between 30.
76:03 28 and 32, somewhere there.
76:05 So you can say that average maximum temperature
76:08 lies between 25 and 35.
76:11 And so that is our average maximum temperature.
76:14 Now that you know a little bit about the data set,
76:16 you know that there is a very good linear relationship
76:19 between your input variable and your output variable.
76:22 Now what you're going to do is you're going to
76:25 perform something known as data splicing.
76:28 Let me just comment that for you.
76:30 This section is nothing but data splicing.
76:34 So for those of you who are paying attention,
76:36 know that data splicing is nothing but
76:38 splitting your data set into training and testing data.
76:41 Now before we do that,
76:43 I mentioned earlier that we'll be only using two variables,
76:46 because we're trying to understand the relationship between
76:49 the minimum temperature and maximum temperature.
76:51 I'm doing this because I want you to understand
76:54 linear regression in the simplest way possible.
76:57 So guys, in order to make understand linear regression,
77:00 I have just derived only two variables from a data set.
77:04 Even though when we check the structure of a data set,
77:07 we had around 31 features,
77:09 meaning that we had 31 variables
77:11 which include my predictor variable and my target variable.
77:14 So, basically, we had 30 predictor variables
77:17 and we had one target variable,
77:19 which is your maximum temperature.
77:21 So, what I'm doing here is I'm only considering
77:24 these two variables,
77:25 because I want to show you exactly
77:27 how linear regression works.
77:29 So, here what I'm doing is I'm basically
77:32 extracting only these two variables
77:34 from our data set, storing it in x and y.
77:37 After that, I'm performing data splicing.
77:39 So here, I'm basically splitting the data
77:41 into training and testing data,
77:43 and remember one point that I am assigning
77:45 20% of the data to our testing data set,
77:48 and the remaining 80% is assigned for training.
77:52 That's how training works.
77:54 We assign maximum data set for training.
77:57 We do this because we want the machine learning model
78:00 or the machine learning algorithm to train better on data.
78:04 We wanted to take as much data as possible,
78:06 so that it can predict the outcome properly.
78:10 So, to repeat it again for you,
78:12 so here we're just splitting the data
78:13 into training and testing data set.
78:16 So, one more thing to note here is that
78:18 we're splitting 80% of the data from training,
78:21 and we're assigning the 20% of the data to test data.
78:25 The test size variable, this variable that you see,
78:27 is what is used to specify
78:29 the proportion of the test set.
78:31 Now after splitting the data into training and testing set,
78:34 finally, the time is to train our algorithm.
78:38 For that, we need to import the linear regression class.
78:41 We need to instantiate it
78:43 and call the fit method along with the training data.
78:46 This is our linear regression class,
78:48 and we're just creating an instance
78:50 of the linear regression class.
78:52 So guys, a good thing about Python is that
78:54 you have pre-defined classes for your algorithms,
78:58 and you don't have call your algorithms.
79:00 Instead, all you have to do,
79:01 is you call this class linear regression class,
79:04 and you have to create an instance of it.
79:06 Here I'm basically creating something known as a regressor.
79:08 And all you have to do is you have to call the fit method
79:11 along with your training data.
79:13 So this is my training data, x train and y train
79:16 contains my training data,
79:18 and I'm calling our linear regression instance,
79:21 which is regressor, along with this data set.
79:24 So here, basically, we're building the model.
79:26 We're doing nothing but building the model.
79:28 Now, one of the major things that
79:30 linear regression model does is
79:32 it finds the best value for the intercept and the slope,
79:36 which results in a line that best fits the data.
79:40 I've discussed what intercept and slope is.
79:43 So if you want to see the intercept and the slope
79:45 calculated by our linear regression model,
79:48 we just have to run this line of code.
79:50 And let's looks at the output for that.
79:53 So, our intercept is around 10.66
79:56 and our coefficient,
79:58 these are also known as beta coefficients,
80:00 coefficient are nothing but what we discussed, beta naught.
80:04 These are beta values.
80:06 Now this will just help you understand
80:07 the significance of your input variables.
80:11 Now what this coefficient value means is,
80:13 see, the coefficient value is around 0.92.
80:16 This means that for every one unit changed
80:20 of your minimum temperature,
80:21 the change in the maximum temperature is around 0.92.
80:26 This will just show you how significant
80:28 your input variable is.
80:30 So, for every one unit change in your minimum temperature,
80:34 the change in the maximum temperature
80:36 will be around 0.92.
80:39 I hope you've understood this part.
80:41 Now that we've trained our algorithm,
80:43 it's trying to make some predictions.
80:45 To do so, what we'll use is we'll use our test data set,
80:49 and we'll see how accurately our algorithm
80:51 predicts the percentage score.
80:53 Now to make predictions,
80:54 we have this line of code.
80:56 Predict is basically a predefined function in Python.
81:00 And all you're going to do is you're going to pass
81:01 your testing data set to this.
81:04 Now what you'll do is you'll compare
81:06 the actual output values,
81:08 which is basically stored in your y test.
81:11 And you'll compare these to the predicted values,
81:14 which is in y prediction.
81:17 And you'll store these comparisons in our data frame
81:20 called df.
81:21 And all I'm doing here is I'm printing the data frame.
81:24 So if you look at the output, this is what it looks like.
81:27 These are your actual values
81:29 and these are the values that you predicted
81:31 by building that model.
81:34 So, if your actual value is 28,
81:36 you predicted around 33,
81:37 here your actual value is 31,
81:40 meaning that your maximum temperature is 31.
81:42 And you predicted a maximum temperature of 30.
81:45 Now, these values are actually pretty close.
81:47 I feel like the accuracy is pretty good over here.
81:51 Now in some cases, you see a lot of variance, like 23.
81:54 Here it's 15.
81:55 Right here it's 22.
81:56 Here it's 11.
81:57 But such cases are very often.
82:00 And the best way to improve your accuracy I would say
82:02 is by training a model with more data.
82:05 Alright.
82:06 You can also view this comparison in the form of a plot.
82:10 Let's see how that looks.
82:16 So, basically, this is a bar graph
82:18 that shows our actual values and our predicted values.
82:21 Blue is represented by your actual values,
82:23 and orange is represented by your predicted values.
82:27 At places you can see that
82:28 we've predicted pretty well,
82:29 like the predictions are pretty close
82:31 to the actual values.
82:32 In some cases, the predictions are varying a little bit.
82:37 So in a few places, it is actually varying,
82:39 but all of this depends on your input data as well.
82:42 When we saw the input data,
82:43 also we saw a lot of variation.
82:44 We saw a couple of outliers.
82:46 So, all that also might effect your output.
82:50 But then this is how you build machine learning models.
82:53 Initially, you're never going to get
82:55 a really good accuracy.
82:57 What you should do is you have to improve
82:58 your training process.
82:59 That's the best way you can predict better,
83:03 either you use a lot of data,
83:05 train your model with a lot of data,
83:07 or you use other methods like parameter tuning,
83:09 or basically you try and find another predictor variable
83:13 that'll help you more in predicting your output.
83:16 To me, this looks pretty good.
83:18 Now let me show you another plot.
83:20 What we're doing is we're drawing a straight line plot.
83:24 Okay, let's see how it looks.
83:31 So guys, this straight line
83:32 represents a linear relationship.
83:34 Now let's say you get a new data point.
83:36 Okay, let's say the value of x is around 20.
83:40 So by using this line,
83:41 you can predict that four a minimum temperature of 20,
83:44 your maximum temperature would be around 25
83:47 or something like that.
83:48 So, we basically drew a linear relationship
83:51 between our input and output variable over here.
83:53 And the final step is to evaluate
83:55 the performance of the algorithm.
83:57 This step is particularly important to compare
84:01 how well different algorithms perform
84:03 on a particular data set.
84:05 Now for regression algorithms,
84:06 three evaluation metrics are used.
84:09 We have something known as mean absolute error,
84:12 mean squared error, and root mean square error.
84:15 Now mean absolute error is nothing but
84:17 the absolute value of the errors.
84:20 Your mean squared error is a mean of the squared errors.
84:24 That's all.
84:24 It's basically you read this and you understand
84:27 what the error means.
84:28 A root mean squared error is the square root
84:30 of the mean of the squared errors.
84:33 Okay.
84:33 So these are pretty simple to understand
84:35 your mean absolute error, your mean squared errors,
84:37 your root mean squared error.
84:40 Now, luckily, we don't have to perform these
84:41 calculations manually.
84:43 We don't have to code each of these calculations.
84:46 The cycle on library comes with prebuilt functions
84:50 that can be used to find out these values.
84:52 Okay.
84:53 So, when you run this code, you will get
84:56 these values for each of the errors.
84:58 You'll get around 3.19 as the mean absolute error.
85:02 Your mean squared error is around 17.63.
85:05 Your root mean squared error is around 4.19.
85:09 Now these error values basically show that
85:12 our model accuracy is not very precise,
85:15 but it's still able to make a lot of predictions.
85:18 We can draw a good linear relationship.
85:20 Now in order to improve the efficiency at all,
85:22 there are a lot of methods like this,
85:24 parameter tuning and all of that,
85:25 or basically you can train your model with a lot more data.
85:29 Apart from that, you can use other predictor variables,
85:31 or maybe you can study the relationship between
85:34 other predictor variables
85:35 and your maximum temperature variable.
85:37 There area lot of ways
85:38 to improve the efficiency of the model.
85:41 But for now, I just wanted to make you understand
85:43 how linear regression works,
85:45 and I hope all of you have a good idea about this.
85:48 I hope all of you have a good understanding
85:50 of how linear regression works.
85:53 This is a small demo about it.
85:55 If any of you still have any doubts,
85:57 regarding linear regression,
85:59 please leave that in the comment section.
86:01 We'll try and solve all your errors.
86:04 So, if you look at this equation,
86:05 we calculated everything here.
86:07 we drew a relationship between y and x,
86:10 which is basically x was our minimum temperature,
86:12 y was our maximum temperature.
86:14 We also calculated the slope and the intercept.
86:17 And we also calculated the error in the end.
86:20 We calculated mean squared error
86:22 we calculated the root mean squared error.
86:24 We also calculate the mean absolute error.
86:27 So that was everything about linear regression.
86:30 This was a simple linear regression model.
86:33 Now let's move on and look at our next algorithm,
86:36 which is a logistic regression.
86:38 Now, in order to understand why we use logistic regression,
86:43 let's consider a small scenarios.
86:45 Let's say that your little sister
86:47 is trying to get into grad school
86:50 and you want to predict whether she'll get admitted
86:53 in her dream school or not.
86:55 Okay, so based on her CGPA and the past data,
86:58 you can use logistic regression
87:00 to foresee the outcome.
87:02 So logistic regression will allow you to analyze
87:04 the set of variables and predict a categorical outcome.
87:08 Since here we need to predict whether she will
87:10 get into a school or not,
87:13 which is a classification problem,
87:15 logistic regression will be used.
87:17 Now I know the first question in your head is,
87:20 why are we not using linear regression in this case?
87:23 The reason is that linear regression
87:25 is used to predict a continuous quantity,
87:27 rather than a categorical one.
87:29 Here we're going to predict
87:30 whether or not your sister is going to get into grad school.
87:33 So that is clearly a categorical outcome.
87:35 So when the result in outcome
87:37 can take only classes of values,
87:39 like two classes of values,
87:41 it is sensible to have a model that predicts the value
87:45 as either zero or one,
87:47 or in a probability form that ranges between zero and one.
87:51 Okay.
87:51 So linear regression does not have this ability.
87:54 If you use linear regression
87:56 to model a binary outcome,
87:58 the resulting model will not predict y values
88:01 in the range of zero and one,
88:04 because linear regression works on
88:05 continuous dependent variables,
88:07 and not on categorical variables.
88:09 That's why we make use of logistic regression.
88:13 So understand that linear regression was used
88:15 to predict continuous quantities,
88:17 and logistic regression is used to
88:19 predict categorical quantities.
88:22 Okay, now one major confusion that everybody has
88:26 is people keep asking me
88:28 why is logistic regression called logistic regression
88:31 when it is used for classification.
88:34 The reason it is named logistic regression
88:36 is because its primary technique
88:38 is very similar to logistic regression.
88:41 There's no other reason behind the naming.
88:43 It belongs to the general linear models.
88:46 It belongs to the same class as linear regression,
88:49 but that is not the other reason
88:51 behind the name logistic regression.
88:53 Logistic regression is mainly used
88:55 for classification purpose,
88:56 because here you'll have to predict a dependent variable
89:00 which is categorical in nature.
89:02 So this is mainly used for classification.
89:05 So, to define logistic regression for you,
89:07 logistic regression is a method used to predict
89:10 a dependent variable y,
89:12 given an independent variable x,
89:15 such that the dependent variable is categorical,
89:18 meaning that your output is a categorical variable.
89:21 So, obviously, this is classification algorithm.
89:25 So guys, again, to clear your confusion,
89:27 when I say categorical variable,
89:29 I mean that it can hold values like one or zero,
89:32 yes or no, true or false, and so on.
89:35 So, basically, in logistic regression,
89:37 the outcome is always categorical.
89:40 Now, how does logistic regression work?
89:43 So guys, before I tell you how logistic regression works,
89:47 take a look at this graph.
89:49 Now I told you that the outcome
89:51 in a logistic regression is categorical.
89:54 Your outcome will either be zero or one,
89:56 or it'll be a probability that ranges between zero and one.
90:00 So, that's why we have this S curve.
90:03 Now some of you might think that why do we have an S curve.
90:07 We can obviously have a straight line.
90:09 We have something known as a sigmoid curve,
90:12 because we can have values ranging between zero and one,
90:15 which will basically show the probability.
90:17 So, maybe your output will be 0.7,
90:20 which is a probability value.
90:22 If it is 0.7,
90:24 it means that your outcome is basically one.
90:27 So that's why we have this sigmoid curve like this.
90:30 Okay.
90:31 Now I'll explain more about this
90:33 in depth in a while.
90:35 Now, in order to understand how logistic regression works,
90:38 first, let's take a look at
90:39 the linear regression equation.
90:42 This was the logistic regression equation
90:43 that we discussed.
90:45 Y here stands for the dependent variable
90:47 that needs to be predicted
90:48 beta naught is nothing by the y intercept.
90:51 Beta one is nothing but the slope.
90:53 And X here represents the independent variable
90:55 that is used to predict y.
90:58 That E denotes the error on the computation.
91:01 So, given the fact that x is the independent variable
91:05 and y is the dependent variable,
91:08 how can we represent a relationship between x an y
91:11 so that y ranges only between zero and one?
91:14 Here this value basically denotes
91:16 probably of y equal to one,
91:19 given some value of x.
91:21 So here, because this Pr, denotes probability
91:24 and this value basically denotes that the probability
91:28 of y equal to one, given some value of x,
91:32 this is what we need to find out.
91:35 Now, if you wanted to calculate the probability
91:37 using the linear regression model,
91:39 then the probability will look something like
91:42 P of X equal to beta naught
91:43 plus beta one into X.
91:46 P of X will be equal to beta naught plus beta one into X,
91:50 where P of X nothing but
91:52 your probability of getting y equal to one,
91:55 given some value of x.
91:56 So the logistic regression equation
91:58 is derived from the same equation,
92:01 except we need to make a few alterations,
92:03 because the output is only categorical.
92:06 So, logistic regression does not necessarily calculate
92:10 the outcome as zero or one.
92:13 I mentioned this before.
92:14 Instead, it calculates the probability of a variable
92:18 falling in the class zero or class one.
92:20 So that's how we can conclude that
92:22 the resulting variable must be positive,
92:25 and it should lie between zero and one,
92:27 which means that it must be less than one.
92:30 So to meet these conditions,
92:32 we have to do two things.
92:34 First, we can take the exponent of the equation,
92:36 because taking an exponential of any value
92:39 will make sure that you get a positive number.
92:41 Correct?
92:42 Secondly, you have to make sure
92:43 that your output is less than one.
92:46 So, a number divided by itself plus one
92:49 will always be less than one.
92:51 So that's how we get this formula
92:53 First, we take the exponent of the equation,
92:56 beta naught plus beta one plus x
92:59 and then we divide it by that number plus one.
93:02 So this is how we get this formula.
93:05 Now the next step is to calculate
93:07 something known as a logic function.
93:09 Now the logic function is nothing,
93:11 but it is a link function
93:13 that is represented as an S curve
93:16 or as a sigmoid curve
93:18 that ranges between the value zero and one.
93:21 It basically calculates the probability
93:23 of the output variable.
93:25 So if you look at this equation, it's quite simple.
93:27 What we have done here is we just cross multiply
93:30 and take each of our beta naught
93:32 plus beta one into x as common.
93:35 The RHS denotes the linear equation for
93:38 the independent variables.
93:39 The LHS represents the odd ratio.
93:42 So if you compute this entire thing,
93:43 you'll get this final value,
93:45 which is basically your logistic regression equation.
93:48 Your RHS here denotes the linear equation
93:52 for independent variables,
93:53 and your LHS represents the odd ratio
93:56 which is also known as the logic function.
93:58 So I told you that logic function
94:00 is basically a function that represents an S curve
94:03 that bring zero and one.
94:06 this will make sure that our value
94:08 ranges between zero and one.
94:09 So in logistic regression,
94:11 on increasing this X by one measure,
94:15 it changes the logic by a factor of beta naught.
94:18 It's the same thing as I showed you in logistic regression.
94:22 So guys, that's how you derive
94:23 the logistic regression equation.
94:26 So if you have any doubts regarding these equations,
94:28 please leave them in the comment section,
94:30 and I'll get back to you, and I'll clear that out.
94:33 So to sum it up, logistic regression is used
94:36 for classification.
94:37 The output variable will always be a categorical variable.
94:40 We also saw how you derive the logistic regression equation.
94:45 And one more important thing is that
94:47 the relationship between the variables
94:49 and a logistic regression is denoted as
94:52 an S curve which is also knows as a sigmoid curve,
94:55 and also the outcome does not
94:57 necessarily have to be calculated as zero or one.
95:00 It can be calculate as a probability
95:02 that the output lies in class one or class zero.
95:07 So your output can be a probability
95:09 ranging between zero and one.
95:10 That's why we have a sigmoid curve.
95:13 So I hope all of you are clear with logistic regression.
95:16 Now I won't be showing you the demo right away.
95:19 I'll explain a couple of more classification algorithms.
95:22 Then I'll show you a practical demo
95:24 where we'll use multiple classification algorithms
95:27 to solve the same problem.
95:28 Again, we'll also calculate the accuracy
95:31 and se which classification algorithm is doing the best.
95:34 Now the next algorithm I'm gonna talk about
95:36 is decision tree.
95:37 Decision tree is one of my favorite algorithms,
95:39 because it's very simple to understand
95:41 how a decision tree works.
95:43 So guys, before this, we discussed linear regression,
95:47 which was a regression algorithm.
95:49 Then we discussed logistic regression,
95:51 which is a classification algorithm.
95:54 Remember, don't get confused just because
95:56 it has the name logistic regression.
95:58 Okay, it is a classification algorithm.
96:01 Now we're discussing decision tree,
96:02 which is again a classification algorithm.
96:05 Okay.
96:06 So what exactly is a decision tree?
96:08 Now a decision tree is, again,
96:10 a supervised machine learning algorithm
96:13 which looks like an inverted tree
96:15 wherein each node represents a predictor variable,
96:19 and the link between the node represents a decision,
96:23 and each leaf node represents an outcome.
96:25 Now I know that's a little confusing,
96:27 so let me make you understand what a decision tree is
96:30 with the help of an example.
96:32 Let's say that you hosted a huge party,
96:34 and you want to know how many of your gusts
96:36 are non-vegetarians.
96:38 So to solve this problem,
96:39 you can create a simple decision tree.
96:42 Now if you look at this figure over here,
96:44 I've created a decision tree that classifies a guest
96:47 as either vegetarian or non-vegetarian.
96:50 Our last outcome here is non-veg or veg.
96:54 So here you understand
96:55 that this is a classification algorithm,
96:57 because here you're predicting a categorical value.
97:00 Each node over here represents a predictor variable.
97:04 So eat chicken is one variable,
97:06 eat mutton is one variable,
97:08 seafood is another variable.
97:10 So each node represents a predictor variable
97:12 that will help you conclude whether or not
97:15 a guest is a non-vegetarian.
97:17 Now as you traverse down the tree,
97:20 you'll make decisions that each node
97:22 until you reach the dead end.
97:24 Okay, that's how it works.
97:25 So, let's say we got a new data point.
97:27 Now we'll pass it through the decision tree.
97:30 The first variable is did the guest
97:32 eat the chicken?
97:34 If yes, then he's a non-vegetarian.
97:35 If no, then you'll pass it to the next variable,
97:38 which is did the guest eat mutton?
97:41 If yes, then he's a non-vegetarian.
97:43 If no, then you'll pass it to the next variable,
97:46 which is seafood.
97:47 If he ate seafood, then he is a non-vegetarian.
97:50 If no, then he's a vegetarian.
97:53 this is how a decision tree works.
97:54 It's a very simple algorithm
97:56 that you can easily understand.
97:59 It has drawn out letters, which is very easy to understand.
98:02 Now let's understand the structure of a decision tree.
98:05 I just showed you an example
98:07 of how the decision tree works.
98:09 Now let me take the same example
98:11 and tell you the structure for decision tree.
98:13 So, first of all, we have something known as the root node.
98:16 Okay.
98:17 The root node is the starting point
98:20 of a decision tree.
98:21 Here you'll perform the first split
98:23 and split it into two other nodes
98:25 or three other nodes, depending on your problem statement.
98:28 So the top most node is known as your root node.
98:31 Now guys, about the root node,
98:34 the root node is assigned to a variable
98:36 that is very significant,
98:38 meaning that that variable is very important
98:41 in predicting the output.
98:43 Okay, so you assign a variable
98:45 that you think is the most significant at the root node.
98:48 After that, we have something known as internal nodes.
98:51 So each internal node represents a decision point
98:54 that eventually leads to the output.
98:57 Internal nodes will have other predictor variables.
99:00 Each of these are nothing predictor variables.
99:02 I just made it into a question
99:04 otherwise these are just predictor variables.
99:07 Those are internal nodes.
99:09 Terminal nodes, also known as the leaf node,
99:11 represent the final class of the output variable,
99:15 because these are basically your outcomes,
99:17 non-veg and vegetarian.
99:19 Branches are nothing but connections between nodes.
99:22 Okay, these connections are links between
99:25 each node is known as a branch,
99:27 and they're represented by arrows.
99:29 So each branch will have some response to it,
99:32 either yes or no, true or false, one or zero, and so on.
99:37 Okay.
99:37 So, guys, this is the structure of a decision tree.
99:40 It's pretty understandable.
99:41 Now let's move on and
99:44 we'll understand how the decision tree algorithm works.
99:48 Now there are many ways to build a decision tree,
99:51 but I'll be focusing on something known
99:56 Okay, this is something known as the ID3 algorithm.
99:59 That is one of the ways in which you can build
100:01 the decision tree.
100:02 ID3 stands for Iterative Dichotomiser 3 algorithm,
100:07 which is one of the most effective algorithms
100:09 used to build a decision tree.
100:11 It uses the concepts of entropy and information gain
100:15 in order to build a decision tree.
100:17 Now you don't have to know what exactly
100:19 the ID3 algorithm is.
100:20 It's just a concept behind building a decision tree.
100:24 Now the ID3 algorithm has around six defined steps
100:27 in order to build a decision tree.
100:29 So the first step is you will select the best attribute.
100:33 Now what do you mean by the best attribute?
100:36 So, attribute is nothing but
100:37 the predictor variable over here.
100:39 So you'll select the best predictor variable.
100:42 Let's call it A.
100:43 After that, you'll assign this A
100:45 as a decision variable for the root node.
100:48 Basically, you'll assign this predictor variable A
100:52 at the root node.
100:53 Next, what you'll do is for each value of A,
100:55 you'll build a descendant of the node.
100:58 Now these three steps, let's look at it
101:01 with the previous example.
101:03 Now here the best attribute is eat chicken.
101:06 Okay, this is my best attribute variable over here.
101:09 So I selected that attribute.
101:11 And what is the next step?
101:13 Step two was assigned that as a decision variable.
101:16 So I assigned eat chick as the decision variable
101:19 at the root node.
101:21 Now you might be wondering how do I know
101:22 which is the best attribute.
101:24 I'll explain all of that in a while.
101:26 So what we did is we assigned this other root node.
101:29 After that, step number three says for each value of A,
101:33 build a descendant of the node.
101:36 So for each value of this variable,
101:39 build a descendant node.
101:41 So this variable can take two values, yes and no.
101:44 So for each of these values,
101:46 I build a descendant node.
101:48 Step number four, assign classification labels
101:52 to the leaf node.
101:53 To your leaf node, I have assigned
101:55 classification one as non-veg, and the other is veg.
101:58 That is step number four.
102:00 Step number five is if data is correctly classified,
102:03 then you stop at that.
102:05 However, if it is not,
102:07 then you keep iterating over the tree,
102:09 and keep changing the position of
102:11 the predictor variables in the tree,
102:14 or you change the root node also
102:16 in order to get the correct output.
102:19 So now let me answer this question.
102:21 What is the best attribute?
102:23 What do you mean by the best attribute
102:25 or the best predictor variable?
102:27 Now the best attribute is the one
102:30 that separates the data into different classes,
102:33 most effectively, or it is basically
102:36 a feature that best splits the data set.
102:39 Now the next question in your head must be how do I decide
102:44 which variable or which feature best splits the data.
102:49 To do this, there are two important measures.
102:52 There's something known as information gain
102:54 and there's something known as entropy.
102:57 Now guys, in order to understand
102:58 information gain and entropy,
103:00 we look at a simple problem statement.
103:02 This data represents the speed of a car
103:05 based on certain parameters.
103:07 So our problem statement here is to study the data set
103:10 and create a decision tree that classifies
103:13 the speed of the car as either slow or fast.
103:17 So our predictor variables here are road type,
103:19 obstruction, and speed limit,
103:21 and or response variable, or our output variable is speed.
103:25 So we'll be building a decision tree using these variables
103:28 in order to predict the speed of car.
103:31 Now like I mentioned earlier,
103:32 we must first begin by deciding a variable
103:35 that best splits the data set
103:37 and assign that particular variable to the root node
103:40 and repeat the same thing for other nodes as well.
103:44 So step one, like we discussed earlier,
103:46 is to select the best attribute A.
103:49 Now, how do you know which variable best separates the data?
103:52 The variable with the highest information gain
103:55 best derives the data into the desired output classes.
103:59 First of all, we'll calculate two measures.
104:01 We'll calculate the entropy and the information gain.
104:04 Now this is where it ell you what exactly entropy is,
104:07 and what exactly information gain is.
104:10 Now entropy is basically used to measure
104:13 the impurity or the uncertainty present in the data.
104:17 It is used to decide how a decision tree can split the data.
104:22 Information gain, on the other hand,
104:23 is the most significant measure
104:25 which is used to build a decision tree.
104:28 It indicates how much information a particular variable
104:32 gives us a bout the final outcome.
104:34 So information gain is important,
104:36 because it is used to choose a variable
104:37 that best splits the data at each node
104:40 for a decision tree.
104:41 Now the variable with the highest information gain
104:43 will be used to split the data at the root node.
104:46 Now in our data set, there are are four observations.
104:49 So what we're gonna do is we'll start by calculating
104:52 the entropy and information gain
104:55 for each of the predictor variable.
104:57 So we're gonna start by calculating the information gain
104:59 and entropy for the road type variable.
105:02 In our data set, you can see that
105:04 there are four observations.
105:06 There are four observations in the road type column,
105:08 which corresponds to the four labels in the speed column.
105:12 So we're gonna begin by calculating the information gain
105:15 of the parent node.
105:16 The parent node is nothing but the speed of the care node.
105:20 This is our output variable, correct?
105:22 It'll be used to show whether the speed of the car
105:26 is slow or fast.
105:27 So to find out the information gain of the
105:30 speed of the car variable,
105:32 we'll go through a couple of steps.
105:34 Now we know that there are four observations
105:36 in this parent node.
105:38 First, we have slow.
105:39 Then again we have slow, fast, and fast.
105:43 Now, out of these four observations, we have two classes.
105:47 So two observations belong to the class slow,
105:50 and two observations belong to the class fast.
105:54 So that's how you calculate P slow and P fast.
105:57 P slow is nothing by the fraction
105:59 of slow outcomes in the parent node,
106:01 and P fast is the fraction of fast outcomes
106:04 in the parent node.
106:06 And the formula to calculate P slow
106:08 is the number of slow outcomes in the parent node
106:11 divided by the total number of outcomes.
106:14 So the number of slow outcomes in the parent node is two,
106:17 and the total number of outcomes is four.
106:20 We have four observations in total.
106:22 So that's how we get P of slow as 0.5.
106:25 Similarly, for P of fast, you'll calculate
106:28 the number of fast outcomes
106:30 divided by the total number of outcomes.
106:32 So again, two by four, you'll get 0.5.
106:36 The next thing you'll do is you'll calculate
106:37 the entropy of this node.
106:39 So to calculate the entropy, this is the formula.
106:42 All you have to do is you have to substitute the,
106:45 you'll have to substitute the value in this formula.
106:48 So P of slow we're substituting as 0.5.
106:50 Similarly, P of fast as 0.5.
106:53 Now when you substitute the value,
106:54 you'll get a answer of one.
106:57 So the entropy of your parent node is one.
107:00 So after calculating the entropy of the parent node,
107:04 we'll calculate the information gain of the child node.
107:08 Now guys, remember that if the information gain
107:11 of the road type variable is great than the information gain
107:16 of all the other predictor variables,
107:18 only then the root node can be split by using
107:21 the road type variable.
107:23 So, to calculate the information gain of road type variable,
107:25 we first need to split the root node
107:27 by sing the road type variable.
107:30 We're just doing this in order to check
107:32 if the road type variable
107:33 is giving us maximum information about a data.
107:37 Okay, so if you notice that road type has two outcomes,
107:40 it has two values, either steep or flat.
107:44 Now go back to our data set.
107:46 So here what you can notice is
107:48 whenever the road type is steep,
107:50 so first what we'll do is we'll check
107:52 the value of speed that we get
107:54 when the road type is steep.
107:56 So, first, observation.
107:57 You see that whenever the road type is steep,
108:00 you're getting a speed of slow.
108:01 Similarly, in the second observation,
108:03 when the road type is steep,
108:05 you'll get a value of slow again.
108:07 If the road type is flat, you'll get an observation of fast.
108:11 And again, if it is steep, there is a value of fast.
108:15 So for three steep values,
108:17 we have slow, slow, and fast.
108:19 And when the road type is flat,
108:21 we'll get an output of fast.
108:23 That's exactly what I've done in this decision tree.
108:27 So whenever the road type is steep,
108:29 you'll get slow, slow or fast.
108:31 And whenever the road type is flat,
108:33 you'll get fast.
108:35 Now the entropy of the right-hand side is zero.
108:39 Entropy is nothing but the uncertainty.
108:41 There's no uncertainty over here.
108:43 Because as soon as you see that the road type is flat,
108:46 your output is fast.
108:48 So there's no uncertainty.
108:50 But when the road type is steep,
108:52 you can have any one of the following outcomes,
108:54 either your speed will be slow
108:56 or it can be fast.
108:58 So you'll start by calculating the entropy
109:01 of both RHS and LHS of the decision tree.
109:05 So the entropy for the right side child node will be zero,
109:08 because there's no uncertainty here.
109:10 Immediately, if you see that the road type is flat,
109:13 your speed of the car will be fast.
109:15 Okay, so there's no uncertainty here,
109:17 and therefore your entropy becomes zero.
109:20 Now entropy for the left-hand side
109:22 is we'll again have to calculate
109:23 the fraction of P slow and the fraction of P fast.
109:26 So out of three observations,
109:28 in two observations we have slow.
109:30 That's why we have two by three over here.
109:32 Similarly for P fast,
109:34 we have one P fast
109:35 divided by the total number of observation which are three.
109:38 So out of these three, we have two slows and one fast.
109:41 When you calculate P slow and P fast,
109:43 you'll get these two values.
109:45 And then when you substitute the entropy in this formula,
109:48 you'll get the entropy as 0.9 for the road type variable.
109:52 I hope you all are understanding this.
109:54 I'll go through this again.
109:56 So, basically, here we are calculating
109:58 the information gain and entropy for road type variable.
110:02 Whenever you consider road type variable,
110:04 there are two values, steep and flat.
110:07 And whenever the value for road type is steep,
110:10 you'll get anyone of these three outcomes,
110:12 either you'll get slow, slow, or fast.
110:15 And when the road type is flat,
110:17 your outcome will be fast.
110:19 Now because there is no uncertainty
110:20 whenever the road type is flat,
110:22 you'll always get an outcome of fast.
110:25 This means that the entropy here is zero,
110:27 or the uncertainty value here is zero.
110:30 But here, there is a lot of uncertainty.
110:33 So whenever your road type is steep,
110:34 your output can either be slow or it can be fast.
110:37 So, finally, you get the Python as 0.9.
110:40 So in order to calculate the information gain
110:43 of the road type variable.
110:45 You need to calculate the weighted average.
110:47 I'll tell you why.
110:49 In order to calculate the information gain,
110:51 you need to know the entropy of the parent,
110:53 which we calculate as one,
110:55 minus the weighted average into the entropy
110:57 of the children.
110:59 Okay.
110:59 So for this formula, you need to calculate
111:01 all of these values.
111:03 So, first of all, you need to calculate the entropy
111:05 of the weighted average.
111:06 Now the total number of outcomes in the parent node
111:09 we saw were four.
111:10 The total number of outcomes
111:11 in the left child node were three.
111:13 And the total number of outcomes in the right child node
111:16 was one.
111:17 Correct?
111:18 In order to verify this with you,
111:20 the total number of outcomes in the parent node are four.
111:24 One, two, three, and four.
111:26 Coming to the child node, which is the road type,
111:30 the total number of outcomes on the right-hand side
111:33 of the child node is one.
111:34 And the total number of outcomes
111:36 on the left-hand side of the child node is three.
111:39 That's exactly what I've written over here.
111:41 Alright, I hope you all understood these three values.
111:44 After that, all you have to do is
111:45 you have to substitute these values in this formula.
111:48 So when you do that, you'll get
111:50 the entropy of the children with weighted average
111:52 will be around 0.675.
111:55 Now just substitute the value in this formula.
111:58 So if you calculate the information gain
112:00 of the road type variable,
112:02 you'll get a value of 0.325.
112:05 Now by using the same method,
112:07 you're going to calculate the information gain
112:08 for each of the predictor variable,
112:11 for road type, for obstruction, and for speed limit.
112:14 Now when you follow the same method
112:16 and you calculate the information gain,
112:19 you'll get these values.
112:20 Now what does this information gain for road type
112:23 equal to 0.325 denote?
112:25 Now the value 0.325 for road type denotes that
112:29 we're getting very little information gain
112:32 from this road type variable.
112:33 And for obstruction, we literally have
112:36 information gain of zero.
112:38 Similarly, information gained for speed limit is one.
112:41 This is the highest value we've got for information gain.
112:44 This means that we'll have to use the speed limit variable
112:47 at our root node in order to split the data set.
112:51 So guys, don't get confused whichever variable
112:54 gives you the maximum information gain.
112:56 That variable has to be chosen at the root node.
112:59 So that's why we have the root node as speed limit.
113:03 So if you've maintained the speed limit,
113:05 then you're going to go slow.
113:06 But if you haven't maintained the speed limit,
113:08 then the speed of your car is going to be fast.
113:11 Your entropy is literally zero,
113:14 and your information is one,
113:16 meaning that you can use this variable at your root node
113:19 in order to split the data set,
113:21 because speed limit gives you the maximum information gain.
113:25 So guys, I hope this use case is clear to all of you.
113:29 To sum everything up,
113:30 I'll just repeat the entire thing to you all once more.
113:34 So basically, here you were given a problem statement
113:36 in order to create a decision tree
113:38 that classifies the speed of a car as either slow or fast.
113:42 So you were given three predictor variables
113:44 and this was your output variable.
113:46 Information gained in entropy are basically two measures
113:49 that are used to decide which variable
113:51 will be assigned to the root node of a decision tree.
113:54 Okay.
113:55 So guys, as soon as you look at the data set,
113:57 if you compare these two columns,
113:58 that is speed limit and speed,
114:00 you'll get an output easily.
114:02 Meaning that if you're maintaining speed limit,
114:04 you're going to go slow.
114:06 But if you aren't maintaining speed limit,
114:07 you're going to a fast.
114:10 So here itself we can understand the speed limit
114:12 has no uncertainty.
114:14 So every time you've maintained your speed limit,
114:17 you will be going slow,
114:18 and every time your outside or speed limit,
114:20 you will be going fast.
114:22 It's as simple as that.
114:24 So how did you start?
114:26 So you started by calculating
114:28 the entropy of the parent node.
114:31 You calculated the entropy of the parent node,
114:34 which came down to one.
114:36 Okay.
114:37 After that, you calculated the information gain
114:40 of each of the child nodes.
114:42 In order to calculate
114:43 the information gain of the child node,
114:44 you stat by calculating the entropy
114:46 of the right-hand side and the left-hand side
114:48 of the decision tree.
114:50 Okay.
114:50 Then you calculate the entropy
114:52 along with the weighted average.
114:54 You substitute these values in the information gain formula,
114:58 and you get the information gain
114:59 for each of the predictor variables.
115:01 So after you get the information gain
115:03 of each of the predictor variables,
115:05 you check which variable gives you
115:06 the maximum information gain,
115:08 and you assign that variable to your root node.
115:11 It's as simple as that.
115:14 So guys, that was all about decision trees.
115:17 Now let's look at our next classification algorithm
115:20 which is random forest.
115:22 Now first of all, what is a random forest?
115:25 Random forest basically builds multiple decision trees
115:28 and glues them together to get a more accurate
115:31 and stable prediction.
115:33 Now if already have decision trees
115:35 and random forest is nothing but
115:37 a collection of decision tree,
115:39 why do we have to use a random forest
115:41 when we already have decision tree?
115:43 There are three main reasons why random forest is used.
115:45 Now even though decision trees are convenient
115:49 and easily implemented,
115:50 they are not as accurate as random forest.
115:53 Decision trees work very effectively
115:56 with the training data,
115:57 backup they're not flexible
115:58 when it comes to classifying anew sample.
116:01 Now this happens because of something known as overfitting.
116:05 Now overfitting is a problem
116:06 that is seen with decision trees.
116:09 It's something that commonly occurs
116:10 when we use decision trees.
116:12 Now overfitting occurs when a model studies
116:15 a training data to such an extent
116:17 that it negatively influences
116:19 the performance of the model on a new data.
116:23 Now this means that the disturbance
116:26 in the training data is recorded,
116:27 and it is learned as concept by the model.
116:30 If there's any disturbance
116:32 or any thought of noise in the training data
116:35 or any error in the training data,
116:37 that is also studied by the model.
116:39 The problem here is that these concepts
116:41 do not apply to the testing data,
116:43 and it negatively impacts the model's ability
116:46 to classify new data.
116:48 So to sum it up,
116:50 overfitting occurs whenever your model
116:53 learns the training data,
116:54 along with all the disturbance in the training data.
116:57 So it basically memorized the training data.
116:59 And whenever a new data will be given to your model,
117:02 it will not predict the outcome very accurately.
117:05 now this is a problem seen in decision trees.
117:08 Okay.
117:08 But in random forest, there's something known as bagging.
117:11 Now the basic idea behind bagging is
117:14 to reduce the variations and the predictions
117:16 by combining the result of multiple decision trees
117:20 on different samples of the data set.
117:22 So your data set will be divided into different samples,
117:26 and you'll be building a decision tree
117:28 on each of these samples.
117:29 This way, each decision tree will be studying
117:32 one subset of your data.
117:34 So this way over fitting will get reduced
117:36 because one decision tree is not studying
117:38 the entire data set.
117:40 Now let's focus on random forest.
117:43 Now in order to understand random forest,
117:44 we look at a small example.
117:45 We can consider this data set.
117:48 In this data, we have four predictor variables.
117:51 We have blood flow, blocked arteries,
117:53 chest pain, and weight.
117:55 Now these variables are used to predict
117:57 whether or not a person has a heart disease.
118:00 So we're going to use this data set
118:02 to create a random forest that predicts
118:04 if a person has a heart disease or not.
118:07 Now the first step in creating a random forest
118:10 is that you create a bootstrap data set.
118:13 Now in bootstrapping, all you have to do
118:14 is you have to randomly select samples
118:17 from your original data set.
118:19 Okay.
118:20 And a point to note is that you can select the same sample
118:23 more than once.
118:25 So if you look at the original data set,
118:28 we have a abnormal, normal, normal, and abnormal.
118:31 Look at the blood flow section.
118:33 Now here I've randomly selected samples,
118:35 normal, abnormal,
118:37 and I've selected one sample twice.
118:40 You can do this in a bootstrap data set.
118:42 Now all I did here is
118:43 I created a bootstrap data set.
118:46 Boot strapping is nothing but an estimation method
118:48 used to make predictions on a data
118:50 by re-sampling the data.
118:53 This is a bootstrap data set.
118:55 Now even though this seems very simple,
118:57 in real world problems,
118:58 you'll never get such small data set.
119:01 Okay, so bootstrapping
119:02 is actually a little more complex than this.
119:05 Usually in real world problems,
119:06 you'll have a huge data set,
119:08 and bootstrapping that data set is actually
119:10 a pretty complex problem.
119:12 I'm here because I'm making you understand
119:14 how random forest works,
119:15 so that's why I've considered a small data set.
119:17 Now you're going to use the bootstrap data set
119:19 that you created,
119:20 and you're going to build decision trees from it.
119:23 Now one more thing to note in random forest is
119:26 you will not be using your entire data set.
119:29 Okay, so you'll only be using few other variables
119:31 at each node.
119:32 So, for example, we'll only consider two variables
119:35 at each step.
119:37 So if you begin at the root node here,
119:38 we will randomly select two variables
119:41 as candidates for the root node.
119:42 Okay, let's say that we selected blood flow
119:45 and blocked arteries.
119:47 Out of these two variables we have to select the variable
119:50 that best separates the sample.
119:52 Okay.
119:53 So for the sake of this example,
119:54 let's say that blocked arteries
119:56 is the most significant predictor,
119:58 and that's why we'll assign it to the root node.
120:00 Now our next step is to repeat the same process
120:03 for each of these upcoming branch nodes.
120:05 Here we'll again select two variables at random
120:08 as candidates for each of these branch nodes,
120:10 and then choose a variable
120:12 that best separates the samples, right?
120:14 So let me just repeat this entire process.
120:17 So you know that you start creating a decision tree
120:20 by selecting the root node.
120:21 In random forest, you'll randomly select
120:24 a couple of variables for each node,
120:26 and then you'll calculate which variable
120:28 best splits the data at that node.
120:31 So for each node, we'll randomly select
120:33 two or three variables.
120:35 And out of those two, three variables,
120:37 we'll see which variable best separates the data.
120:41 Okay, so at each node,
120:42 we'll because calculating information gain an entropy.
120:45 Basically, that's what I mean.
120:46 At every node, you'll calculate information gain
120:48 and entropy of two or three variables,
120:50 and you'll see which variable
120:51 has the highest information gain,
120:53 and you'll keep descending downwards.
120:55 That's how you create a decision tree.
120:58 So we just created our first decision tree.
121:00 Now what you do is you'll go back to step one,
121:03 and you'll repeat the entire process.
121:05 So each decision tree will predict the output class
121:07 based on the predictor variables
121:09 that you've assigned to each decision tree.
121:11 Now let's say for this decision tree,
121:13 you've assigned blood flow.
121:14 Here we have blocked arteries at the root node.
121:17 Here we might have blood flow at the root node and so on.
121:21 So your output will depend on which predictor variable
121:25 is at the root node.
121:27 So each decision tree will predict the output class
121:29 based on the predictor variable
121:31 that you assigned in that tree.
121:34 Now what you do is you'll go back to step one,
121:36 you'll create a new bootstrap data set,
121:38 and then again you'll build a new decision tree.
121:41 And for that decision tree,
121:42 you'll consider only a subset of variables,
121:45 and you'll choose the best predictor variable
121:47 by calculating the information gain.
121:49 So you will keep repeating this process.
121:52 So you just keep repeating step two and step one.
121:55 Okay.
121:56 And you'll keep creating multiple decision trees.
121:58 Okay.
121:59 So having a variety of decision trees in a random forest
122:03 is what makes it more effective than
122:05 an individual decision tree.
122:08 So instead of having an individual decision tree,
122:10 which is created using all the features,
122:13 you can build a random forest
122:15 that uses multiple decision trees
122:17 wherein each decision tree has a random set
122:20 of predictor variables.
122:22 Now step number four is predicting the outcome
122:24 of a new data point.
122:26 So now that you've created a random forest,
122:29 let's see how it can be used
122:30 to predict whether a new patient has a heart disease or not.
122:34 Okay, now this diagram basically has a data
122:39 Okay, this is the data about the new patient.
122:41 He doesn't have blocked arteries.
122:43 He has chest pain, and his weight is around 185 kgs.
122:47 Now all you have to do is you have to run this data
122:50 down each of the decision trees that you made.
122:53 So, the first decision tree shows that
122:55 yes, this person has heart disease.
122:58 Similarly, you'll run the information of this new patient
123:01 through every decision tree that you created.
123:03 Then depending on how many votes you get for yes and no,
123:07 you'll classify that patient
123:09 as either having heart disease or not.
123:12 All you have to do is you have to run
123:13 the information of the new patient
123:15 through all the decision trees that you created
123:17 in the previous step,
123:18 and the final output is based on the number of votes
123:22 each of the class is getting.
123:23 Okay, let's say that three decision trees
123:25 said that yes the patient has heart disease,
123:28 and one decision tree said that no it doesn't have.
123:31 So this means you will obviously classify
123:33 the patient as having a heart disease
123:35 because three of them voted for yes.
123:38 It's based on majority.
123:41 So guys, I hope the concept behind random forest
123:44 is understandable.
123:45 Now the next step is you will evaluate
123:47 the efficiency of the model.
123:49 Now earlier when we created the bootstrap data set
123:52 we left out one entry sample.
123:55 This is the entry sample we left out,
123:57 because we repeated one sample twice.
124:00 If you'll remember in the bootstrap data set,
124:03 here we repeated an entry twice,
124:05 and we missed out on one of the entries.
124:10 So what we're gonna do is...
124:11 So for evaluating the model,
124:13 we'll be using the data entry that we missed out on.
124:17 Now in a real world problem,
124:19 about 1/3 of the original data set is not included
124:22 in the bootstrap dataset.
124:25 Because there's a huge amount of data
124:26 in a real world problem,
124:27 so 1/3 of the original data set
124:29 is not included in the bootstrap data set.
124:32 So guys, the sample data set
124:34 which is not there in your bootstrap data set
124:36 is known as out-of-bag data set,
124:39 because basically this is our out-of-bag data set.
124:42 Now the out-of-bag data set
124:43 is used to check the accuracy of the model.
124:47 Because the model was not created
124:48 by using the out-of-bag data set,
124:50 it will give us a good understanding
124:52 of whether the model is effective or not.
124:55 Now the out-of-bag data set
124:56 is nothing but your testing data set.
124:59 Remember, in machine learning, there's training
125:01 and testing data set.
125:02 So your out-of-bag data set
125:03 is nothing but your testing data set.
125:06 This is used to evaluate the efficiency of your model.
125:09 So eventually, you can measure the accuracy
125:12 of a random forest by the proportion
125:15 of out-of-bag samples that are correctly classified,
125:19 because the out-of-bag data set
125:20 is used to evaluate the efficiency of your model.
125:23 So you can calculate the accuracy
125:25 by understanding how many samples
125:28 or was this out-of-bag data set
125:30 correctly able to classify it.
125:33 So guys, that was an explanation about
125:35 how random forest works.
125:37 To give you an overview,
125:38 let me just run you through all the steps that we took.
125:42 So basically, this was our data set,
125:44 and all we have to do is we have to predict
125:45 whether a patient has heart disease or not.
125:48 So, our first step was to create a bootstrap data set.
125:52 A bootstrap data set is nothing but randomly selected
125:55 observations from your original data set,
125:58 and you can also have duplicate values
126:01 in your bootstrap data set.
126:02 Okay.
126:03 The next step is you're going to create a decision tree
126:05 by considering a random set of predictor variables
126:09 for each decision tree.
126:10 Okay.
126:11 So, the third step is you'll go back to step one,
126:14 create a bootstrap data set.
126:16 Again, create a decision tree.
126:18 So this iteration is performed hundreds of times
126:21 until you are multiple decision trees.
126:24 Now that you've created a random forest,
126:25 you'll use this random forest to predict the outcome.
126:28 So if you're given a new data point
126:30 and you have to classify it
126:31 into one of the two classes,
126:33 we'll just run this new information
126:35 through all the decision trees.
126:37 And you'll just take the majority
126:39 of the output that you're getting from the decision trees
126:41 as your outcome.
126:43 Now in order to evaluate the efficiency of the model,
126:45 you'll use the out of the bag sample data set.
126:48 Now the out-of-bag sample is basically the sample
126:51 that was not included in your bootstrap data set,
126:54 but this sample is coming
126:56 from your original data set, guys.
126:57 This is not something that you randomly create.
127:00 This data set was there in your original data set,
127:02 but it was just not mentioned
127:04 in your bootstrap data set.
127:06 So you'll use your out-of-bag sample
127:09 in order to calculate the accuracy
127:11 of your random forest.
127:13 So the proportion of out-of-bag samples
127:15 that are correctly classified will give you the accuracy
127:18 of your model.
127:19 So that is all for random forest.
127:22 So guys, I'll discuss other
127:23 classification algorithms with you,
127:25 and only then I'll show you a demo on
127:27 the classification algorithms.
127:29 Now our next algorithm is something known as naive Bayes.
127:34 Naive Bayes is, again,
127:36 a supervised classification algorithm,
127:39 which is based on the Bayes Theorem.
127:42 Now the Bayes Theorem basically follows
127:44 a probabilistic approach.
127:45 The main idea behind naive Bayes is that
127:49 the predictor variables in a machine learning model
127:52 are independent of each other,
127:54 meaning that the outcome of a model
127:57 depends on a set of independent variables
128:00 that have nothing to do with each other.
128:04 Now a lot of you might ask why is naive Bayes
128:06 called naive.
128:07 Now usually, when I tell anybody why naive Bayes,
128:10 they keep asking me why is naive Bayes called naive.
128:14 So in real world problems predictor variables
128:16 aren't always independent of each other.
128:19 There is always some correlation
128:21 between the independent variables.
128:23 Now because naive Bayes considers each predictor variable
128:26 to be independent of any other variable in the model,
128:30 it is called naive.
128:31 This is an assumption that naive Bayes states.
128:34 Now let's understand the math
128:35 behind the naive Bayes algorithm.
128:37 So like I mentioned, the principle behind naive Bayes
128:40 is the Bayes Theorem,
128:42 which is also known as the Bayes Rule.
128:44 The Bayes Theorem is used to calculate
128:46 the conditional probability,
128:48 which is nothing but the probability of an event occurring
128:52 based on information about the events in the past.
128:56 This is the mathematical equation for the Bayes Theorem.
129:00 Now, in this equation, the LHS is nothing
129:03 but the conditional probability of event A occurring,
129:07 given the event B.
129:08 P of A is nothing but probability of event A occurring
129:11 P of B is probability of event B.
129:14 And PB of A is nothing but the conditional probability
129:17 of event B occurring, given the event A.
129:22 Now let's try to understand how naive Bayes works.
129:25 Now consider this data set of
129:27 around thousand 500 observations.
129:30 Okay, here we have the following output classes.
129:33 We have either cat, parrot, or turtle.
129:36 These are our output classes,
129:38 and the predictor variables are
129:40 swim, wings, green color, and sharp teeth.
129:44 Okay.
129:45 So, basically, your type is your output variable,
129:48 and swim, wings, green, and sharp teeth
129:51 are your predictor variables.
129:53 Your output variables has three classes,
129:55 cat, parrot, and turtle.
129:57 Okay.
129:58 Now I've summarized this table I've shown on the screen.
130:02 The first thing you can see is the class of type cats
130:05 shows that out of 500 cats,
130:08 450 can swim,
130:10 meaning that 90% of them can.
130:12 And zero number of cats have wings,
130:15 and zero number of cats are green in color,
130:18 and 500 out of 500 cats have sharp teeth.
130:22 Okay.
130:22 Now, coming to parrot, it says 50 out of 500 parrots
130:27 have true value for swim.
130:29 Now guys, obviously, this does not hold true
130:31 in real world.
130:32 I don't think there are any parrots who can swim,
130:34 but I've just created this data set
130:36 so that we can understand naive Bayes.
130:39 So, meaning that 10% of parrots have true value for swim.
130:43 Now all 500 parrots have wings,
130:46 and 400 out of 500 parrots are green in color,
130:50 and zero parrots have sharp teeth.
130:53 Coming to the turtle class,
130:55 all 500 turtles can swim.
130:57 Zero number of turtles have wings.
131:00 And out of 500, hundred turtles are green in color,
131:04 meaning that 20% of the turtles are green in color.
131:07 And 50 out of 500 turtles have sharp teeth.
131:11 So that's what we understand from this data set.
131:14 Now the problem here is
131:16 we are given our observation over here,
131:18 given some value for swim, wings, green, and sharp teeth.
131:22 What we need to do is we need to predict
131:24 whether the animal is a cat, parrot, or a turtle,
131:27 based on these values.
131:30 So the goal here to predict whether it is a cat,
131:33 parrot, or a turtle
131:35 based on all these defined parameters.
131:37 Okay.
131:38 Based on the value of swim, wings, green, and sharp teeth,
131:41 we'll understand whether the animal is a cat,
131:43 or is it a parrot, or is it a turtle.
131:47 So, if you look at the observation,
131:48 the variables swim and green have a value of true,
131:54 and the outcome can be anyone of the types.
131:56 It can either be a cat, it can be a parrot,
131:59 or it can be a turtle.
132:00 So in order to check if the animal is a cat,
132:02 all you have to do is you have to calculate
132:04 the conditional probability at each step.
132:07 So here what we're doing is
132:08 we need to calculate the probability that
132:11 this is a cat,
132:12 given that it can swim and it is green in color.
132:16 First, we'll calculate the probability that it can swim,
132:19 given that it's a cat.
132:21 And two, the probability that it is green
132:24 and the probability of it being green,
132:26 given that it is a cat,
132:28 and then we'll multiply it
132:30 with the probability of it being a cat
132:32 divided by the probability of swim and green.
132:35 Okay.
132:36 So, guys, I know you all can calculate the probability.
132:38 It's quite simple.
132:40 So once you calculate the probability here,
132:42 you'll get a direct value of zero.
132:45 Okay, you'll get a value of zero,
132:46 meaning that this animal is definitely not a cat.
132:49 Similarly, if you do this for parrots,
132:51 you calculate a conditional probability,
132:53 you'll get a value of 0.0264
132:57 divided by probability of swim comma green.
133:00 We don't know this probability.
133:02 Similarly, if you check this for the turtle,
133:05 you'll get a probability of 0.066
133:07 divided by P swim comma green.
133:10 Okay.
133:11 Now for these calculations, the denominator is the same.
133:15 The value of the denominator is the same,
133:17 and the value of and the probability of it
133:21 being a turtle is greater than that of a parrot.
133:25 So that's how we can correctly predict
133:27 that the animal is actually a turtle.
133:30 So guys, this is how naive Bayes works.
133:32 You basically calculate
133:33 the conditional probability at each step.
133:36 Whatever classification needs to be done,
133:39 that has to be calculated through probability.
133:41 There's a lot of statistic
133:42 that comes into naive Bayes.
133:44 And if you all want to learn more about statistics
133:47 and probability,
133:49 I'll leave a link in the description.
133:50 You all can watch that video as well.
133:52 There I've explain exactly
133:54 what conditional probability is,
133:56 and the Bayes Theorem is also explained very well.
133:58 So you all can check out that video also.
134:01 And apart from this, if you all have any doubts
134:03 regarding any of the algorithms,
134:06 please leave them in the comment section.
134:07 Okay, I'll solve your doubts.
134:09 And apart from that, I'll also leave a couple of links
134:12 for each of the algorithms in the description box.
134:14 Because if you want more in-depth understanding
134:17 of each of the algorithms,
134:18 you can check out that content.
134:20 Since this is a full course video,
134:22 I have to cover all the topics,
134:23 and it is hard for me to make you understand
134:25 in-depth of each topic.
134:27 So I'll leave a couple of links in the description box.
134:29 You can watch those videos as well.
134:32 Make sure you checkout the probability
134:33 and statistics video.
134:36 So now let's move on and locate our next algorithm,
134:39 which is the K nearest neighbor algorithm.
134:43 Now KNN, which basically stands for K nearest neighbor,
134:46 is, again, a supervised classification algorithm
134:49 that classifies a new data point into the target class
134:53 or the output class, depending on the features
134:56 of its neighboring data points.
134:58 That's why it's called K nearest neighbor.
135:01 So let's try to understand KNN with a small analogy.
135:04 Okay, let's say that we want a machine
135:06 to distinguish between the images of cats and dogs.
135:10 So to do this, we must input our data set
135:13 of cat and dog images,
135:14 and we have to train our model to detect the animal
135:18 based on certain features.
135:20 For example, features such as pointy ears
135:23 can be used to identify cats.
135:26 Similarly, we can identify dogs
135:28 based on their long ears.
135:30 So after starting the data set
135:32 during the training phase,
135:33 when a new image is given to the model,
135:35 the KNN algorithm will classify it
135:38 into either cats or dogs,
135:40 depending on the similarity in their features.
135:43 Okay, let's say that a new image has pointy ears,
135:46 it will classify that image as cat,
135:49 because it is similar to the cat images,
135:51 because it's similar to its neighbors.
135:53 In this manner, the KNN algorithm classifies
135:56 the data point based on how similar they are
135:58 to their neighboring data points.
136:01 So this is a small example.
136:02 We'll discuss more about it in the further slides.
136:06 Now let me tell you a couple of
136:08 features of KNN algorithm.
136:10 So, first of all, we know that
136:11 it is a supervised learning algorithm.
136:13 It uses labeled input data set
136:15 to predict the output of the data points.
136:18 Then it is also one of the simplest
136:20 machine learning algorithms,
136:21 and it can be easily implemented
136:23 for a varied set of problems.
136:25 Another feature is that it is non-parametric,
136:29 meaning that it does not take in any assumptions.
136:31 For example, naive Bayes is a parametric model,
136:34 because it assumes that all the independent variables
136:37 are in no way related to each other.
136:40 It has assumptions about the model.
136:42 K nearest neighbor has no such assumptions.
136:45 That's why it's considered a non-parametric model.
136:48 Another feature is that it is a lazy algorithm.
136:51 Now, lazy algorithm basically is any algorithm
136:54 that memorizes the training set,
136:56 instead of learning a discriminative function
137:00 from the training data.
137:02 Now, even though KNN is mainly a classification algorithm,
137:05 it can also be used for regression cases.
137:09 So KNN is actually both a classification
137:12 and a regression algorithm.
137:14 But mostly, you'll see that it'll be used
137:16 on the four classification problems.
137:19 The most important feature about
137:21 a K nearest neighbor is that
137:23 it's based on feature similarity
137:24 with its neighboring data points.
137:27 You'll understand this
137:28 in the example that I'm gonna tell you.
137:30 Now, in this image, we have two classes of data.
137:34 We have class A which is squares
137:36 and class B which are triangles.
137:39 Now the problem statement is to assign
137:41 the new input data point
137:43 to one of the two classes
137:44 by using the KNN algorithm.
137:47 So the first step in the KNN algorithm
137:49 is to define the value of K.
137:52 But what is the K in the KNN algorithm stand for?
137:55 Now the K stands for the number of nearest neighbors,
137:59 and that's why it's got the name K nearest neighbors.
138:02 Now, in this image, I've defined the value of K as three.
138:07 This means that the algorithm
138:08 will consider the three neighbors
138:11 that are closest to the new data point
138:13 in order to decide the class of the new data point.
138:17 So the closest between the data point
138:18 is calculated by using measure
138:20 such as Euclidean distance and Manhattan distance,
138:23 which I'll be explaining in a while.
138:25 So our K is equal to three.
138:27 The neighbors include two squares and one triangle.
138:30 So, if I were to classify the new data point
138:33 based on K equal to three,
138:35 then it should be assigned to class A, correct?
138:37 It should be assigned to squares.
138:40 But what if the K value is set to seven.
138:42 Here I'm basically telling my algorithm
138:44 to look for the seven nearest neighbors
138:46 and classify the new data point
138:49 into the class it is most similar to.
138:52 So our K equal to seven.
138:54 The neighbors include three squares and four triangles.
138:57 So if I were to classify the new data point
139:00 based on K equal to seven,
139:02 then it would be assigned to class B,
139:04 since majority of its neighbors are from class B.
139:08 Now this is where a lot of us get confused.
139:10 So how do we know which K values is the most suitable
139:14 for K nearest neighbor.
139:15 Now there are a couple methods
139:17 used to calculate the K value.
139:19 One of them is known as the elbow method.
139:21 We'll be discussing the elbow method
139:23 in the upcoming slides.
139:26 So for now let me just show you
139:27 the measures that are involved behind KNN.
139:30 Okay, there's very simple math
139:31 behind the K nearest neighbor algorithm.
139:33 So I'll be discussing the Euclidean distance with you.
139:36 Now in this figure, we have to measure the distance
139:40 between P one and P two by using Euclidean distance.
139:43 I'm sure a lot of you
139:44 already know what Euclidean distance is.
139:46 It is something that we learned in eighth or 10th grade.
139:49 I'm not sure.
139:51 So all you're doing is you're extracting X one.
139:53 So the formula is basically x two minus x one
139:57 plus y two minus y one the whole square,
139:59 and the root of that is the Euclidean distance.
140:02 It's as simple as that.
140:04 So Euclidean distance is used
140:06 as a measure to check the closeness of data points.
140:10 So basically, KNN uses the Euclidean distance
140:13 to check the closeness of a new data point
140:16 with its neighbors.
140:17 So guys, it's as simple as that.
140:19 KNN makes use of simple measures
140:21 in order to solve very complex problems.
140:23 Okay, and this is one of the reasons why
140:26 KNN is such a commonly used algorithm.
140:29 Coming to support vector machine.
140:31 Now, this is our last algorithm
140:33 under classification algorithms.
140:35 Now guys, don't get paranoid because of the name.
140:38 Support vector machine actually
140:39 is one of the simplest algorithms in supervised learning.
140:43 Okay, it is basically used to classify data
140:46 into different classes.
140:47 It's a classification algorithm.
140:50 Now unlike most algorithms,
140:52 SVM makes use of something known as a hyperplane
140:56 which acts like a decision boundary
140:58 between the separate classes.
141:00 Okay.
141:01 Now SVM can be used to generate multiple
141:03 separating hyperplane,
141:05 such that the data is divided into segments,
141:08 and each segment contains only one kind of data.
141:12 So, a few features of SVM include that
141:14 it is a supervised learning algorithm,
141:16 meaning that it's going to study a labeled training data.
141:20 Another feature is that it is again
141:22 a regression and a classification algorithm.
141:25 Even though SVM is mainly used for classification,
141:28 there is something known as the support vector regressor.
141:31 That is useful regression problems.
141:33 Now, SVM can also be used to classify non-linear data
141:37 by using kernel tricks.
141:39 Non-linear data is basically data
141:41 that cannot be separated
141:42 by using a single linear line.
141:44 I'll be talking more about this
141:46 in the upcoming slides.
141:48 Now let's move on and discuss how SVM works.
141:51 Now again, in order to make you understand
141:53 how support vector machine works,
141:55 you look at a small scenario.
141:57 For a second, pretend that you own a farm
142:00 and you have a problem.
142:02 You need to set up a fence
142:04 to protect your rabbits from a pack of wolves.
142:07 Okay, now, you need to decide
142:09 where you want to build your fence.
142:11 So one way to solve the problem is by using
142:13 support vector machines.
142:16 So if I do that and if I try to draw a decision boundary
142:20 between the rabbits and the wolves,
142:22 it looks something like this.
142:24 Now you can clearly build a fence along this line.
142:27 So in simple terms, this is exactly how
142:29 your support vector machines work.
142:31 It draws a decision boundary,
142:33 which is nothing but a hyperplane
142:36 between any two classes in order to separate them
142:39 or classify them.
142:40 Now I know that you're thinking how do you know
142:42 where to draw a hyperplane.
142:44 The basic principle behind SVM
142:46 is to draw a hyperplane
142:48 that best separates the two classes.
142:50 In our case, the two classes
142:52 are the rabbits and the wolves.
142:54 Now before we move any further,
142:55 let's discuss the different terminologies
142:58 that are there in support vector machine.
143:00 So that is basically a hyperplane.
143:02 It is a decision boundary that best separates
143:05 the two classes.
143:07 Now, support vectors, what exactly are support vectors.
143:11 So when you start with the support vector machine,
143:13 you start by drawing a random hyperplane.
143:16 And then you check the distance
143:17 between the hyperplane and the closest data point
143:21 from each of the class.
143:24 These closest data points to the hyperplane
143:26 are known as support vectors.
143:28 Now these two data points
143:30 are the closest to your hyperplane.
143:32 So these are known as support vectors,
143:34 and that's where the name comes from,
143:35 support vector machines.
143:37 Now the hyperplane is drawn
143:39 based on these support vectors.
143:43 And optimum hyperplane will be
143:44 the one which has a maximum distance
143:47 from each of the support vectors,
143:49 meaning that the distance between the hyperplane
143:52 and the support vectors has to be maximum.
143:56 So, to sum it up, SVM is used to classify data
143:59 by using a hyperplane,
144:01 such that the distance between the hyperplane
144:03 and the support vector is maximum.
144:06 Now this distance is nothing but the margin.
144:09 Now let's try to solve a problem.
144:12 Let's say that I input a new data point
144:15 and I want to draw a hyperplane
144:17 such that it best separates these two classes.
144:20 So what do I do?
144:21 I start out by drawing a hyperplane,
144:24 and then I check the distance between the hyperplane
144:27 and the support vectors.
144:29 So, basically here, I'm trying to check
144:31 if the margin is maximum for this hyperplane.
144:34 But what if I drew the hyperplane like this?
144:37 The margin for this hyperplane is clearly being more
144:41 than the previous one.
144:42 So this is my optimal hyperplane.
144:45 This is exactly how you understand
144:47 which hyperplane needs to be chosen,
144:49 because you can draw multiple hyperplanes.
144:52 Now, the best hyperplane is the one
144:54 that has a maximum module.
144:57 So, this is my optimal hyperplane.
144:59 Now so far it was quite easy.
145:02 Our data was linearly separable,
145:04 which means that you could draw a straight line
145:06 to separate the two classes.
145:08 But what will you do if the data looks like this?
145:12 You possibly cannot draw a hyperplane like this.
145:18 It doesn't separate the two classes.
145:20 We can clearly see rabbits and wolves
145:22 in both of the classes.
145:24 Now this is exactly where non-linear SVM
145:27 comes into the picture.
145:28 Okay, this is what the kernel trick is all about.
145:31 Now, kernel is basically something that can be used
145:34 to transform data into another dimension
145:37 that has a clear dividing margin between classes of data.
145:41 So, basically the kernel function
145:43 offers the user the option of transforming
145:47 non-linear spaces into linear ones.
145:50 Until this point, if you notice that
145:52 we were plotting our data on two dimensional space.
145:55 We had x and y-axis.
145:57 A simple trick is transforming the two variables,
146:00 x and y, into a new feature space,
146:03 which involves a new variable z.
146:05 So, basically, what we're doing
146:07 is we're visualizing the data
146:08 on a three dimensional space.
146:11 So when you transform the 2D space into a 3D space,
146:15 you can clearly see a dividing margin
146:17 between the two classes of data.
146:19 You can clearly draw a line in the middle
146:21 that separates these two data sets.
146:24 So guys, this sums up the whole idea behind
146:27 support vector machines.
146:29 Support vector machines are very easy to understand.
146:32 Now, this was all for our
146:35 supervised learning algorithms.
146:37 Now, before I move on to unsupervised learning algorithms,
146:41 I'll be running a demo.
146:43 We'll be running a demo
146:45 in order to understand all the classification algorithms
146:47 that we studied so far.
146:49 Earlier in the session, we ran a demo
146:51 for the regression algorithms.
146:52 Now we'll run for the classification algorithms.
146:56 So, enough of theory.
146:58 Let's open up Python,
147:00 and let's start looking at how
147:02 these classification algorithms work.
147:06 Now, here what we'll be doing
147:07 is we'll implement multiple classification algorithms
147:10 by using the scikit-learn.
147:12 Okay, it's one of the most popular
147:13 machine learning tool for Python.
147:16 Now we'll be using a simple data set
147:18 for the task of training a classifier to distinguish
147:22 between the different types of fruits.
147:24 The purpose of this demo is to implement
147:26 multiple classification algorithms
147:29 for the same set of problem.
147:30 So as usual, you start by importing
147:32 all your libraries in Python.
147:34 Again, guys, if you don't know Python,
147:36 check the description box,
147:37 I'll leave a link there.
147:39 You can go through that video as well.
147:41 Next, what we're doing is we're reading the fruit data
147:44 in the form of table.
147:46 You stored it in a variable called fruits.
147:49 Now if you wanna see the first few rows of the data,
147:51 let's print the first few observations in our data set.
148:14 So, this is our data set.
148:17 These are the fruit labels.
148:19 So we have around four fruits in our data set.
148:21 We have apple, we have mandarin,
148:23 orange, and lemon.
148:25 Okay.
148:25 Now, fruit label denotes nothing but the label
148:28 of apple, which is one.
148:30 Mandarin has two.
148:31 Similarly, orange is labeled as three.
148:32 And lemon is labeled as four.
148:35 Then a fruit subtype is basically
148:37 the family of fruit it belongs to.
148:39 Mass is the mass of the fruit,
148:41 width, height, and color score.
148:44 These are all our predictor variables.
148:46 We have to identify the type of fruit,
148:49 depending on these predictor variables.
148:52 So, first, we saw a couple of observations over here.
148:54 Next, if you want to see the shape of your data set,
148:58 this is what it looks like.
148:59 There are around 59 observations with
149:01 seven predictor variables,
149:03 which is one, two, three, four, five, six, and seven.
149:07 We have seven variables in total.
149:09 Sorry, not predictor variables.
149:11 This seven denotes both your predictor
149:13 and your target variable.
149:15 Next, I'm just showing you the four fruits that we have
149:18 in our data set,
149:19 which is apple, mandarin, orange, and lemon.
149:22 Next, I'm just grouping fruits by their names.
149:25 Okay.
149:26 So we have 19 apples in our data set.
149:29 We have 16 lemons.
149:31 We have only five mandarins,
149:33 and we have 19 oranges.
149:36 Even though the number of mandarin samples is low,
149:38 we'll have to work with it,
149:39 because right now I'm just trying to make you
149:42 understand the classification algorithms.
149:45 The main aim for me behind doing these demos
149:48 is so that you understand
149:49 how classification algorithms work.
149:51 Now what you can do is you can also plot a graph
149:54 in order to see the frequency of each of these fruits.
149:58 Okay, I'll show you what the plot looks like.
150:01 The number of apples and oranges is the same.
150:04 We have I think around 19 apples and oranges.
150:07 And similarly, this is the count for lemons.
150:10 Okay.
150:11 So this is a small visualization.
150:13 Guys, visualization is actually very important
150:15 when it comes to machine learning,
150:16 because you can see most of the relations
150:19 and correlations by plotting graphs.
150:22 You can't see those correlations
150:23 by just running code and all of that.
150:26 Only when you plot different variables on your graph,
150:28 you'll understand how they are related.
150:31 One of the main task in machine learning
150:33 is to visualize data.
150:35 It ensures that you understand
150:37 the correlation between data.
150:40 Next, what we're gonna do is we'll graph
150:41 something known as a box plot.
150:43 Okay, a box plot basically helps you understand
150:47 the distribution of your data.
150:50 Let me run the box plot,
150:52 and I'll show you what exactly I mean.
150:56 So this is our box plot.
150:58 So, box plot will basically give you
151:01 a clearer idea of the distribution
151:03 of your input variables.
151:05 It is mainly used in exploratory data analysis,
151:08 and it represents the distribution of the data
151:12 and its variability.
151:13 Now, the box plot contains
151:15 upper quartile and lower quartile.
151:17 So the box plot basically spanned your interquartile range
151:20 or something known as IQR.
151:22 IQR is nothing but your third quartile
151:25 subtracted from your first quartile.
151:28 Now again, this involves statistics and probability.
151:30 So I'll be leaving a link in the description box.
151:32 You can go through that video.
151:34 I've explained statistics probability, IQR,
151:37 range, and all of that in there.
151:39 So, one of the main reasons why box plots are used
151:42 is to detect any sort of outliers in the data.
151:46 Since the box plot spans the IQR,
151:49 it detects the data point
151:51 that lie outside the average range.
151:54 So if you see in the colored space,
151:56 most of the data is distributed around the IQR,
152:00 whereas here the data are not that well distributed.
152:02 Height also is not very well distributed,
152:05 but color space is pretty well distributed.
152:08 This is what the box plot shows you.
152:11 So guys, this involves a lot of math.
152:14 ALl of these, each and every function in machine learning
152:16 involves a lot of math.
152:18 So you know it's necessary to have a good understanding
152:20 of statistics, probability, and all of that.
152:23 Now, next, what we'll do is we'll plot a histogram.
152:26 Histogram will basically show you
152:28 the frequency of occurrence.
152:30 Let me just plot this, and then we'll try and understand.
152:38 So here you can understand a few correlations.
152:41 Okay, some pairs of these attributes are correlated.
152:44 For example, mass and width,
152:46 they're somehow correlated along the same ranges.
152:50 So this suggests a high correlation
152:52 and a predictable relationship.
152:54 Like if you look at the graphs, they're quite similar.
152:57 So for each of the predictor variables,
152:59 I've drawn a histogram.
153:01 For each of that input data, we've drawn a histogram.
153:03 Now guys, again, like i said,
153:05 plotting graphs is very important
153:06 because you understand a lot of correlations
153:09 that you cannot understand
153:10 by just looking at your data,
153:12 or just running operations on your data.
153:15 Repeat, or just running code on your data.
153:17 Okay.
153:18 Now, next, what we're doing here is we're just
153:20 dividing the data set into target and predictor variables.
153:25 So, basically, I've created an array of feature names
153:27 which has your predictor variables.
153:29 It has mass, width, height, color space.
153:32 And you have assigned that as X,
153:34 since this is your input,
153:36 and y is your output which is your fruit label.
153:38 That'll show whether it is an apple,
153:41 orange, lemon, and so on.
153:44 Now, the next step that we'll perform over here
153:46 is pretty evident.
153:48 Again, this is data splicing.
153:50 So data splicing, by now,
153:52 I'm sure all of you know what it is.
153:53 It is splitting your data into training and testing data.
153:58 So that's what we've done over here.
153:59 Next, we're importing something known as the MinMaxScaler.
154:03 Scaling or normalizing your data
154:05 is very important in machine learning.
154:07 Now, I'm seeing this because your raw data
154:10 can be very biased.
154:12 So it's very important to normalize your data.
154:15 Now when I say normalize your data,
154:18 so if you look at the value of mass
154:21 and if you look at the value of height and color,
154:25 you see that mass is ranging in hundreds and double digits,
154:30 whereas height is in single digit,
154:32 and color score is not even in single digits.
154:34 So, if some of your variables have a very high range,
154:38 you know they have a very high scale,
154:40 like they're in two digits or three digits,
154:42 whereas other variables are single digits and lesser,
154:45 then your output is going to be very biased.
154:48 It's obvious that it's gonna be very biased.
154:50 That's why you have to scale your data
154:52 in such a way that all of these values
154:54 will have a similar range.
154:57 So that's exactly what the scaler function does.
155:01 Okay.
155:01 Now since we have already divided our data
155:04 into training and testing data,
155:06 our next step is to build the model.
155:09 So, first, we're gonna be using
155:10 the logistic regression algorithm.
155:12 I've already discussed logistic regression with you all.
155:15 It's a classification algorithm,
155:17 which is basically used to predict the outcome
155:19 of a categorical variable.
155:22 So we already have the logistic regression class in Python.
155:25 All you have to do is you have to
155:27 give an instance for this function,
155:29 which is logreg over here.
155:31 And I'm fitting this instance with a training data set,
155:34 meaning that I'm running the algorithm
155:36 with the training data set.
155:38 Once you do that, you can calculate
155:40 the accuracy by using this function.
155:44 So here I'm calculate the accuracy
155:46 on the training data set
155:47 and on the testing data set.
155:50 Okay, so let's look at the output of this.
155:53 Now guys, ignore this future warning.
155:57 Warnings are ignored in Python.
156:00 Now, accuracy of the logistic regression classifier
156:03 on the training data set is around 70%.
156:06 It was pretty good on the training data set.
156:08 But when it comes to classifying on the test data set,
156:11 it's only 40%,
156:13 which is not that good for a classifier.
156:16 Now again, this can depend on the problem statement,
156:19 for which problem statement
156:20 is logistic regression more suitable.
156:24 Next, we'll do the same thing using the decision tree.
156:26 So again, we just call the decision tree function,
156:29 and we'll fit it with the training data set,
156:32 and we'll calculate the accuracy
156:33 of the decision tree on the training,
156:35 and the testing data set.
156:37 So if you do that for a decision tree
156:39 on the training data set,
156:41 you get 100% accuracy.
156:44 But on the testing data set,
156:46 you have around 87% of accuracy.
156:49 This is something that I discussed with you all earlier,
156:52 that this is decision trees are very good
156:54 with training data set,
156:55 because of a process known as overfitting.
156:59 But when it comes to classifying
157:01 the outcome on the testing data set,
157:04 the accuracy reduces.
157:06 Now, this is very good compared to logistic regression.
157:09 For this problem statement, decision trees
157:11 works better that logistic regression.
157:14 Coming to KNN classifier.
157:15 Again, all you have to do is you have to
157:17 call the K neighbor classifier, this function.
157:21 And you have to fit this with the training data set.
157:25 If you calculate the accuracy for a KNN classifier,
157:28 we get a good accuracy actually.
157:31 On the training data set,
157:32 we get an accuracy of 95%.
157:35 And on the testing data set, it's 100%.
157:38 That is really good, because our testing data set
157:41 actually achieved more of an accuracy
157:42 than on a training data set.
157:45 Now all of this depends on the value of K
157:47 that you've chosen for KNN.
157:49 Now, I mentioned that you use the elbow method
157:53 to choose the K value in the K nearest neighbor.
157:55 I'll be discussing the elbow method in the next section.
157:59 So, don't worry if you haven't understood that yet.
158:03 Now, we're also using a naive Bayes classifier.
158:05 Here we're using a Gaussian naive Bayes classifier.
158:09 Gaussian is basically a type of naive Bayes classifier.
158:12 I'm not going to go into depth of this,
158:14 because it'll just extend our session too much more longer.
158:18 Okay.
158:19 And if you want to know more about this,
158:21 I'll leave a link in the description box.
158:22 You can read all about the caution naive Bayes classifier.
158:26 Now, the math behind this is the same.
158:28 It uses naive Bayes, it uses the Bayes Theorem itself.
158:32 Now again, we're gonna call this class,
158:34 and then we're going to run our data,
158:36 training data on it.
158:38 So using the naive Bayes classifier,
158:40 we're getting an accuracy of 0.86
158:43 on the training data set.
158:45 And on the testing data set, we're getting 67% accuracy.
158:50 Okay.
158:50 Now let's do the same thing with support vector machines.
158:53 Importing the support vector classifier.
158:57 And we are fitting the training data into the algorithm.
159:01 We're getting an accuracy of around 61%
159:05 on the training data set and 33% on the testing data set.
159:09 Now guys, this accuracy and all
159:11 depends also on the problem statement.
159:13 It depends on the type of data
159:15 that support vector machines get.
159:17 Usually, SVM is very good on large data sets.
159:20 Now since we have a very small data set over here,
159:22 it's sort of obvious by the accuracy, so less.
159:26 So guys, these were a couple of classification algorithms
159:30 that I showed you here.
159:31 Now, because our KNN classifier
159:34 classified our data set more accurately
159:37 we'll look at the predictions that the KNN classifier mean.
159:40 Okay
159:41 Now we're storing all our predicted values
159:44 in the predict variable.
159:46 now in order to show you the accuracy
159:48 of the KNN model,
159:49 we're going to us something known as the confusion matrix.
159:53 So, a confusion matrix is a table
159:55 that is often used to describe
159:58 the performance of a classification model.
160:01 So, confusion matrix actually
160:03 represents a tabular representation
160:05 of actual versus predicted values.
160:08 So when you draw a confusion matrix
160:10 on the actual versus predicted values
160:12 for the KNN classifier,
160:14 this is what the confusion matrix looks like.
160:18 Now, we have four rows over here.
160:21 If you see, we have four rows.
160:23 The first row represents apples,
160:25 second, mandarin, third represents lemons,
160:28 and fourth, oranges.
160:31 So this four value corresponds to zero comma zero,
160:35 meaning that it was correctly able to classify
160:38 all the four apples.
160:40 Okay.
160:41 This one value represents one comma one,
160:44 meaning that our classifier correctly classified
160:47 this as mandarins.
160:49 This matrix is drawn on actual values
160:51 versus predicted values.
160:53 Now, if you look at the summary of the confusion matrix,
160:55 we'll get something known as precision recall,
160:58 f1-score and support.
161:00 Precision is basically the ratio
161:02 of the correctly predicted positive observations
161:05 to the total predicted positive observations.
161:08 So the correctly predicted positive observations are four,
161:12 and there are total of four apples
161:14 in the testing data set.
161:16 So that's where I get a precision of one.
161:18 Okay.
161:19 Recall on the other hand
161:20 is the ratio of correctly predicted positive observations
161:24 to all the observations in the class.
161:26 Again, we've correctly classified four apples,
161:29 and there are a total of four apples.
161:32 F1-score is nothing but the weighted average
161:34 of your precision and your recall.
161:37 Okay, and your support basically denotes
161:39 the number of data points
161:41 that were correctly classified.
161:43 So, in our KNN algorithm, since we got 100% accuracy,
161:47 all our data points were correctly classified.
161:50 So, 15 out of 15 were correctly classified
161:52 because we have 100% accuracy.
161:55 So that's how you read a confusion matrix.
161:58 Okay, you have four important measures,
162:00 precision, recall, f1-score, and support.
162:04 F1-score is just the ratio or the weighted average
162:07 of your precision and your recall.
162:09 So precision is basically the correctly predicted
162:12 positive observations to the total predicted
162:14 positive observations.
162:16 Recall is a ratio of the predicted
162:19 positive observations to all your observations.
162:24 So guys, that was it for the demo
162:26 of classification algorithms,
162:28 we discuss regression algorithms
162:30 and we discussed classification algorithms.
162:33 Now it's time to talk about unsupervised
162:34 learning algorithms.
162:36 Under unsupervised learning algorithms
162:38 may try to solve clustering problems.
162:40 And the most important clustering algorithm there is,
162:43 known as K-means clustering.
162:45 So we're going to discuss the K-means algorithm,
162:47 and also show you a demo where we'll be executing
162:50 the clustering algorithm,
162:51 and you're seeing how it implemented to solve a problem.
162:55 Now, the main aim of the K-means algorithm
162:58 is to group similar elements or data points in to a cluster.
163:02 So it is basically the process by which
163:04 objects are classified
163:05 interest a predefined number of groups,
163:08 so that they are much dissimilar as possible
163:10 from one group to another group,
163:12 but as much similar as possible within each group.
163:16 Now what I mean is let's say you're trying to cluster
163:19 this population into four different groups,
163:22 such that each group has people within
163:25 a specified range of age.
163:27 Let's say group one is of people between the age 18 and 22.
163:32 Similarly, group two is between 23 and 35.
163:36 Group three is 36 and 39 or something like that.
163:40 So let's say you're trying to cluster
163:41 people into different groups based on their age.
163:44 So for such problems, you can make use
163:46 of the K-means clustering algorithm.
163:48 One of the major applications of the clustering algorithm
163:51 is seen in targeted marketing.
163:53 I don't know how many of you are aware
163:55 of targeted marketing.
163:56 Targeted marketing is all about marketing a specific product
164:00 to a specific audience.
164:03 Let's say you're trying to sell fancy clothes
164:05 or a fancy set of bags and all of that.
164:08 And the perfect audience for such product
164:10 would be teenagers.
164:11 It would be people around the age of 16 to 21 or 18.
164:16 So that is what target marketing is all about.
164:19 Your product is marketed to a specific audience
164:22 that might be interested in it.
164:24 That is what targeted marketing is.
164:26 So K means clustering is use majorly in targeted marketing.
164:30 A lot of eCommerce websites like Amazon, Flipkart, eBay.
164:33 All of these make use of clustering algorithms
164:36 in order to target the right audience.
164:38 Now let's see how the K-means clustering works.
164:41 Now the K in K-means denotes the number of clusters.
164:45 Let's say I give you a data set containing 20 points,
164:49 and you want to cluster this data set into four clusters.
164:53 That means your K will be equal to four.
164:56 So K basically stands for the number of clusters
164:59 in your data set,
165:00 or the number of clusters you want to form.
165:02 You start by defining the number K.
165:04 Now for each of these clusters,
165:06 you're going to choose a centroid.
165:08 So for every cluster,
165:09 there are four cluster in our data set.
165:11 For each of these clusters,
165:12 you'll randomly select one of the data points
165:14 as a centroid.
165:16 Now what you'll do is you'll start
165:17 computing the distance from that centroid
165:20 to every other point in that cluster.
165:23 As you keep computing the centroid
165:25 and the distance between the centroid
165:27 and other data points in that cluster,
165:29 your centroid keep shifting,
165:31 because you're trying to get to the average of that cluster.
165:34 Whenever you're trying to get to the average of the cluster,
165:37 the centroid keeps shifting,
165:38 because the centroid keeps converging and it keeps shifting.
165:42 Let's try to understand how K-means works.
165:45 Let's say that this data set, this is given to us.
165:48 Let's say if you're given random points like these
165:51 and you're asked to us K-means algorithm on this.
165:54 So your first step will be
165:55 to decide the number of clusters you want to create.
165:58 So let's say I wanna create three different clusters.
166:00 So my K value will be equal to three.
166:03 The next step will be to provide
166:04 centroids of all the clusters.
166:07 What you'll do is initially you'll randomly pick
166:10 three data points as your centroids
166:12 for your three different clusters.
166:14 So basically, this red denotes the centroid for one cluster.
166:18 Blue denotes a centroid for another cluster.
166:20 And this green dot denotes the centroid
166:22 for another cluster.
166:23 Now what happens in K-means,
166:26 the algorithm will calculate
166:27 the Euclidean distance of the points from each centroid
166:31 and assign the points to the closest cluster.
166:34 Now since we had three centroids here,
166:36 now what you're gonna do is
166:37 you're going to calculate the distance
166:39 from each and every data point
166:41 to all the centroids,
166:42 and you're going to check which data point
166:44 is closest to which centroid.
166:46 So let's say your data point A
166:48 is closest to the blue centroid.
166:50 So you're going to assign the data point A
166:52 to the blue cluster.
166:54 So based on the distance
166:55 between the centroid and the cluster,
166:57 you're going to form three different clusters.
167:00 Now again, you're going to calculate the centroid
167:02 and you're going to form a new cluster
167:04 which is from better clusters,
167:06 because you're recomputing all those centroids.
167:09 Basically, your centroids represent
167:11 the mean of each of your cluster.
167:14 So you need to make sure
167:15 that your mean is actually the centroid of each cluster.
167:19 So you'll keep recomputing this centroids
167:22 until the position of your centroid does not change.
167:24 That means that your centroid is actually the main
167:28 or the average of that particular cluster.
167:30 So that's how K-means works.
167:32 It's very simple.
167:33 All you have to do is you have to start
167:35 by defining the K value.
167:37 After that, you have to randomly
167:38 pick the number of case centroids.
167:40 Then you're going to calculate the average distance
167:42 of each of the data points from the centroids,
167:45 and you're going to assign a data point
167:47 to the centroid it is closest to.
167:49 That's how K-means works.
167:51 It's a very simple process.
167:53 All you have to do is us have to keep iterating,
167:55 and you have to recompute the centroid value
167:58 until the centroid value does not change,
168:01 until you get a constant centroid value.
168:03 Now guys, again, in K-means,
168:05 you make use of distance measures like Euclidean.
168:08 I've already discussed what Euclidean is all about.
168:11 So, to summarize how K-means works,
168:13 you start by picking the number of clusters.
168:16 Then you pick a centroid.
168:18 After that, you calculate the distance
168:19 of the objects to the centroid.
168:22 Then you group the data points into specific clusters
168:24 based on their distance.
168:26 You have to keep computing the centroid
168:29 until each data point is assigned to the closest cluster,
168:32 so that's how K-means works.
168:35 Now let's look at the elbow method.
168:37 The elbow method is basically used in order to find out
168:40 the most optimum k value for a particular problem.
168:43 So the elbow method is quite simple actually.
168:46 You start off by computing the sum of squared errors
168:50 for some values of K.
168:52 Now sum of squared error is basically
168:55 the sum of the squared distance
168:57 between each member of the cluster and its centroid.
169:01 So you basically calculate the sum of squared errors
169:04 for different values of K.
169:05 For example, you can consider K value
169:07 as two, four, six, eight, 10, 12.
169:10 Consider all these values,
169:12 compute the sum of squared errors for each of these values.
169:15 Now if you plot your K value
169:17 against your sum of squared errors,
169:20 you will see that the error decreases as K gets larger.
169:24 This is because the number of clusters increase.
169:27 If the number of clusters increases,
169:29 it means that the distortion gets smaller.
169:32 The distortion keeps decreasing
169:34 as the number of clusters increase.
169:36 That's because the more clusters you have,
169:38 the closer each centroid will be with its data points.
169:42 So as you keep increasing the number of clusters,
169:44 your distortion will also decrease.
169:46 So the idea of the elbow method is to choose the K
169:50 at which the distortion decreases abruptly.
169:53 So if you look at this graph at K equal to four,
169:56 the distortion is abruptly decreasing.
169:58 So this is how you find the value of K.
170:01 When your distortion drops abruptly,
170:03 that is the most optimal K value
170:05 you should be choosing for your problem statement.
170:08 So let me repeat the idea behind the elbow method.
170:11 You're just going to graph the number of clusters you have
170:15 versus the squared sum of errors.
170:17 This graph will basically give you the distortion.
170:20 Now the distortion obviously going to decrease
170:22 if you increase the number of clusters,
170:25 and there is gonna be one point in this graph
170:28 wherein the distortion decreases very abruptly.
170:31 Now for that point, you need to find out the value of K,
170:34 and that'll be your most optimal K value.
170:37 That's how you choose your K-means K value
170:39 and your KNN K value as well.
170:42 So guys, this is how the elbow method is.
170:44 It's very simple and it can be easily implemented.
170:47 Now we're gonna look at a small demo
170:49 which involves K-means.
170:51 This is actually a very interesting demo.
170:53 Now guys, one interesting application
170:55 of clustering is in color compression with images.
170:58 For example, imagine you have an image
171:01 with millions of colors in it.
171:03 In most images, a large number of colors will be unused,
171:06 and many of the pixels in the image
171:08 will have similar or even identical colors.
171:11 Now having too many colors in your image
171:13 makes it very hard for image processing an image analysis.
171:16 So this is one area where K-means is applied very often.
171:20 It's applied in image segmentation, image analysis,
171:23 image compression, and so on.
171:26 So what we're gonna do in this demo
171:27 is we are going to use an image
171:29 from the scikit-learn data set.
171:31 Okay, it is a prebuilt image,
171:33 and you will require to install the pillow package for this.
171:37 We're going to use an image
171:38 form the scikit-learn data set module.
171:40 So we'll begin by importing the libraries as usual,
171:44 and we'll be loading our image as china.
171:47 The image is china.jpg,
171:48 and we'll be loading this in a variable called china.
171:52 So if you wanna look at the shape of our image,
171:54 you can run this command.
171:56 So we're gonna get a three-dimensional value.
171:59 So we're getting 427 comma 640 comma three.
172:03 Now this is basically a three dimensional array
172:06 of size, height, width, and RGB.
172:08 It contains red, blue, green contributions,
172:11 as integers from zero to 255.
172:13 So, your pixel values range between zero and 255,
172:17 and I think zero stands for your black,
172:20 and 255 represents white if I'm not wrong.
172:23 And basically, that's what this array shape denotes.
172:27 Now one way we can view this set of pixels
172:30 is as a cloud of points in a three dimensional color space.
172:34 So what we'll do is we will reshape the data
172:36 and rescale the color,
172:37 so that they lie between zero and one.
172:40 So the output of this will be a two dimensional array now.
172:43 So basically, we can visualize these pixels
172:46 in this color space.
172:47 Now what we're gonna do is we're gonna try
172:49 and plot our pixels.
172:50 We have a really huge data set
172:53 which contains around 16 million possible colors.
172:56 So this denotes a very, very large data set.
172:59 So, let me show you what it looks like.
173:01 We have red against green and red against blue.
173:05 These are our RGB value,
173:06 and we can have around 16 million possible
173:09 combination of colors.
173:10 The data set is way too large or us to compute.
173:13 So what we'll do is we will reduce these 16 million colors
173:16 to just 16 colors.
173:18 We can do that by using K-means clustering,
173:21 because we can cluster similar colors into similar groups.
173:25 So this is exactly where we'll be importing K-means.
173:28 Now, one thing to note here is
173:30 because we're dealing with a very large data set,
173:33 we will use the MinibatchKMeans.
173:35 This operates on subsets of the data
173:37 to compute the results more quickly and more accurately,
173:41 just like the K-means algorithm,
173:43 because I told you this data set is really huge.
173:46 Even though this is a single image,
173:47 the number of pixel combinations can come up to 16 million,
173:51 which is a lot.
173:52 Now each pixel is considered as a data point
173:55 when you've taken image into consideration.
173:57 When you have data points and data values,
174:00 that's different.
174:01 When you're starting an image for image classification
174:03 or image segmentation,
174:05 each and every pixel is considered.
174:07 So, basically, you're building matrices
174:09 of all of these pixel values.
174:11 So having 16 million pixels is a very huge data set.
174:15 So, for that reason, we'll be using the MinibatchKMeans.
174:19 It's very similar to K-means.
174:20 The only difference is that it'll operate
174:22 on subsets of the data.
174:24 Because the data set is too huge, it'll operate on subsets.
174:27 So, basically, we're making use of K-means
174:29 in order to cluster these 16 million
174:32 color combinations into just 16 colors.
174:35 So basically, we're gonna form 16 clusters
174:38 in this data set.
174:39 Now, the result is the recoloring of the original pixel
174:43 where every pixel is assigned the color
174:46 of its closest cluster center.
174:48 Let's say that there are a couple of colors
174:50 which are very close to green.
174:52 So we're going to cluster all of these similar colors
174:55 into one cluster.
174:56 We'll keep doing this until we get 16 clusters.
175:00 So, obviously, to do this, we'll be using
175:01 the clustering method, K-means.
175:03 Let me show you what the output looks like.
175:06 So, basically, this was the original image
175:08 from the scikit data set,
175:10 and this is the 16-color segmented image.
175:14 Basically, we have only 16 colors here.
175:16 Here we can have around 16 million colors.
175:19 Here there are only 16 colors.
175:20 If you can't also, you can only see particular colors.
175:23 Now obviously there's a lot of distortion over here,
175:25 but this is how you study an image.
175:27 Remove all the extra contrast that is there in an image.
175:30 You try to reduce the pixel
175:33 to a smaller set of data as possible.
175:35 The more varied pixels you have,
175:37 the harder it is going to be for you
175:39 to study the image for analysis.
175:41 Now, obviously, there are some details
175:43 which are lost in this.
175:44 But overall, the image is still recognizable.
175:47 So here, basically, we've compressed this
175:49 with a compression factor of around one million,
175:52 because each cluster will have around
175:53 one million data points in it,
175:55 or pixel values in it, or pixels in it.
175:57 Now this is an interesting application of K-means.
176:00 There are actually better ways
176:02 you can compress information on image.
176:04 So, basically, I showed you this example
176:06 because I want you to understand
176:07 the power of K-means algorithm.
176:09 You can cluster a data set that is this huge
176:12 into just 16 colors.
176:14 Initially, there were 16 million,
176:15 and now you can cluster it to 16 colors.
176:18 So guys, K-means plays a very huge role
176:20 in computer vision image processing,
176:22 object detection, and so on.
176:24 It's a very important algorithm
176:26 when it comes to detecting objects.
176:28 So in self-driving cars and all
176:30 can make use of such algorithms.
176:33 So guys, that was all about unsupervised learning
176:36 and supervised learning.
176:37 Now it's the last type of machine learning,
176:39 which is reinforcement learning.
176:41 Now this is actually
176:42 a very interesting part of machine learning,
176:44 and it is quite difference from supervised and unsupervised.
176:48 So we'll be discussing all the concepts
176:50 that are involved in reinforcement learning.
176:52 And also reinforcement learning
176:54 is a little more advanced.
176:56 When I say advanced, I mean that it's been used
176:58 in applications such as self-driving cars
177:01 and is also a part of
177:02 a lot of deep learning applications,
177:04 such as AlphaGo and so on.
177:06 So, reinforcement learning
177:07 has a different concept to it itself.
177:10 So we'll be discussing all the concepts under it.
177:13 So just to brush up your information
177:16 about reinforcement learning,
177:17 reinforcement learning is a part of machine learning
177:20 where an agent is put in an unknown environment,
177:24 and he learns how to behave in this environment
177:26 by performing certain actions and observing the rewards
177:30 which it gets from these actions.
177:32 Reinforcement learning is all about taking
177:34 an appropriate action in order to maximize the reward
177:37 in a particular situation.
177:39 Now let's understand reinforcement learning
177:41 with an analogy.
177:42 Let's consider a scenario
177:44 wherein a baby is learning how to walk.
177:46 This scenario can go about in two different ways.
177:49 The first is baby starts walking
177:51 and it makes it to the candy.
177:53 And since the candy is the end goal,
177:56 the baby is very happy and it's positive.
177:59 Meaning, the baby is happy
178:00 and it received a positive reward.
178:02 Now, the second way this can go in
178:04 is that the baby starts walking,
178:07 but it falls due to some hurdle between.
178:10 That's really cute.
178:11 So the baby gets hurt and it doesn't get to the candy.
178:14 It's negative because the baby is sad
178:16 and it receives a negative reward.
178:19 So just like how we humans learn from our mistakes
178:21 by trial and error,
178:23 reinforcement learning is also similar.
178:26 Here we have an agent,
178:27 and in this case, the agent is the baby,
178:29 and the reward is the candy
178:31 with many hurdles in between.
178:33 The agent is supposed to find
178:34 the best possible path to reach the reward.
178:37 That is the main goal of reinforcement learning.
178:39 Now the reinforcement learning process
178:42 has two important components.
178:44 It has something known as an agent
178:46 and something known as an environment.
178:48 Now the environment is
178:49 the setting that the agent is acting on,
178:52 and the agent represents the
178:54 reinforcement learning algorithm.
178:56 The whole reinforcement learning is basically the agent.
179:00 The environment is the setting
179:02 in which you place the agent,
179:03 and it is the setting
179:05 wherein the agent takes various action.
179:07 The reinforcement learning process
179:09 starts when the environment sends a state to the agent.
179:13 Now the agent, based on the observations it makes,
179:16 it takes an action in response to that state.
179:19 Now, in turn, the environment will send the next state
179:23 and the respective reward back to the agent.
179:25 Now the agent will update its knowledge
179:27 with the reward returned by the environment
179:29 to evaluate its last actions.
179:32 The loop continues until the environment
179:34 sends a terminal state
179:35 which means that the agent has accomplished
179:37 all of its task.
179:38 To understand this better,
179:40 let's suppose that our agent is playing Counter Strike.
179:44 The reinforcement learning process
179:46 can be broken down into a couple of steps.
179:48 The first step is the reinforcement learning agent,
179:51 which is basically the player,
179:53 he collects a state, S naught, from the environment.
179:56 So whenever you're playing Counter Strike,
179:58 you start off with stage zero or stage one.
180:01 You start off from the first level.
180:03 Now based on this state, S naught,
180:06 the reinforcement learning agent
180:07 will take an action, A naught.
180:10 So guys, action can be anything that causes a result.
180:13 Now if the agent moves left or right in the game,
180:16 that is also considered as an action.
180:18 So initially, the action will be random,
180:19 because the agent has no clue about the environment.
180:22 Let's suppose that you're playing Counter Strike
180:24 for the first time.
180:25 You have no idea about how to play it,
180:27 so you'll just start randomly.
180:29 You'll just go with whatever,
180:30 whichever action you think is right.
180:32 Now the environment is now in a stage one.
180:35 After passing stage zero,
180:37 the environment will go into stage one.
180:39 Once the environment updates the stage to stage on,
180:42 the reinforcement learning agent
180:44 will get a reward R one from the environment.
180:47 This reward can be anything like additional points
180:50 or you'll get additional weapons
180:51 when you're playing Counter Strike.
180:53 Now this reinforcement learning loop will go on
180:56 until the agent is dead or reaches the destination,
181:00 and it continuously outputs
181:01 a sequence of state action and rewards.
181:04 This exactly how reinforcement learning works.
181:06 It starts with the agent being put in an environment,
181:10 and the agent will randomly take
181:11 some action in state zero.
181:13 After taking an action, depending on his action,
181:16 he'll either get a reward
181:17 and move on to state number one,
181:19 or he will either die and go back to the same state.
181:22 So this will keep happening
181:24 until the agent reaches the last stage,
181:26 or he dies or reaches his destination.
181:30 That's exactly how reinforcement learning works.
181:32 Now reinforcement learning is the logic
181:34 behind a lot of games these days.
181:36 It's being implemented in various games, such as Dota.
181:39 A lot of you who play Dota might know this.
181:41 Now let's talk about a couple of
181:43 reinforcement learning definitions or terminologies.
181:46 So, first, we have something known as the agent.
181:48 Like I mentioned, an agent
181:50 is the reinforcement learning algorithm
181:52 that learns from trial and error.
181:54 An agent is the one that takes actions like,
181:57 for example, a solider in Counter Strike
181:59 navigating through the game,
182:01 going right, left, and all of that.
182:02 Is the agent taking some action?
182:04 The environment is because the world
182:07 through which the agent moves.
182:09 Now the environment, basically,
182:10 takes the agent's current state and action as input,
182:14 and returns the agent's reward
182:16 and its next state as the output.
182:19 Next, we have something known as action.
182:21 All the possible steps that an agent can take
182:23 is considered as an action.
182:25 Next, we have something known as state.
182:27 Now the current condition returned by the environment
182:30 is known as a state.
182:31 Reward is an instant return from the environment
182:35 to apprise the last action
182:37 of the reinforcement learning agent.
182:39 All of these terms are pretty understandable.
182:41 Next, we have something known as policy.
182:43 Now, policy is the approach that the agent uses
182:46 to determine the next action
182:48 based on the current state.
182:50 Policy is basically the approach
182:52 with which you go around in the environment.
182:55 Next, we have something known as value.
182:57 Now, the expected long-term return with a discount,
183:00 as opposed to the short-term rewards R,
183:03 is known as value.
183:04 Now, terms like discount and value,
183:06 I'll be discussing in the upcoming slides.
183:08 Action-value is also very similar to the value,
183:11 except it takes an extra parameter
183:12 known as the current action.
183:14 Don't worry about action and Q value.
183:16 We'll talk about all of this in the upcoming slides.
183:19 So make yourself familiar with these terms,
183:21 because we'll be seeing a whole lot of them this session.
183:24 So, before we move any further,
183:26 let's discuss a couple of more
183:28 reinforcement learning concepts.
183:30 Now we have something known as the reward maximization.
183:34 So if you haven't realized it already,
183:36 the basic aim of reinforcement learning agent
183:39 is to maximize the report.
183:41 How does this happen?
183:42 Let's try to understand this in a little more detail.
183:46 So, basically the agent works based on the theory
183:49 of reward maximization.
183:50 Now that's exactly why the agent must be trained
183:53 in such a way that he takes the best action,
183:56 so that the reward is maximal.
183:58 Now let me explain a reward maximization
184:00 with a small example.
184:02 Now in this figure, you can see there is a fox,
184:04 there is some meat, and there is a tiger.
184:07 Our reinforcement learning agent is the fox.
184:10 His end goal is to eat the maximum amount of meat
184:13 before being eaten by the tiger.
184:15 Now because the fox is a very clever guy,
184:17 he eats the meat that is closer to him,
184:21 rather than the meat which is close to the tiger,
184:24 because the closer he gets to the tiger,
184:26 the higher are his chances of getting killed.
184:28 That's pretty obvious.
184:30 Even if the reward near the tiger are bigger meat chunks,
184:33 that'll be discounted.
184:34 This is exactly what discount is.
184:36 We just discussed it in the previous slide.
184:38 This is done because of the uncertainty factor
184:41 that the tiger might actually kill the fox.
184:44 Now the next thing to understand is how discounting
184:47 of a reward works.
184:48 Now, in order to understand discounting,
184:51 we define a discount rate called gamma.
184:54 The value of gamma is between zero and one.
184:58 And the smaller the gamma,
184:59 the larger the discount and so on.
185:02 Now don't worry about these concepts,
185:04 gamma and all of that.
185:05 We'll be seeing that in our practical demo today.
185:07 So let's move on and discuss another concept
185:10 known as exploration and exploitation trade-off.
185:13 Now guys, before that, I hope all of you understood
185:15 reward maximization.
185:17 Basically, the main aim behind reinforcement learning
185:20 is to maximize the rewards that an agent can get.
185:23 Now, one of the most important concepts
185:25 in reinforcement learning is
185:27 the exploration and exploitation trade-off.
185:29 Now, exploration, like the name suggests,
185:32 it's about exploring and capturing
185:34 more information about an environment.
185:37 On the other hand, exploitation is about
185:39 using the already known exploited information
185:42 to heighten your reward.
185:44 Now consider the same example
185:45 that we saw previously.
185:47 So here the fox eats only the meat chunks
185:50 which are close to him.
185:51 He doesn't eat the bigger meat chunks
185:53 which are at the top,
185:54 even though the bigger meat chunks
185:56 would get him more reward.
185:58 So if the fox only focuses on the closest reward,
186:01 he will never reach the big chunks of meat.
186:03 This process is known as exploitation.
186:06 But if the fox decide to explore a bit,
186:09 it can find the bigger reward,
186:11 which is the big chunk of meat.
186:13 This is known as exploration.
186:15 So this is the difference between exploitation
186:17 and exploration.
186:18 It's always best if the agent explores the environment,
186:21 tries to figure out a way in which we can get
186:24 the maximum number of rewards.
186:26 Now let's discuss another important concept
186:29 in reinforcement learning,
186:30 which is known as the Markov's decision process.
186:33 Basically, the mathematical approach for mapping
186:36 a solution in reinforcement learning
186:38 is called Markov's decision process.
186:40 It's the mathematics behind reinforcement learning.
186:43 Now, in a way, the purpose of reinforcement learning
186:46 is to solve a Markov's decision process.
186:49 Now in order to get a solution,
186:51 there are a set of parameters
186:53 in a Markov's decision process.
186:54 There's a set of actions A,
186:56 there's a set of states S,
186:58 a reward R, policy pi, and value V.
187:01 Also, this image represents
187:03 how a reinforcement learning works.
187:06 There's an agent.
187:07 The agent take some action on the environment.
187:10 The environment, in turn, will reward the agent,
187:13 and it will give him the next state.
187:15 That's how reinforcement learning works
187:17 so to sum everything up,
187:18 what happens in Markov's decision process
187:20 and reinforcement learning is
187:22 the agent has to take an action A
187:24 to transition from the start state
187:26 to the end state S.
187:28 While doing so, the agent will receive some reward R
187:32 for each action he takes.
187:34 Now the series of action that are taken by the agent
187:37 define the policy and the rewards collected
187:39 to find the value.
187:41 The main goal here is to maximize the rewards
187:44 by choosing the optimum policy.
187:46 So you're gonna choose the best possible approach
187:49 in order to maximize the rewards.
187:52 That's the main aim of Markov's decision process.
187:55 To understand Markov's decision process,
187:57 let's look at a small example.
187:59 I'm sure all of you already know
188:01 about the shortest path problem.
188:03 We all had such problems and concepts in math
188:07 to find the shortest path.
188:08 Now consider this representation over here, this figure.
188:12 Here, our goal is to find the shortest path
188:15 between two nodes.
188:15 Let's say we're trying to find
188:17 the shortest path between node A and node D.
188:20 Now each edge, as you can see,
188:22 has a number linked with it.
188:24 This number denotes the cost to traverse
188:27 through that edge.
188:29 So we need to choose a policy to travel from A to D
188:32 in such a way that our cost is minimum.
188:35 So in this problem, the set of states are
188:37 denoted by the nodes A, B, C, D.
188:40 The action is to traverse from one node to the other.
188:43 For example, if you're going from to A C,
188:45 there is an action.
188:46 C to B is an action.
188:48 B to D is another action.
188:49 The reward is the cost represented by each edge.
188:53 Policy is the path taken to reach the destination.
188:56 So we need to make sure that we choose a policy
188:58 in such a way that our cost is minimal.
189:02 So what you can do is you can start off at node A,
189:04 and you can take baby steps to reach your destination.
189:07 Initially, only the next possible node is visible to you.
189:10 So from A, you can either go to B
189:12 or you can go to C.
189:13 So if you follow the greedy approach
189:15 and take the most optimum step,
189:17 which is choosing A to C,
189:19 instead of choosing A to B to C.
189:21 Now you're at node C
189:22 and you want to traverse to node D.
189:25 Again, you must choose your path very wisely.
189:27 So if you traverse from A to C,
189:29 and C to B, and B to D,
189:32 your cost is the lest.
189:34 But if you traverse from A to C to D,
189:36 your cost will actually increase.
189:39 Now you need to choose a policy
189:40 that will minimize your cost over here.
189:43 So let's say, for example, the agent
189:45 chose A to C to D.
189:47 It came to node C, and then it directly chose D.
189:50 Now the policy followed by our agent in this problem
189:54 is exploitation type,
189:56 because we didn't explore the other notes.
189:58 We just selected three nodes and we traversed through them.
190:01 And the policy we followed is not actually
190:03 an optimal policy.
190:04 We must always explore more
190:06 to find out the optimal policy.
190:08 Even if the other nodes are not giving us any more reward
190:11 or is actually increasing our cost,
190:14 we still have to explore and find out
190:16 if those paths are actually better.
190:18 That policy is actually better.
190:20 The method that we implemented here
190:22 is known as the policy-based learning.
190:25 Now the aim here is to find the best policy
190:27 among all the possible policies.
190:29 So guys, apart from policy-based,
190:31 we also have value-based approach
190:33 and action-based approach.
190:34 Value based emphasizes on maximizing the rewards.
190:38 And in action base, we emphasize
190:40 on each action taken by the agent.
190:43 Now a point to note is that
190:44 all of these learning approaches have a simple end goal.
190:48 The end goal is to effectively guide the agent
190:51 through the environment,
190:52 and acquire the most number of rewards.
190:55 So this was very simple to understand
190:57 Markov's decision process,
190:59 exploitation and exploration trade-off,
191:01 and we also discussed
191:02 the different reinforcement learning definitions.
191:06 I hope all of this was understandable.
191:08 Now let's move on and understand
191:11 an algorithm known as Q-learning algorithm.
191:14 So guys, Q-learning is one of the most important algorithms
191:17 in reinforcement learning.
191:19 And we'll discuss this algorithm
191:20 with the help of a small example.
191:22 We'll study this example,
191:23 and then we'll implement the same example using Python,
191:26 and we'll see how it works.
191:28 So this is how our demonstration looks for now.
191:31 Now the problem statement is to place an agent
191:34 in any one of the rooms numbered
191:36 zero, one, two, three, and four.
191:38 And the goal is for the agent
191:40 to reach outside the building,
191:42 which is room number five.
191:43 So, basically, this zero, one, two, three, four
191:46 represents the building,
191:47 and five represents a room which is outside the building.
191:51 Now all these rooms are connected by those.
191:54 Now these gaps that you see between the rooms
191:56 are basically those,
191:57 and each room is numbered from zero to four.
192:00 The outside of the building
192:01 can be taught of as a big room
192:03 which is room number five.
192:05 Now if you've noticed this diagram,
192:06 the door number one and door number four
192:09 lead directly to room number five.
192:12 From one, you can directly go to five,
192:13 and from four, also, you can directly go to five.
192:16 But if you want to go to five from room number two,
192:19 then you'll first have to go to room number three,
192:21 room number one, and then room number five.
192:24 So these are indirect links.
192:25 Direct links are from room number one and room number four.
192:29 So I hope all of you are clear with the problem statement.
192:32 You're basically going to have
192:33 a reinforcement learning agent,
192:35 and than agent has to traverse through all the rooms
192:37 in such a way that he reaches room number five.
192:40 To solve this problem,
192:41 first, what we'll do is we'll represent the rooms
192:44 on a graph.
192:45 Now each room is denoted as anode,
192:48 and the links that are connecting these nodes are the doors.
192:51 Alright, so we have node one to five,
192:54 and the links between each of these nodes
192:56 represent the doors.
192:58 So, for example, if you look at this graph over here,
193:01 you can see that there is a direct connection
193:03 from one to five,
193:04 meaning that you can directly go from room number one
193:07 to your goal, which is room number five.
193:09 So if you want to go from room number three to five,
193:12 you can either go to room number one,
193:13 and then go to five,
193:15 or you can go from room number three to four,
193:17 and then to five.
193:19 So guys, remember, end goal is to reach room number five.
193:22 Now to set the room number five as the goal state,
193:25 what we'll do is we'll associate a reward value
193:28 to each door.
193:30 The doors that lead immediately to the goal
193:32 will have an instant reward of 100.
193:35 So, basically, one to five will have a reward of hundred,
193:38 and four to five will also have a reward of hundred.
193:41 Now other doors that are not directly
193:43 connected to the target room
193:44 will have a zero reward,
193:46 because they do not directly lead us to that goal.
193:48 So let's say you placed the agent in room number three.
193:51 So to go from room number three to one,
193:54 the agent will get a reward of zero.
193:56 And to go from one to five,
193:57 the agent will get a reward of hundred.
194:00 Now because the doors are two-way,
194:02 the two arrows are assigned to each room.
194:04 You can see an arrow going towards the room
194:06 and one coming from the room.
194:08 So each arrow contains an instant reward
194:10 as shown in this figure.
194:12 Now of course room number five
194:14 will loop back to itself with a reward of hundred,
194:16 and all other direct connections to the goal room
194:19 will carry a reward of hundred.
194:21 Now in Q-learning, the goal is to reach the state
194:24 with the highest reward.
194:25 So that if the agent arrives at the goal,
194:28 it will remain there forever.
194:30 So I hope all of you are clear with this diagram.
194:32 Now, the terminologies in Q-learning
194:35 include two terms, state and action.
194:37 Okay, your room basically represents the state.
194:41 So if you're in state two,
194:42 it basically means that you're in room number two.
194:45 Now the action is basically the moment of the agent
194:48 from one room to the other room.
194:50 Let's say you're going from room number two
194:52 to room number three.
194:53 That is basically an action.
194:55 Now let's consider some more example.
194:58 Let's say you place the agent in room number two
195:00 and he has to get to the goal.
195:02 So your initial state will be state number two
195:04 or room number two.
195:06 Then from room number two, you'll go to room number three,
195:08 which is state three.
195:10 Then from state three, you can either go back to state two
195:13 or go to state one or state four.
195:15 If you go to state four, from there you can directly go to
195:18 your goal room, which is five.
195:20 This is how the agent is going to traverse.
195:23 Now in order to depict the rewards that you're going to get,
195:25 we're going to create a matrix known as the reward matrix.
195:29 Okay, this is represented by R
195:31 or also known as the R matrix.
195:33 Now the minus one in this table represents null values.
195:38 That is basically where there isn't a link
195:40 between the nodes that is represented as minus one.
195:43 Now there is no link between zero and zero.
195:45 That's why it's minus one.
195:47 Now if you look at this diagram,
195:49 there is no direct link from zero to one.
195:51 That's why I've put minus one over here as well.
195:53 But if you look at zero comma four,
195:56 we have a value of zero over here,
195:58 which means that you can traverse from zero to four,
196:01 but your reward is going to be zero,
196:02 because four is not your goal state.
196:05 However, if you look at the matrix,
196:07 look at one comma five.
196:09 In one comma five, we have a reward value of hundred.
196:12 This is because you can directly go
196:14 from room number one to five,
196:16 and five is the end goal.
196:18 That's why we've assigned a reward of hundred.
196:21 Similarly, for four comma five,
196:23 we have a reward of hundred.
196:24 And for five comma five,
196:25 we have a reward of hundred.
196:27 Zeroes basically represent other links,
196:30 but they are zero because they do not lead to the end goal.
196:34 So I hope you all understood the reward matrix.
196:36 It's very simple.
196:37 Now before we move any further,
196:39 we'll be creating another matrix
196:41 known as the equitable Q matrix.
196:43 Now the Q matrix basically represents the memory
196:46 of what the agent has learned through experience.
196:49 The rules of the Q matrix
196:50 will represent the current state of the agent.
196:53 The columns will represent the next possible actions
196:56 leading to the next state,
196:57 and the formula to calculate the Q matrix
197:00 is this formula, right?
197:01 Here we have Q state comma action,
197:04 R state comma action,
197:05 which is nothing but the reward matrix.
197:07 Then we have a parameter known as the Gamma parameter,
197:10 which I'll explain shortly.
197:12 And then we are multiplying this with a maximum
197:14 of Q next state comma all actions.
197:17 Now don't worry if you haven't understood this formula.
197:19 I'll explain this with a small example.
197:21 For now, let's understand what a Gamma parameter is.
197:24 So, basically, the value of Gamma
197:26 will be between zero and one.
197:28 If Gamma is closer to zero,
197:30 it means that the agent will tend to consider
197:32 only immediate rewards.
197:34 Now, if the Gamma is closer to one,
197:36 it means that the agent will consider future rewards
197:39 with greater weight.
197:41 Now what exactly I'm trying to say is
197:43 if Gamma is closer to one,
197:45 then we'll be performing something known as exploitation.
197:48 I hope you all remember what exploitation
197:50 and exploration trade-off is.
197:53 So, if your gamma is closer to zero,
197:55 it means that the agent
197:56 is not going to explore the environment.
197:58 Instead, it'll just choose a couple of states,
198:01 and it'll just traverse through those states.
198:03 But if your gamma parameter is closer to one,
198:06 it means that the agent will traverse
198:08 through all possible states,
198:09 meaning that it'll perform exploration,
198:12 not exploitation.
198:14 So the closer your gamma parameter is to one,
198:17 the more your agent will explore.
198:19 This is exactly what Gamma parameter is.
198:22 If you want to get the best policy,
198:23 it's always practical that you choose a Gamma parameter
198:27 which is closer to one.
198:29 We want the agent to explore the environment
198:32 as much as possible
198:33 so that it can get the best policy and the maximum rewards.
198:37 I hope this is clear.
198:39 Now let me just tell you
198:40 what a Q-learning algorithm is step by step.
198:42 So you begin the Q-learning algorithm
198:44 by setting the Gamma parameter
198:46 and the environment rewards in matrix R.
198:49 Okay, so, first, you'll have set these two values.
198:51 We've already calculated the reward matrix.
198:54 We need to set the Gamma parameter.
198:56 Next, you'll initialize the matrix Q to zero.
198:59 Now why do you do this?
199:01 Now, if you remember, I said that
199:03 Q matrix is basically the memory of the agent.
199:06 Initially, obviously,
199:07 the agent has no memory of the environment.
199:10 It's new to the environment
199:11 and you're placing it randomly anywhere.
199:14 So it has zero memory.
199:15 That's why you initialize the matrix Q to zero.
199:18 After that, you'll select a random initial state,
199:21 and you place your agent in that initial state.
199:24 Then you'll set this initial state as your current state.
199:27 Now from the current state, you'll select some action
199:30 that will lead you to the next state.
199:33 Then you'll basically get the maximum Q value
199:35 for this next state,
199:36 based on all the possible actions that we take.
199:39 Then you'll keep computing the skew value
199:42 until you reach the goals state.
199:44 Now that might be a little bit confusing,
199:45 so let's look at this entire thing with a small example.
199:49 Let's say that first, you're gonna begin
199:51 with setting your Gamma parameter.
199:53 So I'm setting my Gamma parameter to 0.8
199:55 which is pretty close to one.
199:57 This means that our agent will explore the environment
200:00 as much as possible.
200:01 And also, I'm setting the initial state as room one.
200:05 Meaning, I'm in state one or I'm in room one.
200:08 So basically, your agent is going to be in room number one.
200:11 The next step is to initialize the Q matrix as zero matrix.
200:15 So this is a Q matrix.
200:16 You can see that everything is set to zero,
200:19 because the agent has no memory at all.
200:20 He hasn't traversed to any node,
200:22 so he has no memory.
200:24 Now since the agent is in room one
200:26 he can either go to room number three
200:28 or he can go to room number five.
200:31 Let's randomly select room number five.
200:33 So, from room number five,
200:35 you're going to calculate the maximum Q value
200:38 for the next state based on all possible actions.
200:42 So all the possible actions from room number five
200:44 is one, four, and five.
200:47 So, basically, the traversing from Q one comma five,
200:50 that's why I put one comma five over here,
200:52 state comma action.
200:54 Your reward matrix will have R one comma five.
200:57 Now R one comma five is basically hundred.
201:00 That's why I put hundred over here.
201:02 Now your comma parameter is 0.8.
201:05 So, guys, what I'm doing here
201:06 is I'm just substituting the values in this formula.
201:08 So let me just repeat this whole thing.
201:10 Q state comma action.
201:12 So you're in state number one, correct?
201:14 And your action is you're going to room number five.
201:17 So your Q state comma action is one comma five.
201:20 Again, your reward matrix R one comma five is hundred.
201:25 So here's you're gonna put hundred,
201:26 plus your Gamma parameter.
201:28 Your Gamma parameter is 0.8.
201:30 Then you're going to calculate the maximum Q value
201:33 for the next state based on all possible actions.
201:37 So let's look at the next state.
201:39 From room number five, you can go to either one.
201:42 You can go to four or you can go to five.
201:44 So your actions are five comma one, five comma four,
201:47 and five comma five.
201:48 That's exactly what I mentioned over here.
201:51 Q five comma one, Q five comma four, and Q five comma five.
201:55 You're basically putting all the next possible actions
201:58 from state number five.
202:00 From here, you'll calculate the maximum
202:02 Q value that you're getting for each of these.
202:04 Now your Q value is zero,
202:06 because, initially, your Q matrix is set to zero.
202:09 So you're going to get zero for Q five comma one,
202:12 five comma four, and five comma five.
202:14 So that's why you'll get 0.8 and zero,
202:17 and hence your Q one comma five becomes hundred.
202:20 This hundred comes from R one comma five.
202:23 I hope all of you understood this.
202:25 So next, what you'll do is you'll update
202:27 this one comma five value in your Q matrix,
202:31 because you just calculated Q one comma five.
202:34 So I've updated it over here.
202:36 Now for the next episode,
202:37 we'll start with a randomly chosen initial state.
202:41 Again, let's say that we randomly chose state number three.
202:44 Now from room number three,
202:45 you can either go to room number one, two or four.
202:48 Let's randomly select room number one.
202:51 Now, from room number five,
202:52 you'll calculate the maximum Q value
202:55 for the next possible actions.
202:57 So let's calculate the Q formula for this.
203:00 So your Q state comma action becomes three comma one,
203:03 because you're in state number three
203:05 and your action is you're going to room number one.
203:07 So your R three comma one,
203:09 let's see what R three comma one is.
203:11 R three comma one is zero.
203:14 So you're going to put zero over here,
203:15 plus your Gamma parameter, which is 0.8,
203:18 and then you're going to check the next possible actions
203:21 from room number one,
203:22 and you're going to choose the maximum value
203:24 from these two.
203:25 So Q one comma three and Q one comma five
203:28 denote your next possible actions from room number one.
203:32 So Q one comma three is zero,
203:35 but Q one comma five is hundred.
203:37 So we just calculated this hundred in the previous step.
203:41 So, out of zero and hundred,
203:42 hundred is your maximum value,
203:44 so you're going to choose hundred.
203:45 Now 0.8 into hundred is nothing but 80.
203:49 So again, your Q matrix gets updated.
203:52 You see an 80 over here.
203:54 So, basically what you're doing is as you're taking actions,
203:56 you're updating your Q value,
203:58 you're just calculating the Q value at every step,
204:01 you're putting it in your Q matrix
204:03 so that your agent remembers that,
204:05 okay, when I went from room number one to room number five,
204:08 I had a Q value of hundred.
204:10 Similarly, three to one gave me a Q value of 80.
204:13 So basically, this Q matrix represents
204:15 the memory of your agent.
204:17 I hope all of you are clear with this.
204:20 So basically, what we're gonna do
204:21 is we're gonna keep iterating through this loop
204:23 until we've gone through all possible states
204:25 and reach the goal state, which is five.
204:28 Also, our main aim here is to find the most optimum policy
204:33 to get to room number five.
204:35 Now let's implement the exact same thing using Python.
204:38 So that was a lot of theory.
204:40 Now let's understand how this is done practically.
204:44 Alright, so we begin by importing your library.
204:47 We're gonna be using the NumPy library over here.
204:50 After that, we'll import the R matrix.
204:52 We've already created the R matrix.
204:54 This is the exact matrix that I showed you
204:57 a couple of minutes ago.
204:59 So I've created a matrix called R
205:01 and I've basically stored all the rewards in it.
205:04 If you want to see the R matrix, let me print it.
205:14 So, basically, this is your R matrix.
205:16 If you remember, node one to five,
205:18 you have a reward of hundred.
205:20 Node four to five, you have a reward of hundred,
205:22 and five to five, you have a reward of hundred,
205:25 because all of these nodes directly lead us to the reward.
205:28 Correct?
205:30 Next, what we're doing is we're creating a Q matrix
205:32 which is basically a six into six matrix.
205:36 Which represents all the states, zero to five.
205:39 And this matrix is basically zero.
205:41 After, that we're setting the Gamma parameter.
205:44 Now guys, you can play around with this code,
205:45 and you know you can change the comma parameter
205:48 to 0.7 or 0.9
205:50 and see how much more the agent will explore
205:52 or whether you perform exploitation.
205:55 Here I've set the Gamma parameter 0.8
205:57 which is a pretty good number.
205:59 Now what I'm doing is I'm setting the initial state as one.
206:03 You can randomly choose this state
206:04 according to your needs.
206:05 I've set the initial state as one.
206:08 Now, this function will basically give me
206:10 all the available actions from my initial state.
206:14 Since I've set my initial state as one,
206:16 It'll give me all the possible actions.
206:18 Here what I'm doing is since my initial state is one,
206:21 I'm checking in my row number one,
206:24 which value is equal to zero or greater than zero.
206:27 Those denote my available actions.
206:30 So look at our row number one.
206:32 Here we have one zero and we have a hundred over here.
206:36 This is one comma four and this is one comma five.
206:40 So if you look at the row number one,
206:42 since I've selected the initial state as one,
206:44 we'll consider row number one.
206:46 Okay, what I'm doing is in row number one,
206:49 I have two numbers which are either equal to zero
206:52 or greater than zero.
206:53 These denote my possible actions.
206:56 One comma three has the value of zero
206:59 and one comma five has the value of hundred,
207:01 which means that the agent
207:03 can either go to room number three
207:05 or it can go to room number five.
207:07 What I'm trying to say is from room number one,
207:09 you can basically go to room number three
207:12 or room number five.
207:13 This is exactly what I've coded over here.
207:17 If you remember the reward matrix,
207:18 from one you can traverse to only room number three directly
207:22 and room number five directly.
207:24 Okay, that's exactly what I've mentioned
207:26 in my code over here.
207:28 So this will basically give me the available actions
207:30 from my current state.
207:33 Now once I've moved to me next state,
207:34 I need to check the available actions from that state.
207:38 What I'm doing over here is basically this.
207:42 If you're remember,
207:43 from room number one, we can go to three and five, correct?
207:46 And from three and five,
207:47 I'll randomly select the state.
207:49 And from that state, I need to find out
207:51 all possible actions.
207:53 That's exactly what I've done over here.
207:56 Okay.
207:57 Now this will randomly choose an action for me
207:59 from all my available actions.
208:03 Next, we need to update our Q matrix,
208:05 depending on the actions that we took,
208:06 if you remember.
208:08 So that's exactly what this update function is four.
208:11 Now guys, this entire is for calculating the Q value.
208:16 I hope all of you remember the formula,
208:18 which is Q state comma action,
208:20 R state comma action plus Gamma into max value.
208:23 Max value will basically give me the maximum value
208:26 out of the all possible actions.
208:28 I'm basically computing this formula.
208:31 Now this will just update the Q matrix.
208:35 Coming to the training phase,
208:36 what we're gonna do is we are going to set a range.
208:39 Here I've set a range of 10,000,
208:42 meaning that my agent will perform 10,000 iterations.
208:46 You can set this depending on your own needs,
208:48 and 10,000 iteration is a pretty huge number.
208:52 So, basically, my agent is going to go through
208:54 10,000 possible iterations
208:55 in order to find the best policy.
208:58 Now this is the exact same thing that we did earlier.
209:01 We're setting the current state,
209:03 and then we're choosing the available action
209:05 from the current state.
209:06 The from there, we'll choose an action at random.
209:09 Here we'll calculate a Q value
209:10 and we'll update the Q value in the matrix.
209:14 Alright.
209:15 And here I'm doing nothing,
209:16 but I'm printing the trained Q matrix.
209:19 This was the training phase.
209:21 Now the testing phase, basically,
209:23 you're going to randomly choose a current state.
209:26 You're gonna choose a current state,
209:28 and you're going to keep looping
209:29 through this entire code,
209:31 until you reach the goal state, which is room number five.
209:34 That's exactly what I'm doing in this whole thing.
209:37 Also, in the end, I'm printing the selected part.
209:39 That is basically the policy that the agent took
209:42 to reach room number five.
209:44 Now if I set the current state as one,
209:47 it should give me the best policy
209:49 to reach to room number five from room number one.
209:52 Alright, let's run this code,
209:54 and let's see if it's giving us that.
209:57 Now before that happens, I want you to check and tell me
210:01 which is the best possible way
210:03 to get from room number one to room number five.
210:06 It's obviously directly like this.
210:08 One to five is the best policy to get from room number one
210:11 to room number five.
210:12 So we should get an output of one comma five.
210:16 That's exactly what we're getting
210:18 this is a Q matrix with all the Q values,
210:21 and here we are getting the selected path.
210:23 So if your current state is one,
210:25 your best policy is to go from one to five.
210:28 Now, if you want to change your current state,
210:31 let's say we set the current state to two.
210:35 And before we run the code,
210:36 let's see which is the best possible way
210:38 to get to room number five from room number two.
210:41 From room number two, you can go to three,
210:43 then you can go to one, and then you can go to five.
210:46 This will give you a reward of hundred,
210:49 or you can go to room number three,
210:50 then go to four, and then go to five.
210:53 This will also give you a reward of hundred.
210:55 Our path should be something like that.
210:57 Let's save it and let's run the file.
211:01 So, basically, from stage two,
211:02 you're going to say three, then to four,
211:04 and then to five.
211:06 This is our best possible path
211:08 from two to room number five.
211:10 So, guys, this is exactly how
211:12 the Q learning algorithm works,
211:14 and this was a simple implementation
211:15 of the entire example that I just told you.
211:19 Now if any of you still have doubts
211:20 regarding Q learning or reinforcement learning,
211:23 make sure you comment them in the comment section,
211:25 and I'll try to answer all of your doubts.
211:28 No we're done with machine learning.
211:30 We've completed the whole machine learning model.
211:32 We've understood reinforcement learning,
211:34 supervised learning, unsupervised learning, and so on.
211:38 Before I'll get to deep learning,
211:40 I want to clear a very common misconception.
211:43 A lot of people get confused
211:45 between AI machine learning and deep learning,
211:48 because, you know, artificial intelligence,
211:50 machine learning and deep learning
211:51 are very common applications.
211:53 For example, Siri is an application
211:55 of artificial intelligence,
211:57 machine learning, and deep learning.
211:59 So how are these three connected?
212:00 Are they the same thing or how exactly
212:03 is the relationship between artificial intelligence,
212:05 machine learning, and deep learning?
212:07 This is what I'll be discussing.
212:09 now artificial intelligence is basically the science
212:12 of getting machines to mimic the behavior of human beings.
212:17 But when it comes to machine learning,
212:19 machine learning is a subset of artificial intelligence
212:23 that focuses on getting machines to make decisions
212:27 by feeding them data.
212:29 That's exactly what machine learning is.
212:31 It is a subset of artificial intelligence.
212:34 Deep learning, on the other hand,
212:35 is a subset of machine learning
212:37 that uses the concept of neural networks
212:40 to solve complex problems.
212:42 So, to sum it up, artificial intelligence,
212:44 machine learning, and deep learning,
212:46 are interconnected fields.
212:48 Machine learning and deep learning
212:50 aids artificial intelligence
212:52 by providing a set of algorithms and neural networks
212:55 to solve data-driven problems.
212:58 That's how AI, machine learning, and deep learning
213:00 are related.
213:01 I hope all of you have cleared your misconceptions
213:04 and doubts about AI, ML, and deep learning.
213:07 Now let's look at our next topic,
213:09 which is limitations of machine learning.
213:11 Now the first limitation is machine learning
213:14 is not capable enough to handle high dimensional data.
213:18 This is where the input and the output is very large.
213:21 So handling and processing such type of data
213:24 becomes very complex
213:25 and it takes up a lot of resources.
213:28 This is also sometimes known as the curse of dimensionality.
213:32 So, to understand this in simpler terms,
213:35 look at the image shown on this slide.
213:37 Consider a line of hundred yards
213:40 and let's say that you dropped a coin somewhere on the line.
213:43 Now it's quite convenient for you to find the coin
213:46 by simply walking along the line.
213:48 This is very simple because this line is considered
213:51 as single dimensional entity.
213:53 Now next, you consider that you have
213:55 a square of hundred yards,
213:57 and let's say you dropped a coin somewhere in between.
214:00 Now it's quite evident that you're going to take more time
214:03 to find the coin within that square
214:05 as compared to the previous scenario.
214:07 The square is, let's say, a two dimensional entity.
214:11 Let's take it a step ahead and let's consider a cube.
214:14 Okay, let's say there's a cube of 500 yards
214:17 and you have dropped a coin somewhere in between this cube.
214:21 Now it becomes even more difficult
214:23 for you to find the coin this time,
214:24 because this is a three dimensional entity.
214:27 So, as your dimension increases,
214:30 the problem becomes more complex.
214:32 So if you observe that the complexity
214:35 is increasing the increase in your dimensions,
214:38 and in real life, the high dimensional data
214:40 that we're talking about has thousands of dimensions
214:43 that makes it very complex to handle and process.
214:46 and a high dimensional data
214:47 can easily be found in used cases like image processing,
214:51 natural language processing, image translation, and so on.
214:55 Now in K-means itself,
214:56 we saw that we had 16 million possible colors.
214:59 That is a lot of data.
215:01 So this is why machine learning is restricted.
215:04 It cannot be used in the process of image recognition
215:08 because image recognition and images have a lot of pixels
215:11 and they have a lot of high dimensional data.
215:13 That's why machine learning becomes very restrictive
215:16 when it comes to such uses cases.
215:18 Now the second major challenge is to tell the computer
215:21 what are the features it should look for
215:24 that will play an important role
215:26 in predicted the outcome and in getting a good accuracy.
215:30 Now this process is something known as feature extraction.
215:33 Now feeding raw data to the algorithm rarely works,
215:37 and this is the reason why feature extraction
215:39 is a critical part of machine learning workflow.
215:42 Now the challenge for the programmer here increases
215:44 because the effectiveness of the algorithm
215:47 depends on how insightful the programmer is.
215:50 As a programmer,
215:51 you have to tell the machine that these are the features.
215:54 And depending on these features,
215:55 you have to predict the outcome.
215:57 That's how machine learning works.
215:59 So far, in all our demos,
216:00 we saw that we were providing predictor variables.
216:03 we were providing input variables
216:05 that will help us predict the outcome.
216:07 We were trying to find correlations between variables,
216:10 and we're trying to find out the variable
216:11 that is very important in predicting the output variable.
216:15 So this becomes a challenge for the programmer.
216:17 That's why it's very difficult to apply
216:19 machine learning model to complex problems
216:22 like object recognition, handwriting recognition,
216:25 natural language processing, and so on.
216:27 Now all these problems
216:29 and all these limitations in machine learning
216:32 led to the introduction of deep learning.
216:35 Now we're gonna discuss about deep learning.
216:38 Now deep learning is one of the only methods
216:41 by which we can overcome the challenges
216:43 of feature extraction.
216:45 This is because deep learning models
216:47 are capable of learning to focus
216:49 on the right features by themselves,
216:51 which requires very little guidance from the programmer.
216:54 Basically, deep learning mimics the way our brain functions.
216:57 That is it learns from experience.
217:00 So in deep learning, what happens is
217:02 feature extraction happens automatically.
217:05 You need very little guidance by the programmer.
217:07 So deep learning will learn the model,
217:09 and it will understand which feature or which variable
217:13 is important in predicting the outcome.
217:15 Let's say you have millions of predictor variables
217:18 for a particular problem statement.
217:20 How are you going to sit down and
217:22 understand the significance
217:23 of each of these predictor variables
217:25 it's going to be almost impossible
217:27 to sit down with so many features.
217:29 That's why we have deep learning.
217:30 Whenever there's high dimensionality data
217:33 or whenever the data is really large
217:35 and it has a lot of features
217:37 and a lot of predictor variables, we use deep learning.
217:40 Deep learning will extract features on its own
217:42 and understand which features are important
217:45 in predicting your output.
217:47 So that's the main idea behind deep learning.
217:49 Let me give you a small example also.
217:52 Suppose we want to make a system
217:54 that can recognize the face of different people
217:56 in an image.
217:57 Okay, so, basically, we're creating a system
217:59 that can identify the faces of different people in in image.
218:03 If we solve this by using
218:05 the typical machine learning algorithms,
218:07 we'll have to define facial features like eyes,
218:10 nose, ears, et cetera.
218:12 Okay, and then the system will identify
218:14 which features are more important for which person.
218:17 Now, if you consider deep learning for the same example,
218:20 deep learning will automatically find out the features
218:23 which are important for classification,
218:25 because it uses the concept of neural networks,
218:28 whereas in machine learning we have to
218:29 manually define these features on our own.
218:32 That's the main difference between deep learning
218:34 and machine learning.
218:36 Now the next question is how does deep learning work?
218:39 Now when people started coming up with deep learning,
218:42 their main aim was to re-engineer the human brain.
218:45 Okay, deep learning studies the basic unit of a brain
218:49 called the brain cell or a neuron.
218:51 All of you biology students
218:53 will know what I'm talking about.
218:54 So, basically, deep learning is inspired
218:56 from our brain structure.
218:58 Okay, in our brains, we have something known as neurons,
219:01 and these neurons are replicated in deep learning
219:04 as artificial neurons,
219:06 which are also called perceptrons.
219:08 Now, before we understand how artificial neural networks
219:11 or artificial neurons work,
219:13 let's understand how these biological neurons work,
219:16 because I'm not sure how many of you
219:18 are bio students over here.
219:19 So let's understand the functionality of biological neurons
219:22 and how we can mimic this functionality
219:25 in a perceptron or in an artificial neuron.
219:28 So, guys, if you loo at this image,
219:30 this is basically an image of a biological neuron.
219:33 If you focus on the structure of the biological neuron,
219:36 it has something known dendrites.
219:38 These dendrites are basically used to receive inputs.
219:41 Now the inputs are basically found in the cell body,
219:45 and it's passed on the next biological neuron.
219:49 So, through dendrites, you're going to receive signals
219:51 from other neurons, basically, input.
219:53 Then the cell body will sum up all these inputs,
219:56 and the axon will transmit this input to other neurons.
220:00 The axon will fire up through some threshold,
220:03 and it will get passed onto the next neuron.
220:06 So similar to this, a perceptron or an artificial neuron
220:10 receives multiple inputs,
220:12 and applies various transformations and functions
220:15 and provides us an output.
220:17 These multiple inputs are nothing but your input variables
220:20 or your predictor variables.
220:21 You're feeding input data to an artificial neuron
220:24 or to a perceptron,
220:25 and this perceptron will apply
220:27 various functions and transformations,
220:30 and it will give you an output.
220:32 Now just like our brain consists of
220:34 multiple connected neurons called neural networks,
220:38 we also build something known as
220:39 a network of artificial neurons
220:42 called artificial neural networks.
220:44 So that's the basic concept behind deep learning.
220:47 To sum it up, what exactly is deep learning?
220:50 Now deep learning is a collection of
220:52 statistical machine learning techniques
220:54 used to learn feature hierarchies based on the concept
220:58 of artificial neural networks.
221:00 So the main idea behind deep learning
221:02 is artificial neural networks
221:04 which work exactly like how our brain works.
221:07 Now in this diagram, you can see that
221:09 there are a couple of layers.
221:11 The first layer is known as the input layer.
221:14 This is where you'll receive all the inputs.
221:16 The last layer is known as the output layer
221:19 which provides your desired output.
221:21 Now, all the layers which are there between your input layer
221:24 and your output layer are known as the hidden layers.
221:27 Now, they can be any number of hidden layers,
221:30 thanks to all the resources that we have these days.
221:33 So you can have hundreds of hidden layers in between.
221:36 Now, the number of hidden layers
221:37 and the number of perceptrons in each of these layers
221:41 will entirely depend on the problem
221:43 or on the use case that you're trying to solve.
221:45 So this is basically how deep learning works.
221:49 So let's look at the example that we saw earlier.
221:52 Here what we want to do is we want to perform
221:54 image recognition using deep networks.
221:57 First, what we're gonna do is we are going to pass
221:59 this high dimensional data to the input layer.
222:03 To mach the dimensionality of the input data,
222:05 the input layer will contain
222:07 multiple sub layers of perceptrons
222:09 so that it consume the entire input.
222:13 Okay, so you'll have multiple sub layers of perceptrons.
222:16 Now, the output received from the input layer
222:18 will contain patterns and will only be able to identify
222:22 the edges of the images, based on the contrast levels.
222:25 This output will then be fed to hidden layer number one
222:29 where it'll be able to identify facial features
222:32 like your eyes, nose, ears, and all of that.
222:35 Now from here, the output will be fed
222:37 to hidden layer number two,
222:39 where it will be able to form entire faces
222:42 it'll go deeper into face recognition,
222:45 and this output of the hidden layer
222:46 will be sent to the output layer
222:48 or any other hidden layer that is there
222:50 before the output layer.
222:52 Now, finally, the output layer will perform classification,
222:55 based on the result that you'd get
222:56 from your previous layers.
222:58 So, this is exactly how deep learning works.
223:01 This is a small analogy that I use
223:03 to make you understand what deep learning is.
223:06 Now let's understand what a single layer perceptron is.
223:10 So like I said, perceptron is basically
223:12 an artificial neuron.
223:14 For something known as single layer
223:15 and multiple layer perceptron,
223:17 we'll first focus on single layer perceptron.
223:20 Now before I explain what a perceptron really is,
223:23 you should known that perceptrons are linear classifiers.
223:26 A single layer perceptron
223:28 is a linear or a binary classifier.
223:31 It is used mainly in supervised learning,
223:33 and it helps to classify the given input data
223:36 into separate classes.
223:38 So this diagram basically represents a perceptron.
223:41 A perceptron has multiple inputs.
223:44 It has a set of inputs labeled X one, X two,
223:47 until X n.
223:48 Now each of these input is given a specific weight.
223:52 Okay, so W one represents the weight of input X one.
223:56 W two represents the weight of input X two, and so on.
224:00 Now how you assign these weights
224:01 is a different thing altogether.
224:03 But for now, you need to know that each input
224:05 is assigned a particular weightage.
224:08 Now what a perceptron does is it computes some functions
224:11 on these weighted inputs, and it will give you the output.
224:15 So, basically, these weighted inputs
224:17 go through something known as summation.
224:20 Okay, summation is nothing but the product
224:22 of each of your input with its respective weight.
224:25 Now after the summation is done,
224:27 this passed onto transfer function.
224:29 A transfer function is nothing but an activation function.
224:33 I'll be discussing more about this in a minute.
224:36 The activation function.
224:37 And from the activation function,
224:38 you'll get the outputs Y one, Y two, and so on.
224:42 So guys, you need to understand four important parts
224:45 in a perceptron.
224:46 So, firstly, you have the input values.
224:48 You have X one, X two, X three.
224:50 You have something known as weights and bias,
224:53 and then you have something known as the net sum
224:55 and finally the activation function.
224:58 Now, all the inputs X
224:59 are multiplied with the respective weights.
225:01 So, X one will be multiplied with W one.
225:05 This is known as the summation.
225:07 After this, you'll add all the multiplied values,
225:10 and we'll call them as the weighted sum.
225:13 This is done using the summation function.
225:15 Now we'll apply the weighted sum
225:16 to a correct activation function.
225:19 Now, a lot of people have a confusion
225:21 about activation function.
225:22 Activation function is also known as the transfer function.
225:26 Now, in order to understand activation function,
225:29 this word stems from the way neurons
225:31 in a human brain work.
225:33 The neuron becomes activate
225:35 only after a certain potential is reached.
225:39 That threshold is known as the activation protection.
225:42 Therefore, mathematically,
225:44 it can be represented by a function
225:46 that reaches saturation after a threshold.
225:49 Okay, we have a lot of activation functions
225:51 like signum, sigmoid, tan, hedge, and so on.
225:54 You can think of activation function
225:57 as a function that maps the input
225:59 to the respective output.
226:00 And now I also spoke about weights and bias.
226:04 Now why do we assign weights to each of these inputs?
226:07 What weights do is they show a strength
226:09 of a particular input,
226:11 or how important a particular input
226:13 is for predicting the final output.
226:16 So, basically, the weightage of an input
226:18 denotes the importance of that input.
226:20 Now, our bias basically allows us
226:22 to shift the activation function
226:24 in order to get a precise output.
226:27 So that was all about perceptrons.
226:30 Now in order to make you understand perceptrons better,
226:33 let's look at a small analogy.
226:35 Suppose that you wanna go to a party
226:37 happening near your hose.
226:39 Now your decision will depend on a set of factors.
226:42 First is how is the weather.
226:44 Second probably is your wife, or your girlfriend,
226:47 or your boyfriend going with you.
226:49 And third, is there any public transport available?
226:53 Let's say these are the three factors
226:54 that you're going to consider before you go to a party.
226:57 So, depending on these predictor variables
226:59 or these features,
227:01 you're going to decide whether you're going to stay at home
227:04 or go and party.
227:05 Now, how is the weather is going to be your first input.
227:09 We'll represent this with a value X one.
227:12 Is your wife going with you is another input X two.
227:15 Any public transport is available
227:17 is your another input X three.
227:20 Now, X one will have two values, one and zero.
227:23 One represents that the weather is good.
227:25 Zero represents weather is bad.
227:27 Similarly, one represents that your wife is going,
227:30 and zero represents that your wife is not going.
227:33 And in X three, again, one represents that
227:36 there is public transport,
227:37 and zero represents that there is no public transport.
227:40 Now your output will either be one or zero.
227:43 One means you are going to the party,
227:45 and zero means you will be sitting at home.
227:48 Now in order to understand weightage,
227:50 let's say that the most important factor for you
227:53 is your weather.
227:54 If the weather is good,
227:55 it means that you will 100% go to the party.
227:58 Now if you weather is not good,
228:00 you've decided that you'll sit at home.
228:03 So the maximum weightage is for your weather variable.
228:06 So if your weather is really good,
228:08 you will go to the party.
228:09 It is a very important factor in order to understand
228:12 whether you're going to sit at home
228:13 or you're going to go to the party.
228:15 So, basically, if X one equal to one,
228:18 your output will be one.
228:19 Meaning that if your weather is good,
228:21 you'll go to the party.
228:23 Now let's randomly assign weights to each of our input.
228:26 W one is the weight associated with input X one.
228:29 W two is the weight with X two
228:32 and W three is the weight associated with X three.
228:35 Let's say that your W one is six,
228:37 your W two is two, and W three is two.
228:40 Now by using the activation function,
228:42 you're going to set a threshold of five.
228:45 Now this means that it will fire
228:47 when the weather is good
228:48 and won't fire if the weather is bad,
228:51 irrespective of the other inputs.
228:54 Now here, because your weightage is six,
228:57 so, basically, if you consider your first input
228:59 which has a weightage of six,
229:01 that means you're 100% going to go.
229:03 Let's say you're considering only the second input.
229:06 This means that you're not going to go,
229:08 because your weightage is two and your threshold is five.
229:11 So if your weightage is below your threshold,
229:13 it means that you're not going to go.
229:15 Now let's consider another scenario
229:17 where our threshold is three.
229:19 This means that it'll fire
229:20 when either X one is high
229:22 or the other two inputs are high.
229:24 Now W two is associated with your wife is going or not.
229:28 Let's say the weather is bad
229:30 and you have no public transportation,
229:33 meaning that your x one and x three is zero,
229:35 and only your x two is one.
229:38 Now if your x two is one,
229:39 your weightage is going to be two.
229:41 If your weightage is two,
229:42 you will not go because the threshold value is set to three.
229:45 The threshold value is set in such a way that
229:48 if X two and X three are combined together,
229:51 only then you'll go,
229:52 or only if x one is true, then you'll go.
229:55 So you're assigning threshold in such a way that
229:58 you will go for sure if the weather is good.
230:02 This is how you assign threshold.
230:04 This is nothing but your activation function.
230:07 So guys, I hope all of you understood,
230:09 the most amount of weightage
230:11 is associated with the input that is very important
230:14 in predicting your output.
230:16 This is exactly how a perceptron works.
230:19 Now let's look at the limitations of a perceptron.
230:22 Now in a perceptron, there are no hidden layers.
230:25 There's only an input layer,
230:27 and there is an output layer.
230:29 We have no hidden layers in between.
230:31 And because of this, you cannot classify
230:34 non-linearly separable data points.
230:36 Okay, if you have data, like in this figure,
230:39 how will you separate this.
230:41 You cannot use a perceptron to do this.
230:43 Alright, so complex problems that involve
230:46 a lot of parameters cannot be solved
230:48 by a single layer perceptron.
230:50 That's why we need something known as
230:52 multiple layer perceptron.
230:55 So now we'll discuss something known as
230:57 multilayer perceptron.
230:59 A multilayer perceptron has the same structure
231:02 of a single layer perceptron,
231:04 but with one or more hidden layer.
231:06 Okay, and that's why it's consider as a deep neural network.
231:10 So in a single layer perceptron,
231:12 we had only input layer, output layer.
231:15 We didn't have any hidden layer.
231:16 Now when it comes to multi-layer perceptron,
231:18 there are hidden layers in between,
231:20 and then there is the output layer.
231:22 It was in this similar manner, like I said,
231:24 first, you'll have the input X one, X two, X three,
231:27 and so on.
231:28 And each of these inputs will be assigned some weight.
231:31 W one, W two, W three, and so on.
231:33 Then you'll calculate the weighted summation
231:36 of each of these inputs and their weights.
231:38 After that, you'll send them to the transformation
231:40 or the activation function,
231:41 and you'll finally get the output.
231:44 Now, the only thing is that you'll have multiple
231:46 hidden layers in between,
231:48 one or more than one hidden layers.
231:51 So, guys, this is how a multilayer perceptron works.
231:54 It works on the concept of feed forward neural networks.
231:58 Feed forward means every node at each level
232:01 or each layer is connected to every other node.
232:04 So that's what feed forward networks are.
232:06 Now when it comes to assigning weights,
232:08 what we do is we randomly assign weights.
232:11 Initially we have input X one, X two, X three.
232:13 We randomly assign some weight W one, W two, W three,
232:17 and so on.
232:18 Now it's always necessary that whatever weights
232:20 we assign to our input,
232:23 those weights are actually correct,
232:25 meaning that those weights are company significant
232:28 in predicting your output.
232:30 So how a multilayer perceptron works is
232:33 a set of inputs are passed to the first hidden layer.
232:36 Now the activations from that layer are passed
232:39 through the next layer.
232:40 And from that layer, it's passed to the next hidden layer,
232:43 until you reach the output layer.
232:46 From the output layer, you'll form the two classes,
232:48 class one and class two.
232:49 Basically, you'll classify your input into
232:52 one of the two classes.
232:54 So that's how a multilayer perceptron works.
232:56 A very important concept the multiple layer perceptron
233:00 is back propagation.
233:03 Back propagation algorithm is
233:06 a supervised learning method for multilayer perceptrons.
233:09 Okay, now why do we need back propagation?
233:12 So guys, when we are designing a neural network
233:15 in the beginning, we initialize weights
233:17 with some random values, or any variable for that fact.
233:21 Now, obviously, we need to make sure that these weights
233:24 actually are correct,
233:25 meaning that these weights show the significance
233:28 of each predictor variable.
233:30 These weights have to fit our model
233:32 in such a way that our output is very precise.
233:35 So let's say that we randomly selected
233:37 some weights in the beginning,
233:39 but our model output is much more different
233:42 than our actual output,
233:43 meaning that our error value is very huge.
233:46 So how will you reduce this error.
233:48 Basically, what you need to do is
233:50 we need to somehow explain to the model
233:52 that we need to change the weight
233:54 in such a way that the error becomes minimum.
233:58 So the main thing is the weight and your error
234:00 is very highly related.
234:02 The weightage that you give to each input
234:05 will show how much error is there in your output,
234:08 because the most significant variables
234:09 will have the highest weightage.
234:11 And if the weightage is not correct,
234:13 then your output is also not correct.
234:15 Now, back propagation is a way to update your weights
234:19 in such a way that your outcome is precise
234:21 and your error is reduced.
234:24 So, in short back propagation is used to train
234:27 a multilayer perceptron.
234:29 It's basically use to update your weights
234:31 in such a way that your output is more precise,
234:35 and that your error is reduced.
234:37 So training a neural network is all about back propagation.
234:41 So the most common deep learning algorithm
234:43 for supervised training of the multilayer perceptron
234:46 is known as back propagation.
234:48 So, after calculating the weighted sum of inputs
234:51 and passing them through the activation function,
234:54 we propagate backwards and update the weights
234:57 to reduce the error.
234:59 It's as simple as that.
235:00 So in the beginning, you're going to assign some weights
235:03 to each of your input.
235:04 Now these inputs will go through the activation function
235:07 and it'll go through all the hidden layers
235:09 and give us an output.
235:10 Now when you get the output,
235:12 the output is not very precise,
235:14 or it is not the desired output.
235:17 So what you'll do is you'll propagate backwards,
235:20 and you start updating your weights
235:22 in such a way that your error
235:23 is as minimum as possible.
235:26 So, I'm going to repeat this once more.
235:28 So the idea behind back propagation
235:30 is to choose weights in such a way
235:32 that your error gets minimized.
235:34 To understand this, we'll look at a small example.
235:38 Let's say that we have a data set which has these labels.
235:41 Okay, your input is zero, one, two,
235:43 but your desired output is zero, one, and four
235:46 now the output of your model
235:48 when W equal to three is like this.
235:52 Notice the difference between your model output
235:54 and your desired output.
235:56 So, your model output is three,
235:59 but your desired output is two.
236:01 Similarly, when your model output is six,
236:04 your desired output is supposed to be four.
236:07 Now let's calculate the error when weight is equal to three.
236:11 The error is zero over here
236:13 because your desired output is zero,
236:15 and your model output is also zero.
236:17 Now the error in the second case is one.
236:20 Basically, your model output minus your desired output.
236:23 Three minus two, your error is one.
236:25 Similarly, your error for the third input is two,
236:28 which is six minus four.
236:30 When you take the square,
236:31 this is actually a very huge difference,
236:33 your error becomes larger.
236:35 Now what we need to do
236:36 is we need to update the weight value
236:38 in such a way that our error decreases.
236:40 Now here we've considered the weight as four.
236:43 So when you consider the weight as four,
236:46 your model output becomes zero, four, and eight.
236:50 Your desired output is zero, two, and four.
236:53 So your model output becomes zero, four, and eight,
236:56 which is a lot.
236:58 So guys, I hope you all know
236:59 how to calculate the output over here.
237:01 What I'm doing is I'm multiplying the input
237:04 with your weightage.
237:05 The weightage is four,
237:07 so zero into four will give me zero.
237:09 One into four will give me four,
237:11 and two into four will give me eight.
237:14 That's how I'm getting my model output over here.
237:16 For now, this is how I'm getting the output over here.
237:19 That's how you calculate your weightage.
237:21 Now, here, if you see that our desire output
237:23 is supposed to be zero, two, and four,
237:25 but we're getting an output of zero, four, and eight.
237:28 So our error is actually increasing
237:31 as we increase our weight.
237:33 Our error four W equal to four
237:35 have become zero, four, and 16,
237:37 whereas the error for W equal to three,
237:40 zero, one, and four.
237:41 I mean the square error.
237:43 So if you look at this, as we increase our weightage,
237:46 our error is increasing.
237:48 So, obviously, we know that
237:49 there is no point in increasing the value of W further.
237:53 But if we decrease the value of W,
237:55 our error actually decreases.
237:57 Alright, if we give a weightage of two,
238:00 our error decreases.
238:02 If we can find a relationship between our weight and error,
238:05 basically, if you increase the weight,
238:07 your error also increases.
238:09 If you decrease the weight, your error also decreases.
238:12 Now what we did here is we first initialize
238:14 some random value to W,
238:16 and then we propagated forward.
238:19 Then we notice that there is some error.
238:21 And to reduce that error, we propagated backwards
238:24 and increase the value of W.
238:26 After that, we notice that the error has increased,
238:29 and we came to know that we can't increase the w value.
238:33 Obviously, if your error is increasing
238:34 with increasing your weight,
238:36 you will not increase the weight.
238:38 So again, we propagated backwards,
238:40 and we decreased the W value.
238:42 So, after that, we noticed that the error has reduced.
238:45 So what we're trying is we're trying to get
238:46 the value of weight in such a way
238:48 that the error becomes as minimum as possible
238:52 so we need to figure out whether we need
238:53 to increase or decrease thew eight value.
238:56 Once we know that, we keep on updating the weight value
239:00 in that direction,
239:01 until the error becomes minimum.
239:03 Now you might reach a point where
239:05 if you further update the weight,
239:07 the error will again increase.
239:10 At that point, you need to stop.
239:12 Okay, at that point
239:13 is where your final weight value is there.
239:15 So, basically, this graph denotes that point.
239:18 Now this point is nothing but the global loss minimum.
239:22 If you update the weights further,
239:24 your error will also increase.
239:27 Now you need to find out where your global loss minimum is,
239:30 and that is where your optimum weight lies.
239:33 So let me summarize the steps for you.
239:36 First, you'll calculate the error.
239:38 This is how far your model output is
239:40 from your actual output.
239:42 Then you'll check whether the error
239:44 is minimized or not.
239:46 After that, if the error is very huge,
239:49 then you'll update the weight,
239:50 and you'll check the error again.
239:52 You'll repeat the process until the error becomes minimum
239:56 now once you reach the global loss minimum,
239:58 you'll stop updating the weights,
240:00 and we'll finalize your weight value.
240:02 This is exactly how back propagation works.
240:05 Now in order to tell you mathematically what we're doing
240:08 is we're using a method known as gradient descent.
240:11 Okay, this method is used
240:12 to adjust all the weights in the network
240:15 with an aim of reducing the error at the output layer.
240:19 So how gradient descent optimize our works is
240:22 the first step is you will calculate the error
240:24 by considering the below equation.
240:26 Here you're subtracting the summation of your actual output
240:29 from your network output.
240:31 Step two is based on the error you get,
240:34 you will calculate the rate of change of error
240:36 with respect to the change in the weight.
240:39 The learning rate is something that you set
240:41 in the beginning itself.
240:43 Step three is based on this change in weight,
240:46 you will calculate the new weight.
240:48 Alright, your updated weight will be your weight
240:51 plus the rate of change of weight.
240:53 So guys, that was all about
240:54 back propagation and weight update.
240:57 Now let's look at the limitations of feed forward network.
241:01 So far, we were discussing the multiple layer perceptron,
241:04 which uses the feed forward network.
241:07 Let's discuss the limitations of these
241:09 feed forward networks.
241:10 Now let's consider an example of image classification.
241:14 Okay, let's say you've trained the neural network
241:16 to classify images of various animals.
241:19 Now let's consider an example.
241:21 Here the first output is an elephant.
241:24 We have an elephant.
241:25 And this output will have nothing to do
241:28 with the previous output, which is a dog.
241:31 This means that the output at time T
241:33 is independent of the output at time T minus one.
241:37 Now consider this scenario
241:39 where you will require the use
241:41 of previously obtained output.
241:44 Okay, the concept is very similarly to reading a book.
241:47 As you turn every page, you need
241:49 an understanding of the previous pages
241:51 if you want to make sense of the information,
241:53 then you need to know what you learned before.
241:55 That's exactly what you're doing right now.
241:57 In order to understand deep learning,
242:00 you have to understand machine learning.
242:02 So, basically, with the feed forward network
242:04 the new output at time T plus one
242:07 has nothing to do with the output at time T,
242:11 or T minus one, or T minus two.
242:13 So feed forward networks cannot be used
242:15 while predicting a word in a sentence,
242:17 as it will have absolutely no relationship
242:20 with the previous set of words.
242:21 So, a feed forward network cannot be used in
242:24 use cases wherein you have to predict the outcome
242:27 based on your previous outcome.
242:30 So, in a lot of use cases,
242:32 your previous output will also determine your next output.
242:36 So, for such cases, you may not make use
242:38 of feed forward network.
242:40 Now, what modification can you make
242:42 so that your network can learn
242:44 from your previous mistakes.
242:45 For this, we have solution.
242:47 So, a solution to this is recurrent neural networks.
242:51 So, basically, let's say you have an input
242:52 at time T minus one,
242:54 and you'll get some output when you feed it to the network.
242:57 Now, some information from this input at T minus one
243:01 is fed to the next input,
243:03 which is input at time T.
243:05 Some information from this output
243:07 is fed into the next input,
243:09 which is input at T plus one.
243:11 So, basically, you keep feeding information
243:13 from the previous input to the next input.
243:16 That's how recurrent neural networks really work.
243:19 So recurrent networks
243:20 are a type of artificial neural networks
243:23 designed to recognize patterns in sequence of data,
243:27 such as text, genomes, handwriting, spoken words,
243:31 time series data, sensors, stock markets,
243:34 and government agencies.
243:35 So, guys, recurrent neural networks are actually
243:38 a very important part of deep learning,
243:40 because recurring neural networks have
243:42 applications in a lot of domains.
243:45 Okay, in time series and in stock markets,
243:48 the main network that I use
243:49 are recurrent neural networks,
243:51 because each of your inputs are correlated
243:54 now to better understand recurrent neural networks,
243:57 let's consider a small example
243:59 let's say that you go to the gym regularly,
244:02 and the trainer has given you
244:04 a schedule for your workout.
244:06 So basically, the exercises are repeated
244:08 after every third day.
244:10 Okay, this is what your schedule looks like.
244:12 So, make a note that all these exercises are repeated
244:15 in a proper order or in a sequence every week
244:18 first, let us use a feedforward network
244:20 to try and predict the type of exercises
244:23 that we're going to do.
244:24 The inputs here are Day of the week, the month,
244:27 and your health status.
244:29 Okay, so, neural network has to be trained
244:31 using these inputs to provide us with the prediction
244:34 of the exercise that we should do.
244:36 Now let's try and understand the same thing using
244:39 recurrent neural networks.
244:41 In recurrent neural networks,
244:42 what we'll do is we'll consider the inputs
244:45 of the previous day.
244:46 Okay, so if you did a shoulder workout yesterday,
244:49 then you can do a bicep exercise today,
244:51 and this goes on for the rest of the week.
244:53 However, if you happen to miss a day at the gym,
244:56 the data from the previously attended time stamps
244:58 can be considered.
245:00 It can be done like this.
245:02 So, if a model is trained based on the data
245:04 it can obtain from the previous exercise,
245:07 the output on the model will be extremely accurate.
245:10 In such cases, if you need to do know the output
245:12 at T minus one in order to predict the output at T.
245:16 In such cases, recurrent neural networks are very essential.
245:20 So, basically, I'm feeding some inputs
245:22 through the neural networks.
245:23 You'll go through a few functions,
245:25 and you'll get the output.
245:26 So, basically, you're predicting the output
245:28 based on past information or based on your past input.
245:33 So that's how recurrent neural networks work.
245:35 Now let's look at another type of neural network
245:38 known as convolutional neural network.
245:41 To understand why we need convolutional neural networks,
245:44 let's look at an analogy.
245:46 How do you think a computer reads an image?
245:48 Consider this image.
245:50 This is a New York skyline image.
245:53 On the first glance,
245:54 you'll see a lot of buildings and a lot of colors.
245:57 How does a computer process this image?
246:00 The image is actually broken down into three color channels,
246:03 which is the red, green, and blue.
246:05 It reads in the form of RGB values.
246:08 Now each of these color channels are mapped
246:10 with the image's pixel
246:12 then the computer will recognize the value
246:15 associated with each pixel,
246:16 and determine the size of the image.
246:19 Now for the black and white images,
246:21 there is only one channel,
246:22 but the concept is still the same.
246:24 The thing is we cannot make use of
246:26 fully connected networks when it comes to
246:29 convolutional neural networks.
246:31 I'll tell you why.
246:32 Now consider the first input image.
246:34 Okay, first image has size about
246:36 28 into 28 into three pixels.
246:39 And if we input this to a neural network,
246:42 we'll get about 2,352 weights
246:45 in the first hidden layer itself.
246:47 Now consider another example.
246:49 Okay, let's say we have an image
246:50 of 200 into 200 into three pixels.
246:54 So the size of your first hidden layer
246:55 becomes around 120,000.
246:57 Now if this is just the first hidden layer,
247:00 imagine the number of neurons that you need
247:03 to process an entire complex image set.
247:06 This leads to something known as overfitting,
247:09 because all of the hidden layers are connected.
247:11 They're massively connected.
247:12 There's connection between each and every node.
247:15 Because of this, we face overfitting.
247:17 We have way too much of data.
247:19 We have to use way too many neurons,
247:21 which is not practical.
247:23 So that's why we have something known as
247:25 convolutional neural networks.
247:28 Now convolutional neural networks,
247:30 like any other neural network
247:31 are made up of neurons with learnable weights and basis.
247:35 So each neuron receives several input.
247:38 It takes a weighted sum over them,
247:40 and it gets passed on through some activation function,
247:43 and finally responds with an output.
247:46 So, the concept in convolutional neural networks
247:49 is that the neuron in a particular layer
247:51 will only be connected to a small region
247:54 of the layer before it.
247:55 Not all the neurons will be connected
247:57 in a fully-connected manner,
247:59 which leads to overfitting
248:00 because we need way too many neurons
248:02 to solve this problem.
248:03 Only the regions, which are significant
248:05 are connected to each other.
248:07 There is no full connection
248:08 in convolutional neural networks.
248:10 So gus, what we did so far is we discussed
248:14 what a perceptron is.
248:15 We discussed the different types
248:17 of neural networks that are there.
248:20 We discussed a feedforward neural network.
248:23 We discuss multi layer perceptrons
248:25 we discussed recurrent neural networks,
248:28 and convolutional neural networks.
248:30 I'm not going to go too much in depth
248:32 with these concepts
248:33 now I'll be executing a demo.
248:35 If you you haven't understood any
248:37 theoretical concept of deep learning,
248:39 please let me know in the comment section.
248:41 Apart from this, I'll also leave
248:43 a couple of links in the description box,
248:44 so that you understand the whole download in a better way.
248:48 Okay, if you want a more in-depth explanation,
248:50 I'll leave a couple of links in the description box.
248:53 For now, what I'm gonna do is I'll be running
248:55 a practical demonstration to show you
248:57 what exactly download does
248:59 so, basically, what we're going to do in this demo
249:01 is we're going to predict stock prices.
249:04 Like I said, stock price prediction
249:06 is one of the very good applications
249:09 of deep neural networks.
249:10 You can easily predict the stock price
249:13 of a particular stock for the next minute
249:15 or the next day by using deep neural networks.
249:19 So that's exactly what we're gonna do in this demo
249:22 now, before I discuss the code,
249:23 let me tell you a few things about our data set.
249:26 The data set contains around 42,000 minutes
249:30 of data ranging from April to August 2017
249:34 on 500 stocks,
249:35 as well as the total S&P 500 Index price.
249:38 So the index and stocks are arranged
249:40 in a wide format.
249:42 So, this is my data set, data_stocks.
249:45 It's in the CSV format.
249:47 So what I'm gonna do is I'm going to use
249:49 the read CSV function in order to import this data set.
249:52 This is just the part of where my data set is stored.
249:56 This data set was actually cleaned and prepared,
249:58 meaning that we don't have any missing stock
250:01 and index prices.
250:02 So the file does not contain any missing values.
250:05 Now what we're gonna do first
250:07 is we'll drop the data valuable
250:09 we have a variable known as date,
250:10 which is not really necessary
250:12 in predicting our outcome over here.
250:14 So that's exactly what I'm doing here.
250:15 I'm just dropping the date variable.
250:18 So here, I'm checking the dimensions of the data set.
250:21 This is pretty understandable,
250:23 using the shape function to do that.
250:25 Now, always you make the data as a NymPy array.
250:29 This makes computation much easier.
250:31 The next process is the data splicing.
250:33 I've already discussed data the data splicing with you all.
250:36 Here we're just preparing the training
250:37 and the testing data.
250:39 So the training data will contain
250:40 80% of the total data set.
250:42 Okay, and also we are not shuffling the data set.
250:45 We're just slicing the data set sequentially.
250:48 That's why we have a test start start
250:50 and the test end variable.
250:52 In sequence, I'll be selecting the data.
250:54 There's no need of shuffling this data set.
250:56 These are stock prices
250:57 it does not make sense to shuffle this data.
251:00 Now in the next step, we're going to do is
251:02 we're going to scale the data
251:04 now, scaling data and data normalization
251:06 is one of the most important steps.
251:09 You cannot miss this step
251:11 I already mentioned earlier
251:12 what normalization and scaling is.
251:15 Now most neural networks
251:17 benefit from scaling inputs.
251:18 This is because most common activation function
251:21 of the networks neuron such as tan, hedge, and sigmoid.
251:25 Tan, hedge, and sigmoid are basically activation functions,
251:28 and these are defined in the range of minus one to one
251:32 or zero and one.
251:33 So that's why scaling is an important thing
251:35 in deep neural networks
251:37 for scaling, again, we'll use the MinMaxScaler.
251:40 So we're just importing that function over here.
251:42 And also one point to note is that
251:45 you have to be very cautious
251:46 about what part of data you're scaling
251:48 and when you're doing it.
251:50 A very common mistake is to scale the whole data set
251:53 before training and test splits are being applied.
251:56 So before data splicing itself,
251:58 you shouldn't be scaling your data.
252:00 Now this is a mistake because
252:02 scaling invokes the calculation of statistics.
252:06 For example, minimum or maximum range of the variable
252:09 gets affected.
252:10 So when performing time series forecasting in real life,
252:14 you do not have information from future observations
252:17 at the time of forecasting.
252:19 That's why calculation of scaling statistics
252:22 has to be conducted on training data,
252:24 and only then it has to be applied to the test data.
252:27 Otherwise, you're basically using the future information
252:30 at the time of forecasting,
252:32 which obviously going to lead to biasness
252:35 so that's why you need to make sure
252:37 you do scaling very accurately.
252:39 So, basically, what we're doing is the number of features
252:42 in the training data are stored
252:44 in a variable known as n stocks.
252:46 After this, we'll import the infamous TensorFlow.
252:49 So guys, TensorFlow is actually a very good
252:52 piece of software and it is currently the leading
252:55 deep learning and neural network computation framework.
252:58 It is based on a C++ low-level backend,
253:02 but it's usually controlled through Python.
253:05 So TensorFlow actually operates as
253:07 a graphical representation of your computations.
253:10 And this is important because neural networks
253:12 are actually graphs of data and mathematical operation.
253:16 So that's why TensorFlow is just perfect
253:18 for neural networks and deep learning.
253:20 So the next thing after importing the TensorFlow library
253:23 is something known as placeholders.
253:26 Placeholders are used to store, import, and target data.
253:30 We need two placeholders in order to fit our model.
253:34 So basically, X will contain the network's input,
253:37 which is the stock prices of all the stocks
253:40 at time T equal to T.
253:42 And y will contain the network's output,
253:45 which is the stock price at time T is equal to T plus one.
253:49 Now the shape of the X placeholder
253:52 means that the inputs are two-dimensional matrix.
253:55 And the outputs are a one-dimensional vector.
253:58 So guys, basically, the non-argument indicates
254:01 that at this point we do not yet know
254:03 the number of observations
254:05 that'll flow through the neural network.
254:08 We just keep it as a flexible array for now.
254:10 We'll later define the variable batch size
254:13 that controls the number of observations
254:15 in each training batch.
254:18 Now, apart form this, we also have
254:20 something know as initializers.
254:22 Now, before I tell you what these initializers are,
254:25 you need to understand that
254:27 there's something known as variables
254:28 that are used as flexible containers
254:31 that are allowed to change during the execution.
254:34 Weights and bias are represented as variables
254:37 in order to adapt during training.
254:40 I already discuss weights and bias with you earlier.
254:43 Now weights and bias is something
254:45 that you need to initialize before you train the model.
254:48 That's how we discussed it even while I was explaining
254:51 neural networks to you.
254:53 So here, basically, we make use of something known as
254:55 variant scaling initializer
254:57 and for bias initializer,
254:58 we make use of zeros initializers.
255:01 These are some predefined functions in our TensorFlow model.
255:05 We'll not get into the depth of those things.
255:08 Now let's look at our model architecture parameters.
255:11 So the next thing we have to discuss
255:13 is the model architecture parameters.
255:16 Now the model that we build,
255:17 it consists of four hidden layers.
255:20 For the first layer, we've assigned 1,024 neurons
255:23 which is likely more than double the size of the inputs.
255:27 The subsequent hidden layers are always
255:29 half the size of the previous layer,
255:31 which means that in the hidden layer number two,
255:33 we'll have 512 neurons.
255:36 Hidden layer three will have 256.
255:38 And similarly, hidden layer number four
255:41 will have 128 neurons.
255:43 Now why do we keep reducing the number of neurons
255:45 as we go through each hidden layer.
255:47 We do this because the number of neurons
255:50 for each subsequent layer compresses the information
255:53 that the network identifies in the previous layer.
255:57 Of course there are other possible network architectures
255:59 that you can apply for this problem statement,
256:02 but I'm trying to keep it as simple as possible,
256:04 because I'm introducing deep learning to you all.
256:06 So I can't build a model architecture
256:09 that's very complex and hard to explain.
256:11 And of course, we have output over here
256:13 which will be assigned a single neuron.
256:15 Now it is very important to understand
256:18 that variable dimensions between your input,
256:20 hidden, and output layers.
256:22 So, as a rule of thumb in multilayer perceptrons,
256:26 the second dimension of the previous layer
256:29 is the first dimension in the current layer.
256:32 So the second dimension in my first hidden layer
256:35 is going to be my first dimension in my second hidden layer.
256:39 Now the reason behind this is pretty logical.
256:42 It's because the output from the first hidden layer
256:44 is passed on as an input to the second hidden layer.
256:47 That's why the second dimension of the previous layer
256:50 is the same as the first dimension
256:52 of the next layer or the current layer.
256:55 I hope this is understandable.
256:57 Now coming to the bias dimension over here,
256:59 the bias dimension is always
257:01 equal to the second dimension of your current layer,
257:04 meaning that you're just going to pass
257:05 the number of neurons in that particular hidden layer
257:08 as your dimension in your bias.
257:10 So here, the number of neurons, 1,024,
257:14 you're passing the same number as a parameter to your bias.
257:17 Similarly, even for hidden layer number two,
257:20 if you see a second dimension here
257:21 is n_neurons_2.
257:23 I'm passing the same parameter over here as well.
257:27 Similarly, for hidden layer three
257:28 and hidden layer number four.
257:31 Alright, I hope this is understandable
257:34 now we come to the output layer.
257:36 The output layer will obviously have
257:38 the output from hidden layer number four.
257:40 This is our output from hidden layer four
257:42 that's passed as the first dimension in our output layer,
257:46 and it'll finally have your n target,
257:48 which is set to one over here.
257:51 This is our output.
257:53 Your bias will basically have the current layer's dimension,
257:57 which is n target.
257:58 You're passing that same parameter over here.
258:02 Now after you define the required weight
258:04 and the bias variables,
258:06 the architecture of the network has to be specified.
258:09 What you do is placeholders and variables
258:12 need to be combined into a system of
258:15 sequential matrix multiplication.
258:17 So that's exactly what's happening over here.
258:19 Apart from this, all the hidden layers
258:21 need to be transformed by using the activation function.
258:25 So, activation functions are important
258:28 components of the network
258:30 because they introduce non-linearity to the system.
258:33 This means that high dimensional data
258:35 can be dealt with with the help of the activation functions.
258:39 Obviously, we have very high dimensional data
258:41 when it comes to neural networks.
258:42 We don't have a single dimension
258:44 or we don't have two or three inputs.
258:46 We have thousands and thousands of inputs.
258:49 So, in order for a neural network to process
258:51 that much of high dimensional data,
258:53 we need something known as activation functions.
258:56 That's why we make use of activation functions.
258:58 Now, there are dozens of activation functions,
259:01 and one of the most common one
259:03 is the rectified linear unit,
259:06 rectified linear unit.
259:08 RELU is nothing but rectified linear unit,
259:11 which is what we're gonna be using in this model.
259:14 So, after, you applied the transformation function
259:16 to your hidden layer, you need to make sure that
259:18 your output is transposed.
259:20 This is followed by a very important function known as
259:23 cost function.
259:24 So the cost function of a network
259:26 is used to generate a measure of deviation
259:30 between the network's prediction
259:31 and the actual observed training targets.
259:34 So this is basically your actual output
259:37 minus your model output.
259:39 It basically calculates the error between your actual output
259:43 and your predicted output.
259:45 So, for regression problems, the mean squared error function
259:48 is commonly used.
259:50 I have discussed MSC, mean squared error, before.
259:53 So, basically, we are just measuring
259:55 the deviation over here.
259:57 MSC is nothing bot your deviation
259:59 from your actual output.
260:01 That's exactly what we're doing here.
260:03 So after you've computed your error,
260:05 the next step is obviously to update
260:07 your weight and your bias.
260:09 So, we have something known as the optimizers.
260:11 They basically take care of all the necessary computations
260:15 that are needed to adapt the network's weight
260:18 and bias variables during the training phase.
260:21 That's exactly what's happening over here.
260:23 Now the main function of this optimizer is that
260:26 it invoke something known as a gradient.
260:29 Now if you all remember, we discussed gradient before
260:32 it basically indicates the direction
260:34 in which the weights and the bias
260:36 has to be changed during the training
260:38 in order to minimize the network's cost function
260:42 or the network's error.
260:43 So you need to figure out whether you need to increase
260:45 the weight and the bias in order to decrease the error,
260:49 or is it the other way around?
260:51 You need to understand the relationship
260:53 between your error and your weight variable.
260:55 That's exactly what the optimizer does.
260:57 It invokes the gradient.
260:59 We will give you the direction in which the weights
261:01 and the bias have to be changed.
261:03 So now that you know what an optimizer does,
261:06 in our model, we'll be using
261:08 something known as the AdamOptimizer.
261:11 This is one of the current default optimizers
261:13 in deep learning.
261:15 Adam basically stands for adaptive moment estimation,
261:18 and it can be considered as a combination between
261:21 very two popular optimizers called Adagrad and RMSprop.
261:27 Now let's not get into the depth of the optimizers.
261:30 The main agenda here
261:31 is for you to understand the logic behind deep learning.
261:34 We don't have to go into the functions.
261:35 I know these are predefined functions
261:37 which TensorFlow takes care of.
261:39 Next we have something known as initializers.
261:41 Now, initializers are used to initialize
261:44 the network's variables before training.
261:46 We already discussed this before.
261:48 I'll define the initializer here again.
261:50 I've already done it earlier in this session.
261:53 Initializers are already defined.
261:56 So I just removed that line of code.
261:58 Next step would be fitting the neural network.
262:02 So after we've defined the place holders, the variables,
262:05 variables which are basically weights and bias,
262:08 the initializers, the cost functions,
262:10 and the optimizers of the network,
262:12 the model has to be trained.
262:14 Now, this is usually done by using
262:16 the mini batch training method,
262:18 because we have very huge data set.
262:21 So it's always best to use the mini batch training method.
262:24 Now what happens during mini batch training
262:26 is random data samples of any batch size
262:30 are drawn from the training data,
262:31 and they are fed into the network.
262:33 So the training data set gets divided into
262:36 N divided by your batch size batches
262:39 that are sequentially fed into the network.
262:42 So, one after the other,
262:43 each of these batches will be fed into the network.
262:46 At this point, the placeholder which are your X and Y,
262:50 they come into play.
262:51 They store the input and the target data
262:53 and present them to the network as inputs and targets.
262:57 That's the main functionality of placeholders.
263:00 What they do is they store the input and the target data,
263:03 and they provide this to the network
263:05 as inputs and targets.
263:07 That's exactly what your placeholders do.
263:09 So let's say that a sample data batch of X.
263:13 Now this data batch flows through the network
263:15 until it reaches the output layer.
263:18 There the TensorFlow compares the model's predictions
263:21 against the actual observed targets,
263:23 which is stored in Y.
263:25 If you all remember,
263:26 we stored our actual observed targets in Y.
263:29 After this, TensorFlow will conduct
263:31 something known as optimization step,
263:33 and it'll update the network's parameters
263:35 like the weight of the network and the bias.
263:37 So after having update your weight and the bias,
263:40 the next batch is sampled and the process gets repeated.
263:44 So this procedure will continue
263:45 until all the batches have presented to the network.
263:49 And one full sweep over all batches
263:51 is known as an epoch.
263:53 So I've defined this entire thing over here.
263:55 So we're gonna go through 10 epochs,
263:58 meaning that all the batches
263:59 are going to go through training,
264:01 meaning you're going to input each batch that is X,
264:04 and it'll flow through the network
264:06 until it reaches the output layer.
264:08 There what happens is TensorFlow
264:09 will compare your predictions.
264:11 That is basically what your model predicted
264:13 against the actual observed targets
264:16 which is stored in Y.
264:17 After this, TensorFlow will perform optimization
264:20 wherein it'll update the network paramters
264:23 like your weight and your bias.
264:25 After you update the weight and the bias,
264:27 the next batch will get sampled
264:29 and the process will keep repeating.
264:31 This happens until all the batches are
264:33 implemented in the network.
264:35 So what I just told you was one epoch.
264:37 We're going to repeat this 10 times.
264:39 So a batch size is 256,
264:41 meaning that we have 256 batches.
264:44 So here we're going to assign x and y,
264:46 what I just spoke to you about.
264:48 The mini batch training starts over here
264:51 so, basically, your first batch
264:52 will start flowing through the network
264:54 until it reaches the output layer.
264:56 After this, TensorFlow will compare your model's prediction.
264:59 This is where predictions happen.
265:01 It'll compare your model's prediction
265:03 to the actual observed targets
265:05 which is stored in y.
265:07 Then TensorFlow will start doing optimization,
265:10 and it'll update the network paramters
265:12 like your weight and your bias.
265:14 So after you update the weight and the biases,
265:16 the next batch will get input into the network,
265:19 and this process will keep repeating.
265:21 This process will repeat 10 times
265:23 because we've defined 10 epochs.
265:26 Now, also during the training,
265:28 we evaluate the network's prediction on the test set,
265:32 which is basically the data which we haven't learned,
265:34 but this data is set aside for every fifth batch,
265:38 and this is visualized.
265:40 So in our problem statement,
265:42 what a network is going to do
265:43 is it's going to predict the stock price
265:46 continuously over a time period of T plus one.
265:49 We're feeding it data about a stock price at time T.
265:53 It's going to give us an output of time T plus one.
265:56 Now let me run this code
265:57 and let's see how close our predicted values are
266:00 to the actual values.
266:02 We're going to visualize this entire thing,
266:04 and we've also exported this
266:06 in order to combine it into a video animation.
266:09 I'll show you what the video looks like.
266:12 So now let's look at our visualization.
266:14 We'll look at our output.
266:15 So the orange basically shows our model's prediction.
266:19 So the model quickly learns the shape
266:21 and the location of the time series in the test data
266:25 and showing us an accurate prediction.
266:27 It's pretty close to the actual prediction.
266:29 Now as I'm explaining this to you,
266:31 each batch is running here.
266:33 We are at epoch two.
266:34 We have 10 epochs to go over here.
266:36 So you can see that the network is actually adapting
266:39 to the basic shape of the time series,
266:41 and it's learning finer patterns in the data.
266:44 You see it keeps learning patterns
266:46 and the production is getting closer and closer
266:48 after every epoch.
266:50 So let just wait til we reach epoch 10
266:52 and we complete the entire process.
266:57 So guys, I think the predictions are pretty close,
266:59 like the pattern and the shape is learned very well
267:02 by our neural network.
267:04 It is actually mimicking this network.
267:07 The only deviation is in the values.
267:10 Apart from that, it's learning the shape
267:12 of the time series data in almost the same way.
267:15 The shape is exactly the same.
267:17 It looks very similar to me.
267:19 Now, also remember that there are a lot of ways
267:21 of improving your result.
267:23 You can change the design of your layers
267:25 or you can change the number of neurons.
267:27 You can choose different initialization functions
267:30 and activation functions.
267:32 You can introduce something known as dropout layers
267:34 which basically help you to get rid of overfitting,
267:38 and there's also something known as early stopping.
267:41 Early stopping helps you understand
267:42 where you must stop your batch training.
267:45 That's also another method that you can implement
267:47 for improving your model.
267:49 Now there are also different types of deep learning model
267:52 that you can use for this problem.
267:54 Here we use the feedforward network,
267:56 which basically means that the batches
267:58 will flow from left to right.
268:00 Okay, so our 10 epochs are over.
268:02 Now the final thing that's getting calculate is our error,
268:05 MSC or mean squared error.
268:07 So guys, don't worry about this warning.
268:09 It's just a warning.
268:11 So our mean square error comes down to 0.0029
268:15 which is pretty low because the target is scaled.
268:18 And this means that our accuracy is pretty good.
268:21 So guys, like I mentioned,
268:23 if you want to improve the accuracy of the model,
268:25 you can use different schemes,
268:27 you can use different initialization functions,
268:30 or you can try out different transformation functions.
268:32 You can use something known as dropout technique
268:35 and early stopping in order to make the training phase
268:37 even more better.
268:39 So guys, that was the end of our deep learning demo.
268:42 I hope all of you understood the deep learning demo.
268:45 For those of you who are just learning
268:47 deep learning for the first time,
268:48 it might be a little confusing.
268:50 So if you have any doubts regarding the demo,
268:52 let me know in the comment section.
268:53 I'll also leave a couple of links in the description box,
268:56 so that you can understand deep learning
268:58 in a little more depth.
269:00 Now let's look at our final topic for today,
269:02 which is natural language processing.
269:05 Now before we understand what text mining is
269:07 and what natural language processing is,
269:10 we have to understand the need for text mining
269:12 and natural language processing.
269:14 So guys, the number one reason why we need
269:17 text mining and natural language processing
269:19 is because of the amount of data
269:21 that we're generating during this time.
269:23 Like I mentioned earlier,
269:24 there are around 2.5 quintillion bytes of data
269:27 that is created every day,
269:29 and this number is only going to grow.
269:32 With the evolution of communication
269:33 through social media,
269:35 we generate tons and tons of data.
269:37 The numbers are on your screen.
269:39 These numbers are literally for every minute.
269:41 On Instagram, every minute, 1.7 million pictures are posted.
269:46 Okay, 1.7 or more than 1.7 million pictures are posted.
269:50 Similarly, we have tweets.
269:52 We have around 347,000 tweets every minute on Twitter.
269:57 This is actually a lot and lot of data.
270:00 So, every time we're using a phone,
270:01 we're generating way too much data.
270:04 Just watching a video on YouTube
270:05 is generating a lot of data.
270:07 When sending text messages from WhatsApp,
270:10 that is also generating tons and tons of data.
270:12 Now the only problem is not our data generation.
270:16 The problem is that out of all the data
270:18 that we're generating, only 21% of the data
270:21 is structured and well-formatted.
270:23 The remaining of the data is unstructured,
270:25 and the major source of unstructured data include
270:28 text messages from WhatsApp, Facebook likes,
270:32 comments on Instagram, bulk emails
270:34 that we send out ever single day.
270:36 All of this accounts for the unstructured data
270:38 that we have today.
270:39 Now the question here is what can be done
270:42 with so much data.
270:44 Now the data that we generate
270:45 can be used to grow businesses.
270:47 By analyzing and mining the data,
270:50 we can add more value to a business.
270:52 This exactly what text mining is all about.
270:56 So text mining or text analytics
270:58 is the analysis of data available to us
271:01 in a day-to-day spoken or written language.
271:04 It is amazing so much data that we generate
271:07 can actually be used in text mining.
271:10 We have data from word Word documents,
271:11 PowerPoints, chat messages, emails.
271:14 All of this is used to add value to a business
271:17 now the data that we get from sources
271:19 like social media, IoT,
271:20 they are mainly unstructured,
271:22 and unstructured data cannot be used
271:24 to draw useful insights to grow a business.
271:27 That's exactly why we need to text mining.
271:30 Text mining or text analytics
271:32 is the process of deriving meaningful information
271:35 from natural language text.
271:37 So, all the data that we generate through text messages,
271:40 emails, documents, files,
271:42 are written in natural language text.
271:44 And we are going to use text mining
271:46 and natural language processing
271:48 to draw useful insights or patterns from such data.
271:51 Now let's look at a few examples
271:53 to show you how natural language processing
271:55 and text mining is used.
271:57 So now before I move any further,
271:58 I want to compare text mining and NLP.
272:01 A lot of you might be confused
272:03 about what exactly text mining is
272:05 and how is it related to natural language processing.
272:08 A lot of people have also asked me
272:10 why is NLP and text mining
272:12 considered as one and the same
272:13 and are they the same thing.
272:15 So, basically, text mining is a vast field
272:18 that makes use of natural language processing
272:21 to derive high quality information from the text.
272:24 So, basically, text mining is a process,
272:26 and natural language processing is a method
272:29 used to carry out text mining.
272:31 So, in a way, you can say that text mining
272:33 is a vast field which uses and NLP
272:36 in order perform text analysis and text mining.
272:40 So, NLP is a part of text mining.
272:43 Now let's understand what exactly
272:45 natural language processing is.
272:47 Now, natural language processing
272:48 is a component of text mining
272:50 which basically helps a machine in reading the text.
272:54 Obviously, machines don't actually known English or French,
272:57 they interpret data in the form of zeroes and ones.
273:01 So this is where natural language processing comes in.
273:04 NLP is what computers and smart phones
273:06 use to understand our language,
273:08 both spoken and written language.
273:11 Now because use language to interact with our device,
273:14 NLP became an integral part of our life.
273:17 NLP uses concepts of computer science
273:19 and artificial intelligence
273:20 to study the data and derive useful information from it.
273:24 Now before we move any further,
273:26 let's look at a few applications of NLP and text mining.
273:30 Now we all spend a lot of time surfing the webs.
273:33 Have you ever notice that
273:35 if you start typing a word on Google,
273:38 you immediately get suggestions like these.
273:40 These feature is also known as auto complete.
273:43 It'll basically suggest the rest of the word for you.
273:46 And we also have something known as spam detection.
273:49 Here is an example of how Google recognizes
273:52 the misspelling Netflix
273:53 and shows results for keywords that match your misspelling.
273:57 So, the spam detection is also based
273:59 on the concepts of text mining
274:00 and natural language processing.
274:02 Next we have predictive typing and spell checkers.
274:05 Features like auto correct, email classification
274:08 are all applications of text mining and NLP.
274:12 Now we look at a couple of more applications
274:14 of natural language processing.
274:16 We have something known as sentimental analysis.
274:19 Sentimental analysis is extremely useful
274:21 in social media monitoring,
274:23 because it allows us to gain an overview
274:26 of the wider public opinion behind certain topics.
274:29 So, basically, sentimental analysis
274:31 is used to understand the public's opinion
274:34 or customer's opinion on a certain product
274:36 or on a certain topic.
274:38 Sentimental analysis is actually a very huge part
274:41 of a lot of social media platforms
274:44 like Twitter, Facebook.
274:45 They use sentimental analysis very frequently.
274:48 Then we have something known as chatbot.
274:51 Chatbots are basically the solutions
274:52 for all the consumer frustration,
274:55 regarding customer call assistance.
274:57 So we have companies like Pizza Hut, Uber
274:59 who have started using chatbots
275:01 to provide good customer service,
275:04 apart form that speech recognition.
275:06 NLP has widely been used in speech recognition.
275:09 We're all aware of Alexa, Siri, Google Assistant,
275:12 and Cortana.
275:13 These are all applications of natural language processing.
275:16 Machine translation is another important application of NLP.
275:20 An example of this is the Google Translator
275:22 that uses NLP to process and translate
275:25 one language to the other.
275:27 Other application include spell checkers,
275:29 keywords search, information extraction,
275:32 and NLP can be used to get useful information
275:34 from various website, from word documents,
275:37 from files, and et cetera.
275:39 It can also be used in advertisement matching.
275:42 This basically means a recommendation of ads
275:44 based on your history.
275:46 So now that you have a basic understanding of where
275:48 natural language processing is used
275:50 and what exactly it is,
275:52 let's take a look at some important concepts.
275:55 So, firstly, we're gonna discuss tokenization.
275:58 Now tokenization is the mos basic step in text mining.
276:03 Tokenization basically means breaking down data
276:06 into smaller chunks or tokens
276:08 so that they can be easily analyzed.
276:11 Now how tokenization works is
276:13 it works by breaking a complex sentence into words.
276:17 So you're breaking a huge sentence into words.
276:19 You'll understand the importance of each of the word
276:22 with respect to the whole sentence,
276:24 after which will produce a description
276:26 on an input sentence.
276:28 So, for example, let's say we have this sentence,
276:31 tokens are simple.
276:33 If we apply tokenization on this sentence,
276:36 what we get is this.
276:38 We're just breaking a sentence into words.
276:40 Then we're understanding the importance
276:42 of each of these words.
276:44 We'll perform NLP process on each of these words
276:47 to understand how important each word
276:49 is in this entire sentence.
276:51 For me, I think tokens and simple are important words,
276:54 are is basically another stop word.
276:57 We'll be discussing about stop words in our further slides.
277:00 But for now, you eed to understand that tokenization
277:02 is a very simple process that involves
277:04 breaking sentences into words.
277:07 Next, we have something known as stemming.
277:10 Stemming is basically normalizing words
277:12 into its base form or into its root form.
277:15 Take a look at this example.
277:17 We have words like detection,
277:19 detecting, detected, and detections.
277:22 Now we all know that the root word
277:24 for all these words is detect.
277:27 Basically, all these words mean detect.
277:30 So the stemming algorithm works by cutting off the end
277:33 or the beginning of the word
277:35 and taking into account a list of common prefixes
277:38 and suffixes that can be found on any word.
277:42 So guys, stemming can be successful in some cases,
277:45 but not always.
277:46 That is why a lot of people affirm that
277:48 stemming has a lot of limitations.
277:51 So, in order to overcome the limitations of stemming,
277:54 we have something known as lemmatization.
277:56 Now what lemmatization does is
277:58 it takes into consideration the morphological analysis
278:02 of the words.
278:03 To do so, it is necessary to have a detailed dictionary
278:07 which the algorithm can look through to link the form
278:10 back to its lemma.
278:11 So, basically lemmatization
278:13 is also quite similar to stemming.
278:15 It maps different words into one common root.
278:18 Sometimes what happens in stemming is that
278:22 most of the words gets cut off.
278:23 Let's say we wanted to cut detection into detect.
278:27 Sometimes it becomes det or it becomes tect,
278:30 or something like that.
278:31 So because of this, the grammar
278:33 or the importance of the word goes away.
278:36 You don't know what the words mean anymore.
278:39 Due to the indiscriminate cutting of the word,
278:42 sometimes the grammar the understanding of the word
278:45 is not there anymore.
278:47 So that's why lemmatization was introduced.
278:50 The output of lemmatization is always going to be
278:52 a proper word.
278:54 Okay, it's not going to be something that is half cut
278:56 or anything like that.
278:58 You're going to understand the morphological analysis
279:00 and then only you're going to perform lemmatization.
279:03 An example of a lemmatizer
279:05 is you're going to convert gone, going, and went into go.
279:10 All the three words anyway mean the same thing.
279:12 So you're going to convert it into go.
279:14 We are not removing the first and the last part of the word.
279:18 What we're doing is we're understanding
279:19 the grammar behind the word.
279:21 We're understanding the English
279:23 or the morphological analysis of the word,
279:25 and only then we're going to perform lemmatization.
279:29 That's what lemmatization is all about.
279:31 Now stop words are basically a set of
279:34 commonly used words in any language, not just English.
279:38 Now the reason why stop words
279:39 are critical to many applications is that
279:42 if we remove the words that are very commonly used
279:45 in a given language,
279:46 we can finally focus on the important words.
279:49 For example, in the context of a search engine,
279:52 let's say you open up Google
279:54 and you try how to make strawberry milkshake.
279:57 What the search engine is going to do is
279:59 it's going to find a lot more pages
280:01 that contain the terms how to make,
280:04 rather than pages which contain the recipe
280:07 for your strawberry milkshake.
280:09 That's why you have to disregard these terms.
280:11 The search engine can actually focus
280:13 on the strawberry milkshake recipe,
280:15 instead of looking for pages that have how to and so on.
280:19 So that's why you need to remove these stop words.
280:22 Stop words are how to, begin, gone, various, and, the,
280:27 all of these are stop words.
280:30 They are not necessarily important
280:32 to understand the importance of the sentence.
280:34 So you get rid of these commonly used words,
280:37 so that you can focus on the actual keywords.
280:40 Another term you need to understand
280:41 is document term matrix.
280:44 A document term matrix is basically a matrix
280:46 with documents designated by roles and words by columns.
280:51 So if your document one has this sentence, this is fun,
280:55 or has these word, this is fun,
280:57 then you're going to get one, one, one over here.
281:00 In document two, if you see we have this and we have is,
281:04 but we do not have fun.
281:06 So that's what a document term matrix is.
281:08 It is basically to understand whether your document
281:11 contains each of these words.
281:12 It is a frequency matrix.
281:14 That is what a document term matrix is.
281:17 Now let's move on and look at
281:19 a natural language processing demo.
281:21 So what we're gonna do is we're gonna perform
281:23 sentimental analysis.
281:25 Now like I said, sentimental analysis
281:27 is one of the most popular applications
281:30 of natural language processing.
281:31 It refers to the processing of determining
281:34 whether a given piece of text or a given sentence of text
281:39 is positive or negative.
281:41 So, in some variations, we consider
281:43 a sentence to also be neutral.
281:45 That's a third option.
281:47 And this technique is commonly used to discover
281:49 how people feel about a particular topic
281:52 or what are people's opinion about a particular topic.
281:55 So this is mainly used to analyze the sentiments of users
281:59 in various forms,
282:00 such as in marketing campaigns, in social media,
282:03 in e-commerce websites, and so on.
282:05 So now we'll be performing sentimental analysis
282:07 using Python.
282:09 So we are going to perform natural language processing
282:11 by using the NaiveBayesClassifier.
282:14 That's why we are importing the NaiveBayesClassifier.
282:17 So guys, Python provides a library known
282:19 as natural language toolkit.
282:22 This library contains all the functions that are needed
282:24 to perform natural language processing.
282:26 Also in this library,
282:28 we have a predefined data set called movie reviews.
282:31 What we're gonna do is we're going to download that
282:34 from our NLTK, which is natural language toolkit.
282:37 We're basically going to run our analysis
282:39 on this movie review data set.
282:41 And that's exactly what we're doing over here.
282:43 Now what we're doing is we're defining a function
282:46 in order to extract features.
282:48 So this is our function.
282:49 It's just going to extract all our words.
282:52 Now that we've extracted the data,
282:53 we need to train it,
282:54 so we'll do that by using our movie reviews data set
282:58 that we just downloaded.
282:59 We're going to understand
283:00 the positive words and the negative words.
283:03 So what we're doing here is we're just loading our positive
283:05 and our negative reviews.
283:06 We're loading both of them.
283:08 After that, we'll separate each of these
283:10 into positive features and negative features.
283:12 This is pretty understandable.
283:14 Next, we'll split the data
283:16 into our training and testing set.
283:17 Now this is something that we've been doing
283:19 for all our demos.
283:20 This is also known as data splicing.
283:22 We've also set a threshold factor of 0.8
283:25 which basically means that 80% of your data set
283:28 will belong to your training,
283:29 and 20% will be for your testing.
283:32 You're going to do this even for your positive
283:33 and your negative words.
283:35 After that, you're just extracting the features again,
283:38 and you're just printing
283:39 the number of training data points that you have.
283:41 You're just printing the length of your training features
283:44 and you're printing the length
283:45 of your testing features.
283:46 We can see the output, let's run this program.
284:03 So if you see that we're getting
284:04 the number of training data points as 1,600
284:07 and your number of testing data points are 400,
284:10 there's an 80 to 20% ration over here.
284:13 After this, we'll be using the NaiveBayesClassifier
284:16 and we'll define the object
284:17 for the NaiveBayesClassifier with basically classifier,
284:20 and we'll train this using our training data set.
284:23 We'll also look at the accuracy of our model.
284:26 The accuracy of our classifier is around 73%,
284:30 which is a really good number.
284:32 Now this classifier object will actually contain
284:35 the most informative words
284:36 that are obtained during analysis.
284:39 These words are basically essential in understanding
284:42 which word is classified as positive
284:44 and which is classified as negative.
284:46 What we're doing here is we're going to review movies.
284:49 We're going to see which movie review is positive
284:52 or which movie review is negative.
284:54 Now this classifier will basically have
284:56 all the informative words that will help us decide
284:59 which is a positive review or a negative review.
285:02 Then we're just printing these 10 most informative words,
285:07 and we have outstanding, insulting,
285:09 vulnerable, ludicrous, uninvolving,
285:12 avoids, fascination, and so on.
285:15 These are the most important words in our text.
285:19 Now what we're gonna do is we're gonna test our model.
285:22 I've randomly given some reviews.
285:24 If you want, let's add another review.
285:27 We'll say
285:29 I loved the
285:33 movie.
285:35 So I've added another review over here.
285:38 Here we're just printing the review,
285:39 and we're checking if this is a positive review
285:41 or a negative review.
285:43 Now let's look at our predictions.
285:45 We'll save this and...
285:54 I forgot to put a comma over here.
285:57 Save it and let's run the file again.
286:05 So these were our randomly written movie reviews.
286:09 The predicted sentiment is positive.
286:11 Our probability score was 0.61.
286:14 It's pretty accurate here.
286:16 This is a dull movie and I would never recommend it,
286:18 is a negative sentiment.
286:21 The cinematography is pretty great,
286:23 that's a positive review.
286:25 The movie is pathetic is obviously a negative review.
286:28 The direction was terrible,
286:29 and the story was all over the place.
286:32 This is also considered as a negative review.
286:34 Similarly, I love the movie is what I just inputted,
286:37 and I've got a positive review on that.
286:39 So our classifier actually works really well.
286:43 It's giving us good accuracy
286:44 and it's classifying the sentiments very accurately.
286:48 So, guys, this was all about sentimental analysis.
286:50 Here we basically saw if a movie review
286:52 was positive or negative.
286:54 So guys, that was all for our NLP demo.
286:57 I hope all of you understood this.
286:58 It was a simple sentimental analysis
287:01 that we saw through Python.
287:02 So again, if you have doubts,
287:03 please leave them in the comment section,
287:05 and I'll help you with all of the queries.
287:08 So guys, that was our last module,
287:10 which was on natural language processing.
287:12 Now before I end today's session,
287:14 I would like to discuss with you
287:16 the machine learning engineers program
287:18 that we have Edureka.
287:20 So we all are aware of
287:22 the demand of the machine learning engineer.
287:24 So, at Edureka, we have a master's program
287:28 that involves 200-plus hours of interactive training.
287:32 So the machine learning master's program at Edureka
287:34 has around nine modules and 200-plus hours
287:38 of interactive learning.
287:40 So let me tell you the curriculum
287:42 that this course provides.
287:43 So your first module will basically cover
287:45 Python programming.
287:46 It'll have all the basics and all your data visualization,
287:50 your GUI programming, your functions,
287:53 and your object-oriented concepts.
287:55 The second module will cover machine learning with Python.
287:59 So you'll supervise algorithms
288:00 and unsupervised algorithms
288:02 along with statistics and time series
288:04 in Python will be covered in your second module.
288:06 Your third module will have graphical modeling.
288:09 This is quite important when ti comes to machine learning.
288:12 Here you'll be taught about decision making,
288:14 graph theory, inference, and Bayesian and Markov's network,
288:18 and module number four will cover
288:20 reinforcement learning in depth.
288:22 Here you'll understanding dynamic programming,
288:24 temporal difference, Bellman equations,
288:27 all the concepts of reinforcement learning in depth.
288:30 All the detail in advance concepts
288:32 of reinforcement learning.
288:34 So, module number five will cover NLP with Python.
288:37 You'll understand tokenization, stemming lemmatization,
288:40 syntax, tree parsing, and so on.
288:43 And module number six will have module six will have
288:45 artificial intelligence and deep learning with TensorFlow.
288:48 This module is a very advanced version
288:51 of all your machine learning
288:52 and reinforcement learning that you'll learn.
288:54 Deep learning will be in depth over here.
288:56 You'll be using TensorFlow throughout.
288:58 They'll cover all the concepts that we saw, CNN, RNN.
289:02 it'll cover the various type of neural networks,
289:05 like convolutional neural networks,
289:07 recurrent neural networks,
289:09 long, short-term memory, neural networks,
289:12 and auto encoders and so on.
289:14 The seventh module is all about PySpark.
289:17 It'll show you how Spark SQL works
289:19 and all the features and functions of Spark ML library.
289:23 And the last module will finally cover
289:24 about Python Spark using PySpark.
289:27 Appropriate from this seven modules,
289:28 you'll also get two free self-paced courses.
289:32 Let's actually take a look at the course.
289:35 So this is your machine learning
289:36 engineer master's program.
289:38 You'll have nine courses, 200-plus hours
289:41 of interactive learning.
289:45 This is the whole course curriculum,
289:46 which we just discussed.
289:48 Here there are seven modules.
289:49 Apart from these seven modules,
289:50 you'll be given two free self-paced courses,
289:53 which I'll discuss shortly.
289:55 You can also get to know the average annual salary
289:57 for a machine learning engineer,
289:59 which is over $134,000.
290:03 And there are also a lot of job openings
290:05 in the field of machine learning AI and data science.
290:09 So the job titles that you might get
290:10 are machine learning engineer, AI engineer,
290:13 data scientist, data and analytics manger,
290:16 NLP engineer, and data engineer.
290:18 So this is basically the curriculum.
290:21 Your first will by Python programming certification,
290:23 machine learning certification using Python,
290:26 graphical modeling, reinforcement learning,
290:29 natural language processing,
290:30 AI and deep learning with TensorFlow.
290:33 Python Spark certification training using PySpark.
290:36 If you want to learn more about each of these modules,
290:38 you can just go and view the curriculum.
290:41 They'll explain each and every concept
290:43 that they'll be showing in this module.
290:45 All of this is going to be covered here.
290:48 This is just the first module.
290:52 Now at the end of this project,
290:53 you will be given a verified certificate of completion
290:56 with your name on it,
290:58 and these are the free elective courses
291:00 that you're going to get.
291:01 One is your Python scripting certification training.
291:05 And the other is your Python Statistics
291:07 for Data Science Course.
291:09 Both of these courses explain Python in depth.
291:12 The second course on statistics
291:13 will explain all the concepts
291:15 of statistics probability, descriptive statistics,
291:18 inferential statistics,
291:21 time series, testing data,
291:23 data clustering, regression modeling, and so on.
291:26 So each of the module is designed in such a way that
291:28 you'll have a practical demo
291:30 or a practical implementation
291:32 after each and every model.
291:34 So all the concept that I theoretically taught to you
291:36 will be explained through practical demos.
291:39 This way you'll get a good understanding of
291:41 the entire machine learning and AI concepts.
291:45 So, if any of you are interested
291:46 in enrolling for this program
291:48 or if you want to learn more about
291:50 the machine learning course offered by Edureka,
291:52 please leave your email IDs in the comment section,
291:55 and we'll get back to you
291:56 with all the details of the course.
291:58 So guys, with this, we come to the end
292:00 of this AI full course session.
292:03 I hope all of you have understood the basic concepts
292:05 and the idea behind AI machine learning, deep learning,
292:09 and natural language processing.
292:10 So if you still have doubts
292:12 regarding any of these topics,
292:13 mention them in the comment section,
292:15 and I'll try to answer all your queries.
292:17 So guys, thank you so much for joining me in this session.
292:20 Have a great day.
292:21 I hope you have enjoyed listening to this video.
292:23 Please be kind enough to like it,
292:25 and you can comment any of your doubts and queries,
292:29 and we will reply them at the earliest.
292:31 Do look out for more videos in our playlist
292:34 and subscribe to Edureka channel to learn more.
292:37 Happy learning.