YouTube Transcript:
Machine Learning Full Course - Learn Machine Learning 10 Hours | Machine Learning Tutorial | Edureka

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Video Summary

Summary

Core Theme

This content provides a comprehensive overview of machine learning, its subfields (AI, deep learning), types (supervised, unsupervised, reinforcement), algorithms, and practical applications, along with an introduction to essential tools like Jupyter Notebook and statistical concepts. It aims to equip learners with the foundational knowledge and skills required for a career in machine learning.

Mind Map

Click to expand

Click to explore the full interactive mind map • Zoom, pan, and navigate

I'm sure you all agree that machine learning is one

of the hottest Trend in today's market right Gartner predicts

that by 2022 there would be at least 40%

of new application development project going on in the market

that would be requiring machine learning co-developers

on their team.

It's expected that these project will generate a revenue

of around three point nine trillion dollar,

isn't it cute so looking at the huge?

Upcoming demand of machine learning around the world.

We guys at Eureka have come up

and designed a well-structured machine learning full course

for you guys.

But before we actually drill down over there,

let me just introduce myself.

Hello all I am Atul from Edureka.

And today I'll be guiding you

through this entire machine learning course.

Well, this course has been designed in a way

that you get the most out of it.

So we'll slowly and gradually start

with a beginner level and then move towards the advanced topic.

So without delaying any further,

let's start with the agenda of today's Action

on machine learning course has been segregated

into six different module will start our first module

with introduction to machine learning here.

We'll discuss things.

Like what exactly is machine learning

how it differs from artificial intelligence and the planning

what is various types or dead space application

and finally we'll end up first module

with a basic demo and python.

Okay a second module focuses on starts

and probability here will cover things

like descriptive statistics and inferential statistics to Bob.

Rarity Theory and so

on our third module is unsupervised learning.

Well supervised learning is one of a type of machine learning

which focuses mainly

on regression and classification type of problem.

It deals with label data sets and the algorithm

which are a part of it are linear regression

logistic regression Napier's random Forest decision tree

and so on.

Our fourth module is on unsupervised learning.

Well this module focuses mainly on dealing

with unlabeled data sets

and the algorithm which are a part.

Offered or k-means algorithm

and a priori algorithm as a part of fifth module.

We have reinforcement learning here.

We are going to discuss about reinforcement learning

and depth on also

about Q learning algorithm finally in the end.

It's all about to make you industry ready.

Okay.

So here we are going to discuss about three different projects

which are based on supervised learning

and unsupervised learning

and reinforcement learning finally in the end.

I tell you about some of the skills

that you need to become a machine learnings and Jean.

Nia okay, and also I am discussing about some

of the important questions

that are asked in a machine-learning interview fine

with this we come to the end of this agenda

before you move ahead

don't forget to subscribe to a dareka and press

the Bell icon to never miss any update from us.

Hello everyone.

This is a toll from Eureka

and welcome to today's session on what is machine learning.

As you know,

we are living in a world of humans

and machines humans have been evolving

and learning from the past experience since millions

of years on the other hand the era of machines

and robots have just begun in today's world.

These machines are the rewards are

like they need to be program

before they actually follow your instructions.

But what if the machine started to learn

on their own and this is

where machine learning comes

into picture machine learning is the core

of many futuristic technology advancement in our world.

And today you can see various examples

or implementation of machine learning around us

such as Tesla's self-driving car Apple Siri, Sophia.

I do bot and many more are there.

So what exactly is machine learning?

Well Machine learning is a subfield

of artificial intelligence

that focuses on the design of system

that can learn from and make decisions

and predictions based on the experience

which is data in the case of machines machine learning

enables computer to act

and make data-driven decisions rather than

Being explicitly programmed

to carry out a certain task these programs

are designed to learn

and improve over time

when exposed to new data.

Let's move on and discuss one

of the biggest confusion of the people in the world.

They think that all the three of them

the AI the machine learning and the Deep learning all are same,

you know, what they are wrong.

Let me clarify things

for you artificial intelligence is a broader concept

of machines being able to carry out tasks in a smarter way.

It covers anything which enables the computer to be.

Have like humans think of a famous Turing test to determine

whether a computer is capable of thinking

like a human being or not.

If you are talking to Siri on your phone

and you get an answer you're already very close to it.

So this was about the artificial intelligence now coming

to the machine learning part.

So as I already said machine learning is a subset

or a current application of AI it is based on the idea

that we should be able to give machine the access

to data and let them learn from done cells.

It's a subset of artificial intelligence.

Is that deals

with the extraction of pattern from data set?

This means that the machine can not only find the rules

for optimal Behavior,

but also can adapt to the changes in the world many

of the algorithms involved have been known

for decades centuries even thanks to the advances

in the computer science and parallel Computing.

They can now scale up to massive data volumes.

So this was about the machine learning part now coming over

to deep learning deep learning is a subset of machine learning

where similar machine learning.

Tamar used to train deep neural network.

So as to achieve better accuracy in those cases

where former was not performing up to the mark, right?

I hope now you understood that machine learning Ai

and deep learning all three are different.

Okay moving on ahead.

Let's see in general how a machine learning work.

One of the approaches is

where the machine learning algorithm is strained

using a labeled or unlabeled training data

set to produce a model

new input data is introduced to the machine learning algorithm

and it make prediction based on the model.

The prediction is evaluated for accuracy.

And if the accuracy is acceptable the machine

learning algorithm is deployed.

Now if the accuracy is not acceptable

the machine learning algorithm is strained again,

and again with an argument a training data set.

This was just in high-level example

as they are many more factor and other steps involved in it.

Now, let's move on and subcategorize the Machine

learning into three different types the supervised learning

and unsupervised learning and reinforcement

learning and let's see what each of them are how they work.

Work and how each

of them is used in the field of banking Healthcare retail

and other domains.

Don't worry.

I'll make sure

that I use enough examples and implementation of all three

of them to give you a proper understanding of it.

So starting with supervised learning.

What is it?

So let's see a mathematical definition

of supervised learning supervised learning is

where you have input variables X and an output variable Y

and you use an algorithm to learn the mapping function

from the input to the output.

That is y Affects

the goal is to approximate the mapping function.

So well that whenever you have a new input data

X you could predict the output variable.

That is why for that data, right?

I think this was confusing for you.

Let me simplify the definition of supervised learning

so we can rephrase the understanding

of the mathematical definition as a machine learning method

where each instances of a training data set is composed

of different input attribute

and an expected output the input attributes

of a training data set can be of any End of data it can be

a pixel of the image.

It can be a value of a data base row

or it can even be an audio frequency histogram right

for each input instance

and expected output values

Associated value can be discreet representing a category

or can be a real or continuous value in either case.

The algorithm learns the input pattern

that generate the expected output now

once the algorithm is strain,

it can be used to predict the correct output

of a never seen input.

You can see I image on your screen right

in this image.

And see that we are feeding raw inputs as image of Apple

to the algorithm as a part of the algorithm.

We have a supervisor who keeps on correcting

the machine or who keeps on training the machine.

It keeps on telling him that yes, it is a Apple.

No, it is not an apple things like that.

So this process keeps

on repeating until we get a final train model.

Once the model is ready.

It can easily predict the correct output

of a never seen input in this slide.

You can see

that we are giving an image of a green apple to the machine

and the Machine can easily identify it as yes,

it is an apple and it is giving the correct result right?

Let me make things more clearer to you.

Let's discuss another example of it.

So in this Slide,

the image shows an example

of a supervised learning process used to produce a model

which is capable of recognizing the ducks in the image.

The training data set is composed of labeled picture

of ducks and non Ducks.

The result of supervised learning process is

a predictor model

which is capable of associating a label duck.

Or not duck to the new image presented to the model.

Now one strain,

the resulting predictive model can be deployed

to the production environment.

You can see a mobile app.

For example once deployed it is ready to recognize

the new pictures right now.

You might be wondering why this category

of machine learning is named as supervised learning.

Well, it is called a supervised learning

because the process of an algorithm learning

from the training data set can be thought

of as a teacher supervising the learning process

if we know the correct answers.

I will go Rhythm iteratively makes

while predicting on the training data

and is corrected by the teacher the learning stops

when the algorithm achieves an acceptable level of performance.

Now, let's move on and see some

of the popular supervised learning algorithm.

So we have linear regression random forest

and support Vector machines.

These are just for your information.

We will discuss about these algorithms

in our next video.

Now, let's see some of the popular use cases

of supervised learning

so we have Donna codon or any other speech

Automation in your mobile phone trains using your voice

and one strain it start working based on the training.

This is an application of supervised learning suppose.

You are telling OK Google call Sam

or you say Hey Siri call Sam you get an answer to it

and action is performed

and automatically a call goes to Sam.

So these are just an example

of supervised learning next comes the weather up

based on some of the prior knowledge

like when it is sunny the temperature is high.

Fire when it is cloudy humidity is higher any kind of that they

predict the parameters for a given time.

So this is also an example of supervised learning

as we are feeding the data to the machine and telling

that whenever it is sunny.

The temperature should be higher whenever it is cloudy.

The humidity should be higher.

So it's an example of supervised learning.

Another example is biometric attendance

where you train the machine and after couple of inputs

of your biometric identity beat your thumb your iris

or yellow or anything

once trained Machine gun validate your future input

and can identify you next comes in the field of banking sector

in banking sector

supervised learning is used to predict the credit worthiness

of a credit card holder

by building a machine learning model to look

for faulty attributes by providing it

with a data on deliquent

and non-delinquent customers.

Next comes the healthcare sector in the healthcare sector.

It is used to predict the patient's readmission rates

by building a regression model

by providing data

on the patients treatment Administration and readmissions

to show variables

that best correlate with readmission.

Next comes the retail sector and Retail sector.

It is used to analyze the product

that a customer by together.

It does this by building a supervised model

to identify frequent itemsets

and Association rule from the transactional data now,

lets learn about the next category

of machine learning the unsupervised part mathematically

unsupervised learning is

where you only

have Put data X and no corresponding output variable.

The goal for unsupervised learning is to model

the underlying structure

or distribution in the data

in order to learn more about the data.

So let me rephrase you this in simple terms

in unsupervised learning approach the data instances

of a training data set do not have

an expected output Associated

to them instead unsupervised

learning algorithm detects pattern based

on innate characteristics

of the input data an example of machine learning tasks.

Ask that applies unsupervised learning is clustering

in this task similar data instances are grouped together

in order to identify clusters of data in this slide.

You can see that initially we have different varieties

of fruits as input.

Now these set of fruits as input X are given to the model.

Now, what is the model is trained using

unsupervised learning algorithm.

The model will create clusters on the basis of its training.

It will grip the similar fruits and make their cluster.

Let me make things more clearer to you.

Let's take another example of it.

So in this Slide the image below shows an example

of unsupervised learning process this algorithm processes

an unlabeled training data set

and based on the characteristics.

It grips the picture

into three different clusters of data despite the ability

of grouping similar data into clusters.

The algorithm is not capable to add labels to the crow.

The algorithm only knows which data instances are similar,

but it cannot identify the meaning of this group.

So, Now you might be wondering why this category

of machine learning is named as unsupervised learning.

So these are called as

unsupervised learning because unlike supervised learning ever.

There are no correct answer

and there is no teacher algorithms are left

on their own to discover

and present the interesting structure in the data.

Let's move on and see some

of the popular unsupervised learning algorithm.

So we have here k-means apriori algorithm

and hierarchical clustering now,

let's move on and see some of the examples

of Is learning suppose a friend invites you to his party

and where you meet totally strangers.

Now, you will classify them using unsupervised learning

as you don't have any prior knowledge about them

and this classification can be done on the basis

of gender age group dressing education qualification

or whatever way you might like now why

this learning is different from supervised learning

since you didn't use any pasta prior knowledge

about the people you kept on classifying them on the go

as they kept on coming you kept on classifying them.

Yeah, this category of people belong to this group

this category of people belong to that group and so on.

Okay, let's see one more example.

Let's suppose you have never seen a football match before

and by chance you watch a video on the internet.

Now, you can easily classify the players on the basis

of different Criterion,

like player wearing the same kind of Jersey are

in one class player wearing different kind

of Jersey aren't different class

or you can classify them on the basis

of their playing style like the guys are attacker.

So he's in one class.

He's a Defender he's Another class

or you can classify them.

Whatever Way You observe the things

so this was also an example of unsupervised learning.

Let's move on and see

how unsupervised learning is used in the sectors

of banking Healthcare undertale.

So starting at banking sector.

So in banking sector it is used to segment customers

by behavioral characteristic by surveying prospects

and customers to develop multiple segments

using clustering and Healthcare sector.

It is used to categorize the MRI data by normal or abnormal.

Ages it uses deep learning techniques to build a model

that learns from different features of images to recognize

a different pattern.

Next is the retail sector and Retail sector.

It is used to recommend the products to customer

based on their past purchases.

It does this by building a collaborative filtering model

based on the past purchases by them.

I assume you guys

now have a proper idea of what unsupervised learning means

if you have any slightest doubt

don't hesitate and add your doubt to the I'm in section.

So let's discuss the third

and the last type of machine learning

that is reinforcement learning.

So what is reinforcement learning?

Well reinforcement learning

is a type of machine learning algorithm

which allows software agents

and machine to automatically determine the ideal Behavior

within a specific context to maximize its performance.

The reinforcement learning is about interaction

between two elements

the environment and the learning agent

the learning agent leverages to mechanism namely exploration.

And exploitation when learning agent acts on trial

and error basis,

it is termed as exploration

and when it acts based on the knowledge gained

from the environment,

it is referred to as exploitation.

Now this environment rewards the agent for correct actions,

which is reinforcement signal leveraging the rewards

obtain the agent

improves its environment knowledge to select

the next action in this image.

You can see that the machine is confused

whether it is an apple or it's not an apple

then the Sheena's chain using reinforcement learning.

If it makes correct decision.

It get rewards point for it

and in case of wrong it gets a penalty for that.

Once the training is done.

Now.

The machine can easily identify which one of them is an apple.

Let's see an example here.

We can see that we have an agent

who has to judge from the environment to find out

which of the two is a duck the first task

he did is to observe the environment next.

We select some action using some policy.

It seems that the machine has made a wrong decision.

Bye.

Choosing a bunny as a duck.

So the machine will get penalty for it.

For example - 50.4 a wrong answer right now.

The machine will update its policy

and this will continue

till the machine gets an optimal policy

from the next time machine will know that bunny is not a duck.

Let's see some of the use cases of reinforcement learning

but before that lets see

how Pavlo trained his dog using reinforcement learning

or how he applied

the reinforcement method to train his dog.

Babu integrated learning

in four stages initially Pavlo gave me to his dog

and in response to the meet the dog started salivating next

what he did he created a sound

with the bell for this the dog did not respond anything

in the third part it tried to condition the dog

by using the bell

and then giving him the food seeing the food

the dog started salivating eventually a situation came

when the dog started salivating just after hearing the Bell even

if the food was not given to him as the The dog was reinforced

that whenever the master will ring the bell he

will get the food now.

Let's move on and see

how reinforcement learning is applied in the field

of banking Healthcare and Retail sector.

So starting with the banking sector

in banking sector reinforcement learning is used to create

a next best offer model

for a call center by building a predictive model

that learns over time

as user accept or reject offer made by the sales staff fine now

in healthcare sector it is used to allocate the scars.

Resources to handle different type of er

cases by building a Markov decision process

that learns treatment strategies for each type of er case next

and the last comes in retail sector.

So let's see

how reinforcement learning is applied to retail sector

and Retail sector.

It can be used to reduce excess stock

with Dynamic pricing by building a dynamic pricing model

that are just the price based

on customer response to the offers.

I hope by now you have attained some understanding of

what is machine learning and you are ready to move.

Move ahead.

Welcome to today's topic of discussion on AI

versus machine learning versus deep learning.

These are the term which have confused a lot

of people and if you two are one among them,

let me resolve it for you.

Well artificial intelligence is a broader umbrella

under which machine learning

and deep learning come you can also see in the diagram

that even deep learning

is a subset of machine learning so you can say

that all three of them The AI and machine learning

and deep learning are just the subset of each other.

So let's move on and understand

how exactly the differ from each other.

So let's start with artificial intelligence.

The term artificial intelligence

was first coined in the year 1956.

The concept is pretty old,

but it has gained its popularity recently.

But why well,

the reason is earlier we had very small amount of data

the data we had was not enough to predict the Turret result

but now there's a tremendous increase

in the amount of data statistics

suggest that by 2020 the accumulated volume

of data will increase

from 4.4 zettabyte stew roughly around 44 zettabytes

or 44 trillion jeebies

of data along with such enormous amount of data.

Now, we have more advanced algorithm

and high-end computing power and storage

that can deal with such large amount of data as a result.

It is expected

that 70% of The price will Implement a i

over the next 12 months

which is up from 40 percent in 2016 and 51 percent in 2017.

Just for your understanding.

What does AI well,

it's nothing but a technique

that enables the machine to act like humans

by replicating the behavior and nature with AI

it is possible

for machine to learn from the experience.

The machines are just their responses based

on new input there

by performing human-like tasks artificial intelligence can be

and to accomplish

specific tasks by processing large amount of data

and recognizing pattern in them.

You can consider

that building an artificial intelligence is like Building

a Church the first church took generations to finish.

So most of the workers were working in it never saw

the final outcome those working

on it took pride in their craft building bricks

and chiseling stone

that was going to be placed into the great structure.

So as AI researchers,

we should think of ourselves as humble brick makers was job.

It's just study

how to build components example Parts is planners

or learning algorithm or Etc anything

that someday someone and somewhere will integrate

into the intelligent systems some of the examples

of artificial intelligence from our day-to-day life

are Apple series chess-playing computer Tesla self-driving car

and many more these examples are based on deep learning

and natural language processing.

Well, this was about what is AI and how it gains its hype.

So moving on ahead.

Let's Gus about machine learning and see what it is

and why it was the when introduced well

Machine learning came into existence in the late 80s

and the early 90s,

but what were the issues with the people

which made the machine learning come into existence let

us discuss them one by one in the field of Statistics.

The problem was

how to efficiently train large complex model in the field

of computer science and artificial intelligence.

The problem was how to train more robust version

of AI system while in the case of Neuroscience.

Problem faced by the researchers was

how to design operation model of the brain.

So these were some of the issues

which had the largest influence and led to the existence

of the machine learning.

Now this machine learning shifted its focus

from the symbolic approaches.

It had inherited from the AI and move

towards the methods and model.

It had borrowed from statistics and probability Theory.

So let's proceed and see

what exactly is machine learning.

Well Machine learning is a subset of AI

which enables the computer to act

and make data-driven decisions to carry out a certain task.

These programs are algorithms are designed in a way

that they can learn and improve over time

when exposed to new data.

Let's see an example of machine learning.

Let's say you want to create a system

which tells the expected weight of a person based on its side.

The first thing you do is you collect the data.

Let's see there is

how your data looks like now each point

on the graph represent one data point to start

with we can draw a simple line to predict the weight based

on the height for Sample a simple line W equal x

minus hundred with W is waiting kgs and edges hide

and centimeter this line can help us to make the prediction.

Our main goal is to reduce the difference

between the estimated value and the actual value.

So in order to achieve it,

we try to draw a straight line that fits through all

these different points and minimize the error.

So our main goal is to minimize the error

and make them as small as possible decreasing the error

or the difference between the actual value and estimated.

Value increases the performance of the model further

on the more data points.

We collect the better.

Our model will become we can also improve our model

by adding more variables

and creating different production lines for them.

Once the line is created.

So from the next time if we feed a new data,

for example height of a person to the model,

it would easily predict the data for you and it will tell you

what has predicted weight could be.

I hope you got a clear understanding

of machine learning.

So moving on ahead.

Let's learn about deep learning now what is deep learning?

You can consider deep learning model as a rocket engine

and its fuel is its huge amount of data

that we feed to these algorithms the concept

of deep learning is not new,

but recently it's hype as increase

and deep learning is getting more attention.

This field is a particular kind of machine learning

that is inspired by

the functionality of our brain cells called neurons

which led to the concept of artificial neural network.

It simply takes

the data connection between all the artificial neurons

and adjust them according to the data pattern.

More neurons are added

at the size of the data is large it automatically features

learning at multiple levels of abstraction.

Thereby allowing a system

to learn complex function mapping without depending

on any specific algorithm.

You know, what no one actually knows what happens

inside a neural network and why it works so well,

so currently you can call it as a black box.

Let us discuss some of the example of deep learning

and understand it in a better way.

Let me start with in simple example

and explain you how things And at a conceptual level,

let us try and understand

how you would recognize a square from other shapes.

The first thing you do is you check

whether there are four lines associated with a figure

or not simple concept, right?

If yes, we further check

if they are connected and closed again a few years.

We finally check whether it is perpendicular

and all its sides are equal, correct.

If everything fulfills.

Yes, it is a square.

Well, it is nothing but a nested hierarchy of Concepts.

What we did here we took a complex task

of identifying a square

and this case and broken into simpler tasks.

Now this deep learning also does the same thing

but at a larger scale,

let's take an example of machine which recognizes

the animal the task of the machine is to recognize

whether the given image is of a cat or a dog.

What if we were asked to resolve the same issue using the concept

of machine learning what we would do first.

We would Define the features such as

check whether the animal has whiskers or not a check.

The animal has pointed ears

or not or whether its tail is straight or curved in short.

We will Define the facial features and let

the system identify which features are more important

in classifying a particular animal now

when it comes to deep learning it takes this to one step ahead

deep learning automatically finds are the feature

which are most important for classification compare

into machine learning

where we had to manually give out that features by now.

I guess you have understood that AI is the bigger picture

and machine learning and deep learning are it's apart.

So let's move on

and focus our discussion on machine learning

and deep learning the easiest way to understand the difference

between the machine learning and deep learning is to know

that deep learning is machine learning more specifically.

It is the next evolution of machine learning.

Let's take few important parameter

and compare machine learning with deep learning.

So starting with data dependencies,

the most important difference between deep learning

and machine learning is its performance as the volume

of the data gets From the below graph.

You can see

that when the size of the data is small deep learning algorithm

doesn't perform that well,

but why well,

this is because deep learning algorithm needs

a large amount of data to understand it perfectly

on the other hand the machine learning algorithm can easily

work with smaller data set fine.

Next comes the hardware dependencies deep learning

algorithms are heavily dependent on high-end machines

while the machine learning algorithm can work

on low and machines as Well,

this is because the requirement

of deep learning algorithm include gpus

which is an integral part

of its working the Deep learning algorithm requires gpus

as they do a large

amount of matrix multiplication operations,

and these operations

can only be efficiently optimized using a GPU

as it is built for this purpose.

Only our third parameter

will be feature engineering well feature engineering is a process

of putting the domain knowledge to reduce the complexity

of the data.

Make patterns more visible to learning algorithms.

This process is difficult and expensive in terms of time

and expertise in case of machine learning

most other features are needed to be identified by an expert

and then hand coded as per the domain

and the data type.

For example, the features

can be a pixel value shapes texture position orientation

or anything fine the performance

of most of the machine learning algorithm depends

on how accurately the features are identified and stood

where as in case

of deep learning algorithms it try to learn high level features

from the data.

This is a very distinctive part of deep learning

which makes it way ahead

of traditional machine learning deep learning reduces the task

of developing new feature extractor for every problem

like in the case

of CNN algorithm it first try to learn the low-level features

of the image such as edges and lines

and then it proceeds to the parts of faces of people

and then finally to the high-level representation

of the face.

I hope that things Getting clearer to you.

So let's move on ahead and see the next parameter.

So our next parameter is problem solving approach

when we are solving a problem using traditional machine

learning algorithm.

It is generally recommended

that we first break down the problem

into different sub parts solve them individually

and then finally combine them to get the desired result.

This is how the machine learning algorithm handles the problem

on the other hand the Deep learning algorithm

solves the problem from end to end.

Let's take an example.

To understand this suppose you have a task

of multiple object detection.

And your task is to identify.

What is the object and where it is present in the image.

So, let's see and compare.

How will you tackle this issue using the concept

of machine learning

and deep learning starting with machine learning

in a typical machine learning approach.

You would first divide the problem into two step

first object detection and then object recognization.

First of all,

you would use a bounding box detection algorithm

like grab could fight.

Sample to scan through the image

and find out all the possible objects.

Now, once the objects are recognized you would use

object recognization algorithm,

like svm with hog to recognize relevant objects.

Now, finally,

when you combine the result you would be able to identify.

What is the object and where it is present

in the image on the other hand in deep learning approach.

You would do the process from end to end for example

in a yellow net

which is a type of deep learning algorithm you would pass.

An image and it would give out the location along with the name

of the object.

Now, let's move on to our fifth comparison parameter

its execution time.

Usually a deep learning algorithm takes a long time

to train this is

because there's so

many parameter in a deep learning algorithm

that makes the training longer

than usual the training might even last for two weeks

or more than that.

If you are training completely from the scratch,

whereas in the case of machine learning,

it relatively takes much less time to train ranging

from a few weeks.

Too few Arts.

Now.

The execution time is completely reversed

when it comes to the testing of data during testing

the Deep learning algorithm takes much less time to run.

Whereas if you compare it with a KNN algorithm,

which is a type of machine learning algorithm the test

time increases as the size of the data increase last

but not the least we have interpretability as

a factor for comparison of machine learning

and deep learning.

This fact is the main reason why deep learning is still

thought ten times before anyone knew.

Uses it in the industry.

Let's take an example suppose.

We use deep learning to give

automated scoring two essays the performance it gives

and scoring is quite excellent and is near

to the human performance,

but there's an issue with it.

It does not reveal white

has given that score indeed mathematically.

It is possible to find out

that which node of a deep neural network were activated,

but we don't know

what the neurons are supposed to model

and what these layers of neurons are doing collectively.

So if To interpret the result

on the other hand machine learning algorithm,

like decision tree gives us a crisp rule for void chose

and watered chose.

So it is particularly easy to interpret the reasoning

behind therefore the algorithms like decision tree

and linear or logistic

regression are primarily used in industry for interpretability.

Let me summarize things

for you machine learning uses algorithm to parse

the data learn from the data

and make informed decision based on what it has learned fine.

in this deep learning structures algorithms in layers to create

artificial neural network

that can learn

and make Intelligent Decisions on their own finally

deep learning is a subfield of machine learning

while both fall under the broad category

of artificial intelligence deep learning is usually

what's behind the most

human-like artificial intelligence now

in early days scientists used to have a lab notebook

to Test progress results

and conclusions now Jupiter is a modern-day

to that allows data scientists

to record the complete analysis process much

in the same way other scientists use a lab notebook.

Now, the Jupiter product was originally developed as a part

of IPython project the iPad

and project was used to provide interactive online access

to python over time.

It became useful to interact

with other data analysis tools such as are in the same manner

with the split from python

the tool crew in in his current manifestation of Jupiter.

Now IPython is still an active tool

that's available for use.

The name Jupiter itself is derived from the combination

of Julia Python.

And our while Jupiter runs code

in many programming languages python is a requirement

for installing the jupyter notebook itself now

to download jupyter notebook.

There are a few ways in their official website.

It is strongly recommended installing Python and Jupiter

using Anaconda distribution,

which includes python Don't know what book

and other commonly used packages

for scientific Computing as well as data science.

Although one can also

do so using the pipe installation method personally.

What I would suggest is downloading an app

on a navigator, which is

a desktop graphical user interface included in Anaconda.

Now, this allows you to launch application

and easily manage conda packages environments

and channels without the need to use command line commands.

So all you need to do is go to another Corner dot orgy

and inside you go.

To Anaconda Navigators.

So as you can see here,

we have the conda installation code which you're going

to use to install it in your particular PC.

So either you can use these installers.

So once you download the Anaconda Navigator,

it looks something like this.

So as you can see here,

we have Jupiter lab jupyter notebook you have QT console,

which is IPython console.

We have spider which is somewhat similar to a studio

in terms of python again,

we have a studio so we have orange three

We have glue is and we have VSC code.

Our Focus today would be on this jupyter notebook itself.

Now when you launch the Navigator,

you can see there are many options available

for launching python as well.

As our instances Now by definition are jupyter.

Notebook is fundamentally

a Json file with a number of annotations.

Now, it has three main parts

which are the metadata The Notebook format and the list

of cells now you

should get yourself acquainted with the environment

that Jupiter user interface has a number of components.

So it's important to know

what our components you should be using

on a daily basis and you should get acquainted with it.

So as you can see here

our Focus today will be on the jupyter notebook.

So let me just launched the Japan and notebook.

Now what it does is creates a online python instance

for you to use it over the web.

So let's launch now

as you can see we have Jupiter on the top left

as expected and this acts as a button to go

to your home page whenever you click

on this you get back to your particular home paste.

Is the dashboard now there are three tabs displayed

with other files running and clusters.

Now, what will do is will understand all

of these three and understand

what are the importance

of these three tabs other file tab shows the list

of the current files in the directory.

So as you can see we have so many files here.

Now the running tab

presents another screen of the currently running processes

and the notebooks now the drop-down list for the terminals

and notebooks are populated with there.

Running numbers.

So as you can see inside,

we do not have any running terminals

or there no running notebooks as of now

and the cluster tab

presents another screen to display the list

of clusters available see in the top right corner of the screen.

There are three buttons which are upload new

and the refresh button.

Let me go back so you can see here.

We have the upload new and the refresh button.

Now the upload button is used to add files

to The Notebook space and you may also just drag and drop

as you would when handling files.

Similarly, you can drag

and drop notebooks into specific folders as well.

Now the menu with the new in the top residents

of further many of text file folders terminal

and Python 3.

Now, the test file option is used to add a text file

to the current directory Jupiter will open a new browser window

for you for the running new text editor.

Now, the text entered is automatically saved

and will be displayed in your notebooks files display.

Now the folder option

what it does is creates a new folder.

With the name Untitled folder and remember all the files

and folder names are editable.

Now the terminal option allows you to start

and IPython session.

The node would options available will be activated

when additional note books are available in your environment.

The Python 3 option is used to begin pythons recession

interactively in your note.

The interface looks like the following screen shot.

Now what you have is full file editing capabilities

for your script including saving as new file.

You also have a complete ID

for your python script now we come to the refresh button.

The refresh button is used to update the display.

It's not really necessary as a display is reactive

to any changes in the underlying file structure.

I had a talk with the files tab item.

There is a check box drop

down menu and a home button as you can see here.

We have the checkbox the drop-down menu

and the home button.

Now the check box is used to toggle all the checkboxes

in the item list.

So as you can see you can select all of these when either move

or either delete all of the file selected,

It or what you can do is select all

and deselect some of the files

as your wish now the drop down menu presents a list

of choices available,

which are the folders all notebooks running

and files to the folder section

will select all the folders in the display

and present account of the folders in the small box.

So as you can see here,

we have 18 number of folders now all the notebooks section

will change the count of the number of nodes

and provide you with three option

so you can see here.

It has selected all the given notebooks

which are a In a number

and you get the option to either duplicate the current notebook.

You need to move it view it edit it or delete.

Now, the writing section will select any running scripts

as you can see here.

We have zero running scripts

and update the count to the number selected.

Now the file section will select all the files

in the notebook display and update the counts accordingly.

So if you select the files here,

we are seven files as you can see here.

We have seven files some datasets CSV files

and text files now the home button.

Brings you back to the home screen of the notebook.

So on you to do is click on the jupyter.

Notebook lower.

It will bring you back to the Jupiter notebook dashboard.

Now, as you can see on the left hand side

of every item is a checkbox and I can and the items name.

The checkbox is used to build a set of files to operate

upon and the icon is indicated of of the type of the item.

And in this case,

all of the items are folder here coming down.

We have the ring notebooks.

And finally we have certain files which are the text files

and the As we files now a typical workflow

of any jupyter.

Notebook is to first of all create a notebook

for the project or your data analysis.

Add your analysis step coding

and output and Surround your analysis with organization

and presentation mark down to communicate

and entire story now interactive notebooks

that include widgets and display modules

will then be used by others by modifying parameters

and the data to note the effects of the changes now

if we talk about security jupyter notebooks are created

in order to Be shared with other users in many cases

over the Internet.

However, jupyter notebook can execute arbitrary code

and generate arbitrary code.

This can be a problem.

If malicious aspects have been placed

in the note Now the default security mechanism for Japan

or notebooks include raw HTML,

which is always sanitized and check for malicious coding.

Another aspect is you cannot run external Java scripts.

Now the cell contents,

especially the HTML and the JavaScript are not trusted

it requires user value.

Nation to continue

and the output from any cell is not trusted all other HTML

or JavaScript is never trusted

and clearing the output will cause the notebook

to become trusted

when save now notebooks can also use a security digest

to ensure the correct user is modifying the contents.

So for that what you need to do is a digest

what it does is takes into the account

the entire contents of the notebook and a secret

which is only known by The Notebook Creator

and this combination ensures

that malicious coding is is not going to be added

to the notebook

so you can add security to address to notebook

using the following command which I have given here.

So it's Jupiter the profile

what you have selected and inside you

what you need to do is security and notebook secret.

So what you can do is replace the notebooks

secret with your putter secret

and that will act as a key for the particular notebook.

So what you need to do is share that particular key

with all your colleagues

or whoever you want to share that particular notebook

with and in that case,

it keeps the notebooks.

Geode and away from other malicious coders

and all other aspect of Jupiter is configuration.

So you can configure some of the display parameters used

and presenting notebooks.

Now, these aren't configurable due to the use of product known

as code mirror to present and modify the notebook.

So cold mirror water basically is it is a JavaScript

based editor for the u.s.

Within the web pages and notebooks.

So what you do is what you do code mirror,

so as you can see here

code mirror is a versatile text editor implemented.

In JavaScript for the browser.

So what it does is

allow you to configure the options for Jupiter.

So now let's execute some python code

and understand the notebook in A Better Way Jupiter

does not interact

with your scripts as much as it executes your script

and request the result.

So I think this is how jupyter notebooks

have been extended to other languages besides python

as it just takes

a script runs it against a particular language engine

and across the output from the engine all

the while not Really knowing what kind

of a script is being executed now the new windows shows

and empty cell

for you to enter the python code know

what you need to do is under new you select the Python 3 and

what I will do is open a new notebook.

Now this notebook is Untitled.

So let's give the new work area and name python code.

So as you can see we have renamed this particular cell

now order save option should be on the next to the title

as you can see last.

Checkpoint a few days ago, unsaved changes.

The autosave option is always on what we do is

with an accurate name.

We can find the selection

and this particular notebook very easily

from The Notebook home page.

So if you select your browser's Home tab

and refresh you will find

this new window name displayed here again.

So if you just go to a notebook home

and as you can see,

I mentioned it by then quotes and under running.

Also, you have the pilot and quotes here.

So let's get back to the Particular page

or the notebook one thing to note here

that it has

and does an item icon versus a folder icon

though automatically assigned extension

as you can see here is ipy and be the IPython note and says

the item is in a browser in a Jupiter environment.

It is marked as running answer

is a file by that name in this directory as well.

So if you go to your directory,

let me go and check it.

So as you can see if you go into the users are you

can see we have the in class projects

that Python codes like the series automatically

have that particular IPython notebook created

in our working environment

and the local disk space also.

So if you open the IP y + B file in a text editor,

you will see basic context of a Jupiter code as you can see

if I'm opening it.

The cells are empty.

Nothing is there so let's type in some code here.

For example, I'm going to put in name equals edgy Rekha.

Next what I'm going to do is provide subscribers

that equals seven hundred gay and to run this particular cell.

What you need to do is click on the run Icon

and it will see here we have one.

So this is the first set to be executed in the second cell.

We enter python code

that references the variables from the first cell.

So as you can see here,

we have friend named has strings subscribers.

So let me just run this particular.

So as you can see here note.

Now that we have an output here

that Erica has 700k YouTube subscriber now

since more than 700 K now to know more about Jupiter

and other Technologies,

what you can do is subscribe to our Channel and get

updates on the latest trending Technologies.

So note that Jupiter color codes

your python just as decent editor vote

and we have empty braces

to the left of each code block such as you can see here.

If we execute the cell

the results are displayed in line now, it's interesting

that Jupiter keeps.

The output last generated in the saved version of the file

and it's a save checkpoints.

Now, if we were to rerun your cells using the rerun

or the run all the output would be generated

and c8y autosave now,

the cell number is incremented and as you can see

if I rerun this you see the cell number change

from one to three

and if I rerun this the Selma will change from 2 to 4.

So what Jupiter does is keeps a track of the latest version

of each cell so similarly

if you are to close the browser tab It's the display

in the Home tab.

You will find a new item we created

which is the python code your notebook saved autosaved

as you can see here in the bracket has autosaved.

So if we close this in the home button,

you can see here.

We have python codes.

So as you can see if we click that it opens the same notebook.

It has the previously

displayed items will be always there showing the output sweat

that we generated in the last run now

that we have seen how python Works

in Jupiter including the underlying encoding then

how this python.

This allows data set or data set Works in Jupiter.

So let me create another new python notebook.

So what I'm going to do is name this as pandas.

So from here,

what we will do is read in last dataset

and compute some standard statistics of data.

Now what we are interested in in seeing

how to use the pandas in Jupiter

how well the script performs

and what information is stored in the metadata,

especially if it's a large dataset

so our Python script accesses the iris dataset here

that's built into one of the Python packages.

Now.

All we are looking in to do is to read in slightly large number

of items and calculate some basic operations

on the data set.

So first of all,

what we need to do is

from sklearn import the data set so sklearn

is scikit-learn and it is another library of python.

It contains a lot of data sets for machine learning

and all the algorithms

which are present for machine learning

and the data sets which are there so,

So import was successful.

So what we're going to do now is pull in the IRS data.

What we're going to do is Iris underscore data set equals

and the load on the screen now that should do and I'm sorry,

it's data set start lower.

So so as you can see here,

the number here is considered three now

because in the second drawer

and we encountered an error it was data set.

He's not data set.

So so what we're going

to do is grab the first two corner of the data.

So let's pretend x equals.

If you press the tab, it automatically detects

what you're going to write as Todd datasets dot data.

And what we're going to do is take the first two rows comma

not to run it from your keyboard.

All you need to do is press shift + enter.

So next what we're going to do is calculate

some basic statistics.

So what we're going to do is X underscore.

Count equals x I'm going to use the length function and said

that we're going to use x dot flat similarly.

We going to see X-Men

and X Max and the Min our display our results.

What we're going to do is you just play the results now, so

as you can see the counter 300 the minimum value is 3.8 m/s.

And what is 0.4

and the mean is five point eight four three three three.

So let me connect you to the real life

and tell you what all are the things

which you can easily do using the concepts of machine learning

so you can easily get answer to the questions

like which types of house lies in this segment

or what is the market value of this house or is this

a male as spam or not spam?

Is there any fraud?

Well, these are some of the question you could ask

to the machine

but for getting an answer to these you need some algorithm

the machine need to train on the basis of some algorithm.

Okay, but how will you decide which algorithm

to choose and when?

Okay.

So the best option for us is to explore them one by one.

So the first is classification algorithm

where the categories predicted using the data

if you have some question,

like is this person a male

or a female or is this male a Spam or not?

Spam then these category of question would fall

under the classification algorithm classification is

a supervised learning approach

in which the computer program learns from the input

given to it

and then uses

this learning to classify new observation some examples

of classification problems

are speech organization handwriting recognized.

Shouldn't biometric identification document

classification Etc.

So next is the anomaly detection algorithm

where you identify the unusual data point.

So what is an anomaly detection.

Well, it's a technique

that is used to identify unusual pattern

that does not conform to expected Behavior

or you can say the outliers.

It has many application

in business like intrusion detection,

like identifying strange patterns in the network traffic

that could signal a hack or system Health monitoring

that is sporting a deadly tumor in the MRI scan

or you can even use it

for fraud detection credit card transaction

or to deal with fault detection in operating environment.

So next comes the clustering algorithm,

you can use this clustering algorithm to group the data

based on some similar condition.

Now you can get answer to which type of houses lies

in this segment or what type of customer buys this product.

The clustering is a task of dividing the population

or data points into number of groups such

that the data point and the same groups are more.

Hello to other data points in the same group than those

in the other groups in simple words.

The aim is to segregate groups with similar trait

and assigning them into cluster.

Now this clustering is a task of dividing the population

or data points into a number of groups such

that the data points

in the X group is more similar to the other data points

in the same group rather than those in the other group.

In other words.

The aim is to segregate the groups with similar traits

and assigning them into different clusters.

Let's understand this

with an example Suppose you are the head of a rental store

and you wish to understand the preference of your customer

to scale up your business.

So is it possible

for you to look at the detail of each customer and design

a unique business strategy for each of them?

Definitely not right?

But what you can do is to Cluster all your customer saying

to 10 different groups based on their purchasing habit

and you can use a separate strategy

for customers in each of these ten different groups.

And this is what we call clustering.

Next we have regression algorithm where the data

itself is predicted question.

You may ask to this type of model is like what is

the market value of this house

or is it going to rain tomorrow or not?

So regression is one of the most important and broadly

used machine learning and statistics tool.

It allows you to make prediction from data by learning

the relationship between the features of your data

and some observe continuous valued response regulation

is used in a massive number of application.

You know, what stock Isis prediction can be done

using regression now,

you know about different machine learning algorithm.

How will you decide which algorithm to choose

and when so let's cover this part using a demo.

So in this demo part

what we will do will create six different machine learning model

and pick the best model and build the confidence such

that it has the most reliable accuracy.

So far our demo part will be using the IRS data set.

This data set is quite very famous

and is considered one of the best small project to start

with you can consider

this as a hello world data set for machine learning.

So this data set consists

of 150 observation of Iris flower.

Therefore Columns of measurement of flowers in centimeters

the fifth column being the species

of the flower observe all the observed flowers belong

to one of the three species of Iris setosa Iris virginica

and Iris versicolor.

Well, this is a good good project

because it is so well to understand

the attributes are numeric.

So you have to figure out how to load and handle the data.

It is a classification problem.

Thereby allowing you to practice

with perhaps an easier type of supervised learning algorithm.

It has only four attributes and 150 rose.

Meaning it is very small

and can easily fit into the memory and even all

of the numeric attributes are in same unit

and the same scale means you do not require any special scaling

or transformation to get started.

So let's start coding and as I told earlier for the

But I'll be using Anaconda with python 3.0 install on it.

So when you install Anaconda

how your Navigator would look like.

So there's my home page of my anaconda navigator on this.

I'll be using the jupyter notebook,

which is a web-based interactive Computing notebook environment,

which will help me to write and execute my python codes on it.

So let's hit the launch button and execute

our jupyter notebook.

So as you can see

that my jupyter notebook is starting on localhost

double eight nine zero.

Okay, so there's my jupyter notebook

what I'll do here.

I'll select new.

book Python 3 Does my environment where I can write

and execute all my python codes on it?

So let's start by checking the version

of the libraries in order to make this video short

and more interactive and more informative.

I've already written the set of code.

So let me just copy and paste it down.

I'll explain you then one by one.

So let's start

by checking the version of the Python libraries.

Okay, so there is the code let's just copy

it copied and let's paste it.

Okay first let

me summarize things for you what we are doing here.

We are just checking the version

of the different libraries starting

with python will first check what version

of python we are working on then we'll check

what are the version of sci-fi we are using

the numpy matplotlib then Panda then scikit-learn.

Okay.

So let's execute the Run button and see

what are the various versions of libraries

which we are using it the run.

So we are working on Python 3 point 6 point 4 PSI by 1.0 now.

By 1.1 for matplotlib 2.12 pandas 0.22 and scikit-learn

or version 0.19.

Okay.

So these are the version

which I'm using ideally your version should be more recent

or it should match but don't worry

if you lack a few versions behind

as the API is do not change so quickly everything

in this tutorial will very likely still work for you.

Okay, but in case you

are getting an error stop and try to fix that error

in case you are unable to find the solution for the error,

feel free to reach out at Eureka even after the This class.

Let me tell you this

if you are not able to run the script properly,

you will not be able to complete this tutorial.

Okay, so whenever you get a doubt reach out

to a deal-breaker and just resolve it now,

everything is working smoothly then now is the time

to load the data set.

So as I said,

I'll be using the iris flower data set for this tutorial

but before loading the data set,

let's import all the modules function and the object

which we are going to use in this tutorial same

I've already written the set of code.

So let's just copy and paste them.

Let's load all the libraries.

So these are the various libraries

which will be using in our tutorial.

So everything should work fine without an error.

If you get an error just stop you need to work

on your cyber environment before you continue any further.

So I guess everything should work fine.

Let's hit the Run button and see.

Okay, it worked.

So let's now move ahead and load the data.

We can load the data direct

from the UCI machine learning repository.

First of all,

let me tell you we are using Panda to load the data.

Okay.

So let's say my URL.

Is this so This is

My URL for the use your machine learning repository

from where I will be downloading the data set.

Okay.

Now what I'll do,

I'll specify the name of each column

when loading the data.

This will help me later to explore the data.

Okay, so I'll just copy and paste it down.

Okay, so I'm defining a variable names

which consists of various parameters

including sepal length sepal width petal length battle

with and class.

So these are just the name of column from the data set.

Okay.

Now let's define the data set.

So data set equals Panda dot read underscore CSV inside

that we are defining URL and the names

that is equal to name.

As I already said we'll be using Panda to load the data.

Alright, so we are using Panda dot read CSV,

so we are reading.

The CSV file and inside that

from where that CSV is coming from the URL which you are.

So there's my URL.

Okay name sequel names.

It's just specifying the names of the various columns

in that particular CSV file.

Okay.

So let's move forward and execute it.

So even our data set is loaded.

In case you have some network issues just go ahead

and download the iris data file into your working directory

and loaded using the same method but your make sure

that you change the url to the local name

or else you might get an error.

Okay.

Yeah, our data set is loaded.

So let's move ahead and check out data set.

Let's see how many columns or rows we have in our data set.

Okay.

So let's print the number

of rows and columns in our data set.

So our data set is data set dot shape

what this will do.

It will just give you the numbers of total number

of rows and 2.

Little more of column or you can say the total number

of instances are attributes in your data set fine.

So print data set dot shape audio getting 150 and 500.

So 150 is the total number of rows in your data set

and five is the total number of columns fine.

So moving on ahead.

What if I want to see the sample data set?

Okay.

So let me just print the first certain instances

of the data set.

Okay, so print data set.

Head.

What I want is the first 30 instances fine.

This will give me the first 30 result of my data set.

Okay.

So when I hit the Run button

what I am getting is the first 30 result,

okay 0 to 29.

So this is

how my sample data set looks like sepal length sepal

width petal and petal width and the class, okay.

So this is how our data set looks like now,

let's move on

and look at the summary of each attribute.

What if I want to find out the count mean the minimum

and the maximum values and some other percentiles as well.

So what should I do then

for that print data set dot described.

What did we give let's see.

So you can see

that all the numbers are the same scales of similar range

between 0 to 8 centimeters,

right the mean value the standard deviation

the minimum value the 25 percentile

50 percentile 75 percentile

the maximum value all these values lies

in the range between 0 to 8 centimeter.

Okay.

So what we just did is we just took a summary

of each attribute.

Now, let's look at the number of instances

that belong to each class.

So for that what we'll do print data set.

First of all,

so let's print data set

and I want to group it Group by using class

and I want the size of it size of each class fine,

and let's hit the Run.

Okay.

So what I want to do,

I want to print print out data set.

However want to get it.

I want it by class.

So Group by class.

Okay.

Now I want the size

of each class find the size of each class.

So Group by class dot size and skewed the run

so you can see

that I have 50 instances of Iris setosa 50 instances

of Iris versicolor

and 50 instances of Iris virginica.

Okay, all our of data type integer of base64 fine.

So now we have a basic idea of Data, now,

let's move ahead and create some visualization for it.

So for this we are going to create two different types

of plot first would be the univariate plot

and the next would be the multivariate plot.

So we'll be creating univariate plots to better understand

about each attribute

and the next will be creating the multivariate plot to better

understand the relationship between different attributes.

Okay.

So we start with some univariate plot

that is plot of each individual variable.

So given that the input variables are numeric

we can create box and whiskers plot for it.

Okay.

So let's move ahead and create a box and whiskers plot

so data set Dot Plot.

What kind I want it's a box.

Okay, I'm do I need a subplot?

Yeah, I need subplots for that.

So subplots equal to what type

of layout do I won't so my layout structure is

2 cross 2 next do I want to share my coordinates X

and Y coordinates.

No, I don't want to share it.

So share x equal false

and even share why that 2 equals false?

Okay.

So we have our data set Dot Plot kind equal box.

My subplots is to lay out to Us too and then

what I want to do it,

I want to see so Plot show whatever I created short.

Okay, execute it.

Not just gives us a much clearer idea

about the distribution of the input attribute.

Now what if I had given the layout to 2 cross 2 instead

of that I would have given it for cross for so

what it will result just see fine.

Everything would be printed in just one single row.

Hold on guys area is a doubt.

He's asking that why we're using the sheriff's

and share y values.

What are these why we have assigned false values to it?

Okay Ariel.

So in order to resolve this query,

I need to show you what will happen

if I give True Values to them.

Okay, so be with me so share its go.

Pull through and share why that equals true.

So let's see what result will get.

You're getting it the X

and y-coordinates are just shared among all the

for visualization.

Right?

So are you can see

that the sepal length and sepal width has

y values ranging from zero point zero two seven point five

which are being shared

among both the visualization so is with the petal length.

It has shared value

between zero point zero two seven point five.

Okay, so that is why I don't want to share

the value of X and Y,

so it's just giving us a cluttered visualization.

So Aria why I'm doing this.

I'm just doing it

cause I don't want my X and Y coordinates To be shared

among any visualization.

Okay.

That is why my share X and share by value are false.

Okay, let's execute it.

So this is a pretty much Clear visualization

which gives a clear idea about the distribution

of the input attribute.

Now if you want you can also create a histogram

of each input variable to get a clear idea

of the distribution.

So let's create a histogram for it.

So data set dot his okay.

I would need to see it.

So plot dot show.

Let's see.

So there's my histogram and it seems

that we have two input variables that have a go.

And distribution so this is useful to note

as we can use the algorithms

that can exploit this assumption.

Okay.

So next comes the multivariate lat now

that we have created the univariate plot to understand

about each attribute.

Let's move on and look at the multivariate plot and see

the interaction between the different variables.

So first, let's look at the scatter plot

of all the attribute this can be helpful

to spot structured relationship between input variables.

Okay.

So let's create a scatter Matrix.

So for creating a scatter plot, we need scatter Matrix,

and we need to pass our data set into It okay.

And then what I want I want to see it.

So plot dot show.

So this is how my scatter Matrix looks

like it's like

that the diagonal grouping of some pear, right?

So this suggests a high correlation

and a predictable relationship.

All right.

This was our multivariate plot.

Now, let's move on and evaluate some algorithm

that's time to create some model of the data

and estimate the accuracy on the basis of unseen data.

Okay.

So now we know all about our data set, right?

We know how many instances

and attributes are there in our data set.

We know the summary of each attribute.

So I guess we have seen much about our data set.

Now.

Let's move on and create some algorithm

and estimate their accuracy based on the Unseen data.

Okay.

Now what we'll do we'll create some model of the data

and estimate the accuracy based on the some unseen data.

Okay.

So for that first of all,

let's create a validation data set.

What is the validation data set validation data set is

your training data set

that will be using it to trainer model fine.

All right.

So how will create a validation data

set for creating a validation data set?

What we are going to do is we are going to split our data set

into two point.

Okay.

So the very first thing we'll do is to create

a validation data set.

So why do we even need a validation data set?

So we need a validation data set know

that the model we created is any good later.

What we'll do we'll use the statistical method

to estimate the accuracy of the model that we create

on the Unseen data.

We also want a more concrete estimate of the accuracy

of the best model

on unseen data by evaluating it on the actual unseen data.

Okay confused.

Let me simplify this for you.

What we'll do we'll split the loaded data

into two parts the first 80 percent of the data.

User to train our model and the rest 20% will hold back

as the validation data set

that will use it to verify our trained model.

Okay fine.

So let's define an array.

This is my ra water it will consist of will consist

of all the values from the data set.

So data set dot values.

Okay next.

I'll Define a variable X

which will consist of all the column

from the array from 0 to 4 starting from 0 to 4

and the next variable Y

which would consist of of the array starting from this.

So first of all,

we will Define a variable X that will consist of the values

in the array starting from the beginning 0 Del for

okay.

So these are the column which will include

in the X variable

and for a y variable I'll Define it as a class or the output.

So what I need,

I just need the fourth column that is my class column.

So I'll start it from the beginning

and I just want the fourth column.

Okay now I'll Define the my validation size.

Validation underscore sighs,

I'll Define it as 0.20 and our use a seed I

Define CD equals 6.

So this method seed sets the integers starting value used

in generating random number.

Okay, I'll Define the value of C R equals x.

I'll tell you what is the importance of that later on?

Okay.

So let me Define first few variables such as X

underscore train test why underscore train

and why underscore test Okay,

so What do you want to do is Select some model.

Okay, so module underscore selection.

But before doing

that what we have to do is split our training data set

into two halves.

Okay, so dot train underscore test underscore split

what you want to split is a value of X and Y.

Okay and my test size is equals to validation size,

which is a 0.20 correct and my random state.

Is equal to seed so what the city is doing?

It's helping me to keep the same Randomness in the training

and testing data set fine.

So let's execute it and see what is our result.

It's executed next.

We'll create a test harness for this.

We'll use 10-fold cross-validation to

estimate the accuracy.

So what it will do it will split a data set

into 10 parts crane on the nine part

and test on the one part

and this will repeat for all combination of train

and test pilots.

Okay.

So for that,

let's define again my CD that was six already

Define and scoring equals accuracy fine.

So we are using the metric of accuracy to evaluate the model.

So what is this?

This is a ratio of number of correctly predicted instances

divided by the total number of instances in the data set x

hundred giving a percentage example.

It's 98% accurate or 99% accurate things like that.

Okay, so we'll be In the scoring variable

when we run the build

and evaluate each model in the next step.

The next part is building model till now.

We don't know which algorithm would be good for this problem

or what configuration to use.

So let's begin with six different algorithm.

I'll be using

logistic regression linear discriminant analysis,

k-nearest neighbor classification and

regression trees neighbor buys.

And what Vector machine well

these algorithms chime using is a good mixture of simple linear

or non-linear algorithms in simple linear switch.

Included the logistic regression

and the linear discriminant analysis or the nonlinear part

which included the KNN algorithm the card algorithm

that the neighbor buys and the support Vector machines.

Okay.

So we reset the random number seed

before each run to ensure that evaluation

of each algorithm

is performed using exactly the same data spreads.

It ensures the result are directly comparable.

Okay, so, let me just copy and paste it.

Okay.

So what we're doing here,

we are building five different types of model.

We are building logistic regression

linear discriminant analysis,

k-nearest neighbor decision tree ghajini buys

and the support Vector machine.

Okay next what we'll do we'll evaluate model in each turn.

Okay.

So what is this?

So we have six different model

and accuracy estimation for each one of them

now we need to compare the model to each other

and select the most accurate of them all.

So running the script we saw the following result

so we can see some of the results on the screen.

What is It is just

the accuracy score using different set of algorithms.

Okay, when we are using logistic regression,

what is the accuracy rate

when we are using linear discriminant algorithm?

What is the accuracy and so-and-so?

Okay.

So from the output with seems

that LD algorithm was the most accurate model

that we tested now,

we want to get an idea of the accuracy of the model

on our validation set or the testing data set.

So this will give us an independent final check

on the accuracy of the best model.

It is always valuable to keep our testing data set

for just in case you made a our overfitting

to the testing data set

or you made a data leak both will result

in an overly optimistic result.

Okay, you can run the ldo model directly on the validation set

and summarize the result as a final score a confusion Matrix

and a classification statistics and probability are essential

because these disciples form the basic Foundation

of all machine learning algorithms deep learning.

Social intelligence and data science,

in fact mathematics and probability is

behind everything around us from shapes patterns

and colors to the count

of petals in a flower mathematics is embedded

in each and every aspect of our lives.

So I'm going to go ahead and discuss the agenda

for today with you all we're going to begin

the session by understanding

what is data after that.

We'll move on and look at the different categories

of data like quantitative and Qualitative data,

then we'll discuss what exactly statistics is

the basic terminologies in statistics and a couple

of sampling techniques.

Once we're done with that.

We'll discuss a different types of Statistics

which involve descriptive and inferential statistics.

Then in the next session,

we will mainly be focusing on descriptive statistics

here will understand the different measures

of center measures of spread Information Gain

and entropy will also

understand all of these measures with the help of a user.

And finally, we'll discuss what exactly a confusion Matrix is.

Once we've covered

the entire descriptive statistics module will discuss

the probability module

here will understand what exactly probability

is the different terminologies in probability.

We will also study the different probability distributions,

then we'll discuss the types of probability which include

marginal probability joint and conditional probability.

Then we move on

and discuss a use case wherein we will see examples

that show us

how the different types of probability work

and to better understand Bayes theorem.

We look at a small example.

Also, I forgot to mention

that at the end of the descriptive statistics module

will be running a small demo in the our language.

So for those of you who don't know much

about our I'll be explaining every line in depth,

but if you want to have a more in-depth understanding

about our I'll leave a couple of blocks.

And a couple of videos in the description box

you all can definitely check out that content.

Now after we've completed the probability module will discuss

the inferential statistics module will start this module

by understanding

what is point estimation will discuss

what is confidence interval and how you can estimate

the confidence interval will also discuss margin of error

and will understand all of these concepts by looking

at a small use case.

We finally end the inferential Real statistic module by looking

at what hypothesis testing is hypothesis.

Testing is a very important part of inferential statistics.

So we'll end the session by looking at a use case

that discusses how hypothesis testing works

and to sum everything up.

We'll look at a demo

that explains how inferential statistics works.

Right?

So guys, there's a lot to cover today.

So let's move ahead and take a look at our first topic

which is what is data.

Now, this is a quite simple question

if I ask any of You what is data?

You'll see that it's a set of numbers

or some sort of documents

that have stored in my computer now data is actually everything.

All right, look around you there is data everywhere each click

on your phone generates more data than you know,

now this generated data provides insights for analysis

and helps us make Better Business decisions.

This is why data is so important to give you

a formal definition data refers to facts and statistics.

Collected together for reference or analysis.

All right.

This is the definition of data in terms

of statistics and probability.

So as we know data can be collected it

can be measured and analyzed

it can be visualized by using statistical models

and graphs now data is divided into two major subcategories.

Alright, so first we have qualitative data

and quantitative data.

These are the two different types of data

under qualitative data.

I'll be have nominal and ordinal data

and under quantitative data.

We have discrete and continuous data.

Now, let's focus on qualitative data.

Now this type of data deals with characteristics and descriptors

that can't be easily measured

but can be observed subjectively

now qualitative data is further divided

into nominal and ordinal data.

So nominal data is any sort of data

that doesn't have any order or ranking?

Okay.

An example of nominal data is gender.

Now.

There is no ranking in gender.

There's only male female or other right?

There is no one two,

three four or any sort of ordering in gender race is

another example of nominal data.

Now ordinal data is basically an ordered series of information.

Okay, let's say that you went to a restaurant.

Okay.

Your information is stored in the form of customer ID.

All right.

So basically you are represented with a customer ID.

Now you would have rated their service as

either good or average.

All right, that's how no ordinal data is

and similarly they'll have a record of other customers

who visit the restaurant along with their ratings.

All right.

So any data which has some sort of sequence

or some sort of order to it is known as ordinal data.

All right, so guys,

this is pretty simple to understand now,

let's move on and look at quantitative data.

So quantitative data basically these He's

with numbers and things.

Okay, you can understand

that by the word quantitative itself quantitative is

basically quantity.

Right Saudis with numbers a deals with anything

that you can measure objectively, right?

So there are two types

of quantitative data there is discrete and continuous data

now discrete data is also known as categorical data

and it can hold a finite number of possible values.

Now, the number of students in a class is a finite Number.

All right, you can't have infinite number

of students in a class.

Let's say in your fifth grade.

There were a hundred students in your class.

All right, there weren't infinite number but there was

a definite finite number of students in your class.

Okay, that's discrete data.

Next.

We have continuous data.

Now this type of data can hold infinite number

of possible values.

Okay.

So when you say weight of a person is an example

of continuous data

what I mean to see is my weight can be 50 kgs

or it Can be 50.1 kgs

or it can be 50.00 one kgs or 50.000 one or is 50.0 2 3

and so on right?

There are infinite number of possible values, right?

So this is what I mean by continuous data.

All right.

This is the difference between discrete and continuous data.

And also I would like to mention a few other things over here.

Now, there are a couple of types of variables as well.

We have a discrete variable

and we have a continuous variable discrete variable

is also known as a categorical variable

or and it can hold values of different categories.

Let's say that you have a variable called message

and there are two types of values that this variable

can hold let's say

that your message can either be a Spam message

or a non spam message.

Okay, that's when you call a variable as discrete

or categorical variable.

All right, because it can hold values

that represent different categories of data

now continuous variables are basically variables

that can store in finite number of values.

So the weight of a person can be denoted as

a continuous variable.

All right, let's say there is a variable called weight

and it can store infinite number of possible values.

That's why we'll call it a continuous variable.

So guys basically variable is anything

that can store a value right?

So if you associate any sort of data with a A table,

then it will become either discrete variable

or continuous variable.

There is also dependent and independent type of variables.

Now, we won't discuss all with that in depth because

that's pretty understandable.

I'm sure all of you know,

what is independent variable and dependent variable right?

Dependent variable is any variable whose value

depends on any other independent variable?

So guys that much knowledge I expect or

if you do have all right.

So now let's move on and look at our next topic which Which is

what is statistics now coming to the formal definition

of statistics statistics is an area of Applied Mathematics,

which is concerned

with data collection analysis interpretation

and presentation now usually

when I speak about statistics people think statistics is

all about analysis

but statistics has other path toward it has data collection is

also part of Statistics data interpretation presentation.

All of this comes

into statistics already are going to use statistical methods

to visualize data to collect data to interpret data.

Alright, so the area of mathematics deals

with understanding

how data can be used to solve complex problems.

Okay.

Now I'll give you a couple of examples

that can be solved by using statistics.

Okay, let's say

that your company has created a new drug

that may cure cancer.

How would you conduct a test to confirm

the As Effectiveness now,

even though this sounds like a biology problem.

This can be solved with Statistics.

All right, you will have to create a test

which can confirm the effectiveness of the drum

or a this is a common problem

that can be solved using statistics.

Let me give you another example you

and a friend are at a baseball game and out of the blue.

He offers you a bet

that neither team will hit a home run in that game.

Should you take the BET?

All right here you just discuss the probability

of I know you'll win or lose.

All right, this is another problem

that comes under statistics.

Let's look at another example.

The latest sales data has just come in

and your boss wants you to prepare a report

for management on places

where the company could improve its business.

What should you look for?

And what should you not look for now?

This problem involves a lot

of data analysis will have to look at the different variables

that are causing your business to go down

or the you have to look at a few variables.

That are increasing the performance of your models

and does growing your business.

Alright, so this involves a lot of data analysis

and the basic idea

behind data analysis is to use statistical techniques

in order to figure out the relationship

between different variables

or different components in your business.

Okay.

So now let's move on and look

at our next topic which is basic terminologies and statistics.

Now before you dive deep into statistics, it is important

that you understand the basic terminologies

used in statistics.

The two most important terminologies in statistics

are population and Sample.

So throughout the statistics course or throughout any problem

that you're trying to stall with Statistics.

You will come across these two words,

which is population and Sample Now population is a collection

or a set of individuals or objects or events.

Events whose properties are to be analyzed.

Okay.

So basically you can refer to population as a subject

that you're trying to analyze now a sample is just

like the word suggests.

It's a subset of the population.

So you have to make sure that you choose the sample

in such a way

that it represents the entire population.

All right.

It shouldn't Focus add one part of the population instead.

It should represent the entire population.

That's how your sample should be chosen.

So Well chosen sample will contain most

of the information about a particular population parameter.

Now, you must be wondering how can one choose a sample

that best represents the entire population now

sampling is a statistical method

that deals with the selection of individual observations

within a population.

So sampling is performed

in order to infer statistical knowledge about a population.

All right, if you want to understand

the different statistics of a population

like the mean

the median Median the mode or the standard deviation

or the variance of a population.

Then you're going to perform sampling.

All right,

because it's not reasonable for you to study a large population

and find out the mean median and everything else.

So why is sampling performed you might ask?

What is the point of sampling?

We can just study the entire population now guys,

think of a scenario

where in you're asked to perform a survey

about the eating habits of teenagers in the US.

So at present there are over 42 million teens in the US

and this number is growing

as we are speaking right now, correct.

Is it possible to survey each of these 42 million individuals

about their health?

Is it possible?

Well, it might be possible

but this will take forever to do now.

Obviously, it's not it's not reasonable to go around

knocking each door

and asking for what does your teenage son eat

and all of that right?

This is not very reasonable.

That's Why sampling is used?

It's a method wherein a sample of the population is studied

in order to draw inferences about the entire population.

So it's basically a shortcut to starting

the entire population instead of taking the entire population

and finding out all the solutions.

You just going to take a part of the population

that represents the entire population

and you're going to perform all your statistical analysis

your inferential statistics on that small sample.

All right,

and that sample basically here Presents the entire population.

All right, so I'm sure have made this clear

to you all what is sample and what is population now?

There are two main types of sampling techniques

that are discussed today.

We have probability sampling and non-probability

sampling now in this video

will only be focusing on probability sampling techniques

because non-probability sampling is not within the scope

of this video.

All right will only discuss the probability part

because we're focusing

on Statistics and probability correct.

Now again under probability sampling.

We have three different types.

We have random sampling systematic

and stratified sampling.

All right, and just to mention the different types

of non-probability

sampling zwi have no ball Kota judgment

and convenience sampling.

All right now guys in this session.

I'll only be focusing on probability.

So let's move on

and look at the different types of probability sampling.

So what is Probability sampling.

It is a sampling technique

in which samples from a large population

are chosen by using the theory of probability.

All right, so there are three types

of probability sampling.

All right first we have the random sampling now

in this method each member

of the population has an equal chance

of being selected in the sample.

All right.

So each and every individual or each and every object

in the population has an equal chance

of being a A part of the sample.

That's what random sampling is all about.

Okay, you are randomly going to select any individual

or any object.

So this Bay each individual has

an equal chance of being selected.

Correct?

Next.

We have systematic sampling now

in systematic sampling every nth record is chosen

from the population to be a part of the sample.

All right.

Now refer this image

that I've shown over here out of these six groups

every Skinned group is chosen as a sample.

Okay.

So every second record is chosen here and this is

our systematic sampling works.

Okay, you're randomly selecting the nth record

and you're going to add that to your sample.

Next.

We have stratified sampling.

Now in this type

of technique a stratum is used to form samples

from a large population.

So what is a stratum a stratum is basically a subset

of the population

that shares at least one comment.

Characteristics so let's say

that your population has a mix of both male and female

so you can create to straightens

out of this one will have only the male subset

and the other will have the female subset

or a this is what stratum is it is basically a subset

of the population

that shares at least one common characteristics.

All right in our example, it is gender.

So after you've created

a stratum you're going to use random sampling

on the stratums and you're going to choose a final Samba.

But so random sampling meaning

that all of the individuals

in each of the stratum will have an equal chance

of being selected in the sample, correct.

So Guys, these were the three different types

of sampling techniques.

Now, let's move on and look at our next topic

which is the different types of Statistics.

So after this,

we'll be looking at the more advanced concepts of Statistics,

right so far we discuss the basics of Statistics,

which is basically

what is statistics the different sampling.

Techniques and the terminologies and statistics.

All right.

Now we look at the different types of Statistics.

So there are two major types of Statistics

descriptive statistics

and inferential statistics in today's session.

We will be discussing both of these types

of Statistics in depth.

All right, we'll also be looking at a demo

which I'll be running in the our language

in order to make you understand what exactly

descriptive and inferential statistics is so guys,

which is going to look at the 600 don't worry,

if you don't have much knowledge,

I'm explaining everything from the basic level.

All right, so guys descriptive statistics is a method

which is used to describe and understand the features

of specific data set by giving a short summary of the data.

Okay, so it is mainly

focused upon the characteristics of data.

It also provides a graphical summary of the data now

in order to make you understand what descriptive statistics is,

let's suppose.

Pose that you want to gift all your classmates or t-shirt.

So to study the average shirt size of a student

in a classroom.

So if you were to use descriptive statistics to study

the average shirt size of students in your classroom,

then what you would do is you would record the shirt size

of all students in the class

and then you would find out the maximum minimum and average

shirt size of the club.

Okay.

So coming to inferential statistics inferential

statistics makes Is

and predictions about a population based

on the sample of data taken from the population?

Okay.

So in simple words,

it generalizes a large data set and it applies probability

to draw a conclusion.

Okay.

So it allows you to infer data parameters

based on a statistical model by using sample data.

So if we consider the same example of finding

the average shirt size of students in a class

in infinite shal statistics,

you will take a sample.

All set of the class

which is basically a few people from the entire class.

All right, you already have had grouped the class

into large medium and small.

All right in this method you basically build

a statistical model

and expand it for the entire population in the class.

So guys, there was a brief understanding of descriptive

and inferential statistics.

So that's the difference between descriptive

and inferential now in the next section,

we will go in depth about descriptive statistics.

All right, so,

That's a discuss more about descriptive statistics.

So like I mentioned

earlier descriptive statistics is a method

that is used to describe and understand the features

of a specific data set by giving short summaries about the sample

and measures of the data.

There are two important measures in descriptive statistics.

We have measure of central tendency,

which is also known as measure

of center and we have measures of variability.

This is also known as measures of spread.

Ed so measures of center include mean median and mode now

what is measures

of center measures of the center are statistical measures

that represent the summary of a data set?

Okay, the three main measures of center are mean median

and mode coming to measures of variability

or measures of spread.

We have range interquartile range variance

and standard deviation.

All right.

So now let's discuss each

of these measures in a little more.

Up starting with the measures of center.

Now.

I'm sure all of you know what the mean is mean is basically

the measure of the average of all the values in a sample.

Okay, so it's basically the average of all

the values in a sample.

How do you measure the mean I hope all of you know

how the main is measured

if there are 10 numbers

and you want to find the mean of these 10 numbers.

All you have to do is you have to add up all the 10 numbers

and you have to divide it by 10 then here

represents the Number of samples in your data set.

All right, since we have 10 numbers,

we're going to divide this by 10.

All right, this will give us the average

or the mean so to better understand the measures

of central tendency.

Let's look at an example.

Now the data set over here is basically the cars data set

and it contains a few variables.

All right, it has something known as cars.

It has mileage

per gallon cylinder type displacement horsepower

and roll axle ratio.

All right, all of these measures are related to cars.

Okay.

So what you're going to do is you're going

to use descriptive analysis

and you're going to analyze each of the variables

in the sample data set

for the mean standard deviation median mode and so on.

So let's say that you want to find out the mean

or the average horsepower

of the cars among the population of cards.

Like I mentioned earlier

what you'll do is you will check the average of all the values.

So in this case,

we will take the sum of the horizontal.

Horsepower of each car

and we'll divide that by the total number of cards.

Okay, that's exactly

what I've done here in the calculation part.

So this hundred and ten basically

represents the horsepower for the first car.

Alright, similarly.

I've just added up all the values of horsepower

for each of the cars

and I've divided it by 8 now 8 is basically the number

of cars in our data set.

All right, so hundred and three point six two five is what

our mean is or Average of horsepower is all right.

Now, let's understand what median is with an example?

Okay.

So to Define median median is basically a measure

of the central value

of the sample set is called the median.

All right, you can see that it is a middle value.

So if we want to find out the center value

of the mileage per gallon among the population

of cars first,

what we'll do is we'll arrange the MGP values in ascending

or descending order

and Choose a middle value right in this case

since we have eight values, right?

We have eight values which is an even entry.

So whenever you have even number of data points

or samples in your data set,

then you're going to take the average

of the two middle values.

If we had nine values over here.

We can easily figure out the middle value

and you know choose that as a median.

But since they're even number of values we're going

to take the average of the two middle values.

All right, so,

Eight and twenty three are my two middle values

and I'm taking the mean of those 2

and hence I get twenty two point nine,

which is my median.

All right.

Lastly let's look at how mode is calculated.

So what is mode the value

that is most recurrent

in the sample set is known as mode or basically the value

that occurs most often.

Okay, that is known as mode.

So let's say

that we want to find out the most common type of cylinder

among the population

of cards all we have to Do is we will check the value

which is repeated the most number of times here.

We can see that the cylinders come in two types.

We have cylinder of Type 4 and cylinder of type 6, right?

So take a look at the data set.

You can see that the most recurring value is 6 right.

We have one two, three four and five.

We have five six and we have one two, three.

Yeah, we have three four types of lenders and 5/6.

Cylinders.

So basically we have

three four type cylinders and we have five six type cylinders.

All right.

So our mode is going to be 6 since 6 is more

recurrent than 4 so guys

those were the measures

of the center or the measures of central tendency.

Now, let's move on and look at the measures of the spread.

All right.

Now, what is the measure of spread a measure of spread?

Sometimes also called

as measure of dispersion is used to describe the The variability

in a sample or population.

Okay, you can think of it as some sort

of deviation in the sample.

All right.

So you measure this

with the help of the different measure of spreads.

We have range interquartile range variance

and standard deviation.

Now range is pretty self-explanatory, right?

It is the given measure of how spread apart the values

in a data set are the range can be calculated

as shown in this formula.

So you're basically going to The maximum value

from the minimum value in your data set.

That's how you calculate the range of the data.

Alright, next we have interquartile range.

So before we discuss interquartile range,

let's understand.

What a quartile is red.

So quartiles basically tell us about the spread of a data set

by breaking the data set into different quarters.

Okay, just like

how the median breaks the data into two parts.

The quartile will break it.

In two different quarters,

so to better understand how quartile and

interquartile are calculated.

Let's look at a small example.

Now this data set basically represents the marks

of hundred students ordered from the lowest

to the highest scores red.

So the quartiles lie

in the following ranges the first quartile,

which is also known as q1 it

lies between the 25th and 26th observation.

All right.

So if you look at this I've highlighted the 25th

and the Six observation.

So how you can calculate Q 1 or first quartile is

by taking the average of these two values.

Alright, since both the values are 45

when you add them up and divide them by two

you'll still get 45 now the second quartile

or Q 2 is between the 50th and the fifty first observation.

So you're going to take the average of 58

and 59 and you will get a value of 58.5 now,

this is my second quarter the third quartile Q3.

Is between the 75th

and the 76th observation here again will take the average

of the two values

which is the 75th value and the 76 value right

and you'll get a value of 71.

All right, so guys this is exactly

how you calculate the different quarters.

Now, let's look at what is interquartile range.

So IQR or the interquartile range is a measure

of variability based on dividing a data set into quartiles.

Now, the interquartile range is Calculated

by subtracting the q1 from Q3.

So basically Q3

minus q1 is your IQ are so your IQR is your Q3 minus q1?

All right.

Now this is how each

of the quartiles are each core tile represents a quarter,

which is 25% All right.

So guys, I hope all of you are clear

with the interquartile range and what our quartiles now,

let's look at variance covariance is

basically a measure that shows how much a

I'm variable the first from its expected value.

Okay.

It's basically the variance in any variable now variance

can be calculated by using this formula right here x

basically represents any data point in your data set

n is the total number of data points in your data set

and X bar is basically the mean of data points.

All right.

This is how you calculate variance variance is

basically a Computing the squares of deviations.

Okay.

That's why it says s Square there now.

Look at what is deviation deviation is just the difference

between each element from the mean.

Okay, so it can be calculated by using this simple formula

where X I basically represents a data point

and mu is the mean of the population

or add this is exactly

how you calculate the deviation Now population variance

and Sample variance are very specific to

whether you're calculating the variance

in your population data set or in your sample data

set now the only difference between Elation

and Sample variance.

So the formula

for population variance is pretty explanatory.

So X is basically each data point mu is the mean

of the population

n is the number of samples in your data set.

All right.

Now, let's look at sample.

Variance Now sample variance

is the average of squared differences from the mean.

All right here x i is any data point

or any sample in your data set X bar is the mean

of your sample.

All right.

It's not the main of your population.

It's the If your sample and

if you notice any here is a smaller n is the number

of data points in your sample.

And this is basically the difference between sample

and population variance.

I hope that is clear coming

to standard deviation is the measure of dispersion

of a set of data from its mean.

All right, so it's basically the deviation from your mean.

That's what standard deviation is now to better understand

how the measures of spread are calculated.

Let's look at a small use case.

So let's say the Daenerys has 20 dragons.

They have the numbers nine to five four and so on

as shown on the screen,

what you have to do is you have to work out

the standard deviation or at

in order to calculate the standard deviation.

You need to know the mean right?

So first you're going to find out the mean of your sample set.

So how do you calculate the mean you add all the numbers

and divided by the total number of samples in your data set

so you get a value of 7 here

then you I'll clear the rhs of your standard deviation formula.

All right, so from each data point you're going

to subtract the mean and you're going to square that.

All right.

So when you do that,

you will get the following result.

You'll basically get this 425 for 925

and so on so finally you will just find the mean

of the squared differences.

All right.

So your standard deviation

will come up to two point nine eight three

once you take the square root.

So guys, this is pretty simple.

It's a simple mathematic technique.

All you have to do is you have to substitute the values

in the formula.

All right.

I hope this was clear to all of you.

Now let's move on

and discuss the next topic which is Information Gain

and entropy now.

This is one of my favorite topics in statistics.

It's very interesting and this topic is mainly involved

in machine learning algorithms,

like decision trees and random forest.

All right, it's very important

for you to know

how Information Gain and entropy really work and why they are

so essential in building machine learning models.

We focus on the statistic parts of Information Gain and entropy

and after that we'll discuss As a use case and see

how Information Gain

and entropy is used in decision trees.

So for those of you

who don't know what a decision tree is it is

basically a machine learning algorithm.

You don't have to know anything about this.

I'll explain everything in depth.

So don't worry.

Now.

Let's look at what exactly entropy and Information Gain Is

As entropy is basically the measure

of any sort of uncertainty that is present in the data.

All right, so it can be measured by using this formula.

So here s is the set of all instances in the data set

or although data items in the data set

n is the different type

of classes in your data set Pi is the event probability.

Now this might seem a little confusing

to you all but when we go to the use case,

you'll understand all of these terms even better.

All right cam.

To Information Gain

as the word suggests Information Gain indicates

how much information a particular feature

or a particular variable gives us about the final outcome.

Okay, it can be measured by using this formula.

So again here hedge of s is the entropy

of the whole data set s SJ is the number

of instances with the J value

of an attribute a s is the total number of instances

in the data set V is the set

of distinct values of an attribute a hedge

of SJ is the entropy of subsets of instances

and hedge of a comma s is the entropy of an attribute

a even though this seems confusing.

I'll clear out the confusion.

All right, let's discuss a small problem statement

where we will understand

how Information Gain

and entropy is used to study the significance of a model.

So like I said Information Gain

and entropy are very important statistical measures

that let us understand

the significance of a predictive model.

Okay to get a more clear understanding.

Let's look at a use case.

All right now suppose we are given a problem statement.

All right, the statement is that you have to predict

whether a match can be played

or Not by studying the weather conditions.

So the predictor variables here are outlook humidity wind day

is also a predictor variable.

The target variable is basically played already.

The target variable is the variable

that you're trying to protect.

Okay.

Now the value of the target variable will decide

whether or not a game can be played.

All right, so that's why The play has two values.

It has no and yes, no,

meaning that the weather conditions are not good.

And therefore you cannot play the game.

Yes, meaning that the weather conditions are good and suitable

for you to play the game.

Alright, so that was a problem statement.

I hope the problem statement is clear to all of you now

to solve such a problem.

We make use of something known as decision trees.

So guys think of an inverted tree

and each branch of the tree denotes some decision.

All right, each branch is Is known as the branch node

and at each branch node,

you're going to take a decision in such a manner

that you will get an outcome at the end of the branch.

All right.

Now this figure here basically shows

that out of 14 observations 9 observations result in a yes,

meaning that out of 14 days.

The match can be played only on nine days.

Alright, so here

if you see on day 1 Day 2 Day 8 day 9 and 11.

The Outlook has been Alright,

so basically we try to plaster a data set

depending on the Outlook.

So when the Outlook is sunny,

this is our data set when the Outlook is overcast.

and when the Outlook is the rain this is what we have.

All right, so

when it is sunny we have two yeses and three nodes.

Okay, when the Outlook is overcast.

We have all four as yes has meaning

that on the four days when the Outlook was overcast.

We can play the game.

All right.

Now when it comes to drain,

we have three yeses and two nodes.

All right.

So if you notice here,

the decision is being made by choosing the Outlook variable

as the root node.

Okay.

So the root node is

basically the topmost node in a decision tree.

Now, what we've done here is we've created a decision tree

that starts with the Outlook node.

All right, then you're splitting the decision tree further

depending on other parameters like Sunny overcast and rain.

All right now like we know that Outlook has three values.

Sunny overcast and brain so let me explain this

in a more in-depth manner.

Okay.

So what you're doing here is you're making

the decision Tree by choosing the Outlook variable

at the root node.

The root note is basically the topmost node

in a decision tree.

Now the Outlook node has three branches coming out from it,

which is sunny overcast and rain.

So basically Outlook

can have three values either it can be sunny.

It can be overcast or it can be rainy.

Okay now these three values Use are assigned

to the immediate Branch nodes and for each

of these values the possibility

of play is equal to yes is calculated.

So the sunny

and the rain branches will give you an impure output.

Meaning that there is a mix of yes and no right.

There are two yeses here three nodes here.

There are three yeses here and two nodes over here,

but when it comes to the overcast variable,

it results in a hundred percent pure subset.

All right, this shows that the overcast baby.

Will result in a definite and certain output.

This is exactly what entropy is used to measure.

All right, it calculates the impurity or the uncertainty.

Alright, so the lesser the uncertainty or the entropy

of a variable more significant is that variable?

So when it comes to overcast there's literally no impurity

in the data set.

It is a hundred percent pure subset, right?

So be want variables like these in order to build a model.

All right now,

we don't always Ways get lucky and we don't always find

variables that will result in pure subsets.

That's why we have the measure entropy.

So the lesser the entropy of a particular variable the most

significant that variable will be so in a decision tree.

The root node is assigned the best attribute

so that the decision tree can predict the most

precise outcome meaning that on the root note.

You should have the most significant variable.

All right, that's why we've chosen Outlook

or and now some of you might ask me why haven't you chosen

overcast Okay is overcast is not a variable.

It is a value of the Outlook variable.

All right.

That's why we've chosen outlook here because it has

a hundred percent pure subset which is overcast.

All right.

Now the question in your head is how do I decide which variable

or attribute best Blitz the data now right now,

I know I looked at the data

and I told you that,

you know here we have a hundred percent pure subset,

but what if it's a more complex problem

and you're not able to understand which variable

will best split the data,

so guys when it comes

to decision tree Information and gain

and entropy will help

you understand which variable will best split the data set.

All right, or which variable you have to assign to the root node

because whichever variable is assigned to the dude node.

It will best let the data set

and it has to be the most significant variable.

All right.

So how we can do this is we need to use

Information Gain and entropy.

So from the total of the 14 instances

that we saw nine of them said yes

and 5 of the instances said know

that you cannot play on that particular day.

All right.

So how do you calculate the entropy?

So this is the formula you just substitute

the values in the formula.

So when you substitute the values in the formula,

you will get a value of 0.9940.

All right.

This is the entropy

or this is the uncertainty of the data present in a sample.

Now in order to ensure

that we choose the best variable for the root node.

Let us look at all the possible combinations

that you can use on the root node.

Okay, so these are All the possible combinations

you can either have Outlook you can have

windy humidity or temperature.

Okay, these are four variables and you can have any one

of these variables as your root node.

But how do you select

which variable best fits the root node?

That's what we are going to see by using

Information Gain and entropy.

So guys now the task at hand is to find the information gain

for each of these attributes.

All right.

So for Outlook for windy for humidity and for temperature,

we're going to find out the information.

Nation gained right now a point to remember is

that the variable

that results in the highest Information Gain must be chosen

because it will give us the most precise and output information.

All right.

So the information gain for attribute windy will calculate

that first here.

We have six instances of true and eight instances of false.

Okay.

So when you substitute all the values in the formula,

you will get a value of zero point zero four eight.

So we get a value of You 2.0 for it.

Now.

This is a very low value for Information Gain.

All right, so the information

that you're going to get from Windy attribute is pretty low.

So let's calculate the information gain

of attribute Outlook.

All right, so from the total of 14 instances,

we have five instances with say Sunny for instances,

which are overcast and five instances,

which are rainy.

All right for Sonny.

We have three yeses and to nose for overcast we have All the

for as yes for any we have three years and two nodes.

Okay.

So when you calculate the information gain

of the Outlook variable will get a value

of zero point 2 4 7 now compare this to the information gain

of the windy attribute.

This value is actually pretty good.

Right we have zero point 2 4 7 which is a pretty good value

for Information Gain.

Now, let's look at the information gain

of attribute humidity now over here.

We have seven instances with say hi and seven instances

with say normal.

Right and under the high Branch node.

We have three instances with say yes,

and the rest for instances would say no similarly

under the normal Branch.

We have one two, three,

four, five six seven instances would say yes

and one instance with says no.

All right.

So when you calculate the information gain

for the humidity variable,

you're going to get a value of 0.15 one.

Now.

This is also a pretty decent value,

but when you compare it to the Information Gain,

Of the attribute Outlook it is less right now.

Let's look at the information gain of attribute temperature.

All right, so the temperature can hold repeat.

So basically the temperature attribute can hold

hot mild and cool.

Okay under hot.

We have two instances with says yes and two instances

for no under mild.

We have four instances of yes and two instances of no

and under col we have three instances of yes

and one instance of no.

All right.

When you calculate

the information gain for this attribute,

you will get a value of zero point zero to nine,

which is again very less.

So what you can summarize from here is if we look

at the information gain for each of these variable will see

that for Outlook.

We have the maximum gain.

All right, we have zero point two four seven,

which is the highest Information Gain value

and you must always choose a variable with the highest

Information Gain to split the data at the root node.

So that's why we assign The Outlook variable

at the root node.

All right, so guys.

I hope this use case with clear if any of you have doubts.

Please keep commenting those doubts now,

let's move on and look at what exactly a confusion Matrix is

the confusion Matrix is the last topic

for descriptive statistics read after this.

I'll be running a short demo where I'll be showing you

how you can calculate mean median mode

and standard deviation variance and all of those values

by using our okay.

So let's talk about confusion Matrix now guys.

What is the confusion Matrix now don't get confused.

This is not any complex topic now confusion.

Matrix is a matrix

that is often used to describe the performance of a model.

All right, and this is specifically used

for classification models

or a classifier

and what it does is it will calculate the accuracy

or it will calculate the performance of your classifier

by comparing your actual results and Your predicted results.

All right.

So this is what it looks

like to positive to negative and all of that.

Now this is a little confusing.

I'll get back to what exactly true positive

to negative and all of this stands for for now.

Let's look at an example and let's try and understand what

exactly confusion Matrix is.

So guys have made sure

that I put examples after each and every topic

because it's important you

understand the Practical part of Statistics.

All right statistics has literally nothing to do

with Theory you need to understand how Calculations

are done in statistics.

Okay.

So here what I've done is now let's look at a small use case.

Okay, let's consider

that your given data about a hundred and sixty five

patients out of which hundred and five patients have a disease

and the remaining 50 patients don't have a disease.

Okay.

So what you're going to do is you will build a classifier

that predicts by using

these hundred and sixty five observations.

You'll feed all

of these 165 observations to your classifier

and it will predict the output every time

a new patients detail is fed to the classifier right now

out of these 165 cases.

Let's say that the classifier predicted.

Yes hundred and ten times and no 55 times.

Alright, so yes basically stands for yes.

The person has a disease and no stands for know.

The person does not have a disease.

All right, that's pretty self-explanatory.

But yeah, so it predicted that a hundred and ten times.

Patient has a disease and 55 times

that know the patient doesn't have a disease.

However in reality only hundred and five patients

in the sample have the disease and 60 patients

who do not have the disease, right?

So how do you calculate the accuracy of your model?

You basically build the confusion Matrix?

All right.

This is how the Matrix looks

like and basically denotes the total number of observations

that you have

which is 165 in our case actual denotes the actual use

in the data set

and predicted denotes the predicted values

by the classifier.

So the actual value is no here

and the predicted value is no here.

So your classifier was correctly able

to classify 50 cases as no.

All right, since both of these are no so 50

it was correctly able to classify but 10

of these cases it incorrectly classified meaning

that your actual value here is no but you classifier

predicted it as yes

or I that's why this And over here similarly

it wrongly predicted

that five patients do not have diseases

whereas they actually did have diseases

and it correctly predicted hundred patients,

which have the disease.

All right.

I know this is a little bit confusing.

But if you look at these values no,

no 50 meaning

that it correctly predicted 50 values No

Yes means that it wrongly predicted.

Yes for the values are it was supposed to predict.

No.

All right.

Now what exactly is?

Is this true positive to negative and all of that?

I'll tell you what exactly it is.

So true positive are the cases in which we predicted a yes

and they do not actually have the disease.

All right, so it is basically this value

already predicted a yes here,

even though they did not have the disease.

So we have 10 true positives right similarly true-

is we predicted know

and they don't have the disease meaning

that this is correct.

False positive is be predicted.

Yes, but they do not actually have the disease

or at this is also known as type 1 error falls- is we predicted.

No, but they actually do not have the disease.

So guys basically falls-

and true negatives are basically correct classifications.

All right.

So this was confusion Matrix

and I hope this concept is clear again guys.

If you have doubts,

please comment your doubt in the comment section.

So guys, that was the entire descriptive.

X module and now we will discuss about probability.

Okay.

So before we understand what exactly probability is,

let me clear out a very common misconception people

often tend to ask me this question.

What is the relationship between statistics and probability?

So probability and statistics are related fields.

All right.

So probability is a mathematical method used

for statistical analysis.

Therefore we can say

that a probability and statistics are interconnected.

Launches of mathematics

that deal with analyzing the relative frequency of events.

So they're very interconnected feels

and probability makes use of statistics

and statistics makes use

of probability or a they're very interconnected Fields.

So that is the relationship between statistics

and probability.

Now, let's understand what exactly is probability.

So probability is the measure

of How likely an event will occur to be more precise.

It is the ratio.

Of desired outcome to the total outcomes.

Now, the probability

of all outcomes always sum up to 1 the probability will always

sum up to 1 probability cannot go beyond one.

Okay.

So either your probability can be 0 or it can be 1

or it can be in the form of decimals like 0.5

to or 0.55 or it can be in the form of 0.5 0.7 0.9.

But it's valuable always stay between the range 0 and 1.

Okay at the famous example

of probability is rolling a dice example.

So when you roll a dice you get six possible outcomes, right?

You get one two,

three four and five six phases of a dice now

each possibility only has one outcome.

So what is the probability that on rolling a dice?

You will get 3 the probability is 1 by 6, right

because there's only one phase

which has the number 3 on it out of six phases.

There's only one phase which has the number three.

So the probability of getting 3

when you roll a dice is 1 by 6 similarly,

if you want to find the probability of getting

a number 5 again,

the probability is going to be 1 by 6.

All right, so all of this will sum up to 1.

All right, so guys this is exactly what probability is.

It's a very simple concept we all learnt it

in 8 standard onwards right now.

Let's understand the different terminologies

that are related to probability.

Now the three terminologies that you often come across

when We talk about probability.

We have something known as the random experiment.

Okay, it's basically an experiment or a process

for which the outcomes cannot be predicted with certainty.

All right.

That's why you use probability.

You're going to use probability in order to predict the outcome

with some sort

of certainty sample space is the entire possible set

of outcomes of a random experiment an event is

one or more outcomes of an experiment.

So if you consider the example Love rolling a dice.

Now.

Let's say that you want to find out the probability

of getting a to when you roll the dice.

Okay.

So finding this probability is the random experiment

the sample space is basically your entire possibility.

Okay.

So one two, three, four,

five six phases are there and out of that you need

to find the probability of getting a 2, right.

So all the possible outcomes

will basically represent your sample space.

Okay.

So 1 to 6 are all your possible outcomes this represents.

Sample space event is one or more outcome

of an experiment.

So in this case my event is to get a to

when I roll a dice, right?

So my event is the probability of getting a to

when I roll a dice.

So guys, this is basically what random experiment sample space

and event really means alright.

Now, let's discuss the different types of events.

There are two types of events that you should know about there

is disjoint and non disjoint events disjoint events.

These are events

that do not have any common outcome.

For example,

if you draw a single card from a deck of cards,

it cannot be a king and a queen correct.

It can either be king or it can be Queen.

Now a non disjoint events are events

that have common outcomes.

For example, a student can get hundred marks

in statistics and hundred marks in probability.

All right, and also the outcome

of a ball delibird can be a no ball

and it can be a 6 right.

So this is Non disjoint events are or n.

These are very simple to understand right now.

Let's move on and look at the different types

of probability distribution.

All right, I'll be discussing

the three main probability distribution functions.

I'll be talking about probability density

function normal distribution and Central limit theorem.

Okay probability density function also known

as PDF is concerned

with the relative likelihood for a continuous random variable.

To take on a given value.

All right.

So the PDF gives the probability

of a variable that lies between the range A and B.

So basically what you're trying to do is you're going to try

and find the probability of a continuous random variable

over a specified range.

Okay.

Now this graph denotes the PDF of a continuous variable.

Now, this graph is also known as the bell curve right?

It's famously called the bell curve because of

its shape and there are three important properties

that you To know about a probability density function.

Now the graph of a PDF will be continuous over a range.

This is because you're finding the probability

that a continuous variable lies between the ranges A and B,

right the second property is

that the area bounded by the curve of a density function

and the x-axis is equal to 1 basically the area

below the curve is equal to 1 all right,

because it denotes probability again

the probability cannot arrange.

More than one it has to be

between 0 and 1 property number three is that the probability

that our random variable assumes a value between A

and B is equal to the area

under the PDF bounded by A and B. Okay.

Now what this means is

that the probability value is denoted by the area

of the graph.

All right, so whatever value that you get here,

which basically one is the probability

that a random variable will lie between the range A and B.

All right, so I hope

If you have understood the probability density function,

it's basically the probability of finding the value

of a continuous random variable between the range A and B.

All right.

Now, let's look at our next distribution,

which is normal distribution now normal distribution,

which is also known as

the gaussian distribution is a probability distribution

that denotes the symmetric property

of the mean right meaning

that the idea behind this function is

that The data near the mean

occurs more frequently than the data away from the mean.

So what it means to say is

that the data around the mean represents the entire data set.

Okay.

So if you just take a sample of data

around the mean it can represent the entire data set now similar

to the probability density function the normal distribution

appears as a bell curve.

All right.

Now when it comes to normal distribution,

there are two important factors.

All right, we have the mean of the population.

And the standard deviation.

Okay, so the mean and the graph determines the location

of the center of the graph,

right and the standard deviation determines the height

of the graph.

Okay.

So if the standard deviation is large the curve is going

to look something like this.

All right, it'll be short and wide and

if the standard deviation is small the curve

is tall and narrow.

All right.

So this was it about normal distribution.

Now, let's look at the central limit theorem.

Now the central limit theorem states

that the sampling distribution

of the mean of any independent random variable will be normal

or nearly normal

if the sample size is large enough now,

that's a little confusing.

Okay.

Let me break it down for you now in simple terms

if we had a large population

and we divided it into many samples.

Then the mean of all the samples

from the population will be almost equal

to the mean of the entire population right meaning

that each of the sample is normally distributed.

Right.

So if you compare the mean of each of the sample,

it will almost be equal to the mean of the population.

Right?

So this graph basically shows a more clear understanding

of the central limit theorem red you can see each sample here

and the mean

of each sample is almost along the same line, right?

Okay.

So this is exactly

what the central limit theorem States now the accuracy

or the resemblance

to the normal distribution depends on two main factors.

Right.

So the first is the number of sample points

that you consider.

All right,

and the second is a shape of the underlying population.

Now the shape obviously depends on the standard deviation

and the mean of a sample, correct.

So guys the central limit theorem basically states

that each sample will be normally distributed

in such a way

that the mean of each sample will coincide with the mean

of the actual population.

All right in short terms.

That's what central limit theorem States.

Alright, and this holds true only for a large.

Is it mostly for a small data set

and there are more deviations

when compared to a large data set is because of

the scaling Factor, right?

The small is deviation

in a small data set will change the value very drastically,

but in a large data set a small deviation

will not matter at all.

Now, let's move on and look at our next topic

which is the different types of probability.

Now, this is a important topic

because most of your problems can be solved by understanding

which type of probability should I use to solve?

This problem right?

So we have three important types of probability.

We have marginal joint and conditional probability.

So let's discuss each

of these now the probability of an event occurring unconditioned

on any other event is known as marginal probability

or unconditional probability.

So let's say that you want to find the probability

that a card drawn is a heart.

All right.

So if you want to find the probability

that a card drawn is a heart the prophet.

B13 by 52 since there are 52 cards in a deck

and there are 13 hearts in a deck of cards.

Right and there are 52 cards in a turtleneck.

So your marginal probability will be 13 by 52.

That's about marginal probability.

Now, let's understand.

What is joint probability.

Now joint probability

is a measure of two events happening at the same time.

Okay.

Let's say that the two events are A and B.

So the probability of event A

and B occurring is the dissection of A and B.

So for example,

if you want to find the probability

that a card is a four and a red that would be joint probability.

All right, because you're finding a card

that is 4 and the card has to be red in color.

So for the answer,

this will be 2 by 52 because we have 1/2

in heart and we have 1/2 and diamonds correct.

So both of these are red and color therefore.

Our probability is to by 52

and if you further down it Is 1 by 26, right?

So this is what joint probability is all

about moving on.

Let's look at what exactly conditional probability is.

So if the probability

of an event or an outcome is based on the occurrence

of a previous event or an outcome,

then you call it as a conditional probability.

Okay.

So the conditional probability of an event B is the probability

that the event will occur given

that an event a has already occurred, right?

So if a and b are dependent events,

then the expression

for conditional probability is given by this.

Now this first term on the left hand side,

which is p b of a is basically the probability

of event B occurring

given that event a has already occurred.

All right.

So like I said,

if a and b are dependent events,

then this is the expression but if a

and b are independent events,

and the expression for conditional probability is

like this, right?

So guys P of A and B of B is obviously the probability of A

and probability of B right now.

Let's move on now in order

to understand conditional probability joint probability

and marginal probability.

Let's look at a small use case.

Okay now basically we're going to take a data set

which examines the salary package and training

undergone my candidates.

Okay.

Now in this there are 60 candidates without training

and forty five candidates,

which have enrolled for Adder a curse training.

Right.

Now the task here is you have to assess the training

with a salary package.

Okay, let's look at this in a little more depth.

So in total,

we have hundred and five candidates out of which 60

of them have not enrolled Frederick has training

and 45 of them have enrolled for a deer Acres training

or this is a small survey

that was conducted

and this is the rating of the package or the salary

that they got right?

So if you read through the data,

you can understand there were five candidates.

It's without education

or training who got a very poor salary package.

Okay.

Similarly, there are

30 candidates with Ed Eureka training

who got a good package, right?

So guys basically you're comparing the salary package

of a person depending on

whether or not they've enrolled for a director training, right?

This is our data set.

Now, let's look at our problem statement find the probability

that a candidate has undergone a Drake

has training quite simple,

which type of probability is this Is this is

marginal probability?

Right?

So the probability

that a candidate has undergone edger Acres training is

obviously 45 divided by a hundred and five

since 45 is the number

of candidates with Eddie record raining

and hundred and five is the total number of candidates.

So you get a value of approximately 0.4

to all right,

that's the probability of a candidate

that has undergone educate a girl straining next question

find the probability

that a candidate has attended edger Acres training.

Also has good package.

Now.

This is obviously a joint probability problem, right?

So how do you calculate this now?

Since our table is quite formatted we can directly find

that people who have gotten a good package

along with Eddie record raining or 30, right?

So out of hundred and five people 30 people

have education training and a good package, right?

They specifically asking for people

with Eddie record raining.

Remember that night.

The question is find the probability that a gang Today,

it has attended editor Acres training

and also has a good package.

All right, so we need to consider two factors

that is a candidate

who's addenda deaderick has training and

who has a good package.

So clearly that number

is 30 30 divided by total number of candidates,

which is 1:05, right?

So here you get the answer clearly next.

We have find the probability

that a candidate has a good package given

that he has not undergone training.

Okay.

Now this is Early conditional probability

because here you're defining a condition you're saying

that you want to find the probability of a candidate

who has a good package given that he's not undergone.

Any training, right?

The condition is that he's not undergone any training.

All right.

So the number of people

who have not undergone training are 60 and out

of that five of them have got a good package

that so that's why this is Phi by 60 and not five

by a hundred and five

because here they have clearly mentioned has a good pack.

Given that he has not undergone training.

So you have to only

consider people who have not undergone training, right?

So any five people

who have not undergone training have gotten

a good package, right?

So 5 divided by 60 you get a probability of around 0.08

which is pretty low, right?

Okay.

So this was all

about the different types of probability now,

let's move on and look at our last Topic in probability,

which is base theorem.

Now guys base.

Your room is a very important concept when it comes

to statistics and probability.

It is majorly used in knife bias algorithm.

Those of you who aren't aware.

Now I've bias is a supervised learning classification

algorithm and it is mainly used in Gmail spam filtering right?

A lot of you might have noticed that if you open up Gmail,

you'll see that you have a folder called spam right

or that is carried out through machine learning

and And the algorithm use there is knife bias, right?

So now let's discuss what exactly the Bayes theorem is

and what it denotes the bias theorem is used

to show the relation between one conditional probability

and it's inverse.

All right.

Basically it's nothing but the probability

of an event occurring based on prior knowledge of conditions

that might be related to the same event.

Okay.

So mathematically the bell's theorem is represented

like this right now.

Shown in this equation.

The left-hand term is referred to as the likelihood ratio

which measures the probability

of occurrence of event be given an event a okay

on the left hand side is

what is known as the posterior right

is referred to as posterior,

which means that the probability

of occurrence of a given an event be right.

The second term is referred

to as the likelihood Ratio or at this measures the probability

of occurrence of B given an event.

A now P of a is also known as the prior

which refers to the actual probability distribution of A

and P of B is again,

the probability of B, right.

This is the bias theorem

and in order to better understand the base theorem.

Let's look at a small example.

Let's say that we have three bowels we have bow is

a bow will be and bouncy.

Okay barley contains two blue balls

and for red balls bowel B

contains eight blue balls and for red balls.

Wow Zeke.

Games one blue ball and three red balls now

if we draw one ball from each Bowl,

what is the probability to draw a blue ball

from a bowel a if we know

that we drew exactly a total of two blue balls, right?

If you didn't understand the question,

please read it I shall pause for a second or two.

Right.

So I hope all of you have understood the question.

Okay.

Now what I'm going to do is I'm going to draw

a blueprint for you

and tell you how exactly to solve the problem.

But I want you all to give me the solution

to this problem, right?

I'll draw a blueprint.

I'll tell you what exactly the steps are

but I want you to come up with a solution

on your own right the formula is also given to you.

Everything is given to you.

All you have to do is come up with the final answer.

Right?

Let's look at how you can solve this problem.

So first of all,

what we will do is Let's consider a all right,

let a be the event

of picking a blue ball from bag in and let

X be the event of picking exactly two blue balls,

right because these are the two events

that we need to calculate the probability of now

there are two probabilities that you need to consider here.

One is the event of picking a blue ball from bag a

and the other is the event of picking exactly two blue balls.

Okay.

So these two are represented by a and X respectively

and so what we want is the probability of occurrence

of event a given X,

which means that given

that we're picking exactly two blue balls.

What is the probability

that we are picking a blue ball from bag?

So by the definition of conditional probability,

this is exactly what our equation will look like.

Correct.

This is basically a occurrence of event a given element X

and this is the probability of a and x

and this is the probability of X alone, correct.

What we need to do is we need to find these two probabilities

which is probability of a and X occurring together

and probability of X. Okay.

This is the entire solution.

So how do you find P probability

of X this you can do in three ways.

So first is white ball from a either white from be

or read from see now first is to find the probability of x x

basically represents the event

of picking exactly two blue balls.

Right.

So these are the three ways in which it is possible.

So you'll pick one blue ball from bowel a and one from bowel

be in the second case.

You can pick one

from a and another blue ball from see in the third case.

You can pick a blue ball from Bagby

and a blue ball from bagsy.

Right?

These are the three ways in which it is possible.

So you need to find the probability of each

of this step do is

that you need to find the probability of a

and X occurring together.

This is the sum of terms one and two.

Okay, this is

because in both of these events,

you're picking a ball from bag, correct?

So there is find out this probability and let

me know your answer in the comment section.

All right.

We'll see if you get the answer right?

I gave you the entire solution to this.

All you have to do is substitute the value right?

If you want a second or two,

I'm going to pause on the screen so that you can go through this

in a more clearer way right?

Remember that you need to calculate two.

He's the first probability

that you need to calculate is the event of picking a blue ball

from bag a given

that you're picking exactly two blue balls.

Okay, II probability you need to calculate is the event

of picking exactly to bluebirds.

All right.

These are the two probabilities.

You need to calculate so remember that and this

is the solution.

All right, so guys,

make sure you mention your answers

in the comment section for now.

Let's move on and Get our next topic,

which is the inferential statistics.

So guys, we just completed the probability module right now.

We will discuss inferential statistics,

which is the second type of Statistics.

We discussed descriptive statistics earlier.

All right.

So like I mentioned earlier inferential statistics also

known as statistical inference is a branch of Statistics

that deals with forming inferences and predictions

about a population based on a sample of data.

Taken from the population.

All right, and the question you should ask is

how does one form inferences or predictions on a sample?

The answer is you use Point estimation?

Okay.

Now you must be wondering

what is point estimation one estimation is concerned

with the use of the sample data to measure a single value

which serves as an approximate value

or the best estimate of an unknown population parameter.

That's a little confusing.

Let me break it down to you for Camping

in order to calculate the mean of a huge population.

What we do is we first draw out the sample of the population

and then we find the sample mean

right the sample mean is then used to estimate

the population mean this is basically Point estimate,

you're estimating the value of one of the parameters

of the population, right?

Basically the main

you're trying to estimate the value of the mean.

This is what point estimation is the two main terms

in point estimation.

There's something known as

as the estimator and the something known

as the estimate estimator is a function of the sample

that is used to find out the estimate.

Alright in this example.

It's basically the sample mean right so a function

that calculates the sample mean is known as the estimator

and the realized value

of the estimator is the estimate right?

So I hope Point estimation is clear.

Now, how do you find the estimates?

There are four common ways in which you can do this.

The first one is method of Moment yo,

what you do is you form an equation

in the sample data set

and then you analyze the similar equation

in the population data set

as well like the population mean population variance and so on.

So in simple terms,

what you're doing is you're taking down some known facts

about the population

and you're extending those ideas to the sample.

Alright, once you do that,

you can analyze the sample and estimate more

essential or more complex values right next.

We have maximum likelihood.

This method basically uses a model to estimate a value.

All right.

Now a maximum likelihood is majorly based on probability.

So there's a lot of probability involved in this method next.

We have the base estimator this works by minimizing

the errors or the average risk.

Okay, the base estimator has a lot to do

with the Bayes theorem.

All right, let's not get into the depth

of these estimation methods.

Finally.

We have the best unbiased estimators in this method.

There are seven unbiased estimators that can be used

to approximate a parameter.

Okay.

So Guys these were a couple of methods

that are used to find the estimate

but the most well-known method to find the estimate is known as

the interval estimation.

Okay.

This is one

of the most important estimation methods right?

This is where confidence interval also comes

into the picture right apart from interval estimation.

We also have something known as margin of error.

So I'll be discussing all of this.

In the upcoming slides.

So first let's understand.

What is interval estimate?

Okay, an interval or range of values,

which are used to estimate a population parameter is known as

an interval estimation, right?

That's very understandable.

Basically what they're trying to see is you're going to estimate

the value of a parameter.

Let's say you're trying to find the mean of a population.

What you're going to do is you're going to build a range

and your value will lie in that range or in that interval.

Alright, so this way your output is going to be more accurate

because you've not predicted a point estimation instead.

You have estimated an interval

within which your value might occur, right?

Okay.

Now this image clearly shows

how Point estimate and interval estimate or different

so guys interval estimate is obviously more accurate

because you are not just focusing on a particular value

or a particular point

in order to predict the probability instead.

You're saying that the value might be

within this range between the lower confidence limit

and the upper confidence limit.

All right, this is denotes the range or the interval.

Okay, if you're still confused about interval estimation,

let me give you a small example

if I stated that I will take 30 minutes to reach the theater.

This is known as Point estimation.

Okay, but if I stated

that I will take between 45 minutes

to an hour to reach the theater.

This is an example of into Estimation.

All right.

I hope it's clear.

Now now interval estimation gives rise to two important

statistical terminologies one is known as confidence interval

and the other is known as margin of error.

All right.

So there's it's important

that you pay attention

to both of these terminologies confidence interval is one

of the most significant measures

that are used to check

how essential machine learning model is.

All right.

So what is confidence interval confidence interval is

the measure of your confidence

that the interval estimated contains

the population parameter or the population mean

or any of those parameters right now statisticians

use confidence interval to describe the amount

of uncertainty associated

with the sample estimate of a population parameter now guys,

this is a lot of definition.

Let me just make you understand confidence interval

with a small example.

Okay.

Let's say that you perform

a survey and you survey a group of cat owners.

The see how many cans of cat food they purchase

in one year.

Okay, you test

your statistics at the 99 percent confidence level

and you get a confidence interval

of hundred comma 200 this means

that you think

that the cat owners

by between hundred to two hundred cans in a year and also

since the confidence level is 99% shows

that you're very confident that the results are, correct.

Okay.

I hope all of you are clear with that.

Alright, so your confidence interval here will be

a hundred and two hundred

and your confidence level will be 99% Right?

That's the difference between confidence interval

and confidence level So within your confidence interval

your value is going to lie and your confidence level will show

how confident you are about your estimation, right?

I hope that was clear.

Let's look at margin of error.

No margin of error

for a given level of confidence is a greatest possible distance

between the Point estimate

and the value of the parameter

that it is estimating you can say

that it is a deviation from the actual point estimate right.

Now.

The margin of error can be calculated

using this formula now zc her denotes the critical value

or the confidence interval

and this is X standard deviation divided by root

of the sample size.

All right, n is basically the sample size now,

let's understand how you can estimate

the confidence intervals.

So guys the level of confidence

which is denoted by C is the probability

that the interval estimate contains a population parameter.

Let's say that you're trying to estimate the mean.

All right.

So the level of confidence is the probability

that the interval estimate contains

the population parameter.

So this interval between minus Z and z

or the area beneath this curve is nothing but the probability

that the interval estimate contains a population parameter.

You don't all right.

It should basically contain the value

that you are predicting right.

Now.

These are known as critical values.

This is basically your lower limit

and your higher limit confidence level.

Also, there's something known as the Z score now.

This court can be calculated by using the standard normal table.

All right, if you look it up anywhere on Google

you'll find the z-score table

or the standard normal table to understand

how this is done.

Let's look at a small example.

Okay, let's say that the level of confidence.

Vince is 90% This means that you are 90% confident

that the interval contains the population mean.

Okay, so the remaining 10% which is out of hundred percent.

The remaining 10% is equally distributed

on these tail regions.

Okay, so you have 0.05 here and 0.05 over here, right?

So on either side

of see you will distribute the other leftover percentage

now these Z scores are calculated from the table

as I mentioned before.

All right one.

I'm 6 4 5 is get collated from the standard normal table.

Okay, so guys how you estimate the level of confidence?

So to sum it up.

Let me tell you the steps that are involved in constructing

a confidence interval first.

You would start by identifying a sample statistic.

Okay.

This is the statistic

that you will use to estimate a population parameter.

This can be anything like the mean

of the sample next you will select a confidence level

now the confidence level describes the uncertainty

of a Sampling method right

after that you'll find something known as the margin

of error right?

We discussed margin of error earlier.

So you find this based on the equation

that I explained in the previous slide,

then you'll finally specify the confidence interval.

All right.

Now, let's look at a problem statement

to better understand this concept a random sample

of 32 textbook prices is taken from a local College Bookstore.

The mean of the sample is so so

and so and the sample standard deviation is

This use a 95% confident level

and find the margin of error for the mean price

of all text books in the bookstore.

Okay.

Now, this is a very straightforward question.

If you want you can read the question again.

All you have to do is you have to just substitute the values

into the equation.

All right, so guys,

we know the formula for margin of error you take the Z score

from the table.

After that we have deviation Madrid's 23.4 for right

and that's standard deviation and n stands for the number

of samples here.

The number of samples is 32 basically 32 textbooks.

So approximately your margin of error is going to be

around 8.1 to this is a pretty simple question.

All right.

I hope all of you understood this now

that you know,

the idea behind confidence interval.

Let's move ahead to one

of the most important topics in statistical inference,

which is hypothesis testing, right?

So Ugly statisticians use hypothesis testing

to formally check

whether the hypothesis is accepted or rejected.

Okay, hypothesis.

Testing is an inferential statistical technique

used to determine

whether there is enough evidence in a data sample to infer

that a certain condition holds true for an entire population.

So to understand

the characteristics of a general population,

we take a random sample,

and we analyze the properties of the sample right we test.

Whether or not the identified conclusion represents

the population accurately

and finally we interpret the results now

whether or not to accept the hypothesis depends

upon the percentage value that we get from the hypothesis.

Okay, so to better understand this,

let's look at a small example before that.

There are a few steps that are followed

in hypothesis testing you begin by stating the null

and the alternative hypothesis.

All right.

I'll tell you what exactly these terms are

and then you formulate.

Analysis plan right after that you analyze the sample data

and finally you can interpret the results

right now to understand the entire hypothesis testing.

We look at a good example.

Okay now consider for boys Nick jean-bob

and Harry these boys were caught bunking a class

and they were asked to stay back at school

and clean the classroom as a punishment, right?

So what John did is he decided

that four of them would take turns to clean their classrooms.

He came up with a plan of writing each

of their names on chits

and putting them

in a bowl now every day they had to pick up a name from the bowel

and that person had to play in the clock, right?

That sounds pretty fair enough now it is been three days

and everybody's name has come up except John's assuming

that this event is completely random

and free of bias.

What is a probability

of John not cheating right or is the probability

that he's not actually cheating this can Solved

by using hypothesis testing.

Okay.

So we'll Begin by calculating the probability of John

not being picked for a day.

Alright, so we're going to assume

that the event is free of bias.

So we need to find out the probability

of John not cheating right first we will find the probability

that John is not picked for a day, right?

We get 3 out of 4,

which is basically 75% 75% is fairly high.

So if John is not picked for three days in a row

the Probability will drop down to approximately 42% Okay.

So three days in a row meaning

that is the probability drops down to 42 percent.

Now, let's consider a situation

where John is not picked for 12 days in a row

the probability drops down to three point two percent.

Okay.

That's the probability of John cheating becomes

fairly high, right?

So in order

for statisticians to come to a conclusion,

they Define what is known as a threshold value.

Right considering the above situation

if the threshold value is set to 5 percent.

It would indicate

that if the probability lies below 5% then John is cheating

his way out of detention.

But if the probability is about threshold value then John

it just lucky and his name isn't getting picked.

So the probability

and hypothesis testing give rise to two important components

of hypothesis testing,

which is null hypothesis and alternative hypothesis.

Null.

Hypothesis is based.

Basically approving

the Assumption alternate hypothesis is

when your result disapproves the Assumption right therefore

in our example,

if the probability of an event occurring

is less than 5% which it is then the event is biased hence.

It proves the alternate hypothesis.

So guys with this we come to the end of this session.

Let's go ahead and understand what exactly is.

Was learning so supervised learning is

where you have the input variable X

and the output variable Y and use an algorithm

to learn the map Egg function from the input to the output

as I mentioned earlier

with the example of face detection.

So it is called supervised learning

because the process of an algorithm learning

from the training data set can be thought

of as a teacher supervising the learning process.

So if we have a look at the supervised learning steps

or What would rather say the workflow?

So the model is used as you can see here.

We have the historic data.

Then we again we have the random sampling.

We split the data into train your asset

and the testing data set using the training data set.

We with the help of machine learning

which is supervised machine learning.

We create statistical model then

after we have a mod which is being generated

with the help of the training data set.

What we do is use the testing data set

for production and testing.

What we do is get the output

and finally we have the model validation outcome.

That was the training and testing.

So if we have a look at the prediction part

of any particular supervised learning algorithm,

so the model is used for operating outcome

of a new data set.

So whenever performance

of the model degraded the model is retrained

or if there are any performance issues,

the model is retained with the help of the new data now

when we talk about supervisor

in there not just one but quite a few algorithms here.

So we have linear regression logistic regression.

This is entry.

We have random Forest.

We have made by classifiers.

So linear regression is used to estimate real values.

For example, the cost of houses.

The number of calls the total sales based

on the continuous variables.

So that is what reading regression is.

Now when we talk about logistic regression,

which is used to estimate discrete values, for example,

which are binary values like 0 and 1 yes,

or no true.

False based on the given set of independent variables.

So for example,

when you are talking about something like the chances

of winning or if you talk

about winning which can be either true or false

if will it rain today with it can be the yes or no,

so it cannot be

like when the output of a particular algorithm

or the particular question is either.

Yes.

No or Banner e then

only we use a large stick regression the next

we have decision trees.

So now these are used for classification problems it work.

X for both categorical and continuous

dependent variables and

if we talk about random Forest So Random Forest is an M symbol

of a decision tree,

it gives better prediction accuracy than decision tree.

So that is another type of supervised learning algorithm.

And finally we have the need based classifier.

So it was

a classification technique based on the Bayes theorem

with an assumption of Independence between predictors.

A linear regression is one of the easiest algorithm

in machine learning.

It is a statistical model

that attempts to show the relationship

between two variables with a linear equation.

But before we drill down

to linear regression algorithm in depth,

I'll give you a quick overview of today's agenda.

So we'll start a session

with a quick overview of what is regression

as linear regression is one of a type

of regression algorithm.

Once we learn about regression,

its use case the various types of it next.

We'll learn about the algorithm from scratch.

Each where I'll teach

you it's mathematical implementation first,

then we'll drill down to the coding part

and Implement linear regression using python

in today's session will deal

with linear regression algorithm using least Square method check

its goodness of fit

or how close the data is

to the fitted regression line using the R square method.

And then finally

what will do will optimize it using the gradient decent method

in the last part on the coding session.

I'll teach you to implement linear regression using Python

and Coding session

would be divided into two parts the first part would consist

of linear regression using python from scratch

where you will use the mathematical algorithm

that you have learned in this session.

And in the next part of the coding session

will be using scikit-learn for direct implementation

of linear regression.

So let's begin our session with what is regression.

Well regression analysis is

a form of predictive modeling technique

which investigates the relationship between a dependent

and independent variable a regression analysis.

Vols graphing a line over a set of data points

that most closely fits the overall shape of the data

or regression shows the changes in a dependent variable

on the y-axis

to the changes

in the explanation variable on the x-axis fine.

Now you would ask what are the uses of regression?

Well, there are major three uses of regression analysis the first

being determining the strength of predicates errs,

the regression might be used to identify the strength

of the effect

that the independent variables have on the dependent variable

or But you can ask question.

Like what is the strength of relationship between sales

and marketing spending or what is the relationship between age

and income second is forecasting

an effect in this the regression can be used to forecast effects

or impact of changes.

That is the regression analysis help us to understand

how much the dependent variable changes with the change

and one or more independent variable fine.

For example, you can ask question like how much

additional say Lancome will I get for each?

Thousand dollars spent on marketing.

So it is Trend forecasting

in this the regression analysis predict Trends

and future values.

The regression analysis can be used to get Point estimates

in this you can ask questions.

Like what will be the price of Bitcoin

and next six months, right?

So next topic is linear versus logistic regression by now.

I hope that you know, what a regression is.

So let's move on and understand its type.

So there are various kinds of regression like linear

regression logistic regression polynomial regression.

Others only but for this session

will be focusing on linear and logistic regression.

So let's move on and let me tell you what is linear regression.

And what is logistic regression

then what we'll do we'll compare both of them.

All right.

So starting with linear regression

in simple linear regression,

we are interested in things like y equal MX plus C.

So what we are trying to find is the correlation between X

and Y variable this means

that every value of x

has a corresponding value of y and it

if it is continuous.

All right, however in logistic regression,

we are not fitting our data to a straight line

like linear regression instead what we are doing.

We are mapping Y versus X

to a sigmoid function in logistic regression.

What we find out is is y 1 or 0 for this particular value of x

so thus we are essentially deciding true or false value

for a given value of x fine.

So as a core concept of linear regression,

you can say that the data is modeled using a straight.

But in the case of logistic regression

the data is module using a sigmoid function.

The linear regression is used with continuous variables

on the other hand the logistic regression.

It is used with categorical variable the output

or the prediction of a linear regression

is a value of the variable

on the other hand the output of production

of a logistic regression is the probability

of occurrence of the event.

Now, how will you check the accuracy

and goodness of fit in case of linear regression?

We are various methods like measured by loss R square.

Are adjusted r squared Etc

while in the case of logistic regression you

have accuracy precision recall F1 score,

which is nothing but the harmonic mean of precision

and recall next is Roc curve

for determining the probability threshold for classification

or the confusion Matrix Etc.

There are many all right.

So summarizing the difference

between linear and logistic regression.

You can say

that the type of function you are mapping to is the main point

of difference between linear and logistic regression

a linear regression model.

The Continuous X2 a continuous file on the other hand

a logistic regression Maps a continuous x

to the bindery why

so we can use logistic regression to make category

or true false decisions from the data find

so let's move on ahead next is linear

regression selection criteria,

or you can say when will you use linear regression?

So the first is classification

and regression capabilities regression models predict

a continuous variable such as the sales made on a day

or predict the temperature

of a city T their Reliance on a polynomial

like a straight line to fit a data set

poses a real challenge

when it comes towards building a classification capability.

Let's imagine that you fit a line with a train points

that you have now imagine you add some more data points to it.

But in order to fit it, what do you have to do?

You have to change your existing model

that is maybe you have to change the threshold itself.

So this will happen with each new data point you are

to the model hence.

The linear regression is not good for classification models.

Fine.

Next is data quality.

Each missing value removes one data point

that could optimize

the regression and simple linear regression.

The outliers can significantly

disrupt the outcome just for now.

You can know that if you remove the outliers your model

will become very good.

All right.

So this is about data quality.

Next is computational complexity a linear regression is often

not computationally expensive as compared to the decision tree

or the clustering algorithm the order of complexity

for n training example and X features usually Falls

in either Big O of x

Or bigger of xn next is comprehensible

and transparent the linear regression are

easily comprehensible and transparent in nature.

They can be represented by a simple mathematical notation

to anyone and can be understood very easily.

So these are some of the criteria based

on which you will select the linear regression algorithm.

All right.

Next is where is linear regression used first

is evaluating trans and sales estimate.

Well linear regression can be used in business

to evaluate Trends and make estimates.

Forecast for example,

if a company sales have increased steadily every month

for past few years then conducting a linear analysis

on the sales data with monthly sales on the y axis

and time on the x axis.

This will give you a line

that predicts the upward Trends in the sale after creating

the trendline the company could use the slope

of the lines to focused sale in future months.

Next is analyzing.

The impact of price changes will linear regression

can be used to analyze the effect of pricing

on Omer behavior for instance,

if a company changes

the price on a certain product several times,

then it can record the quantity itself for each price level

and then perform a linear regression

with sold quantity as

a dependent variable and price as the independent variable.

This would result in a line that depicts the extent

to which the customer reduce their consumption of the product

as the prices increasing.

So this result would help us in future pricing decisions.

Next is assessment of risk in financial services

and insurance domain.

Linear regression can be used to analyze the risk,

for example health insurance company might conduct

a linear regression algorithm

how it can do it can do it by plotting the number of claims

per customer against its age and they might discover

that the old customers

tend to make more health insurance claim.

Well the result of such analysis might guide

important business decisions.

All right, so by now you have just a rough idea of

what linear regression algorithm as like

what it does where it is used when You should use it early.

Now.

Let's move on and understand the algorithm

and depth so suppose you have independent variable

on the x-axis and dependent variable on the y-axis.

All right suppose.

This is the data point on the x axis.

The independent variable is increasing on the x-axis.

And so does the dependent variable on the y-axis?

So what kind of linear regression line you would get

you would get a positive linear regression line.

All right as the slope would be positive next is suppose.

You have an independent variable on the X axis

which is increasing

and on the other hand the dependent variable on the y-axis

that is decreasing.

So what kind of line will you get in that case?

You will get a negative regression line.

In this case as the slope of the line is negative

and this particular line

that is line of y equal MX

plus C is a line of linear regression

which shows the relationship between independent variable

and dependent variable

and this line is only known as line of linear regression.

Okay.

So let's add some data points, too.

Our graph so these are some observation

or data points on our graph.

So let's plot some more.

Okay.

Now all our data points are plotted now our task is

to create a regression line or the best fit line.

All right now

once our regression line is drawn now,

it's the task of production now suppose.

This is our estimated value or the predicted value

and this is our actual value.

Okay.

So what we have to do our main goal is to reduce this error

that is to reduce the distance

between the Estimated or the predicted value

and the actual value the best fit line would be the one

which had the least error

or the least difference in estimated value

and the actual value.

All right, and other words we have to minimize the error.

This was a brief understanding

of linear regression algorithm soon.

We'll jump towards mathematical implementation.

But for then let me tell you this suppose you draw a graph

with speed on the x-axis

and distance covered

on the y axis with the time domain in constant.

If you plot a graph between the speed travel

by the vehicle

and the distance traveled in a fixed unit of time,

then you will get a positive relationship.

All right.

So suppose the equation of a line is y equal MX plus C.

Then in this case Y is the distance traveled

in a fixed duration of time x

is the speed of vehicle m is the positive slope

of the line and see is the y-intercept of the line.

All right suppose the distance remaining constant.

You have to plot a graph between the speed of the vehicle

and the time taken to travel a fixed distance.

Then in that case you will get a line

with a negative relationship.

All right, the slope of the line is negative here the equation

of line changes to y equal minus of MX plus C

where Y is the time taken to travel

a fixed distance X is the speed

of vehicle m is the negative slope

of the line and see is the y-intercept of the line.

All right.

Now, let's get back

to our independent and dependent variable.

So in that term, why is our dependent variable

and X that is our independent variable now,

let's move on.

And see them at the magical implementation of the things.

Alright, so we have x

equal 1 2 3 4 5 let's plot them on the x-axis.

So 0 1 2 3 4 5 6 align and we have y as 3 4 2 4 5.

All right.

So let's plot 1

2 3 4 5 on the y-axis now,

let's plot our coordinates 1 by 1 so x equal 1 and y equal 3,

so we have here x

equal 1 and y equal

3 So this is the point 1 comma 3 so similarly

we have 1 3 2 4 3 2 4 4 & 5 5.

Alright, so moving on ahead.

Let's calculate the mean of X and Y and plot it on the graph.

All right, so mean of X is 1

plus 2 plus 3 plus 4 plus 5 divided by 5.

That is 3.

All right, similarly mean of Y is 3 plus 4 plus 2

plus 4 plus 5 that is 18.

So we 10 divided by 5.

That is nothing but 3.6.

Alright, so next what we'll do we'll plot.

I mean that is 3 comma 3 .6 on the graph.

Okay.

So there's a point 3 comma 3 .6

see our goal is to find or predict the best fit line

using the least Square Method All right.

So in order to find

that we first need to find the equation of line,

so let's find the equation of our regression line.

Alright, so let's suppose this is our regression line

y equal MX plus C.

Now.

We have an equation of line.

So all we need to do is find the value of M

and C. I wear m equals summation of x

minus X bar X Y minus y bar upon the summation of x

minus X bar whole Square don't get confused.

Let me resolve it for you.

All right.

So moving on ahead as a part of formula.

What we are going to do will calculate x minus X bar.

So we have X as 1 minus X bar as 3 so 1 minus 3

that is minus 2 next.

We have x equal to minus its mean 3

that is minus 1 similarly we 3 -

3 0 4 minus 3 1 5 - 3 2.

All right, so x minus X bar.

It's nothing but the distance of all the point

through the line y equal 3

and what does this y

minus y bar implies it implies the distance

of all the point from the line

x equal 3 .6 fine.

So let's calculate the value of y minus y bar.

So starting with y equal 3 -

value of y bar that is 3.6.

So it is three minus three. .6.

How much -

of 0.6 next is 4 minus 3.6 that is 0.4 next to minus 3.6

that is - of 1.6.

Next is 4 minus 3.6 that is 0.4 again,

5 minus 3.6 that is 1.4.

Alright, so now we are done with Y minus y bar fine now next

we will calculate x minus X bar whole Square.

So let's calculate x minus X bar whole Square

so it is -

2 whole square that is 4 minus 1 whole square.

That is 1 0 squared is 0 1 Square 1 2 square for fine.

So now in our table we have x minus X bar y minus y bar

and x minus X bar whole Square.

Now what we need.

We need the product of x minus X bar X Y minus y bar.

Alright, so let's see the product of x

minus X bar X Y minus

y bar that is minus of 2 x minus of 0.6.

That is 1.2 minus of 1 x 0 point 4.

That is minus.

- of zero point 4 0 x minus of 1.6.

That is 0 1 multiplied by zero point four

that is 0.4.

And next 2 multiplied by 1 point for that is 2.8.

All right.

Now almost all the parts of our formula is done.

So now what we need to do is get the summation

of last two columns.

All right, so the summation of x minus X bar whole square is 10

and the summation of x

minus X bar X Y minus y bar is

for So the value of M will be equal to 4 by 10 fine.

So let's put this value

of m equals zero point 4 and our line y equal MX plus C.

So let's file all the points into the equation

and find the value of C.

So we have y as 3.6 remember the mean by m as 0.4

which we calculated just now X as the mean value of x

that is 3 and we

have the equation as 3 point 6 equals 0 .4

Applied by 3 plus C. Alright

that is 3.6 equal 1 Point 2 plus C.

So what is the value of C that is 3.6 minus 1.2.

That is 2.4.

All right.

So what we had we had m equals zero point four C as 2.4.

And then finally

when we calculate the equation of the regression line,

what we get is y equal zero point four times of X

plus two point four.

So this is the regression line.

All right, so there is

how you are plotting your points this Actual point.

All right now for given m equals zero point four and SQL 2.4.

Let's predict the value of y for x equal 1 2 3 4 & 5.

So when x equal 1 the predicted value

of y will be zero point four x

one plus two point four that is 2.8.

Similarly when x equal to predicted value

of y will be zero point 4 x

2 + 2 point 4 that equals to 3 point 2 similarly x

equal 3 y will be 3. .6.

X equals 4 y will be 4 point 0 x

equal 5 y will be four point four.

So let's plot them on the graph

and the line passing through all these predicting point

and cutting y-axis at 2.4 as the line of regression.

Now your task is to calculate the distance between the actual

and the predicted value

and your job is to reduce the distance.

All right, or in other words,

you have to reduce the error between the actual

and the predicted value the line

with the least error will be the line of linear regression.

Chicken or regression line

and it will also be the best fit line.

All right.

So this is how things work in computer.

So what it do it performs n number of iteration

for different values of M for different values of M.

It will calculate the equation of line

where y equals MX plus C.

Right?

So as the value of M changes the line

is changing so iteration will start from one.

All right, and it will perform a number of iteration.

So after every iteration

what it will do it will calculate the predicted.

Value according to the line and compare the distance

of actual value to the predicted value

and the value of M

for which the distance between the actual

and the predicted value is minimum will be selected

as the best fit line.

All right.

Now that we have calculated the best fit line now,

it's time to check the goodness of fit or to check

how good a model is performing.

So in order to do that,

we have a method called R square method.

So what is this R square?

Well r-squared value is a statistical measure

of how close the data are

to the fitted regression line in general.

It is considered

that a high r-squared value model is a good model,

but you can also have a lower squared value

for a good model as well

or a higher squared value for a model

that does not fit at all.

I like it is also known as coefficient of determination

or the coefficient of multiple determination.

Let's move on and see how a square is calculated.

So these are our actual values plotted on the graph.

We had calculated the predicted values

of Y as 2.8 3.2 3.6 4.0 4.4.

Remember when we calculated the predicted values

of Y for the equation Y predicted equals 0 1 4 x

of X plus two point four for every x

equal 1 2 3 4 & 5 from there.

We got the Ed values of Phi all right.

So let's plot it on the graph.

So these are point and the line passing

through these points are nothing but the regression line.

All right.

Now what you need to do is

you have to check and compare the distance of actual -

mean versus the distance of predicted - mean alike.

So basically what you are doing you are calculating the distance

of actual value to the mean to distance of predicted value

to the mean I like

so there is nothing

but a square in mathematically you can represent our school.

Whereas summation of Y predicted values minus y

bar whole Square divided by summation of Y minus

y bar whole Square

where Y is the actual value y p is the predicted value

and Y Bar is the mean value of y that is nothing but 3.6.

Remember, this is our formula.

So next what we'll do we'll calculate y minus.

Y1.

So we have y is 3y bar as 3 point 6,

so we'll calculate it as 3 minus 3.6

that is nothing but minus of 0.6 similarly for y equal 4

and Y Bar equal 3.6.

We have y minus y bar as zero point 4 then 2 minus 3.6.

It is 1 point 6 4 minus 3.6 again zero point four

and five minus 3.6

it is 1.4.

So we got the value of y minus y bar.

Now what we have to do we have to take it Square.

So we have minus 0.6 Square as 0.36 0.4 Square as 0.16 -

of 1.6 Square as 2.56 0.4 Square as 0.16 and 1.4 squared

is 1.96 now is a part of formula what we need.

We need our YP minus y BAR value.

So these are VIP values and we have to subtract it from the No,

why so 2 .8 minus 3.6 that is minus 0.8.

Similarly.

We will get 3.2 minus 3.6 that is 0.4 and 3.6 minus 3.6.

That is 0 for 1 0 minus 3.6 that is 0.4.

Then 4 .4 minus 3.6 that is 0.8.

So we calculated the value of YP minus y bar now,

it's our turn to calculate the value of y b minus

y bar whole Square next.

We have -

of 0.8 Square as 0.64 - of Point four square as 0.160 Square

0 0 point 4 Square as again 0.16 and 0.8 Square as 0.64.

All right.

Now as a part of formula

what it suggests it suggests me to take the summation of Y P

and summation of Y minus y bar whole Square.

All right.

Let's see.

So in submitting y minus y bar whole Square

what you get is five point two and summation of Y P minus

y bar whole Square you get one point six.

So the value of R square can be calculated as

1 point 6 upon 5.2 fine.

So the result which will get is approximately equal to 0.3.

Well, this is not a good fit.

All right, so it suggests

that the data points are far away from the regression line.

Alright, so this is

how your graph will look like when R square is 0.3

when you increase the value of R square to 0.7.

So you'll see

that the actual value would like closer to the regression line

when it reaches to 0.9 it comes.

More clothes and when the value of approximately equals

to 1 then the actual values lies on the regression line itself,

for example, in this case.

If you get a very low value of R square suppose 0.02.

So in that case what will see

that the actual values are very far away

from the regression line or you can say

that there are too many outliers in your data.

You cannot focus and thing from the data.

All right.

So this was all about the calculation of our Square now,

you might get a question like are low values

of Square always bad.

Well in some field it is entirely expected that I ask

where value will be low.

For example any field

that attempts to predict human behavior such as psychology

typically has r-squared values lower than around 50%

through which you can conclude

that humans are simply harder

to predict the under physical process furthermore.

If you ask what value is low,

but you have statistically significant predictors,

then you can still draw important conclusion

about how changes in the predicator values associated.

Created with the changes in the response value regardless

of the r-squared

the significant coefficient still represent the mean change

in the response for one unit of change in the predicator

while holding other predicated in the model constant.

Obviously this type

of information can be extremely valuable.

All right.

So this was all about the theoretical concept now,

let's move on to the coding part

and understand the code in depth.

So for implementing linear regression using python,

I will be using Anaconda

with jupyter notebook installed on it.

So I like there's a jupyter notebook

and we are using python 3.0 on it.

Alright, so we are going to use a data set consisting

of head size and human brain of different people.

All right.

So let's import our data set percent matplotlib and line.

We are importing numpy

as NP pandas as speedy and matplotlib and from matplotlib.

We are importing pipe lot of that as PLT.

Alright next we will import our data had brain dot CSV

and store it in the database table.

Let's execute the Run button and see the armor.

But so this task symbol it symbolizes

that it still executing.

So there's a output our data set consists

of two thirty seven rows and 4 columns.

We have columns as gender age range head size

in centimeter Cube

and brain weights and Graham fine.

So there's our sample data set.

This is how it looks it consists of all these data set.

So now that we have imported our data,

so as you can see they are 237 values in the training set

so we can find a linear.

Relationship between the head size and the Brain weights.

So now what we'll do we'll collect X & Y

the X would consist of the head size values

and the Y would consist of brain with values.

So collecting X and Y. Let's execute the Run.

Done next what we'll do we need to find the values of b 1

or B not or you can say m and C.

So we'll need the mean of X and Y values first of all

what we'll do we'll calculate the mean of X and Y so mean x

equal NP dot Min X.

So mean is a predefined function of Numb by similarly mean

underscore y equal NP dot mean of Y,

so what it will return

if you'll return the mean values of Y

next we'll check the total number of values.

So m equals.

Well length of X. Alright,

then we'll use the formula to calculate the values

of b 1 and B naught or MNC.

All right, let's execute the Run button and see

what is the result.

So as you can see here on the screen,

we have got d 1 as 0 point 2 6 3

and be not as three twenty five point five seven.

Alright, so now that we have a coefficient.

So comparing it with the equation y equal MX plus C.

You can say that brain weight equals

zero point 2 6 3 X Head size plus three twenty five point

five seven so you can say

that the value of M here is zero point

2 6 3 and the value of C.

Here is three twenty five point five seven.

All right, so there's our linear model now,

let's plot it and see graphically.

Let's execute it.

So this is how our plot looks like this model is not so bad.

But we need to find out how good our model has.

So in order to find it the many methods

like root mean Square method the coefficient of determination

or the a square method.

So in this tutorial,

I have told you about our score method.

So let's focus on that and see how good our model is.

So let's calculate the R square value.

All right here SS underscore T is the total sum of square SS.

I is the total sum of square of residuals and R square

as the formula is 1 minus total sum

of squares upon total sum of square of the residuals.

All right next when you execute it,

you will get the value of R square as 0.63

which is pretty very good.

Now that you have implemented simple linear regression model

using least Square method,

let's move on and see

how will you implement the model using machine learning library

called scikit-learn.

All right.

So this scikit-learn is a simple machine.

Owning library in Python welding machine learning model are

very easy using scikit-learn.

So suppose there's a python code.

So using the scikit-learn libraries your code shortens

to this length

like so let's execute

the Run button and see you will get the same our to score.

So today we'll be discussing logistic regression.

So let's move forward

and understand the what and by of logistic regression.

Now this algorithm is most widely used

when the dependent variable

or you can see the output is in the binary format.

And so here you need to predict the outcome

of a categorical dependent variable.

So the outcome should be always discreet or categorical

in nature Now by discrete.

I mean the value should be binary

or you can say you just have two values it can either be 0

or 1 you can either be yes

or a no either be true or false or high or low.

So only these can be the outcomes so the value

which you need to protect should be discrete

or you can say categorical in nature.

Whereas in linear regression.

We have the value of by or you can say the value.

Two predictors in a Range

that is how there's a difference between linear regression

and logistic regression.

We must be having question.

Why not linear regression now guys in linear regression

the value of buyer

or the value which you need to predict is in a range,

but in our case as in the logistic regression,

we just have two values it can be either 0

or it can be one.

It should not entertain the values which is

below zero or above one.

But in linear regression,

we have the value of y in the range so here in order

to implement logic regression.

We need to clip this This part so we don't need the value

that is below zero or we don't need the value

which is above 1

so since the value of y will be between only 0 and 1

that is the main rule of logistic regression.

The linear line has to be clipped at zero and one now.

Once we clip this graph it would look somewhat like this.

So here you are getting the curve

which is nothing but three different straight lines.

So here we need to make a new way to solve this problem.

So this has to be formulated into equation

and hence we come up with logistic regression.

So here the outcome is either 0 or 1.

Which is the main rule of logistic regression.

So with this our resulting curve cannot be formulated.

So hence our main aim to bring the values to 0

and 1 is fulfilled.

So that is how we came up with large stick regression now here

once it gets formulated into an equation.

It looks somewhat like this.

So guys, this is nothing but an S curve

or you can say the sigmoid curve a sigmoid function curve.

So this sigmoid function basically converts any value

from minus infinity to Infinity pure discrete values,

which a Logitech regression wants or you can say the Values

which are in binary format either 0 or 1.

So if you see here the values and either 0

or 1 and this is nothing but just a transition of it,

but guys there's a catch over here.

So let's say I have a data point that is 0.8.

Now, how can you decide

whether your value is 0

or 1 now here you have the concept

of threshold which basically divides your line.

So here threshold value

basically indicates the probability of either winning

or losing so here by winning.

I mean the value is equals to 1.

Am I losing I mean the values equal to 0

but how does it do that?

Let's have a data point which is over here.

Let's say my cursor is at 0.8.

So here I check

whether this value is less than the threshold value or not.

Let's say if it is more than the threshold value.

It should give me the result as 1 if it is less than that,

then should give me the result is zero.

So here my threshold value is 0.5.

I need to Define that if my value let's is 0.8.

It is a more than 0.5.

Then the value should be rounded of to 1.

Let's see if it is less than 0.5.

Let's I have a value 0.2 then should reduce it to zero.

So here you can use the concept

of threshold value to find output.

So here it should be discreet.

It should be either 0 or it should be one.

So I hope you caught this curve of logistic regression.

So guys, this is the sigmoid S curve.

So to make this curve we need to make an equation.

So let me address that part as well.

So let's see how an equation is formed to imitate

this functionality so over here,

we have an equation of a straight line.

It is y is equal to MX plus C.

So in this case,

I just have only one independent variable but let's say

if we have many independent variable then

the equation becomes m 1 x 1 plus m 2 x 2 plus m 3 x

3 and so on till M NX n now,

let us put in B and X.

So here the equation becomes Y is equal to b 1 x

1 plus beta 2 x

2 plus b 3 x 3 and so on till be nxn plus C.

So guys the equation

of the straight line has a range from minus infinity to Infinity.

But in our case or you can say largest equation the value

which we need to predict or you can say the Y value

it can have the range only from 0 to 1.

So in that case we need to transform this equation.

So to do that what we

had done we have just divide the equation by 1 minus y

so now Y is equal to 0 so 0

over 1 minus 0 which is equal to 1

so 0 over 1 is again 0

and if you take Y is equals to 1 then 1 over 1 minus 1

which is 0

so 1 over 0 is infinity.

So here my range is now between You know to Infinity,

but again, we want the range from minus infinity to Infinity.

So for that

what we'll do we'll have the log of this equation.

So let's go ahead and have the logarithmic

of this equation.

So here we have this transform it further to get the range

between minus infinity to Infinity so over

here we have log of Y

over 1 minus 1

and this is your final logistic regression equation.

So guys, don't worry.

You don't have to write this formula or memorize

this formula in Python.

You just need to call this function

which is logistic regression

and everything will be be automatically for you.

So I don't want to scare you with the maths

in the formulas behind it,

which is always good to know how this formula was generated.

So I hope you guys are clear

with how logistic regression comes into the picture next.

Let us see what are the major differences

between linear regression was a logistic regression the first

of all in linear regression,

we have the value

of y as a continuous variable or the variable

between need to predict are continuous in nature.

Whereas in logistic regression.

We have the categorical variable so here the value

which you need to predict should be Creating nature.

It should be either 0

or 1 or should have just two values to it.

For example,

whether it is raining or it is not raining

is it humid outside or it is not humid outside.

Now, does it going to snow and it's not going to snow?

So these are the few example,

we need to predict

where the values are discrete or you can just predict

whether this is happening or not.

Next linear equation solves your regression problems.

So here you have a concept of independent variable

and the dependent variable.

So here you can calculate the value of y

which you need to predict using the A of X so

here your y variable or you can see the value

that you need to predict are in a range.

But whereas in logistic regression you

have discrete values.

So logistic regression basically solves a classification problem

so it can basically classify it and it can just give you result

whether this event is happening or not.

So I hope it is

pretty much Clear till now next in linear equation.

The graph that you have seen is

a straight line graph so over here,

you can calculate the value of y with respect to the value

of x where as in logistic regression because of that.

The got was a Escobar you can see the sigmoid curve.

So using the sigmoid function You can predict

your y-values moving the I let us see the various use cases

where in logistic regression is implemented in real life.

So the very first is weather prediction now

largest aggression helps you to predict your weather.

For example, it is used to predict

whether it is raining or not whether it is sunny.

Is it cloudy or not?

So all these things can be predicted

using logistic regression.

Where as you need to keep in mind

that both linear regression.

And logistic regression can be used in predicting the weather.

So in that case linear equation helps you to predict

what will be the temperature tomorrow

whereas logistic regression will only tell you

which is going to rain or not or whether it's cloudy or not,

which is going to snow or not.

So these values are discrete.

Whereas if you apply linear regression you

the predicting things like what is the temperature tomorrow

or what is the temperature day after tomorrow

and all those thing?

So these are the slight differences

between linear regression

and logistic regression the moving ahead.

We have classification problem.

Sighs on performs multi-class classification.

So here it can help you tell whether it's a bird.

It's not a bird.

Then you classify different kind of mammals.

Let's say whether it's a dog or it's not a dog similarly.

You can check it for reptile

whether it's a reptile or not a reptile.

So in logistic regression,

it can perform multi-class classification.

So this point I've already discussed

that it is used in classification problems next.

It also helps you to determine the illness as well.

So let me take an example.

Let's say a patient goes for a routine check up in hospital.

So what doctor will do it,

it will perform various tests on the patient and will check

whether the patient is actually l or not.

So what will be the features

so doctor can check the sugar level

the blood pressure then what is the age of the patient?

Is it very small or is it old person then?

What is the previous medical history of the patient

and all of these features will be recorded by the doctor

and finally doctor checks the patient data and Data -

the outcome of an illness and the severity of illness.

So using all the data a doctor can identify

with A patient is ill or not.

So these are the various use cases

in which you can use logistic regression now,

I guess enough of theory part.

So let's move ahead and see some of the Practical implementation

of logistic regression so over here,

I be implementing two projects

when I have the data set

of Titanic so over here will predict what factors made

people more likely to survive the sinking of the Titanic ship

and my second project will see the data analysis

on the SUV cars so over here we have the data of the SUV cars

who can purchase it.

And what factors made people more interested in buying SUV?

So these will be the major questions as

to why you should Implement

logistic regression and what output will you get by it?

So let's start by the very first project

that is Titanic data analysis.

So some of you might know

that there was a ship called as Titanic

with basically hit an iceberg

and it sunk to the bottom of the ocean and it was

a big disaster at that time

because it was the first voyage of the ship

and it was supposed to be really really strongly built and one

of the best ships of that time.

So it was a big Disaster

of that time and of course there is a movie about this as well.

So many of you might have washed it.

So what we have we have data of the passengers those

who survived and those

who did not survive in this particular tragedy.

So what you have to do you have to look at this data

and analyze which factors would have been contributed

the most to the chances

of a person survival on the ship or not.

So using the logistic regression, we can predict

whether the person survived

or the person died now apart from this.

We also have a look with the various features

along with that.

So first, let us explore The data set so over here.

We have the index value then the First Column

is passenger ID.

Then my next column is survived.

So over here,

we have two values a 0 and a 1 so 0 stands

for did not survive and one stands for survive.

So this column is categorical

where the values are discrete next.

We have passenger class so over here,

we have three values 1 2 and 3.

So this basically tells you that

whether a passengers travelling in the first class second class

or third class,

then we have the name of the We

have the six or you can see the gender of the passenger

where the passenger is a male or female.

Then we have the age we had sip SP.

So this basically means the number of siblings

or the spouses aboard the Titanic so over here,

we have values such as 1 0 and so on then we have

Parts apart is basically the number of parents

or children aboard the Titanic so over here,

we also have some values

then we have the ticket number.

We have the fair.

We have the table number and we have the embarked column.

So in my inbox column,

we have three values we have SC and Q.

So as basically stands

for Southampton C stands for Cherbourg

and Q stands for Cubans down.

So these are the features

that will be applying our model on so here

we'll perform various steps

and then we'll be implementing logistic regression.

So now these are the various steps

which are required to implement any algorithm.

So now in our case we are implementing

logistic regression soft.

Very first step is to collect your data

or to import the libraries

that are used for collecting your data.

And then taking it forward then my second step is to analyze

your data so over here I can go to the various fields

and then I can analyze the data.

I can check that the females

or children survive better than the males

or did the rich passenger survived more

than the poor passenger or did the money matter as in

who paid mode to get into the ship

with the evacuated first?

And what about the workers does the worker survived

or what is the survival rate

if you were the worker in the ship and not just

a traveling passenger?

So all of these are very very and questions

and you would be going through all of them one by one.

So in this stage,

you need to analyze our data

and explore your data as much as you can then my third step is

to Wrangle your data now

data wrangling basically means cleaning your data so over here,

you can simply remove the unnecessary items or

if you have a null values in the data set.

You can just clear that data and then you can take it forward.

So in this step you can build your model

using the train data set

and then you can test it

using a test so over here you will be performing a split

which basically Get your data set into training

and testing data set and find you will check the accuracy.

So as to ensure

how much accurate your values are.

So I hope you guys got these five steps

that you're going to implement in logistic regression.

So now let's go into all these steps in detail.

So number one.

We have to collect your data

or you can say import the libraries.

So it may show you the implementation part as well.

So I just open my jupyter notebook

and I just Implement all of these steps side by side.

So guys this is my jupyter notebook.

So first, let me just rename jupyter notebook to let's say

Titanic data analysis.

Now a full step was to import all the libraries

and collect the data.

So let me just import all the library's first.

So first of all, I'll import pandas.

So pandas is used for data analysis.

So I'll say import pandas as PD then I will be importing numpy.

So I'll say import numpy as NP so number is a library in Python

which basically stands for numerical Python

and it is widely used to perform any scientific computation.

Next.

We will be importing Seaborn.

So c 1 is a library

for statistical plotting so Say import Seaborn as SNS.

I'll also import matplotlib.

So matplotlib library is again for plotting.

So I'll say import matplotlib dot Pi plot

as PLT now to run this library in jupyter Notebook all I have

to write in his percentage matplotlib in line.

Next I will be importing one module as well.

So as to calculate the basic mathematical functions,

so I'll say import maths.

So these are the libraries

that I will be needing in this Titanic data analysis.

So now let me just import my data set.

So I'll take a variable.

Let's say Titanic data and using the pandas.

I will just read my CSV or you can see the data set.

I like the name of my data set that is Titanic dot CSV.

Now.

I have already showed you the data set so over here.

Let me just bring the top 10 rows.

So for that I will just say

I take the variable Titanic data dot head

and I'll say the top ten rules.

So now I'll just run this

so to run this style is have to press shift + enter

or else you can just directly click on this cell so over here.

I have the index.

We have the passenger ID, which is nothing.

But again the index

which is starting from 1 then we have the survived column

which has a category.

Call values or you can say the discrete values,

which is in the form of 0 or 1.

Then we have the passenger class.

We have the name of the passenger sex age and so on.

So this is the data set

that I will be going forward

with next let us print the number of passengers

which are there in this original data frame for that.

I'll just simply type in print.

I'll say a number of passengers.

And using the length function,

I can calculate the total length.

So I'll say length

and inside this I'll be passing this variable

because Titanic data, so I'll just copy it from here.

I'll just paste it dot index

and next set me just bring this one.

So here the number of passengers

which are there

in the original data set we have is 891 so around

this number would traveling in the Titanic ship so over here,

my first step is done

where you have just collected data imported all the libraries

and find out the total number of passengers,

which are Titanic so now let me just go back

to presentation and let's see.

What is my next step.

So we're done with the collecting data.

Next step is to analyze your data so over here

will be creating different plots to check the relationship

between variables as

in how one variable is affecting the other

so you can simply explore your data set by making use

of various columns

and then you can plot a graph between them.

So you can either plot a correlation graph.

You can plot a distribution curve.

It's up to you guys.

So let me just go back

to my jupyter notebook and let me analyze some of the data.

Over here.

My second part is to analyze data.

So I just put this in headed to now to put this in here

to I just have to go on code click on mark down

and I just run this so first

let us plot account plot

where you can pay between the passengers

who survived and who did not survive.

So for that I will be using the Seabourn Library so over

here I have imported Seaborn as SNS

so I don't have to write the whole name.

I'll simply say SNS dot count plot.

I say axis with the survive and the data

that I'll be using is the Titanic data

or you can say the name of variable in which you

have store your data set.

So now let me just run this

so who were here as you can see I have survived column on my x

axis and on the y axis.

I have the count.

So zero basically stands for did not survive

and one stands for the passengers

who did survive so over here,

you can see that around 550 of the passengers

who did not survive and they were around 350 passengers

who only survive so here you can basically conclude.

There are very less survivors than on survivors.

So this was the very first plot now there is not another plot

to compare the sex as to whether out of all the passengers

who survived and who did not survive.

How many were men and how many were female

so to do that?

I'll simply say SNS dot count plot.

I add the Hue as six so I want to know

how many females and how many male survive

then I'll be specifying the data.

So I'm using Titanic data set and let me just run

this you have done a mistake

over here so over here you

can see I have survived column on the x-axis

and I have the count on the why now.

So have you color stands for your male passengers

and orange stands for your female?

So as you can see here the passengers

who did not survive

that has a value 0 so we can see that.

Majority of males did not survive and if we see the people

who survived here,

we can see the majority of female survive.

So this basically concludes the gender of the survival rate.

So it appears on average women were more than three

times more likely to survive than men next.

Let us plot another plot

where we have the Hue as the passenger class so over

here we can see which class at the passenger was traveling in

whether it was traveling in class 1 2 or 3.

So for that I just arrived the same command.

I will say as soon as.com plot.

I gave my x-axis as a family.

I'll change my Hue to passenger class.

So my variable named as PE class.

And the data said

that I'll be using this Titanic data.

So this is my result

so over here you can see I have blue for first-class orange

for second class and green for the third class.

So here the passengers

who did not survive a majorly of the third class

or you can say the lowest class

or the cheapest class to get into the dynamic and the people

who did survive majorly belong to the higher classes.

So here 1 & 2 has more eyes than the passenger

who were traveling in the third class.

So here we have computed that the passengers

who did not survive a majorly of third.

Or you can see the lowest class

and the passengers who were traveling

in first and second class would tend to survive mode next.

I just got a graph for the age distribution over here.

I can simply use my data.

So we'll be using pandas library for this.

I will declare an array and I'll pass in the column.

That is H.

So I plot and I want a histogram.

So I'll see plot da test.

So you can notice over here

that we have more of young passengers,

or you can see the children between the ages 0 to 10

and then we have the average people

and if you go ahead Lester would be the population.

So this is the analysis on the age column.

So we saw that we have

more young passengers and more video courage passengers

which are traveling in the Titanic.

So next let me plot a graph of fare as well.

So I'll say Titanic data.

I say fair and again,

I've got a histogram so I'll say hissed.

So here you can see the fair size is

between zero to hundred now.

Let me add the bin size.

So as to make it more clear over here,

I'll say Ben is equals to let's

say 20 and I'll increase the figure size as well.

So I'll say fixed size.

Let's say I'll give the dimensions as 10 by 5.

So it is bins.

So this is more clear now next.

It is analyzed the other columns as well.

So I'll just type in Titanic data

and I want the information as to what all columns are left.

So here we have passenger ID,

which I guess it's of no use then we have see

how many passengers survived and how many did not we

also do the analysis on the gender basis.

We saw with a female tend to survive more

or the maintain to survive more then we saw the passenger class

where the passenger is traveling in the first class second class

or third class.

Then we have the name.

So in name, we cannot do any analysis.

We saw the sex we saw the ages.

Well, then we have sea bass P.

So this stands for the number of siblings or the spouse is

which Are aboard the Titanic so let us do this as well.

So I'll say SNS dot count plot.

I mentioned X SC SP.

And I will be using the Titanic data

so you can see the plot

over here so over here you can conclude that.

It has the maximum value on zero so we can conclude

that neither children nor a spouse was

on board the Titanic now second most highest value is 1

and then we have various values for 2 3 4 and so on next

if I go above the store this column as well.

Similarly can do four parts.

So next we have part

so you can see the number of parents or children

which are both the Titanic so similarly can do.

Israel then we have the ticket number.

So I don't think so.

Any analysis is required for Ticket.

Then we have fears of a we have already discussed as

in the people would tend to travel in the first class.

You will pay the highest view then we have the cable number

and we have embarked.

So these are the columns

that will be doing data wrangling on

so we have analyzed the data

and we have seen quite a few graphs

in which we can conclude which variable is better than another

or what is the relationship the whole third step

is my data wrangling so data wrangling basically

means Cleaning your data.

So if you have a large data set,

you might be having some null values

or you can say n values.

So it's very important

that you remove all the unnecessary items

that are present in your data set.

So removing this directly affects your accuracy.

So I just go ahead and clean my data

by removing all the Nan values and unnecessary columns,

which has a null value in the data set

the next time you're performing data wrangling.

Supposed to fall I'll check

whether my dataset is null or not.

So I'll say Titanic data,

which is the name of my data set and I'll say is null.

So this will basically tell me what all values are null

and will return me a Boolean result.

So this basically checks the missing data

and your result will be in Boolean format

as in the result will be true or false so Falls mean

if it is not null and true means

if it is null,

so let me just run this.

Over here you can see the values as false or true.

So Falls is where the value is not null and true is

where the value is none.

So over here you can see in the cabin column.

We have the very first value

which is null so we have to do something on this so you can see

that we have a large data set.

So the counting does not stop

and we can actually see the some of it.

We can actually print the number of passengers

who have the Nan value in each column.

So I say Titanic underscore data is null

and I want the sum of it.

They've got some so this is basically print the number

of passengers who have the n n values in each column

so we can see

that we have missing values in each column that is 177.

Then we have the maximum value in the cave in column

and we have very Less in the Embark column.

That is 2 so here

if you don't want to see this numbers,

you can also plot a heat map

and then you can visually analyze it so let me just do

that as well.

So I'll say SNSD heat map.

and say why tick labels False child has run this

as we have already seen

that there were three columns

in which missing data value was present.

So this might be age so over here almost 20% of each column

has a missing value then we have the caping columns.

So this is quite a large value

and then we have two values for embark column as well.

Add a see map for color coding.

So I'll say see map.

So if I do this

so the graph becomes more attractive so over here

yellow stands for Drew or you can say the values are null.

So here we have computed

that we have the missing value of H. We have a lot

of missing values in the cabin column

and we have very less value,

which is not even visible in the Embark column as well.

So to remove these missing values,

you can either replace the values and you can put in

some dummy values to it or you can simply drop the column.

So here let us suppose pick the age column.

So first, let me just plot a box plot

and they will analyze with having a column as age

so I'll say SNS dot box plot.

I'll say x is equals to passenger class.

So it's PE class.

I'll say Y is equal to H and the data set

that I'll be using is Titanic side.

So I'll say the data is goes to Titanic data.

You can see the edge

in first class and second class tends to be more older rather

than we have it in the third place.

Well that depends On the experience

how much you earn on might be there any number of reasons?

So here we concluded

that passengers

who were traveling in class one and class two a tend

to be older than what we have in the class 3 so we have found

that we have some missing values in EM.

Now one way is to either just drop the column

or you can just simply fill in some values to them.

So this method is called as imputation now

to perform data wrangling

or cleaning it is for spring the head of the data set.

So I'll say Titanic not head so it's Titanic.

For data, let's say I just want the five rows.

So here we have survived which is again categorical.

So in this particular column,

I can apply logic to progression.

So this can be my y value or the value

that you need to predict.

Then we have the passenger class.

We have the name then we have ticket number Fair

given so over here.

We have seen that in keeping.

We have a lot of null values or you can say that any invalid

which is quite visible as well.

So first of all,

we'll just drop this column for dropping it.

I'll just say Titanic underscore data.

And I'll simply type in drop and the column

which I need to drop so I have to drop the cable column.

I mention the access equals to 1 and I'll say

in place also to true.

So now again,

I just print the head and a to see whether this column

has been removed from the data set or not.

So I'll say Titanic dot head.

So as you can see here,

we don't have given column anymore.

Now, you can also drop the na values.

So I'll say Titanic data dot drop all the any values

or you can say Nan

which is not a number and I will say in place is equal to True.

Let's Titanic.

So over here,

let me again plot the heat map and let's say what the values

which will be for showing a lot of null values.

Has it been removed or not.

So I'll say SNSD heat map.

I'll pass in the data set.

I'll check it is null I say why dick labels is equal to false.

And I don't want color coding.

So again I say false.

So this will basically help me to check

whether my values has been removed

from the data set or not.

So as you can see here, I don't have any null values.

So it's entirely black now.

You can actually know the some as well.

So I'll just go above So I'll just copy this part

and I just use the sum function to calculate the sum.

So here the tells me

that data set is green as in the data set does not contain

any null value or any n value.

So now we have R Angela data.

You can see cleaner data.

So here we have done just one step in data wrangling

that is just removing one column out of it.

Now you can do a lot of things you can actually

fill in the values with some other values

or you can just calculate the mean

and then you can just fit in the null values.

But now if I see my data set,

so I'll say Titanic data dot head.

But now if I see you over here I have a lot of string values.

So this has to be converted to a categorical variables

in order to implement logistic regression.

So what we will do we will convert this

to categorical variable

into some dummy variables and this can be done using pandas

because logistic regression just take two values.

So whenever you apply machine learning you need to make sure

that there are no string values present

because it won't be taking these as your input variables.

So using string you don't have to predict anything but

in my case I have the survived columns 2210 how many?

People tend to survive

and how men did not so 0 stands for did not survive

and one stands for survive.

So now let me just convert these variables

into dummy variables.

So I'll just use pandas and I say PD not get dummies.

You can simply press tab to autocomplete

and say Titanic data and I'll pass the sex

so you can just simply click on shift + tab to get

more information on this.

So here we have the type data frame

and we have the passenger ID survived and passenger class.

So if Run this you'll see

that 0 basically stands for not a female and once and

for it is a female similarly for male zero Stanford's not made

and one Stanford main now,

we don't require both these columns

because one column itself is enough to tell us

whether it's male or you can say female or not.

So let's say if I want to keep only mail I will say

if the value of mail is 1

so it is definitely a maid and is not a female.

So that is how you don't need both of these values.

So for that I just remove the First Column,

let's say a female so I'll say drop first.

Andrew it has given me just one column

which is male and has a value 0 and 1.

Let me just set this as a variable hsx so over here

I can say sex dot head.

I'll just want to see the first pie Bros.

Sorry, it's Dot.

So this is how my data looks like now here.

We have done it for sex.

Then we have the numerical values in age.

We have the numerical values in spouses.

Then we have the ticket number.

We have the pair and we have embarked as well.

So in Embark, the values are in SC and Q.

So here also we can apply this get dummy function.

So let's say I will take a variable.

Let's say Embark.

I'll use the pandas Library.

I need the column name that is embarked.

Let me just print the head of it.

So I'll say Embark dot head so over here.

We have c q and s now here also we can drop the First Column

because these two values are enough

with the passenger is either traveling for Q

that is toonstone S4 sound time

and if both the values are 0 then definitely

the passenger is from Cherbourg.

That is the third value

so you can again drop the first value so I'll say drop.

Let me just run this

so this is how my output looks like now similarly.

You can do it for passenger class as well.

So here also we have three classes one two,

and three so I'll just copy the whole statement.

So let's say I want the variable name.

Let's say PCL.

I'll pass in the column name

that is PE class and I'll just drop the First Column.

So here also the values will be 1 2 or 3

and I'll just remove the First Column.

So here we just left with two and three so

if both the values are 0 then definitely the passengers

traveling the first class now,

we have made the values as categorical now,

my next step would be to concatenate all

these new rows into a data set.

We can see Titanic data using the pandas will just concatenate

all these columns.

So I'll say p Dot.

One cat and then say we have to concatenate sex.

We have to concatenate Embark and PCL

and then I will mention the access to one.

I'll just run this can you

to print the head so over here you can see

that these columns have been added over here.

So we have the mail column with basically tells

where the person is male

or it's a female then we have the Embark

which is basically q

and s so if it's traveling

from Queenstown value would be one else it

would be 0 and If both of these values are zeroed,

it is definitely traveling from Cherbourg.

Then we have the passenger class as 2 and 3.

So the value of both

these is 0 then passengers travelling in class one.

So I hope you got this till now now these are

the irrelevant columns

that we have done over here

so we can just drop these columns will drop

in PE class the embarked column and the sex column.

So I'll just type

in Titanic data dot drop and mention the columns

that I want to drop.

So I say And even lead the passenger ID

because it's nothing but just the index value

which is starting from one.

So I'll drop this as well then I don't want name as well.

So I'll delete name as well.

Then what else we can drop we can drop the ticket as well.

And then I'll just mention the axis L say

in place is equal to True.

Okay, so the my column name starts uppercase.

So these has been dropped now,

let me just bring my data set again.

So this is my final leadership guys.

We have the survived column

which has the value zero and one then we have the passenger class

or we forgot to drop this as well.

So no worries.

I'll drop this again.

So now let me just run this.

So over here we have the survive.

We have the H we have the same SP.

We have the parts.

We have Fair mail and these we have just converted.

So here we have just performed data angling

for you can see clean the data

and then we have just converted the values of gender

to male then embarked to qns

and the passenger Class 2 2 & 3.

So this was all about my data wrangling

or just cleaning the data then my next up is training

and testing your data.

So here we will split the data set into train subset

and test steps.

And then what we'll do we'll build a model

on the train data

and then predict the output on your test data set.

So let me just go back to Jupiter

and it is implement this as well over here.

I need to train my data set.

So I'll just put this indeed heading 3.

So over you need to Define your dependent variable

and independent variable.

So here my Y is the output for you can say the value

that I need to predict so over here,

I will write Titanic data.

I'll take the column which is survive.

So basically I have to predict this column

whether the passenger survived or not.

And as you can see we have the discrete outcome,

which is in the form of 0 and 1 and rest all the things we

can take it as a features or you can say independent variable.

So I'll say Titanic data.

Not a drop,

so we just simply drop the survive

and all the other columns will be my independent variable.

So everything else are the features which leads

to the survival rate.

So once we have defined the independent variable

and the dependent variable next step is to split

your data into training and testing subset.

So for that we will be using SK loan.

I just type in from sklearn dot cross validation.

import train display Now here

if you just click on shift and tab,

you can go to the documentation

and you can just see the examples over here.

And she can blast open it

and then I just go to examples and see

how you can split your data.

So over here you have extra next test why drain

why test and then using the string test platelet

and just passing

your independent variable and dependent variable

and just Define a size and a random straight to it.

So, let me just copy this and I'll just paste over here.

Over here, we'll train test.

Then we have the dependent variable train and test

and using the split function will pass in the independent

and dependent variable and then we'll set a split size.

So let's say I'll put it up 0.3.

So this basically means

that your data set is divided in 0.3

that is in 70/30 ratio.

And then I can add any random straight to it.

So let's say I'm applying one this is not necessary.

If you want the same result as that of mine,

you can add the random stream.

So this would basically take exactly the same sample

every Next I have to train and predict by creating a model.

So here logistic regression will graph

from the linear regression.

So next I'll just type in

from SK loan dot linear model import logistic regression.

Next I'll just create

the instance of this logistic regression model.

So I'll say log model is equals to largest aggression now.

I just need to fit my model.

So I'll say log model dot fit

and I'll just pass in my ex train.

And why it rain?

It gives me all the details of logistic regression.

So here it gives me the class made dual fit intercept

and all those things then what I need to do,

I need to make prediction.

So I will take a variable insect addictions and I'll pass

on the model to it.

So I'll say log model dot protect

and I'll pass in the value that is X test.

So here we have just created a model fit

that model and then we had made predictions.

So now to evaluate how my model has been performing.

So you can simply calculate the accuracy

or you can also calculate a classification report.

So don't worry guys.

I'll be showing both of these methods.

So I'll say

from sklearn dot matrix input classification report.

Are you start fishing report?

And inside this I'll be passing in why test and the predictions?

So guys this is my classification report.

So over here, I have the Precision.

I have the recall.

We have the advanced code and then we have support.

So here we have the value of decision as 75 72 and 73,

which is not that bad now

in order to calculate the accuracy as well.

You can also use the concept of confusion Matrix.

So if you want to print the confusion Matrix,

I will simply say

from sklearn dot matrix import confusion Matrix first of all,

and then we'll just print this So

how am I function has been imported successfully

so is a confusion Matrix.

And I'll again passing the same variables

which is why test and predictions.

So I hope you guys already know the concept of confusion Matrix.

So can you guys give me a quick confirmation as

to whether you guys

remember this confusion Matrix concept or not?

So if not,

I can just quickly summarize this as well.

Okay charged with you say

so yes.

Okay.

So what is not clear with this?

So I'll just tell you in a brief what

confusion Matrix is all about?

So confusion Matrix is nothing but a 2 by 2 Matrix

which has a four outcomes this basic tells us

that how accurate your values are.

So here we have the column as predicted.

No predicted Y and we have actual know an actual.

Yes.

So this is the concept of confusion Matrix.

So here let me just fade in these values

which we have just calculated.

So here we have 105.

105 2125 and 63 So as you can see here,

we have got four outcomes now

105 is the value where a model has predicted.

No, and in reality.

It was also a no so

where we have predicted know an actual know similarly.

We have 63 as a predicted.

Yes.

So here the model predicted.

Yes, and actually also it was yes.

So in order to calculate the accuracy,

you just need to add the sum

of these two values and divide the whole by the some.

So here these two values tells me where the order has.

We predicted the correct output.

So this value is also called as true-

This is called as false positive.

This is called as true positive

and this is called as false negative.

Now in order to calculate the accuracy.

You don't have to do it manually.

So in Python,

you can just import accuracy score function

and you can get the results from that.

So I'll just do that as well.

So I'll say from sklearn dot-matrix import accuracy score

and I'll simply print the accuracy.

I'm passing the same variables.

That is why I test and predictions so over here.

It tells me the accuracy as 78 which is quite good so over here

if you want to do it manually we have 2 plus these two numbers,

which is 105 263.

So this comes out to almost 168 and then you have to divide

by the sum of all the phone numbers.

So 105 plus 63 plus 21 plus 25,

so this gives me a result of to 1/4.

So now if you divide

these two number you'll get the same accuracy that is 98%

or you can say .78.

So that is how you can calculate the accuracy.

So now let me just go back to my presentation and let's see

what all we have covered till now.

So here we have First Data data into train and test subset then

we have build a model on the train data

and then predicted the output on the test data set

and then my fifth step is to check the accuracy.

So here we have

calculator accuracy to almost seventy eight percent,

which is quite good.

You cannot say that accuracy is bad.

So here tells me how accurate your results.

So him accuracy skoda finds

that enhanced got a good accuracy.

So now moving ahead.

Let us see the second project that is SUV data analysis.

So in this a car company has released new SUV in the market

and using the previous data about the sales of their SUV.

They want to predict the category of people

who might be interested in buying this.

So using the logistic regression,

you need to find what factors made people more interested

in buying this SUV.

So for this let us hear data set where I have user ID I have

Of gender as male and female then we have the age.

We have the estimated salary

and then we have the purchased column.

So this is my discreet column

or you can see the categorical column.

So here we just have the value

that is 0 and 1 and this column we need to predict

whether a person can actually purchase a SUV or Not.

So based on these factors, we will be deciding

whether a person can actually purchase SUV or not.

So we know the salary of a person we know the age

and using these we can predict

whether person can actually purchase SUV

on Let me just go to my jupyter.

Notebook and has implemented a logistic regression.

So guys, I will not be going through all the details

of data cleaning and analyzing the part start part.

I'll just leave it on you.

So just go ahead and practice as much as you can.

Alright, so the second project is SUV predictions.

Alright, so first of all,

I have to import all the libraries

so I say import numpy SNP and similarly.

I'll do the rest of it.

Alright, so now let me just bring the head

of this data set.

So this give already seen that we have columns as user ID.

We have gender.

We have the age.

We have the salary and then we have to calculate

whether person can actually purchase a SUV or not.

So now let us just simply go on to the algorithm part.

So we'll directly start off with the logistic regression

how you can train a model so for doing all those things

we first need to Define an independent variable

and a dependent variable.

So in this case,

I want my ex at is an independent variable is

a data set.

I lock so here I will specify sighing all the rows.

So cool and basically stands for that and in the columns,

I want only two and three dot values.

So here we should fetch me all the rows

and only the second and third column which is age

and estimated salary.

So these are the factors

which will be used to predict the dependent variable

that is purchase.

So here my dependent variable is purchase

any dependent variable is of age and salary.

So I'll say later said dot

I log I'll have all the rows and add just one for column.

That is my position.

Is column values.

All right, so I just forgot

when one square bracket over here.

Alright so over here.

I have defined my independent variable and dependent variable.

So here my independent variable is age and salary

and dependent variable is the column purchase.

Now, you must be wondering what is this?

I lock function.

So I look function is basically an index of a panda's data frame

and it is used for integer based indexing

or you can also say selection by index now,

let me just bring these independent variables

and dependent variable.

So if I bring the independent variable I have aged as

well as a salary next.

Let me print the dependent variable as well.

So over here you can see I just have the values in 0

and 1 so 0 stands for did not purchase next.

Let me just divide my data set into training and test subset.

So I'll simply write in

from SK loaned cross plate dot cross validation.

import rain test next I just press shift and tab

and over here.

I will go to the examples and just copy the same line.

So I'll just copy this.

I'll move the points now.

I want to text size to be let's see 25,

so I have divided the trained and tested in 75/25 ratio.

Now, let's say I'll take the random set of 0 So

Random State basically ensures the same result

or you can say the same samples taken whenever you run the code.

So let me just run this now.

You can also scale your input values

for better performing

and this can be done using standard scale.

Oh, so let me do that as well.

So I'll say from sklearn pre-processing.

Import standard scalar now.

Why do we scale it now?

If you see a data set we are dealing with large numbers.

Well, although we are using a very small data set.

So whenever you're working in a prod environment,

you'll be working with large data set

we will be using thousands

and hundred thousands of do people's so they're scaling

down will definitely affect the performance

by a large extent.

So here let me just show you

how you can scale down these input values and then

the pre-processing contains all your methods & functionality,

which is required to transform your data.

So now let us scale down

for tests as well as their training data set.

So else First Make an instance of it.

So I'll say standard scalar

then I'll have Xtreme sasc dot fit fit underscore transform.

I'll pass in my Xtreme variable.

And similarly I can do it for test wherein

I'll pass the X test.

All right.

Now my next step is to import logistic regression.

So I'll simply apply logistically creation

by first importing it

so I'll say from sklearn sklearn

the linear model import logistic regression over here.

I'll be using classifier.

So is a classifier DOT is equals

to largest aggression so over here,

I just make an instance of it.

So I'll say logistic regression and over here.

I just pass in the random state,

which is 0 No, I simply fit the model.

And I simply pass in X train and white rain.

So here it tells me all the details

of logistic regression.

Then I have to predict the value.

So I'll say why I prayed it's equals to classifier.

Then predict function and then I just pass in X test.

So now we have created the model.

We have scale down our input values.

Then we have applied logistic regression.

We have predicted the values

and now we want to know the accuracy.

So now the accuracy first we need to import accuracy scores.

So I'll say from sklearn dot-matrix import

actually see school

and using this function we can calculate the accuracy

or you can manually do

that by creating a confusion Matrix.

So I'll just pass.

my lightest and my y predicted All right.

So over here I get the accuracy is 89%

So we want to know the accuracy in percentage.

So I just have to multiply it by a hundred and if I run this

so it gives me 89%

So I hope you guys are clear

with whatever I have taught you today.

So here I have taken my independent variables as age

and salary and then we have calculated

that how many people can purchase the SUV

and then we have calculated our model by checking

the accuracy so over here we get the accuracy is 89

which is great.

Alright guys that is it for today.

So I'll Discuss

what we have covered in today's training.

First of all,

we had a quick introduction to what is regression

and where their aggression is actually use then

we have understood the types of regression

and then got into the details

of what and why of logistic regression

of compared linear was in logistic regression.

If you've also seen the various use cases

where you can Implement logistic regression in real life

and then we have picked up two projects

that is Titanic data analysis

and SUV prediction so over here we have seen

how you can collect your data analyze your data then perform.

Modeling on that date that train the data test

the data and then finally have calculated the accuracy.

So in your SUV prediction,

you can actually analyze clean your data

and you can do a lot of things

so you can just go ahead pick up any data set

and explore it as much as you can.

What is classification.

I hope every one of you must have used Gmail.

So how do you think the male is getting classified as a Spam

or not spam mail?

Well, there's But classification So

What It Is Well classification is the process

of dividing the data set into different categories

or groups by adding label.

In other way,

you can say that it is a technique

of categorizing the observation into different category.

So basically what you are doing is you are taking

the data analyzing it

and on the basis of some condition

you finely divided into various categories.

Now, why do we classify it?

Well, we classify it to perform predictive analysis

on it like when you get

the mail the machine predicts it Be a Spam

or not spam mail

and on the basis of that prediction it

add the irrelevant or spam mail

to the respective folder in general this classification.

Algorithm handled questions.

Like is this data belongs to a category or B category?

Like is this a male or is this a female something like that?

I getting it?

Okay fine.

Now the question arises where will you use it?

Well, you can use this of protection order to check

whether the transaction is genuine or not suppose.

I am using a credit.

Here in India now due

to some reason I had to fly to Dubai now.

If I'm using the credit card over there,

I will get a notification alert regarding my transaction.

They would ask me to confirm about the transaction.

So this is also kind of predictive analysis

as the machine predicts

that something fishy is

in the transaction as very for our ago.

I made the transaction using the same credit card and India

and 24 hour later.

The same credit card is being used for the payment in Dubai.

So the machine texts that something fishy is going on

in the transaction.

So in order to confirm it it sends you a notification alert.

All right.

Well, this is one of the use case of classification

you can even use it to classify different items

like fruits on the base of its taste color size

or weight a machine well trained using

the classification algorithm can easily predict the class

or the type of fruit whenever new data is given to it.

Not just the fruit.

It can be any item.

It can be a car.

It can be a house.

It can be a signboard.

Or anything.

Have you noticed

that while you visit some sites

or you try to login into some you get

a picture capture for that right

where you have to identify

whether the given image is of a car or its of a pole or not?

You have to select it for example that 10 images

and you're selecting three Mages out of it.

So in a way you are training the machine,

right you're telling

that these three are the picture of a car

and rest are not so

who knows you are training at for something big

right?

So moving on ahead.

Let's discuss the types of education online.

Well, there are several different ways

to perform the same tasks like in order to predict

whether a given person is a male

or a female the machine had to be trained first.

All right,

but there are multiple ways to train the machine and you

can choose any one of them just for Predictive Analytics.

There are many different techniques,

but the most common of them all is the decision tree,

which we'll cover in depth in today's session.

So it's a part of classification algorithm.

We have decision tree random Forest name buys.

K-nearest neighbor Lodge

is Regression linear regression support Vector machines

and so on there are many.

Alright, so let me give you an idea about few

of them starting with decision tree.

Well decision tree is a graphical representation

of all the possible solution

to a decision the decisions

which are made they can be explained very easily.

For example here is a task,

which says that should I go to a restaurant

or should I buy a hamburger you are confused on that.

So for the artboard you will do you will create

a dish entry for it starting

with the root node will be first of all,

you will check whether you are hungry or not.

All right,

if you're not hungry then just go back to sleep.

Right?

If you are hungry

and you have $25 then you will decide to go to restaurant

and if you're hungry and you don't have $25,

then you will just go and buy a hamburger.

That's it.

All right.

So there's about decision tree now moving on ahead.

Let's see.

What is a random Forest.

Well random Forest build multiple decision trees

and merges them together to get a more accurate

and stable production.

All right, most of the time random Forest is trained

with a bagging method.

The bragging method is based on the idea

that the combination

of learning module increases the overall result.

If you are combining the learning from different models

and then clubbing it together

what it will do it will Increase the overall result fine.

Just one more thing.

If the size of your data set is huge.

Then in that case one single decision tree would lead

to our Offutt model same way

like a single person might have its own perspective

on the complete population as a population is very huge.

Right?

However, if we implement the voting system and ask

different individual to interpret the data,

then we would be able to cover the pattern

in a much meticulous way even from the diagram.

You can see that in section A

we have Howard large training data set what we do.

We first divide our training data set

into n sub-samples on it

and we create a decision tree for each cell sample.

Now in the B part what we do we take the vote

out of every decision made by every decision tree.

And finally we Club the vote to get

the random Forest dition fine.

Let's move on ahead.

Next.

We have neighbor Buys.

So name bias is a classification technique,

which is based on Bayes theorem.

It assumes that it's

of any particular feature in a class is completely unrelated

to the presence

of any other feature named buys is simple

and easy to implement algorithm and due to a Simplicity

this algorithm might out perform more complex model

when the size of the data set is not large enough.

All right, a classical use case

of Navy bias is a document classification.

And that what you do you determine

whether a given text corresponds

to one or more categories in the Texas case,

the features used might be the presence or absence.

Absence of any keyword.

So this was about Nev from the diagram.

You can see that using neighbor buys.

We have to decide

whether we have a disease or not.

First what we do we check the probability

of having a disease

and not having the disease right probability

of having a disease is 0.1

while on the other hand probability of not having

a disease is 0.9.

Okay first, let's see

when we have disease and we go to the doctor.

All right, so when we visited the doctor

and the test is positive Adjective so probability

of having a positive test

when you're having a disease is 0.8 0 and probability

of a negative test

when you already have a disease that is 0.20.

This is also a false negative statement as the test

is detecting negative,

but you still have the disease, right?

So it's a false negative statement.

Now, let's move ahead

when you don't have the disease at all.

So probability of not having a disease is 0.9.

And when you visit the doctor and the doctor is like, yes,

you have the disease.

But you already know that you don't have the disease.

So it's a false positive statement.

So probability of having a disease when you actually

know there is no disease is 0.1 and probability

of not having a disease

when you actually know there is no disease.

So and the probability of it is around 0.90 fine.

It is same as probability of not having a disease even

the test is showing the same results

a true positive statement.

So it is 0.9.

All right.

So let's move on ahead and discuss about kn n algorithm.

So this KNN algorithm or the k-nearest neighbor,

it stores all the available cases

and classifies new cases based on the similarity measure the K

in the KNN algorithm as the nearest neighbor,

we wish to take vote from for example,

if k equal 1 then the object is simply assigned to the class

of that single nearest neighbor from the diagram.

You can see the difference in the image

when k equal 1 k equal 3 and k equal 5, right?

Well the And systems

are now able to use the k-nearest neighbor

for visual pattern recognization to scan

and detect hidden packages in the bottom bin

of a shopping cart at the checkout

if an object is detected

which matches exactly to the object listed

in the database.

Then the price of the spotted product could even

automatically be added to the customers Bill

while this automated billing practice is not used

extensively at this time,

but the technology has been developed

and is available for use

if you want you can just use It and yeah,

one more thing k-nearest neighbor is also used

in retail to detect patterns

in the credit card users many new transaction scrutinizing

software application use

Cayenne algorithms to analyze register data

and spot unusual pattern

that indicates suspicious activity.

For example, if register data indicates

that a lot of customers information

is being entered manually rather than through automated scanning

and swapping then in that case.

This could indicate

that the employees were using the register.

In fact stealing customers personal information or

if I register data indicates

that a particular good is being returned

or exchanged multiple times.

This could indicate

that employees are misusing the return policy

or trying to make money from doing the fake returns, right?

So this was about KNN algorithm.

So starting with what is decision tree,

but first, let me tell you why did we choose

the Gentry to start with?

Well, these decision tree are really very easy to read

and understand it belongs to one of The few models

that interpretable

where you can understand exactly why the classifier has made

that particular decision right?

Let me tell you a fact that for a given data set.

You cannot say

that this algorithm performs better than that.

It's like you cannot say that decision trees

better than a buys

or name biases performing better than decision tree.

It depends on the data set,

right you have to apply hit and trial method

with all the algorithms one

by one and then compare the result the model

which gives the best result as the Order

which you can use at for better accuracy

for your data set.

All right, so let's start with what is decision tree.

Well a decision tree is a graphical representation

of all the possible solution

to our decision based on certain conditions.

Now, you might be wondering why this thing is called

as decision tree.

Well, it is called so

because it starts with the root

and then branches off to a number of solution just

like a tree right even the tree starts from a roux

and it starts growing its branches.

As once it gets bigger

and bigger similarly in a decision tree.

It has a roux

which keeps on growing with increasing number of decision

and the conditions now,

let me tell you a real life scenario.

I won't say that all of you,

but most of you must have used it.

Remember whenever you dial the toll-free number

of your credit card company,

it redirects you

to his intelligent computerised assistant

where it asks you questions like,

press one for English or press 2 for Henry,

press 3 for this press 4 for that right now

once you select one now again,

It redirects you to a certain set

of questions like press 1 for this press 1 for that

and similarly, right?

So this keeps on repeating

until you finally get to the right person, right?

You might think

that you are caught in a voicemail hell

but what the company was actually doing it

was just using a decision tree to get you to the right person.

I lied.

I'd like you to focus on this particular image

for a moment on this particular slide.

You can see I image where the task is.

Should I accept a new job offer or not?

Alright, so you have to decide that for That

what you did you created a decision tree starting

with the base condition or the root node.

Was that the basic salary

or the minimum salary should be $50,000

if it is not $50,000.

Then you are not at all accepting the offer.

All right.

So if your salary is greater than $50,000,

then you will further check

whether the commute is more than one hour or not.

If it is more than one are you will just decline the offer

if it is less than one hour,

then you are getting closer to accepting the job offer then

further what you will do.

You will check whether the company is offering.

Free coffee or not,

right if the company is not offering the free coffee,

then you will just decline the offer

and have fit as offering the free coffee.

And yeah, you will happily accept the offer right?

This is just an example of a decision tree.

Now, let's move ahead and understand a decision tree.

Well, here is a sample data set

that I will be using it to explain you

about the decision tree.

All right in this data set each row is an example.

And the first two columns provide features or attributes

that describes the data

and the last column gives the label

or the class we want to predict

and if you like you can just modify this data

by adding additional features

and more example

and our program will work in exactly the same way fine.

Now this data set is pretty straightforward

except for one thing.

I hope you have noticed that it is not perfectly separable.

Let me tell you something more about that as

in the second and fifth examples they have the same features.

But different labels both have yellow as a Colour

and diameter as three,

but the labels are mango and lemon right?

Let's move on and see

how our decision tree handles this case.

All right, in order to build a tree will use a decision tree

algorithm called card this card algorithm

stands for classification

and regression tree algorithm online.

Let's see a preview of how it works.

All right to begin with We'll add a root node

for the tree and all the nodes receive a list

of rows as a input

and the route will receive the entire training data set now

each node will ask true and false question

about one other feature.

And in response to that question will split

or partition the data set into two different subsets

these subsets then become input to child node.

We are to the tree

and the goal of the question is to finally unmix the labels

as we proceed down or in other words to produce

the purest possible distribution of the labels at each node.

For example, the input of this node contains only.

One single type of label so we could say

that it's perfectly unmixed.

There is no uncertainty about the type of label

as it consists of only grapes right

on the other hand the labels in this node are still mixed up.

So we would ask another question to further drill it down,

right but before that we need to understand which question to ask

and when and to do

that we need to conduct

by how much question helps to unmix the label

and we can quantify the amount of Uncertainty

at a single node using a metric called gini impurity

and we can quantify

how much a question reduces

that uncertainty using a concept called information game will use

these to select the best question to ask at each point.

And then what we'll do we'll iterate the steps

will recursively build the tree

on each of the new node will continue dividing the data

until there are no further question to ask

and finally we reach to our Leaf.

Alright, alright, so this was about decision tree.

So in order to create a diversion First of all

what you have to do you have to identify

different set of questions

that you can ask to a tree like is this color green

and what will be these question this question will be decided by

your data set like as this colored green

as the diameter greater than equal to 3 is the color

yellow right questions resembles to your data set remember that?

All right.

So if my color is green,

then what it will do it will divide into two part first.

The Green Mango will be in the true while on the false.

We have lemon and the map all right.

And if the color is green

or the diameter is greater than equal to 3

or the color is yellow.

Now let's move on

and understand about decision tree terminologies.

Alright, so starting

with root node root node is a base node of a tree

the entire tree starts from a root node.

In other words.

It is the first node of a tree it represents

the entire population or sample

and this entire population is further segregated

or divided into two or more homogeneous set.

Fine.

Next is the leaf node.

Well, Leaf node is the one

when you reach at the end of the tree,

right that is you cannot further segregated down

to any other level.

That is the leaf node.

Next is splitting splitting is dividing your root node

or node into different sub part on the basis of some condition.

All right, then comes the branch or the sub tree.

Well, this Branch or subtree gets formed

when you split the tree suppose when you split a root node,

it gets divided into two branches

or two subtrees right next.

The concept of pruning.

Well, you can say that pruning is just opposite

of splitting what we are doing here.

We are just removing the sub node of a decision tree

will see more about pruning later in this session.

All right, let's move on ahead.

Next is parent or child node.

Well, first of all root node is always the parent node

and all other nodes

associated with that is known as child node.

Well, you can understand it in a way that all the top node

belongs to a parent node and all the bottom node

which are derived from a Top node zhi node the node

producing a further note is a child node and the node

which is producing.

It is a parent node simple concept, right?

Let's use the cartel Gotham and design a tree manually.

So first of all,

what you do you decide which question to ask

and when so how will you do that?

So let's first of all visualize the decision tree.

So there's the decision tree which will be creating manually

or like first of all,

let's have a look at the Data set you have

Outlook temperature humidity and windy

as you have different attributes on the basis of

that you have to predict that whether you can play or not.

So which one among them should you pick first answer determine

the best attribute that classifies the training data?

All right.

So how will you choose the best attribute

or how does a tree decide

where to split or how the tree will decide its root node?

Well before we move on

and split a tree there are some terminologies

that you should know.

All right first being the gini index.

X so what is this gini Index?

This gini index is the measure of impurity or Purity used

in building a decision Tree in cartel Gotham.

All right.

Next is Information Gain this Information Gain is

the decrease in entropy

after data set is split on the basis of an attribute

constructing a decision tree is all about finding an attribute

that Returns the highest Information Gain.

All right, so you will be selecting the node

that would give you the highest Information Gain.

Alright next is reduction in variance.

Reduction in variance is an algorithm which is used

for continuous Target variable or regression problems.

The split with lower variance is selected as a criteria to let

the population see in general term.

What do you mean by variance?

Variance is how much your data is wearing?

Right?

So if your data is less impure or is more pure

than in that case the variation would be less

as all the data almost similar, right?

So there's also a way of setting a tree the split

with lower variance

is selected as the criteria to split the population.

All right.

Next is the chi Square t Square.

It is an algorithm

which is used to find out these statistical significance

between the differences between sub nodes

and the parent nodes fine.

Let's move ahead now the main question is

how will you decide the best attribute

for now just understand

that you need to calculate something known as

information game the attribute

with the highest Information Gain is considered the best.

Yeah.

I know your next question might be like what?

This information,

but before we move on and see

what exactly Information Gain Is let me first introduce you

to a term called entropy

because this term

will be used in calculating the Information Gain.

Well entropy is just a metric

which measures the impurity of something or in other words.

You can say that as the first step to do

before you solve the problem of a decision tree

as I mentioned is something about impurity.

So let's move on and understand what is impurity suppose.

You are a basket full of apples

and another Bowl Which is full of same label,

which says Apple now

if you are asked to pick one item

from each basket and ball,

then the probability of getting the apple

and it's correct label is 1 so in this case, you can say

that impurities zero.

All right.

Now what if there are four different fruits

in the basket and four different labels in the ball,

then the probability of matching the fruit

to a label is obviously not one.

It's something less than that.

Well, it could be possible

that I picked banana from the basket

and when I randomly picked Level from the ball.

It says a cherry any random permutation

and combination can be possible.

So in this case, I'd say that impurities is nonzero.

I hope the concept of impurities here.

So coming back to entropy

as I said entropy is the measure of impurity

from the graph on your left.

You can see that

as the probability is zero or one

that is either they are highly impure

or they are highly pure than in that case the value

of entropy is zero.

And when the probability is 0.5 then the value of entropy.

Is maximum.

Well, what is impurity impurities the degree

of Randomness how random data is

so if the data is

completely pure in that case the randomness equals zero or

if the data is completely empty or even in that case

the value of impurity will be zero question.

Like why is it

that the value of entropy is maximum

at 0.5 might arise in a mine, right?

So let me discuss about that.

Let me derive it mathematically

as you can see here on the slide the mathematical formula

of entropy is -

of probability of yes,

let's move on and see

what this graph has to say mathematically suppose s is

our total sample space and it's divided into two parts.

Yes, and no like in our data set the result

for playing was divided into two parts.

Yes or no,

which we have to predict either we have to play or not.

Right?

So for that particular case,

you can Define the formula of entropy as entropy

of total sample space equals negative

of probability of e

is multiplied by log of probability.

We of yes,

whether base 2 minus probability of no X log of probability of no

with base to where s is your total sample space

and P of v s is the probability of e s--

and p-- of know is the probability of no.

Well, if the number of BS equal number of know

that is probability of s equals 0.5 right

since you have equal number of BS and know so

in that case the value

of entropy will be one just put the value over there.

All right.

Let me just move to Next slide I'll show you this.

Alright next is if it contains all Yes,

or all know that is probability of a sample space is either 1

or 0 then in that case entropy will be equal to 0

Let's see the mathematically one by one.

So let's start with the first condition

where the probability was 0.5.

So this is our formula for entropy, right?

So there's our first case right which will discuss the art

when the probability of vs equal probability of node

that is in our data set we have Rule number of yes, and no.

All right.

So probability of yes equal probability of no

and that equals 0.5 or in other words,

you can say that yes

plus no equal to Total sample space.

All right, since the probability is 0.5.

So when you put the values

in the formula you get something like this

and when you calculate it,

you will get the entropy of the total sample space as one.

All right.

Let's see for the next case.

What is the next case either you have totally us

or you have to No,

so if you have total, yes,

let's see the formula when we have total.

Yes.

So you have all yes and 0 no fine.

So probability of e s equal one.

And yes as the total sample space obviously.

So in the formula when you put that thing up here,

you get entropy

of sample space equal negative X of 1 multiplied by log of 1

as the value of log 1 equals 0.

So the total thing will result to 0 similarly is the case

with no even in that case you will get the entropy

of total sample.

Case as 0 so this was all about entropy.

All right.

Next is what is Information Gain?

Well Information Gain

what it does is it measures the reduction in entropy.

It decides which attribute

should be selected as the decision node.

If s is our total collection

than Information Gain equals entropy,

which we calculated just now that -

weighted average multiplied by entropy of each feature.

Don't worry.

We'll just see

how it to calculate it with an example.

All right.

So let's manually build a decision tree

for our data set.

So there's our data set

which consists of 14 different instances

out of which we have nine.

Yes and five know I like so we have the formula

for entropy just put over that since 9 years.

So total probability of e s equals 9

by 14 and total probability of no equals Phi by 14

and when you put up the value

and calculate the result you will get the value.

Oh of entropy as 0.94.

All right.

So this was your first step

that is compute the entropy for the entire data set.

All right.

Now you have to select

that out of Outlook temperature humidity and windy,

which of the node should you select as the root node

big question, right?

How will you decide that?

This particular node should be chosen at the base note

and on the basis of

that only I will be creating the entire tree.

I will select that.

Let's see so you have to do

it one by one you have to calculate the entropy

and Information Gain for all

of the Front note so starting with Outlook.

So Outlook has

three different parameters Sunny overcast and rainy.

So first of all select how many number of years

and no are there in the case of Sunny like when it is sunny

how many number of years

and how many number of nodes are there?

So in total we have to yes

and three Nos and case of sunny in case of overcast.

We have all yes.

So if it is overcast then will surely go to play.

It's like that.

Alright and next it is rainy then total number of vs equal.

Three and total number of no equals 2 fine next

what we do we calculate the entropy

for each feature for here.

We are calculating the entropy when Outlook equals Sunny.

First of all,

we are assuming that Outlook is our root node

and for that we are calculating the information gain for it.

Alright.

So in order to calculate the Information Gain remember

the formula it was entropy of the total sample space -

weighted average X entropy of each feature.

All right.

So what we are doing here,

we are calculating the entropy of out.

Look when it was sunny.

So total number of yes,

when it was sunny was to and total number of know

that was three fine.

So let's put up in the formula

since the probability of yes is 2 by 5

and the probability of no is 3 by 5.

So you will get something like this.

Alright, so you are getting the entropy

of sunny as zero point nine seven one fine.

Next we will calculate the entropy for overcast

when it was overcast.

Remember it was all yes, right.

So the probability

of yes is equal 1 and when you put over

that you will get the value of entropy as 0 fine

and when it was rainy rainy has 3s and to nose.

So probability of e s in case of Sonny's 3 by 5

and probability of know in case of Sonny's 2 by 5.

And when you add the value of probability of vs

and probability of no to the formula,

you get the entropy of sunny as zero point nine seven one point.

Now, you have to calculate

how much information you are getting from Outlook

that equals weighted average.

All right.

So what was this?

To diverge total number of years and total number of no fine.

So information from Outlook equals 5 by 14 from

where does this 5 came over?

We are calculating

the total number of sample space within that particular Outlook

when it was sunny, right?

So in case of Sunny there was two years and three NOS.

All right.

So weighted average for Sonny would be equal to 5 by 14.

All right,

since the formula was five by 14 x entropy of each feature.

All right, so

as calculated the entropy He for Sonny is zero point nine.

Seven one, right?

So what we'll do we'll multiply 5 by 14 with 0.97 one.

Right?

Well, this was the calculation for information

when Outlook equal sunny,

but Outlook even equals overcast and rainy for in that case.

What we'll do again similarly will calculate for everything

for overcast and sunny

for overcast weighted averages

for by 14 multiplied by its entropy.

That is 0 and for Sonny it is same Phi by 14.

Yes, and to Knows X its entropy

that is zero point nine seven one.

And finally we'll take the sum of all of them which equals

to 0.693 right next.

We will calculate the information gained this

what we did earlier was information taken from Outlook.

Now, we are calculating.

What is the information?

We are gaining from Outlook right.

Now this Information Gain

that equals to Total entropy minus the information

that is taken from Outlook.

All right, so So total entropy we had 0.94 -

information we took from Outlook as 0.693.

So the value of information gained from Outlook results

to zero point two four seven.

All right.

So next what we have to do.

Let's assume that Wendy is our root node.

So Wendy consists of two parameters false and true.

Let's see how many years

and how many nodes are there in case of true and false.

So when Wendy has Falls as its parameter,

then in that case it has six years and to knows.

And when it as true as its parameter,

it has 3 S and 3 nodes.

All right.

So let's move ahead

and similarly calculate the information taken from Wendy

and finally calculate the information gained from Wendy.

Alright, so first of all,

what we'll do we'll calculate the entropy

of each feature starting with windy equal true.

So in case of true we had equal number of yes

and equal number of no will remember the graph

when we had the probability as

0.5 as total number of years equal total number of know.

For that case the entropy equals 1

so we can directly write entropy of room

when it's windy is one

as we had already proved it

when probability equals 0.5 the entropy is the maximum

that equals to 1.

All right.

Next is entropy of false when it is windy.

All right, so similarly just put the probability of yes

and no in the formula and then calculate the result

since you have six years and two nodes.

So in total,

you'll get the probability of e S6 by 8 and probability

of know Two by eight.

All right, so when you will calculate it,

you will get the entropy

of false as zero point eight one one.

Alright, now, let's calculate the information from windy.

So total information collected from Windy

equals information taken

when Wendy equal true plus information taken

when when D equals false.

So we'll calculate the weighted average for each one of them

and then we'll sum

it up to finally get the total information taken from windy.

So in this case,

it equals to 8 by 14 multiplied by 0.8 1 1 + 6 y 14 x 1

what is this?

8 it is total number of yes, and

no in case when when D equals false, right?

So when it was false, so total number of BS

that equals to 6 and total more of know that equal to 2

that some herbs to 8.

All right.

So that is why the weighted average results to Aid by

14 similarly information taken

when windy equals true equals to 3 plus 3 that is 3 S

and 3 no equal 6 divided by total number of sample space.

That is 14 x That is entropy of true.

All right, so it is a

by 14 multiplied by 0.8 1 1 plus 6 by 14 x one

which results to 0.89 to this

is information taken from Windy.

All right.

Now how much information you are gaining from Wendy.

So for that what you will do so total information gained

from Windy that equals to Total entropy -

information taken from Windy.

All right, that is 0.94 -

0.89 to that equals to zero point zero four eight.

And so 0.048 is the information gained from Windy.

Similarly we calculated for the rest to all right.

So for Outlook as you can see,

the information was 0.693.

And it's Information Gain was zero point two four seven

in case of temperature.

The information was around

zero point nine one one and the Information Gain

that was equal to 0.02 9 in case of humidity.

The information gained was 0.15 to and in the case of windy.

The information gained was 0.048.

So what we'll do we'll select the attribute.

With a maximum fine.

Now, we are selected Outlook as our root node,

and it is further subdivided

into three different parts Sunny overcast and rain,

so in case of overcast we have seen

that it consists of all.

Yes, so we can consider it as a leaf node,

but in case of sunny and rainy,

it's doubtful as it consists of both.

Yes and both know

so you need to recalculate the things right again

for this node.

You have to recalculate the things.

All right, you have to again select the attribute.

Is having the maximum Information Gain.

All right, so there's

how your complete tree will look like.

All right.

So, let's see when you can play so you can play

when Outlook is overcast.

All right, in that case.

You can always play if the Outlook is sunny.

You will further drill down to check the humidity condition.

All right, if the humidity is normal,

then you will play

if the humidity is high then you won't play right

when the Outlook predicts

that it's rainy then further you will check

whether it's windy or not.

If it is a week went then you will go and offer.

Say but if it has strong wind, then you won't play right?

So this is how your entire decision tree would look

like at the end.

Now comes the concept of pruning say is

that what should I do to play?

Well you have to do pruning pruning will decide

how you will play.

What is this pruning?

Well, this pruning is nothing but cutting down the nodes

and order to get the optimal solution.

All right.

So what pruning does it reduces the complexity?

All right as are you can see on the screen

that it showing only the result for you.

That is it showing all the result which says

that you can play.

All right before we drill down to a practical session

a common question might come in your mind.

You might think

that our tree base model better than cleaner model, right?

You can think like if I can use a logistic regression

for classification problem

and linear regression for regression problem.

Then why there is a need to use the tree.

Well many of us have this In in their mind and well,

there's a valid question too.

Well, actually as I said earlier,

you can use any algorithm.

It depends on the type of problem.

You're solving let's look at some key factor,

which will help you to decide which algorithm to use and

when so the first point being

if the relationship between dependent and independent

variable as well approximated

by a linear model then linear

regression will outperform tree base model second case

if there is a high non-linearity and complex

relationship between Lent

and independent variables at remodel will outperform

a classical regression model in third case.

If you need to build a model

which is easy to explain to people a decision tree model

will always do better than a linear model

as the decision tree models

are simpler to interpret then linear regression.

All right.

Now, let's move on ahead and see

how you can write it as Gentry classifier from scratch

and python using the card algorithm.

All right for this.

I will be using jupyter notebook with python 3.0.

Oh install on it.

Alright, so let's open the Anaconda

and the jupyter notebook.

Whereas that so this

is a inner Corner Navigator and I will directly jump over

to jupyter notebook and hit the launch button.

I guess everyone knows that jupyter.

Notebook is a web-based interactive Computing notebook

environment where you can run your python codes.

So my jupyter notebook.

It opens on my Local Host double 8 9 1

so I will be using this jupyter notebook

in order to write my decision tree classifier

using python for this decision tree classifier.

I have already written.

Set of codes.

Let me explain you just one by one.

So we'll start with initializing our training data set.

So there's our sample data set

for which each row is an example.

The last column is a label

and the first two columns are the features.

If you want you can add some more features an example

for your practice interesting fact is

that this data set is designed in a way

that the second and fifth example have almost

the same features,

but they have different labels.

All right.

So let's move on and see how the tree handles this case

as you can see here both.

Both of them the second

and the fifth column have the same features.

What did different is just their label?

Right?

So let's move ahead.

So this is our training data set next what we are doing we

are adding some column labels.

So they are used only to print the trees fine.

So what we'll do we'll add header to the columns

like the First Column is of color second is of diameter

and third is a label column.

Alright, next Road will do will Define

a function as unique values in which will pass the rows

and the columns.

So this function what it will do.

We find the unique values for a column in the data set.

So this is an example for that.

So what we are doing here,

we are passing training data Hazard row

and column number as 0 so

what we are doing we are finding unique values in terms of color.

And in this

since the row is training data and the column is 1

so what you are doing here,

so we are finding the unique values

in terms of diameter fine.

So this is just an example next

what we'll do we'll Define a function as class count

and we'll pass zeros into it.

So what it does it counts the number of each type

of Example within data set.

So in this function

what we are basically doing we are counting the number

of each type for example

in the data set or what we are doing.

We are counting the unique values for the label

in the data set as a sample.

You can see here.

We can pass that entire training data set

to this particular function as class underscore count

what it will do it will find all the different types of label

within the training data set

as you can see here the unique label consists of mango grape

and lemon so next what we'll do we'll Define a function

is numeric and we'll pass a value into it.

So what it Do it.

We'll just test

if the value is numeric or not and it will return

if the value is an integer or a float.

For example, you can see is numeric.

We are passing 7 so it is an integer

so it will return in value and

if we are passing red it's not a numeric value, right?

So moving on ahead

where you define a class named as question.

So what this question does

this question is used to partition the data set.

This class voted does it just records a column number?

For example 0 for color a light and a column value for example,

green Next what we are doing we are defining a match method

which is used to compare the feature value

in the example.

The feature values stored in the question.

Let's see how first of all what you are doing.

We're defining an init function and inside

that we are passing the self column

and the value as parameter.

So next what we do we Define a function

as match what it does is it compares the feature value

in an example to the feature value in this question

when next we'll Define a function as re PR,

which is just a helper method to print the question

in a readable format.

Next what we are doing we are defining a function partition.

Well, this function is used to partition

the data set each row in the data set it checks

if it matched the question or not

if it does so it adds it to the true rose or if not,

then it adds to the false Rose.

All right, for example,

as you can see, it's partition the training data set based on

whether the rows are ready or not here.

We are calling the function question

and we are passing a value of zero and read to it.

So what did we do?

It will assign all the red rose to True underscore Rose.

And everything else will be assigned

to false underscore rose fine.

Next what we'll do we'll Define a gini impurity function

and inside that will pass the list of rows.

So what it will do it will just calculate the dream Purity

for the list of rows.

Next what we are doing every defining a function

as Information Gain.

So what this Information Gain function does it calculates

The Information Gain using the uncertainty

of the starting node -

the weighted impurity of the child node.

The next function is find the best plate.

Well, this function is used to find the best question to ask

by iterating over every feature of value

and then calculating the information game.

For the detail explanation on the code.

You can find the code in the description given below.

All right next we'll define a class as leave

for classifying the data.

It holds a dictionary of glass like mango for how many times

it appears in the row from the training data

that reaches the sleeve.

Alright next is the decision node.

So this decision node, it will ask a question.

This holds a reference to the question

and the two child nodes on the base of

that you are deciding which node to add further to which branch.

Alright so next video.

We're defining a function of Beltre and inside

that we are passing our number of rows.

So this is the function that is used to build the tree.

So initially what we did we Define all the various function

that we'll be using in order to build a tree.

So let's start

by partitioning the data set for each unique attribute,

then we'll calculate the information gain

and then return the question

that produces the highest gain

and on the basis of that will split the tree.

So what we are doing here,

we are partitioning the data set calculating

the Information Gain.

And then what this is returning it is returning the question

that is producing the highest gain.

All right.

Now if gain equals 0 return Leaf Rose,

so what it will do.

So if we are getting no for the gain

that is gain equals 0 then in that case

since no further question could be asked

so what it will do it will return a leaf fine now true

or underscore Rose

or false underscore Rose equal partition with rose

and the question.

So if we are reaching tell this position,

then you have already found a Value

which will be used to partition the data set then

what you will do you will recursively build

the true branch

and similarly recursively build the false Branch.

So return Division and Discord node and side

that will be passing question true branch and false Branch.

So what it will do it will return a question node.

This question node this recalls the best feature

or the value to ask at this point fine.

Now that we have Builder tree next

what we'll do we'll Define a print underscore tree function

which will be used to print the tree fine.

So finally what we are doing in this particular function

that we are printing our tree next is the classify function

which will use it to decide

whether to follow the true Branch or the false branch

and then compared

to the feature values stored in the node to the example.

We are considering and last what we'll do

we'll finally print the production at the leaf.

So let's execute it and see okay,

so there's our testing data.

Online so we printed a leaf as well.

Now that we have trained our algorithm is

our training data set now it's time to test it.

So there's our testing data set.

So let's finally execute it and see what is the result.

So this is the result you will get so first question,

which is asked by the algorithm is is diameter greater

than equal to 3,

if it is true,

then it will further ask if the color is yellow again,

if it is true,

then it will predict mango as one and lemon with one.

And in case it is false,

then it will just predict the mango.

Now.

This was the true part.

Now next coming to diameter is not greater

than or equal to 3 then in that case it's false.

And what did we do?

It'll just predict the grape vine.

Okay.

So this was all about the coding part now,

let's conclude this session.

But before concluding let me just show you one more thing.

Now.

There's a scikit-learn algorithm cheat sheet,

which explains you

which algorithm you should use and when all right,

let's build in a decision tree format.

At let's see how it is Big.

So first condition it will check

whether you have 50 samples or not.

If your samples are greater than 50,

then we'll move ahead if it is less than 50,

then you need to collect more data

if your sample is greater than 50,

then you have to decide

whether you want to predict a category or not.

If you want to predict a category,

then further you will see

that whether you have labeled data or not.

If you have label data, then

that would be a classification algorithm problem.

If you don't have the label data,

then it would be a clustering problem.

Now if you don't want to The category then

what you want to protect predict a quantity.

Well, if you want to predict a quantity,

then in that case,

it would be a regression problem.

If you don't want to predict

a quantity and you want to keep looking further,

then in that case,

you should go for dimensionality reduction problems and still

if you don't want to look

and the predicting structure is not working.

Then you have tough luck for that.

I hope this doesn't recession clarifies all your doubt

over decision tree algorithm.

Now, we'll try to find out the answer to this particular

question as to why we need random Forest fine.

So like human beings learn from the past experiences.

So unlike human beings a computer does not have

experiences then how does machine takes decisions?

Where does it learn from?

Well a computer system actually

learns from the data which represents some past experiences

of an application domain.

So now let's see,

how random Forest It's in building up in learning model

with a very simple use case of credit risk detection.

Now needless to say

that credit card companies

have a very nested interest in identifying

Financial transactions

that are illegitimate and criminal in nature.

And also I would like to mention this point

that according to the Federal Reserve payments

study Americans used credit cards to pay

for twenty six point two million purchases in 2012

and The estimated loss due to unauthorized transactions

that here was u.s.

6 point 1 billion dollars now

in the banking industry measuring risk is very critical

because the stakes are too high.

So the overall goal is actually to figure out

who all can be fraudulent

before too much Financial damage has been done.

So for this a credit card company receives thousands

of applications for new cards

and each application contains information.

Mission about an applicant, right?

So so here as you can see that from all those applications

what we can actually figure out is

that predictor variables.

Like what is the marital status of the person?

What is the gender of the person?

What is the age of the person and the status which is actually

whether it is a default pair or non-default pair.

So default payments are basically when payments

are not made in time

and according to the agreement signed by the cardholder.

So now that account is actually set to be in the default.

So you can easily figure out the history

of the particular card holder from this then we can also look

at the time of payment

whether he has been a regular pair

or non regular one.

What is the source of income for that particular person

and so and so forth.

So to minimize loss the back actually needs

certain decision rule to predict

whether to approve

Particular no one of that particular person or not.

Now here is

where the random Forest actually comes into the picture.

All right.

Now, let's see how random Forest can actually help us

in this particular scenario.

Now, we have taken randomly

two parameters out of all the predictive variables

that we saw previously now,

we have taken two predictor variables here.

The first one is the income

and the second one is the H right

and Hurley parallel

it to decision trees have been implemented

upon those predicted variables and let's first assume the case

of the income variable right?

So here we have divided our income into three categories

the first one being the person earning over $35,000 second

from 15 to 35 thousand dollars the third one running

in the range of 0 to 15 thousand dollars.

Now if a person is earning over $35,000,

which is a pretty Good income pretty decent.

So now we'll check out for the credit history.

And here the probability is

that if a person is earning a good amount then

there is very low risk

that he won't be able to pay back already earning good.

So the probability is

that his application of loan will get approved.

Right?

So there is actually low risk or moderate risk,

but there's no real issue of higher risk as such.

We can approve the applicants request here.

Now, let's move on and watch out for the second category

where the person is actually earning

from 15 to 35 thousand dollars right now here the person may

or may not pay back.

So in such scenarios will look for the credit history as

to what has been his previous history.

Now if his previous history has been bad

like he has been a default ER in the previous transactions

will definitely not Consider approving his request

and he will be

at the high risk in which is not good for the bank.

If the previous history

of that particular applicant is really good.

Then we will just to clarify

a doubt will consider another parameter as well

that will be on depth.

I have his already in really high dip then

the risks again increases and there are chances

that he might not pay repay in the future.

So here Will.

Not accept the request of the person having high dipped

if the person is in the low depth

and he has been a good pair in his past history.

Then there are chances

that he might be back and we can consider

approving the request of this particular applicant.

Alex look at the third category,

which is a person earning from 0 to 15 thousand dollars.

Now, this is something which actually raises I broke

and this person will actually lie

in the category of high risk.

All right.

So the probability is

that his application of loan would probably get rejected now,

we'll get one final outcome from this income parameter, right?

Now let us look at our second variable

that is H which will lead into the second decision tree.

Now.

Let us say if the person is Young, right?

So now we will look forward to if it is a student now

if it is a student then the chances are high

that he won't be able to repay back

because he has no earning Source, right?

So here the risks are too high and probability is

that his application of loan will get rejected fine.

Now if the person is Young

and his Not the student then we'll probably go on

and look for another variable.

That is pan balance.

Now.

Let's look if the bank balance is less than 5 lakhs.

So again the risk arises and the probabilities

that his application of loan will get rejected.

Now if the person is Young is not a student

and his bank balance so

of greater than 5 lakhs is got a pretty good

and stable and balanced then the probability is

that he is sort of application

will get approved of Now let us take another scenario

if he's a senior, right?

So if he is a senior will probably go and check out

for this credit history.

How well has he been in his previous transactions?

What kind of a person he is like

whether he's a defaulter or is Ananda falter.

Now if he is a very fair kind of person

in his previous transactions then again the risk arises

and the probability of his application

getting rejected actually increases right now

if he has An excellent person as

per his transactions in the previous history.

So now again here there is least risk

and the probabilities

that his application of loan will get approved.

So now here these two variables income and age have led

to two different decision trees.

Right and these two different decision trees actually led

to two different results.

Now what random forest does is it will actually compile

these two different results from these two different.

Gentry's and then finally,

it will lead to a final outcome.

That is how random Forest actually works.

Right?

So that is actually the motive of the random Forest.

Now let us move forward and see what is random Forest right?

You can get an idea

of the mechanism from the name itself random forests.

So a collection of trees is a fortress

that's why I called for is probably and here

also the trees are actually because being trained on subsets

which are being selected at random.

And therefore they are called random forests So Random forests

is a collection or an insane.

Humble of decision trees right here decision trees actually

built using the whole data set considering all features,

but actually in random Forest only a fraction of the number

of rows is selected

and that too at random

and a particular number of features,

which are actually selected at random are trained

upon and that is

how the decision trees are built upon.

Right?

So similarly number of decision trees will be grown

and each decision tree will Salt into a certain final outcome

and random Forest will do nothing

but actually just compiled the results

of all those decision trees to bring up the final result.

As you can see in this particular figure

that a particular instance actually has resulted

into three different decision trees, right?

So not tree one results into a final outcome called Class A

and tree to results

into class B. Similarly tree three results into class P

So Random Forest

will compile the results of all these Decision trees

and it will go by the call of the majority voting now

since head to decision trees have actually voted

into the favor of the Class B that is decision tree 2 and 3.

Therefore the final outcome will be in the favor of the Class B.

And that is how random Forest actually works upon.

Now one really

beautiful thing about this particular algorithm is

that it is one of the versatile algorithms

which is capable of Performing both regression as well as Now,

let's try to understand random Forest further

with a very beautiful example or this is my favorite one.

So let's say you want to decide

if you want to watch edge of tomorrow or not, right?

So in this particular scenario,

you will have two different actions to work Bond either.

You can just straight away go

to your best friend asked him about.

All right,

whether should I go for Edge of Tomorrow not will I

like this movie or you can ask Your friends

and take their opinion consideration and then based

on the final results

who can go out and watch Edge of Tomorrow, right?

So now let's just take the first scenario.

So where you go to your best friend asked about

whether you should go out to watch edge

of tomorrow or not.

So your friend will probably ask you certain questions

like the first one being here Jonah So so let's say

your friend asks you

if you really like The Adventurous kind

of movies or not.

So you say yes,

definitely I would love to watch it Venture kind of movie.

So the probabilities

that you will like edge of tomorrow as well.

Since Age of Tomorrow is also a movie of Adventure

and sci-fi kind of Journal right?

So let's say you do not like the adventure John a movie.

So then again the probability reduces

that you might really not like edge of Morrow right.

So from here you can come to a certain conclusion right?

Let's say your best friend puts you into another situation

where he'll ask you

or a do you like Emily Blunt and you see definitely

I like Emily Blunt and then he puts another question to you.

Do you like Emily Blunt to be in the main lead

and you say yes, then again,

the probability arises

that you will definitely like edge of tomorrow as

well because Edge of Tomorrow is Has the Emily plant

in the main lead cast so

and if you say oh I do not like Emily Blunt then again,

the probability reduces

that you would like Edge of Tomorrow to write.

So this is one way

where you have one decision tree and your final outcome.

Your final decision will be based on your one decision tree,

or you can see your final outcome will be based

on just one friend.

No, definitely not really convinced.

You want to consider the options of your other friends also

so that you can make very precise and crisp

decision right you go out

and you approach some other bunch of friends of yours.

So now let's say you go to three of your friends

and you ask them the same question

whether I would like to watch it off tomorrow or not.

So you go out and approach

three or four friends friend one friend twin friend three.

Now, you will consider each of their Sport

and then you will your decision now will be dependent

on the compiled results of all of your three friends, right?

Now here, let's say you go to your first friend

and you ask him

whether you would like to watch it just tomorrow

not and your first friend puts you to one question.

Did you like Top Gun?

And you say yes,

definitely I did like the movie Top Gun then the probabilities

that you would like edge of tomorrow as

well because topgun is actually a military action drama,

which is also Tom Cruise.

So now again the probability Rises that yes,

you will like edge of tomorrow as well and

If you say no I didn't like Top Gun then again.

The chances are

that you wouldn't like Edge of Tomorrow, right?

And then another question that he puts you across is

that do you really like to watch action movies?

And you say yes,

I would love to watch them that again.

The chances are

that you would like to watch Edge of Tomorrow.

So from your friend

when you can come to one conclusion now

here since the ratio

of liking the movie to don't like is actually 2 is

to 1 so the final result is Actually,

you would like Edge of Tomorrow.

Now you go to your second friend and you ask the same question.

So now you are second friend asks you did you like far

and away when we went out and did the last time

when we washed it

and you say no I really didn't like far and away

then you would say then you are definitely going

to like Edge of Tomorrow.

Why does so because far and away is actually

since most of whom might not be knowing it so far

in a ways Johner of romance

and it revolves around a girl

and a guy By falling in love with each other and so on.

So the probability is

that you wouldn't like edge of tomorrow.

So he ask you another question.

Did you like Bolivian

and to really like to watch Tom Cruise?

And you say Yes, again.

The probability is

that you would like to watch Edge of Tomorrow.

Why because Oblivion again is a science fiction

casting Tom Cruise full of strange experiences.

And where Tom Cruise is the savior of the masses.

Kind well,

that is the same kind of plot in edge of tomorrow as well.

So here it is pure yes

that you would like to watch edge of tomorrow.

So you get

another second decision from your second friend.

Now you go to your third friend and ask him so

probably our third friend is not really interesting

in having any sort of conversation with you say,

it just simply asks you did you like Godzilla and you said

no I didn't like Godzilla's

we said definitely you wouldn't like

it's of tomorrow why so

because Godzilla is also actually sign Fiction movie

from the adventure Jonah.

So now you have got three results from

three different decision trees from three different friends.

Now you compile the results of all those friends

and then you make a final call that yes,

would you like to watch edge of tomorrow or not?

So this is some very real time and very interesting example

where you can actually Implement random Forest

into ground reality right any questions so far.

So far, no,

that's good, and then we can move forward.

Now let us look at various domains

where random Forest is actually used.

So because of its diversity random Forest is actually used

in various diverse to means

like so beat banking beat

medicine beat land use beat marketing name it

and random Forest is there so in banking particularly

random Forest is being actually used to make it out

whether the applicant will be a default a pair

or it Will be non default of 1

so that it can accordingly

approve or reject the applications of loan,

right?

So that is how random Forest is being used in banking

talking about medicine.

Random.

Forest is widely used

in medicine field to predict beforehand.

What is the probability

if a person will actually have a particular disease or not?

Right?

So it's actually used to look at the various disease Trends.

Let's say you want to figure out what is the probability

that a person will have diabetes?

Not and so what would you do?

It'd probably look at the medical history

of the patient and then you will see or read.

This has been the glucose concentration.

What was the BMI?

What was the insulin levels

in the patient in the past previous three months.

What is the age of this particular person

and will make a different decision trees based on each one

of these predictor variables

and then you'll finally compiled the results

of all those variables and then you'll make a fine.

Final decision as

to whether the person will have diabetes

in the near future or not.

That is how random Forest will be used

in medicine sector now move.

Random Forest is also actually used to find out the land use.

For example, I want to set up a particular industry

in certain area.

So what would I probably look for a look for?

What is the vegetation over there?

What is the Urban population over there?

Right and how much is the Is from the nearest modes

of Transport like from the bus station

or the railway station and accordingly.

I will split my parameters

and I will make decision on each one of these parameters

and finally I'll compile my decision of all

these parameters in that will be my final outcome.

So that is how I am finally going to predict

whether I should put my industry

at this particular location or not.

Right?

So these three examples have actually been of majorly

around classification problem

because we are trying to classify

whether or not we're actually trying to answer this question

whether or not right now,

let's move forward and look

how marketing is revolving around random Forest.

So particularly in marketing

we try to identify the customer churn.

So this is particularly the regression kind

of problem right now

how let's see so customer churn

is nothing but actually the number of people

which are actually The number of customers

who are losing out.

So we're going out of your market.

Now you want to identify

what will be your customer churn in near future.

So you'll most of them

eCommerce Industries are actually using this

like Amazon Flipkart Etc.

So they particularly look at your each Behavior as to

what has been your past history.

What has been your purchasing history.

What do you like based on your activity

around certain things around certain ads around certain?

Discounts or around certain kind of materials right?

If you like a particular top your activity will be more

around that particular top.

So that is how they track each and every particular move

of yours and then they try to predict

whether you will be moving out or not.

So that is how they identify the customer churn.

So these all are various domains

where random Forest is used and this is

not the only list so there are numerous other examples

which are Chile are using

random forests that makes it so special actually.

Now, let's move forward and see how random

Forest actually works.

Right.

So let us start with the random Forest algorithm first.

Let's just see it step

by step as to how random Forest algorithm works.

So the first step is to actually select

certain M features from T.

Where m is less than T.

So here T is the total number of the predictor variables

that you have

in your data set and out of those total predictor variables.

You will select some randomly some Features out of those now

why we are actually selecting a few features only.

The reason is

that if you will select all the predictive variables

or the total predictor variables then each of your decision tree

will be same.

So the model is not actually learning something new.

It is learning the same previous thing

because all those decision trees will be similar,

right if you actually split your predicted variables

and you select randomly a few predicted variables only.

Let's say there are 14 total number of variables and out

of those you randomly pick just three right?

So every time you will get a new decision tree,

so there will be variety.

Right?

So the classification model will be actually

much more intelligent than the previous one.

Now.

It has got barrier to experiences.

So definitely it will make different decisions each time.

And then when you will compile all those different decisions,

it will be a new more accurate.

An efficient result right?

So the first important step is to select certain number

of features out of all the features now,

let's move on to the second step.

Let's say for any node D. Now.

The first step is to calculate the best plate at that point.

So, you know that decision tree

how decision trees actually implemented so

you pick up a the most significant variable right?

And then you will split that particular node

into Other child nodes

that is how the split takes place, right?

So you will do it for M number of variables

that you have selected.

Let's say you have selected three

so you will implement the split at all.

Those three nodes in one particular decision tree,

right the third step is split up the node

into two daughter nodes.

So now you can split your root note

into as many notes

as you want to put hair will split our node

into 2.2 notes as to this

or that so it will be an answer in terms of You saw that right?

Our fourth step will be to repeat all these 3 steps

that we've done previously

and we'll repeat all this splitting

until we have reached all the N number of nodes.

Right?

So we need to repeat

until we have reached till the leaf nodes

of a decision tree.

That is how we will do it right now after these four steps.

We will have our one decision tree.

But random Forest is actually about multiple.

Asian trees.

So here our fifth step will come into the picture

which will actually repeat all these previous steps

for D number of times now hit these the D number

of decision trees.

Let's say I want to implement five decision trees.

So my first step

will be to implement all the previous steps 5 times.

So the head the eye tration is 4/5 number of times right now.

Once I have created

these five decision trees still my task is not complete yet.

On my final task will be to compile the results

of all these five different decision trees

and I will make a call

in the majority voting right here.

As you can see in this picture.

I had in different instances.

Then I created n different decision trees.

And finally I will compile the result of all these n

different decision trees

and I will take my call on the majority voting right.

So whatever my majority vote says

that will be My final result.

So this is basically an overview of the random Forest algorithm

how it actually works.

Let's just have a look at this example to get

much better understanding of what we have learnt.

So let's say I have this data set

which consists of four different instances, right?

So basically it consists of the weather information

of previous 14 days right from D1 tildy 14,

and this basically Outlook humidity and wind

is Click gives me the better condition

of those 14 days.

And finally I have play

which is my target variable weather match did take place

on that particular day or not right.

Now.

My main goal is to find out

whether the match will actually take place

if I have following these weather conditions

with me on any particular day.

Let's say the Outlook is rainy that day

and humidity is high and the wind is very weak.

So now I need to predict

whether I will be able to play in the match.

That they are not.

All right.

So this is a problem statement fine.

Now, let's see how random Forest is used in this to sort it out.

Now here the first step is to actually split

my entire data set into subsets here.

I have split my entire 14 variables into further

smaller subsets right now these subsets may

or may not overlap

like there is certain overlapping between d 1 till D3

and D3 till D6 fine.

Is an overlapping of D3

so it might happen that there might be overlapping

so you need not really worry about the overlapping

but you have to make sure

that all those subsets are actually different right?

So here I have taken three different subsets

my first subset consists of D1

till D3 Mexican subset consists of D3

till D6 and methods subset consists of D7 tildy.

Now now I will first be focusing on my first upset now here,

let's say that particular day

the Outlook was Overcast fine if yes,

it was overcast then the probabilities

that the match will take place.

So overcast is basically when your weather is too cloudy.

So if that is the condition then definitely the match

will take place and let's say it wasn't overcast.

Then you will consider these second most probable option

that will be the wind

and you will make a decision based on this now

whether wind was weak or strong if wind was weak,

then you will definitely go out and play them.

Judge as you would not so now the final outcome

out of this decision tree will be Play

Because here the ratio between the play

and no play is to is to 1

so we get to a certain decision from a first decision tree.

Now, let us look at the second subset now

since second subset has different number of variables.

So that is why this decision trees absolutely different from

what we saw in our four subsets.

So let's say if it was overcast then you will play the match

if It isn't the overcast in you would go

and look out for humidity.

Now further.

It will get split into two whether it was high or normal.

Now, we'll take the first case

if the humidity was high and when it was week,

then you will play the match else

if humidity was high but wind was too strong,

then you would not go out and play the match right now.

Let us look at the second dot to node of humidity

if the humidity was normal.

The wind was weak.

Then you will definitely go out and play the match

as you want go out and play the match.

So here if you look at the final result,

then the ratio of placed no play is 3 is to 2 then again.

The final outcome is actually play, right?

So from second subset,

we get the final decision of play now,

let us look at our third subset

which consists of D7 till D9 here

if again the overcast is yes, then you will play a match.

Each else you will go and check out for humidity.

And if the humidity is really high then you

won't play the match else.

You will play the match again the probability

of playing the matches.

Yes, because the ratio of no play is Twist one, right?

So three different subsets three different decision trees

three different outcomes

and one final outcome after compiling all the results

from these three different decision trees are so I This

gives a better perspective better understanding

of random Forest like how it really works.

All right.

So now let's just have a look

at various features of random Forest Ray.

So the first and the foremost feature is

that it is one

of the most accurate learning algorithms, right?

So why it is so

because single decision trees are actually prone

to having high variance

or Hive bias and on the contrary actually.

M4s, it averages the entire variance

across the decision trees.

So let's say

if the variances say X4 decision tree,

but for random Forest,

let's say we have implemented n number

of decision trees parallely.

So my entire variance gets averaged to upon

and my final variance actually becomes X upon n so

that is how the entire variance actually goes down

as compared to other algorithms.

Now second most important feature is

that it works well for both classification

and regression problems

and by far I have come across this is one

and the only algorithm

which works equally well for both of them

these classification kind of problem or a regression kind

of problem, right?

Then it's really runs efficient on large databases.

So basically it's really scalable.

Even if you work for the lesser amount of database

or if you work for a really huge volume of data, right?

So that's a very good part about it.

Then the fourth most important point is

that it requires almost no input preparation.

Now, why am I saying this is

because it has got certain implicit methods,

which actually take care and All the outliers

and all the missing data

and you really don't have to take care about all that thing

while you are in the stages of input preparations.

So Random Forest is all here to take care

of everything else and next.

Is it performs implicit feature selection, right?

So while we are implementing multiple decision trees,

so it has got implicit method

which will automatically pick up some random features out.

Of all your parameters and then it will go

on and implementing different decision trees.

So for example,

if you just give one simple command

that all right,

I want to implement 500 decision trees no matter

how so Random Forest will automatically take care

and it will Implement all those 500 decision trees

and those all 500 decision trees will be different

from each other and this is

because it has got implicit methods

which will automatically collect different parameters.

Out of all the variables that you have right?

Then it can be easily grown in parallel why it is so

because we are actually

implementing multiple decision trees and all

those decision trees are running

or all those decisions trees are actually

getting implemented parallely.

So if you say I want thousand trees to be implemented.

So all those thousand trees are getting implemented parallely.

So that is how the computation time reduces down.

Right, and the last point is

that it has got methods for balancing error

in unbalanced it

as it's now what exactly unbalanced data sets

are let me just give you an example of that.

So let's say you're working on a data set fine

and you create a random forest model and get

90% accuracy immediately.

Fantastic you think right.

So now you start diving deep you go a little deeper.

And you discovered

that 90% of that data actually belongs to just one class

damn your entire data set.

Your entire decision is actually biased

to just one particular class.

So Random Forest actually takes care of this thing

and it is really not biased

towards any particular decision tree or any particular variable

or any class.

So it has got methods which looks after it

and they does is all the balance of errors in your data sets.

So that's pretty much

about the features of random forests.

What is KNN algorithm

will K. Nearest neighbor is a simple algorithm

that stores all the available cases

and classify the new data

or case based on a similarity measure.

It suggests that

if you are similar to your neighbors,

then you are one of them, right?

For example,

if apple looks more similar to banana orange or Melon.

Rather than a monkey rat

or a cat then most likely Apple belong to the group of fruits.

All right.

Well in general Cayenne is used in Search application

where you are looking for similar items

that is when your task is some form of fine items

similar to this one.

Then you call this search as a Cayenne search.

But what is this KN KN?

Well this K denotes the number of nearest neighbor

which are voting class of the new data

or the testing data.

For example,

if k equal 1 then the testing data are given the same label

as a close this Ample in the training set similarly

if k equal to 3 the labels

of the three closes classes are checked and the most

common label is assigned to then testing data.

So this is

what a KN KN algorithm means so moving on ahead.

Let's see some of the example of scenarios

where KN is used in the industry.

So, let's see the industrial application

of KNN algorithm starting with recommender system.

Well the biggest use case

of cayenne and search is a recommender system.

This recommended system is like an automated form

of a shop counter guy when you asked him for a product.

Not only shows you the product

but also suggest you or displays your relevant set of products,

which are related to the item.

You're already interested in buying this KNN algorithm

applies to recommending products like an Amazon

or for recommending media,

like in case of Netflix or even for recommending advertisement

to display to a user

if I'm not wrong almost all of you must have used Amazon

for shopping, right?

So just to tell you more than 35% of amazon.com revenue

is generated by its recommendation engine.

So what's their strategy Amazon uses?

Recommendation as a targeted marketing tool

in both the email campaigns

around most of its website

Pages Amazon will recommend many products

from different categories based on what you have browser

and it will pull those products in front of you

which you are likely to buy

like the frequently bought together option

that comes at the bottom of the product page to tempt you

into buying the combo.

Well, this recommendation has just one main goal

that is increase average order value or to upsell

and cross-sell customers by providing product suggestion

based on items in the shopping cart,

or On the product they are currently looking at on site.

So next industrial application of KNN

algorithm is concept search

or searching semantically similar documents

and classifying documents containing similar topics.

So as you know,

the data on the Internet is increasing exponentially

every single second.

There are billions and billions of documents on the internet

each document on the internet contains multiple Concepts,

that could be a potential concept.

Now, this is a situation

where the main problem is to extract concept

from a set of documents

as each page could have thousands of combination

that could be potential Concepts an average document could have

millions of concept combined

that the vast amount of data on the web.

Well, we are talking about an enormous amount

of data set and Sample.

So what we need is we need to find the concept

from the enormous amount of data set and samples, right?

So for this purpose,

we'll be using KNN algorithm more advanced example

could include handwriting detection like an OCR

or image recognization or even video recognization.

All right.

So now that you know various use cases

of KNN algorithm,

let's proceed and see how does it work.

So how does a KNN algorithm work?

Let's start by plotting these blue and orange

point on our graph.

So these Blue Points the belong to class A

and the orange ones they belong to class B.

Now you get a star as a new pony and your task is to predict

whether this new point it belongs to class A

or it belongs to the class B.

So to start the production, the very first thing

that you have to do is select the value of K,

just as I told you KN KN algorithm refers to the number

of nearest neighbors that you want to select for example,

in this case k equal to 3.

So what does it mean it means

that I am selecting three points

which are the least distance to the new point

or you can say I am selecting three different points

which are closest to the star.

Well at this point of time you can ask

how will you calculate the least distance?

So once you calculate the distance,

you will get one blue and two orange points

which are closest to this star now since in this case

as we have a majority of Inch point so you can see

that for k equal 3D star belongs to the class B,

or you can say

that the star is more similar to the orange points

moving on ahead.

Well, what if k equal to 6 well for this case,

you have to look for six different points

which are closest to this star.

So in this case after calculating the distance,

we find that we have four blue points

and two Orange Point

which are closest to the star now,

as you can see

that the blue points are in majority so you can say

that for k equals 6 this star belongs.

These two class A or the star is more similar to Blue Points.

So by now,

I guess you know how a KNN algorithm work.

And what is the significance of gain KNN algorithm.

So how will you choose the value of K?

So keeping in mind this case the most important parameter

in KNN algorithm.

So, let's see when you build a k nearest neighbor classifier.

How will you choose a value of K?

Well, you might have a specific value of K in mind

or you could divide up your data and use something

like cross-validation technique to test several values of K

in order to determine

which works best for your data.

Example if n equal 2,000 cases then

in that case the optimal value of K lies somewhere

in between 1 to 19.

But yes, unless you try it you cannot be sure of it.

So, you know how the algorithm is working on a higher level.

Let's move on and see

how things are predicted using KNN algorithm.

Remember I told you

the KNN algorithm uses the least distance measure

in order to find its nearest neighbors.

So let's see how these distances calculated.

Well, there are several distance measure

which can be used.

So to start with Will mainly focus on euclidean distance

in Manhattan distance in this session.

So what is this euclidean distance?

Well, this euclidean distance is defined as the square root

of the sum of difference between a new point x

and an existing Point why

so for example here we have Point P1 and P2 Point

P. 1 is 1 1 and point B 2 is 5

for so what is the euclidean distance between both of them?

So you can say that euclidean distance is

a direct distance between two points.

So what is the distance between the point P1 and P2?

So we Calculate it as 5 minus 1 whole square

plus 4 minus 1 whole square

and we can route it over which results to 5.

So next is the Manhattan distance.

Well, this Manhattan distance is used to calculate the distance

between real Vector using

the sum of their absolute difference in this case.

The Manhattan distance between the point P1

and P2 is mod of 5 minus 1 plus mod value of 4 minus 1,

which results to 3 plus 4.

That is 7.

So this slide shows the difference between euclidean

and Manhattan distance from point A to point B.

So euclidean distance is nothing but the direct

or the least possible distance between A and B.

Whereas the Manhattan distance is a distance between A

and B measured along the axis at right angle.

Let's take an example and see

how things are predicted using KNN algorithm

or how the cannon algorithm is working suppose.

We have data set which consists of height weight

and T-shirt size of some customers.

Now when a new customer come we only have is height.

And wait as the information now our task is to predict.

What is the T-shirt size of that particular customer?

So for this will be using the KNN algorithm.

So the very first thing what we need to do,

we need to calculate the euclidean distance.

So now that you have a new data of height 160 one centimeter

and weight as 61 kg.

So the very first thing

that we'll do is we'll calculate the euclidean distance,

which is nothing but the square root

of 160 1 minus 158 whole square

plus 61 minus 58 whole square and square root of that is 4.24.

Let's drag and drop it.

So these are the various euclidean distance

of other points.

Now, let's suppose k equal to 5 then the algorithm

what it does is it searches for the five customer

closest to the new customer

that is most similar to the new data in terms

of its attribute for k equal 5.

Let's find the top five minimum euclidian distance.

So these are the distance

which we are going to use one two,

three, four and five.

So let's rank them in the order first.

This is second.

This is third then this one is Forward and again,

this one is five.

So there's our order.

So for k equal 5 we have for t-shirts

which come under size M and one t-shirt

which comes under size l

so obviously best guess for the best prediction

for the T-shirt size of white 161 centimeters and wait

60 1 kg is M.

Or you can say that a new customer fit

into size M. Well this was all about the theoretical session.

But before we drill down to the coding part,

let me just tell you why people call KN as a lazy learner.

Well KN for classification.

Ocean is a very simple algorithm.

But that's not why they are called lazy KN is a lazy learner

because it doesn't have a discriminative function

from the training data.

But what it does it memorizes the training data,

there is no learning phase of the model and all

of the work happens at the time.

Your prediction is requested.

So as such there's the reason why KN is often referred

to us lazy learning algorithm.

So this was all about the theoretical session now,

let's move on to the coding part.

So for the Practical

implementation of the Hands-On part,

I'll be using the artists data set

so This data set consists of 150 observation.

We have four features

and one class label the four features include

the sepal length sepal width petal length and the petrol head

whereas the class label decides which flower belongs

to which category.

So this was the description of the data set,

which we are using now,

let's move on and see what are the step

by step solution to perform a KNN algorithm.

So first, we'll start by handling the data

what we have to do we have to open the data set

from the CSV format

and split the data set into train and test part next.

We'll take the Clarity where we have to calculate the distance

between two data instances.

Once we calculate the distance next we'll look for the neighbor

and select K Neighbors

which are having the least distance from a new point.

Now once we get our neighbor,

then we'll generate a response from a set of data instances.

So this will decide

whether the new Point belongs to class A or Class B.

Finally will create the accuracy function

and in the end.

We'll tie it all together in the main function.

So let's start with our code

for implementing KNN algorithm using python.

I'll be using Java.

Old book by Don 3.0 installed on it.

Now.

Let's move on and see

how can an algorithm can be implemented using python.

So there's my jupyter notebook,

which is a web-based interactive Computing notebook environment

with python 3.0 installed on it.

So the launch its launching so there's our jupyter notebook

and we'll be riding our python codes on it.

So the first thing

that we need to do is load our file our data

is in CSV format without a header line

or any code we can open the file the open function

and read the data line using the reader function.

In the CSV module.

So let's write a code to load our data file.

Let's execute the Run button.

So once you execute the Run button,

you can see the entire training data set as the output next.

We need to split the data into a training data set

that KN can use to make prediction and a test data set

that we can use to evaluate the accuracy of the model.

So we first need to convert the flower measure

that will load it as string into numbers

that we can work next.

We need to split the data set randomly to train and test.

Ratio 67's 233 for test is to train as a standard ratio,

which is used for this purpose.

So let's define a function

as load data set

that loads a CSV with the provided file

named and split it randomly into training

and test data set using the provided split ratio.

So this is our function load data set which is using filename

split ratio training data set

and testing data set as its input.

All right.

So let's execute the Run button and check for any errors.

So it's executed with zero errors.

Let's test this function.

So there's our training set testing set load data set.

So this is our function load data set on inside

that we are passing.

Our file is data with a split ratio of 0.66

and training data set and test data set.

Let's see what our training data set and test data set.

It's dividing into so it's giving a count

of training data set and testing data set.

The total number of training data set

as split into is 97 and total number

of test data set we have is 53.

So total number of training data set we have here is 97 and total

number of test data set we have here is 53.

All right.

Okay, so Function load data set is performing.

Well, so let's move on to step two

which is similarity.

So in order to make prediction,

we need to calculate the similarity between

any two given data instances.

This is needed

so that we can locate the kamo similar data instances

in the training data set are in turn make a prediction given

that all for flower measurement are numeric and have same unit.

We can directly use the euclidean distance measure.

This is nothing but the square root of the sum

of squared differences between two areas

of the number given

that all the for flower Are numeric and have

same unit we can directly use the euclidean distance measure

which is nothing but the square root of the sum

of squared difference between two areas

or the number additionally we want to control

which field to include in the distance calculation.

So specifically we only want to include first for attribute.

So our approach will be to limit the euclidean distance

to a fixed length.

All right.

So let's define our euclidean function.

So this are euclidean distance function

which takes instance one instance to and length as

parameters instance 1 and ends.

These two are the two points

of which you want to calculate the euclidean distance,

whereas this length and denote

that how many attributes you want to include?

Okay.

So there's our euclidean function.

Let's execute it.

It's executing fine without any errors.

Let's test the function suppose the data one

or the first instance consists of the data point has two to two

and it belongs to class A

and data to consist of four for four

and it belongs to class P.

So when we calculate the euclidean distance

of data one to data to and

what we have to do we have to consider only

first three features of them.

All right.

So let's print the distance as you can see here.

The distance comes out to be three point

four six four now like so this is nothing

but the square root of 4 minus 2 whole Square.

So this distance is nothing but the euclidean distance

and it is calculated as square root of 4 minus 2 whole square

plus 4 minus 2 whole square

that is nothing but 3 times of 4 minus 2 whole square

that is 12 + square root

of 12 is nothing but 3.46 for all right.

So now that we have calculated the distance now we need to look

for K nearest.

Neighbors now that we have a similarity measure

we can use it to collect the kamo similar instances

for a given unseen instance.

Well, this is a straightforward process

of calculating the distance for all the instances

and selecting a subset with the smallest distance value.

And now what we have to do we have to select

the smallest distance values.

So for that will be defining a function

as get neighbors.

So for that

what we will be doing will be defining a function

as get neighbors

what it will do it will return the K most similar Neighbors

From the training set for a given test instance.

All right, so this is how our get neighbors In look

like it takes training data set

and test instance and K as its input here.

The K is nothing but the number

of nearest neighbor you want to check for.

All right.

So basically what you'll be getting

from this get Mabel's function is K different points

having least euclidean distance from the test instance.

All right, let's execute it.

So the function executed without any errors.

So let's test our function.

So suppose the training data set includes the data like to to

to and it belongs to class A

and other data includes four four four and it belongs

to class P and at testing and Census 555 or now,

we have to predict

whether this test instance belongs to class A

or it belongs to class be.

All right for k equal 1 we have to predict

its nearest neighbor and predict

whether this test instance it will belong to class A

or will it belong to class be?

Alright.

So let's execute the Run button.

All right.

So an executing the Run button you can see

that we have output as for for for

and be a new instance

5 5 5 is closes 2.44 for which belongs to class be.

All right.

Now once you have located the most similar neighbor

for a test instance next task is to predict a response based

on those neighbors.

So how we can do that.

Well, we can do this

by allowing each neighbor to vote for the class attribute

and take the majority vote as a prediction.

Let's see how we can do that.

So we are function

as getresponse with takes neighbors as the input.

Well, this neighbor was nothing but the output of this get me /

function the output

of get me were function will be fed to get response.

All right.

Let's execute the Run button.

It's executed.

Let's move ahead and test our function get response.

So we have a But as bun bun bun it belongs to class

A 2 2 2 it belongs to class a33.

It belongs to class B.

So this response, that's what it will do.

It will store the value of get response by passing

this neighbor value.

I like so what we want to check is we want to predict

whether that test instance final outcome will belongs

to class A or Class B.

When the neighbors are 1 1 1 a 2 2 A + 3 3 B.

So, let's check our response.

Now that we have created all the different function

which are required for a KNN algorithm.

So important main concern is

how do you evaluate the accuracy of the prediction

and easy way to evaluate

the accuracy of the model is to calculate a ratio

of the total correct prediction to all the protection made.

So for this I will be defining function

as get accuracy and inside

that I'll be passing my test data set

and the predictions get accuracy function check

get executed without any error.

Let's check it for a sample data set.

So we have our test data set as 1 1 1 It belongs to class A 2/2

which again belongs to class 3 3 3 which belongs to class B

and my predictions is for first test data.

It predicted latter belongs to class A which is true

for next it predicted that belongs to class C,

which is again to and for the next again and predictive

that it belongs to class A which is false in this case

cause the test data belongs to class be.

All right.

So in total we have to correct prediction out of three.

All right, so the ratio will be 2 by 3,

which is nothing but 66.6.

So our accuracy rate is 66.6.

It's so now that you have created all the function

that are required for KNN algorithm.

Let's compile them into one single main function.

Alright, so this is our main function

and we are using Iris data set

with a split of 0.67 and the value of K is 3 Let's see.

What is the accuracy score of this check

how accurate are modulus so in training data set,

we have a hundred and thirteen values

and then the test data set.

We have 37 values.

These are the predicted

and the actual values of the output.

Okay.

So in total we got an accuracy of 90s.

In point two nine percent, which is really very good.

Alright, so I hope the concept of this KNN algorithm is

here device in a world full of machine learning

and artificial intelligence surrounding almost everything

around us classification

and prediction is one

of the most important aspects of machine learning.

So before moving forward,

let's have a quick look at the agenda.

I'll start off this video by explaining you guys

what exactly is Nave biased

then we'll and what is Bayes theorem

which serves as a logic

behind the name pass algorithm going forward.

I'll explain the steps involved

in the neighbors algorithm one by one

and finally add finish of this video with a demo

on the Nave bass using the SQL own package noun

a bass is a simple but surprisingly powerful algorithm

from penetrative analysis.

It is a classification technique based on base theorem

with an assumption of Independence among predictors.

It comprises of two parts, which is name.

And bias in simple terms neighbors classifier assumes

that the presence of a particular feature

in a class is unrelated to the presence

of any other feature,

even if this features depend on each other

or upon the existence of the other features,

all of these properties independently contribute

to the probability

whether a fruit is an apple or an orange or a banana.

So that is why it

is known as naive now naive based model is easy to build

and particularly useful for very large data sets.

In probability Theory and statistics based theorem,

which is already known as the base law

or the base rule describes the probability of an event

based on prior knowledge of the conditions

that might be related to the event now paste

theorem is a way to figure out conditional probability.

The conditional probability is the probability

of an event happening given

that it has some relationship to one or more other events.

For example, your probability of getting a parking space

is connected to the time of the day you pass.

Where you park

and what conventions are you going on at that time

based Serum is slightly more nuanced in a nutshell.

It gives you an actual probability of an event given

information about the tests.

Now, if you look at the definition

of Bayes theorem,

we can see that given a hypothesis H

and the evidence e-base term states that

the relationship between the probability of the hypothesis

before getting the evidence

which is the P of H and the probability

of the hypothesis after getting the evidence

that P of H given e is defined as probability

of e given H into probability

of H divided by probability of e it's rather confusing, right?

So let's take an example to understand this theorem.

So suppose I have a deck of cards and

if a single card is drawn from the deck of playing cards,

the probability that the card is a king is for by 52

since there are four Kings in a standard deck of 52 cards.

Now if King is an event, this card is a king.

The probability of King is given as 4 by 52

that is equal to 1 by 13.

Now if the evidence is provided for instance someone looks

as the That the single card is a face card the probability

of King given

that it's a face can be calculated

using the base theorem by this formula.

The since every King is also a face card

the probability of face given

that it's a king is equal to 1

and since there are three face cards in each suit.

That is the chat king and queen.

The probability of the face card is equal to 12 by 52.

That is 3 by 30.

Now using Bayes theorem we can find out the probability

of King given that it's a face

so our final answer comes to 1 by 3,

which is also true.

So if you have a deck of cards

which has having only faces now there are three types of phases

which are the chat king and queen so the probability

that it's the king is 1 by 3.

Now.

This is the simple example of how based on works now

if we look at the proof as in how this Bayes theorem Evolved.

So here we have probability of a given p

and probability of B given a now for

a joint probability distribution over the sets A and B,

the probability of a intersection B,

the conditional probability of a given B is defined

as the probability

of a intersection B divided by probability of B,

and similarly probability of B,

given a is defined as probability of B intersection

a divided by probability of a now we can

Equate probability of a intersection p and probability

of B intersection a as both are the same thing now

from this method

as you can see,

we get our final base theorem proof,

which is the probability of a given b equals probability of B,

given a into probability

of P divided by the probability of a now

while this is the equation

that applies to any probability distribution

over the events A and B.

It has a particular nice interpretation in case

where a is represented as the hypothesis h

and B is represented

as some observed evidence e in that case the formula is p

of H given e is equal

to P of e given H

into probability of H divided by probability of e now

this relates the probability

of hypotheses before getting the evidence,

which is p of H to the probability

of the hypothesis after getting the evidence

which is p of H given e

for this reason P of H is known as the prior probability

while P of Each given e is known as the posterior probability

and the factor

that relates the two is known as the likelihood ratio Now using

this term space theorem can be rephrased

as the posterior probability equals.

The prior probability times the likelihood ratio.

So now that we know the maths

which is involved behind the baster.

Mm.

Let's see how we can implement this in real life scenario.

So suppose we have a data set.

In which we have the Outlook the humidity

and we need to find out

whether we should play or not on that day.

So the Outlook can be sunny overcast rain

and the humidity high normal

and the wind are categorized into two phases

which are the weak and the strong winds.

The first of all will create a frequency table using

each attribute of the data set.

So the frequency table for the Outlook looks

like this we have Sunny overcast and rainy the frequency table

of humidity looks like this

and Frequency table of when looks like this we have strong

and weak for wind and high and normal ranges for humidity.

So for each frequency table,

we will generate a likelihood table now now

the likelihood table contains the probability

of a particular day suppose we take the sunny

and we take the play as yes

and no so the probability of Sunny given

that we play yes is 3 by 10,

which is 0.3 the probability of X,

which is the probability of Sunny

Is equal to 5 by 14 now,

these are all the terms

which are just generated from the data

which we have a

and finally the probability of yes is 10 out of 14.

So if we have a look at the likelihood of yes given

that it's a sunny we can see using Bayes theorem.

It's the probability of Sunny given yes

into probability of s divided by the probability of Sunny.

So we have all the values here calculated.

So if you put that in our base serum equation,

we get the likelihood of yes.

A 0.59 similarly the likelihood

of no can also be calculated here is 0.40 now similarly.

We are going to create the likelihood table

for both the humidity

and the win there's a

for humidity the likelihood for yes given the humidity

is high is equal to 0.4 to and the probability

of playing know given the vent is high is 0.58.

The similarly for table wind the probability of he has given

that the wind is week is 0.75 and the probability of no given

that the win is week is 0.25 now suppose we have of day

which has high rain

which has high humidity and the wind is weak.

So should we play or not?

That's our for that?

We use the base theorem here again the likelihood

of yes on that day is equal

to the probability of Outlook rain given

that it's a yes into probability of Magic given that say yes,

and the probability of when that is we given

that it's we are playing yes into the probability of yes,

which equals to zero point zero one nine

and similarly the likelihood of know on that day is equal

to zero point zero one six.

Now if we look

at the probability of yes for that day

of playing we just need to divide it

with the likelihood some of both the yes

and no so the probability of playing tomorrow,

which is yes is 5

whereas the probability of not playing is equal to 0.45.

Now.

This is based upon the data which we already have with us.

So now that you have an idea of what exactly is named bias

how it works and we have seen

how it can be implemented on a particular data set.

Let's see where it is used in the industry.

The started with our first industrial use case,

which is news categorization or we can use

the term text classification to broaden the spectrum

of this algorithm news

in the web are rapidly growing in the era of Information Age

where each new site has its own different layout

and categorization for grouping news.

Now these heterogeneity

of layout and categorization cannot always satisfy

individual users need to remove these heterogeneity

and classifying the news articles.

Owing to the user preference is a formidable task companies

use web crawler to extract useful text

from HTML Pages the news articles

and each of these news articles

is then tokenized now these tokens are nothing

but the categories of the news now

in order to achieve better classification result.

We remove the less significant Words,

which are the stop was from the documents

or the Articles

and then we apply the Nave base classifier

for classifying the news contents based on the news.

Now this is by far one

of the best examples of Neighbors classifier,

which is Spam filtering.

Now.

It's the Nave Bayes classifier are

a popular statistical technique for email filtering.

They typically use bag of words features to identify

at the spam email

and approach commonly used in text classification as well.

Now it works by correlating the use of tokens,

but the spam and non-spam emails and then the Bayes theorem,

which I explained

earlier is used to calculate the probability

that an email is

or not a Spam so named by a Spam filtering is

a baseline technique for dealing with Spam

that container itself

to the emails need of an individual user

and give low false positive spam detection rates

that are generally acceptable to users.

It is one of the oldest ways of doing spam filtering

with its roots

in the 1990s particular words have particular probabilities

of occurring in spam.

And and legitimate email as well for instance.

Most emails users

will frequently encounter the world lottery

or the lucky draw a spam email,

but we'll sell them see it in other emails.

The filter doesn't know these probabilities in advance

and must be friends.

So it can build them up to train the filter.

The user must manually indicate

whether a new email is Spam or not for all the words

in each straining email.

The filter will adjust the probability

that each word will appear in a Spam or legitimate.

Owl in the database now

after training the word probabilities also known

as the likelihood functions are used to compute the probability

that an email with a particular set of words as in in belongs

to either category each word

in the email contributes the email spam probability.

This contribution is called the posterior probability

and is computed again using the base 0

then the email spam probability

is computed over all the verse in the email

and if the total exceeds a certain threshold say

Or 95% the filter will Mark the email as spam.

Now object detection is the process of finding instances

of real-world objects such as faces bicycles

and buildings in images

or video now object detection

algorithm typically use extracted features

and learning algorithm

to recognize instance of an object category here again,

a bass plays an important role of categorization

and classification of object now medical area.

This is increasingly voluminous amount of electronic data,

which are becoming more and more complicated.

The produced medical data has certain characteristics

that make the analysis very challenging and attractive

as well among all the different approaches.

The knave bias is used.

It is the most effective and efficient classification

algorithm and has been successfully applied

to many medical problems empirical comparison

of knave bias versus five popular classifiers

on Medical data sets shows

that may bias is well suited for medical application and has

high performance in most of the examine medical problems.

Now in the past various testicle methods have been used

for modeling in the area of disease diagnosis.

These methods require prior assumptions and are

less capable of dealing

with massive and complicated nonlinear and dependent data one

of the main advantages of neighbor as approach

which is appealing to Physicians is

that all the available information is used?

To explain the decision this explanation seems

to be natural for medical diagnosis and prognosis.

That is it is very close to the way

how physician diagnosed patients now weather is one

of the most influential factor in our daily life to an extent

that it may affect the economy of a country

that depends on occupation like agriculture.

Therefore as a countermeasure to reduce the damage

caused by uncertainty in whether Behavior,

there should be an efficient way to print the weather now

whether projecting has Challenging problem

in the meteorological department

since ears even after the technology skill

and scientific advancement the accuracy

and protection of weather has never been sufficient even

in current day this domain remains as a research topic

in which scientists

and mathematicians are working to produce a model

or an algorithm

that will accurately predict the weather now

a bias in approach based model is created by

where posterior probabilities are used to calculate

the likelihood of each class label for input.

Data instance and the one with the maximum likelihood

is considered as the resulting output now earlier.

We saw a small implementation of this algorithm as well

where we predicted

whether we should play or not based on the data,

which we have collected earlier.

Now, this is a python Library

which is known as scikit-learn it helps to build in a bias

and model in Python.

Now, there are three types of named by ass model

under scikit-learn Library.

The first one is the caution.

It is used in classification and it Assumes

that the feature follow a normal distribution.

The next we have is multinomial.

It is used for discrete counts.

For example, let's say we have a text classification problem

and here we consider bernouli trials,

which is one step further

and instead of word occurring in the document.

We have count

how often word occurs

in the document you can think of it

as a number of times outcomes number is observed

in the given number of Trials.

And finally we have the bernouli type.

Of neighbors.

The binomial model is useful

if your feature vectors are binary bag of words model

where the once

and the zeros are words occur in the document and the verse

which do not occur

in the document respectively based on their data set.

You can choose any of the given discussed model here,

which is the gaussian the multinomial or the bernouli.

So let's understand how this algorithm works.

And what are the different steps?

One can take to create a bison model and use knave bias

to predict the output so here to understand better.

We are going to predict the onset of diabetes Now

this problem comprises

of 768 observations of medical details

for Pima Indian patients.

The record describes instantaneous measurement taken

from the patient such as the age the number

of times pregnant

and the blood work crew now all the patients are women aged 21

and Older and all the attributes are numeric

and the unit's vary from attribute to attribute.

Each record has a class value that indicate

whether the patient suffered on onset of diabetes

within five years are the measurements.

These are classified as 0 now.

I've broken the whole process down into the following steps.

The first step is handling the data

in which we load the data from the CSV file and split it

into training and test it

as it's the second step is summarizing the data.

In which we summarize

the properties in the training data sets so that we

can calculate the probabilities and make predictions.

Now the third step comes is making a particular prediction.

We use the summaries

of the data set to generate a single prediction.

And after that we generate predictions given a test data

set and a summarized training data sets.

And finally we evaluate

the accuracy of the predictions made for a test data set

as the percentage correct out of all the predictions made

and finally We tied together and form.

Our own model of nape is classifier.

Now.

The first thing we need to do is load our data the data is

in the CSV format without a header line

or any codes.

We can open the file with the open function

and read the data lines using the read functions

in the CSV module.

Now, we also need to convert the attributes

that were loaded as strings into numbers

so that we can work with them.

So let me show you

how this can be implemented now for that you need to Tall python

on a system and use the jupyter notebook

or the python shell.

Hey, I'm using the Anaconda Navigator

which has all

the things required to do the programming in Python.

We have the Jupiter lab.

We have the notebook.

We have the QT console.

Even we have a studio as well.

So what you need to do is just install the Anaconda Navigator

it comes with the pre installed python also,

so the moment you click launch on The jupyter Notebook.

It will take you to the Jupiter homepage

in a local system and here you can do programming in Python.

So let me just rename it as by my India diabetes.

So first, we need to load the data set.

So I'm creating here a function load CSV now before that.

We need to import certain CSV the math

and the random method.

So as you can see,

I've created a load CSV function

which will take the pie my Indian diabetes

data dot CSV file using the CSV dot read a method

and then we are converting every element of that data set

into float originally all the ants are in string,

but we need to convert them into floor

for all calculation purposes.

The next we need to split the data into training data sets

that nay bias can use to make the prediction

and this data set

that we can use to evaluate the accuracy of the model.

We need to split the data set randomly into training

and testing data set in the ratio of usually

which is 7230.

But for this example,

I'm going to use 67

and 33 now 70 and 30 is a Ratio for testing algorithms

so you can play around with this number.

So this is our split data set function.

Now the Navy base model is comprised

of summary of the data in the training data set.

Now this summary is then used while making predictions.

Now the summary of the training data

collected involves the mean the standard deviation

of each attribute by class value now, for example,

if there are two class values and seven numerical attributes,

then we need a mean

and the standard deviation for each of these seven attributes

and the class value

which makes The 14 attributes summaries

so we can break the preparation

of this summary down into the following sub tasks

which are the separating data by class calculating mean

calculating standard deviation summarizing the data sets

and summarizing attributes by class.

So the first task is to separate

the training data set instances by class value

so that we can calculate statistics for each class.

We can do that by creating a map

of each class value to a list of instances

that belong to the class.

Class and sort the entire dataset of instances

into the appropriate list.

Now the separate by class function just the same.

So as you can see the function assumes

that the last attribute is the class value

the function returns a map of class value to the list

of data instances next.

We need to calculate the mean of each attribute

for a class value.

Now, the mean is

the central middle or the central tendency of the data

and we use it as a middle of our gaussian distribution

when Calculating the probabilities.

So this is our function for mean now.

We also need to calculate the standard deviation

of each attribute for a class value.

The standard deviation is calculated as a square root

of the variance

and the variance is calculated as the average

of the squared differences

for each attribute value

from the mean now one thing to note

that here is

that we are using n minus one method

which subtracts one

from the number of attributes values

when calculating the variance.

Now that we have the tools to summarize the data

for a given list of instances.

We can calculate the mean and standard deviation

for each attribute.

Now that's if function groups the values for each attribute

across our data instances into their own lists

so that we can compute the mean and standard deviation values

for each attribute.

Now next comes the summarizing attributes by class.

We can pull it all together by first separating.

Our training data sets into instances groped by class

then calculating the summaries for each a Should be now.

We are ready to make predictions using the summaries prepared

from our training data

making patients involved calculating the probability

that a given data instance belong to each class then

selecting the class

with the largest probability as a prediction.

Now we can divide this whole method into four tasks

which are the calculating gaussian probability density

function calculating class probability making a prediction

and then estimating the accuracy

now to calculate the gaussian probability density function.

We use the gaussian function to estimate the probability

of a given attribute value given the node mean

and the standard deviation of the attribute estimated

from the training data.

As you can see the parameters RX mean

and the standard deviation now

in the calculate probability function,

we calculate the exponent first then calculate the main division

this lets us fit the equation nicely into two lines.

Now, the next task

is calculating the class properties now

that we had can calculate the probability of an attribute

belonging to a class.

We can combine the probabilities of all the attributes values

for a data instance and come up with a probability

of the entire.

Our data instance belonging to the class.

So now that we have calculated the class properties.

It's time to finally make our first prediction now,

we can calculate the probability of the data instance belong

to each class value

and we can look for the largest probability

and return the associated class

and for that we are going to use this function predict

which uses the summaries

and the input Vector which is basically all the probabilities

which are being input for a particular label

now finally we can An estimate the accuracy

of the model by making predictions

for each data instances in our test data for that.

We use the get predictions method.

Now this method is used

to calculate the predictions based upon the test data sets

and the summary of the training data set.

Now, the predictions can be compared

to the class values in our test data set

and classification accuracy can be calculated as

an accuracy ratio between the zeros

and the hundred percent.

Now the get accuracy method will calculate this accuracy ratio.

Now finally to sum it all up.

We Define our main function we call all these methods

which we have defined earlier one by one to get

the Courtesy of the model which we have created.

So as you can see,

this is our main function in which we have the file name.

We have defined the split ratio.

We have the data set.

We have the training and test data set.

We are using the split data set method next.

We are using the summarized by class function using

the get protection and the get accuracy method as well.

So guys as you can see the output of this one gives us

that we are splitting the 768 Rose into 514

which is the training and 254

which is the test data set rows and the accuracy of this model

is 68% Now we can play with the amount of training

and test data sets which are to be used

so we can change the split ratio to seventies.

238 is 220 to get different sort of accuracy.

So suppose I change the split ratio from 0.67 20.8.

So as you can see,

we get the accuracy of 62 percent.

So splitting it into 0.67 gave us a better result

which was 68 percent.

So this is how you can Implement Navy bias caution classifier.

These are the step by step methods

which you need to do in case of using the Nave Bayes classifier,

but don't worry.

We do not need to write all this many lines

of code to make a model this with the second.

And I really comes into picture the scikit-learn library has

a predefined method

or as say a predefined function of nape bias,

which converts all of these lines,

of course into merely just two or three lines of codes.

So, let me just open another jupyter notebook.

So let me name it as sklearn a pass.

Now here we are going to use the most famous data set

which is the iris De Casa.

Now, the iris flower data set is a multivariate

data set introduced by the British statistician

and biologists Roland Fisher

and based on this fish is linear discriminant model this data set

became a typical test case

for many statistical classification techniques

in machine learning.

So here we are going to use the caution NB model,

which is already available in the sklearn.

As I mentioned earlier,

there were three types of Neighbors

which are the question multinomial and the bernouli.

So here we are going to use the caution and be model

which is already present in the SK loan Library,

which is the cycle in library.

So first of all,

what we need to do is import the sklearn data sets

and the metrics and we also need to import the caution NB Now

once all these libraries

are lowered we need to load the data set

which is the iris dataset.

The next what we need to do is fit a Nave

by a smaller to this data set.

So as you can see we have so easily defined the model

which is the gaussian NB which contains

all the programming

which I just showed you earlier all the methods

which are taking the input calculating the mean

the standard deviation separating it bike last

and finally making predictions.

Calculating the prediction accuracy.

All of this comes under the caution and be method

which is inside already present in the sklearn library.

We just need to fit it according to the data set

which we have so next

if we print the model we see which is the gaussian NB model.

The next what we need to do is make the predictions.

So the expected output is data set dot Target

and the projected is using the pretend model

and the model we are using is the cause in N be here.

Now to summarize the model

which created we calculate the confusion Matrix

and the classification report.

So guys, as you can see the classification to provide

we have the Precision of Point Ninety Six,

we have the recall of 0.96.

We have the F1 score

and the support and finally if we print our confusion Matrix,

as you can see it gives us this output.

So as you can see using the gaussian

and we method just putting it in the model

and using any of the data.

fitting the model

which you created into a particular data set

and getting the desired output is so easy

with the scikit-learn library

as we Mo support Vector machine is one

of the most effective machine learning classifier

and it has been used in various Fields

such as face recognition cancer classification

and so on today's session

is dedicated to how svm works the various features of svm

and how it Is used in the real world.

All right.

Okay.

Now let's move on and see what svm algorithm is all about.

So guys s VM

or support Vector machine is a supervised learning algorithm,

which is mainly used to classify data into different classes now

unlike most algorithms svm makes use of a hyperplane

which acts like a decision boundary

between the various classes

in general svm can be used to generate

multiple separating hyperplanes

so that the data is Divided into segments.

Okay, and each

of these segments will contain only one kind of data.

It's mainly used

for classification purpose wearing you want to classify

or data into two different segments depending

on the features of the data.

Now before moving any further,

let's discuss a few features of svm.

Like I mentioned earlier svm is a supervised learning algorithm.

This means that svm trains

on a set of labeled data svm studies the label training data

and then classifies any new input Data,

depending on what it learned

in the training phase a main advantage

of support Vector machine is

that it can be used for both classification

and regression problems.

All right.

Now even though svm is mainly known for classification the svr

which is the support Vector regressor is used

for regression problems.

All right, so svm can be used both for classification.

And for regression.

Now, this is one of the reasons why a lot of people prefer svm

because it's a very good classifier and along

That it is also used for regression.

Okay.

Another feature is the svm kernel functions svm can be used

for classifying nonlinear data

by using the kernel trick the kernel trick basically

means to transform your data into another dimension

so that you can easily draw a hyperplane

between the different classes of the data.

Alright, nonlinear data is basically data

which cannot be separated with a straight line.

Alright, so svm can even be used on nonlinear data sets.

You just have to use a A kernel functions to do this.

All right.

So guys, I hope you all are clear

with the basic concepts of svm.

Now, let's move on and look at how svm works

so there's an order to understand how svm Works

let's consider a small scenario now for a second pretend

that you own a firm.

Okay, and let's say that you have a problem

and you want to set up a fence to protect your rabbits

from the pack of wolves.

Okay, but where do you

build your films one way to get around?

The problem is to build a classifier based.

On the position of the rabbits and words in your pasture.

So what I'm telling you is you can classify the group

of rabbits as one group

and draw a decision boundary between the rabbits

and the world correct.

So if I do that and if I try to draw a decision boundary

between the rabbits and the Wolves,

it looks something like this.

Okay.

Now you can clearly build a fence along this line

in simple terms.

This is exactly

how SPM work it draws a decision boundary,

which is a hyperplane

between any New classes in order to separate them

or classify them now.

I know you're thinking how do you know

where to draw a hyperplane

the basic principle behind svm is to draw a hyperplane

that best separates the two classes

in our case the two glasses of the rabbits and the Wolves.

So you start off by drawing a random hyperplane

and then you check the distance between the hyperplane

and the closest data points

from each Club these closes on your is data points

to the hyperplane are known as support vectors.

And that's where the name comes from support Vector machine.

So basically the hyperplane is drawn

based on these support vectors.

So guys an optimal hyperplane will have

a maximum distance from each of these support vectors.

All right.

So basically the hyperplane which has the maximum distance

from the support vectors is the most optimal hyperplane

and this distance between the hyperplane

and the support vectors is known as the margin.

All right,

so to sum it up svm is used to classify data.

By using a hyper plane such that the distance

between the hyperplane and the support vectors is maximum.

So basically your margin has to be maximum.

All right, that way,

you know that you're actually separating your classes or add

because the distance between the two classes is maximum.

Okay.

Now, let's try to solve a problem.

Okay.

So let's say that I input a new data point.

Okay.

This is a new data point

and now I want to draw a hyper plane such

that it best separates the two classes.

Okay, so I start off by drawing a hyperplane.

Like this and then I check the distance

between the hyperplane and the support vectors.

Okay, so I'm trying to check

if the margin is maximum for this hyper plane,

but what if I draw a hyperplane which is like this?

All right.

Now I'm going to check the support vectors over here.

Then I'm going to check the distance

from the support vectors and for this hyperplane, it's clear

that the margin is more red.

When you compare the margin of the previous one

to this hyperplane.

It is more.

So the reason why I'm choosing this hyperplane is

because the Distance between the support vectors

and the hyperplane is maximum in this scenario.

Okay.

So guys, this is how you choose a hyperplane.

You basically have to make sure

that the hyper plane has a maximum.

Margin.

All right, it has to best separate the two classes.

All right.

Okay so far it was quite easy.

Our data was linearly separable

which means that you could draw a straight line

to separate the two classes.

All right, but what will you do?

If the data set is like this

you possibly can't draw a hyperplane like Is on it,

it doesn't separate the two classes at all.

So what do you do

in such situations now earlier in the session I mentioned

how a kernel can be used to transform data

into another dimension

that has a clear dividing margin between the classes of data.

Alright, so kernel functions offer the user this option

of transforming nonlinear spaces into linear ones.

Nonlinear data set is the one

that you can't separate using a straight line.

All right.

In order to deal with such data sets,

you're going to transform them into linear data sets

and then use svm on them.

Okay.

So simple trick would be to transform the two variables

X and Y into a new feature space involving

a new variable called Z.

All right, so guys so far we were plotting our data

on two dimensional space.

Correct?

We will only using the X

and the y axis so we had only those two variables X and Y now

in order to deal with this kind of data a simple trick.

Be to transform the two variables X

and Y into a new feature space involving a new variable

called Z. Okay,

so we're basically visualizing the data

on a three-dimensional space.

Now when you transform the 2D space into a 3D space

you can clearly see a dividing margin

between the two classes of data right now.

You can go ahead and separate the two classes

by drawing the best hyperplane between them.

Okay, that's exactly

what we discussed in the previous slides.

So guys, why don't you try this yourself dried.

Drawing a hyperplane,

which is the most Optimum for these two classes.

All right, so guys,

I hope you have a good understanding

about nonlinear svm's now.

Let's look at a real world use case

if support Vector machines.

So guys s VM

as a classifier has been used in cancer classification

since the early 2000s.

So there was an experiment held by a group of professionals

who applied svm in a colon cancer tissue classification.

So the data set consisted of about

Transmembrane protein samples

and only about 50 to 200 genes samples were input

Into the svm classifier

Now this sample

which was input

into the svm classifier had both colon cancer tissue samples

and normal colon tissue samples right now.

The main objective of this study was to classify Gene samples

based on whether they are cancerous or not.

Okay, so svm was trained using the 50 to 200 samples

in order to discriminate between non-tumor

from A tumor specimens.

So the performance

of the svm classifier was very accurate

for even a small data set.

All right, we had only 50 to 200 samples and even

for the small data set svm was pretty accurate

with this results.

Not only that its performance was compared

to other classification algorithms like naive Bayes

and in each case svm outperform naive Bayes.

So after this experiment it was clear

that svm classified the data more effectively and it

worked exceptionally good.

Small data sets.

Let's go ahead

and understand what exactly is unsupervised learning.

So sometimes the given data is unstructured and unlabeled

so it becomes difficult to classify the data

into different categories.

So unsupervised learning helps to solve this problem.

This learning is used to Cluster the input data

and classes on the basis of their statistical properties.

So example, we can cluster

Different Bikes based upon the speed limit there.

Acceleration or the average

that they are giving so

and suppose learning is a type of machine learning algorithm

used to draw inferences

from beta sets consisting of input data

without labeled responses.

So if you have a look at the workflow

or the process flow of unsupervised learning,

so the training data is collection of information

without any label.

We have the machine learning algorithm

and then we have the clustering malls.

So what it does is

that distributes the data into a different class.

And again, if you provide any unreliable new data,

it will make a prediction

and find out to which cluster that particular data

or the data set belongs

to or the particular data point belongs to so one

of the most important

algorithms in unsupervised learning is clustering.

So let's understand exactly what is clustering.

So a clustering

basically is the process of dividing the data sets

into groups consisting of similar data points.

It means grouping of objects based

on the information found in the data describing the object.

Objects or their relationships

so clustering malls focus on

and defying groups of similar records

and labeling records

according to the group to which they belong now this is done

without the benefit of prior knowledge

about the groups and their characteristics.

So and in fact,

we may not even know exactly how many groups are

there to look for.

Now.

These models are often referred to as

unsupervised learning models,

since there's no external standard by which to judge.

One is classification performance.

There are no right or wrong answers to these model.

And if we talk about why clustering is used

so the goal of clustering is to determine

the intrinsic group in a set of unlabeled data sometime.

The partitioning is the goal

or the purpose of clustering algorithm is to make sense

of and exact value

from the last set of structured and unstructured data.

So that is why clustering is used in the industry and

if you have a look at the video,

These use cases of clustering in the industry.

So first of all, it's being used in marketing.

So discovering distinct groups

in customer databases such as customers

who make a lot of long-distance calls customers

who use internet more

than cause they're also using insurance companies.

So like I need to find groups of Corporation insurance policy

holders with high average claim rate Farmers crash cops,

which is profitable.

They are using C Smith studies and defined problem areas of Oil

or gas exploration Based on seesmic data,

and they're also used in the recommendation of movies.

If you would say they are also used in Flickr photos.

They also used by Amazon

for recommending the product which category it lies in.

So basically if we talk

about clustering there are three types of clustering.

So first of all,

we have the exclusive clustering

which is the hard clustering so here and item belongs

exclusively to one cluster not several clusters

and the data point.

Along exclusively to one cluster.

So an example of this is the k-means clustering

so k-means clustering does this exclusive kind

of clustering so secondly,

we have overlapping clustering

so it is also known as soft clusters in this

and item can belong

to multiple clusters as its degree of association

with each cluster is shown and for example,

we have fuzzy or the c means clustering

which is being used for overlapping clustering

and finally we have The hierarchical clustering

so when two clusters have a parent-child relationship

or a tree-like structure,

then it is known as hierarchical cluster.

So as you can see here from the example,

we have a parent-child kind

of relationship in the cluster given here.

So let's understand

what exactly is K means clustering.

So k-means clustering is an algorithm whose main goal

is to group similar elements of data points into a cluster

and it is the process

by which objects are classified into a predefined number.

Of groups so that they are as much dissimilar as

possible from one group to another group

but as much as similar or possible within each group now

if you have a look at the algorithm working here, right?

So first of all,

it starts with and defying the number of clusters,

which is k then I can we find

the centroid we find the distance objects

to the distance object

to the centroid distance of object to the centroid

then we find the Dropping based

on the minimum distance has the centroid Converse

if true then we make a cluster false.

We then I can't find the centroid repeat

all of the steps again and again,

so let me show you

how exactly clustering was with an example here.

So first we need to decide the number

of clusters to be made now another important task here is

how to decide the important number of clusters

or how to decide the number of clusters really get

into that later.

So first, let's assume

that the number Number of clusters we have decided

is 3 so after that then we provide the centroids

for all the Clusters

which is guessing

and the algorithm calculates the euclidean distance

of the point from each centroid

and assigns the data point

to the closest cluster now euclidean distance.

All of you know is the square root

of the distance the square root of the square of the distance.

So next when the center is a calculated again,

we have our new clusters for each data point.

And again the distance from the points

to the new clusters are calculated and then again,

the points are assigned to the closest cluster.

And then again,

we have the new centroid

scattered and now these steps are repeated

until we have a repetition the centroids

or the new center eyes are very close to the very previous ones.

So antenna and less output gets repeated

or the outputs are very very close enough.

We do not stop this process.

We keep on calculating the euclidean distance.

It's of all the points to the centroids.

Then we calculate the new centroids

and that is how clay means clustering Works basically,

so an important part here is to understand

how to decide the value of K or the number of clusters

because it does not make any sense.

If you do not know

how many classes are you going to make?

So to decide the number of clusters,

we have the elbow method.

So let's assume first

of all compute the sum squared error,

which is the sse4 some value.

A for example,

let's take two four six and eight now the SS e

which is the sum squared is defined as a sum

of the squared distance between each number member

of the cluster

and its centroid mathematically and

if you mathematically it is given by the equation

which is provided here.

And if you brought the key against the SSE,

you will see that the error decreases

as K gets large now this is

because the number of cluster increases

they should be smaller.

So does this torsion is also smaller know the idea

of the elbow method is to choose the K at which

the SSC decreases abruptly.

So for example here

if we have a look at the figure given here.

We see that the best number of cluster is at the elbow

as you can see here the graph here changes abruptly

after number four.

So for this particular example,

we're going to use for as a number of cluster.

So first of all while working

with k-means clustering there are two key points,

As to know first of all be careful about various start.

So choosing the first center

at random choosing the second center

that is far away from the first center similarly choosing

the NIH Center as far away as possible from the closest

of the all the other centers

and the second idea is to do as many runs

of k-means each with different random starting points

so that you get an idea of where exactly

and how many clusters you need to make and

where exactly the centroid lies.

And how the data is getting confused

now k-means is not exactly a very good method.

So let's understand the pros and cons of clay means clusterings.

We know that k-means is simple and understandable.

Everyone loves you

that the first go the items automatically assigned

to the Clusters.

Now if we have a look at the cons,

so first of all one needs to define the number of clusters,

there's a very heavy task asks us

if we have 3/4 or

if we have 10 categories and if we do not know

what the number of clusters are going to be.

It's Difficult for anyone to you know to guess the number

of clusters not all items are forced into clusters

whether they are actually belong to any other cluster

or any other category,

they are forced to to lie

in that other category in which they are closest

to this against happens because of the number

of clusters with not defining the correct number of clusters

or not being able to guess the correct number of clusters.

So and most of all it's unable

to handle the noisy data and the outliners because anyway,

As machine learning engineers

and data scientists have to clean the data.

But then again it comes down

to the analysis watch they are doing and the method

that they are using so typically people do not clean the data

for k-means clustering even

if the clean there's sometimes a now see noisy

and outliners data which affect the whole model

so that was all for k-means clustering.

So what we're going to do is now use k-means clustering

for the We data set

so we have to find out the number of clusters

and divide it accordingly.

So the use case is that first of all,

we have a data set of five thousand movies.

And what you want to do is grip them

if the movies into clusters based on the Facebook likes,

so guys, let's have a look at the demo here.

So first of all,

what we're going to do is import deep copy numpy pandas

Seaborn the various libraries,

which we're going to use now and from map popular videos.

In the use ply plot,

and we're going to use this ggplot and next

what we're going to do is import the data set

and look at the shape of the data is it

so if we have a look at the shape of the data set we can see

that it has 5043 rows with Twenty Eight columns.

And if you have a look

at the head of the data set we can see it has 5043 data points,

so What we're going to do is place the data points

in the plot me take the director Facebook likes

and we have a look at the data columns face number

and post cars total

Facebook likes director Facebook likes.

So what we have done here

now is taking the director Facebook likes and the actor

three Facebook likes, right.

So we have five thousand forty three rows

and two columns Now using the k-means from sklearn

what we're going to do is import it.

First we're going to import k-means

from sklearn dot cluster.

Remember guys Escalon is a very important library

in Python for machine learning.

So and the number of cluster

what we're going to do is provide as five now this again,

the number of cluster depends upon the SSE,

which is the sum of squared errors

or the we're going to use the elbow method.

So I'm not going to go into the details of that again.

So we're going to fit the data into the k-means to fit and

if you find the cluster,

Us then for the k-means and printed.

So what we find is is an array of five clusters

and Fa print the label of the Caymans cluster.

Now next what we're going to do is plot the data

which we have with the Clusters with the new data clusters,

which we have found and for this we're going

to use the si bon and as you can see here,

we have plotted that car.

We have plotted the data into the grid

and You can see here.

We have five clusters.

So probably what I would say is

that the cluster 3 and the cluster

zero are very very close.

So it might depend see that's exactly

what I was going to say.

Is that initially the main Challenge

and k-means clustering is to define the number of centers

which are the K.

So as you can see here

that the third Center

and the zeroth cluster the third cluster

and the zeroth cluster up very very close to each other

so It probably could have been in one another cluster

and the another disadvantage was

that we do not exactly know

how the points are to be arranged.

So it's very difficult to force the data into any other cluster

which makes our analysis a little different works fine.

But sometimes it might be difficult to code

in the k-means clustering now,

let's understand what exactly is seems clustering.

So the fuzzy c means

is an extension of the k-means clustering the popular simple.

Clustering technique so fuzzy clustering also referred

as soft clustering is a form

of clustering in which each data point can belong

to more than one cluster.

So k-means tries to find the heart clusters

where each point belongs to one cluster.

Whereas the fuzzy c means discovers the soft clusters

in a soft cluster any point can belong

to more than one cluster

at a time with a certain Affinity value

towards each 4zc means assigns the degree of membership,

which Just from 0 to 1 to an object to a given cluster.

So there is a stipulation that the sum of the membership

of an object to all the cluster.

It belongs to must be equal to 1 so the degree of membership

of this particular point to pull of these clusters as 0.6 0.4.

And if you add up we get 1

so that is one of the logic behind the fuzzy c means

so and and this Affinity is proportional to the distance

from the point to the center of the cluster now then again

Now we have the pros and cons of fuzzy see means.

So first of all,

it allows a data point to be in multiple cluster.

That's a pro.

It's a more neutral representation of the behavior

of jeans jeans usually are involved in multiple functions.

So it is a very good type of clustering

when we're talking about genes First of and again,

if we talk about the cons again,

we have to Define c which is the number

of clusters same as K next.

We need to determine the membership cutoff value also,

so that takes a lot of Time and it's time-consuming

and the Clusters

are sensitive to initial assignment of centroid.

So a slight change

or deviation from the center has it's going to result

in a very different kind of, you know,

a funny kind of output we get from the fuzzy c means and one

of the major disadvantage of see means clustering is

that it's this are non deterministic algorithm.

So it does not give you

a particular output as in such that's

that now let's have a look.

At the third type

of clustering which is the hierarchical clustering.

So hierarchical clustering is an alternative approach

which builds a hierarchy from the bottom up

or the top to bottom

and does not require to specify the number

of clusters beforehand.

Now, the algorithm works as in first of all,

we put each data point in its own cluster and

if I the closest to Cluster

and combine them into one more cluster repeat the above step

till the data points are in a single cluster.

Now, there are two types of hierarchical clustering one is

I've number 80 plus string

and the other one is division clustering.

So a commemorative clustering bills the dendogram

from bottom level

while the division clustering it starts all the data points

in one cluster the fruit cluster now again

hierarchical clustering also has some sort of pros and cons.

So in the pros don't know Assumption

of a particular number of cluster is required

and it may correspond to meaningful taxonomist.

Whereas if we talk about the cons

once a decision is made to combine two clusters.

Has it cannot be undone and one

of the major disadvantage of these hierarchical clustering is

that it becomes very slow.

If we talked about very very large data sets and nowadays.

I think every industry are using last year as its and collecting

large amounts of data.

So hierarchical clustering is not the act

or the best method someone might need to go for so there's

that now when we talk about unsupervised learning,

so we have K means clustering and again,

Another important term

which people usually Miss while talking about us was running and

there's one very important concept

of Market Basket analysis.

Now, it is one of the key techniques

used by large retailers to uncover association

between items now it works by looking

for combination of items

that occur together frequently

in the transactions to put it in other way.

It allows retailers to identify the relationships

between the items

that the People by for example people

who buy bread also tend to buy butter the marketing team

at the retail stores should Target customers

who buy bread and butter and provide them and offer

so that they buy a third item like an egg.

So if a customer buys bread and butter and sees a discount

or an offer on X,

he will be encouraged to spend more money and buy the eggs.

Now, this is what Market Basket analysis is all about now

to find the association between the two items

and make predictions about what the customers will buy.

There are two Cartoons which are the association rule Mining

and the ebrary algorithms.

So let's discuss each of these algorithm

with an example.

First of all,

if we have a look at the association rule mining now,

it's a technique that's shows

how items are associated to each other for example customers

who purchased spread have a 60 percent likelihood

of also purchasing jam and customers

who purchase laptop are more likely to purchase laptop bags.

Now if you take an example of an association rule

if we have a look at the Example here a arrow B.

It means that

if a person buys an atom a then he will also buy an atom P. Now.

There are three common ways to measure a particular Association

because we have to find these rules not on the basis

of some statistics, right?

So what we do is use

support confidence and lift now these three common ways

and the measures to have a look

at the association rule Mining and know exactly

how good is that rule.

So first of all,

we have support So support gifts the fraction of the

Which contains an item A and B.

So it's basically the frequency of the item

in the whole item set.

Where's confidence gifts

how often the item A and B occurred together

given the number of item given the number

of times a occur.

So it's frequency a comma B divided by

the frequency of a now left

what indicates is the strength of the rule over the random

co-occurrence of A and B.

If you have a close look at the denominator

of the lift formula here,

we have support a into support be and now a major thing

which can be noted from this is

that the support of A and B are independent here.

So if the value of lift

or the denominator value of the lift is more it means

that the items are independently selling more not together.

So that in turn will decrease the value of lift.

So what happens is

that suppose the value of lift is more that implies

that the rule which we get.

It implies that the rule is strong and it

And we used for later purposes because in that case the support

in to support P value,

which is the denominator of lift will be low

which in turn means

that there is a relationship between the items in the and B.

So let's take an example of Association rule Mining

and understand how exactly it works.

So let's suppose we have a set of items a b c d

and e and we have the set of transactions

which are T1 T2,

T3, T4 and T5

and what we need to do is create some sort of Rules,

for example, you can see a d

which means that if a person buys a he buys D

if a person by see he buys a if a person buys a he by C.

And for the fourth one is

if a person by B and C Hill in turn by a now

what we need to do is calculate the support confidence and lift

of these rules now here again,

we talked about a priority algorithm.

So a priori algorithm

and the association rule mining go hand in hand.

So what a predator This algorithm.

It uses the frequent itemsets to generate the association rules

and it is based on the concept

that a subset

of a frequent itemsets must also be a frequent Isom set.

So let's understand

what is a frequent item set and how all of these work together.

So if we take the following transactions of items,

we have transaction T 1 2 T 5 and the items are 1 3 4

2 3 5 1 2 3 5 2 5 and 1 3 5 now.

Now another more important thing about support

which I forgot to mention was

that when talking

about Association rule mining there is a minimum support count

what we need to do.

Now.

The first step is to build a list of items

that of size 1 using this transaction data

set and use the minimum support count to now,

let's see how we do that if we create the table see

when you have a close look at the table c 1

we have the items at one which has support three

because it appears in the transaction one.

Three and five similarly

if you have a look at the item set the single item 3.

So it has the support of for it appears

in t 1 T 2 T 3 and T 5 but

if we have a look at the item set for it only appears

in the transaction once

so it's support value is 1 now the item set with

the support value Which is less than the minimum support value

that is to have to be eliminated.

So the final table

which is a table F1 has one two three.

And five it does not contain the for now.

What we're going to do is create the item list of the size

2 and all the combination of the item sets in F1.

I used in this iteration.

So we're left for behind.

We just have 1 2 3 & 5.

So the possible item sets a 1 2 1 3

1 5 2 3 2 5 & 3 5 then again.

We will calculate the support So

in this case if we have a closer look at the table

c 2 we see that the items at once.

What to do is having

a support value 1 which has to be eliminated.

So the final table f 2 does not contain 1 comma 2 similarly

if we create the item sets

of size 3 and calculate this support values,

but before calculating the support, let's perform

the puring on the data set.

Now what's appearing?

So after all the combinations are made we divide the table

c 3 items to check

if there are another subset whose support is less

than the minimum support value.

This is a prairie algorithm.

So in the item sets one, two,

three what we can see that we have one two,

and in the one to five again,

we have one too so build this cardboard of these item sets

and we'll be left with 1 3 5 and 2 3 5.

So with one three five,

we have three subsets one five one, three three five,

which are present in table F2.

Then again.

We have two three to five and 3/5

which are also present in t will f 2

so we have 2 Move 1 comma 2 from the table c

3 and create the table F3 now

if you're using the items of C3 to create the atoms of C-4.

So what we find is

that we have the item set 1 2 3 5 the support value is

1 Which is less than the minimum support value of 2.

So what we're going to do is stop here

and we're going to return to the previous item set.

That is the table c 3 so the final table.

Well, if three was one three five with

the support value of 2 and 2 3 5 with the support value of 2 now,

what we're gonna do is generate all the subsets

of each frequent itemsets.

So let's assume

that minimum confidence value is 60% So for every subset s

of I the output rule is that s gives i2s

is that s recommends i ns.

If the support of I / support of s is greater than or equal.

Equal to the minimum confidence value,

then only will proceed further.

So keep in mind

that we have not used left till now.

We are only working with support and confidence.

So applying rules with item sets of F3

we get rule 1 which is 1 comma 3 which gives 1 3 5 and 1/3.

It means if you buy one and three there's a 66% chance

that you will buy item 5 also

similarly the rule 1 comma 5 it means

that If you buy one and five,

there's a hundred percent chance

that you will buy three also similarly

if we have a look at Rule 5 and 6 here

the confidence value is

less than 60 percent which was the assumed confidence value.

So what we're going to do is with reject these files now

an important thing to note here is

that have a closer look to the Rule 5 and root 3,

you see it has one five three one five three three point five.

It's very confusing.

So one thing to keep in Mine is

that the order of the item sets is also very important

that will help us allow create good rules

and avoid any kind of confusion.

So that's that.

So now let's learn

how Association rule I used in Market Basket analysis problems.

So what we'll do is we will be using

the online transactions data

of a retail store for generating Association rules.

So first of all,

what you need to do is import pandas MSD ml.

D&D libraries from the imported and read the data.

So first of all, what we're going to do is read the data,

what we're going to do is from M LX T

and E dot frequent patterns.

We're going to improve the a priori and Association rules.

As you can see here.

We have the head of the data.

You can see we have invoice number stock code

the description quantity

the invoice dt8 unit price customer ID and the country.

So in the next step,

what we will do is we will do the data cleanup

which includes removing.

His from some of the descriptions given

and what we're going to do is drop the rules

that do not have

the invoice numbers every move the crate transactions.

So hey, what what you're going to do is remove

which do not have any invoice number

if the string tight

ainst Epstein was a number then we're going to remove that.

Those are the credits remove any kind of spaces

from the descriptions.

So as you can see here,

we have like five hundred and thirty-two thousand rows

with eight columns.

So next one.

We're going to do is after the cleanup.

We need to consolidate the items into one transaction per row

with each product for the sake of keeping the data set small.

We're going to only look at the sales for France.

So we're going to use the only France and group

by invoice number description

with the quantity sum up and see so

which leaves us with three ninety two rows

and one thousand five hundred sixty three columns.

Now, there are a lot of zeros in the data,

but we also need to make sure Any positive values

are converted to a 1 and anything less than 0 is set to 0

so for that we're going to use this code defining

and code units

if x is less than 0 return 0 if x

is greater than 1 returned one.

So what we're going to do is map

and apply it to the whole data set we have here.

So now that we have structured data properly.

So the next step is to generate the frequent item set

that has support of at least seven percent.

Now this number is chosen so that you can get close enough.

Now, what we're going to do is generate the rules

with the corresponding support confidence and lift.

So we had given the minimum support a 0.7.

The metric is left frequent Island set

and threshold is 1 so these are

the following rules now a few rules with a high lift value,

which means that it occurs more frequently

than would be expected given the number of transaction

the product combinations most of the places the confidence.

Is high as well.

So these are few to observations what we get here.

If we filter the data frame using the standard pandas code

for large lift six and high confidence 0.8.

This is what the output is going to look like.

These are 1 2 3 4 5 6 7 8.

So as you can see here,

we have the H rules which are the final rules

which are given by the Association rule Mining

and this is how all the industries are.

Are any of these we've talked about largely retailers.

They tend to know

how their products are used and how exactly they

should rearrange and provide the offers on the product

so that people spend more and more money

and time in the shop.

So that was all about Association rule mining.

So so guys,

that's all for unsupervised learning.

I hope you got to know about the different formulas

how unsupervised learning works because you know,

we did not provide any label to the data.

All we did was create some rules and not knowing what the data is

and we did clusterings different types of clusterings

came in simi's hierarchical clustering.

The reinforcement learning is a part of machine learning

where an agent is put in an environment

and he learns to behave in this environment

by performing certain actions.

Okay, so it basically performs actions and it either gets

a rewards on the actions

or It gets a punishment and observing the reward

which it gets from those actions reinforcement learning is all

about taking an appropriate action in order

to maximize the reward in a particular situation.

So guys in supervised learning the training data comprises

of the input

and the expected output

and so the model is trained with the expected output itself,

but when it comes to reinforcement learning,

there is no expected output here the reinforcement agent decides.

What actions to take in order to perform a given task.

In the absence of a training data set it is bound to learn

from its experience itself.

All right.

So reinforcement learning is all about an agent

who's put in an unknown environment

and he's going to use a hit and trial method in order

to figure out the environment

and then come up with an outcome.

Okay.

Now, let's look at reinforcement learning

within an analogy.

So consider a scenario where in a baby is learning

how to walk the scenario can go about in two ways.

Now in the first case the baby starts walking

and makes it to the candy here.

The candy is basically the reward it's going to get so

since the candy is the end goal.

The baby is happy.

It's positive.

Okay, so the baby is happy and it gets rewarded a set

of candies now another way in which this could go is

that the baby starts walking

but Falls due to some hurdle in between the baby gets hurt

and it doesn't get any candy and obviously the baby is sad.

So this is a negative reward.

Okay, or you can say this is a setback.

So just like how we humans learn

from our mistakes by trial and error.

Learning is also similar.

Okay, so we have an agent

which is basically the baby and a reward

which is the candy over here.

Okay, and with many hurdles in between the agent is supposed

to find the best possible path to read through the reward.

So guys, I hope you all are clear

with the reinforcement learning.

Now.

Let's look at the reinforcement learning process.

So generally a reinforcement learning system has

two main components.

All right, the first is an agent and the second one is

an environment now in the previous case,

we saw that the agent was a baby.

B and the environment was the living room

where in the baby was crawling.

Okay.

The environment is the setting

that the agent is acting on and the agent over here

represents the reinforcement learning algorithm.

So guys the reinforcement learning process starts

when the environment sends a state to the agent

and then the agent will take some actions based

on the observations

in turn the environment will send the next state

and the respective reward back to the agent.

The agent will update

its knowledge with the reward returned by the I meant

and it uses that to evaluate its previous action.

So guys this Loop keeps continuing

until the environment sends a terminal state which means

that the agent has accomplished all his tasks

and he finally gets the reward.

Okay.

This is exactly

what was depicted in this scenario.

So the agent keeps climbing up ladders

until he reaches his reward to understand this better.

Let's suppose that our agent is learning to play Counter-Strike.

Okay, so let's break it down now initially the RL agent

which is Only the player player 1 let's say it's the player

1 who is trying to learn how to play the game.

Okay.

He collects some state from the environment.

Okay.

This could be the first state of Counter-Strike now based

on the state the agent will take some action.

Okay, and this action can be anything

that causes a result.

So if the player moves left

or right it's also considered as an action.

Okay.

So initially the action is going to be random

because obviously the first time you pick up Counter-Strike,

you're not going to be a master at it.

So you're going to try with different actions

and you're just going to Up a random action

in the beginning.

Now the environment is going to give a new state.

So after clearing

that the environment is now going to give a new state

to the agent or to the player.

So maybe he's across stage 1 now.

He's in stage 2.

So now the player

will get a reward our one from the environment

because it cleared stage 1.

So this reward can be anything.

It can be additional points or coins or anything like that.

Okay.

So basically this Loop keeps going on

until the player is dead or reaches the destination.

Okay, and it Continuously outputs a sequence

of States actions and rewards.

So guys.

This was a small example to show

you how reinforcement learning process works.

So you start with an initial State

and once a player clothes that state he gets a reward

after that the environment will give another stage to the player

and after it clears that state it's going to get another reward

and it's going to keep happening

until the player reaches his destination.

All right, so guys, I hope this is clear now,

let's move on and look

at the reinforcement learning definition.

So there are a few Concepts that you should be aware

of while studying reinforcement learning.

Let's look at those definitions over here.

So first we have the agent now an agent is basically

the reinforcement learning algorithm that learns

from trial and error.

Okay.

So an agent takes actions,

like for example a soldier in Counter-Strike navigating

through the game.

That's also an action.

Okay, if he moves left right or if he shoots at somebody

that's also an action.

Okay.

So the agent is responsible

for taking actions in the environment.

Now the environment is the whole Counter-Strike game.

Okay.

It's basically the world through which the agent

moves the environment takes the agents current state

and action as input

and it Returns the agency reward and its next state as output.

Alright, next we have action now all the possible steps

that an agent can take are called actions.

So like I said,

it can be moving right left or shooting or any of that.

Alright, then we have state now state is

basically the current condition returned by the environment.

So Double State you are in

if you are in state 1 or if you're interested

to that represents your current condition.

All right.

Next we have reward a reward is basically an instant return

from the environment to appraise Your Last Action.

Okay, so it can be anything like coins

or it can be additional points.

So basically a reward is given to an agent

after it clears.

The specific stages.

Next we have policy policy is basically the strategy

that the agent uses to find out his next action.

In based on his current state policy is just

the strategy with which you approach the game.

Then we have value.

Now while you is the expected long-term return with discount

so value and action value can be a little bit confusing

for you right now.

But as we move further,

you'll understand what I'm talking about.

Okay, so value is basically the long-term return

that you get with discount.

Okay discount, I'll explain in the further slides.

Then we have action value

now action value is also known as Q value.

Okay, it's very similar to what You except

that it takes an extra parameter,

which is the current action.

So basically here you'll find out the Q value depending

on the particular action that you took.

All right.

So guys don't get confused with value and action value.

We look at examples

in the further slides and you will understand this better.

Okay, so guys make sure

that you're familiar with these terms

because you'll be seeing a lot of these terms

in the further slides.

All right.

Now before we move any further,

I'd like to discuss a few more Concepts.

Okay.

So first we will discuss the reward maximization.

So if you haven't already realize the it the basic aim

of the RL agent is to maximize the reward now,

how does that happen?

Let's try to understand this in depth.

So the agent must be trained in such a way

that he takes the best action so that the reward is maximum

because the end goal of reinforcement learning

is to maximize your reward based on a set of actions.

So let me explain this with a small game now

in the figure you can see there is a Forks there's some meat

and there's a tiger So odd agent is basically the fox

and his end goal is to eat the maximum amount of meat

before being eaten by the tiger now

since the fox is a clever fellow

he eats the meat

that is closer to him rather than the meat

which is closer to the tiger.

Now this is because the closer he is to the tiger

the higher are his chances of getting killed.

So because of this the rewards which are near the tiger,

even if they are bigger meat chunks,

they will be discounted.

So this is exactly what discounting means

so our agent is not going to eat the meat chunks

which are Closer to the tiger because of the risk.

All right now even

though the meat chunks might be larger.

He does not want to take the chances of getting killed.

Okay.

This is called discounting.

Okay.

This is where you discount

because it improvised and you just eat the meat

which are closer to you instead of taking risks

and eating the meat

which are closer to your opponent.

All right.

Now the discounting of reward Works based

on a value called gamma will be discussing gamma

in our further slides,

but in short the value of gamma is between 0 and 1.

Okay.

So the Follow the gamma.

The larger is the discount value.

Okay.

So if the gamma value is lesser,

it means that the agent is not going to explore

and he's not going to try and eat the meat chunks

which are closer to the tiger.

Okay, but if the gamma value is closer to 1 it means

that our agent is actually going to explore

and it's going to dry and eat the meat chunks

which are closer to the tiger.

All right now,

I'll be explaining this in depth in the further slides.

So don't worry

if you haven't got a clear concept yet,

but just understand that reward maximized.

Ation is a very important step

when it comes to reinforcement learning

because the agent has to collect maximum rewards

by the end of the game.

All right.

Now, let's look at another concept

which is called exploration and exploitation.

So exploration like the name suggests is about exploring

and capturing more information about an environment

on the other hand exploitation is about using the already

known exploited information to hide in the rewards.

So guys consider the fox and tiger example

that we discussed now here

the foxy Only the meat chunks which are close to him,

but he does not eat the meat chunks

which are closer to the tiger.

Okay, even though they might give him more Awards.

He does not eat them

if the fox only focuses on the closest rewards,

he will never reach the big chunks of meat.

Okay, this is what exploitation is

about you just going to use the currently known information

and you're going to try and get rewards based

on that information.

But if the fox decides to explore a bit,

it can find the bigger award which is the big chunks of meat.

This is exactly what exploration is.

So the agent is not going to stick to one corner instead.

He's going to explore the entire environment and try

and collect bigger rewards.

All right, so guys,

I hope you all are clear with exploration and exploitation.

Now, let's look at the markers decision process.

So guys, this is basically a mathematical approach

for mapping a solution in reinforcement learning in a way.

The purpose of reinforcement learning is to solve

a Markov decision process.

Okay, so there are a few parameters.

Was that I used to get to the solution.

So the parameters include the set of actions the set

of states the rewards the policy

that you're taking to approach the problem and the value

that you get.

Okay, so to sum it up the agent must take

an action a to transition

from a start state to the end State s

while doing so the agent will receive a reward are

for each action that he takes.

So guys a series

of actions taken by the agent Define the policy

or a defines the approach.

And the rewards

that are collected Define the value.

So the main goal here is to maximize the rewards

by choosing the optimum policy.

All right.

Now, let's try to understand this with the help

of the shortest path problem.

I'm sure a lot of you might have gone through this problem

when you are in college,

so guys look at the graph over here.

So our aim here is to find the shortest path

between a and d with minimum possible cost.

So the value that you see on each of these edges

basically denotes the cost.

So if I want to go from A to see it's gonna cost me 15 points.

Okay.

So let's look at how this is done.

Now before we move and look at the problem

in this problem the set of states are denoted by the nodes,

which is ABCD

and the action is to Traverse from one node to the other.

So if I'm going from A to B,

that's an action similarly a to see

that's an action.

Okay, the reward is basically the cost

which is represented by each Edge over here.

All right.

Now the policy is basically the path that I choose

to reach the destination,

so Let's say I choose a seed be okay,

that's one policy in order to get to D

and choosing a CD which is a policy.

Okay.

It's basically how I'm approaching the problem.

So guys here you can start off at node a

and you can take baby steps to your destination.

Now initially you're clueless

so you can just take the next possible node,

which is visible to you.

So guys, if you're smart enough,

you're going to choose a to see instead of ABCD or ABD.

All right.

So now if you are at nodes see you want to drive.

String note D.

You must again choose a weisbarth.

All right, you just have to calculate which path has

the highest cost

or which path will give you the maximum rewards.

So guys, this is a simple problem.

We just trying to calculate the shortest path between a

and d by traversing through these nodes.

So if I Traverse from a CD, it gives me the maximum reward.

Okay, it gives me 65,

which is more than any other policy would give me.

Okay.

So if I go from ABD,

it would be 40 when you compare this to a CD.

It gives me more reward.

So obviously I'm going to go with a CB.

Okay, so guys was a simple problem

in order to understand how Markov decision process works.

All right, so guys, I want to ask you a question.

What do you think?

I did hear did I perform exploration or did I

perform exploitation now

the policy for the above example is of exploitation

because we didn't explore the other nodes.

Okay.

We just selected three notes and we travel through them.

So that's why this is called exploitation.

We must always explore the different notes

so that we Find a more optimal policy.

But in this case,

obviously a CD has the highest reward

and we're going with a CD but generally it's not so simple.

There are a lot of nodes there hundreds of notes you Traverse

and there are like 50 60 policies.

Okay, 50 60 different policies.

So you make sure you explore through all the policies

and then decide on an Optimum policy

which will give you a maximum reward the for a robot

and environment is a place

where It has been put to use now.

Remember this reward is itself

the agent for example an automobile Factory

where a robot is used to move materials

from one place to another now the task we discussed just

now have a property in common.

Now, these tasks involve and environment and expect

the agent to learn from the environment.

Now, this is where traditional machine learning phase

and hence the need for reinforcement learning now.

It is good to have an established overview

of the problem.

That is to be Of using the Q learning

or the reinforcement learning

so it helps to define the main components

of a reinforcement learning solution.

That is the agent environment action rewards and States.

So let's suppose we are to build

a few autonomous robots for an automobile building Factory.

Now, these robots will help the factory personal

by conveying them the necessary parts

that they would need in order to pull the car.

Now.

These different parts are located at

nine different positions within the factory warehouse.

The car part include the chassis Wheels dashboard

the engine and so on

and the factory workers have prioritized the location

that contains the body

or the chassis to be the topmost but they

have provided the priorities for other locations as well,

which will look into the moment.

Now these locations

within the factory look somewhat like this.

So as you can see here,

we have L1 L2 L3

all of these stations now one thing you might notice here

that there Little obstacle prison in between the locations.

So L6 is the top priority location

that contains the chassis for preparing the car bodies.

Now the task is to enable the robots

so that they can find the shortest route

from any given location to another location on their own.

Now the agents in this case are the robots the environment

is the automobile Factory warehouse.

So let's talk about the state's the states are the location

in which a particular robot is

And in the particular instance of time,

which will denote it states the machines understand numbers

rather than let us so let's map the location codes to number.

So as you can see here,

we have mapped location l 1 to this t 0 L 2 and 1

and so on we have L8 as state 7 and L line at state.

So next what we're going to talk about are the actions.

So in our example,

the action will be the direct location

that a robot can go

from a particular location right considering What

that is a tel to location

and the Direct locations to which it can move rl5 L1 and L3.

Now the figure here may come in handy to visualize this now

as you might have already guessed the set of actions

here is nothing but the set

of all possible states

of the robot for each location the set of actions

that a robot can take will be different.

For example, the set of actions will change

if the robot is in L1 rather than L2.

So if the robot is Is in L1 it can only go

to L 4 and L 2 directly now

that we are done with the states and the actions.

Let's talk about the rewards.

So the states are basically zero one two,

three four and the actions are also 0 1

2 3 4 up to 8.

Now.

The rewards now will be given to a robot.

If a location

which is the state is directly reachable

from a particular location.

So let's take an example suppose L line is directly reachable

from L8, right?

If a robot goes from LA to align and vice versa,

it will be rewarded by one and

if I look a shin is not directly reachable

from a particular equation.

We do not give any reward a reward of 0 now the reward

is just a number

and nothing else it enables the robots to make sense

of the movements helping them

in deciding what locations are directly reachable

and what are not now

with this Q. We can construct a reward table

which contains all the required values mapping

between all possible States.

So as you can see here in the table the positions

which are marked green have a positive reward.

And as you can see here,

we have all the possible rewards that a robot can get by moving

in between the different states.

Now comes an interesting decision.

Now remember that the factory administrator prioritized L6

to be the topmost.

So how do we incorporate this fact in the above table.

Now, this is done by associating the topmost priority location

with a very high reward than the usual ones.

So let's put 990.

And in the cell L 6 comma

and 6 now the table of rewards with a higher reward

for the topmost location looks something like this.

We have not formally defined all the vital components

for the solution.

We are aiming for the problem discussed.

Now, you will shift gears a bit and study some

of the fundamental concepts

that Prevail in the world of reinforcement learning

and q-learning the first of all we'll start

with the Bellman equation now consider

the following Square rooms,

which is analogous to the actual environment.

Aunt from our original problem,

but without the barriers now suppose a robot needs to go

to the room marked

in the green promise current position a using

the specified Direction now,

how can we enable the robot to do this programmatically

one idea would be introduced some kind of a footprint

which the robot will be able to follow now here

a constant value is specified in each of the rooms

which will come along the robots way

if it follows the direction specified above now in this way

if it starts at A it will be able to scan

through this constant value

and will move accordingly but this will only work

if the direction is prefix

and the robot always starts

at the location a now consider the robot starts

at this location rather than its previous one.

Now the robot now sees Footprints

in two different directions.

It is therefore unable to decide which way to go

in order to get the destination which is the Green Room.

It happens primarily

because the robot does not have a weight.

Remember the directions to proceed so our job

now is to enable the robot with a memory.

Now, this is where the Bellman equation comes into play.

So as you can see here,

the main reason of the Bellman equation

is to enable the reward with the memory.

That's the thing we're going to use.

So the equation goes something like this V

of s gives maximum a r of s comma a plus gamma of vs -

where s is a particular state

which is a ROM a is the Action Moving

between the rooms as -

is the state to which the robot goes from s

and gamma is the discount Factor

now we'll get into it in a moment

and obviously R of s comma a is a reward function

which takes a state as an action a and outputs the reward now V

of s is the value of being in a particular state

which is the footprint

now we consider all the possible actions

and take the one that yields the maximum value now,

there is one constraint however regarding the value Footprint,

that is the row marked

in the yellow just below the Green Room.

It will always have the value of 1 to denote

that is one of the nearest room

adjacent to the Green Room not this is also to ensure

that a robot gets a reward

when it goes from a yellow room to The Green Room.

Let's see how to make sense of the equation

which we have here.

So let's assume a discount factor of 0.9

as remember gamma is the discount value

or the discount Factor.

So let's take a 0.9 now for the room,

which is Just below the one

or the yellow room, which is the Aztec Mark for this room.

What will be the V of s

that is the value of being in a particular state?

So for this V of s would be something

like maximum of a will take 0

which is the initial of our s comma.

Hey plus 0.9 which is gamma into 1

that gives us zero point nine now here the robot

will not get any reward

for going to a state marked in yellow.

Hence the ER s comma a is 0 here

but the robot knows the value of being in the yellow room.

Hence V of s Dash is one following this

for the other states.

We should get 0.9 then again,

if we put 0.9 in this equation,

we get 0.81 than 0.7 to 9 and then we again reach

the starting point.

So this is

how the table looks with some value Footprints computed

from the Bellman equation now a couple of things

to It is here is

that the max function has the robot to always

choose the state

that gives it the maximum value of being in that state.

Now the discount Factor gamma notifies the robot

about how far it is from the destination.

This is typically specified by the developer of the algorithm.

That would be installed in the robot.

Now, the other states can also be given their respective values

in a similar way.

So as you can see here the boxes adjacent

to the green one have one and

if we Move away from 1 we get 0.9 0.8 1 0 1 7 to 9

and finally we reach 0.66.

Now the robot now can precede its way

through the Green Room utilizing these value Footprints event

if it's dropped at any arbitrary room

in the given location now,

if a robot Lance up in the highlighted Sky Blue Area,

it will still find two options to choose

from but eventually either of the parts will be good enough

for the robot to take

because Auto V the value for prints

and only that out.

Now one thing to note is

that the Bellman equation is one of the key equations

in the world of reinforcement learning and Q learning.

So if we think realistically our surroundings do not always work

in the way we expect there is always a bit

of stochastic City involved in it.

So this applies to robot as well.

Sometimes it might so happen

that the robots Machinery got corrupted.

Sometimes the robot may come across some hindrance

on its way which it may not be known

to it beforehand.

Right and sometimes even if the robot knows

that it needs to take the right turn it will not so

how do we introduce this to cast a city

in our case now here comes the Markov decision process.

So consider the robot is currently in the Red Room

and it needs to go to the green room.

Now.

Let's now consider the robot has a slight chance

of dysfunctioning and might take the left or the right

or the bottom turn instead of digging the upper turn

and are Get to the Green Room from where it is now,

which is the Retro.

Now the question is,

how do we enable the robot to handle this when it is out

in the given environment right.

Now, this is a situation

where the decision making

regarding which turn is to be taken is partly random

and partly another control of the robot now partly random

because we are not sure

when exactly the robot mind dysfunctional and partly under

the control of the robot

because it is still

making a decision of taking a turn right on its own.

And with the help of the program embedded into it.

So a Markov decision process

is a discrete time stochastic Control process.

It provides a mathematical framework for modeling

decision-making in situations

where the outcomes are partly random

and partly under the control of the decision maker.

Now we need to give this concept

a mathematical shape most likely an equation

which then can be taken further.

Now you might be surprised

that we can do this with the help of the Bellman equation.

Action with a few minor tweaks.

So if we have a look at the original Bellman equation

V of X is equal

to maximum of our s comma a plus gamma V of s -

what needs to be changed in the above equation

so that we can introduce some amount of Randomness

here as long as we are not sure

when the robot might not take the expected turn.

We are then also not sure in which room it might end up

in which is nothing

but the ROM it moves

from its current room at this point according.

To the equation.

We are not sure of the a stash

which is the next state or the room,

but we do know all the probable turns the robot might take now

in order to incorporate each

of this probabilities into the above equation.

We need to associate a probability with each

of the turns to quantify the robot.

If it has got any expertise chance of taking the stern know

if we do so we get PS is equal to maximum

of RS comma a plus gamma into summation of s -

PS comma a comma s stash into V of his stash now the PS a--

and a stash is the probability

of moving from room s to establish with the action a

and the submission here is the expectation

of the situation.

That's a robot in curse,

which is the randomness now, let's take a look

at this example here.

So when we associate

the probabilities to each of these terms Owns,

we essentially mean that there is an 80% chance

that the robot will take the upper turn.

Now, if you put all the required values

in our equation,

we get V of s is equal to maximum of R of s comma a +

comma of 0.8 into V of room up

plus zero point 1 into V of room down 0.03 into Rome

of V of from left plus 0.03 into V of room right now note

that the value footprints.

Not change due to the fact

that we are incorporating stochastically here.

But this time we will not calculate

those values Footprints instead.

We will let the robot to figure it out.

Now up until this point.

We have not considered about rewarding the robot

for its action of going into a particular room.

We are only watering the robot

when it gets to the destination now,

ideally there should be a reward for each action the robot takes

to help it better assess the quality of the actions,

but the there was need not to be always be the same

but it is much better than having some amount

of reward for the actions than having no rewards at all.

Right and this idea is known as the living penalty in reality.

The reward system can be very complex

and particularly modeling sparse rewards is an active area

of research in the domain of reinforcement learning.

So by now we have got the equation which we have a so

what we're going to do is now transition to Q learning.

So this equation gives us the value of going

to a particular State taking the stochastic city

of the environment into account.

Now, we have also learned very briefly about the idea

of living penalty

which deals with associating each move of the robot

with a reward.

So Q learning processes

and idea of assessing the quality of an action

that is taken to move to a state rather than determining

the possible value of the state

which is being moved to so earlier.

We had 0.8 into V. E of s 1 0.03 into V

of S 2 0 point 1 into V of S 3 and so on now

if you incorporate the idea of assessing the quality

of the action for moving to a certain state

so the environment with the agent

and the quality of the action will look something like this.

So instead of 0.8 V

of s 1 will have q of s 1 comma a one will have q

of S 2 comma 2 Q of S 3 now the robot now has food.

In states to choose from and along with that there are

four different actions also for the current state it is in so

how do we calculate Q of s comma

a that is the cumulative quality of the possible actions

the robot might take so let's break it down.

Now from the equation V of s equals maximum a RS comma a +

comma summation s - PSAs - into V of s -

if we discard them.

Maximum function we have is of a plus gamma into summation p

and v now essentially in the equation

that produces V of s.

We are considering all possible actions

and all possible States from the current state

that the robot is

in and then we are taking

the maximum value caused by taking a certain action

and the equation produces a value footprint,

which is for just one possible action.

In fact, we can think of it as the quality

of the So Q of s comma a is equal to RS comma a +

comma of summation p and v now

that we have got an equation to quantify the quality

of a particular action.

We are going to make a little adjustment

in the equation we can now say

that V of s is the maximum of all the possible values

of Q of s comma a right.

So let's utilize this fact

and replace V of s Dash as a function of Q. So Q U.s.

Comma a becomes R of s comma a + comma of summation PSAs -

and maximum of the que

es - a -

so the equation of V is now turned into an equation of Q,

which is the quality.

But why would we do that now?

This is done to ease our calculations

because now we have only one function Q

which is also the core of the dynamic programming language.

We have only one.

Ocean Q to calculate

and R of s comma a is a Quantified metric

which produces reward of moving to a certain State.

Now, the qualities

of the actions are called The Q values

and from now on we will refer to the value Footprints

as the Q values an important piece

of the puzzle is the temporal difference.

Now temporal difference is the component

that will help the robot calculate the Q values

which respect to the changes in the environment over time.

So consider Our robot is currently in the mark State

and it wants to move to the Upper State.

One thing to note that here is

that the robot already knows the Q value of making the action

that is moving through the Upper State and we know

that the environment is stochastic in nature

and the reward

that the robot will get after moving to the Upper State

might be different from an earlier observation.

So how do we capture this change the real difference?

We calculate the new q s comma a with the same formula

and subtract the Previously known qsa from it.

So this will in turn give us the new QA.

Now the equation

that we just derived gifts the temporal difference

in the Q values

which further helps to capture the random changes

in the environment

which may impose now the name q s comma a

is updated as the following so Q T of s comma is equal

to QT minus 1 s comma a

plus Alpha D DT of a comma

s now here Allah Alpha is the learning rate which controls

how quickly the robot adapts to the random changes imposed

by the environment the qts comma is the current state q value

and a QT minus 1 s comma is the previously recorded Q value.

So if we replace the TDS comma a with its full form equation,

we should get Q T of s comma is equal to QT -

1 of s comma y

plus Alpha into R of s comma a plus gamma maximum.

Q s Dash a dash minus QT

minus 1 s comma a now

that we have all the little pieces of q line together.

Let's move forward to its implementation part.

Now, this is the final equation of q-learning, right?

So, let's see

how we can implement this and obtain the best path

for any robot to take now to implement the algorithm.

We need to understand the warehouse location

and how that can be mapped to different states.

So let's start by reconnecting the sample environment.

So as you can see here,

we have L1 L2 L3 to align and as you can see here,

we have certain borders also.

So first of all,

let's map each of the above locations in the warehouse

two numbers or the states

so that it will ease our calculations, right?

So what I'm going to do is create a new Python 3 file

in the jupyter notebook and I'll name it as q-learning.

Number.

Okay.

So let's define the states.

But before that what we need to do is import numpy

because we're going to use numpy

for this purpose and let's initialize the parameters.

That is the gamma and Alpha parameters.

So gamma is 0.75

which is the discount Factor whereas Alpha is 0.9,

which is the learning rate.

Now next what we're going to do is Define the states and map

it to numbers.

So as I mentioned Earlier l 1 is 0 and Dylan line.

We have defined the states in the numerical form.

Now.

The next step is to define the actions which is

as mentioned above represents the transition

to the next state.

So as you can see here,

we have an array of actions from 0 to 8.

Now, what we're going to do is Define the reward table.

So as you can see, it's the same Matrix

that we created just now

that I showed you just now now if you understood it correctly,

there isn't any real Barrel limitation

as depicted in the image,

for example, the transitional for tell one is allowed

but the reward will be zero to discourage

that path or in tough situation.

What we do is add a minus 1 there

so that it gets a negative reward.

So in the above code snippet as you can see here.

I took each of the states and put once in the respective state

that are directly reachable from the certain State now,

if you refer to that reward table, once again,

what we created the above,

our reconstruction will be easy to understand

but one thing to note here is

that we did not consider the top priority location L6 yet.

We would also need an inverse mapping

from the state's back to its original location

and it will be cleaner

when we reach to the utter depths of the algorithms.

So for that what we're going to do Is have the inverse

map location State delegation.

We will take the distinct State and location

and convert it back.

Now.

What we'll do is we will now Define a function get optimal

which is the get optimal route,

which will have a start location and an N location.

Don't worry.

The code is pick

but I'll explain you each and every bit of the code.

Now the get optimal route function will take

two arguments the style location in the warehouse

and the end location in the warehouse recipe lovely

and it will return the optimal route

for reaching the end location

from the starting location in the form of an ordered list

containing the letters.

So we'll start by defining

the function by initializing the Q values to be all zeros.

So as you can see here,

we have given the Q value has to be 0 but For that

what we need to do is copy the reward Matrix to a new one.

So this is the rewards new and next again.

What we need to do is get the ending State corresponding

to the ending location.

And with this information automatically will set

the priority of the given ending stay to the highest one

that we are not defining it now,

but will automatically set the priority

of the given ending State as nine nine nine.

So what we're going to do is initialize

the Q values to be 0 and in the queue learning process

what you can see See here.

We are taking I in range 1,000 and we're going

to pick up a state randomly.

So we're going to use the MP dot random r + NT

and for traversing through the neighbor location

in the same maze.

We're going to iterate through the new reward

Matrix and get the actions

which are greater than 0 and after that

what we're going to do is pick an action randomly from the list

of the playable actions

in years to the next state

will going to compute the temporal difference,

which is TD,

which is the rewards plus gamma into The queue of next state

and will take n p dot ARG Max

of Q of next eight minus Q of the current state.

We going to then update

the Q values using the Bellman equation

as you can see here,

you have the Bellman equation

and we're going to update the Q values

and after that we're going to initialize the optimal route

with a starting location now here we do not know

what the next location yet.

So initialize it with the value of the starting location,

which again is the random Shh now we do not know

about the exact number of iteration needed to reach

to the final location.

Hence while loop will be a good choice for the iteration.

So when you're going to fetch the starting State fetch

the highest Q value penetrating

to the starting State we go to the index

or the next state,

but we need the corresponding letter.

So we're going to use that state to location function.

We just mentioned there

and after that we're going to update the starting location

for the next iteration.

Finally, we'll return the root.

So let's take the starting location of n line

and and location of L1 and see what part do we actually get?

So as you can see here,

we get Airline l8l five L2 and L1.

And if you have a look at the image here,

we have if we start from L9

to L1 we got l8l 5 L 2 l 1 L HL 5 L2 L1.

That would yield us the maximum.

Mm value of the maximum reward for the robot.

So now we have come to the end

of this Q learning session the past year has seen a lot

of great examples for machine learning

and many new high-impact

application of machine learning with discovered

and brought to light especially in the healthcare Finance

the speech recognition augmented reality

and much more complex 3D and video applications.

The natural language processing was easily

the most talked about domain

within the community with the likes of you.

Lmf it and but being open sourced.

So let's have a look

at some of the amazing machine learning projects

which are open sourced the code is available for you.

And those are discussed in this 2018 to nine in Spectrum.

So the first and the foremost is tensorflow dot DS now

machine learning in the browser

or fictional thought a few years back.

Back and a stunning reality.

Now a lot of us in this field are welded to our favorite IDE,

but tells of not DOT JS has the potential to change your habits.

It's become a very popular released since its release

earlier this year

and continues to amaze with its flexibility.

Now as a repository states,

there are primarily three major features

of terms of rho dot J's develop machine learning

and deep learning models

in your process itself run pre-existent as

a flow models within the browser retrain our Gene

these prediction models as well.

And if you are familiar with Kara's the high-level layers

EPA will seem quite familiar,

but there are plenty of examples available on GitHub repository.

So do check out those legs to Quicken your learning curve.

And as I mentioned earlier,

I'll leave the links to all of these open

source machine learning projects in the description below.

The next what we not discuss is detector

on it is developed by Facebook and made a huge Splash

when it was earlier launched in.

An 80 those developed by Facebook's AI research team,

which is fa ir.

And it implements the state

of the art object detection frame was it

is written in Python

and as help enable multiple projects

including the dance pose.

Now, we'll know

what exactly is then suppose after this example

and this repository contains the code

of over 70 preacher involves.

So it's a very good open source small guys.

So to check it out now the moment I talked

about then suppose.

That's the next one.

I'm going to talk about so

That's supposed stents human pose estimation in the wild,

but the code to train and evaluate your own dance pose

using the our CNN model is included here

and I've given the link to the open source code

in the description below

and there are notebooks available as well to visualize

certain Sports cocoa data set the next on our list.

We have D painterly harmonization.

Now, I want you to take a moment to just admire

the above images.

Can you tell which ones we're done by a human

and which one by a machine?

I certainly could not now here.

The first frame is the input image the original one

and a third frame

as you can see here has been generated by

this technique amazing, right?

The algorithm has an external object

to your choosing to any image

and manages it to make it look like nothing touched it now,

make sure you check out the code and try to implement it

on different sets of images yourself.

It is really really fun.

But talking about images.

We have image out painting now

what if I give you an image and ask you to extend Its boundaries

by imagining what it would look

like when the entire scene was captured.

You would understandably turn to some image editing software.

But here's the awesome news.

You can achieve it in few lines of code,

which is the image out painting.

Now.

This project is Akira's implementation of Stanford image

out failing paper,

which is incredibly cool and Illustrated paper.

And this is how most research paper should be.

I've given the links in the description

below to check it out guys and see how you can.

Implement it now.

Let's talk about audio processing which is

an another field

where machine learning has started to make its mark.

It is not just limited to generate music.

You can do tasks like audio classification fingerprinting

segmentation tagging and much more and there is a lot

that's still yet to be explored

and who knows perhaps you could use this project

to Pioneer your way to the top.

Now what if you want to discover your own planner now

that might perhaps be overstating things a bit,

but the astronaut repository will definitely get you close.

The Google brain team discovered

two new planets in the summer 2017 by applying the astronaut.

It's a deep neural network meant for working

with astronomical data.

It goes to show the far-ranging application

of machine learning and was a truly Monumental development.

And now the team

behind the technology has open source the entire code,

so go ahead and check out your own planet and

who knows you might even have a planet on your name now,

I could not possibly let this section.

Pass by without mentioning the brt.

The Google AI is released has smashed record on his way

to winning the hearts of NLP enthusiasts

and experts alike following you.

Lmf it and he LMO brt really blew away the competition

with its performance.

It obtained a state of art result

on 11 and LP task apart from the official Google repository.

There is a python implementation of birth,

which is worth checking out whether it makes a new era

or not in natural language processing.

The thing we will soon find out now add on it.

I'm sure you guys might have heard of it.

It is a framework for automatically learning

high quality models without requiring programming expertise

since it's a Google invention.

The framework is based on tensorflow and you can build

and simple models using a Danette

and even extend it to use to train a neural network.

Now the GitHub page contains the code and example

the API documentation

and other things to get your hands dirty the trust me

Otto ml is the next big thing.

NG in our field now

if you follow a few researchers on social media,

you must have come across some of the images.

I am showing here

in a video form a stick human running across the terrain

or trying to stand up or some sort,

but that my friends is reinforcement learning

and action now,

here's a signature example of it a framework to create

a simulated humanoid to imitate multiple motion skin.

So let's have a look at the top 10 skills.

Are required to become

a successful machine learning engineer.

So starting with programming languages python

is the lingua Franca of machine learning.

You may have had exposure to buy them.

Even if you weren't previously

in programming or in a computer science research field.

However, it is important to have a solid understanding of glasses

and data structures.

Sometimes python won't be enough often.

You'll encounter projects

that need to leverage hardware for Speed improvements.

Now, make sure you are familiar with the basic algorithms as

well as the classes.

Memory management and linking now

if you want a job in machine learning,

you will probably have to learn all of these languages

at some point C++ can help in speeding code up.

Whereas our works great in statistics and plots

and Hadoop is java-based.

So you probably need to implement mappers

and reducers in Java.

Now next we have linear algebra.

You need to be intimately familiar with mattresses vectors

and matrix multiplication

if you have an understanding of derivatives and integrals,

You should be in the clear.

Otherwise even simple concept

like gradient descent will elude you statistic

is going to come up a lot at least make sure you are familiar

with the caution distributions means standard deviation

and much more every bit of statistical understanding

Beyond this helps the theories help in learning

about algorithms great samples are naive buys

gaussian mixture models and hidden Markov models.

You need to have a firm understanding of probability

and stats to understand these these models just go nuts

and study measure Theory

and next we have advanced signal processing techniques.

Now feature extraction is one of the most important parts

of machine learning different types of problems

need various Solutions.

You may be able to utilize really cool Advanced

signal processing algorithms such as wavelets share.

Let's go blades and bandless you need to learn

about the time-frequency analysis and try to apply it

in your problems.

Now, this skill will give you an edge over all

the other skills not this kid.

Will give you an edge

while you're applying

for a machine learning engine the job or others

or next we have applied

maths a lot of machine learning techniques out.

There are just fancy types of functional approximation.

Now these often get developed by theoretical mathematician

and then get applied by people

who do not understand the theory at all.

Now the result is

that many developers might have a hard time

finding the best techniques for the problem.

So even a basic understanding of numerical analysis will give

you a huge Edge having a firm understanding.

Ending of algorithm Theory and knowing

how the algorithm works.

You can also discriminate models such as svm's now you will need

to understand subjects such as

gradient descent convex optimization LaGrange

quadratic programming partial differentiation equations

and much more now all this math might seem intimidating at first

if you have been away from it for a while just

machine learning is much more math intensive

than something like front-end developer.

Just like any other skill getting better at math is a man.

Our Focus practice the next skill

in our list is the neural network architectures.

We need machine learning for tasks that are too complex

for human to quote directly that is tasks

that are so complex

that it is Impractical now neural networks are a class

of models within the general machine learning literature

or neural networks are a specific set of algorithms

that have revolutionized machine learning.

They're inspired by biological neural networks,

and the current so-called deep neural networks

have proven to work quite well.

Well, the neural networks are themselves

General function approximations,

which is why they

can be applied to almost any machine learning problem

about learning a complex mapping

from the input to the output space.

Of course, there are still good reason for the surge

in the popularity of neural networks,

but neural networks have been by far the most accurate way of

approaching many problems like translation speech recognition

and image classification now coming to our next point

which is the natural language processing now

since it combines computer science and Listed,

there are a bunch of libraries like the NLT K chances.

Mm and the techniques such as sentimental analysis

and summarization

that are unique to NLP now audio and video processing

has a frequent overlap with the natural language processing.

However, natural language processing can be applied

to non audio data like text voice

and audio analysis involves extracting useful information

from the audio signals themselves being well-versed

in math will get you far in this one

and you should also be familiar.

Her with the concepts such as the fast Fourier transforms.

Now, these were the technical skills

that are required

to become a successful machine learning engineer.

So next I'm going to discuss some of the non-technical skills

or the soft skills,

which are required to become a machine-learning engineer.

So first of all, we have the industry knowledge.

Now the most successful machine learning projects out.

There are going to be those

that address real pain points whichever industry we

are working for you should know

how that industry works

and Will be beneficial for the business

if a machine learning engineer does not have business Acumen

and the know-how of the elements

that make up a successful business model

or any particular algorithm.

Then all those technical skills cannot be Channel productively,

you won't be able to discern the problems

and potential challenges that need solving

for the business to sustain

and grow you won't really be able to help

your organization explore new business opportunities.

So this is a must-have skill now next we

have effective communication.

You'll need to explain the machine learning

Concepts to the people with little to no expertise

in the field chances are you'll need to work

with a team of Engineers as well as many other teams.

So communication is going to make all of this much

more easier companies searching

for a strong machine learning engineer looking for someone

who can clearly

and fluently translate their technical findings

to a non technical team such as marketing

or sales department and next on our list.

We have rapid prototyping so Iterating on ideas as quickly as

possible is mandatory for finding one

that works in machine learning this applies to everything

from picking up the right model

to working on projects such as A/B Testing

you need to do a group

of techniques used to quickly fabricate a scale model

of a physical part

or assembly using

the three-dimensional computer aided design,

which is the cat so last

but not the least we have the final skill

and that is to keep updated.

You must stay up to date with Any upcoming changes

every month new neural network models come out

that are performed the previous architecture.

It also means being aware

of the news regarding the development of the tools

the changelog the conferences

and much more you

need to know about the theories and algorithms.

Now this you can achieve

by reading the research papers blogs the conference's videos.

And also you need to focus on the online community

with changes very quickly.

So expect and cultivate this change now,

this is not the Here we have certain skills the bonus skills,

which will give you an edge over other competitors

or the other persons

who are applying

for a machine-learning engineer position on the bonus point.

We have physics.

Now, you might be in a situation

where you're like to apply machine learning techniques

to A system

that will interact with the real world having some knowledge

of physics will take you far the next we

have reinforcement learning.

So this reinforcement learning has been a driver

behind many of the most exciting developments

in the Deep learning

and the AI community.

T from the alphago zero to the open a is Dota 2 pot.

This will be a critical to understand

if you want to go into robotics self-driving cars

or other AI related areas.

And finally we have

computer vision out of all the disciplines out there.

There are by far

the most resources available for learning computer vision.

This field appears to have the lowest barriers to entry

but of course this

likely means you will face slightly more competition.

So having a good knowledge of computer vision

how it rolls will give you an edge.

Other competitors now.

I hope you got acquainted with all the skills

which are required to become a successful

machine learning engineer.

As you know,

we are living in the worlds of humans and machines

in today's world.

These machines are the robots have to be programmed

before they start following your instructions.

But what if the machine started

learning on its own from their experience work like us

and feel like us

and do things more accurately than us now?

Well his machine learning Angela comes into picture to make sure

everything is working according to the procedures

and the guidelines.

So in my opinion machine learning is one of the most

recent and And Technologies,

there is you probably use it at dozens of times every day

without even knowing it.

So before we indulge

into the different roles the salary Trends

and what should be there on the resume

of a machine learning engineer

while applying for a job.

Let's understand

who exactly a machine learning engineering is so machine

learning Engineers are sophisticated programmers

who develop machines and systems

that can learn

and apply knowledge without specific Direction artificial

intelligence is the goal of a machine-learning engineer.

They are computer programmers

but their focus goes beyond specifically

programming machines to perform specific tasks.

They create programs

that will enable machines to take actions

without being specifically directed to perform those tasks.

Now if we have a look at the job trends

of machine learning in general.

So as you can see in Seattle itself,

we have 2,000 jobs in New York.

We have 1100 San Francisco.

We have 1100 in Bengaluru India,

we have 1100 and then we have Sunnyvale,

California where we have If I were a number of jobs,

so as you can see the number of jobs in the market is too much

and probably with the emergence of machine learning

and artificial intelligence.

This number is just going to get higher now.

If you have a look at the job opening salary-wise percentage,

so you can see for the $90,000 per annum bracket.

We have 32.7 percentage and that's the maximum.

So be assured

that if you get a job as a machine-learning engineer,

you'll probably get around 90 thousand bucks a year.

That's safe to say.

Now for the hundred and ten thousand dollars per year.

We have 25% $120,000.

We have 20 percent

almost then we have a hundred and thirty thousand dollars

which are the senior machine learning and Jenna's

that's a 13 point 6 7% And finally,

we have the most senior machine learning engineer

or we have the data scientist here,

which have the salary

of a hundred and forty thousand dollars per annum

and the percentage for that one is really low.

So as you can see there is a great opportunity for people.

What trying to go into machine learning field

and get started with it?

So let's have a look at the machine learning

in junior salary.

So the average salary in the u.s.

Is around a hundred eleven thousand four hundred

and ninety dollars

and the average salary in India is around

seven last nineteen thousand six hundred forty six rupees.

That's a very good average salary

for any particular profession.

So moving forward if we have a look

at the salary of an entry-level machine learning.

You know, so the salary ranges from $76,000

or seventy seven thousand dollars two hundred

and fifty one thousand dollars per annum.

That's a huge salary.

And if you talk about the bonus here,

we have like

three thousand dollars to twenty five thousand dollars depending

on the work YouTube and the project you are working on.

Let's talk about the profit sharing now.

So it's around two thousand dollars

to fifty thousand dollars.

Now this again depends

upon the project you are working the company you are working

for and the percentage

that Give to the in general or the developer

for that particular project.

Now, the total pay comes around seventy six thousand dollars

or seventy-five thousand dollars

two hundred and sixty two thousand dollars

and this is just for the entry level machine learning engineer.

Just imagine if you become an experience machine

learning engineer your salary is going to go through the roof.

So now that we have understood

who exactly is a machine learning engineer

the various salary Trends the job Trends in the market

and how it's rising.

Let's understand.

What skills it takes to become a machine learning engine.

So first of all,

we have programming languages now programming languages are

big deal when it comes to machine learning

because you don't just need to have Proficiency

in one language you might require Proficiency in Python.

Java are or C++

because you might be working in a Hadoop environment

where you require

Java programming to do the mapreduce Coatings

and sometimes our is very great

for visualization purposes and python has you know,

Another favorite languages

when comes to machine learning now next scale

that particular individual needs is calculus and statistics.

So a lot of machine learning algorithms are mostly

maths and statistics.

So and a lot of static

is required majorly the matrix multiplication

and all so good understanding

of calculus as well as statistic is required.

Now next we have

signal processing now Advanced signal processing is something

that will give you an upper Edge

over other machine learning engine is

if you are Applying for a job anywhere.

Now the next kill we have is applied maths

as I mentioned earlier many of the machine

learning algorithms here are purely mathematical formulas.

So a good understanding of maths and how the algorithm Works

will take you far ahead the next on our list.

We have neural networks.

No real networks are something

that has been emerging

quite popularly in the recent years and due to its efficiency

and the extent to which it can walk and get the results

as soon as possible.

Neural networks are a must

for machine learning engine now moving forward.

We have language processing.

So a lot of times machine learning Engineers have to deal

with text Data the voice data as well as video data now

processing any kind of language audio

or the video is something

that a machine-learning engineer has to do on a daily basis.

So one needs to be proficient in this area also now,

these are only some of the few skills

which are absolutely necessary.

I would say for any machine learning

and Engineer so let's now discuss the job description

or the roles

and responsibilities

of a particular machine learning engineer now

depending on their level

of expertise machine learning Engineers may have

to study and transform data science prototypes.

They need to design machine Learning Systems.

They also need to research and Implement appropriate machine

learning algorithms and tools

as it's a very important part of the job.

They need to develop new machine learning application

according to the industry requirements the Select

the appropriate data sets

and the data representation methods

because if there is a slight deviation in the data set

and the data representation

that's going to affect Model A lot.

They need to run machine learning tests and experiments.

They need to perform statistical analysis

and fine-tuning using the test results.

So sometimes people ask

what exactly is a difference between a data analyst

and a machine learning engineer.

So so static analysis

just a small part of of machine learning Engineers job.

Whereas it is a major part

or it probably covers a large part of a data analyst job

rather than a machine learning Engineers job.

So machine learning Engineers might need to train

and retrain the systems whenever necessary

and they also need

to extend the existing machine learning libraries

and Frameworks to their full potential

so that they could make the model Works superbly

and finally they need to keep abreast of the developments

in the field needless to say

that any machine.

In general or any particular individual has to stay updated

to the technologies

that are coming in the market

and every now and then a new technology arises

which will overthrow the older one.

So you need to be up to date now coming

to the resume part of a machine learning engineer.

So any resume of a particular machine learning Engineers

should consist like clear career objective skills,

which a particular individual possesses

the educational qualification

certain certification the past experience

if you are an experienced machine learning and Jen

are the projects which you have worked on and that's it.

So let's have a look at the various elements

that are required

in a machine-learning Engineers resume.

So first of all,

you need to have a clear career objective.

So here you will need not stretch it too much

and keep it as precise as possible.

So next we have the skills required and these skills

can be technical as well as non technical.

So let's have a look

at the various Technical and non-technical skills out here.

So starting with the technical skills.

First of all,

we have programming languages as an our Java Python and C++.

But the first

and the foremost requirement is to have a good grip

on any programming languages preferably python

as it is easy to learn

and it's applications are wider than any other language now,

it is important to have a good understanding of topics

like data structures memory management and classes.

All the python is a very good language it

alone cannot help you

so you will probably have to learn all

these he's languages like C++ are python Java

and also work on mapreduce

at some point of time the next on our list.

We have calculus and linear algebra and statistics.

So you'll need to be intimately familiar

with matrices the vectors and the matrix multiplication.

So statistics is going to come up a lot

and at least make sure you are familiar

with caution distribution means standard deviations

and much more.

So you also need to have a firm understanding of probability.

Stats to understand the machine learning models the next

as I mentioned earlier, it's signal processing techniques.

So feature extraction is one of the most important parts

of machine learning

different types of problems need various Solutions.

So you may be able to utilize

the really cool Advanced signal processing algorithms such as

wavelengths shallots curve.

Let's and the ballast so try to learn about

the time-frequency analysis and try to apply it to your problems

as it gives you an upper jaw.

Our other machine learning Engineers,

so just go for the next we

have mathematics and a lot of machine learning techniques out.

There are just fancy types of function approximation

having a firm understanding of algorithm Theory and knowing

how the algorithm works is really necessary

and understanding subjects

like gradient descent convex optimization

quadratic programming

and partial differentiation will help a lot the neural networks

as I was talking earlier.

So we need machine learning for tasks that are too Flex

for humans to quote directly.

So that is the tasks that are so complex

that it is Impractical neural networks are a class

of models within the general machine learning literature.

They are specific set of algorithms

that have revolutionized machine

learning deep neural networks have proven to work quite well

and neural networks

are themself General function approximations,

which is why they can be applied

to almost any machine learning problem out there

and they help a lot about learning a complex mapping

from the input

to The output space now next we have language processing

since natural language processing combines two

of the major areas of work

that are linguistic and computer science

and chances are at some point you are going to work

with either text or audio or the video.

So it's necessary to have a control over libraries

like gents mm and ltk

and techniques like what to wet sentimental analysis

and text summarization Now voice

and audio analysis involves extracting useful information

from the Your signals themselves very well versed in maths

and concept like Fourier transformation will get

you far in this one.

These were the technical skills that are required but be assured

that there are a lot of non technical skills.

Also that are required to land a good job

in a machine learning industry.

So first of all,

you need to have an industry knowledge.

So the most successful machine learning projects out.

There are going to be those that address real pain points,

don't you agree?

So whichever industry are working for You should know

how that industry works

and what will be beneficial for the industry.

Now, if a machine learning engineer

does not have business Acumen and the know-how of the elements

that make up a successful business model.

All those technical skills cannot be

channeled productively.

You won't be able to discern the problems

and the potential challenges that need solving

for the business to sustain and grow the next on our list.

We have effective communication and not this is one

of the most important parts in any job requirements.

So you'll need to In machine learning Concepts to people

with little to no expertise in the field a chances are

you will need to work with a team of Engineers

as well as many other teams like marketing

and the sales team.

So communication is going to make all of this much

easier companies searching

for the strong machine learning engineer looking for someone

who can clearly and fluency translate technical findings

to a non technical team.

Rapid prototyping is another skill,

which is very much required for any machine learning engineer.

So iterating on ideas as quickly as possible is mandatory

for finding the one

that works in machine learning this applies to everything

from picking the right model

to working on projects such as a/b testing

and much more now you need to do a group

of techniques used to quickly fabricate a scale model

of a physical part

or assembly using

the three-dimensional computer aided design,

which is the cat data now coming to the final skills,

which will be required

for any machine learning agenda is to keep updated.

So you must stay up to date with any upcoming changes

every month new neural network models come out

that outperformed the previous architecture.

It also means being aware of the news regarding the development

of the tools Theory

and algorithms through research papers blocks conference videos

and much more.

Now another part of any machine learning engineer's resume is

the education qualification.

So a bachelor's

or master's degree in computer science RIT economics statistics

or even mathematics can help.

Up you land a job in machine learning plus

if you are an experienced machine learning engineer,

so probably some standard company certifications

will help you a lot

when Landing a good job in machine learning

and finally coming to the professional experience.

You need to have experience in computer science statistics data

as is if you are switching

from any other profession into a machine learning engineer,

or if you have a previous experience in machine learning

that is very well.

Now finally if we talk about The projects

so you need to have not just any project

that you have worked on you need to have working on machine

learning related projects

that involve a certain level of AI

and working on neural networks to a certain degree

to land a good job as a machine-learning engineer.

Now if you have a look at the company's hiring machine

learning Engineers,

so every other company is looking

for machine learning Engineers

who can modify the existing model to something

that did not need much more.

Of Maintenance and cancel sustain so basically working

on artificial intelligence and new algorithms

that can work on their own is what every company deserves.

So Amazon Facebook.

We have Tech giants

like Microsoft IBM again in the gaming industry,

we have or the GPU industry Graphics industry.

We have Nvidia in banking industry.

We have JPMorgan Chase again,

we have LinkedIn and also we have Walmart.

So all of these companies require machine learning engine

at some part of the time.

So be assured that

if you are looking for a machine learning engineer post,

every other companies be it a big shot company or even

the new startups are looking for machine learning Engineers.

So be assured you will get a job now with this we come

to an end of this video.

So I hope you've got a good understanding

of who exactly are machine learning engineer is

the way just job Trends the salary Trends.

What are the skills required to become machine learning engineer

and once you become a machine-learning engineer,

what are the roles and responsibilities

what appears to be on the resume or the job description

what appears to be

on the job application of any machine learning engineers?

And also I hope you got to know how to prepare your resume

or how to prepare it in the correct format.

And what on to keep their

in the resume the career objectives the skills

Technical and non-technical

previous experience education qualification

and certain projects

which are related to it.

So that's it guys Ed Rica

as you know provides a machine learning.

Engineer master's program now that is aligned in such a way

that will get you acquainted in all the skills

that are required to become a machine

learning engine and that too in the correct form.

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Machine Learning Full Course - Learn Machine Learning 10 Hours | Machine Learning Tutorial | Edureka

AutoDub

Video Transcript

Summary

Core Theme

Paste YouTube URL

Transcript Extraction Form

Get Our Chrome Extension

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube Transcript:
Machine Learning Full Course - Learn Machine Learning 10 Hours | Machine Learning Tutorial | Edureka