Hang tight while we fetch the video data and transcripts. This only takes a moment.
Connecting to YouTube player…
Fetching transcript data…
We’ll display the transcript, summary, and all view options as soon as everything loads.
Next steps
Loading transcript tools…
Statistics For Data Science: COMPLETE Course For Beginners (2025)| Statistics Tutorial | Intellipaat | Intellipaat | YouTubeToText
YouTube Transcript: Statistics For Data Science: COMPLETE Course For Beginners (2025)| Statistics Tutorial | Intellipaat
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
Video Summary
Summary
Core Theme
This content is a comprehensive tutorial on statistics for data science, covering fundamental concepts from descriptive statistics to probability distributions and hypothesis testing, with practical Python implementations. It emphasizes building a strong statistical foundation essential for aspiring data scientists.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
[Music]
Welcome to Intellipad Statistics for
Data Science full course. Your complete
tutorial to master the most important
pillar of data science, statistics. If
you want to become a data scientist,
here's one truth you can't ignore.
Statistics is non-negotiable. That is
why we have created the statistics for
data science course designed for
absolute beginners who want to build
strong foundation that lead to real
world data science jobs. Let's face it,
machine learning, AI, even deep
learning, all of it stands on the
shoulder of statistical thinking. And in
this statistics for data science
tutorial, you will not only learn theory
but apply statistics principle
practically to different problem
statements. So not memorizing anything
here. You're solving data problems
hands-on using stat and little bit of
Python. In this statistics for data
science full course, we will start from
scratch. You will learn what is
statistics and data size, descriptive
versus inferial statistics, mean,
median, mode, variance, standard
deviation, probability theory and
distribution, binomial, normal, poison,
hypothesis testing, p values, confidence
interval, zed test, t test, ANOVA. We
have also included hands-on coding,
implementation of these stack concepts,
their real world example and jaw focused
explanations. So you will know exactly
why you are learning each concepts and
how it is used by data scientists in
companies like Amazon, Google and
Netflix. By the end of statistics course
for data science, you will be ready to
take a dig at machine learning and make
sense of numbers and figure out the
patterns between them. So let's get started.
started.
So tax is basically a branch of
mathematics, right? So how can I give
this answer is stat
large amount of data. So if we have a
smaller like suppose if I'm having a
company and I have only 10 employees
there. So I can easily manage the data
myself. I do not need any tools or
something to manage it. But if I'm
having large amount of data that time it
is difficult for me to manually analyze
the data. So that's why we need the
tools of stats.
stats.
Now the next question which is asked is
like okay we understood what is
statistics but why it is important for us.
Why is statistics important for us being
a data analyst or being a data
So guys, everybody knows what is the
difference between a raw data and information.
information.
Everybody knows what is the difference
Raw data means meaningless data, right?
which do not have any meaning and
information means meaningful data. Yes
So don't you think so your statistics is
actually helping you to convert your raw
So with the help of statistics we are
actually converting our raw data into
into information.
Also we can say that
also we can say that
from large data sets that we are
decisions etc. So this is how we can
answer this question.
from large data sets and which helps me
to make predictions, decisions and
all other things. So now we understood
that our data science or data analyst
field is full of analysis, analyzing the
data, doing predictions, finding
something and all that stuff. So at
stats also have the quality to do that.
So how much important this concept is
for us, right? I hope you got a basic
idea about this.
So now let us start with the another
topic which is population versus sample. Okay.
Okay.
So first let me explain you in a layman
language what exactly population and
sample is. Now suppose we live in a
planet of earth. Right?
Suppose this is your earth. If I want to
divide this earth into subp parts then I
think continents. So how many continents
we are having? That is seven. Right? So
I divided this earth into seven
different continents.
Right? That means the entire entity what
we are talking about is population.
the entire entity and the sub part of
Now how we can talk about when we have a
data frame that the entire data set what
we are having the entire CSV file what
we are having is the population and if I
take sub part of that data if I take the
sub part of that data that will be our
sample. So like how we do it in real
life like suppose you have some
employees data okay all employees data.
So all the managers all the HRs all the
interns all the senior employees
everyone have the data here and from
this your question is you need to find
what is the average salary of a managers
in the company. So what you will do is
you will filter out the data in which
you will have a sample only of managers
and from this data you will find what is the
the
average salary
of managers. Clear? Now the question
arises if this part is clear the
question arises is how do we choose samples?
samples?
Okay, we have some techniques guys
through which we decide samples and that
techniques are known as sampling
techniques. So there are many sampling
techniques but three most important
sampling techniques which we are going
to discuss. Okay. So the first sampling
technique is known as random sampling.
Okay. The second sampling technique
which is known as stratified sampling
and the third sampling technique is
known as systematic sampling. So I will
give you a hint and you will tell me
about these sampling techniques guys.
The thing is when you you know face
interviews it's not important that you
are you heard about all the concepts all
the terms there right? There might be
some new terms which will come in
interviews but according to the name you
need to answer something. It's very bad
if you are sitting blank in the
interview. So at least you need to say
something. So let's practice that thing.
Okay. So first we will talk about random
sampling. So from the name itself random
I think it's easy. What is random
sampling? So yes means taking data
randomly right? So nothing no pattern
nothing just randomly I will select like
suppose from the entire school you need
to randomly pick five students
irrespective of what class they are what
gender they are what age they are you
just randomly picking five student right
second is stratified sampling now from
the stratified word I think a strategy
English word strategy comes into my
So what could be this stratified sample?
So don't you think so the example which
we took above that from the entire
employees data I want the data I want
the sample only of managers. So because
I was having a strategy in my mind
statement was I need to find the average
salary of managers only. So when I was
creating sample I was thinking that only
managers should come in my data. So
don't you think so this example is of
stratified sampling where I am putting
the condition that designation of that
employee should be of managers only. So
I'm putting some conditions here. So
stratified sampling can also be known as
conditionalbased sampling where you are
putting some conditions.
of
Clear guys? So I'll give you one more example
example
like from the entire data set of all the
genders I want to create a sample of
women's only.
The third is systematic sampling.
So from the word systematic a systematic
order comes into my mind. Yes or no? A
systematic order it is ordered based
sampling. Order based what does it means
that like you are creating a sample.
Okay, you are creating this sample and
in this sample you are saying that I
want taking
every nth element. Every
Every
nth element. What does this means is
like suppose in your data points are
segregated like this 1 2 3 4 5 6 7 8 9
10 11 12 13 14.
You are creating a sample in which you
are saying I want every third element.
every third element.
So what points you will get in your
sample is I think 3 6 9
12. So you will be having a sample in
which you will be having the points 3 6
9 and 12.
So a order is maintained in this type of sampling.
sampling.
Okay. Let me make it more clear by
giving you the exact difference of three
of the sampling techniques. Okay. So
suppose you have a data in which the
points are stored like this. Male,
Male, female,
female,
male, male, female, female and male.
Suppose like this the data is arranged.
So if I want to perform random sampling
here. So in random sampling I can get
anything I can even get males I can even
get females anything I can get right if
I'm doing stratified sampling in
stratified sampling I am saying that I
want only females
only females so only females data I will get
get
number of females present I will get
only females data now if I want to do
systematic here in systematic I am
saying I want every second element. So
what elements I will get is first I will
get female then I will get male then I
will get again female. So like this I
will get the data getting the difference
between three of the sampling
techniques. Now my question to you is
you need to find in real time what is
the use of these three sampling techniques.
techniques.
So I'll give you one example of
systematic. Okay. So in systematics
suppose you're working in some factory
okay and you are there um product
management you're a manager of the
products there and you want to see the
production of the products now see like
if any new product is launched like we
talk about iPhones so recently which
iPhone is launched iPhone 15 was launched
launched
now when this iPhone 15 was launched do
you think they will produce lakhs of products.
products.
Do you think so? That they will produce
one lakh, 10 lakh or even kes of
products at one go which usually they
don't. Why? because they will first
create some samples like suppose uh this
is not the estimate number I'm just
giving you an idea like suppose they
will pin 10,000 phones in the market and
they will see what is the sales of this
10,000 if the sales is high they will
produce more phones if the sales is low
they will slow down the production
that's how any industry works now see
from that 10,000 phones what they have
buyed or what they have basically made.
Do you think so? They will manually
check each and every product that that
is working fine or not. No. That they
will convert this into batches. Batches
means batch one maybe of 500 phones.
Then 500 phones then 500 from this 500
phones. So suppose this is your
population now or 500 phones. And now
from this batch they will check every
second or every third product. Every
second or every third product. So if
there's like suppose 99.9%
of the checked products are working fine
that most probably this batch is fine.
This is how it works right means most
probably 500 phones will work fine.
Similarly they will check for this is
the example of systematic. So instead of
looking for each and every product
individually, they will look for every
second mobile or every third mobile and
see the percentage of like suppose they
checked 10 mobiles here. Now from this
10 mobiles all the 10 were working fine
or eight were working fine or five were
working fine. So 50% was fine, 99 was
fine or 100% was fine. So if the
percentage is good like they're saying
99% of the phones are working fine means
out of nine out of 10 nine phones were
working fine. So most probably the batch
of 500 will also work fine. Getting the
point? So here we can use systematic
sampling when we need to check the
condition or we need to check any
specific product in factories. So if it
is clear let us move to the next topic
which is central tendencies
central tendencies.
So suppose you have a data okay and you
are plotting the histogram of data. So
this is your x-axis and what is there in
y-axis of uh this histogram? What is the
y-axis of histogram? It's frequency or
count. Okay. So if you forget what
exactly histogram is, I'll give you
this is something what histogram looks
like. Yes or no? So for this data, if I
plotted this histogram and I smoothen
out this histogram like this, I smooen
out. Right? Now central tendencies will
help you to find where is the center of
the distribution. At what value the
center of the distribution lies.
So there are three types of central
tendencies mean median and mode. So the
distribution means the center is lying
at mean or the center is lying at median
or the center is lying at the mode
value. That's what your central
tendencies tells you. We have some
distribution of data like this, right?
So where is the center point of that distribution?
distribution?
Where is the center point of that
distribution? Does it center point is at
mean? That means the center of the data
is the mean of the data or the center of
the data is actually the median of the
data or the center of data is at the
mode of the data. That's what central
tendencies tells you.
So it is used to indicate
where does
middle or center of the distribution lies.
lies.
Okay. So central tendencies have three
points like three categories.
So I already told about that is your
mean, median and mode.
Mhm. Okay. So when we talk about mean,
so let us talk about these terms one by
one. What is the mean of the data? So it
is the average of the data. So what is
the formula of mean? Sum of observations
divided by total number of observations. Right?
Right?
Right. So suppose if I give you a data
of 7 10 15 27 38
42 96 tell me what is the mean of this
data. So basically summing up all the
values like this 7 + 10 + 96
divided by total number is 1 2 3 4 5 6
7. So that is approx 33.57.
So this was mean everyone is clear
right? Let's talk about median then what
is median? Median means the middle value
right? So suppose if I give you data
like this 20 10 40 5 4 2 3 50 tell me
what is the median of this data okay one
second I'll give you one more that is 49
one more in median middle value after
sorting the data so sorting can be even
in ascending or descending order so if I
sort this data so First I'll get 2 3 4
5. After five what we have? 10 20 30
49 50. And now you tell what is the
median? 10. Now 10 is the middle value
not 4. Okay. And suppose the same data I
am having without 49.
Without 49.
So if it is a odd number then it's fine.
We can easily find the median. If it is
even here it is even numbers. So that
time we need to take the two most center
elements and take the average of them. 5
+ 10 by 2 that is 15 by 2. So median
will be here 7.5. Here everyone this was
the basic concept which you should remember.
remember.
Now let us talk about mode. What is the
meaning of mode? most repeatable. So
suppose if I take the example we have 2
2 3 3 4 4 tell me what is the mode of
this data? Mode is three. Right? But if
my data is this one 2 3 3 4 4 2. Now
what is the mode? Now you can see we
have two elements. 1 is two which is
repeated three times and then three
which is also repeated three times mean
maximum repetition is three here. So
this time it is two modes we are having
that is two and three. So if we want to
use any one element either we can use
two here either we can use three here.
Clear? So don't get confused that if we
have multiple modes what to do in that
case? Do we need to subtract? Do we need
to take the average or something? No
average nothing these two elements will
be your mode. Clear? If no repetator
then no more. So basically if you got a
data like this. So here maximum time
repetition is basically one only. Now
because every element is repeated one
time only. So all will be considered as
the mode values. All the elements will
be considered as mode. So this was the
concept of mean, median and mode.
Okay. So let us jump to the next concept
which is of variation. What is the
meaning of variation
or we can call it as variance also. What
is the meaning of variance? So variance
actually tells you about the spread of
the data guys. Not only the spread
the spread of data from mean value.
Okay. So like suppose you have two types
of data in which the one points are like
this and in other the points are like
this. So can you tell me where is more
variance and where is less variance. So
we are basically calculating from the
mean. So suppose this is the mean of
this point and this is the mean of this
data. So with the help of variance we
are trying to calculate the distance of
each and every point from the value. Now
let's discuss about the formula of
variance standard variance formula. So
no need to memorize the formula. It is
even very very easy. So guys if you want
to calculate the distance between any
two points here like suppose it is x1
and x2. How you calculate the distance
between any two points? How you
calculate the distance between any two
points? Taking the difference x2 - x1
that will give me the distance right.
Similarly, variance we are calculating
the distance of the points from the
mean. Right? So that means that x minus
the mean of the data. Right? Then we
need to square it. Why? Because we don't
want any negative value. Right? And this
is for the one point like X but we have
multiple points in the data. So we will
do the submission and divided by the
capital N. This is how standard variance
formula looks like. So it was very easy,
right? Okay. Then when we talk about
standard deviation, it is nothing just
the under root value of standard variance.
Okay. So, next topic is to find the
range. So, tell me what is the range of
the data? It is the difference or
distance between minimum and maximum
value in the data. So, the formula will
be range is equals to max value of the
data minus minimum. So suppose you have
data points 7 8 2 9 10 11. Tell me what
is the range of this data? What is the
range of the data? So maximum value is
11 here. Minimum value is two here. So
we are getting the nine. So now let us
discuss about one more topic which is
percentiles. So everyone please focus on
this topic because it is something like
sometimes it feels difficult to
understand. So that topic name is
percentile. Okay. Let us discuss about
the topic of percentiles here. Now
suppose you have some data. So suppose
you have a data which stores the ages of
person. So in your data ages are stored
something like this. 1 age 2 4 5 7 10 12
6 5 and suppose 50. So what I can
of data is
less than or is of uh what we say small
children 99% data is of small children
whereas only one data point I'm having
here which is of old age person that's
what we can inference from here right
this is how your percentile will have
here your 99th percentile value
is equals to 50. What does this means?
That 99%age
of data is less than number 50. So this 99%
99%
of data will be less than number 50.
Getting the point? So percentiles helps
you to find a value below which certain
percentage of observations lie. So if
I'm talking about 99 percentile means
99% of the data. If I'm talking about
50% means 50%age of the data. If I'm
talking about 75% means 75% of the data.
Getting the point or all let me explain
you with one more example. So everyone
of you I hope you know what is the
marking system of competitive exams IIIT
J or IIT JAMS CAT GATE any exam guys
competitive exams do you know how the
marking system of that is they are not
percentile based that is percentile
based not percentage based percentile so
let me explain you so suppose you um any
topper in your class score 80 marks out
of 100. Okay, this is the marks of
topper. Now in this topper means means
rest of the complete class has marks
less than 18. That's why we are saying
the person is topper. Right? So that
means 100 percentile value here is
equals to 80 means 100%age
of other data points of other students
have marks which is less than 80.
Getting the point? So percentiles are
the scores that are used to describe a
value below which some percentage of
observations fall. If this is clear the
next topic is very very easy for you
guys which is quartiles. So quartiles I
think I can relate this word to a
quarter English word quarter and what is
quarter means 1 by4. So that means I
need to divide my data into four equal
parts. So the first part the first
division will be made at the value of 25 percentile.
percentile.
The second division will be made at the
value of 50th percentile and the third
division value will be made at 75%.
So for example if you have a data in
which your minimum value is 1 and the
maximum value is 100 and suppose here we
are having 15 45 and 80. Now what does
this means that at 25%
value is 15 means 25% of the data is
less than value 15. Similarly 50%age of
the data is less than value 45.
Similarly 75% of data is less than value 80.
80.
Getting the point? So this 25% is known
as Q1. 50% is known as Q2 and 75% is
known as Q3. That's just the notation
what we give. Right? Similarly, we have
one more term which we use which is IQR.
So, IQR full form is interquartile
range. What is the value? It's nothing
just the difference of two variables
that is Q3 minus Q1. Like we have read
similarly we have IQR. So Q3 means the
value of 25 percentile and Q1 means
value of 25 percentile. What does this
means that in this data what is your
IQR? So at 75 means Q3 we have 80 and at
Q1 we have 15. So 80 minus 15 which is
65. This is your IQR value. Right? Then
we will discuss about one topic which is
outliers. So in machine learning
outliers topic is very important. Let us
first understand the meaning of
outliers. So I call this as odd one
outliers. Like suppose you have a data
points like this 1 2 3 5 7 15 20 100.
Can you tell me any value here which is
odd one out value? 100. Right? because
the other values range is very less but
100 range is very high. So that odd one
out value is your outliers. Similarly,
if I take another set 0.02 01
01
1 2 3 5 7 15 20. Now can you tell me any
value which is outlier here or odd one
out? So in right hand side we have two
values 101 and in left hand side also we
have two values. So these values are
your outliers. So the extremist values
on the right or the left side is known
as the outright which is out of the
range. So if your data is this now like
don't consider 100. If your data is only
this then you don't have any outlier
because the range is quite okay here.
The range is quite okay. But if the
range of the data like here you can see
the range is 80. Whereas here it is just
five. You can see the difference. Then
only you will say it has outliers. It's
not necessary that every data set will
have outliers. Some have some don't
have. Clear? Now tell me which plot we
use to find the outliers. Which
visualization plot we use to find the
outliers? that do our data have outliers
or not which visualization plot we use
box plot. So how that box plot looks
like is so if you have these black
points like this then we say that the
data is outliers otherwise the data
don't have outliers. Similarly yes
boilent plot also boilent plot looks a
leaf shape. So if that leaf is also
having points that means outliers
otherwise no right
right
what is the meaning of correlation
correlation basically tells you the
relationship of columns relationship
of columns like if I'm increasing one
column what is the effect on the other
column okay so correlation is of three
types. Correlation is of three types.
What is that? Positive correlation,
negative correlation and zero
correlation. Now what does this means
that? So guys, do you remember the
concept of directly proportional and
inversely proportional?
What is directly? That if I'm increasing
one variable, the other variable will
also increase with it. like your uh
supply and demand. If demand of any
variable is increasing, the supply will
also increase. Right? Similarly,
inversely proportional, if one is
increasing, the other will decrease with
it. Like if you will increase the speed
of your vehicle, the time to get to your
destination will decrease. That's what
positive and negative correlation is.
Positive means directly proportional.
negative means inversely proportional
and zero means no effect that if I'm
increasing one variable or if I'm
decreasing one variable there is no
effect on the area now which
visualization tool we use
for correlation
which visualization tool we use heat map
so in ML you are going to use this heat
map a lot so you will automatically
remember this. Yeah. So this was all
about your descriptive statistics from
my end. Okay.
Regarding your probability. So first
tell me what is probability guys? What
idea you have about probability or we
can also talk like what are the things
scenarios here you can say that
probability will help me in data
science. Why probability is important
for data science. Probability basically
helps the data scientists. Like if we do
any kind of prediction or if we do any
kind of statistical analysis,
how probable that our analysis will be
right or how probable it will be wrong.
Like suppose I have forecasted that
tomorrow it will be real. Now if I know
the probability how much probability my
predictions will be correct I can
proceed like if it is raining my
probability comes out to be 90%. Then if
I'm working in some traffic agency
department and there I can help them to
reduce the traffic I can give them a
prior idea that tomorrow uh I need more
staff present I need more traffic police
as it is raining so it could be more
traffic. So it gives you the likelihood
or the probability that how much right
your predictions will be. So probability
helps data scientist and even data
analyst to access the likelihood of
events that in turns enhance efficiency
efficiency
of insights like what insights we are
basically making drawn from statistical models.
models.
Now tell me what intro you know about
probability. What exactly probability is?
is?
What is the meaning of probability
definition? Try to recall you see.
So probability is the way so way of
measuring how likely
something is to happen. Right? And
formula is like one of your student one
of your colleague has already given the
formula. So the formula is probability
is equals to favorable outcomes
favorable outcomes divided by
total number.
So let me also define about the
outcomes. So what do you understand by
favorable outcomes guys?
according to our needs. So the specific
outcomes we are interested in like
suppose if we toss a coin and I want
what is the probability of coming head.
So I want the probability of head right.
So head will be my favorable outcome
means the outcomes which you are
interested in. If you want to find what
is the probability that tomorrow it will
be raining. So yes will be your
favorable outcome. So the specific outcomes
you are interested in
right now. Then what are total outcomes?
What are total outcomes? We have the
formula as favorable outcomes divided by
total number of outcomes. So tell me
what are total outcomes? All the
possibilities. All the possible outcomes.
outcomes.
So let us discuss about some examples
simple examples. So first flipping a coin.
coin.
Flipping a single coin. Right? So what
is the sample space you will get? Either
you'll get head or you will get tail.
Right? So your favorable outcome suppose
we want to find
what is the probability of head right
then we need to find what is the total
outcomes here good two so now what will
be the probability that it's a head put
that into the formula yes 1 by 2 so here
it is basically 1 * head is coming so 1
by 2 which is.5 is the probability that
you will achieve leave a head right
so what is the sample space you're getting
getting
so either I can get both head either I
can get one head one tail first tail
then head or I can get both tails this
is the sample space right so favorable
outcome here suppose you want what is
the probability I will get head and head
to so how how many times it's repeating
what is the total outcome how many total
outcomes you're having four so the
probability will be 1 by 4 so these are
the very basic examples I hope you have
studied in your six seven standards as
well in your math session remember so
before moving ahead with further
advanced topics important terminologies
we use so I'll mention this topic as
important terminology ology.
So the first what we are going to
discuss is what do you understand by experiment?
experiment?
What is an experiment means? So an
experiment is an event where uncertain
outcomes are possible. This is how you
can define this terminology
of experiment. Then what is event guys?
What is the meaning of event? uh the
possible outcomes like if I'm having a
event I am doing this I am flipping a
coin so my task is to flip the coin
right that's what event is or we can
also say that I'll send you another so
the definition directly from the
Wikipedia is like the event means the
set of outcomes of an experiment for
example if If I'm flipping a coin, so my
sample space will be head or tail,
right? What is the event that I'm
getting a head or event of getting a
tail? Okay, so you can even consider it
as the subset of your sample space. So
an event is a set of outcomes of an experiment.
So let us discuss about sample space.
Now tell me what is sample space? All
the possible sample space means guys all
the possible outcomes. Like if I'm
tossing a coin, what are all the
possible outcomes? I can get a head, I
can get a tail, right? So when I'm
flipping only one coin, my sample space
is only two, which is I can get head or
I can get tail. Whereas if I'm flipping
two coins, this will be my sample space.
I can get both head. I can get first
head then tail otherwise first tail then
head and then both tail I can get
similarly if we have a deck of cards I
hope everybody knows about those deck of
cards game right so 52 cards exactly so
if I'm taking any card from that deck of
uh cards so what will be the sample
space so all the 52 cards which I'm
having in that deck will be in my sample
space So sample space basically means
sample space total outcome basically
means that we are counting that value
like total outcomes here we are having
two here we are having four if we are
counting that total number of outcomes
that is total outcomes if I'm just
writing that this is the sample space
here if I'm mentioning those outcomes
like what are all the possible outcome
that is your sample space. Okay, let me
talk about next more three they are also
important. What do you understand by
equally likely events guys? Equally likely
likely events.
events.
What is the meaning of equally likely
events? Equally likely guys from the
name itself try to understand like equal
what look likely means probability
events means the event what we having
don't you think so events which have the
equal probability from the name itself
we can define the definition right
events that have equal probability
for example your coin
The probability of head and the
probability of tail is equal that is 1
by 2 or when you have a dice. So the
probability of any number here is equals
to 1 by 6 means every outcome is having
equal number of probability that events
are known as equally likely events. So
Events that have the same chances or
probability of occurring. But here is
one note point guys. What is that? No
the events or the outcomes should be
independent of each other. They should
not depend upon each other. So if I am
toss uh tossing a coin my head and tails
are not dependent right like if first
time I'll get head second time it's
necessary I will get tail or something
no they are independent of each other so
that is very much necessary the outcome
of one event should be independent of
other. I hope it is clear.
The next one mutually exclusive events.
Mutually exclusive events. Try to think.
So mutually exclusive exclusive means
don't include anything. Right? Now
events that cannot happen
simultaneously. Right? That's are
mutually exclusive. Like again flipping
a coin. I cannot get head and tail both
together. Right? We cannot get any
either from dice also. I cannot get two
numbers same time. So events
that cannot
happen simultaneously.
Right? Another example of weather or
climate. Climate can be either hot or
cold. At one time it will be either hot
or it will be either cold. Coin example,
dice example, true false example or any
one zero example as well. Clear? Then
similarly we have one more which is
mutually inclusive.
Now tell me what would be this mutually inclusive
events can occur simultaneously
okay I'll give you one example then you
think of any other so if I'm taking a
card from the deck of card so I want to
find what is the probability
that the card is of um diamond. The card
is of diamond and a number four. So here
we are having two events together. One
is card is red and number is or we can
say card is red of number four. holiday
and Sunday like what is the probability
that it will be a Sunday and also a
national holiday. So, so these are the
few examples of mutually exclusive and
mutually inclusive events. Before moving
ahead, let's discuss about types of probability.
So, probability can be broadly classified
classified
into three types.
The first type is known as marginal.
The second type is known as joint and
the third type is known as conditional.
So probability is broadly classified
into these three types. So have you
heard about these types before? So let
us discuss with a very very beautiful
example. That example is actually I've
taken so that you can understand about
these three individually and also the
differences between these three. So
using one example only we will try to
distinguish between these three types.
So let me build the example first.
So a survey was conducted
with 500 strangers. Okay. There was a
survey regarding games that which game
is played by the person and the results
So the games like they asked that male,
female, total. So how many of them plays
football or rugby
or any other game? Okay, a survey was
conducted. So total we are having 500
persons. So from all the 500 persons
there were 270
ms and 230 females and the total who
plays rugby was 195 sorry football was
195 and for rugby was 125 and other was
180. Then from all the persons who plays
football and also who is male means how
many males were playing football that is
120. How many females were playing
football? That was 75. Then how many
males were playing rugby? That was 100.
And how many females were playing rugby?
That was 25. And how many males play
other sports? That was 50. And how many
females play? That is 130. Let me know
if this uh picture is clear like how it
came. So this was the number but if I
want to get the probability so I'll
mention here to get the proper probabilities
probabilities divide
divide
each by total number of persons that is
male
male total football
football
rugby others.
So kindly divide each and every value by
500 and tell me what will be the output.
So I'm giving you time of 2 minutes. You
make a note at your end where you will
divide each and every element by 500,
120 by 500, 75 by 500, 195 by 500. So
after 2 minutes I will start what
answers you get. Clear? Keep a note at
your end and divide each and everything
with 500. What is 120 by 500? 0.25 or 0.24.
0.24.
It's 0.24 24 not 25 0.24
0.24
75 divided by 500 good 0.15
Similarly 100 by 500.25
by 500 will be 0.05
125 by 500 will be 0.25 25 then 0.1 0.26
0.26 0.36
0.36
then 0.54
0.46 4 6 and right so we have find the
probabilities now let us discuss that
what is marginal joint and conditional
probability so first I will talk about
the joint probability joint probability
that was just an example through this
example we will understand about the
probabilities okay right now we were
just discussing about one example I have
built the plot now now I will use this
example to explain you about the
probabilities. Now joined guys from the
name itself it is clear that we are
joining two events that means don't you
think so it is kind of mutually
inclusive events means the probability
of the events occurring together. So in
our case don't you think the probab if
we want to find the probability that the
person is a male and play rugby
that is this probability is 0.2.
Similarly person is a female and plays
rugby that is 0.05. So these two two
probabilities are your joint
probability. What does this mean? That
it is used to calculate
the probability.
It is used to calculate the probability
of two events occurring together at the
same time. So it is denoted by P A and B
or probability of A comma D. It can be
more than two also. Mohammed depends
upon the example or depends upon the condition.
condition.
So in our example,
so in our example, joint probability
of someone being a male and liking
football, right? Or all the values which
are here 1 2 3 4. Don't you think so?
These probabilities are joint
probabilities. All the six highlighted
probabilities are the example of joint
probability. Joint probability means the
probability of two events occurring
together. So in this all probabilities
we have two events. What one is the
gender event other is the game event. So
here means that I am finding the
probability the person is a male and
playing football.
Right? Similarly here the example is
person is a female and playing football.
So two events we are having one is it's
a football and the gender is female.
Similarly here we can say that the
person is a male and play rugby whereas
here the person is a female and plays
rugby. Clear? But there is one note
point guys we have one note point. What
is that note is joint probability is symmetrical.
symmetrical.
Now what do you mean by symmetrical
here? What do you mean by symmetrical?
Means if I'm finding the probability of
A and B it should equals to probability
of B and A like if I'm finding what is
the probability the person is a male and
play rugby should be equals to what is
the probability the person first plays
rugby and then it is a male. So both of
the cases it should be equal. So that's
what joint probability should be symmetrical.
symmetrical.
What will be marginal? Try to think. Try
to think guys. What will be marginal? So
if joint means two events together,
don't you think? So marginal means a
single event we are talking about. So in
our example we are basically talking
about the person is either male or
female or either the person is playing a
particular game. So
marginal probability it is the
probability or it is the probability of
an event irrespective
of the outcome of another variable. So
it is denoted by probability
of B we are talking about. So like this
was an example. So here like if we see
that uh so from all the 500 people why
want to find the probability that the
person was male we don't care what game
he preferred we only prefer that the
person was a male to this probability similarly 0.4 for said that the person
similarly 0.4 for said that the person is a female or 500 people I want the
is a female or 500 people I want the people who plays football irrespective
people who plays football irrespective of what gender they are 0.39 0.25 25 and
of what gender they are 0.39 0.25 25 and 0.2. So don't you think so? These are
0.2. So don't you think so? These are the examples of marginal probabilities.
the examples of marginal probabilities. In these examples, we are talking about
In these examples, we are talking about the person is either male or female. We
the person is either male or female. We don't care what game they are playing.
don't care what game they are playing. We only care from 500 people. What is
We only care from 500 people. What is the probability the person was a male or
the probability the person was a male or the person was a female? I don't care
the person was a female? I don't care about the game. Similarly, these
about the game. Similarly, these probabilities were games irrespective of
probabilities were games irrespective of the gender. I want the probability of
the gender. I want the probability of people who plays football irrespective
people who plays football irrespective of they are males or females. So
of they are males or females. So individual probabilities is known as
individual probabilities is known as marginal probability. Individual means
marginal probability. Individual means that we are concerning only about one
that we are concerning only about one event. Here we have two events. Now one
event. Here we have two events. Now one is gender, other is game. So either we
is gender, other is game. So either we are focusing only upon the gender the
are focusing only upon the gender the person is male or female irrespective of
person is male or female irrespective of the game or we are only considered about
the game or we are only considered about the game irrespective of gender.
the game irrespective of gender. So either we will consider game or
So either we will consider game or either we will consider gender. Then
either we will consider gender. Then there's another which is conditional
there's another which is conditional probability.
Conditional probability. Now conditional probability means that it defines the
probability means that it defines the probability of one event occurring given
probability of one event occurring given that another event has occurred. That
that another event has occurred. That means one event already occurred. We
means one event already occurred. We want to find the probability of next
want to find the probability of next event which is dependent upon the
event which is dependent upon the previous event. So the formula of
previous event. So the formula of conditional probability is probability
conditional probability is probability of a comma b divided by probability of
of a comma b divided by probability of b. So here we are finding probability of
b. So here we are finding probability of a given b. Now if we talk about
a given b. Now if we talk about examples. Now suppose in our example
examples. Now suppose in our example only it is given that the person we
only it is given that the person we selected
selected is a female. So for example probability
is a female. So for example probability of preferring a sport
of preferring a sport given the observant
given the observant is female. Now one event already given
is female. Now one event already given that the person is female and we want
that the person is female and we want find the probability of preferring a
find the probability of preferring a sport suppose rugby right so probability
sport suppose rugby right so probability of we need to find the rugby given that
of we need to find the rugby given that it's a female so that will be
it's a female so that will be probability of like according to the
probability of like according to the formula a comma b means probability of
formula a comma b means probability of female plays rugby rugby divided by
female plays rugby rugby divided by probability that it is a female. So
probability that it is a female. So probability of female plays rugby. What
probability of female plays rugby. What is the probability? Tell me. It's a
is the probability? Tell me. It's a female and plays rugby. What is the
female and plays rugby. What is the probability guys? 0.05
probability guys? 0.05 divided by it's a probability that it's
divided by it's a probability that it's a female. What is the probability that
a female. What is the probability that it's a female? 0.46.
it's a female? 0.46. So this is how we calculate our
So this is how we calculate our conditional probability. Let's talk
conditional probability. Let's talk about one more that is base theorem.
Base theorem is actually nothing guys. It's the extension of conditional
It's the extension of conditional probability.
probability. Mathematical extension of conditional
Mathematical extension of conditional probability.
probability. So the formula of base theorem is
So the formula of base theorem is probability of b by a into probability
probability of b by a into probability of a divided by the same statement again
of a divided by the same statement again probability of b by a comma probability
probability of b by a comma probability of a plus probability of b when a did
of a plus probability of b when a did not occur multiply by probability of e.
not occur multiply by probability of e. It is nothing just the extension of your
It is nothing just the extension of your conditional probability the mathematical
conditional probability the mathematical extension. So derivation and all that
extension. So derivation and all that stuff will not be important guys for the
stuff will not be important guys for the base theorem. Okay. So this was all
base theorem. Okay. So this was all about your probability.
about your probability. So if no doubts let's discuss about the
So if no doubts let's discuss about the probability distribution.
probability distribution. Okay probability distribution guys it's
Okay probability distribution guys it's very very easy to understand. Suppose
very very easy to understand. Suppose you throw 100
you throw 100 dice. We throw a dice 100 times. Right
dice. We throw a dice 100 times. Right now I will record that how many times
now I will record that how many times I'll get one. How many times I'll get
I'll get one. How many times I'll get two? How many times I'll get other
two? How many times I'll get other outcomes. Okay, I'll record that 1 2 3 4
outcomes. Okay, I'll record that 1 2 3 4 5 6. Now suppose you got one five times
5 6. Now suppose you got one five times you got two
you got two suppose 10 times you got three suppose
suppose 10 times you got three suppose more number of times 25 times or you got
more number of times 25 times or you got four then you got five then you got six
four then you got five then you got six you have just count the frequency
you have just count the frequency right Now
right Now when I am smoothening out this graph
when I am smoothening out this graph like this, when I'm smoothing out this
like this, when I'm smoothing out this histogram like this, this smoothening
histogram like this, this smoothening part is your probability distribution.
part is your probability distribution. Clear? The smoothening part of this
Clear? The smoothening part of this is known as probability distribution.
So it is basically finding the probability
probability of different possible values of a
of different possible values of a variable. So here my possible values was
variable. So here my possible values was 1 2 3 4 5 6. So it is telling me what is
1 2 3 4 5 6. So it is telling me what is the probability. So here I can see that
the probability. So here I can see that the probability of coming three is the
the probability of coming three is the highest and probability of coming one
highest and probability of coming one and six is the lowest. So that's what
and six is the lowest. So that's what probability distribution helps you clear
probability distribution helps you clear the normal meaning of probability
the normal meaning of probability distribution. Yes or no? Now there are
distribution. Yes or no? Now there are two types of probability distribution
two types of probability distribution guys means two types of data we are
guys means two types of data we are having. So because we have two types of
having. So because we have two types of data we have two types of probability
data we have two types of probability distribution. So I hope everybody knows
distribution. So I hope everybody knows what is that two type of data discrete
what is that two type of data discrete and continuous. You know what is the
and continuous. You know what is the meaning of discrete and what is the
meaning of discrete and what is the meaning of continuous?
meaning of continuous? Yes. So ma means that for discrete data
Yes. So ma means that for discrete data we will be using discrete probability
we will be using discrete probability distribution
distribution and for continuous data we will be using
and for continuous data we will be using continuous probability distribution.
continuous probability distribution. Right? So I will write here used for
Right? So I will write here used for discrete value used for continuous
discrete value used for continuous values. Right? So can anybody give me
values. Right? So can anybody give me the examples of discrete data? Example
the examples of discrete data? Example of discrete data. So number of students
of discrete data. So number of students not marks number of students in a class
not marks number of students in a class because students can be either 30 or 31.
because students can be either 30 or 31. We cannot say we have 30.5 students.
We cannot say we have 30.5 students. Right? Number of students or we can say
Right? Number of students or we can say coin example head or tail or we can say
coin example head or tail or we can say dice example we can get 1 2 3 4 5 6 we
dice example we can get 1 2 3 4 5 6 we cannot get 1.1 2.1 like that right then
cannot get 1.1 2.1 like that right then what will be the example of continuous
what will be the example of continuous what will be the example of continuous
what will be the example of continuous values heartbeat good very nice weight
values heartbeat good very nice weight or height of a student loan amount
or height of a student loan amount temperature very nice
temperature very nice load these are the examples of
load these are the examples of continuous value now guys I'll tell you
continuous value now guys I'll tell you one important thing in reality now we
one important thing in reality now we will get this histogram you can see we
will get this histogram you can see we will get this histogram to achieve this
will get this histogram to achieve this curve we use a function of probability
curve we use a function of probability distribution
distribution we use a function okay so when We have a
we use a function okay so when We have a discrete data. The function name is PMF
discrete data. The function name is PMF which is probability
which is probability mass function. Using this function, we
mass function. Using this function, we will plot the discrete probability
will plot the discrete probability distributions.
distributions. Right? Similarly, when we have the
Right? Similarly, when we have the continuous data, the function we will
continuous data, the function we will use is PDF. The full form is probability
use is PDF. The full form is probability density
density function. So using these functions we
function. So using these functions we will plot the probability distribution
will plot the probability distribution curve. Now we will study about different
curve. Now we will study about different types of probability distribution guys
types of probability distribution guys we are having. So in discrete we have
we are having. So in discrete we have binomial probability distribution
binomial probability distribution but knowledge
but knowledge and poisons
and poisons whereas in continuous we have normal
whereas in continuous we have normal distribution.
distribution. So we will discuss about these
So we will discuss about these distributions one by one. So this was
distributions one by one. So this was the flow chart of probability
the flow chart of probability distributions.
distributions. First let us discuss about Bernnoli.
First let us discuss about Bernnoli. Bernnoli's distribution. It's very very
Bernnoli's distribution. It's very very easy. But knowledge distribution means
easy. But knowledge distribution means when distribution whenever our outcome
when distribution whenever our outcome is binary guys outcome is outcome of an
is binary guys outcome is outcome of an event is binary like real time where we
event is binary like real time where we can use it like suppose I want to find
can use it like suppose I want to find the probability that tomorrow it will be
the probability that tomorrow it will be raining or not one or zero head or tail
raining or not one or zero head or tail whenever we are having only two outcomes
whenever we are having only two outcomes we will be using the Bernoli is
we will be using the Bernoli is distribution what is the probability
distribution what is the probability that you will pass in tomorrow's exam
that you will pass in tomorrow's exam right but only one thing you need to
right but only one thing you need to note here what you need to note here is
note here what you need to note here is number of trials suppose if I am tossing
number of trials suppose if I am tossing a coin and I want to find the
a coin and I want to find the probability of head or tail so two
probability of head or tail so two events I'm having means two outcomes I'm
events I'm having means two outcomes I'm having that means I'll use boli's there
having that means I'll use boli's there but number of trials will be one means
but number of trials will be one means only one time I can toss was the coin
only one time I can toss was the coin only one time I can find it. So for this
only one time I can find it. So for this the function is PMF. Remember I told you
the function is PMF. Remember I told you about that we will use the function as
about that we will use the function as PMF. So let me give you the formula of
PMF. So let me give you the formula of PMF but you don't need to remember it.
PMF but you don't need to remember it. So don't worry it is just a mathematical
So don't worry it is just a mathematical formula. You don't need to remember this
formula. You don't need to remember this formulas. So it will be either the
formulas. So it will be either the probability will be yes
probability will be yes or the probability will be no. Here x is
or the probability will be no. Here x is equals to number of
equals to number of trials. So the formula exactly is don't
trials. So the formula exactly is don't go with the formulas
go with the formulas probability mass function.
probability mass function. Okay.
This is the PMF function. Here when we comes to real cases,
Here when we comes to real cases, do we need to find anything only one
do we need to find anything only one time? Means number of trials should be
time? Means number of trials should be one. Yes or no? No. Right? That's why we
one. Yes or no? No. Right? That's why we have the extension of this Bernoli.
have the extension of this Bernoli. We have the extension that is binomial
We have the extension that is binomial distribution.
Extension of Bernoli's
of Bernoli's P means probability. Here also we are
P means probability. Here also we are having two independent outcomes like
having two independent outcomes like where head and tail, true false, yes no
where head and tail, true false, yes no kind of situations but only the
kind of situations but only the difference is here number of trials will
difference is here number of trials will be infinite or multiple. So the formula
be infinite or multiple. So the formula here is again I'm saying no you do not
here is again I'm saying no you do not need to learn the formula only it is for
need to learn the formula only it is for the developers who actually creates this
the developers who actually creates this formulas but a basic understanding that
formulas but a basic understanding that is required n is equals to number of
is required n is equals to number of trial
trial x is equals to number of successes
x is equals to number of successes and q is equals to probability of 1
and q is equals to probability of 1 minus
minus So difference between bernoli and
So difference between bernoli and binomial is only one that is number of
binomial is only one that is number of trials. Clear?
trials. Clear? Bernoli means one trial. Binomial means
Bernoli means one trial. Binomial means multiple trials. That's it. Do you do
multiple trials. That's it. Do you do not read to uh remember the formulas.
not read to uh remember the formulas. Clear? Then let's discuss about the
Clear? Then let's discuss about the poisons distribution. Now think of a
poisons distribution. Now think of a situation guys that you are manager of
situation guys that you are manager of the server team in your company. Okay.
the server team in your company. Okay. Now it's a company of a call center
Now it's a company of a call center where you get several calls. Right? Now
where you get several calls. Right? Now everyone knows that if we will get calls
everyone knows that if we will get calls there will be a burden on the server if
there will be a burden on the server if the calls are very huge. Right? Now if
the calls are very huge. Right? Now if you are a manager of that you should be
you are a manager of that you should be aware about right what is the times when
aware about right what is the times when I get the highest falls in my customer
I get the highest falls in my customer care so that I can increase the load or
care so that I can increase the load or I can uh increase the capacity of the
I can uh increase the capacity of the servers right whereas what is the time
servers right whereas what is the time zone when I get very less calls so that
zone when I get very less calls so that I can leave the server as it is because
I can leave the server as it is because the thing is suppose the load of the
the thing is suppose the load of the server is less F and you got multiple
server is less F and you got multiple calls at that time your server will be
calls at that time your server will be overload and maybe it will crash also
overload and maybe it will crash also that time we use this poisons
that time we use this poisons distribution when we need to find the
distribution when we need to find the probability
probability of a certain duration
of a certain duration let's take another example suppose this
let's take another example suppose this Saturday Sunday you sold this many books
Saturday Sunday you sold this many books you want to find the probability that
you want to find the probability that what will be the sales in next
what will be the sales in next Saturday's and Sunday or you can say the
Saturday's and Sunday or you can say the arrival of emails. Getting it? So
arrival of emails. Getting it? So wherever you want to find the
wherever you want to find the probability between a specific time
probability between a specific time duration, you will use poison
duration, you will use poison distribution. There a discrete
distribution. There a discrete distribution
distribution estimates the likelihood of an event in
estimates the likelihood of an event in a given
a given time duration.
time duration. So here formula is lambda x / x
So here formula is lambda x / x factorial e minus lambda. So x means
factorial e minus lambda. So x means number of successes.
number of successes. Lambda means mean number of successes.
Lambda means mean number of successes. E means a constant value.
E means a constant value. So these were the three discrete
So these were the three discrete probability distribution.
So the last distribution is normal distribution which is very very
distribution which is very very important like very very important
important like very very important comparative to your other distributions
comparative to your other distributions also and also very very very very easy.
also and also very very very very easy. something when I will plot the
something when I will plot the distribution curve it could come to any
distribution curve it could come to any shape right the shape could be like this
shape right the shape could be like this or the shape could be like this or
or the shape could be like this or anything else right but when the shape
anything else right but when the shape of your distribution is like this
of your distribution is like this shape is known as bell curve so if the
shape is known as bell curve so if the shape of your distribution is like a
shape of your distribution is like a bell curve it's looking like a bell here
bell curve it's looking like a bell here right it's looking like a bell so the
right it's looking like a bell so the shape of the distribution is like a bell
shape of the distribution is like a bell we call it as normal distribution.
we call it as normal distribution. So when distribution
So when distribution follows
follows a bell curve
a bell curve we call it as normal distribution only
we call it as normal distribution only the shape guys
the shape guys know why this shape is so important
know why this shape is so important right why if I have the specific shape
right why if I have the specific shape we have a type of distribution there is
we have a type of distribution there is some special things about this what
some special things about this what the first property it has is this shape
the first property it has is this shape the mean, median and mode values will be
the mean, median and mode values will be equals to each other.
equals to each other. Means whenever we have this shape my 3 m
Means whenever we have this shape my 3 m of statistic values will be same. The
of statistic values will be same. The second property is it is symmetrical. It
second property is it is symmetrical. It is symmetrical. Symmetrical means mirror
is symmetrical. Symmetrical means mirror image. Left hand side will be equals to
image. Left hand side will be equals to right hand side. And the third property
right hand side. And the third property is a special case special case of this
is a special case special case of this normal distribution which is known as
normal distribution which is known as standard normal distribution.
standard normal distribution. Now what is the special case? So I know
Now what is the special case? So I know that in normal distribution
that in normal distribution my values of mean, median, mode will be
my values of mean, median, mode will be equal. But that could be equals to
equal. But that could be equals to anything. That could be equals to 6,
anything. That could be equals to 6, that could be equals to 10, that could
that could be equals to 10, that could be equals to 100, anything. But in
be equals to 100, anything. But in standard normal distribution that value
standard normal distribution that value will be always equals to zero
will be always equals to zero means at origin. What does this mean?
means at origin. What does this mean? That if I'm talking about the normal
That if I'm talking about the normal distribution that could be anywhere in
distribution that could be anywhere in the graph in but if I'm talking about
the graph in but if I'm talking about standard normal distribution that will
standard normal distribution that will only lie at the origin value and one
only lie at the origin value and one more standard deviation value will be
more standard deviation value will be equals to 1. That's what about your
equals to 1. That's what about your normal distribution is the function here
normal distribution is the function here is f_sub_x is equ= to 1 by sigma
is f_sub_x is equ= to 1 by sigma under<unk> 2 pi
under<unk> 2 pi exponential of - 1 by 2 x - mu by sigma
exponential of - 1 by 2 x - mu by sigma this is the function
this is the function where sigma is standard deviation
where sigma is standard deviation mu means mean.
mu means mean. Now there's one last last concept which
Now there's one last last concept which is similar to this you know normal
is similar to this you know normal distribution which is central limit
distribution which is central limit theorem. Anybody knows what is central
theorem. Anybody knows what is central limit theorem? It is related to this
limit theorem? It is related to this normal distribution only. Anybody knows
normal distribution only. Anybody knows what is it? It says that it's a like
what is it? It says that it's a like proven experimental technique guys. It's
proven experimental technique guys. It's not like we can see examples or we can
not like we can see examples or we can see something like that. No, it is just
see something like that. No, it is just an experimental value or experimental
an experimental value or experimental theorem which scientists discover. What
theorem which scientists discover. What they discovered is like whatever the
they discovered is like whatever the shape of the population is, the
shape of the population is, the population could be of any shape that
population could be of any shape that could be
could be this shape that could be this shape that
this shape that could be this shape that could be any shape of the distribution.
could be any shape of the distribution. Central limit says that if I am taking a
Central limit says that if I am taking a sample from this population
sample from this population and that size is greater than 30, it
and that size is greater than 30, it will that sample will most probably
will that sample will most probably follows a normal distribution.
follows a normal distribution. Means here sample population or sample
Means here sample population or sample distribution will not depend upon
distribution will not depend upon population.
population. Whatever the shape of the population is,
Whatever the shape of the population is, if I'm taking a sample greater than 30,
if I'm taking a sample greater than 30, it tries to or it tends to be a normal
it tries to or it tends to be a normal distribution. So let me define it
distribution. So let me define it properly so that you can even answer it.
properly so that you can even answer it. It's an important interview question as
It's an important interview question as well. So you can easily answer it. It
well. So you can easily answer it. It states that
states that if we take several
if we take several random samples
random samples from a population
from a population that has a finite variant, the mean of
that has a finite variant, the mean of the samples
the samples tends
tends to follow.
to follow. Clear? So here your sample size
Clear? So here your sample size should be greater than 30. That's what
should be greater than 30. That's what your central limit theorem is.
your central limit theorem is. So I'm importing the necessary libraries
So I'm importing the necessary libraries first. Now keep the name of the data as
first. Now keep the name of the data as uh data. Okay. So let it equore
CSV in which I am loading it country profile
profile variables dot CSV
variables dot CSV and TF dot head.
What is the shape of the guys? Uh complete shape of the file. So the
complete shape of the file. So the command which we can use is df dot shape
command which we can use is df dot shape here. So this will give me the shape of
here. So this will give me the shape of the data which is 229 and 50. So 229
the data which is 229 and 50. So 229 rows you're having and 50 columns you're
rows you're having and 50 columns you're having right?
having right? So you can see here like how many
So you can see here like how many columns 50 columns we are having right?
columns 50 columns we are having right? So df dot columns
So df dot columns it we are having 50 columns here. So
it we are having 50 columns here. So this data set I think everyone can
this data set I think everyone can understood that this is regarding the
understood that this is regarding the variables of the country. What is the
variables of the country. What is the name of the country? What is the region
name of the country? What is the region in which the country is surface area of
in which the country is surface area of the country? Population in thousand
the country? Population in thousand population density sex ratio and so on.
population density sex ratio and so on. We have these parameters. So if you from
We have these parameters. So if you from the geographic background you are well
the geographic background you are well aware about these parameters yes or no
aware about these parameters yes or no otherwise you can just understand it as
otherwise you can just understand it as the variables related to your country.
the variables related to your country. So from this data we need to just apply
So from this data we need to just apply the descriptive statistics concepts. So
the descriptive statistics concepts. So before that few commands which are very
before that few commands which are very very important for you. So the first
very important for you. So the first command is how to check the null values
command is how to check the null values in your data.
in your data. How to check the
How to check the null values in the data.
null values in the data. So write the name of the data frame dot
So write the name of the data frame dot is null command. Okay, this command will
is null command. Okay, this command will give you the null values. But the thing
give you the null values. But the thing is there is one problem with this
is there is one problem with this command. What is that? It will give me
command. What is that? It will give me the output in the boolean format. So you
the output in the boolean format. So you can see that wherever you will be having
can see that wherever you will be having a null value it will give you true
a null value it will give you true otherwise it will give you false right
otherwise it will give you false right but the thing is it is difficult for you
but the thing is it is difficult for you to manually see each and every value
to manually see each and every value wherever I'm having true right so what
wherever I'm having true right so what we use is dot sum here what this dot sum
we use is dot sum here what this dot sum will do is basically count the number of
will do is basically count the number of null values so right now you can see
null values so right now you can see that every in every column it is zero
that every in every column it is zero that means that you don't have null
that means that you don't have null values in your data. So again if I use
values in your data. So again if I use total sum it will give me the total
total sum it will give me the total number of null values in the data which
number of null values in the data which is zero in our case because we don't
is zero in our case because we don't have null case we have null values it
have null case we have null values it will give me the count of that null
will give me the count of that null value. Okay. Now tell me what if we have
value. Okay. Now tell me what if we have null values. Rajes first sum will give
null values. Rajes first sum will give me the null values in every column. You
me the null values in every column. You can see here and second value is
can see here and second value is actually counting these all numbers
actually counting these all numbers present here. The total count it is
present here. The total count it is given. Okay. Now the thing is guys what
given. Okay. Now the thing is guys what if we have right now we don't have but
if we have right now we don't have but what if we have null values? What to do
what if we have null values? What to do in that case? What if we have null
in that case? What if we have null values? So there are two methods we can
values? So there are two methods we can do it. One is drop the null values. So
do it. One is drop the null values. So if the values are very less, we can drop
if the values are very less, we can drop it. Otherwise we can fill the null
it. Otherwise we can fill the null values. Now we will fill the null values
values. Now we will fill the null values with what? We use the fill we use the
with what? We use the fill we use the fill now command to fill those values.
fill now command to fill those values. But with what we fill it with what we
But with what we fill it with what we fill the null values. Single sum is
fill the null values. Single sum is giving you the number of null values in
giving you the number of null values in each column separately and double sum is
each column separately and double sum is basically giving you the total count in
basically giving you the total count in the data frame. So it depends upon you.
the data frame. So it depends upon you. If you want to see in every column what
If you want to see in every column what are the null values, you can use single
are the null values, you can use single sum and if you want to use the total
sum and if you want to use the total null values in my data frame, I'll use
null values in my data frame, I'll use the doubles. So I'll give you a separate
the doubles. So I'll give you a separate command for it. Then if you're getting
command for it. Then if you're getting confused here so single sum is giving
confused here so single sum is giving you the null values in each and every
you the null values in each and every column and the double sum will give you
column and the double sum will give you the count. So I'll separated the command
the count. So I'll separated the command so that you can easily relate. So we
so that you can easily relate. So we will fill the null values guys most
will fill the null values guys most probably with mean or median or if it's
probably with mean or median or if it's a object data we will fill it with and
a object data we will fill it with and sometimes we fill it with zero also. Now
sometimes we fill it with zero also. Now how to check for duplicates?
how to check for duplicates? Do we have duplicate data or not? How we
Do we have duplicate data or not? How we check that? What is the command? It's df
check that? What is the command? It's df dot duplicated guys. It's not duplicate.
dot duplicated guys. It's not duplicate. It is duplicated. Not duplicates also.
It is duplicated. Not duplicates also. See if I'm writing here duplicates it
See if I'm writing here duplicates it will show me an error or it if I write
will show me an error or it if I write only duplicate again it will show me
only duplicate again it will show me error. So the command is duplicator. But
error. So the command is duplicator. But again the same problem we are having it
again the same problem we are having it will give me in boolean format. So we
will give me in boolean format. So we what we will use dot sum. So it will
what we will use dot sum. So it will count the duplicate values. So right now
count the duplicate values. So right now we have zero duplicate values. Now let
we have zero duplicate values. Now let us start our descriptive stats. So for
us start our descriptive stats. So for this randomly you can pick any column
this randomly you can pick any column guys. I am taking this population in
guys. I am taking this population in thousands column. Okay. So we will
thousands column. Okay. So we will initialize we will do our descriptive
initialize we will do our descriptive stats based on this column here. You can
stats based on this column here. You can pick any column as you want. Okay. But
pick any column as you want. Okay. But for this session I also first do with
for this session I also first do with what I am doing and for the practice you
what I am doing and for the practice you can take any other column. So for our
can take any other column. So for our analysis we have chosen this population
analysis we have chosen this population in thousands column. So first we will
in thousands column. So first we will find the mean. So first and what is the
find the mean. So first and what is the basic formula of mean guys? Like first
basic formula of mean guys? Like first let let us do it manually and then we
let let us do it manually and then we will use Python function. So what is the
will use Python function. So what is the formula of mean sum of observations
formula of mean sum of observations divided by total number of observations
divided by total number of observations right now let us do how we can if we
right now let us do how we can if we don't know what is the function and all
don't know what is the function and all how we can do it manually. So
how we can do it manually. So sum of observations means I need to take
sum of observations means I need to take the sum of this column population in
the sum of this column population in thousands. So I'll copy the name from
thousands. So I'll copy the name from here. So it's better to copy the name
here. So it's better to copy the name instead of writing it because in writing
instead of writing it because in writing we can miss some spaces and all which
we can miss some spaces and all which will cause me errors. So how to get the
will cause me errors. So how to get the sum guys? I think you should know the
sum guys? I think you should know the command which function will help me
command which function will help me first to get the sum. I think dot sum.
first to get the sum. I think dot sum. Yes or no? Dot sum. Now I can print sum
Yes or no? Dot sum. Now I can print sum of observations
of observations is sum. So sum of observations we got is
is sum. So sum of observations we got is this. Now we need to find the total. So
this. Now we need to find the total. So I'm choosing a variable again I'm
I'm choosing a variable again I'm choosing this column. How to find what
choosing this column. How to find what is the total number of observations
is the total number of observations here. So either you can use guys dot
here. So either you can use guys dot count function. Dot count function. It
count function. Dot count function. It will count the number of observations or
will count the number of observations or you can even use the length function.
you can even use the length function. Length will also give you the same
Length will also give you the same answer. Both you can do right. So we got
answer. Both you can do right. So we got sum, we got total. Now we can easily
sum, we got total. Now we can easily find mean is equals to sum divided by
find mean is equals to sum divided by total. Let's see what is the answer. So
total. Let's see what is the answer. So it's 32756.
So now similarly guys manually try to do median
median and mode also. How manually we can find
and mode also. How manually we can find median and how manually we can find
median and how manually we can find mode. So first we will find the length
mode. So first we will find the length of
of your column.
your column. Okay. So basically at first we need to
Okay. So basically at first we need to find is it a even length or an odd
find is it a even length or an odd length. So we can see here it is odd
length. So we can see here it is odd length. So basically if I divide here by
length. So basically if I divide here by two it will give me the index value of
two it will give me the index value of the what we say the middle element right
the what we say the middle element right now I can easily from you I can find the
now I can easily from you I can find the element at this index. So I'll take the
element at this index. So I'll take the column name but first we need to sort
column name but first we need to sort the values also. sort values
the values also. sort values because after sorting only we need to do
because after sorting only we need to do right and here I can find the index of
right and here I can find the index of 114
114 that is 223 but there is one more thing
that is 223 but there is one more thing guys there is one more thing you need to
guys there is one more thing you need to understand is what when we sorted the
understand is what when we sorted the values so if I sort here when we sorted
values so if I sort here when we sorted the values the index values are also
the values the index values are also changed
changed based on what sorting is done. So that
based on what sorting is done. So that means you also need to do reset index or
means you also need to do reset index or we can say ignore index. Ignore index is
we can say ignore index. Ignore index is equals to true. What this will do is it
equals to true. What this will do is it will basically give you the index values
will basically give you the index values from 012 instead of the shuffling one.
from 012 instead of the shuffling one. Clear everyone? And now I can find what
Clear everyone? And now I can find what is at 1 14 index. 548
is at 1 14 index. 548 is your median. Clear everyone? How we
is your median. Clear everyone? How we found the median?
found the median? Then how to find mode? Mode is I think
Then how to find mode? Mode is I think very very simple. Just value counts,
very very simple. Just value counts, right? Dot value comes
right? Dot value comes it will give me. So the highest time
it will give me. So the highest time repetitive is 2890
repetitive is 2890 51. So I can see here multiple modes I
51. So I can see here multiple modes I am having. So remember we discussed the
am having. So remember we discussed the example in theory also that we can have
example in theory also that we can have multiple modes. So here you can see here
multiple modes. So here you can see here that we do have multiple modes.
that we do have multiple modes. Now this is how we have done it manually
Now this is how we have done it manually right. But do we have functions for it?
right. But do we have functions for it? Yes we have. What is that functions?
Yes we have. What is that functions? So in real life we don't do manually we
So in real life we don't do manually we use functions. So we have for mean we
use functions. So we have for mean we have dot mean function.
have dot mean function. So I'll print this
So I'll print this mean is
similarly for median and for mode
and for mode for median and for mode we have these
for median and for mode we have these functions.
functions. Dear everyone so you can see here you
Dear everyone so you can see here you are getting multiple modes here. Why?
are getting multiple modes here. Why? because we have multiple values which is
because we have multiple values which is repeated equal number of times like the
repeated equal number of times like the example we discussed in theory. So
example we discussed in theory. So that's why you have multiple modes here.
that's why you have multiple modes here. Clear everyone. So in real we can use
Clear everyone. So in real we can use these functions to find mean, median and
these functions to find mean, median and mode.
mode. So see the difference in the output. You
So see the difference in the output. You have the index values jumbled up here,
have the index values jumbled up here, right? And in ignore index you don't
right? And in ignore index you don't have the jumbled up values.
have the jumbled up values. Okay. So we have found the mean, median,
Okay. So we have found the mean, median, mode. Then we will find the variance and
mode. Then we will find the variance and standard deviation.
standard deviation. For variance the function is dot var.
For variance the function is dot var. Okay. And for standard deviation the
Okay. And for standard deviation the function name is it is std.
function name is it is std. Run this and you can see variance of
Run this and you can see variance of your data is and standard deviation of
your data is and standard deviation of your data is this. Right? Now tell me
your data is this. Right? Now tell me relationship of this is very good
relationship of this is very good interview question also. What is the
interview question also. What is the relationship between variance and
relationship between variance and standard deviation? What is the
standard deviation? What is the relationship between variance and
relationship between variance and standard deviation? Guys, we have
standard deviation? Guys, we have studied this. Your standard deviation
studied this. Your standard deviation is equals to square root of
is equals to square root of standard variance. That was the
standard variance. That was the relationship. Now can we cross check it
relationship. Now can we cross check it that is it really true that means this
that is it really true that means this value should be the square root of this
value should be the square root of this value. So let us check how we can do it.
value. So let us check how we can do it. First import math in which std is equals
First import math in which std is equals to math dot square root
to math dot square root or the function name we can write sqrt
or the function name we can write sqrt also. Yes, just a minute. We can write
also. Yes, just a minute. We can write sqrt in which I will pass the variable
sqrt in which I will pass the variable name. So let me store it into a
name. So let me store it into a variable. So I'll take this value and
variable. So I'll take this value and store it into or I can we can directly
store it into or I can we can directly pass also instead of storing and st let
pass also instead of storing and st let it run. So don't you think so you are
it run. So don't you think so you are getting the exact value. So it is now
getting the exact value. So it is now proven that your standard deviation is
proven that your standard deviation is actually the square root of variance
actually the square root of variance only. So the thing which we have studied
only. So the thing which we have studied in theory we have proved it here.
in theory we have proved it here. Similarly how to find the range? We have
Similarly how to find the range? We have studied about range also. What is the
studied about range also. What is the formula to find the range?
formula to find the range? Maximum minus minimum. Right? So how we
Maximum minus minimum. Right? So how we can find the maximum?
can find the maximum? Which function we have for maximum dot
Which function we have for maximum dot max right? Similarly for minimum we have
max right? Similarly for minimum we have the function dot min right and we can
the function dot min right and we can print
print maximum
maximum value is max variable and print minimum
value is max variable and print minimum value is min variable. So this is your
value is min variable. So this is your maximum. This is your minimum. And we
maximum. This is your minimum. And we can easily find the range. Now range is
can easily find the range. Now range is equals to max minus min
equals to max minus min and print
and print range of the data is the range variable.
range of the data is the range variable. So this is how you can calculate the
So this is how you can calculate the range also. Clear?
range also. Clear? So you can see how these concepts are
So you can see how these concepts are very easy to implement right. So we
very easy to implement right. So we found mean, median, mode, we found
found mean, median, mode, we found range, we found variance, standard
range, we found variance, standard deviation. We are left with percentiles,
deviation. We are left with percentiles, quartiles
quartiles and correlation. These three topics is
and correlation. These three topics is what we left. So let's talk about
what we left. So let's talk about percentiles. So everyone remembers the
percentiles. So everyone remembers the concept of percentiles.
concept of percentiles. Let me relate to that example. Now
Let me relate to that example. Now suppose you have one student in the
suppose you have one student in the class who scored 75 marks out of 100.
class who scored 75 marks out of 100. Okay. And this 75 value is at 60th
Okay. And this 75 value is at 60th percentile. What does this means that
percentile. What does this means that 60%age
60%age 60%age
60%age of other students in the class have less
of other students in the class have less marks than 75.
marks than 75. So percentiles are the observations.
So percentiles are the observations. They will give you a value below which
They will give you a value below which some percentage of other values belong.
some percentage of other values belong. Clear?
Clear? So how we can find it using Python?
So how we can find it using Python? Let's see. So suppose if we want to find
Let's see. So suppose if we want to find 75 percentile, we will use the function
75 percentile, we will use the function np do. Okay, in which I will pass the
np do. Okay, in which I will pass the column name and comma which percentile I
column name and comma which percentile I want. So suppose I want to check what
want. So suppose I want to check what will be 75 percentile value. So print
will be 75 percentile value. So print this. Numpy is not defined. I think we
this. Numpy is not defined. I think we have run the code for numpy. Okay, we
have run the code for numpy. Okay, we have not run this as n.
have not run this as n. So you can see that at 75%
So you can see that at 75% value is 9193
value is 9193 which basically means that 75 percentile
which basically means that 75 percentile percentage of other values is less than
percentage of other values is less than this value. Clear? Similarly you can
this value. Clear? Similarly you can find 25 and you can find 15 also. So
find 25 and you can find 15 also. So give me what is the value of 25
give me what is the value of 25 percentile? what value you will get for
percentile? what value you will get for 25%.
25%. So I'll copy the same code here. I will
So I'll copy the same code here. I will change the variable name and I'll 25
change the variable name and I'll 25 and so yes 431.
and so yes 431. Similarly we can find the quartiles
Similarly we can find the quartiles also. So what was quartiles? Remember
also. So what was quartiles? Remember Q1, Q2, Q3, Q1 means 25 percentile, Q2
Q1, Q2, Q3, Q1 means 25 percentile, Q2 means 15 and Q3 means 75, right? So
means 15 and Q3 means 75, right? So either we can find the percentile also
either we can find the percentile also like here this 25 percentile is
like here this 25 percentile is basically my Q1. So like this also we
basically my Q1. So like this also we can find but we can use another function
can find but we can use another function also. What is that? I'll take the column
also. What is that? I'll take the column name dot quantile function in which I
name dot quantile function in which I will pass 0.25.
will pass 0.25. So that is 431. You can see exact value
So that is 431. You can see exact value you got. So you can write here Q1 is
you got. So you can write here Q1 is equals to print Q1. So either you can
equals to print Q1. So either you can find through percentiles also or you can
find through percentiles also or you can find it to quartiles also. Tell me what
find it to quartiles also. Tell me what value you will get for Q2 and Q3. Let's
value you will get for Q2 and Q3. Let's see for Q2 and Q3 what value you will
see for Q2 and Q3 what value you will get. Q2 means 50%.
get. Q2 means 50%. So this is giving you the value below
So this is giving you the value below which all other values all the 100% of
which all other values all the 100% of values belong. That is means we are
values belong. That is means we are finding the maximum value and you can
finding the maximum value and you can see here the maximum value is also same.
see here the maximum value is also same. So yes q2 and Q3 value also we got
So yes q2 and Q3 value also we got perfectly fine. See how easy it is to
perfectly fine. See how easy it is to implement in Python. Right? And the last
implement in Python. Right? And the last concept we are here with correlation.
concept we are here with correlation. So remember how many types of
So remember how many types of correlation we were having? We discussed
correlation we were having? We discussed positive like we discussed in the
positive like we discussed in the starting positive, negative and zero. So
starting positive, negative and zero. So how we find it? Function name is dot
how we find it? Function name is dot core.
core. So DF you can see you're getting the
So DF you can see you're getting the correlation values of each and every
correlation values of each and every column here right and also we can plot
column here right and also we can plot which visualization plot we use we have
which visualization plot we use we have discussed guys like that time you were
discussed guys like that time you were not aware about that plot so let me
not aware about that plot so let me discuss that plot with you which plot I
discuss that plot with you which plot I told you guys we use for correlation
told you guys we use for correlation heat map and how we can get that heat
heat map and how we can get that heat map SNS do heat map here we will find df
map SNS do heat map here we will find df core run this
core run this see you're getting the proper heat pipe
see you're getting the proper heat pipe now how to interpret the values you can
now how to interpret the values you can write here n not is equals to true so
write here n not is equals to true so you can see values are also visible but
you can see values are also visible but values are overlapped here right so I
values are overlapped here right so I will increase the figure size so plt dot
will increase the figure size so plt dot figure
figure inside that we will increase the figure
inside that we will increase the figure size is equals to I think 15 10 will
size is equals to I think 15 10 will work fine for my system. So yes you can
work fine for my system. So yes you can see the values here. Clear everyone? So
see the values here. Clear everyone? So you can see here some negative values
you can see here some negative values and you can see some positive values. So
and you can see some positive values. So negative means they are negatively
negative means they are negatively correlated and positive means they are
correlated and positive means they are positively correlated.
positively correlated. Clear everyone? So this is how we find
Clear everyone? So this is how we find the correlation.
the correlation. Just a quick info guys, Intellipath
Just a quick info guys, Intellipath offers a data science course in
offers a data science course in collaboration with iHub, IIT Riy which
collaboration with iHub, IIT Riy which will help you master concepts like
will help you master concepts like Python, SQL, machine learning, AI,
Python, SQL, machine learning, AI, PowerBI and more. With this course, we
PowerBI and more. With this course, we have already helped thousands of
have already helped thousands of professionals in successful career
professionals in successful career transition. You can check out their
transition. You can check out their testimonials on our achievers channel
testimonials on our achievers channel whose link is given in the description
whose link is given in the description below. Without a doubt, this course can
below. Without a doubt, this course can set your career to new heights. So visit
set your career to new heights. So visit the course page link given below in the
the course page link given below in the description and take the first step
description and take the first step towards career growth with the data
towards career growth with the data science course. So first thing first
science course. So first thing first what is the difference between statistic
what is the difference between statistic and statistics? So descriptive is done.
and statistics? So descriptive is done. So what is the difference between
So what is the difference between statistic and statistics? Statistic
statistic and statistics? Statistic versus statistics.
versus statistics. This is the first question I asked when
This is the first question I asked when I talk about statistics. If you observe
I talk about statistics. If you observe there's just one extra letter s in the
there's just one extra letter s in the end right that's the difference in the
end right that's the difference in the spelling not in the concept status uh
spelling not in the concept status uh statistic can be a single piece of data
statistic can be a single piece of data okay so you this is a question out of
okay so you this is a question out of syllabus right you might say you have
syllabus right you might say you have done this statistics and so I think this
done this statistics and so I think this is this is a more like a teacher's way
is this is a more like a teacher's way of testing where the you know unexpected
of testing where the you know unexpected question might come so see uh statist
question might come so see uh statist Statistic you are close statistic talks
Statistic you are close statistic talks about statistical technique
about statistical technique statistical technique like you know Z
statistical technique like you know Z test you know t test k square test right
test you know t test k square test right all these are your statistic more like
all these are your statistic more like it's an approach it's a method within
it's an approach it's a method within the universe of what statistics
the universe of what statistics statistic is a subject right so
statistic is a subject right so statistic is more like a method or
statistic is more like a method or approach a technique tech which you
approach a technique tech which you apply on a set of data which you have
apply on a set of data which you have collected over a piece of over a period
collected over a piece of over a period of time you have done some
of time you have done some transformation you are going to analyze
transformation you are going to analyze with the help of this test okay so
with the help of this test okay so statistics you already know it's a
statistics you already know it's a subject quite an quite an interesting
subject quite an quite an interesting subject how was your introduction to
subject how was your introduction to statistics like did you feel that
statistics like did you feel that statistics can be a really interesting
statistics can be a really interesting you know topic or interesting area where
you know topic or interesting area where you can learn how many of you guys felt
you can learn how many of you guys felt that that oh statistics something I
that that oh statistics something I should you know pick it up studied
should you know pick it up studied statistics till the descriptive was
statistics till the descriptive was taught to you right I know some of the
taught to you right I know some of the students are there okay reasonable quite
students are there okay reasonable quite interesting so let me couple of things
interesting so let me couple of things and let's see understand you few things
and let's see understand you few things uh that what is your understanding of
uh that what is your understanding of descriptive statistics a little bit more
descriptive statistics a little bit more so that I can use some of those things
so that I can use some of those things in inferential so I'm just making sure
in inferential so I'm just making sure so tell Tell me one thing if I talk
so tell Tell me one thing if I talk about salary right and I have a very
about salary right and I have a very good diagram for that. So you must have
good diagram for that. So you must have this is the time by the way guys. So
this is the time by the way guys. So those who have been to uh engine
those who have been to uh engine colleges or typically I would say I
colleges or typically I would say I December is day one uh December one is
December is day one uh December one is day zero for them. So this is the moment
day zero for them. So this is the moment I am sure my uh juniors will be getting
I am sure my uh juniors will be getting placed right. So this is a bell curve
placed right. So this is a bell curve which I put together and you might be
which I put together and you might be seeing something coming in the news
seeing something coming in the news right 1 K package let me tell you I have
right 1 K package let me tell you I have been asked this question a lot of time
been asked this question a lot of time one K package is a myth or more like
one K package is a myth or more like it's a misinformation
it's a misinformation given to uh news media print to get some
given to uh news media print to get some footage or mileage definitely some of
footage or mileage definitely some of the guys maybe one or two or three
the guys maybe one or two or three people in a batch they get one cr but
people in a batch they get one cr but that's all in dollars means it's not in
that's all in dollars means it's not in India, it's in US or likes of US. So
India, it's in US or likes of US. So converting dollar into rupees and then
converting dollar into rupees and then telling you are getting a package of 1
telling you are getting a package of 1 cr. The people forget uh that they have
cr. The people forget uh that they have to spend in dollar. They'll not take a
to spend in dollar. They'll not take a flight, come back to India and spend in
flight, come back to India and spend in rupees. Understand my point? So 1 K
rupees. Understand my point? So 1 K package is okay. U right? It's not like
package is okay. U right? It's not like in India hardly anybody gets as a
in India hardly anybody gets as a fresher 1 K package. If some companies
fresher 1 K package. If some companies giving their idiots the company owners
giving their idiots the company owners or the board is idiot okay doesn't make
or the board is idiot okay doesn't make sense so definitely you can see this is
sense so definitely you can see this is a skewed distribution and as you can
a skewed distribution and as you can also see I've already written it's a
also see I've already written it's a right skewed anybody why it is called
right skewed anybody why it is called right skew why is it called why is it
right skew why is it called why is it called so because it has a long tail
called so because it has a long tail has a long tail on the right side some
has a long tail on the right side some of the guys will be wondering what is
of the guys will be wondering what is that long tail let me highlight You'll
that long tail let me highlight You'll see it. Can you see the distribution is
see it. Can you see the distribution is going like this? Has a long tail. This
going like this? Has a long tail. This is what we say long tail and the peak is
is what we say long tail and the peak is on the left hand side. So it's a right
on the left hand side. So it's a right skewed. Okay. Now you must have been
skewed. Okay. Now you must have been taught uh when the data is dist when the
taught uh when the data is dist when the data is symmetric when the data is uh
data is symmetric when the data is uh having a shape like this the
having a shape like this the distribution. So then mean median mode
distribution. So then mean median mode all are going to be at the same place.
all are going to be at the same place. Remember now when the data is skewed
Remember now when the data is skewed then there is an issue doing mean here
then there is an issue doing mean here okay so far with me guys doing mean here
okay so far with me guys doing mean here and what is the reason for that because
and what is the reason for that because mean is sensitive to outliers I hope you
mean is sensitive to outliers I hope you have been must have been taught this
have been must have been taught this right mean is sensitive to outlier and
right mean is sensitive to outlier and this is a very very common concept are
this is a very very common concept are you guys aware mean is sensitive to
you guys aware mean is sensitive to outliers let me show you if you're not
outliers let me show you if you're not tell me one thing I've also written some
tell me one thing I've also written some package is okay let's say this is not 51
package is okay let's say this is not 51 by the way this is 5 LPA let me write
by the way this is 5 LPA let me write clearly else you start laughing at me 5
clearly else you start laughing at me 5 LPA let's say most of the guys are
LPA let's say most of the guys are getting 5 LPA that becomes if you think
getting 5 LPA that becomes if you think this is your you know this is your the
this is your you know this is your the peak of the salary you can say like a
peak of the salary you can say like a mode so this is the highest value which
mode so this is the highest value which you see in the normal distribution so
you see in the normal distribution so that's your highest frequency value
that's your highest frequency value which means it's the mode Right? Or you
which means it's the mode Right? Or you can say close to median. But it cannot
can say close to median. But it cannot be the average. You know why? Because of
be the average. You know why? Because of these guys who are sitting somewhere
these guys who are sitting somewhere here. Can you see this? Were sitting
here. Can you see this? Were sitting here. They are going to impact the
here. They are going to impact the average. Let me give a simple example.
average. Let me give a simple example. So let's say these are your salaries 35.
So let's say these are your salaries 35. I'm taking some numbers in LPA lakhs
I'm taking some numbers in LPA lakhs perm 50 and 80. Let's say. So let's say
perm 50 and 80. Let's say. So let's say these guys salary if you want to find
these guys salary if you want to find and this is all in LPA. So if you do the
and this is all in LPA. So if you do the math what is the mean salary? If someone
math what is the mean salary? If someone can do this for me you will get to how
can do this for me you will get to how much and I think I have 3 6 9 10. So
much and I think I have 3 6 9 10. So what is it coming guys? Did someone do
what is it coming guys? Did someone do 81.5?
81.5? No. How can be 81.5 average? Only two
No. How can be 81.5 average? Only two values are 5080 18.2 so many people are
values are 5080 18.2 so many people are telling. So let's say that's the
telling. So let's say that's the average. Okay? Right? But if you observe
average. Okay? Right? But if you observe closely, most of your data distribution
closely, most of your data distribution is not even higher than 10. Right? Guys,
is not even higher than 10. Right? Guys, can you see this? Just because of these
can you see this? Just because of these two guys who are like the IT that media
two guys who are like the IT that media guy who are getting this high package
guy who are getting this high package has impacted your overall salary. Why
has impacted your overall salary. Why why don't you do that? If you you know
why don't you do that? If you you know exclude these outliers and do this math
exclude these outliers and do this math again add this eight salary only and
again add this eight salary only and divide by 10 or sorry divide by 8 how
divide by 10 or sorry divide by 8 how much you are getting guys I think it
much you are getting guys I think it will be close to six or seven 6.5 okay
will be close to six or seven 6.5 okay thank you those who are helping me do
thank you those who are helping me do you think which one looks more real
you think which one looks more real closer to reality this one or this one
closer to reality this one or this one as an average salary
as an average salary definitely 6.5 right definitely 6.5
definitely 6.5 right definitely 6.5 Five. So you understand when I say mean
Five. So you understand when I say mean is sensitive to outliers and that's the
is sensitive to outliers and that's the reason people say whenever you are
reason people say whenever you are looking at a data with outliers you
looking at a data with outliers you should not go with mean you should go
should not go with mean you should go with median. Now what don't we find
with median. Now what don't we find median for this one. So just arrange 3 4
median for this one. So just arrange 3 4 3 4 5 6 8 10 11 50 8 4 6 8 I missed one
3 4 5 6 8 10 11 50 8 4 6 8 I missed one number. I think five is two times right
number. I think five is two times right guys. Okay. Now tell me guys what is the
guys. Okay. Now tell me guys what is the median? So I I'll come to URL 2 4 5
median? So I I'll come to URL 2 4 5 right. So what you will do my dear you
right. So what you will do my dear you will take first four and second four
will take first four and second four last four. So you will be left out with
last four. So you will be left out with what? What is the average? I'm d I'm
what? What is the average? I'm d I'm sure you must have been taught how to
sure you must have been taught how to calculate median. So what is the average
calculate median. So what is the average of these two? I'm not going to teach
of these two? I'm not going to teach that. I'm just making a Now tell me
that. I'm just making a Now tell me looking at this number and this number
looking at this number and this number which one is more realistic mean is more
which one is more realistic mean is more realistic or median is more realistic to
realistic or median is more realistic to the average which you might. So now you
the average which you might. So now you understand. So that's why if you see if
understand. So that's why if you see if some apps like glass door or some
some apps like glass door or some similar like pay scale if they record
similar like pay scale if they record average salary you should not never go
average salary you should not never go with that because let me also tell you
with that because let me also tell you there are students who are coming from
there are students who are coming from ISP and all and they're getting 30 lakhs
ISP and all and they're getting 30 lakhs as a fresher salary and then there will
as a fresher salary and then there will be student who are coming from normal
be student who are coming from normal college and they will be getting three
college and they will be getting three lakhs or five lakhs. So 30 lakhs guys if
lakhs or five lakhs. So 30 lakhs guys if you keep taking right blindly he will he
you keep taking right blindly he will he will be in a wrong mean he will in a
will be in a wrong mean he will in a bubble that thinking oh I am also going
bubble that thinking oh I am also going to get 30 lakhs so just to you know just
to get 30 lakhs so just to you know just to bust the this thought that you know
to bust the this thought that you know it doesn't work like this okay so does
it doesn't work like this okay so does it make sense guys now coming back to RN
it make sense guys now coming back to RN outlier identifying outlier treating
outlier identifying outlier treating outlier okay so remember I had taught
outlier okay so remember I had taught you guys box plot okay so box plot says
you guys box plot okay so box plot says that if something is in you know I'm
that if something is in you know I'm just more like you know revising
just more like you know revising revisiting few things so if you do a
revisiting few things so if you do a little math and try to identify an
little math and try to identify an outlier you will be able to do that so
outlier you will be able to do that so now my question is did you guys go
now my question is did you guys go through some of those examples where you
through some of those examples where you identify the outlier and got treated and
identify the outlier and got treated and treat the outlier already Right? So
treat the outlier already Right? So generally these there are two on the
generally these there are two on the upper side and two on the lower side we
upper side and two on the lower side we have outliers. Okay. So there is a
have outliers. Okay. So there is a statistical way to calculate these
statistical way to calculate these whiskers known as whisker. I have also
whiskers known as whisker. I have also talked about it in my visualization.
talked about it in my visualization. Let's say this is your upper threshold.
Let's say this is your upper threshold. So guys I'll not repeat it. I'll quickly
So guys I'll not repeat it. I'll quickly write the equation. You take the Q3 and
write the equation. You take the Q3 and in then you add the 1.5 times.
in then you add the 1.5 times. Statistically this number has been given
Statistically this number has been given to you. this whole formula you do the
to you. this whole formula you do the intercortile range which is Q3 minus Q1
intercortile range which is Q3 minus Q1 and multiply with 1.5
and multiply with 1.5 anything which is greater than this like
anything which is greater than this like these two guys are going to be outlier
these two guys are going to be outlier similarly you can also do lower
similarly you can also do lower threshold
threshold just refresh your memory so that will be
just refresh your memory so that will be Q1 minus 1.5 IQR IQR is enter range
Q1 minus 1.5 IQR IQR is enter range anything lower than this lower threshold
anything lower than this lower threshold like these guys are outlash and you can
like these guys are outlash and you can also make a little sense out of it. If
also make a little sense out of it. If someone is getting outclam something
someone is getting outclam something unexpected or weird, right? Everybody's
unexpected or weird, right? Everybody's getting five lakhs, six lakhs, 10 lakhs
getting five lakhs, six lakhs, 10 lakhs and somebody is getting 80 lakhs
and somebody is getting 80 lakhs definitely that is an outlier. That's a
definitely that is an outlier. That's a normal way of saying it. So outlier
normal way of saying it. So outlier means which is not normal which doesn't
means which is not normal which doesn't fit in normal scenario is an outlier
fit in normal scenario is an outlier literally I'm trying to explain
literally I'm trying to explain mathematically. This is how do you do
mathematically. This is how do you do that? Okay. Now guys, I'll jump one
that? Okay. Now guys, I'll jump one question and if you have been taught if
question and if you have been taught if you have been taught normal
you have been taught normal distribution, you should be able to take
distribution, you should be able to take it because then you must have been
it because then you must have been taught jet score as well. Guys, jet
taught jet score as well. Guys, jet score was taught to you guys. No. So
score was taught to you guys. No. So then what did you guys do in normal
then what did you guys do in normal distribution? There's no normal
distribution? There's no normal distribution without jet score. And
distribution without jet score. And without the normal distribution jet
without the normal distribution jet score, I cannot go into inferial. So
score, I cannot go into inferial. So it's a prerequisite. Min and max is
it's a prerequisite. Min and max is nothing. Maximum value in the data is
nothing. Maximum value in the data is maximum. Minimum is minimum. Okay.
maximum. Minimum is minimum. Okay. Simple. Why the outliers are not
Simple. Why the outliers are not considered as what they they are? See
considered as what they they are? See this is this minimum. Now this minimum
this is this minimum. Now this minimum is like for their they are actually
is like for their they are actually taking at lower threshold and upper
taking at lower threshold and upper threshold. Okay. So don't get confused.
threshold. Okay. So don't get confused. This is not our minimum maximum.
This is not our minimum maximum. Obviously minimum of the data will be
Obviously minimum of the data will be the outliers, right? If they are lower
the outliers, right? If they are lower the threshold values, right? Lower than
the threshold values, right? Lower than the low threshold. you you have a good
the low threshold. you you have a good point but they were talking about this
point but they were talking about this whisker okay so remember uh there's a
whisker okay so remember uh there's a bell curve okay standard standard normal
bell curve okay standard standard normal distribution
distribution let me do the let me do the thing okay
let me do the let me do the thing okay so one of the key distribution is normal
so one of the key distribution is normal distribution also known as bell's bell
distribution also known as bell's bell curve bell's curve it has gotten a name
curve bell's curve it has gotten a name bell curve anybody knows why it is
bell curve anybody knows why it is because if You go to a temple, right?
because if You go to a temple, right? And have you seen a bell, right? The
And have you seen a bell, right? The normal distribution can be, you know,
normal distribution can be, you know, shown as a bell curve. Okay? And here
shown as a bell curve. Okay? And here you go. Now, if you look at the normal
you go. Now, if you look at the normal distribution, okay, most of the thing uh
distribution, okay, most of the thing uh in the in the world can be model as a
in the in the world can be model as a bell curve. Trust me on that. So that's
bell curve. Trust me on that. So that's what I was telling right if the normal
what I was telling right if the normal distribution is symmetric your mean
distribution is symmetric your mean median mode all will be on the same
median mode all will be on the same point okay so ideally when we talk about
point okay so ideally when we talk about standard normal distribution you make
standard normal distribution you make sure that it is symmetric it's not
sure that it is symmetric it's not skewed okay and if you look at the the
skewed okay and if you look at the the top one not the bottom one the top one
top one not the bottom one the top one you must be seeing that I have given
you must be seeing that I have given three colors one is you know between mu
three colors one is you know between mu plus minus sigma. So I hope you know
plus minus sigma. So I hope you know what is mu, what is sigma. So this is
what is mu, what is sigma. So this is going to cover how much of your data?
going to cover how much of your data? 68.27%.
68.27%. Right? And then if you look at the from
Right? And then if you look at the from mu + - 2 sigma
mu + - 2 sigma 95% of the data will lie there. And if
95% of the data will lie there. And if you go and stretch it further mu plus -
you go and stretch it further mu plus - 3 sigma 99% of the data will be there.
3 sigma 99% of the data will be there. So you might be wondering why do we need
So you might be wondering why do we need normal distribution right? uh and why mu
normal distribution right? uh and why mu plus minus sigma is so important. See
plus minus sigma is so important. See this is what has been found or observed
this is what has been found or observed on a normal distribution because it's
on a normal distribution because it's all about probabilities of what data
all about probabilities of what data will lie where. So if you take any
will lie where. So if you take any normal distribution uh curve you can
normal distribution uh curve you can always find mu which is your more like
always find mu which is your more like expectation of x don't don't go into
expectation of x don't don't go into this it's a random variable I'm damn
this it's a random variable I'm damn sure you don't know. So expectation of X
sure you don't know. So expectation of X is generally written as mu which is your
is generally written as mu which is your you can say mean but it's not real mean
you can say mean but it's not real mean because mean of deterministic data is
because mean of deterministic data is your average but this is more like
your average but this is more like expected value of X. So I'm going to
expected value of X. So I'm going to write expected value and the moment
write expected value and the moment expectation come it's probability okay
expectation come it's probability okay then you also have sigma which is your
then you also have sigma which is your you know that right standard deviation
you know that right standard deviation which is square root of variance.
which is square root of variance. Now what we are saying that if you take
Now what we are saying that if you take a range of mu + minus sigma 68 I'll say
a range of mu + minus sigma 68 I'll say roughly 68% of the data it will have
roughly 68% of the data it will have plus - 2 sigma it will be 95% of the
plus - 2 sigma it will be 95% of the data and new plus - 3 sigma 99% of the
data and new plus - 3 sigma 99% of the data will lie there. This is the reason
data will lie there. This is the reason the professors or the college
the professors or the college universities professors they use this
universities professors they use this like a lot. So suppose they fit on a
like a lot. So suppose they fit on a normal distribution on your uh to
normal distribution on your uh to calculate grades right so what happens
calculate grades right so what happens guys let's say and definitely it will
guys let's say and definitely it will not be like symmetric this normal
not be like symmetric this normal distribution can be skewed so they know
distribution can be skewed so they know that so we have a mean somewhere here so
that so we have a mean somewhere here so this is going to be mu they know that in
this is going to be mu they know that in this there will be some sigma so this is
this there will be some sigma so this is mu plus sigma this will be mu + 2 sigma
mu plus sigma this will be mu + 2 sigma and this will be mu + 3 sigma and let's
and this will be mu + 3 sigma and let's say this is mu minus sigma and Then they
say this is mu minus sigma and Then they create bands. So this is going to be
create bands. So this is going to be their favorite student which is maybe
their favorite student which is maybe one or two who maybe one or two right
one or two who maybe one or two right and they will be getting let's say a
and they will be getting let's say a plus grade. Then they'll go to the next
plus grade. Then they'll go to the next band which is mu plus 2 sigma to mu plus
band which is mu plus 2 sigma to mu plus 3 sigma focus guys. So all those guys
3 sigma focus guys. So all those guys who are here they are going to get let's
who are here they are going to get let's say grade A, grade B, grade C, grade D
say grade A, grade B, grade C, grade D and everybody who is behind this are
and everybody who is behind this are going to get what grade? F grade which
going to get what grade? F grade which is a fail grade. Uh first of all this is
is a fail grade. Uh first of all this is just a highle overview of how grading
just a highle overview of how grading system work. It's not that
system work. It's not that straightforward and by the just to give
straightforward and by the just to give an example and this is the most of the
an example and this is the most of the guys generally fra means you have to
guys generally fra means you have to repeat the course and exam as well. Are
repeat the course and exam as well. Are you guys able to follow how normal
you guys able to follow how normal distribution can help you? As a matter
distribution can help you? As a matter of fact even the COVID cases were
of fact even the COVID cases were modeled as a normal distribution income
modeled as a normal distribution income can be modeled as a normal distribution.
can be modeled as a normal distribution. ID's uh performance or result can be
ID's uh performance or result can be modeled as a normal distribution. So
modeled as a normal distribution. So normal distribution is like bread and
normal distribution is like bread and butter. Okay. The beauty about normal
butter. Okay. The beauty about normal distribution is that it start from
distribution is that it start from negative. So it can fit all the real
negative. So it can fit all the real data from minus infinity to plus
data from minus infinity to plus infinity. No other distribution can do
infinity. No other distribution can do justice for a continuous one. Okay. Now
justice for a continuous one. Okay. Now coming back coming back how we are going
coming back coming back how we are going to use it. Okay. By the way, you
to use it. Okay. By the way, you understand what is the meaning of
understand what is the meaning of standard deviation? What what is
standard deviation? What what is interpretation of high standard
interpretation of high standard deviation versus low standard deviation?
deviation versus low standard deviation? Anybody? So, for example, I give you two
Anybody? So, for example, I give you two curves. Tell me which one is having the
curves. Tell me which one is having the high high uh higher you know standard
high high uh higher you know standard deviation. Higher standard deviation
deviation. Higher standard deviation curve is which color black or red? Where
curve is which color black or red? Where do you observe higher standard
do you observe higher standard deviation? First one, right? Black.
deviation? First one, right? Black. those who are saying red. So please pay
those who are saying red. So please pay attention standard deviation means
attention standard deviation means spreadness of the data right. So how
spreadness of the data right. So how spreadness means amount of variation.
spreadness means amount of variation. So standard deviation and you know why
So standard deviation and you know why do we need standard deviation? Because
do we need standard deviation? Because can you see that the mean of the data
can you see that the mean of the data the peak of the data is same right? So
the peak of the data is same right? So just saying mean, median, mode is not
just saying mean, median, mode is not enough a lot of time right it's all same
enough a lot of time right it's all same but these two distribution are quite
but these two distribution are quite different. So the reason is that why
different. So the reason is that why standard deviation is required because
standard deviation is required because standard deviation talks about the
standard deviation talks about the amount of variation or dispersion also
amount of variation or dispersion also known as dispersion or also known as
known as dispersion or also known as spread of data amount of variation or
spread of data amount of variation or dispersion to data or spread of data. So
dispersion to data or spread of data. So you guys can clearly see that this guy
you guys can clearly see that this guy the black one is like quite varying and
the black one is like quite varying and the red one is like less varying if you
the red one is like less varying if you look at the similar range that's why
look at the similar range that's why this okay now guys there is a
this okay now guys there is a probability density function for the
probability density function for the standard normal distribution so what
standard normal distribution so what happens from the mean from the mean so
happens from the mean from the mean so probability density function now you
probability density function now you might be wondering what is it don't
might be wondering what is it don't wonder much I'll just use this jargon
wonder much I'll just use this jargon for you although it should have been
for you although it should have been with you but I can't do much about it.
with you but I can't do much about it. Probability density function is actually
Probability density function is actually a mathematical function to represent the
a mathematical function to represent the any distribution. So for normal
any distribution. So for normal distribution you also have a probability
distribution you also have a probability distribution function. Density function
distribution function. Density function those who have done a little bit of
those who have done a little bit of random variable they know that. So it is
random variable they know that. So it is actually a mathematical equation
actually a mathematical equation 1 upon roo<unk> under 2 pi sigma is
1 upon roo<unk> under 2 pi sigma is outside e to the power -/ x - mu by
outside e to the power -/ x - mu by sigma square where x can be between this
sigma square where x can be between this doesn't look this looks better. Okay
doesn't look this looks better. Okay guys now this is your normal
guys now this is your normal distribution this is not standard. Now
distribution this is not standard. Now the problem this guy has
the problem this guy has this is normal distribution normal
this is normal distribution normal distribution. Now giving letting giving
distribution. Now giving letting giving a little idea suppose you talk about a
a little idea suppose you talk about a normal distribution curve and this is
normal distribution curve and this is how it is. I hope everybody knows this
how it is. I hope everybody knows this the area under this curve is the total
the area under this curve is the total probability. Come on right if it
probability. Come on right if it represents all the different different
represents all the different different probabilities the total data will have
probabilities the total data will have probability equal to one. I think this
probability equal to one. I think this is common sense. So then if it is
is common sense. So then if it is symmetric about let's say about the
symmetric about let's say about the center then can I say that this area
center then can I say that this area minus infinity to 0 let me take some
minus infinity to 0 let me take some time to shade it how much this will be
time to shade it how much this will be probability value what is the area under
probability value what is the area under this curve what is this area if total is
this curve what is this area if total is one what is this area similarly on the
one what is this area similarly on the other side you can say what is it you
other side you can say what is it you tell me another five so total is one
tell me another five so total is one right so far with me guys
right so far with me guys Total is 1.5
Total is 1.5 is one. Since it is symmetric, it is
is one. Since it is symmetric, it is easy to do some of these juggling. So
easy to do some of these juggling. So you can say minus infinity to 0 is one
you can say minus infinity to 0 is one part and 0 to infinity it is another
part and 0 to infinity it is another part. Right? Then it is symmetric. But
part. Right? Then it is symmetric. But look at this. Suppose this is a
look at this. Suppose this is a distribution right? This is also normal
distribution right? This is also normal distribution. It is like skewed
distribution. It is like skewed distribution. Ideally people don't say
distribution. Ideally people don't say this is a normal distribution. Let me
this is a normal distribution. Let me also tell you this. But let's say the
also tell you this. But let's say the distribution is not symmetric. Let's say
distribution is not symmetric. Let's say this is the zero here. See about the
this is the zero here. See about the peak it is symmetric only but it has
peak it is symmetric only but it has shifted. Can you see that guys? Hello.
shifted. Can you see that guys? Hello. Are you with me? Now this is not at
Are you with me? Now this is not at zero. It has shifted towards right. Now
zero. It has shifted towards right. Now you tell me if I ask what is this area
you tell me if I ask what is this area minus infinity to0 minus infinity to0.
minus infinity to0 minus infinity to0. What is this area? Will it be easy for
What is this area? Will it be easy for you to calculate? I think no. Because to
you to calculate? I think no. Because to find this area and if you have done
find this area and if you have done little bit of maths it means that you
little bit of maths it means that you are going to integrate this guy from
are going to integrate this guy from minus infinity to 0 dx and you will die
minus infinity to 0 dx and you will die doing the integration. So statisticians
doing the integration. So statisticians mathematicians they agreed that you know
mathematicians they agreed that you know boss we can't do it all the time. So
boss we can't do it all the time. So they said why don't we normalize this
they said why don't we normalize this make it a standard normal distribution
make it a standard normal distribution doesn't matter what data we are getting
doesn't matter what data we are getting end of the day we are going to make it
end of the day we are going to make it look like what you see on the first one
look like what you see on the first one and that's what where we got the normal
and that's what where we got the normal distribution to the standard normal
distribution to the standard normal distribution who will be doing this
distribution who will be doing this integration right it is doable but it's
integration right it is doable but it's not easily doable so they thought why
not easily doable so they thought why don't we introduce standard normal
don't we introduce standard normal distribution which is the in Python you
distribution which is the in Python you must have he got this like oh I
must have he got this like oh I standardize the data have you heard this
standardize the data have you heard this word normalize the data the data
word normalize the data the data scientists they keep telling this all
scientists they keep telling this all the time which is overused or abused but
the time which is overused or abused but what is the interpretation of it means
what is the interpretation of it means that you will take all the x value then
that you will take all the x value then what you will do you will subtract the
what you will do you will subtract the mean and divide by sigma and you will
mean and divide by sigma and you will call a new variable called zed and this
call a new variable called zed and this is where I introduce the zed
is where I introduce the zed Have you ever observed this in some at
Have you ever observed this in some at some point x - mu - x - mu by sigma. So
some point x - mu - x - mu by sigma. So basically what you're doing you're
basically what you're doing you're taking the whole data subtract the mean
taking the whole data subtract the mean and divide by sigma means normalizing by
and divide by sigma means normalizing by sigma. This is known as you know zed zed
sigma. This is known as you know zed zed value or zed score right. And what
value or zed score right. And what you're going to do with this the idea is
you're going to do with this the idea is that if you take this and find
that if you take this and find expectation of zed which is like
expectation of zed which is like expectation of x - mu sigma this is a
expectation of x - mu sigma this is a little bit coming from random variable
little bit coming from random variable so please excuse me for that but I want
so please excuse me for that but I want to show you so if you do the maths it
to show you so if you do the maths it comes to be expectation of x minus
comes to be expectation of x minus expectation of mu 1 by sigma expectation
expectation of mu 1 by sigma expectation of x is also average if you remember
of x is also average if you remember what I have given on the top see here
what I have given on the top see here wow here so using that you write this as
wow here so using that you write this as mu why I'm using curly braces sorry and
mu why I'm using curly braces sorry and then expectation of anything constant is
then expectation of anything constant is also mu cancel my so this is coming to
also mu cancel my so this is coming to zero so the beauty of zed score is that
zero so the beauty of zed score is that expectation of zed is what zero which is
expectation of zed is what zero which is mean which means that mean for standard
mean which means that mean for standard normal distribution is always zero hello
normal distribution is always zero hello mean for standard normal distribution is
mean for standard normal distribution is zero. It is a calculated mean always.
zero. It is a calculated mean always. It's not the natural mean coming from
It's not the natural mean coming from the data. You have standardized it. So
the data. You have standardized it. So don't miss this step in the whole scheme
don't miss this step in the whole scheme of things. I'll take a pause here. You
of things. I'll take a pause here. You have done this extra step on your data
have done this extra step on your data to normalize it. And that's the reason
to normalize it. And that's the reason that your expectation of zed is coming
that your expectation of zed is coming to be zero. And likewise sigma will come
to be zero. And likewise sigma will come out to be one which I'm not deriving
out to be one which I'm not deriving because it will take me to the extensive
because it will take me to the extensive intensive calculation on random variable
intensive calculation on random variable which is not there for you. Okay. Let's
which is not there for you. Okay. Let's not go too mathematical as so far with
not go too mathematical as so far with me guys. Now you might be wondering what
me guys. Now you might be wondering what is the benefit of this? So benefit is
is the benefit of this? So benefit is there.
there. The moment you do this you started with
The moment you do this you started with fxx mu sigma = 1 upon<unk> under 2 pi
fxx mu sigma = 1 upon<unk> under 2 pi sigma e to the power -/
sigma e to the power -/ x - mu by sigma square. Now what you
x - mu by sigma square. Now what you have done my dear you have taken this
have done my dear you have taken this guy and you have calling this as zed
guy and you have calling this as zed focus.
focus. So now suddenly what's happening the
So now suddenly what's happening the transition is happening and I'll write
transition is happening and I'll write it will become fz
it will become fz z mu becomes zero sigma becomes 1 and
z mu becomes zero sigma becomes 1 and then you will write 1 upon roo<unk>
then you will write 1 upon roo<unk> under 2 pi sigma is 1. So this will be a
under 2 pi sigma is 1. So this will be a shorter one e to the power minus j² by 2
shorter one e to the power minus j² by 2 guys does it look simpler than than the
guys does it look simpler than than the than the one which I just showed you. So
than the one which I just showed you. So right your your equation gets simpler
right your your equation gets simpler and you can generate a table out of it
and you can generate a table out of it and that's where you're going to use it.
and that's where you're going to use it. So when I say table then you'll be
So when I say table then you'll be looking at something. Yeah it is
looking at something. Yeah it is difficult to remember since I'm teaching
difficult to remember since I'm teaching I can but that's the reason they not
I can but that's the reason they not they're not hoping you to remember this.
they're not hoping you to remember this. They're telling you forgot forget about
They're telling you forgot forget about this. Why don't you learn this table?
this. Why don't you learn this table? Now this is the jet score table and we
Now this is the jet score table and we need to understand it uh in some time
need to understand it uh in some time but not not not now what this table does
but not not not now what this table does how it can help us and blah blah blah
how it can help us and blah blah blah right now tell me one thing this is
right now tell me one thing this is known as jet score table zed score table
known as jet score table zed score table okay I'm not in not paying too much
okay I'm not in not paying too much attention on uh so might be sounding
attention on uh so might be sounding like zed score but it is not it's not
like zed score but it is not it's not zed it's z score table or zcore table
zed it's z score table or zcore table okay
okay so Now if you see guys if you look at
so Now if you see guys if you look at you know Z =0 which is here this is zero
you know Z =0 which is here this is zero okay so all the values of you know zed
okay so all the values of you know zed is coming here okay on the left why it's
is coming here okay on the left why it's picking it up all the values are coming
picking it up all the values are coming on the left then can see zed value 0.1
on the left then can see zed value 0.1 2.3 and you have another set of values
2.3 and you have another set of values which are decimal values on the on the
which are decimal values on the on the horizontal line. So you have two line
horizontal line. So you have two line vertical line and horizontal line. Let's
vertical line and horizontal line. Let's say I'm trying to find a Z score for
say I'm trying to find a Z score for let's say some value like let's say what
let's say some value like let's say what is the Z score for one. So when the
is the Z score for one. So when the moment I say one it means that from 0 to
moment I say one it means that from 0 to 1 what is the value? So 0 to one how
1 what is the value? So 0 to one how will you find? You can clearly see that
will you find? You can clearly see that this is your value. What is that 3413?
this is your value. What is that 3413? It's your probability. Okay don't get me
It's your probability. Okay don't get me wrong it's your probability. Okay. So
wrong it's your probability. Okay. So let's say if I fit a normal distribution
let's say if I fit a normal distribution curve. I take an example to make you
curve. I take an example to make you understand this table. Let's say this is
understand this table. Let's say this is zero and this is one and this is minus
zero and this is one and this is minus one. I'm interested in finding the mu +
one. I'm interested in finding the mu + - sigma. Guys this is mu plus - sigma.
- sigma. Guys this is mu plus - sigma. Now mu is 0 and sigma is 1. Mu + -
Now mu is 0 and sigma is 1. Mu + - sigma. Basically you are trying to find
sigma. Basically you are trying to find 0 + - one. Now to get it you can also
0 + - one. Now to get it you can also understand that this is symmetric. If
understand that this is symmetric. If you find 0 to one you can also get the
you find 0 to one you can also get the whole thing and as this example say uh
whole thing and as this example say uh this always talks about 0 to zed right.
this always talks about 0 to zed right. So 0 to one if I say already I
So 0 to one if I say already I identified it. See zed 0 to one means in
identified it. See zed 0 to one means in the chart you have to find one. What is
the chart you have to find one. What is the value for one? Zed equal to 1. What
the value for one? Zed equal to 1. What is this value my dear? 3143
is this value my dear? 3143 and since it is symmetric it doesn't
and since it is symmetric it doesn't matter the Z value for minus1 to 0 will
matter the Z value for minus1 to 0 will be same area wise you can look at that
be same area wise you can look at that it's not in this table do you see any
it's not in this table do you see any negative value minus one or not right if
negative value minus one or not right if you do the total are you seeing a number
you do the total are you seeing a number which is very much familiar to you let
which is very much familiar to you let me show you in case you are not able to
me show you in case you are not able to recall what is this number guys mu plus
recall what is this number guys mu plus - 68.27 27 are you able to almost
- 68.27 27 are you able to almost getting the same hello in percentage
getting the same hello in percentage what is it 68.26% 26% of the data.
what is it 68.26% 26% of the data. Interesting. So this table works, right?
Interesting. So this table works, right? This table works. So what you can
This table works. So what you can actually calculate and find and then
actually calculate and find and then answer you are getting make sense guys.
answer you are getting make sense guys. Why it is 50%.
Why it is 50%. Tometric question and he was supposed to
Tometric question and he was supposed to find this angle. So so he supposed to
find this angle. So so he supposed to find this angle and then what he did he
find this angle and then what he did he drew this extension and then he was
drew this extension and then he was thinking that this will be 60. So this
thinking that this will be 60. So this angle will be 180 minus 60. So 120.
angle will be 180 minus 60. So 120. What is the basis for this? Are you
What is the basis for this? Are you observing it with the line? You're not
observing it with the line? You're not looking at any condition. What angle?
looking at any condition. What angle? What is the triangle shape? Right? So
What is the triangle shape? Right? So exactly you're doing the same that
exactly you're doing the same that sometime people like student do this
sometime people like student do this blunder. They they just observe thing
blunder. They they just observe thing and then answer. Here observation
and then answer. Here observation doesn't work all the time. You have to
doesn't work all the time. You have to go with the concept. The concept says
go with the concept. The concept says that if it is 0 to one, how will you
that if it is 0 to one, how will you find 0 to one in the Z score table? You
find 0 to one in the Z score table? You will go and find what is the value for
will go and find what is the value for one because this table gives you from 0
one because this table gives you from 0 to 0 to zed. So suppose you want to find
to 0 to zed. So suppose you want to find 0 to one. I can see that 0 to 1 is this.
0 to one. I can see that 0 to 1 is this. Then you might be wondering what is
Then you might be wondering what is this? You should ask that what is this?
this? You should ask that what is this? What is this? So suppose you want to
What is this? So suppose you want to find till 1.5,
find till 1.5, right? Or let's say you want to find
right? Or let's say you want to find 1.05. 05. How will you find 1.05? So
1.05. 05. How will you find 1.05? So you'll take one here my dear and 0.05
you'll take one here my dear and 0.05 you will look at here and then you will
you will look at here and then you will draw two lines and then where it is
draw two lines and then where it is intersecting that's the value. So what
intersecting that's the value. So what is the value guys? 3531. Are you getting
is the value guys? 3531. Are you getting how to use this? This is for decimal
how to use this? This is for decimal places. Okay don't feel that if you're
places. Okay don't feel that if you're not getting it uh what's going on? It's
not getting it uh what's going on? It's normal behavior with the normal
normal behavior with the normal distribution. Okay. Normal is not
distribution. Okay. Normal is not normal. seems doesn't seem like that
normal. seems doesn't seem like that right just trying to be you know making
right just trying to be you know making the moment like normal doesn't seem to
the moment like normal doesn't seem to be normal right why don't we do a
be normal right why don't we do a question right so let me take this top
question right so let me take this top 5%
5% of applicants
of applicants as measured by
as measured by GRE scores top 5% of applicants as
GRE scores top 5% of applicants as measured by GRE scores will receive
measured by GRE scores will receive scholarships
scholarships If GRE
If GRE is normally distributed with the mean
is normally distributed with the mean 500 and the variance right sigma square
500 and the variance right sigma square is variance so sigma is 100 being smart
is variance so sigma is 100 being smart how does how high does your GR score
how does how high does your GR score have to be to qualify for a scholarship
have to be to qualify for a scholarship real question we're trying to estimate
real question we're trying to estimate it have to be to qualify for a
it have to be to qualify for a scholarship
scholarship so this is for all those guys having
so this is for all those guys having American dream going to US going to US
American dream going to US going to US university get a job learn there study
university get a job learn there study there and settle there right so they
there and settle there right so they write GRE exam and definitely those who
write GRE exam and definitely those who have I know some of the guys might be in
have I know some of the guys might be in US already so they know it's not that
US already so they know it's not that easy first of all you need to have a lot
easy first of all you need to have a lot of money in the bank or you get a
of money in the bank or you get a scholarship there only two option yeah
scholarship there only two option yeah so did you understand the question guys
so did you understand the question guys you have a normal distribution mu is 500
you have a normal distribution mu is 500 sigma is 100 and they're asking that
sigma is 100 and they're asking that what should be the value sorry what
what should be the value sorry what should be the GR score to qualify for a
should be the GR score to qualify for a scholarship you will not be able to do
scholarship you will not be able to do this right now let me do it for you so
this right now let me do it for you so guys definitely when you look at this
guys definitely when you look at this it's a normal distribution it's a not a
it's a normal distribution it's a not a standard normal distribution it's a
standard normal distribution it's a normal distribution because why it is
normal distribution because why it is not a standard normal distribution still
not a standard normal distribution still you know awake and not sleepy
you know awake and not sleepy how So how do you realize that it is
how So how do you realize that it is normal distribution not a standard
normal distribution not a standard normal distribution? Come on Jantaa let
normal distribution? Come on Jantaa let me know what is in standard normal
me know what is in standard normal distribution what is the value of mu is
distribution what is the value of mu is it 500 it's normal distribution right
it 500 it's normal distribution right because mu is something non zero and
because mu is something non zero and sigma is also not one it is 100 so it's
sigma is also not one it is 100 so it's normal distribution cool now I needed to
normal distribution cool now I needed to convert into standard normal right so
convert into standard normal right so standard normal I have to get into how
standard normal I have to get into how will I get it I'll calculate zed which
will I get it I'll calculate zed which is x - mu by sigma which is basically x
is x - mu by sigma which is basically x - 500 by 100 right guys so far with me
- 500 by 100 right guys so far with me now here you'll need a little bit of
now here you'll need a little bit of help on the from the question itself
help on the from the question itself they'll give you in the question itself
they'll give you in the question itself that if you talk about that I need some
that if you talk about that I need some data about the threshold this threshold
data about the threshold this threshold will be given from the data itself and
will be given from the data itself and it's given right that 5% of the
it's given right that 5% of the applicants
applicants are going to you
are going to you received a scholarship like you know if
received a scholarship like you know if you are having 96 percentile 97
you are having 96 percentile 97 percentile you get the scholarship right
percentile you get the scholarship right remember those who have written CAT exam
remember those who have written CAT exam or any exam which are having percentile
or any exam which are having percentile percentile means suppose you are at 99
percentile means suppose you are at 99 what does it mean it means that 99% of
what does it mean it means that 99% of the students are behind you right hello
the students are behind you right hello so similarly for 95%
so similarly for 95% this is your 95% the whole of it 95 and
this is your 95% the whole of it 95 and this small part is your 5% let's say
this small part is your 5% let's say this is your 5% % and this all is your
this is your 5% % and this all is your 95%. So those who are going to be in the
95%. So those who are going to be in the pink area they are going to get the
pink area they are going to get the scholarship. Hello guys so far with me.
scholarship. Hello guys so far with me. So the major chunk is 95 obviously
So the major chunk is 95 obviously majority doesn't get scholarship right
majority doesn't get scholarship right so far with me 5%. Now this 5% cut off
so far with me 5%. Now this 5% cut off will be given from the will be given by
will be given from the will be given by other question itself. Generally when
other question itself. Generally when I'll teach you confidence interval
I'll teach you confidence interval hypothesis testing right. So I will give
hypothesis testing right. So I will give you this don't worry about it that 95%
you this don't worry about it that 95% means you have a probability there and
means you have a probability there and that probability is going to get
that probability is going to get converted into a jet score and that jet
converted into a jet score and that jet score will be 1.65 for the time being
score will be 1.65 for the time being just remember it. So alpha 0.05 gives
just remember it. So alpha 0.05 gives you the alpha cutoff right and for this
you the alpha cutoff right and for this corresponding jet score is for now you
corresponding jet score is for now you just take it as a value later I'll
just take it as a value later I'll explain okay it is 1.65 65 right why is
explain okay it is 1.65 65 right why is this coming because this is your
this coming because this is your confidence interval cutff and this is
confidence interval cutff and this is where your the magic happens okay so for
where your the magic happens okay so for the time being I'm just saying again and
the time being I'm just saying again and again forget how I have reached to this
again forget how I have reached to this how they have given you this number
how they have given you this number later you will see so now you have the
later you will see so now you have the jet score cut off anybody who is having
jet score cut off anybody who is having higher than this value will get the
higher than this value will get the scholarship so 1.65 65 x - 500 by 100.
scholarship so 1.65 65 x - 500 by 100. If you calculate it will come to 665.
If you calculate it will come to 665. Hello. So anyone who scoring more than
Hello. So anyone who scoring more than 665 they'll get a scholarship. Hello
665 they'll get a scholarship. Hello guys are able to follow this question.
guys are able to follow this question. So you might be wondering that how do
So you might be wondering that how do you how do you know that? Because see if
you how do you know that? Because see if this is your cutff which is 1.65 65 then
this is your cutff which is 1.65 65 then as you increasing as go towards the
as you increasing as go towards the right this value will increase now right
right this value will increase now right buddy the question is how high does your
buddy the question is how high does your GS go when I say how high means how
GS go when I say how high means how minimum you have to maintain to get the
minimum you have to maintain to get the scholarship higher cutff they're not
scholarship higher cutff they're not asking they're asking the cutff right
asking they're asking the cutff right suppose the qualifying marks is 665
suppose the qualifying marks is 665 you'll get the scholarship clear now
you'll get the scholarship clear now I'll explain that guys how did you get
I'll explain that guys how did you get that 1.65 65. Why don't you reverse
that 1.65 65. Why don't you reverse engineering it? Where is 1.6 here? 1.65
engineering it? Where is 1.6 here? 1.65 will be here. 0.05.
will be here. 0.05. Can we go there and see what is this
Can we go there and see what is this value my dear? Come on. What is this
value my dear? Come on. What is this value? This is what 4550.
value? This is what 4550. Can you put into
Can you put into into two this probability? How much is
into two this probability? How much is that?
that? 91% 90%.
91% 90%. We need 95.
We need 95. So it is close but not 100% close. Why
So it is close but not 100% close. Why I'm missing that? Because we are looking
I'm missing that? Because we are looking at a different chart all 1.65.
at a different chart all 1.65. No, this is not correct. Something is
No, this is not correct. Something is fishy with the chart. There's another
fishy with the chart. There's another distribution guys. Let me show you
distribution guys. Let me show you another table. Ideally, I use this table
another table. Ideally, I use this table for getting this more better. So this is
for getting this more better. So this is like you know the problem is that this
like you know the problem is that this chart is that this chart can only be
chart is that this chart can only be used and that's why we not use when
used and that's why we not use when something we are measuring from zero
something we are measuring from zero right guys remember they're saying 0 to
right guys remember they're saying 0 to zed but if you are trying to do from
zed but if you are trying to do from minus infinity till here right this
minus infinity till here right this value so then you should use this minus
value so then you should use this minus infinity to zed right this is the most
infinity to zed right this is the most appropriate chart now you tell me where
appropriate chart now you tell me where is 1.65 so this is 1.6 and 1.65 65. Now
is 1.65 so this is 1.6 and 1.65 65. Now let's see. So this is a value and the
let's see. So this is a value and the the closer values are this. Now you are
the closer values are this. Now you are making making sense guys. So it is
making making sense guys. So it is coming to what? Either you take this
coming to what? Either you take this higher cutff 95.05
higher cutff 95.05 right? So there are no concepts issue
right? So there are no concepts issue guys. Don't get excited. It's just that
guys. Don't get excited. It's just that we looking at a different table that
we looking at a different table that there are two formats to the table. One
there are two formats to the table. One is from minus infinity to zed and the
is from minus infinity to zed and the other is from zero to z. But we need
other is from zero to z. But we need this time we needed what? from here or
this time we needed what? from here or from starting. I think from starting.
from starting. I think from starting. Now guys, let me talk about important
Now guys, let me talk about important things which is I feel uh the first
things which is I feel uh the first question I generally ask moving away
question I generally ask moving away from descriptive to in in inferential is
from descriptive to in in inferential is that what is the difference between
that what is the difference between descriptive and inferential statistics?
descriptive and inferential statistics? Anyone anyone want to take a shot at
Anyone anyone want to take a shot at this answering it? Inferential is
this answering it? Inferential is conclusive. Descriptive is unbiased
conclusive. Descriptive is unbiased measurement. Okay. But someone has
measurement. Okay. But someone has answered in terms of what we will do
answered in terms of what we will do with what. Right? But what is the key
with what. Right? But what is the key difference? Descriptive, you know, if
difference? Descriptive, you know, if you go with the literal meaning of it.
you go with the literal meaning of it. Describing your data. Now that data will
Describing your data. Now that data will be population or sample, that is another
be population or sample, that is another discussion, right? But if you are
discussion, right? But if you are summarizing and presenting the data in a
summarizing and presenting the data in a meaningful way to describe it without
meaningful way to describe it without taking or making or generalizing any
taking or making or generalizing any inferences based on the presented data.
inferences based on the presented data. So inferential is conclusive. So if I
So inferential is conclusive. So if I can give you some pointers guys. Okay.
can give you some pointers guys. Okay. So descriptive if I may write
So descriptive if I may write descriptive statistics I'll just write
descriptive statistics I'll just write stats you know involves
stats you know involves involves the summarization
involves the summarization and presentation of data in a meaningful
and presentation of data in a meaningful way. It's not like it's good for nothing
way. It's not like it's good for nothing just it describes the data you know. So
just it describes the data you know. So what is the primary goal here guys?
what is the primary goal here guys? Primary or goal or objective is to
Primary or goal or objective is to describe the data obviously with the
describe the data obviously with the help of mean you know dispersion
help of mean you know dispersion measures of central tendency
measures of central tendency leveraging
leveraging you know measures of central tendency
you know measures of central tendency what is that guys what are the measures
what is that guys what are the measures of central tendency mean median mode
of central tendency mean median mode right must have been thought measures of
right must have been thought measures of dispersion right spreadness variance
dispersion right spreadness variance standard deviation right measure of
standard deviation right measure of shape what is measures of shape Okay, I
shape what is measures of shape Okay, I hope you must have been taught skewness
hope you must have been taught skewness ktosis right etc. is used when you want
ktosis right etc. is used when you want to make prediction or generalization
to make prediction or generalization about a population based on a sample.
about a population based on a sample. Yes, that's the application of inferial
Yes, that's the application of inferial statistics, right? Why do we need
statistics, right? Why do we need inferial? That's the answer to that
inferial? That's the answer to that question. We are still understanding
question. We are still understanding what's the difference between what is
what's the difference between what is what. So guys definitely you all of the
what. So guys definitely you all of the answers which you have given is correct
answers which you have given is correct that descriptive is all about population
that descriptive is all about population inferial is all about sample but one
inferial is all about sample but one thing which is very much you have to
thing which is very much you have to very much clear about that that
very much clear about that that descriptive stats do not make any
descriptive stats do not make any inferences okay that's why you have a
inferences okay that's why you have a separate branch all together do not you
separate branch all together do not you know draw conclusions
know draw conclusions or inferences but it doesn't mean that
or inferences but it doesn't mean that you will not find any logical conclusion
you will not find any logical conclusion right to RN point it does not draw any
right to RN point it does not draw any conclusion or inferences beyond what is
conclusion or inferences beyond what is presented in data so basically means
presented in data so basically means what generalizing it what you see in the
what generalizing it what you see in the data you tell it right it's an unbiased
data you tell it right it's an unbiased approach then I'll go into inferential
approach then I'll go into inferential so this is what descriptive is I'll get
so this is what descriptive is I'll get into inferential
into inferential if you remember in school days in the
if you remember in school days in the English MCB main course book those who
English MCB main course book those who have done CBSC they would know I don't
have done CBSC they would know I don't know it's still there or not but I'm
know it's still there or not but I'm talking about more than a decade ago. So
talking about more than a decade ago. So when I was in class 9 10 right so there
when I was in class 9 10 right so there we had a book called MCD main course
we had a book called MCD main course book there they used to give some you
book there they used to give some you know five or six lines right in the
know five or six lines right in the paper in the exam and then they used to
paper in the exam and then they used to ask what is the what are the what is the
ask what is the what are the what is the inference of this you know these four or
inference of this you know these four or five lines right remember somebody
five lines right remember somebody remembers that and then they used to
remembers that and then they used to give five option four option one would
give five option four option one would be correct right so obviously uh read
be correct right so obviously uh read the those four or five lines right and
the those four or five lines right and trying to make some conclusion on
trying to make some conclusion on inferences about it that's what you must
inferences about it that's what you must have done in English but it's a similar
have done in English but it's a similar kind of situation for statistics as home
kind of situation for statistics as home so inferial statistics as the name
so inferial statistics as the name suggest you draw some inferences right
suggest you draw some inferences right the literal meaning of inferences is
the literal meaning of inferences is conclusion some generalization
conclusion some generalization so I'll also keep talking and write so
so I'll also keep talking and write so that I can save some time involve making
that I can save some time involve making inferences
inferences And once you have done the inferences
And once you have done the inferences with you, you draw some conclusion
with you, you draw some conclusion about a population
about a population based on sample data and that will be
based on sample data and that will be random sample. There are there are
random sample. There are there are numerous way of doing the random sample.
numerous way of doing the random sample. So what is the goal? The primary goal is
So what is the goal? The primary goal is to
the goal is to generalize findings or insights
insights from a sample to a larger population,
from a sample to a larger population, right? And then you also have some
right? And then you also have some techniques there
techniques there leveraging. What are the techniques?
leveraging. What are the techniques? Generally we use guys like as simple as
Generally we use guys like as simple as doing hypothesis testing, right? So you
doing hypothesis testing, right? So you actually form a hypothesis and you test
actually form a hypothesis and you test it to come to a conclusion you know
it to come to a conclusion you know using confidence interval concept right
using confidence interval concept right CI concept and you will love it the way
CI concept and you will love it the way I'll teach you I can promise as simple
I'll teach you I can promise as simple as running some tests like t test right
as running some tests like t test right ANOVA analysis of variance etc okay so
ANOVA analysis of variance etc okay so this is my introduction to the
this is my introduction to the difference between these two guys now at
difference between these two guys now at least we have come we have come to a
least we have come we have come to a point where we know that you know
point where we know that you know descriptive is all about generalizing
descriptive is all about generalizing the data sorry uh describing the data
the data sorry uh describing the data inferential is all about making
inferential is all about making inferences or generalization about a
inferences or generalization about a larger data set uh from a sample of
larger data set uh from a sample of those right that is clear if that is
those right that is clear if that is clear guys so then I'll give you a movie
clear guys so then I'll give you a movie recommendation and this is just one
recommendation and this is just one summary which I have already written I
summary which I have already written I don't want to write see so if I have to
don't want to write see so if I have to summarize it in In a nutshell,
summarize it in In a nutshell, descriptive talks about describing the
descriptive talks about describing the visible characteristics of a data which
visible characteristics of a data which is can be a population or sample. So not
is can be a population or sample. So not always be like stable. I only do
always be like stable. I only do population with you can also find
population with you can also find descriptive statistics for sample. Who
descriptive statistics for sample. Who is stopping you? Meanwhile, inferential
is stopping you? Meanwhile, inferential uh talks about inferences or
uh talks about inferences or generalization. Look at the the green
generalization. Look at the the green highlighter about a larger data set
highlighter about a larger data set based on sample of those and that sample
based on sample of those and that sample will be random. There are lot of samples
will be random. There are lot of samples stratified strategic random simple
stratified strategic random simple random there are a lot of things are
random there are a lot of things are there let me talk about sample versus
there let me talk about sample versus population okay so whenever we talk
population okay so whenever we talk about sampling you know so sampling is
about sampling you know so sampling is very very important for any inferial
very very important for any inferial statistics uh right any sort of test it
statistics uh right any sort of test it cannot test everybody okay so let's say
cannot test everybody okay so let's say you are looking at a population which is
you are looking at a population which is let's the country population India has
let's the country population India has how many people how many we are 1.5 4 or
how many people how many we are 1.5 4 or we are now 1.5 how is it 1.4 1.5 what is
we are now 1.5 how is it 1.4 1.5 what is it 1.6 huh okay so 160 kores so let's
it 1.6 huh okay so 160 kores so let's say if I try to measure something 160 cr
say if I try to measure something 160 cr people this will be quite a lot of data
people this will be quite a lot of data isn't it hello and quite a lot of you
isn't it hello and quite a lot of you are if you are referring to what is
are if you are referring to what is India's population in 2023 I think it is
India's population in 2023 I think it is saying 1.43
saying 1.43 or 1.4 44 not 1.6 so yeah it is we are
or 1.4 44 not 1.6 so yeah it is we are close to 1.4243 4243 anyway so doesn't
close to 1.4243 4243 anyway so doesn't matter for my example suppose I ask you
matter for my example suppose I ask you to go and talk to everybody and ask
to go and talk to everybody and ask their mathematics marks right and give
their mathematics marks right and give me the mean of maths for Indians who
me the mean of maths for Indians who have studied maths mean except those
have studied maths mean except those guys who are in let's say and grade wise
guys who are in let's say and grade wise so grade one average marks grade two
so grade one average marks grade two grade four do you think it is even
grade four do you think it is even possible to do that I think No, I'm
possible to do that I think No, I'm asking you to go and talk to everybody,
asking you to go and talk to everybody, right? Okay, let me take a recent
right? Okay, let me take a recent example. Election just concluded, right?
example. Election just concluded, right? You must have heard about exit poll. One
You must have heard about exit poll. One of the famous things happens on media.
of the famous things happens on media. Exit poll. Anybody has any idea exit
Exit poll. Anybody has any idea exit poll. Exit poll is like they do give you
poll. Exit poll is like they do give you know agencies. They do the polling and
know agencies. They do the polling and they do the random polling, random
they do the random polling, random sampling of the voters. They ask in a
sampling of the voters. They ask in a constituency that okay tell us what do
constituency that okay tell us what do you think? They don't go and ask whom
you think? They don't go and ask whom did you vote for? Was it party A and
did you vote for? Was it party A and I'll not take any party name. So let's
I'll not take any party name. So let's say there are three parties you know
say there are three parties you know party ABC whom did you vote for? They
party ABC whom did you vote for? They don't ask. They have a very amazing set
don't ask. They have a very amazing set of questions with they call questionaire
of questions with they call questionaire and they go and they do this survey. But
and they go and they do this survey. But do you think is it possible to let's say
do you think is it possible to let's say even let's say just uh so I was in indor
even let's say just uh so I was in indor so MP indor let's say we go to vijanagar
so MP indor let's say we go to vijanagar anybody from indor say hi. So let's say
anybody from indor say hi. So let's say and MP election just concluded. Suppose
and MP election just concluded. Suppose this is a this is a nice place in in MP
this is a this is a nice place in in MP Indor. So even for a vizagar population
Indor. So even for a vizagar population let's say the population is you know
let's say the population is you know maybe let's say 20 lakhs. Okay
maybe let's say 20 lakhs. Okay hypothetically I'm just making some
hypothetically I'm just making some numbers. Is it easy to go and talk to
numbers. Is it easy to go and talk to everybody who are 18 plus who can vote?
everybody who are 18 plus who can vote? We can vote 18 plus or 21 plus. My GK
We can vote 18 plus or 21 plus. My GK has gone. I think 18 plus is the voter.
has gone. I think 18 plus is the voter. So 18 plus guys uh can vote. You're not
So 18 plus guys uh can vote. You're not going to ask all the voters that whom do
going to ask all the voters that whom do you vote for, right? It's not possible.
you vote for, right? It's not possible. So what this agency do guys? Agency do
So what this agency do guys? Agency do the random sampling of voters. So what
the random sampling of voters. So what they do? They are going to do random
they do? They are going to do random sampling of voters. And if I'm correct,
sampling of voters. And if I'm correct, they also pay the one who participate in
they also pay the one who participate in it but they never accept it by the way.
it but they never accept it by the way. So agencies do random sampling of
So agencies do random sampling of voters. they'll do in Vizanagar and ask
voters. they'll do in Vizanagar and ask a set of question to gauge the sentiment
a set of question to gauge the sentiment right you know ask a set of question to
right you know ask a set of question to gauge the sentiment and tie it back tie
gauge the sentiment and tie it back tie back to to which party these guys voted
back to to which party these guys voted for right and then what happens they
for right and then what happens they prepare a nice summary and if you have
prepare a nice summary and if you have seen so you must have seen something
seen so you must have seen something like this I cannot show you who is
like this I cannot show you who is getting what but I can definitely show
getting what but I can definitely show show you how does it look like right
show you how does it look like right have you seen this hello so can you see
have you seen this hello so can you see if I take the first number 120 to 140
if I take the first number 120 to 140 let's say this is party A party B party
let's say this is party A party B party C party D whatever it seems that party A
C party D whatever it seems that party A has a clear majority in most of the
has a clear majority in most of the agencies so coming back to this example
agencies so coming back to this example so the you know politicizing this who is
so the you know politicizing this who is what in this list let's focus on our
what in this list let's focus on our concept so what is this 22 to 140 what
concept so what is this 22 to 140 what what do you mean by that? So these
what do you mean by that? So these agency what it what they do right let's
agency what it what they do right let's understand the process 120 to 140 right
understand the process 120 to 140 right uh is the range they are given isn't a
uh is the range they are given isn't a range that minimum party A will get 122
range that minimum party A will get 122 seats and maximum they are going to get
seats and maximum they are going to get 140 hello so far with me guys this is
140 hello so far with me guys this is what we call it as confidence interval
what we call it as confidence interval interesting so we have been seeing these
interesting so we have been seeing these exit polls for long this is known as
exit polls for long this is known as confidence interval and they
confidence interval and they confidence score also that we feel 95 we
confidence score also that we feel 95 we feel 95% chances that party A will get
feel 95% chances that party A will get these many seats. This is known as conf
these many seats. This is known as conf confidence interval estimate. It's an
confidence interval estimate. It's an estimate right because you are not
estimate right because you are not talking to everybody. You are talking to
talking to everybody. You are talking to a sample sample of people. You are
a sample sample of people. You are asking those questions right? You are
asking those questions right? You are asking those questions and you are
asking those questions and you are coming to a conclusion that okay party A
coming to a conclusion that okay party A will get this these many seats because
will get this these many seats because don't forget that you did some sampling
don't forget that you did some sampling here right this is how sampling looks
here right this is how sampling looks like you go to a constituency you will
like you go to a constituency you will you know pick pick and ask questions
you know pick pick and ask questions this is your sample right and then you
this is your sample right and then you come back to generalize something about
come back to generalize something about that population point estimate versus
that population point estimate versus confidence interval estimate point
confidence interval estimate point estimate would be like Okay, I will say
estimate would be like Okay, I will say party A will get 130 seats. Period. You
party A will get 130 seats. Period. You cannot exactly predict it 130.
cannot exactly predict it 130. Astrologer do that. I think party A will
Astrologer do that. I think party A will get 130 seats. It is a point estimate
get 130 seats. It is a point estimate one value single value that is like you
one value single value that is like you know getting the mean. Generally people
know getting the mean. Generally people say this will be the mean of the sample
say this will be the mean of the sample and then I am saying that this will
and then I am saying that this will become population. So mean of the
become population. So mean of the population I'm predicting. So on an
population I'm predicting. So on an average I feel party will get 130. But
average I feel party will get 130. But tell me getting 130 correct is tough or
tell me getting 130 correct is tough or not. Tell me if the party gets 132
not. Tell me if the party gets 132 you're wrong. If the party get 130 28
you're wrong. If the party get 130 28 you are wrong. So point estimate
you are wrong. So point estimate you cannot you know put a confidence to
you cannot you know put a confidence to it. So that's why statistically
it. So that's why statistically we generally go with confidence interval
we generally go with confidence interval concept. Now if I'm getting 132 or 128
concept. Now if I'm getting 132 or 128 still both the options are going to fall
still both the options are going to fall in here. Hello and your analysis is
in here. Hello and your analysis is still correct. That happens with
still correct. That happens with clinical trials in you know vaccines as
clinical trials in you know vaccines as well in the covid hit us. So a lot of
well in the covid hit us. So a lot of vaccine trials would have happened and
vaccine trials would have happened and I'll take that example. I talked about
I'll take that example. I talked about that case study that vaccine trials
that case study that vaccine trials right vaccine trials are not going to be
right vaccine trials are not going to be 100% accurate and when the you know
100% accurate and when the you know pandemic happens right it's a mahammari
pandemic happens right it's a mahammari it's very tough because you have you are
it's very tough because you have you are running against the clock right you need
running against the clock right you need to get a vaccine in the market so you
to get a vaccine in the market so you cannot be like waiting for 99.99%
cannot be like waiting for 99.99% accuracy right you will go with some
accuracy right you will go with some conf interval generally there they use
conf interval generally there they use 95 because it's related to health you
95 because it's related to health you cannot administer something to someone
cannot administer something to someone who are having some co-orbidities they
who are having some co-orbidities they they might die right rather than they
they might die right rather than they die with the virus they die with because
die with the virus they die with because of your medicine. So you have to be very
of your medicine. So you have to be very careful but they do all those clinical
careful but they do all those clinical trials to come to the range of efficacy.
trials to come to the range of efficacy. So that's what is the efficacy of the
So that's what is the efficacy of the medicine how effective it is you know
medicine how effective it is you know are you it's a score. So that score if
are you it's a score. So that score if it comes in a threshold you go and
it comes in a threshold you go and release the vaccine in the market hello
release the vaccine in the market hello and then you put in the batch
and then you put in the batch production. So that's the difference
production. So that's the difference between the point estimate and the
between the point estimate and the confidence interval range. I I'll
confidence interval range. I I'll deliberate more on it. But point
deliberate more on it. But point estimate does not provide a room of
estimate does not provide a room of error. There's no margin of error. 130
error. There's no margin of error. 130 means 130. But confidence interval does
means 130. But confidence interval does provide margin of error. You can make
provide margin of error. You can make mistake and that you can also set it up.
mistake and that you can also set it up. Suppose you say 80% confident interval
Suppose you say 80% confident interval then your range will be like 100 to 140
then your range will be like 100 to 140 or 100 to 160. Are you feeling guys?
or 100 to 160. Are you feeling guys? Then you will say that what is this? You
Then you will say that what is this? You cannot have this much of range right 100
cannot have this much of range right 100 to 160 and then some party will uh you
to 160 and then some party will uh you know the majority is like only 60 let's
know the majority is like only 60 let's say majority to form a government is 70
say majority to form a government is 70 and you are putting a range of 100 to
and you are putting a range of 100 to 200 error cannot be the majority right
200 error cannot be the majority right are you getting my point what I'm saying
are you getting my point what I'm saying let's say to form the government the
let's say to form the government the majority is like 70 you are putting the
majority is like 70 you are putting the range of 100 that the gap is 100 so
range of 100 that the gap is 100 so that's not a
that's not a uh study so that's where range how much
uh study so that's where range how much we should Keep a percentage 95 99 90.
we should Keep a percentage 95 99 90. Generally these are the confidence
Generally these are the confidence interval levels 95 90 uh 99 99 but that
interval levels 95 90 uh 99 99 but that in any exit poll they follow a very good
in any exit poll they follow a very good uh set of approaches to run to come a
uh set of approaches to run to come a number which is your confident interval
number which is your confident interval that's what they show in the TV which
that's what they show in the TV which you are seeing right and they also claim
you are seeing right and they also claim that my analysis depends on blah blah
that my analysis depends on blah blah blah thing right that's why if you see
blah thing right that's why if you see some of the guys are like like rockstar
some of the guys are like like rockstar can you see these guys I am telling
can you see these guys I am telling you'll get exactly This is like ED they
you'll get exactly This is like ED they must be new. So definitely these are the
must be new. So definitely these are the top players right? C voter I have been
top players right? C voter I have been seeing since my childhood days. But this
seeing since my childhood days. But this is like new nation whatever. Can you see
is like new nation whatever. Can you see that they are giving the point estimate?
Just a quick info guys. Intellipath offers a data science course in
offers a data science course in collaboration with iHub IIT R key which
collaboration with iHub IIT R key which will help you master concepts like
will help you master concepts like Python, SQL, machine learning, AI,
Python, SQL, machine learning, AI, PowerBI and more. With this course, we
PowerBI and more. With this course, we have already helped thousands of
have already helped thousands of professionals in successful career
professionals in successful career transition. You can check out their
transition. You can check out their testimonials on our achievers channel
testimonials on our achievers channel whose link is given in the description
whose link is given in the description below. Without a doubt, this course can
below. Without a doubt, this course can set your career to new heights. So,
set your career to new heights. So, visit the course page link given below
visit the course page link given below in the description and take the first
in the description and take the first step towards career growth with the data
step towards career growth with the data science course. Right? And there we
science course. Right? And there we understood that there can be two types
understood that there can be two types of estimate. One is point estimate and
of estimate. One is point estimate and other is confidence and interval
other is confidence and interval estimate. So we are going to extend this
estimate. So we are going to extend this conversation further. Okay. I think
conversation further. Okay. I think there was one question right this family
there was one question right this family income if the poverty had level is this.
income if the poverty had level is this. What percent of population lives in
What percent of population lives in poverty? So here uh mu is what? $25,000.
poverty? So here uh mu is what? $25,000. Sigma is 10,000. So zed is what? X - mu
Sigma is 10,000. So zed is what? X - mu by sigma. If the poverty level is this
by sigma. If the poverty level is this which is 10,000 minus 25,000 by sigma
which is 10,000 minus 25,000 by sigma 10,000 so that will be -5,000 by 10,000
10,000 so that will be -5,000 by 10,000 you'll get - 1.5 right that's what you
you'll get - 1.5 right that's what you are getting now you are looking for a
are getting now you are looking for a you're looking for a negative table so
you're looking for a negative table so try to understand this let's say this is
try to understand this let's say this is your normal distribution
your normal distribution right standardized normal distribution
right standardized normal distribution And let's say this is your after doing
And let's say this is your after doing this this is zero. Let's say this is
this this is zero. Let's say this is your 1.5 Z score and this is minus 1.5.
your 1.5 Z score and this is minus 1.5. Definitely if you keep going on the left
Definitely if you keep going on the left hand side these are the guys who are
hand side these are the guys who are going to be below poverty level right it
going to be below poverty level right it is as good as saying that these are the
is as good as saying that these are the guys because it is symmetric. So you
guys because it is symmetric. So you need area from minus infinity to minus
need area from minus infinity to minus 1.5. What you can do? You can find area
1.5. What you can do? You can find area of 0 to 1.5. You find this area that
of 0 to 1.5. You find this area that will give you the probability and
will give you the probability and subtract 0.5 now. So,
subtract 0.5 now. So, so you have to use this table my dear.
so you have to use this table my dear. So, where is 1.5? It is 4332. Right? So,
So, where is 1.5? It is 4332. Right? So, then how much is that? 0 to 1.5 is how
then how much is that? 0 to 1.5 is how much?
much? 4332.
4332. Okay. And 5 -4332
Okay. And 5 -4332 is how much? If I I can write.5
is how much? If I I can write.5 minus.4332
minus.4332 equal to 0.668
equal to 0.668 into 100%. If you do how much is that
into 100%. If you do how much is that 6.68%
6.68% coming to the now if you're wondering
coming to the now if you're wondering what is it? It's very simple. What I
what is it? It's very simple. What I have done I have calculated this area
have done I have calculated this area from the table the Z score table. it
from the table the Z score table. it comes to what 4332
comes to what 4332 if you subtract. So this area is going
if you subtract. So this area is going to be the same as this area right it's
to be the same as this area right it's symmetric. So what I've done I have
symmetric. So what I've done I have taken the whole area which is from here
taken the whole area which is from here to here and that is 0.5 you remember
to here and that is 0.5 you remember 0.51 I've taken 0.5 and subtracted this
0.51 I've taken 0.5 and subtracted this part I would have I have gotten this
part I would have I have gotten this part clear it was easy question the only
part clear it was easy question the only complication was how to calculate the
complication was how to calculate the area using the same table right you
area using the same table right you could not have used this table because
could not have used this table because you don't have minus infinity I think
you don't have minus infinity I think this is a typo from my side you're right
this is a typo from my side you're right it is 0 to zed right 0 to zed
it is 0 to zed right 0 to zed It's not from minus ideally it is minus
It's not from minus ideally it is minus infinity. Sorry, my bad. It is from
infinity. Sorry, my bad. It is from minus infinity because if you see here
minus infinity because if you see here it is 0.5. So if I do that minus
it is 0.5. So if I do that minus infinity to 0 is.5 only. So this I wrote
infinity to 0 is.5 only. So this I wrote it correct. No there's no typo for
it correct. No there's no typo for myself. Sorry I got confused. But this
myself. Sorry I got confused. But this is from minus infinity to zed. But you
is from minus infinity to zed. But you don't have somewhere minus 1.5 here. So
don't have somewhere minus 1.5 here. So what you can do? You can go till the
what you can do? You can go till the last like others are doing and till the
last like others are doing and till the last if you go so let's say this is 1.5
last if you go so let's say this is 1.5 subtract this from 1.5 you will get the
subtract this from 1.5 you will get the same answer I think Benor has done
same answer I think Benor has done similar approach right Ben Wen so there
similar approach right Ben Wen so there are multiple ways to reach to the same
are multiple ways to reach to the same answer let me show you 9332
answer let me show you 9332 see coming now I'll move on to the
see coming now I'll move on to the concept of you know so I was saying
concept of you know so I was saying right that exit call example Right. So
right that exit call example Right. So just to extend a little more. So for the
just to extend a little more. So for the exit poll a point estimate looks like
exit poll a point estimate looks like let's say there are 300 seats. So now
let's say there are 300 seats. So now I'll take my own example. You have three
I'll take my own example. You have three party A
party A B
B party C. Let's say you are saying party
party C. Let's say you are saying party A will get 180 seats,
A will get 180 seats, party B will get 80 seats and party C
party B will get 80 seats and party C will get 40. total to 300. If you do the
will get 40. total to 300. If you do the maths, this is what we call it as what
maths, this is what we call it as what point estim
point estim you are giving a single value that this
you are giving a single value that this will happen exactly the same. So there
will happen exactly the same. So there we don't have any you know margin of
we don't have any you know margin of error right but on contrary if you go to
error right but on contrary if you go to the other side of the story where we are
the other side of the story where we are saying we'll talk about this is point
saying we'll talk about this is point estimate let's say we talk about
estimate let's say we talk about confidence interval. So what we do we
confidence interval. So what we do we add a margin of error to the point
add a margin of error to the point estimate and we develop confidence
estimate and we develop confidence interval and then we have different
interval and then we have different levels of confidence that okay you will
levels of confidence that okay you will say 90%
say 90% 95% 99% which one you are going to say
95% 99% which one you are going to say so for example if I look at party A so
so for example if I look at party A so I'll say okay party A will get
I'll say okay party A will get 170 to 190 seats so let me rewrite this
170 to 190 seats so let me rewrite this party A party B and party C let's say we
party A party B and party C let's say we say party A would get 170 to 190 seats
say party A would get 170 to 190 seats so what you have done what you've done
so what you have done what you've done my that you've added a margin of error
my that you've added a margin of error you have added a margin of error
you have added a margin of error obviously still the mean the the average
obviously still the mean the the average is what 180 but but you are not saying
is what 180 but but you are not saying 180 you are saying 170 to 190 which
180 you are saying 170 to 190 which means that you are going with certain
means that you are going with certain alpha let's say alpha is 95% %
alpha let's say alpha is 95% % confidence
confidence don't already I explained that alpha is
don't already I explained that alpha is you know 95 means 05 you have to do
you know 95 means 05 you have to do opposite and alpha is known as level of
opposite and alpha is known as level of significance
significance similarly for party P you will say you
similarly for party P you will say you know 70 to 90 party C will say 35 to 45
know 70 to 90 party C will say 35 to 45 something like this right this is known
something like this right this is known as interval estimate confidence interval
as interval estimate confidence interval estimate you have a range here. So I
estimate you have a range here. So I have a very good example to you know
have a very good example to you know give you the definition of confidence
give you the definition of confidence interval. I always show that so and you
interval. I always show that so and you will love it. Can you see that? So there
will love it. Can you see that? So there are two people right there are two
are two people right there are two images. Can you see that? So I have
images. Can you see that? So I have already kept the the concept and notes
already kept the the concept and notes here but if you see here look at the
here but if you see here look at the first point estimate. Let's say you go
first point estimate. Let's say you go for a uh you know fishing right? you go
for a uh you know fishing right? you go for fishing and you're using a more like
for fishing and you're using a more like a spear to fish right so you say you
a spear to fish right so you say you think that oh I'll just you know hit the
think that oh I'll just you know hit the fish with this right what is the margin
fish with this right what is the margin of error in this quite a lot of like
of error in this quite a lot of like there's no margin of error right means
there's no margin of error right means you have to make a mistake chances are
you have to make a mistake chances are very high guys you agree with me then
very high guys you agree with me then there are another image on the right
there are another image on the right hand side that talks about confidence
hand side that talks about confidence interval that you have a fishing net
interval that you have a fishing net tell me Which option will give you more
tell me Which option will give you more number of fish? More number of fishes.
number of fish? More number of fishes. Chances of you get a fish obviously the
Chances of you get a fish obviously the net one the confidence interval because
net one the confidence interval because it will place you in the range. Chances
it will place you in the range. Chances are very high to compare against with
are very high to compare against with that you will get exactly that fish. So
that you will get exactly that fish. So this is a very cool image to understand
this is a very cool image to understand that confidence interval uh you know
that confidence interval uh you know refers to the probability that a
refers to the probability that a population parameter will fall between a
population parameter will fall between a set of values right and that set of
set of values right and that set of values is like your range your interval
values is like your range your interval and that will fall for a fixed or
and that will fall for a fixed or certain proportion of times. So when you
certain proportion of times. So when you say 95% confidence interval what do you
say 95% confidence interval what do you mean by that? It means that if you do
mean by that? It means that if you do this activity
this activity of exit polling 95% times I will fall in
of exit polling 95% times I will fall in this range. I will not be in this range
this range. I will not be in this range for how much time 5%. Means I still
for how much time 5%. Means I still there will be some error there will be
there will be some error there will be mistakes but you will be 95% within the
mistakes but you will be 95% within the range and you will be 5% outside the
range and you will be 5% outside the range means mistakes have been done mean
range means mistakes have been done mean your your exit polling gone wrong. Are
your your exit polling gone wrong. Are you getting it? So that's how you define
you getting it? So that's how you define confidence interval. So see this is a
confidence interval. So see this is a normal distribution curve and this is
normal distribution curve and this is 95%
95% uh confidence interval which I mean to
uh confidence interval which I mean to say alpha is 0.05
say alpha is 0.05 again look at this and you can clearly
again look at this and you can clearly see that if alpha is 05 so it is two
see that if alpha is 05 so it is two tail so we have divided this into alpha
tail so we have divided this into alpha by two so one is for the left tail right
by two so one is for the left tail right if you do alpha by two how much is that
if you do alpha by two how much is that guys 0.025 025 in percentage it will be
guys 0.025 025 in percentage it will be 2.5%. The right tail you have alpha by 2
2.5%. The right tail you have alpha by 2 so 0.025
so 0.025 equal to 2.5%. Total is 5%. So if you
equal to 2.5%. Total is 5%. So if you look at the limit right this is your
look at the limit right this is your upper limit this is your lower limit.
upper limit this is your lower limit. What we saying that let's say we goten a
What we saying that let's say we goten a population mean right and we say that
population mean right and we say that the population mean we are trying to
the population mean we are trying to estimate there is 95% chance listen to
estimate there is 95% chance listen to me 95% chance that the population mean
me 95% chance that the population mean will fall between what these lower and
will fall between what these lower and upper limit 90%
upper limit 90% chance is your population mean will fall
chance is your population mean will fall between this this is your confidence
between this this is your confidence interval right just like in this example
interval right just like in this example I had mentioned
I had mentioned that let's say your mean is 180 and when
that let's say your mean is 180 and when I give a range of like 170 to 190 I am
I give a range of like 170 to 190 I am sure that 95% times you will get a value
sure that 95% times you will get a value 180 lying in between this range clear
180 lying in between this range clear don't worry I have given you the formula
don't worry I have given you the formula derivation everything but did you
derivation everything but did you understand the concept point estimate is
understand the concept point estimate is used in which cases then because CI has
used in which cases then because CI has a broader approach so whenever you are
a broader approach so whenever you are interested in getting a single value you
interested in getting a single value you are not interested in giving confidence
are not interested in giving confidence interval you can go even you can go and
interval you can go even you can go and use it for example for example let's say
use it for example for example let's say you want to estimate the average height
you want to estimate the average height of a population right you don't need to
of a population right you don't need to really go and give them you know let's
really go and give them you know let's say estimating
say estimating average height of a population
average height of a population so for in this case you know point
so for in this case you know point estimate can be sample
estimate can be sample might be the sample mean height right so
might be the sample mean height right so you take the sample you find the average
you take the sample you find the average that will be your point estimate for the
that will be your point estimate for the population. Okay. Now whenever you want
population. Okay. Now whenever you want to add and that's exactly being done now
to add and that's exactly being done now you you have taken a sample in the exit
you you have taken a sample in the exit polling also right remember and you have
polling also right remember and you have done the average number of seats I feel
done the average number of seats I feel from the sample it should come this is a
from the sample it should come this is a point estimate so it's not like that
point estimate so it's not like that point estimate is not being used you are
point estimate is not being used you are using point estimate in the in the more
using point estimate in the in the more like a first step and that becomes the
like a first step and that becomes the base for your confidence interval which
base for your confidence interval which you will realize that let's Say if I
you will realize that let's Say if I have a average sample height let's say
have a average sample height let's say some value X right sample mean height or
some value X right sample mean height or sample average height and now you want
sample average height and now you want to rather than you want to represent
to rather than you want to represent this into a confidence interval manner
this into a confidence interval manner or confidence interval representation
or confidence interval representation you want to say oh no I don't want to
you want to say oh no I don't want to say average height I would like to say
say average height I would like to say 95% confident that the true population
95% confident that the true population height is between the upper lower limit
height is between the upper lower limit and the upper limit so it's more likely
and the upper limit so it's more likely to you know in you know go from point
to you know in you know go from point estimate to confidence interval when you
estimate to confidence interval when you want to give a more like a certaintity
want to give a more like a certaintity of getting a value right if you are not
of getting a value right if you are not going to give that certaintity of a
going to give that certaintity of a value you just want to estimate it and
value you just want to estimate it and just say it out you can use sample mean
just say it out you can use sample mean and that's where point estimate can be
and that's where point estimate can be used everywhere you getting my point
used everywhere you getting my point it's not about that it's like a versus b
it's not about that it's like a versus b it's like a and then how we can add a
it's like a and then how we can add a layer of certaintity to a and that
layer of certaintity to a and that becomes your confidence interval point
becomes your confidence interval point estimate single value you are estimating
estimate single value you are estimating confident interval when you want to give
confident interval when you want to give a range of values and showing that okay
a range of values and showing that okay you are 95% sure that you know it will
you are 95% sure that you know it will fall in that range so now why don't we
fall in that range so now why don't we get into some question how to calculate
get into some question how to calculate confidence interval
confidence interval see I'm telling you mathematics behind
see I'm telling you mathematics behind the scene as well now guys you always
the scene as well now guys you always remember that you have a zcore for
remember that you have a zcore for different different confidence level
different different confidence level Okay, confidence level is different from
Okay, confidence level is different from confidence interval. Always remember
confidence interval. Always remember that. So this is your confidence level.
that. So this is your confidence level. This is as good as saying your 1 minus
This is as good as saying your 1 minus alpha, right? So 95% chance is 0.95 Z
alpha, right? So 95% chance is 0.95 Z score is 1.96.
score is 1.96. And you can see this in your right 1.96
And you can see this in your right 1.96 now. So 1.9 where is 06 here. So you can
now. So 1.9 where is 06 here. So you can come here. See? So what what is the
come here. See? So what what is the value you're getting guys 9750
value you're getting guys 9750 right you are here and this is your 2.5%
right you are here and this is your 2.5% do the maths if you subtract 1.9750
do the maths if you subtract 1.9750 you'll get 2.5 in percentage and
you'll get 2.5 in percentage and similarly you'll have other 2.5 so are
similarly you'll have other 2.5 so are you getting 1.96 means 95% 95% chance
you getting 1.96 means 95% 95% chance right so these are the values you should
right so these are the values you should always remember it should be on your
always remember it should be on your fingertips so once you have the
fingertips so once you have the confidence level and the jet score so
confidence level and the jet score so the formula becomes to be honest very
the formula becomes to be honest very easy. So the formula says let me give
easy. So the formula says let me give you confidence interval CI is equal to
you confidence interval CI is equal to Xar + minus Z S by roo<unk> under N
Xar + minus Z S by roo<unk> under N where
where Xar is the mean sample mean uh zed is
Xar is the mean sample mean uh zed is the
the chosen zed value right I'll also give an
chosen zed value right I'll also give an arrow to take you the right is
arrow to take you the right is yeah S is what my dear S is your
yeah S is what my dear S is your standard error standard error it's not
standard error standard error it's not standard deviation standard error I'll
standard deviation standard error I'll come to that and N is the sample size
come to that and N is the sample size clear so now you might be interested
clear so now you might be interested upper limit and lower limit so you can
upper limit and lower limit so you can calculate now so what is going to be
calculate now so what is going to be upper limit it is going to be upper
upper limit it is going to be upper limit for confidence interval XAR + Z
limit for confidence interval XAR + Z score
score S by roo<unk> under n to multiply what
S by roo<unk> under n to multiply what is going to be a lower limit confidence
is going to be a lower limit confidence interval xar minus s by roo<unk> under
interval xar minus s by roo<unk> under n. So now guys this is known as standard
n. So now guys this is known as standard error of mean. Okay let's talk about a
error of mean. Okay let's talk about a little bit more standard error of mean.
little bit more standard error of mean. So what is the meaning of it? So let me
So what is the meaning of it? So let me write one line and you'll understand
write one line and you'll understand this is a measure again this is equal to
this is a measure again this is equal to standard deviation only but not like
standard deviation only but not like typical standard deviation. You have to
typical standard deviation. You have to always say standard deviation of what?
always say standard deviation of what? So this is measure of variability
So this is measure of variability associated with sample mean when it is
associated with sample mean when it is used to estimate the population mean.
used to estimate the population mean. So suppose you have a population right
So suppose you have a population right this is your population
and you have a sample this is your sample hello and you can have multiple
sample hello and you can have multiple samples right every time you will get a
samples right every time you will get a sample mean sample one sample two so far
sample mean sample one sample two so far with me guys right if you do that random
with me guys right if you do that random sampling population mean is not going to
sampling population mean is not going to change right let's say this is your mu
change right let's say this is your mu mu p mu population mu pop so every time
mu p mu population mu pop so every time the delta between mu pop and what you
the delta between mu pop and what you have estimated there will be some number
have estimated there will be some number let's say a mu pop minus x2 bar mu pop
let's say a mu pop minus x2 bar mu pop minus x3 bar c hello if this delta if
minus x3 bar c hello if this delta if you find standard deviation for abc that
you find standard deviation for abc that becomes your standard error and dot dot
becomes your standard error and dot dot dot dot I'll get that that measure of
dot dot I'll get that that measure of variability associated with the when
variability associated with the when you're trying to estimate okay the
you're trying to estimate okay the difference always Okay, that becomes
difference always Okay, that becomes your standard error. Are we clear guys?
your standard error. Are we clear guys? And I'm going to give the formula
And I'm going to give the formula C, right? Correct guys. So obviously you
C, right? Correct guys. So obviously you can see what I was saying, right? XI is
can see what I was saying, right? XI is your different sample mean and you're
your different sample mean and you're finding that. So what we are trying to
finding that. So what we are trying to do? We are finding X1 - mu X2 - mu X3 -
do? We are finding X1 - mu X2 - mu X3 - mu. No, this is a different formula. So
mu. No, this is a different formula. So we are not actually subtracting from
we are not actually subtracting from standard error. We are we are not
standard error. We are we are not subtracting from sample mean as for this
subtracting from sample mean as for this formula. What we are doing I'm just also
formula. What we are doing I'm just also contemplating that only formula that
contemplating that only formula that sigma divided by root under n. Yeah. So
sigma divided by root under n. Yeah. So that see that's correct only. Now you
that see that's correct only. Now you are right. You are right. So n is the
are right. You are right. So n is the sample size. this what I have written
sample size. this what I have written this whole thing is your standard error.
this whole thing is your standard error. So this s okay let me let me update this
So this s okay let me let me update this because then you will get confused.
because then you will get confused. You're right. This whole thing is your
You're right. This whole thing is your standard error. This is your standard
standard error. This is your standard error. Okay. But guys I think this is uh
error. Okay. But guys I think this is uh this is something I need to just
this is something I need to just validate. But this is the formula of
validate. But this is the formula of standard error of the mean. There is no
standard error of the mean. There is no doubt about it. Standard error of the
doubt about it. Standard error of the mean. But I I'll validate okay for the
mean. But I I'll validate okay for the now for the time being. This formula can
now for the time being. This formula can be little tricky time to time. And what
be little tricky time to time. And what is n by the way guys? N is the size of
is n by the way guys? N is the size of the population. Rest terms you know
the population. Rest terms you know right? The rest of the terms you know
right? The rest of the terms you know but I'll get back to that. I need to
but I'll get back to that. I need to validate from a so every time all the
validate from a so every time all the resources will have you know s by root
resources will have you know s by root under n is the standard of the mean.
under n is the standard of the mean. Okay. Every time some of the times you
Okay. Every time some of the times you know the the portals are not that great.
know the the portals are not that great. They also make slight mistake in the
They also make slight mistake in the formula not concept. But this is how we
formula not concept. But this is how we go about it. For the time being guys,
go about it. For the time being guys, let's go into the question so that you
let's go into the question so that you have a certain idea that how do we
have a certain idea that how do we calculate confidence interval. Okay. So
calculate confidence interval. Okay. So let's say if I give you and I can give
let's say if I give you and I can give you like an example rather than
you like an example rather than calculating it. So for example,
calculating it. So for example, not a question but for example if you
not a question but for example if you look at confidence interval suppose I
look at confidence interval suppose I say sample mean is 86
say sample mean is 86 uh zed is 1.96 and standard error is
uh zed is 1.96 and standard error is standard error of the mean is 6.2 2 by
standard error of the mean is 6.2 2 by 46 total, right? 46 your sample sets.
46 total, right? 46 your sample sets. How will you do it? This is your sample
How will you do it? This is your sample mean. Xar and mu are not same. Mu is for
mean. Xar and mu are not same. Mu is for population mean, right? This is for
population mean, right? This is for population and Xar is for sample mean.
population and Xar is for sample mean. They're not same. Calculate the margin.
They're not same. Calculate the margin. So give me the upper limit, give me the
So give me the upper limit, give me the lower limit and tell me the answer. do
lower limit and tell me the answer. do the maths. By the way, guys, tell me one
the maths. By the way, guys, tell me one thing and maybe S you only tell me what
thing and maybe S you only tell me what this is standard error of the mean. But
this is standard error of the mean. But what is this S? This is your sample
what is this S? This is your sample standard deviation, right? And the
standard deviation, right? And the moment you divide by root under N, it
moment you divide by root under N, it become the standard error of the mean.
become the standard error of the mean. Okay, standard error of the mean. So
Okay, standard error of the mean. So that's the difference. Okay, by the way,
that's the difference. Okay, by the way, come come here. So what is it? 86 plus
come come here. So what is it? 86 plus 1.96. What? Why you why you guys are
1.96. What? Why you why you guys are thinking so much right? 46 come on
thinking so much right? 46 come on someone can give me this or I can write
someone can give me this or I can write myself 86 plus and guys this will be
myself 86 plus and guys this will be root under now 46 root I don't know I
root under now 46 root I don't know I can take it in one note but guys anybody
can take it in one note but guys anybody has solved it this is root under so I
has solved it this is root under so I have the answer for the upper limit as
have the answer for the upper limit as 87.79
87.79 anybody who gotten this so this is what
anybody who gotten this so this is what you will get for the lower limit you
you will get for the lower limit you have to do the same thing and it should
have to do the same thing and it should be 84.21
be 84.21 right? So if you look at this formula
right? So if you look at this formula guys Xar + minus you know uh Z into S by
guys Xar + minus you know uh Z into S by root under N this is your confident
root under N this is your confident interval. If you deliberate little more
interval. If you deliberate little more you'll realize that as the sample size
you'll realize that as the sample size increases
increases right confidence interval become narrow.
right confidence interval become narrow. It's obvious, right? It's a general
It's obvious, right? It's a general thumb rule, right? Should become more
thumb rule, right? Should become more narrow and vice versa, right? Because
narrow and vice versa, right? Because you have more data points to get more
you have more data points to get more exact answer. So your confidence
exact answer. So your confidence interval can be narrow and you can still
interval can be narrow and you can still achieve better accuracy. Let me give a
achieve better accuracy. Let me give a question here. Do it guys. In a sample
question here. Do it guys. In a sample of 40 light bulbs, the mean lifetime is
of 40 light bulbs, the mean lifetime is 5,000 hours. Okay. 40 light bulbs mean
5,000 hours. Okay. 40 light bulbs mean is 5,000 hour and the standard deviation
is 5,000 hour and the standard deviation is 400 hours. Compute a 90% confidence
is 400 hours. Compute a 90% confidence interval for the average lifetime of the
interval for the average lifetime of the bulbs. Get to it. And I'm I'm giving you
bulbs. Get to it. And I'm I'm giving you one other question. I can also help you.
one other question. I can also help you. See here whenever you're looking on any
See here whenever you're looking on any confident interval generally there is a
confident interval generally there is a concept of central limit theorem which
concept of central limit theorem which says that to apply any concept of normal
says that to apply any concept of normal distribution you have to have your
distribution you have to have your distribution normal and that can only
distribution normal and that can only possible that if n is greater than 30
possible that if n is greater than 30 which is your sample size we can use
which is your sample size we can use excuse me we can use score equal to much
excuse me we can use score equal to much 90% right so it is what how much value
90% right so it is what how much value one point I gave you know 1.676
one point I gave you know 1.676 look at that table 90% 1.645 sorry jet
look at that table 90% 1.645 sorry jet score is what 1.645 6 4 5. So if n is
score is what 1.645 6 4 5. So if n is not more than 30 then you will not be
not more than 30 then you will not be able to use this. There is some there
able to use this. There is some there are some rules. Now if you use this so
are some rules. Now if you use this so you have the jet score you have a sample
you have the jet score you have a sample size n40.
size n40. Uh you have mean lifetime how much?
Uh you have mean lifetime how much? 5,000. You have standard deviation
5,000. You have standard deviation sample standard deviation is how much?
sample standard deviation is how much? 400. So what is going to be your
400. So what is going to be your confidence interval? It will be xar +
confidence interval? It will be xar + minus zed square. Listen to me guys. S
minus zed square. Listen to me guys. S by roo<unk> under n right? Someone
by roo<unk> under n right? Someone should have done by now. So someone
should have done by now. So someone should have done it. How much is that?
should have done it. How much is that? 1.645
1.645 into uh 400 by 40.
into uh 400 by 40. What is it guys? What is the range
What is it guys? What is the range coming out to be? Upper limit will be
coming out to be? Upper limit will be how much and lower limit will be how
how much and lower limit will be how much? So upper limit is 4895
much? So upper limit is 4895 or lower limit what which one is it that
or lower limit what which one is it that lower I'm getting
lower I'm getting no I think I made some calculation
no I think I made some calculation mistake what is square root of 40 it
mistake what is square root of 40 it works wow this is good here okay so 400
works wow this is good here okay so 400 / 40 I can use it
/ 40 I can use it well this is good then what I can do I
well this is good then what I can do I multiply 1.645 645
multiply 1.645 645 into this with this this right so and I
into this with this this right so and I also want to add 5,000 plus right guys
also want to add 5,000 plus right guys this is my one note equation I'm getting
this is my one note equation I'm getting 5104
5104 okay I'll copy and paste and I'll just
okay I'll copy and paste and I'll just add a minus here is your 489 I think you
add a minus here is your 489 I think you guys are right you know so the answer is
guys are right you know so the answer is so apparently my answer key is wrong
so apparently my answer key is wrong which I was looking at 51 1 04 and 48
which I was looking at 51 1 04 and 48 96 right guys come on so upper limit is
96 right guys come on so upper limit is so that's why I don't trust these
so that's why I don't trust these question and the answer keys you have to
question and the answer keys you have to be careful I'm only interested in the
be careful I'm only interested in the question but this is the range guys if
question but this is the range guys if you're not getting it whatever you have
you're not getting it whatever you have done is correct okay I think it's a good
done is correct okay I think it's a good place to stop but guys let me take a
place to stop but guys let me take a quick feedback did you understand the
quick feedback did you understand the point estimate and the confidence in the
point estimate and the confidence in the double I think you cannot say no right
double I think you cannot say no right this is the so yeah this is this is how
this is the so yeah this is this is how you know uh generally any statisticians
you know uh generally any statisticians right they come up with this kind of
right they come up with this kind of approach to estimate the population
approach to estimate the population characteristics with the help of sample
characteristics with the help of sample right so you can clearly observe in
right so you can clearly observe in question like this you are not looking
question like this you are not looking at every all the bulbs let's interpret
at every all the bulbs let's interpret it now you're not looking at all the
it now you're not looking at all the buls you're only looking at 40 bulbs
buls you're only looking at 40 bulbs right and taking the sample mean of
right and taking the sample mean of those bulbs, standard deviation of those
those bulbs, standard deviation of those bulbs hours, right? And then you are
bulbs hours, right? And then you are coming with the confidence of saying
coming with the confidence of saying that I feel all the bulbs in the
that I feel all the bulbs in the population will either last 4896 to 5104
population will either last 4896 to 5104 hours. Interesting, right? So looking at
hours. Interesting, right? So looking at batches, you are coming and generalizing
batches, you are coming and generalizing the whole you are looking at this batch
the whole you are looking at this batch sample and then generalizing it to the
sample and then generalizing it to the whole population in that batch. So in
whole population in that batch. So in manufacturing you use confidence
manufacturing you use confidence interval like anything the quality
interval like anything the quality assurance you know the quality engineers
assurance you know the quality engineers they use a lot. So if you are not
they use a lot. So if you are not finding a lot of issues right you feel
finding a lot of issues right you feel that average life uh lifetime is x right
that average life uh lifetime is x right you can generalize and you can
you can generalize and you can generalize it to the population. So
generalize it to the population. So using this sample mean and putting a
using this sample mean and putting a buffer which is your margin of error.
buffer which is your margin of error. This is your margin of error. Now guys
This is your margin of error. Now guys you forgot this is your margin of error.
you forgot this is your margin of error. Right. So let me ask one question. I'll
Right. So let me ask one question. I'll talk about the next very important topic
talk about the next very important topic later on. Guys, you remember I said n
later on. Guys, you remember I said n should be greater than 30, sample size
should be greater than 30, sample size should be greater than 30, then only we
should be greater than 30, then only we can use that score. But nobody asked me
can use that score. But nobody asked me that what if sample size is less than
that what if sample size is less than 30. So if the sample size is less than
30. So if the sample size is less than 30, you will using the t distribution
30, you will using the t distribution and you will be using a t score. Hello.
and you will be using a t score. Hello. So I'm giving a note because I have not
So I'm giving a note because I have not talked about t distribution or t test
talked about t distribution or t test but I'll give as a note. Please keep it
but I'll give as a note. Please keep it with you. You don't need to write it's
with you. You don't need to write it's going to be in the notes. So write I
going to be in the notes. So write I I'll write a note. Let's say let's say
I'll write a note. Let's say let's say we have one small set of data or one
we have one small set of data or one small set of
small set of sample
sample which is let's say n is less than 30
which is let's say n is less than 30 then t distribution in place of zed
then t distribution in place of zed distribution or normal distribution
distribution or normal distribution instead of normal distribution is used
instead of normal distribution is used used to construct the confidence intera
used to construct the confidence intera Okay guys, clear? So if the sample size
Okay guys, clear? So if the sample size is less than 30, then t distribution and
is less than 30, then t distribution and instead of normal distribution, so I'm
instead of normal distribution, so I'm going to highlight it as like pink or
going to highlight it as like pink or purple to construct the confidence
purple to construct the confidence interval. That's how you construct the
interval. That's how you construct the conf. When the n is less than 30, you
conf. When the n is less than 30, you have another t distribution. Your
have another t distribution. Your formula will change. Your formula will
formula will change. Your formula will look like this, my dear. Xar + minus
look like this, my dear. Xar + minus here you will bring T distribution or T
here you will bring T distribution or T score T distribution score into same
score T distribution score into same thing S by root nothing changes only
thing S by root nothing changes only thing is changing is what my dear this
thing is changing is what my dear this jet score will be replaced with T score
jet score will be replaced with T score clear but I'll get back to that now I'm
clear but I'll get back to that now I'm interested in more on hypothesis testing
interested in more on hypothesis testing very interesting topic hypothesis right
very interesting topic hypothesis right in India people are always ready with
in India people are always ready with their own hypothesis so a simple
their own hypothesis so a simple hypothesis
hypothesis can be that I always take this an
can be that I always take this an example that there is a man who goes for
example that there is a man who goes for a trial.
a trial. A man goes to a trial
A man goes to a trial where he is being tried. This is a very
where he is being tried. This is a very classical example I take where he is
classical example I take where he is being tried for the murder of his wife.
being tried for the murder of his wife. I know it's
I know it's you know discussing this example not a
you know discussing this example not a good idea but I'm bound to take this
good idea but I'm bound to take this example right. What do you guys think?
example right. What do you guys think? Girls, ladies, let me know. Ideally,
Girls, ladies, let me know. Ideally, when I say this, without getting into
when I say this, without getting into anything, 95%
anything, 95% will say, "Okay, definitely he would
will say, "Okay, definitely he would have done it." A man goes uh to a trial
have done it." A man goes uh to a trial where he's being tried for the murder.
where he's being tried for the murder. The moment you see murder of his wife,
The moment you see murder of his wife, living in today's society, we believe
living in today's society, we believe that this is 99% correct. But do you
that this is 99% correct. But do you think, you know, judiciary works on this
think, you know, judiciary works on this principle? Do you think the judge in the
principle? Do you think the judge in the court will behave like you? Never ever.
court will behave like you? Never ever. There's a rule right in the law and
There's a rule right in the law and obviously I'm I'm again just like we
obviously I'm I'm again just like we Indians are opinionated right I don't
Indians are opinionated right I don't have any idea about law and order but
have any idea about law and order but I'm just reading some lines and telling
I'm just reading some lines and telling you this is how they proceed any cases
you this is how they proceed any cases in the world not in India across the
in the world not in India across the globe judge will only work on the
globe judge will only work on the documents and documents is always under
documents and documents is always under the administration how they present it
the administration how they present it what they present it that is going to
what they present it that is going to decide at the fate of the case. That's
decide at the fate of the case. That's how reality is. Okay? There's no emotion
how reality is. Okay? There's no emotion attached to it. But when I say this
attached to it. But when I say this statement, a lot of people if you go and
statement, a lot of people if you go and say this in society, right? The moment
say this in society, right? The moment something happens, right? 100% people
something happens, right? 100% people will start believing that this man was
will start believing that this man was that blah blah blah 100% he has murdered
that blah blah blah 100% he has murdered his wife. But this is a hypothesis. You
his wife. But this is a hypothesis. You need to test it. So if you want to test
need to test it. So if you want to test it then you have to form two set of
it then you have to form two set of hypothesis and that's where you must
hypothesis and that's where you must have heard in hypothesis there is one
have heard in hypothesis there is one known as null hypothesis. So you have to
known as null hypothesis. So you have to first frame null hypothesis. Now you
first frame null hypothesis. Now you tell me guys what should be the null
tell me guys what should be the null hypothesis in this case Hnot. In this
hypothesis in this case Hnot. In this case the null hypothesis is what? Not
case the null hypothesis is what? Not guilty. the the court will proceed or
guilty. the the court will proceed or start initiate this discussion right
start initiate this discussion right with with the guy saying not guilty
with with the guy saying not guilty right the moment have you seen in the
right the moment have you seen in the movie right I think we learn a lot from
movie right I think we learn a lot from movies but few things are correct the
movies but few things are correct the judge will just to save the court time
judge will just to save the court time of the court they'll ask the person that
of the court they'll ask the person that do you accept that you are guilty or not
do you accept that you are guilty or not like there's a very typical language
like there's a very typical language right the lawyer will tell you how to
right the lawyer will tell you how to respond if the person himself accept
respond if the person himself accept guilty then nothing to be done right so
guilty then nothing to be done right so obviously He will he will be just
obviously He will he will be just punished for it. Agree. So that the
punished for it. Agree. So that the first thing they ask to the person
first thing they ask to the person itself that are you guilty or not? You
itself that are you guilty or not? You accept that you have done it. If you
accept that you have done it. If you have not done it then the court will
have not done it then the court will accept it as the null hypothesis. Null
accept it as the null hypothesis. Null hypothesis means natural. Null means no
hypothesis means natural. Null means no change. Always remember that. Someone
change. Always remember that. Someone can ask this in interview. Null means no
can ask this in interview. Null means no change. Tell me do you really think that
change. Tell me do you really think that everybody right waking up and killing
everybody right waking up and killing their wives? I don't think so. Right.
their wives? I don't think so. Right. Every man in the world every day they
Every man in the world every day they wake up and they they plan to kill their
wake up and they they plan to kill their wife. I don't think so. Right. What is
wife. I don't think so. Right. What is rare is your going to be alternate
rare is your going to be alternate hypothesis. What is very common to
hypothesis. What is very common to happen in this society is going to be
happen in this society is going to be null hypothesis. So what is common in
null hypothesis. So what is common in this world like you sleep in the night
this world like you sleep in the night sleep during night that's common. I know
sleep during night that's common. I know some know we are we aroused we don't
some know we are we aroused we don't sleep in during night. We we wake up uh
sleep in during night. We we wake up uh we wake up in the 2:00 a.m. in the night
we wake up in the 2:00 a.m. in the night and then then they'll sleep around you
and then then they'll sleep around you know 6:00 a.m. in the morning. So I'm
know 6:00 a.m. in the morning. So I'm not talking about exception. I'm talking
not talking about exception. I'm talking about natural. What is natural in this
about natural. What is natural in this world? You sleep during night and you
world? You sleep during night and you wake up in the morning. Guys, hello. You
wake up in the morning. Guys, hello. You agree that's your normal. Similarly,
agree that's your normal. Similarly, this is also normal that man is not
this is also normal that man is not going to kill his wife. Right? This is
going to kill his wife. Right? This is normal. Still it is normal. I know it's
normal. Still it is normal. I know it's hard to believe it but then the women
hard to believe it but then the women would not have got married right there's
would not have got married right there's no point if you know you are going to
no point if you know you are going to die so definitely this is all normal
die so definitely this is all normal that's why the the hypothesis should be
that's why the the hypothesis should be very carefully framed right null
very carefully framed right null hypothesis is not guilty because of this
hypothesis is not guilty because of this normal natural phenomena that nobody
normal natural phenomena that nobody wants to kill nobody right trust me
wants to kill nobody right trust me still this is this is the philosophy and
still this is this is the philosophy and trust Trust me building hypothesis is a
trust Trust me building hypothesis is a lot of philosophy. Then alternate
lot of philosophy. Then alternate hypothesis.
hypothesis. Alternate will be what? There is some
Alternate will be what? There is some change change in what? Change in the
change change in what? Change in the normal behavior, right? And that will be
normal behavior, right? And that will be HA. Sometimes people also write H1. So
HA. Sometimes people also write H1. So both these notations are there. And
both these notations are there. And looks weird. HA or H1 and it is going to
looks weird. HA or H1 and it is going to be killed. Hello. And now you will start
be killed. Hello. And now you will start with the data to test this assumption
with the data to test this assumption that okay uh guilty or not guilty uh and
that okay uh guilty or not guilty uh and you're going to start with the whole
you're going to start with the whole datadriven approach to come to a
datadriven approach to come to a conclusion. We always want to prove the
conclusion. We always want to prove the null hypothesis is true. If you are
null hypothesis is true. If you are failing to do it then alternate wins.
failing to do it then alternate wins. Clear? So you lose the case while trying
Clear? So you lose the case while trying to winning while try winning it. Right?
to winning while try winning it. Right? Have you have you seen this also a
Have you have you seen this also a defendant? If you are if you are getting
defendant? If you are if you are getting convicted in something so not convicted
convicted in something so not convicted sorry if you're trying if you're getting
sorry if you're trying if you're getting tried for something which you have done
tried for something which you have done or not still you will get a defendant
or not still you will get a defendant right you are going to be defended right
right you are going to be defended right right victim I don't know what are the
right victim I don't know what are the words but defendant means that the
words but defendant means that the person who is getting blame right that
person who is getting blame right that person needs to be defended so naturally
person needs to be defended so naturally you feel that this person even you can
you feel that this person even you can feel that okay he you need to still
feel that okay he you need to still defend that person because that's not
defend that person because that's not natural for killing right a person. So
natural for killing right a person. So null hypothesis is going to be the idea
null hypothesis is going to be the idea to win that so that you prove that
to win that so that you prove that person is not guilty. In case we fail to
person is not guilty. In case we fail to prove it then he will be getting
prove it then he will be getting convicted and he will be getting the
convicted and he will be getting the punishment. Simple. So see uh we were
punishment. Simple. So see uh we were discussing hypothesis right? So null
discussing hypothesis right? So null hypothesis means no change no impact on
hypothesis means no change no impact on treatment no change nothing nothing has
treatment no change nothing nothing has been done okay you're doing a test like
been done okay you're doing a test like a in a pharmaceutical company you are
a in a pharmaceutical company you are doing a control test group what is the
doing a control test group what is the effect of treatment on your patient if
effect of treatment on your patient if it is null means no no effect right so
it is null means no no effect right so that's where uh comes your normal comes
that's where uh comes your normal comes your null hypothesis now uh let me get
your null hypothesis now uh let me get into the little more of the details uh
into the little more of the details uh you should know that hypothesis testing
you should know that hypothesis testing is basically ally is an assumption right
is basically ally is an assumption right always remember that let me give you as
always remember that let me give you as a note else you will forget hypothesis
a note else you will forget hypothesis testing my screen is upright is
testing my screen is upright is basically
basically an assumption that we make about the
an assumption that we make about the population parameter it's an assumption
population parameter it's an assumption that this will happen that will happen
that this will happen that will happen till the time you are done the test
till the time you are done the test complete testing it's still an
complete testing it's still an assumption okay if I continue right
assumption okay if I continue right hypothesis testing
hypothesis testing evaluates
evaluates two mutually exclusive statement.
two mutually exclusive statement. Correct guys? H not and H1. What is the
Correct guys? H not and H1. What is the meaning of mutually exclusive? Means if
meaning of mutually exclusive? Means if Hnot is correct, H1 will be false. Right
Hnot is correct, H1 will be false. Right guys? Correct. Two mutually exclusive
guys? Correct. Two mutually exclusive statement you are going to evaluate
statement you are going to evaluate about population
about population to determine which statement is best
to determine which statement is best supported by the sample data. Okay.
supported by the sample data. Okay. Now, how does hypothesis testing work?
Now, how does hypothesis testing work? Right? I want to show you. And to show
Right? I want to show you. And to show that I have to use an image. Let me show
that I have to use an image. Let me show you this. This is very very important
you this. This is very very important guys. Again, you're going to fit a
guys. Again, you're going to fit a normal distribution, right? Uh you can
normal distribution, right? Uh you can see that this is for two sides. By the
see that this is for two sides. By the way, can you see two tails are red? This
way, can you see two tails are red? This is for two sides. Suppose you say alpha
is for two sides. Suppose you say alpha equal to 0.05. Right? That's your level
equal to 0.05. Right? That's your level of significance. In another 5 10 minutes
of significance. In another 5 10 minutes I'll explain full detail. But tell me
I'll explain full detail. But tell me for alpha 0.05 alpha by 2 is what I have
for alpha 0.05 alpha by 2 is what I have just calculated. So in terms of
just calculated. So in terms of percentage it is 2.5 2.5 guys correct
percentage it is 2.5 2.5 guys correct for 95% confidence interval. You have
for 95% confidence interval. You have alpha by2 is equal to 2.5. So guys if
alpha by2 is equal to 2.5. So guys if you look at this I'll start shading.
you look at this I'll start shading. This is your happy reason. That's your
This is your happy reason. That's your confidence interval. Now guys this green
confidence interval. Now guys this green shade the bigger area see this is your
shade the bigger area see this is your confidence interval reason can say this
confidence interval reason can say this is your happy reason or CI reason where
is your happy reason or CI reason where you will say that anything which you're
you will say that anything which you're trying to test is going to fall in this
trying to test is going to fall in this place or not right now what happens now
place or not right now what happens now whenever you're trying to form the
whenever you're trying to form the hypothesis I'm giving one example of
hypothesis I'm giving one example of hypothesis that average height of the
hypothesis that average height of the sample is equal to the of the sample
sample is equal to the of the sample is equal to the
is equal to the national average height which is your
national average height which is your population height. Right guys? Come on.
population height. Right guys? Come on. If you're comparing sample height with
If you're comparing sample height with national average height is your
national average height is your population height. So ideally you will
population height. So ideally you will believe that average height of a like
believe that average height of a like let's say and this is a s and the data
let's say and this is a s and the data you're looking like all the adults male
you're looking like all the adults male between age 18 to 25 right something
between age 18 to 25 right something like that or 20 to 25 you can decide the
like that or 20 to 25 you can decide the test so you are saying that if you try
test so you are saying that if you try to take a sample out of this population
to take a sample out of this population for the height of the average height of
for the height of the average height of the adults male we believe ideally it
the adults male we believe ideally it will be same that's why it is part of
will be same that's why it is part of what hypothesis ho if your hypothesis is
what hypothesis ho if your hypothesis is True. Listen to me. So you have a alpha.
True. Listen to me. So you have a alpha. Always remember that you have a alpha
Always remember that you have a alpha which is your level of significance.
which is your level of significance. Remember that in this study I have fixed
Remember that in this study I have fixed it to 0.05.
it to 0.05. Now there are certain rules. If your P
Now there are certain rules. If your P value which is denoted as P is less than
value which is denoted as P is less than alpha. Listen to me. Less than alpha.
alpha. Listen to me. Less than alpha. Sometime people also write that write
Sometime people also write that write less than equal to alpha. you are going
less than equal to alpha. you are going to reject the null hypothesis. Listen to
to reject the null hypothesis. Listen to me. Don't worry how, why? I'll explain,
me. Don't worry how, why? I'll explain, you will understand. If your P value is
you will understand. If your P value is greater than alpha, you accept the null
greater than alpha, you accept the null hypothesis. P greater than alpha, you
hypothesis. P greater than alpha, you have to accept. Now the question come
have to accept. Now the question come now I'm going there only. What is P
now I'm going there only. What is P value? So what is P value? Let's answer
value? So what is P value? Let's answer that. So P value is the probability. P
that. So P value is the probability. P stands for probability of rejecting a
stands for probability of rejecting a null hypothesis when it is proven to be
null hypothesis when it is proven to be true. Right? So it is the probability of
true. Right? So it is the probability of rejecting a null hypothesis when the
rejecting a null hypothesis when the hypothesis is proven true. If p value is
hypothesis is proven true. If p value is less than or equal to as let me
less than or equal to as let me highlight p value is equal to or less
highlight p value is equal to or less than the significance level alpha then
than the significance level alpha then the null hypothesis is inconsistent. It
the null hypothesis is inconsistent. It needs to be rejected. I think that's the
needs to be rejected. I think that's the same thing I've written guys here. If P
same thing I've written guys here. If P is less than equal to alpha, you are
is less than equal to alpha, you are going to reject the null hypothesis. But
going to reject the null hypothesis. But this is a little tricky and Anthony
this is a little tricky and Anthony everyone please hold on to your
everyone please hold on to your question. Let me explain. I'm not done.
question. Let me explain. I'm not done. To explain this, I need to show you one
To explain this, I need to show you one cool example with the help of this
cool example with the help of this chart. I think you are going to
chart. I think you are going to understand like this. Tell me one thing.
understand like this. Tell me one thing. I'm talking about two scenario. Scenario
I'm talking about two scenario. Scenario number one, a dog bites a man. We must
number one, a dog bites a man. We must have had must have seen some news in the
have had must have seen some news in the videos right where the pet dogs in the
videos right where the pet dogs in the left they are doing all the kind of
left they are doing all the kind of nonsense things. Have you seen it guys?
nonsense things. Have you seen it guys? A dog bites a man. Do you think it's
A dog bites a man. Do you think it's going to be here or in the tail? What do
going to be here or in the tail? What do you say? There are two options on the
you say? There are two options on the plate. Most likely observation, most
plate. Most likely observation, most unlikely observation. Very unlikely. It
unlikely observation. Very unlikely. It is going to be in the tail or it is
is going to be in the tail or it is going to be in the peak.
going to be in the peak. Why it will be in the tail? A dog bites
Why it will be in the tail? A dog bites a man is a natural phenomena. It will
a man is a natural phenomena. It will always be somewhere coming as a most
always be somewhere coming as a most likely observation. In India, every day
likely observation. In India, every day a dog is biting somebody, right? It's a
a dog is biting somebody, right? It's a common phenomena. The second phenomena
common phenomena. The second phenomena I'll give you is a man bites a dog. The
I'll give you is a man bites a dog. The moment a man bites a dog, it becomes the
moment a man bites a dog, it becomes the breaking news.
breaking news. A man bites a dog is like likely. What
A man bites a dog is like likely. What do you think? A man bites a dog is very
do you think? A man bites a dog is very unlikely. It's like some psychotic
unlikely. It's like some psychotic behavior, right? That's how we put it.
behavior, right? That's how we put it. But I don't see uh this is going to be a
But I don't see uh this is going to be a common phenomena, right? You guys agree
common phenomena, right? You guys agree or not? Hello. So, it will be coming
or not? Hello. So, it will be coming somewhere in the tail. Now, you tell me
somewhere in the tail. Now, you tell me a dog biting a man is like a
a dog biting a man is like a non-hypothesis. It's a normal behavior.
non-hypothesis. It's a normal behavior. To reject it, you have to come to the
To reject it, you have to come to the tail. Hello, listen to me. A man biting
tail. Hello, listen to me. A man biting a dog, you have to come here and that is
a dog, you have to come here and that is in the tail. Very unlikely. So you will
in the tail. Very unlikely. So you will come here. Now you know that you have a
come here. Now you know that you have a cutff remember and you have another
cutff remember and you have another cutff here. At least you know this
cutff here. At least you know this right? You know that this part of the
right? You know that this part of the graph is your green reason. If there is
graph is your green reason. If there is an easy way to create this diagram and
an easy way to create this diagram and all I would have done all myself.
all I would have done all myself. So now if you see here this is the most
So now if you see here this is the most likely to happen that's green zone this
likely to happen that's green zone this is your this purple zone where the
is your this purple zone where the unlikely things happen agree everybody
unlikely things happen agree everybody so far with me don't lose it keep keep
so far with me don't lose it keep keep holding on the thoughts right now tell
holding on the thoughts right now tell me this is your reason of what 0.025
me this is your reason of what 0.025 remember alpha by 2 and this is also
remember alpha by 2 and this is also 0.025 025. This guy is 95%. And this guy
0.025 025. This guy is 95%. And this guy is how much? 2.5. 2.5. Don't forget
is how much? 2.5. 2.5. Don't forget that. Okay. Now you have a alpha. Alpha
that. Okay. Now you have a alpha. Alpha is how much? 0.05.
is how much? 0.05. You are saying that if your p value,
You are saying that if your p value, probability value is less than equal to
probability value is less than equal to alpha, you reject. Remember that. Tell
alpha, you reject. Remember that. Tell me p value is probability value. This is
me p value is probability value. This is going to be the highest probability.
going to be the highest probability. Look on the y-axis, right? So if this is
Look on the y-axis, right? So if this is less than alpha, your alpha is what?
less than alpha, your alpha is what? Fixed. You fixed it. Somebody asking me
Fixed. You fixed it. Somebody asking me alpha is going to be fixed. Yeah, you
alpha is going to be fixed. Yeah, you fix it. Now your alpha is 0.05 or for
fix it. Now your alpha is 0.05 or for two tail, you will fix it to 0.025.
two tail, you will fix it to 0.025. Depends on you. But let's keep it for
Depends on you. But let's keep it for 0.05. If you get any value which is less
0.05. If you get any value which is less than that, you will automatically land
than that, you will automatically land in this region where you have the less
in this region where you have the less probability. your cut off less than mean
probability. your cut off less than mean is what you are going in this uh red
is what you are going in this uh red zone now this purple zone everybody I
zone now this purple zone everybody I need your yes no good to go right the
need your yes no good to go right the moment you say less than 0.05 05. So see
moment you say less than 0.05 05. So see if I can give you some numbers then you
if I can give you some numbers then you will understand the beauty. Let's say
will understand the beauty. Let's say this is like you know.3
this is like you know.3 this is like 2 this is like 0.1 this guy
this is like 2 this is like 0.1 this guy will be like 05 right if you go lesser
will be like 05 right if you go lesser than this you will come more toward the
than this you will come more toward the tail now are you feeling it guys people
tail now are you feeling it guys people don't don't feel because they don't look
don't don't feel because they don't look at this this is y-axis is probability
at this this is y-axis is probability only now because these examples
only now because these examples combining everything into one story or
combining everything into one story or narrative you will not find anywhere
narrative you will not find anywhere maybe one guy after doing one hour of
maybe one guy after doing one hour of search you'll fine and obviously I I'm
search you'll fine and obviously I I'm not there so you'll find someone X Y Z
not there so you'll find someone X Y Z but it's very tough trust me on that
but it's very tough trust me on that understanding the P value understanding
understanding the P value understanding the alpha and understanding where to
the alpha and understanding where to reject visually explaining this in
reject visually explaining this in visually is very tough and if you're
visually is very tough and if you're able to explain it's very easy that why
able to explain it's very easy that why when I say this line it means it go to
when I say this line it means it go to this reason alpha is 0.05 05 which is
this reason alpha is 0.05 05 which is your cutff or alpha y2 what do you set
your cutff or alpha y2 what do you set if you are want to go lower than this
if you are want to go lower than this you will don't go there because what in
you will don't go there because what in our head happens now you always think
our head happens now you always think like like a kid x-axis yaxis suppose
like like a kid x-axis yaxis suppose this is your cutff let's say alpha when
this is your cutff let's say alpha when you I write less than equal to alpha
you I write less than equal to alpha your brain automatically will push you
your brain automatically will push you back isn't it guys that's what always
back isn't it guys that's what always happens the moment you say less than
happens the moment you say less than alpha to s it will go towards the left
alpha to s it will go towards the left now but what is the issue here the left
now but what is the issue here the left is more probability because this is
is more probability because this is different from your normal XY axis.
different from your normal XY axis. Right? So here alpha is less on the
Right? So here alpha is less on the right hand side. Again always remember
right hand side. Again always remember this is the distribution. You are not
this is the distribution. You are not looking the X-axis. Now buddy you are
looking the X-axis. Now buddy you are looking on the y-axis deciding on the
looking on the y-axis deciding on the x-axis. That's why it is confusing. You
x-axis. That's why it is confusing. You look at the probability probability on
look at the probability probability on the y-axis where the probability will be
the y-axis where the probability will be less as you come down. Now come towards
less as you come down. Now come towards the right or come towards the left. This
the right or come towards the left. This is the highest probability guys. This is
is the highest probability guys. This is the highest probability. this point
the highest probability. this point clear. So that's why when you say P less
clear. So that's why when you say P less than alpha, you reject. Less than equal
than alpha, you reject. Less than equal to alpha, you reject because why do you
to alpha, you reject because why do you reject also? Because it's a very
reject also? Because it's a very unlikely observation. It's not normal,
unlikely observation. It's not normal, right? Again, I'll tie back to this
right? Again, I'll tie back to this diagram, right? A man kills his wife,
diagram, right? A man kills his wife, right? Are we going to put here a man
right? Are we going to put here a man kills his wife? No. It's unlikely or
kills his wife? No. It's unlikely or likely. It's likely to kill a person.
likely. It's likely to kill a person. No. So I'll say a man doesn't kill his
No. So I'll say a man doesn't kill his wife. is likely to happen. All right?
wife. is likely to happen. All right? And here it will be a man kills his
And here it will be a man kills his wife. Unlikely. So now you understand
wife. Unlikely. So now you understand why I say that not guilty is null
why I say that not guilty is null hypothesis because it is likely to
hypothesis because it is likely to happen and being guilty is your
happen and being guilty is your alternate hypothesis which will come
alternate hypothesis which will come here. Are we clear guys? So if you
here. Are we clear guys? So if you observe I have connected the dots. So
observe I have connected the dots. So now I will move on to the steps typical
now I will move on to the steps typical steps in doing the hypothesis testing.
steps in doing the hypothesis testing. steps in involved
steps in involved hypothesis testing. So number one null
hypothesis testing. So number one null hypothesis means no change. It doesn't
hypothesis means no change. It doesn't mean no assumption in you are assuming
mean no assumption in you are assuming that he has not killed his wife. Right?
that he has not killed his wife. Right? It means natural anything which has no
It means natural anything which has no change in the behavior. Right? So if you
change in the behavior. Right? So if you take my example killing a person is not
take my example killing a person is not a normal behavior. Right? It's not
a normal behavior. Right? It's not natural. Right? Got it? So if something
natural. Right? Got it? So if something is natural, no change, no effect, right?
is natural, no change, no effect, right? No cause effect, it's going to be null
No cause effect, it's going to be null hypothesis. So number one point, how to
hypothesis. So number one point, how to get into hypothesis testing? So first of
get into hypothesis testing? So first of all, you have to formulate it. You know,
all, you have to formulate it. You know, you have to assign which one is going to
you have to assign which one is going to be null, which one is going to be
be null, which one is going to be alternate. So formulate two hypothesis
alternate. So formulate two hypothesis for analysis. Once you have done it,
for analysis. Once you have done it, we'll move on to draw sample. So once
we'll move on to draw sample. So once you have done the hypothesis statement
you have done the hypothesis statement draw samples from population for
draw samples from population for analysis. Obviously it is for analysis
analysis. Obviously it is for analysis why you need to write. Third step will
why you need to write. Third step will be perform appropriate test. Perform
be perform appropriate test. Perform appropriate statistical test or
appropriate statistical test or techniques right there I'll teach you
techniques right there I'll teach you multiple techniques. Uh last step will
multiple techniques. Uh last step will be either accept or reject.
be either accept or reject. Accept or reject. We don't even talk
Accept or reject. We don't even talk about alternate. We always talk about
about alternate. We always talk about accept or reject null hypothesis based
accept or reject null hypothesis based on evidence. So don't worry I will take
on evidence. So don't worry I will take example in handsome to show you this in
example in handsome to show you this in action. Clear? Now guys I'll talk about
action. Clear? Now guys I'll talk about two types of error that so when you try
two types of error that so when you try to do hypothesis you are bound to make
to do hypothesis you are bound to make mistake and there can be type one
mistake and there can be type one versus type two error. You might have
versus type two error. You might have heard about this in statistics hypothes
heard about this in statistics hypothes hypothesis hypothesis testing. You must
hypothesis hypothesis testing. You must have heard about type one versus type
have heard about type one versus type two. Very famous. So let me show you one
two. Very famous. So let me show you one nice diagram. Doesn't matter what you
nice diagram. Doesn't matter what you do, right? You are always going to make
do, right? You are always going to make mistake. Chances will be less but you
mistake. Chances will be less but you are bound to make mistake and the moment
are bound to make mistake and the moment you make mistakes right you are going to
you make mistakes right you are going to get certain error. Okay. So first thing
get certain error. Okay. So first thing first let me explain you what are these
first let me explain you what are these errors. So number one if you see you are
errors. So number one if you see you are going to see two normal distribution
going to see two normal distribution curve right and if you see here if you
curve right and if you see here if you take this curve as the base which is
take this curve as the base which is your null hypothesis samples right and
your null hypothesis samples right and you are thinking it is null hypothesis
you are thinking it is null hypothesis part but but if you look at this is your
part but but if you look at this is your any mean value you have estimated like
any mean value you have estimated like you know this will be equal to mu not or
you know this will be equal to mu not or mu1 whatever against this cutff can you
mu1 whatever against this cutff can you see that against this line against this
see that against this line against this line You see a something like this. See,
line You see a something like this. See, so now if you look at this, this is
so now if you look at this, this is known as type one error. Why it is type
known as type one error. Why it is type one error? Type one error is a type of
one error? Type one error is a type of error which happens when there is a
error which happens when there is a rejection of null hypothesis which is
rejection of null hypothesis which is when it is actually true. So let me take
when it is actually true. So let me take in my example. Suppose a man is being
in my example. Suppose a man is being tried for the murder of his wife and he
tried for the murder of his wife and he was innocent. A man who is innocent
was innocent. A man who is innocent got convicted
got convicted in the trial of in the trial for murder
in the trial of in the trial for murder of his wife. What kind of error is this?
of his wife. What kind of error is this? Come up. This is your type one error.
Come up. This is your type one error. Generally again in the law and order
Generally again in the law and order there's a philosophy that doesn't matter
there's a philosophy that doesn't matter that if you set free some criminals but
that if you set free some criminals but make sure that you don't convict an
make sure that you don't convict an innocent man. Right? Have you heard
innocent man. Right? Have you heard heard of it? It's okay that you make
heard of it? It's okay that you make mistake in punishing a criminal but it
mistake in punishing a criminal but it cannot punish an innocent but irony is
cannot punish an innocent but irony is irony is that irony is that that it
irony is that irony is that that it happens a lot of time right across the
happens a lot of time right across the globe but the idea is to avoid it. This
globe but the idea is to avoid it. This is known as blunder. This is not an
is known as blunder. This is not an error it's a blunder. You cannot go with
error it's a blunder. You cannot go with this. Even in any machine learning model
this. Even in any machine learning model type one error is not accepted at all.
type one error is not accepted at all. Okay, mean how can you do that? It's
Okay, mean how can you do that? It's like you're telling a man you are
like you're telling a man you are pregnant. I'll show you that image also.
pregnant. I'll show you that image also. You are telling a man that you are
You are telling a man that you are pregnant is a type one error. Let me
pregnant is a type one error. Let me show you that image. It's a very cool
show you that image. It's a very cool image. Generally, it comes as memes,
image. Generally, it comes as memes, right? Let me just pull that for you.
right? Let me just pull that for you. Yeah. So, this is the image which I was
Yeah. So, this is the image which I was talking about. Must have seen it in a
talking about. Must have seen it in a lot of data science memes. Yeah. This is
lot of data science memes. Yeah. This is much better. Although they they are
much better. Although they they are calling you know true positive true
calling you know true positive true negative I cannot go in that yet because
negative I cannot go in that yet because that's more of a classification
that's more of a classification ML based kind of understanding but can
ML based kind of understanding but can you see that type one error it's there
you see that type one error it's there in the box what is type one error you
in the box what is type one error you are looking at an old man he's the doc
are looking at an old man he's the doc he's the doctor and you're telling him
he's the doctor and you're telling him that you are pregnant obviously it's a
that you are pregnant obviously it's a blunder right you can look at a pregnant
blunder right you can look at a pregnant lady and you can say you are not
lady and you can say you are not pregnant that's your type two error that
pregnant that's your type two error that is still acceptable right doesn't
is still acceptable right doesn't anybody right? But how can you say this
anybody right? But how can you say this a man? So then this is a typical example
a man? So then this is a typical example of making an error right in in null
of making an error right in in null hypothesis. And can you see that this
hypothesis. And can you see that this particular part of the tail is is a null
particular part of the tail is is a null hypothesis. But since you are
hypothesis. But since you are overlapping these two guys null and
overlapping these two guys null and alternate and you have a cut off of mean
alternate and you have a cut off of mean anything more than this you say it is uh
anything more than this you say it is uh it is rejected. So you have rejected the
it is rejected. So you have rejected the null hypothesis, right? Anything before
null hypothesis, right? Anything before this. Now can you see that guys? This
this. Now can you see that guys? This one you have accepted the null
one you have accepted the null hypothesis
hypothesis and there can be a chance that if you
and there can be a chance that if you look at this part hello can there be a
look at this part hello can there be a chance that this part is part of what
chance that this part is part of what what another one alternate also right
what another one alternate also right hello so what's happening guys this is
hello so what's happening guys this is known as type two error so far with me
known as type two error so far with me so what is type two error type two type
so what is type two error type two type two error is is the kind of error that
two error is is the kind of error that occurs when we do not reject the null
occurs when we do not reject the null hypothesis
hypothesis equivalent true to false positive sorry
equivalent true to false positive sorry equivalent to false negative and when
equivalent to false negative and when the null hypothesis was false right so
the null hypothesis was false right so this is as good as saying I'll write I
this is as good as saying I'll write I need some space below it I'll write here
need some space below it I'll write here this is like saying a man is proven not
this is like saying a man is proven not guilty
guilty this also happens by the way even when
this also happens by the way even when even when he killed his wife right are
even when he killed his wife right are you getting guys this is equivalent to
you getting guys this is equivalent to what error
what error Type two error type two error type one
Type two error type two error type one error type two error right guys clear in
error type two error right guys clear in as per the law and order this is okay
as per the law and order this is okay mean I know you guys might be feeling oh
mean I know you guys might be feeling oh how can you leave this person but this
how can you leave this person but this is how it is in case you make a mistake
is how it is in case you make a mistake and a criminal get gets free it's okay
and a criminal get gets free it's okay but an innocent cannot be no cannot be
but an innocent cannot be no cannot be convicted now I'll give you the
convicted now I'll give you the definitions so this is type one error
definitions so this is type one error this is type
this is type so type one error
so type one error when there is a rejection of null
when there is a rejection of null hypothesis when it is actually true and
hypothesis when it is actually true and type error when we do not reject a null
type error when we do not reject a null hypothesis that is false. So this marks
hypothesis that is false. So this marks the closure of hypothesis testing not
the closure of hypothesis testing not from the example or use cases. This is
from the example or use cases. This is more from the concept side. When I take
more from the concept side. When I take this example in the hands you'll
this example in the hands you'll understand nicely. Just a quick info
understand nicely. Just a quick info guys. Intellipath offers a data science
guys. Intellipath offers a data science course in collaboration with iHub, IIT
course in collaboration with iHub, IIT Riy which will help you master concepts
Riy which will help you master concepts like Python, SQL, machine learning, AI,
like Python, SQL, machine learning, AI, PowerBI and more. With this course, we
PowerBI and more. With this course, we have already helped thousands of
have already helped thousands of professionals in successful career
professionals in successful career transition. You can check out their
transition. You can check out their testimonials on our achievers channel
testimonials on our achievers channel whose link is given in the description
whose link is given in the description below. Without a doubt, this course can
below. Without a doubt, this course can set your career to new heights. So visit
set your career to new heights. So visit the course page link given below in the
the course page link given below in the description and take the first step
description and take the first step towards career growth with the data
towards career growth with the data science course. So when do we use t test
science course. So when do we use t test guys? You have a jet test. Why do we
guys? You have a jet test. Why do we need t test? When the sample size is a
need t test? When the sample size is a is a little trouble for us, right? So
is a little trouble for us, right? So when the sample size is less than 30. So
when the sample size is less than 30. So your jet test will fail because jet test
your jet test will fail because jet test needs uh minimum 30 uh or more than 30
needs uh minimum 30 uh or more than 30 so that your distribution can be
so that your distribution can be qualified as normal distribution.
qualified as normal distribution. So, so then for small sample sizes, if
So, so then for small sample sizes, if you want to compare the group means,
you want to compare the group means, right? Or compare the means of two
right? Or compare the means of two groups. So, there you go ahead and use
groups. So, there you go ahead and use the t test stat uh t test. T stats uh t
the t test stat uh t test. T stats uh t test is like a t statistics which is to
test is like a t statistics which is to find the difference between the two
find the difference between the two means or compare the two means. Okay. So
means or compare the two means. Okay. So let me give you this and we can discuss
let me give you this and we can discuss on this. So whenever you're comparing
on this. So whenever you're comparing two samples means right. So you are
two samples means right. So you are looking at a population. So this is your
looking at a population. So this is your population just imagine and let's say
population just imagine and let's say you have two samples sample one and you
you have two samples sample one and you have another sample sample two. Now
have another sample sample two. Now there can be there can be two different
there can be there can be two different population as well. So that's one of the
population as well. So that's one of the way of looking at you can have one
way of looking at you can have one population and you can have another
population and you can have another population pop one pop two you have two
population pop one pop two you have two samples coming out of these two
samples coming out of these two different population. Now the question
different population. Now the question is that which t test is being used where
is that which t test is being used where this is known as paired t test. This is
this is known as paired t test. This is known as paired t test. Okay. And what
known as paired t test. Okay. And what you are looking at the other one this is
you are looking at the other one this is known as if samples are taken from two
known as if samples are taken from two independent population two different
independent population two different population this is known as two sample t
population this is known as two sample t test and this is something you have to
test and this is something you have to keep in your head mean you can't do much
keep in your head mean you can't do much about it these are the concept okay so
about it these are the concept okay so when you're trying to compare the means
when you're trying to compare the means two means like this you have a two
two means like this you have a two sample t test you have a pair t test
sample t test you have a pair t test there's one more thing when you compare
there's one more thing when you compare the mean against the standard value. So
the mean against the standard value. So let me take a population. You are
let me take a population. You are getting a sample out of it sample and
getting a sample out of it sample and you are comparing this sample against a
you are comparing this sample against a standard value. Then this becomes your
standard value. Then this becomes your one sample t test. So now you don't have
one sample t test. So now you don't have two samples. You have only one sample.
two samples. You have only one sample. You have only one sample, right? And you
You have only one sample, right? And you are comparing against some standard
are comparing against some standard value. Let's say I have a sample and and
value. Let's say I have a sample and and I I want to compare the sample mean
I I want to compare the sample mean against let's say 20 right so that will
against let's say 20 right so that will say that you are comparing with respect
say that you are comparing with respect to 20 and that becomes your one sample t
to 20 and that becomes your one sample t test okay so these are the these are the
test okay so these are the these are the different flavors of t test
different flavors of t test works on the as I was telling that when
works on the as I was telling that when you have the sample less than 30 then
you have the sample less than 30 then only you should use uh t test and there
only you should use uh t test and there are few assumptions guys like samples
are few assumptions guys like samples are independent. I just talked about
are independent. I just talked about homogeneity in sample variance also
homogeneity in sample variance also known as homocidasticity. It's a very
known as homocidasticity. It's a very difficult word to pronounce
difficult word to pronounce homosidasticity
homosidasticity and heteroscidasticity.
and heteroscidasticity. Anybody recalls that? So homocyasticity
Anybody recalls that? So homocyasticity let me use this word means homogeneity
let me use this word means homogeneity of the variance. Can you see that? So
of the variance. Can you see that? So whenever your all the samples right
whenever your all the samples right sample one sample two are having the
sample one sample two are having the same variance different samples
same variance different samples exhibiting same variance they are called
exhibiting same variance they are called homosidasticity
homosidasticity okay so that's like more like you know
okay so that's like more like you know one assumption that we have to just
one assumption that we have to just check to apply t test and data is
check to apply t test and data is assumed to be normally distributed so
assumed to be normally distributed so normal distribution
normal distribution homogenity and sample variance and
homogenity and sample variance and samples have to be independent. So these
samples have to be independent. So these are some assumptions that you have to
are some assumptions that you have to you know follow to comply right now I'll
you know follow to comply right now I'll get into I'll get into more like a tree
get into I'll get into more like a tree diagram suppose you're doing the
diagram suppose you're doing the hypothesis testing you want to reach to
hypothesis testing you want to reach to a conclusion and you are trying to do
a conclusion and you are trying to do test now this tree diagram will help you
test now this tree diagram will help you guys and this is about your population
guys and this is about your population okay so suppose you're doing hypothesis
okay so suppose you're doing hypothesis testing right and you know that in my no
testing right and you know that in my no in my data sigma is known okay
in my data sigma is known okay population variance is known and n is
population variance is known and n is greater than 30 then you go ahead and
greater than 30 then you go ahead and use jet score or jet test what is that
use jet score or jet test what is that you remember I had talked so you will
you remember I had talked so you will take xar sample mean subtract population
take xar sample mean subtract population mean divide by sigma by root under n
mean divide by sigma by root under n okay that's what we call zed test now
okay that's what we call zed test now zed test will also be of two types
zed test will also be of two types So let me
So let me extend this a little further. Z test is
extend this a little further. Z test is also going to be of two types.
also going to be of two types. One sample and two sample. Just like I
One sample and two sample. Just like I have mentioned about t test. One sample
have mentioned about t test. One sample it will be compared against a standard
it will be compared against a standard value. Two sample will be compared
value. Two sample will be compared against the each other. Right? This is
against the each other. Right? This is what you call as jet test. Okay. I will
what you call as jet test. Okay. I will show all these things in hands format as
show all these things in hands format as well when when when will we reach here
well when when when will we reach here reach there. Now what is Xbar? Sample
reach there. Now what is Xbar? Sample mean. Let's get familiarized. What is mu
mean. Let's get familiarized. What is mu population mean? What is sigma?
population mean? What is sigma? population standard deviation. What is
population standard deviation. What is n? n is your sample size. Should be
n? n is your sample size. Should be good. Give me go ahead if you're clear
good. Give me go ahead if you're clear about the formula and the the
about the formula and the the bifurcation. If sigma is known and n is
bifurcation. If sigma is known and n is greater than 30, right? We will use
greater than 30, right? We will use that. This is this is all informative.
that. This is this is all informative. We have to follow that. Means we cannot
We have to follow that. Means we cannot be questioning that how did it come.
be questioning that how did it come. Definitely it has a history but there's
Definitely it has a history but there's no point getting into it. Then on the
no point getting into it. Then on the second side we have a t test right in
second side we have a t test right in for the t test you don't know the sigma
for the t test you don't know the sigma so you you come with s is your sample
so you you come with s is your sample standard deviation and n is less than
standard deviation and n is less than 30. These are the two criteria there
30. These are the two criteria there that you don't know the population you
that you don't know the population you know uh standard deviation you know the
know uh standard deviation you know the s S is your sample standard deviation
s S is your sample standard deviation let me write here S is sample standard
let me write here S is sample standard deviation
deviation and if you look at the formula if you
and if you look at the formula if you look at the formula you will you will
look at the formula you will you will try to so s is known and n is less than
try to so s is known and n is less than 30 right then what you will do again you
30 right then what you will do again you will use a flavor of Z test which is
will use a flavor of Z test which is going to be replaced yeah going to
going to be replaced yeah going to replace your formula which you have
replace your formula which you have written zed now since you don't have the
written zed now since you don't have the sigma what you will write S by root
sigma what you will write S by root under N you have seen this as well okay
under N you have seen this as well okay and this can also be one sample and two
and this can also be one sample and two sample both are Z test only guys but the
sample both are Z test only guys but the flavor is different right now the
flavor is different right now the question is that do we can we apply this
question is that do we can we apply this Z test n is less than 30 then people say
Z test n is less than 30 then people say that okay xar minus mu and why don't we
that okay xar minus mu and why don't we name it something else because it is
name it something else because it is similar flavor of z test right but but
similar flavor of z test right but but can it can be called like a z test right
can it can be called like a z test right no when n is less than 30 right you will
no when n is less than 30 right you will name it as the t test right but if you
name it as the t test right but if you see the the inspiration is from here
see the the inspiration is from here only so similar to jet test But you are
only so similar to jet test But you are renaming it given that your requirement
renaming it given that your requirement of the sample size and you say that it
of the sample size and you say that it is t test. Okay. See the only thing
is t test. Okay. See the only thing which you are looking at this as
which you are looking at this as differently my dear is the sample size
differently my dear is the sample size and s is known. You don't know the
and s is known. You don't know the sigma. So your t test what I've written
sigma. So your t test what I've written here you will observe that it is close
here you will observe that it is close to z test. That's what I'm highlighting.
to z test. That's what I'm highlighting. Nothing else. Okay. What is the name of
Nothing else. Okay. What is the name of symbol of standard deviation? So I've
symbol of standard deviation? So I've written now see Xar sample mean, mu is
written now see Xar sample mean, mu is population mean, sigma is population
population mean, sigma is population standard deviation, s is your sample
standard deviation, s is your sample standard deviation. N is sample size.
standard deviation. N is sample size. This is what we call also as student t
This is what we call also as student t test. So by default we say loosely t
test. So by default we say loosely t test or you will say student t test.
test or you will say student t test. Clear? Now moving on there can be you
Clear? Now moving on there can be you know sometime people say what is one
know sometime people say what is one tail what is two tail right so let's
tail what is two tail right so let's understand that one tailed versus
understand that one tailed versus two-tailed test okay first understand
two-tailed test okay first understand two-tailed test to which you have seen
two-tailed test to which you have seen yes well two-tailed test so let me show
yes well two-tailed test so let me show you one nice diagram so this is the
you one nice diagram so this is the two-tailed test where you say if the
two-tailed test where you say if the population mean is equal to sample. Can
population mean is equal to sample. Can you see that? This is what we are
you see that? This is what we are saying. Population mean is equal to
saying. Population mean is equal to sample mean. That is going to be your
sample mean. That is going to be your natural hypothesis, right? Population
natural hypothesis, right? Population mean is equal to sample mean. And here
mean is equal to sample mean. And here we are saying population mean is not
we are saying population mean is not equal. So guys at the moment you say not
equal. So guys at the moment you say not equal, it can go in any direction. Say
equal, it can go in any direction. Say if you say it is not equal it can go
if you say it is not equal it can go either towards right hand side where it
either towards right hand side where it is you know more than the population
is you know more than the population mean is more than the sample mean or it
mean is more than the sample mean or it can go to the negative direction sorry
can go to the negative direction sorry left side direction when the population
left side direction when the population mean is less than sample mean. That's
mean is less than sample mean. That's where you need when you're trying to
where you need when you're trying to have hypothesis like this you need a
have hypothesis like this you need a two-tail test cannot do one tail. Why?
two-tail test cannot do one tail. Why? Because you don't have a direction
Because you don't have a direction there. Do you have a direction that you
there. Do you have a direction that you will go towards left or right? No, it's
will go towards left or right? No, it's birectional. It can go any direction. So
birectional. It can go any direction. So for that you have already seen me doing
for that you have already seen me doing this study the whole game the same game
this study the whole game the same game right alpha is this is your rejection
right alpha is this is your rejection reason also known as significance level.
reason also known as significance level. This is where when you reject your null
This is where when you reject your null hypothesis let me write and this is for
hypothesis let me write and this is for alpha equal to 0.05 which is your level
alpha equal to 0.05 which is your level of significance or significance level.
of significance or significance level. And when you say that your probability
And when you say that your probability value p value is less than equal to
value p value is less than equal to alpha what you do my dear reject the
alpha what you do my dear reject the null hypothesis
null hypothesis which basically means hnot. The moment
which basically means hnot. The moment you reject either you will come here
you reject either you will come here this part of the chart. This is your
this part of the chart. This is your rejection reason. This reason is known
rejection reason. This reason is known as what my dear 1 minus alpha which is
as what my dear 1 minus alpha which is 95%. This is the reason where you will
95%. This is the reason where you will accept that your null hypothesis is
accept that your null hypothesis is true. Basically you will say that your
true. Basically you will say that your population mean is equal to sample mean.
population mean is equal to sample mean. Let me shade then it will look real
Let me shade then it will look real nice. Take a look at take a look at it
nice. Take a look at take a look at it guys.
Okay. Now this is two tail test. There can be
Now this is two tail test. There can be another one which is one tail. Now one
another one which is one tail. Now one tail test can be left or right both
tail test can be left or right both because it's a directional one. Right?
because it's a directional one. Right? So either you can go left or you can get
So either you can go left or you can get right. Just for our reference I'll use
right. Just for our reference I'll use left. So when you say left means your
left. So when you say left means your population mean is less than the sample
population mean is less than the sample mean. See I'm saying my dear population
mean. See I'm saying my dear population mean is less than the sample mean and
mean is less than the sample mean and alternate hypothesis is that null
alternate hypothesis is that null hypothesis is same only population mean
hypothesis is same only population mean is equal to sample mean. So now do you
is equal to sample mean. So now do you guys agree that or you can see will be a
guys agree that or you can see will be a one tailed or two tailed will dependent
one tailed or two tailed will dependent will be dependent totally on the way you
will be dependent totally on the way you are forming the hypothesis you guys
are forming the hypothesis you guys agree now this is known as left one tail
agree now this is known as left one tail test so you can clearly see that the
test so you can clearly see that the whole of the 5% is sitting here
whole of the 5% is sitting here total acceptance region of 95%. Hello,
total acceptance region of 95%. Hello, are we clear? And you can imagine the
are we clear? And you can imagine the right hand will be what? Just opposite
right hand will be what? Just opposite of it. Let me also quickly show you. So
of it. Let me also quickly show you. So you will see the rejection region on the
you will see the rejection region on the right hand side as you can see on the
right hand side as you can see on the screen. See total 5%. And but I just
screen. See total 5%. And but I just want to conclude that you are going to
want to conclude that you are going to use Z test zed test when when your n is
use Z test zed test when when your n is greater than 30 even equal to 30 will
greater than 30 even equal to 30 will work and population variance is known
work and population variance is known always remember that population variance
always remember that population variance is known what sigma is known right you
is known what sigma is known right you will always do t test student t test
will always do t test student t test when n is less than 30 and population
when n is less than 30 and population variance is unknown. That's how you use
variance is unknown. That's how you use sample standard deviation. Clear? Now
sample standard deviation. Clear? Now there are different t test description
there are different t test description about it. One sample if you're comparing
about it. One sample if you're comparing it against the standard value, two
it against the standard value, two sample if you're taken from two
sample if you're taken from two different population, right? And if the
different population, right? And if the two samples are taken from the same
two samples are taken from the same population, right? You will use pair
population, right? You will use pair test. Okay, I'll give an example also.
test. Okay, I'll give an example also. So I have a couple of examples to
So I have a couple of examples to explain. I think that's what maybe Suman
explain. I think that's what maybe Suman is asking. One sample t test, it tests
is asking. One sample t test, it tests whether the population mean of a single
whether the population mean of a single population is equal to a target value,
population is equal to a target value, right? Some target value. For example,
right? Some target value. For example, mean height of female college student
mean height of female college student greater than 5 5.5 ft. So this is your
greater than 5 5.5 ft. So this is your standard or target value you're
standard or target value you're comparing against, right? That's your
comparing against, right? That's your one sample. Two sample when you say you
one sample. Two sample when you say you are going to compare the difference
are going to compare the difference between the means, right? see that of
between the means, right? see that of two independent population. So you say
two independent population. So you say that okay you have a target value but
that okay you have a target value but the target value is the difference
the target value is the difference between the means of two independent
between the means of two independent population. So then the example can be
population. So then the example can be does the mean height of female college
does the mean height of female college student significantly
student significantly differ from the mean height of the male
differ from the mean height of the male college student. Right? So there are two
college student. Right? So there are two different college.
different college. Right? One college is only for female
Right? One college is only for female students. Right? So that's your sample.
students. Right? So that's your sample. It is your female college and you will
It is your female college and you will take a sample called S1. Similarly, you
take a sample called S1. Similarly, you have another sample
have another sample which is male college and there you will
which is male college and there you will extract another sample and then you will
extract another sample and then you will compare the difference between the mean
compare the difference between the mean right using these two sample. Clear
right using these two sample. Clear guys, got it. The difference between one
guys, got it. The difference between one sample, two sample. Perfect. Let's do a
sample, two sample. Perfect. Let's do a question guys. The following data
question guys. The following data represents
represents hemoglobin
hemoglobin values in gram per dill liter for 10
values in gram per dill liter for 10 patients. Okay. So number are 10.5
patients. Okay. So number are 10.5 9 6.5 8 11 7.5
8.5 9.5 and 12. Question S is the mean
9.5 and 12. Question S is the mean question S is the mean value for
question S is the mean value for patients
patients significantly differ from the mean value
significantly differ from the mean value of general population. So guys the data
of general population. So guys the data you are seeing is a sample and the mean
you are seeing is a sample and the mean value of general population of
value of general population of hemoglobin is 12 g decime per deciliter
hemoglobin is 12 g decime per deciliter right at alpha = 0.05 5. My first
right at alpha = 0.05 5. My first question to you guys look at this
question to you guys look at this question and tell me that it is what
question and tell me that it is what kind of test you're going to do on this.
kind of test you're going to do on this. So guys, you are going to do since n is
So guys, you are going to do since n is less than 30, how much is n? n is equal
less than 30, how much is n? n is equal to 10 which is less than 30 and you have
to 10 which is less than 30 and you have the sample compared against standard
the sample compared against standard value of 12 g per deciliter. Right? So
value of 12 g per deciliter. Right? So this gives me combinely I can say
this gives me combinely I can say combining these two it means I need to
combining these two it means I need to do one sample
do one sample test. Okay. So let's do that now. So
test. Okay. So let's do that now. So what is going to be Xbar? And we know
what is going to be Xbar? And we know that N is 10. N is 10. Xar is what?
that N is 10. N is 10. Xar is what? Sample mean. So you have to do this sum
Sample mean. So you have to do this sum of all these values. I'm not going to do
of all these values. I'm not going to do till the last and divide by 10. If you
till the last and divide by 10. If you do that guys, you can confirm it later.
do that guys, you can confirm it later. It will come out to be 8.95. Right? Then
It will come out to be 8.95. Right? Then we need to calculate the sample standard
we need to calculate the sample standard deviation. The formula is root under x i
deviation. The formula is root under x i - xr²
- xr² i = 1 to n / n - 1. So basically it will
i = 1 to n / n - 1. So basically it will be 10 and this will be 9. That's how you
be 10 and this will be 9. That's how you calculate sample standard deviation.
calculate sample standard deviation. Guys, if you do this math, you will find
Guys, if you do this math, you will find 1.80. So, what is your t statistics? Xar
1.80. So, what is your t statistics? Xar minus mu by s by roo<unk> under n 8.95
minus mu by s by roo<unk> under n 8.95 - 12 1.80 roo<unk> under 10. Hands on
- 12 1.80 roo<unk> under 10. Hands on you will be getting this as a result in
you will be getting this as a result in you know 1 second or maximum 2 seconds
you know 1 second or maximum 2 seconds python will boot. So now guys we have
python will boot. So now guys we have the t statistics value. What is it?
the t statistics value. What is it? Generally what we do now once we have
Generally what we do now once we have the t stats t statistic we take mod of
the t stats t statistic we take mod of it so if you take mod of
it so if you take mod of so I think I'm audible right so what
so I think I'm audible right so what you're doing you're taking minus 5.35
you're doing you're taking minus 5.35 and taking mod of it right clear so what
and taking mod of it right clear so what you're doing you're the taking the t
you're doing you're the taking the t statistics and taking mod of it there's
statistics and taking mod of it there's a reason for it and you will understand
a reason for it and you will understand the moment I show you the t statistics
the moment I show you the t statistics table again you will look at the table
table again you will look at the table please focus. So this is a table guys
please focus. So this is a table guys and in this table you will have couple
and in this table you will have couple of things. One is your degree of freedom
of things. One is your degree of freedom that is nothing uh that is n minus one.
that is nothing uh that is n minus one. So your sample size minus one is your
So your sample size minus one is your degree of freedom that is something you
degree of freedom that is something you have to keep in your head. So tell me
have to keep in your head. So tell me what is n minus one? 10 - one 9 right?
what is n minus one? 10 - one 9 right? So you'll come here 9. Okay. Now it says
So you'll come here 9. Okay. Now it says that what is the critical value for this
that what is the critical value for this particular question? You have an alpha
particular question? You have an alpha which is 0.05.
which is 0.05. You have a n which is n minus one. So
You have a n which is n minus one. So you have two things or you need two
you have two things or you need two things like let me put it like this way.
things like let me put it like this way. You have a degree of freedom 9. You have
You have a degree of freedom 9. You have an n sorry you have an alpha 0.05.
an n sorry you have an alpha 0.05. Looking at alpha and degree of freedom.
Looking at alpha and degree of freedom. What is alpha? This is your alpha. Now
What is alpha? This is your alpha. Now it is going to be one tail significance
it is going to be one tail significance level or two tail significance level. It
level or two tail significance level. It is going to be one sample but what one
is going to be one sample but what one tail or two tail you don't know which
tail or two tail you don't know which direction it will go right you're going
direction it will go right you're going to take what two tail it is not telling
to take what two tail it is not telling more than or less than it is just we are
more than or less than it is just we are comparing whether it is equal to that or
comparing whether it is equal to that or not right it's two tail within two tail
not right it's two tail within two tail you will take alpha equal to 0.05 05.
you will take alpha equal to 0.05 05. Tell me for degree of freedom 9. Yeah,
Tell me for degree of freedom 9. Yeah, tell me for degree of freedom 9 and
tell me for degree of freedom 9 and alpha = to 05
alpha = to 05 where you are intersecting guys. You're
where you are intersecting guys. You're intersecting a cell called 2.26.
intersecting a cell called 2.26. So it basically gives you the t critical
So it basically gives you the t critical value coming from the table 2.26.
value coming from the table 2.26. What is the t stats? You have got a t
What is the t stats? You have got a t statistic that is 5 although it was
statistic that is 5 although it was minus 5 what was that sorry minus 5.35
minus 5 what was that sorry minus 5.35 but since it's all in positive we take a
but since it's all in positive we take a more so it is 5.35 now there's a
more so it is 5.35 now there's a condition that if you t statistic and
condition that if you t statistic and that's what you will learn step by so
that's what you will learn step by so I'll give the steps once you have these
I'll give the steps once you have these value steps so number one step is what
value steps so number one step is what calculate
calculate t critical tritical
t critical tritical at given significance level, level of
at given significance level, level of significance and degree of freedom from
significance and degree of freedom from the above table. Right? That's number
the above table. Right? That's number one step and you know that that is
one step and you know that that is coming t critical coming 2.26. You have
coming t critical coming 2.26. You have done it. Step number two, if t critical
done it. Step number two, if t critical is greater than calculated value which
is greater than calculated value which we known as mod of t statistic.
we known as mod of t statistic. Then what you will do my dear? You will
Then what you will do my dear? You will accept the null hypothesis. Maybe I'll
accept the null hypothesis. Maybe I'll use green. Accept the null hypothesis.
use green. Accept the null hypothesis. Okay, that is tuka a. There can be a
Okay, that is tuka a. There can be a scenario where t critical is less than
scenario where t critical is less than equal to calculated value t statistic
equal to calculated value t statistic and then you will say reject the null
and then you will say reject the null hypothesis.
hypothesis. If you compare this equation with this
If you compare this equation with this remember this you can compare like P is
remember this you can compare like P is greater than alpha and this is P is less
greater than alpha and this is P is less than equal to alpha. Hello this is just
than equal to alpha. Hello this is just a reference point.
a reference point. Are we not saying the same thing guys?
Are we not saying the same thing guys? Come on now let's look at the diagram
Come on now let's look at the diagram also. So by the way what is happening
also. So by the way what is happening here? What as for our question what is
here? What as for our question what is happening? Your tritical is
happening? Your tritical is 2.26 26 T statistics is 5.35
2.26 26 T statistics is 5.35 which one is what definitely t critical
which one is what definitely t critical is less than equal to calculated value
is less than equal to calculated value right guys since t critical is less than
right guys since t critical is less than t statistics mod right so what you will
t statistics mod right so what you will get guys the conclusion you are going to
get guys the conclusion you are going to reject the null hypothesis
reject the null hypothesis so basically you are saying that sample
so basically you are saying that sample mean is not equal to population mean it
mean is not equal to population mean it is not true reflection of that and look
is not true reflection of that and look at the values. Now your sample value
at the values. Now your sample value came quite low right 8.95 sample mean
came quite low right 8.95 sample mean look at the values the different values
look at the values the different values right you have as high as 12 you have as
right you have as high as 12 you have as low as 6.5 right so that is definitely
low as 6.5 right so that is definitely not putting you on that confidence that
not putting you on that confidence that your sample is a true reflection of your
your sample is a true reflection of your population and that's why you are going
population and that's why you are going to reject it even you can visualize it
to reject it even you can visualize it and you can also come to a conclusion
and you can also come to a conclusion from the t statistics right Now if you
from the t statistics right Now if you try to put it into a understanding of
try to put it into a understanding of the same distribution thing right so
the same distribution thing right so here if you see this is going to be two
here if you see this is going to be two tails so again this will be 2.5%
tails so again this will be 2.5% this will going to be 2.5% let's say
this will going to be 2.5% let's say this is your tr critical what is that
this is your tr critical what is that value it's 2.26 26. So you know that
value it's 2.26 26. So you know that your t critical is less than the t
your t critical is less than the t calculated value. t calculated value is
calculated value. t calculated value is what 5.35
what 5.35 although trust me this is like you know
although trust me this is like you know somewhere here - 5.35
somewhere here - 5.35 and this will be -2.26
and this will be -2.26 symmetric that's why I'm saying positive
symmetric that's why I'm saying positive negative isn't it? It is in this
negative isn't it? It is in this rejection region. You know that this is
rejection region. You know that this is your rejection reason. Or you can say
your rejection reason. Or you can say that your it is significant
that your it is significant statistically significant that's why
statistically significant that's why you're rejecting it. So we also say
you're rejecting it. So we also say right this t statistic is what in one
right this t statistic is what in one way we will also write it as
way we will also write it as statistically significant. Why it is
statistically significant. Why it is statistically significant? Because it is
statistically significant? Because it is higher than the critical values and you
higher than the critical values and you will reject the null hypothesis. So
will reject the null hypothesis. So basically you know basically it is going
basically you know basically it is going to be in this. So this is known as what
to be in this. So this is known as what reason? rejection region or significant
reason? rejection region or significant reason significance region I have shown
reason significance region I have shown you that if you're not able to recall
you that if you're not able to recall let me go up and show you the diagram
let me go up and show you the diagram see it's there now rejection or
see it's there now rejection or significance level right I hope you are
significance level right I hope you are following here right see I'm look at me
following here right see I'm look at me I'm getting into the details of the
I'm getting into the details of the things right I hope you can see that
things right I hope you can see that efforts right so are you feeling that
efforts right so are you feeling that that okay it makes sense Right? The
that okay it makes sense Right? The statistical tests are making sense. Even
statistical tests are making sense. Even for a smaller sample, you can make a
for a smaller sample, you can make a judgment and it looks correct. Hello.
judgment and it looks correct. Hello. Generally, when you do in Python, it
Generally, when you do in Python, it will be just telling you one line. It
will be just telling you one line. It will give you this t calculated this
will give you this t calculated this number and then it will say that whether
number and then it will say that whether you can accept or reject and reject
you can accept or reject and reject that's behind the scene calculation I'm
that's behind the scene calculation I'm showing guys. Gly good to go. We're
showing guys. Gly good to go. We're following this rest others are going to
following this rest others are going to follow the similar kind of style right
follow the similar kind of style right so you see different behavior in the
so you see different behavior in the sample and the population you might need
sample and the population you might need to do a different sampling approach or
to do a different sampling approach or something else right but this sample
something else right but this sample doesn't reflect the homoglobin level for
doesn't reflect the homoglobin level for the population because it is
the population because it is significantly differ from the mean right
significantly differ from the mean right so you can write a conclusion like that
so you can write a conclusion like that so let me write also so what is the
so let me write also so what is the conclusion sample significantly.
conclusion sample significantly. So it's significantly differ from the
So it's significantly differ from the population from the mean of the I should
population from the mean of the I should write here from the mean of the mean of
write here from the mean of the mean of the population homoglobin.
the population homoglobin. Let me find some more questions for you
Let me find some more questions for you guys. I want to do a two sample t test.
guys. I want to do a two sample t test. Now why why don't you look at this? So
Now why why don't you look at this? So it's a more like a MCQ question guys.
it's a more like a MCQ question guys. you have to choose one correct answer
you have to choose one correct answer for it. Right? So you have a uh Mio or
for it. Right? So you have a uh Mio or restaurant owner wants to test if her
restaurant owner wants to test if her two managers so let's say it's an
two managers so let's say it's an organization it's a restaurant right and
organization it's a restaurant right and Mio wants to test her two managers
Mio wants to test her two managers perform at the same level right they're
perform at the same level right they're comparing two managers whether they're
comparing two managers whether they're performing at the the same level or not.
performing at the the same level or not. So what what what they're doing that
So what what what they're doing that they she collects the data about the
they she collects the data about the number of customer complaints at two
number of customer complaints at two random sample of shifts right so there
random sample of shifts right so there are two manager working in crew shift
are two manager working in crew shift one for each manager right you know
one for each manager right you know right in restaurant people work in shift
right in restaurant people work in shift someone will start at 6:00 and they will
someone will start at 6:00 and they will go till you know maybe 6:00 a.m. in the
go till you know maybe 6:00 a.m. in the morning so shift one can be 6 till 300
morning so shift one can be 6 till 300 p.m. So 6:00 a.m. to 3 p.m. and another
p.m. So 6:00 a.m. to 3 p.m. and another guy can come at another shift which will
guy can come at another shift which will be 3:00 p.m. till 12:00 in the night.
be 3:00 p.m. till 12:00 in the night. Hello. So what she's doing that she's
Hello. So what she's doing that she's collecting sample for this manager and
collecting sample for this manager and this manager. Sample is all about number
this manager. Sample is all about number of customer complaints. Hello. So far
of customer complaints. Hello. So far with me guys and then the summary of the
with me guys and then the summary of the result they have given to you. So mean
result they have given to you. So mean is four complaints, five complaints,
is four complaints, five complaints, standard deviation is.3.5,
standard deviation is.3.5, number of shifts they have done 19 and
number of shifts they have done 19 and 21. So this guy she has collected for
21. So this guy she has collected for let's say this is manager A. This is
let's say this is manager A. This is manager B. So how many shift data she
manager B. So how many shift data she has looked into 19 and here she has
has looked into 19 and here she has looked into 21 right? 21 days or 21
looked into 21 right? 21 days or 21 shifts in that she's trying to come with
shifts in that she's trying to come with the number of complaints. Now the
the number of complaints. Now the question is what you are going to use
question is what you are going to use mean how do you solve this? You have an
mean how do you solve this? You have an idea definitely this is going to be two
idea definitely this is going to be two sample t statistics. So guys here you
sample t statistics. So guys here you are going to do or use the statistics
are going to do or use the statistics called t and here you're going to use
called t and here you're going to use sample mean x1 bar x2 bar. Here not you
sample mean x1 bar x2 bar. Here not you going to write xar minus mu and then you
going to write xar minus mu and then you are going to use this formula s1 squar
are going to use this formula s1 squar by n_sub_1 + s_sub_2² by n_sub_2. Okay,
by n_sub_1 + s_sub_2² by n_sub_2. Okay, this is the formula we use all the time.
this is the formula we use all the time. Now that's the reason that I am telling
Now that's the reason that I am telling you this because this formula can be
you this because this formula can be derived and I hope you know right
derived and I hope you know right everything here. X1 is let me write this
everything here. X1 is let me write this is something you are not supposed to
is something you are not supposed to remember because you are going to code
remember because you are going to code in Python so you can easily use it but
in Python so you can easily use it but you should write this is like sample
you should write this is like sample mean uh this is sample mean one
mean uh this is sample mean one X2 bar sample mean two for the second
X2 bar sample mean two for the second population sample then S1 is what sample
population sample then S1 is what sample standard deviation one s_ub_2 means
standard deviation one s_ub_2 means sample standard deviation two n_sub_1
sample standard deviation two n_sub_1 means means sample size one and n_sub_2
means means sample size one and n_sub_2 means sample size two. So when you do
means sample size two. So when you do two sample t test you do the difference
two sample t test you do the difference between the mean. This is much much
between the mean. This is much much important. So guys four minus 5 will be
important. So guys four minus 5 will be so this is correct this is correct this
so this is correct this is correct this is also correct. So definitely out of
is also correct. So definitely out of these options
these options these two are gone right now standard
these two are gone right now standard deviation is what.3 squared. So I think
deviation is what.3 squared. So I think this looks more closer right but tell me
this looks more closer right but tell me one thing this is so very close but why
one thing this is so very close but why this zero is coming if you look at the
this zero is coming if you look at the exact formula now there will be one more
exact formula now there will be one more term which we generally assume as zero
term which we generally assume as zero so the formula says sample difference
so the formula says sample difference definitely it is x1 bar minus x2 bar but
definitely it is x1 bar minus x2 bar but you are also subtracting well there's
you are also subtracting well there's one more difference and that is known as
one more difference and that is known as hypothesized difference. So basically
hypothesized difference. So basically I'm saying population
I'm saying population difference if there is anything and then
difference if there is anything and then standard error of the difference divided
standard error of the difference divided by standard error of the difference. So
by standard error of the difference. So what I have seen people missing this x1
what I have seen people missing this x1 - x2 bar
- x2 bar mu1 - mu2. So this mu1 - mu2 if you
mu1 - mu2. So this mu1 - mu2 if you think that the population mean
think that the population mean difference is also there. This comes
difference is also there. This comes from the null hypothesis. If there is no
from the null hypothesis. If there is no difference in the population mean then
difference in the population mean then what you will set mu1 minus mu2. If
what you will set mu1 minus mu2. If there is no difference in the if no
there is no difference in the if no difference in the population means it
difference in the population means it implies that you put mu1 - mu2 equal to
implies that you put mu1 - mu2 equal to zero. That's how you put this equal to
zero. That's how you put this equal to zero. And that's how this zero is
zero. And that's how this zero is coming. Are we clear guys? Denominator
coming. Are we clear guys? Denominator no change. You will still write the same
no change. You will still write the same s1² n1 s_ub_2² divided by n_sub_2.
s1² n1 s_ub_2² divided by n_sub_2. Comfortable? I hope you like this
Comfortable? I hope you like this question right? It's easy. Then you can
question right? It's easy. Then you can do a number of things. So you can
do a number of things. So you can someone can calculate this t statistics
someone can calculate this t statistics quickly for me. What is that? 5 - 4
quickly for me. What is that? 5 - 4 sorry 4 - 5 do the maths anybody
sorry 4 - 5 do the maths anybody root under.3²
root under.3² by 19.4²
by 19.4² by 21.5
by 21.5 what is the t statistics you have gotten
what is the t statistics you have gotten -4.46
and degree of freedom you have to also take now. So see n_sub_1 is equal to 90
take now. So see n_sub_1 is equal to 90 n_sub_2 = 21. So you know degree of
n_sub_2 = 21. So you know degree of freedom will be what? nsub_1 -1 +
freedom will be what? nsub_1 -1 + n_sub_2 - 1. So actually it comes out to
n_sub_2 - 1. So actually it comes out to be n1 n2 - 2 right. So how much is that
be n1 n2 - 2 right. So how much is that guys? 40 - 2 38 and alpha if nothing is
guys? 40 - 2 38 and alpha if nothing is given you can always take 0.05. Okay I'm
given you can always take 0.05. Okay I'm just confirming my degree of freedom.
just confirming my degree of freedom. Let me open my notes. I think it is n1 +
Let me open my notes. I think it is n1 + n_sub_2 by minus 2 means that's correct
n_sub_2 by minus 2 means that's correct only. So this is your degree of freedom.
only. So this is your degree of freedom. Now look at the table. I know this is
Now look at the table. I know this is 38. I don't know we'll have 38 in this.
38. I don't know we'll have 38 in this. No but uh t stats uh df = 38 alpha =
No but uh t stats uh df = 38 alpha = 0.05
0.05 it's very tough. Generally you should
it's very tough. Generally you should not wor get worried because I think
not wor get worried because I think sometimes these things trouble us but
sometimes these things trouble us but that's fine. There's a website called
that's fine. There's a website called mera calculator
mera calculator significance level 005 and calculate
significance level 005 and calculate this that will give the critical value
this that will give the critical value guys what is the value for that so see
guys what is the value for that so see this is a calculator for critical value
this is a calculator for critical value I have taken 2 0.05 05 38 is my degree
I have taken 2 0.05 05 38 is my degree of freedom. So that gives me this value.
of freedom. So that gives me this value. What is this value guys? So t critical
What is this value guys? So t critical comes out to be please help me here. It
comes out to be please help me here. It is 2.0249.
is 2.0249. What is my t statistics mod? -4.46.
What is my t statistics mod? -4.46. Earlier you told me how much someone can
Earlier you told me how much someone can also confirm that nobody's calculating
also confirm that nobody's calculating but I assume that this is what but I
but I assume that this is what but I don't want to assume
don't want to assume that's what the problem is.
that's what the problem is. delet
delet 0.3
0.3 0.5
0.5 right these are the values 0.3 0.5
right these are the values 0.3 0.5 19 and 21 right so my equation is 3 into
19 and 21 right so my equation is 3 into 0.3
0.3 divided by 19 I'll just do this I'm
divided by 19 I'll just do this I'm using Excel by the way in case you're
using Excel by the way in case you're wondering what he's doing this plus this
wondering what he's doing this plus this and I'll So square root this and then
and I'll So square root this and then I'll divide -1
I'll divide -1 by this number it comes -7.7518
why you are getting so many different values -7.7518
values -7.7518 period that should be the value why you
period that should be the value why you are getting different different values
are getting different different values what's going on
what's going on calculation
calculation So it will be 7.751
So it will be 7.751 definitely you can see that it is higher
definitely you can see that it is higher so you reject reject null hypothesis
so you reject reject null hypothesis that's it clear I think this is what I'm
that's it clear I think this is what I'm looking at the table getting it so what
looking at the table getting it so what you are saying that definitely there's a
you are saying that definitely there's a significant different in the manager
significant different in the manager performance level because they're not
performance level because they're not same right which one is better which one
same right which one is better which one is not that is a different question okay
is not that is a different question okay so now guys next thing we can look at is
so now guys next thing we can look at is the F test. Okay, so I'm putting the
the F test. Okay, so I'm putting the heading F test. So F test is more to
heading F test. So F test is more to more to look at the variance of two
more to look at the variance of two populations, right? Not the mean. So
populations, right? Not the mean. So when you want to compare the variances
when you want to compare the variances of two population, you are going to come
of two population, you are going to come to F test. Okay. And there are few
to F test. Okay. And there are few assumptions similar to like t test,
assumptions similar to like t test, normal distribution of the data. Data is
normal distribution of the data. Data is independent. So generally when you are
independent. So generally when you are trying to do the F test what you do you
trying to do the F test what you do you try to check if my screen is up right if
try to check if my screen is up right if the variances of the two populations
the variances of the two populations whether groups are taken from equal or
whether groups are taken from equal or not whe the groups taken are taken from
not whe the groups taken are taken from equal or not. So there are you know two
equal or not. So there are you know two population you have extracted two sample
population you have extracted two sample let me draw this is gen population one
let me draw this is gen population one this is pop two and you are extracting a
this is pop two and you are extracting a sample one group one and sample two so
sample one group one and sample two so what you do you try to estimate using
what you do you try to estimate using two samples that the variances for the
two samples that the variances for the two populations are same or not are you
two populations are same or not are you able to understand guys what you are
able to understand guys what you are comparing you're comparing the variances
comparing you're comparing the variances are same or not earlier you were
are same or not earlier you were comparing the means Remember so what
comparing the means Remember so what will be your null hypothesis?
will be your null hypothesis? So null hypothesis can be variances are
So null hypothesis can be variances are equal and alternate hypothesis is you
equal and alternate hypothesis is you know that variances are not equal. Okay
know that variances are not equal. Okay null hypothesis and this is your
null hypothesis and this is your alternate hypothesis. Once you have it
alternate hypothesis. Once you have it then you calculate the value of f and
then you calculate the value of f and what is the formula we deploy? It is
what is the formula we deploy? It is larger sample variance. Which one is
larger sample variance. Which one is larger? You will put it on the numerator
larger? You will put it on the numerator divided by smaller sample variance
divided by smaller sample variance that is sigma 1 square divided by sigma
that is sigma 1 square divided by sigma square. Okay. And then degree of freedom
square. Okay. And then degree of freedom you need and that is going to be there
you need and that is going to be there two degree of freedom df1
two degree of freedom df1 and df2 because of two populations. So
and df2 because of two populations. So one with so because of two sample n1
one with so because of two sample n1 minus one and n minus one. Okay, it's
minus one and n minus one. Okay, it's very easy guys. Trust me, this is more
very easy guys. Trust me, this is more easier than t test. This is a
easier than t test. This is a distribution table for F test. Can you
distribution table for F test. Can you see that? Now you have a degree of
see that? Now you have a degree of freedom one. See, and you have a degree
freedom one. See, and you have a degree of freedom two. Let's say for any of
of freedom two. Let's say for any of those, let's say hypothetically your
those, let's say hypothetically your degree of freedom is one is 8 and
degree of freedom is one is 8 and another is 12. Let's say so where you
another is 12. Let's say so where you will reach finally guys come on. What is
will reach finally guys come on. What is your t critical? 2 whatever this what is
your t critical? 2 whatever this what is this number? 250
this number? 250 mg. So where you're intersecting guys
mg. So where you're intersecting guys 2.5196
2.5196 that becomes your F critical and then
that becomes your F critical and then again the same game you will compare
again the same game you will compare they'll try to you know come to a
they'll try to you know come to a confusion. It is also known as F test is
confusion. It is also known as F test is also known as variance ratio
also known as variance ratio distribution test. Right? Some fancy
distribution test. Right? Some fancy name but you have it. Trust me there are
name but you have it. Trust me there are a lot of distribution. One of the
a lot of distribution. One of the distribution is F distribution. Okay.
distribution is F distribution. Okay. But the problem is that we cannot get
But the problem is that we cannot get into too much of the distribution also.
into too much of the distribution also. That's not in the scope of the syllabus.
That's not in the scope of the syllabus. But you have a T distribution, you have
But you have a T distribution, you have a F distribution. So the sky is the
a F distribution. So the sky is the limit. You have a K square distribution,
limit. You have a K square distribution, right? There are quite a few
right? There are quite a few distributions. But our aim is to you
distributions. But our aim is to you know look at uh F distribution and try
know look at uh F distribution and try to get into the any of those kind of
to get into the any of those kind of analysis. But ra ra ra ra ra ra ra ra ra
analysis. But ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra
ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra ra rather than looking too
ra ra ra ra ra rather than looking too much into distribution our focus is
much into distribution our focus is directly look into the how to say uh the
directly look into the how to say uh the test so that you can handle it. Okay, I
test so that you can handle it. Okay, I think so. See formula derivation if you
think so. See formula derivation if you try to do then you will get a little out
try to do then you will get a little out of the place. Okay, any sample limit I
of the place. Okay, any sample limit I think no as you can see the notes we are
think no as you can see the notes we are not talking about n has to be less than
not talking about n has to be less than or more than right. So you can use it
or more than right. So you can use it and this table is for different
and this table is for different different alpha guys. So just one thing
different alpha guys. So just one thing I wish to write is that this table which
I wish to write is that this table which I've given now this is for alpha 90°
I've given now this is for alpha 90° 90%. So 10 you can also have alpha for
90%. So 10 you can also have alpha for 0.05 like that. So this is a very good
0.05 like that. So this is a very good website which you can look at it for the
website which you can look at it for the table. So for different alpha different
table. So for different alpha different uh table is there and you can leverage
uh table is there and you can leverage that. Okay. Just a quick info guys.
that. Okay. Just a quick info guys. Intellipath offers a data science course
Intellipath offers a data science course in collaboration with iHub, IIT, Riy
in collaboration with iHub, IIT, Riy which will help you master concepts like
which will help you master concepts like Python, SQL, machine learning, AI,
Python, SQL, machine learning, AI, PowerBI and more. With this course, we
PowerBI and more. With this course, we have already helped thousands of
have already helped thousands of professionals in successful career
professionals in successful career transition. You can check out their
transition. You can check out their testimonials on our achievers channel
testimonials on our achievers channel whose link is given in the description
whose link is given in the description below. Without a doubt, this course can
below. Without a doubt, this course can set your career to new heights. So,
set your career to new heights. So, visit the course page link given below
visit the course page link given below in the description and take the first
in the description and take the first step towards career growth with the data
step towards career growth with the data science course. Okay. So let me put the
science course. Okay. So let me put the heading ANOVA. So ANOVA is another
heading ANOVA. So ANOVA is another statistical test guys. So let me share
statistical test guys. So let me share this and talk through. So ANOVA also
this and talk through. So ANOVA also known as analysis of variance. See till
known as analysis of variance. See till now we are only compare means variance
now we are only compare means variance between two groups right? Either you
between two groups right? Either you were doing one sample or you do you
were doing one sample or you do you doing two sample.
doing two sample. Let's say if I ask that if I want to
Let's say if I ask that if I want to compare between more than two two or
compare between more than two two or more groups and I want to compare the
more groups and I want to compare the mean what method I should go with
mean what method I should go with because these will not allow me right so
because these will not allow me right so that's where ANOVA comes into the
that's where ANOVA comes into the picture although the name is analysis of
picture although the name is analysis of variance but it tries to compare the
variance but it tries to compare the mean of different groups to understand
mean of different groups to understand that you know how one group is
that you know how one group is significantly different from the other
significantly different from the other group. Clear? Again, then we have
group. Clear? Again, then we have certain assumptions that samples needs
certain assumptions that samples needs to be independent. Right? The another
to be independent. Right? The another one is very important that all
one is very important that all population should have common variance,
population should have common variance, right? Homoidasticity and samples are
right? Homoidasticity and samples are drawn from the normally distributed
drawn from the normally distributed population. Okay. So, ANOVA there are
population. Okay. So, ANOVA there are different types of ANOVA, but the one
different types of ANOVA, but the one which is we are going to cover in the
which is we are going to cover in the syllabus is let's say oneway ANOVA. So
syllabus is let's say oneway ANOVA. So one way ANOVA is basically an extension
one way ANOVA is basically an extension of t test right. So for example you have
of t test right. So for example you have population one population two and you
population one population two and you can go till end population K. So you
can go till end population K. So you will observe that you have how many
will observe that you have how many groups you have total K groups. So here
groups you have total K groups. So here the idea is that you know if I want to
the idea is that you know if I want to find the difference between the means
find the difference between the means right I'll follow an approach which will
right I'll follow an approach which will help us to you know find the variation
help us to you know find the variation among the sample means. So everyone will
among the sample means. So everyone will have a sample coming out of it. Sample
have a sample coming out of it. Sample one, sample two and sample K. So just
one, sample two and sample K. So just for the three groups right let's say for
for the three groups right let's say for K equal to three I can show you one
K equal to three I can show you one diagram see that if that makes sense. So
diagram see that if that makes sense. So look at here. So this is how the
look at here. So this is how the different populations are varying. But
different populations are varying. But do you notice one thing that all these
do you notice one thing that all these guys are having similar variance right?
guys are having similar variance right? The length of this arrow. So the one
The length of this arrow. So the one first thing you should observe that the
first thing you should observe that the three populations have the same variant
three populations have the same variant the common variance although their mean
the common variance although their mean are significantly different from each
are significantly different from each other but but but if you compare the
other but but but if you compare the means between these two guys sorry for
means between these two guys sorry for the very blurred photo but it is very
the very blurred photo but it is very helpful that's why I'm using it. So if
helpful that's why I'm using it. So if you look at this can you quickly see
you look at this can you quickly see that there's not much difference between
that there's not much difference between these two. So mu1 and mu2 are look
these two. So mu1 and mu2 are look closer. However, they are significantly
closer. However, they are significantly different from mu3. Can I say that guys?
different from mu3. Can I say that guys? So there are three population 1 2 three
So there are three population 1 2 three having three different means and you're
having three different means and you're trying to compare it. Okay. So once you
trying to compare it. Okay. So once you have a fair idea about that what it does
have a fair idea about that what it does why don't we get into the calculation
why don't we get into the calculation you know the approach how do you do it?
you know the approach how do you do it? Okay. So let me give you the F
Okay. So let me give you the F statistics for this. So how they
statistics for this. So how they calculate guys? They will use F and
calculate guys? They will use F and they'll try to find variation among
they'll try to find variation among sample means and normalize it with the
sample means and normalize it with the natural variation or you can just write
natural variation or you can just write variation within groups. Okay. So or you
variation within groups. Okay. So or you can write you know mean sum of squares
can write you know mean sum of squares between the groups mean square of errors
between the groups mean square of errors which is MSC and mean sum of square
which is MSC and mean sum of square between the groups is known as MSP. So
between the groups is known as MSP. So finally it boils down to you know MSP by
finally it boils down to you know MSP by MSE just keep that in head and we are
MSE just keep that in head and we are going to deep dive on this. So let's
going to deep dive on this. So let's understand first thing first what is
understand first thing first what is MSP? So as the numeric suggests that it
MSP? So as the numeric suggests that it is mean square between groups. Okay. So
is mean square between groups. Okay. So how do we calculate it guys? There's a
how do we calculate it guys? There's a formula called SSB K minus one. Suppose
formula called SSB K minus one. Suppose you have K population you will do K
you have K population you will do K minus one. Okay there's a formula you
minus one. Okay there's a formula you can't do anything. You have to keep that
can't do anything. You have to keep that in your head but I'll explain it later.
in your head but I'll explain it later. So you you actually write XJ - Xar all
So you you actually write XJ - Xar all squared. You will not understand much
squared. You will not understand much here guys. I'm telling you. You just
here guys. I'm telling you. You just note down the formula and then here you
note down the formula and then here you write NJ. Okay, that's your MSB which is
write NJ. Okay, that's your MSB which is SSB by K minus one. We have another one
SSB by K minus one. We have another one called MSE mean square of errors. I'm
called MSE mean square of errors. I'm going to explain but first let me give
going to explain but first let me give you the formula. So this is going to be
you the formula. So this is going to be SSE sum of square error N minus K. So
SSE sum of square error N minus K. So basically it is sigma nj -1 njus one
basically it is sigma nj -1 njus one then another sigma x - xj bar²
then another sigma x - xj bar² okay n minus so it comes to sigma nj
okay n minus so it comes to sigma nj minus one and this comes to s j²
minus one and this comes to s j² n minus k
n minus k so basically this n which is you are
so basically this n which is you are seeing is known as total number of
seeing is known as total number of observations
observations throughout K groups. This is like the
throughout K groups. This is like the big total. So let's get into an example.
big total. So let's get into an example. Okay, because we need to calculate this
Okay, because we need to calculate this in the you know in a very simple
in the you know in a very simple step-by-step approach. Ideally end of
step-by-step approach. Ideally end of this exercise you will be trying to
this exercise you will be trying to create this table which I call ANOVA
create this table which I call ANOVA table source of variation. Some of the
table source of variation. Some of the things you have not gotten yet. I know
things you have not gotten yet. I know that. So please don't worry. Just follow
that. So please don't worry. Just follow through. I'll explain with the example
through. I'll explain with the example between groups. This is your error and
between groups. This is your error and this is your total. There are three
this is your total. There are three segment. If you see now number one thing
segment. If you see now number one thing we have to calculate sum of squares. So
we have to calculate sum of squares. So you already know you have to calculate
you already know you have to calculate SSB between groups. Error will be SS. So
SSB between groups. Error will be SS. So this is something you know it is NJ SSB
this is something you know it is NJ SSB on the top I have written XJ bar min -
on the top I have written XJ bar min - XR whole² okay NJ XJ bar guys I missed
XR whole² okay NJ XJ bar guys I missed the bar here XJ bar min - XR whole
the bar here XJ bar min - XR whole square okay XJ bar - XR²
square okay XJ bar - XR² okay SS is sigma X - XJ bar² required
okay SS is sigma X - XJ bar² required total is SST which is SSB plus SSC then
total is SST which is SSB plus SSC then you need a degree of freedom so number
you need a degree of freedom so number one is DF1 K -1 DF2 N - K DF3 N minus
one is DF1 K -1 DF2 N - K DF3 N minus one okay mean squares
one okay mean squares so MSB equal to SSB
so MSB equal to SSB K minus one MSE
K minus one MSE S S by N minus K. Okay. Then finally
S S by N minus K. Okay. Then finally we'll calculate F value which is the
we'll calculate F value which is the ratio MSP by MSE. Okay. So I know this
ratio MSP by MSE. Okay. So I know this is little overwhelming I think but let
is little overwhelming I think but let me try to explain you by taking an
me try to explain you by taking an example and I think you will get it. I
example and I think you will get it. I don't see any reason that you not get
don't see any reason that you not get it.
it. So let's say uh I have a table which
So let's say uh I have a table which says that times required by three
says that times required by three workers to perform an assembly line. So
workers to perform an assembly line. So they are working in an industry assembly
they are working in an industry assembly line task were recorded on five selected
line task were recorded on five selected occasions. So let's say let's say we are
occasions. So let's say let's say we are the people. So this is first worker is
the people. So this is first worker is mean, second worker let's say delete,
mean, second worker let's say delete, third is vertical. Okay let's say. So
third is vertical. Okay let's say. So now let's say these are the numbers
now let's say these are the numbers guys. 8 8 10 okay 10 9
10 Okay 81
81 10
10 9 Yeah. So these are the five days on
9 Yeah. So these are the five days on the five selected occasion. 1 2 3 so far
the five selected occasion. 1 2 3 so far with me with the data guys and this is
with me with the data guys and this is all in hours. Okay. So let's say 8
all in hours. Okay. So let's say 8 hours, 10 hours, 9 hours, 11 hours like
hours, 10 hours, 9 hours, 11 hours like that. So now tell me there are three
that. So now tell me there are three groups. Maybe I can give you three
groups. Maybe I can give you three colors. Tell me that's why I'm coming to
colors. Tell me that's why I'm coming to Excel. You like it the way I teach this.
Excel. You like it the way I teach this. There are three population, three
There are three population, three sample, right? If I ask you what is the
sample, right? If I ask you what is the sample average? What is the sample
sample average? What is the sample average? So I think you know right what
average? So I think you know right what you will write quickly average of what
you will write quickly average of what each of these kinds 9.6 you find sample
each of these kinds 9.6 you find sample for delip we do 8.8 and vertical is 9.
for delip we do 8.8 and vertical is 9. This is your sample average. What is
This is your sample average. What is sample variance? So you know that it
sample variance? So you know that it will be variance and this will be s for
will be variance and this will be s for sample. See 1.3 0.7 and 0.7 okay now
sample. See 1.3 0.7 and 0.7 okay now there can be overall average also and
there can be overall average also and that will be what average of not across
that will be what average of not across the population it is going to be the
the population it is going to be the average of the whole guy it cross the
average of the whole guy it cross the total n right so how much is that value
total n right so how much is that value 9.3 now also tell me what is the value
9.3 now also tell me what is the value of n and what is the value of k n is to
of n and what is the value of k n is to like what how many total 15 and how many
like what how many total 15 and how many groups are there three Good. Now we need
groups are there three Good. Now we need to calculate sum of square between the
to calculate sum of square between the groups. How will you find s of square
groups. How will you find s of square between the groups? What is the formula
between the groups? What is the formula says for sum of square between the
says for sum of square between the groups? It says that you will use NJ and
groups? It says that you will use NJ and let me write and the beauty about Excel
let me write and the beauty about Excel is that I can write also. So maybe I can
is that I can write also. So maybe I can write here. Nice doesn't disappoint me.
write here. Nice doesn't disappoint me. What is SSB? SSB is equal to sigma NJ XJ
What is SSB? SSB is equal to sigma NJ XJ bar minus Xar
bar minus Xar squared. Okay. So guys let's understand
squared. Okay. So guys let's understand that what what is the meaning of NJ? NJ
that what what is the meaning of NJ? NJ is number of data points within each
is number of data points within each sample which I not written there. Now
sample which I not written there. Now I'm writing. NJ is what? Number of data
I'm writing. NJ is what? Number of data points within each sample. Okay. What
points within each sample. Okay. What about XJ?
about XJ? This is your sample mean. And what is
This is your sample mean. And what is this Xar? There's no J is overall mean.
this Xar? There's no J is overall mean. We'll say this as overall mean. And you
We'll say this as overall mean. And you know the numbers also, right? So sample
know the numbers also, right? So sample mean is how much? This is what going to
mean is how much? This is what going to be the sample average. I'm giving the
be the sample average. I'm giving the arrow because I like giving it sample
arrow because I like giving it sample average and overall mean is this number
average and overall mean is this number guys. Good. Now SSB we can calculate. So
guys. Good. Now SSB we can calculate. So first thing first how many number of it
first thing first how many number of it is five into now you have to write NJ
is five into now you have to write NJ minus. So this 5 into sample mean 9.6
minus. So this 5 into sample mean 9.6 minus overall mean write whole square I
minus overall mean write whole square I hope you know a little bit of Excel plus
hope you know a little bit of Excel plus you will do the same thing again you'll
you will do the same thing again you'll write 5 into 9.6 - 9.4²
write 5 into 9.6 - 9.4² + 5 into notion the next group this a -
+ 5 into notion the next group this a - 9.4² 4²
9.4² 4² + 5 into 9.8
+ 5 into 9.8 - 9.4
- 9.4 square and that will give me the value
square and that will give me the value of sample mean sum square between that
of sample mean sum square between that right guys so far with me so this is
right guys so far with me so this is your samp sum of square within the
your samp sum of square within the groups that's why you're doing right 9.4
groups that's why you're doing right 9.4 9.4 for all the time. So this is known
9.4 for all the time. So this is known as variation between the groups. Now I
as variation between the groups. Now I need to find sum of square error which
need to find sum of square error which is actually within the group itself. So
is actually within the group itself. So for that what is my approach? Approach
for that what is my approach? Approach is very simple. I'll do n minus k right
is very simple. I'll do n minus k right sorry nj minus one. So it is going to be
sorry nj minus one. So it is going to be right remember the formula nj minus one.
right remember the formula nj minus one. Let me write the formula. What is s of
Let me write the formula. What is s of square error? It is sigma nj minus1 into
square error? It is sigma nj minus1 into sj j². So variance that's it for now to
sj j². So variance that's it for now to follow this. Okay. So let's calculate
follow this. Okay. So let's calculate this equal to nj minus one. So each
this equal to nj minus one. So each group has equal number of sample. This
group has equal number of sample. This is a very simple case. So you do 5 - 1
is a very simple case. So you do 5 - 1 into c12. What is c12 guys? For me it is
into c12. What is c12 guys? For me it is variance. Right? I have calculated it.
variance. Right? I have calculated it. Huh? What is SJ squared guys? Your
Huh? What is SJ squared guys? Your individual variance plus 5 - 1 then I'll
individual variance plus 5 - 1 then I'll do this variance 5 - 1. Are you guys
do this variance 5 - 1. Are you guys following with me? Then you're doing
following with me? Then you're doing this variance done 10.8.
this variance done 10.8. So you have the variance already
So you have the variance already calculated within sample variance right?
calculated within sample variance right? Remember what is this SJ squared? This
Remember what is this SJ squared? This is your within sample variance.
is your within sample variance. Right? So that gives you S SS. So what
Right? So that gives you S SS. So what is your SST? Sum of these guys. So what
is your SST? Sum of these guys. So what is it? You will sum it up. 10.8 + 2.8
is it? You will sum it up. 10.8 + 2.8 13.6.
13.6. That's your total. Okay. This is SSB,
That's your total. Okay. This is SSB, SSC, SST. This is not your MSB. So to
SSC, SST. This is not your MSB. So to get into MSB, remember I had given a
get into MSB, remember I had given a formula.
formula. To get into MSP and MSE, you have to use
To get into MSP and MSE, you have to use a formula. And let me show you what does
a formula. And let me show you what does that you have if you divide it by k
that you have if you divide it by k minus one in one and one you have to
minus one in one and one you have to divide by n minus k. So ideally msb and
divide by n minus k. So ideally msb and ms you will calculate
ms you will calculate by that division. Remember what is that
by that division. Remember what is that division? You'll take this sb and divide
division? You'll take this sb and divide by k minus one. What is k 3 minus 1. So
by k minus one. What is k 3 minus 1. So how much is that? 1.4. SSB is what? This
how much is that? 1.4. SSB is what? This divided by what? N minus k. 15 minus K
divided by what? N minus k. 15 minus K remember that gives you what did I do?
remember that gives you what did I do? I7. Let me rewrite. Sorry, guys.
I7. Let me rewrite. Sorry, guys. Something I missed. This value at 7
Something I missed. This value at 7 divided by N minus K. Right. Perfect.
divided by N minus K. Right. Perfect. This is cool. Now you tell me now what
This is cool. Now you tell me now what is the value of F? It is the ratio of
is the value of F? It is the ratio of MSB by MSE 1.556.
MSB by MSE 1.556. Wow. Now once you have the F statistics
Wow. Now once you have the F statistics value right, what you will do? We need
value right, what you will do? We need to find something, right? So what we
to find something, right? So what we will do? You have F statistics. What is
will do? You have F statistics. What is going to be your frritical? Calculate it
going to be your frritical? Calculate it now. So fritical is going to be
now. So fritical is going to be calculated. So this is what you have
calculated. So this is what you have calculated is known as F statistics.
calculated is known as F statistics. Remember and that is coming 1.555
Remember and that is coming 1.555 whatever 1.55
whatever 1.55 fritical is what you will find and
fritical is what you will find and that's going to be a question mark for
that's going to be a question mark for us. Now you have to use for certain
us. Now you have to use for certain rule. So first number alpha is 0.05.
rule. So first number alpha is 0.05. One of the degree of freedom is k minus
One of the degree of freedom is k minus one. Another of the degree of freedom is
one. Another of the degree of freedom is n minus k. So this is equal to 3 - 1 our
n minus k. So this is equal to 3 - 1 our case 2 and this is equal to n. n is 15 -
case 2 and this is equal to n. n is 15 - 3 is 12. Right? And you will be looking
3 is 12. Right? And you will be looking at a table and this is what I have to
at a table and this is what I have to show to you. Maybe I'll write here. Wow,
show to you. Maybe I'll write here. Wow, this is good. So guys tell me degree of
this is good. So guys tell me degree of freedom. So this is between the groups.
freedom. So this is between the groups. So you will come here right and one is
So you will come here right and one is you will come here and then you will
you will come here and then you will draw a line and compare where are you
draw a line and compare where are you able to circle it up guys here 3.89 this
able to circle it up guys here 3.89 this is the f table for the degree of freedom
is the f table for the degree of freedom for the significant alpha level 0.05 05.
for the significant alpha level 0.05 05. So now you tell me your F stat is what?
So now you tell me your F stat is what? Less than the F critical. What do you do
Less than the F critical. What do you do in this case?
in this case? Except the nothing changes except the
Except the nothing changes except the null hypothesis.
null hypothesis. So and what was the null hypothesis that
So and what was the null hypothesis that your means are not significantly
your means are not significantly different.
different. So
So what means that the means are more or
what means that the means are more or less close and let's see that one of the
less close and let's see that one of the mean is 9.6 one of the mean is 8.8 and
mean is 9.6 one of the mean is 8.8 and 9.8 Okay. So it's not significantly
9.8 Okay. So it's not significantly different. You might be wondering is
different. You might be wondering is there a way that you can show that the
there a way that you can show that the other case right? What I'll do now guys
other case right? What I'll do now guys I'll just copy paste the whole thing and
I'll just copy paste the whole thing and that's the beauty about Excel. You can
that's the beauty about Excel. You can just replicate in nancond. See you have
just replicate in nancond. See you have to just update it that okay Akash is
to just update it that okay Akash is very hard working as you know.
very hard working as you know. I'm just kidding. Right. I'll just put
I'm just kidding. Right. I'll just put some weird numbers like he's working 18
some weird numbers like he's working 18 hours in 24 hours, right? So definitely
hours in 24 hours, right? So definitely I'm trying I'm in line to beat Manaran
I'm trying I'm in line to beat Manaran Modi our PM. Can you see that guys my F
Modi our PM. Can you see that guys my F statistic is 21 something. Now if you
statistic is 21 something. Now if you see tell me again get back. So what is
see tell me again get back. So what is the value of critical? Critical will not
the value of critical? Critical will not change now because it depends on the
change now because it depends on the alpha and and blah blah blah right. So
alpha and and blah blah blah right. So what is the value of fritical? It is
what is the value of fritical? It is what was that? 3.89.
what was that? 3.89. What is the F stat? What do you see? So
What is the F stat? What do you see? So you see that F stat is more than F
you see that F stat is more than F critical. So what reject
critical. So what reject means you say that the difference in the
means you say that the difference in the mean
mean in the sample means is significant. Are
in the sample means is significant. Are you feeling it guys?
you feeling it guys? So what I have made you learn today and
So what I have made you learn today and now I can see that you can understand
now I can see that you can understand and give me a go if you are with me that
and give me a go if you are with me that looking at so for when I gave you those
looking at so for when I gave you those mathematical sigma and all I know a lot
mathematical sigma and all I know a lot of people are from non-mathematical
of people are from non-mathematical background so for me it's like back of
background so for me it's like back of my hand I can do like this but for you
my hand I can do like this but for you it can be tricky that's where I you know
it can be tricky that's where I you know strategically waited you uh you know put
strategically waited you uh you know put you on hold that okay come to this table
you on hold that okay come to this table and then you suddenly Understand?
and then you suddenly Understand? Can you see the question guys? A
Can you see the question guys? A pharmaceutical company conducts
pharmaceutical company conducts experiment to test the effect of new
experiment to test the effect of new cholesterol medication. Very much
cholesterol medication. Very much required. It's a real pharma question.
required. It's a real pharma question. The company selects 15 subjects randomly
The company selects 15 subjects randomly from a larger population. Each subject
from a larger population. Each subject is randomly assigned to one of the three
is randomly assigned to one of the three treatment groups. Within each treatment
treatment groups. Within each treatment group, subject receive a different dose
group, subject receive a different dose of new medication. In group whatever
of new medication. In group whatever they have given 0 mg per day 50 mg
they have given 0 mg per day 50 mg second day and sorry 100 mg per day. So
second day and sorry 100 mg per day. So actually they're telling my dear is this
actually they're telling my dear is this that first group 0 mg group two 50 mg
that first group 0 mg group two 50 mg group 300 medication difference after 30
group 300 medication difference after 30 days doctor measured the cholesterol
days doctor measured the cholesterol level of each subject and the results
level of each subject and the results for 15 subject appear in this table.
for 15 subject appear in this table. Right difference these are cholesterol
Right difference these are cholesterol values. Okay. In conducting this
values. Okay. In conducting this experiment, the experiment had two
experiment, the experiment had two research questions. Does doses level
research questions. Does doses level have a significant effect on
have a significant effect on cholesterol? It's a very obvious
cholesterol? It's a very obvious question guys, right? Second, how strong
question guys, right? Second, how strong is the effect of B's level on
is the effect of B's level on cholesterol? So about significance and
cholesterol? So about significance and the degree of strong,
the degree of strong, right? 210 240 uh what is that? 270
right? 210 240 uh what is that? 270 270
270 300. Right? So now you tell me what is
300. Right? So now you tell me what is the obviously the color formatting has
the obviously the color formatting has gone three groups and here you go right
gone three groups and here you go right so you clearly see that the average is
so you clearly see that the average is what 258 I'm just making sure that n is
what 258 I'm just making sure that n is 15 k is 3 overall average looks correct
15 k is 3 overall average looks correct right so it's coming 4.16 okay
right so it's coming 4.16 okay so we have f equal to 4.16
so we have f equal to 4.16 and we know that f critical is But we
and we know that f critical is But we have already calculated that 3.89 at
have already calculated that 3.89 at 0.05.
0.05. So we know the maths. What is it? F
So we know the maths. What is it? F statistics comes out to be 4.16.
statistics comes out to be 4.16. F critical is
F critical is 3.8. It was no uh 3.89.
3.8. It was no uh 3.89. So definitely if statistic is more than
So definitely if statistic is more than fritical. So it is going to be reject
fritical. So it is going to be reject the null hypothesis
the null hypothesis right definitely it looks meaningful
right definitely it looks meaningful that
that right the different doses will have
right the different doses will have different impact on the cholesterol
different impact on the cholesterol level right can you see that so they
level right can you see that so they differ see different mg will impact on
differ see different mg will impact on the cholesterol level differently and
the cholesterol level differently and look at the sample average it looks that
look at the sample average it looks that 210 is little little better. So 100 mg
210 is little little better. So 100 mg medicine is making your cholesterol
medicine is making your cholesterol level little lower guys isn't it? We
level little lower guys isn't it? We have a view right 260 and 210 definitely
have a view right 260 and 210 definitely there is quite a difference you cannot
there is quite a difference you cannot say these see you can say more or less
say these see you can say more or less these are similar range but it does
these are similar range but it does impact so if I have to put a visual so I
impact so if I have to put a visual so I can say that hello you have zero mg you
can say that hello you have zero mg you have what do you mean by zero mg 50 mg
have what do you mean by zero mg 50 mg 100 mg so you have cholesterol level 258
100 mg so you have cholesterol level 258 you come Here you have a cholesterol
you come Here you have a cholesterol level 238
level 238 246 and then you come here you see that
246 and then you come here you see that your cholesterol level has come down to
your cholesterol level has come down to 210 and this is like guys not from zero
210 and this is like guys not from zero this is like let's say 150 range to this
this is like let's say 150 range to this one can you see that you have a downward
one can you see that you have a downward trend coming down
trend coming down not like drastically but it does come
not like drastically but it does come down
down why I have done this so that you can
why I have done this so that you can feel this coming
feel this coming slightly but it is coming up.
slightly but it is coming up. Okay. So guys, we are going to import
Okay. So guys, we are going to import certain libraries like you know import
certain libraries like you know import numpy as np,
numpy as np, import pandas as pd. These are standard
import pandas as pd. These are standard ones. Import operating system for
ones. Import operating system for working directory.
working directory. Import mplot
hd import seb as sns.
import seb as sns. Right.
Right. Cool.
Cool. This should work. Yeah, these are the
This should work. Yeah, these are the libraries guys. Please type in so that
libraries guys. Please type in so that you have have it ready.
you have have it ready. Now we will start with first test which
Now we will start with first test which is our Z test. Okay. So from sci
is our Z test. Okay. So from sci imports. So sci science file.
imports. So sci science file. So I'll ask you to install this means
So I'll ask you to install this means call not install. So guys let's start
call not install. So guys let's start with one sample Z test. Okay. So you
with one sample Z test. Okay. So you know that one sample Z test if I can
know that one sample Z test if I can give you a point you know that
give you a point you know that what it does
what it does checks the checks whether the whether
checks the checks whether the whether the sample comes from
the sample comes from uh known population.
uh known population. Right? Remember where population mean
sigma should be known. Right? Right. So I need to find certain data to do this.
I need to find certain data to do this. Just wondering which one to take.
Just wondering which one to take. Okay. So do you recall right? One sample
Okay. So do you recall right? One sample jet test we we do to do what? just to
jet test we we do to do what? just to give you a uh the flashback. So let's
give you a uh the flashback. So let's say you have a let us say that
say you have a let us say that you have a sample
you have a sample data and you want to test
data and you want to test I'm writing in plain English you want to
I'm writing in plain English you want to test the if the sample mean
test the if the sample mean is significantly
is significantly you don't need to type I type fast
you don't need to type I type fast because I do it all the time different
because I do it all the time different from the population mean
from the population mean that's why do we do this uh that's the
that's why do we do this uh that's the reason for you doing it right guys? Come
reason for you doing it right guys? Come on. So what I'm going to do I'm going to
on. So what I'm going to do I'm going to just create a sample data like just type
just create a sample data like just type in a list. So sample I can give number 3
in a list. So sample I can give number 3 4 5 6 4 6 4a 6 comma 5a 4 6a 3a 5 okay
4 5 6 4 6 4a 6 comma 5a 4 6a 3a 5 okay and let's say the population mean and
and let's say the population mean and standard deviation I assume is this
standard deviation I assume is this follow me so I say that pop mean
follow me so I say that pop mean population mean is pop mean 4.5 and
population mean is pop mean 4.5 and population SD standard deviation is 1.2.
population SD standard deviation is 1.2. Okay, these are two variables I have
Okay, these are two variables I have created guys. Now I'm going to perform
created guys. Now I'm going to perform one sample Z test. So for that what I'll
one sample Z test. So for that what I'll do I'll write Z statistics.
do I'll write Z statistics. So there are two multivariable I'm
So there are two multivariable I'm creating guys and t value and then I'm
creating guys and t value and then I'm going to use my function stats dot z
going to use my function stats dot z test sample
test sample is my sample then I need to give value
is my sample then I need to give value this it has no attribute called z test
this it has no attribute called z test really import
really import as stats maybe this is another one we
as stats maybe this is another one we should use. So guys, this is also in
should use. So guys, this is also in scypi only.
scypi only. Scypi little big one, right? Scypi stats
Scypi little big one, right? Scypi stats as stats.
as stats. Okay, let's run it now. I think you
Okay, let's run it now. I think you should get
should get interesting.
interesting. Hold on. Stats
Hold on. Stats dot
dot Z.
Z. Okay,
Okay, they have updated Jet statistics record.
they have updated Jet statistics record. But this is not the Jet statistic. This
But this is not the Jet statistic. This is to my variable name
is to my variable name score.
score. Hold on guys, let me I'm missing the
Hold on guys, let me I'm missing the formula. Is it jet score or not?
formula. Is it jet score or not? Okay. So the jet test one which I was
Okay. So the jet test one which I was trying to do guys, it seems it seems
trying to do guys, it seems it seems that it's not working out for us.
that it's not working out for us. Okay. So guys the reason is that I was
Okay. So guys the reason is that I was not importing
not importing and not importing maybe the right
and not importing maybe the right libraries.
libraries. So we have to import from stats
So we have to import from stats do the models
do the models stats that's why it's missing
stats that's why it's missing maybe let's see that wait stats import Z
maybe let's see that wait stats import Z test wow now it will work guys so we
test wow now it will work guys so we have to call I'm giving two value to be
have to call I'm giving two value to be honest z score and p value and I'll say
honest z score and p value and I'll say just check test simply right app it
just check test simply right app it picked up sample
picked up sample and value will be equal to
and value will be equal to population mean pop mean and pop
population mean pop mean and pop standard deviation okay
standard deviation okay and let's see did it do anything now oh
and let's see did it do anything now oh now I missed one thing now my bad sigma
now I missed one thing now my bad sigma is equal to
is equal to now see sigma is not there
now see sigma is not there value alternative
value alternative zero
zero use variance pool
use variance pool but how can we it's zero guys let me do
but how can we it's zero guys let me do one thing I'll I'll do away from this
one thing I'll I'll do away from this because this approach this this library
because this approach this this library is not helping me anything okay here so
is not helping me anything okay here so what I'm going to do I'm going to
what I'm going to do I'm going to install another library it's not a very
install another library it's not a very good one which I was trying to do so it
good one which I was trying to do so it is known as bio info kit
is known as bio info kit okay Okay, perfect.
okay Okay, perfect. Now what you have to do? You have to
Now what you have to do? You have to call this bio info kit
call this bio info kit dot analyst import.
dot analyst import. This is to get data. So you can also get
This is to get data. So you can also get some data which is inbuilt data for
some data which is inbuilt data for doing Z sample one sample Z test. So to
doing Z sample one sample Z test. So to give you some background about this
give you some background about this data, forget about the data I was
data, forget about the data I was looking at because that's anyway my some
looking at because that's anyway my some mocked up data. Now let's come to this
mocked up data. Now let's come to this data. So guys, this is to get sample
data. So guys, this is to get sample data for
data for understanding
understanding the statistical test.
the statistical test. Okay.
Okay. Now let's say I have a data which I get
Now let's say I have a data which I get it from this function get data. So this
it from this function get data. So this is the function you're calling get data
is the function you're calling get data and there you will call Z1 sample. You
and there you will call Z1 sample. You have to write Jed one sample and you
have to write Jed one sample and you have to say data. This is the way you
have to say data. This is the way you get data. Okay. Wow. Perfect. This
get data. Okay. Wow. Perfect. This works, right? So it's a data frame,
works, right? So it's a data frame, right? And if you see it is a series,
right? And if you see it is a series, right? It is only one column. It's a
right? It is only one column. It's a series. Now this data is guys, it's a
series. Now this data is guys, it's a factory data. Let me talk about a little
factory data. Let me talk about a little bit more about the data. So I can give a
bit more about the data. So I can give a factory produces
factory produces a factory produces you know you know the
a factory produces you know you know the the tennis balls of diameter of 5 cm but
the tennis balls of diameter of 5 cm but due to manufacturing condition every
due to manufacturing condition every ball may not have the exactly the same
ball may not have the exactly the same diameter. Then the standard deviation
diameter. Then the standard deviation they are telling of the ball is balls is
they are telling of the ball is balls is 0.4. Now the quality officer would like
0.4. Now the quality officer would like to test whether the ball diameter is
to test whether the ball diameter is sign significantly different from 5 cm
sign significantly different from 5 cm in a sample of 50 balls randomly taken
in a sample of 50 balls randomly taken from the manufacturing line. So did you
from the manufacturing line. So did you understand the problem guys? There is a
understand the problem guys? There is a factory which manufactured the tennis
factory which manufactured the tennis balls right uh the ideal size is 5 m. So
balls right uh the ideal size is 5 m. So they have taken a sample from the
they have taken a sample from the population right? Let's say you have a
population right? Let's say you have a lot of tennis, you have a lot of tennis
lot of tennis, you have a lot of tennis balls. You have taken a taken a sample
balls. You have taken a taken a sample which is of size 50, right? And these
which is of size 50, right? And these samples you are going to use for the one
samples you are going to use for the one sample t test one sample z test because
sample t test one sample z test because your sample size is greater than greater
your sample size is greater than greater than 30 and you are going to say that
than 30 and you are going to say that okay whether your sample size uh mean is
okay whether your sample size uh mean is equivalent to equal to five or not
equivalent to equal to five or not that's a sample jet test you will do.
that's a sample jet test you will do. Okay. So now guys once you have the data
Okay. So now guys once you have the data what you can do you're going to use the
what you can do you're going to use the Z test function available in the B info
Z test function available in the B info kit library. So to do that to do that
kit library. So to do that to do that you have to import fromkit
analysts import stat. So first you imported the
import stat. So first you imported the data. Now you're importing the function
data. Now you're importing the function and then you will call the function
and then you will call the function using right rest uh you can put some you
using right rest uh you can put some you know results
know results result dot z test. Now z test needs you
result dot z test. Now z test needs you know a data frame number one. Suppose
know a data frame number one. Suppose you have your own data you can also run
you have your own data you can also run it right. So you have to give a data
it right. So you have to give a data frame x equal to your column name. What
frame x equal to your column name. What is the name of the column? Just go on
is the name of the column? Just go on the top and read sizes. So you'll put
the top and read sizes. So you'll put the sizes
the sizes it's there and then you have a mu. So
it's there and then you have a mu. So what is the population mean? You want to
what is the population mean? You want to compare against five and what is your x
compare against five and what is your x standard deviation that is 04 and test
standard deviation that is 04 and test type is equal to one. One sample is test
type is equal to one. One sample is test type one. So see pandas data frame
type one. So see pandas data frame containing x group x is the column name
containing x group x is the column name for the x group. Mu is your you know
for the x group. Mu is your you know hypothes hypothesized or known
hypothes hypothesized or known population mean standard deviation for
population mean standard deviation for the group alpha you can also set and
the group alpha you can also set and type of jet test use one for one sample
type of jet test use one for one sample jet test. The moment you run this now
jet test. The moment you run this now you are done and then you can print the
you are done and then you can print the result summary by using the function
result summary by using the function result summary and you will be happy to
result summary and you will be happy to see the output and if you look at the
see the output and if you look at the output given by uh sci library right
output given by uh sci library right let's say for example I show you uh one
let's say for example I show you uh one sample z test sci stats output let's see
sample z test sci stats output let's see right so they calculate the z score
right so they calculate the z score I don't know if they've given some
I don't know if they've given some example but look at that. No, this is
example but look at that. No, this is not the jet test
not the jet test one sample. Yeah, it seems to be this
one sample. Yeah, it seems to be this one a Z test. It's a t test. First of
one a Z test. It's a t test. First of all, finding the jet test has become
all, finding the jet test has become like something I don't know why they are
like something I don't know why they are not able to. But anyway, but if you say
not able to. But anyway, but if you say this is the t test they're doing and if
this is the t test they're doing and if you look at the output, can you see the
you look at the output, can you see the result? Look at that. This result is
result? Look at that. This result is like very lame. See jet test t test
like very lame. See jet test t test same. Yeah. So t test result look at
same. Yeah. So t test result look at that. Then the result we get in our
that. Then the result we get in our function. Look at this. So structured
function. Look at this. So structured you can take a screenshot and you go. So
you can take a screenshot and you go. So first thing first it shows sample size
first thing first it shows sample size 50 mean 5.0176
50 mean 5.0176 zed value is 317. And now they're giving
zed value is 317. And now they're giving p value for one tail two tail lower
p value for one tail two tail lower upper everything. Is the p value
upper everything. Is the p value significant? What is the concept?
significant? What is the concept? Concept says if your p value is less
Concept says if your p value is less than alpha you reject less than equal to
than alpha you reject less than equal to alpha you reject what null if your p
alpha you reject what null if your p value don't forget the basics is greater
value don't forget the basics is greater than alpha accept remember that now tell
than alpha accept remember that now tell me for alpha 0.05 05 which is our
me for alpha 0.05 05 which is our standard what is the p value you are
standard what is the p value you are getting 75 even one is 37 so it is p is
getting 75 even one is 37 so it is p is greater than alpha so are you going to
greater than alpha so are you going to accept the null hypothesis or reject
accept the null hypothesis or reject accept right so what is the
accept right so what is the interpretation my dear so interpretation
interpretation my dear so interpretation is I'll write I'll write p value
is I'll write I'll write p value obtained from the one sample zed test
obtained from the one sample zed test one sample zed test is not significant
one sample zed test is not significant and you can write.
and you can write. I hope you understand.
I hope you understand. Okay. But
Okay. But do you think it's a it's an important
do you think it's a it's an important tool for the quality managers? Think
tool for the quality managers? Think about it. Is it is it practical for any
about it. Is it is it practical for any quality manager that okay you open every
quality manager that okay you open every tennis boxes ball and try to check right
tennis boxes ball and try to check right it's impossible. So what they do they
it's impossible. So what they do they take the batches of production and they
take the batches of production and they will take sample and they will run a
will take sample and they will run a hypo run a jet test right and if they
hypo run a jet test right and if they are statistically they are getting this
are statistically they are getting this what we have gotten they'll say okay
what we have gotten they'll say okay pass they are not going to check each
pass they are not going to check each and every ball right they will just
and every ball right they will just obviously they'll check each and every
obviously they'll check each and every ball in the sample not in the population
ball in the sample not in the population and then try to quickly you know
and then try to quickly you know estimate against the population and say
estimate against the population and say that okay it is not significantly
that okay it is not significantly different from the ball sizes which I
different from the ball sizes which I need to produce. So I'm more than happy
need to produce. So I'm more than happy to pass you your your batch. So that's
to pass you your your batch. So that's the whole cycle goes through there.
the whole cycle goes through there. Okay.
Okay. Just a quick info guys. Intellipath
Just a quick info guys. Intellipath offers a data science course in
offers a data science course in collaboration with iHub, IIT, Riy which
collaboration with iHub, IIT, Riy which will help you master concepts like
will help you master concepts like Python, SQL, machine learning, AI,
Python, SQL, machine learning, AI, PowerBI and more. With this course, we
PowerBI and more. With this course, we have already helped thousands of
have already helped thousands of professionals in successful career
professionals in successful career transition. You can check out their
transition. You can check out their testimonials on our achievers channel
testimonials on our achievers channel whose link is given in the description
whose link is given in the description below. Without a doubt, this course can
below. Without a doubt, this course can set your career to new heights. So,
set your career to new heights. So, visit the course page link given below
visit the course page link given below in the description and take the first
in the description and take the first step towards career growth with the data
step towards career growth with the data science course.
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.