YouTube Transcript:
A8. Boxplot
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
hi we are now on the second graphical
analysis tool
and this time we will have box plot
box plot is also known as box plot and whiskers
whiskers
box plot is defined as a graphical
method of displaying variation
in a set of data in most cases
a histogram analysis provides a
sufficient display
but a box and whisker plot can provide
additional detail
while allowing multiple sets of data to
be displayed in the same graph
like histogram this falls under the data
distribution tools
but unlike histogram box plot can give
us an idea of central tendency
and dispersion better if you want to compare
compare
data sets and their measures of central tendency
tendency
and this per shot at the same time you
can actually use
box plot this is my personal favorite
because i have done so many things using
box plot and i have established a system
using a box plot that i can share with
you on the next section
now let's study the anatomy of a box
plot a box plot
technically divides the data set into four
four
partitions this partitions is called a quartile
quartile
just like how a year is being divided
into four
which we call a quarter now for every quartile
quartile
there is 25 percent of data set within it
it
now if we will divide a part into four
we will have a four
quartile meaning four 25 percent of
partition each the first partition is called
called
the first quartile wherein 25
of the data is located the second is the
second quartile
for the second quartile it would be 25
and another 25 which is 50 of the data
within it it is also known as the median
one of the central tendency measure
next is the third quartile third
quartile so we have to add another 25
percent from our second quartile so this is
is
75 of the data within it
and the last will be the fourth quartile
which is the 100
partition which means 100 of the data is
within this number
now we we can actually detect outlier
denoted by an asterisk symbol whenever
you are using
the box plot if the tails of what we
call the whiskers
this one and this one cannot contain
the data value the very high or very low
data value
then boxplot will tag it as an outlier
or an unusual observation if you can
still remember our study about central
tendency measures
mean is susceptible to outliers that is why
why
we are using median when we are using
box plot as our graphical analysis tool
in box plot central tendency is based on
the median
and dispersion is based on what we call
the inter-quartile range or
iqr iqr is the difference of the third quartile
quartile
and the first quartile practically speaking
speaking
if we have a greater amount of iqr
we have a greater picture of the box
and we have and if we have the greater
picture of the box or greater span of
the box
there is a higher amount of variation present
present
now if we want to check on the smaller
amount of variation
we're looking for smaller size of boxes
smaller the size of the box the smaller
the amount of the variation
so let's put it into practice we have to
go to minitab again
but before that let's go to our
worksheet and
copy energy okay so from worksheet we
have to copy here
the same dataset that we use in our
histogram case study
so this is again energy cost now we want
to check the distribution of the energy cost
cost
and uh using box plot
we have to click graph and then
find box plot here it is click box plot
now we have again simple y here one
column with one data so we will be using
this one
upper left we have to double click
and what is the variable data
energy cost double left click after that
we have to click ok
drag your worksheet down a little and
then adjust so you can see
now we have a box plot of distribution
of the energy cost
you can see here on the left side or the y-axis
y-axis
it's the value or data values of your
energy cost
so you don't have anything on your
x-axis because it's energy cos
as a function of the particular x-axis
but for now
it's black now how to read this you can
actually get the values of the quartiles
and everything that you want to know
about the box plot by putting your
mouse cursor on top of your box plot
so it revealed that first quartile is
197.5 percent
second quarter armenian is 320 third
quartile is 447.5
the iqr is 250 the whiskers
spans from 7 to 7 to 676
and the data values that we have is
25 data points now how to
interpret it only says that there is 25
of data that is equal to or less than 197
197
percent and the remaining 75 percent
is more than that value for median
it only says that there are 50 percent
of data above
and below that number because it's the midpoint
midpoint
now talking about the third quartile 447.5
447.5
there is 75 percent of data below that number
number
and the remaining 25 on top of that number
number
that is how we use the data and the
quartile values
as we use box plot in order for us to analyze
analyze
our data set and our data set has no
missing values nor it has no outlier
because we don't see any asterisk on the
chart that has been
created here say we have
a target value let's say our target value
value
is 300 now
let's put a reference line so
we have to right click here and then
click edit graph okay so this
graph will will appear then you have to
right click and then add
now we will be adding a reference line
why because we want to check whether
where are we as against our target so a
reference line could help us
visualize that so we want to
put a value of a reference line at y
value so
let's click 300 as mentioned
and then click ok and there will
appear a 300 here now click ok
for you to apply that now if we have a
target of 300 how can we answer the
question of how much of the data points
is already meeting
the 300 target so because this is cause
we want
lower the better again we have to
put our cursor on the box plot so we can
see the value of the median
so the closest value is the median okay
so you will be using the closest
value of the quartile in interpreting
for this case closer to 300 is
the median which is 320 using an
estimation based on the visual
output or the graphical output of our
box plot
we can say that almost 50 percent of the data
data
is actually meeting the target of 300
based on the median
so that is how we can use box plot to
interpret the results
of our data as we visualize them moving
forward you can use box plot
to check the distribution of your data
set as against your target
to check your process capability on how capable
capable
are you in meeting the target so you
will so you will have an idea or understanding
understanding
of how much is the problem that you are facing
facing
based on historical data now let's take
another example
this time we will use the same data set
from instagram
the fertilizer problem but now using
box plot so now let's go to our
i have to create a new worksheet i have
to close this one
and then paste it
now we will have to create a box plot
for this case study
we have to go to graph and then
box plot earlier we use
simple y because we have one column of
data but
we now have three columns of data so we
have to choose
the lower left which corresponds to
multiple y
then i have to click okay now i have to
repeat the same process
highlight c1 drag down and then select
make sure that all of the data variables
are here
if it's already there you have to click ok
ok
for you to generate the chart and then
you have your chart already
now let's move this and interpret
so as you can see this is now a
representation of
the previous data set that we have
regarding fertilizer
you can see here on the y-axis we have
the data
which is the plant height in centimeters
for the three conditions that we have
we have none grow fast and super bland
now as mentioned using a box plot
central tendency is given on using the
median so median remember median is the line
line
inside the box so these are the median
okay for this person it is
the height of the box so this
height this height of the boxes
now if you are asked to check which has the
the
highest amount of central tendency
and because this is to put context this
plant height so we want higher the
better so zero
to higher value so we're asking for
a central tendency measure or median
which is closer to
the upper part of this chart which is 40
on this particular axis
now which has the highest amount of
median among the three
okay so we have grow fast so we can check
check
for none the median is 18. for rufus the
median is 25.5
and for super plant the median is 21.
therefore the highest median
is yes grow fast
okay so that's for measure of sensual tendency
tendency
how about four measures of this person
we have to check on which has the smallest
smallest
box so which has a smallest box
using visual judgment we have
grow fast again we can check using what
yes iqr so iqr
for none it's eight for growfus
it's 6.25 and for
super plant it's eight therefore the smallest
smallest
amount of variation can be found in
grow fast because it has the smallest iqr
iqr
and visually speaking it has the smallest
smallest
size of the box okay so therefore
you can use box plot if you want to
compare categories of data
it should be continuous data your y is
continuous data because these are your x's
x's
plant height in terms of
condition of whether there is fertilizer
or no fertilizer so again y
is continuous and x is categorical
so if you have that kind of data set
therefore you can use this
for your data analysis or your root
cause analysis
moving forward you will be using box
plot as you prove
your root cause analysis in your case study
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.
Works with YouTube, Coursera, Udemy and more educational platforms
Get Instant Transcripts: Just Edit the Domain in Your Address Bar!
YouTube
←
→
↻
https://www.youtube.com/watch?v=UF8uR6Z6KLc
YoutubeToText
←
→
↻
https://youtubetotext.net/watch?v=UF8uR6Z6KLc