YouTube Transcript:
Measure 19 Histogram

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

Video Transcript

it's just that if you can see here this

is your box plot and this is your

histogram see so histogram is just

another view of what we can see in a box

plot but the reason why I

prefer box plot is that I you know I I

can see

I can see a lot of insights but there

are also

you know there are also

strong points that the histogram

do have and the Box plan doesn't have so

say for example if you want to let's say

get the view of how this data performed

as against to certain requirements so

let's say for example if you are

familiar with this view let's say for

example if you have your lower spec

limits and your upper spec limits here

okay and then what what histogram do is

you know to chart right using bar charts

and these bar charts are uh we put what

we call the frequency distribution or

the curve line here so this one

represents again the same thing with box plot

plot

uh the central tendency

and the dispersion

so remember that when you're using just

to refresh

um when you're using a histogram the

width of the distribution or the width

of the

let's say the curve let's just focus on

the curve for Simplicity purposes

uh it represents what you call the

dispersion so the The Wider

the base of your curve of course the

wider the distribution is

so that that that

a particular

uh width of your histogram the base of

the width uh it pretty much resembles

with the entire if you can see in this

view the entire span of your box plot

but the concentration of course would be

here right in the area of the box or the

interquartile range

so that's going to be the width of the

Box the The Wider the Box The Wider the

the base of the histogram mean it means

that uh The Wider the dispersion uh the

data has now

now

um the central tendency most of the

cases that's gonna be

where where the

the highest data point I mean the

highest data value of the the bar is

located so let's say for example for

this for this one it's pretty much

probably here

this one it's pretty much probably

here okay so this one it's pretty much

probably here so something like that so

that's how it it works um

in a histogram that still detect an outlier

outlier

what do you think can a histogram still

can can a histogram also detect an outlier

outlier

with this View

yes it could still detect outliers for

extremely low value example that's going

to be an outlier here so it's really

pretty much on how you would want your

your data to be presented that's not

some some would prefer I would prefer

using the histogram I'm more comfortable

with it some would say I would rather

use the the Box plug but if we talk

about Pros capability measures

um histogram is used not box plot so you

will talk more about that if not today

maybe tomorrow so that's uh the

histogram okay it's basically the same

function of uh you know I checking on

the distribution but it has uh a

different form two different forms

histogram in the Box plan okay so this

is basically an example so we can I'm

gonna jump again to minitab

okay so let's say let's go back to the

basic example of this one okay

um and let's try to create uh histogram

so let's sorry that's graph okay graph and

and

um go to [Music]

[Music]

so we can do histogram here okay

okay

and then you just have to click fit here

and then you just have to

um so graph you can do a histogram here

and then with fit

and then you just have to click this one

and you'll have this view that's one basic

basic

um flow that we can take on

uh I have here maybe about at 6.5

average if I'm not looking on this one

so pretty much here or here

okay or it's basically where the middle

curve lies okay so maybe here so 6.45 if

you look at the average here it's 6.45 okay

okay

so this one

so it's basically if you draw a line in

the middle of

of this curve where and then check where

it lies in the x-axis and then it will

give you at most plus minus some

difference of course small difference

okay if it's if we're using the eye

eyeball method

the the dispersion we cannot see the

actual value

but we can see the the distribution it's

you know a little wide compared to what

we expect let's say

so you can see the standard deviation as

a measure here so we're using standard

division rather than IQR for dispersion

uh we're comparing basically histogram

to box plot uh in terms of central

tendency we're using average rather than the

the [Music]

[Music] um

uh median inbox plan okay so that's

the

histogram okay using that path so so

that's how it looks like so you can

still you know

um capture that outlier but there's a

pretty much more convenient

um path so that's basically not under

graph but rather uh from stat

uh you go to basic stat and then you

look for this one it's called graphical

summary so this graphical summary will

pretty much give you every uh basic

statistics that is available to your consumption

so because we put that five of course um

uh where you're getting a different view

from your PDF okay

so this particular view gives

you the

um gives you the essential statistics

so how with this illustration that Vlad

requested how do you think

um doubtliers affects uh the central

tendency measures

say for example um

the mean now it becomes

6.42 okay

so because we have A5

what happens this extremely low value

tends to pull the value of

your average to the left correct because

it it gives you know some weight on the

left side of the distribution and it

somehow attracts it that hey

um if I'm not player I'm actually

inviting the average To Go near me

something like that and that creates

some sort of noise and bias

now that is where the value of the

median will be more useful if you have outliers

outliers

okay if you have outliers to avoid uh

the effect to avoid bias no

it's the because average is susceptible

to bias and errors in data to outliers

so what you want to check is the median

you might want to consider using the median

median

as your measure of central tendency this

is not actually

a common practice that is being done

especially for organizations that has

been uh using average as their basic

measure of central tendency but from

time to time uh there are cases that we

really need to resort to using the

median Even oee in our project oee

project from previous organization what

we use as a metric was not

average because there are you know lines

there are tools that are performing way below

below

the the common performance that is being

exhibited by this group of tools or

machines okay so by using average that

the the central tendency is somehow

polluted so what we decided was to use

uh median rather than mean for that project

project

okay just to give you an idea of of how

you can play of course with the

statistics and when is the best time to

use those statistics

so we can also check the normality

um normality test is basically required

a requirement before we do a before we

do any in-depth data analysis so it's uh

in statistics we call it assumptions so

there are assumptions when you do

certain tests that uh the the data set

should be following in normal

distribution if not then you you'll deploy

deploy

tests statistical tests that are

intended for non-normally distributed

data so

um statistical tests used

um used for normally distributed data is

called parametric test and statistical

tests used for non-normally distributed

data is called non-parametric test so

normality is a concept that is very

known to statistics I I hope I I guess

you're you're all heard of it from the

previous uh discussion the the normality

is something like this something like

that but in Practical context this is

what will happen so when you're doing a

project or when you're crunching data

first you have to understand how the

data is distributed right so you'll be

doing something like this

okay now

what and then if you see that hey

there's the P value is not uh 0.05 and

above or above 0.75 so which means that

the data is basically not the same the

distribution of the data is not the same

with a normal distribution so basically

it's uh non-normally distributed or we

could say that it's skewed right

so what will what will we do next um

excuse me what will we do next

shall we collect

another set of data so that's the

question right so come to think of this

say for example this this data that

you're seeing right now is the data that

is from your 12 weeks performance of

your primary metric for your project

okay because this is how it would go

this is supposedly how it will go from

Define you have you have summarize the

data but you haven't created any

distribution like this then you might

want to check the individual

distribution coming from the 12 week

performance because of course the 12

week that one data point in in that 12

week range that contains daily data and

that daily data contains another set of

data right so because this is just an

aggregation of the whole week so there's

going to be in probably let's say if we

talk about yield or if we talk about um

output production there's a daily data

within that mean uh daily data and

there's uh per machine data let's say if

we talk about output if we talk about

let's say um

yield that's going to be there's uh from

that week that's that's going to be um

what they call this daily data and the

daily data has maybe uh either per

machine or per per lot data of yield

right so and if you talk about this for

example for Michelle's case uh if we

talk about let's say customer

satisfaction index or the inventory

levels the inventor level uh average for

the week is further divided into average

the uh inverter level per day and that

that would be um on a per let's say per

s key or per line item so that's how the

data is

um constructed right or uh the

architecture of the data so

um imagine that if you're seeing that if

you're seeing that data metric in this

particular View

and if you see that hey the the p-value

is not saying that you know it's

normally distributed what will you do

next will you collect new a new data set

or would you rather understand why do I

have that outlier there

it's not basically we're not targeting

basically that the data should be

normally distributed okay it's not

always the case because there are data

sets that are intended to be uh

non-normally distributed

all right and that's the common

misconception of course as mentioned

earlier there are tests that are intended

intended

for normally distributed data and there

are tests that are intended for

not normally distributed data and in

statistics there is in there's a gray

area wherein it's called the central

limit theorem

the central limit theorem states that

um in a certain number of data points if

you talk about I think 41 data points

the distribution of the data might be

non-normally distributed but as but if

we you know extend the data collection

and increase the number of data points

at some point

uh the data will follow in normal distribution

distribution

so with that principle it it is being

used abusively to

uh to use a parametric test to

non-normally distributed data okay

okay

so that's that's one gray area that I'm

seeing in the field of statistics and I

think if you read uh if you have some

some spare time if you read so many

articles there's a great debate about it um

but uh to to vlad's point yeah that's

correct you want to understand what what

causes that outlier and uh eventually

remove that outlier so that you can have

a better view not to make the data

normally distributed but to have a

better view of the data without that outlier

outlier

okay having that outlier there is an

Insight or would trigger you to

investigate but if you would want to be

saying properly it's either you would

remove that after you understand what

caused that or you would want to use

a central tendency measure that is not

affected by that outlier which is the

median rather than using the average

so we don't have to be concerned that uh

it's not normally distributed so I have

to collect another set of data that

could be the case if you want to prove

that supposedly this data should should

be following

a normal distribution but at some cases

there are really factors that you know

that causes it to not be

um normally distributed and that's one

story another story is that there are

data sets for example uh meantime

between failure or maybe uh customer

satisfaction index you would want your

customer satisfaction index the higher

the better so the chart will be

something like

um probably something like this right so

it's cute because you would want a five

year rather than a one year right if

you're talking about customer

satisfaction index

okay so for example output output is

supposedly higher the better so it

should be let's say for example you

would want you don't want 1000 output

but you would want 5 000 output so the

higher the better

okay but if we're talking about

uh data that has plus minus say for example

example

um resistance of a certain PCB

um certain PCB or a certain Electronic

Component so that's going to be plus

minus let's say two two ohms

Okay so

that should follow a normal distribution

since you have a plus minus

minus plus sides right so that's gonna

be uh how we we treat this um

normal distribution thing okay

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Measure 19 Histogram

Video Transcript

Paste YouTube URL

Transcript Extraction Form

Get Our Chrome Extension

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube Transcript:
Measure 19 Histogram