0:04 next would be scatter plot so of course
0:07 scatter plot is another graphical a very
0:10 powerful graphical analysis tool so this
0:13 is a scatter plot is a diagram to
0:16 present the relationship between two
0:18 variables of a data set most of the
0:22 cases it's used for quantitative data
0:24 sets so for example if you have a
0:26 quantitative y paired with the
0:28 quantitative X so that's going to be
0:31 using a scatter plot so a scatter plot
0:34 consists of a set of data points so on a
0:35 scatter plot a single observation is
0:38 presented by a data point with its
0:39 horizontal position equal to the value
0:42 of one variable that could be found on
0:45 the other position which is the vertical one
0:51 so a scatter plot helps uh us to
0:53 understand whether the two variables are
0:55 related to each other or not
0:56 how is the strength of their
0:59 relationship what is the shape of the
1:01 relationship what is the direction of
1:04 their relationship and when outliers are
1:13 let's try to use a example
1:25 oops sorry
1:28 so let's try to create another worksheet here
1:43 so we're talking about in this case
1:46 we're talking about the mileage or the
1:48 miles per gallon
1:51 so pretty much this is the fuel yield if
1:55 you own a car you you are I I would
1:58 guess in this time of uh the year you
2:02 would be very conscious about your uh
2:03 fuel yield
2:06 okay because uh the the cost of the fuel
2:09 right is too high right now I think
2:13 diesel is about at about 82 per per liter
2:15 liter
2:19 so this is my um miles per per gallon
2:22 and this is the weight of the car so
2:25 using we can use
2:29 histogram okay I'm sorry we can use we
2:33 can use scatter plot right to
2:37 check whether this particular two
2:40 variables that are quantitative are
2:43 associated with each other or correlated
2:45 to each other so correlated is a
2:48 correlation rather is a term that we use
2:53 to uh to potentially Express that there
2:56 is a or to express but if there's a
2:58 potential Association or relationship
3:00 between the two variables that are being
3:02 paired okay
3:06 so what what we can do is we go to stat
3:09 sorry graph
3:12 if we go to graph
3:14 click the scatter plot here
3:16 and then there are there are so many
3:18 options so you might want to use this
3:20 third one here because this gives you
3:23 the idea of where the line or what we
3:25 call the fitted line or the line that we
3:29 draw in the clustering of data points
3:32 that we have in a scatter plot
3:34 is located no you don't have to imagine
3:36 if you use this you'll imagine where
3:38 should I put this line but if you use
3:40 this it's uh the software will give you
3:47 and then you put the let's say for
3:51 example that that we assume that uh
3:54 uh the miles per gallon is affected by
3:58 the weight of the car so remember Y is a
4:00 function of X so when you're doing root
4:02 cause analysis or hypothesis testing
4:05 this is a very important concept that
4:07 you always have to remember and then you
4:11 put uh you click OK and we will have
4:14 this view here
4:17 so from
4:19 I think I don't know if how many data
4:21 points this is
4:24 but this is a lot I mean this is
4:28 about 398 paired data points so from
4:32 that view we see that uh the line is
4:36 actually leaning to the left okay so the
4:40 slope is the slope of the line is the
4:43 one that we use to check the possible
4:45 correlation so if you see something like
4:49 this it denotes a negative
4:54 correlation so why negative correlation
4:56 again correlation is a term that we use
5:00 to ex Express potentially the possible
5:02 Association or correlation or
5:04 relationship between two paired
5:07 variables so what we do uh by hand is we
5:09 plot the data points and then we plot
5:11 this red line here it's called the
5:13 fitted line so the fitted line from the
5:17 treated line we try to draw the slope
5:18 and depending on the slope we will
5:20 assess whether there's a possible
5:22 positive correlation
5:25 uh possible negative correlation or no
5:28 correlation at all so if you see this
5:30 kind of slope uh the line is leaning to
5:32 the left side it means that there's a
5:34 possible negative correlation what does
5:37 it mean so as the weight of the car or
5:39 the vehicle increases so from zero let's
5:41 say to this number
5:44 so okay what happens to the miles per gallon
5:46 gallon
5:49 so as this one increases
5:51 the miles per gallon what
5:54 geek creases so that is why it's a
6:01 and with this you can actually
6:04 uh identify with this concept you can
6:06 identify important factors that are
6:09 affecting your certain kpi or metric
6:12 okay so this is highly used to establish
6:15 a an association or relationship or
6:18 correlation but this is just the visual
6:20 or graphical part of it there is a thing
6:22 called correlation and regression
6:24 analysis that we will be covering on the
6:26 analyze space but for now
6:28 um as part of also as part of the seven
6:31 basic UC tools we are trying to explore
6:33 this as part of the graphical analysis
6:36 tools so that's for negative correlation
6:40 so if you see um something something uh
6:43 that is opposite of this let's say if
6:46 the line of I mean the clustered of data
6:49 points are just like this foreign
6:54 it's something like that so the slope is
6:58 here then this signifies a possible positive
7:00 positive
7:05 correlation we're in as one increases
7:07 the other also
7:09 increases okay
7:12 and there's also a case wherein there
7:15 are no lines that could be drawn since
7:18 the data points are two dispersed so
7:21 that denotes that your
7:23 um data point
7:28 only shows no correlation at all
7:31 so that's pretty much how we use the
7:33 concept of scatter plot when we are
7:37 dealing with graphical analysis okay
7:39 okay
7:41 so for example if you want to check if
7:43 output is dependent on the number of
7:46 attend a number of people who are
7:48 present in your workplace or the
7:50 attendance rate so you can do this so if
7:52 let's say for example if the inventor
7:54 level of your
7:58 medicine at your Pharmacy is let's say um
7:59 um
8:03 dependent on the number of
8:05 inpatients or outpatients
8:08 so something like that then you can use
8:10 this and for as long as there is a
8:13 quantitative the quantitative nature of
8:16 both Y and X are are there so you can
8:20 use this so I I would suggest that uh
8:22 forever because this is paired data
8:26 points okay so as much as possible say
8:28 for example if you talk about the the
8:31 output for this day one the data should
8:34 be paired with uh of course the
8:36 attendance rate for this day so not
8:39 pulling only the data let's say for
8:40 example if a different data set from
8:43 from another team that contains uh the
8:46 output data and then you pull in another
8:48 data that is from another team that
8:50 contains the attendance data still there
8:53 are both qualitative but when you try to
8:55 pair them there are not actually
8:58 on they don't have the same reference
9:01 maybe uh let's say for example this data
9:05 could be this First Data could be uh for
9:08 for the output of let's say Monday but
9:10 the data that you were able to pair from
9:13 that particular random data set that you
9:15 were you tried to pull that from that
9:17 group uh attend who provides attendance
9:21 report it's from a Tuesday so there's
9:23 gonna be some disconnect so as much as
9:26 possible we try to get really the paired
9:27 data point
9:30 and this this is just my uh you know
9:33 idea and best practice since uh what
9:35 what I'm expecting is there's gonna be
9:37 some sort of you know
9:39 um false context
9:42 because any data that you can pair would
9:45 would pretty much have like some sort of
9:47 relationship or correlation but it's
9:48 very important that we stick to the
9:51 context so and then context isn't just
9:53 about when you're trying to interpret it
9:55 but also when you're trying to get the
9:58 data that you will use for certain tools
10:01 and and you know gaps either graphical