0:02 okay so before we get into the actual
0:05 content of our lesson for today all I
0:06 want to talk just really briefly about
0:08 what I did with these section here this
0:11 topic on blocking and experiments is
0:13 technically part of section 4.2 still
0:16 but I took this piece of the section and
0:19 I kind of pushed it back so that when I
0:22 give you guys a quiz over 4.2 blocking
0:25 will not be on it so at the first time
0:27 you guys are watching this video for me
0:30 on in class we will talk about just one
0:33 quick thing in these slides that'll help
0:34 you guys review for in general
0:37 completely randomized design experiment
0:39 stuff that'll get you ready for your 4.2
0:41 quiz so I would expect that the first
0:43 time you're watching this it'll be for
0:44 like the first five minutes of the video
0:47 and then you can revisit this and watch
0:50 the whole new lesson um later on for the
0:53 next class period so anyway the one I
0:55 want thing I want to talk about in this
0:57 first part here um for the first time
1:00 you're watching this is introducing this
1:02 problem right here and talking about how
1:04 we would use a completely randomized
1:07 design as an experiment so remember a
1:09 completely randomized design is just a
1:11 general experiment like you would
1:13 probably think to do um where there's
1:16 nothing fancy going on blocking which is
1:18 the subject of this lesson is like a
1:20 more sophisticated or fancier way of
1:23 running running in experiments so the
1:25 context for this problem is that we have
1:29 20 students available and those 20
1:31 students have taken the ACE the SAT
1:33 before and they're gonna participate in
1:36 an experiments with an online in an
1:39 in-class prep course for the SAT so we
1:41 want to see if these kids are gonna do
1:43 better with their SATs if they're either
1:45 in an online class or an in-person class
1:48 and we need to describe a completely
1:51 randomized design um to get this done
1:53 now I've done this a couple times with
1:55 you guys where we've made a little exam
1:58 experiment diagram for the randomized
2:00 design but I want to show you because I
2:02 haven't done this in a lesson view yet
2:05 how you would actually describe this
2:07 with like a little list because if the
2:10 AP test asks you to do a question like
2:11 this you're generally going to want to
2:13 do it via like just
2:14 Crypton rather than having to make a
2:17 picture um so pay attention as I go over
2:19 this because your four point to quiz
2:21 will have a problem where I asked you to
2:24 design an experiment like this and use a
2:27 completely randomized design so I've got
2:29 20 students at my disposal and they need
2:32 to go to online and in class those are
2:34 going to be buying two treatment groups
2:36 right here so the first thing I'm going
2:38 to do um you can do it where you put
2:39 everybody's name on it next card and
2:41 shuffle those up or you can give
2:43 everybody a unique number I'm gonna go
2:45 ahead and do it the number way although
2:48 either way is cool so assign each
2:55 volunteer a unique number from 1 to 20
3:08 so unique number from 1 to 20 when you
3:10 specify this make sure you're saying
3:12 unique which gets rid of like the
3:14 technicality of oh I gave somebody the
3:15 same number twice so everybody gets a
3:18 unique number here after I give
3:20 everybody a number what I'm gonna do is
3:24 use randant on my calculator use some
3:28 sort of random number generator so Rand intz
3:28 intz
3:32 the command would be 1 comma 20 I'm
3:36 gonna pick a number from 1 to 20 and I'm
3:42 gonna do this to generates 10 unique
3:47 numbers again and I'm specifying that
3:50 the numbers are unique which means that
3:53 there are no repeats allowed it's always
3:55 a good thing when you do problems like
3:58 this to explicitly say if repeats are
4:00 allowed or not here I definitely don't
4:01 want repeats I wouldn't want to pick the
4:04 same person twice after you talk about
4:06 how you're gonna pick your 10 people um
4:08 didn't get myself in early enough space
4:11 here the next step is going to be to us
4:12 say what you're gonna do with those 10
4:16 people so those 10 people the 10 people
4:22 corresponding to those numbers so the 10
4:26 people corresponding to the selected numbers
4:49 - in person it doesn't matter which ten
4:51 go to which group as long as the
4:53 selection of the ten was random so if
4:54 you would reverse the order of this that
4:56 would be totally fine as well
4:58 so they go to the class they take their
5:01 class I'm running out of space here so
5:02 I'll delete this when I continue with my
5:05 video but the next step in my process I
5:08 would have to give them the SAT so they
5:18 take their little class after class have
5:31 subjects take the SAT and measure their
5:39 improvements okay so they take the class
5:41 they retake the SAT again we see how
5:42 they improve because everybody's taking
5:44 it before the very last thing you have
5:46 to do when you talk about an experiment
5:48 is talk about what happens with your
5:50 results the comparison step that's
5:51 really really important that kids tend
5:53 to forget so my last step in this
5:57 process is going to be to compare the
6:01 improvement scores between the two
6:09 groups so when I asked you to design me
6:12 a completely randomized experiments you
6:14 need to talk about labels how do they
6:16 get their labels so I said assign each
6:17 volunteer a number from 1 to 20 that's
6:20 unique talk about how the randomness
6:21 takes place it could be if you're
6:23 shuffling names out of a hat doing that
6:25 it could be talking about randant with
6:28 repeats or no repeats then you need to
6:30 talk about what happens to the people
6:32 once you've randomly selected them so
6:34 where do they go what do they do talk
6:36 about the experiment itself
6:37 oh yeah collect data blah blah blah and
6:39 then talk about a comparison at the end
6:41 you're gonna compare you two
6:43 so this is what I would expect to see
6:45 when you are asked to design a
6:47 completely randomized experiments I'm
6:49 gonna go ahead and delete this and I'm
6:51 gonna continue with the video right now
6:54 but if you haven't had your 4.2 quiz yet
6:56 this is where you should stop for now
6:58 I'm gonna carry on though and you'll
7:00 pick up from this point in the next
7:05 lesson that you is on you Arendt's so
7:07 I'm talking to you in the future now
7:09 compared to when you first watch this
7:11 video this is where you pick up to talk
7:14 about our new concept of blocking so
7:15 what we just finished diagramming right
7:17 there um it was just written on the
7:19 screen is an example of a completely
7:21 randomized design but the issue with
7:24 this design um in our little fictitious
7:26 problem about the SAT it turns out that
7:28 out of those 20 people who volunteered
7:30 to be in our study they were in
7:33 different math classes 10 of the kids
7:35 were in precalc six were in algebra two
7:39 and four are only in geometry what kind
7:42 of a problem with this cause in terms of
7:44 SAT scores and the value of this class
7:47 so think about that for a second and
7:48 hopefully come up with something in your
7:51 head the biggest thing I'm telling you
7:53 the answer now the biggest thing that
7:55 this would cause is depending on how
7:57 much math you already know this class
8:00 probably isn't gonna be as valuable if
8:03 you don't know as much okay if I put a
8:05 third grader through an SAT class it's
8:07 not gonna help them at all because they
8:09 just don't know enough math that it's
8:11 not gonna make sense anyway if I put
8:13 somebody in precalc through this class
8:15 who already knows the stuff and may have
8:17 forgotten things the review is gonna be
8:19 really beneficial and I would expect the
8:22 creeks out kids to benefit more than the
8:24 geometry kids who haven't learned those
8:26 tougher things like logarithms etc etc
8:30 okay so going in the amount of math you
8:32 already have learned in the past is
8:34 probably gonna have a pretty significant
8:36 impact on how much this class actually
8:39 helps you so the amounts of math you
8:47 know so the amount of math you know
8:58 we'll have or could have a major impact
9:13 on how much a review class helps again
9:14 if you've never learned the content in
9:16 the first place seeing it in a quick
9:19 review SAT class isn't gonna make you
9:20 understand it well enough you have to
9:22 have been in the content and seen the
9:25 class for real so that is not really an
9:27 ideal thing right here because those
9:30 precalc kids may end up just getting way
9:31 more out of this class than the other
9:33 kids and that couldn't make things
9:35 trickier when we're actually analyzing
9:37 our results one way to solve this
9:39 problem the best way to solve this
9:41 problem is to use a concept called
9:46 blocking so blocking is a different type
9:48 of experimental design as opposed to a
9:51 completely randomized design and that is
9:52 what I need to teach you about in this lesson
9:53 lesson
9:55 so let's go on to the next page and I'll
9:58 explain what blocking actually is so a
10:03 block as a general vocab term right here
10:08 a block is a group established by a researcher
10:21 group established by a researcher with a
10:32 common characteristic and the big thing
10:36 about a block is that it's established
10:43 before you do random assignments okay so
10:45 a block is established before random
10:46 assignment this will make more sense
10:48 when I actually break down this problem
10:50 um and I'll show you how it works
10:52 but it completely a randomized block
10:55 design is an experiments involving this
10:56 is a bad definition right here but it's
11:04 an experiment involving blocks so how
11:07 would this work rather than have all 20
11:09 of my people and be like a random
11:12 assignment 10 here 10 here if I do that
11:15 it could be that I just end up putting
11:17 all that most of the precalc kids like
11:18 what if 7 or 8 precalc kids just
11:21 magically end up in one group by random
11:23 chance and the other group only is 2 or
11:25 3 freaked out kids well this group with
11:27 a lot of precalc kids is just gonna look
11:30 really good maybe if it's online or in
11:31 person it could just be the fact that
11:33 they're in precalc that's making this
11:36 group look better than this one right
11:38 here so what we do if we're pretty
11:40 confident that math class is going to
11:42 make a difference we take each group
11:44 separately we take our precalc kittens
11:47 all 10 of them and then we would divvy
11:49 them up so like randomly assign 5
11:52 randomly assign 5 to each group there
11:55 were six kids and algebra two randomly
11:57 assigned three here randomly assigned
11:59 three here so rather than relying on
12:01 just my big pool and hoping that they
12:04 end up about even when you deal them out
12:06 you break them into little piles called
12:09 blocks and then from those blocks what
12:11 you do is you randomly assign parts to
12:14 each this is actually a lot like
12:16 stratified sampling so this is a concept
12:18 that kids sometimes confused with
12:20 stratified sampling the big difference
12:22 with stratified sampling is that you are
12:26 sampling it says it in the UM name when
12:28 you sample you pull out of the
12:30 population so again my promise
12:32 sample I have freshmen sophomores
12:34 juniors seniors I'm sampling people to
12:37 be in my study so I would pull from the
12:39 list of freshmen I would pull from the
12:42 list of sophomores with blocking let's
12:44 say I was doing a problem in experiment
12:47 based on school somehow and I wanted to
12:49 make sure I got some freshmen to each
12:51 group I am assigning putting it into
12:54 groups rather than sampling and pulling
12:57 it but it's a very similar concept here
13:00 so creating a diagram outlining this
13:04 problem with our randomized block design
13:07 for SAT we would start out with our 10
13:14 precalc kids that would be one block we
13:17 would have six kids who are in algebra
13:22 two and then we have my little toolbars
13:23 on the way here
13:26 I got four kids who are in that does not
13:30 look like a four for kids in Geo and
13:32 then what you're basically gonna do is
13:34 do a very similar process I'm not
13:36 necessarily gonna write this whole thing
13:38 here um but what you would do is you
13:46 would randomly assign it so that's five
13:52 of the kids end up in each group so you
13:53 would have five of the kids go to
13:55 treatment one which is I don't know the
13:59 online class and then you have the
14:02 remaining five go to the in-person class
14:05 and you would have the same sort of deal
14:08 with the algebra two except this time
14:10 there were only six of them so you have
14:12 three and three so you that way you're
14:14 basically making sure you have five
14:16 freaked out kids in each group
14:19 three algebra two kids in each group and
14:21 we'll talk about why this is beneficial
14:23 as the lesson progresses here but that
14:26 is what blocking essentially needs now
14:27 one more thing because I'm not writing
14:29 this whole thing out here for my
14:32 experimental design but after you get
14:34 your results so after they take their
14:36 class blah blah blah when you make your
14:39 comparison what you're gonna end up
14:41 doing is comparing within the block
14:45 firsts so compare
14:49 within the block what does that mean
14:51 basically when you get your results
14:54 compare apples to apples precalc kittens
14:56 versus precalc kids algebra 2 kids
14:58 versus algebra 2 kids
15:00 you compare those different parts of the
15:02 same block to each other because that is
15:04 more likely to tell you if something is
15:08 really going on so let's keep things
15:10 going in our lesson right here and what
15:11 I have on this next slide I'm pretty
15:14 sure yeah it's like hypothetical results
15:15 for what actually happened in this study
15:19 so um the kids took their classes life
15:22 was good and then what we did is we went
15:25 ahead I think they did use blocking in
15:27 this yeah so that this was with
15:29 blockings have you counted out there are
15:31 five precalc kids in each group three
15:33 algebra two kids in each group two Geoje
15:35 group and this shows when they took
15:37 their SAT again hypothetically what
15:41 their results ended up be so we need to
15:44 make ourselves dot plots right here side
15:47 by side so we can look at our results
15:49 our whole next chapter we will talk
15:50 about is all about graphs and what
15:51 they're good for
15:53 but when you look at this it's kind of
15:55 hard to tell if one is necessarily
15:57 better than the other so I'm gonna go
15:59 ahead and graph these on top of each
16:01 other right here and see what's going on
16:07 so I'll start at 0 10 20 30 40 50 and
16:10 you guys should be graphing these as
16:12 well so while I'm just talking here you
16:16 can start making your own graph I don't
16:19 know why I started numbering all these
16:21 so yeah make your graph and try to make
16:24 them stack on top of each other because
16:26 when you do it that way it'll be easier
16:28 to see if there is a difference between
16:30 the groups we talk about overlap in the
16:33 pictures and we did that in the previous
16:35 lesson here so that should be a little
16:37 familiar for you guys I'm gonna make my
16:39 first group this will be the online
16:41 group and this will be the classroom
16:43 group and what I'm gonna do is put a dot
16:47 for each kid where they end up so precalc
16:48 precalc
16:50 kids start out with online I have two
16:52 kids who got hundreds points
16:54 improvements that's good
16:57 etc so go ahead and graph them and make
16:59 sure you put them on the appropriate
17:04 axis here and another 100 those are my
17:06 freakout kids then it switches to
17:20 all rights that's pre couch
17:28 algebra to online is 50 60 40 50 60 for
17:34 he's right there okay um algebra 2 in
17:44 class is 30 40 20 and then GE Oh online
17:54 30 30 and then geo classroom 0 20 okay
17:56 so you guys should have these graphed on
17:58 your own here um and then what we're
18:00 gonna do is look at these pictures and
18:02 try to decide is there compelling
18:05 evidence that the online class is better
18:07 than the in-person class or vice versa
18:10 and we will learn much more
18:12 sophisticated ways of doing this using
18:14 like actual statistics later on
18:17 primarily second semester for right now
18:19 we're just kind of using our gut and
18:20 we're looking at the picture and the key
18:23 like I told you is over left if there's
18:25 a ton of overlap between the two groups
18:29 the results aren't compelling enough for
18:30 me to be like wow they're probably
18:32 actually is a difference there couldn't
18:33 be just a small difference if there's
18:34 only like a very small difference
18:35 between the two groups
18:38 I could assume that difference or I
18:39 could suspect that that difference only
18:41 occurred due the chance of the random
18:43 assignments so if you look at these two
18:45 pictures here that's what I care about
18:46 when you compare dot plots is the
18:49 overlap and if you look at the bulk of
18:51 these points these points go from here
18:57 to here these points go from here to
19:00 here these points right here have an
19:02 awful lot of overlap to them there's a
19:05 very strong amount of overlap right here
19:07 where if I was looking at this picture
19:08 yes the kids
19:11 and ooh nobody got a 9-year 100 at all
19:13 in the class group but I have five kids
19:15 who got it in the online you might be
19:17 like ah online can be a little better
19:19 but there's so much overlap between the
19:22 two groups that this probably isn't
19:25 enough to be convincing okay so what I'm
19:28 gonna write here just informally I'm
19:33 gonna say there's too much overlap to be
19:40 confidence of a difference if I was
19:42 writing this as like a full free
19:43 response question I would say there's
19:44 too much overlap between the scores
19:46 context is really important too much of
19:48 overlap between the scores we can't be
19:51 confident that one class is necessarily
19:52 better than the other
19:54 okay so when your data is spread out
19:57 that's a lot of variability we talked
19:57 about that before
20:00 variability is not our friend in
20:01 experiments because when you're more
20:03 spread out it's more likely you're gonna
20:05 have that overlap and then your results
20:09 aren't gonna need to mincing then I need
20:10 to erase this picture I think let me see
20:14 if I can keep it here actually my tabs
20:18 my next okay it's a little bit of a mess
20:19 right here but this is important and I
20:21 want you guys to see what's going on
20:24 when we made our dot plots by hands we
20:27 kind of looked throughout the fact that
20:28 kids were in different math classes like
20:30 you might remember oh those kids I think
20:31 we're in precalc
20:33 but it's not very evident by looking at
20:35 these dots right here that they're in
20:36 different math classes we kind of
20:39 disregarded that so what this picture
20:41 did is it took that same data you can
20:43 see the shape of this graph is the same
20:45 although their classroom and their
20:48 online are reversed for mine I should
20:51 have looked at that ahead of time so be
20:54 aware that this matches up with here and
20:57 that this matches up with here but you
20:59 can see that the shape of the graph is
21:01 very much the same all they did
21:02 differently instead of dots for
21:04 everybody though is they put different
21:07 symbols for each Club math class that
21:09 the kid was in so our precalc id's are
21:12 the triangles right here our algebra two
21:13 kids are the grey dots and then the
21:16 squares are the geometry kids here is
21:19 why this is actually more beneficial if
21:22 you look at the groups
21:24 individually earlier in this video I
21:25 said the whole point of blocking is that
21:27 you're comparing apples to apples
21:31 oranges to oranges etc but when I looked
21:33 at my graph right over here I didn't
21:34 really do that I just like looked at
21:37 everything as a whole look at the
21:40 group's individually so if I look here
21:44 are my precalc is for in class
21:47 here are my pre calc id's for online no
21:48 overlap between those groups this group
21:51 is all higher than all of these kids
21:53 right here if this was just random
21:55 assignment here it's very unlikely that
21:57 every single kid would beat every single
21:59 kid right here so what that tells me is
22:01 man that online class might have
22:03 actually helps same thing when you look
22:05 at out to be two granted there are less
22:08 dots so not grates making conclusions
22:10 based on six data points but there's
22:12 quite a bit of separation right here
22:13 only a little bit of overlap with you
22:16 through kids tied and then with the Geo
22:20 I got this versus this so when you look
22:23 at the groups individually apples to
22:26 apples it becomes a lot more clear that
22:29 the online class in this scenario was better
22:29 better
22:42 it appears that online led to higher
22:52 improvements within students of the same
23:00 math class so out of all the precalc
23:02 kids the kids in the online class did do
23:04 better and same with algebra two and
23:07 same with geometry so when you actually
23:09 look at the blocks and compare within
23:11 the blocks apples to apples it becomes
23:14 easier to see if your results could be
23:16 statistically significant where things
23:17 get cloudy when you look at the data
23:20 overall so this is kind of a visual and
23:22 demonstration of like why blocking is
23:26 helpful it helps us determine if things
23:28 are statistically significant or not by
23:32 accounting for a source of variability
23:35 in our data that was a big mouthful
23:35 right there
23:38 source of variability in our data is
23:40 what math class you're in depending on
23:41 what math classroom and that's gonna
23:43 swing how much you improve on this class
23:46 quite a bit well if you kind of take
23:48 that off the table and you balance it
23:50 out so you've got this versus this right
23:51 here I'll freak out
23:53 then you don't have to worry about that
23:55 clouding your results anymore so it
23:58 accounts for a source of variability in
24:00 the data I'm gonna have you write that
24:02 at some point in the slides but I want
24:05 to see if I did that elsewhere so what I
24:07 want to do right here is show you guys
24:12 just a quick way that you can this is an
24:13 alternative kind of extension sort of a
24:16 deal so just a straight up law there is
24:18 another way we could kind of put
24:21 everybody on the same playing fields so
24:23 what they did if you look at this right
24:24 here because this is gonna take a little
24:26 bit of explaining for you guys to
24:29 understand it says right here that the
24:32 average improvements for students in
24:36 pre-k was 86 I'm gonna start right there
24:39 so if you take all those stuff those
24:42 numbers right here 100 hundred 99 270 is
24:44 70 all those precalc kids in you average
24:46 their scores together turns out that
24:50 average is an 86 so no matter online or
24:52 classroom everybody together who was in
24:55 precalc improved by an average of 86
24:59 points and in algebra 2 so if you add up
25:00 all these numbers right here for the
25:03 algebra 2 kids those numbers average out
25:08 to 40 and geo averaged out to 20 so
25:10 another option this is like kind of a
25:12 just above and beyond sort of thing that
25:16 you can do if you are in precalc your
25:20 score is expected to improve by 86
25:23 points just by virtue of being in pre-k
25:25 that's just the average of everybody in
25:29 pre-k okay so this is kind of like the
25:32 head start or the bonus or whatever that
25:33 you get the benefit you get just from
25:36 being a kid in pre-k what we do a lot a
25:39 lot a lot in statistics is we analyze
25:44 the difference after you account for
25:46 things like we do a lot with differences
25:48 in statistics
25:50 so in this context what I'm gonna do is
25:52 I'm going to take each of my freakout
25:55 kids and I'm gonna subtract 86 points
25:58 right here so if I subtract 86 from this
26:00 I wish I had written these out ahead of
26:04 time but that's a 14 that's a 14 for for
26:08 14 and then we have negative 16 negative
26:13 16 negative 6 negative 6 negative 6 ok
26:16 so this is the number these are the
26:18 numbers after we take out that average
26:22 improvement of freaked out kids I'm
26:23 gonna do the same thing with algebra too
26:26 but by being an algebra 2 your score
26:28 improved by an average of 40 points
26:30 across the board so by being an algebra
26:32 2 this is like your algebra 2 bonus
26:35 right here so if I subtract out that's I
26:40 get 10 20 so I'm subtracting 40 each
26:44 time right here um so I have negative 10
26:50 I have 0 I have negative 20 and then
26:54 with the Gio kids I have a minus 20 for
26:56 each of them so I should change colors
27:01 and then I'm gonna have 20 points
27:04 subtracted off from each of them this is
27:10 gonna end up being a 10 a 10 a 0 that's
27:15 negative 20 and a 0 so what this did is
27:18 it basically leveled the playing field
27:22 in a way by removing um the advantage of
27:25 being in that class so to speak so these
27:28 ones got 86 points knocked off these
27:30 ones got 40 points knocked up and he's
27:34 got 20 points knocked off now that all
27:36 the kids are more or less on the same
27:37 playing field and we eliminated the
27:39 advantage to being in a certain math
27:42 less what we could do now is we could
27:44 look at all of our online kids so these
27:48 guys are online um these guys are online
27:52 and these guys are online and we could
27:55 graph all of these against all of the
27:57 not circled problems I'm not gonna graph
27:59 it but if you look most of the kids with
28:01 positive numbers are in the online
28:03 and there's a whole lot of negative
28:07 numbers in the in-person groups would be
28:08 another way of looking at your data
28:11 holistically and seeing that maybe there
28:14 is an advantage to being in the online
28:17 group okay so this slide right here a
28:18 little bit more above and beyond just a
28:20 different way of approaching it but
28:22 looking at things within blocks like we
28:24 started off on these slides very very
28:31 important but I've set a few of these
28:32 things already but this is good used to
28:34 be able to talk about it again and have
28:36 it here in your notes blocking in
28:38 experiments is similar to stratified
28:40 sampling I've already talked about that
28:41 in the difference you can backpedal in
28:43 this video and hear me say that again if
28:44 you need it but picture it's like
28:48 stratified is for sampling blocking is
28:51 for experiments so blocking is a form of
28:55 random or blocking goes well with random
29:00 assignments stratified goes well with
29:04 random sampling and then um blocking is
29:05 a good way to increase your chances of
29:06 finding convincing evidence we've talked
29:08 about that it was easier to see there
29:09 was a difference when we compare apples
29:12 to apples block should be choosing
29:13 chosen like strata units within a block
29:15 should be similar and different than the
29:18 other blocks this line right here is
29:22 important you should only block when you
29:24 expect that the blocking variable is
29:26 associated with the response variable
29:29 okay so in my same problem with my
29:32 online class my in-person class blocking
29:34 by somebody's para color would be a
29:36 stupid thing to do because there's no
29:39 real like um incentive like there's no
29:41 real thought that people with brown hair
29:44 are gonna do differently an online or in
29:48 person if you block by a poor choice of
29:49 a variable one that doesn't actually
29:51 influence things like you were thinking
29:54 what you've done is you've fragmented
29:56 your data instead of looking at twenty
29:58 in having a 10 and a 10 if I went by
30:00 hair color now I've got three people
30:02 with brown here three people with brown
30:04 hair three versus three is not as good
30:08 as 10 versus 10 so if you block poorly
30:10 by a variable it doesn't have to be as
30:13 ridiculous as my example right there but
30:14 if you block by something that doesn't
30:15 actually matter
30:17 you run the risk of fragmenting your
30:18 data too much to find anything meaningful
30:19 meaningful
30:22 I mean blocks are not formed at random
30:24 they are chosen in advance by the
30:27 experimenters there's one other phrase
30:28 that's super importance I want you to
30:31 put a star by it's in your notes I said
30:32 it out loud earlier but now it's
30:34 actually here for you guys
30:37 blocking accounts for a source of
30:40 variability in your data the math class
30:42 you're in is gonna be a source of
30:43 variability and how much you improve
30:46 blocking balances it out and make sure
30:49 that you're comparing apples to apples
30:51 all right so something to think about
30:53 here at the bottom what are some
30:56 variables that we can block for in the
30:58 caffeine experiment that we did so let's
31:00 go with our hypothetical one we didn't
31:02 actually do with the caffeine so I'm
31:04 gonna give you guys cups of soda so
31:06 think about what would be good a good
31:09 choice of a blocking variable okay so a
31:11 couple things we can do here now that
31:13 you've thought about it one probably
31:14 pretty good one would be caffeine
31:16 tolerance so if I have 10 of you guys
31:18 who have caffeine allots and 8 of you
31:20 guys who don't have caffeine as much I
31:22 would take the 10 that'd be my block and
31:25 make sure 5 of you hit the caffeine 5
31:27 you get the caffeine free then I take my
31:30 8 kids who are not big with caffeine and
31:33 you 4 and 4 and then you compare those
31:35 little groups instead of comparing the
31:38 whole big pile um you can set other
31:40 things like weighed athletic ability
31:42 etcetera etc as an experiment or what
31:44 you would do is you would put people
31:46 into piles based on some variable you
31:48 can block my more than one variable but
31:52 you fragment your data to like into two
31:53 smaller groups so generally you'll see
31:55 it just with one variable and then what
31:57 you do is you deal out the people in the
32:01 piles randomly in general how can we
32:02 determine which variables might be best
32:04 for blocking just kind of common sense
32:07 logic maybe if you have prior research
32:10 in the area you may know that something
32:12 is going to influence things you don't
32:13 want to just guess something a new block
32:15 and just because it's a good idea
32:16 because you think it's a good thought
32:18 you have to have some motivations and
32:20 reason to believe that what you're doing
32:25 will be helpful all right so I believe
32:27 that this guy right here's my last slide
32:29 let me
32:32 yet is so we have a fresh context right
32:34 here and I'm not gonna write this I'm
32:36 just going to talk you through it so we
32:41 have um a popcorn aficionado here and
32:45 she has four types of popcorn and she's
32:46 gonna see if the popcorn button on the
32:49 microwave does better with popping the
32:51 kernels than using the amount of time
32:54 printed on the back and when she goes to
32:56 the store she's gonna buy four kinds of popcorn
32:58 popcorn
33:00 there's movie butter like butter natural
33:02 cattle for 40 bags altogether
33:04 why would randomized design be
33:07 preferable to a completely randomized
33:09 design this I do want to write at least
33:11 something on because this is important
33:12 and something I could very well ask you
33:22 to do one quiz the type of popcorn is
33:49 how many kernels will blocking accounts
34:02 okay some types of popcorn probably pop
34:04 better than other types of popcorn if
34:05 you've got all sorts of butter wrapped
34:07 around it maybe that stops it from
34:09 popping as well or whatever so rather
34:12 than comparing all 40 bags 20 verses 20
34:14 and oh maybe I ended up with more of
34:16 butter in this kind of grip right here
34:19 you break it up so you get half and half
34:22 when you do like if I would have done
34:25 just a randomized design with all 40
34:28 bags my results are not going to be
34:30 biased and they are not you're not
34:33 running the risk of confounding so those
34:35 are two things students will incorrectly
34:38 assume sometimes your process is still
34:40 random so you're not favoring one side
34:43 over the other consistently just in one
34:46 trial you may have more in one group
34:48 than the other so your results if you
34:50 repeat this you replicate it multiple
34:53 times will have more variability because
34:54 you could get more butter over here this
34:56 time and more butter over here the next
34:58 so your results are more spread out and
35:00 I'm not harder to tell there's overlap
35:04 for statistical significance okay
35:07 outline a randomized block designed for
35:08 this experiments we've talked about that
35:10 before I don't want to write it all out
35:12 right here but basically what you would
35:13 do is you could give me a play-by-play
35:16 with this you would start by putting
35:17 them in groups of 10 so take the 10
35:21 movie butter randomly assign 5 give a
35:22 number to each one and randomly assign 5
35:25 to this group 5 to this group blah blah
35:27 blah proceed and do the same thing
35:28 repeat the process with light butter
35:31 natural etc you just talk in words about
35:33 what you're gonna do making sure you
35:35 highlight those key things we talked
35:39 about earlier in the lesson very last
35:40 thing I didn't want to waste an extra
35:42 piece of paper this is the eighth slide
35:44 but this is actually important and it
35:45 looks not important because I like
35:48 tacked it on the end right here but put
35:52 a star by this last definition matched
35:56 pairs design is important and it's the
35:57 last thing we have to talk about
35:59 Internet's right here matched pairs
36:09 design is blocks of size 2
36:20 with very similar individuals in each
36:23 grade so I'm gonna give you a few
36:25 examples of this match pairs is a
36:28 special type of blocking where you have
36:30 two people in each group and what you do
36:32 is you take those two people and you
36:35 give one one treatment and the other the
36:37 other treatment and then you compare
36:41 those two people okay um very easy
36:42 example to understand so you had
36:44 identical twins and you wanted to test
36:47 um I don't know some sort of medication
36:49 since they're genetically so similar you
36:53 could give one twin one kind of medicine
36:55 and once when the other and compare
36:57 between the two of them how their
36:59 responses are gonna be now naturally you
37:01 don't always have identical twins and
37:03 that's not even necessary to do a mashed
37:07 pairs um let's take our SAT example
37:10 let's say I was gonna do an SAT study on
37:12 you guys in class what I could do
37:15 instead of doing a blocking since you
37:17 guys are all in the same class what I
37:18 could do is take the two people with the
37:21 highest SAT scores in the class or a CT
37:24 scores and put one in each group
37:27 randomly then the next to highest score
37:29 is one here one here next two highest
37:31 scores here here etc so you basically
37:33 are pairing people up who are very
37:36 similar and then you're making sure one
37:38 goes to each treatment there's also an
37:40 example in our book it doesn't even have
37:41 to be two separate people let's say I
37:43 wanted to test like two kinds of
37:45 deodorant to see which one works better
37:47 I can use myself as a matched pairs and
37:50 randomly assign one arm one kind one arm
37:53 the other and compare on myself so you
37:55 can even get creative with things like
37:57 that but match pairs even further
38:00 accounts for variability because if I'm
38:03 testing both things on myself all those
38:05 other factors about sweat levels
38:07 athletic ability etc etc etc are the
38:10 same so match pairs is a nice way to
38:12 account for a lot of a variability but
38:14 blocking in general whether it's regular
38:17 blocking or mashed pairs basically in a
38:20 nutshell makes it easier to tell if what
38:22 we are testing is actually