0:05 Thank you everyone again for being the
0:08 experimentation session. We have our
0:11 last but not least session from Liz
0:16 Openmire. Liz is a C uh sorry almost Liz
0:18 is a data scientist at Stasic and before
0:22 Stasic she used to work at Meta. uh
0:26 astic she built a lot of important
0:29 customerf facing features directly for
0:32 example the surrogant metrics if you are
0:35 familiar with this concept as well as
0:38 filer interval as one of the very first
0:41 platform that has that feature um so
0:43 today she's going to talk about 10 type
0:46 of experiment to run before you die here
0:47 you go [Music]
0:50 [Music]
0:52 >> hi everyone and thank you so much for
0:54 coming. I really appreciate you all
0:56 turning out and I hope you're as excited
0:59 as I am to talk about experiments today.
1:01 Um I know it's a bit of a clickbaity
1:05 title. Uh but uh hopefully the talk
1:08 itself lives up to this hype. Um I think
1:10 there's a little bit of an elephant in
1:11 the room. Obviously I'm going to try to
1:12 convince you today that you should be
1:14 running more experiments and I work for
1:16 Stats, a company that's trying to sell
1:20 you experiments. Um, but hopefully my
1:22 content today will be convincing enough
1:25 that you can look past my impure motives
1:27 and uh hopefully run some more experiments.
1:28 experiments.
1:31 >> She do. I wanted to start with this
1:34 quote by George Box that all models are
1:37 wrong but some are useful. It's a very
1:40 popular adage in kind of analytics and
1:42 experimentation community. You're all
1:43 probably familiar with it. Found that
1:46 it's very controversial but I still like
1:49 it. Um, but I think that folks don't
1:50 like it because they're like, "Oh, all
1:52 models are wrong. What What do you mean
1:55 by that? They're all wrong." Um, but
1:57 what we mean by that is every model is
2:00 going to have uncertainty in its results
2:02 and building it on assumptions. And
2:05 there's this inherent tradeoff that the
2:07 assumptions you're willing to make and
2:09 how certain you can be about your
2:11 results based on those assumptions. And
2:14 that's a very core tension to model
2:16 building, right? and is is going to be
2:18 something we'll talk about today with
2:19 all of these different 10 types of
2:21 experiments. And when I talk about
2:24 models, I also mean causal inference
2:39 >> sorry, spoilers.
2:43 Um, which is a good experiment.
2:46 Um, but I've obviously butchered this
2:48 quote. I've made it way too complex and
2:50 have been very pedantic about it. Uh,
2:53 but unfortunately as data scientists, I
2:54 feel like that's within our fatal flaw.
2:56 We want to be really precise, really
2:59 talk through all our assumptions when
3:00 business folks are maybe like just tell
3:02 me what just tell me what you want me to
3:05 do. Um, but for a data science audience,
3:07 for a data audience like this, I wanted
3:10 to walk through all of the nuance to
3:13 this kind of a quote, right?
3:15 With that in mind, when I'm talking
3:16 about experiments, I'm usually talking
3:20 about randomized control trials. And why
3:23 is that? Has anyone here run other types
3:25 of causal inference studies like
3:28 behavioral studies? A lot of it's not
3:30 autoometrics is built off behavioral
3:32 studies. A lot of different areas of
3:35 academia rely on this. Uh but the
3:37 painful thing is it's really hard and
3:39 you're working with a lot of assumptions
3:41 and you're working with a lot of
3:44 confounders. However, with
3:46 experimentation and in particular
3:48 randomized control trials, you have the
3:50 benefit of randomization doing a lot of
3:53 heavy lifting for you. You get to be a
3:55 little lazy, right? You're like, well,
3:57 in expectation if the sample size is big
3:59 enough, it it'll just all out, right?
4:01 any confounder that you think about or
4:03 even those that you don't think about at
4:06 all will be dealing that out when you
4:07 have enough of a sample size. And what
4:09 that actually does practically is free
4:12 up brain power for you know tackling the
4:14 corner cases, the edge cases,
4:16 communicating to stakeholders, doing
4:18 everything else other than trying to
4:20 convince people that your assumptions
4:22 are valid and that we should run it this
4:25 way rather than this other tweaked way.
4:27 Uh because they're very, you know, well
4:29 structured assumptions. I know that what
4:31 I basically said is well I don't like
4:34 making assumptions like the parallel
4:36 trends assumption for a diff and diff uh
4:40 behavioral study but I love making sa sa
4:42 is my favorite assumption um but I I
4:44 think that it's about a little bit of
4:45 the framing and the standard practices
4:47 of experimentation that that make them
4:53 The rest of this talk really is going to
4:56 be 10 different experiments that you can
4:58 run. Uh it's it's not that original. the
5:01 structure isn't mind-blown IG beans. Uh
5:03 so we really will be going through 10
5:05 different experiment types. Uh there
5:06 will be the basic experiments that I am
5:09 thinking probably everyone here has run
5:11 or heard of or interacted with before.
5:13 Um there will be the third questions
5:15 where we have some like atypical
5:17 situations where you might need to take
5:19 how you're setting up your experiment to
5:22 handle that. And then I'll be looking at
5:23 what I'm calling jet fuel, which is
5:25 things that help you speed up your
5:27 experimentation practices and make
5:31 decisions faster.
5:33 With our basic experiments, like I said,
5:36 they're basic, but they're fundamental.
5:37 Uh we're actually going to spend most of
5:40 our time here in the list because it is
5:42 kind of the most things that all people
5:43 are going to potentially be working
5:46 with. Our first example that we'll be
5:47 talking about, our first type of
5:49 experiment is the standard AB growth
5:52 test. And this is actually what most
5:55 experiments are in practice uh in the
5:58 industry, right? Because a lot of times
6:00 we have folks who are doing marketing
6:02 and want to get adoption for whatever
6:05 their product is, whatever their company
6:09 is. Um so you might realize that I am
6:11 definitely not a marketer because my
6:13 choice for the treatments here were
6:15 probably pretty terrible. I have the
6:17 control just till you sign up today,
6:21 please. I have uh another uh test arm
6:23 that says, "Hey, if you sign up today,
6:27 you can get a small sliver of Bitcoin."
6:30 And there's a third version where I say
6:32 offer expires today. We'll match your
6:35 first deposit. Um so these are all
6:37 potential different call to actions that
6:40 might happen on an email or a landing
6:43 page or something like that. And what we
6:46 can do here is really measure the
6:49 outcomes of those super effectively. But
6:51 I actually want people to think beyond
6:54 just the basic metrics of like, okay,
6:56 did they actually sign up? Did they
6:58 check it out? Were they a visitor to the
7:00 site? And think about much further down
7:03 impact from just did they click the
7:05 button? And did they convert on this one
7:07 instance? I'd love to ask the room. Does
7:09 anyone have an idea of what may be a
7:11 good additional metric to use in this situation?
7:13 situation? >> Yeah.
7:13 >> Yeah.
7:15 >> Second transactions,
7:17 >> second time transactions was what he
7:19 said, right? You care about the lifetime
7:22 of that user. Did getting that email
7:25 make them a repeat retentive customer.
7:27 Was that onboarding experience something
7:29 that impacted them down the line and for
7:32 the long term? Because often times uh
7:34 when we're making these kinds of
7:37 experiments, it's the entry point to a
7:39 product and that can have really big
7:42 implications beyond just did they sign
7:45 up after seeing this.
7:47 >> Anyone else have any additional ideas? Yeah,
7:48 Yeah, >> definitely.
7:51 >> definitely.
7:53 No cancellation.
7:55 >> Change your cancellation. Yes. Did we
7:59 have clickbait in our call to action and
8:00 people clicked on it? that we achieved
8:02 that goal, but they actually churned
8:04 immediately after because they realized
8:06 maybe it wasn't as good of a deal as
8:08 they thought. We got them to do that
8:11 initial click but not follow through on
8:13 what we them to do. Those are some great
8:15 ideas. I'm sure everyone has tons of
8:17 ideas like rare running an experiment
8:19 where we only care about three metrics,
8:22 right? Um but if we talk about every
8:24 experiment type in this detail, we'll be
8:27 here for the rest of the afternoon. for um
8:33 the next one I want to talk about is the
8:36 standard AB product test. The way that
8:38 this is going to differ from the growth
8:40 test is that people are already on your
8:42 product when this is happening. It could
8:44 be something like a notification test.
8:47 Uh who here likes notifications on their phone?
8:49 phone?
8:52 No volunteers. No volunteers. Sometimes
8:54 that was actually probably the most
8:55 honest answer, right? Sometimes they're
8:58 really helpful, sometimes they're super
9:00 annoying. And so this is going to be
9:01 another common type of test that people
9:04 will do to kind of understand how users
9:06 interact with their product and what's
9:13 sorry. Um, I know that none of that was
9:15 probably a surprise to any of the folks
9:17 in this room, but I do want to emphasize
9:19 that, uh, this is going to be a
9:22 commonplace type of experiment, but they
9:24 can get really complex really quickly.
9:25 And there are a lot of ways to think
9:27 outside the box even with just the basic
9:30 types of experiments. For example, if we
9:32 think about things like UI changes, the
9:34 classic example, it's like okay, it's
9:36 just a button test, right? Button, blue
9:39 button, contrived ex. But there's a lot
9:41 of things that you can do that are in
9:44 the scenes and that have a big impact
9:46 rather than just like, oh, the UI looks
9:48 a little different now. Um, if anyone
9:50 here's a user of Stat Sig, you've been a
9:53 part of one of my experiments, which is
9:56 uh doing different strategies for
10:00 caching and surveying and querying the
10:03 different uh metrics explorer queries
10:07 you might be using. Um, so again, it
10:09 doesn't just have to be UI. There was no
10:10 human involved in that at all. It was
10:13 just different query techniques. Um,
10:15 next, it's the population that you're
10:17 working on. So for the experiment I was
10:19 talking about my population wasn't users
10:22 kind of just assumed that faster query
10:24 equals better which is pretty safe
10:27 assumption. Um but what that meant is
10:29 that I could use a different unit type
10:30 being each individual query being
10:34 randomly assigned ra than needing to use
10:36 uh users which is a really typical unit
10:38 of randomization. Um, so this could
10:40 really anything. You could have sessions
10:42 as the unit identifier. And obviously as
10:44 you're choosing your unit of
10:45 randomization, you're going to want to
10:47 look through your assumption list of
10:48 whenever you're running an experiment of
10:51 like, hey, are these going to, you know,
10:54 conform to SVA? Is this reasonable? Like
10:56 what do I need to do in this situation?
10:59 Um, the next piece of advice for these
11:00 kinds of experiments, just don't mess
11:03 things up. Easy as that, right?
11:05 um they think that it can be really
11:07 sneaky to uh get bugs or regressions in
11:09 something that you're testing. And
11:10 honestly, when you're running an
11:12 experiment, that's probably the biggest
11:15 value having an experiment run is that
11:17 you pitch those mistakes or things that
11:19 are really bad for your business even
11:20 though you thought it was a great idea.
11:21 And either you can tweak the
11:24 implementation of it or you can be we
11:26 better scrap this idea altogether
11:29 because it's just not. Um and then
11:32 measurement is not one sizefits all. Um
11:34 different products are going to require
11:36 uh you know different measurement and
11:38 this is really where domain expertise
11:40 comes comes into play you know uh it
11:43 isn't just like a black box like any
11:45 data scientist can make good decisions
11:48 about any there is a level of in-depth
11:50 knowledge that really helps when you're
11:58 I have the third type of test up here
12:00 and I'd love if anyone has a guess for
12:05 what type of test this is.
12:06 I'll give you a second reading it, but
12:08 I'd love to hear if anyone has a guess
12:18 Any guesses? No volunteers. You can't
12:20 just shout it out to what's
12:22 what's
12:24 performance to this. That's a good
12:27 guess. Um, this is actually a really
12:30 messed up AI test. Um, obviously I
12:32 cherrypicked this example, you know, uh,
12:34 I'll I'll own up to that, but I think
12:36 that it's really important to be running
12:38 AI tests as well to understand that like
12:41 is your randomization working when there
12:43 is no real change and also kind of just
12:45 to get a sense of like an AI test like
12:47 they can really sneak up on you if you
12:50 like have a real test that's running
12:53 that is you think it's an AB test but it
12:55 actually didn't really change anything
12:56 meaningfully and it's actually an AA
12:58 test. Those can be really sneaky and you
13:00 might be shooting things that don't make
13:02 a difference because with a 95%
13:03 confidence interval which is pretty
13:06 industry standard. Yeah, there's that 5%
13:07 chance that you know you're getting
13:11 those positives right. Um so I think
13:12 that the AA test can be a really
13:14 powerful tool a from making sure that
13:17 your randomization and telemetry is all
13:19 working correctly and b just to kind of
13:21 familiarize yourself with the concept to
13:23 be like wait am I am I getting tricked?
13:24 Am I getting tricked by something that's
13:30 The fourth type of experiment that I
13:34 wanted to talk about was hold outs in.
13:36 Uh there are different ways to do pulled
13:39 outs or back tests, but it's basically
13:41 kind of this big umbrella of experiment
13:44 types where talking about withholding
13:46 products that you've shipped from a
13:48 certain set of the population. Um this
13:51 beautiful chart is actually from Etsy.
13:53 Uh they have a really great blog called
13:55 code as craft that I've really really
13:57 enjoyed their articles on. Um but this
13:59 is how Etsy runs their holds, right?
14:02 They're shipping things across, you
14:04 know, a quarter and then they have a
14:06 comparison period between two untainted
14:09 samples that have not gotten any of the
14:11 shipped experiences during the quarter
14:13 and they compare them into a sack. This
14:15 is one of the ways that you can run a
14:16 hold out, right? you're saying like,
14:19 hey, how do I for the winner's curse and
14:21 make sure my shipped experiments are
14:23 actually doing good? But I think there's
14:26 also a really interesting methodology um
14:29 that we like at stats, which is uh
14:32 comparing what you're shipping to that
14:35 hold out during that Q1 because it can
14:37 help you kind of understand what is the
14:39 total impact of my experimentation
14:41 program over time. You can also kind of
14:44 look at this daily time series type view
14:47 and you can see kind of intuitively the
14:49 impact of different rollouts as they
14:51 happen. Right? Any roll out before
14:53 you've shipped anything starts as an EA
14:55 test. So that looks really reasonable
14:57 that at the start there's there's zero
14:58 difference between the test and control
15:00 group. Sorry, I realize I haven't
15:02 explained this visualization at all as I
15:05 got here. Um but basically this
15:07 visualization is comparing your test to
15:09 your control group over the different
15:12 days uh that are happening here. And so
15:16 when this is pulled out uh you're
15:17 getting that sense of like okay as
15:21 certain uh launches are happening what
15:23 is the impact on the total population
15:25 and what is their aggregated impact at
15:33 That ends our section of the basic
15:36 experiments. Um, and so we're going to
15:38 be able to move on to the hard
15:41 questions. And I think that when we go
15:43 on to this next section, what I really
15:45 want to emphasize is these may not be
15:49 your hard questions, but they are
15:50 existing ones. And there's a lot of
15:52 literature out there about these kinds
15:54 of challenges that you might be facing.
15:57 So just kind of you know thinking about
15:59 like the experimentation community and
16:01 what kind of solutions there might be uh
16:03 for different challenges that folks
16:05 face. Uh one of them that I wanted to
16:08 talk about was interference.
16:10 Um I've talked about SIPA a lot. My
16:11 favorite assumption as I told you
16:14 earlier. Um we're basically assuming
16:16 that every unit that we're experimenting
16:19 on has a stable treatment. There's no
16:22 interference between those unit. It's
16:24 nice and clean. everyone's independent
16:26 of each other. I love assuming things
16:29 are independent variables, right? Um so
16:32 what about when is violated? Like if
16:34 we're putting up a billboard in a giant
16:36 city, you know, shameless plugs for
16:40 student sig too, obvious um but
16:43 basically when we're treating shulations
16:45 and we can't control for these kind of
16:48 network effects, um there are a few ways
16:50 that we have to deal with this kind of
16:54 violation of SIPA and be a little here.
16:58 uh one of them if it could be an
17:02 experiment maybe it is there's some
17:04 fuzzy definitions here but basically
17:06 using a synthetic control can be
17:09 reallyful for this type of situation uh
17:12 because you don't
17:14 like what if the billboard wasn't there
17:17 in that particular city oh you have a
17:20 synthetic control modeling would look
17:23 like from units that ain't having billboards
17:24 billboards
17:28 then kind of compare that test result
17:30 that you're observing to the synthetic
17:34 control that you right um an example of
17:36 how this works would be let's say you
17:38 know I'm changing something about the
17:39 vibe of Seattle and I'm like well
17:42 Seattle's kind of like if you mix
17:46 Boston and all I would basically be like
17:49 okay that's that's what the vibe of and
17:52 I'm do something that changes the vibe
17:55 of Seattle Um and so what I would do if
17:57 I'm making that experiment is I would
18:02 say okay well based on how SF
18:05 and Boston are doing that's my synthetic
18:08 control is because I can use that model
18:10 it's going to spit up what spit out what
18:13 the vibe of Seattle would be if I had no
18:15 treatment and that works as long as I
18:19 make sure those cities are untreated of
18:20 them. So it lets you be a little bit
18:23 creative in how you're measuring things
18:25 and it works really well. It has network
18:26 effects. You don't have a lot of sample
18:28 size. You're kind of bumped by some of
18:35 Sorry. Well, we'll grab pushies at the
18:37 but I know there's a lot to discuss
18:40 here. Obviously this kind of like in a
18:42 bush of different uh experimental
18:44 techniques, but we can definitely chat
18:45 after and I think there will be
18:47 different questions at the end too. Um,
18:50 another way to handle this same is
18:53 sweat. Um, this is really frequently
18:55 used in like the ride share problem
18:59 where it's like myth have markets. Uh,
19:01 or in DNA, right, where you try to match
19:03 people to play against each other. Um,
19:05 not only do you have the issue of, you
19:07 know, shareoffs between units, but you
19:10 also have, you know, people who are on
19:13 different teams or people who are buying
19:16 the, you know, ride share, uh, you know,
19:18 rides and people who are driving are in
19:22 the marketplace, right? Um, so these can
19:24 make, uh, you know, an already tricky
19:28 problem even trickier, right? Um but the
19:31 uh example of switchbacks that is really
19:33 interesting because instead of
19:38 experimenting on uh units of you drivers
19:40 or riders or different people playing a
19:43 video game, you're experimenting on
19:45 units of time which might also be broken
19:47 up into like different geographies or
19:48 maybe it's different servers. You just
19:50 video games example. And what you're
19:53 doing is you're kind of swapping in your
19:55 treatment and your control in these
19:58 units of time. Uh this obviously comes
20:01 with some new assumptions, right? You
20:02 probably don't want it to be something
20:04 super visible to the customer. So this
20:06 is something more like, oh, Uber's
20:09 pricing be experimented on this way or
20:12 the different matching algorithm in the
20:13 video game could be experimented on this
20:15 way, but not like the UI changed. that
20:17 would be super jarring if it's like well
20:20 I saw the 10 minutes ago but it's not
20:24 there anymore. Um there are also some
20:25 other constraints of this kind of
20:27 methodology that you have this kind of
20:30 burnout in uh the diagram that I'm
20:33 showing where we actually don't those to
20:36 either the test or the control but we
20:37 come with things that are relatively
20:40 shortterm too. That's actually the
20:43 matchmaking and the eukar example work
20:45 really well because when on the Uber
20:47 app, right, you're usually not able to
20:49 make quick decision of like, hey, I want
20:51 to get this ride here. Here's the price
20:53 of option. Cool. Let me book it. When
20:54 you're getting batched to play a video
20:56 game, right, you you kind of enter the
20:58 like gameplay screen and there's
20:59 batching that happens and you play the
21:02 game. They're all relatively shortterm.
21:04 So you can cut it with this time period
21:07 and use that as your randomization and
21:09 kind of stick it to later. Whereas like
21:11 for a billboard there's no way that
21:13 we're not taking down putting up taking
21:15 it putting you know there there's just
21:19 these practice right.
21:22 Um moving on to a different area not uh
21:24 dealing with those kind of uh violations
21:28 of such and different forms. Uh one is
21:30 about elasticity.
21:32 Um, I think that they're really cool
21:33 because they help you turn
21:36 experimentation from a tool to make one
21:38 decision into a tool to make any
21:42 decisions, right? The idea is that let's
21:44 say, you know, that I'm proving the
21:47 performance and latency of the map
21:48 queries that I was talking about
21:50 earlier. I know that's I've made that a
21:53 screen and you know can see hey load and
21:57 MP issues that trouble so much. If I
21:59 decrease load time, will people
22:02 actually, you know, stay there longer or
22:04 like have a better experience because of
22:07 that? And how much is really worth it,
22:09 right? Should I be my engineers spend
22:12 all the quarter on the increase and
22:15 latency or to worry about other things
22:17 like building features and you know, all
22:19 of the media things they could be
22:22 working, right? Um so I think that just
22:25 kind of like us is really powerful to
22:28 understand like yes we want to make the
22:30 right decision on an individual case but
22:32 also how do we keep making the right
22:35 decisions on to prioritize um and so
22:37 this can help answer the questions if we
22:40 have a finite amount of load time
22:43 decrease and we see how other
22:47 areed. Um, another way to be able to
22:49 achieve the same kind of is using a
22:53 regression which I on purpose make
22:56 product worse. This is very
22:57 controversial. I know several companies
23:00 that just fundamentally like you know we
23:01 don't we don't do regression testing
23:04 actually. Um, and it's very
23:07 controversial because you are making an
23:10 experience worse for people and you're
23:12 potentially causing your customers to
23:14 churn. like it it could really hurt your
23:17 metrics overall, right? Um but kind of
23:21 that balance between learning and uh
23:22 making the right thing. If you're
23:24 familiar with like ML works, you
23:26 probably know explore exploit, how much
23:29 are you wanting to explore here and how
23:31 fruitful is it going to be to user
23:33 population if you do this kind of
23:37 experimentation, right? I but
23:40 interesting because it does help you,
23:41 right? You get to make those
23:43 prioritization decisions really clearly
23:48 based on data instead of vibes.
23:51 >> Okay. Update. Next section that I'm
23:52 talking about is GQL things help you
23:56 experiment really really quickly. Um I
23:59 think that's important in terms of
24:02 addictums and what's nice we actually
24:04 borrow a lot from like medical
24:07 literature in past um because when they
24:09 were running experiments they were you
24:13 know people see right so they have a lot
24:15 of techniques to be able to make
24:17 decisions quickly and either really
24:19 quickly stop something that's targeting
24:22 people or really quickly uh something
24:26 that's helping people Okay,
24:28 one example of this is SPRT or
24:31 sequential probability ratio tests when
24:33 are basically a different role of
24:36 thinking through the instead of a p
24:39 value we have this probability ratio
24:43 like this likely ratio of one is that
24:45 it's kind of your vegetables of like you
24:48 really have to wrap our analysis for
24:51 every single that you're looking at and
24:54 updates of like okay what is evidence
24:56 that convinces me there's no effect.
24:57 What is the evidence that convinces me
25:00 there is an effect? Which we should all
25:02 be doing power in adolescence for like
25:05 your classic frequentist experiment. But
25:07 as Dylan was talking about earlier, you
25:11 know, not always necessarily, but with
25:13 our teeth, it's fundamentally part of
25:15 the process. So it does make everyone
25:17 eat their vegetables. And then also it
25:19 helps make decisions for sure in most
25:21 cases where you're able to kind of
25:23 quantify like do I have evidence that
25:27 this is you know not at all different or
25:28 do I have evidence that there isn't kind
25:37 I think great for this is bandits again
25:40 not exactly a randomized trial but you
25:43 do know bias that you're introducing in
25:46 the problem that it solves here is if we
25:48 see this cumulative success rate over
25:51 time and our probability of a variant
25:53 being tested over time. so bummer that
25:57 we're still making people use the circle
25:58 variant that's in the red and under
26:02 there like I don't know you get pretty
26:03 early on that maybe that's way to go
26:08 with neutron um so the great thing that
26:11 they do is say hey let's take that of
26:13 best variant over time and let's use
26:16 that as the sign
26:18 I think that was actually totally wrong
26:21 let's see specific but like we need
26:22 different types of
26:27 I yeah uh but I think again like super
26:30 probability ratio test these help you uh
26:33 make decisions early but also not make
26:35 decisions so you're worried right you're
26:37 just doing differential allocation
26:39 you're not necessarily like taking
26:41 something away when there still could be
26:49 and the last type of experiment that I
26:51 wanted to talk about was interl
26:53 experiment experiments which is very
26:56 very demo specific right um interle
27:00 experiments are sort of like a ranking
27:02 type situation like if you're searching
27:05 a product on Amazon's well what are the
27:06 results that they show you and in what
27:08 order right
27:10 so you're able to have these kind of two
27:13 outputs of what those should be and
27:16 intersperse them they probably should
27:19 then uh be able to understand user
27:21 preference and make these decisions faster
27:22 faster
27:27 Um and basically um what you're also
27:29 doing at the same time is your
27:33 understanding as a byproduct of this and
27:35 again very domain specific helps you
27:37 understand your specific topic much
27:40 better. Um but it is worth talking about
27:43 this because um this is one of those
27:45 cases where I'm like okay you actually
27:48 get to learn really well. Uh but also uh
27:50 you know you have the standard
27:52 experiment side um and you have a little
27:55 product of safe learn and this
27:58 experiment. Um okay I had you sitting
28:01 here for a long time. So I want to get
28:03 you all standing up through this last
28:06 little piece of participation and my 10
28:17 Yeah, still on. Um, and if you have done
28:20 four or fewer of these experiments,
28:22 please sit down. >> Okay.
28:25 >> Okay.
28:28 >> If you've done five or fewer, please sit down.
28:34 >> If done or fear, sit down. I think you
28:37 know where this is going, right? Um,
28:40 sound interest. Yes. I'm not giving in
28:43 the back the chance to participate.
28:45 How everyone's standing in the back has
28:53 please sit down
28:56 or senior please sit down. You have done
29:00 all okay but I think nine is all right.
29:02 Has anyone done all 10? Raise your hand
29:05 if you've done all eight.
29:09 Okay. But you can I click on four. I
29:11 don't know. I don't want to make you can
29:14 I pick more you've done.
29:16 >> Yeah. What's your favorite
29:21 switch? Switch back to game. Yeah. So,
29:23 hopefully this gives you some
29:26 inspiration. Um this not exhaustive by
29:28 any means. They're just ones I really
29:32 like. Um and so, uh if you'd like to, uh
29:33 I'd love to also figure out if you have
29:36 an extra one thing I missed. Um but I
29:38 think we're close to time. So, if anyone
29:40 has any questions, um, I'd love to
29:42 address them, but also know I'll I'll be
29:45 your I'm happy people later, too. But we question
29:50 your favorite.
29:53 >> Oh, well, my favorite I really like. I
29:55 know they're basic, but there's just so
29:58 many ways that you can do them. Um
30:00 because you can do like I showed you the
30:03 way that she does them versus comparing
30:04 your whole experimentation program to
30:06 get control of being led out versus
30:08 doing a back test to basically like
30:10 double confirm what you do. They're just
30:13 so versatile and like it's like uh
30:15 double checking your home too. So you
30:23 >> Talking about synthetic controls, how do
30:24 you compare them with pre-post testings?
30:26 because but synthetic synthetic controls
30:29 has a hard questions. So that would be
30:30 with pre-post. How do you compare those two?
30:31 two?
30:34 >> So our synthetic control methodology
30:37 that I'm used to using is not actually a
30:40 pre-post methodology. Um it's basically
30:43 using other units that are not being
30:46 treated to uh kind of calculate
30:50 counterfactual. Um, so basically instead
30:53 of using the pre-p period, you can use
30:56 different units that you can kind of
30:57 model into the unit that you're
31:00 experimenting, right? Um, and that way
31:02 what you can do is still account for
31:04 those kind of like team seasonality
31:05 things, right? Like I don't want to
31:09 compare um Black Friday sales to two
31:11 weeks ago sales in like a pre-post
31:13 situation. So I think that then
31:16 constructing your synthetic control not
31:18 from pre-experiment data but instead
31:19 from different units that aren't being
31:22 treated can be really powerful. Yeah.
31:24 >> The the diagram.
31:28 >> Yeah. Yeah. Oh. Oh, sorry. I must
31:29 understand what you're saying. Okay.
31:31 Okay. But I think that is cuing from
31:34 that pre-experiment period, right? If
31:36 you can, you know, construct your model,
31:38 you know, just, uh, constructing your
31:40 model based on some of the data, uh,
31:42 confirming your model based on some of
31:45 the data, right? Um, you can figure out
31:48 what your MSE is for that model, right?
31:50 And then you can kind of understand what
31:54 the uncertainty added to your analysis
31:57 by the fact that your uh, control group
32:01 is a minimal can be taken into account, right?
32:07 We can talk later too like more in
32:10 conversation. Yeah. Um I think we are at
32:11 time but again I would love to talk to