YouTube Transcript:
Median Elimination

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

Video Transcript

Video Summary

Summary

Core Theme

This content explains the "Median Elimination Algorithm," a method for solving multi-armed bandit problems by iteratively eliminating suboptimal arms based on their estimated values, aiming to identify an approximately optimal arm with high probability and bounded sample complexity.

Key Points

[Music]

for

that

is okay good took a while to write uh so essentially what we do is we start off with

K arms right and then in the first round okay so median elimination is going to proceed in

rounds okay in the first round we pull each arm some number of times right determined by

this magic quantity so can people read this magic quantity it's 1 by Epsilon L by 2 whole

squ log 3 by Delta L

okay so I pull each arm that many number of times okay and then I estimate there empirical value

right qla a I'll estimate right where L indicates that this is the Q function I'm forming at the L

stage okay then what I do is I take these QA arrange them in ascending descending whatever

order I find the median of all these arms and then eliminate all those arms whose values is below the

median right I'm eliminating so this is the set difference so I take my current set SL

right these are all the arms that are under consideration so S1 will be all the ACT arms

and then I'm eliminating all those arms such that so my qla a is less than ml where ml is

the median at the L level right and then I change my constants by Magic compounds again

so my Epsilon L + 1 becomes 3/4 of Epsilon L Epsilon l itself started as Epsilon by 4

okay now it becomes 3x 4 of Epsilon by 4 and my Delta L + 1 becomes Delta L by 2 which

already started as Delta by 2 so it becomes Delta by 4 and Lal L + 1 and I keep going

until my SL becomes one the number of arms left becomes one right so how many rounds will I go

block K rounds because every time I'm eliminating half THS right because I'm

looking at the median I'm eliminating everything below the median so I'm guaranteed to eliminate

half the arms in every round so I'll end in log K rounds okay but the trick to provide

proving any kind of sample complexity here is to show that okay this quantity right when I

Sumit over this lock rounds is some bounded quantity right and that gives me the sample

complexity I want I already told you what the sample complexity is going to be what was

it K by Epsilon s log 1 by Delta some constant by Delta so as far as order notations we can

ignore the constant so log 1 by Delta so that is what is going to be the sample complexity so

I'll have to show you that first second in fact first I have to show you that this is actually a

pack algorithm right that it actually gives you a Epsilon optimal arm with a high probability

right becomes a little tricky here because I'm eliminating half the arms every time so now I

have to think about what is the probability that I eliminated all the Epsilon optimal

arms right so I have to in some round along the way before I reach the final round okay

I shouldn't eliminate all the Epsilon optimal arms right so what we will do here you see the

right because at the end of the thing if that one arm left is some Epsilon optimal arm I'm

safe right but along the way if I eliminate all the Epsilon optimal arms then I'm in trouble so

what I will do now is to show that at every round the probability of me doing that is

very small so small that when I add up the probability of me eliminating across all the

rounds it is still below Delta right still below Delta so that's essentially what I'm

going to try and show did that make sense so every round the probability of me eliminating

all the Epsilon optimal arms is small so small that when I add it up across all the

rounds it is still less than Delta okay so that's essentially what I'm going to what I'm going to

show okay so first theorem that I'll state it but I'll come back and prove this later is

the the median elimination algorithm takes Epsilon

Delta an Epsilon Delta

pack with

so we'll prove this in two parts the first LMA will help us establish that okay and then then

we'll have another result later that will help us establish the sample complex sample complex

is actually trivial right if you think about it it's essentially summing this quantity up

right starting from Epsilon by 4 and Delta by 2 and successively changing the value of

Epsilon and Delta okay it's just just a bunch of algebra right you just move move numbers around

and simplify things you'll get it conceptually there is nothing deep about looking at the sample

complexity so is that clear so I essentially have to sum up this quantity right for every

arm so this will be multiplied by the size of the arms which will be which will start with K

and then it will become k by2 k by 4 successively it become smaller and smaller and likewise the

Epsilon and my Delta also become smaller and smaller so the idea is to just to Sumit over L

this quantity and then replace this by a function of Epsilon and L you can replace Epsilon L right

by a function of Epsilon so what is Epsilon 1 it's Epsilon by 4 what is Epsilon 2 3x 4 into

Epsilon by 2 Epsilon by 4 Epsilon 3 is 3x 4 the whole squared so so Epsilon L is essentially 3x

4^ lus 1 into Epsilon by 4 right so like that so you can write the similar expression for

Delta and you can essentially simplify it right so that part is fairly straightforward so if you

have time I'll just give you an intuition into that but otherwise you can work it out easily uh

the first part is one that requires some work right so this is what we'll do for every face

l in the median elimination algorithm we

have the

probability

okay so this requires a little bit of thought so what I'm saying so what is

this this is the true best arm at Lev L right so at level one I would expect this to be a star

right so the true best arm I'm not talking about the capital Q right I'm talking about Q Stars here

so I'm talking about the true best arm right so in this case I would expect the true this this should

be Q star a star right in level one but in level two itself it could be something different because

I might have lost a star in the first round itself right what I'm saying is okay this is the best I

could have done in the lth round right potentially right and what is this this is the best I could

have done in the l+ one round right so the bad case is when this is less than that right that

means I eliminated some high ranking arms all I have left or low ranking arms right so the bad Cas

is when this quantity is less than that quantity right by how much it should be less right that's

a question that's a trick we are asking here so if I add an Epsilon to this so essentially I have my

uh I know let's call this J Star right that's a maximum in the L level so I have my qar J star

here right and I have my Q star I star here which is the guy that gives me the max

here right if I add an Epsilon to

this if I add an Epsilon to this and this is within that that means

this is Epsilon optimal when you consider against

that right do you see that if I add an Epsilon to this lower value and

the higher value is actually less than that then it is a acceptable

situation because the lower value is within Epsilon of the higher

value correct so this is a good outcome so I'm saying the probability of the good

outcome is at least 1 minus Delta that means the probability is high okay is

that Mak sense the probability of the good outcome is at least 1 minus Delta okay now

the thing to notice here this is not truly Epsilon this Epsilon L right and this is not

Delta this is Delta l so why is that the case because every round I lose Epsilon L

right at most Epsilon L right so at the end of the day I shouldn't have lost more than

Epsilon so what I'm going to show you is that every round I lose Epsilon L and Epsilon L is

chosen such that even if I lose the Epsilon L for every L My overall loss will be epon right

so that because we know that when s in S1 the best arm is a star right and if overall loss is

less than Epsilon then at the end of this I'll have at least one Epsilon optimal arm left okay

so that is basically what we are trying to show here so this this this relation makes sense why

we are trying to show this okay and likewise this 1us Delta L is there because at the end

of the day if all the events hold true also my probability must be at least 1 minus Delta

right therefore we use Delta L and that is the reason your Epsilon 1's and Delta 1es are

already smaller than Epsilon and Delta because if I start off with Epsilon here and Delta here I'm

gone

[Applause]

okay so people know what w log means right without loss of generality I'm going to

consider L equal to 1 right so whatever results I show here will will hold for the subsequent L

so we start off by looking at the first the event

E1 which

is first thing I'm looking at is I'm grossly underestimating the the True Value right I mean

this this should now start looking familiar to you because in everything we have been doing the same

thing we start off by saying okay first thing that has to go wrong is I grossly underestimate

the the true optimal actions value and then we'll think about okay what happens if I overestimate

the suboptimal actions value but I start off by saying okay grossly underestimate this

right okay so now what is the probability of

this no and we already have our magic number here so substitute everything and tell

me so you have the expression here right and you have the magic number

there

that this guy will get cancelled

out again do you have a four problem there yeah yeah so well the square should be outside

I mean I I'm just writing it down from the algorithm here so their results are wrong

right so it should be Epsilon l s by 2 not Epsilon L by 2 the whole squ

okay and this is a one-sided bound so the outside two will not be there

right so everything will simplify and you'll

get Delta 1 by

3 okay this is the first event suppose it doesn't hold okay suppose even as doesn't

know that essentially I've not made a bad mistake underestimating the best arm right and since it's

maybe I should have done it slightly differently um I have said without loss of generality consider

L equal to one but then I put in a star here which actually violates the without loss of generality

part okay so the a star here is actually the best arm left at level l that's how you should think

of it it's not the overall best arm whatever be the level L it's a best arm left at level

L and for L equal to one it happens to be a star the true a star right for for L equal

2 3 and four this will essentially become the best arm at that level so I'll just put

a l there so that you know what I'm meaning okay it's the best arm in the Heth level

that is left among all the arms that is left the true best arm in the health level okay

good

okay so next I'm going to look at the probability that this translation is killing me

sh up EX

hm this one put the two in the numerator I should put

the two in the numerator what do you need for it to cancel out uh

Epsilon have Epsilon 1 by 2 so Epsilon 1 by

no yeah so this is Epsilon by 2 so substitute that I get Epsilon squ by 4 right that will

become half that's one by Epsilon squ by 2 so it will get things are all fine okay

good so I'm going to look at this problem right so sorry not

this so E1 does not hold that means that this is true right so E1 is the event where I have

underestimated it so I'm saying E1 does not hold that means that it is no longer an underestimate

so this is my estimate is within Epsilon by 2 right but then I'm looking at the possib ility

that different yeah so but my some J which is not Epsilon 1 optimal right so that's that's a

condition

no on L uh right so I know that current my current estimate in the lth round I'm going

to decide about eliminating in the lth round right in the current round I not

underestimated my a star L okay so it's I'm within Epsilon of that within Epsilon 1 by

two of the true a star value right but then I take some other arm J which is a bad arm bad

in the sense J is not Epsilon L close to a star it's a bad arm right and I figured out what is

the probability that the Q estimate for that arm will be greater than the Q estimate for

the a star right so if this happens right then there's a chance that a star will get

eliminated right only a chance why it has to be in the lower half not

just get beaten by one guy but it has to be in the lower half of the arms

right so I need at least K by 2 or whatever uh the size of SK SL by two bad arms to beat

me correct for me to eliminate this arm right I need at least half of the arms to beat me

not just one arm right but before we go to that part so what is the probability of this happening

well so this conditioning tells us that ql a star is not too bad right so what does that

mean qlj has to be more than Epsilon by2 so we remember that figure we drew right

so either this has to be underestimate well I erase the figure but either the a star has to

be underestimated by more than Epsilon by 2 or the a prime has to be overestimated

by more than Epsilon by 2 right here we are conditioning it on the event that a star is

not underestimated by more than Epsilon by2 therefore J has to be overestimated by more

than Epsilon by 2 right since this event we have conditioned it as false so that event has to be

true okay

so what do we get in that

case same thing here

right so I'm kind of skipping a step here but people are all fine with that right I'm skipping

rewriting this this bound right so essentially you have to now replace this with overestimating by

Epsilon by 2 right in fact I should probably just make you write that missing step as a homework so

that I I'm sure that people are all comfortable with the notation nothing more the notation can

get really complex otherwise the concepts are very simple that is why I'm taking so much time

talking over it right if I could just write the proof and leave it at that because it's

fairly easy these things are fairly easy so this is for one arm right and like this they

have K arms right so the

probability uh

good so let uh I don't know we use the same symbols here which I don't agree with let hash

bad be the number of bad arms okay hash bad be the number of bad arms such that this this condition

holds right hash bad be the number of bad arms such that that condition

holds so these are the number of bad arms that are beating the best

for

so what will that expectation

huh mod by 2 yeah not really but

uh I agree with the mod SL part but not SL by two I have a probability for that happening right and

how many such experiments actually is mod SL minus one right potentially everything other

than the best arm could be a bad arm right at at maximum thing is SL minus one right and so it's

SL -1 into Delta 1 by 3 so that's the probability of one arm being bad I mean one arm actually for

which this first relation holds for one bad arm and there are SL minus one I'm just making it SL

to make our life easier so SL s bad arms so the expected number of bad arms will be SL into Delta

1x3 right now I want to look at the probability

that hash

bad

this is where we are going to use

the marco

inequality right so I already written the expectation here so I'm asking you

the probability of X greater than equal to a it's the expected value

of x divided by a right so this is less than or equal to the expected value which

is divided by

okay so what have we shown there is that if E1 does not hold if I'm good in my estimate for

the best arm okay the probability that half the I mean so many bad arms will beat me is

bounded by two Dela 1 by 3 right but I'm also considering E1 as a bad event right E1 is also

bad because I my estimate for the best arm is really bad you know so all bets are off

in some sense right because I don't really I can't really identify my best St because

it's so bad right so all bets are also even is also a bad event right all of this is bad

and even is also bad so what is the total probability of something bad happening the

two together but they huh use the union bound you want use a fancy term we use

the union bound and then we say the probability of something bad happening is bounded by Delta

1 right so this bad is Delta 1 by 3 that bad is 2 Delta 1 by3 and this is actually

the compliment event of E1 in fact I can just add the probabilities I don't really need the union

Bond so the total probability of something bad happening is Delta 1 right so that's essentially

what we are started off by saying here so the probability of good is 1 minus Delta

now we have shown that the probability of bad is Delta l so the probability of good is 1us Delta

L okay

great so one of the the nice things about the median elmination algorithm is that uh it was one

of the earliest papers to introduce this kind of a round based algorithms to the Bandit literature

right now a lot of uh later uh improvements on both regret based case and people trying to play

around with the uh uh the pack guarantees all have switched to some kind of round based idea

you know so you do a certain number of pulls and then you eliminate a certain number of arms right

and then go back and pull the remaining arms some more and then eliminate some more arms so these

kinds of round based elimination methods have become very popular in the Pandit literature and

the Medan elimination is one of the earliest such round based uh elimination approaches and after

this became very popular and a lot of papers uh started being written using this kind of a round

based idea right so we have so far what we have shown is that it is have we shown that it is uh

back huh what is it oh that is fine I mean that that I'm already spoken about this LMA is taken

As proved right this LMA has taken As proved yeah so you have to still use the union bond for every

round that I'm eliminating right so what you have to show show is well I have all these

Deltas right so you have to keep adding them up to

make sure that eventually you are less than overall you are less than Delta

right so every round the probability of failure is bounded by Delta 1 right so for the K rounds

the probability of failure is I mean whatever the uh log K rounds the probability of failure

is bounded by summing it up so Delta 2 Delta by 2 Delta by 4 Delta by 8 and so on so forth until you

reach one right so that's the that's the summation and or not until you reach one until you reach log

K rounds so it's you'll get Delta by 2 power log k okay and the summation of that will go

to Delta is essentially Delta into half + 1/4 + 1/8 right if the sum runs to Infinity you'll

get 1 but the sum is not running to Infinity it stops at some finite time so it will be

less than one so the summation will be less than Delta Pro right so some probability of

something bad happening will be less than Delta so likewise you can show the same things for uh

the Epsilon right so at every level you you drop atmost by Epsilon L right so the total

you would have dropped is Epsilon by 4 + so it's essentially 1/4 plus uh what is it uh 34

into one or or yeah right so Epsilon by 4 into 1 + 1/4 and so on so forth right 3x 4 L minus one

yeah yeah that that is also upper bounded by one if you run it to Infinity no it's

yeah it's upper bound it be one if you run the summation to Infinity but you're

stopping at some finite amount so it'll be less than Epsilon okay so both of that are

satisfied that is easy enough uh the harder the trickier part is to show the the sample

complexity

right it's not hard it's just algebra can you let you guys do it or you want me to

do the sample complexity part of the proof as well no is it huh depends somebody say depends

depends on what that's a paper is it there in the paper it is there in the paper yeah yeah

it's it's there in the paper you can look at it and so I'll stop it here for the pack the pack

complexity we'll come back to all of this regret and pack and everything when we look at the full

RL problem at some point uh but right now for the bandits I'll stop [Music] [Applause] here

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Median Elimination