Hang tight while we fetch the video data and transcripts. This only takes a moment.
Connecting to YouTube player…
Fetching transcript data…
We’ll display the transcript, summary, and all view options as soon as everything loads.
Next steps
Loading transcript tools…
Median Elimination | Reinforcement Learning | YouTubeToText
YouTube Transcript: Median Elimination
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
Video Summary
Summary
Core Theme
This content explains the "Median Elimination Algorithm," a method for solving multi-armed bandit problems by iteratively eliminating suboptimal arms based on their estimated values, aiming to identify an approximately optimal arm with high probability and bounded sample complexity.
Key Points
[Music]
for
is
EX
that
o
is okay good took a while to write uh so essentially what we do is we start off with
K arms right and then in the first round okay so median elimination is going to proceed in
rounds okay in the first round we pull each arm some number of times right determined by
this magic quantity so can people read this magic quantity it's 1 by Epsilon L by 2 whole
squ log 3 by Delta L
okay so I pull each arm that many number of times okay and then I estimate there empirical value
right qla a I'll estimate right where L indicates that this is the Q function I'm forming at the L
stage okay then what I do is I take these QA arrange them in ascending descending whatever
order I find the median of all these arms and then eliminate all those arms whose values is below the
median right I'm eliminating so this is the set difference so I take my current set SL
right these are all the arms that are under consideration so S1 will be all the ACT arms
and then I'm eliminating all those arms such that so my qla a is less than ml where ml is
the median at the L level right and then I change my constants by Magic compounds again
so my Epsilon L + 1 becomes 3/4 of Epsilon L Epsilon l itself started as Epsilon by 4
okay now it becomes 3x 4 of Epsilon by 4 and my Delta L + 1 becomes Delta L by 2 which
already started as Delta by 2 so it becomes Delta by 4 and Lal L + 1 and I keep going
until my SL becomes one the number of arms left becomes one right so how many rounds will I go
block K rounds because every time I'm eliminating half THS right because I'm
looking at the median I'm eliminating everything below the median so I'm guaranteed to eliminate
half the arms in every round so I'll end in log K rounds okay but the trick to provide
proving any kind of sample complexity here is to show that okay this quantity right when I
Sumit over this lock rounds is some bounded quantity right and that gives me the sample
complexity I want I already told you what the sample complexity is going to be what was
it K by Epsilon s log 1 by Delta some constant by Delta so as far as order notations we can
ignore the constant so log 1 by Delta so that is what is going to be the sample complexity so
I'll have to show you that first second in fact first I have to show you that this is actually a
pack algorithm right that it actually gives you a Epsilon optimal arm with a high probability
right becomes a little tricky here because I'm eliminating half the arms every time so now I
have to think about what is the probability that I eliminated all the Epsilon optimal
arms right so I have to in some round along the way before I reach the final round okay
I shouldn't eliminate all the Epsilon optimal arms right so what we will do here you see the
right because at the end of the thing if that one arm left is some Epsilon optimal arm I'm
safe right but along the way if I eliminate all the Epsilon optimal arms then I'm in trouble so
what I will do now is to show that at every round the probability of me doing that is
very small so small that when I add up the probability of me eliminating across all the
rounds it is still below Delta right still below Delta so that's essentially what I'm
going to try and show did that make sense so every round the probability of me eliminating
all the Epsilon optimal arms is small so small that when I add it up across all the
rounds it is still less than Delta okay so that's essentially what I'm going to what I'm going to
show okay so first theorem that I'll state it but I'll come back and prove this later is
the the median elimination algorithm takes Epsilon
Delta an Epsilon Delta
pack with
so we'll prove this in two parts the first LMA will help us establish that okay and then then
we'll have another result later that will help us establish the sample complex sample complex
is actually trivial right if you think about it it's essentially summing this quantity up
right starting from Epsilon by 4 and Delta by 2 and successively changing the value of
Epsilon and Delta okay it's just just a bunch of algebra right you just move move numbers around
and simplify things you'll get it conceptually there is nothing deep about looking at the sample
complexity so is that clear so I essentially have to sum up this quantity right for every
arm so this will be multiplied by the size of the arms which will be which will start with K
and then it will become k by2 k by 4 successively it become smaller and smaller and likewise the
Epsilon and my Delta also become smaller and smaller so the idea is to just to Sumit over L
this quantity and then replace this by a function of Epsilon and L you can replace Epsilon L right
by a function of Epsilon so what is Epsilon 1 it's Epsilon by 4 what is Epsilon 2 3x 4 into
Epsilon by 2 Epsilon by 4 Epsilon 3 is 3x 4 the whole squared so so Epsilon L is essentially 3x
4^ lus 1 into Epsilon by 4 right so like that so you can write the similar expression for
Delta and you can essentially simplify it right so that part is fairly straightforward so if you
have time I'll just give you an intuition into that but otherwise you can work it out easily uh
the first part is one that requires some work right so this is what we'll do for every face
l in the median elimination algorithm we
have the
probability
okay so this requires a little bit of thought so what I'm saying so what is
this this is the true best arm at Lev L right so at level one I would expect this to be a star
right so the true best arm I'm not talking about the capital Q right I'm talking about Q Stars here
so I'm talking about the true best arm right so in this case I would expect the true this this should
be Q star a star right in level one but in level two itself it could be something different because
I might have lost a star in the first round itself right what I'm saying is okay this is the best I
could have done in the lth round right potentially right and what is this this is the best I could
have done in the l+ one round right so the bad case is when this is less than that right that
means I eliminated some high ranking arms all I have left or low ranking arms right so the bad Cas
is when this quantity is less than that quantity right by how much it should be less right that's
a question that's a trick we are asking here so if I add an Epsilon to this so essentially I have my
uh I know let's call this J Star right that's a maximum in the L level so I have my qar J star
here right and I have my Q star I star here which is the guy that gives me the max
here right if I add an Epsilon to
this if I add an Epsilon to this and this is within that that means
this is Epsilon optimal when you consider against
that right do you see that if I add an Epsilon to this lower value and
the higher value is actually less than that then it is a acceptable
situation because the lower value is within Epsilon of the higher
value correct so this is a good outcome so I'm saying the probability of the good
outcome is at least 1 minus Delta that means the probability is high okay is
that Mak sense the probability of the good outcome is at least 1 minus Delta okay now
the thing to notice here this is not truly Epsilon this Epsilon L right and this is not
Delta this is Delta l so why is that the case because every round I lose Epsilon L
right at most Epsilon L right so at the end of the day I shouldn't have lost more than
Epsilon so what I'm going to show you is that every round I lose Epsilon L and Epsilon L is
chosen such that even if I lose the Epsilon L for every L My overall loss will be epon right
so that because we know that when s in S1 the best arm is a star right and if overall loss is
less than Epsilon then at the end of this I'll have at least one Epsilon optimal arm left okay
so that is basically what we are trying to show here so this this this relation makes sense why
we are trying to show this okay and likewise this 1us Delta L is there because at the end
of the day if all the events hold true also my probability must be at least 1 minus Delta
right therefore we use Delta L and that is the reason your Epsilon 1's and Delta 1es are
already smaller than Epsilon and Delta because if I start off with Epsilon here and Delta here I'm
gone
[Applause]
okay so people know what w log means right without loss of generality I'm going to
consider L equal to 1 right so whatever results I show here will will hold for the subsequent L
so we start off by looking at the first the event
E1 which
is first thing I'm looking at is I'm grossly underestimating the the True Value right I mean
this this should now start looking familiar to you because in everything we have been doing the same
thing we start off by saying okay first thing that has to go wrong is I grossly underestimate
the the true optimal actions value and then we'll think about okay what happens if I overestimate
the suboptimal actions value but I start off by saying okay grossly underestimate this
right okay so now what is the probability of
this no and we already have our magic number here so substitute everything and tell
me so you have the expression here right and you have the magic number
there
that this guy will get cancelled
out again do you have a four problem there yeah yeah so well the square should be outside
I mean I I'm just writing it down from the algorithm here so their results are wrong
right so it should be Epsilon l s by 2 not Epsilon L by 2 the whole squ
okay and this is a one-sided bound so the outside two will not be there
right so everything will simplify and you'll
get Delta 1 by
3 okay this is the first event suppose it doesn't hold okay suppose even as doesn't
know that essentially I've not made a bad mistake underestimating the best arm right and since it's
maybe I should have done it slightly differently um I have said without loss of generality consider
L equal to one but then I put in a star here which actually violates the without loss of generality
part okay so the a star here is actually the best arm left at level l that's how you should think
of it it's not the overall best arm whatever be the level L it's a best arm left at level
L and for L equal to one it happens to be a star the true a star right for for L equal
2 3 and four this will essentially become the best arm at that level so I'll just put
a l there so that you know what I'm meaning okay it's the best arm in the Heth level
that is left among all the arms that is left the true best arm in the health level okay
good
okay so next I'm going to look at the probability that this translation is killing me
sh up EX
hm this one put the two in the numerator I should put
the two in the numerator what do you need for it to cancel out uh
Epsilon have Epsilon 1 by 2 so Epsilon 1 by
no yeah so this is Epsilon by 2 so substitute that I get Epsilon squ by 4 right that will
become half that's one by Epsilon squ by 2 so it will get things are all fine okay
good so I'm going to look at this problem right so sorry not
this so E1 does not hold that means that this is true right so E1 is the event where I have
underestimated it so I'm saying E1 does not hold that means that it is no longer an underestimate
so this is my estimate is within Epsilon by 2 right but then I'm looking at the possib ility
that different yeah so but my some J which is not Epsilon 1 optimal right so that's that's a
condition
no on L uh right so I know that current my current estimate in the lth round I'm going
to decide about eliminating in the lth round right in the current round I not
underestimated my a star L okay so it's I'm within Epsilon of that within Epsilon 1 by
two of the true a star value right but then I take some other arm J which is a bad arm bad
in the sense J is not Epsilon L close to a star it's a bad arm right and I figured out what is
the probability that the Q estimate for that arm will be greater than the Q estimate for
the a star right so if this happens right then there's a chance that a star will get
eliminated right only a chance why it has to be in the lower half not
just get beaten by one guy but it has to be in the lower half of the arms
right so I need at least K by 2 or whatever uh the size of SK SL by two bad arms to beat
me correct for me to eliminate this arm right I need at least half of the arms to beat me
not just one arm right but before we go to that part so what is the probability of this happening
well so this conditioning tells us that ql a star is not too bad right so what does that
mean qlj has to be more than Epsilon by2 so we remember that figure we drew right
so either this has to be underestimate well I erase the figure but either the a star has to
be underestimated by more than Epsilon by 2 or the a prime has to be overestimated
by more than Epsilon by 2 right here we are conditioning it on the event that a star is
not underestimated by more than Epsilon by2 therefore J has to be overestimated by more
than Epsilon by 2 right since this event we have conditioned it as false so that event has to be
true okay
so what do we get in that
case same thing here
right so I'm kind of skipping a step here but people are all fine with that right I'm skipping
rewriting this this bound right so essentially you have to now replace this with overestimating by
Epsilon by 2 right in fact I should probably just make you write that missing step as a homework so
that I I'm sure that people are all comfortable with the notation nothing more the notation can
get really complex otherwise the concepts are very simple that is why I'm taking so much time
talking over it right if I could just write the proof and leave it at that because it's
fairly easy these things are fairly easy so this is for one arm right and like this they
have K arms right so the
probability uh
good so let uh I don't know we use the same symbols here which I don't agree with let hash
bad be the number of bad arms okay hash bad be the number of bad arms such that this this condition
holds right hash bad be the number of bad arms such that that condition
holds so these are the number of bad arms that are beating the best
for
so what will that expectation
be
huh mod by 2 yeah not really but
uh I agree with the mod SL part but not SL by two I have a probability for that happening right and
how many such experiments actually is mod SL minus one right potentially everything other
than the best arm could be a bad arm right at at maximum thing is SL minus one right and so it's
SL -1 into Delta 1 by 3 so that's the probability of one arm being bad I mean one arm actually for
which this first relation holds for one bad arm and there are SL minus one I'm just making it SL
to make our life easier so SL s bad arms so the expected number of bad arms will be SL into Delta
1x3 right now I want to look at the probability
that hash
bad
this is where we are going to use
the marco
inequality right so I already written the expectation here so I'm asking you
the probability of X greater than equal to a it's the expected value
of x divided by a right so this is less than or equal to the expected value which
is divided by
okay so what have we shown there is that if E1 does not hold if I'm good in my estimate for
the best arm okay the probability that half the I mean so many bad arms will beat me is
bounded by two Dela 1 by 3 right but I'm also considering E1 as a bad event right E1 is also
bad because I my estimate for the best arm is really bad you know so all bets are off
in some sense right because I don't really I can't really identify my best St because
it's so bad right so all bets are also even is also a bad event right all of this is bad
and even is also bad so what is the total probability of something bad happening the
two together but they huh use the union bound you want use a fancy term we use
the union bound and then we say the probability of something bad happening is bounded by Delta
1 right so this bad is Delta 1 by 3 that bad is 2 Delta 1 by3 and this is actually
the compliment event of E1 in fact I can just add the probabilities I don't really need the union
Bond so the total probability of something bad happening is Delta 1 right so that's essentially
what we are started off by saying here so the probability of good is 1 minus Delta
now we have shown that the probability of bad is Delta l so the probability of good is 1us Delta
L okay
great so one of the the nice things about the median elmination algorithm is that uh it was one
of the earliest papers to introduce this kind of a round based algorithms to the Bandit literature
right now a lot of uh later uh improvements on both regret based case and people trying to play
around with the uh uh the pack guarantees all have switched to some kind of round based idea
you know so you do a certain number of pulls and then you eliminate a certain number of arms right
and then go back and pull the remaining arms some more and then eliminate some more arms so these
kinds of round based elimination methods have become very popular in the Pandit literature and
the Medan elimination is one of the earliest such round based uh elimination approaches and after
this became very popular and a lot of papers uh started being written using this kind of a round
based idea right so we have so far what we have shown is that it is have we shown that it is uh
back huh what is it oh that is fine I mean that that I'm already spoken about this LMA is taken
As proved right this LMA has taken As proved yeah so you have to still use the union bond for every
round that I'm eliminating right so what you have to show show is well I have all these
Deltas right so you have to keep adding them up to
make sure that eventually you are less than overall you are less than Delta
right so every round the probability of failure is bounded by Delta 1 right so for the K rounds
the probability of failure is I mean whatever the uh log K rounds the probability of failure
is bounded by summing it up so Delta 2 Delta by 2 Delta by 4 Delta by 8 and so on so forth until you
reach one right so that's the that's the summation and or not until you reach one until you reach log
K rounds so it's you'll get Delta by 2 power log k okay and the summation of that will go
to Delta is essentially Delta into half + 1/4 + 1/8 right if the sum runs to Infinity you'll
get 1 but the sum is not running to Infinity it stops at some finite time so it will be
less than one so the summation will be less than Delta Pro right so some probability of
something bad happening will be less than Delta so likewise you can show the same things for uh
the Epsilon right so at every level you you drop atmost by Epsilon L right so the total
you would have dropped is Epsilon by 4 + so it's essentially 1/4 plus uh what is it uh 34
into one or or yeah right so Epsilon by 4 into 1 + 1/4 and so on so forth right 3x 4 L minus one
yeah yeah that that is also upper bounded by one if you run it to Infinity no it's
yeah it's upper bound it be one if you run the summation to Infinity but you're
stopping at some finite amount so it'll be less than Epsilon okay so both of that are
satisfied that is easy enough uh the harder the trickier part is to show the the sample
complexity
right it's not hard it's just algebra can you let you guys do it or you want me to
do the sample complexity part of the proof as well no is it huh depends somebody say depends
depends on what that's a paper is it there in the paper it is there in the paper yeah yeah
it's it's there in the paper you can look at it and so I'll stop it here for the pack the pack
complexity we'll come back to all of this regret and pack and everything when we look at the full
RL problem at some point uh but right now for the bandits I'll stop [Music] [Applause] here
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.