0:04 good morning so I'm especially honored
0:08 and excited to introduce dr. Michael
0:12 Kern's for our morning keynotes because
0:16 in addition to all of the extraordinary
0:18 success that he's had which I'll tell
0:20 you about in a moment a little-known
0:23 fact is that a little over three years
0:26 ago Michael introduced me to a small
0:30 startup named quanto peon so actually at
0:33 at this at this conference
0:36 so Michael Kearns is a professor at the
0:38 University of Pennsylvania for their
0:40 computer and information science
0:43 department he's also a founding director
0:46 of Penn's Warren Center for Network and
0:49 data sciences his research interests
0:51 include topics in machine learning
0:54 artificial intelligence algorithmic game
0:57 theory computational social science and
1:00 quantitative trading he's consulted and
1:03 worked widely within the technology and
1:05 finance industries and it's chief
1:07 scientist at mana partners a
1:09 quantitative hedge fund and trading
1:11 technology firm here in New York City
1:16 and Michael is also a scientific adviser
1:22 to quant opium ok good morning thanks to
1:24 all of you for showing up so early on a
1:26 Saturday anyway thanks for coming and
1:28 thanks very much to the quanto peon team
1:31 for having me so I noticed in the
1:34 program there are a lot of talks that
1:36 are on the discussion of overfitting and
1:39 train test methodology etc which is all
1:41 good stuff but one way of avoiding
1:43 overfitting is to never assume in the
1:45 first place that the future will have
1:46 anything at all to do with the past and
1:48 that's what I want to talk about today
1:51 as you can see many talented people have
1:53 given deep thought to the concept of
1:55 regret in the past but I'm here to talk
1:57 about a particular technical definition
1:59 of regret and to tell you about the
2:00 subfield of machine learning which is
2:03 called no regret learning which is a
2:05 thriving subfield of machine learning
2:07 and is finding wide applications in many
2:09 domains and today I just want to tell
2:11 you about what the basic framework is
2:12 the kind of result that this field
2:13 produces theoretically
2:16 and then talk mainly about potential
2:19 applications of its to trading domains
2:22 and you know if you're fundamentally
2:24 suspicious of a title like trading
2:25 without regret your instincts are
2:27 correct and you'll notice that I've
2:29 added an asterisk to my title which we
2:34 will come to eventually okay so what is
2:35 no regret learning I just want on one
2:37 slide kind of describe the technical
2:39 framework for you so no regret learning
2:41 in this in the basic no vet learning
2:43 framework I want you to imagine you have
2:45 some set of n signals or predictors and
2:46 I want you to think about these signals
2:48 or predictors in an extremely general
2:51 way so to give examples of what these
2:52 signals or predictors might be they
2:54 might be the you know seared the time
2:57 series of returns of individual equities
2:59 or other instruments or funds of these
3:01 things or portfolios of such instruments
3:04 or they might be full-fledged strategies
3:06 like blackbox algorithms that are taking
3:08 any data sources you want as input and
3:10 producing if you like predictions about
3:13 the future so you might think of them as
3:15 alphas for instance in a trading context
3:17 or they might be you know other sources
3:19 of data or information like advisers
3:20 like stock analysts who are giving
3:23 upgrade and downgrade recommendations or
3:25 sentiment indicators from news or social
3:27 media anything you want you can only
3:30 think of these things as just boxes that
3:31 have in unknown inputs and are
3:33 generating a sequence of numerical
3:37 values and each trading period we're
3:38 going to imagine that each one of these
3:40 signals received some arbitrary payoff
3:42 and when I say arbitrary I mean I'm
3:44 gonna make absolutely no stochastic
3:46 assumptions or probabilistic modeling
3:49 whatsoever in particular you can think
3:52 of the returns or payoffs or losses of
3:54 these black boxes as being generated by
3:57 some omniscient or even a mom nishan
4:00 nature or even adversary whose express
4:03 purpose is to sort of foil your use of
4:04 these signals and so you should already
4:06 be wondering what in the world could you
4:08 possibly hope to say in such a setting
4:10 and I'll come to that in a second and in
4:12 particular these signals or black boxes
4:14 might have all kinds of side information
4:16 or even inside information they might
4:18 have expertise and specialization that
4:20 lets them do well under certain
4:22 conditions and poorly another under
4:24 other ones and so the only thing I'm
4:26 going to assume about kind of the vet
4:28 their payoffs that these end predictors
4:30 get at each time step is that they're
4:32 bounded okay that they're you know
4:34 they're there they're not arbitrarily
4:36 large either positive or negative and
4:38 note that this assumption is already
4:40 present in most stochastic models that
4:42 are used in finance such as you know the
4:45 transition from discrete Brownian motion
4:47 to continuous Brownian motion that
4:48 underlies things like options pricing
4:53 okay so in this setting your goal is to
4:55 design an algorithm that's gonna
4:57 maintain some portfolio or dynamic
4:59 weighting or if you like a probability
5:02 distribution over these and signals or
5:04 predictors and whatever a probability
5:06 distribution or weighting you're using
5:08 at each time step you're basically going
5:10 to get the weighted average according to
5:13 that distribution of the underlying
5:16 payoffs for the signals okay and so
5:17 what's the goal you know so I've set up
5:19 this extremely general setting where you
5:21 have these arbitrary and black boxes
5:23 your algorithm is maintaining some
5:25 weighting or distribution over them but
5:27 there's no assumptions at all on the
5:28 sequence of payoffs and so in particular
5:30 there's no assumption here that future
5:32 payoffs have anything to do with past
5:36 payoffs for these n signals well so in
5:38 in the absence of there being some
5:40 relationship between past data and
5:42 future data the only thing you might
5:43 hope to say is something about the past
5:46 about your performance in hindsight on
5:49 the data so far so no regret learning
5:51 the goal is that no matter what the
5:52 sequence of payoffs is for these
5:54 underlying and signals your algorithm
5:56 should have what's called no regret
5:59 which means that if we look at the
6:02 payoff of the best one of these signals
6:06 in hindsight that your algorithms payoff
6:08 is is not too much worse than that
6:10 okay so you know that's what I'm sort of
6:12 saying informally here no regret is
6:14 going to mean that if we look at the
6:16 best payoff meaning the best payoff
6:18 amongst these end signals in hindsight
6:20 and we look at the actual payoff that
6:23 your algorithm got in hindsight that the
6:24 difference between those things is
6:27 sublinear right so if T is the number of
6:29 time periods so far we'd like for
6:32 instance results like this difference of
6:33 the best payoff minus your algorithm
6:35 this payoff growing so you like square
6:37 root of T because if it grows like
6:39 square root of T then if we look at your purse
6:39 purse
6:42 regret right that square root of T
6:44 divided by T which is 1 over square root
6:46 of T and that's going to 0 as the number
6:47 of periods increases and this is the
6:50 source of the term no regret so no
6:53 regret refers to having per step regret
6:56 that diminishes with time which is
6:57 equivalent to having sub linear
7:01 accumulative regret okay now so so this
7:04 is the goal and so you know it's a form
7:05 of learning but it's a form of learning
7:07 in which you're not generalizing in any
7:08 sense at all
7:10 you're just claiming that at all points
7:13 in time you're sort of tracking the best
7:16 one of these n strategies or predictors
7:18 in hindsight ok so it's an it's an ex
7:20 post statement rather than an ex amt
7:23 statement now let me point out something
7:25 that we're not proposing here which
7:27 would be deeply suspicious we're not
7:30 competing with the optimal unconstrained
7:32 policy that can switch willy-nilly
7:34 between the different strategies or
7:36 predictors okay so we're not sort of
7:38 saying like in hindsight we should have
7:40 been doing this on you know following
7:42 following predictor 1 on the first day
7:44 predictor 7 on the second day predictor
7:46 19 on the third day with arbitrary
7:48 switching we're just comparing ourselves
7:50 to predictor 1 in isolation to an
7:52 isolation 3 in an isolation and
7:54 whichever of those is best we want to
7:57 track that ok so this is the basic
7:59 definition even though there are a lot
8:02 of deep mathematical connections between
8:03 boosting which many of you may be
8:06 familiar with and no regret learning the
8:07 fundamental assumptions of those two
8:09 frameworks and the goals of them are
8:10 quite different
8:12 so in boosting you do make probabilistic
8:15 assumptions you make stochastic or
8:16 distributional assumptions on how the
8:18 data is generated you essentially have
8:21 an iid setting and there your goal is to
8:23 actually beat the best of your weak
8:25 learner so to speak here we're not
8:27 trying to beat the best right we're just
8:29 trying to track the best and so you can
8:30 think of the criteria here as being the
8:33 difference between finding a needle in a haystack
8:33 haystack
8:36 ie the you know good quanto pian
8:39 strategy out of the sea of quanto pian
8:41 strategies versus the other challenge of
8:43 trying to build something that's better
8:44 than any one of the individual
8:49 strategies in the first place ok now no
8:50 regret learning this framework has
8:52 obvious appeal in financial settings where
8:53 where
8:55 in particular you know non-stationarity
8:57 is the rule and in fact even worse than
8:59 non-stationary a lot of that
9:01 non-stationarity might be generated by
9:04 game theoretic or strategic or even
9:05 adversarial interactions between
9:08 multiple parties and so typical time
9:10 series models even aren't appropriate
9:15 okay okay so this is probably the only
9:17 semi technical slide and I'm only gonna
9:19 talk about the highlighted parts for the
9:21 most part so in the top box here I've
9:23 reproduced from a survey paper on no
9:24 regret learning that I'll give a
9:27 reference to at the end what a typical
9:30 algorithm that achieves this no regret
9:32 criterion looks like and the first thing
9:33 I want you to notice is how simple and
9:35 how short it is right if you coded this
9:38 up in MATLAB in full detail it would be
9:40 maybe five or ten lines inside of a loop
9:43 and these algorithms are sort of very
9:45 appealingly simple in almost every no
9:47 regret algorithm looks more or less like
9:50 this you would essentially have a weight
9:52 associated with each one of the end
9:54 signals and you're going to adjust those
9:57 weights in an AW in the semi obvious way
10:00 according to historical performance and
10:01 in particular the way you're gonna
10:04 update a weight at the next time step is
10:05 you're going to say the new value for
10:07 the weight is the old value to the way
10:10 times some multiplicative factor this
10:12 happens to be phrased as losses rather
10:14 than payoff so negative values for this
10:18 signal M are good and positive is bad
10:20 but in plain English what this update is
10:23 basically doing is saying look whatever
10:26 your payoff was I'm going to increase
10:28 your weight by a multiplicative factor
10:30 of 1 plus something times your payoff
10:32 and if you lost money I'm gonna I'm
10:34 gonna I'm gonna multiply your weight by
10:36 something less than 1 which is going to
10:38 be 1 minus something times your payoff
10:40 and there's something here is this
10:42 single free parameter of this algorithm
10:46 which is a learning rate okay so I have
10:48 one knob and if I kind of crank that
10:50 knob up very high then I'm making very
10:53 very radical changes in response to
10:55 positive or negative payoff so I'm gonna
10:57 greatly up weight things that get
10:58 slightly positive payoff and greatly
11:00 down weight things that get slightly
11:02 negative payoff if I set that learning
11:05 rate very very low then I'm I'm making
11:06 much more modest changes
11:09 the important kind of algebraic the part
11:10 of this update is that its
11:12 multiplicative it's not additive so
11:14 those of you who are familiar with
11:15 things like the perceptron learning
11:17 algorithm for linear threshold functions
11:19 which has nice mathematical properties
11:21 that's an additive update you sort of
11:23 say like well I'm going to add something
11:24 to the weight if it got a positive
11:26 payoff and I'm going to subtract
11:27 something from the way here we're
11:29 actually multiplying things and so in
11:32 particular if one of these signals goes
11:35 on a run of positive payoffs it will
11:37 exponentially increase the weight of
11:40 that signal over that period of time
11:42 okay and then the way you're actually
11:44 making your decisions at each round is
11:47 just by playing the weighted average of
11:49 these of the underlying signals
11:51 according to the normalized distribution
11:54 given by these weights okay so that's
11:56 the algorithm it's incredibly simple and
11:59 here's a very very nice theorem you can
12:02 prove about this algorithm and I'll just
12:03 go through sort of the final part this
12:05 it's this theorem is saying look if you
12:08 set the learning rate of the algorithm
12:11 in a particular way then what this pink
12:14 box is saying is that the payoff of this
12:17 algorithm on an arbitrary sequence of
12:18 payoff vectors for the underlying
12:21 signals the payoff of this algorithm at
12:24 the top will be the payoff of the best
12:26 one of the underlying signals in
12:29 hindsight - something that looks like
12:32 square root of T log of the number of
12:34 underlying signals okay so let me say
12:36 that again because I'm gonna and I want
12:38 you to keep just this one extra break
12:40 expression in mind the payoff of this
12:43 algorithm will be no matter what the
12:45 sequence of underlying payoffs for the
12:46 signals is the payoff of the algorithm
12:50 will be at worst the best payoff in
12:52 hindsight of any of the signals minus
12:55 the square root of T which is the number
12:57 of time steps times log of the number of
12:59 underlying signals that you're tracking
13:03 okay so there is a penalty here that for
13:05 the for how many signals you're tracking
13:07 is there should be and I'll say
13:08 something a little bit more precise
13:09 about that in a second there is a
13:11 penalty for that but it's modest it's
13:15 logarithmic okay okay so let me just
13:17 make some miscellaneous remarks about
13:19 this framework and type of result so
13:21 first of all no regret
13:23 and the analysis of them have actually a
13:25 very long and rich history it goes back
13:27 at least to kind of the game theory
13:30 micro econ literature in the 50s in the
13:31 form of something called Blackwell
13:33 approachability it has a very large and
13:35 very currently active as I mentioned
13:37 machine learning literature there are
13:39 lots of connections to finance some of
13:41 which I'll talk about but a couple of
13:43 others that I won't there are modern
13:45 connections to black Scholes so in
13:47 particular it is possible to derive the
13:51 black Scholes pricing formula without
13:53 any probabilistic assumptions whatsoever
13:56 by essentially viewing the pricing
14:00 process as a game between a price setter
14:03 and an arbitrage or both of whom are
14:05 playing a no regret algorithm and this
14:06 was done in the last few years or so
14:08 there are also very close predictions
14:11 between no regret learning and the right
14:13 way of pricing in prediction markets in
14:16 order to elicit truthful information
14:19 from an underlying population they're
14:21 very strong and long-standing
14:23 connections to game theory you can
14:26 derive the minimax theorem and you can
14:28 also derive linear programming again as
14:30 the limiting behavior between two
14:33 parties playing no regret algorithms in
14:36 a zero-sum game and more generally if
14:38 you take - no regret algorithms and have
14:40 them play repeatedly against each other
14:42 so they're each generating the payoffs
14:44 for the other one in an arbitrary game
14:47 zero-sum or otherwise you you get
14:49 convergence to Nash or a more general
14:51 notion of equilibrium called correlated
14:53 equilibrium now this all sounds great
14:56 and very powerful let me make a couple
14:58 of remarks though that will demystify
15:01 this kind of result in this framework so
15:04 first of all you know a lot of what's
15:06 difficult about proving the type of
15:08 theorem that I gave on the previous
15:11 slide is to really deal with the case to
15:12 really deal with arbitrary sequences of
15:15 payoffs for the underlying signals so in
15:18 particular you know here's an algorithm
15:20 that won't meet the no regret criterion
15:22 but you can still say something about so
15:24 it's just the greedy algorithm which is
15:25 known and there's no regret literature
15:26 is follow-the-leader
15:29 which is all you do is you basically say
15:31 well I'm going to look at the payoffs so
15:32 far and whichever one of these
15:34 underlying signals has the high
15:36 is accumulative payoff so far I'm gonna
15:39 put all of my money or all of my weight
15:41 on that particular signal tomorrow okay
15:43 and then you know you then tomorrow
15:45 happens and you update everybody's data
15:47 and you repeat okay so this is sort of
15:49 the most naive algorithm you can imagine
15:51 it does not meet anything like the
15:53 result I gave on the previous slide but
15:55 on the other hand what you can say is
15:57 that the regret of this algorithm will
15:59 essentially scale like the number of
16:02 times that the lead changes in the
16:03 underlying signals for the end
16:05 predictors right because the only time
16:07 you sort of lose is when you know you
16:11 made the you went with expert five or
16:13 predictor five and you know they weren't
16:15 the best and then they actually lost the
16:18 lead so you know if you have if you
16:20 happen to have a collection of
16:22 underlying signals in which the one
16:24 that's the best it doesn't change that
16:26 often then this algorithm will work
16:28 perfectly well but in general it'll do
16:31 poorly okay maybe a slightly more
16:33 related but more mathematical thing to
16:36 say is that even though this log and
16:40 term and the regret is modest it means
16:41 that you can't try anything
16:43 okay so here's something that you know
16:45 doesn't work and shouldn't work which is
16:47 imagine that what's going on is that
16:49 nature is flipping a sequence of fair
16:52 coins and these predictors that you have
16:53 are trying to predict what the outcome
16:56 of the coin is going to be each day okay
16:58 so you're gonna try to look at the
17:01 predictors and and use their guesses as
17:02 to what the next value of the coin is
17:04 going to be to make your prediction okay
17:07 so I think we can all agree that the
17:09 result I gave on the last slide better
17:11 not say that you can predict the outcome
17:13 of a fair coin better than random
17:15 guessing and in fact it doesn't right
17:17 because in order to in order for there
17:19 to be an expert or a predictor in your
17:21 pool that's good you'd essentially need
17:24 to add as predictors all two to the D to
17:27 two to the T possible sequences for this
17:29 coin over this sequence of coin flips
17:30 right and you'll notice that if you
17:33 actually put if you choose N equals two
17:36 to the T then log of two to the T is T
17:38 and then when you take square root of T
17:40 times log T its square root of T squared
17:43 would which I'm sorry it gives you
17:44 square root of it gives you exactly
17:47 square root of T okay and this is this is
17:47 is
17:49 it should be right what it's saying is
17:51 that you can use this method to kind of
17:53 come within predicting the outcome of
17:56 fair coin making sort of like 1 over
17:59 square root of T you know which is
18:00 exactly the variance right the standard
18:03 deviation okay so it's not telling you
18:05 that you know impossible things are
18:07 possible which is good on the other hand
18:09 it's often the case in the no regret
18:11 literature that you find that when you
18:13 do make probabilistic assumptions of
18:15 some kind on how the data is generated
18:18 and you run a no regret algorithm which
18:21 is not using such assumptions that no
18:23 regret learning algorithms often in fact
18:25 give you optimal or near optimal
18:27 solutions for your particular stochastic
18:30 assumptions so this is a very nice thing
18:31 because it means that you can have these
18:34 worst-case adversarial guarantees no
18:36 matter what the data is but if the world
18:39 is nice to you you might actually enjoy
18:42 the optimal result for that nice set of
18:46 assumptions ok so now let's turn to
18:48 trading that's a that's background about
18:52 no regret learning so the use of no
18:54 regret learning in trading or finance
18:56 settings is not new at all and what I've
19:01 reproduced here is the cover page of a
19:02 very famous paper on this topic called
19:05 Universal portfolios by tom koehler who
19:07 died a few years ago but was for many
19:09 years a prominent statistician at
19:13 stanford and in his paper he was he was
19:15 you know his paper actually predates the
19:19 kind of modern flurry of research in the
19:20 machine learning community on no regret
19:23 learning and he was considering a
19:26 setting in which these n predictors were
19:29 just simply the returns of stocks ok so
19:32 you have history historical returns of stocks
19:32 stocks
19:34 he's not going to make any stochastic
19:36 assumptions like Brownian motion with
19:39 drift or or any other time series model
19:41 it's just each day there's a vector of
19:43 returns for some underlying universe of
19:45 n stocks and he wants to maintain a
19:48 weighted portfolio over those stocks and
19:50 say something about it and he actually
19:52 says something stronger even than what I
19:55 said about comparing favorably to the
19:57 best single stock in hindsight he
20:00 compares his algorithm his algorithms return
20:00 return
20:04 - the best constant rebalance portfolio
20:07 in hindsight okay so he what I mean by
20:08 that just so we're all on the same page
20:11 is that the sort of class that he wants
20:14 to compare himself are all distributions
20:17 over these and underlying instruments in
20:19 which you do constant rebalancing so you
20:21 just have some fixed weighting and then
20:23 every day you know some things make and
20:26 some things lose money and then you've
20:27 made or lost money whatever money you
20:29 have left you're gonna redistribute it
20:31 according to this constant vector of
20:33 weights okay and as some of you might
20:36 know rebalancing is an extremely
20:39 powerful thing in particular it's easy
20:40 to kind of you know come up with very
20:43 simple sequences of returns for even two
20:45 stocks in which neither of the stocks is
20:47 really doing anything at all they're
20:50 it's just they're essentially flat but
20:51 because of the rebalancing you're
20:53 actually getting exponential growth in wealth
20:53 wealth
20:57 okay so Kover proposed a particular
20:58 algorithm that doesn't look too
21:01 dissimilar from the ones I described a
21:04 few slides ago and compares it to the
21:06 best constant rebalance portfolio after
21:08 the stock outcome so again in hindsight
21:11 for an arbitrary sequence of returns and
21:14 but he so he compares to a stronger
21:17 class of comparators but he has a weaker
21:20 criterion for what he means by regret in
21:22 particular he looks at log wealth so he
21:26 wants the log of the of his cumulative
21:28 wealth to compare favorably to the log
21:30 of the cumulative wealth of the best
21:31 constant rebalance portfolio in
21:34 hindsight and and this is all very very
21:37 sensible in the regime where you imagine
21:39 that the the best constant rebalance
21:41 portfolio is experiencing exponential
21:43 growth right because if you take the log
21:44 and it's not exponential growth this is
21:47 not an interesting statement okay
21:50 okay and so he actually did some some
21:53 simulations so here is a figure
21:55 reproduced from Cobras paper in which
21:59 there are two just so-so and n is two
22:01 here there are two underlying stocks and
22:03 here they are down at the bottom and
22:05 here he's showing the kind of cumulative
22:07 return of those stocks and then the
22:09 cumulative return of his so-called
22:11 Universal portfolio algorithm and you
22:13 can see this looks quite good these
22:14 stocks aren't doing
22:17 his generally doing quite well and much
22:19 much not not only does it have kind of
22:21 no regrets to either these stocks it's
22:23 doing much much better he doesn't
22:25 unfortunately show what the best
22:26 constant rebalance portfolio would be
22:28 but there is a theoretical guarantee
22:32 about that and this is from a much later
22:35 paper that proposed a similar algorithm
22:36 for a similar setting in which again
22:38 they've taken some underlying universe I
22:41 think in this case of about 30 tickers
22:43 and they're not showing you the
22:44 underlying tickers but they are showing
22:46 you the performance of their algorithm
22:49 which is here this is the performance of
22:51 the best constant rebalance portfolio so
22:53 they're kind of comparing favorably to
22:54 that they're not doing as well but
22:57 they're doing almost as well and this is
22:59 the return of the best single stock in
23:02 the underlying universe of 30 ok so this
23:05 all looks well and good and you know so
23:08 maybe 10 years ago or so you know when I
23:10 started or maybe more like 15 years ago
23:12 or so when I started thinking seriously
23:15 about about trading and Finance I you
23:16 know I was aware of this literature and
23:17 I was aware of these papers and I
23:19 thought well this is an obvious thing to
23:21 try and I learned a lot in doing that
23:24 which is where we come to the asterisk
23:27 on the title here and so here is here
23:30 these are plots from some experiments I
23:33 did and in this particular case I've
23:35 taken a much much larger universe than
23:37 any of the previous papers consider in
23:40 particular I took the S&P 500 okay so
23:43 now instead of n being 2 or 30 and is
23:46 500 okay and what I'm showing you here
23:50 is the S&P 500 between 2005 and 2011
23:52 which is a very interesting period for
23:55 the S&P 500 because it includes the
23:57 financial crisis and most of the
23:59 recovery from the financial crisis but
24:02 to the extent that the index ends up in
24:06 2011 about where it started in 2005 so
24:09 there's been sort of huge directional
24:11 movement but overall there's not been
24:12 much change in the index from the start
24:14 points at the end points but period of
24:16 extremely high volatility and so what
24:18 I'm showing you on this left-hand plot
24:20 is there's a blue curve for each of the
24:24 underlying stocks in the S&P 500 this is
24:26 the key each one of these blue lines is
24:27 the cumulative return over the
24:29 period for some for one of those 500
24:32 stocks and these lines that are buried
24:33 down here in the middle with again with
24:36 a little red arrow are the performance
24:38 of the kind of multiplicative weights
24:40 algorithm that I showed you before that
24:43 has this no regret guarantee and now it
24:45 doesn't look so great and there's
24:47 there's many of these I'll get to why
24:48 there's many different colored lines
24:50 here in a second
24:53 but you can see that you know this does
24:56 not look especially better than just
24:58 trading the index itself and it's not ok
25:00 it's you're not doing much better here
25:02 with this particular algorithm then you
25:04 would have done by just sort of equally
25:05 weighting the 500 stocks at the
25:08 beginning of 2005 and letting it and
25:10 letting it sit or rebalancing for
25:12 instance over that time period and what
25:14 I've done over here on the right plot is
25:17 taken away the other 500 curves here so
25:18 we can just see the performance of the
25:20 album a little bit better and the
25:21 different curves here correspond to
25:23 different values of the learning rate so
25:25 remember I told you there's this one
25:27 parameter of this method which is how
25:29 aggressively you make these updates and
25:31 if you set it to zero then you're not
25:33 doing any adaptation at all you're
25:36 simply parking your weights equally and
25:38 leaving them and if you crank it up
25:40 really high then you're making very very
25:43 dramatic changes in response to positive
25:45 or negative returns and so the the green
25:47 line here is where the learning rate is
25:48 the highest and the blue one is where
25:51 it's the lowest and you can see like to
25:53 me none of these look appealing right
25:55 the the mid the blue one is kind of
25:57 trading the index the green one you know
25:59 looks a little bit better at the end but
26:01 then you had of course a much more
26:03 significant drawdown during the sort of
26:05 period of the financial crisis and it's
26:07 not the real problem here isn't that the
26:09 returns don't look great that is a
26:11 problem the real problem is that though
26:12 is the way you're not getting good
26:14 return which is with extremely
26:16 concentrated portfolios so in particular
26:18 if you crank the weight up to this
26:20 particular green curve that I get here
26:22 this is what the underlying weights look
26:25 like so there are actually 500 curves in
26:28 this plot here you just can't see almost
26:30 all of them because they're nailed down
26:33 here at zero and so you can see that all
26:36 this you know green strategy is doing is
26:39 wildly switching its capital and sort of
26:41 chasing whichever thing
26:43 is outperforming for a particular sort
26:45 time period you can see that roughly
26:48 when this particular stock takes off is
26:50 when the algorithm sort of midway
26:52 through the time period more or less
26:54 puts almost all of its weight in that
26:55 stock for the remainder of the period
26:58 right and so even if you were sort of
27:01 found these returns acceptable from a
27:03 quant perspective you could never trade
27:04 this right you know like a quant hedge fund
27:05 fund
27:07 quanto peon in particular I believe
27:08 would never let you trade such a
27:10 strategy because it's too concentrated
27:12 right you're basically like doing stock
27:13 picking you're not actually doing
27:15 something quantitative and with low
27:17 concentration and high diversity by the
27:19 way if you're interested in what this
27:22 stock is it is of course Apple and so
27:24 even though you know even though the
27:26 algorithm was doing something sensible
27:27 and I would have been very happy for you
27:30 to have told me in 2005 to put all of my
27:31 wealth and Apple and just leave it there
27:34 still from a quant perspective this is
27:36 not the sort of you know kind of xposed
27:38 behavior that you want to see okay and
27:41 so this leads to the natural question of
27:43 like well can we can we you know fix
27:46 this can we adapt these algorithms and
27:48 this framework which i think is very
27:49 appealing where you're not making any
27:51 stochastic assumptions and you're still
27:53 getting these very strong ex-post
27:56 guarantees to an arbitrary sequence of
27:59 data can we somehow preserve those
28:01 properties but deal with the you know
28:03 the sort of additional constraints that
28:05 come into a quant rating environment
28:07 which is you want things like a
28:09 diversified portfolio and other forms
28:10 you know which is one form of risk
28:13 management right diversification okay
28:16 and so you know after doing these
28:18 experiments I spent you know some time
28:21 over the coming years writing a series
28:22 of papers in which I tried to address
28:24 these things and and let me tell you
28:28 about some things that don't work so one
28:29 thing you could ask for is you could
28:31 just say like well look you know of
28:34 course if your metric is to you know
28:37 let's say in a finance setting to have
28:39 the return the return of your algorithm
28:41 compared favorably in hindsight to the
28:44 return of the best individual stock or
28:45 the return of the best constant of
28:47 rebalancing portfolio you know there's
28:49 nothing in that statement that mentions
28:51 anything about risk okay or a
28:54 concentration or or Sharpe ratio or the
28:55 like and so
28:56 one thing we could say it's like well
28:58 maybe we can just change the criterion
29:01 and go back and kind of redo the math
29:03 and work out an algorithm that will do
29:05 make a similar statement for the altered
29:08 criterion so let's let me be precise you
29:11 know every every individual stock not
29:13 only has a cumulative return it has a
29:15 Sharpe ratio right we can we can look at
29:18 the sp500 and after any time period we
29:20 could say what is the Sharpe ratio of
29:22 each one of these stocks over the time
29:24 period in question and I could therefore
29:27 ask can we design an algorithm that in
29:30 hindsight has a Sharpe ratio which
29:34 compares favorably I has low regrets to
29:37 the best individual stocks Sharpe ratio
29:40 in hindsight perfectly reasonable
29:41 question could also ask it about
29:43 whatever your other favorite kind of
29:46 risk versus reward trade-off is like the
29:47 mean mean variance optimization I could
29:49 sort of say like well I'm gonna for each
29:52 stock look at its return - its standard
29:54 deviation historically and I watched
29:56 your algorithm to compare to whichever
29:58 stock has the highest mean - standard
30:00 deviation okay so this seemed like a
30:01 good idea
30:03 and unfortunately there are strong
30:06 negative results on this so not only
30:09 could I not find an algorithm which gave
30:11 me a mathematical statement like the one
30:13 we saw at the beginning but first say
30:16 something like Sharpe ratio even worse
30:18 we were able to prove an impossibility
30:20 here and we will go prove that you just
30:23 cannot design an algorithm which will
30:25 give you such a guarantee on an
30:28 arbitrary sequence of returns so no
30:30 regret is provably impossible in fact
30:32 not only can you prove that no regret is
30:34 impossible you can prove strong lower
30:36 bounds on what's called the competitive
30:39 ratio so you know instead of sort of
30:42 getting optimal - something you know I
30:44 could ask well maybe can I get half of
30:47 optimal or a 1/10 of optimal so it's
30:49 like some constant fraction of optimal
30:50 you can even prove impossibility results
30:52 for that so there's somehow something
30:55 about this framework which forbids you
30:57 from changing to the sort of the types
30:59 of criteria we might like to have in a
31:01 finance setting and still kind of
31:02 preserving the types of mathematical
31:04 guarantees that we get and intuitively
31:05 you know
31:07 the reason for these impossibility
31:10 results is that volatility introduces
31:13 switching costs right so you know I can
31:15 be tracking a stock you're putting all
31:17 of my weight in one particular stock and
31:19 enjoying the sequence of returns it gets
31:21 and maybe some other stock starts
31:22 looking better and so I want to switch
31:25 to that other stock well at the moment I
31:26 switch to that other stock I start
31:28 enjoying whatever returns that stock is
31:30 getting but I don't enjoy the same
31:33 historical volatility right so somehow
31:35 standard deviation is somehow keeping
31:38 track of some sort of state and this
31:41 state is causing problems for the the no
31:47 regret type of framework okay so one
31:50 thing that we have more success with an
31:52 alternative approach if you like is to
31:56 instead of assuming that these black
31:59 boxes or predictors are arbitrary let's
32:01 imagine that they have some risk
32:03 consideration properties as well okay so
32:07 in particular let's internalize the risk
32:09 within these strategies and then if we
32:11 have a collection of strategies that are
32:13 doing some risk management internally
32:15 can we run an algorithm on top of those
32:17 which has sort of favorable no regret
32:20 style properties and so let me let me
32:22 just tell you a little bit about this in
32:24 one slide so this is what you might
32:27 consider no regret learning under some
32:30 sort of risk constraint okay and so as
32:32 many of you know like the Sharpe ratio
32:34 is something you can always measure but
32:35 it's very difficult criterion to
32:39 actually manage to write the the closest
32:41 you can come to actually managing to the
32:43 Sharpe ratio is to just impose risk
32:45 limits to control your risk somehow and
32:47 so in particular one natural way of
32:49 controlling risk is to restrict yourself
32:53 to portfolios or strategies whose daily
32:56 PL has a standard deviation of at most a
32:57 certain amount so we're basically going
33:00 to say look you know you know you know
33:02 kind of hedge fund context you might
33:05 imagine saying whatever the returns are
33:06 we want to restrict ourselves to
33:09 strategies that on a typical day you
33:11 know won't have a standard deviation of
33:13 let's say more than ten million dollars
33:15 one one way or the other okay so we're
33:17 gonna basically we have some underlying
33:18 capital and a Strad
33:21 we can look historically at the PNL
33:23 that's been generated on that Capitol
33:25 and look at its distribution and say we
33:27 just don't want anything whose standard
33:29 deviation is more than a particular
33:33 dollar amount so if you do that right if
33:36 for instance you'd go to the exercise of
33:38 taking the historical returns of the S&P
33:40 500 and you estimate the covariance
33:44 matrix between those 500 stocks or you
33:45 have some perhaps lower-dimensional
33:49 factor model for doing so and you you
33:51 basically say well what are the
33:54 portfolio's over those stocks that have
33:56 less than a certain dollar value of
33:59 volatility historically it's easy to
34:00 work out the map that this is basically
34:02 going to give you an elliptical
34:04 constraint in portfolio space and the
34:06 eccentricity of that ellipse will depend
34:08 on the strength of the correlation so
34:09 like if all of the stocks are
34:12 independent then this constraint on the
34:16 standard deviation of your gmv is just a
34:18 spear and n dimensions if some stocks
34:20 are highly correlated and you'll get
34:22 sort of eccentricity in those particular
34:26 dimensions okay okay so so this is nice
34:28 because it means that you know if we
34:30 adopt if we sort of assume that the
34:32 underlying strategies or portfolios that
34:34 we're trying to track in a no regret
34:37 setting our risk managed in the sense
34:40 that they have some standard deviation
34:43 limit on their you know on their daily
34:46 P&L we end up with a convex in
34:47 particular in elliptical but more
34:50 generally a convex constraint on our on
34:53 our portfolio space okay and what I'm
34:55 showing you in these kind of simulated
34:58 plots are you know just two strategies
35:01 one of which is essentially a momentum
35:03 strategy and this is where there are two
35:06 stocks okay so there are two stocks here
35:09 and and the X and y values here are
35:12 essentially indicating the position in
35:14 that particular stock at different
35:16 moments of time and the color-coding
35:18 here you can ignore it's just sort of I
35:20 think blue is sort of early in the data
35:22 and and red is later in the data so I'm
35:23 just trying to show you the movement
35:27 over time so in in these simulated plots
35:29 all I'm showing you is one strategy
35:32 that's essentially following
35:34 a momentum rule and is therefore
35:36 changing its portfolio over time and
35:37 another one is essentially a pairs
35:38 trading strategy where there's a
35:41 continued no it's basically offsetting
35:43 long positions in one stock with short
35:45 positions in the other stock and the
35:46 stocks are pretty correlated which is
35:48 why we have this highly elliptical shape
35:51 here so these are sort of two underlying
35:54 strategies that we've imposed risk
35:55 considerations on so you can see that
35:58 not the you know for either this
36:00 momentum strategy or this kind of fully
36:03 hedged pairs trading style strategy they
36:05 never leave this elliptical constraint
36:06 because if they hit up against the
36:08 border we just say you know don't make
36:10 that trade don't make any trade that
36:12 would actually cause you to cross this
36:15 risk constraint and so now we've changed
36:17 the game right rather than considering
36:20 arbitrary black boxes we're considering
36:22 arbitrary black boxes but we don't let
36:24 them go outside of this little playpen
36:26 right because we're enforcing a risk
36:28 consideration on them and now you can recover
36:29 recover
36:31 you know the type of theorem I gave in
36:34 an earlier slide in so now we're only
36:35 going to compete with strategies that
36:37 obey these risk limits and we're also
36:39 only going to look at strategies that
36:41 make as these strategies do they only
36:43 make local moves in portfolio spaces
36:44 because and you can think of this as
36:46 modeling you know kind of limiting your
36:50 market impact okay and to kind of cut to
36:52 the chase if you make the additional
36:53 assumption that your underlying
36:55 strategies are risk managed in this
36:58 particular way then you can run you can
37:00 give an algorithm that looks similar to
37:01 but it's a little bit more complicated
37:02 than the one I gave at the beginning
37:04 that we'll all it will actually give you
37:08 a no regret guarantee in in the in the
37:11 risk managed criterion okay say like me
37:14 - variance okay and the only thing you
37:16 need to do is sort kind of combine no
37:18 regret learning with some tricks that
37:20 are from an interesting literature
37:22 called the pursuit of Asian literature
37:24 which you can go look up which has a
37:26 long mathematical history but it's
37:28 basically about games in which one party
37:30 is trying to chase another party let's
37:34 say in a convex region and so you can
37:36 recover theoretical guarantees and you
37:38 can get kind of nice experimental
37:42 results as well let me just finish up
37:45 here by talking about some variations
37:50 of this framework so you know one thing
37:52 it would be nice is to maintain the
37:55 original no regret I guarantee I talked
37:57 about which is to have small regrets to
37:59 the best single predictor in hindsight
38:01 but also guarantee that you will never
38:04 do much worse than some fixed index okay
38:07 so for example it would be great if I
38:09 could say like well you'll get close to
38:11 the returns of whatever the best stock
38:13 is in the in the index but you'll never
38:17 do worse than the index itself okay
38:19 whereas the original statement allows
38:21 you to do kind of cumulatively square
38:23 root of t worse than the index right so
38:25 if sort of everything's moving sideways
38:27 and there's no winner taking off
38:29 relative to the index you might actually
38:32 lose got a square root of regret square
38:33 root of T regrets to the index itself
38:36 which wouldn't be so great so it turns
38:38 out that you can make again slight
38:40 modifications to the algorithm the type
38:41 of algorithm I gave you at the beginning
38:43 of the talk and get this kind of
38:45 guarantee as well sort of promise that
38:47 you'll track the index and if something
38:49 in the index takes off then your your
38:53 profit will will track that as well a
38:56 very important generalization in in many
38:58 applications but including in finance
39:00 applications is to go to what's called a
39:03 bandit setting so in the framework I've
39:05 described to you so far you know each
39:07 day you're maintaining awaiting over
39:09 these underlying predictors or stocks
39:12 but whatever weighting you're
39:13 maintaining even if you put all of your
39:16 weight in a single stock you're still
39:18 finding out what the returns of all of
39:21 the stocks are okay and so that's fine
39:22 if your underlying predictors are
39:25 actually stock are actually stocks but
39:26 in many settings
39:28 you know you might only get feedback on
39:30 the thing that you chose to play and
39:32 this is what's called a multi-armed
39:33 bandit problem for those of you that are
39:35 familiar with that literature so for
39:38 example if instead of being n stocks I
39:40 have n strategies and each of these
39:44 strategies has internal state and makes
39:47 trades each day you know and I want to
39:49 maintain kind of a weighting over these
39:51 underlying strategies like well to
39:53 really find out what a strategy does
39:55 does on a given day I really need to
39:57 trade it in particular because you know
39:59 I can back test all of them
40:03 right in parallel but I can to really
40:04 find out what a strategy would have
40:06 gotten today I really need to trade it
40:08 live in the market among other reasons
40:10 because of trading costs right because a
40:11 market impact because of the
40:15 counterfactual that you know if I make
40:17 trades that causes other traders and
40:18 algorithms to change their behavior and
40:21 the only way of simulating that is by
40:23 actually trading live in the market okay
40:25 so in a setting like that I can't just
40:27 sort of maintain this conceptual
40:29 weighting and every day find out what
40:30 these strategies would have done I
40:32 really need to pick one or more of them
40:34 and play it in the market to get the
40:36 feedback okay and so this is called a
40:37 bandit setting where you only get the
40:39 feedback or payoff of whatever you
40:42 whatever choice you made that day and
40:45 and basically everything I've said in
40:47 this talk also applies to the bandit
40:49 setting there are simple modifications
40:50 of the algorithm I gave on the first
40:53 slide where you only get feedback from
40:55 the thing that you chose and you still
40:57 get similar mathematical results would
40:59 like a worse regret down I think it goes
41:01 like the dependence on the number of
41:03 time steps goes from square root of T to
41:06 like T to the 2/3 something like that
41:11 okay let's see the the other thing that
41:13 I think you know tool that you you
41:15 should be aware of and by the way one of
41:16 the reasons I wanted to give this talk
41:18 is that these tools are simple enough
41:21 but powerful enough that I think a lot
41:23 of you could find applications for them
41:25 in your own strategies and so let me
41:26 tell you about you know kind of one of
41:29 the the biggest generalizations of this
41:31 framework that's been known today which
41:34 is sort of general online or sequential
41:37 convex optimization so now imagine your
41:39 action space instead of being like
41:42 picking between one of n discrete
41:44 alternatives which is really the same as
41:45 maintaining a weighting over them you
41:46 can think of maintaining a weighting or
41:48 distribution over them as is like a
41:50 probabilistic strategy where I have a
41:52 weighting and I'm going to pick randomly
41:55 from that distribution and play the
41:56 particular time aid right so
41:58 mathematically it's exactly the same
42:00 because it's just you know either the
42:01 inner product or it's an expectation
42:04 okay so a very nice journal is a ssin of
42:06 this is where our action space is
42:08 continuous right so instead of there
42:11 being n distinct choices there's an
42:14 continuous infinity of choices
42:16 but we need the shape of that continuous
42:18 infinity to be nice like let's say some
42:20 convex and compact region but if you
42:22 want to think of a Kong a specific
42:24 example imagine that you have some
42:28 parametric strategy class okay and that
42:30 you know each point in this space is one
42:32 setting of the parameters for your
42:35 strategy okay so a typical thing to do
42:37 would be to get historical data and
42:40 optimize the parameters of your strategy
42:41 on the historical data and use that
42:44 going forward but then again you're
42:45 making the assumption that like the
42:46 future is going to look something like
42:50 the past so you can not do that and
42:52 instead think in a no regret way and say
42:54 like well there's this continuous
42:56 infinity of settings of parameters for
42:59 my strategy I'd like to be able to
43:00 promise that no matter what the data
43:03 looks like I will run a strategy whose
43:06 performance in hindsight looks like the
43:09 optimal setting of the parameters you
43:12 know after the data's come in ex-post
43:16 okay um and and and furthermore let's
43:19 imagine that the way the payoffs are
43:20 determined each day is with some
43:22 arbitrary convex function over your
43:24 parameter space so you know the game is
43:27 if you like is that you know everyday
43:30 nature picks some arbitrary potentially
43:33 different convex function describing the
43:35 payoffs over your parameter space you
43:37 don't know what that function is but you
43:38 pick one of the parameter points and you
43:40 get the corresponding payoff that day
43:42 the next day nature changes what that
43:44 function looks like you again pick a
43:45 setting for your parameters you can
43:47 basically get the same kind of guarantee
43:49 that I gave in the no regret setting
43:51 where you promise that no matter what
43:54 the sequence of payoff functions chosen
43:56 by nature is their performance of your
43:59 algorithm will compare favorably to the
44:00 best setting of your parameters in
44:03 hindsight okay and this is sort of the
44:06 most general the most general
44:08 formulation of this the no regret
44:12 setting so just to conclude you know no
44:14 regret learning I think is something
44:16 that I would encourage all of you to
44:18 look at it has very rich history has
44:21 very powerful theory that touches on you
44:23 know many topics from finance to game theory
44:25 theory
44:28 and and even sort of mathematical
44:29 programming in the form of convex
44:31 optimization linear programming and the
44:34 like on the other hand deviations from
44:38 the typical additive objective function
44:40 like cumulative returns or cumulative
44:43 payoffs kind of break the theory for the
44:45 most part which is not a great thing but
44:47 there are workarounds right so there's
44:49 that you can one workaround that I
44:51 mentioned and there are others the one
44:53 is to sort of in dodging i's risk levels
44:55 within your strategies and then run such
44:57 an algorithm on top of those risk manage
45:01 strategies as well and I've listed here
45:03 at the end of the quanto peon team will
45:05 make these slides available I've just
45:07 listed some references I especially
45:10 recommend the first two which are served
45:12 very nice survey papers on the entire
45:14 topic of no regret learning and they're
45:16 they're relatively recent as well and
45:18 I'm happy to take questions if there are any
45:44 okay so first off um I I'd rather stay
45:46 man because first off I just want thank
45:48 you for giving such an amazing talk
45:49 thank you which I have you as a
45:55 professor but well I want want to ask
45:59 about this is I I guess that that when
46:04 you when you have that the payoff of the
46:05 algorithm the performance of the
46:07 algorithm I guess it depends on how you
46:10 would define you know the the payoff
46:13 that you're aiming at like if you have a
46:17 collection of n strategies then like I I
46:19 guess it just become more concentrating
46:21 that that you're that what you're aiming
46:24 at is total cumulative return over the
46:26 entire Baptist as compared to say you
46:27 know the total cumulative return over
46:31 the past three months six months I see
46:33 so you want to you you want to have kind
46:35 of a sliding guarantee so so this this
46:37 this this guarantee there are variants
46:39 to give you this as well right so rather
46:41 than sort of saying like well T just
46:43 keeps ticking up and I always will
46:45 promise that I'm like square root of T
46:48 behind at worst the best thing on the
46:50 entire history so far I might want to
46:52 move and guarantee right that basically
46:55 says in any window of length you know
46:57 let's let's give it a different name of
46:59 any window of length T prime I will
47:01 compare favorably to the best thing on
47:04 that window you can get such a guarantee
47:07 as well okay but but you know but in
47:09 general that that fixed window T prime
47:11 won't be going to infinity right because
47:13 it's a fixed window so if I want to you
47:16 know you should think of it's like it's
47:17 it's a learning it is a learning
47:18 algorithm right it's sort of getting
47:21 better in the sense that the guarantee
47:23 that you're giving is getting stronger
47:26 the longer time goes on but if you
47:27 choose to compare yourself to a fixed
47:30 moving window of like 250 training days
47:33 right the last calendar year then that
47:35 value will be fixed and you will you
47:36 know you'll trap to that extent but you
47:38 won't have a strong a guarantee as you
47:40 would if you let kind of T goat
47:45 asymptotically to infinity okay okay it
47:46 just seems like like you know like God
47:48 that the classic you know momentum
47:51 strategies like they seem to sort of
47:53 mesh pretty well with the
47:55 this is that yeah so that's a good point
47:58 this is a momentum strategy covert thing
48:00 is a little bit different right because
48:02 you're also you're running it on top of
48:04 constant rebalance portfolio so there's
48:07 this interesting there's both momentum
48:09 and reversion going on because the meta
48:11 algorithm that's reading underlying
48:14 court folios that's momentum but the
48:16 constant rebalancing is a reversion
48:18 process right you in when you rebalance
48:21 you take money away from things that did
48:22 well today and you give it to things
48:24 that didn't do well today right so in in
48:26 the in the Universal portfolio case
48:27 there's both the momentum and a
48:29 reversion aspect but in the straight
48:31 versions where you're just comparing to
48:33 the underlying and stocks for instance
48:35 it is it is a pure momentum strategy
48:38 you're right that's that's pretty cool
48:42 thank you yeah actually I found two
48:45 questions about this the first one is
48:48 that I'm a little bit confused about why
48:51 risk sort of derail the algorithm I
48:53 would have thought that you know the
48:54 theoretical proof has already
48:57 incorporated all possible random
49:02 deviation from the mean so that you know
49:04 despite the presence of which it still
49:10 satisfies that convergence yeah so let
49:11 me let me try to answer that question
49:14 even more generally than these methods
49:17 in which maybe a more or less satisfying
49:19 way of answering in in general one of
49:21 the dark secrets of all of modern
49:24 machine learning is its reliance on
49:26 additive objective functions so what I
49:28 mean by that is you have an objective
49:31 function that you're optimizing that can
49:34 be decomposed as a sum of the objective
49:36 functions on the individual data points
49:38 okay so for example
49:40 least squares regression right I can
49:42 write down the objective function as a
49:45 sum over each data point of my squared
49:47 error on that data point okay and
49:50 basically everything you know from
49:51 modern machine learning almost
49:54 everything relies very heavily on that
49:56 additive 'ti both the theory and the
50:00 practice okay and anytime you have an
50:03 objective function which breaks that
50:06 various chaos ensues and the
50:08 simple thing that happens here is that
50:10 you know square root does not have that
50:13 property right if I look at like the
50:16 variance of a random variable or or like
50:18 the the performance of a model over time
50:22 it's a square root of a son okay and you
50:24 can't rewrite a square root of a sum as
50:27 a sum of square roots okay and this is
50:29 what so so that's sort of a math answer
50:32 the intuitive answer is that things like
50:34 variance and standard deviation and
50:36 other common risk metrics they actually
50:38 have state right another way of saying
50:39 less mathematically what I just said is
50:42 that I if you give me a list of if
50:45 there's a list of numbers and you tell
50:47 me that there's n numbers in the
50:48 sequence and I want to know what the
50:50 contribution of this number is to the
50:52 average I know exactly what it is it's
50:54 that number divided by n if you ask me
50:57 what the contribution of this number is
50:59 to the variance of that sequence I can't
51:02 answer you I need to see the rest of the
51:04 numbers and so things like standard
51:06 deviation and variance that you can
51:07 think of them as having state right
51:09 there you're actually keeping there's
51:10 something there's something about the
51:12 googlers global information about the
51:15 sequence in that number in a way that's
51:17 not present in things like just the
51:19 straight mean or any other additive loss
51:22 function okay thank you so the second
51:25 part of my question is is the slow
51:27 conger conversions i you know i
51:29 understand that you have sub linear
51:32 convergence to the best algorithm so the
51:35 question is is this too slow for a
51:39 practical you know trader I guess my
51:41 answer that would be if this if if you
51:44 know it depends on you know it depends
51:46 on how much data you have and and many
51:49 other things but my general answer would
51:51 be a sort of one over square root of T
51:53 convergence to optimality is too slow
51:56 for you then so will all of machine
51:58 learning right because basically sort of
52:01 power law convergence is the rule in
52:04 machine learning both the theoretically
52:07 and largely empirically as well so you
52:09 know I I definitely agree there are you
52:11 know there are data sets for which the T
52:15 we have is just you know not big enough
52:16 just for these things to make
52:18 interesting statements but in general
52:20 that's a sign that you just don't
52:22 enough data to apply very very simple
52:23 learning methods even like linear
52:27 regression exam for example thanks for a
52:30 great talk very thought-provoking so my
52:31 question is in regards to the
52:33 multi-armed bandit yeah can you talk a
52:34 little bit about the exploitation
52:37 exploration trade-off and how you sort
52:39 of handle that yeah it's a good question
52:43 so sort of no regret algorithms in the
52:47 bandit setting essentially work on the
52:49 principle they handle exploration
52:51 exploitation with sort of an optimism
52:53 under uncertainty principle and what I
52:56 mean by that is like a classical bandit
52:57 algorithm that did very explicit
52:59 exploration exploitation would be
53:01 something like look you know at each
53:04 round you know with some problem with
53:06 most of would you know with high
53:07 probability choose the thing that's done
53:09 the best so far and with the remaining
53:12 probability pick something at random and
53:14 then sort of gradually schedule that
53:17 exploration probability to go to zero in
53:20 sort of the no regret framework you
53:22 instead have implicit exploration what
53:25 you basically do is you say things that
53:27 you know things that I've tried a lot
53:30 like we the it's gonna sound like I'm
53:31 making probabilistic assumptions but I'm
53:33 not but at the high the intuition in a
53:35 probabilistic setting would be things
53:37 that you've tried a lot you have very
53:38 tight error bars on what their real
53:41 underlying value is and things that you
53:42 haven't tried that much have very wide
53:46 error bars so instead of using like the
53:48 the empirical mean of each one of the
53:51 actions so far use an upper confidence
53:53 bound okay and so things that you
53:54 haven't tried much kind of get this
53:57 automatic promotion and then you try
53:58 them more and then you find out whether
53:59 they're really good or not and this is
54:01 basically how all these algorithms look
54:09 awesome thanks so much hope so you
54:11 showed that once you manage the risk of
54:14 your strategies you could get good
54:17 regret guarantees now what is the
54:19 intuitive reason that you didn't get it
54:20 when you didn't manage the risk
54:22 what's the intuitive reason why what
54:24 like theoretically you should have
54:26 gotten it without managing the risk as
54:29 your theory showed initially so what is
54:30 the intuitive reason when you manage the
54:31 risk you could get so no what I'm saying
54:34 is the original theory or brain
54:36 if you don't kind of have each one of
54:38 the constituent strategies managing its
54:40 own risk in some sense only in the
54:43 experiments rate huh no even in even in
54:45 theory right in theory I can just sort
54:49 of give you bad sequences in which the
54:52 underlying constituent strategies are
54:55 not risk managed and you're in anyway
54:56 and not only will the particular
54:58 algorithm I showed do poorly you can
55:00 actually prove outright that there
55:01 doesn't exist an algorithm which will
55:03 kind of recover the theoretical
55:06 guarantees I gave on the early slide
55:12 okay last question thank you dr. Cruz I
55:13 have a question about updating the
55:17 weights to optimize the Sharpe ratio can
55:19 we consider using some like momentum
55:22 gradient descent method that we can
55:25 update the instead of doing like
55:27 stochastically we can add the everydays
55:31 information with respect to the gradient
55:34 of the Sharpe ratio so that we can use
55:37 some momentum method that match the idea
55:41 that you use here too like yeah I'm
55:43 afraid my updating the you know my
55:44 answer is gonna be the unsatisfying one
55:47 which is you know nice idea it just
55:49 turns out that doesn't work okay so in
55:51 you know like as as per these
55:53 impossibility theorems that I mentioned
55:56 so by the way you can interpret these
55:58 algorithms as momentum algorithms
56:00 engaging in gradient descent so in
56:02 particularly this last generalization I
56:04 mentioned where you have like a convex
56:07 parameter space and each day nature is
56:08 picking some payoff function over that
56:10 parameter space and you're sort of
56:11 trying to track the best setting of your
56:13 parameters that though if you go look at
56:15 the actual algorithm that does that it's
56:19 essentially you know even though there's
56:21 no gradient to speak of because every
56:23 day the objective function is changing
56:24 according to nature's choice it's
56:26 essentially trying to compute some sort
56:29 of aggregate gradients of the functions
56:31 chosen the average of the functions
56:33 chosen by nature so far so these
56:35 algorithms in general they are momentum
56:36 algorithms and they are following a
56:39 gradient but sort of trying to do it
56:41 with Sharpe ratio as the objective just
56:43 kind of breaks all of the theory and
56:45 it's because of this sort of
56:46 non-additive ax t that i mentioned before
56:46 before