This content introduces "no regret learning," a machine learning framework designed to perform nearly as well as the best possible strategy in hindsight, even under adversarial conditions and without assumptions about future data. It explores its theoretical underpinnings, practical algorithms, and limitations, particularly in financial trading.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
good morning so I'm especially honored
and excited to introduce dr. Michael
Kern's for our morning keynotes because
in addition to all of the extraordinary
success that he's had which I'll tell
you about in a moment a little-known
fact is that a little over three years
ago Michael introduced me to a small
startup named quanto peon so actually at
at this at this conference
so Michael Kearns is a professor at the
University of Pennsylvania for their
computer and information science
department he's also a founding director
of Penn's Warren Center for Network and
data sciences his research interests
include topics in machine learning
artificial intelligence algorithmic game
theory computational social science and
quantitative trading he's consulted and
worked widely within the technology and
finance industries and it's chief
scientist at mana partners a
quantitative hedge fund and trading
technology firm here in New York City
and Michael is also a scientific adviser
to quant opium ok good morning thanks to
all of you for showing up so early on a
Saturday anyway thanks for coming and
thanks very much to the quanto peon team
for having me so I noticed in the
program there are a lot of talks that
are on the discussion of overfitting and
train test methodology etc which is all
good stuff but one way of avoiding
overfitting is to never assume in the
first place that the future will have
anything at all to do with the past and
that's what I want to talk about today
as you can see many talented people have
given deep thought to the concept of
regret in the past but I'm here to talk
about a particular technical definition
of regret and to tell you about the
subfield of machine learning which is
called no regret learning which is a
thriving subfield of machine learning
and is finding wide applications in many
domains and today I just want to tell
you about what the basic framework is
the kind of result that this field
produces theoretically
and then talk mainly about potential
applications of its to trading domains
and you know if you're fundamentally
suspicious of a title like trading
without regret your instincts are
correct and you'll notice that I've
added an asterisk to my title which we
will come to eventually okay so what is
no regret learning I just want on one
slide kind of describe the technical
framework for you so no regret learning
in this in the basic no vet learning
framework I want you to imagine you have
some set of n signals or predictors and
I want you to think about these signals
or predictors in an extremely general
way so to give examples of what these
signals or predictors might be they
might be the you know seared the time
series of returns of individual equities
or other instruments or funds of these
things or portfolios of such instruments
or they might be full-fledged strategies
like blackbox algorithms that are taking
any data sources you want as input and
producing if you like predictions about
the future so you might think of them as
alphas for instance in a trading context
or they might be you know other sources
of data or information like advisers
like stock analysts who are giving
upgrade and downgrade recommendations or
sentiment indicators from news or social
media anything you want you can only
think of these things as just boxes that
have in unknown inputs and are
generating a sequence of numerical
values and each trading period we're
going to imagine that each one of these
signals received some arbitrary payoff
and when I say arbitrary I mean I'm
gonna make absolutely no stochastic
assumptions or probabilistic modeling
whatsoever in particular you can think
of the returns or payoffs or losses of
these black boxes as being generated by
some omniscient or even a mom nishan
nature or even adversary whose express
purpose is to sort of foil your use of
these signals and so you should already
be wondering what in the world could you
possibly hope to say in such a setting
and I'll come to that in a second and in
particular these signals or black boxes
might have all kinds of side information
or even inside information they might
have expertise and specialization that
lets them do well under certain
conditions and poorly another under
other ones and so the only thing I'm
going to assume about kind of the vet
their payoffs that these end predictors
get at each time step is that they're
bounded okay that they're you know
they're there they're not arbitrarily
large either positive or negative and
note that this assumption is already
present in most stochastic models that
are used in finance such as you know the
transition from discrete Brownian motion
to continuous Brownian motion that
underlies things like options pricing
okay so in this setting your goal is to
design an algorithm that's gonna
maintain some portfolio or dynamic
weighting or if you like a probability
distribution over these and signals or
predictors and whatever a probability
distribution or weighting you're using
at each time step you're basically going
to get the weighted average according to
that distribution of the underlying
payoffs for the signals okay and so
what's the goal you know so I've set up
this extremely general setting where you
have these arbitrary and black boxes
your algorithm is maintaining some
weighting or distribution over them but
there's no assumptions at all on the
sequence of payoffs and so in particular
there's no assumption here that future
payoffs have anything to do with past
payoffs for these n signals well so in
in the absence of there being some
relationship between past data and
future data the only thing you might
hope to say is something about the past
about your performance in hindsight on
the data so far so no regret learning
the goal is that no matter what the
sequence of payoffs is for these
underlying and signals your algorithm
should have what's called no regret
which means that if we look at the
payoff of the best one of these signals
in hindsight that your algorithms payoff
is is not too much worse than that
okay so you know that's what I'm sort of
saying informally here no regret is
going to mean that if we look at the
best payoff meaning the best payoff
amongst these end signals in hindsight
and we look at the actual payoff that
your algorithm got in hindsight that the
difference between those things is
sublinear right so if T is the number of
time periods so far we'd like for
instance results like this difference of
the best payoff minus your algorithm
this payoff growing so you like square
root of T because if it grows like
square root of T then if we look at your purse
purse
regret right that square root of T
divided by T which is 1 over square root
of T and that's going to 0 as the number
of periods increases and this is the
source of the term no regret so no
regret refers to having per step regret
that diminishes with time which is
equivalent to having sub linear
accumulative regret okay now so so this
is the goal and so you know it's a form
of learning but it's a form of learning
in which you're not generalizing in any
sense at all
you're just claiming that at all points
in time you're sort of tracking the best
one of these n strategies or predictors
in hindsight ok so it's an it's an ex
post statement rather than an ex amt
statement now let me point out something
that we're not proposing here which
would be deeply suspicious we're not
competing with the optimal unconstrained
policy that can switch willy-nilly
between the different strategies or
predictors okay so we're not sort of
saying like in hindsight we should have
been doing this on you know following
following predictor 1 on the first day
predictor 7 on the second day predictor
19 on the third day with arbitrary
switching we're just comparing ourselves
to predictor 1 in isolation to an
isolation 3 in an isolation and
whichever of those is best we want to
track that ok so this is the basic
definition even though there are a lot
of deep mathematical connections between
boosting which many of you may be
familiar with and no regret learning the
fundamental assumptions of those two
frameworks and the goals of them are
quite different
so in boosting you do make probabilistic
assumptions you make stochastic or
distributional assumptions on how the
data is generated you essentially have
an iid setting and there your goal is to
actually beat the best of your weak
learner so to speak here we're not
trying to beat the best right we're just
trying to track the best and so you can
think of the criteria here as being the
difference between finding a needle in a haystack
haystack
ie the you know good quanto pian
strategy out of the sea of quanto pian
strategies versus the other challenge of
trying to build something that's better
than any one of the individual
strategies in the first place ok now no
regret learning this framework has
obvious appeal in financial settings where
where
in particular you know non-stationarity
is the rule and in fact even worse than
non-stationary a lot of that
non-stationarity might be generated by
game theoretic or strategic or even
adversarial interactions between
multiple parties and so typical time
series models even aren't appropriate
okay okay so this is probably the only
semi technical slide and I'm only gonna
talk about the highlighted parts for the
most part so in the top box here I've
reproduced from a survey paper on no
regret learning that I'll give a
reference to at the end what a typical
algorithm that achieves this no regret
criterion looks like and the first thing
I want you to notice is how simple and
how short it is right if you coded this
up in MATLAB in full detail it would be
maybe five or ten lines inside of a loop
and these algorithms are sort of very
appealingly simple in almost every no
regret algorithm looks more or less like
this you would essentially have a weight
associated with each one of the end
signals and you're going to adjust those
weights in an AW in the semi obvious way
according to historical performance and
in particular the way you're gonna
update a weight at the next time step is
you're going to say the new value for
the weight is the old value to the way
times some multiplicative factor this
happens to be phrased as losses rather
than payoff so negative values for this
signal M are good and positive is bad
but in plain English what this update is
basically doing is saying look whatever
your payoff was I'm going to increase
your weight by a multiplicative factor
of 1 plus something times your payoff
and if you lost money I'm gonna I'm
gonna I'm gonna multiply your weight by
something less than 1 which is going to
be 1 minus something times your payoff
and there's something here is this
single free parameter of this algorithm
which is a learning rate okay so I have
one knob and if I kind of crank that
knob up very high then I'm making very
very radical changes in response to
positive or negative payoff so I'm gonna
greatly up weight things that get
slightly positive payoff and greatly
down weight things that get slightly
negative payoff if I set that learning
rate very very low then I'm I'm making
much more modest changes
the important kind of algebraic the part
of this update is that its
multiplicative it's not additive so
those of you who are familiar with
things like the perceptron learning
algorithm for linear threshold functions
which has nice mathematical properties
that's an additive update you sort of
say like well I'm going to add something
to the weight if it got a positive
payoff and I'm going to subtract
something from the way here we're
actually multiplying things and so in
particular if one of these signals goes
on a run of positive payoffs it will
exponentially increase the weight of
that signal over that period of time
okay and then the way you're actually
making your decisions at each round is
just by playing the weighted average of
these of the underlying signals
according to the normalized distribution
given by these weights okay so that's
the algorithm it's incredibly simple and
here's a very very nice theorem you can
prove about this algorithm and I'll just
go through sort of the final part this
it's this theorem is saying look if you
set the learning rate of the algorithm
in a particular way then what this pink
box is saying is that the payoff of this
algorithm on an arbitrary sequence of
payoff vectors for the underlying
signals the payoff of this algorithm at
the top will be the payoff of the best
one of the underlying signals in
hindsight - something that looks like
square root of T log of the number of
underlying signals okay so let me say
that again because I'm gonna and I want
you to keep just this one extra break
expression in mind the payoff of this
algorithm will be no matter what the
sequence of underlying payoffs for the
signals is the payoff of the algorithm
will be at worst the best payoff in
hindsight of any of the signals minus
the square root of T which is the number
of time steps times log of the number of
underlying signals that you're tracking
okay so there is a penalty here that for
the for how many signals you're tracking
is there should be and I'll say
something a little bit more precise
about that in a second there is a
penalty for that but it's modest it's
logarithmic okay okay so let me just
make some miscellaneous remarks about
this framework and type of result so
first of all no regret
and the analysis of them have actually a
very long and rich history it goes back
at least to kind of the game theory
micro econ literature in the 50s in the
form of something called Blackwell
approachability it has a very large and
very currently active as I mentioned
machine learning literature there are
lots of connections to finance some of
which I'll talk about but a couple of
others that I won't there are modern
connections to black Scholes so in
particular it is possible to derive the
black Scholes pricing formula without
any probabilistic assumptions whatsoever
by essentially viewing the pricing
process as a game between a price setter
and an arbitrage or both of whom are
playing a no regret algorithm and this
was done in the last few years or so
there are also very close predictions
between no regret learning and the right
way of pricing in prediction markets in
order to elicit truthful information
from an underlying population they're
very strong and long-standing
connections to game theory you can
derive the minimax theorem and you can
also derive linear programming again as
the limiting behavior between two
parties playing no regret algorithms in
a zero-sum game and more generally if
you take - no regret algorithms and have
them play repeatedly against each other
so they're each generating the payoffs
for the other one in an arbitrary game
zero-sum or otherwise you you get
convergence to Nash or a more general
notion of equilibrium called correlated
equilibrium now this all sounds great
and very powerful let me make a couple
of remarks though that will demystify
this kind of result in this framework so
first of all you know a lot of what's
difficult about proving the type of
theorem that I gave on the previous
slide is to really deal with the case to
really deal with arbitrary sequences of
payoffs for the underlying signals so in
particular you know here's an algorithm
that won't meet the no regret criterion
but you can still say something about so
it's just the greedy algorithm which is
known and there's no regret literature
is follow-the-leader
which is all you do is you basically say
well I'm going to look at the payoffs so
far and whichever one of these
underlying signals has the high
is accumulative payoff so far I'm gonna
put all of my money or all of my weight
on that particular signal tomorrow okay
and then you know you then tomorrow
happens and you update everybody's data
and you repeat okay so this is sort of
the most naive algorithm you can imagine
it does not meet anything like the
result I gave on the previous slide but
on the other hand what you can say is
that the regret of this algorithm will
essentially scale like the number of
times that the lead changes in the
underlying signals for the end
predictors right because the only time
you sort of lose is when you know you
made the you went with expert five or
predictor five and you know they weren't
the best and then they actually lost the
lead so you know if you have if you
happen to have a collection of
underlying signals in which the one
that's the best it doesn't change that
often then this algorithm will work
perfectly well but in general it'll do
poorly okay maybe a slightly more
related but more mathematical thing to
say is that even though this log and
term and the regret is modest it means
that you can't try anything
okay so here's something that you know
doesn't work and shouldn't work which is
imagine that what's going on is that
nature is flipping a sequence of fair
coins and these predictors that you have
are trying to predict what the outcome
of the coin is going to be each day okay
so you're gonna try to look at the
predictors and and use their guesses as
to what the next value of the coin is
going to be to make your prediction okay
so I think we can all agree that the
result I gave on the last slide better
not say that you can predict the outcome
of a fair coin better than random
guessing and in fact it doesn't right
because in order to in order for there
to be an expert or a predictor in your
pool that's good you'd essentially need
to add as predictors all two to the D to
two to the T possible sequences for this
coin over this sequence of coin flips
right and you'll notice that if you
actually put if you choose N equals two
to the T then log of two to the T is T
and then when you take square root of T
times log T its square root of T squared
would which I'm sorry it gives you
square root of it gives you exactly
square root of T okay and this is this is
is
it should be right what it's saying is
that you can use this method to kind of
come within predicting the outcome of
fair coin making sort of like 1 over
square root of T you know which is
exactly the variance right the standard
deviation okay so it's not telling you
that you know impossible things are
possible which is good on the other hand
it's often the case in the no regret
literature that you find that when you
do make probabilistic assumptions of
some kind on how the data is generated
and you run a no regret algorithm which
is not using such assumptions that no
regret learning algorithms often in fact
give you optimal or near optimal
solutions for your particular stochastic
assumptions so this is a very nice thing
because it means that you can have these
worst-case adversarial guarantees no
matter what the data is but if the world
is nice to you you might actually enjoy
the optimal result for that nice set of
assumptions ok so now let's turn to
trading that's a that's background about
no regret learning so the use of no
regret learning in trading or finance
settings is not new at all and what I've
reproduced here is the cover page of a
very famous paper on this topic called
Universal portfolios by tom koehler who
died a few years ago but was for many
years a prominent statistician at
stanford and in his paper he was he was
you know his paper actually predates the
kind of modern flurry of research in the
machine learning community on no regret
learning and he was considering a
setting in which these n predictors were
just simply the returns of stocks ok so
you have history historical returns of stocks
stocks
he's not going to make any stochastic
assumptions like Brownian motion with
drift or or any other time series model
it's just each day there's a vector of
returns for some underlying universe of
n stocks and he wants to maintain a
weighted portfolio over those stocks and
say something about it and he actually
says something stronger even than what I
said about comparing favorably to the
best single stock in hindsight he
compares his algorithm his algorithms return
return
- the best constant rebalance portfolio
in hindsight okay so he what I mean by
that just so we're all on the same page
is that the sort of class that he wants
to compare himself are all distributions
over these and underlying instruments in
which you do constant rebalancing so you
just have some fixed weighting and then
every day you know some things make and
some things lose money and then you've
made or lost money whatever money you
have left you're gonna redistribute it
according to this constant vector of
weights okay and as some of you might
know rebalancing is an extremely
powerful thing in particular it's easy
to kind of you know come up with very
simple sequences of returns for even two
stocks in which neither of the stocks is
really doing anything at all they're
it's just they're essentially flat but
because of the rebalancing you're
actually getting exponential growth in wealth
wealth
okay so Kover proposed a particular
algorithm that doesn't look too
dissimilar from the ones I described a
few slides ago and compares it to the
best constant rebalance portfolio after
the stock outcome so again in hindsight
for an arbitrary sequence of returns and
but he so he compares to a stronger
class of comparators but he has a weaker
criterion for what he means by regret in
particular he looks at log wealth so he
wants the log of the of his cumulative
wealth to compare favorably to the log
of the cumulative wealth of the best
constant rebalance portfolio in
hindsight and and this is all very very
sensible in the regime where you imagine
that the the best constant rebalance
portfolio is experiencing exponential
growth right because if you take the log
and it's not exponential growth this is
not an interesting statement okay
okay and so he actually did some some
simulations so here is a figure
reproduced from Cobras paper in which
there are two just so-so and n is two
here there are two underlying stocks and
here they are down at the bottom and
here he's showing the kind of cumulative
return of those stocks and then the
cumulative return of his so-called
Universal portfolio algorithm and you
can see this looks quite good these
stocks aren't doing
his generally doing quite well and much
much not not only does it have kind of
no regrets to either these stocks it's
doing much much better he doesn't
unfortunately show what the best
constant rebalance portfolio would be
but there is a theoretical guarantee
about that and this is from a much later
paper that proposed a similar algorithm
for a similar setting in which again
they've taken some underlying universe I
think in this case of about 30 tickers
and they're not showing you the
underlying tickers but they are showing
you the performance of their algorithm
which is here this is the performance of
the best constant rebalance portfolio so
they're kind of comparing favorably to
that they're not doing as well but
they're doing almost as well and this is
the return of the best single stock in
the underlying universe of 30 ok so this
all looks well and good and you know so
maybe 10 years ago or so you know when I
started or maybe more like 15 years ago
or so when I started thinking seriously
about about trading and Finance I you
know I was aware of this literature and
I was aware of these papers and I
thought well this is an obvious thing to
try and I learned a lot in doing that
which is where we come to the asterisk
on the title here and so here is here
these are plots from some experiments I
did and in this particular case I've
taken a much much larger universe than
any of the previous papers consider in
particular I took the S&P 500 okay so
now instead of n being 2 or 30 and is
500 okay and what I'm showing you here
is the S&P 500 between 2005 and 2011
which is a very interesting period for
the S&P 500 because it includes the
financial crisis and most of the
recovery from the financial crisis but
to the extent that the index ends up in
2011 about where it started in 2005 so
there's been sort of huge directional
movement but overall there's not been
much change in the index from the start
points at the end points but period of
extremely high volatility and so what
I'm showing you on this left-hand plot
is there's a blue curve for each of the
underlying stocks in the S&P 500 this is
the key each one of these blue lines is
the cumulative return over the
period for some for one of those 500
stocks and these lines that are buried
down here in the middle with again with
a little red arrow are the performance
of the kind of multiplicative weights
algorithm that I showed you before that
has this no regret guarantee and now it
doesn't look so great and there's
there's many of these I'll get to why
there's many different colored lines
here in a second
but you can see that you know this does
not look especially better than just
trading the index itself and it's not ok
it's you're not doing much better here
with this particular algorithm then you
would have done by just sort of equally
weighting the 500 stocks at the
beginning of 2005 and letting it and
letting it sit or rebalancing for
instance over that time period and what
I've done over here on the right plot is
taken away the other 500 curves here so
we can just see the performance of the
album a little bit better and the
different curves here correspond to
different values of the learning rate so
remember I told you there's this one
parameter of this method which is how
aggressively you make these updates and
if you set it to zero then you're not
doing any adaptation at all you're
simply parking your weights equally and
leaving them and if you crank it up
really high then you're making very very
dramatic changes in response to positive
or negative returns and so the the green
line here is where the learning rate is
the highest and the blue one is where
it's the lowest and you can see like to
me none of these look appealing right
the the mid the blue one is kind of
trading the index the green one you know
looks a little bit better at the end but
then you had of course a much more
significant drawdown during the sort of
period of the financial crisis and it's
not the real problem here isn't that the
returns don't look great that is a
problem the real problem is that though
is the way you're not getting good
return which is with extremely
concentrated portfolios so in particular
if you crank the weight up to this
particular green curve that I get here
this is what the underlying weights look
like so there are actually 500 curves in
this plot here you just can't see almost
all of them because they're nailed down
here at zero and so you can see that all
this you know green strategy is doing is
wildly switching its capital and sort of
chasing whichever thing
is outperforming for a particular sort
time period you can see that roughly
when this particular stock takes off is
when the algorithm sort of midway
through the time period more or less
puts almost all of its weight in that
stock for the remainder of the period
right and so even if you were sort of
found these returns acceptable from a
quant perspective you could never trade
this right you know like a quant hedge fund
fund
quanto peon in particular I believe
would never let you trade such a
strategy because it's too concentrated
right you're basically like doing stock
picking you're not actually doing
something quantitative and with low
concentration and high diversity by the
way if you're interested in what this
stock is it is of course Apple and so
even though you know even though the
algorithm was doing something sensible
and I would have been very happy for you
to have told me in 2005 to put all of my
wealth and Apple and just leave it there
still from a quant perspective this is
not the sort of you know kind of xposed
behavior that you want to see okay and
so this leads to the natural question of
like well can we can we you know fix
this can we adapt these algorithms and
this framework which i think is very
appealing where you're not making any
stochastic assumptions and you're still
getting these very strong ex-post
guarantees to an arbitrary sequence of
data can we somehow preserve those
properties but deal with the you know
the sort of additional constraints that
come into a quant rating environment
which is you want things like a
diversified portfolio and other forms
you know which is one form of risk
management right diversification okay
and so you know after doing these
experiments I spent you know some time
over the coming years writing a series
of papers in which I tried to address
these things and and let me tell you
about some things that don't work so one
thing you could ask for is you could
just say like well look you know of
course if your metric is to you know
let's say in a finance setting to have
the return the return of your algorithm
compared favorably in hindsight to the
return of the best individual stock or
the return of the best constant of
rebalancing portfolio you know there's
nothing in that statement that mentions
anything about risk okay or a
concentration or or Sharpe ratio or the
like and so
one thing we could say it's like well
maybe we can just change the criterion
and go back and kind of redo the math
and work out an algorithm that will do
make a similar statement for the altered
criterion so let's let me be precise you
know every every individual stock not
only has a cumulative return it has a
Sharpe ratio right we can we can look at
the sp500 and after any time period we
could say what is the Sharpe ratio of
each one of these stocks over the time
period in question and I could therefore
ask can we design an algorithm that in
hindsight has a Sharpe ratio which
compares favorably I has low regrets to
the best individual stocks Sharpe ratio
in hindsight perfectly reasonable
question could also ask it about
whatever your other favorite kind of
risk versus reward trade-off is like the
mean mean variance optimization I could
sort of say like well I'm gonna for each
stock look at its return - its standard
deviation historically and I watched
your algorithm to compare to whichever
stock has the highest mean - standard
deviation okay so this seemed like a
good idea
and unfortunately there are strong
negative results on this so not only
could I not find an algorithm which gave
me a mathematical statement like the one
we saw at the beginning but first say
something like Sharpe ratio even worse
we were able to prove an impossibility
here and we will go prove that you just
cannot design an algorithm which will
give you such a guarantee on an
arbitrary sequence of returns so no
regret is provably impossible in fact
not only can you prove that no regret is
impossible you can prove strong lower
bounds on what's called the competitive
ratio so you know instead of sort of
getting optimal - something you know I
could ask well maybe can I get half of
optimal or a 1/10 of optimal so it's
like some constant fraction of optimal
you can even prove impossibility results
for that so there's somehow something
about this framework which forbids you
from changing to the sort of the types
of criteria we might like to have in a
finance setting and still kind of
preserving the types of mathematical
guarantees that we get and intuitively
you know
the reason for these impossibility
results is that volatility introduces
switching costs right so you know I can
be tracking a stock you're putting all
of my weight in one particular stock and
enjoying the sequence of returns it gets
and maybe some other stock starts
looking better and so I want to switch
to that other stock well at the moment I
switch to that other stock I start
enjoying whatever returns that stock is
getting but I don't enjoy the same
historical volatility right so somehow
standard deviation is somehow keeping
track of some sort of state and this
state is causing problems for the the no
regret type of framework okay so one
thing that we have more success with an
alternative approach if you like is to
instead of assuming that these black
boxes or predictors are arbitrary let's
imagine that they have some risk
consideration properties as well okay so
in particular let's internalize the risk
within these strategies and then if we
have a collection of strategies that are
doing some risk management internally
can we run an algorithm on top of those
which has sort of favorable no regret
style properties and so let me let me
just tell you a little bit about this in
one slide so this is what you might
consider no regret learning under some
sort of risk constraint okay and so as
many of you know like the Sharpe ratio
is something you can always measure but
it's very difficult criterion to
actually manage to write the the closest
you can come to actually managing to the
Sharpe ratio is to just impose risk
limits to control your risk somehow and
so in particular one natural way of
controlling risk is to restrict yourself
to portfolios or strategies whose daily
PL has a standard deviation of at most a
certain amount so we're basically going
to say look you know you know you know
kind of hedge fund context you might
imagine saying whatever the returns are
we want to restrict ourselves to
strategies that on a typical day you
know won't have a standard deviation of
let's say more than ten million dollars
one one way or the other okay so we're
gonna basically we have some underlying
capital and a Strad
we can look historically at the PNL
that's been generated on that Capitol
and look at its distribution and say we
just don't want anything whose standard
deviation is more than a particular
dollar amount so if you do that right if
for instance you'd go to the exercise of
taking the historical returns of the S&P
500 and you estimate the covariance
matrix between those 500 stocks or you
have some perhaps lower-dimensional
factor model for doing so and you you
basically say well what are the
portfolio's over those stocks that have
less than a certain dollar value of
volatility historically it's easy to
work out the map that this is basically
going to give you an elliptical
constraint in portfolio space and the
eccentricity of that ellipse will depend
on the strength of the correlation so
like if all of the stocks are
independent then this constraint on the
standard deviation of your gmv is just a
spear and n dimensions if some stocks
are highly correlated and you'll get
sort of eccentricity in those particular
dimensions okay okay so so this is nice
because it means that you know if we
adopt if we sort of assume that the
underlying strategies or portfolios that
we're trying to track in a no regret
setting our risk managed in the sense
that they have some standard deviation
limit on their you know on their daily
P&L we end up with a convex in
particular in elliptical but more
generally a convex constraint on our on
our portfolio space okay and what I'm
showing you in these kind of simulated
plots are you know just two strategies
one of which is essentially a momentum
strategy and this is where there are two
stocks okay so there are two stocks here
and and the X and y values here are
essentially indicating the position in
that particular stock at different
moments of time and the color-coding
here you can ignore it's just sort of I
think blue is sort of early in the data
and and red is later in the data so I'm
just trying to show you the movement
over time so in in these simulated plots
all I'm showing you is one strategy
that's essentially following
a momentum rule and is therefore
changing its portfolio over time and
another one is essentially a pairs
trading strategy where there's a
continued no it's basically offsetting
long positions in one stock with short
positions in the other stock and the
stocks are pretty correlated which is
why we have this highly elliptical shape
here so these are sort of two underlying
strategies that we've imposed risk
considerations on so you can see that
not the you know for either this
momentum strategy or this kind of fully
hedged pairs trading style strategy they
never leave this elliptical constraint
because if they hit up against the
border we just say you know don't make
that trade don't make any trade that
would actually cause you to cross this
risk constraint and so now we've changed
the game right rather than considering
arbitrary black boxes we're considering
arbitrary black boxes but we don't let
them go outside of this little playpen
right because we're enforcing a risk
consideration on them and now you can recover
recover
you know the type of theorem I gave in
an earlier slide in so now we're only
going to compete with strategies that
obey these risk limits and we're also
only going to look at strategies that
make as these strategies do they only
make local moves in portfolio spaces
because and you can think of this as
modeling you know kind of limiting your
market impact okay and to kind of cut to
the chase if you make the additional
assumption that your underlying
strategies are risk managed in this
particular way then you can run you can
give an algorithm that looks similar to
but it's a little bit more complicated
than the one I gave at the beginning
that we'll all it will actually give you
a no regret guarantee in in the in the
risk managed criterion okay say like me
- variance okay and the only thing you
need to do is sort kind of combine no
regret learning with some tricks that
are from an interesting literature
called the pursuit of Asian literature
which you can go look up which has a
long mathematical history but it's
basically about games in which one party
is trying to chase another party let's
say in a convex region and so you can
recover theoretical guarantees and you
can get kind of nice experimental
results as well let me just finish up
here by talking about some variations
of this framework so you know one thing
it would be nice is to maintain the
original no regret I guarantee I talked
about which is to have small regrets to
the best single predictor in hindsight
but also guarantee that you will never
do much worse than some fixed index okay
so for example it would be great if I
could say like well you'll get close to
the returns of whatever the best stock
is in the in the index but you'll never
do worse than the index itself okay
whereas the original statement allows
you to do kind of cumulatively square
root of t worse than the index right so
if sort of everything's moving sideways
and there's no winner taking off
relative to the index you might actually
lose got a square root of regret square
root of T regrets to the index itself
which wouldn't be so great so it turns
out that you can make again slight
modifications to the algorithm the type
of algorithm I gave you at the beginning
of the talk and get this kind of
guarantee as well sort of promise that
you'll track the index and if something
in the index takes off then your your
profit will will track that as well a
very important generalization in in many
applications but including in finance
applications is to go to what's called a
bandit setting so in the framework I've
described to you so far you know each
day you're maintaining awaiting over
these underlying predictors or stocks
but whatever weighting you're
maintaining even if you put all of your
weight in a single stock you're still
finding out what the returns of all of
the stocks are okay and so that's fine
if your underlying predictors are
actually stock are actually stocks but
in many settings
you know you might only get feedback on
the thing that you chose to play and
this is what's called a multi-armed
bandit problem for those of you that are
familiar with that literature so for
example if instead of being n stocks I
have n strategies and each of these
strategies has internal state and makes
trades each day you know and I want to
maintain kind of a weighting over these
underlying strategies like well to
really find out what a strategy does
does on a given day I really need to
trade it in particular because you know
I can back test all of them
right in parallel but I can to really
find out what a strategy would have
gotten today I really need to trade it
live in the market among other reasons
because of trading costs right because a
market impact because of the
counterfactual that you know if I make
trades that causes other traders and
algorithms to change their behavior and
the only way of simulating that is by
actually trading live in the market okay
so in a setting like that I can't just
sort of maintain this conceptual
weighting and every day find out what
these strategies would have done I
really need to pick one or more of them
and play it in the market to get the
feedback okay and so this is called a
bandit setting where you only get the
feedback or payoff of whatever you
whatever choice you made that day and
and basically everything I've said in
this talk also applies to the bandit
setting there are simple modifications
of the algorithm I gave on the first
slide where you only get feedback from
the thing that you chose and you still
get similar mathematical results would
like a worse regret down I think it goes
like the dependence on the number of
time steps goes from square root of T to
like T to the 2/3 something like that
okay let's see the the other thing that
I think you know tool that you you
should be aware of and by the way one of
the reasons I wanted to give this talk
is that these tools are simple enough
but powerful enough that I think a lot
of you could find applications for them
in your own strategies and so let me
tell you about you know kind of one of
the the biggest generalizations of this
framework that's been known today which
is sort of general online or sequential
convex optimization so now imagine your
action space instead of being like
picking between one of n discrete
alternatives which is really the same as
maintaining a weighting over them you
can think of maintaining a weighting or
distribution over them as is like a
probabilistic strategy where I have a
weighting and I'm going to pick randomly
from that distribution and play the
particular time aid right so
mathematically it's exactly the same
because it's just you know either the
inner product or it's an expectation
okay so a very nice journal is a ssin of
this is where our action space is
continuous right so instead of there
being n distinct choices there's an
continuous infinity of choices
but we need the shape of that continuous
infinity to be nice like let's say some
convex and compact region but if you
want to think of a Kong a specific
example imagine that you have some
parametric strategy class okay and that
you know each point in this space is one
setting of the parameters for your
strategy okay so a typical thing to do
would be to get historical data and
optimize the parameters of your strategy
on the historical data and use that
going forward but then again you're
making the assumption that like the
future is going to look something like
the past so you can not do that and
instead think in a no regret way and say
like well there's this continuous
infinity of settings of parameters for
my strategy I'd like to be able to
promise that no matter what the data
looks like I will run a strategy whose
performance in hindsight looks like the
optimal setting of the parameters you
know after the data's come in ex-post
okay um and and and furthermore let's
imagine that the way the payoffs are
determined each day is with some
arbitrary convex function over your
parameter space so you know the game is
if you like is that you know everyday
nature picks some arbitrary potentially
different convex function describing the
payoffs over your parameter space you
don't know what that function is but you
pick one of the parameter points and you
get the corresponding payoff that day
the next day nature changes what that
function looks like you again pick a
setting for your parameters you can
basically get the same kind of guarantee
that I gave in the no regret setting
where you promise that no matter what
the sequence of payoff functions chosen
by nature is their performance of your
algorithm will compare favorably to the
best setting of your parameters in
hindsight okay and this is sort of the
most general the most general
formulation of this the no regret
setting so just to conclude you know no
regret learning I think is something
that I would encourage all of you to
look at it has very rich history has
very powerful theory that touches on you
know many topics from finance to game theory
theory
and and even sort of mathematical
programming in the form of convex
optimization linear programming and the
like on the other hand deviations from
the typical additive objective function
like cumulative returns or cumulative
payoffs kind of break the theory for the
most part which is not a great thing but
there are workarounds right so there's
that you can one workaround that I
mentioned and there are others the one
is to sort of in dodging i's risk levels
within your strategies and then run such
an algorithm on top of those risk manage
strategies as well and I've listed here
at the end of the quanto peon team will
make these slides available I've just
listed some references I especially
recommend the first two which are served
very nice survey papers on the entire
topic of no regret learning and they're
they're relatively recent as well and
I'm happy to take questions if there are any
okay so first off um I I'd rather stay
man because first off I just want thank
you for giving such an amazing talk
thank you which I have you as a
professor but well I want want to ask
about this is I I guess that that when
you when you have that the payoff of the
algorithm the performance of the
algorithm I guess it depends on how you
would define you know the the payoff
that you're aiming at like if you have a
collection of n strategies then like I I
guess it just become more concentrating
that that you're that what you're aiming
at is total cumulative return over the
entire Baptist as compared to say you
know the total cumulative return over
the past three months six months I see
so you want to you you want to have kind
of a sliding guarantee so so this this
this this guarantee there are variants
to give you this as well right so rather
than sort of saying like well T just
keeps ticking up and I always will
promise that I'm like square root of T
behind at worst the best thing on the
entire history so far I might want to
move and guarantee right that basically
says in any window of length you know
let's let's give it a different name of
any window of length T prime I will
compare favorably to the best thing on
that window you can get such a guarantee
as well okay but but you know but in
general that that fixed window T prime
won't be going to infinity right because
it's a fixed window so if I want to you
know you should think of it's like it's
it's a learning it is a learning
algorithm right it's sort of getting
better in the sense that the guarantee
that you're giving is getting stronger
the longer time goes on but if you
choose to compare yourself to a fixed
moving window of like 250 training days
right the last calendar year then that
value will be fixed and you will you
know you'll trap to that extent but you
won't have a strong a guarantee as you
would if you let kind of T goat
asymptotically to infinity okay okay it
just seems like like you know like God
that the classic you know momentum
strategies like they seem to sort of
mesh pretty well with the
this is that yeah so that's a good point
this is a momentum strategy covert thing
is a little bit different right because
you're also you're running it on top of
constant rebalance portfolio so there's
this interesting there's both momentum
and reversion going on because the meta
algorithm that's reading underlying
court folios that's momentum but the
constant rebalancing is a reversion
process right you in when you rebalance
you take money away from things that did
well today and you give it to things
that didn't do well today right so in in
the in the Universal portfolio case
there's both the momentum and a
reversion aspect but in the straight
versions where you're just comparing to
the underlying and stocks for instance
it is it is a pure momentum strategy
you're right that's that's pretty cool
thank you yeah actually I found two
questions about this the first one is
that I'm a little bit confused about why
risk sort of derail the algorithm I
would have thought that you know the
theoretical proof has already
incorporated all possible random
deviation from the mean so that you know
despite the presence of which it still
satisfies that convergence yeah so let
me let me try to answer that question
even more generally than these methods
in which maybe a more or less satisfying
way of answering in in general one of
the dark secrets of all of modern
machine learning is its reliance on
additive objective functions so what I
mean by that is you have an objective
function that you're optimizing that can
be decomposed as a sum of the objective
functions on the individual data points
okay so for example
least squares regression right I can
write down the objective function as a
sum over each data point of my squared
error on that data point okay and
basically everything you know from
modern machine learning almost
everything relies very heavily on that
additive 'ti both the theory and the
practice okay and anytime you have an
objective function which breaks that
various chaos ensues and the
simple thing that happens here is that
you know square root does not have that
property right if I look at like the
variance of a random variable or or like
the the performance of a model over time
it's a square root of a son okay and you
can't rewrite a square root of a sum as
a sum of square roots okay and this is
what so so that's sort of a math answer
the intuitive answer is that things like
variance and standard deviation and
other common risk metrics they actually
have state right another way of saying
less mathematically what I just said is
that I if you give me a list of if
there's a list of numbers and you tell
me that there's n numbers in the
sequence and I want to know what the
contribution of this number is to the
average I know exactly what it is it's
that number divided by n if you ask me
what the contribution of this number is
to the variance of that sequence I can't
answer you I need to see the rest of the
numbers and so things like standard
deviation and variance that you can
think of them as having state right
there you're actually keeping there's
something there's something about the
googlers global information about the
sequence in that number in a way that's
not present in things like just the
straight mean or any other additive loss
function okay thank you so the second
part of my question is is the slow
conger conversions i you know i
understand that you have sub linear
convergence to the best algorithm so the
question is is this too slow for a
practical you know trader I guess my
answer that would be if this if if you
know it depends on you know it depends
on how much data you have and and many
other things but my general answer would
be a sort of one over square root of T
convergence to optimality is too slow
for you then so will all of machine
learning right because basically sort of
power law convergence is the rule in
machine learning both the theoretically
and largely empirically as well so you
know I I definitely agree there are you
know there are data sets for which the T
we have is just you know not big enough
just for these things to make
interesting statements but in general
that's a sign that you just don't
enough data to apply very very simple
learning methods even like linear
regression exam for example thanks for a
great talk very thought-provoking so my
question is in regards to the
multi-armed bandit yeah can you talk a
little bit about the exploitation
exploration trade-off and how you sort
of handle that yeah it's a good question
so sort of no regret algorithms in the
bandit setting essentially work on the
principle they handle exploration
exploitation with sort of an optimism
under uncertainty principle and what I
mean by that is like a classical bandit
algorithm that did very explicit
exploration exploitation would be
something like look you know at each
round you know with some problem with
most of would you know with high
probability choose the thing that's done
the best so far and with the remaining
probability pick something at random and
then sort of gradually schedule that
exploration probability to go to zero in
sort of the no regret framework you
instead have implicit exploration what
you basically do is you say things that
you know things that I've tried a lot
like we the it's gonna sound like I'm
making probabilistic assumptions but I'm
not but at the high the intuition in a
probabilistic setting would be things
that you've tried a lot you have very
tight error bars on what their real
underlying value is and things that you
haven't tried that much have very wide
error bars so instead of using like the
the empirical mean of each one of the
actions so far use an upper confidence
bound okay and so things that you
haven't tried much kind of get this
automatic promotion and then you try
them more and then you find out whether
they're really good or not and this is
basically how all these algorithms look
awesome thanks so much hope so you
showed that once you manage the risk of
your strategies you could get good
regret guarantees now what is the
intuitive reason that you didn't get it
when you didn't manage the risk
what's the intuitive reason why what
like theoretically you should have
gotten it without managing the risk as
your theory showed initially so what is
the intuitive reason when you manage the
risk you could get so no what I'm saying
is the original theory or brain
if you don't kind of have each one of
the constituent strategies managing its
own risk in some sense only in the
experiments rate huh no even in even in
theory right in theory I can just sort
of give you bad sequences in which the
underlying constituent strategies are
not risk managed and you're in anyway
and not only will the particular
algorithm I showed do poorly you can
actually prove outright that there
doesn't exist an algorithm which will
kind of recover the theoretical
guarantees I gave on the early slide
okay last question thank you dr. Cruz I
have a question about updating the
weights to optimize the Sharpe ratio can
we consider using some like momentum
gradient descent method that we can
update the instead of doing like
stochastically we can add the everydays
information with respect to the gradient
of the Sharpe ratio so that we can use
some momentum method that match the idea
that you use here too like yeah I'm
afraid my updating the you know my
answer is gonna be the unsatisfying one
which is you know nice idea it just
turns out that doesn't work okay so in
you know like as as per these
impossibility theorems that I mentioned
so by the way you can interpret these
algorithms as momentum algorithms
engaging in gradient descent so in
particularly this last generalization I
mentioned where you have like a convex
parameter space and each day nature is
picking some payoff function over that
parameter space and you're sort of
trying to track the best setting of your
parameters that though if you go look at
the actual algorithm that does that it's
essentially you know even though there's
no gradient to speak of because every
day the objective function is changing
according to nature's choice it's
essentially trying to compute some sort
of aggregate gradients of the functions
chosen the average of the functions
chosen by nature so far so these
algorithms in general they are momentum
algorithms and they are following a
gradient but sort of trying to do it
with Sharpe ratio as the objective just
kind of breaks all of the theory and
it's because of this sort of
non-additive ax t that i mentioned before
before
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.