YouTube 字幕：
"Trading without Regret" by Dr. Michael Kearns

不必从头看完视频——获取完整字幕，搜索关键词，一键复制。

AutoDub

听懂YouTube外语视频

沉浸式YouTube翻译中文配音

告别语言障碍，拥抱全球优质内容

免费使用

视频字幕

视频摘要

Summary

Core Theme

This content introduces "no regret learning," a machine learning framework designed to perform nearly as well as the best possible strategy in hindsight, even under adversarial conditions and without assumptions about future data. It explores its theoretical underpinnings, practical algorithms, and limitations, particularly in financial trading.

Mind Map

点击展开

点击探索完整互动思维导图

good morning so I'm especially honored

and excited to introduce dr. Michael

Kern's for our morning keynotes because

in addition to all of the extraordinary

success that he's had which I'll tell

you about in a moment a little-known

fact is that a little over three years

ago Michael introduced me to a small

startup named quanto peon so actually at

at this at this conference

so Michael Kearns is a professor at the

University of Pennsylvania for their

computer and information science

department he's also a founding director

of Penn's Warren Center for Network and

data sciences his research interests

include topics in machine learning

artificial intelligence algorithmic game

theory computational social science and

quantitative trading he's consulted and

worked widely within the technology and

finance industries and it's chief

scientist at mana partners a

quantitative hedge fund and trading

technology firm here in New York City

and Michael is also a scientific adviser

to quant opium ok good morning thanks to

all of you for showing up so early on a

Saturday anyway thanks for coming and

thanks very much to the quanto peon team

for having me so I noticed in the

program there are a lot of talks that

are on the discussion of overfitting and

train test methodology etc which is all

good stuff but one way of avoiding

overfitting is to never assume in the

first place that the future will have

anything at all to do with the past and

that's what I want to talk about today

as you can see many talented people have

given deep thought to the concept of

regret in the past but I'm here to talk

about a particular technical definition

of regret and to tell you about the

subfield of machine learning which is

called no regret learning which is a

thriving subfield of machine learning

and is finding wide applications in many

domains and today I just want to tell

you about what the basic framework is

the kind of result that this field

produces theoretically

and then talk mainly about potential

applications of its to trading domains

and you know if you're fundamentally

suspicious of a title like trading

without regret your instincts are

correct and you'll notice that I've

added an asterisk to my title which we

will come to eventually okay so what is

no regret learning I just want on one

slide kind of describe the technical

framework for you so no regret learning

in this in the basic no vet learning

framework I want you to imagine you have

some set of n signals or predictors and

I want you to think about these signals

or predictors in an extremely general

way so to give examples of what these

signals or predictors might be they

might be the you know seared the time

series of returns of individual equities

or other instruments or funds of these

things or portfolios of such instruments

or they might be full-fledged strategies

like blackbox algorithms that are taking

any data sources you want as input and

producing if you like predictions about

the future so you might think of them as

alphas for instance in a trading context

or they might be you know other sources

of data or information like advisers

like stock analysts who are giving

upgrade and downgrade recommendations or

sentiment indicators from news or social

media anything you want you can only

think of these things as just boxes that

have in unknown inputs and are

generating a sequence of numerical

values and each trading period we're

going to imagine that each one of these

signals received some arbitrary payoff

and when I say arbitrary I mean I'm

gonna make absolutely no stochastic

assumptions or probabilistic modeling

whatsoever in particular you can think

of the returns or payoffs or losses of

these black boxes as being generated by

some omniscient or even a mom nishan

nature or even adversary whose express

purpose is to sort of foil your use of

these signals and so you should already

be wondering what in the world could you

possibly hope to say in such a setting

and I'll come to that in a second and in

particular these signals or black boxes

might have all kinds of side information

or even inside information they might

have expertise and specialization that

lets them do well under certain

conditions and poorly another under

other ones and so the only thing I'm

going to assume about kind of the vet

their payoffs that these end predictors

get at each time step is that they're

bounded okay that they're you know

they're there they're not arbitrarily

large either positive or negative and

note that this assumption is already

present in most stochastic models that

are used in finance such as you know the

transition from discrete Brownian motion

to continuous Brownian motion that

underlies things like options pricing

okay so in this setting your goal is to

design an algorithm that's gonna

maintain some portfolio or dynamic

weighting or if you like a probability

distribution over these and signals or

predictors and whatever a probability

distribution or weighting you're using

at each time step you're basically going

to get the weighted average according to

that distribution of the underlying

payoffs for the signals okay and so

what's the goal you know so I've set up

this extremely general setting where you

have these arbitrary and black boxes

your algorithm is maintaining some

weighting or distribution over them but

there's no assumptions at all on the

sequence of payoffs and so in particular

there's no assumption here that future

payoffs have anything to do with past

payoffs for these n signals well so in

in the absence of there being some

relationship between past data and

future data the only thing you might

hope to say is something about the past

about your performance in hindsight on

the data so far so no regret learning

the goal is that no matter what the

sequence of payoffs is for these

underlying and signals your algorithm

should have what's called no regret

which means that if we look at the

payoff of the best one of these signals

in hindsight that your algorithms payoff

is is not too much worse than that

okay so you know that's what I'm sort of

saying informally here no regret is

going to mean that if we look at the

best payoff meaning the best payoff

amongst these end signals in hindsight

and we look at the actual payoff that

your algorithm got in hindsight that the

difference between those things is

sublinear right so if T is the number of

time periods so far we'd like for

instance results like this difference of

the best payoff minus your algorithm

this payoff growing so you like square

root of T because if it grows like

square root of T then if we look at your purse

purse

regret right that square root of T

divided by T which is 1 over square root

of T and that's going to 0 as the number

of periods increases and this is the

source of the term no regret so no

regret refers to having per step regret

that diminishes with time which is

equivalent to having sub linear

accumulative regret okay now so so this

is the goal and so you know it's a form

of learning but it's a form of learning

in which you're not generalizing in any

sense at all

you're just claiming that at all points

in time you're sort of tracking the best

one of these n strategies or predictors

in hindsight ok so it's an it's an ex

post statement rather than an ex amt

statement now let me point out something

that we're not proposing here which

would be deeply suspicious we're not

competing with the optimal unconstrained

policy that can switch willy-nilly

between the different strategies or

predictors okay so we're not sort of

saying like in hindsight we should have

been doing this on you know following

following predictor 1 on the first day

predictor 7 on the second day predictor

19 on the third day with arbitrary

switching we're just comparing ourselves

to predictor 1 in isolation to an

isolation 3 in an isolation and

whichever of those is best we want to

track that ok so this is the basic

definition even though there are a lot

of deep mathematical connections between

boosting which many of you may be

familiar with and no regret learning the

fundamental assumptions of those two

frameworks and the goals of them are

quite different

so in boosting you do make probabilistic

assumptions you make stochastic or

distributional assumptions on how the

data is generated you essentially have

an iid setting and there your goal is to

actually beat the best of your weak

learner so to speak here we're not

trying to beat the best right we're just

trying to track the best and so you can

think of the criteria here as being the

difference between finding a needle in a haystack

haystack

ie the you know good quanto pian

strategy out of the sea of quanto pian

strategies versus the other challenge of

trying to build something that's better

than any one of the individual

strategies in the first place ok now no

regret learning this framework has

obvious appeal in financial settings where

where

in particular you know non-stationarity

is the rule and in fact even worse than

non-stationary a lot of that

non-stationarity might be generated by

game theoretic or strategic or even

adversarial interactions between

multiple parties and so typical time

series models even aren't appropriate

okay okay so this is probably the only

semi technical slide and I'm only gonna

talk about the highlighted parts for the

most part so in the top box here I've

reproduced from a survey paper on no

regret learning that I'll give a

reference to at the end what a typical

algorithm that achieves this no regret

criterion looks like and the first thing

I want you to notice is how simple and

how short it is right if you coded this

up in MATLAB in full detail it would be

maybe five or ten lines inside of a loop

and these algorithms are sort of very

appealingly simple in almost every no

regret algorithm looks more or less like

this you would essentially have a weight

associated with each one of the end

signals and you're going to adjust those

weights in an AW in the semi obvious way

according to historical performance and

in particular the way you're gonna

update a weight at the next time step is

you're going to say the new value for

the weight is the old value to the way

times some multiplicative factor this

happens to be phrased as losses rather

than payoff so negative values for this

signal M are good and positive is bad

but in plain English what this update is

basically doing is saying look whatever

your payoff was I'm going to increase

your weight by a multiplicative factor

of 1 plus something times your payoff

and if you lost money I'm gonna I'm

gonna I'm gonna multiply your weight by

something less than 1 which is going to

be 1 minus something times your payoff

and there's something here is this

single free parameter of this algorithm

which is a learning rate okay so I have

one knob and if I kind of crank that

knob up very high then I'm making very

very radical changes in response to

positive or negative payoff so I'm gonna

greatly up weight things that get

slightly positive payoff and greatly

down weight things that get slightly

negative payoff if I set that learning

rate very very low then I'm I'm making

much more modest changes

the important kind of algebraic the part

of this update is that its

multiplicative it's not additive so

those of you who are familiar with

things like the perceptron learning

algorithm for linear threshold functions

which has nice mathematical properties

that's an additive update you sort of

say like well I'm going to add something

to the weight if it got a positive

payoff and I'm going to subtract

something from the way here we're

actually multiplying things and so in

particular if one of these signals goes

on a run of positive payoffs it will

exponentially increase the weight of

that signal over that period of time

okay and then the way you're actually

making your decisions at each round is

just by playing the weighted average of

these of the underlying signals

according to the normalized distribution

given by these weights okay so that's

the algorithm it's incredibly simple and

here's a very very nice theorem you can

prove about this algorithm and I'll just

go through sort of the final part this

it's this theorem is saying look if you

set the learning rate of the algorithm

in a particular way then what this pink

box is saying is that the payoff of this

algorithm on an arbitrary sequence of

payoff vectors for the underlying

signals the payoff of this algorithm at

the top will be the payoff of the best

one of the underlying signals in

hindsight - something that looks like

square root of T log of the number of

underlying signals okay so let me say

that again because I'm gonna and I want

you to keep just this one extra break

expression in mind the payoff of this

algorithm will be no matter what the

sequence of underlying payoffs for the

signals is the payoff of the algorithm

will be at worst the best payoff in

hindsight of any of the signals minus

the square root of T which is the number

of time steps times log of the number of

underlying signals that you're tracking

okay so there is a penalty here that for

the for how many signals you're tracking

is there should be and I'll say

something a little bit more precise

about that in a second there is a

penalty for that but it's modest it's

logarithmic okay okay so let me just

make some miscellaneous remarks about

this framework and type of result so

first of all no regret

and the analysis of them have actually a

very long and rich history it goes back

at least to kind of the game theory

micro econ literature in the 50s in the

form of something called Blackwell

approachability it has a very large and

very currently active as I mentioned

machine learning literature there are

lots of connections to finance some of

which I'll talk about but a couple of

others that I won't there are modern

connections to black Scholes so in

particular it is possible to derive the

black Scholes pricing formula without

any probabilistic assumptions whatsoever

by essentially viewing the pricing

process as a game between a price setter

and an arbitrage or both of whom are

playing a no regret algorithm and this

was done in the last few years or so

there are also very close predictions

between no regret learning and the right

way of pricing in prediction markets in

order to elicit truthful information

from an underlying population they're

very strong and long-standing

connections to game theory you can

derive the minimax theorem and you can

also derive linear programming again as

the limiting behavior between two

parties playing no regret algorithms in

a zero-sum game and more generally if

you take - no regret algorithms and have

them play repeatedly against each other

so they're each generating the payoffs

for the other one in an arbitrary game

zero-sum or otherwise you you get

convergence to Nash or a more general

notion of equilibrium called correlated

equilibrium now this all sounds great

and very powerful let me make a couple

of remarks though that will demystify

this kind of result in this framework so

first of all you know a lot of what's

difficult about proving the type of

theorem that I gave on the previous

slide is to really deal with the case to

really deal with arbitrary sequences of

payoffs for the underlying signals so in

particular you know here's an algorithm

that won't meet the no regret criterion

but you can still say something about so

it's just the greedy algorithm which is

known and there's no regret literature

is follow-the-leader

which is all you do is you basically say

well I'm going to look at the payoffs so

far and whichever one of these

underlying signals has the high

is accumulative payoff so far I'm gonna

put all of my money or all of my weight

on that particular signal tomorrow okay

and then you know you then tomorrow

happens and you update everybody's data

and you repeat okay so this is sort of

the most naive algorithm you can imagine

it does not meet anything like the

result I gave on the previous slide but

on the other hand what you can say is

that the regret of this algorithm will

essentially scale like the number of

times that the lead changes in the

underlying signals for the end

predictors right because the only time

you sort of lose is when you know you

made the you went with expert five or

predictor five and you know they weren't

the best and then they actually lost the

lead so you know if you have if you

happen to have a collection of

underlying signals in which the one

that's the best it doesn't change that

often then this algorithm will work

perfectly well but in general it'll do

poorly okay maybe a slightly more

related but more mathematical thing to

say is that even though this log and

term and the regret is modest it means

that you can't try anything

okay so here's something that you know

doesn't work and shouldn't work which is

imagine that what's going on is that

nature is flipping a sequence of fair

coins and these predictors that you have

are trying to predict what the outcome

of the coin is going to be each day okay

so you're gonna try to look at the

predictors and and use their guesses as

to what the next value of the coin is

going to be to make your prediction okay

so I think we can all agree that the

result I gave on the last slide better

not say that you can predict the outcome

of a fair coin better than random

guessing and in fact it doesn't right

because in order to in order for there

to be an expert or a predictor in your

pool that's good you'd essentially need

to add as predictors all two to the D to

two to the T possible sequences for this

coin over this sequence of coin flips

right and you'll notice that if you

actually put if you choose N equals two

to the T then log of two to the T is T

and then when you take square root of T

times log T its square root of T squared

would which I'm sorry it gives you

square root of it gives you exactly

square root of T okay and this is this is

it should be right what it's saying is

that you can use this method to kind of

come within predicting the outcome of

fair coin making sort of like 1 over

square root of T you know which is

exactly the variance right the standard

deviation okay so it's not telling you

that you know impossible things are

possible which is good on the other hand

it's often the case in the no regret

literature that you find that when you

do make probabilistic assumptions of

some kind on how the data is generated

and you run a no regret algorithm which

is not using such assumptions that no

regret learning algorithms often in fact

give you optimal or near optimal

solutions for your particular stochastic

assumptions so this is a very nice thing

because it means that you can have these

worst-case adversarial guarantees no

matter what the data is but if the world

is nice to you you might actually enjoy

the optimal result for that nice set of

assumptions ok so now let's turn to

trading that's a that's background about

no regret learning so the use of no

regret learning in trading or finance

settings is not new at all and what I've

reproduced here is the cover page of a

very famous paper on this topic called

Universal portfolios by tom koehler who

died a few years ago but was for many

years a prominent statistician at

stanford and in his paper he was he was

you know his paper actually predates the

kind of modern flurry of research in the

machine learning community on no regret

learning and he was considering a

setting in which these n predictors were

just simply the returns of stocks ok so

you have history historical returns of stocks

stocks

he's not going to make any stochastic

assumptions like Brownian motion with

drift or or any other time series model

it's just each day there's a vector of

returns for some underlying universe of

n stocks and he wants to maintain a

weighted portfolio over those stocks and

say something about it and he actually

says something stronger even than what I

said about comparing favorably to the

best single stock in hindsight he

compares his algorithm his algorithms return

return

- the best constant rebalance portfolio

in hindsight okay so he what I mean by

that just so we're all on the same page

is that the sort of class that he wants

to compare himself are all distributions

over these and underlying instruments in

which you do constant rebalancing so you

just have some fixed weighting and then

every day you know some things make and

some things lose money and then you've

made or lost money whatever money you

have left you're gonna redistribute it

according to this constant vector of

weights okay and as some of you might

know rebalancing is an extremely

powerful thing in particular it's easy

to kind of you know come up with very

simple sequences of returns for even two

stocks in which neither of the stocks is

really doing anything at all they're

it's just they're essentially flat but

because of the rebalancing you're

actually getting exponential growth in wealth

wealth

okay so Kover proposed a particular

algorithm that doesn't look too

dissimilar from the ones I described a

few slides ago and compares it to the

best constant rebalance portfolio after

the stock outcome so again in hindsight

for an arbitrary sequence of returns and

but he so he compares to a stronger

class of comparators but he has a weaker

criterion for what he means by regret in

particular he looks at log wealth so he

wants the log of the of his cumulative

wealth to compare favorably to the log

of the cumulative wealth of the best

constant rebalance portfolio in

hindsight and and this is all very very

sensible in the regime where you imagine

that the the best constant rebalance

portfolio is experiencing exponential

growth right because if you take the log

and it's not exponential growth this is

not an interesting statement okay

okay and so he actually did some some

simulations so here is a figure

reproduced from Cobras paper in which

there are two just so-so and n is two

here there are two underlying stocks and

here they are down at the bottom and

here he's showing the kind of cumulative

return of those stocks and then the

cumulative return of his so-called

Universal portfolio algorithm and you

can see this looks quite good these

stocks aren't doing

his generally doing quite well and much

much not not only does it have kind of

no regrets to either these stocks it's

doing much much better he doesn't

unfortunately show what the best

constant rebalance portfolio would be

but there is a theoretical guarantee

about that and this is from a much later

paper that proposed a similar algorithm

for a similar setting in which again

they've taken some underlying universe I

think in this case of about 30 tickers

and they're not showing you the

underlying tickers but they are showing

you the performance of their algorithm

which is here this is the performance of

the best constant rebalance portfolio so

they're kind of comparing favorably to

that they're not doing as well but

they're doing almost as well and this is

the return of the best single stock in

the underlying universe of 30 ok so this

all looks well and good and you know so

maybe 10 years ago or so you know when I

started or maybe more like 15 years ago

or so when I started thinking seriously

about about trading and Finance I you

know I was aware of this literature and

I was aware of these papers and I

thought well this is an obvious thing to

try and I learned a lot in doing that

which is where we come to the asterisk

on the title here and so here is here

these are plots from some experiments I

did and in this particular case I've

taken a much much larger universe than

any of the previous papers consider in

particular I took the S&P 500 okay so

now instead of n being 2 or 30 and is

500 okay and what I'm showing you here

is the S&P 500 between 2005 and 2011

which is a very interesting period for

the S&P 500 because it includes the

financial crisis and most of the

recovery from the financial crisis but

to the extent that the index ends up in

2011 about where it started in 2005 so

there's been sort of huge directional

movement but overall there's not been

much change in the index from the start

points at the end points but period of

extremely high volatility and so what

I'm showing you on this left-hand plot

is there's a blue curve for each of the

underlying stocks in the S&P 500 this is

the key each one of these blue lines is

the cumulative return over the

period for some for one of those 500

stocks and these lines that are buried

down here in the middle with again with

a little red arrow are the performance

of the kind of multiplicative weights

algorithm that I showed you before that

has this no regret guarantee and now it

doesn't look so great and there's

there's many of these I'll get to why

there's many different colored lines

here in a second

but you can see that you know this does

not look especially better than just

trading the index itself and it's not ok

it's you're not doing much better here

with this particular algorithm then you

would have done by just sort of equally

weighting the 500 stocks at the

beginning of 2005 and letting it and

letting it sit or rebalancing for

instance over that time period and what

I've done over here on the right plot is

taken away the other 500 curves here so

we can just see the performance of the

album a little bit better and the

different curves here correspond to

different values of the learning rate so

remember I told you there's this one

parameter of this method which is how

aggressively you make these updates and

if you set it to zero then you're not

doing any adaptation at all you're

simply parking your weights equally and

leaving them and if you crank it up

really high then you're making very very

dramatic changes in response to positive

or negative returns and so the the green

line here is where the learning rate is

the highest and the blue one is where

it's the lowest and you can see like to

me none of these look appealing right

the the mid the blue one is kind of

trading the index the green one you know

looks a little bit better at the end but

then you had of course a much more

significant drawdown during the sort of

period of the financial crisis and it's

not the real problem here isn't that the

returns don't look great that is a

problem the real problem is that though

is the way you're not getting good

return which is with extremely

concentrated portfolios so in particular

if you crank the weight up to this

particular green curve that I get here

this is what the underlying weights look

like so there are actually 500 curves in

this plot here you just can't see almost

all of them because they're nailed down

here at zero and so you can see that all

this you know green strategy is doing is

wildly switching its capital and sort of

chasing whichever thing

is outperforming for a particular sort

time period you can see that roughly

when this particular stock takes off is

when the algorithm sort of midway

through the time period more or less

puts almost all of its weight in that

stock for the remainder of the period

right and so even if you were sort of

found these returns acceptable from a

quant perspective you could never trade

this right you know like a quant hedge fund

fund

quanto peon in particular I believe

would never let you trade such a

strategy because it's too concentrated

right you're basically like doing stock

picking you're not actually doing

something quantitative and with low

concentration and high diversity by the

way if you're interested in what this

stock is it is of course Apple and so

even though you know even though the

algorithm was doing something sensible

and I would have been very happy for you

to have told me in 2005 to put all of my

wealth and Apple and just leave it there

still from a quant perspective this is

not the sort of you know kind of xposed

behavior that you want to see okay and

so this leads to the natural question of

like well can we can we you know fix

this can we adapt these algorithms and

this framework which i think is very

appealing where you're not making any

stochastic assumptions and you're still

getting these very strong ex-post

guarantees to an arbitrary sequence of

data can we somehow preserve those

properties but deal with the you know

the sort of additional constraints that

come into a quant rating environment

which is you want things like a

diversified portfolio and other forms

you know which is one form of risk

management right diversification okay

and so you know after doing these

experiments I spent you know some time

over the coming years writing a series

of papers in which I tried to address

these things and and let me tell you

about some things that don't work so one

thing you could ask for is you could

just say like well look you know of

course if your metric is to you know

let's say in a finance setting to have

the return the return of your algorithm

compared favorably in hindsight to the

return of the best individual stock or

the return of the best constant of

rebalancing portfolio you know there's

nothing in that statement that mentions

anything about risk okay or a

concentration or or Sharpe ratio or the

like and so

one thing we could say it's like well

maybe we can just change the criterion

and go back and kind of redo the math

and work out an algorithm that will do

make a similar statement for the altered

criterion so let's let me be precise you

know every every individual stock not

only has a cumulative return it has a

Sharpe ratio right we can we can look at

the sp500 and after any time period we

could say what is the Sharpe ratio of

each one of these stocks over the time

period in question and I could therefore

ask can we design an algorithm that in

hindsight has a Sharpe ratio which

compares favorably I has low regrets to

the best individual stocks Sharpe ratio

in hindsight perfectly reasonable

question could also ask it about

whatever your other favorite kind of

risk versus reward trade-off is like the

mean mean variance optimization I could

sort of say like well I'm gonna for each

stock look at its return - its standard

deviation historically and I watched

your algorithm to compare to whichever

stock has the highest mean - standard

deviation okay so this seemed like a

good idea

and unfortunately there are strong

negative results on this so not only

could I not find an algorithm which gave

me a mathematical statement like the one

we saw at the beginning but first say

something like Sharpe ratio even worse

we were able to prove an impossibility

here and we will go prove that you just

cannot design an algorithm which will

give you such a guarantee on an

arbitrary sequence of returns so no

regret is provably impossible in fact

not only can you prove that no regret is

impossible you can prove strong lower

bounds on what's called the competitive

ratio so you know instead of sort of

getting optimal - something you know I

could ask well maybe can I get half of

optimal or a 1/10 of optimal so it's

like some constant fraction of optimal

you can even prove impossibility results

for that so there's somehow something

about this framework which forbids you

from changing to the sort of the types

of criteria we might like to have in a

finance setting and still kind of

preserving the types of mathematical

guarantees that we get and intuitively

you know

the reason for these impossibility

results is that volatility introduces

switching costs right so you know I can

be tracking a stock you're putting all

of my weight in one particular stock and

enjoying the sequence of returns it gets

and maybe some other stock starts

looking better and so I want to switch

to that other stock well at the moment I

switch to that other stock I start

enjoying whatever returns that stock is

getting but I don't enjoy the same

historical volatility right so somehow

standard deviation is somehow keeping

track of some sort of state and this

state is causing problems for the the no

regret type of framework okay so one

thing that we have more success with an

alternative approach if you like is to

instead of assuming that these black

boxes or predictors are arbitrary let's

imagine that they have some risk

consideration properties as well okay so

in particular let's internalize the risk

within these strategies and then if we

have a collection of strategies that are

doing some risk management internally

can we run an algorithm on top of those

which has sort of favorable no regret

style properties and so let me let me

just tell you a little bit about this in

one slide so this is what you might

consider no regret learning under some

sort of risk constraint okay and so as

many of you know like the Sharpe ratio

is something you can always measure but

it's very difficult criterion to

actually manage to write the the closest

you can come to actually managing to the

Sharpe ratio is to just impose risk

limits to control your risk somehow and

so in particular one natural way of

controlling risk is to restrict yourself

to portfolios or strategies whose daily

PL has a standard deviation of at most a

certain amount so we're basically going

to say look you know you know you know

kind of hedge fund context you might

imagine saying whatever the returns are

we want to restrict ourselves to

strategies that on a typical day you

know won't have a standard deviation of

let's say more than ten million dollars

one one way or the other okay so we're

gonna basically we have some underlying

capital and a Strad

we can look historically at the PNL

that's been generated on that Capitol

and look at its distribution and say we

just don't want anything whose standard

deviation is more than a particular

dollar amount so if you do that right if

for instance you'd go to the exercise of

taking the historical returns of the S&P

500 and you estimate the covariance

matrix between those 500 stocks or you

have some perhaps lower-dimensional

factor model for doing so and you you

basically say well what are the

portfolio's over those stocks that have

less than a certain dollar value of

volatility historically it's easy to

work out the map that this is basically

going to give you an elliptical

constraint in portfolio space and the

eccentricity of that ellipse will depend

on the strength of the correlation so

like if all of the stocks are

independent then this constraint on the

standard deviation of your gmv is just a

spear and n dimensions if some stocks

are highly correlated and you'll get

sort of eccentricity in those particular

dimensions okay okay so so this is nice

because it means that you know if we

adopt if we sort of assume that the

underlying strategies or portfolios that

we're trying to track in a no regret

setting our risk managed in the sense

that they have some standard deviation

limit on their you know on their daily

P&L we end up with a convex in

particular in elliptical but more

generally a convex constraint on our on

our portfolio space okay and what I'm

showing you in these kind of simulated

plots are you know just two strategies

one of which is essentially a momentum

strategy and this is where there are two

stocks okay so there are two stocks here

and and the X and y values here are

essentially indicating the position in

that particular stock at different

moments of time and the color-coding

here you can ignore it's just sort of I

think blue is sort of early in the data

and and red is later in the data so I'm

just trying to show you the movement

over time so in in these simulated plots

all I'm showing you is one strategy

that's essentially following

a momentum rule and is therefore

changing its portfolio over time and

another one is essentially a pairs

trading strategy where there's a

continued no it's basically offsetting

long positions in one stock with short

positions in the other stock and the

stocks are pretty correlated which is

why we have this highly elliptical shape

here so these are sort of two underlying

strategies that we've imposed risk

considerations on so you can see that

not the you know for either this

momentum strategy or this kind of fully

hedged pairs trading style strategy they

never leave this elliptical constraint

because if they hit up against the

border we just say you know don't make

that trade don't make any trade that

would actually cause you to cross this

risk constraint and so now we've changed

the game right rather than considering

arbitrary black boxes we're considering

arbitrary black boxes but we don't let

them go outside of this little playpen

right because we're enforcing a risk

consideration on them and now you can recover

recover

you know the type of theorem I gave in

an earlier slide in so now we're only

going to compete with strategies that

obey these risk limits and we're also

only going to look at strategies that

make as these strategies do they only

make local moves in portfolio spaces

because and you can think of this as

modeling you know kind of limiting your

market impact okay and to kind of cut to

the chase if you make the additional

assumption that your underlying

strategies are risk managed in this

particular way then you can run you can

give an algorithm that looks similar to

but it's a little bit more complicated

than the one I gave at the beginning

that we'll all it will actually give you

a no regret guarantee in in the in the

risk managed criterion okay say like me

- variance okay and the only thing you

need to do is sort kind of combine no

regret learning with some tricks that

are from an interesting literature

called the pursuit of Asian literature

which you can go look up which has a

long mathematical history but it's

basically about games in which one party

is trying to chase another party let's

say in a convex region and so you can

recover theoretical guarantees and you

can get kind of nice experimental

results as well let me just finish up

here by talking about some variations

of this framework so you know one thing

it would be nice is to maintain the

original no regret I guarantee I talked

about which is to have small regrets to

the best single predictor in hindsight

but also guarantee that you will never

do much worse than some fixed index okay

so for example it would be great if I

could say like well you'll get close to

the returns of whatever the best stock

is in the in the index but you'll never

do worse than the index itself okay

whereas the original statement allows

you to do kind of cumulatively square

root of t worse than the index right so

if sort of everything's moving sideways

and there's no winner taking off

relative to the index you might actually

lose got a square root of regret square

root of T regrets to the index itself

which wouldn't be so great so it turns

out that you can make again slight

modifications to the algorithm the type

of algorithm I gave you at the beginning

of the talk and get this kind of

guarantee as well sort of promise that

you'll track the index and if something

in the index takes off then your your

profit will will track that as well a

very important generalization in in many

applications but including in finance

applications is to go to what's called a

bandit setting so in the framework I've

described to you so far you know each

day you're maintaining awaiting over

these underlying predictors or stocks

but whatever weighting you're

maintaining even if you put all of your

weight in a single stock you're still

finding out what the returns of all of

the stocks are okay and so that's fine

if your underlying predictors are

actually stock are actually stocks but

in many settings

you know you might only get feedback on

the thing that you chose to play and

this is what's called a multi-armed

bandit problem for those of you that are

familiar with that literature so for

example if instead of being n stocks I

have n strategies and each of these

strategies has internal state and makes

trades each day you know and I want to

maintain kind of a weighting over these

underlying strategies like well to

really find out what a strategy does

does on a given day I really need to

trade it in particular because you know

I can back test all of them

right in parallel but I can to really

find out what a strategy would have

gotten today I really need to trade it

live in the market among other reasons

because of trading costs right because a

market impact because of the

counterfactual that you know if I make

trades that causes other traders and

algorithms to change their behavior and

the only way of simulating that is by

actually trading live in the market okay

so in a setting like that I can't just

sort of maintain this conceptual

weighting and every day find out what

these strategies would have done I

really need to pick one or more of them

and play it in the market to get the

feedback okay and so this is called a

bandit setting where you only get the

feedback or payoff of whatever you

whatever choice you made that day and

and basically everything I've said in

this talk also applies to the bandit

setting there are simple modifications

of the algorithm I gave on the first

slide where you only get feedback from

the thing that you chose and you still

get similar mathematical results would

like a worse regret down I think it goes

like the dependence on the number of

time steps goes from square root of T to

like T to the 2/3 something like that

okay let's see the the other thing that

I think you know tool that you you

should be aware of and by the way one of

the reasons I wanted to give this talk

is that these tools are simple enough

but powerful enough that I think a lot

of you could find applications for them

in your own strategies and so let me

tell you about you know kind of one of

the the biggest generalizations of this

framework that's been known today which

is sort of general online or sequential

convex optimization so now imagine your

action space instead of being like

picking between one of n discrete

alternatives which is really the same as

maintaining a weighting over them you

can think of maintaining a weighting or

distribution over them as is like a

probabilistic strategy where I have a

weighting and I'm going to pick randomly

from that distribution and play the

particular time aid right so

mathematically it's exactly the same

because it's just you know either the

inner product or it's an expectation

okay so a very nice journal is a ssin of

this is where our action space is

continuous right so instead of there

being n distinct choices there's an

continuous infinity of choices

but we need the shape of that continuous

infinity to be nice like let's say some

convex and compact region but if you

want to think of a Kong a specific

example imagine that you have some

parametric strategy class okay and that

you know each point in this space is one

setting of the parameters for your

strategy okay so a typical thing to do

would be to get historical data and

optimize the parameters of your strategy

on the historical data and use that

going forward but then again you're

making the assumption that like the

future is going to look something like

the past so you can not do that and

instead think in a no regret way and say

like well there's this continuous

infinity of settings of parameters for

my strategy I'd like to be able to

promise that no matter what the data

looks like I will run a strategy whose

performance in hindsight looks like the

optimal setting of the parameters you

know after the data's come in ex-post

okay um and and and furthermore let's

imagine that the way the payoffs are

determined each day is with some

arbitrary convex function over your

parameter space so you know the game is

if you like is that you know everyday

nature picks some arbitrary potentially

different convex function describing the

payoffs over your parameter space you

don't know what that function is but you

pick one of the parameter points and you

get the corresponding payoff that day

the next day nature changes what that

function looks like you again pick a

setting for your parameters you can

basically get the same kind of guarantee

that I gave in the no regret setting

where you promise that no matter what

the sequence of payoff functions chosen

by nature is their performance of your

algorithm will compare favorably to the

best setting of your parameters in

hindsight okay and this is sort of the

most general the most general

formulation of this the no regret

setting so just to conclude you know no

regret learning I think is something

that I would encourage all of you to

look at it has very rich history has

very powerful theory that touches on you

know many topics from finance to game theory

theory

and and even sort of mathematical

programming in the form of convex

optimization linear programming and the

like on the other hand deviations from

the typical additive objective function

like cumulative returns or cumulative

payoffs kind of break the theory for the

most part which is not a great thing but

there are workarounds right so there's

that you can one workaround that I

mentioned and there are others the one

is to sort of in dodging i's risk levels

within your strategies and then run such

an algorithm on top of those risk manage

strategies as well and I've listed here

at the end of the quanto peon team will

make these slides available I've just

listed some references I especially

recommend the first two which are served

very nice survey papers on the entire

topic of no regret learning and they're

they're relatively recent as well and

I'm happy to take questions if there are any

okay so first off um I I'd rather stay

man because first off I just want thank

you for giving such an amazing talk

thank you which I have you as a

professor but well I want want to ask

about this is I I guess that that when

you when you have that the payoff of the

algorithm the performance of the

algorithm I guess it depends on how you

would define you know the the payoff

that you're aiming at like if you have a

collection of n strategies then like I I

guess it just become more concentrating

that that you're that what you're aiming

at is total cumulative return over the

entire Baptist as compared to say you

know the total cumulative return over

the past three months six months I see

so you want to you you want to have kind

of a sliding guarantee so so this this

this this guarantee there are variants

to give you this as well right so rather

than sort of saying like well T just

keeps ticking up and I always will

promise that I'm like square root of T

behind at worst the best thing on the

entire history so far I might want to

move and guarantee right that basically

says in any window of length you know

let's let's give it a different name of

any window of length T prime I will

compare favorably to the best thing on

that window you can get such a guarantee

as well okay but but you know but in

general that that fixed window T prime

won't be going to infinity right because

it's a fixed window so if I want to you

know you should think of it's like it's

it's a learning it is a learning

algorithm right it's sort of getting

better in the sense that the guarantee

that you're giving is getting stronger

the longer time goes on but if you

choose to compare yourself to a fixed

moving window of like 250 training days

right the last calendar year then that

value will be fixed and you will you

know you'll trap to that extent but you

won't have a strong a guarantee as you

would if you let kind of T goat

asymptotically to infinity okay okay it

just seems like like you know like God

that the classic you know momentum

strategies like they seem to sort of

mesh pretty well with the

this is that yeah so that's a good point

this is a momentum strategy covert thing

is a little bit different right because

you're also you're running it on top of

constant rebalance portfolio so there's

this interesting there's both momentum

and reversion going on because the meta

algorithm that's reading underlying

court folios that's momentum but the

constant rebalancing is a reversion

process right you in when you rebalance

you take money away from things that did

well today and you give it to things

that didn't do well today right so in in

the in the Universal portfolio case

there's both the momentum and a

reversion aspect but in the straight

versions where you're just comparing to

the underlying and stocks for instance

it is it is a pure momentum strategy

you're right that's that's pretty cool

thank you yeah actually I found two

questions about this the first one is

that I'm a little bit confused about why

risk sort of derail the algorithm I

would have thought that you know the

theoretical proof has already

incorporated all possible random

deviation from the mean so that you know

despite the presence of which it still

satisfies that convergence yeah so let

me let me try to answer that question

even more generally than these methods

in which maybe a more or less satisfying

way of answering in in general one of

the dark secrets of all of modern

machine learning is its reliance on

additive objective functions so what I

mean by that is you have an objective

function that you're optimizing that can

be decomposed as a sum of the objective

functions on the individual data points

okay so for example

least squares regression right I can

write down the objective function as a

sum over each data point of my squared

error on that data point okay and

basically everything you know from

modern machine learning almost

everything relies very heavily on that

additive 'ti both the theory and the

practice okay and anytime you have an

objective function which breaks that

various chaos ensues and the

simple thing that happens here is that

you know square root does not have that

property right if I look at like the

variance of a random variable or or like

the the performance of a model over time

it's a square root of a son okay and you

can't rewrite a square root of a sum as

a sum of square roots okay and this is

what so so that's sort of a math answer

the intuitive answer is that things like

variance and standard deviation and

other common risk metrics they actually

have state right another way of saying

less mathematically what I just said is

that I if you give me a list of if

there's a list of numbers and you tell

me that there's n numbers in the

sequence and I want to know what the

contribution of this number is to the

average I know exactly what it is it's

that number divided by n if you ask me

what the contribution of this number is

to the variance of that sequence I can't

answer you I need to see the rest of the

numbers and so things like standard

deviation and variance that you can

think of them as having state right

there you're actually keeping there's

something there's something about the

googlers global information about the

sequence in that number in a way that's

not present in things like just the

straight mean or any other additive loss

function okay thank you so the second

part of my question is is the slow

conger conversions i you know i

understand that you have sub linear

convergence to the best algorithm so the

question is is this too slow for a

practical you know trader I guess my

answer that would be if this if if you

know it depends on you know it depends

on how much data you have and and many

other things but my general answer would

be a sort of one over square root of T

convergence to optimality is too slow

for you then so will all of machine

learning right because basically sort of

power law convergence is the rule in

machine learning both the theoretically

and largely empirically as well so you

know I I definitely agree there are you

know there are data sets for which the T

we have is just you know not big enough

just for these things to make

interesting statements but in general

that's a sign that you just don't

enough data to apply very very simple

learning methods even like linear

regression exam for example thanks for a

great talk very thought-provoking so my

question is in regards to the

multi-armed bandit yeah can you talk a

little bit about the exploitation

exploration trade-off and how you sort

of handle that yeah it's a good question

so sort of no regret algorithms in the

bandit setting essentially work on the

principle they handle exploration

exploitation with sort of an optimism

under uncertainty principle and what I

mean by that is like a classical bandit

algorithm that did very explicit

exploration exploitation would be

something like look you know at each

round you know with some problem with

most of would you know with high

probability choose the thing that's done

the best so far and with the remaining

probability pick something at random and

then sort of gradually schedule that

exploration probability to go to zero in

sort of the no regret framework you

instead have implicit exploration what

you basically do is you say things that

you know things that I've tried a lot

like we the it's gonna sound like I'm

making probabilistic assumptions but I'm

not but at the high the intuition in a

probabilistic setting would be things

that you've tried a lot you have very

tight error bars on what their real

underlying value is and things that you

haven't tried that much have very wide

error bars so instead of using like the

the empirical mean of each one of the

actions so far use an upper confidence

bound okay and so things that you

haven't tried much kind of get this

automatic promotion and then you try

them more and then you find out whether

they're really good or not and this is

basically how all these algorithms look

awesome thanks so much hope so you

showed that once you manage the risk of

your strategies you could get good

regret guarantees now what is the

intuitive reason that you didn't get it

when you didn't manage the risk

what's the intuitive reason why what

like theoretically you should have

gotten it without managing the risk as

your theory showed initially so what is

the intuitive reason when you manage the

risk you could get so no what I'm saying

is the original theory or brain

if you don't kind of have each one of

the constituent strategies managing its

own risk in some sense only in the

experiments rate huh no even in even in

theory right in theory I can just sort

of give you bad sequences in which the

underlying constituent strategies are

not risk managed and you're in anyway

and not only will the particular

algorithm I showed do poorly you can

actually prove outright that there

doesn't exist an algorithm which will

kind of recover the theoretical

guarantees I gave on the early slide

okay last question thank you dr. Cruz I

have a question about updating the

weights to optimize the Sharpe ratio can

we consider using some like momentum

gradient descent method that we can

update the instead of doing like

stochastically we can add the everydays

information with respect to the gradient

of the Sharpe ratio so that we can use

some momentum method that match the idea

that you use here too like yeah I'm

afraid my updating the you know my

answer is gonna be the unsatisfying one

which is you know nice idea it just

turns out that doesn't work okay so in

you know like as as per these

impossibility theorems that I mentioned

so by the way you can interpret these

algorithms as momentum algorithms

engaging in gradient descent so in

particularly this last generalization I

mentioned where you have like a convex

parameter space and each day nature is

picking some payoff function over that

parameter space and you're sort of

trying to track the best setting of your

parameters that though if you go look at

the actual algorithm that does that it's

essentially you know even though there's

no gradient to speak of because every

day the objective function is changing

according to nature's choice it's

essentially trying to compute some sort

of aggregate gradients of the functions

chosen the average of the functions

chosen by nature so far so these

algorithms in general they are momentum

algorithms and they are following a

gradient but sort of trying to do it

with Sharpe ratio as the objective just

kind of breaks all of the theory and

it's because of this sort of

non-additive ax t that i mentioned before

before

点击任意文字或时间戳，即可跳转到视频对应位置

大多数字幕 5 秒内即可准备好

一键复制125+ 种语言搜索内容跳转到时间戳

粘贴 YouTube 链接

输入任意 YouTube 视频链接，获取完整字幕

大多数字幕 5 秒内即可准备好

安装 Chrome 扩展

无需离开 YouTube，一键获取视频字幕。安装我们的 Chrome 扩展，直接在视频页面访问任意视频的完整字幕。

免费添加到 Chrome

支持 YouTube、Coursera、Udemy 等主流教育平台

快速获取字幕：直接修改地址栏中的域名即可！

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube 字幕正在为您准备结果……

YouTube 字幕："Trading without Regret" by Dr. Michael Kearns