0:01 thank you very much so I'm very excited
0:04 to be here first of all because the
0:05 speaker lineup is something out of my
0:07 dreams it's either my friends or people
0:09 that are always wanted to see again or
0:10 people always wanted to meet so I'm
0:13 really Keen like for what's happening
0:16 this week thanks very much IAM as well
0:18 and secondly I'm excited and also
0:20 slightly nervous because in the last 3
0:22 years I've actually not given talks at
0:24 all if I could help it really the bare
0:26 minimum and the reason for that was that
0:31 I had a growing feeling of um ease with
0:32 the literature in Quantum machine
0:34 learning especially in the field of
0:36 thinking about how quantum computers can
0:38 be used for machine learning and this
0:39 unease was also reflected in my own
0:41 research I always had this feeling
0:42 something was missing there was an
0:43 elephant in the room that we don't
0:47 address that there were problems and so
0:49 um this is now the first talk where I
0:50 want to present a bit of work where I'm
0:52 very confident again that this is going
0:54 into the right direction so I spent the
0:55 last couple of years besides having a
0:57 child there's another one on the way so
0:59 a bit of family stuff as well but I
1:00 spent a lot of the last years really
1:02 thinking about how to verbalize these
1:05 issues that are kind of felt and
1:07 secondly how to build a research agenda
1:09 that uh addresses them or circumvents
1:10 them or sometimes is a little bit
1:12 different and this is um so what I want
1:14 to present here are preliminary results
1:16 from two pilot studies in two Focus
1:18 areas that we're now kind of following
1:22 exam do and um so the reason why I could
1:23 spend so much time in actually thinking
1:25 about a conceptual version and also you
1:27 know not publish a lot not go to talks a
1:30 lot is that um I was you know asked to
1:32 build a little Quantum machine learning
1:33 team xandu and a lot of the credits go
1:35 to those folks here this is the core
1:37 team this is a bit more the larger
1:39 Circle if you like what you hear here
1:41 then consider working for us it's a very
1:43 sweet team and we have um what's an
1:45 opportunity and curse we are all remote
1:49 um so yeah okay cool the first thing I
1:51 realized in the last couple of years
1:52 speaking to also people like for example
1:54 Ryan or like a couple of you in the
1:58 audience is that these concerns that I
1:59 have come from the objective that I'm
2:00 trying to to optimize and you might not
2:02 share this objective but the first thing
2:03 I have to do is be open with this
2:06 objective I'm working in Industry um
2:08 zanu's mission is to build quantum
2:09 computers that are useful and available
2:11 to people everywhere and this hides what
2:13 the company actually does most of the
2:14 company is building a photonic quantum
2:17 computer but a large uh part of the team
2:19 is actually trying to build software
2:20 that makes them available and hopefully
2:22 a joy to program you might have heard of
2:26 Penny Lane and I'm part of the uh
2:27 software team leading this LEL Quantum
2:29 machine learning team that's trying to
2:31 make these computers
2:33 useful and so the derived Mission or
2:34 objective that we're working on is to
2:36 make computers useful for machine
2:38 learning and this sounds now like a bit
2:39 of industrial bubble but it's actually
2:41 not it's a very precisely and
2:43 consciously formulated objective and
2:45 what you do not see here is that we want
2:47 to prove a practical Quantum Advantage
2:48 because this is something completely
2:50 different what we're doing here is
2:52 envisioning a state of the world this is
2:54 something that you do in startups a lot
2:55 and I learned from it and I find it
2:56 actually quite an interesting way of
2:59 working you envision a a state of the
3:01 world maybe in 10 years time where the
3:03 typical machine learning practitioner
3:04 doesn't only have specialized knowledge
3:07 on how to program a GPU or on linear
3:09 algebra but also needs to know Quantum
3:11 Computing and the question is how do we
3:12 get there from here so this is an
3:14 industrial question it doesn't mean to
3:15 understand the world it means to change
3:18 the world but I wouldn't do it if it
3:20 wouldn't Encompass also like a lot of
3:22 understanding the world obviously and
3:23 now the question is you look at the
3:24 field and you have especially when
3:26 you're building a team and you have to
3:29 give them something to do and um the
3:30 question is now with the models that
3:32 also Nathan had this wonderful
3:35 introduction to are we actually on the
3:36 right path so basically if we
3:38 extrapolate our computers get better our
3:40 models get a little bit better we tune a
3:41 bit here and there will we get to the
3:45 state of the world and my answer to this
3:47 is if I formulate it nicely we need to a
3:50 lot of things have to change in research
3:52 for us to be on the right path obviously
3:54 I can't predict the future but um let's
3:57 put it like that if I would put my money
3:59 or my good name or something to a path
4:01 and I would would say let's not choose
4:03 those let's change a couple of
4:06 things um also I have to say when I talk
4:07 about Quantum machine learning in the
4:08 back of my mind I have this like very
4:11 mainstream approach that Nathan was also
4:12 talking about we load data into a
4:15 quantum computer we kind of uh run some
4:16 kind of variational algorithms and then
4:18 it gives us like kind of the answer of a
4:20 machine learning task which is a kind of
4:21 narrow approach to Quantum machine
4:23 learning in general I'm sure we discuss
4:26 a lot more facets of it so let me try to
4:27 phrase four patterns that I see in the
4:29 literature and in my own kind of work
4:32 that I started feeling uneasy about and
4:34 I think I'll make myself very um well I
4:36 don't know if I make some enemies here
4:38 but the first one um is the pattern of
4:40 we prove an exponential speed up in qo
4:42 and actually even Nathan who's like
4:43 knows everything about speed UPS was
4:45 saying like maybe this is not the right
4:47 approach I was actually like i l a
4:48 couple of like comments in this
4:51 talk why is this so problematic so from
4:53 an academic point of view you can play
4:55 this game but again it's problematic if
4:57 you want quantum computers to be used
4:58 for machine learning one day on a grand scheme
5:00 scheme
5:01 and the point is that the language of
5:03 computational complexity is not used in
5:04 machine learning if you good in Europe
5:06 you hear very few people like talking
5:09 about exponential speedups and the
5:10 reason is that this language cannot
5:13 really formulate or phrase what we see
5:15 in current state-of-the-art machine
5:17 learning so this language is not useful
5:18 for current state-of-the-art machine
5:20 learning another way to phrase this is
5:22 machine learning is happening in
5:25 heuristics and this is a problem and we
5:26 should acknowledge that this is a
5:27 problem for the language that we have
5:29 learned and inherited from Quantum computing
5:31 computing
5:32 so I do think that our performance
5:33 measures in this sense our theoretical
5:36 ones are to some extent not meaningful
5:38 to get good results in machine learning
5:40 and I put here mainstream machine
5:41 learning because when I discuss this
5:43 with people they always say yeah what
5:45 what about Quantum data maybe there are
5:46 Tas that machine learning hasn't tackled
5:48 yet and there again I think Nathan has
5:51 dropped a very interesting comment in
5:53 the last weeks I looked a little bit
5:54 into Transformer models and what's
5:56 actually happening in state-ofthe-art
5:58 machine learning and I'm not actually
6:00 sure that they can't solve a lot of our
6:02 problems in quantum physics because they
6:04 learn in a very interesting manner kind
6:06 of very complicated correlations physics
6:08 problems are simple there's low energy
6:09 involved and most of the things we do
6:11 they're very structured why on Earth
6:12 shouldn't they be able to learn most of
6:14 them except from a couple exotic heads
6:16 but again my idea is not to prove that
6:17 there's an exotic case where things will
6:19 happen I want this to be mainstream so
6:24 here the second pattern is also related
6:27 to Performance so if we don't go to
6:28 Quantum Computing as our parent discipline
6:34 the answer is no no the answer is yes is
6:39 it is it fair to say that a uh a
6:41 proposed a machine learning algorithm
6:45 that is only polom slower than a Quantum
6:47 solution is actually Quantum because it
6:50 could be simulated in polom time then or
6:52 polinomial equivalent time on a turing
6:54 machine oh my gosh this
6:57 question can you phrase it again and
6:58 like tell me the context of why this is
7:00 right now oh the context is a
7:04 talking from previous slide so the um
7:07 the basic thing is is that if we have a
7:10 black box right we can't stare inside
7:12 the black box to be able to determine
7:15 whether or not it's a uh quantum
7:18 computer or it's a d-wave quantum computer
7:20 computer
7:23 okay so if we don't know whether or not
7:27 it's classical or a quantum computer um
7:29 then how can we tell from the input out
7:32 put pairs if it's only a polom
7:34 separation between it because then it
7:38 becomes uh there's no clear indication
7:39 that we couldn't just be secretly
7:42 simulating everything going on inside
7:45 inside
7:48 okay yes can we talk about this
7:49 afterwards it leads into a very
7:51 different topic and I don't know if I
7:53 entirely understand the problem here
7:55 okay yeah do I'm not cutting you off I'm
7:58 really just like affirming that I'm I'm
8:00 not on the level this
8:03 okay lovely thank you cool second
8:05 pattern um I'm actually quite impressed
8:07 do they have cameras on the videos on
8:15 you okay second second P guest oh yeah
8:17 yeah you know me quite well by now so
8:20 second pattern is um you know we uh look
8:21 at performance from a machine learning
8:24 angle what we do here is okay heuristic
8:25 so we start
8:27 benchmarking so a lot of papers like run
8:28 little benchmarks there always the
8:30 sentence where we start testing things
8:32 on mnist or whatever and the big
8:34 elephant in the room I think here is
8:37 that um machine learning has different
8:39 regimes and I think that small problems
8:42 in machine learning um are solved and
8:45 have a very different working reality
8:46 than big problems and the big problems
8:48 remember again this is like what we're
8:50 trying to solve here so we don't
8:51 actually know from the benchmarks that
8:53 are so small at the moment what happens
8:55 on larger scales and if you have ever
8:56 trained a new network on very or a very
8:58 performant new network on small problems
8:59 they're actually performing really badly
9:01 often so there seems to be something
9:05 going on and at this stage in time it's
9:07 very hard to change this but we could be
9:08 a lot more aware in our research asking
9:10 questions what benchmarks can we
9:12 actually design that are designed to
9:18 scale um there's another issue actually
9:21 that um is so kind of like guesswork
9:23 that I just want to share it with you
9:25 but um you know I need to see data if
9:26 this is really true I think we have a
9:28 huge positivity bias in Quantum machine
9:30 learning now I know everyone's like
9:31 claiming this but um I work on my
9:33 Fridays actually since forever I don't
9:35 work in Quantum Computing but I work
9:37 with classical machine learning um I
9:38 work with a group of social
9:39 psychologists and they shared with me
9:42 that in 20 around 2010 I think it was in
9:43 their field they had a huge Scandal
9:45 because a huge collaboration of people
9:47 who started feeling really uneasy about
9:49 research sat down and tried to reproduce
9:50 studies that people did in social
9:53 psychology and found almost zero rep
9:55 reproducibility so people have built
9:57 their Harvard careers on something that
9:59 you couldn't reproduce and so start
10:01 wondering if someone shouldn't do this
10:02 and you will see that we might actually
10:04 like start doing something like this and
10:06 we have very interesting findings that
10:07 I'll share
10:09 just so this was more about performance
10:11 there's the third pattern this is more
10:13 about model design again these are these
10:14 are patterns that worry me I don't know
10:17 if they worry you
10:20 um and this is that in a paper that uses
10:22 variational models for Quantum machine
10:23 learning at some stage there's always
10:24 this place where you introduce the
10:25 circuit then there's always a statement
10:27 like we use a poate and some inang and
10:29 so on and so forth I challenge you to
10:31 find papers where there is a logical
10:33 explanation of why they use this circuit
10:35 and not another one so sometimes you
10:36 find a paper that has some design
10:38 principles that they optimize but those
10:41 design principles are in 99% of the case
10:43 derived from quantum physics so we want
10:46 a circuit that's not you know that
10:48 incorporates a model class that's uh
10:50 classically intractable or we want a
10:51 model class that's like Universal of
10:53 some sorts or it's easy to implement in
10:56 Hardware but these design principles are
10:57 not coming from machine
10:59 learning and so like um and this is for
11:01 me like this really became app parent in
11:03 work I did with Ryan and Johannes but
11:04 also like many of you will know this you
11:06 can build a crazy onet that you plug
11:08 into your Quantum machine learning model
11:10 and it's a really useless machine
11:12 learning model in the end you can have a
11:13 crazy circuit that only gives you a sign
11:19 function in the end so yeah yes um so uh
11:21 one thing that we often hear is uh
11:23 expressibility we want our Circ to be
11:25 Express be able to express the whole
11:28 hillt press is that a machine learning
11:29 perspective or is it
11:31 I think this is super difficult because
11:32 um so first of all Quantum
11:34 expressibility so can we express any
11:37 unitary is not something because again
11:39 you can build expressive models that are
11:42 very very limited functions if you say
11:43 that your Quantum models I think the
11:45 latest papers don't fall into this trap
11:47 anymore they're actually talking about
11:48 you know we want to express or function
11:50 classes and expressibility if you ever
11:52 looked into the theory of machine
11:55 learning is a very very subtle topic
11:57 because what classical machine learning
11:59 theory was about is actually for for
12:01 many many years was to find this balance
12:02 between very expressive models and very
12:04 simple models so that you regularize
12:06 well so it's actually the focus of a lot
12:08 of theory and definitely In classical
12:11 machine learning more expressivity is
12:13 not at all better I think this comes
12:15 from people just reading about deep
12:16 learning and these models are very
12:18 expressive but only because they're
12:20 interpolating your data and still doing
12:22 well does not mean that all of a sudden
12:24 expressibility explosivity is actually a
12:26 good thing this is very complicated to
12:27 talk about and use as a measure
12:28 measure [Music]
12:30 [Music]
12:33 yeah how do you um streamline your
12:35 answers when you design it like um which
12:38 parameter is important which is not or
12:40 maybe I can refis if they need like
12:42 classical techniques that can be useful
12:44 here to reduce the parameters yeah don't
12:47 know so this is in all these patterns
12:49 they're always based on uh a problem
12:50 that we can't often can't do better but
12:52 I think we can analyze better we can be
12:54 more honest so for example the previous
12:55 slide also we cannot Benchmark in high
12:57 Dimensions but we can actually talk
12:59 about this and here again you I don't
13:00 know I have never found in classical
13:02 machine learning a principle that you
13:04 can now use to design your onsets it's
13:06 very subtle but we have to like start
13:08 thinking about this in my world you will
13:10 see just now so the answer that I'm
13:13 trying to find to this is to design
13:15 models from first principles more know
13:16 that there's a mechanism in there that
13:18 is interesting and then see how well it
13:20 does instead of just closing your eyes
13:22 and hoping that a Quantum model does I
13:23 did when I started with this in 2017 I
13:25 trained the first Quantum neur networks
13:27 which is really like a long time ago I
13:28 mean even parameter shift rules weren't
13:32 invented then yet and um at the
13:33 beginning I thought like maybe this is
13:35 just like magically going to happen and
13:36 this is where the frustration come came
13:39 in over the years where I realized these
13:40 are not working for me out of the box I
13:42 don't know if anyone when I speak to
13:43 students they often like agree with me
13:45 but if I try to train a Quantum neuron
13:47 Network it doesn't work out of the box
13:49 if I use a psyched learn model it works
13:52 out of the box and there's a discrepancy
13:53 between what I see in the literature
13:55 this is why I think there's a positivity
13:58 bu and what I see in my own research so
13:59 okay I gave a long on so turn I show
14:00 question so
14:03 yeah yes so I think I have a handle on
14:05 the first two patterns you described but
14:07 I'm not sure I understand this one would
14:09 you like to see more uniformity in the
14:12 kinds of models that people use or what
14:14 do you mean exactly I have some sense of
14:16 what you mean when you say that the the
14:18 circuit ons is motivated by the physics
14:19 but when you like to see it more
14:22 motivated by the the learning that's
14:24 where I don't understand yes so I want
14:26 to know why on Earth are using this
14:28 onset and not another one I want to know
14:29 also that you test what another onot
14:31 would do for example in your benchmark
14:33 okay like did you cherry pick an onot on
14:35 which the performance is good and it's
14:37 not necessarily well motivated in and of
14:39 itself yes or just forget about the
14:40 concept of an anet in a variation
14:42 circuit in general but try to get an
14:44 algorithm that is more handmade or where
14:46 there's a property in there that you
14:48 know I get to this actually in the
14:52 second part I well yeah thank you yes so
14:55 so is the point somehow to do with this
14:56 um so you're mentioning expressibility
14:59 in in the classical machine learning
15:01 circumstance where uh there is kind of a
15:03 contradiction between too much
15:05 expressibility and overfitting M so
15:07 there's some rigidity or some kind of
15:09 regularity that is required of the of
15:11 the approximating function right that
15:13 prevents it from overfitting is that
15:15 what you what youing and you know I mean
15:17 a gaan a support Vector machine with a
15:20 gaan kernel can express any function I
15:21 mean it's cool but it still doesn't do
15:23 deep learning at this stage so
15:26 so
15:29 yeah um cool and then the last pattern I
15:30 think this might make
15:33 you I think this attacks a bit more the
15:36 work of all of us here in the room and
15:37 this is the one that frustrated the
15:38 heart of me because I love working in
15:40 theory and there's so much beautiful
15:41 machine learning theory so for example
15:43 one thing I really loved is the spectral
15:45 analysis of kernels so I thought maybe
15:46 we can do this with Quantum kernels or
15:48 statistical physics of learning there
15:49 was actually a workshop here a couple of
15:51 years ago that was fantastic blew my
15:53 mind start working with these scientists
15:55 at some stage they ask me okay we can
15:56 now analyze Quantum models but what
15:57 model class would you actually like to
15:59 analyze going back to the last slide I
16:01 could now take one of um you know
16:04 Nathan's um feature Maps they make a big
16:05 difference of what I take I could just
16:08 arbitrarily take one of them and start
16:11 analyzing um but what people actually do
16:12 a lot in theory that you see is a
16:14 pattern where they start with when when
16:15 I see this in a paper always like s
16:18 saying like Okay this is the pattern we
16:20 consider model class tra RM which means
16:22 any embedding any Quantum computation
16:24 maybe not any Quantum computation but a
16:26 very very large class of anything I
16:28 could do in the world and then I prove a
16:30 theorem proof that you know if I kind of
16:33 approximate this like or I have a kind
16:35 of randomly sampled model out of this
16:37 class that I have Baron plateaus I prove
16:39 for example amir's we were here from
16:41 Amir like a beautiful results on there's
16:42 no back propagation scaling in general
16:44 happening or a lot of cool theory on my
16:47 kernel decompositions you know but the
16:50 point is that all of these results might
16:52 not even hold for the models we should
16:55 care about so if quantum Computing in
16:57 general can't have can't be trainable or
16:59 can't have back propagation SC and the
17:00 discussion in the end of your talk was
17:01 going in that
17:03 direction then okay so be it but there
17:05 might be a model class that is actually
17:06 good and if you think about classical
17:08 machine learning we do not prove lots of
17:09 things about any probability
17:11 distribution that I could parameterize
17:13 we start proving things about new
17:15 networks so we have a very different um
17:16 situation than in deep learning in deep
17:18 learning we know a model that we are
17:20 interested in and we build theory about
17:23 it now we copy this Theory and analyze a
17:24 rather large or arbitrary class of
17:26 models so this is like starting was
17:29 starting to drive me insane so I any
17:31 kind of Investigation about you know
17:33 what parameterized circuits actually are
17:34 in terms of quantum models and this was
17:37 work I did kind of a lot
17:40 before okay so now I have destroyed a
17:42 lot and in the last years I always gave
17:43 these talks that were so negative so let
17:46 me know yes I about the patterns I was
17:48 wondering could you comment briefly
17:51 about to to what extent these patterns
17:53 exist in classical machine learning
17:55 because I think some of them ought to
17:56 like there is a separation between
17:59 theory and practice I think data sets
18:01 and papers aren't publicly available a
18:03 lot of the time I think only the third
18:04 one to be honest so the first one that
18:06 we use computational complex that we use
18:08 basically ways to a language that
18:10 doesn't suit the reality I think people
18:12 have actually given up using that
18:13 language I mean we do a lot of
18:14 benchmarking there the second one we can
18:16 Benchmark in the big regime so that's
18:18 not a problem the third one a bit that
18:20 models are no also the third one is not
18:22 true because the models are verified by
18:24 their success so we know we should
18:26 analyze these models because they're
18:28 doing well and now the last one we're
18:30 doing actually theory about the models
18:31 that are doing well so we don't have
18:33 this arbitrariness so I don't think
18:36 actually any pattern applies there yes
18:38 question of how on Earth are you going
18:41 to um see what the blue field is right
18:43 how do we find out what are the relevant
18:47 issues and I Tred to go what we say
18:49 across Green Street and talk to our
18:51 experimentalist yeah it takes 10 years
18:54 and I'm not a lot wiser so how on Earth
18:57 are you going to find out where the blue
19:02 spot is okay um I'll try now I come to
19:03 this now actually
19:05 yeah um so this is kind of like what I'm
19:07 trying so now I'm sitting there I'm
19:09 starting a team and I want to work on
19:11 something so I could I could actually
19:13 hire any team right what specialist do I
19:14 need what field do I want to go in
19:16 especially if I'm so frustrated myself
19:17 about a lot of things so what on Earth
19:21 do we set up and so we arrived at two
19:24 Focus areas the first one is okay if I'm
19:25 the only one having a worry about our
19:27 Quantum designs and finding them or
19:29 having a bit of susp I if they're doing
19:31 so well let's do the good thing a
19:32 scientist does Let's test this
19:33 hypothesis especially because no one
19:35 else seems to really worry about it
19:36 actually except for those students are
19:38 sometimes talk to so let's try to assess
19:40 how good Quantum models really are and
19:42 this is a very big field so kind of
19:44 reassess benchmarking reassess a lot of
19:45 things that we do and there's a lot in
19:47 the pipeline but the first pilot study I
19:48 talk about here is just to
19:50 systematically Benchmark popular ideas
19:51 in Quantum machine learning trying to
19:54 make the benchmarks as objective as we
19:57 can sounds super boring everyone who
19:58 went onto this project at the beginning
20:00 was like I think this is the most
20:01 exciting piece of research I've done
20:03 ever and for various reasons actually
20:07 but um yeah and the second one is um
20:08 goes a bit into this direction and I
20:11 think you know I've tried to find good
20:12 models for so long and it's always like
20:14 what is my principle to optimize and I
20:15 think we should go the other way around
20:18 we look into Quantum algorithms we
20:19 identify what's the core engine what are
20:22 they good at and then we start asking
20:23 machine learning questions they have
20:25 nothing to do with deep learning and we
20:27 see what we find I'll motivate that in
20:31 the second part Okay cool so first part
20:32 and this is where I share like one
20:33 result that I'm actually really shocked
20:36 about I hope youed about this so a very
20:38 innocent study let's just get a team of
20:40 very senior researchers together let's
20:42 talk all the time together we have a
20:43 good software team let's implement the
20:45 best Benchmark design we can come up
20:48 with there's my realization is it's an
20:50 art to Benchmark World there are there
20:51 are millions of questions you have to
20:52 answer and we always like ask our
20:54 students in the first year of PhD to
20:56 answer those and they're completely lost
20:57 and there's a lot of arbitrariness
21:00 coming from this model selection we had
21:02 a lot of procedures the one we ended up
21:04 with take archive papers after 2018 with
21:06 certain keywords who we caused a very
21:08 wide net of trying to get everything
21:10 that could be related to
21:12 QR we only take the one with more than
21:14 30 Google Scholar citations this
21:16 introduces a very very serious bias but
21:19 one we want it introduces bias towards
21:21 earlier papers that are highly cited and
21:23 we want this because um there are often
21:25 models that people reproduce that they
21:27 talk about so influential papers but
21:30 they not be the most tweaked one so just
21:32 be careful about that notably this
21:34 excludes your 2017
21:36 work oh my gosh that was really bad no
21:38 no no that was you me the one we're both
21:40 on no no no that was published in 2018
21:43 so no why we did why we did this as well
21:45 if you look for classif and archive
21:47 forever you find all of this like State
21:48 classification papers you just have to
21:50 go through so many papers that's so we
21:52 literally sat for a week and just went
21:54 through thousands of papers right just
21:57 to give you an idea of what work this is
21:59 now we limit the uh topic we only want
22:01 nisk because we can only Implement them
22:04 only Cubit models only new models we are
22:05 only looking at supervised
22:07 classification and we want conventional
22:09 classical data so that includes images
22:11 as well to some extent but not like
22:12 something very specific like graph
22:15 structured data we randomly select 15
22:16 papers because we felt this is what we
22:19 can Implement um and then we realized
22:20 when we read through the papers properly
22:22 that some of them unfortunately really
22:23 good ones we can't Implement because for
22:24 example they don't give us actually an
22:26 idea what feature embedding they want or
22:28 this kind of some gaps
22:30 these are the ones we actually had in
22:31 the final selection I'll tell you
22:34 immediately that three of them are not
22:36 yet um implemented well enough that I'm
22:37 confident to share the results because I
22:39 don't know if you ever did a lot of data
22:41 analysis very different from theoretical
22:42 work you don't just Implement an
22:44 experiment but it's almost like a friend
22:45 you get to know you have to like do it
22:47 over and over again you find a bug you
22:48 start not understanding this you change
22:50 the setting and so we've interacted with
22:53 the other models not eternally there's a
22:55 lot more work to do but quite a lot you
22:56 also see because there's an an old paper
22:58 bias that there's actually um you know
23:01 you this was like you I promis this was
23:03 like accidental but I'm happy that it's
23:04 also my own work because you have to
23:05 criticize your own work as well
23:09 something that comes out you see um here
23:12 this paper the auth is actually in the
23:16 audience oh yes um uh voch and this is
23:18 um twice here because it's actually we
23:21 used uh it proposes two types of models
23:22 that we kind of like put into different
23:25 classes so now one observation here is
23:28 that um kind of the types of family of
23:30 models I think are quite representative
23:31 to what we see in this the literature of
23:34 supervised qml um there're these Q andn
23:37 designs which I know I also feel it's a
23:38 complete Misa but it's the idea of
23:40 encoding data and then training A
23:42 variation circuit as a classifier they
23:44 Quantum kernel methods which the idea is
23:47 you embed Quantum data you compare the
23:48 quantum states of two embedded data
23:50 points and then you feed the result into
23:51 a classical machine learning algorithm
23:53 like a support Vector machine they're
23:56 Quantum convolutional um neuron networks
23:57 and there's actually one Quantum
23:59 generative model
24:02 okay cool then you have to decide on
24:03 task I also try to be quick here this is
24:05 like the hardest thing I think there's a
24:06 whole research progam I want to make
24:08 about this uh we use binary
24:10 classification we optimize the accuracy
24:12 and then we use four data sets the first
24:13 one is supposed to be the vanilla
24:15 example we just sample from a hyper Cube
24:18 and use a perceptron model to create a
24:19 linear decision boundary and then we label
24:21 label
24:23 data then the one that I really don't
24:25 like being used but I think the first
24:27 qml paper actually using it we realized
24:29 when we looked AR was my owner is like
24:30 the worst paper in the entire world I'm
24:31 so happy it didn't make a selection
24:33 because it didn't get cited but that's
24:35 emest obviously we can't do original
24:37 Mist we have to pre-process it somehow
24:39 and this is a very simple Problem by the
24:42 way because when you see Mist results
24:44 they use a multiclass classification in
24:45 Quantum machine learning we often use
24:47 binary classification and this is a very
24:48 easy problem so you just get these two
24:50 blobs and you have to find some very
24:52 benign hyperplan or decision boundary
24:54 that is not very curvy that goes through
24:56 them and then what I'm currently working
24:58 on a lot is to um donate some more
25:00 realistic data sets that are based on
25:02 classical machine learning models of
25:05 data first result is that hyper
25:07 parameters matter and this is a problem
25:09 I tell you now why let me tell you what
25:11 you look at look at the time oh no it's
25:14 actually good um what you look at is the
25:16 three families of models that we already
25:18 implemented we also compare them to uh
25:20 classical models from the names if you
25:21 know Cy could learn you realize that
25:23 we're using a framework it's a mixture
25:24 of Jack's Penny Lane and py could learn
25:27 to do the um the hyper parameter
25:28 optimization Cy could Lear by the way is
25:30 super cool framework I know it sounds a
25:31 bit like a beginner's framework but it's
25:33 actually quite Wicked what's what's in
25:37 there and um Mist was pre-processed
25:39 slightly different for the three classes
25:41 of um models and what I show you is now
25:44 the range of accuracies you get from the
25:46 worst to the best hyperparameter setting
25:48 and the hyper parameter grid we use is
25:50 very very small because it means how
25:51 much you know it really increases your
25:53 run times of your algorithms um so we
25:55 used for example the three most
25:57 important hyperparameters of a model we
25:58 used to a grid of like two or three
26:00 points and then we computed every
26:02 combination of those and kind of ran the
26:03 model and so depending on what
26:05 hyperparameters you have you get really
26:07 bad performance or kind of reasonable
26:09 performance that's the first Insight why
26:10 there such a problem all of a sudden you
26:12 do not have a benchmarking um a couple
26:14 of experiments to run but every model
26:15 you have to run hundreds of thousands of
26:17 experiments makes a big headache I
26:19 promise you can see that we only ran our
26:20 results so far this is why they're
26:23 preliminary to 8 cubits Joseph who's
26:25 leading the study he's a very well
26:28 spoken British gentleman but he said if
26:29 he ever reads A Benchmark study up to 8
26:31 cubits only these days he would vomit
26:33 onto the paper so we're at the moment
26:35 doing the task of going higher and this
26:36 is actually not as I mean I'm open my
26:38 eyes how hard this actually is for these
26:40 of things here are the results of the best
26:42 best
26:45 models now I could now start to try to
26:46 interpret them but we're actually really
26:48 still playing with
26:49 interpretations but I want to basically
26:52 pick out a few items here to show you
26:54 how hard it is to to interpret so at
26:56 first s you would say okay the one model
26:57 class that's actually performing really
26:59 really well as good as the classical
27:01 model are the quantum
27:03 kernels but now I told you that I use
27:05 different pre-processing so you should
27:06 be suspicious and this class here I
27:08 don't use amness like the 60,000 data
27:11 points but I use 250 subsample from this
27:13 data so this is a much easier problem
27:15 than the one on the left but maybe still
27:17 kernel methods are good because a lot of
27:19 it is done on a classical computer so
27:22 maybe that's good by the way these total
27:24 drops in the models is also not entirely
27:26 clear to me yet what this is I think in
27:27 this model here was one is of the once I
27:29 was involved and I think it's really
27:30 just becoming I tried to play
27:32 around bit with it and couldn't get it
27:34 better there could be convergence issues
27:36 coming up here so this is something we
27:39 still have to study now the second thing
27:41 that maybe just one second I'll I'll get
27:42 to you in a second the second thing that
27:44 someone would immediately like comment
27:46 about is maybe if I was a first year PhD
27:49 student and I have this horrible you
27:50 know culture of I have to push out a
27:52 paper in four months time because I do
27:53 an internship somewhere I'll probably
27:55 not say oh there's a Quantum model
27:56 that's better than the classical model
27:58 and this is the dressed Quantum circuit
27:59 classifier guess what the dressed
28:01 Quantum circuit classifier is a paper by
28:04 Andrea Mari and it uses a neuron Network
28:05 then a Quantum circuit and then another
28:08 neuron Network so it's really good oh
28:10 it's a neuron Network now if you
28:12 consider this you could say oh so but
28:13 the quantum circuit is making it
28:15 slightly better and I read a lot of
28:16 papers where there's a small line above
28:17 and it's like the quantum model is
28:19 better out let's to put the result out
28:23 let's not try to even Interac it and um
28:24 but there's also a butt is a bit more
28:26 complicated if you see the best uh
28:29 neuron Network in Cross validation is
28:30 actually much better than the quantum
28:32 models for some reason if we use the
28:34 test set and I have no clue why this is
28:36 even possible um the new network some
28:38 there's some strange overfitting it gets
28:40 like worse and I think that if we find
28:41 out what's Happening Here we can
28:43 actually push this
28:46 up okay so fossil good and now here's
28:48 the big bomb I want to drop at least to
28:49 me it was like quite a
28:52 surprise so when you Benchmark you
28:54 should try to break a model so that's
28:56 what we tried to do we took our you know
28:58 these models that I'm talking about they
29:00 have a lot of you know it's a big
29:02 pipeline all of it it's a different loss
29:04 function it's different pre-processing
29:05 there's a lot of decisions that come
29:07 from these papers but what we did we
29:08 checked the quantum part and replaced
29:10 the quantum part however interesting the
29:12 feature map is or whatever is happening
29:14 by separable Circuit so we encode our
29:16 data into po rotations that are not
29:18 entangled we do our variational part by
29:20 rotations that are not entangled we
29:22 measure something that's not entangled
29:24 and these are the
29:28 results and they're almost the same
29:29 and this is something I wouldn't show if
29:31 we haven't consistently seen it over the
29:33 last months it's possible that one or
29:34 two Mob Models still go up and down
29:36 there's something super strange here the
29:38 circuit Centric model gets so much
29:40 better and I have a feeling it's because
29:42 that model introduces a classical bias
29:44 that it adds I think this is actually L
29:46 what's happening here anyways this model
29:50 here which is this um um feature map
29:52 where you encode into an iqp circuit is
29:54 the only one I think where the separable
29:56 model gets worse What's Happening Here
29:58 is the feature M actually doesn't only
30:00 take X1 X2 X3 and encodes it it also
30:03 encodes X1 * X2 X1 * X3 and so on and so
30:05 forth so it builds higher order features
30:07 classically and then encodes them so
30:08 there's classical Pro pre-processing
30:10 going on and when we take the separable
30:12 model we switch that off so maybe that's
30:15 actually What's Happening Here Okay cool
30:17 so that's kind of what it is I'm
30:18 interacting with this data really on the
30:20 daily basis for a long time and I have
30:22 start having this feeling here if you
30:23 have ever played around with the new
30:25 network playground from tensor flow you
30:27 realize that you cannot only put in the
30:28 Med features but you can also build
30:29 these polinomial features and
30:31 trigonometric features and it turns out
30:32 that all the models are doing this they
30:35 build low order trigonometric features
30:37 or uh polinomial features and this
30:39 increases the performance capacity but I
30:42 do not know if this works for larger
30:45 scales uh one second I actually totally
30:47 wait where was the question I'm so sorry
30:49 yeah I I was just wondering uh did you
30:53 run these models on actual Hardware no
30:56 no no oh my uh completely simulated and
30:57 it's already the biggest headache just
30:59 to give you an an idea so this is why I
31:01 find this research so interesting
31:02 because part of my passion is also
31:04 software right or like and if you want
31:05 to push this so for example what we use
31:07 at the moment we use just in time
31:08 compilation with Jacks makes things
31:10 really fast but then you get a problem
31:11 at some stage you can't compile anymore
31:14 because things get too big I also have a
31:15 memory leak that I realize on the plan
31:17 anyways and then we have to reroute our
31:19 entire code use back ends of peny Lan
31:20 that use gpus or high performance
31:22 Computing just to get access to a
31:25 cluster oh my gosh y
31:28 anyways sorry yeah uh just a quick
31:31 question taking a look at the uh
31:33 performance ability or the performance
31:36 of these different uh models as a
31:37 function of the number of quantum and
31:40 classical parameters because they've got
31:42 different number of model parameters
31:44 yeah so it may not be the fairest
31:45 comparison as far as I know is
31:46 completely mixed so there's no
31:48 correlation between the two but these
31:50 will be plots that I'm so like working
31:51 at the moment to show you basically how
31:53 the parameters grow because it's very
31:54 mixed it's very different from each
31:56 model yeah one of the things that I
31:57 found is that looking at
32:00 things like the AIC to disting between
32:01 different models that have different
32:03 numers of parameters sometimes need to very
32:04 very
32:07 different results about what's the
32:09 whether a Quantum or a classical model
32:10 is preferred yes but what I definitely
32:11 don't see is that more Quantum
32:13 parameters more cubits is better or more
32:14 lay even in the hyper parameter
32:15 optimization we hardly ever see that
32:17 more layers is
32:19 better that's something I really find
32:22 strange to be honest yes um which
32:24 Optimizer do you use and do you have an
32:26 assumed noise model is just noisess noisess
32:27 noisess
32:30 Optimizer oh good question so the way we
32:32 implemented and this is you know this
32:34 was like months of discussion of how we
32:36 do it is literally to go into the paper
32:37 and pick out everything they suggest and
32:40 do so even if we know this is maybe not
32:41 the best loss function you know someone
32:43 doesn't use cross entropy but like you
32:45 know to we actually like implemented
32:47 that and wherever this wasn't suggested
32:49 we try to make reasonable take
32:51 reasonable choices do you really that
32:52 most of these models where they didn't
32:54 say something we used
32:57 atom is um when you say no do you also
32:59 mean no shot noise when doing shot noise
33:01 nothing it's just perfectly simulated expectation
33:02 expectation
33:05 values yeah I've actually never touched
33:07 noise because I already like working
33:12 already this is like really a lot to do
33:15 yeah with the hyper parameters did you
33:17 like randomize the the hyper parameters
33:20 like how did you optimize them yeah so
33:21 we do a grid search a complete grid
33:23 surch and actually you could increase
33:24 the grid then you would probably have
33:26 bigger ranges of the model so this is
33:28 always like really difficult choice
33:29 where sometimes you can't increase for
33:31 example circuit Centric is not doing
33:32 very well
33:34 here and I do believe it could do much
33:36 better if we increase the grid but then
33:38 our our laptops just die and as I said
33:40 this we can only do afterwards some
33:42 models could still like increase but I
33:43 do believe also the neuron Network could
33:44 increase because we have a very small
33:48 grid for that one as well yes so for
33:52 this mod so what what kind of the number
33:55 of variational parameters you have to
33:57 the thousands or Millions like usually
34:00 like around 50 but that depends this is
34:01 also a hyper parameter how many layers
34:03 you have in your circuit and they change
34:07 yeah so so here is roughly around 50 for
34:09 for classical and Quantum for the
34:11 quantum models for the classical you as
34:14 well I think yeah as well but that's
34:15 actually good point yeah we need to
34:17 actually so there will be plots added
34:19 for this actually quite hard for every
34:21 hyper parameter sitting it's completely
34:22 different how many parameters how many
34:23 cubits and so on and so forth so you
34:25 would really have to to visualize this
34:27 well so we're working on that just
34:30 because ofly I heard classical Mach
34:33 learning they one mil paramet yeah but
34:34 for example for the support Vector
34:35 machine the parameters are the same
34:37 because you put it into a classical
34:39 model okay cool let me move on because
34:41 I'm really also super excited about the
34:44 next topic which is complete change of
34:47 energy mentally um this is now very deep
34:49 Quantum Computing not very deep I'm
34:52 trying to stay as um superficial because
34:54 I wasn't like waterlot train in quantum
34:55 computer so it's a topic that most of
34:58 you will know know more about than me
34:59 and this is how can we design models
35:01 from first principle so let's say our
35:02 Benchmark study showed us that most
35:04 models are actually crap and we really
35:05 need better ones how do we get better
35:09 ones if we don't have this Golden Rule
35:10 of how to build a good machine learning
35:12 model from classical machine
35:14 learning for this I think we have to go
35:17 back and ask how did q&n Quantum neuron
35:18 networks come about and I think they
35:21 come explicitly from taking two
35:23 assumptions we interpret Quantum
35:24 Computing as something that has to do
35:26 with provable Quantum advantage and with
35:29 dis circuits we inter interpret machine
35:31 learning as the stateof the art which is
35:33 deep learning you know big models great
35:35 interent and of course then you get num
35:37 networks they kind of inherit nicely the
35:39 speed UPS of quantum models because you
35:41 can make them expressive they use poates
35:43 we can Implement on our Hardware they
35:45 follow the blueprint of deep
35:47 learning let's take a different starting
35:49 point um and what I want to do in these
35:52 last slides is really convince
35:54 you kind not convince you but like kind
35:56 of open up this way of thinking which
35:58 was a long process for me because at any
35:59 stage of This research I'm always like
36:01 okay now let's train a variation circuit
36:02 and no that's not what we're doing let's
36:04 do something different and I'm not sure
36:07 whether this is leading yet so this is
36:08 quantum Computing interpreted as solving
36:10 highly structured problems with
36:12 interference and I'll give an example um
36:14 to give you an intuition what I mean and
36:16 take machine learning as generalizing
36:18 from samples no gradient descend no
36:21 Hardware no not even a trainable
36:22 parameter think of kar's neighbor it
36:24 doesn't have trainable parameters it's
36:25 still machine
36:27 learning it's actually quite a good
36:30 algorithm if you ever tried it yeah and
36:32 so I said I was doing family building so
36:33 I I read this book I don't know if you
36:35 know it I read this book to my son and
36:37 it's so beautiful and the places you
36:39 might go there I start to be convinced
36:41 that using these two things as starting
36:43 points will get us somewhere where we
36:45 start understanding something that will
36:47 lead to a better model design we're not
36:49 only the first steps of the way
36:52 yet okay cool so the example I want to
36:54 kind of kind of get these two points
36:56 across a bit better is period finding
36:57 let's say you've got a couple of
36:58 integers and you've got a function let's
37:01 say it also maps to integers and this
37:02 function is periodic there are a couple
37:04 of more requirements you need and the
37:07 question is find the period um this
37:09 sounds very simple but it's actually an
37:11 example of a huge class of quantum
37:13 algorithms the hidden subgroup problems
37:14 and the most famous one is sh's
37:16 algorithm as many of you know this is
37:17 where I'm saying like you guys probably
37:19 learned this all at University um where
37:22 I'm didn't and I also only learned group
37:24 Theory this year my colleague asked me
37:25 like didn't you have a mathematical
37:27 education but somehow just like jumped
37:30 over it in my in my universe here and
37:32 there you know all these like words that
37:33 sound so normal start becoming a larger
37:35 concept so the integers become a group
37:38 for example in this case that 12 the
37:40 function Heights coets of a subgroup so
37:41 this is why it's called the hidden
37:42 subgroup problem and I try to find a
37:45 generator of the
37:47 subgroup so now let's so this is
37:48 basically what I mean by structured
37:50 problems now let's say how do Quantum
37:52 algorithms use interference for
37:53 structured problems and I just show you
37:54 a couple of pictures that you have to
37:56 keep in the back of your mind when you
37:59 when you think about this so what we do
38:00 as a standard algorithm textbook
38:02 algorithm to solve this problem is we
38:05 put our integers or our X values into
38:06 superposition you know we interpret them
38:09 as computational basis States then we
38:11 have this magic Oracle that always comes
38:12 in that kind of knows all the function
38:15 values and takes an Ancilla and uh
38:16 writes in the function value into the ancill
38:17 ancill
38:21 state and by the way so one of the super
38:22 cool things about this answers is that I
38:24 started realizing that it might kick us
38:26 out of the assumption that there's a
38:29 perfect Oracle it might start to like we
38:30 only have samples of the Oracle I come I
38:31 come there just
38:34 now um the next thing we do is we
38:36 measure the second register and what
38:37 does it do it only takes the X values
38:40 out that have the same value and it
38:41 actually doesn't matter what we measure
38:43 here so I'm just use you know we
38:46 measured one in this case and I'm almost
38:48 there what we do then is the magic thing
38:50 that we do if you look into Scott AR's
38:53 talks for example he often talks about F
38:55 being the one thing that is actually
38:56 like interesting especially in the Pap
38:58 that Nathan mentioned you apply a
39:00 Quantum fre transform and what you do is
39:01 you create a superp position with these
39:03 amplitudes that you see here and you see
39:06 it's super structured right and then the
39:07 magic happens of interference I just
39:09 like wrote different you know I
39:10 evaluated these values numerically that
39:13 you see this better in those Darkly
39:15 highlighted States what you get is
39:16 constructive interference because these
39:19 ones here are integer values of um you
39:21 know the exponential function so you get
39:23 like ones here by the way I forgot about
39:25 normalization here so constructive
39:27 interference and this one here in this
39:29 column you always have ones but the rest
39:32 of this the amplitudes will give you um
39:33 exactly minus one and so you get a
39:35 negative interference and by the way if
39:37 you know a group Theory then these are
39:40 obviously like irreducible
39:42 representations and so what you get is
39:43 actually also a very simple State and
39:45 this is what I mean by the solution
39:48 comes from interference why is this a
39:50 solution actually I was super surprised
39:52 that in Hidden subgroup problems uh how
39:54 you get the solution out of the final
39:55 state after the quantum for transform is
39:58 not so simple to to do but in this case
39:59 it's actually very simple because all of
40:00 these states have the property that's an
40:02 integer times 12 which is the size of
40:04 the group divided by the number you want
40:07 to get and you can get this out of a few
40:10 samples okay cool I'm almost ready uh
40:12 done um with kind of what I'm trying to
40:15 say here and by the way you're probably
40:16 expecting also like a huge preliminary
40:17 result that I'll show you now this is
40:20 the new model class this one come the
40:21 preliminary result is literally a
40:23 question that we're asking in research
40:25 now but I promise this took us half a
40:26 year to get this question and some times
40:29 this is not the wrong way to do things
40:31 so interpret Quantum Computing as
40:32 solving highly structured problems with
40:34 interference now let's put in machine
40:36 learning and I think you know where this
40:38 is going machine learning generalizing
40:39 from samples so let's say we don't have
40:41 an oracle but we have something
40:42 something that we started talking about
40:44 as a data rized Oracle if you know any
40:46 papers or research directions that do
40:47 this already I'm almost sure people have
40:49 thought about this from another
40:52 Direction then please let me know so
40:53 what happens if I only have a constant
40:56 number of actually examples of what the
40:58 Oracle does let's go very quickly
41:00 visually through this algorithm so for
41:02 example if I measure one now so I kind
41:04 of kept this state here I won't have
41:06 some of the states I will only have very
41:07 few of the states
41:10 left what happens in the interference
41:11 pattern is that some columns will be
41:13 blacked out and won't exist and what it
41:16 does to The Columns of of the rows of
41:18 constructive interference is the same
41:19 thing they're still constru
41:21 constructively interferes but the
41:23 amplitude is linearly
41:25 smaller the destructive interference
41:28 gets dest destroyed and we started off
41:30 thinking like maybe there's a certain
41:32 probability distribution of data that
41:34 doesn't destroy this so this was like
41:36 what we went at the beginning so to give
41:37 you an example where I plot kind of the
41:39 final distribution that you measure from
41:41 your hidden subgroup problem um if you
41:43 have a perfect Oracle you get nice Peaks
41:45 and then if you have 10% of your Oracle
41:46 you get kind of these interference I
41:49 guess worse and worse and I'm actually
41:50 finished now I tell you two questions
41:51 that we're currently investigating they
41:53 are kind of slightly different flavors
41:55 of this and the first thing is can we
41:57 amplify the signal in for samp this is
41:59 called for sampling basically sampling
42:01 from this distribution so still these
42:03 Peaks are still very structured so is
42:04 there a way to still get out what we
42:07 want if the Oracle so generalized from a
42:10 lower from data rized Oracle
42:12 basically and the second question that's
42:14 a bit different is can we learn to
42:17 reconstruct the Oracle from data so we
42:19 get given only a couple of states can we
42:20 kind of recover the full structure and
42:22 then just run the HSP and now who's in
42:23 the room who doesn't now immediately
42:25 think oh let's train a variation circuit
42:28 and like but the question is what we
42:30 don't want to use arbitrary anws so at
42:31 the moment we're really working on
42:33 trying to find a very clear way that
42:35 uses the inductive bu that this problem
42:36 gives us that is highly structured to
42:38 really see how we can solve this