0:02 and so let's get started because this is
0:03 what I want to talk to you about how we
0:06 use data analytics in soccer because
0:07 it's been a big boom over the last
0:10 stand' four or five years with data
0:11 analysis performance now is this
0:15 statistical analysis and specifically in
0:17 soccer I'm sure you've seen social media
0:18 Twitter everybody these logs are lots of
0:20 bloggers talking about how we use
0:22 statistics I will use data analytics but
0:23 I want to talk to you about how we use
0:29 it here at NYC FC and and basically I'm
0:30 gonna take you just talk to you a little
0:33 bit about the history of statistics and
0:35 data and where we're at now because
0:36 there's two types that we've used
0:38 physical data and technical data
0:41 I'm predominately predominantly on the
0:43 technical side but physical data is
0:45 where as a sport we feel like we've
0:46 mastered so a physical data I'm sure
0:49 you've heard of GPS so how far a play is
0:52 run what intensity is a player runner
0:54 how long does it play around for how
0:55 many accelerations how many
0:56 decelerations how many times that the
0:58 jump landed hit the floor
0:59 all this kind of physical data we feel
1:01 like we've mastered and that's been
1:04 through the use of GPS heart rate data
1:08 and I'll give you an example of how we
1:11 use it for instance if a player got
1:13 injured and had a muscular injury a
1:15 hamstring injury we could go back over
1:17 the period of time just before their
1:19 injury and we can look what did that
1:22 player actually do how long did he run
1:25 for what kind of physical law did he put
1:28 on his body that may have helped him
1:31 have that injury that injury occur we
1:34 can look at that time frame maybe six
1:35 seven eight weeks before we can see what
1:37 that player did and during our training
1:39 phases we can go back and we can see in
1:41 track okay we might be at a level where
1:43 this player could be at risk again so we
1:46 could manage display specific training
1:48 Lord during this week two week three
1:50 week period so from a physical
1:52 perspective we never learn we never hear
1:54 a Decatur chance of injury but we can
1:56 try and minimalize it and help this
1:58 player progress through a Pacific
2:02 training phase that he may have may come
2:04 have a Rico an injury try and prevent
2:06 that from the technical side of things
2:08 we have then
2:11 we're not would get into a stage where
2:13 we have access to lots of different data
2:16 has everybody heard of opto Sports yeah
2:18 opto no so for those who don't know opt
2:20 RA and
2:22 a statistical company who basically
2:23 watch our games and they called our
2:24 games just from a statistical
2:27 perspective they take information on
2:29 every single event on the field so a
2:33 pass a shot are throwing a tackle they
2:34 called everything and they produced the
2:36 data quantities quantitative
2:39 quantitative analysis and if everything
2:41 has happened on the field so that is the
2:42 type of data that I use and what we use
2:44 to produce a statistical analysis and
2:49 our data analysis as well and so for me
2:52 this is my definition of my role from a
2:54 dates perspective so it's bringing an
2:59 objectivity and a predictability to what
3:00 I believe is one of the most fluid
3:02 sports in the world I'm one of the most
3:04 opinionated sports in the world so I'm
3:06 sure everybody in the room today has an
3:08 opinion on how my safety played last
3:10 game where we good will be bad was there
3:12 somebody who was a player that was
3:16 particularly strong weak did we attack
3:17 well did we defend well everyone's got
3:19 an opinion and it's the same with fans
3:21 with everybody in the press in the media
3:24 with coaches with analysts with within
3:26 families people who support one team
3:27 another team everyone's got an opinion
3:30 so for me my role I see it's important
3:34 to bring an objectivity to kind of help
3:36 people's opinion support people's
3:38 opinions and see what has actually
3:40 happened from a different angle from a
3:42 very object see fungal so that's how I
3:44 see that's how I see my role what I can
3:46 produce and what I can bring from a data
3:48 perspective I'm from a video perspective
3:54 as well so statistics and analytics and
3:57 I want to talk about the differences is
3:58 there any statisticians are
4:00 mathematicians are who were anybody who
4:02 works in data analysis in the room yeah
4:04 can you tell me what you do can you
4:05 share what you do are can you share what
4:38 okay ask with know so for me I was from
4:40 a soccer perspective we have spent a lot
4:42 of time looking at statistics I'm not as
4:44 much time looking at analytics so you
4:48 may have seen social media at the end of
4:50 games there's lots of sort of passes
4:52 it's all successful passage how many
4:53 passes forward
4:56 how many shots how many and goals these
4:58 types of statistics for me we've not
5:00 looked at analytics and this is what the
5:02 main part of my talk my presentation
5:03 would be about the differences between
5:05 statistics and the differences between
5:10 analytics so from here this is from one
5:14 of our games this season and what in
5:16 this room what you think this data tells
5:18 us a below look at it look at it you
5:20 from yourselves what do your opinion
5:21 what what could have happened in this
5:25 game what could have gone on second
5:29 sorry very dominant CBO is very dominant
5:33 one quote would you agree with that yeah
5:34 is there anything else that you could
5:35 take from it apart from the overall
5:37 dominance of respective dominance of the
5:42 team say again so you can't tell who won
5:53 who did you think won the game seemed B
5:58 there's team a team a won the game yeah
5:59 so this is actually one of our games
6:02 this season and my next question is
6:07 which team do you think my CFC is Team B
6:11 seem be have you told him
6:13 that's gonna be my next question which
6:15 team is my CFC and if you're really good
6:17 which team is it that and we played
6:19 against and yes I London City when
6:20 played on my home first game of the
6:23 season okay so this was out this was
6:25 some of the basic data statistical
6:28 analysis that we did and by this game
6:29 the first game of the season that we
6:31 played and it's an example just a real
6:33 basic statistical overview of the game
6:36 so well done I'm really impressed didn't
6:37 think anybody will get that a far as
6:38 being really clever when I picked that
6:47 game okay so my next question is do we
6:51 need to question this data previously we
6:53 don't set a squad a game just an obvious
6:54 question do you think we need to
6:57 question this data I think we do because
7:01 we've lost the game but allegedly are
7:02 statistically we've been really really
7:04 dominant in every in every aspect of
7:07 being in possession out possession in
7:10 possession nearly 70 percent double the
7:11 passage apparently box entries nearly
7:13 double shot the shots on-target shots
7:16 inside the box 8 compared to this so we
7:18 still lost a game how did we lose the
7:20 game why did we lose a game and we're
7:23 going to question so I'm asking you your
7:27 opinion now can we make just looking at
7:29 this as a from an analyst perspective do
7:31 you think looking at this is enough to
7:34 make tactically help tactically improve
7:36 the team awesome help inform and make
7:44 need more historical data definitely any
7:53 No okay so yeah definitely something
7:55 didn't need more historical data just a
7:57 look back and see where we are like said
8:00 historical but for me just looking on a
8:03 very singular level these figures these
8:07 are variables just describe what has
8:10 happened there's no kind of underlying
8:13 theory there's no underlying explanation
8:14 we need to delve deeper into this we can
8:16 use it as a I was a good starting point
8:18 to see oK we've been dominant but how
8:20 what we use it as a gag as a GAD there's
8:24 no real kind of meaning meaning behind them
8:25 them
8:27 and that's how I see statistics and
8:30 that's why I feel as a sport soccer has
8:34 kind of we've been put back a couple of
8:35 years because we've been so focused on
8:37 statistics and they've been they've been
8:38 locked out so much in the media they've
8:40 been they've been built up so much I'm
8:41 really we needed to look at data
8:44 analytics and we needed to use these
8:48 variables to help do this like you
8:49 alluded to before create predictability
8:53 look it sit and get some more insight
8:55 into what is actually going on and
8:57 analytics for me is helping us do that
8:59 and we're scratching the surface we're
9:00 scratching the surface with what we're
9:02 doing at the minute and we're scratching
9:06 the surface but we're we're at a point
9:09 using the data that we have we're at a
9:10 point where we can now substantially
9:14 substantially say we can do this because
9:16 from an analytical perspective we have
9:20 the systems in place where every for
9:22 example every event that happens on the
9:24 field we can quantify and we can
9:27 quantify in relation to scoring and
9:28 conceding goals because ultimately
9:29 that's what we want to do as a team my
9:31 job is to try and help my team score
9:32 more goals than the opponent scores
9:36 against those when get three points so
9:38 every event on the field no matter if
9:40 it's a Shawn Johnson putting the ball
9:42 down for a goal kick and passing to the
9:44 right to Fred Breann or max you know our
9:45 our Joe Allen or whoever it might be
9:49 that has a quantitative and you can put
9:51 a quantitative value on that so what
9:54 Knights might happen going forward so if
9:56 you move into the attacking fear it
9:57 might be Maxie Morales beats to play a
9:59 1v1 and slides are through ball to WV
10:01 running into the box that might have a
10:03 bigger that will have a bigger value of
10:04 a pass
10:06 than the past from Shawn Johnson - it's
10:09 a fret so we're now starting to be able
10:11 to so quantify everything that we can do
10:13 and its effect and its influence on the
10:15 school play and subsequent possession
10:20 and this is the one value that we've
10:22 been looking at a lot this season has
10:23 anybody heard of this expected goals
10:25 you've heard of it what's your
10:47 you want to set the presentation sure
10:49 but that basically spot-on so expected
10:51 goals what is it he's hit the nail on
10:53 the head pretty much there so expected
10:55 goals look search shots and it looks at
10:57 up to coldest shots over a million
10:59 nearly a million a half shots in the
11:01 opto database and he looks at the
11:03 location on the field the type of shot
11:05 was it a header was it a volley was it
11:08 weak fought right foot left foot how did
11:10 it originate was it from a cross was it
11:13 from a free ball was it from a counter a
11:14 quick counter-attack a slow
11:16 counter-attack slow buildup play looks
11:18 at all these different variables and it
11:21 puts them together we get a regression
11:23 analysis so we look at the comparative
11:25 value of every the relate comparative
11:26 relationships between all these
11:29 variables and its effect and how do they
11:30 relate how do they compare and working
11:32 interact with each other through this
11:33 type of analysis to get an expected
11:37 goals so this means for every shot we
11:39 can assign a value how likely is that
11:42 shot or that attempt to a resort in a
11:45 goal our goal conceded why it doesn't do
11:47 though what it doesn't do and obviously
11:48 as analysts we need to look at the
11:50 limitations as well as everything that
11:51 we can do and we need to understand
11:53 these limitations and the limitations of
11:56 the expected goals is I'll give you an
11:58 example if Jack Harrison goes down the
12:00 right-hand side I'm puts across him with
12:01 his right foot although he's not really
12:04 likely to do it and and it goes right I
12:05 saw you jack it goes right across the
12:08 six yard box and there's nobody there in
12:09 front the goal to tap it into an empty
12:11 net because the goalkeepers dived than
12:13 the balls gone past him if nobody's
12:15 there and that shot doesn't happen
12:16 there's nobody there to take the shot we
12:18 can't we don't quantify that and qualify
12:19 that with an expected goals because a
12:21 shot didn't happen so that's one of the
12:23 limitations but for me it's not one of
12:26 the major ones as there's a reason
12:27 tactically behind there wasn't a player
12:31 in that position to have that shot okay
12:32 if that makes sense
12:34 well else it doesn't account for and
12:36 this is probably a major limitation for
12:38 me is the teammates defenders
12:40 goalkeepers the location of them who's
12:41 in front of the player when they've
12:43 taken the shot we're not looking at that
12:44 a player's their location at in in
12:46 position if I was to shoot and you were
12:47 to block it we're not taking that into
12:49 account okay which is probably a major
12:52 limitation but for what we have got on
12:54 what we do and how we Jets in how
12:55 reliable this and
12:57 and how substantial expected goals has
12:59 been we can't eradicate it but we're
13:00 taking it into account when were working
13:05 to make this a part of this okay so this
13:06 is the expected gulfs chart so I can
13:08 tell you in this blue zone here there's
13:12 a 33% chance of every shot to go in the
13:16 yellow zone 18% red zone 10% so what
13:17 does that mean everything outside of
13:20 that so every free-kick Andrea Pirlo was
13:21 scored in his career outside here you
13:22 probably should have passed
13:26 okay saying from from the from a run of
13:29 play it's gonna generally be a wasted
13:32 possession okay so you should look to
13:34 pass or create a further Avenue to
13:35 create an opportunity so cystic Leone
13:38 over 1.5 million shots okay
13:39 I'd let you find me let your own
13:41 opinions on that I believe it's quite
13:43 solid and I believe a lot of the chances
13:45 although and the emotion of the game it
13:47 kind of takes over you in these
13:49 situations okay was it the right
13:52 decision and when we get to a stage
13:54 which I'll talk about later we can
13:57 quantify or qualify and location and
13:59 look at that this I think will become
14:04 even more solid expensive goals and so
14:06 what do we get out of looking at
14:07 expected goals what do we get from this
14:11 from this distinct we get did we deserve
14:12 to win the game and I'll show you a
14:14 couple of graphs and a couple of tables
14:15 in a second
14:17 how is our team performing in attack
14:20 feed and in attack and in defense so how
14:21 are we for fine when we score how we
14:23 come performing from a defensive
14:25 perspective and this is a big one for me
14:27 because we can do all the work we want
14:29 from a statistical point which we can't
14:31 impact practice we can't impact training
14:33 we can't impact the 11 players that's it
14:34 and the 18 players in the squad that
14:35 take to the field every game day
14:39 then what's appointment okay so we need
14:40 to be able to influence and create
14:42 something where this analysis worked and
14:44 is it gone juices with how our coaches
14:47 work and can affect our players as well so
14:49 so
14:51 I'll go back to this little graph here
14:53 we decided that we can't really improve
14:55 performance directly just off looking
14:57 these metrics okay so from an expected
15:00 goals perspective from this same game we
15:02 were expected to count to have scored
15:05 one point one goal compared to our land
15:07 or cities not point nine one goal so
15:08 you're taking all their chances are
15:11 their shots everything we spoke about
15:16 here and they were only expected and
15:19 they were experts to score one goal we
15:21 were expected to constant one although
15:22 we've been dominant in every other
15:27 aspect of the game okay why is that so
15:31 this is a chance timeline so look at all
15:34 our attempts in blue and all our Landers
15:36 in yellow so this maps out in relation
15:38 to the time of the game where we've had
15:40 our chances and the value of each chance
15:42 that we have created our each shot we
15:44 have had the value of what's expected to
15:48 be to be a goal okay so you can see it's
15:52 chuckles to get to this point just over
15:54 fifty minutes to get to a similar level
15:56 of our land or chance the chance they've
15:58 created sixty percent chance of scoring
16:04 okay and over the course of the game it
16:07 took us over eighty minutes when you
16:09 look at how dominant we were they still
16:10 caused eighteen minutes to get to the
16:12 same level that Isle and those chances
16:13 were although we've had we control
16:15 possession from possessions that we've
16:16 been in there half half the amount of
16:18 time with the ball then we did without
16:19 it they stalk was 18 minutes to get
16:21 there so this provoked further further
16:24 and further questioning we take at this
16:26 we can use this as a good guide of the
16:27 periods where we've been quite dominant
16:29 but we create the chances that haven't
16:34 been that high in quality should we say
16:37 so this next one
16:39 now we're looking at where our shots
16:41 were from so relative if you remember
16:43 the map I showed you with the circles of
16:45 the blue red and yellow zones looking at
16:48 its actual chance we've had all inside
16:50 the box these are the eight shots
16:52 compared to all angles one chance and
16:53 how good this chance was for them to
16:55 score but as you can see we've had one
16:57 shot inside the Blue Zone where she's
16:59 fifty percent chance of scoring we've
17:00 had free shots inside the yellow zone
17:02 which is fifty four percent chance of
17:04 scoring saw them to them attempts
17:06 combined probably gives us that goal we
17:08 should have scored tatata game so we
17:10 created them similar chances there and
17:12 would think okay we can look at them
17:13 chances in a little bit more data how
17:15 did we create them chances why did we
17:18 not score outside of that we had four in
17:18 a red zone
17:20 but she's for any chance of scoring but
17:21 obviously we need to look at them in a
17:24 little bit more detail as well so now
17:25 looking at everything combined our
17:27 expected goals knowing what with
17:29 understanding my expected goals and how
17:31 we formulate and calculate is looking at
17:33 the chance timeline and the types of
17:36 chances we've conceded there can we
17:40 start to make a few tactical decisions
17:42 now I can we start to inform the process
17:47 yes no what do you think we're going
17:49 into a little bit more detail and we can
17:50 quantify a little bit better and I have
17:52 a better understanding of the types of
17:55 chances we've created in relation to our
17:57 dominance can we now start to formulate
18:01 that's that's working say that again sorry
18:13 yeah yes so now we have a better
18:15 understanding we have a better and we're
18:17 better educated in what we need to do
18:18 where we want to go how we want to do
18:20 that yeah okay definitely a hundred
18:24 percent but again for me tactically we
18:26 still can't really influence we still
18:27 can't really influence from this we have
18:29 a better deeper understanding and it's
18:30 pointing us in a direction and it's
18:32 pointed us in a direction where we can
18:35 do this start reviewing specific footage
18:37 so it's giving us a real form the
18:39 standing of where our chances have come
18:41 from the quality of them chances and
18:43 pointers in the direction of the areas
18:45 relative to the timeline where we need
18:47 to look how we need to look okay so we
18:49 go back and we review this and then this
18:53 would work on specific technical and
18:55 tactical content during training session
18:56 so that was pointers in that direction
18:58 to use the video to then see okay so
19:01 these chants would be created which had
19:03 first three percent chance of have been
19:05 converted what kind of chance was it and
19:07 how did we create that chance in the
19:09 context of that game so then we can look
19:11 at that chance was there ways that we
19:14 could create better chances or how did
19:15 we create that chance and then we look
19:17 at the other the chances that weren't
19:19 maybe so good and why was that why was
19:20 that so it's provoking us and looking
19:27 out in a different perspective so this
19:28 is where we are now in relation to
19:31 expected goals so expected goals in the
19:33 league we have the highest highest in
19:35 MLS so about of all the chances we've
19:37 created while the analysis that we've
19:40 looked up after all the chances during
19:42 the season we've gone through we've had
19:44 a lot will seem where I'm gonna tell you
19:45 out because I've given a lot of secrets
19:47 away but we've looked at how we create
19:50 our chances and now we're just above
19:52 Chicago in terms of goals up we have
19:54 expected to score which is for less than
19:55 what we have actually score
19:57 so we're outperforming what we are
19:59 predicted to be and we're supposed to be
20:03 this season but using this data so
20:05 pointers in the directions to guide us
20:08 and to inform the coaches and we've been
20:09 able to structure
20:11 training we've been able to use it as a
20:13 guiding principle we're able to use it
20:15 to to guide our focus on
20:18 goalscoring and and defendant and I
20:19 really on a really basic level and our
20:21 coaches take this information really
20:23 really well so come up with decisions
20:25 like this but we're using their
20:27 expertise from a soccer playing
20:28 perspective and coaching perspective
20:30 mixed with their data to come up with an
20:32 object if you point in a standpoint of
20:34 where to focus our attention because we
20:36 want to focus our attention on the plays
20:37 that are going to scores more goals and
20:40 stop us conceding some really obvious
20:42 level we want to focus on that and this
20:44 and the data is giving us that direction
20:47 to be able to focus as well as taking on
20:48 board the subjective feelings of the
20:49 coaches from a tactical perspective it's
20:51 really really important because they
20:52 have really good experience great
20:53 experience in the game they know the
20:55 game really really well their intuition
20:57 is not something that as data analyst
20:58 there's a farm science what we can
21:00 ignore notion no way because their
21:02 intuition and what they feel about the
21:03 game how they see the game is really
21:04 really important but we from an
21:07 objective standpoint need to use this to
21:10 help okay is this correct if it's not
21:13 how can we look at a way of saying okay
21:15 from their Jets if some point this is
21:16 what we need to do this is how we need
21:18 to do it okay then you come together so
21:19 it's a really really strong working
21:22 relationship one myself daily have with
21:27 the cultures of their of my CFC so just
21:30 to finish data analytics and we're next
21:32 so alluded to it before with location of
21:34 players and being able to map from a
21:36 technical standpoint where they are on
21:38 the field and how their relationships
21:40 integrate with each other and how my
21:42 movement five yards forward could
21:43 influence your movement five yards
21:45 backwards how does that in the middle in
21:47 and rating a goal scoring opportunity
21:50 and this is XY coordinate data location
21:51 data is what we're calling it this
21:53 company's core track up
21:56 trial in in the Premier League this type
21:58 of data so we're on the cusp of it to be
22:00 able to really solidify everything that
22:02 we do from data perspective and to be
22:04 having a little bit more predictive
22:06 analysis or what could actually happen
22:08 so try and map it out from the real
22:10 objective standpoint of what is going on
22:12 from a numbers perspective when we get
22:14 this XY data it's going to be really
22:15 really exciting for the sport and you'll
22:17 see a massive boom and a massive change
22:19 in how data analytics is perceived in
22:21 the outside wider community so we're
22:24 still we're looking for this and then
22:26 and last but not started on it yet but
22:28 as a group CFG from a research
22:30 perspective we're leading the way in
22:32 Europe and in England on trying to get
22:34 this and push this forward and try out
22:35 trial in different algorithms trial in
22:37 different software companies just to see
22:39 where we are at with this the location
22:41 data because once we get that from a
22:43 technical perspective and comparing the
22:44 interactions of every single player on
22:47 the field and have imagined an Excel
22:49 data sheet with millions and millions
22:50 and millions the roles being able to go
22:51 through that on how each players
22:54 movement has affected the next place
22:56 then we really start to see the game
22:57 from a tactical perspective and it
22:59 really starts to relate to the numbers