Hang tight while we fetch the video data and transcripts. This only takes a moment.
Connecting to YouTube player…
Fetching transcript data…
We’ll display the transcript, summary, and all view options as soon as everything loads.
Next steps
Loading transcript tools…
Machine Learning for Security and Security for Machine Learning with Nicole Nichols - TWiML Talk... | The TWIML AI Podcast with Sam Charrington | YouTubeToText
YouTube Transcript: Machine Learning for Security and Security for Machine Learning with Nicole Nichols - TWiML Talk...
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Stripe's machine learning infrastructure has evolved from supporting critical production use cases like fraud detection to providing a flexible and scalable platform for model training and inference, leveraging Kubernetes and a robust feature framework.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
all right everyone I'm on the line with
Kelly revoir Kelly is an engineering
manager at stripe working on machine
learning infrastructure Kelly welcome to
this week in machine learning in AI I
thanks for having me
I really excited to chat same here same
here so we got in touch with you kind of
occasioned by a talk you're giving at
strata which is actually happening as we
speak I'm not physically in sf4 at this
time but your talk which is going to be
later today is on scaling model training
from flexible training api's to resource
management with kubernetes and of course
machine learning infrastructure and AI
of platforms is a very popular topic
here on the podcast and so I'm looking
forward to digging into the way stripe
is platforming its machine learning
processes and operations but before we
do that I'd love to hear a little bit
about your background and how you got
started working in this space yes great
maybe I'll say a little bit about what I
do now and then kind of work backwards
from that awesome so right now I'm an
engineering manager at stripe and I work
with our data infrastructure group which
is seven teams kind of at the lowest
level things like our production
databases or things like elastic search
clusters and then kind of working up
through like batching streaming
platforms core like ETL data pipelines
and libraries and also machine learning
infrastructure I've been at stripe for
very close to six years now from when
the company was about 50 people and have
basically worked on a bunch of different
things in sort of like risk data and
machine learning and both as an engineer
and engineering manager and also
initially more on kind of like the
application side and then over time
moving over to the the infrastructure
side by a training I am like a kind of
research scientist person so I studied
physics and electrical engineering in
school and did my PhD at Stanford
working on nanophotonics and then
a short postdoc at HP Labs nanophotonics
yeah I think you had recently optics
which is not too far away so maybe that
can see a little bit of an idea okay and
then yeah I was at HP Labs for a year so
working on sort of similar things and
also some 3d imaging and I guess I like
to call what I did although I don't know
that anyone else calls it that sort of
like full stack science where like you
have an idea and some theory or modeling
or simulation and then you use that to
design a device and then you actually go
in the cleanroom and like make the
device and then you actually go in the
optics lab and like you know shoot a
bunch of lasers out your device and
measure it and then you sort of like
process the data and compare it to your
theory and simulation and I was like I
found like kind of the two ends the most
like sort of the magical moment where
like you know the data that you
collected like matches what you thought
was gonna happen from your modeling and
I kind of decided that I wanted to do
more of that and a little less than like
fabrication or material science and I
was kind of sitting in Silicon Valley
and started looking around and like
stripe was super exciting in terms of
its mission like having interesting data
and just like having amazing people
awesome awesome stripe sounds really
interesting but shooting lasers at stuff
also sounds really really cool nice nice
and so maybe tell us a little bit about
stripes kind of machine learning journey
from an infrastructure perspective you
know how did it it sounds like you're
doing a bunch of interesting things both
from a training perspective from a data
management perspective inference but how
did it evolve yeah I think one thing
that's interesting about machine
learning at stripe Blake I think a lot
of places you talk to machine learning
kind of like started out as being for
some some kind of like offline analytics
more like you know internal business
questions like maybe like you're trying
to calculate long-term value of your user
user
and we do stuff like that now but we
actually started like our kind of core
uses have always been very much on kind
of the production side like our kind of
most business critical and first machine
lean you need machine learning use cases
where things like scoring transactions
in the charge flow to evaluate whether
they're fraudulent or not we're doing
kind of like internal risk management of
like you know making sure our users are
you know selling things that we can
support from our Terms of Service or
that they're kind of like you know good
users that we want to support and so we
we started out from having kind of a lot
of these more like production
requirements and it needs to be this
fast and it needs to be this reliable
and I think our machine learning
platform kind of like evolved from that
side where you know initially we had
kind of like one machine learning team
and then even just having a couple of
applications we started seeing like oh
here are some commonalities like
everyone needs to be able to score
models or you know even like having some
notion of shared features could be
really valuable across just a couple of
applications and then as we split our
machine learning team one piece of that
became machine learning infrastructure
which we've developed since then and you
know it's really important for that team
to work both with the teams doing the
business applications which now include
a bunch of other things in our user
facing products like radar and billing
as well as internally and also you know
it's important for the machine learning
infrastructure to build on the rest of
your data infrastructure and really the
rest of all of your infrastructure and
we've worked really closely with like
our orchestration team on you know as
you said and chatting about my talk like
getting training to run on kubernetes
yeah that's maybe an interesting place
to start the you kind of alluded to the
the interfaces between machine learning
infrastructure as a team and you know
data infrastructure you know just
infrastructure how do they how do they
connect you know maybe even
organizationally and how do they tend to
work with them up with one another for
example you know in you know training on
kubernetes you know where is the line
between what the ml infrastructure team
is doing and you know what it's
requiring of some you know broader
technology infrastructure group yeah I
think the kubernetes case is really
interesting and it's one that's been
super successful for us so I guess maybe
like a year or two ago we'd initially
focused on the kind of scoring like
real-time inference part of models
because that's the hardest and we'd sort
of left people on their own it's like
well you figure out how to treat a model
and then you know if you manage to do
that we'll help you score it and we
realized that that wasn't like great
right so we started thinking you know
what can we do and at first we built
some CLI tools to kind of like wrap the
Python people were doing but then we
wanted to kind of do more so eventually
we built an API and then a big hassle
had been the resource management and we
just kind of wanted to like abstract
that all away and as it happened at that
time our Constitution team had gotten
like really interested in kubernetes and
I think they wrote a blog post like
maybe year and a half ago they had kind
of just moved our first application into
kubernetes which was some of our cron
jobs that we use in our financial
infrastructure and so we ended up
collaborating this was kind of like a
great next step of a second application
they could work on and you know we had
some details we had to work out we're
having to figure out like how do we
package up all of our Python code and to
you know some docker file we can deploy
and it was really useful to be able to
work with them on that but I think we
have found really good interfaces in
working with them where you know we
wrote a client for the communities API
but it's like anytime we need help or
any time there's management of the
communities cluster they take care of
all of that so it's kind of given us
this flexibility where we can define
different instance and resource types
and swap them out really easily if we
need CPUs or GPUs or we need to like
expand the cluster but we as a machine
learning infrastructure kind of like
don't have to deal with managing
kubernetes or updating it we have this
amazing team of people who are like
totally focused on that for stripe
mm-hmm awesome awesome and then actually
let's maybe stay on
this you know this topic for a moment so
your talk as strata was focused on this
area what was kind of the flow of your
talk what were the main points that you
are that you're planning to go through
with the audience there yeah great
question so we we kind of think about
this in two pieces and you know maybe
that's cuz that's how we actually did it
so one piece was the resource management
that I talked about was you know getting
getting things to around on kubernetes I
was actually kind of like the second
piece for us the first piece was
figuring out sort of like how should the
user interact with things and like where
should we give them flexibility and
where should we constrain things and so
we ended up building what we call
internally railyard which is like a
model training API and it goes with
there's sort of two pieces there's like
what you put in the API request and then
there's what we call out workflow and
the API request is a little bit more
constrained like you have to say your
metadata for who's training so we can
track it you have to tell us like where
your data is like how you're doing
things like hold out just kind of basic
things that you'll always need to put
them we have this workflow piece that
people can write like kind of like
whatever Python they want as long as
they define a train method in it that
will hand us back like the fitted model
and we definitely have found that like
initially we were very focused on binary
classifiers for things like fraud but
people have done things like weird
embeddings if people doing
timeseriesforecasting we're using like
things like scikit-learn actually abused
fast text by georg prophet and so this
has worked pretty well in terms of like
providing enough flexibility that people
can do things that we actually didn't
anticipate originally but it's
constrained enough that we can run it
and sort of track what's going on and
you know give them what they need and be
able to automate the things we need to
automate okay and so you're the
interface you're describing is this kind
of Python and this train method are you
well actually that's it maybe a question
do you that are the users do you think
of your users as more kind of the data
science type of user or machine learning
engineer type of user or is there a mix
of those two you know types of
backgrounds yeah it's a mix which has
been really interesting and I think
coming back to what I said earlier like
because we initially focused on these
kind of critical production in these
cases we started out where the team's
users were really pretty much all
machine learning engineers and very
highly skilled machine learning
engineers like people who are excellent
programmers and you know they know stats
in ml and they're kind of like the
unicorns to hire and over time we've
been able to broaden that and I think
having things like you know this tooling
has made that possible like in our user
survey right after we first shipped even
just the kind of like API workflow piece
and we were actually just like running
it on some box as a sidecar process we
hadn't even done kubernetes yet but a
lot of the feedback we got was like oh
this new person started on my team and I
just like pointed them to the directory
where the workflows are and I like
didn't have to think about how to split
all these things out because like you
know you just kind of pointed me in the
right direction and I could point them
in the right direction so I think that
having having these kind of like common
ways of doing things has been a way to
broaden our user set and as our data
science team which is more internally
focused has grown they've been able to
kind of like start picking up
increasingly large pieces of what we
built for the ML engineers as well and
we've been like excited to see that and
work with them and so the the interface
then is kind of Python code and our is
the platform container izing that code
or is the user expected to do it or is
it integrated into some kind of workflow
like they check it in and then it
becomes available you know to the
platform via check-in or see ICD type of
process yeah so we still have the
experimental flow where people can like
kind of try things out but when you're
ready to productionize your workflow
basically what you do is you get your
code review
you merge it and we use we ended up
using Google's subpar library because it
works really well with basil which we
use for a lot of our build tooling to
kind of what are those - yeah so subpar
is a Google library that helps us like
package Python code into like a
self-contained executable both the
source code and any dependencies like if
you're running PI torch and you need
some kudos stuff okay
and it works kind of out of the box with
basil which is the open source version
of Google's build system which we have
started to use that stripe a few years
ago and have extended since it's really
nice for like speed reproducibility and
working with multiple languages so this
is where our ml in 14 kind of worked
with our orchestration team to figure
out the details here to be able to kind
of like package of all this Python code
and have it so that basically almost
like a service deploy you can kind of
like have it turn into a docker image
that you can deploy to like Amazon's ECR
and then kubernetes will kind of like
know how to pull that down and be able
to run it so the ml engineer the data
scientist doesn't really have to think
about any of that it just kind of works
as part of the you know you got your app
er emerged and you deploy something if
you need to change the workflow okay but
earlier on in the process when you're
experimenting the currency is a you know
some Python code are you [Music]
[Music]
does the are you like what kind of
tooling have you built up around
experiment management and automatically
tracking various experiment parameters
or hyper parameters hyper parameter
optimization and that kind of thing are
you doing all that or is that all on the
the user to do yeah that's a really good
question so one of the things that we
added and our API for training as we
found it was really useful to have this
like custom params field especially
because we eventually people ended up
and you know we have some shared
services to support this like sort of a
retraining service that can automate
your training requests
and so one of the things that people
from the beginning use the custom
programs for was hyper parameter
optimization optimization we are kind of
working toward building that out as a
first-class thing like we now have like
evaluation workflows that can be
integrated with all of this as well and
that's kind of like the first step you
need for high parameter optimization if
you want to do it as a service is like
what are you optimizing if you don't
know what you're looking at so that's
something we hope to do like over the
next you know three to six months is to
make that like a little bit more of
first-class support and you mentioned
this this directory of workflows
elaborate on that a little bit yeah so
one of the nice things is you know when
you're writing your workflow if you put
it in the right place then are like our
scholar service railyard will know where
to find it but one of the side benefits
has also just been that there is one
place where people's workflows are and
so that that's been kind of like a nice
place for people to get started and see
like you know what models are other
people using or like what pre-processing
or kind of what other things are they
doing or what what types of parameters
like estimator parameters are they
looking at changing to just kind of you
know have that be like a little bit more
available to our users or internal users
mm-hmm and the workflow elements of this
is it is a graph based is it something
like airflow how's that implemented yeah
so in this case my workflow all I mean
it's just like Python code that you know
you give it like we're actually railyard
our API passes to it like what are your
features or what are your labels and
then you are Python code returns like
here is the fitted pipeline or model and
like usually something like the
evaluation data set that we can pass
back we have had so we've people have
kind of built us and users like
interesting things on top of having a
training API so some of our users built
out actually the folks working on radar
a fraud product built out like an auto
retraining service that we've since kind
of take it over and generalized
and where they schedule like nightly
retraining of all the tens and hundreds
of models and you know that's integrated
to be able to even like if the
evaluation looks better like potentially
automatically to play them we do also
have people who have put like training
models via our service into like air
flow decks if they have you know some
some slightly more complicated set of
things that they want to run so you
definitely seen that as well okay and
you've mentioned radar a couple of times
is that a product that stripe or an
internal project of yeah like user
facing fraud product it runs on all of
our machine learning infrastructure and
you know every charge that goes through
stripe within usually 100 milliseconds
or so we've kind of like done a bunch of
real-time future generation and
evaluated like kind of all of the models
that are appropriate and in addition to
sort of the machine learning piece
there's also a product piece for it
where users can get more visibility into
what our ml has done they can kind of
like write their own rules and like set
block thresholds on them and there's
there's sort of like a manual review
functionality so they're kind of some
more product pieces that are
complementary to the underlying machine
learning okay interesting and so just
trying to complete the picture here
you've got these workflows which are
essentially Python they expose a trained
entry point and do you are they you
mention this directory of workflows is
that like a directory like on a server
somewhere with just like dot PI files or
is that are they do you require that
they be versioned
and are you kind of managing those
versions yeah so that that's just like
actually like in a code basically so
that's like yeah the workflows live
together in code as part of as part of
kind of our tuning API it's like when
you send that here's my training request
which has you know here's my data here's
my metadata this is the workflow I want
you to run we give you back a job ID
which then you can check the status of
you can check the result the result will
have things in it like what was the get
Shaw and so that's like something that
we can track as well got it
so you're submitting the job with the
little bit which workflow you're running
through like in the case where you're
running on kubernetes you've merged your
code to master and then we kind of
package up all this code and deploy the
docker image and then from there you can
kind of make requests to our service
which will run the job on kubernetes so
at that point your code it's you know
whatever is on master for the workflow
plus whatever you've put in the request
got it
okay and so that's the the kind of the
shape of the training infrastructure
you've mentioned a couple of times that
you it sounds like there's some degree
to which actually I'm not sure maybe I'm
inferring a lot here but let's talk
about the where the the data comes from
for training and what kind of you know
platform support you're offering folks
yeah that's a really interesting
question kind of within the framework of
like what do you need for a like really
our API request we support two different
types of data sources one is more for
experimentation which is like you can
kind of tell us how to make the sequel
to query the data warehouse and that's
kind of nice for experimentation but not
so nice for production what pretty much
everyone uses for production is the
other data source we sort or
which is parquet from s3 so it's like
you tell us you know where to find that
and what your future names are and
usually that's generated by our futures
framework that we call semblance which
is basically like a DSL that helps you
know gives you a lot of ways to write
complex features like think have things
like counters be able to do things like
joins do a lot of transformations and
then you know the other infrastructure
team figures out like how to run that
code in batch if you are doing training
or like there's a way to run it in real
time basically and kind of like a
consumer setup but you only have to
write your code feature code like once
okay and so dr you also is it the user
that's only writing a feature code once
are you going after kind of sharing
features across the user base to what
extent or are you seeing shared features
yeah yes the user writes their code once
and like also I think having a framework
similar to the training workflows where
people can see what other people have
done has been really powerful so we do
have people who are like definitely kind
of sharing features across applications
and there's there's a little bit of a
trade-off like it's like a huge amount
of leverage if you don't have to rewrite
some complicated business logic you do
have to manage a little bit of making
sure that you know everything is
versioned and that you're paying
attention to like not deprecated
something someone else is using and that
you're not like just like changing a
definition in place that you are kind of
like creating a new version every time
you are changing something right so
there's a little bit more management
there and hopefully over time we can
improve our tooling around that but I
think it's you know even even since
before we had a features framework like
being able to kind of share some of that
stuff has been like hugely valuable for
us mmm and are you so what is the
features framework is that
is that a set of api's or is that kind
of a run time like what what exactly is it
it
yeah there's kind of two pieces so which
is basically sort of what you said like
you know whatever like the API like what
are what are the things we you know let
users Express and one thing we tried to
do there is actually constrain not a
little bit so we like you have to use
events for everything and we don't
really let you Express notions of time
so you kind of can't mess up that time
machine of like what was the state of
the features at some time in the past
where you want to be training your model
we kind of like take care of that for
you so that's kind of one piece and then
you know we kind of compile that into
like an ast and then we use that to
essentially write like a compiler to be
able to run it on different backends and
then we can kind of like you know write
tests and try and check at the framework
level that that things are gonna be as
close as possible to the same across
those different backends so back-end
could be something for training where
you're going to materialize like what
was the value of the features at each
point in time in the past that you want
as inputs to training your model or
another back-end could be like I
mentioned we have kind of this consumer
base back-end that we use like for
example for radar to be able to like
evaluate these features like as a charge
is happening and so to what extent you
find that that that limitation of
everything being event-based
gets in the way of what folks want to do
yeah that's definitely a little bit of a
paradigm shift for people because
they're like oh I just want to use this
thing from the database right but we
found that actually it's worked out
pretty well and that especially when you
have users who are ml engineers like
they do really understand the value of
like why you want to have things event
based and like the sort of gotchas that
that helps prevent because I think
everyone has their story about how you
were just looking something up in the
database but then you know the value
changed and you didn't realize it so
it's kind of like you're leaking future
information into your training data and
then your
it's not gonna do as well as you thought
it did so like I think moving to a more
event based world and I mean I think in
general stripe has also kind of been
doing more streaming work and more
having like good support also as at the
infrastructure level with Kafka has been
really helpful with that and so does
that mean that the models that they're
building need to be aware of kind of
this streaming paradigm during training
where do they get a static data set to
Train yeah so basically you can kind of
use our futures framework to just
generate like park' and s3 that has
materialized like all the information
you want of what was the value of each
of the features that you want at all the
points in time that you want and then
you know your input to the training API
is like please use this park' from s3 we
could make it a little more seamless
than that the nods works pretty well and
part I use just like a serialize like a
file format yeah it's pretty efficient
you know I think it's used in a lot of
kind of big data uses you can also do
things like predicate push down and we
have like a way in the training API to
kind of specify some filters there to
just kind of like save save some effort
use a predicate push down yeah so if you
know you only need certain columns or
something like you know you can you can
load it a little bit more efficiently
and not have to carry around a lot of
extra data got it okay the other
interesting thing that you talked about
in the context of the this event base
framework is the whole you know time
machine is the way you said it kind of
alluding to the point in time
correctness of you know feature snapshot
can you elaborate a little bit on did
you did you start there or did you
evolve to that that seems to be in my
conversations kind of I don't know maybe
like one of the
cutting edges or bleeding edges that
people are trying to deal with as they
scale up these these data management
systems for features yeah for this
particular project in this version we
started there straight previously had
kind of looked at something a little bit
related a couple years before and in a
lot of ways we kind of learned from that
so he ended up with something that was
more more powerful and sort of solved
some of these issues at the platform
level we did you know at that point we
had been running machine learning
applications in production for a few
years so I think everyone has their
horror stories right I was like all the
things that can go wrong especially kind
of a derp correct this level and like
everyone has their story about like
reimplemented features and different
languages which we we did for a while
too and kind of like all the things that
can go wrong there so yeah I think we
really tried to learn from both like
what are all the things we'd seen go
well or go wrong in individual
applications and then also from kind of
like our previous attempts at some of
this type of thing like what what was
good and you know what could still be better
better
mm-hmm and out of curiosity what do you
use for data warehouse and are there
multiple or is it is there just one and
we've used a combination of redshift and
presto over the past couple of years you
know they have a little bit of sort of
like different abilities and strengths
and those are those are things that
people like to use to experiment with
machine learning although like you know
we generally don't use them in our
production clothes because we kind of
prefer the event piece model it so is
the event based model is it kind of
parallel or orthogonal to to redshift or
press tours or is it a front-end to
either these two systems yeah I guess we
have we actually have a front-end that
we've built for redshift and presto you
know separately from from machine
learning that's really nice and lets
people like you know to the extent they
have permissions to do so like Explorer
tables or put annotations on tables and
we haven't integrated our
in general I would say we could do some
work on our UI is for formal stuff we
definitely focus more on the backend and
infra an API side although we do have
some things like our auto retraining
service has a UI where you can see like
what's the status of my job like was it
you know did it finish did it produce a
model that was better than the previous
model mm-hmm I think I'm just trying to
wrap my head around the the event based
model here you know as an example of a
question that's coming to mind in an
event-based world are you regenerating
the features you know every time and if
you've got you know some complex feature
that involves a lot of transformation or
you have to backfill a ton of data like
what does that even mean in an
event-based world where i think of like
you have events and they go away
yes I kind of store for all that that
isn't redshift or presto well you know
we're publishing something to Kafka and
then we're archiving it to s3 that then
that persists like you know as long as
we want it to in some cases basically
forever and so that is available we do
do you end up doing a decent amount of
back filling of kind of like you know
you define the transform features you
want but then you you know you need to
run that back over all the data you'll
need for your training so that's
something that we've actually done a lot
of from the beginning partly because of
our applications like when you're
looking at fraud you know the way you
find out if you were right or not is
that like in some time period usually
within 90 days but sometimes longer than
that the cardholder decides whether
they're going to dispute something as
fraudulent or not and that's compared to
like you know if you're doing ads or
trying to get clicks like you kind of
get the result right away right and we
you know so I think we've always like
been interested in kind of like being
able to backfill so that is you know you
can log things forward but then it's
like you'll probably have to wait a
little bit of time before you have
enough of a dataset that you can train
on it ok cool so we talked about the
data side of things we talked about
training and experiments
how about inference yes that's a really
great question and that's that's kind of
like the first thing that we built
infrastructure support for at first a
decent number of years ago like I think
even before things like tensorflow we're
really popular and so we have like our
own Scala service that we use to do our
production real-time inference and you
know we started out especially because
we have like mostly transactional data
we don't know a lot of things like
images at least as our most critical
applications at this point and a lot of
our early models and even still today
like most of our production models are
kind of like tree based models like
initially things like random forests and
now things more like
x/g boost and so you know we've kind of
like we have the serialization for that
built in to our training workflows and
we've optimized that to run pretty
efficiently in our Scala in print
service and then we've built some kind
of nice layers on top of that for things
like model composition kind of what we
call meta models where you know you can
kind of like take your machine learning
model and kind of like almost like
within the model sort of compose
something like add a threshold to it or
like for radar we trained you know some
array of like in some pieces users
specific models along with like maybe
more of some global models and so you
can kind of incorporate in the framework
of a model doing that dispatch where
you're kind of like if it matches these
conditions let's core with these models
otherwise score with this model and like
here's how you combine it and then the
way that interfaces with your
application is that each application has
what we call a tag and basically the tag
points to the model identifier which is
kind of like immutable and then whenever
you have a new model or you're ready to
ship you just like update what is that
tag point to and then you know put it in
production you're saying like score the
model for this tag okay and that is
pretty similar to like you know if you
read about Michelangelo and things like
that sometimes we're like we all came up with
with
it also sounds a little bit like sorry
say that again
yeah I think that like a lot of people
who kind of come up with some of these
that these ways of doing things that
just kind of make sense mm-hmm it also
sounds a little bit like some of what
Selden is trying to capture a Nakuru
Nettie's environment I which I guess
brings me to is the inference running in
kubernetes or is that a separate
separate infrastructure it's not right
now but I think that's mostly like a
matter of time and prioritization like
the first thing we moved to kubernetes
was the training piece because the
workflow management piece was so
powerful or sorry the resource
management piece was so powerful like
being able to swap out CPU GPU high
memory we've moved some of our like the
sort of real-time feature evaluation to
kubernetes which has been really great
and made it like a lot less toil to kind
of deploy new feature versions at some
point we will probably also move the
inference service to kubernetes we just
kind of haven't gotten there yet because
it is still some work to do that and is
the the inferences is happening on AWS
as well and are you using kind of
standard CPU instances or are you doing
anything fancy there yeah so we run on
cloud for pretty much everything and
definitely use a lot of AWS for the
real-time inference of the most
sensitive like production use cases
we're definitely mostly using CPU and
we've done a lot of optimization work so
that has worked pretty well for us I
think we do have some folks who've kind
of experimented a little bit with like
hourly or batch scoring using some other
things I think that's something that
we're definitely thinking about as we
have more people production izing kind
of like more complex types of models
where you know we might want something
different you mentioned a lot of
optimization that you've done is that on
a model and by MA
by model basis or are there platform
things that you've done that help
optimize across the various models that
you're deploying for a mutes yeah it
definitely a lot of things at the
platform level like I think the first
models that we ever ever scored and our
inference service were sterilized with
yeah mole and they were like really huge
and they caused a lot of garbage when we
tried to load them and so like we did
some work there for kind of tree based
models to be able to load things from
disk to memory really quickly and like
not producing much garbage so that's
that kind of thing are things that we
did especially kind of like in the
earlier days okay and are you what are
you using for querying the models so you
doing rest or G RPC or something
altogether different yeah we used rest
right now I think G RPC is like
something that we're interested in but
we haven't done yet okay and are you is
all of your all of the imprints done via
kind of V arrests and like a kind of
micro service style or do you also do
more I guess embedded types of inference
for like where you need have super low
latency requirements just rest kind of
meet the need across the application
portfolio yeah even for most critical
applications like shield thinks I've
worked pretty well one other thing our
orchestration team has done that's
worked really well for us is migrating a
lot of things to envoy so we've seen
some some things where like we didn't
understand why there was some delay like
in what we measured for how long things
text versus like what it took to the
user there just kind of went away as we
move to envoy and what is envoy envoy is
like a service service networking mush
so it was developed by lyft and it's
kind of like an open source open source
library and so it handles a lot of thing
it can have a lot of things like service
to service communication okay cool
and so the
the inference the inference environment
does it is it doing absent of kubernetes
all the things that you'd expect
communities to do in terms of like
auto-scaling and you know load balancing
across the different service instances
or is that stuff all done statically we
take care of the routing ourselves and
we also at this point have kind of like
sharded are in front service so not all
models are stored on every host so that
you know we don't need hosts with like
infinite memory and so that we take care
of ourselves the scaling we is not fully
automated at this point we do we have
kind of like quality of service that we
have like multiple kind of clusters of
machines and we tear a little bit by
like you know how sensitive your
application is and what you need from it
so that we can be a little bit more
relaxed with people who are developing
and want to test and not have that like
potentially have any impact on more
critical applications but we haven't
done like totally automated scaling not
something we kind of still look at a
little bit ourselves awesome awesome so
if you were kind of just starting down
this journey without having done all the
the things that that you've done it's
right what do you think you would start
if you just you know you're at an
organization that's kind of increasingly
invested in or investing in machine
learning and you know needs to try to
you know gain some efficiencies yeah I
mean I think if you're just starting out
like it's good to think about like what
are your requirements right and you know
if you're just trying to iterate quickly
it's like do the simplest thing possible
right so you know if you can do things
in batch like great do things in batch I
think a lotta there are a lot of both
open-source libraries as well as managed
solutions like on all the different
cloud providers so I think you know I
don't know you know if you're only one
person then I think that those could
make a lot of sense also for people
starting out because I ending one of the
interesting things with machine learning
applications is that it takes a little
bit of work like usually there's sort of
this threshold of like your modeling has
to be good enough for this to be like a
useful thing for you to do like for
fraud detection that's like if we can't
catch any fraud with our models then
like you know we probably shouldn't have
like a fraud detection product so I
think it is useful to kind of have like
a quick iteration cycle to find out like
is this a viable thing that you even
want to pursue and if you have an
infrastructure team they can kind of
like help lower the bar for that but I
think there are other ways to do that
especially as you know there's been like
this Cambrian explosion in the ecosystem
of different open-source platforms as
well as different managed solutions yeah
how do you how do you think an
organization knows when they should have
an infrastructure team ml in particular
yeah I think that's a really interesting
question I guess in our case I think you
know the person who originally founded
the machine learning infrastructure team
had worked in this area before at
Twitter and kind of had a sense of like
this is gonna be a thing that we're
really gonna want to invest in given how
important it is for a business and also
that if you don't kind of like dedicate
some folks to it it's easy for them to
kind of get sucked up in other things
like if you just have data
infrastructure that's undifferentiated
so I think it's a really interesting
question there probably is this business
piece rate of like what are your ml
obligations like how critical are they
to your business and like how difficult
are your infrastructure requirements for
them as well I think a lot of companies
develop their ml infrastructure like
starting out with things like making the
notebook experience really great because
they want to support like a lot of data
scientists who are doing a lot of
analysis and so that's like a little bit
of a different arc from from the one
that we've been on and I think that's
like actually a pretty business
dependent thing okay awesome
awesome well Kelly thanks so much for
taking the time to chat with me about
this really interesting story and I've
enjoyed learning about it cool and
thanks so much for chatting really
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.