YouTube Transcript:
Machine Learning for Security and Security for Machine Learning with Nicole Nichols - TWiML Talk...

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Video Summary

Summary

Core Theme

Stripe's machine learning infrastructure has evolved from supporting critical production use cases like fraud detection to providing a flexible and scalable platform for model training and inference, leveraging Kubernetes and a robust feature framework.

Mind Map

Click to expand

Click to explore the full interactive mind map • Zoom, pan, and navigate

all right everyone I'm on the line with

Kelly revoir Kelly is an engineering

manager at stripe working on machine

learning infrastructure Kelly welcome to

this week in machine learning in AI I

thanks for having me

I really excited to chat same here same

here so we got in touch with you kind of

occasioned by a talk you're giving at

strata which is actually happening as we

speak I'm not physically in sf4 at this

time but your talk which is going to be

later today is on scaling model training

from flexible training api's to resource

management with kubernetes and of course

machine learning infrastructure and AI

of platforms is a very popular topic

here on the podcast and so I'm looking

forward to digging into the way stripe

is platforming its machine learning

processes and operations but before we

do that I'd love to hear a little bit

about your background and how you got

started working in this space yes great

maybe I'll say a little bit about what I

do now and then kind of work backwards

from that awesome so right now I'm an

engineering manager at stripe and I work

with our data infrastructure group which

is seven teams kind of at the lowest

level things like our production

databases or things like elastic search

clusters and then kind of working up

through like batching streaming

platforms core like ETL data pipelines

and libraries and also machine learning

infrastructure I've been at stripe for

very close to six years now from when

the company was about 50 people and have

basically worked on a bunch of different

things in sort of like risk data and

machine learning and both as an engineer

and engineering manager and also

initially more on kind of like the

application side and then over time

moving over to the the infrastructure

side by a training I am like a kind of

research scientist person so I studied

physics and electrical engineering in

school and did my PhD at Stanford

working on nanophotonics and then

a short postdoc at HP Labs nanophotonics

yeah I think you had recently optics

which is not too far away so maybe that

can see a little bit of an idea okay and

then yeah I was at HP Labs for a year so

working on sort of similar things and

also some 3d imaging and I guess I like

to call what I did although I don't know

that anyone else calls it that sort of

like full stack science where like you

have an idea and some theory or modeling

or simulation and then you use that to

design a device and then you actually go

in the cleanroom and like make the

device and then you actually go in the

optics lab and like you know shoot a

bunch of lasers out your device and

measure it and then you sort of like

process the data and compare it to your

theory and simulation and I was like I

found like kind of the two ends the most

like sort of the magical moment where

like you know the data that you

collected like matches what you thought

was gonna happen from your modeling and

I kind of decided that I wanted to do

more of that and a little less than like

fabrication or material science and I

was kind of sitting in Silicon Valley

and started looking around and like

stripe was super exciting in terms of

its mission like having interesting data

and just like having amazing people

awesome awesome stripe sounds really

interesting but shooting lasers at stuff

also sounds really really cool nice nice

and so maybe tell us a little bit about

stripes kind of machine learning journey

from an infrastructure perspective you

know how did it it sounds like you're

doing a bunch of interesting things both

from a training perspective from a data

management perspective inference but how

did it evolve yeah I think one thing

that's interesting about machine

learning at stripe Blake I think a lot

of places you talk to machine learning

kind of like started out as being for

some some kind of like offline analytics

more like you know internal business

questions like maybe like you're trying

to calculate long-term value of your user

user

and we do stuff like that now but we

actually started like our kind of core

uses have always been very much on kind

of the production side like our kind of

most business critical and first machine

lean you need machine learning use cases

where things like scoring transactions

in the charge flow to evaluate whether

they're fraudulent or not we're doing

kind of like internal risk management of

like you know making sure our users are

you know selling things that we can

support from our Terms of Service or

that they're kind of like you know good

users that we want to support and so we

we started out from having kind of a lot

of these more like production

requirements and it needs to be this

fast and it needs to be this reliable

and I think our machine learning

platform kind of like evolved from that

side where you know initially we had

kind of like one machine learning team

and then even just having a couple of

applications we started seeing like oh

here are some commonalities like

everyone needs to be able to score

models or you know even like having some

notion of shared features could be

really valuable across just a couple of

applications and then as we split our

machine learning team one piece of that

became machine learning infrastructure

which we've developed since then and you

know it's really important for that team

to work both with the teams doing the

business applications which now include

a bunch of other things in our user

facing products like radar and billing

as well as internally and also you know

it's important for the machine learning

infrastructure to build on the rest of

your data infrastructure and really the

rest of all of your infrastructure and

we've worked really closely with like

our orchestration team on you know as

you said and chatting about my talk like

getting training to run on kubernetes

yeah that's maybe an interesting place

to start the you kind of alluded to the

the interfaces between machine learning

infrastructure as a team and you know

data infrastructure you know just

infrastructure how do they how do they

connect you know maybe even

organizationally and how do they tend to

work with them up with one another for

example you know in you know training on

kubernetes you know where is the line

between what the ml infrastructure team

is doing and you know what it's

requiring of some you know broader

technology infrastructure group yeah I

think the kubernetes case is really

interesting and it's one that's been

super successful for us so I guess maybe

like a year or two ago we'd initially

focused on the kind of scoring like

real-time inference part of models

because that's the hardest and we'd sort

of left people on their own it's like

well you figure out how to treat a model

and then you know if you manage to do

that we'll help you score it and we

realized that that wasn't like great

right so we started thinking you know

what can we do and at first we built

some CLI tools to kind of like wrap the

Python people were doing but then we

wanted to kind of do more so eventually

we built an API and then a big hassle

had been the resource management and we

just kind of wanted to like abstract

that all away and as it happened at that

time our Constitution team had gotten

like really interested in kubernetes and

I think they wrote a blog post like

maybe year and a half ago they had kind

of just moved our first application into

kubernetes which was some of our cron

jobs that we use in our financial

infrastructure and so we ended up

collaborating this was kind of like a

great next step of a second application

they could work on and you know we had

some details we had to work out we're

having to figure out like how do we

package up all of our Python code and to

you know some docker file we can deploy

and it was really useful to be able to

work with them on that but I think we

have found really good interfaces in

working with them where you know we

wrote a client for the communities API

but it's like anytime we need help or

any time there's management of the

communities cluster they take care of

all of that so it's kind of given us

this flexibility where we can define

different instance and resource types

and swap them out really easily if we

need CPUs or GPUs or we need to like

expand the cluster but we as a machine

learning infrastructure kind of like

don't have to deal with managing

kubernetes or updating it we have this

amazing team of people who are like

totally focused on that for stripe

mm-hmm awesome awesome and then actually

let's maybe stay on

this you know this topic for a moment so

your talk as strata was focused on this

area what was kind of the flow of your

talk what were the main points that you

are that you're planning to go through

with the audience there yeah great

question so we we kind of think about

this in two pieces and you know maybe

that's cuz that's how we actually did it

so one piece was the resource management

that I talked about was you know getting

getting things to around on kubernetes I

was actually kind of like the second

piece for us the first piece was

figuring out sort of like how should the

user interact with things and like where

should we give them flexibility and

where should we constrain things and so

we ended up building what we call

internally railyard which is like a

model training API and it goes with

there's sort of two pieces there's like

what you put in the API request and then

there's what we call out workflow and

the API request is a little bit more

constrained like you have to say your

metadata for who's training so we can

track it you have to tell us like where

your data is like how you're doing

things like hold out just kind of basic

things that you'll always need to put

them we have this workflow piece that

people can write like kind of like

whatever Python they want as long as

they define a train method in it that

will hand us back like the fitted model

and we definitely have found that like

initially we were very focused on binary

classifiers for things like fraud but

people have done things like weird

embeddings if people doing

timeseriesforecasting we're using like

things like scikit-learn actually abused

fast text by georg prophet and so this

has worked pretty well in terms of like

providing enough flexibility that people

can do things that we actually didn't

anticipate originally but it's

constrained enough that we can run it

and sort of track what's going on and

you know give them what they need and be

able to automate the things we need to

automate okay and so you're the

interface you're describing is this kind

of Python and this train method are you

well actually that's it maybe a question

do you that are the users do you think

of your users as more kind of the data

science type of user or machine learning

engineer type of user or is there a mix

of those two you know types of

backgrounds yeah it's a mix which has

been really interesting and I think

coming back to what I said earlier like

because we initially focused on these

kind of critical production in these

cases we started out where the team's

users were really pretty much all

machine learning engineers and very

highly skilled machine learning

engineers like people who are excellent

programmers and you know they know stats

in ml and they're kind of like the

unicorns to hire and over time we've

been able to broaden that and I think

having things like you know this tooling

has made that possible like in our user

survey right after we first shipped even

just the kind of like API workflow piece

and we were actually just like running

it on some box as a sidecar process we

hadn't even done kubernetes yet but a

lot of the feedback we got was like oh

this new person started on my team and I

just like pointed them to the directory

where the workflows are and I like

didn't have to think about how to split

all these things out because like you

know you just kind of pointed me in the

right direction and I could point them

in the right direction so I think that

having having these kind of like common

ways of doing things has been a way to

broaden our user set and as our data

science team which is more internally

focused has grown they've been able to

kind of like start picking up

increasingly large pieces of what we

built for the ML engineers as well and

we've been like excited to see that and

work with them and so the the interface

then is kind of Python code and our is

the platform container izing that code

or is the user expected to do it or is

it integrated into some kind of workflow

like they check it in and then it

becomes available you know to the

platform via check-in or see ICD type of

process yeah so we still have the

experimental flow where people can like

kind of try things out but when you're

ready to productionize your workflow

basically what you do is you get your

code review

you merge it and we use we ended up

using Google's subpar library because it

works really well with basil which we

use for a lot of our build tooling to

kind of what are those - yeah so subpar

is a Google library that helps us like

package Python code into like a

self-contained executable both the

source code and any dependencies like if

you're running PI torch and you need

some kudos stuff okay

and it works kind of out of the box with

basil which is the open source version

of Google's build system which we have

started to use that stripe a few years

ago and have extended since it's really

nice for like speed reproducibility and

working with multiple languages so this

is where our ml in 14 kind of worked

with our orchestration team to figure

out the details here to be able to kind

of like package of all this Python code

and have it so that basically almost

like a service deploy you can kind of

like have it turn into a docker image

that you can deploy to like Amazon's ECR

and then kubernetes will kind of like

know how to pull that down and be able

to run it so the ml engineer the data

scientist doesn't really have to think

about any of that it just kind of works

as part of the you know you got your app

er emerged and you deploy something if

you need to change the workflow okay but

earlier on in the process when you're

experimenting the currency is a you know

some Python code are you [Music]

[Music]

does the are you like what kind of

tooling have you built up around

experiment management and automatically

tracking various experiment parameters

or hyper parameters hyper parameter

optimization and that kind of thing are

you doing all that or is that all on the

the user to do yeah that's a really good

question so one of the things that we

added and our API for training as we

found it was really useful to have this

like custom params field especially

because we eventually people ended up

and you know we have some shared

services to support this like sort of a

retraining service that can automate

your training requests

and so one of the things that people

from the beginning use the custom

programs for was hyper parameter

optimization optimization we are kind of

working toward building that out as a

first-class thing like we now have like

evaluation workflows that can be

integrated with all of this as well and

that's kind of like the first step you

need for high parameter optimization if

you want to do it as a service is like

what are you optimizing if you don't

know what you're looking at so that's

something we hope to do like over the

next you know three to six months is to

make that like a little bit more of

first-class support and you mentioned

this this directory of workflows

elaborate on that a little bit yeah so

one of the nice things is you know when

you're writing your workflow if you put

it in the right place then are like our

scholar service railyard will know where

to find it but one of the side benefits

has also just been that there is one

place where people's workflows are and

so that that's been kind of like a nice

place for people to get started and see

like you know what models are other

people using or like what pre-processing

or kind of what other things are they

doing or what what types of parameters

like estimator parameters are they

looking at changing to just kind of you

know have that be like a little bit more

available to our users or internal users

mm-hmm and the workflow elements of this

is it is a graph based is it something

like airflow how's that implemented yeah

so in this case my workflow all I mean

it's just like Python code that you know

you give it like we're actually railyard

our API passes to it like what are your

features or what are your labels and

then you are Python code returns like

here is the fitted pipeline or model and

like usually something like the

evaluation data set that we can pass

back we have had so we've people have

kind of built us and users like

interesting things on top of having a

training API so some of our users built

out actually the folks working on radar

a fraud product built out like an auto

retraining service that we've since kind

of take it over and generalized

and where they schedule like nightly

retraining of all the tens and hundreds

of models and you know that's integrated

to be able to even like if the

evaluation looks better like potentially

automatically to play them we do also

have people who have put like training

models via our service into like air

flow decks if they have you know some

some slightly more complicated set of

things that they want to run so you

definitely seen that as well okay and

you've mentioned radar a couple of times

is that a product that stripe or an

internal project of yeah like user

facing fraud product it runs on all of

our machine learning infrastructure and

you know every charge that goes through

stripe within usually 100 milliseconds

or so we've kind of like done a bunch of

real-time future generation and

evaluated like kind of all of the models

that are appropriate and in addition to

sort of the machine learning piece

there's also a product piece for it

where users can get more visibility into

what our ml has done they can kind of

like write their own rules and like set

block thresholds on them and there's

there's sort of like a manual review

functionality so they're kind of some

more product pieces that are

complementary to the underlying machine

learning okay interesting and so just

trying to complete the picture here

you've got these workflows which are

essentially Python they expose a trained

entry point and do you are they you

mention this directory of workflows is

that like a directory like on a server

somewhere with just like dot PI files or

is that are they do you require that

they be versioned

and are you kind of managing those

versions yeah so that that's just like

actually like in a code basically so

that's like yeah the workflows live

together in code as part of as part of

kind of our tuning API it's like when

you send that here's my training request

which has you know here's my data here's

my metadata this is the workflow I want

you to run we give you back a job ID

which then you can check the status of

you can check the result the result will

have things in it like what was the get

Shaw and so that's like something that

we can track as well got it

so you're submitting the job with the

little bit which workflow you're running

through like in the case where you're

running on kubernetes you've merged your

code to master and then we kind of

package up all this code and deploy the

docker image and then from there you can

kind of make requests to our service

which will run the job on kubernetes so

at that point your code it's you know

whatever is on master for the workflow

plus whatever you've put in the request

got it

okay and so that's the the kind of the

shape of the training infrastructure

you've mentioned a couple of times that

you it sounds like there's some degree

to which actually I'm not sure maybe I'm

inferring a lot here but let's talk

about the where the the data comes from

for training and what kind of you know

platform support you're offering folks

yeah that's a really interesting

question kind of within the framework of

like what do you need for a like really

our API request we support two different

types of data sources one is more for

experimentation which is like you can

kind of tell us how to make the sequel

to query the data warehouse and that's

kind of nice for experimentation but not

so nice for production what pretty much

everyone uses for production is the

other data source we sort or

which is parquet from s3 so it's like

you tell us you know where to find that

and what your future names are and

usually that's generated by our futures

framework that we call semblance which

is basically like a DSL that helps you

know gives you a lot of ways to write

complex features like think have things

like counters be able to do things like

joins do a lot of transformations and

then you know the other infrastructure

team figures out like how to run that

code in batch if you are doing training

or like there's a way to run it in real

time basically and kind of like a

consumer setup but you only have to

write your code feature code like once

okay and so dr you also is it the user

that's only writing a feature code once

are you going after kind of sharing

features across the user base to what

extent or are you seeing shared features

yeah yes the user writes their code once

and like also I think having a framework

similar to the training workflows where

people can see what other people have

done has been really powerful so we do

have people who are like definitely kind

of sharing features across applications

and there's there's a little bit of a

trade-off like it's like a huge amount

of leverage if you don't have to rewrite

some complicated business logic you do

have to manage a little bit of making

sure that you know everything is

versioned and that you're paying

attention to like not deprecated

something someone else is using and that

you're not like just like changing a

definition in place that you are kind of

like creating a new version every time

you are changing something right so

there's a little bit more management

there and hopefully over time we can

improve our tooling around that but I

think it's you know even even since

before we had a features framework like

being able to kind of share some of that

stuff has been like hugely valuable for

us mmm and are you so what is the

features framework is that

is that a set of api's or is that kind

of a run time like what what exactly is it

yeah there's kind of two pieces so which

is basically sort of what you said like

you know whatever like the API like what

are what are the things we you know let

users Express and one thing we tried to

do there is actually constrain not a

little bit so we like you have to use

events for everything and we don't

really let you Express notions of time

so you kind of can't mess up that time

machine of like what was the state of

the features at some time in the past

where you want to be training your model

we kind of like take care of that for

you so that's kind of one piece and then

you know we kind of compile that into

like an ast and then we use that to

essentially write like a compiler to be

able to run it on different backends and

then we can kind of like you know write

tests and try and check at the framework

level that that things are gonna be as

close as possible to the same across

those different backends so back-end

could be something for training where

you're going to materialize like what

was the value of the features at each

point in time in the past that you want

as inputs to training your model or

another back-end could be like I

mentioned we have kind of this consumer

base back-end that we use like for

example for radar to be able to like

evaluate these features like as a charge

is happening and so to what extent you

find that that that limitation of

everything being event-based

gets in the way of what folks want to do

yeah that's definitely a little bit of a

paradigm shift for people because

they're like oh I just want to use this

thing from the database right but we

found that actually it's worked out

pretty well and that especially when you

have users who are ml engineers like

they do really understand the value of

like why you want to have things event

based and like the sort of gotchas that

that helps prevent because I think

everyone has their story about how you

were just looking something up in the

database but then you know the value

changed and you didn't realize it so

it's kind of like you're leaking future

information into your training data and

then your

it's not gonna do as well as you thought

it did so like I think moving to a more

event based world and I mean I think in

general stripe has also kind of been

doing more streaming work and more

having like good support also as at the

infrastructure level with Kafka has been

really helpful with that and so does

that mean that the models that they're

building need to be aware of kind of

this streaming paradigm during training

where do they get a static data set to

Train yeah so basically you can kind of

use our futures framework to just

generate like park' and s3 that has

materialized like all the information

you want of what was the value of each

of the features that you want at all the

points in time that you want and then

you know your input to the training API

is like please use this park' from s3 we

could make it a little more seamless

than that the nods works pretty well and

part I use just like a serialize like a

file format yeah it's pretty efficient

you know I think it's used in a lot of

kind of big data uses you can also do

things like predicate push down and we

have like a way in the training API to

kind of specify some filters there to

just kind of like save save some effort

use a predicate push down yeah so if you

know you only need certain columns or

something like you know you can you can

load it a little bit more efficiently

and not have to carry around a lot of

extra data got it okay the other

interesting thing that you talked about

in the context of the this event base

framework is the whole you know time

machine is the way you said it kind of

alluding to the point in time

correctness of you know feature snapshot

can you elaborate a little bit on did

you did you start there or did you

evolve to that that seems to be in my

conversations kind of I don't know maybe

like one of the

cutting edges or bleeding edges that

people are trying to deal with as they

scale up these these data management

systems for features yeah for this

particular project in this version we

started there straight previously had

kind of looked at something a little bit

related a couple years before and in a

lot of ways we kind of learned from that

so he ended up with something that was

more more powerful and sort of solved

some of these issues at the platform

level we did you know at that point we

had been running machine learning

applications in production for a few

years so I think everyone has their

horror stories right I was like all the

things that can go wrong especially kind

of a derp correct this level and like

everyone has their story about like

reimplemented features and different

languages which we we did for a while

too and kind of like all the things that

can go wrong there so yeah I think we

really tried to learn from both like

what are all the things we'd seen go

well or go wrong in individual

applications and then also from kind of

like our previous attempts at some of

this type of thing like what what was

good and you know what could still be better

better

mm-hmm and out of curiosity what do you

use for data warehouse and are there

multiple or is it is there just one and

we've used a combination of redshift and

presto over the past couple of years you

know they have a little bit of sort of

like different abilities and strengths

and those are those are things that

people like to use to experiment with

machine learning although like you know

we generally don't use them in our

production clothes because we kind of

prefer the event piece model it so is

the event based model is it kind of

parallel or orthogonal to to redshift or

press tours or is it a front-end to

either these two systems yeah I guess we

have we actually have a front-end that

we've built for redshift and presto you

know separately from from machine

learning that's really nice and lets

people like you know to the extent they

have permissions to do so like Explorer

tables or put annotations on tables and

we haven't integrated our

in general I would say we could do some

work on our UI is for formal stuff we

definitely focus more on the backend and

infra an API side although we do have

some things like our auto retraining

service has a UI where you can see like

what's the status of my job like was it

you know did it finish did it produce a

model that was better than the previous

model mm-hmm I think I'm just trying to

wrap my head around the the event based

model here you know as an example of a

question that's coming to mind in an

event-based world are you regenerating

the features you know every time and if

you've got you know some complex feature

that involves a lot of transformation or

you have to backfill a ton of data like

what does that even mean in an

event-based world where i think of like

you have events and they go away

yes I kind of store for all that that

isn't redshift or presto well you know

we're publishing something to Kafka and

then we're archiving it to s3 that then

that persists like you know as long as

we want it to in some cases basically

forever and so that is available we do

do you end up doing a decent amount of

back filling of kind of like you know

you define the transform features you

want but then you you know you need to

run that back over all the data you'll

need for your training so that's

something that we've actually done a lot

of from the beginning partly because of

our applications like when you're

looking at fraud you know the way you

find out if you were right or not is

that like in some time period usually

within 90 days but sometimes longer than

that the cardholder decides whether

they're going to dispute something as

fraudulent or not and that's compared to

like you know if you're doing ads or

trying to get clicks like you kind of

get the result right away right and we

you know so I think we've always like

been interested in kind of like being

able to backfill so that is you know you

can log things forward but then it's

like you'll probably have to wait a

little bit of time before you have

enough of a dataset that you can train

on it ok cool so we talked about the

data side of things we talked about

training and experiments

how about inference yes that's a really

great question and that's that's kind of

like the first thing that we built

infrastructure support for at first a

decent number of years ago like I think

even before things like tensorflow we're

really popular and so we have like our

own Scala service that we use to do our

production real-time inference and you

know we started out especially because

we have like mostly transactional data

we don't know a lot of things like

images at least as our most critical

applications at this point and a lot of

our early models and even still today

like most of our production models are

kind of like tree based models like

initially things like random forests and

now things more like

x/g boost and so you know we've kind of

like we have the serialization for that

built in to our training workflows and

we've optimized that to run pretty

efficiently in our Scala in print

service and then we've built some kind

of nice layers on top of that for things

like model composition kind of what we

call meta models where you know you can

kind of like take your machine learning

model and kind of like almost like

within the model sort of compose

something like add a threshold to it or

like for radar we trained you know some

array of like in some pieces users

specific models along with like maybe

more of some global models and so you

can kind of incorporate in the framework

of a model doing that dispatch where

you're kind of like if it matches these

conditions let's core with these models

otherwise score with this model and like

here's how you combine it and then the

way that interfaces with your

application is that each application has

what we call a tag and basically the tag

points to the model identifier which is

kind of like immutable and then whenever

you have a new model or you're ready to

ship you just like update what is that

tag point to and then you know put it in

production you're saying like score the

model for this tag okay and that is

pretty similar to like you know if you

read about Michelangelo and things like

that sometimes we're like we all came up with

with

it also sounds a little bit like sorry

say that again

yeah I think that like a lot of people

who kind of come up with some of these

that these ways of doing things that

just kind of make sense mm-hmm it also

sounds a little bit like some of what

Selden is trying to capture a Nakuru

Nettie's environment I which I guess

brings me to is the inference running in

kubernetes or is that a separate

separate infrastructure it's not right

now but I think that's mostly like a

matter of time and prioritization like

the first thing we moved to kubernetes

was the training piece because the

workflow management piece was so

powerful or sorry the resource

management piece was so powerful like

being able to swap out CPU GPU high

memory we've moved some of our like the

sort of real-time feature evaluation to

kubernetes which has been really great

and made it like a lot less toil to kind

of deploy new feature versions at some

point we will probably also move the

inference service to kubernetes we just

kind of haven't gotten there yet because

it is still some work to do that and is

the the inferences is happening on AWS

as well and are you using kind of

standard CPU instances or are you doing

anything fancy there yeah so we run on

cloud for pretty much everything and

definitely use a lot of AWS for the

real-time inference of the most

sensitive like production use cases

we're definitely mostly using CPU and

we've done a lot of optimization work so

that has worked pretty well for us I

think we do have some folks who've kind

of experimented a little bit with like

hourly or batch scoring using some other

things I think that's something that

we're definitely thinking about as we

have more people production izing kind

of like more complex types of models

where you know we might want something

different you mentioned a lot of

optimization that you've done is that on

a model and by MA

by model basis or are there platform

things that you've done that help

optimize across the various models that

you're deploying for a mutes yeah it

definitely a lot of things at the

platform level like I think the first

models that we ever ever scored and our

inference service were sterilized with

yeah mole and they were like really huge

and they caused a lot of garbage when we

tried to load them and so like we did

some work there for kind of tree based

models to be able to load things from

disk to memory really quickly and like

not producing much garbage so that's

that kind of thing are things that we

did especially kind of like in the

earlier days okay and are you what are

you using for querying the models so you

doing rest or G RPC or something

altogether different yeah we used rest

right now I think G RPC is like

something that we're interested in but

we haven't done yet okay and are you is

all of your all of the imprints done via

kind of V arrests and like a kind of

micro service style or do you also do

more I guess embedded types of inference

for like where you need have super low

latency requirements just rest kind of

meet the need across the application

portfolio yeah even for most critical

applications like shield thinks I've

worked pretty well one other thing our

orchestration team has done that's

worked really well for us is migrating a

lot of things to envoy so we've seen

some some things where like we didn't

understand why there was some delay like

in what we measured for how long things

text versus like what it took to the

user there just kind of went away as we

move to envoy and what is envoy envoy is

like a service service networking mush

so it was developed by lyft and it's

kind of like an open source open source

library and so it handles a lot of thing

it can have a lot of things like service

to service communication okay cool

and so the

the inference the inference environment

does it is it doing absent of kubernetes

all the things that you'd expect

communities to do in terms of like

auto-scaling and you know load balancing

across the different service instances

or is that stuff all done statically we

take care of the routing ourselves and

we also at this point have kind of like

sharded are in front service so not all

models are stored on every host so that

you know we don't need hosts with like

infinite memory and so that we take care

of ourselves the scaling we is not fully

automated at this point we do we have

kind of like quality of service that we

have like multiple kind of clusters of

machines and we tear a little bit by

like you know how sensitive your

application is and what you need from it

so that we can be a little bit more

relaxed with people who are developing

and want to test and not have that like

potentially have any impact on more

critical applications but we haven't

done like totally automated scaling not

something we kind of still look at a

little bit ourselves awesome awesome so

if you were kind of just starting down

this journey without having done all the

the things that that you've done it's

right what do you think you would start

if you just you know you're at an

organization that's kind of increasingly

invested in or investing in machine

learning and you know needs to try to

you know gain some efficiencies yeah I

mean I think if you're just starting out

like it's good to think about like what

are your requirements right and you know

if you're just trying to iterate quickly

it's like do the simplest thing possible

right so you know if you can do things

in batch like great do things in batch I

think a lotta there are a lot of both

open-source libraries as well as managed

solutions like on all the different

cloud providers so I think you know I

don't know you know if you're only one

person then I think that those could

make a lot of sense also for people

starting out because I ending one of the

interesting things with machine learning

applications is that it takes a little

bit of work like usually there's sort of

this threshold of like your modeling has

to be good enough for this to be like a

useful thing for you to do like for

fraud detection that's like if we can't

catch any fraud with our models then

like you know we probably shouldn't have

like a fraud detection product so I

think it is useful to kind of have like

a quick iteration cycle to find out like

is this a viable thing that you even

want to pursue and if you have an

infrastructure team they can kind of

like help lower the bar for that but I

think there are other ways to do that

especially as you know there's been like

this Cambrian explosion in the ecosystem

of different open-source platforms as

well as different managed solutions yeah

how do you how do you think an

organization knows when they should have

an infrastructure team ml in particular

yeah I think that's a really interesting

question I guess in our case I think you

know the person who originally founded

the machine learning infrastructure team

had worked in this area before at

Twitter and kind of had a sense of like

this is gonna be a thing that we're

really gonna want to invest in given how

important it is for a business and also

that if you don't kind of like dedicate

some folks to it it's easy for them to

kind of get sucked up in other things

like if you just have data

infrastructure that's undifferentiated

so I think it's a really interesting

question there probably is this business

piece rate of like what are your ml

obligations like how critical are they

to your business and like how difficult

are your infrastructure requirements for

them as well I think a lot of companies

develop their ml infrastructure like

starting out with things like making the

notebook experience really great because

they want to support like a lot of data

scientists who are doing a lot of

analysis and so that's like a little bit

of a different arc from from the one

that we've been on and I think that's

like actually a pretty business

dependent thing okay awesome

awesome well Kelly thanks so much for

taking the time to chat with me about

this really interesting story and I've

enjoyed learning about it cool and

thanks so much for chatting really

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:Machine Learning for Security and Security for Machine Learning with Nicole Nichols - TWiML Talk...