0:00 Are you using flask or fast API to serve
0:02 your machine learning model?
0:04 Google's tensorflow team has developed
0:06 this tool called tf serving
0:08 which is little better than flask and
0:12 a
0:12 better way and it also allows you to do
0:15 your model version management in little
0:17 better way. So in this video we'll look
0:19 into some theory and then we'll
0:20 practically see how this tool works
0:23 let's begin!
0:24 Let's say you're building an email
0:26 classification model where you're
0:27 specifying whether email is spam or
0:30 not spam a typical data science workflow
0:32 would be
0:33 you will collect data, do data cleaning
0:36 feature engineering
0:38 and you will train a model. Let's say
0:39 this is a tensorflow model
0:41 you will then export it to a file you
0:44 can just say
0:45 model.save and that will export the
0:48 model to a file on hard disk.
0:52 Then you can write a fast api or
0:55 flask based server. This is the proper
0:58 approach
0:58 people write these servers which will
1:01 load the save model. As you can see in
1:03 this line
1:04 and when clients make http
1:08 call such as this predict
1:11 it will call this function where you use
1:13 the loaded model.
1:16 Now let's say this model is running fine
1:19 in production
1:21 you get new data, you train a new model
1:23 and you are
1:24 ready to deploy the next version which
1:26 is version 2
1:28 to your beta users. So what happens here
1:31 version 1 is production and then version
1:34 2
1:35 is ready to be deployed to beta users.
1:39 Now imagine how you would have to change
1:41 your fast API code in this case.
1:45 You would somehow detect in your predict
1:47 function
1:48 that the given user is a beta user and
1:50 then
1:51 you can call beta model. So see here I'm
1:54 loading
1:55 model one and two into two different
1:57 variables and I can
1:59 so the request based on
2:03 what type of user that is. And this is
2:06 one approach
2:06 maybe you can have a different server
2:10 altogether
2:10 just for beta users. But you can already
2:13 see the complexity here
2:15 here you have to do if else maybe you
2:18 five different versions which you want
2:20 to serve to different type of
2:22 users. Overall you get an idea
2:25 the version management is little tedious.
2:28 TF serving makes this version management
2:31 and model serving becomes very easy
2:34 here you have to write all this code we
2:37 will see
2:38 in tf serving you don't have to write
2:40 any code
2:41 you just run one command and your server
2:43 is ready.
2:44 So I will show you practically how it
2:46 looks but let me mention one another
2:48 benefit of tf serving-
2:50 it is batch inference . You might have
2:52 let's say
2:53 thousands of incoming requests for
2:55 inference.
2:57 In tf serving you can
3:00 match those requests and solve those
3:02 requests to a model
3:04 in batches the benefit is better
3:07 hardware
3:08 resource utilization. You can have a
3:11 timeout parameter
3:12 say 5 second is a time timeout so and
3:15 your batch size is 100.
3:17 In 5 seconds let's say you only receive 52
3:19 requests
3:20 then it will badge only 52 because you
3:23 don't want these requests to be waiting
3:24 till you receive 100 requests, you know.
3:26 So batching thing also
3:29 works now let me just show you directly
3:31 how this whole thing works.
3:33 In one of my videos in deep learning
3:35 tutorial playlist
3:36 I built a text classification model
3:38 using BERT
3:39 so here is the video if you want to see
3:42 model building process but I'm just
3:44 going to open that same notebook here
3:47 and you can see you're classifying email
3:50 as spam and non-spam
3:51 using BERT and tensorflow once the
3:54 model is
3:54 built and ready you can export
3:58 it to a file using dot save method.
4:01 So here I have called dot save three
4:03 times basically
4:05 just to show you three different
4:07 versions
4:08 and you can see these three different
4:09 versions are saved here so you see
4:11 saved model directory here these are
4:14 essentially the same models.
4:16 But in real life you would have
4:19 different models
4:20 you know version one and version two
4:22 would have some differences.
4:24 But here just for the tutorial purpose I
4:26 save the same model
4:28 so if you go to individual model you'll
4:30 see couple of files
4:32 you know like the assets and
4:35 variables and so on you you don't have
4:38 to worry about what these files are you
4:39 can just directly load this model and
4:41 start using it.
4:45 The first step here would be to install
4:47 tensorflow serving
4:49 the most convenient way to install
4:51 tensorflow serving is
4:52 docker so you can just pull the docker
4:55 image
4:56 by running this docker pull command.
4:58 There are by the way other ways like if
5:00 you're using Ubuntu you can do
5:01 apt-get and things like that but here
5:05 I will just use docker okay so in my git
5:08 bash
5:09 I can just run docker poll tensorflow
5:12 server
5:12 and it will pull the latest image I
5:14 already have the latest image
5:16 so it's not doing much but
5:21 if you run docker desktop
5:24 I have windows and on windows I have
5:26 already installed docker desktop
5:28 and if you look at docker desktop you
5:29 see I
5:31 I already have tensorflow serving
5:35 image on my computer
5:39 you can use git clone this this will
5:42 just
5:42 download you know some sample models for
5:44 you but we are going to use our own
5:46 model
5:47 so you can follow these commands just
5:48 for your own
5:50 learning. But here I'm going to
5:54 just lower the model which I saved so I
5:57 have saved
5:58 again in df serving I have all these
6:00 saved models okay
6:02 so now I'm going to open
6:05 windows powershell so you open windows
6:08 powershell I already have it open here.
6:10 I'm just going to
6:11 call it clear and
6:16 okay it will look like this and
6:19 once you have this open you need to
6:22 first
6:23 load the docker container. so how to load
6:26 that?
6:27 Okay, I have created a github page
6:31 where I have given all the commands and
6:34 I'm going to put the link of this in
6:35 video description below
6:36 but the way you load docker
6:40 is by calling this docker run minus v so
6:44 I will
6:45 just gradually type in those commands so
6:47 you get an idea so you will say
6:49 docker run okay.
6:53 See there are a couple of command line
6:55 options this is not a docker tutorial so
6:57 I'm not gonna go
6:58 into each option in detail but
7:02 minus v is an important one so here
7:05 you supply your
7:08 host directory. So
7:12 I want to use this directory here right
7:14 so I will just say control c
7:16 okay and control v.
7:20 so I want this directory to be
7:23 mapped to some directory
7:26 inside my docker container so I'll just
7:29 give the same name you know
7:31 tf serving so this directory from my
7:34 host
7:35 maps to this directory in my docker
7:38 image
7:40 then I will loop port. So let's say
7:44 I want
7:47 8605 port to be exposed as 8605
7:51 so 8605 on my host
7:54 system is mapped to h605
7:58 on my on my what
8:01 on my docker container image
8:05 and then I will do entry point now if I
8:08 don't do entry point directly
8:10 what's gonna happen is docker will have
8:12 its
8:13 default entry point so the image that we
8:16 loaded
8:17 the default entry point will directly
8:21 run the tf serving command. But we don't
8:23 want to do that so I will just
8:26 do entry point. Entry point is bin
8:29 bash. Bin bash will take you to command
8:33 prompt basically okay and then
8:37 you give the name of your image
8:40 so what is the name of your image well
8:43 it is
8:45 tensorflow slash surving
8:49 when you do that you will enter into you
8:52 see
8:54 you have entered into now your docker
8:57 and within the docker image since you
8:59 mapped this 48 serving.
9:01 See 48 tf serving? You see that
9:04 here and if you do let me just clear it.
9:08 If you do ls.
9:12 See if you do ls
9:15 minus you see all these directories you
9:19 see saved models which is good.
9:21 Now the way you run tensorflow serving
9:24 by running this command this will run
9:28 your tensorflow serving
9:29 and here you need to supply your rest
9:33 API port.
9:34 So my rest API port where i'll be making
9:37 my you know http calls
9:39 is eight six zero five
9:43 and you can have any any port pretty
9:45 much but
9:46 we decided h605
9:50 you need to give model name
9:53 let's see my my any anything you can
9:55 give x y z
9:57 my model name is email model
10:00 and what is my model base path?
10:04 So my model base path is nothing but
10:07 this directory. Okay when you run this
10:10 command
10:12 see in one line we
10:17 okay what is what is it saying there is
10:21 some
10:21 error happening okay we forgot to give
10:26 the name of the model file so it is
10:32 actually saved models right you need to
10:34 give saved models actually.
10:38 Okay so that was the reason i was
10:39 getting an error. So now
10:41 this means my model server is ready
10:45 see just by writing one line of code I
10:48 I created my server now how does this
10:52 server work
10:53 well for that you need to run postman
10:56 so install postman it's a popular tool
10:58 which is used to make
11:00 http request you know what before I run
11:03 postman let me just show you
11:05 in a browser itself so 8605
11:10 is my port then you do v1 is just a
11:13 fixed thing okay v1
11:15 then you do models then you do email
11:21 so when I do this it is saying I have
11:24 version 3 available
11:25 which means by default when I ran that
11:28 command
11:30 it looked into save model directory and
11:33 whatever is the highest number
11:34 version 3 it says that is the model
11:38 which is available. Now I will use
11:40 postman to make actual requests so in
11:42 the postman
11:44 what you can do is you can say
11:50 email model colon predict.
11:54 okay so I have same url basically
11:57 see this part is kind of fixed.
12:00 I was confused initially by v1 but
12:02 ignore that v1 is always v1.
12:04 It's not the actual version okay
12:08 then email model and then colon predict
12:12 in the body so by default you'll be here
12:15 you have to come to body
12:16 and in the body click on row raw
12:20 and this is the format that it expects.
12:22 Okay so let me reduce the font size
12:24 a little bit okay so here
12:27 you will say instances this is a fixed
12:30 format okay don't ask me why it is like
12:32 that's a format that tf serving expects
12:35 and I'm giving two emails this one is
12:37 not a spam email
12:39 this one is a spam email and when I say
12:42 see it is sending the request and it got
12:44 this prediction back.
12:46 It got this prediction back from the
12:48 model server that we just started.
12:51 So here if the value is less than 0.5
12:55 it is it is not a spam if it is more
12:57 than 0.5 it is spam.
12:59 So you can see is clearly see second one
13:01 is spam that's why value is
13:02 more than 0.5 this one is not a spam
13:05 hence the value is
13:06 less than 0.5 if you want to call this
13:09 by version number.
13:11 You can simply say slash versions
13:16 slash 3 column predict and you get the
13:19 same response.
13:20 But we already saw this has only version
13:26 two
13:27 and send see what happens.
13:31 It says where this model is not found
13:33 even
13:35 version one is not found so what if
13:38 I want to make all three versions
13:41 available?
13:42 I want to make all three versions
13:44 available okay?
13:46 For that you have to use model config
13:48 file
13:49 how do I do that okay let me exit ctrl c
13:53 it will exit. Okay
13:58 I can do that by changing the command a
14:01 little bit
14:03 so here
14:05 [Music]
14:07 I will say model config file
14:11 is equal to so I need to supply
14:16 model config file
14:20 so I already have this config called
14:23 model.config.a
14:24 and if you look at that file what it
14:27 says is
14:28 model version policy all which means in
14:31 this directory.
14:33 Whatever version you see make them all
14:36 survivable instances.
14:42 So I'm going to run this
14:48 see successfully loaded version 2
14:52 version 1 version 3. All
14:56 threes will be survivable so now when I
14:58 say see version one predict
15:02 it works version two predict
15:07 it works I get the same output by the
15:10 way
15:11 because my models are same but in real
15:13 life scenario
15:14 version one, version two, version three.
15:16 This they will give a little different
15:18 output.
15:20 version three works.
15:23 Version four what's gonna happen
15:25 obviously you don't have version four
15:27 guys
15:29 many times you have a production version
15:31 let's say one is a production version
15:33 then you build version two and you want
15:35 to deploy that only to beta users
15:38 and you don't want this call to be
15:41 happening through version. Maybe
15:44 it will be better if I can do something
15:46 like okay production
15:47 you know like production versus beta
15:51 do you think that would be better? Well
15:53 that's also supported
15:55 and in order to do that you have to
15:58 use version labels. So I have the same
16:01 exit file
16:02 but I added this section. I'm saying my
16:04 version one
16:05 is my production version two is my beta
16:08 and I'm going to run my model server now
16:12 with that particular file so it's a
16:14 different config file
16:16 basically c. okay so
16:20 request to oh I see okay
16:23 you need to if you get you get this
16:26 okay let me just show you okay if
16:29 if you give this command it gives this
16:31 error and to
16:33 tackle this error you need to give
16:37 this particular option here.
16:43 If you supply this particular command
16:46 line option
16:48 then you don't get this error. Okay so
16:51 now my
16:53 you know tf serving is ready whenever
16:56 you see this.
16:57 It means it is ready and it can serve
17:00 using labels
17:02 so let's see so first of all
17:05 you can obviously call using version
17:08 numbers. So let's verify that first
17:10 so here I will supply
17:13 version number one. You know it works
17:16 just okay
17:17 but now I want to use labels so you will
17:19 just say instead of version
17:20 labels and lesser beta.
17:25 See my beta works you can also do
17:29 production these are the two labels I
17:32 have
17:34 this works just okay and in your client
17:37 code
17:38 what you'll be doing is you know let's
17:41 say you are making this code in
17:42 Javascript.
17:43 In Javascript you will have all these
17:45 urls or beta whatever
17:47 and based on beta or production user you
17:50 can
17:51 switch your users so this is almost
17:54 like
17:56 a b testing type of scenario where you
17:58 are
17:59 testing you know new version for
18:03 beta users. If you look at documentation
18:07 model config file has few other options
18:09 as well.
18:09 For example you can serve two
18:14 entirely different models. You can have a
18:16 dog and cat classifier
18:18 here and truck and car cat classifier
18:21 here.
18:22 And just by running one command
18:25 you can have your server which can do
18:27 different type of
18:29 inferences. So just go through this
18:33 documentation.
18:34 I did not cover all the options we
18:36 talked about
18:39 batching. So matching configuration can
18:41 be this
18:42 you just pass this batch parameters file
18:45 and in the file you can say okay batch
18:49 128 request with certain timeout
18:53 and it will help you utilize your
18:56 hardware resources
18:58 in a most appropriate way.
19:02 That's all i had for this tutorial. I
19:04 have a Github page
19:05 where I have given my notebook
19:09 you can export the models i did not
19:11 upload
19:12 exported models because they were very
19:13 big
19:15 you have all this config files so see
19:18 this is
19:18 config.c and so on I highly encourage
19:22 that you practice whatever you learned
19:24 in this video
19:25 because just by watching video you're
19:28 not going to learn anything.
19:30 Trust me you need to practice. So just
19:33 install df serving practice whatever you
19:36 learned in this video
19:37 and I hope you can get a good
19:39 understanding of how this thing works.
19:41 If you like this video please give it a
19:43 thumbs up and share it with your friends.
19:45 All the useful links are in video
19:47 description below.
19:48 Thank you. Thanks for watching.