0:03 thanks for sticking with me for this
0:06 final optional video on image generation
0:08 so far this week we focus most of
0:11 attention on text generation and text
0:13 generation is what a lot of users are
0:16 using and is having the biggest impact
0:17 of all the different tools of generative
0:20 AI but part of the excitement of
0:23 generative AI is also image generation
0:25 and they're also starting to be some
0:27 models that can generate either text or
0:29 images and these are sometimes called
0:31 multi mod models because it can operate
0:35 in multiple modalities text or images
0:37 what I'd like to do in this video is
0:39 share with you how image generation
0:41 Works let's take a look with just a
0:43 prompts you can use generative AI to
0:46 generate a beautiful picture of a person
0:48 that that never existed or a picture of
0:51 a futuristic scene or a picture of a
0:53 cool robot like this how does this
0:56 technology work image generation today
0:59 is mostly done via a method called a
1:02 diffusion usion model diffusion models
1:04 have learned from huge numbers of images
1:07 found on the internet or elsewhere and
1:09 it turns out that at the heart of a
1:13 diffusion model is supervised learning
1:14 here's what it does let's say the argm
1:16 finds a picture on the internet of an
1:19 apple like this and it wants to learn
1:21 from pictures like this and hundreds of
1:23 millions of others on how to generate
1:26 images the first step is to take this
1:29 image and gradually add more and more
1:32 noise to it so that you go from this
1:34 nice picture of an apple to a noisier to
1:38 an even noisier to finally a picture
1:39 that just looks like Pure Noise where
1:41 all the pixels are just chosen at random
1:43 and it doesn't look at all like an apple
1:45 the diffusion model then uses pictures
1:49 like these as data to learn using
1:52 supervised learning to take as input a
1:55 noisy image and to Output a slightly
1:58 less noisy image specifically it would
2:00 create a data set where the first data
2:03 point says if it's given the second
2:06 input image what we want the supervised
2:08 learning algum to do is learn to Output
2:11 a cleaner version of this apple and
2:13 here's another dat Point given this
2:15 third image of an even noisier image we
2:17 would like the algorithm to learn to
2:20 Output a slightly less noisy version
2:22 like this and finally given an image of
2:25 Pure Noise like this fourth image we
2:27 would like it to learn to Output a
2:29 slightly less noisy picture here that
2:31 just suggest the presence of an Apple
2:34 after training on maybe hundreds of
2:36 millions of images via a process like
2:38 this when you want to apply it to
2:40 generate a new image this is how you
2:43 would run it you would start off with a
2:46 Pure Noise image so start by taking a
2:48 picture where every single Pixel in
2:49 picture is just chosen completely at
2:52 random we then feed this picture to the
2:54 supervised learning algorithm that we
2:56 trained up on the previous slide when we
2:58 feed in Pure Noise it learns to remove a
3:00 little bit of noise from this picture
3:02 and you may end up with a picture like
3:04 this that suggests some sort of fruit in
3:06 the middle but we're not quite sure what
3:09 it is yet given the second picture we
3:12 again feeded to the model and it then
3:14 takes away even a little bit more noise
3:16 and now it looks like we can see a noisy
3:18 picture of a
3:20 watermelon and then if you apply this
3:23 one more time we end up with this fourth
3:25 image which looks like a pretty nice
3:27 picture of a watermelon I'm illustrating
3:31 this process using four steps of adding
3:32 noise on the previous slide and four
3:34 steps of removing noise on this slide
3:37 but in practice maybe about a 100 steps
3:39 would be more typical for a diffusion
3:41 model so this algorithm will work for
3:44 generating pictures completely at random
3:46 but we want to be able to control the
3:48 image it generates by specifying a
3:50 prompt to tell it what we want it to
3:53 generate let me describe a modification
3:55 of the algorithm that lets you add text
3:58 or add a prompt to tell it what you want
4:01 it to generate in in this trading data
4:03 we given pictures like this apple as
4:06 well as a description or prompt that
4:09 could have generated this Apple so here
4:11 I have a text description saying this is
4:14 a red apple then we will same as before
4:18 add noise to this picture until we get
4:20 this fourth image which is Pure Noise
4:21 but we're going to change how we build
4:24 the learning algorithm which is rather
4:26 than inputting the slightly noisy
4:28 picture and expecting it to generate a
4:30 clean picture when instead have the
4:32 input a to the supervisor learning album
4:35 be this noisy picture as well as the
4:38 text caption or the prompt that could
4:41 have generated this picture namely Red
4:44 Apple and given this input we now want
4:46 the algorithm to Output this clean
4:49 picture of an apple and similarly will
4:51 generate additional data points for the
4:54 algorithm using the other noisy images
4:57 where each time given a noisy image and
5:00 the text prompt red apple we want the AL
5:02 to learn to generate a less noisy
5:06 picture of a red apple so Having learned
5:08 from a very large data set when you want
5:11 to apply the Alum to generating say a
5:14 green banana this is what you do same as
5:17 before we start off with an image of
5:20 Pure Noise so every single Pixel is
5:22 chosen completely at random and if you
5:25 wanted to generate a green banana you
5:27 input to the suis learning algorithm
5:29 that picture of Pure Noise together with
5:32 with the prompt green banana now that it
5:34 knows you want a green banana hopefully
5:36 the ALB will output a picture that maybe
5:38 looks like this can't see the banana
5:40 that clearly but maybe this a suggestion
5:42 of some sort of greenish fruit in the
5:44 middle and this is the first step of
5:46 image generation the next step is we
5:48 then take this image on the right that
5:51 was the output B and now feed that is
5:54 the input a with Again The Prompt green
5:57 banana to get it to generate a slightly
5:59 less noisy picture and now we see See
6:01 Clearly looks like it's a green banana
6:03 but a pretty noisy one and we do this
6:06 one more time and it finally removes
6:09 most of the noise um until we end up
6:12 with that picture of a pretty nice green
6:15 banana so that's how diffusion models
6:18 work for generating images and at the
6:21 heart of this really magical process of
6:23 generating beautiful images is again
6:26 supervised learning thanks for sticking
6:27 with me for this optional video and I
6:30 look forward to seeing you next week
6:32 where we'll dive much more into
6:34 applications being built using
6:36 generative AI I'll see you in the next video