Generative AI image generation primarily utilizes diffusion models, a supervised learning technique that learns to progressively denoise images from pure noise, guided by text prompts, to create novel visuals.
Mind Map
Nhấn để mở rộng
Nhấn để khám phá sơ đồ tư duy tương tác đầy đủ
thanks for sticking with me for this
final optional video on image generation
so far this week we focus most of
attention on text generation and text
generation is what a lot of users are
using and is having the biggest impact
of all the different tools of generative
AI but part of the excitement of
generative AI is also image generation
and they're also starting to be some
models that can generate either text or
images and these are sometimes called
multi mod models because it can operate
in multiple modalities text or images
what I'd like to do in this video is
share with you how image generation
Works let's take a look with just a
prompts you can use generative AI to
generate a beautiful picture of a person
that that never existed or a picture of
a futuristic scene or a picture of a
cool robot like this how does this
technology work image generation today
is mostly done via a method called a
diffusion usion model diffusion models
have learned from huge numbers of images
found on the internet or elsewhere and
it turns out that at the heart of a
diffusion model is supervised learning
here's what it does let's say the argm
finds a picture on the internet of an
apple like this and it wants to learn
from pictures like this and hundreds of
millions of others on how to generate
images the first step is to take this
image and gradually add more and more
noise to it so that you go from this
nice picture of an apple to a noisier to
an even noisier to finally a picture
that just looks like Pure Noise where
all the pixels are just chosen at random
and it doesn't look at all like an apple
the diffusion model then uses pictures
like these as data to learn using
supervised learning to take as input a
noisy image and to Output a slightly
less noisy image specifically it would
create a data set where the first data
point says if it's given the second
input image what we want the supervised
learning algum to do is learn to Output
a cleaner version of this apple and
here's another dat Point given this
third image of an even noisier image we
would like the algorithm to learn to
Output a slightly less noisy version
like this and finally given an image of
Pure Noise like this fourth image we
would like it to learn to Output a
slightly less noisy picture here that
just suggest the presence of an Apple
after training on maybe hundreds of
millions of images via a process like
this when you want to apply it to
generate a new image this is how you
would run it you would start off with a
Pure Noise image so start by taking a
picture where every single Pixel in
picture is just chosen completely at
random we then feed this picture to the
supervised learning algorithm that we
trained up on the previous slide when we
feed in Pure Noise it learns to remove a
little bit of noise from this picture
and you may end up with a picture like
this that suggests some sort of fruit in
the middle but we're not quite sure what
it is yet given the second picture we
again feeded to the model and it then
takes away even a little bit more noise
and now it looks like we can see a noisy
picture of a
watermelon and then if you apply this
one more time we end up with this fourth
image which looks like a pretty nice
picture of a watermelon I'm illustrating
this process using four steps of adding
noise on the previous slide and four
steps of removing noise on this slide
but in practice maybe about a 100 steps
would be more typical for a diffusion
model so this algorithm will work for
generating pictures completely at random
but we want to be able to control the
image it generates by specifying a
prompt to tell it what we want it to
generate let me describe a modification
of the algorithm that lets you add text
or add a prompt to tell it what you want
it to generate in in this trading data
we given pictures like this apple as
well as a description or prompt that
could have generated this Apple so here
I have a text description saying this is
a red apple then we will same as before
add noise to this picture until we get
this fourth image which is Pure Noise
but we're going to change how we build
the learning algorithm which is rather
than inputting the slightly noisy
picture and expecting it to generate a
clean picture when instead have the
input a to the supervisor learning album
be this noisy picture as well as the
text caption or the prompt that could
have generated this picture namely Red
Apple and given this input we now want
the algorithm to Output this clean
picture of an apple and similarly will
generate additional data points for the
algorithm using the other noisy images
where each time given a noisy image and
the text prompt red apple we want the AL
to learn to generate a less noisy
picture of a red apple so Having learned
from a very large data set when you want
to apply the Alum to generating say a
green banana this is what you do same as
before we start off with an image of
Pure Noise so every single Pixel is
chosen completely at random and if you
wanted to generate a green banana you
input to the suis learning algorithm
that picture of Pure Noise together with
with the prompt green banana now that it
knows you want a green banana hopefully
the ALB will output a picture that maybe
looks like this can't see the banana
that clearly but maybe this a suggestion
of some sort of greenish fruit in the
middle and this is the first step of
image generation the next step is we
then take this image on the right that
was the output B and now feed that is
the input a with Again The Prompt green
banana to get it to generate a slightly
less noisy picture and now we see See
Clearly looks like it's a green banana
but a pretty noisy one and we do this
one more time and it finally removes
most of the noise um until we end up
with that picture of a pretty nice green
banana so that's how diffusion models
work for generating images and at the
heart of this really magical process of
generating beautiful images is again
supervised learning thanks for sticking
with me for this optional video and I
look forward to seeing you next week
where we'll dive much more into
applications being built using
generative AI I'll see you in the next video
Nhấn vào bất kỳ đoạn văn bản hoặc mốc thời gian nào để nhảy đến phần đó trong video
Chia sẻ:
Hầu hết transcript sẵn sàng trong dưới 5 giây
Sao Chép 1 Chạm125+ Ngôn ngữTìm kiếm nội dungNhảy đến mốc thời gian
Dán URL YouTube
Nhập link bất kỳ video YouTube để lấy toàn bộ transcript
Form Trích Xuất Transcript
Hầu hết transcript sẵn sàng trong dưới 5 giây
Cài Tiện Ích Chrome Của Chúng Tôi
Lấy transcript ngay mà không cần rời khỏi YouTube. Cài tiện ích Chrome để truy cập transcript của bất kỳ video nào ngay trên trang xem, chỉ với một cú nhấp.