This video introduces the PyMC3 library as a powerful tool for performing Bayesian analyses, demonstrating how it can extract meaningful insights from limited data (as few as 10 data points) by providing full parameter distributions rather than simple point estimates.
Mind Map
Expand करने के लिए click करें
पूरा interactive mind map देखने के लिए click करें
[Music]
hey everyone welcome back so today we're
going to be learning about machine
learning with just 10 data points
and now that we got that clickbaity
title out of the way the real topic of
this video
is pi mc3 which is a very cool library
in python
which markets itself as a probabilistic
coding framework but um
i like to think that it's something that
really helps us to run our bayesian
analyses so we'll see that through the
course of this video
to keep things simple in this video
we're going to be looking at a linear
model which is something we have learned
very early on
in statistics and maybe learned several times
times
so just as a refresher a linear model
says that the true values of y
are generated according to this linear
process we'll just be keeping it as one variable
variable
single variable regression so it's mx
plus b where m
is the slope and b is the intercept
pretty simple
but of course we don't observe these
true values because there's a little bit
of noise
added to them and so the values that we
observe which are called y
are equal to the true values plus some
normally distributed noise with mean
zero and standard deviation sigma so if
we look at these two equations for a
while there's three things that we're
going to care about in this video
m which is the slope b which is the
intercept and sigma which is the
standard deviation of the noise so let's
generate some data
where we're going to pick the true slope
to be 5 the true intercept will be 10
and the true sigma will be 1
for this video and if we generate 10
points 10 data points according to that
process it looks
like this for example so all the blue
points the 10 of them are data points
that we observe
the red line is the true equation y
equals mx plus b
of course we don't observe that but you
can just reference that against the
points we do have
now if we weren't thinking in a pi mc3
or a bayesian framework what would we do
we would just fit a usual linear model
so let's do that first to see if we can
do better
so if we fit a regular linear model this
is the output so the true model is again
y true is equal to 5x
plus 10 and the true sigma is equal to 1.
1.
if we fit a linear model and to get a
little bit mathy for a second
we are using the maximum likelihood
estimator the mle estimate in this case
the mle estimate of the slope is 5.9
which is pretty far away from five
and for the intercept is 9.8 which is
closer to 10
and then the standard deviation of the
residuals is 1.16 so these three
estimated values
are somewhat close some closer than
others to their true counterparts but
they're not
exact but the main issue is not really
how close they are but the main issue is
that these are what we call point estimates
estimates
in stats point estimates are kind of
just single estimates hey i think this
is the approximation that i'm going to
give you
what we usually want in stats is a whole distribution
distribution
of values for these estimated parameters
the reason we want to hold distribution
is so that we can start looking at and
say that okay the most likely thing
is this but it could also very likely be
this or this
or if the distribution is really narrow
then we have a lot of confidence in the
point estimate that we get back so we
would love to have a full distribution
and so that's where we start turning our
gears towards bayesian analysis
so let's work through some of the math
at a high level first and talk about why
we need or
is nice to have pi mc3 so first we start
with priors
this is always kind of a point of
contention in bayesian stats but
let's just pick some priors so we have a
prior for m which is a slope normally
distributed with mean 0
and standard deviation 20. b is the same
thing and sigma is going to be
exponentially distributed with parameter 1.
1.
now let's talk about these upper two for
a second notice i put a standard
deviation of 20 which is a pretty big
number and so this is what we call a
relatively flat prior
by picking such a big standard deviation
in our prior
we are basically encoding this idea that
i don't really know what these variables
should be
therefore i'm going to center them at
zero and give them a big standard
deviation so i'm not really
locking them down into any value because
i don't really understand too well that
what they should be
now the likelihood function is what is
the probability of
seeing the data we actually observe
seeing the y values that we actually observe
observe
given some setting of m b and sigma and
that's normally distributed with mean mx
plus b
and standard deviation sigma as we have
from the linear model
above now the posterior is the most
important quantity in bayesian stats
and the posterior asks the question
about what is the probability
distribution of our parameters
m b and sigma given that we have some
observed data y
so it's asking the reverse question of
the likelihood and we know according to
bayes theorem that the posterior which
is this guy on the left is proportional
to the likelihood which is this guy here
times the prior the reason i've split up
the priors into three pieces is because
we have this implicit assumption
that the priors are independent of each
other now this doesn't generally have to
be true we can always have our priors be
dependent on each other in interesting
ways that would just complicate the
analysis a little bit today we'll keep
it simple
and so we have that the posterior is
proportional to this quantity here
now the difficult part with bayesian
stats is that we want to sample from the
posterior i want to get samples of m
b and sigma from the posterior but the posterior
posterior
is the multiplication of a normal
distribution another normal another
normal and an exponential
which doesn't seem mathematically fun to
work with or
code and that's where pi mc3 comes in pi
mc3 says that
you don't need to know anything
mathematically about the posterior
just tell me what the priors are just
tell me what the likelihood is
and i'm going to use mcmc behind the scenes
scenes
to sample from this posterior for you
and that's the power of pi
mc3 but with that great power comes
great responsibility and we should still
understand what's
going on it's just that we kind of give
the work away to somebody who can handle
it more efficiently than us
and so let's look at the code this is
the easiest part of the video the pi mc3
coding so at the very top i imported pi
mc3s pm
so with pm.model as model so you always
start a pi
mc3 program like this you specify your
priors so we have three priors sigma is
exponentially distributed with lambda
equals one
the intercept and the slope are both
normally distributed with mean
zero and standard deviation 20. so
that's all encoded there
so the likelihood is normally
distributed with mean slope
times x values plus intercept mx plus b
and standard deviation equals sigma and
then we go ahead and put the observed
values that we actually get
which are those 10 blue points at the
very beginning of this video
and then we go ahead and say i want you
to sample a thousand
samples from the posterior and this
course equals four says i want you to do
that four times independently
so that we can see how those four
independent runs compare to see if they
line up or not
and the results drumroll please and by
the way this took
about two minutes to complete so it is
kind of a time-intensive process
i guess it depends on the strength of
your computer as well but at the end of
the day
we have plots that look like this so
let's pause here because this is the
main plot we're going to need to analyze
in this video
so let's actually look at the right hand
side these are kind of jagged looking
plots here
so notice that this goes from 0 to 1 000
so this is the samples each iteration of
the samples from sample 1 to sample 1000
and you can see if we zoomed into the
early points here you might see a lot of
change but you can see eventually they
converge around
10 5 and maybe one point something
respectively for the intercept slope and sigma
sigma
and if we were to take all those samples
and plot distributions of them
we get plots that look like this notice
that there are kind of four overlapping
density plots here and that's because we
did four independent runs of a thousand samples
samples
but you can see that the shapes look
generally the same now as nice as these
plots are i
replotted them for ourselves so we can
put some additional information on them
to tie this whole story together
so here's the plot of the slope so you
can see this power of the bayesian
analysis is that now we have a full distribution
distribution
for sampling from this slope from this m
you can see this black line which is the
true slope of five and you can see that
the solid blue line
is the posterior mean or the average of
all the samples from the posterior
distribution for this particular parameter
parameter
and the dashed blue line is the mle
estimate so this
point i want to drive home another point
is that we don't often use bayesian
stats because we want to do a better
point estimate
you can see the point estimates which
means the posterior mean and the mle estimate
estimate
are pretty much the same here it's just
that using bayesian stats we get a whole distribution
distribution
which gives us a better idea about how
confident we should be in this posterior mean
mean
and these red lines are the standard
deviations away from the posterior mean
this is the intercept we can see you're
doing a pretty good job here too
sigma there's a little bit of a more
interesting story going on the true
value is here
this dashed line which was the mle
estimate is closer
this solid blue line which is the
posterior mean is a little bit further
away but the other interesting thing
about the sigma distribution is that you
can see the exponential prior kind of
hiding in there you can see this
posterior distribution is a little bit
skewed to the right like an exponential
would be
whereas these guys are looking more like
normal distributions so
you can see the choice of the prior does
affect the final outcome here
but the main point i wanted to get
across is that we can use this cool
library using pymc3
to get a lot of information out of just
10 or 5 data points
which we didn't necessarily get by just
doing a simple point estimate the
mle estimate so if you like this video
please like and subscribe for more
videos just like this and
Video के उस moment पर जाने के लिए कोई भी text या timestamp click करें
Share करें:
ज्यादातर transcripts 5 सेकंड से कम में तैयार
एक Click में Copy125+ भाषाएंContent Search करेंTimestamps पर जाएं
YouTube URL Paste करें
कोई भी YouTube video link डालें और पूरा transcript पाएं
Transcript निकालें
ज्यादातर transcripts 5 सेकंड से कम में तैयार
हमारा Chrome Extension लें
YouTube छोड़े बिना transcript तुरंत पाएं। हमारा Chrome extension install करें और watch page पर ही किसी भी video का transcript one-click में access करें।