YouTube Transcript:
Machine Learning with 10 Data Points - Or an Intro to PyMC3

पूरा वीडियो देखने की जरूरत नहीं — full transcript लें, keywords search करें, और एक click में copy करें।

Share करें:

AutoDub

YouTube के विदेशी Videos समझें

YouTube Videos की Hindi में Immersive Dubbing

भाषा की दीवार तोड़ें, दुनिया भर का बेहतरीन content enjoy करें

Free में Use करें

वीडियो Transcript

वीडियो Summary

Summary

Core Theme

This video introduces the PyMC3 library as a powerful tool for performing Bayesian analyses, demonstrating how it can extract meaningful insights from limited data (as few as 10 data points) by providing full parameter distributions rather than simple point estimates.

Mind Map

Expand करने के लिए click करें

पूरा interactive mind map देखने के लिए click करें

[Music]

hey everyone welcome back so today we're

going to be learning about machine

learning with just 10 data points

and now that we got that clickbaity

title out of the way the real topic of

this video

is pi mc3 which is a very cool library

in python

which markets itself as a probabilistic

coding framework but um

i like to think that it's something that

really helps us to run our bayesian

analyses so we'll see that through the

course of this video

to keep things simple in this video

we're going to be looking at a linear

model which is something we have learned

very early on

in statistics and maybe learned several times

times

so just as a refresher a linear model

says that the true values of y

are generated according to this linear

process we'll just be keeping it as one variable

variable

single variable regression so it's mx

plus b where m

is the slope and b is the intercept

pretty simple

but of course we don't observe these

true values because there's a little bit

of noise

added to them and so the values that we

observe which are called y

are equal to the true values plus some

normally distributed noise with mean

zero and standard deviation sigma so if

we look at these two equations for a

while there's three things that we're

going to care about in this video

m which is the slope b which is the

intercept and sigma which is the

standard deviation of the noise so let's

generate some data

where we're going to pick the true slope

to be 5 the true intercept will be 10

and the true sigma will be 1

for this video and if we generate 10

points 10 data points according to that

process it looks

like this for example so all the blue

points the 10 of them are data points

that we observe

the red line is the true equation y

equals mx plus b

of course we don't observe that but you

can just reference that against the

points we do have

now if we weren't thinking in a pi mc3

or a bayesian framework what would we do

we would just fit a usual linear model

so let's do that first to see if we can

do better

so if we fit a regular linear model this

is the output so the true model is again

y true is equal to 5x

plus 10 and the true sigma is equal to 1.

if we fit a linear model and to get a

little bit mathy for a second

we are using the maximum likelihood

estimator the mle estimate in this case

the mle estimate of the slope is 5.9

which is pretty far away from five

and for the intercept is 9.8 which is

closer to 10

and then the standard deviation of the

residuals is 1.16 so these three

estimated values

are somewhat close some closer than

others to their true counterparts but

they're not

exact but the main issue is not really

how close they are but the main issue is

that these are what we call point estimates

estimates

in stats point estimates are kind of

just single estimates hey i think this

is the approximation that i'm going to

give you

what we usually want in stats is a whole distribution

distribution

of values for these estimated parameters

the reason we want to hold distribution

is so that we can start looking at and

say that okay the most likely thing

is this but it could also very likely be

this or this

or if the distribution is really narrow

then we have a lot of confidence in the

point estimate that we get back so we

would love to have a full distribution

and so that's where we start turning our

gears towards bayesian analysis

so let's work through some of the math

at a high level first and talk about why

we need or

is nice to have pi mc3 so first we start

with priors

this is always kind of a point of

contention in bayesian stats but

let's just pick some priors so we have a

prior for m which is a slope normally

distributed with mean 0

and standard deviation 20. b is the same

thing and sigma is going to be

exponentially distributed with parameter 1.

now let's talk about these upper two for

a second notice i put a standard

deviation of 20 which is a pretty big

number and so this is what we call a

relatively flat prior

by picking such a big standard deviation

in our prior

we are basically encoding this idea that

i don't really know what these variables

should be

therefore i'm going to center them at

zero and give them a big standard

deviation so i'm not really

locking them down into any value because

i don't really understand too well that

what they should be

now the likelihood function is what is

the probability of

seeing the data we actually observe

seeing the y values that we actually observe

observe

given some setting of m b and sigma and

that's normally distributed with mean mx

plus b

and standard deviation sigma as we have

from the linear model

above now the posterior is the most

important quantity in bayesian stats

and the posterior asks the question

about what is the probability

distribution of our parameters

m b and sigma given that we have some

observed data y

so it's asking the reverse question of

the likelihood and we know according to

bayes theorem that the posterior which

is this guy on the left is proportional

to the likelihood which is this guy here

times the prior the reason i've split up

the priors into three pieces is because

we have this implicit assumption

that the priors are independent of each

other now this doesn't generally have to

be true we can always have our priors be

dependent on each other in interesting

ways that would just complicate the

analysis a little bit today we'll keep

it simple

and so we have that the posterior is

proportional to this quantity here

now the difficult part with bayesian

stats is that we want to sample from the

posterior i want to get samples of m

b and sigma from the posterior but the posterior

posterior

is the multiplication of a normal

distribution another normal another

normal and an exponential

which doesn't seem mathematically fun to

work with or

code and that's where pi mc3 comes in pi

mc3 says that

you don't need to know anything

mathematically about the posterior

just tell me what the priors are just

tell me what the likelihood is

and i'm going to use mcmc behind the scenes

scenes

to sample from this posterior for you

and that's the power of pi

mc3 but with that great power comes

great responsibility and we should still

understand what's

going on it's just that we kind of give

the work away to somebody who can handle

it more efficiently than us

and so let's look at the code this is

the easiest part of the video the pi mc3

coding so at the very top i imported pi

mc3s pm

so with pm.model as model so you always

start a pi

mc3 program like this you specify your

priors so we have three priors sigma is

exponentially distributed with lambda

equals one

the intercept and the slope are both

normally distributed with mean

zero and standard deviation 20. so

that's all encoded there

so the likelihood is normally

distributed with mean slope

times x values plus intercept mx plus b

and standard deviation equals sigma and

then we go ahead and put the observed

values that we actually get

which are those 10 blue points at the

very beginning of this video

and then we go ahead and say i want you

to sample a thousand

samples from the posterior and this

course equals four says i want you to do

that four times independently

so that we can see how those four

independent runs compare to see if they

line up or not

and the results drumroll please and by

the way this took

about two minutes to complete so it is

kind of a time-intensive process

i guess it depends on the strength of

your computer as well but at the end of

the day

we have plots that look like this so

let's pause here because this is the

main plot we're going to need to analyze

in this video

so let's actually look at the right hand

side these are kind of jagged looking

plots here

so notice that this goes from 0 to 1 000

so this is the samples each iteration of

the samples from sample 1 to sample 1000

and you can see if we zoomed into the

early points here you might see a lot of

change but you can see eventually they

converge around

10 5 and maybe one point something

respectively for the intercept slope and sigma

sigma

and if we were to take all those samples

and plot distributions of them

we get plots that look like this notice

that there are kind of four overlapping

density plots here and that's because we

did four independent runs of a thousand samples

samples

but you can see that the shapes look

generally the same now as nice as these

plots are i

replotted them for ourselves so we can

put some additional information on them

to tie this whole story together

so here's the plot of the slope so you

can see this power of the bayesian

analysis is that now we have a full distribution

distribution

for sampling from this slope from this m

you can see this black line which is the

true slope of five and you can see that

the solid blue line

is the posterior mean or the average of

all the samples from the posterior

distribution for this particular parameter

parameter

and the dashed blue line is the mle

estimate so this

point i want to drive home another point

is that we don't often use bayesian

stats because we want to do a better

point estimate

you can see the point estimates which

means the posterior mean and the mle estimate

estimate

are pretty much the same here it's just

that using bayesian stats we get a whole distribution

distribution

which gives us a better idea about how

confident we should be in this posterior mean

mean

and these red lines are the standard

deviations away from the posterior mean

this is the intercept we can see you're

doing a pretty good job here too

sigma there's a little bit of a more

interesting story going on the true

value is here

this dashed line which was the mle

estimate is closer

this solid blue line which is the

posterior mean is a little bit further

away but the other interesting thing

about the sigma distribution is that you

can see the exponential prior kind of

hiding in there you can see this

posterior distribution is a little bit

skewed to the right like an exponential

would be

whereas these guys are looking more like

normal distributions so

you can see the choice of the prior does

affect the final outcome here

but the main point i wanted to get

across is that we can use this cool

library using pymc3

to get a lot of information out of just

10 or 5 data points

which we didn't necessarily get by just

doing a simple point estimate the

mle estimate so if you like this video

please like and subscribe for more

videos just like this and

Video के उस moment पर जाने के लिए कोई भी text या timestamp click करें

Share करें:

ज्यादातर transcripts 5 सेकंड से कम में तैयार

एक Click में Copy125+ भाषाएंContent Search करेंTimestamps पर जाएं

YouTube URL Paste करें

कोई भी YouTube video link डालें और पूरा transcript पाएं

ज्यादातर transcripts 5 सेकंड से कम में तैयार

हमारा Chrome Extension लें

YouTube छोड़े बिना transcript तुरंत पाएं। हमारा Chrome extension install करें और watch page पर ही किसी भी video का transcript one-click में access करें।

Chrome में Add करें — Free

YouTube, Coursera, Udemy और अन्य educational platforms पर काम करता है

Instant Transcript पाएं: बस Address Bar में Domain बदलें!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube Transcriptआपके results तैयार हो रहे हैं…

YouTube Transcript:Machine Learning with 10 Data Points - Or an Intro to PyMC3

AutoDub

वीडियो Transcript

Summary

Core Theme

YouTube URL Paste करें

Transcript निकालें

हमारा Chrome Extension लें

Instant Transcript पाएं: बस Address Bar में Domain बदलें!

YouTube Transcript:
Machine Learning with 10 Data Points - Or an Intro to PyMC3