Core Theme: The content is a comprehensive overview of the AWS Certified AI Practitioner (AIFC01) certification, detailing its curriculum, exam format, and the foundational AI/ML concepts covered, with a strong emphasis on AWS services like Amazon Bedrock and SageMaker.
Key Points:
The course prepares individuals for the AWS Certified AI Practitioner (AIFC01) certification, covering traditional ML, managed AI services, and Generative AI/LLMs.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
hey this is Andrew Brown your favorite
Cloud instructor bringing you another
free Cloud certification course and this
one is the aabus certified AI
practitioner also known as the aif c01
and the way we're going to get certified
is we're going to be doing lecture
content hands on labs and as always I
provide you a free practice exam so you
can as that exam uh put it on your
resume or LinkedIn to go try to get that
job you've been looking to get uh if you
like courses like this one the best way
to support it is by purchasing the uh
optional paid materials on exampro doco
for this course it's at aif c01 this is
where you're going to get uh additional
practice exams uh cheat sheets um
downloadable lecture slides and more um
and you know if you do not know me I've
taught a lot of courses um so you know
Microsoft AWS gcp terraform kubernetes
you name it I've taught it and so
looking forward to jumping into the AI
okay hey this is angre brown and we're
at the start of our journey asking the
most important question first which is
what is the AI practitioner so this is
an AI certification teaching you the
foundational knowledge of AI Cloud
workloads adus offerings around
traditional ml pipelines adabs offerings
around managed AI Services offerings
around gen and large language models the
course code here is the AI aifc one so
make sure you check the course code so
that you know that you're using the
latest course uh consider the
certification if you want to become an
AI engineer or a data scientist or you
have to work with um AI stuff in your
developer job if you don't know what an
AI engineer is it's someone that builds
AI Solutions using manage AI Services it
could also be building uh ml pipelines
or working with data scientists um to
some degree you will want this uh
certification if you're looking to
architect business use cases for ml a
geni this certification is more focused
on the SE suite and decision makers to
help them buy into the ad ecosystem for
AIML but I'm going to cram in a bunch of
developer stuff because I know that
people want to do this for real and not
just talk about it um if you enjoy the
following tasks like stats and matths
working with data working with python
then this is a career path for you if
you don't you better watch out here
because this stuff creeps up on you uh
unexpectedly but for generative AI it's
not so much an issue here's our Ana
certification road map and you know
again this is just a suggestion you can
do these in any order that you want but
I strongly suggest that before you do AI
practitioner do the cloud practitioner
because a lot of those skills are
expected uh for this okay um and I just
need to remind you that ad
certifications do not validate
programming technical diagramming code
management and many other technical
skills that are required for obtaining
technical roles so do not assume that
when you get aert you can do the job
it's part of your Learning Journey yes
of course but uh you need to really make
sure that you can do the skills um how
long will it take to pass well if you're
beginner 20 hours if you're experienced
five hours this is not a hard
certification I probably made the
content harder than it had to be but I
want to prep you you know for your rules
in actually being able to do this stuff
you're looking about 10 hour average
study time spend half your time with
lecture in Labs other half with practice
exams recommending you study one or two
hours for possibly 14 days days again it
won't take long to get through this
course watch the lecture content do the
Hands-On Labs now this certification
doesn't require any hands-on experience
but I really think that you should do it
because uh in practice versus on paper
are to completely do different things
the labs are not hard here and it will
really help cement your knowledge and in
some cases I'm keeping the lecture
slides light because we're going to be
doing the lab so even if you don't do
them watch what I do so you get at least
that experience there get paid practice
exams because this one has new exam
question types uh and people said that
uh it was it threw them off right so um
you know you can go over to the exam Pro
platform get your free practice exam we
also have paid ones um and you can find
that aif c01 buy those paid ones support
more of this free content we really
appreciate your support and this stuff
is hard to make so um let's talk about
the domains for the exam there are five
domains each domain has its own
weighting this is deter how many
questions that will show up in domain uh
so for domain one we have fundamentals
of AI and ml domain two we have
fundamentals of gen domain three we have
apps applications of foundation
Foundation models I love just saying
applications is apps and by the way this
is not a spelling mistake I copy and
paste it it's Foundation models but
foundational models is also correct um
domain four is guidelines of responsible
Ai and domain five is security
compliance governance AI Solutions which
there's not a lot to talk about so they
really over emphasize it when there's
not much to say but these two categories
is all gen so I put a lot of gen in this
course Amazon Bedrock is done end to end
for this um so you're you're in really
good shape I probably have the best
course for um for the AI practitioner
for the Bedrock stuff okay sag maker uh
I do an okay job of it sagemaker um used
to be sagemaker Studio Classic and
they've migrated over to this new
experience which is not very good and so
you know I'm I'm kind of grumpy when
making the content for sage maker
because I miss the old experience and I
I I think ad has kind of U not done a
good job re reimagining that solution
but anyway where do you take this exam
in person or online uh adabs uses
Pearson view uh for their proctored
online exam system and also for their uh
their test Network PSI is gone if you
remember PSI from a long time ago ads is
not using them anymore the experience
with PSI hasn't been great but I also
think the reason why us is going with a
single provider now is just that they
can leverage that platform to its
maximum and add new new features like
exam question types which we'll talk
about in a moment a
proctor uh is someone that watches the
exam so the idea is they're there to
make sure you do not cheat so understand
that is a component in the test
experience the grading here is 700 out
of a th000 for a passing score I put an
aster there because it's around 70%
because it must uses scaled scoring you
could technically fail at 70% right so
always aim for higher um there are 65
questions on this exam 50 scored 15
unscored you can get uh 15 scored
questions wrong there's no penalty for
wrong questions format of questions
multiple choice multiple answer but also
uh ordering ordering matching and case
studies for this exam for sure so um
right now the exam is in beta at the
time to make this video so they might
change and get rid of those questions
because people don't like them but
understand that a US is trying new exam
type questions you'll experience this
our platform simulates them so you'll be
in good shape if you use our practice
exams not all providers can even similar
things like case studies we absolutely
have that in Spades and we've been doing
that well before this so it was just
coincidental that Aus decided to do that
uh 50 questions are on the exam or
unscored they will not count towards
your final score why are they unscored
unscored questions are used to evaluate
the introduction of new questions to
determine if the exam is too easy in the
passing score or questioned uh
difficulty needs to be increased
discover users who are attempting to
cheat Okay so there's lots of reasons
why they do this if you encounter
questions you've never studied for that
seem really hard keep your cool remember
they may be unscored questions the
duration of the exam is 2.5 hours you
get about 1.5 minutes per question um
there is 120 Minutes with 150 minutes
seat time seat time refers to the time
you should allocate for the exam this
includes reviewing the instructions
showing up for online uh Proctor uh on
showing up for the Proctor to look at
your workspace reading accepting the NDA
complete the exam provide feedback at
the end if it seems like I'm tired it's
cuz I shot this three times my
microphone wasn't on and so my voice is
kind of wearing out but we'll get
through this okay the uh the exam is
valid for 36 months and three years
before recertification I don't really
know that for certain because at the
time of this exam they didn't say that
but the general rule is that search for
8 us is always three years if you're
going to get recertified it's going to
be um You probably get it for free
through a of a skill Builder they're
always trying to do that let's have some
real talk talk about certifications I
have to remind you that cloud
certifications expect you to have
foundational technical skills like
programming scripting SQL it networking
Linux Windows servers project management
developer tools app development skills
compi algorithm skills and more if you
do not have these skills and you get
these searchs you cannot do the job
right this only teaches you how to do ml
AI on the adus platform but it's missing
a lot of stuff uh and adus likes to
position this certific a as a fund fun
fundamental exam but I find there's tons
of gaps with this one I'm producing my
own uh uh foundational generic uh a gen
certification to really fill the gaps
here but you know to fill the gaps
leverage fore Cod Camp their large
catalog of General technical content and
we at exam Pro also make additional uh
materials uh beside the certification to
really help you there that's only
available in the
subscription okay it itself does not
care about ad certifications for hiring
for their own technical roles
certifications serve as a structured way
of learning with a goalpost now
originally certification actually
mattered back in 2016 17 if you had Ana
certification we're talking about it
companies took notice but now it's more
of a learning path thing nurse
certifications can be more valuable so
the reception of the partitioner might
be more valuable uh but I don't know at
this point so I don't want to give you
false hope and considerations but you
still it's good to learn this
and and stuff like that understand that
you might need to add 250 to 500 hours
beside the certification to have the
developer knowledge to perform the stuff
here or AI knowledge if you will so you
know again just consider there's
additional work to be done if you want
to work as an AI engineer or data
scientist um we are going to add
Hands-On labs to help you fill the gaps
here so if you see me taking due TS and
it seems like we're doing long labs I'm
trying to help you out here you can
watch them and not do them if you want
but really you're really should do them
because I'm giving you Real World skills
here here and you folks keep saying that
you want it so I'm giving it to you so
some of the labs might even uh end up in
failed implementations uh not for this
certification I think there was only
like one that I that that was bust and
it wasn't my fault it was just yeah I
think we're trying to do fine t on
Amazon bedrock and so it just wasn't
clear the spend and I just did not want
to end up with a $5,000 bill or
something crazy so I did show the
process and I I did tell you that but uh
this one has next to no failures it's
just the one there but understand that
it's it's about seeing the problems
seeing what's worth using seeing what's
not worth using because these
certifications are marketing tools to
convince you to utilize them but I'm
here as your uh Community hero and I
actually am an aist Community hero to
tell you the real truth about these
services and which ones you should use
and and maybe avoid and I want to be
really clear with that uh we do try our
best to clean up infrastructure but you
should always be proac and check if the
resources are running you're responsible
for the cost and spending your ad
account in the
adus um practitioner course I show you
budgeting and stuff I'm not showing you
in this course but I do in this one and
by the way in this course I actually had
unexpected spend I usually don't have it
but I had it with stagemaker canvas it
was like almost $400 $500 Canadian
because it's US dollars converted
afterwards and uh it's just one of those
Services where they really really misled
you uh not intentionally but like
because the UI was so bad and I I really
pointed out and I even tell you don't
use stagemaker canvas and just watch me
do it so be very careful with Spen but I
do my best but you are responsible just
remember that [Music]
[Music]
okay hey this is Andrew Brown I'm on the
adabs training certifications pages and
we're looking at the adabs certified a
practitioner I do want to point out that
right now the exam is in beta um so
generally I would recommend for you to
wait for to go out of beta because beta
means that the exam questions are going
to change and often beta is for testing
to see whether the exam is good or not
not really forgetting that validation
but anyway if you want to go sit it
early you can but again my
recommendation is to wait the um exam
guide is very unlikely to change so uh
itus doesn't usually change too much
from the the beta experience it's more
about the exams so this is going to be
fine but let's scroll on down and take a
look here and see what they're
recommending so familiar with adus core
Services um share responsibility model I
am Global infrastructure this is all the
stuff that gets covered in the um adabs
Cloud petitioner so you should have your
Cloud petitioner before proceeding for
this certification things you don't have
to do develop coding ml algorithms
implementing data engineering feature
techniques hyper parameters building
deploying AI pipelines conducting math
or statistics basically nothing you
don't have to do any Hands-On but I'm
going to tell you I I pack back in
Hands-On stuff because I think that if
you do some Hands-On that it's going to
really help cement that information your
head there's no reason not to do it um
because you know we can read something
on a paper but it has nothing to do with
what's actually happening so you should
do Hands-On labs and I have hands on
labs for you I have a lot around Amazon
Bedrock just because I feel that that
was um should have been more
strengthened in this certification and
so or just knowledge in general because
it's such a large product but I spend a
lot of time in Bedrock let's scroll down
here and take look so we have multiple
choice multiple response ordering
matching case studies these three are
new not new if you're from Azure because
that seems like a similar thing from
over there um but yeah these are new
question types we have uh 15 unscored
questions we uh so we'll continue on to
here the results is between 100 and
1,000 with a minimum passing score of
700 okay your you score report can
contain tables of classifications of
performance which I'm not really
interested in we'll scroll on down let's
take a look at the domains we have
fundamentals of AIML fundamentals of gen
applications of foundational models
guiding of responsible Ai and then
security so we'll take a look here and
they uh rattle off a bunch of different
terms and so I do my best to cover as
much as I can here the problem is is
that it's not very succinct or exactly
what it is that they want you to know
and because we're right now in betas I
don't know exactly what's going to show
up on there but I did a lot of coverage
here have as much as I that
can over here we have recognize app AI
workload so they're just talking about
like when you should use them when you
should not use them do you know all the
manage services and we cover all those
manag Services here then we're talking
about Sage maker and the ml pipeline all
the steps and and all the core sagemaker
features and services you should
know um then there's about model
performances so I give this a bit of
extra time in the course just because
you know they become valuable later down
the road but they're not super technical
um so you're not going to have hard time
with that for Gen we have a lot of stuff
and I I really really dot the eyes on
gen not because well it's it's not like
I'm huge about gen but just I just
happen to be building a lot of J
projects so I just was able to pack in a
lot of good stuff here and I think a lot
of companies again this is where their
focus is going to be when they're taking
the AI practitioner um so a lot of
information on that then you know
they're talking about more of the other
services like Party Rock Bedrock
playground Amazon Q by the way Amazon Q
is a terrible terrible product um ad us
keeps telling me like oh it's it's new
and improved every like two weeks and I
I come back and doing it's just garbage
I don't know why they keep promoting it
but I guess they've invested a lot of
energy into that product and
unfortunately it's just not very good um
so sorry I don't have anything nice to
say of it maybe in the future it's
better but every time I look at it it's
bad applications of foundation model so
yeah we're talking about not just
Foundation models but just types of what
do they call this application Foundation
model but yeah just general generative
AI knowledge it's weird that they just
have this here because it's basically
that section as well uh and then
responsible AI you know there's isn't
there isn't a whole lot to say about
responsible AI it's kind of weird that
it has uh so much attention here but we
literally just spend one video on it and
there's three other videos of services
to look look at but um you can pretty
much guess like what is responsible and
what's not so it's not like that hard
we'll go down below here um you know not
a whole lot to talk about security I
mean they listed a bunch of stuff in
here but some a lot of the things that
they were listing don't even exist yet
so um and not I don't mean because it's
a beta I just mean like they're talking
about things that just don't exist like
or they haven't been implemented so you
know again I think that this is just
adabs is not not doing a very good job
putting together these exam guides as
they used to they're really throwing a
lot of stuff at the wall here but that's
okay I'm going to make sure you come
through this uh uh pretty well here with
no problems there's the appendix of a
lot of services and what is in scope and
what's out of scope so here you can see
a bunch of services here um you know and
not all of them are in the course but I
I listed the ones that I thought were
most relevant and what my experience was
and what logically made sense and uh
yeah so there you [Music]
[Music]
go hey this is Andrew Brown and we are
taking a look at the definition of what
is artificial intelligence and we really
want to put this against uh the terms of
machine learning deep learning and
generative AI so that it's very clear
what the differences are often people
just say AI when they mean ml or deep
learning so understand that um these
terms are uh not used correctly often
but people generally will understand
what you're trying to say so it's not a
big deal if you use them out of turn but
let's make sure that we know what they
are let's so let's first take a look
here at artificial intelligence also
known as AI these are maches that
perform jobs that mimic human behavior
okay that's the key thing here is that
they are humanik or doing tasks that
you'd expect a human to do um and that
is clearly a very broad term of what is
AI and so you can see why a lot of
things are attributed to being AI then
you have machine learning and machine
learning initialized as ml is machines
that get better at a task without EXP it
programming now of course we have to
code a machine learning model but once
we have that model and we pass things
into it it's able to complete its task
with its very complex algorithms um so
you could also just think of it as an uh
it's a a a special algorithm to perform
a task that would negate you the negate
you having to do calculations or
programming or things like that then we
have what is deep learning and when we
think of a lot of the AI stuff we're
usually thinking of deep learning
because it's these machines that have an
artificial neural network inspired by
the human brain to solve complex
problems so you probably have this uh
you probably seen a graphic of it of
like these nodes and they're
interconnected and they go through
layers that's deep learning a lot of
people call that machine learning or AI
but no that's that's the L then we have
gen so gen which is more of a u
marketing term but generative AI is a
specialized subset of AI that generates
out uh content such as images video text
and audio now I don't have it in the
graphic on the left because it's hard to
say where it is a go does it go here
right because it is a subset of AI but
technically um gen often utilizes deep
learning because when we think of it and
my Line's not dry here today but um when
we think of it there we go there's the
line is that a lot of gen techniques
like large language models or um or um
Vision models things like that are
utilizing neural networks so it is deep learning
learning [Music]
[Music]
okay all right so I know we keep talking
about what is AI what is Gen AI but
we're going to cover it again just so
that it becomes more clear from
different perspectives um so let's talk
about what is artificial intelligence so
AI is computer systems that perform
tasks typically requiring human
intelligence um so these include things
like problem solving decision making
understanding natural language
recognizing speech and images and an
ai's goal is to interpret analyze and
respond to human actions it's there to
simulate human intelligence in machines
when we use the word simulate we're
talking about mimics aspects resembles
behaviors but what we're not talking
about is emulation which is replicating
exact processes and mechanisms that's if
you created literally a ual human brain
that's what emulation would be um so AI
applications are vast and include areas
such as expert systems natural language
processing also known as NLP speech
recognition robotics uh and more AI is
using various Industries for tasks such
as uh we're talking about business to
Consumer so think of a customer service
chatbot if we're looking at e-commerce
think of a recommendation system if
we're talking about the Auto industry uh
maybe we're we're looking at automous
Vehicles if it's medical then medical
diagnosis there's a lot of applications
for AI but it's a broad application for
all sorts of things now let's take a
look at generative AI so generative AI
uh often initialized as geni or or said
as geni is a subset of AI that focuses
on creating new content or data that is
novel and realistic it can interpret or
analyze data but also Al generate new
data itself it often uh yeah so like
types of content produces would be text
images music speech and other forms of
media it often involves Advanced machine
learning techniques uh so it could be
using things like Gans it could be using
vae so variational Auto encoders um a
lot of current llms use the Transformer
architecture so if you're using um chat
GPT or Claud Sonet or any of the popular
ones they're basically all Transformer
architectures gener I has multiple
modalities and when we say modalities
it's like think about your your senses
you have touch taste hearing smell so
modalities are the kinds of content or
or um senses that a model has so we have
Vision so realistic images and videos
text generating humanlike text audio
composing music molecular which is more
of an interesting one so drug Discovery
via geomic data and uh I want to make it
clear again we're talking about large
language models but llms large language
models will generate out humanlike text
and is a subset of gen it's just one
modality of the many modalities um but
it's often conflated as being AI or gen
AI just because it's the most popular
and In Demand right now and the most
developed so just make sure that you
understand that geni and AI is not all
about large language models it's just
one modality one application of of the
broad sense of AI and gen AI now let's
just make sure we have a side by-side
comparison uh and then I'm sure after
this you'll definitely know uh
definitively the difference between Ai
and gen so in terms of functionality AI
focuses on understanding and decision
making whereas gen is about creating new
and original outputs for data handling
AI analyzes and makes decisions based on
existing data gen uses existing data to
generate new and unseen outputs in terms
of applications AI spans across various
sectors including data analysis
automation NLP and Healthcare where gen
and yes I see the spelling mistake uh
it's creative and Innovative focusing on
content creation synthetic data
generation defix and design so there you go
go [Music]
[Music]
let's talk about Jupiter so Jupiter
notebook is a web-based application for
authoring documents that combine Live
code narrative text equations and
visualizations and before it was called
jupyter notebook it was known as I
IPython notebook and jupyter notebooks
were overhauled and then turned into an
ID called Jupiter lab which we'll talk
about here in a moment but you generally
want to open notebooks in Labs um and
the leg the Legacy web-based interfaces
known as Jupiter classic notebook and to
be honest I get confused between juper
lab and classic I think most things that
you use these days are Jupiter lab um
but the confusion is because we just
call them notebooks even though Jupiter
classic notebook is the not nope the uh
the older one and the newer one is
Jupiter Labs let's go take a look at
jupyter Labs so jupyter lab is the next
Generation webbased user interface it
has all the similar features as the
classic juper notebook in a flexible and
more powerful user interface so it has
notebooks terminals text editor file
browser uh Rich outputs and the way you
I think that you know that you're using
jupyter lab is that it will have this uh
these tabs here on the side and a bunch
of functionality so Jupiter lab will
eventually replace the classic jupyter
notebook and that's kind of true because
um but not fully because in some places
I do come across classic notebooks
launching them up um but for the most
part functionally it has been
replaced then we have jupyter Hub so
jupyter Hub is a server to run jupyter
labs for multiple users it's intended
for a class of students a corporate data
science group scientific research groups
and so it has some components uh
underneath you will come across notebook
like experiences that are like Jupiter
Labs so some companies will um extend
the functionality of it one example is
Sage maker uh Studio Classic for
whatever reason ad us um spent all this
time creating extensions and extending
Jupiter lab and then they decided uh no
we're not going to have extensions
anymore and we're just going to use the
vanilla version um but uh there's also
things like vs code that has notebooks
or code lab that have notebooks and vs
code is like its own kind of notebook
thing it's not juper Labs but it's juper
lab compatible so just understand that
you'll come across things that are
notebooks that look like Jupiter lab but
they're not necessarily Jupiter lab okay [Music]
[Music]
let's take a look at natural language
processing also known as NLP and in
machine learning it's a technique that
can understand the context of a corpus a
corpus is a body of related text the
text that you are working with and NLP
intersects with computer science and
Linguistics so if you know a lot about
the the nature of uh spoken and written
language then uh computer science here
is going to meet in the middle here so
that we can um make sense of it using
algorithms so NLP enables us to do
things like analyze and interpret text
within documents emails and messages
interpret or contextualize spoken texts
like sentiment analysis synthesiz speech
uh such as using a voice assistant
talking to you automatically translate
spoken or written phrases and sentences
between languages in uh interpret spoken
or written commands and determine
appropriate actions another thing you'll
hear a lot is language understanding
which is supposed to be it's a it's more
like a specialized subset of NLP um uh
that just goes farther to understand uh
more traditional older ways of doing NLP
but uh anyway what I'll do is we'll just
take a look at this um very simple
flowchart to give you some idea of
things that are related with an NLP this
is mostly just get you exposed to some
terms it's not important to remember
what these are and I can't even describe
them off the top of my head um but again
just get you exposure to NLP terms so
that when you see them later you'll go
look up and be like oh I remember seeing
that term here so here we have like text
wrangling pre-processing language
understanding so structure and syntax
processing functionality which is what
the NLP uh does for you in the end but
text text Rand pre-processing is where
you are preparing uh text to be uh put
into possibly um a machine learning
model or maybe you're using it for um
some kind of analysis or something like
that and so this is basically taking
text and um formatting it changing it
and so what could we be doing here well
we could be doing conversions maybe
we're lower casing things maybe we're
upper casing things um maybe we're
turning contractions into their full
forms or vice versa sanitation this is
where you are maybe stripping out HTML
or special characters or you are
removing stop wordss when uh you have
stop wordss later on in your ml models
tokenization which is conver converting
um the text into uh Vector embeddings we
have stemming okay we have uh lonization
so there's a lot of things here but you
can see it's mostly just like formatting
the text to be utilized for something
else we have language understanding so
these are processes to make sense of the
text so part of speech tagging so is
this an adjective is this a noun things
like that chunking how can we uh break
up the text and then work with those
chunks later on down the road so that
still makes sense dependency parsing so
you know which word relies on other
words and what relationships do they
have to other ones uh
consti constitu parsing very hard for
word for me to say but like imagine a um
a a tra GRE tra green and so like you
know a noun has an adjective under it
which has another thing under it you
look up if you look it up and go to
Google Images you'll you'll know what
I'm talking about then we have
processing functionality what are we
using NLP 4 so we have name and
recognition this is where you have a
body of text and it's highlighting uh
important words like maybe important
nouns that it thinks you you care about
or things like that or personally
identifiable information we got engrams
sentiment analysis is this text positive
negative happy sad information
extraction what are we trying to get out
of a large body of text yeah um same
thing with information retrieval
questioning and answering topic modeling
so you know again not super important to
know these in depth right now but the
things that are important we will see
these terms again um and you'll know
what they are then so don't worry about
trying to memorize this now but just get
that exposure to NLP terms [Music]
[Music]
okay hey this is Andrew Brown and we're
looking at the concept of a regression
and this is a process of finding a
function to correlate a label data set
into a continuous variable or number so
imagine we need to predict a variable in
the future such as the weather what is
it going to be next week and so the idea
is that you're going to plot your data
onto a graph or vector space our dots
are represented as vectors um and we're
going to draw a line through it which we
call a regression line and the point of
the regression line is that is our
prediction so if this is going over time
based on the temperature um you know uh
that is how we are figuring out in the
future what things are going to be so
the distance of a vector from the
regression line going to just get out a
different colored pen tool other than
than red so maybe cyan so imagine this
dot here to the line that's what we're
going to call an error because the idea
is that um things that are closer to the
line is the prediction and things that
are farther away from the line are an
error from the line so hopefully that
makes sense there are different
regression algorithms used uh uh that
can uh that we use to predict future
variables so we have mean squared error
uh root mean squar error mean absolute
error and So based on the algorithm that
you use to draw your line that's going
to change um the prediction [Music]
[Music]
okay let's take a look at classification
this is the process of finding a
function to divide a label data set into
classes or categories so the idea here
is we're going to predict a category to
apply to the input of data so will it
rain next Saturday is it going to be
sunny or is it going to be raining so
the idea is uh we have our data we're
plotting it on a graph but we're drawing
a classification line that divides the
data set okay and the idea is that if it
falls on one side then it's sunny it
falls on the other side then it's rainy
and so again if you have a different
type of algorithm that's the thing
that's doing the division um it's going
to have different results you have a a
logistic regression a decision tree
random Forest you can use a neural
network you can use a uh a
Navy Bay I always say that wrong so I do
apologize or you can use KNN or you can
use a support Vector machine at or svm
so just understand that there could be
more algorithms of this but these are
the common ones and you know if you want
to learn more about how these different
algorithms will change just look up on
the Internet uh what that would look
like and there's definitely
[Music]
let's talk about clustering this is the
process of grouping unlabeled data based
on similarities and differences the key
word here is unlabeled when we looked at
uh um classification that was labeled
data so the idea here is that we're
grouping based on similar user
differences so imagine that this
grouping of dots that are close together
we determined that that is Windows and
this uh group of dots are Mac computers
and just like classification ression you
have different algorithms they're going
to give you different results and the
reason why I show you these algorithm
names is because when you have to do
classification regression or uh
clustering uh you're going to see these
names because you're going have to
choose what algorithm you want to
utilize right now it's not so important
to uh know them but when they are
important we will look at them uh in
more detail [Music]
[Music]
okay so we are going to dive into the
types of machine learning in other
slides in more detail but this is just
kind of an overview so that you can kind
of see these terms up front um so we'll
just quickly go through this here and
we're going to group them um based on
what they're trying to do so the first
is learning problems we have supervised
unsupervised reinforcement these are
three terms you're going to hear quite a
bit with machine learning uh the key
thing here is that supervised is where
you have labeled data and unsupervised
is where you're working with unlabeled
data for
reinforcement this is an agent an agent
that operates in an En environment and
must learn to operate using feedback and
this kind of sounds like agentic
workflows or agentic coding we're
talking about gen which we'll learn
about later but the idea is like imagine
you wanted to make a uh a machine
learning model that played the the Mario
or or the Sonic video game that'd be
using reinforcement learning okay then
we have hybrid learning problems so we
have semisupervised self- supervised
multi- instance so semisupervised is
where you have a mix of labeled and
unlabeled data you have a lot of
unlabeled data and a little bit of
labeled data and so that's kind of a a
mix between supervised and unsupervised
you have
self-supervised um and I believe that
this is where um the idea is that it can
label its own data I think but we'll
find out later on in future slides we
have multi- instance where we have um
examples of unlabeled data and so then
we just kind of bag them together um
again we'll cover that later on we have
statistical inference so here we have
inductive deductive and and transductive
so using evidence to determine the
outcome or then we have deductive using
general rules to determine the specific
outcomes and then we have transductive
used uh to predict specific examples
given specific uh specific things from a
specific domain okay then for learning
techniques we have multitask active
online transfer and Ensemble so
multitask is fitting a model on one data
set that addresses multiple related
problems active is the model is able to
query a human operator during the
learning process um online is using
available data and updating them mod
before prediction is made kind of sounds
like rag when we're talking about gen um
but again this is just general machine
learning right so we have transfer and
model is first trained on one task and
then sum are all the models used as a
starting point for uh for related task
and then we have uh Ensemble where uh
two or more models are fit on the same
data and the predictions from each model
are combined so yeah we're going to see
these terms again but just trying to get
it uh up front here for you [Music]
[Music]
okay let's take a look at the divisions
of machine learning this is just another
way to breakup machine learning um and
these terms you're going to see uh more
in how we're going to structure our
upcoming slides here so I just want to
give you a quick overview here so we
have classical machine learning and the
advantage of classical machine learning
is the data is simple you have clear
features um and generally classic
machine learning is extremely uh cost
efficient compared to other types of
machine learning but this is where you
have supervised unsupervised uh kind of
uh stuff so you know when you think of
classical machine learning think of
those two things supervised and
unsupervised um uh learning then you
have reinforcement learning this is uh
when there is no data and the idea is
that the model is going to through trial
and error figure out what is the right
thing to do this is where we have
real-time decision- making game AI so we
talked about Mario or sonic uh uh like
the ml model playing those games and
failing again and again and again until
it can pass the game a learning task or
robot navigation so think of automous uh
driving vehicles that would be a good
case for reinforcement learning we have
Ensemble methods when uh quality of data
is a problem so then you are going to
have different strategies to work with
multiple models or algorithms to have a
better outcome and here we have things
like bagging boosting stacking okay and
so you know you'll see those terms like
boosting you'll definitely see the word
boost more uh when we get to that then
we have neural networks and deep
learning you should just really think of
deep learning as neural networks this is
when the data is complicated and or the
features are unclear this is where you'd
use uh neural networks like a
convolutional neural network a
reoccurring neural network uh a gan so
generative adversar [Music]
[Music]
adversarial Network sorry multi-layer
percepton uh or perceptrons sorry MLP
Auto encoders and I just have a really
hard time pronouncing these things but
yeah you're going to see these terms
again so again don't worry about it right
right [Music]
[Music]
now let's take a look here at classical
machine learning and so when we say
classical we're talking about algorithms
that have existed for quite a while may
maybe as early as the 19 50s because we
had these mathematicians and they
figured these out and a lot of these
things actually relate to um statistics
right so we're taking statistics um and
utilizing them uh in these algorithms in
our Computing spaces so hopefully that
makes sense but yeah it's they're called
classical ml because we are dealing with
algorithms and one example would be
nearest neighbor algorithm which was
invented in
1967 and lots of companies today
definitely could utilize classical
machine learning uh to solve business
problems just because they're old does
not mean that they're not good it's just
a matter of organizations knowing how to
adopt uh classical machine learning so
let's talk about first supervised
learning so this is where we have data
that has been labeled into categories
and this is great when we are doing
something that is Task driven we're
trying to make a prediction because the
idea is we have this labeled data and so
then we can bring unlabeled data and
tell the machine to label it right so
here we have classification so we want
an outcome this would be to predict the
C what category something belongs to a
use case here would be identity fraud
detection we have regression this is
where maybe we want to predict a
variable in the future so we're we're
trying to figure out a market forecast
um and we cover you know classical
regression so you should know what these
are um if not you will know about what
they are soon enough because we'll cover
them more than once um then for
unsupervised learning we have data that
has been not been lab laed okay this is
where things are datadriven so we
recognize a structure or a pattern we're
not making a very specific prediction um
here we have clustering so the outcome
of something so you group data based on
similarities or differences example here
would be targeted marketing Association
so find a relationship between variables
through Association the use case here
would be a custo a customer
recommendation we have dimens
dimensionality reduction so here help
reduce the amount of data pre-processing
this is a problem you have a lot of data
um and this a use case here would be big
data visualization so um yeah there you [Music]
[Music]
go all right let's compare supervised
versus unsupervised learning and I know
we've already talked about it like twice
before but we're going to talk about it
again and then again because I'm just
trying to give it to you in different
perspectives so that you really know the
difference between these so let's talk
about what is supervisor learning so
this is a machine learning task or
function that needs to be provided
training data and the training data is
when you provide labeled data the
correct answers and the Machine can
learn from those results so show me how
to do it and then I can do it on my own
that's what's happening here and so for
supervised learning models we have classification
classification
regression what about unsupervised
learning this is a machine learning
Tasker function that needs no existing
training data uh for this it will take
the unlabeled data into discover its
patterns applying its own labels so I am
an independent worker I can figure this
out on my own right uh and for this
these unsupervised learning models we
really should have put the unon that
there let me just fix that there
unsupervised we have clustering
Association Dimension dimensionality
reduction and so supervised learning
tends to be more accurate than
unsupervised learning but requires more
upfront work whereas unsupervised
learning still requires human
intervention to validate the results so
hopefully that is clear [Music]
[Music]
okay okay let's review it one more time
I know it's getting tiresome but it's
very important that you remember the
difference between supervis unsupervised
and reinforcement so supervised learning
is where the data has been labeled for
training it's task driven and you're
making prediction this is when the
labels are known and you want a precise
outcome when you need a specific value
return and so here we use classification
ression as examples of supervised
learning there's more than just those
two but that's what I want you to know
for now we have unsupervised learning
data has not been labeled the ml model
needs to do its own labeling it is Data
driven you're recognizing a structure or
a pattern when the labels are not known
the outcome does not need to be precise
when you're trying to make sense of data
here we have clustering dimensionality
reduction Association then you have
reinforcement learning so there's no
data and there's an environment and an
ml model generates data and many
attempts to reach the goal this is
decision driven you have game AI
learning task robot navigation so
hopefully that is clear and it's in your
head um we are going to repeat these
again but it's going to be less of this
um and more detail [Music]
[Music]
okay let's talk about supervised
learning models and we're going to cover
classification and regression
again um just so that we really know
that we know what these things are so
classification is a process of finding a
function to divide a data set into
classes or categories so imagine will it
be cold or will it be hot tomorrow right
so very clear it's either one or the
other it's going to fall on one side of
the line or the other one we have
different algorithms we can use like log
Logistics regression K nearest neighbor
support Vector machines colel SP spms uh
Navy's uh bay decision stre
classification random Force
classification so we're listing a lot
more here we have what is regression
regression is a process of finding a
function to correlate a data set into a
continuous variable number so what is
the temperature going to be tomorrow and
here we have uh things like s uh simple
linear regression multiple linear
regression polom regression support
Vector regression decision tree
regression random Force regression just
again want to continuously repeat that
so you know what these things are [Music]
[Music]
okay let's take a look at unsupervised
learning uh so what can we do here we
have clustering and again we've covered
these prior but I just really want to
make sure that you know what they are so
clustering is a process of grouping
unlabeled data based on S similarities and
and
differences right so we used an example
previously um you know is this a Mac or
is it a Windows here it's about age and
something else and so it's saying you
know is are these people do these people
have cholesterol are they highrisk or
low risk um for chering algorithms we
have K means uh DB scan K modes then we
have Association so Association is the
process of finding relationship between
variables through Association um so the
idea is that if somebody buys breads
then suggest butter because based on
previous cont combinations we know what
people want um so there are different
algorithms for that I cannot say those
words so I'm not going to attempt it you
can see them here on the right hand side
we have dimensionality reduction this is
where we're reducing the amount of data
we retaining the data Integrity often
used as a pre-processing stage and we
have lots of algorithms for this
principal component analysis linear
discriminant analysis generalized
discrimin analysis singular value
decomposition uh Laden uh direct I can't
say that word there's just too many
words that are too hard to say but
there's a lot there's a lot for
Dimension dimensionality reduction um
yeah and so hopefully you can remember
those things classification regression
clustering Association Dimension
dimensionality reduction [Music]
[Music]
okay let's take a look here at neural
networks and deep learning first
defining what are neural networks so
these are often described as mimicking
the brain you have a neur neuron or node
that represents an algorithm the data is
inputed into the neuron and based on the
output the data will be passed to one of
the many connected neurals the
connections between neurons is weighted
the network is organized into layers
there will be an input layer multiple
hidden layers and an output layer you
could technically have one hidden layer
but often you have multiple layers if
you have three or more now we're talking
about deep learning if you have less
than three then it's just a neural
network um and just look at the visual
for here for a moment because each node
or uh neural remember that it has its
own um its own algorithm like how it's
going to process that data and I'm
pretty certain that most neural networks
the the algorithm is going to be same
for all the nodes but we'll talk about
that as we dig deeper into the neurons
themselves um but then there's the
concept of a feed forward neural network
which is initialized as fnn I don't know
why it's not ffnn but whatever so these
are neural networks where connections
between between nodes do not form a
cycle that means that they always move
forward so data moves forward okay we
don't have neural networks going back
and this way and that way they're just
going One Direction which is forward
then you have back propagation this is
where after um things ran into like
everything's ran through it's going to
move backwards through the neural
network and adjust the
weights okay to improve the outcome on
the next iteration so after it's ran it
actually has to update all the weights
and that is back propagation this is how
a neural network learns it has to do
back propagation okay then we have a
loss function so it's a function that
compares the ground truth to the
prediction to determine the error rate
so how bad the network performed ground
truth right is data that is labeled that
you know to be correct okay now we're
talking about how these neurons are
going to have their own algorithm right
because up here we say that uh it
represents an algorithm so this is where
we have these um algorithms which we
call activation functions so an
activation function is an algorithm
applied to a hidden layer node it's one
of these things right here let me just
get my pen out again one of these that
affects the connected output and so an
example of that would be R L U or reu I
don't know how to pronounce it properly
but I recognize it uh but we will be
looking at activation functions when we
look at Neons a bit uh a bit soon here
um there's the concept of D so when the
network layer increases the amount of
nodes we call it more dense uh and when
the layers decrease the the amount of
nodes we call it sparse okay so when we
see increase it's dense if it's
decreasing it's sparse um
and for deep learning algorithms we have
supervised and unsupervised just like
with classical machine learning um and
so on the supervised side we're going to
see things like uh fnn RNN CNN so you
are passing in labeled data for this to
work for unsupervised learning we we
have uh dbn's SES rbms and not important
to really remember this but I just
wanted you to know that they have
supervised and unsupervised learning um
uh for for deep learning [Music]
[Music]
okay let's take a look at what a
perceptron is so a perceptron is an
algorithm for supervised learning of
binary classifiers invented in
1943 and then the machine was built in
1957 so the mark1 perceptron which is
the name of the machine um it was able
to do some form of image recognition
uh what that would be I don't know I I
wasn't able to extrapolate that but you
can see all of the interconnected uh
work just kind of like the human brain
would have where you have these uh
connections and layers and so this is
kind of where the idea of a um of a
neural network you know came from and
the fact that it's so old just shows you
that we've been doing ml longer than you
think but yeah hopefully that lays the
ground of of the word perceptron but
we'll take a look now at a perceptron [Music]
[Music]
network all right so let's take a look
at a basic perceptron Network and you
might be saying why are we so interested
in this very
old um type of network it's not old this
is neural networks it they are
perceptron networks um so you know just
as goes to show you that the concept is
not new it's just that we have now
scaled it and we have a lot more compute
and we're not connecting everything by
hand right so a basic perceptron has an
input and output layer each layer
contains a number of nodes nodes between
layers have established connections that
are weighted so here is that example the
amount of nodes in the input layer the
input layer right I'm going get my pen
out here over here is determined by the
number of dimensions of the inputed
vector what does that mean the number of
dimensions of an inputed Vector so a
vector remember our our graph we're
taking a DOT and putting it somewhere so
if you had a graph um or a vector space
that had an X and A Y then you have two
inputs for the node right you'd have X
and Y and it doesn't have to be X and Y
it could be different kinds of values
but that's the point there okay so the
input layer is just connection points
okay this input layer nothing that this
layer does will modify the data okay
just the starting point for it so the
amount of nodes in the output layer is
determined by the application of the
neural network so if you have a yes and no
no
classification uh then you would only
have one output node because you just
want to know is it yes or is it no is it
zero or is it one so it would not matter
if there was a thousand input nodes but
if your classification is yes or no you
only need a single node for that right
the output nodes and other layers can
modif Y and compute new values based on
the inputed
data okay and so data moving between
nodes are uh are multiplied by the
weights right so that is what a weight
does it it affects uh the the strength
or the weakness of the number of what
you want to adjust it for the weights
will be modified during the training
process to produce a better outcome so
hopefully that is clear but the only
thing that's you don't see here is those
hidden layers those additional layers
but anyway we'll move on to now talking
about how the algorithm of the actual uh
neural or the neuron works [Music]
[Music]
okay let's take a look at activation
function so when data arrives to a node
that that can perform a computation all
arriving inputed data is summed and then
an activation function is triggered so
the idea here is you have uh let's say
you have two um uh nodes and you have
connections to the the out the output
node notice that it's summing that is
the mathematical symbol for a sum and
then we have a
mathematical um uh symbol for a function
right so it's going to sum it and then
trigger the activation function so the
activation function acts as a gate
between nodes and determines whether
output will proceed to the next layer
the activ activation function will
determine if a node is active or in
active based on its own output which
could be a range between 0 to 1 to1 to
zero and there's all sorts of activation
functions you can put in here um and
this is not the full list and depending
on if you're watching a a beginner like
because I'm going to have this video in
more than one course so if you're in a
beginner course we will not show you uh
the types of activation functions like
literally how they work but in a more
advanced ml1 we will because you will
want to know them there so just
understand that um you know if you don't
see exactly what these look like it
doesn't matter right now okay so we have
linear activation functions so it can't
do back
propagation um that's what linear
activation functions can't do so here it
just passes along the data then we have
nonl linear activation functions so can
do back propagation can stack and have
many layers here we have binary steps so
if greater than threshold then activate
we have sigmoid used in binary classif
ation susceptible to to the vanishing
gradient problem these are things again
if you are doing real ml with me here
then we will talk about them if you
don't see it in the course it's because
I'm trying to make things easy on you
okay we have tan or ton H I'm not sure
how to pronounce it this is a modified
scill version of sigmoid still
susceptible to the vanishing gr uh
gradient problem which is something we
really want to avoid uh reu again I
don't know how to say it properly uh uh
mostly and we're missing an L there
nobody tell me that okay mostly commonly
used activation function will treat any
negative value as a zero we have leaky
relo this counters the dying REO problem
with a small slope of negative values
parameterz relo so type of leaky relo
where the negative slope is fixed at
0.01x exponential linear unit similar to
reu no dying Ru problem saturates
negative large numbers we have switch
this is is an alternative to uh to the
REO by the Google brain team max out use
it in a max out layer choose the output
uh to be the max of inputs inputs soft
Max this is something you'll see a lot
if you're looking at architectural
diagrams like if you look at the
Transformer architecture look for the
word softmax you'll always see these
near the outputs converts the outputs of
probabilities for the multiple
classifications so yeah you know I might
cover these or we might not uh based on
that course but anyway uh that that is
the activation functions [Music]
[Music]
okay all right so we're taking a look at
activation functions the first being the
linear activation function it is also
known as the identity function it's a
straight line as you can tell here the
model is not really learning it does not
improve upon uh the error term it cannot
perform back propagation it cannot stack
layers only ever has one layer this
means your model will behave if it's
linear so no longer handle complex
nonlinear data uh the range is that it's
Unbound so it's infinite it's derivative
one what you put in is what you get out
um so you know why would you want to use
this I think that it's used for inputs
um because you know if you're just
passing something along then that's
totally fine there but if you had
multiple hidden layers with this it's
not going to be very useful but there you
you [Music]
[Music]
go let's take a look at binary step
activation function so this function
will either either return Z one if the
value is zero or less it will return zero the value is greater than zero
zero the value is greater than zero it'll be uh it'll be one and that's why
it'll be uh it'll be one and that's why it's called a binary step function
it's called a binary step function because it's clearly in one place or or
because it's clearly in one place or or the other it can only handle binary
the other it can only handle binary classification so on or off or true or
classification so on or off or true or false it has a range of zero or one it
false it has a range of zero or one it is bound so it's not infinite it's one
is bound so it's not infinite it's one of the earliest used activation
of the earliest used activation functions not used much today but you
functions not used much today but you know when we were looking at that
know when we were looking at that example of like uh producing a yes or no
example of like uh producing a yes or no you could see that this would be the
you could see that this would be the activation function on the output
activation function on the output function right because that'd be very
function right because that'd be very clear but you can see this is very very
clear but you can see this is very very simplistic
simplistic [Music]
[Music] okay let's take a look at the sigmoid
okay let's take a look at the sigmoid activation function which is a logistic
activation function which is a logistic curve that resembles an S shape so there
curve that resembles an S shape so there it is it can handle binary multic
it is it can handle binary multic classifications so think Cow Horse pig
classifications so think Cow Horse pig as we are looking at multiple types of
as we are looking at multiple types of classific classification we can now
classific classification we can now stack layers
stack layers uh we have ranges between Z and one it
uh we have ranges between Z and one it tends to bring the activations to either
tends to bring the activations to either side of the curve with clear
side of the curve with clear distinctions on prediction one of the
distinctions on prediction one of the most widely used functions near the end
most widely used functions near the end of the function y responds less to X so
of the function y responds less to X so this causes the vanishing gradient what
this causes the vanishing gradient what we're talking about we say Vanishing
we're talking about we say Vanishing gradient like look at this it just goes
gradient like look at this it just goes and it vanishes into the gradient that's
and it vanishes into the gradient that's what it's talking about the network
what it's talking about the network refuses to learn further or is
refuses to learn further or is distractedly slow so if values are over
distractedly slow so if values are over here then you're going to run into
here then you're going to run into some trouble so sigmoid is analog
some trouble so sigmoid is analog meaning almost all neurons will fire be
meaning almost all neurons will fire be active activation will be both dense and
active activation will be both dense and slow slowly and costly so think about
slow slowly and costly so think about that um binary step because if it's
that um binary step because if it's binary step it's either on or off um
binary step it's either on or off um because remember that the the purpose of
because remember that the the purpose of it is that if it's zero it's not going
it is that if it's zero it's not going to pass data along if it's one it is so
to pass data along if it's one it is so because this it I mean it it could
because this it I mean it it could technically be zero but like even if
technically be zero but like even if it's here it's a little bit off on right
it's here it's a little bit off on right it's always on or it's it's like really
it's always on or it's it's like really on or it's teeny tiny on right so um
on or it's teeny tiny on right so um there you
there you [Music]
[Music] go all right I want to admit something
go all right I want to admit something that's really embarrassing but when we
that's really embarrassing but when we initially listed out those activation
initially listed out those activation functions I think I swapped the h&n so I
functions I think I swapped the h&n so I called it tan H when it's just ton and
called it tan H when it's just ton and that's why I was saying ton before
that's why I was saying ton before because I'm like in my mind I knew it
because I'm like in my mind I knew it was Tom but like the H was off so I said
was Tom but like the H was off so I said tan H so I do apologize for that but it
tan H so I do apologize for that but it is ton it is the same as a sigmoid
is ton it is the same as a sigmoid function but it's scaled and it's made
function but it's scaled and it's made larger so it looks really really similar
larger so it looks really really similar so it can handle binary multi
so it can handle binary multi classification because it's analog just
classification because it's analog just like the other one we can stack layers
like the other one we can stack layers we have ranges between1 and one the
we have ranges between1 and one the gradient is stronger so it has a a
gradient is stronger so it has a a steeper curve it still has a vanished
steeper curve it still has a vanished and gradient problem like the sigid um
and gradient problem like the sigid um but versus taon and sigmoid is based on
but versus taon and sigmoid is based on your use case so ton can assist in to
your use case so ton can assist in to avoid bias in gradients ton can
avoid bias in gradients ton can outperform sigmoid so you know it's
outperform sigmoid so you know it's depends if you need to do it or not
depends if you need to do it or not [Music]
[Music] right let's take a look here at relo so
right let's take a look here at relo so relo stands for rectified linear unit
relo stands for rectified linear unit activation function where the positive
activation function where the positive axis is linear and the negative axis is
axis is linear and the negative axis is always zero so it looks like that and
always zero so it looks like that and again just remember the point of
again just remember the point of activation functions is that it's either
activation functions is that it's either on or off or always on to to some degree
on or off or always on to to some degree or not um so here the range is zero to
or not um so here the range is zero to infinite so we have a positive axis that
infinite so we have a positive axis that is
is Unbound um so with sigmoid andon it
Unbound um so with sigmoid andon it fires almost all the neurons and this
fires almost all the neurons and this leads to things being dense remember we
leads to things being dense remember we said dense as in um there's it's adding
said dense as in um there's it's adding more information as it goes as opposed
more information as it goes as opposed to being the same or less it's slow it's
to being the same or less it's slow it's costly um so the uh reu is Will Will
costly um so the uh reu is Will Will sparsely trigger activation functions
sparsely trigger activation functions because of its negative AIS gradient
because of its negative AIS gradient being zero so you have um you know if
being zero so you have um you know if something is really low it's going to be
something is really low it's going to be zero it's not going to um be a teeny
zero it's not going to um be a teeny tiny bit on it's less costly but it's
tiny bit on it's less costly but it's more uh efficient so it's a lot faster
more uh efficient so it's a lot faster the negative axis uh with a zero grading
the negative axis uh with a zero grading has a side effect called the REO dying
has a side effect called the REO dying gradient so the gradient will go towards
gradient so the gradient will go towards zero and will be stuck in zero because
zero and will be stuck in zero because variations adjusting due to input or
variations adjusting due to input or error will have nothing to uh nothing to
error will have nothing to uh nothing to adjust to so the nodes essentially die
adjust to so the nodes essentially die okay
okay [Music]
[Music] let's take a look at leaky REO
let's take a look at leaky REO activation function so leaky rectified
activation function so leaky rectified linear unit activation function is where
linear unit activation function is where the positive axis is linear and the
the positive axis is linear and the negative axis has a gentle gradient
negative axis has a gentle gradient closer to zero do you notice that every
closer to zero do you notice that every time we look at one of these it's trying
time we look at one of these it's trying to solve a problem and and try to be
to solve a problem and and try to be better so hopefully you're seeing that
better so hopefully you're seeing that as we go through these activation
as we go through these activation functions so is similar to the REO but
functions so is similar to the REO but it reduces the effects of the REO d
it reduces the effects of the REO d gradient it's leaky because the negative
gradient it's leaky because the negative axis leaks which causes some nodes not
axis leaks which causes some nodes not to die uh we have also paramed relo
to die uh we have also paramed relo which is leaky uh REO where the negative
which is leaky uh REO where the negative slope is
slope is 0 uh or
0 uh or z01 we have reu 6 uh where we have relu
z01 we have reu 6 uh where we have relu where the positive axis has an upper
where the positive axis has an upper limit so it's not infinite uh so the
limit so it's not infinite uh so the idea here it's bound to a max value okay
idea here it's bound to a max value okay [Music]
[Music] let's take a look here at exponential
let's take a look here at exponential linear unit also known as elu it has a
linear unit also known as elu it has a slope towards a negative one axis it has
slope towards a negative one axis it has a linear gradient in the positive axis
a linear gradient in the positive axis so that's what it looks like kind of
so that's what it looks like kind of like um uh what was the last one I
like um uh what was the last one I pretty forgot it was called but uh you
pretty forgot it was called but uh you know the one where it was uh zero in in
know the one where it was uh zero in in the uh One Direction there but anyway so
the uh One Direction there but anyway so something between yeah reu and and leaky
something between yeah reu and and leaky reu um so elu slope slopes towards the
reu um so elu slope slopes towards the Nega one negative value it pushes the
Nega one negative value it pushes the mean of the activation closer to zero
mean of the activation closer to zero meaning activation closer to zero causes
meaning activation closer to zero causes faster learning and convergence uh elu
faster learning and convergence uh elu avoids the dying uh elu problem it
avoids the dying uh elu problem it saturates for larger negative numbers so
saturates for larger negative numbers so everything is a trade-off with these
everything is a trade-off with these things
things [Music]
[Music] okay let's take a look at the swish
okay let's take a look at the swish activation function so it has a slope
activation function so it has a slope that dips and eases out to zero in the
that dips and eases out to zero in the negative axis it has a linear gradient
negative axis it has a linear gradient in the positive axis so kind of looks
in the positive axis so kind of looks similar but like a little bit different
similar but like a little bit different swish was proposed by the Google brain
swish was proposed by the Google brain team as a replacement for REO it's
team as a replacement for REO it's called swish because of its switching
called swish because of its switching dip it looks similar to relu but it's a
dip it looks similar to relu but it's a smooth function it never abruptly
smooth function it never abruptly changes Direction it it is non monotonic
changes Direction it it is non monotonic so it does not remain stable similar to
so it does not remain stable similar to Ru will have sparity very negative uh
Ru will have sparity very negative uh very negative values will Zero out there
very negative values will Zero out there are other variants in the swish family
are other variants in the swish family so we have Mish hard Swish and hard
so we have Mish hard Swish and hard [Music]
[Music] let's take a look at max out so this is
let's take a look at max out so this is a function that uh that will take
a function that uh that will take multiple inputs and it will select the
multiple inputs and it will select the maximum value and return the value so um
maximum value and return the value so um the max out is a generalization of relu
the max out is a generalization of relu and the Leaky reu functions max out
and the Leaky reu functions max out neuron would have all the benefits of
neuron would have all the benefits of relu neurons without having the dying
relu neurons without having the dying reu max out is uh is that it's expensive
reu max out is uh is that it's expensive as it doubles the number of parameters
as it doubles the number of parameters for each
for each [Music]
[Music] neuron all here's our last one the soft
neuron all here's our last one the soft Max activation function this is uh it
Max activation function this is uh it will calculate the probabilities of each
will calculate the probabilities of each class over all possible classes when
class over all possible classes when used for multi classification models it
used for multi classification models it Returns the probabilities of each class
Returns the probabilities of each class and the target class will have the high
and the target class will have the high probability uh the calculated properties
probability uh the calculated properties Pro probabilities will be in the range
Pro probabilities will be in the range of zero and one the sum of all
of zero and one the sum of all probabilities is equal to one softmac
probabilities is equal to one softmac functions is generally used in multiple
functions is generally used in multiple classifications on the output layer so
classifications on the output layer so again I said if you look at the
again I said if you look at the Transformer architecture which probably
Transformer architecture which probably is in this course you will see it there
is in this course you will see it there and you'll see it in other ml models uh
and you'll see it in other ml models uh diagrams for sure you can only assign a
diagrams for sure you can only assign a single label to a probability for this
single label to a probability for this [Music]
[Music] okay let's define a algorithm and a
okay let's define a algorithm and a function so an algorithm is a set of
function so an algorithm is a set of mathematical or computer instructions to
mathematical or computer instructions to perform a specific task and an algorithm
perform a specific task and an algorithm can be composed of several smaller
can be composed of several smaller algorithms you're basically saying how
algorithms you're basically saying how do you do something that's what an
do you do something that's what an algorithm is right how are we going to
algorithm is right how are we going to do something um so I want to take a look
do something um so I want to take a look here at the K nearest neighbor knnn
here at the K nearest neighbor knnn algorithm which can be used to create a
algorithm which can be used to create a supervised classification machine uh
supervised classification machine uh learning algorithm so tell me who are
learning algorithm so tell me who are your closest neighbors and we will infer
your closest neighbors and we will infer that that I can be considered of the
that that I can be considered of the same class so within KNN you can use
same class so within KNN you can use different distance metrics uh such as uh
different distance metrics uh such as uh idian Hamming uh manowski Manhattan so
idian Hamming uh manowski Manhattan so there's all different ones that you can
there's all different ones that you can utilize a function is a way of grouping
utilize a function is a way of grouping algorithms together uh so you can call
algorithms together uh so you can call them to compute a result so sounds like
them to compute a result so sounds like a machine learning model model right
a machine learning model model right where you have a grouping of algorithms
where you have a grouping of algorithms so you know look at this K and N just
so you know look at this K and N just here for a moment because we do uh see
here for a moment because we do uh see this happen a lot but K nearest neighbor
this happen a lot but K nearest neighbor is just like how close am I from here to
is just like how close am I from here to here to here to here it's literally in
here to here to here it's literally in the name how who are my nearest
the name how who are my nearest neighbors okay so KNN itself is not
neighbors okay so KNN itself is not machine learning but when applied to
machine learning but when applied to solve machine learning problem it makes
solve machine learning problem it makes it a machine learning algorithm okay
it a machine learning algorithm okay [Music]
[Music] let's take a look at what a machine
let's take a look at what a machine learning model is but before we do that
learning model is but before we do that let's define what a model is in general
let's define what a model is in general terms so in general terms a model is
terms so in general terms a model is information representation of an object
information representation of an object person or system models can be concrete
person or system models can be concrete so they have a physical form think a
so they have a physical form think a design of a vehicle a person posing for
design of a vehicle a person posing for a picture then you have abstract so
a picture then you have abstract so Express as behavioral patterns think
Express as behavioral patterns think mathematical computer code written words
mathematical computer code written words so what is a machine learning model then
so what is a machine learning model then an ml model is a function that takes uh
an ml model is a function that takes uh in data performs a machine learning
in data performs a machine learning algorithm to produce a prediction the
algorithm to produce a prediction the machine learning model is trained not to
machine learning model is trained not to be confused with the training model
be confused with the training model which is learning to make correct
which is learning to make correct predictions uh an ml model can be the
predictions uh an ml model can be the training model that is just deployed
training model that is just deployed once it has been tuned to make good
once it has been tuned to make good predictions so normally you'd have
predictions so normally you'd have training data let's say labeled data and
training data let's say labeled data and here you are going to have your learning
here you are going to have your learning algorithm and you're going to put it
algorithm and you're going to put it through training so that's your training
through training so that's your training model and then you have hyper tuning
model and then you have hyper tuning where you are continuously tweaking the
where you are continuously tweaking the model to get it to where you want it to
model to get it to where you want it to be okay then once you deploy the model
be okay then once you deploy the model that is your trained model your machine
that is your trained model your machine learning model which can go and produce
learning model which can go and produce predictions um and from here you could
predictions um and from here you could then provide it unlabeled data because
then provide it unlabeled data because you know its goal is to make predictions
you know its goal is to make predictions and that could be labeling data or doing
and that could be labeling data or doing other things okay and we call uh uh uh
other things okay and we call uh uh uh the interaction with the deployed
the interaction with the deployed machine learning model inference right
machine learning model inference right so when you are inferring something you
so when you are inferring something you are providing you're providing data and
are providing you're providing data and saying hey can you uh make a prediction
saying hey can you uh make a prediction for me and that's what inference is
for me and that's what inference is [Music]
[Music] okay so let's take a look at what a
okay so let's take a look at what a feature is so a feature is a
feature is so a feature is a characteristic extracted from our
characteristic extracted from our unstructured data set that has been
unstructured data set that has been prepared to be ingested by our machine
prepared to be ingested by our machine learning model to infer a prediction so
learning model to infer a prediction so ml models generally only accept
ml models generally only accept numerical data and so we prepare our
numerical data and so we prepare our data into machine readable format by
data into machine readable format by encoding which we'll revisit later in
encoding which we'll revisit later in more detail um so let's talk about what
more detail um so let's talk about what is feature engineering so feature
is feature engineering so feature engineering is the process of extracting
engineering is the process of extracting features from our provided data sources
features from our provided data sources so imagine you have your data sources
so imagine you have your data sources which you have then your raw data you're
which you have then your raw data you're going to clean and transform them into
going to clean and transform them into features turning them into machine
features turning them into machine readable format information for your
readable format information for your machine learning models and then you
machine learning models and then you know you go from there
know you go from there [Music]
[Music] okay so what is inference inference is
okay so what is inference inference is the act of requesting and getting a
the act of requesting and getting a prediction and when we're talking about
prediction and when we're talking about in the context of machine learning we're
in the context of machine learning we're inputting data into a machine learning
inputting data into a machine learning model that has been deployed for
model that has been deployed for production use to then output a
production use to then output a prediction so imagine our raw data is a
prediction so imagine our raw data is a banana and we tell we say tell me what
banana and we tell we say tell me what this is to the machine learning model
this is to the machine learning model it's going to bring back information so
it's going to bring back information so saying it's a yellow banana and it has a
saying it's a yellow banana and it has a confidence score of 0.9 so if we talk
confidence score of 0.9 so if we talk about the inference textbook definition
about the inference textbook definition it's steps in reasoning or moving from
it's steps in reasoning or moving from premise to logical consequence but I I
premise to logical consequence but I I think that it's easy to remember as the
think that it's easy to remember as the act of requesting and getting a
act of requesting and getting a prediction
prediction [Music]
[Music] okay let's talk about parameters and
okay let's talk about parameters and hyperparameters so a model parameter is
hyperparameters so a model parameter is a variable that configures the internal
a variable that configures the internal state of a model and whose value can be
state of a model and whose value can be estimated the value of parameter is not
estimated the value of parameter is not manually set and will be learned
manually set and will be learned outputed after training parameters are
outputed after training parameters are used to make predictions then we have
used to make predictions then we have model hyp or model hyperparameter this
model hyp or model hyperparameter this is a variable that is external to the
is a variable that is external to the model and whose value cannot be
model and whose value cannot be estimated the value of the
estimated the value of the hyperparameter is manually set before
hyperparameter is manually set before the training of the model hyper
the training of the model hyper parameters are used to estimate model
parameters are used to estimate model parameters and so we have things like
parameters and so we have things like learning rate Epoch and batch size and
learning rate Epoch and batch size and here's kind of a diagram hopefully it
here's kind of a diagram hopefully it helps make sense but imagine you have a
helps make sense but imagine you have a variable and you want to input it into
variable and you want to input it into your model right and we'll just make a
your model right and we'll just make a box here to indicate that this is the
box here to indicate that this is the model it's going to go into layers right
model it's going to go into layers right and we'll talk about this again later on
and we'll talk about this again later on but uh par
but uh par are the connections between uh nodes
are the connections between uh nodes okay so the idea is that this will have
okay so the idea is that this will have a variable or a value and it'll have a
a variable or a value and it'll have a weight and those are those internal
weight and those are those internal State those parameters okay so hopefully
State those parameters okay so hopefully uh that is very clear there because the
uh that is very clear there because the idea is that when you want to uh utilize
idea is that when you want to uh utilize something for training right you're
something for training right you're going to pass um very like a a Content
going to pass um very like a a Content or variables it's going to go through
or variables it's going to go through all those layers and then all these
all those layers and then all these connections have to be set these
connections have to be set these parameters
parameters of these connections have to be set so
of these connections have to be set so you get the result that you want to get
you get the result that you want to get so hopefully that is clear but we will
so hopefully that is clear but we will cover it again um if it's not clear
cover it again um if it's not clear later on
later on [Music]
[Music] okay hey this is Andrew Brown let's take
okay hey this is Andrew Brown let's take a look at responsible AI specifically
a look at responsible AI specifically for ad of us and often you'll see like a
for ad of us and often you'll see like a list of things like fairness
list of things like fairness explainability privacy and security
explainability privacy and security safety
safety controllability veracity robustness
controllability veracity robustness governance and transparent so this is
governance and transparent so this is the one that adab us defines other ones
the one that adab us defines other ones like Microsoft and other people have
like Microsoft and other people have similar lists so they're more or less
similar lists so they're more or less the same but for the exams for the AI
the same but for the exams for the AI practitioner they might give you a list
practitioner they might give you a list of these so you might want to remember
of these so you might want to remember those key terms let's go ahead and see
those key terms let's go ahead and see what we have in terms of resources for
what we have in terms of resources for responsible AI here so we have model
responsible AI here so we have model evaluation on Amazon Bedrock we have
evaluation on Amazon Bedrock we have Amazon sagemaker clarify we do look at
Amazon sagemaker clarify we do look at that later that's for explainable AI to
that later that's for explainable AI to determine what's going on there and
determine what's going on there and again we have guard rails we have a on
again we have guard rails we have a on that so we look at that we have clarify
that so we look at that we have clarify again clarify again model monitor which
again clarify again model monitor which is more about monitoring the degration
is more about monitoring the degration of a model we do talk about that Amazon
of a model we do talk about that Amazon augmented AI that is a human reviewing
augmented AI that is a human reviewing the end points so all these things are
the end points so all these things are covered um yeah it doesn't look like
covered um yeah it doesn't look like they have a whole lot here let's see ACI
they have a whole lot here let's see ACI service cards provides transfering
service cards provides transfering document intended use cases for fairness
document intended use cases for fairness so I know Microsoft has something very
so I know Microsoft has something very similar um but uh yeah I guess they're
similar um but uh yeah I guess they're just down below
just down below here not super exciting to be
here not super exciting to be honest yeah you got a bunch of stuff you
honest yeah you got a bunch of stuff you can read through so you can see how
can read through so you can see how they're being responsible with it I
guess and yeah so nothing super super exciting here but um yeah I guess
exciting here but um yeah I guess clarify is their big thing here
clarify is their big thing here remembering this list
remembering this list [Music]
[Music] okay let's take a look at labeling so
okay let's take a look at labeling so data label is the process of identifying
data label is the process of identifying raw data images text files videos and
raw data images text files videos and adding one or me more meaningful and
adding one or me more meaningful and informative labels to provide context so
informative labels to provide context so machine learning model can learn from
machine learning model can learn from with supervised Lear uh machine learning
with supervised Lear uh machine learning labeling is a prerequisite to produce
labeling is a prerequisite to produce training data and each piece of data
training data and each piece of data will generally be labeled by human on
will generally be labeled by human on left- hand side that's an example of um
left- hand side that's an example of um Amazon recognition where it's trying to
Amazon recognition where it's trying to identify bounding boxes or classifying
identify bounding boxes or classifying image under particular categories that's
image under particular categories that's an example of supervised machine
an example of supervised machine learning that requires labeled data with
learning that requires labeled data with unsupervised machine learning labels
unsupervised machine learning labels will be uh produced by the machine and
will be uh produced by the machine and may not be human readable then there's
may not be human readable then there's this concept of ground truth this is a
this concept of ground truth this is a uh a properly labeled data set that you
uh a properly labeled data set that you use as an objective standard to train
use as an objective standard to train and assess a given model and is often
and assess a given model and is often called Ground truth the accuracy of
called Ground truth the accuracy of train models will depend on the accuracy
train models will depend on the accuracy of your ground truth and so ground truth
of your ground truth and so ground truth data is very important uh for uh you
data is very important uh for uh you know successful models okay
[Music] let's take a look here at data mining
let's take a look here at data mining this is the extraction of patterns and
this is the extraction of patterns and knowledge from large amounts of data not
knowledge from large amounts of data not the extraction of data itself and so the
the extraction of data itself and so the industry has this thing called Chris DM
industry has this thing called Chris DM which defines it in six phases first is
which defines it in six phases first is business understanding so what does the
business understanding so what does the business need data understanding what do
business need data understanding what do we have and what data do we have we have
we have and what data do we have we have data preparation so how do we organize
data preparation so how do we organize the data for modeling the modeling which
the data for modeling the modeling which is what modeling Tech techniques should
is what modeling Tech techniques should we apply
we apply evaluation what data model best meets
evaluation what data model best meets the business objectives deployment how
the business objectives deployment how do people access the data so that gives
do people access the data so that gives you an idea about working with data
you an idea about working with data mining
mining [Music]
[Music] okay let's take a look here at data
okay let's take a look here at data mining methods um these are ways that we
mining methods um these are ways that we find valid patterns in relationships in
find valid patterns in relationships in huge data sets and they're important
huge data sets and they're important when we're talking about machine
when we're talking about machine learning because sometimes that is what
learning because sometimes that is what the model is trying to do it's trying to
the model is trying to do it's trying to find a pattern of relationship it's
find a pattern of relationship it's trying to predict ICT that so I'm not
trying to predict ICT that so I'm not going to read through all of this
going to read through all of this because you can read through it if you
because you can read through it if you want but these are terms that we've seen
want but these are terms that we've seen already like classification clustering
already like classification clustering regression sequential Association rules
regression sequential Association rules outer detection and prediction uh and
outer detection and prediction uh and notice down here when we have prediction
notice down here when we have prediction it says uh use a combination of other
it says uh use a combination of other data mining techniques such as transends
data mining techniques such as transends clustering classification to predict
clustering classification to predict future data which is fine but we have
future data which is fine but we have classification clustering regression and
classification clustering regression and Association these four are going to show
Association these four are going to show up again and again when we're looking at
up again and again when we're looking at um classical models okay so machine
um classical models okay so machine learning models but anyway I just wanted
learning models but anyway I just wanted to include that even though this is more
to include that even though this is more of a data a data slide
of a data a data slide [Music]
[Music] okay let's take a look here at knowledge
okay let's take a look here at knowledge mining this is a discipline in AI that
mining this is a discipline in AI that uses combination of intelligent services
uses combination of intelligent services to quickly learn from vast amounts of
to quickly learn from vast amounts of information it allows organizations to
information it allows organizations to deeply understand and easily explore
deeply understand and easily explore information uncover hidden insights and
information uncover hidden insights and find relationships and patterns at scale
find relationships and patterns at scale this is a term that was kind of coin
this is a term that was kind of coin over at Microsoft you don't hear about
over at Microsoft you don't hear about it over at Azure or gcp but it still is
it over at Azure or gcp but it still is a good concept to know the other thing
a good concept to know the other thing is that when we look at rag so that's
is that when we look at rag so that's retrieval augmented generation there is
retrieval augmented generation there is a lot of overlap with this or in many
a lot of overlap with this or in many cases you can look at rag being
cases you can look at rag being knowledge mining um but let's talk about
knowledge mining um but let's talk about what we have here so the first thing is
what we have here so the first thing is ingest then we have enrich and we have
ingest then we have enrich and we have explore so inest is ingest content from
explore so inest is ingest content from a range of sources using connectors to
a range of sources using connectors to fir uh uh to first and third party data
fir uh uh to first and third party data stores so we have structured data like
stores so we have structured data like databases csvs unstructured data like
databases csvs unstructured data like PDF video images and audio we have
PDF video images and audio we have enrich so enrich the content with AI
enrich so enrich the content with AI capabilities and let you extract
capabilities and let you extract information find patterns and deep
information find patterns and deep deepening understanding so for manage AI
deepening understanding so for manage AI Services we have Vision Services
Services we have Vision Services language Services speech services
language Services speech services decision services and search Services
decision services and search Services now those literally map to Azure uh AI
now those literally map to Azure uh AI managed services but we're talking about
managed services but we're talking about AWS uh when we're talking about Vision
AWS uh when we're talking about Vision we're talking about recognition we're
we're talking about recognition we're talking about language um I guess that
talking about language um I guess that could be something like um I'm trying to
could be something like um I'm trying to remember the service that does NLP here
remember the service that does NLP here uh okay remember off the top of my head
uh okay remember off the top of my head but for speech we have poly um for for
but for speech we have poly um for for search this could be um not necessarily
search this could be um not necessarily an AI well it could be Kendra right so
an AI well it could be Kendra right so there's a lot of manag AI services that
there's a lot of manag AI services that can be utilized at that level then we
can be utilized at that level then we have Explorer so the newly indexed data
have Explorer so the newly indexed data via search Bots or existing business
via search Bots or existing business applications and data visualizations so
applications and data visualizations so here it could be used in a CRM it could
here it could be used in a CRM it could be in a wrap system it could be powerbi
be in a wrap system it could be powerbi and I didn't list it here but it could
and I didn't list it here but it could also be used to return back to an llm to
also be used to return back to an llm to interpret and then complete rag so there
interpret and then complete rag so there you
you [Music]
[Music] go let's take a look here at data
go let's take a look here at data wrangling this is the process of
wrangling this is the process of transforming mapping data from one raw
transforming mapping data from one raw data form into another format with the
data form into another format with the intent of making it more appropriate and
intent of making it more appropriate and valuable uh for a variety of Downstream
valuable uh for a variety of Downstream purposes such as analytics also known as