YouTube Transcript:
Binning and Binarization | Discretization | Quantile Binning | KMeans Binning
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
Available languages:
View:
Hey guys welcome to my YouTube channel
we are doing feature engineering and today
we will start a new thing
in feature engineering this code encoding
numerical data or encoding numerical
features tagged flat doing so
a few days ago I learned how to encode wheat
flour and today we are
looking at another variant of it that
how to encode medical data means if you have
a data heritage and in it different businesses
2728 normal then
how can you convert it into per size categorical data
dot we are going to study on this thing okay right
you might think that medical test is good
then why convert medical data into technical data
so the real difference is sometimes there is a
need that what will be the
better representation of the data that you have
when I will categorize that data
because in that example a few days ago I was
working on a small machine learning problem
and there you check the google play store
you download the app its data
set right now there is a column Open the number of
downloads. Okay, now in the number of downloads
column, take entries of strange, meaning some of the ones that were very
famous, those that were very
big, they had a very big
rally, something like this, okay, it's a very big
thing. Then if there was a small one, then you won't do
much of this presentation, so do
something like this, meaning a lot of apps are
downloaded very less. And then here I did something that I
converted this code into category, by
creating buffalo, meaning Android plus download,
panderm plus download, hot one million plus download, tank's
plug download, download top. With this,
like I did, my whole
problem became a little bit simpler and the
result started coming in. So I am telling you
that this rally rough problem specific
that you may have to do this sometime that you
convert your numerical data into category data. Okay, so in today's video, we are going to
see the two famous techniques to
convert numerical into technical.
Share two techniques, the first technique
is cord discretization data. Cigarette I
reservation which we call winning this is a
little comment in some places you will find biscuit
ad this question and in some places you will find
winning okay so of burning which is
anti shatter only and this thing is all in
English font the second take
good okay and these two techniques are
in this video first of all I will
teach you the description of the winning in a little detail
and then I will give you a little idea of the
handset of hydration these two techniques if
you have learned then you will never have any problem in
converting numerical data into technical okay so
let's focus on discretization and we
can call it winning okay sources
tears are not visible with the hands below or
discretization or winning if you
want to define it is written in it this
process of transforming continuous
pimples * discrete vegetables bike
creating a set of continuous intervals
dates 50 gr green jobs value
disstabilization and also called big web
in Alternative names for internal okay it is
very simple it helps a lot here
cargo ships are exactly like we have a lot of
12345 so 5781 shades now you said that
I want to convert this into more discrete
so what will you do you will give intervals in it the
time interval from zero which
can now be called 1020 the second one is just like
histogram works exactly like that
2230 Amazon so now we check that there is
this much value between zero to tank how much is there
in between 1070
how much is there in between 2030 and this
graph of yours was of one which I don't know something like this
if the customer suffered a means earlier if this
program was like this then basically you are
making its his program the
one range that is in the range that time 1028 Amazon so this
is what digitization is about this rfi speaking this is nothing
more than this okay so now we have
understood this digitization now what is the benefit to
this station
What is the benefit of winning by using it
till 2050? The first thing that happens is that you can
handle the part a little better. How
inhuman that here there is a guy at the very last,
his taste is very high, so that is why
he is out. Like when you make beans, if it
was the last day of the human, then your value here
and your value here are
hitting both equally. You are equal to this day,
so your outlier is now similar to
describe the one which was at the very beginning of that bank,
so now you can handle the life a little bit,
okay, so that the condition has to improve the
value spread, that means data spread, if there is
too much data in the middle, then you
can spread it, I will tell you
three techniques in this, we will read one of those
techniques, you will be able to do this, you will be able to
improve the balance on the updater's
fat will be a little uniform, this is what we have to do,
okay, so now discuss
what are the time
top discretization, okay, so it is
divided into three categories and If
I write distribution or planning then there is a
class of winning which is unsupervised
learning and
unsupervised learning controls
supervisor winning and type it a little differently
it is called system at the last it is called uniform winning it is
also called uniform winning that when
running second it is equal
frequency wings
that when we call it tile winning and if it is a
little it is called means winning and
their building and only
one minute technique is used in supervisor it is
tree winning okay so in today's video
except this we will
read everything else we will read equal between
equal frequency will also increase the minute one will also increase
and custom winning will also
increase I am not focusing on this you should have
time to read it
I will make a video about it later
mostly you do it daily
so you can use any one of these chart techniques So
this is a little bit so that's why I am
not covering it in this video you
can read it yourself that means this is a job social site
listen relax and we
focus on the first one details call rates or
uniform is daily now the funda here is
unlocked it is simple whatever you have read till now the
decoration of winning you do exactly that
here you have to send message on matters there are
different numbers ah they are from
different numbers so what will you do
first of all you have to tell yourself how many
beans you want how many beans from how many
districts you said I want temple run
okay these beans you have to decide
you want ball tampering so now there is a
formula this formula is maths
maths
- main divided by number of bin time by the
maximum value of the maximum value of 0a and
what will be your number then
how will your interview be - after 20 minutes means
now the interviewer will conduct the first interview
after that the second interview will be conducted from
That second iodine and this total how many were benched
1000 now what you have to do is to pick up each person
and ask which
bill will come in this day which one has
become doctor here which bill this
next one has come here which one will
come here okay it will come in the face it will
come somewhere in the middle and in this way you
write the frequency of how many number method is there in each place among the two districts
five people this is the tens of 2015 people
this is the entry people exam in the middle of 2030
and the respect of files in the middle of Noida
and then you can flip it this is the
first day of poetry second flowing Shaheen and
last drop and herbal you have
faith in this program plot ready looks like this ok
okay now squad equal waste
because if you mode your bin
this is then the interval is the same in all the interviews
you have the width of time okay that is why did
not equal with winning okay or
even say any uniform so I
wrote I have kept the balance
like age 423 add
this mileage came 23.1 in the day as if it is the
first win, this is convinced, this third
benefit was made and this path from reach party to
48th well also came in it, the
first one came in the chutti wala and this is its
range, the third Maya and this is its
range, okay, if anyone understood it, it is a very
simple concept, just like making a histogram,
you just do the same, now what happens in this type of winning is that
your first benefit is that the
outlives are handled a little, the
last data
comes in the last day, then it is
treated like everything else, second here the job is done
spread update, it does not change, like the
maal return rate was this, so if you make this program, then
it looks exactly like this, so there is no
change in this spread, no
change in the spread of data, on
this stage it is spread in this way, there is no
difference in it, so to
handle this, you can use this winning
technique You can, okay, so that was to go
here, this uniform
I have, so the second technique is called
frequency or which tile
winning, it's called ticket show, very simple,
so what do we do here,
now you have to tell again that
how many princes you want, like again you wrote that
I want temple run, okay, now
what will you do here, as many intervals or
beans you want, you will have to give that quantity, while working on it,
you know the contact,
okay, so its basic meaning is that if
you want time intervals over this data, then the
time interval will contain information for
total observation, simple means you start from things and
go to that add where your 10th
percentile value is,
that chicken know that value of sixteen,
what does it mean, between zero to sixteen, the
time person of the population resize of ten
percent of people between H0 to sixteen,
then you will go from 6 to that add where
you will get 20% of the time cases If you
got it on the printed page then between 1617 you mean
between zero to 220 now 20% the population
lives again let's see then from a for
anti you will go to that number where your
30s percentile will be like in this case 1022
so it means that from zero to *
your 30 percent population lives
and hundredth and in this way you will delete 20 * or you will
create 10:00 tiles
test now the difference here is that
your width of every interval is not equal
like if you remember
what was happening in the group from the time pictures
0630 width here it is not necessary
here there is a Gattu of the first one liquid 60
second one for the third one but
how many people are coming in it they are almost
similar so this is gold coin tile
winning anonymously it is used a little more in comparison to
peer and it has two
reasons first this is the last Like the one above, it
works well on blouses and secondly it makes your
spread value spread uniform,
meaning when you see its result, there are
almost equal number of items in every interval,
so this program of yours looks a little like this,
it is
very equal, okay, so it
makes your value spread uniform, okay, so
these are both the benefits, you are handling the fruit like
and secondly it is improving your value
strat, okay, so
it is used a little more, in fact this
will be within the default value, when Hitler will use it,
okay, that text wants to be used for the third
time, this is the most interesting thing,
what happens in that porn news meaning is that you use a clustering
algorithm, which you will
also read in the camp, the name of that algorithm is that means
together clustering algorithm, okay,
what it does is it creates clusters 100,
meaning let me tell you equal admission that
if your data is something like this, then this Anuj
Mishra will do it, it will make characters from
and Automatically now if you say then I can
see that it is more because
in FD that but you can also have data in tender
so this year's specialty is that
in candy as well as in Twenty20 in ND
as well as tractors are made and given to you, okay so
from here citizens are used,
how does the edit polytechnic form discretization happen for
discussing, how does it happen, you should know
that it is specially used when
your data is spread a little in clusters
and means some people's this is for some distance like
after a while other people's this is that
after a while the root of other people here that
taking our data in clusters plus
means there is no value in this distance
then there is no value in me if
you get update in this way then your K means will
work better otherwise the other two
can do their work, okay now
see what do you do and that you told luck
that I need interval or now research
that ticket so here interpol is
called centralized,
according to the nomenclature of this algorithm it is called
central So what do you do,
you create that interest randomly anywhere, there is no
logic anywhere, you have
created five anywhere, now what will you do, you will
make a line making circles at every 2 intervals, as we
make here, see, these are black lines, now this is
picked up from a good festival, they have
not used the drawing technique, now
ok, I will tell you another technique, those who have
not used it, you will go to Pet Nadu, so
what do you do, you calculate the distance of every point,
like if you
WhatsApp this point,
talk about the first point on the phone, then you calculate the distance of every point, you do it for
f4 point with every Android
and then the points which are near this Android,
you consider them in that cluster,
as here the
distance of all these points was nearest to this Android, so that's
why all these people or here these customers have come, all these
points customers have come, all these
black customers have come, this white and this golden
poster have come in test one It could have been
that you used to section once between two classes, the
bike is like this, everything is in piece custom
and like this everything is in this letter, but
I told you the technique earlier, you find out the
distance from all and just decide
because there is a gun point, which plastic
glass, whichever contact is the
project of the central, whichever is the project of the central,
you map it to the same cluster, so
now suppose that you did the mapping, then
after mapping you kill that you
free this android,
what does it mean to free the central, here is
how I freed it, the
meaning of removing the central is very much that you
mean these three, that
you will send this android to the mean of these eyes, like the
mean of these six together at this place, so I
picked it up and it is a little meeting, pay attention, you will understand
here, if
you see the last custom, the mean of these three together was
at this place and the cluster Johra central was
here in the beginning, so you
give it a slight curve and bring it here, okay, you
do this with all the centers, your android is
protected So what do you do again,
set the points, what do you do according to the set points,
again you
calculate the distance of every point with every android,
okay and then decide in
which cluster each point will go and again
all the clusters are signed, again you
send the centroid to the mean and keep doing this until there is no
difference in this app,
if you do not understand this entire algorithm then
go to my channel and
watch the video of means calculation, you will understand immediately,
so very interesting algorithm,
these people have done that algorithm for winning
and all the methods of the answers are also
created easily, then you get the
value of your centroid and the
value of that android is the same,
believe me, that is your pattern of interval, okay, so this one is
used when, let me tell you again, this is
used when your data is
spread in the form of clusters,
okay so 10:00 set this is the pin technique,
now if you talk about how If you want to increment the
school thing then you don't have to do anything yourself
you have a cycle in the side line there is
a question class which is
called a pen discrete eye you have to
call the Vashisht class here
you have to tell 3 parameters
first will be your number of beans second what is
your strategy strategy is
used three times in the strategy which are the tiles
tiles
and k means for 3 techniques and the
third thing is the encoding
means when you convert it
means you dip it in the scribble
then you get free in this way like
here the fiber which is in place of husband because
then going to the interval the graph falls in the day then
you have to tell that
how will it be formed so here
you get only two options the
first option is ordinal
means if you think that there will be a
red ji in the order what is recorded in the technical data
then you can use audible if boys
use one heart encoding okay So it is
general ordinary and unwanted very less
I will quickly show you its
documentation that we will do the same here is the
documentation number of business s n
code strategy these are the three values that
you handle the from here there is nothing else special
okay so what I am going to do now
I will give you a small example
here we will again
use the titanic dataset and approve there
so I got the top most columns so there will be no
problems and I will show you without
winning and with meaning of production
and see in some places you will see
benefit and this
dataset is not very appropriate to pin but let's
see okay so
I have created ignore doctors and I
quiz people so these are
all the imports of our year here I brought p altius
institute here train tablet
came and decision free classified is victim
protection I am going to do the
process to extract the address for cross addition
this is my mind class resources digitizer
In preprocessing you get more column
transform from this you will come till now you must be engaged
in reducing this so
what did you do first I imported trend CSP
and I will work only on these three and
today then some lesson of oil and oil roadways bus and
investment had started I had
changed the name of shoulder and I did
not change here so that's why
not only India but the world ignore it in the right way and
I used these three columns we already
know that there are missing values in age so
I topped all the Rohit with missing values
okay now to make it a little
difficult we have to give the problem and this is
my data shape Rolly says out of that
177 days got deleted because
in verses and this is my current date of friend okay so
what do I do now you will first
show you without playing any sort of
winning how has been my result okay so
extracted the maximum Trent
plate hit this is not to explain everything that I am
not explaining at this time this is my
explain this job fair Column inside this
after that I created real life because
different free classified model
trained by fitting on express by making solid and
calculating vibrant and then
extracted the questions which is 62.90 features
complying with any short transformation and
memorable columns and getting if such of 6
Hussain okay now I will quickly select
using roles so in that Ram
avatar I have been doing the same there is some special difference
now what are we doing we are applying ours
which is tiger okay I have
created two objects by saying vintage and Kevin
then in the first one I have kept both obscene
poetry same speed so you can also accept
I have not done two cameras separately in
this what is Sudarshan Chakra if you want to
put beans in the first one we have to put Yogi in the second one then
you can do this type of thing so let's do one thing
in the beginning we will
apply uniform or first of all
I will show you which tiles which
tiles are our first strategy
and rotating into a good 10 pan So this is
my a I have created object now
I created in call transfer
on column transformer I have created two chords
transform his mother equal to transform
I officer the first set with friend apps
and in this I have sent the vintage and here I have created
India my column and in a train
is my first Columbus index gang
or and similarly created a set named second in which
now sent in fear and put it here
okay so this is my column transform
hey and here I just transformed the
explain of transport
express and put it in these two variants
okay now you can see this a little bit
if you write TRF Dortmund
transformer then you will get both your
transfers first and second now
from the plate I have to see in the first one
what is going on your crops standing this
truck and here if I have two
and here if you look then a You have a
factory cute and support beans you
can see by running this m underscore beans
underscore and this is the two of cotton beans okay
so if I go to the second step instead of the first
then the time regret that is okay if you want you
can also see
what is the winters can be secure range in which the bank and you
can do this what you have to do is you have to
pause this factory or and if you
put it here then it is
and this guys this is our first interval
processes column and
after that 7.75 78984 elephants quantity use
then uniform britney will go and from
98982 five if you flip here then this
college will come up and look this point photo
safety this is our first tile range
where the time value is sliding from the tempo
doing business from the tempo and
22% revenue sharing between points 217
okay so you can see by running this
and now what I did I did a data
from this code of Later you can understand that there is nothing
special about its
output, is it worth seeing, has it broken down, this is what
I showed you in the beginning of the video,
see, initially it looked like this, now
after transferring it looks like this,
similarly destroyed Africa
and after doing the transformer it looks like this, the
district and block has been allotted and if
you look at the level of one, the other one is
coming in this range, the one that is coming is
also coming in this range, the
queen-4 on David Chuare sponge is coming in this match,
you are tide, it is going because the
said juice will have to be seen, what happened
here, good I have written Shakur quickly, is
needed but or you can use it,
what is the range here, it is definitely made,
you can see it a little later by
going to Twitter, okay, now what do I do,
my more columns are from, now I am
again applying duty on this on
top of the transform data, I am again
testing on top of the express transport
and again the bodies of the queries and
there is no special improvement He is well
Pintu his pocket and tools that I have
extended us I am a street girl and Lipsey
look here a little bit of yours fell that's
why it went on leave
tell you that we will read that parameter to introduce everyone
this you will not understand at first also the
result will come on how many you can understand a little
if you select the data and see
you haven't done it and if you
remove the romance then the cross is not coming then
better you try it now what have I done
I have created a function for you people by
opening the descriptive I have
not told the thing myself in it number of prince and
strategy by turning it it will give you equal
play and will also show you the graph by roasting so if
I show you this thing then
I have run this cord this time
and I sent which tile in digitize here
first and this will be
63.01 on doing this contest its accuracy is coming
and this is its one column this is its
size distribution before any
transformation and now see after the which
tile winning transformation you can see
Your uniform is done, this write-up
spread uniform is done, as
your typing is okay, if you want to
talk about affair, then affair is very
tight but I can see, your uniform is quite done,
keeping the time, the first uniform one should
sit equal to that,
see here your change is very much,
in the distribution, it will come
out exactly like that, almost all the message is visible, the
fair one is also visible and enthusiasm is
visible, okay, if you see
from the circle that means 16
Jhaal means, then it is still there that you
subscribe, okay, so that grade two is
just a different and by doing it above it,
this ruler was made to forcefully call,
what will you do by coming to the proper set,
like you will see something or the other, then this
acid and ping description custom domain is
called so that use it in us, then
when will you do that you will have knowledge, you will have business
knowledge, so you can also add according to your own,
like you do one to 80
this is an interview in which you Subscribe
that from ATM to 5560 you could have made another
category and whatever is from sixty
onwards you could have made it a last category so
this you have made it according to your knowledge,
you know that ATM means
adult and above sixty is your
retirement age, those people would have been there, so
this is your domain knowledge, so sometimes you can go and
use it, so Radha using a custom
sorry off side, you have 3 strategies of these
you use and your knowledge to create imbalance,
okay, so sometimes it is also said that
unfortunately you cannot do this thing
through psychiatry, this option is not available,
you will have to keep the logic yourself in this fort in candles
or ₹ 100
note do it using cycle, okay so custom
domain is winning, this is to
create beans with your college, so there is nothing
special in it, there is no teaching, okay, so this is the
talk about this digitization, I have done this
in the question above Knowledge talk
about finalization okay so pension is a
special hips and digitization
biscuit edition what do you do with it a
continuous value you convert it into discrete
tight bone what do you do at this time
you convert a continuous value a
nickel volume into binary and
write it for example like 20 or one you have
annual income of Malwa
okay now if your enrollment is
less than six lakhs then you are not in taxable
means if it is more then you are
in tablespoon so one so what have you done here
you have converted the income into semen about this also
taxable or not so this is
called finalization sometimes in some specific
cases it is called like it is a
very good thing one thing is image
processing so in image processing you also had
pics whose value is zero to
225 color images okay now
if you want to convert it into black and white
then you keep zero Eric holder potato
so 120 7.5 pixels below this, make
it zero, which means make it black
and above this, close it, which means make it
white, so this picture, you had this dipped image,
you had a color image, what did it become, it
became black and white pepper, okay, so this is a
very dish friend of technique, this is a
very specific place for you to work, I have to
show you one, I just created it, here
in the cycle, there are classes by telling a winery, if
you use this class, then
you have just two important things in it, threshold,
like in that last example one 27.53
and below it sub zero, below it sub one and
above it sub one and there is a copy, so you will
put it outside the salary EAC colony and if the value of your copy is true, then
it will create a new column, okay and if its
value is false, then these are the
two parameters to make changes in the salary one, and it is a
very simple thing, I will
quickly show you something, so I have I have
described it, now this time we are going
by the same Titanic data
because now your understanding will be a little
better, so here I have added the Mez Affair
Siblings Power Sparing Child and
Survive column for the treatment and again
dropped all the chords with sharpness, this is
my data
set, now what did I do in it, send the symptoms mouse
and parents side
and said family, see that treatment
and if I did its color, but I had
another column by saying family and where the
time of both of these is fine and then I did
not need siblings positive parents side, so I dropped both these problems
and after doing this, now
my data looks like this, this is my
output column, in the rest of the days, I am
not going to do anything above Jamshed and outside the input column,
I am going to do all the work on the family,
okay, so now what do I have to do,
first of all I took out a Sawai,
after that I also did the train test and this is
my explain commission, you are understanding all this,
now here without boy when
I I used these films to edit people and
applied duty so my chair was becoming
difficult to find at any two point, the
center higher oil is repeatedly this number and
cross-pollination is also in this direction, that means
around now
how to apply the concept of pimples I said
that I want to find out whether someone is
traveling alone or not, if any
passenger is traveling alone or not, so
if the family value is more than zero,
it means the person is traveling for himself,
if the family value is more than zero, it
means someone is traveling with him, then I have a
family problem, I am transforming it
into new column traveling alone
means he is not traveling, one
means he is not traveling, okay so what to do now all
we have to do is that I have
created a transformer fire officer here
I will show you the Ishwari class,
till the class threshold is this, so I do
not need to make any changes copy truth
akh so I copied false Vikas,
I need changes in the existing column
in the family column If I want changes, then I
passed through my entire setup box problem and
hit transform and
after transporting, look at this guys, the first vada
means the first passenger is
travelling with someone, this
note loan and this fourth passenger, this guy is
travelling for you, this business is fine, so
just converted it in the mind and again
applied duty anwar, then it went 6301, there is no
special improvement, that means there was a little
zooming - different in the content,
what did you do last time, butter we elastic
more danger has increased, okay good, so
big wounds, this example is not very accurate, so
you cannot reply to it,
who was just
showing you the example for the sake of showing, there is nothing special here,
okay, so this is another feature
engineering technique zone, we were going to read how to
end code numerical data and
sometimes this will be useful for you, okay, so
keep this in your mind and use it is very easy,
you may have seen it, if possible, once It is
ok to download the code and try it.
Photo video from soya tax set up. Thanks for
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.
Works with YouTube, Coursera, Udemy and more educational platforms
Get Instant Transcripts: Just Edit the Domain in Your Address Bar!
YouTube
←
→
↻
https://www.youtube.com/watch?v=UF8uR6Z6KLc
YoutubeToText
←
→
↻
https://youtubetotext.net/watch?v=UF8uR6Z6KLc