Hang tight while we fetch the video data and transcripts. This only takes a moment.
Connecting to YouTube player…
Fetching transcript data…
We’ll display the transcript, summary, and all view options as soon as everything loads.
Next steps
Loading transcript tools…
[S1E10] Making Sense of Text Analytics | 5 Minutes With Ingo | Altair RapidMiner How-To | YouTubeToText
YouTube Transcript: [S1E10] Making Sense of Text Analytics | 5 Minutes With Ingo
Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.
Share:
Video Transcript
Hi Mara.
Data science number seven.
Oh hey guys.
Hey Graham.
Hey uh Ingo Ralph. How are you today?
Ingo it's time for
five minutes with
five minutes with Ingo.
And today also with Ralph co-founder of
Rapid Miner and a true expert on text analytics.
analytics.
Just talking about sentiment analysis
for data scientist number seven. Ralph,
how is that working?
Well, we've seen text classification in
the past. So now we want to apply that
to text. Question is how do we do that?
Um well let's look at some statements.
Okay. What do people say about data
scientist number seven?
I just happen to have a like like a
statement here.
How convenient.
Yeah, it is. So, we have for example
positive statement. Unicorns are amazing.
amazing.
Yeah. Yes, they are.
Yeah, of course they are.
Other people have some trouble. So,
let's check out what they're saying. Finding
Finding unicorns
unicorns
is difficult.
Okay. two text sentences to well yeah
unstructured data that it really is how
can we bring this into structured format
so Ralph what what would the first step
look like
well grammar is always so difficult and
I think the most important part is in
the words so let's skip grammar
that's basically pretty much exactly
what I'm doing in English every day true
true
yeah I know anyway
anyway
okay so let's tear the text apart into components
components
so we are like like just breaking this
down. So we have still those two
sentences but every sentence is broken
down into those what we call tokens. So
every word is becoming a token. Okay.
Okay.
So what's the next step?
Well still a lot of words and some of
them don't carry as much information
like is and are are not so central.
Let's just get them out.
Okay. So we throw them away. That's what
we call stop words. We can remove them.
Well now we still have the two
sentences. Not really as structured as I
would like it. We're heading towards the
table. So let's put in more structure.
Since we put away the grammar, we ignore
the word order and basically only look
at the words.
Okay, so that is interesting. That
almost looks like like a good structure
already. I think we need a pen. Um,
whiteboard number one, can you give us a pen?
pen?
All right, perfect.
Thank you.
What a good white board.
Um, so we could actually say like those
two words occurring in both both
sentences. So we can actually make a cut here,
here,
another one here,
and another one here. So it's almost
like columns in a table. You have almost
a structure now already. Okay. So then
what can we do now?
Well, you want to simplify things. So we
just count is a word occurring or not.
So here there's
no word here.
True. But we have one here for finding.
Actually we have two ones.
And for difficult we have
a simple one.
So look at that. Now we have actually a
table. Every token is becoming a column.
The values here is just the count of
words for for every every word in each
sentence. Now we can add another column
which is basically the positive
sentiment or the negative sentiment
here. And now we have this label we
usually want to predict with machine
learning. So we can now use any machine
learning method. SVMs by the way are
great for that by just taking this table
here, train the model on this data in
order to predict if it's a positive or a
negative sentiment. And that's really
about it for for for for the general
idea about text text transformation.
Only problem is what are you doing if
the text are becoming any longer?
Well, the longer the text, the higher
the values will be. So it's kind of
unfair. The longer text are stronger. So
we want to divide that by the length of
the text. This has two words. So I
divide those by two. The other one has
three words. So length is three. I
divide it by three. So afterwards it
does not depend on the length anymore.
So that means really unicorn for example
is more typical for the first text. Yeah.
Yeah.
Because it's occurring in 50% of all the
verse basically and here it's only in 30
33% of all the words. So that makes it a
more typical verse.
Unfortunately it occurs in both text. So
it doesn't make a typical verse for
positive or negative. But for example
amazing and diff difficult. You can see
there's some diff difference.
I can.
Okay. That is excellent. Uh that's
really amazing. Now of course there's
another problem in my opinion because
what are you doing with tech words which
are just very frequent overall like we
have been throwing away those stop words
like is or are. So those are words which
are very frequent in all text documents.
What can you do about them?
Yeah. So this was is called text
frequency. It's about how often is the
word in the text. Oh, I think we needed
a second whiteboard. Where's whiteboard
number two? Here it is. There we go. So
this is text frequency. It's the count
of the word divided by the length of the
text. And then the other term is taking
care of words that are too frequent in
too many documents. So we count in how
many documents they are and take the
inverse of that.
So that means we are normalizing the
term frequency. So the term frequency is
what we see here. And by normalizing
this we give terms which are well very
frequent in all documents a smaller
weight and that's a perfect
representation for text documents in
general and that's really a great way to
transform unstructured information into
structured information. And that's how
for today.
Interesting. Thanks, Ingo. Thanks, Ralph.
Ralph.
And this has been your five minutes with
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.