YouTube Transcript:
[S1E10] Making Sense of Text Analytics | 5 Minutes With Ingo

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

Video Transcript

Hi Mara.

Data science number seven.

Oh hey guys.

Hey Graham.

Hey uh Ingo Ralph. How are you today?

Ingo it's time for

five minutes with

five minutes with Ingo.

And today also with Ralph co-founder of

Rapid Miner and a true expert on text analytics.

analytics.

Just talking about sentiment analysis

for data scientist number seven. Ralph,

how is that working?

Well, we've seen text classification in

the past. So now we want to apply that

to text. Question is how do we do that?

Um well let's look at some statements.

Okay. What do people say about data

scientist number seven?

I just happen to have a like like a

statement here.

How convenient.

Yeah, it is. So, we have for example

positive statement. Unicorns are amazing.

amazing.

Yeah. Yes, they are.

Yeah, of course they are.

Other people have some trouble. So,

let's check out what they're saying. Finding

Finding unicorns

unicorns

is difficult.

Okay. two text sentences to well yeah

unstructured data that it really is how

can we bring this into structured format

so Ralph what what would the first step

look like

well grammar is always so difficult and

I think the most important part is in

the words so let's skip grammar

that's basically pretty much exactly

what I'm doing in English every day true

true

yeah I know anyway

anyway

okay so let's tear the text apart into components

components

so we are like like just breaking this

down. So we have still those two

sentences but every sentence is broken

down into those what we call tokens. So

every word is becoming a token. Okay.

Okay.

So what's the next step?

Well still a lot of words and some of

them don't carry as much information

like is and are are not so central.

Let's just get them out.

Okay. So we throw them away. That's what

we call stop words. We can remove them.

Well now we still have the two

sentences. Not really as structured as I

would like it. We're heading towards the

table. So let's put in more structure.

Since we put away the grammar, we ignore

the word order and basically only look

at the words.

Okay, so that is interesting. That

almost looks like like a good structure

already. I think we need a pen. Um,

whiteboard number one, can you give us a pen?

pen?

All right, perfect.

Thank you.

What a good white board.

Um, so we could actually say like those

two words occurring in both both

sentences. So we can actually make a cut here,

here,

another one here,

and another one here. So it's almost

like columns in a table. You have almost

a structure now already. Okay. So then

what can we do now?

Well, you want to simplify things. So we

just count is a word occurring or not.

So here there's

no word here.

True. But we have one here for finding.

Actually we have two ones.

And for difficult we have

a simple one.

So look at that. Now we have actually a

table. Every token is becoming a column.

The values here is just the count of

words for for every every word in each

sentence. Now we can add another column

which is basically the positive

sentiment or the negative sentiment

here. And now we have this label we

usually want to predict with machine

learning. So we can now use any machine

learning method. SVMs by the way are

great for that by just taking this table

here, train the model on this data in

order to predict if it's a positive or a

negative sentiment. And that's really

about it for for for for the general

idea about text text transformation.

Only problem is what are you doing if

the text are becoming any longer?

Well, the longer the text, the higher

the values will be. So it's kind of

unfair. The longer text are stronger. So

we want to divide that by the length of

the text. This has two words. So I

divide those by two. The other one has

three words. So length is three. I

divide it by three. So afterwards it

does not depend on the length anymore.

So that means really unicorn for example

is more typical for the first text. Yeah.

Yeah.

Because it's occurring in 50% of all the

verse basically and here it's only in 30

33% of all the words. So that makes it a

more typical verse.

Unfortunately it occurs in both text. So

it doesn't make a typical verse for

positive or negative. But for example

amazing and diff difficult. You can see

there's some diff difference.

I can.

Okay. That is excellent. Uh that's

really amazing. Now of course there's

another problem in my opinion because

what are you doing with tech words which

are just very frequent overall like we

have been throwing away those stop words

like is or are. So those are words which

are very frequent in all text documents.

What can you do about them?

Yeah. So this was is called text

frequency. It's about how often is the

word in the text. Oh, I think we needed

a second whiteboard. Where's whiteboard

number two? Here it is. There we go. So

this is text frequency. It's the count

of the word divided by the length of the

text. And then the other term is taking

care of words that are too frequent in

too many documents. So we count in how

many documents they are and take the

inverse of that.

So that means we are normalizing the

term frequency. So the term frequency is

what we see here. And by normalizing

this we give terms which are well very

frequent in all documents a smaller

weight and that's a perfect

representation for text documents in

general and that's really a great way to

transform unstructured information into

structured information. And that's how

for today.

Interesting. Thanks, Ingo. Thanks, Ralph.

Ralph.

And this has been your five minutes with

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:[S1E10] Making Sense of Text Analytics | 5 Minutes With Ingo

Video Transcript

Paste YouTube URL

Transcript Extraction Form

Get Our Chrome Extension

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube Transcript:
[S1E10] Making Sense of Text Analytics | 5 Minutes With Ingo