YouTube Transcript:
How our brain judges people in a split second | DW Documentary

Skip watching entire videos - get the full transcript, search for keywords, and copy with one click.

AutoDub

Understand YouTube Foreign Videos

Immersive YouTube Voice Translation

Break language barriers, embrace global quality content

Solve Foreign Video Barriers Instantly

Video Transcript

Video Summary

Summary

Core Theme

Our brains are wired to make rapid, often unconscious, judgments about people based on facial and vocal cues, a phenomenon rooted in evolutionary survival instincts but susceptible to learned stereotypes and biases.

Mind Map

Click to expand

Click to explore the full interactive mind map • Zoom, pan, and navigate

A look

A word

A first impression.

And straight away we know.

I can trust her.

But not him.

Sheís lying.

And so is he.

But he seems reliable.

Hallo.

Alone the level of trustworthiness that you perceive

in another personís face,

even when theyíre complete strangers,

can predict criminal sentencing decisions,

including up to capital punishment, or can predict hiring decisions.

A brief glance at someoneís face or the sound of their voice

can affect our decisions.

Gathering information from facial and vocal

has been fundamental to social interactions.

Language has only been around for tens of thousands of years,

which, in evolutionary terms, is no more than a blink of the eye.

First impressions can be alluring, but often deceptive.

Iím looking forward to tomorrow.

Our face and voice reveal a lot about us.

Our mood, our disposition, our health.

They point to whatís going on inside us

and the cues they give

can even be interpreted by artificial intelligence.

Science fiction has been predicting this development for ages.

But itís still hard to fathom and it leaves us feeling skeptical

and uneasy because weíre not used to it.

We encounter strangers every day.

A new face, an unfamiliar voice.

Both unique and distinct.

They express our individuality.

But they also help us decide whether we like the person.

And whether we accept or reject their advances.

Decisions we make instantly.

In just a hundred milliseconds of exposure,

people already make up their mind about trustworthiness

and competence and dominance.

But that making up their mind takes several hundreds of milliseconds.

But you only need a very quick glance on their certain facial features,

even in a static photograph that convey levels of intelligence

and that can lead to judgement and biased decisions.

Jon Freeman is looking at what happens in our brain

after a short glance at someoneís face.

His theory: many of these instantaneous decisions

are based on learned stereotypes.

The same applies to voices.

Pascal Belin continues to find evidence

that we associate certain emotions

and traits with how someoneís voice sounds.

We see voices as a type of auditory face.

We need just one word to form an opinion.

A voice like this...

is seen as inspiring and confident by most people, whereas this one

leaves the listener thinking they wouldnít trust him with their money.

What does science say?

Do we all see the same thing when we look at someoneís face?

Jon Freeman uses a special morphing program to get a more accurate answer.

He can alter gender, age, mood and character traits subtly.

If you ask hundreds of different subjects

to judge the trustworthiness of these individual faces,

youíll find that they generally agree

in terms of being highly correlated with one another,

so the same faces appear trustworthy or relatively untrustworthy

across the board generally.

Although weíre all different,

the result is surprisingly similar for everyone

at least if weíre asked if someone is trustworthy.

The first impression is when we decide who we want to communicate,

cooperate or form a close relationship with.

Is it surprising that people have these kinds of unconscious tendencies

despite humans being such rational creatures?

I would say not really.

When we think evolutionarily about it,

in terms of our evolutionary past

before we had verbal language as non-human primates,

non-verbal communication and using facial appearance,

using cues of the face, voice and body were really critical for survival,

for the maintenance of resources, for building social groups.

It can be attributed to our evolution.

Making instant decisions about who is friend or foe

greatly increased our chances of survival.

As pack animals, weíve always formed communities.

Long before language played a role,

humans developed a keen sense of how those around them felt.

And being able to read the room is a huge advantage.

beforeIf someone in the group is scared,

your own life may also be in danger too.

If someone is seething with rage, you placate them or run!

Our brains are still wired the same way today.

As soon as we encounter someone new,

we immediately attempt to establish

whether they are with us or against us.

But to what extent do these first impressions

actually alter our behaviour?

The evidence shows that they have a strong impact.

And they predict all sorts of downstream social outcomes

and real-world consequences.

There are findings like: faces that appear more competent

are more likely to be elected to Senator and governor positions

in the United States, and even presidential candidates

are more likely to win in the United States.

Competent looking managers and attractive people are paid more.

And defendants who look untrustworthy are given longer sentences.

But what about our voices?

We can ìhearî fine nuances of confidence,

dominance and competence too.

Even if they have little in common with the speakerís actual personality.

Beyond words, a voice also transports emotions

and can even bring things to life.

Puppets become real figures we relate to like other human beings.

It illustrates how we instinctively relate to a human personality

if it has a voice.

On a puppet, for example.

And because of changing body cues and changing vocal cues,

that the perception of the emotion, the perception of the person

or the puppets intentions are changed.

Oh yes

Look me in the eyes

Just a little bit

Generosity

Sympathetic

Ambivalent

Our brains create real people from the voices they hear

even when the people arenít real.

Once you give a machine a voice,

it gets a personality as if it were human.

Itís an automatic reflex,

and research shows that peopleís feelings change if their computer,

car or coffee machine has a voice.

The vocal acoustics we give machines could

even determine how we interact with them.

Wake up! Wake up!

How can I help you?

What can you do?

You can for example ask me to introduce myself or chat a little.

Can you introduce yourself?

Iím a Furhat robot.

A social robot built to interact with people

in the same way you interact with each other.

So I can smile - and nod.

Gabriel Skantze is one of Furhatís creators.

A first prototype of the robot was launched in 2011.

I looked a bit more crude with cables sticking out of my head.

They came up with the idea of covering the cables with a fur hat and that,

ladies and gentlemen, is where the name ìFurhatî comes from.

I don't really need my fur hat anymore.

I look pretty as I am, don't you think?

I don't know what the original interest comes from really.

I just think it's a very fascinating idea of creating an agent

that interacts like a human and behaves like a human.

It's fascinating by its own right.

But it's also again back to the idea that if we can do that,

we start to get a better understanding of how we, as humans work.

In the future, Gabriel Skantze wants Furhat

to behave like a human during a conversation.

But as soon as scientists try to transfer

our human behaviours to machines, it quickly becomes apparent

how complex our behaviours are.

Today Furhat is supposed to make small talk.

The robot searches the web on its own for responses.

What do you mean by that?

You are quite stupid!

Thatís a rude thing to say, Furhat.

So I have no idea what the robot will say next,

so itís a surprise for me what it says.

And itís a bit fascinating to see how their conversation unfolds.

Although the conversation takes some unexpected turns,

Furhat has already mastered the basics.

When to speak, where their conversation partner is looking,

and how much eye contact is appropriate.

The scientists have programmed Furhat

with a whole range of emotional facial cues.

However, the finer differences we express using mimic

and our voices are proving trickier.

So as humans, for example, we have these micro expressions.

So my eyes move a little bit all the time.

I make small movements with my face.

And we want the robot to have those small movements also.

Otherwise it looks very robotic and not very human-like.

So we think that the face is extremely important

and the way we give feedback to each other and everything is expressed

through the face,

but also through the voice and the tone of our voice and so on.

Thatís why itís so difficult for Furhat to react appropriately.

The same word or the same sentence can come across very differently

depending on the mood, the occasion or the person weíre talking to.

Unfortunately, thereís no user manual for humans that Furhat can learn from.

Not yet anyway.

Thereís plenty of cases where a face can be identical and the same features

But the context, the body and the voice dramatically changes

how we understand that person.

Thereís all sorts of different kinds of cues in terms of intonation,

pitch contour, format characteristics, that change,

how we perceive other people's voices,

the emotions that they're feeling, their intentions.

How do we read moods?

Marc Swerts is researching how tiny movements in our facial muscles

can influence our communication.

The eyebrows, cheeks, lips and chin all contribute

to forming very different types of smiles.

It's very subtle because it has to do with micro expressions

that you see around the eye region already mouth region.

Or you can fake a smile.

Like, if I do this in a very fake manner.

You can see that the person is pretending to be happy

or being cynical or sarcastic but is not revealing

what his or her true sentiments or emotions are.

And it's not only smiling,

it's also in the very subtle movements of eyebrows,

the very subtle movements of blinking.

A recent US TV series focused on body language.

Dr Lightman was the protagonist of the crime show ìLie To Meî.

He was an expert in micro expressions,

who believed facial expressions could expose lies and suppressed emotions.

Can we really identify every single emotion just by practicing?

Some scientists think so.

Apparently, all we need to do is to consciously notice

each millisecond-long facial expression.

The results are used in market research to find out

which commercials are most effective.

Specially trained security teams at airports also analyze facial cues

to spot potential terrorists.

But is it really that easy to tell when criminals are lying?

Hollywood wants us to think so.

43 muscles are responsible for some ten thousand facial expressions.

And if you learn them all, youíll never need a lie detector.

But the scientific world takes a slightly dimmer view.

In real life itís often much harder to do.

For instance there are these claims that from micro expressions

you can see whether someone is lying or not.

But that's close to impossible.

So for most time that people lie about something,

you're close to chance level about guessing,

whether or not someone is speaking the truth or not.

Ironically speaking, if your lie becomes more important,

like if Iím lying about something which really matters, like,

you have to hide something, itís called the pink elephant effect.

Your cues to lie become clear for the other person.

So the more you try your best not to show that you're lying,

the more likely it is that people will see that youíre lying.

How easy is it to tell when someone is lying?

Marc Swerts is looking to children aged five and over for the answer.

The children are asked to tell the prince in a computer game the truth.

But lie to the dragon.

Theyíre supposed to help the prince hide from the dragon.

Cameras and microphones record the childrenís behavior

in an attempt to find any differences.

After recording numerous children,

the results highlight signs that point to lying.

When you look at the face when theyíre being truthful,

they have a very open and spontaneous kind of expression.

When theyíre lying and they have the impression that theyíre being watched

and being observed, you see that they have this sense of

ìOh I'm being observedî.

And you can tell from facial expressions around the mouth area,

which is more marked than in the truthful condition.

And itís something about the voice.

So when theyíre being truthful they have a very soft, normal warm voice.

When they are lying they tend to be a little bit more using a creaky voice

like taking a little bit like this.

But not every child showed the same cues

so itís not a reliable way to tell if they are telling the truth or not.

Generally, we are much better at controlling our facial cues

than our vocal cues.

Every sound we produce is created by over a hundred muscles

all working closely together.

Emotions alter muscular tension which impacts the tone of our voices.

Everything is controlled

by different parts of the brain.

the muscles in the chest and abdomen that create the required air pressure

muscles in the tongue, lips and face that vary the voice

and, of course, the larynx and vocal cords.

The higher the pitch - when we become excited for example

the faster they vibrate.

But does everyone hear the same thing when a stranger talks to us?

Do we all come to the same conclusion in deciding if someone is trustworthy,

extroverted or willing to try new things?

Bjˆrn Schuller is conductiong research using a range of different voices.

Although the group doesnít agree on everything,

the data shows some clear tendencies.

Artificial intelligence is being used to help identify them.

My theory is that if a human can hear something,

a computer can pick up on it too.

But it becomes a little spooky when we go beyond what a human can spot.

Weíre now trying to assess whether the speaker has Covid 19 or not.

Plus for yes and minus for no.

Weíve got one vote for positive.

It was negative.

Hereís the next voice...

We now have three positives and one negative.

Iím going to say positive.

Maurice?

Yes, thatís right.

Diagnosing Covid by simply listening to someoneís voice sounds risky.

At least, when we rely on the human ear.

At the start of the pandemic,

Bjˆrn Schuller programmed a range of voices into artificial intelligence.

Is a more accurate diagnosis now possible?

This is the asymptomatic negative case.

And this is the symptomatic positive case.

What we can see quite clearly on the right here are the upper tones.

There are lots more signals we can use too,

like the uncontrolled vibration of the vocal cords

that leads to irregularities in the stimuli, a certain throatiness,

breathlessness that causes longer speech breaks.

All these things give the computer enough examples to reach a decision

and differentiate between asthma or a cold.

At least 85% of the diagnoses made by artificial intelligence were correct.

Whatís more, computers can also identify ADHD,

Parkinsonís, Alzheimerís and depression by analyzing voices.

Anything that goes wrong in the body or brain impacts our voices.

To make a diagnosis, artificial intelligence

looks at up to six thousand different vocal cues.

The new technology could allow diagnoses to be made more easily,

and earlier.

Every word we say reveals more about us than we realize.

And as listeners, we are influenced by the person speaking to us.

Subconsciously, we relate to the person speaking.

We internalize their anxiety, uncertainty,

excitement or happiness as if it were our own.

Itís a type of synchronization that connects two people through mimicry.

But in general, mimicry is something that we do observe

a lot in normal kind of conversations.

And itís reflected in various aspects of our communication.

From the words we use, the syntax we use, the prosody we produce,

or the intonation and the tempo.

But also the non-verbal communication, for instance, smiling behaviour.

The closer the relationship or a desire for a relationship,

the more intense our subconscious mimicry becomes.

We also mimic more strongly when we want to be liked.

A smile is the clearest signal.

The smiling person often triggers something of happiness in yourself.

Like if you see a smiling person, you sometimes start to smile yourself.

Maybe one of the attractive features of the Mona Lisa

has exactly to do with that.

Like there's something intriguing, something attractive about

looking at the painting because she elicits smiles, she elicits happiness.

We allow ourselves to be influenced by someone elseís mood.

Marc Swerts wanted to take a closer look.

The experiment: the speaker is describing something to her audience.

Her manner is animated and she smiles frequently.

Her audience reacts similarly.

They smile back, nod in agreement and give positive feedback.

But what happens when the same speaker repeats the process,

but more seriously?

Her listeners also look more earnest.

They appear to concentrate more and their reactions are more constrained.

Synchronization signals empathy and interest to the other person.

If our communication is successful, we tune into them more closely.

And itís not just our facial cues that sync.

Itís our voices too.

Trying to express an emotion vocally that weíre not feeling

is nearly impossible.

So what transforms a voice into an instrument that can appeal to,

persuade or motivate other people?

Oliver Niebuhr has carried out numerous case studies

and all have the same outcome: Itís not what we say that counts,

itís how we say it.

The voice is an extremely complex, multi-layered signal.

We all have information that we want to share,

but absorbing it is hard work from a cognitive point of view.

We have to work on how we present it.

Emphasizing words can send a clear signal

indicating which parts of the conversation are important.

We need to consider short pauses too.

In fact, people who communicate like this are seen as more likeable.

So itís all about how we use our voices to ìpackageî the content.

The phonetician is performing a short test.

How well can his co-worker present a text heís reading for the first time?

Artificial intelligence is again used for analysis.

The computer calculates a score of 47.7.

It splits the voice into sixteen parameters including speed,

rhythm, melody, volume and pauses.

A score between 1 and 100 for ìacoustic appealî is then calculated.

Our ears prefer a varied pitch, usually around two octaves.

Charismatic speakers often switch between loud and quiet, fast and slow.

It makes the voice sound more melodic.

For centuries, this principle has been used by populists

and demagogues to capture the attention of audiences.

Because even ancient civilizations understood

that the only way to motivate and inspire people

was to get them to listen.

Public speaking, projecting and modulating your voice

to achieve the best result was taught in classic antiquity.

Now itís become a lost skill.

Itís possible to train your voice to transport information effectively.

Itís no different to learning new vocabulary or grammar.

Oliver Niebuhr has developed a computer training program.

The principle is fairly basic.

The upper and lower lines show the pitch.

Circles in different colours and sizes represent speed, volume and pauses.

Users are shown what they can improve in real-time.

After one day of training, the speaker tries again.

and scores 12 points higher than the previous day.

The main improvements are in pitch variation.

His score has soared from 34.5 to 73.3.

Thatís almost double his previous attempt.

Itís a clear improvement.

Other voices are so seductive that we can lose ourselves in them,

as shown in this experiment.

Some drivers were given instructions by this voice.

And others by this slightly less engaging voice.

What the drivers didnít know was that halfway through the experiment

the sat-nav started giving incorrect directions.

And it got progressively worse and worse.

We wanted to see at what point the drivers would quit:

We were able to show that the more expressive,

the more convincing voice kept drivers following the wrong route for longer,

despite it going against their better judgement.

We had to call them, explain it was just a test and ask them to come back!

An engaging voice plays a key role

when it comes to flirting or meeting a romantic partner.

Hello, Iím Willi

Hello Willi, Iím Cordula.

Iím looking for a woman, for the long-term.

Whoís attractive? Who do we desire?

We make snap judgements when it comes to one of lifeís

most important decisions.

Quickly and irrationally.

I have always thought that snap judgements

are really fascinating in sort of how we determined who we want to date,

who we want to be friends with, whom we don't want to be friends with.

And we don't have a lot of introspective access

to those feelings.

What drives that? We usually encounter people

and we like them or we don't like them.

And we have a good sense of that.

We get along with them and these things determine all of our behaviour.

Are we all born with a universal code?

Is it nature or nurture that allows us to interpret character traits

and read emotions through facial and vocal cues?

One thing is certain.

We react to these cues from a very young age.

Just like our animal ancestors.

Obviously, primates never developed verbal language like humans.

But just like us they communicate vocally

and they know how to interpret the signals.

Scientists have always assumed that there are huge differences

between verbal humans and non-verbal primates.

But research shows that the auditory cortex

in both species is more similar than expected.

Whether verbal or non-verbal,

it makes no difference to how the brain processes signals.

Pascal Belin has tested humans and primates

using functional magnetic resonance imaging or ìfMRIî.

The results show that both groups react to their own speciesí voices

in the same parts of the brain.

Our ancestors probably also used these areas twenty million years ago.

Primates process vocal cues the same way we do - even without language.

The brainís architecture has changed slightly

in humans to process language.

But the mechanisms have stayed the same in other species

for anything beyond language - identity, emotions, personality.

Research into how primates interpret facial cues shows similar results.

Again, similar brain structures to humans are activated in the primates.

Does that mean that we are born with the ability to understand

facial and vocal cues?

Joris is ten months old and is getting ready for an experiment.

Sarah Jessen wants to measure Jorisís brainwaves

to see how he reacts to unfamiliar faces.

Can he already judge who is trustworthy and untrustworthy?

We carried out research where we showed babies

a range of faces for fifty milliseconds.

Thatís only a twentieth of a second.

Itís so quick we assumed that babies wouldnít even register the faces.

However, we identified activity in the brain

that proved the babies had not only registered the faces,

but had even made a decision about whether they were trustworthy or not.

But is it innate or learned?

We donít believe that a baby is born with the ability to judge

whether a face is trustworthy or not.

Itís more likely to be a combination of learning processes

and an inherent or early interest in faces.

Over the first few months, faces and voices are a babyís

most important learning resources.

Parents intuitively use pronounced facial cues,

emphasize certain words and exaggerate.

This captures the babyís attention

and allows them to recognize emotions more easily.

By six months, babies can already differentiate between happiness,

fear, sadness and anger.

Just like a child, Furhat is learning to understand us better

and practicing how to behave in a conversation.

He analyzes the movements and eyes

of his conversation partner using a camera.

Furhat has to know whether I am talking to Furhat

or my colleague here.

So thatís quite tricky and we call that multiparty interaction

where more than two people are talking.

And one of the ways of handling this is

that we track the head pose of the users.

So here we can see that the camera has detected us two here

and you can also recognize our faces.

So if I turn around and look back it will see that Iím the same person still.

Itís time to play a game with Furhat.

The team takes turns drawing a shape

while the other players guess what it is.

Could it be star?

Is it a flower?

Yes, itës a flower.

My guess is feather

Is it a worm?

I know it, it is snake.

It is, yes.

This is a good one.

A boat?

Keep guessing

Furhat will look and sound more like a human over time

but he wonít be identical.

Studies show we prefer a clear distinction

between humans and robots or we find it ìtoo creepyî.

The boundaries are already blurred in the media.

After forty years, ABBA is back on stage.

Not in real life of course, but as avatars.

Itís like time has simply passed the group of seventy-year olds by.

Only their voices are still real.

Weíre at the start of a huge technological shift.

We are often asked to record actorsí voices.

I predict that over the next few years these real-life voice recordings

will become superfluous

because weíll be able to create new,

synthetic voices using scientific principles.

I donít know what youíre talking about, HAL.

I know that you and Frank were planning to disconnect me

and Iím afraid thatís something I cannot allow to happen.

Artificial intelligence making independent decisions

was still a pipedream in director Stanley Kubrickís day.

But today we live with it.

And maybe thatís a good thing.

Because AI doesnít make emotional decisions

or succumb to alluring voices.

Itís neutral and impartial and everything weíre not. Right?

If you program an AI, if they see that, you know,

an African American is linked with hostility and crime

in media depictions on TV,

the AI is going to pick up on that and will act accordingly.

So AI is more like us than we realise.

In ìLie To Meî,

weíre shown how some faces are automatically linked to stereotypes.

What are you guys up to?

Weíre testing for racial bias.

Clock the racist.

Oh, theyíre...theyíre all racists.

Yeah, 80% of people who take this test are biased.

And science dictates that subconscious bias directly impacts our perceptions.

They leave a lot of collateral damage in the brain.

It's not just a stereotype living in sort of a filing cabinet in the brain.

They're changing approach and avoidance tendencies,

behavioural tendencies,

motor tendencies, visual tendencies, auditory tendencies.

Jon Freeman is looking at exactly what happens using fMRI.

The test group is shown several different faces.

As well as the part of the brain responsible for facial recognition,

other areas that process social and emotional information

are also activated.

These areas memorize bias and personality traits.

To provide a rapid response, our brains make fast predictions.

We register what we perceive to be most probable.

This can often be adaptive.

If you walk into a restaurant, you expect to see chairs and tables,

and a waiter, et cetera.

Youíre not going to waste a lot of metabolic resources, the brain's time,

the visual systems resources in processing every single object

in that space.

You generate a bunch of hypotheses. You know what a restaurant is and you

kind of run with those hypotheses and you use expectations

to fill in the gaps at the brain is too lazy

to figure out itself and does not want to waste its resources on.

So are we merely at the mercy of these mechanisms?

Jon Freeman refuses to accept this theory.

Heís carrying out research to find out how we learn these stereotypes

and whether we can ìunlearnî them.

He shows the test group a range of faces

linked to specific character traits.

So given all that, we wanted to explore our capacity

to rapidly acquire completely novel facial stereotypes out of thin air.

People, that have a wide sellion, which is the nose bridge on the face,

and itís a cue that really has nothing to do with anything interesting.

It's just simply how wide the bridge of the nose is.

So itís an arbitrary facial feature and 80% of the time

we're pairing this wide selling with trustworthy behaviors.

So now they see completely new faces,

not the ones that they had previously learned about,

that have wide and narrow sellions, that arbitrary facial feature.

And indeed what we found was that on a variety of different measures,

more conscious, less conscious, that people are applying these stereotypes,

they are automatically activating these stereotypes

without their conscious awareness from just a couple of minutes of learning.

Our brains are highly flexible when it comes to stereotyping.

We can learn to evaluate faces differently

at least over the short-term.

Jon is now looking into a training method that works long-term.

The same principle applies to voices.

Ultimately, the more we know about these mechanisms,

the less susceptible we are to being deceived by first impressions.

First impressions - fascinating, sometimes deceptive

but always a special, even magical, moment.

And maybe the start of a new relationship

where we discover so much more than what we saw at first sight.

Click on any text or timestamp to jump to that moment in the video

Most transcripts ready in under 5 seconds

One-Click Copy125+ LanguagesSearch ContentJump to Timestamps

Paste YouTube URL

Enter any YouTube video link to get the full transcript

Most transcripts ready in under 5 seconds

Get Our Chrome Extension

Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.

Add to Chrome — Free

Works with YouTube, Coursera, Udemy and more educational platforms

Get Instant Transcripts: Just Edit the Domain in Your Address Bar!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube TranscriptPreparing your results…

YouTube Transcript:How our brain judges people in a split second | DW Documentary