YouTube文字起こし：
Bell's Theorem, a Glitch in Reality

動画を最後まで見なくてOK。完全な文字起こしを取得し、キーワード検索やワンクリックコピーができます。

AutoDub

YouTube外国語動画を理解

没入型YouTube日本語吹き替え

言語の壁を越えて、世界の優良コンテンツを楽しもう

無料で使う

動画の文字起こし

動画の要約

Summary

Core Theme

Bell's theorem demonstrates that quantum mechanics is fundamentally non-local, meaning that entangled particles can influence each other instantaneously regardless of distance, a concept that challenges classical notions of space, time, and causality.

Mind Map

クリックして展開

クリックしてインタラクティブなマインドマップを確認

[music]

Hey everyone. Today I have for you a

genuine glitch in reality that's going

to blow your mind and change the whole

way you think about everything. So it's

called Bell's theorem and this is one of

the most mysterious, unsettling,

magnificent results in all of

theoretical physics. So let's talk about it.

Bell's theorem demonstrates that quantum

mechanics is weirdly non-local.

That is, there's something going on with

quantum physics that doesn't seem to be

bothered by the limitations of space and

time. Now, of course, much has been said

about this, including in various popular

science uh articles and videos and all

that sort of thing. You often hear about

quantum entanglement, spooky action at a

distance, and all that kind of stuff.

And there's often some crossover with uh

sci-fi, about communication systems that

work faster than light and all that. And

there's also kind of this woo woo

connotation about consciousness and all

that sort of thing. And those are all

really fanciful notions, but in many

cases, what you hear about Bell's

theorem and quantum entanglement and all

that is not well grounded in the actual

physics and the math of quantum mechanics.

mechanics.

And so I wanted to make a video where we

actually really get into the technical

details of what exactly did Bell teach

us about the nature of reality. And so I

wanted to go through his famous

legendary 1964 paper, you know, word for

word, equation for equation. I want to

really dive into it and explore with you

exactly what is his argument and what

does it imply about the nature of reality.

reality.

I should point out in case you don't

know, I recently made a video on the

Einstein Podilski Rosen paradox which is

definitely a prequel to this video.

In fact, Bell's legendary 1964 paper is

called on the Einstein Podilski Rosen

paradox. Okay, so this is a followup to

the argument that Einstein, Podolski,

and Rosen put forward back in 1935 in

which they looked at quantum mechanics

and said, "Hey, wait a minute.

Something's wrong here. Something's

paradoxical. Either quantum mechanics is

super weird or maybe it's just incomplete."

incomplete."

And so almost 30 years after that, John

Stewart Bell thought about it real hard

and was like, "You know what? Sorry

Einstein and friends, actually quantum

mechanics is not incomplete, but rather

it's just really weird and genuinely

non-local in at least in some subtle

ways." So that's the context in which

Bell wrote this paper. It's a follow-up

to the argument put forward by Einstein,

Podilski, and Rosen. So before watching

this video, I do recommend watching my

video on the EPR paradox. Or if you

haven't seen that video, but you're just

familiar with the EPR paradox, then

that's cool, too. You don't have to get

your info from me. I'm just one of many

sources on this beautiful internet.

All right, then let's get into the

paper. Well, first of all, this paper is

broken up into six parts. Part one is

the introduction.

Part two is the formulation where we

sort of define our terms and think about

what it is we're going to be thinking

about. Part three is an illustration of

some examples.

And part four has the main argument of

the paper in which we find that if you

try to explain quantum physics using a

local hidden variable theory, you run

into a contradiction. In part five, the

ideas are generalized. And in part six,

we have our conclusion. So those are the

six parts of this paper. We're going to

go through them one at a time. And in

between these, I'm also going to have

some animations and some information and

equations that provide context because

one thing you got to know about this

paper is it is so cryptic and it is so

dense with equations and very few words

that if you just try to read it, it's

really hard actually. You really got to

take your time with this one. And so

we're going to take our time and I'm

going to have related animations and

equations to help us along and to fill

in the gaps in the paper where it's

assumed that the reader is going to be

imagining a certain thing in mind when

they read it. Oh, and speaking of, I've

put a link to the PDF in the description

below the video. And I definitely

recommend printing out this paper so

that you have it for reference as we go

through it. If you don't have a printer,

that's fine, but then you should open it

up on another screen or another tab or something.

something.

All right. So now it's time to get into

the introduction of the paper. The paper

begins. The paradox of Einstein,

Podilski, and Rosen was advanced as an

argument that quantum mechanics could

not be a complete theory, but should be

supplemented by additional variables.

Remember at the end of the EPR paper

they talked about how quantum physics is

incomplete and it's missing something

and you have to put variables into

quantum physics in order to have it

provide a complete description of reality.

reality.

These additional variables were to

restore to the theory causality and locality

locality

and that's often called local causality.

It's just the idea that cause and effect

should propagate such that an object is

only affected by its immediate

surroundings. as opposed to some kind of

weird teleportation or spooky action at

a distance. So Einstein and friends

argued that you have to put some kind of

additional variables into quantum

mechanics in order to resolve the EPR

paradox and give quantum mechanics local causality.

causality.

In this note that is Bell's paper, that

idea will be formulated mathematically

and shown to be incompatible with the

statistical predictions of quantum

mechanics. So that's what we're going to

do today. We're going to mathematically

explore the concept of hidden additional

variables in quantum mechanics and show

that it doesn't work and that therefore

quantum mechanics genuinely does exhibit

non-local phenomena which is crazy. Like

that goes against everything we think we

know about the nature of reality.

Anyway, it is the requirement of

locality or more precisely that the

result of a measurement on one system be

unaffected by operations on a distant

system with which it has interacted in

the past. That creates the essential

difficulty. So the hidden variable story

doesn't work if you require the theory

to be local. There have been attempts to

show that even without such a

separability or locality requirement, no

hidden variable interpretation of

quantum mechanics is possible.

These attempts have been examined

elsewhere and found wanting. That is to

say, actually, you can make a hidden

variable interpretation of quantum

mechanics work if you relax the

constraint of locality. But then it's

like what's the point, right? Moreover,

a hidden variable interpretation of

elementary quantum theory has been

explicitly constructed. Here he's

referring to bombian mechanics. That

particular interpretation bomb mechanics

has indeed a grossly non-local

structure. Famously bow mechanics is a

non-local theory. This the non-locality

is characteristic according to the

results to be proved here of any such

theory which reproduces exactly the

quantum mechanical predictions. That is

to say, what we're going to show in this

paper is that if you want a theory that

matches the quantum mechanical

statistics and you want it to involve

hidden variables as advocated for by

Einstein, Prolski, and Rosen, then

necessarily you're going to end up with

a non-local theory. And of course, that

non-locality is the same kind of dilemma

that you end up having to confront if

you just take quantum mechanics at face

value in which it does appear to be a

non-local theory. So, no matter how you

look at it, there's some weird non-local

stuff going on in quantum mechanics.

All right. Now, before going further, I

want to say a few words about spin 1/2

particles because spin 1/2 particles are

the main characters of this paper. And

so, it'll be helpful to review some of

the main points regarding the experiment

and theory of spin 1/2 particles.

So on the experimental side for sure the

most important and famous spin 1/2

experiment is the stern gerlock

experiment. The way this experiment

works is imagine that you have an oven

and inside the oven you put some silver

and the oven is so hot that the silver

atoms start to evaporate and fly around

with crazy high speeds and some of them

are going to fly out of a hole in the

oven. And then suppose you have some

kind of apparatus called a columator so

that we end up with a line of silver

atoms flying in a particular direction.

And also suppose this whole experiment

happens in a vacuum so that the silver

atoms aren't bumping into air as they

fly along. Now then this beam of atoms

is directed to fly through a strong

non-uniform magnetic field. And

amazingly, what happens is that magnetic

field somehow splits the beam of atoms

into two beams. And it's like, what

what's going on with that two beams? Why

do we have two beams? How can it be that

you have one beam of atoms coming in and

you have two beams going out? Well, the

key to understanding this is that a

silver atom is electrically neutral.

It's 47 protons perfectly cancel out.

It's 47 electrons because it's just a

neutral atom. It's not ionized. But if

you look at the electrons in a silver

atom, you find that all of the electrons

are paired up in their various orbitals,

but there remains a single unpaired

electron in the 5s orbital.

And so for all of the paired electrons

in the silver atom, their spins cancel

each other out. But the unpaired 5s

electron has a spin of 1/2 because an

electron is a spin 1/2 particle. And as

a result, it's sort of like the whole

silver atom behaves like an electrically

neutral spin 1/2 particle. So that

unpaired electron spin gives the whole

atom a tiny magnetic moment. That is it

makes the silver atom sort of like a

tiny little magnet.

I should also say the nucleus of the

silver atom also has a net spin of 1/2.

But because the nucleus is so tightly

packed compared to the electrons, the

magnetic effect of the nuclear spin is

thousands of times smaller than the

magnetic effect of the electron spin. So

for all intents and purposes, it doesn't

matter in this experiment.

So then what happens to the silver atoms

as they're flying through this apparatus

is that the initial beam is totally

thermally random. I mean, you're talking

about evaporated silver atoms. There's

no preferred directionality to the spin.

It's all a random distribution over the

spin directions. But then as they fly

through the sternerlock magnet, for some

reason the spins get projected either

onto purely spin up or purely spin down.

And that's really weird because it's not

this distribution of some continuous

quantity. No, it's a quantum like either

up or down. There's only two options

that it can be, which is super weird,

right? This is a very quantum effect.

And so then if we want to say okay well

these two states are going to be

separated by one quantum unit then you

realize that given the symmetry of the

situation since both beams are deflected

by equal amounts we can say that spin up

is associated with a quantity of plus

1/2 and spin down is associated with a

quantity of - 1/2. So that the

difference between plus one/2 and minus

one/2 is one quantum unit. And so that's

why we call this a spin 1/2 particle.

Okay. So we have two discrete beams. And

clearly there's something weirdly

quantum going on here. But what's really

going on here? You know, cuz the story I

just told about spin 1/2 and the

electron, it's like a little magnet and

it separates out. What does that really

mean? Like physically, how should we

imagine that? Well, in a moment I'll

tell you a little bit of the quantum

theory and then we'll also imagine some

kind of speculative hidden variable

theory and we'll see that those don't

really work. So, we'll get into the

theory in a moment, but for now I

actually want to stick on the

experimental side of things so that we

can learn a little bit more about how

spin 1/2 particles actually behave.

So, imagine we do a sternerlock

experiment where we have a beam of

silver atoms flying through. It goes

through the sternerlock magnet and it

splits into two beams, spin up and spin

down. Now suppose we put a wall so that

all the spin down atoms hit the wall and

they stop going. But then the spin up

atoms, they can fly right through and

they can keep going. And now we have a

beam of spin up atoms. So then we line

it up and pass it through another

sternerlock magnet that's oriented along

the same axis, the same direction in

space. Well, then an amazing thing

happens, which is that in the second

Stern Gerlock magnet, we only see a spin

up beam. There's no spin down. And I

guess that's not too surprising. It kind

of makes sense because we start off with

a random beam of silver atoms. We split

that into a spin up and a spin down. And

then we reme-measure and we find, okay,

there's only spin up. Yeah. Okay, that's

not too mind-blowing. That kind of makes

a lot of sense, right? And remember, all

of this is happening in a vacuum

chamber. So there's no air molecules

that the silver atoms are bumping into

cuz if there were, then we could imagine

the beam kind of rerandomizing. You

know, eventually the silver atoms are

slamming into air molecules and getting

all reoriented and all that sort of

thing. So this is all happening inside a

vacuum chamber. What this two-stage

Stern Gerlock experiment shows is that

spin is a state that the atom is in,

right? It's a property that persists

with the atom and has some continuity

across time. So that it makes sense to

say this is a spin up atom at least for

now. You know, I mean, it can bump into

something and change its spin. But

supposing it doesn't, then it can

continue on in that spin- up state for

some amount of time. So that's cool.

That gives us some sense of the

physicality of spin. But we're still

left with the mysterious question of why

do we have two discrete options for a

spin measurement anyway as opposed to

some continuous range of outcomes? And

how should we visualize a spin state?

Well, again, we'll talk about the theory

of that in just a moment, but there's

one more experimental thing I want to

show you before we get there. What we're

going to do now is imagine slightly

rotating the second magnet by some small

angle theta. And then a magical thing

happens. The second beam now mostly

comes out as spin up. But now there's

also a spin down beam as well. And it's

very subtle because all the spin up

atoms that are flying through the second

detector, most of them are going to come

out spin up. But every now and then

there is a chance that it'll come out

spin down. And so if you think about

many atoms flying through and so it's

sort of like a continuous beam

situation, then imagine a very bright

spin up beam and a dull but nonzero spin

down beam. And so then the question

becomes what is the probability of it

being spin up versus spin down in this

kind of an experiment? And there's

actually a very good agreement between

quantum mechanics and experimental

results which show that for the atoms

passing through the second magnet they

have a cosine^ squar theta /2

probability of being spin up and

likewise a sin^ 2 theta /2 probability

of being spin down. Remember that

cosine^ 2 + sin^ square is 1. So those

probabilities add up to one 100%. And

we're going to take that as sort of a

ground truth for this video.

This cosine^ 2 / 2 sin^ square. We're

going to take that as an absolute fact

about reality because it has been

measured in many experiments and it is a

pretty direct result of quantum theory.

Oh, and one thing I should say in this

diagram, you see that second beam is

still horizontal even though I tilted

the picture of the detector. In reality,

if you're doing an experiment like this,

you would want to realign the second

beam so that it comes in parallel to the

detector. But there are ways of doing

that without modifying the spin state of

the particle. So I just didn't show that

in this diagram because I wanted to keep

things simple. Actually, let me show you

this. This is a cool much better

diagram. So this comes from Wikipedia.

Shout outs to Clara Kate Jones for

making this beautiful diagram. What this

diagram shows is a two-stage Stern

Gerlock experiment. The particle beam

comes in. You get a 50-50 split between

spin up and spin down denoted as Z plus

and Z minus. You know, because we're

measuring along the Z-axis.

Then we send that second beam through

the second detector. The second detector

appears to be tilted, but is actually

just in alignment with the way the Z

plus beam comes out of the first detector.

detector.

But now I want to look at something

really cool, which is what if the second

detector measures along a whole

different axis. So, for example, if the

second detector measures along the

x-axis, the spin up particle beam goes

through the second detector and then

splits into a 50/50 probability mix of

being spin left or spin right. By the

way, instead of spin left and spin

right, let's use the language spin up

along x and spin down along x. So you

see when we say spin up and spin down,

it's always with reference to a

measurement axis and spin up is going to

be the beam which goes up relative to

that axis. Okay? So we can always use

the words spin up and spin down. But in

this experiment, you can also think

about it as spin left and spin right

when we're measuring along the x-axis.

I suppose this experiment is not too

surprising either because we see that

the particles come in spin up. We

wouldn't really expect any kind of

probabilistic biases as far as spin

left, spin right because all we know is

that the particles are all spin up and

up is perpendicular to left and right.

So it'd be kind of weird if the second

particle beam had some kind of bias

towards left and right, right? Like

where would that come from? We should

still expect some kind of randomness

along the x direction. Okay, so that

doesn't really blow your mind, but this

See, imagine we have a three-stage

experiment where the particle beam comes

in, the first detector splits into spin

up and spin down. We send only the spin

up through. Then the second detector

measures along X. So we get our spin

left and our spin right or in other

words along X. We can talk about it in

terms of spin up and spin down along X.

And then suppose we only allow the spin

up along X beam to go through. Then we

measure again along the Zaxis. And the

craziest thing happens. Look what we

get. We get a 50/50 particle beam of

spin up or spin down along Z. Well, how

can that be? Because the first magnet

already filtered out all of the spin

down along Z. So, shouldn't we expect

for the outgoing beam, we should have

only spin- ups, right? Isn't that what

we should expect is only spin up along Z

because the first magnet already

filtered out the spin down. But no, in

reality in experiments, you get a 50/50

spin up along Z. So what is going on

there? That's very strange. And the

reason this is so strange is that we

know that spin is a property of the

atom. We know that it's a physical thing

that the atom carries with it as it

moves along. Right? Right? I mean, we

thought about this earlier and we

realized, yeah, the Stern Gerlock

experiment shows us that spin is a state

that the atom can be and it's a property

of the atom at some moment in time. And

so, how can it be that if we've filtered

out the spin down along Z atoms, somehow

after the third detector, we get spin

down along Z? Like, what's happening

there? How can spin be a conserved

quantity if it comes back like that?

Like, what's going on? Now, what I'm

showing here, this is just an

experimental fact. This is the reality.

And then as people, it's on us to figure

out how do we tell a story that makes

sense of this reality. And so in just a

moment, I'm going to tell you the

quantum story, which is going to explain

what's happening here. And the long

story short of that is when you measure

the spin along some axis, the particle

forgets its spin information along the

other axis because you're resetting the

spin state of the particle. you're

projecting it into a spin igen state of

whatever axis you most recently measured

it on. And so once you measure it spin

up spin down along X, now all of a

sudden if it's in a spin up along Xigg

state, that has equal 50/50 odds of

being measured spin up or spin down

along Z. But then of course when you

learn quantum physics you're always

thinking about this is so weird and so

strange and I don't like it and surely

there's some kind of more classical

explanation with some kind of hidden

variable. Surely there's some kind of

secret behavior happening inside the

atom or to do with these detectors.

Maybe the detectors are modifying the

atom in such a way as to flip them up

and flip them down and kind of reset

their state. All right. So when you

learn quantum physics, you yearn for a

more sane explanation.

And especially, you know what would be

really nice is if we didn't have all

these weird quantum probabilities,

right? So wouldn't it be cool if we can

come up with some kind of explanation

for what's going on in the Stern Gerlock

experiment, but rather than this

confusing quantum story with wave

functions and states, what if we can

come up with some kind of more classical

deterministic model of what's going on

here? Even though such models don't

work, it's still very helpful to give it

a try, see what we can come up with, and

then when we figure out the way in which

the model doesn't work, that'll help us

appreciate why we need quantum

mechanics, even though it's super weird.

And seeing the failure of these local

hidden variable models is going to segue

very nicely into the core argument of

Bell's paper. All right. So, I want to

return to this picture of the two-stage

Stern Gerlock experiment where we use

the first magnet just to filter out the

spin down atoms and give us a beam of

nice pure spin up atoms. Then, we're

going to send those through a second

detector tilted relative to the first by

an angle of theta. And as we talked

about earlier, the probability of the

atom being spin up in the second

detector is going to be cosine^ 2 of

theta / 2. In this plot, we put the

theta angle along the x-axis and we put

the percentage probability that it'll be

spin up on the y-axis.

So on the far left of this plot, you can

see that we have a 100% chance of

measuring spin up when the second

detector is tilted 0° relative to the

first. That is when they're in

alignment. A spin up coming in is always

a spin up going out. On the opposite

extreme, if you imagine we put the

second detector all the way upside down,

180 degrees tilted, then relative to

that orientation, the detector is going

to say, "Hey, every particle spin down."

And now that's not too surprising

because all that is is we're flipping

the second detector around. So what was

defined as spin up is now relative to

the second detector spin down. And so

really, we don't have to think about an

angle of all the way up to 180° because

the interesting stuff happens with a

tilt angle between 0 and 90°. And beyond

that point, there's a kind of symmetry

where it's the same thing, but it's just

everything's flipped relative to before.

And speaking of 90°, if we tilted the

second detector 90°, then we'd have a

50/50 chance of an incoming spin up atom

going out as either spin up or spin down.

down.

Here's an animation, and this will give

us a more dynamic picture of what's

going on here. So, we have our incoming

beam of silver atoms coming in from the

left. They go through the first

detector. We split out, spin up, spin

down. The spin- ups keep going. And on

the right, what I'm showing here, and

this is just a rectangle, so it's kind

of abstract, but all I mean to indicate

there is we're doing a spin measurement

along the axis symbolized by the

orientation of that rectangle.

And as the rectangle goes back and

forth, you can kind of get a feel for

how the relative probability of

measuring spin up and spin down along

that second measurement axis changes as

a function of the angle.

On one extreme, when the detectors are

aligned, spin up is always spin up. On

the other hand, when the detector is

90°, we get a 50/50 split. And in

between, we get a probability which goes

with this cosine^ 2 theta / 2 curve.

Now this equation the cosine^ square of

theta / 2 comes from the spinner math of

what happens when you project a spin

state relative to one axis onto another

axis. But all of that spinner math and

projection and all that that's the weird

quantum stuff we don't want to have to

deal with if we don't have to. So when

we're trying to come up with a hidden

variable explanation, we want to think

in terms of some kind of quantity that

we can attach to each particle. maybe

some kind of arrow that indicates some

sort of direction. And you know, one of

the first things that comes to mind when

you think about the Stern Gerlock

experiment is maybe each incoming atom

has some kind of vector-like directional

quantity associated with it and then

maybe the detector sort of flips that

vector up or down as the particle passes through.

through.

Now, I'm not saying that's the case. I'm

just saying that's kind of something

that we might instinctively or

intuitively think might be the case. And

so let's go ahead and test our intuition

against logic and reason and see if it

actually holds up. So what I'm showing

here is an animation where we have these

atoms coming in and there's a yellow

vector associated with each one of them

which encodes some sort of orientational

direction like thing that goes with the

atom. And so for the sake of argument,

we can say our incoming beam should have

a random distribution over those vector

angles because these are evaporated

silver atoms and it's all thermally

random. Then suppose we claim that what

a sternerlock magnet does is it's going

to flip that arrow either up or down.

And then if it flips it up, it sends it

upwards. If it flips it down, it sends

it downwards.

Well, at first glance, an explanation

like this seems like it could possibly

be kind of what's going on here. This is

a model where the Sternlock magnet plays

a really active role in aligning the

particle a certain way. And whether or

not it flips up or flips down, we can

say the rule there is just if the vector

is pointing even a little bit up, it

goes up. If it's pointing even a little

bit down, it goes down. If it's pointing

perfectly horizontal, well, in reality,

nothing's perfectly horizontal. There's

probability zero of that happening. And

even if it did happen, it happens so

rarely you'd never even notice.

You know, the cool thing about physics

is that you can put an idea forward and

you can really propose it like, hey,

maybe this is how it is. But one of the

rules of physics is you have to stick to

whatever principles you propose. But

then if you can show that your own

principle leads to a contradiction, well

then sorry, but you have to redesign

your model. Okay. So what I want to show

now is that this assumption that the

sternerlock magnet flips up or flips

down the atom is actually not consistent

with the experimental data. And the

reason is actually very simple and you

can totally see it which is that if you

have a two-stage Stern Gerlock

experiment where the second detector is

tilted. We know from the experimental

data that when the second detector is

tilted then some of the particles should

sometimes come out spin down even if

they went in as spin up.

But if we tilt the detector anywhere

between 0° and all the way up to 89.9°,

then by this rule that the sternerlock

magnet is going to flip the particle in

whichever way it was already kind of

pointing in. Well, that leads us to see

that an incoming beam of spin up is

always going to come out spin up.

And so right there you see that this

model doesn't actually work by our own

principle that we put forward about

these arrows getting flipped up or

flipped down and and all that it doesn't

work. It just doesn't match the

two-stage Stern Gerlock experiment.

And so whatever is going on with spin,

it's not that. It's something else.

So what do we do? Well, just because our

model didn't work doesn't mean we can't

massage it into something that might work.

work.

So let's go ahead and see if we can

massage our model into something which

matches the experimental data at least

better than our first attempt which kind

of matched the data in the case of one

sternerlock magnet but failed miserably

when we had two and the second one was

tilted. Well, okay. So what if we did

this? Let's say that a sternerlock

magnet doesn't actually flip the

particle up or down, right? Because if

it does that, then as we've seen, the

second detector is just going to give us

a bunch of spin ups and no spin downs.

So let's say instead of flipping the

arrow up or down, the Stern Gerlock

magnet just kind of passively sorts

these particles based on whether their

vector points a little bit up or a

little bit down.

And so any vector that points even a

little bit up, that gets sent towards

the up beam. And any vector that points

a little bit down, that atom goes in the

down beam. But the sternerlock magnet

doesn't change the direction of that vector.

vector.

So maybe this vector represents a kind

of classical spin axis. Then in this

model, the angular momentum of the

particle would be conserved as it passes

through the detector. But somehow and

for some reason, the detector is just

sorting the incoming particles into two

beams depending on whether they're a

little bit up or a little bit down.

Well, you know, there's a problem with

this model, which is that

philosophically, it's starting to feel a

bit contrived because it's hard to

reconcile the fact that we see two

discrete beams with such a passive thing

going on at the detector.

Because at least before when we thought

that maybe the magnet just flips the

thing up or flips the thing down, there

you have kind of a naturally physically

dichomous situation where yeah, it's a

sword, but then it's also an action

where the particles are really separated

out in a binary way.

So if you have a more passive situation

where it's just a sword, you kind of

have to wonder, well then how is it that

we get two sharp beams? But never mind

all that because even though it seems

implausible, that's different than it

being illogical or impossible or

incoherent. You know, nature is weird.

So maybe this is how it is. But now if

we take this model and pass it through a

second sternerlock magnet, the question

comes up of does this model match the

data? In particular, do we find a

cosine^ squar theta / 2 of an incoming

spin up remaining spin up versus a sin^

square theta /2 probability of it going

spin down? Well, if you just look at the

animation shown here, you can see that

at first glance it kind of does seem to

work because when the second detector is

not tilted at all, anything coming in

spin up is going to go out spin up. So

that's good. at theta equals 0, this

model matches experiment.

And then if you imagine at 90°, well,

there it's a 50/50 because coming in the

spin up beam, that's just going to be a

vector that's pointing up a little bit,

but the distribution is totally random

as far as left and right. And so when

the detector is tilted at 90°, that

could go either way at that point, you

know. And so there again, we find

another angle at which our model matches

the data. And another wonderful thing

about this model is that for

intermediate angles, it kind of seems

like it would fit the data. You know, if

you tilt the detector like 45°, you can

see there's kind of a chance that it

would be spin down versus spin up. And

so at first, this feels very exciting

and very promising.

But when you think through it carefully,

you realize that this model actually

doesn't quite [clears throat] match the

cosine squared statistics that we get

from the experiment and from quantum

physics because instead of a cosine

squared function, it's actually just a

linear function in theta. And that's

actually a very important point. So I

want to linger on that for a moment and

I want to see exactly why this model

gives us a probability which is linear

in theta. So you think about the fact

that we have evaporated silver atoms

coming in and presumably they're all

going to be randomly oriented. And so if

we want to come up with a picture that

involves this hidden variable of an

orientational vector-like degree of

freedom, call it lambda, then the

situation we're describing here begins

with lambda vectors chosen totally at

random as far as their direction is

concerned. And if you like, you can

imagine lambda is being selected

uniformly from the unit circle. or if

you want to be fully three-dimensional,

the unit sphere. Although, as we're

about to see, it actually really doesn't

matter whether we think about it in

terms of a two-dimensional situation or

a three-dimensional situation. In either

case, we find the same linear trend. All

right, then. So, the particle passes

through the first sternlock magnet and

all of these vectors lambda that were

pointing a little bit downwards get

filtered out. They go in the spin down

beam and we block that. But then if the

vector is pointing even a little bit up

then it keeps passing through and then

it moves on to the next sterner lock detector.

detector.

So let's go ahead and use the vector P

to symbolize the polarization vector

that is the axis of measurement for the

first sterning lock magnet. You see here

based on the diagram that all of the

particles that have made it through our

filter are all going to be measured spin

up if they're measured again perfectly

along the direction P with no tilt angle.

angle.

And so that's what it means

experimentally to prepare some spin 1/2

particles some firmians with the spin

polarization along the vector P. It

means that for sure we know if we

measure the spin along P we're going to

get spin up.

Now then what can we say about that

hidden variable vector lambda? Well, we

can say that the particles that are

allowed through necessarily have lambda

which is somewhere in the northern

hemisphere. that is the hemisphere that

points in the same kind of direction as

the polarization vector P. Or in other

words, these are the lambda such that

lambda.p is greater than zero. And the

lambda are still going to be uniformly

distributed around that hemisphere

because they came in uniformly

distributed around the sphere and we've

just cut it in half. So now we want to

ask the question of what is the

probability of a particle with some

lambda vector being measured spin up in

the second detector which would happen

in our local hidden variable model if

lambda. A is greater than zero. That is

if the lambda vector happens to be

pointing in the same hemisphere as the

measurement axis a. And when you think

about it, you realize that the

probability of lambda measuring spin up

depends on the overlap of the lambda

hemisphere and the a hemisphere.

See, cuz if we draw a and then we think

about the hemisphere of vectors that

point in kind of the same direction as a

that is for which the vector a is

positive, you realize that the set of

all lambdas which are going to be

measured spin up is precisely the

overlap between the lambda hemisphere

and the a hemisphere. And given that

lambda is going to have a uniform

probability distribution, we can see

then that the probability of measuring

spin up is just going to be the fraction

of the lambda hemisphere that overlaps

with A. And the probability of it

measuring spin down is going to be the

fraction of lambda's hemisphere that

does not overlap with A. And if you see

that, then you see one of the core

concepts of Bell's paper. We're going to

describe this slightly differently in a

moment when we get into the paper and

it's going to be a little bit more

complicated, but this right here is a

very fundamental insight. Imagining

rotating hemispheres and seeing how the

overlap varies linearly. That is a

mental image that you want to keep in

mind as we get into parts three and four

of the paper. All right, then. So, just

to be really formal about this, let's go

ahead and say that theta is the tilt

angle between our polarization vector P

and our measurement axis vector A. And

then I want you to go ahead and imagine

rotating theta from 0 to pi or 180 if

you want to talk in terms of degrees.

Well, when you start off with theta

equals 0, p and a are aligned the same

way. And there's a complete overlap

between the lambda hemisphere and the a

hemisphere. And so you have a 100%

chance, guaranteed chance that when

theta is zero, you're going to measure

the particle spin up. But now imagine

theta growing and growing until theta

equals 90° or p<unk> /2 radians. Well,

at that point you're going to have a

50/50 overlap between the lambda

hemisphere and the a hemisphere. And so

then you're going to have a 50/50 chance

of measuring spin up versus spin down.

And then if you go ahead and flip it all

the way around 180° A and P are

perfectly antiparallel, then it'll be

guaranteed that you'll measure spin down

for a theta of 180°. Bearing in mind

that spin down is relative to that

upside down vector a. Now these three

points for which theta is 0, theta is 90

and theta is 180° all of those actually

do match the experimental data and

quantum mechanics. So that's all good.

But what's not all good is that linear

dependence on the probability of

measuring spin up as a function of the

angle theta. And you can see that linear

dependence just based on the way the

area fraction changes as you slide theta

around and you change the overlap

between these two hemispheres.

You know, one way to think about the

probability logic here is just imagine

you're playing one of those board games

that has the spinner thing and you spin

the thing and then the probability that

it lands on some wedge is just going to

be the wedge area. Well, yeah. So when

you think about that kind of logic and

then you think about the wedge area of

the overlap between the hemispheres and

the way it changes you can see that the

probability is indeed linear in theta.

But now that linearity is actually a

real problem because from experiments

and from quantum mechanics we can very

confidently say that the probability of

measuring the particle spin up is not

linear in the tilt angle theta but

rather it's the cosine^ square of theta

/ 2. And that fact that cosine squared

curvy fact makes our linear model very

hard to believe because the math is

wrong. the statistical predictions of

our model are not the true statistics of

the situation.

So what do we do? We just give up. Well,

we actually should give up because as

we'll see in this, you know, the whole

paper is about how local hidden variable

models don't work. But let's not give up

yet. Let's be very stubborn, okay?

Because technically there is a way that

we can fix this particular model for

this particular situation.

And the way in which we do that is going

to involve a concept which we'll see

later on in the paper. So we're going to

try to save this model somehow. And the

way that we're going to try to do that

is going to be illustrative and teach us

something about the situation. Even

though ultimately this fix is going to

break down when we later on start

looking at quantum entanglement.

All right. Then so the way to fix the

model is to define an effective

measurement axis. Call that a prime. and

define that as the measurement axis A

tilted towards the polarization vector P

such that the equation 1 - 2 theta prime

pi= cosine of theta is satisfied. Now

here by theta prime I mean the tilt

angle between the polarization vector p

and the effective measurement axis a

prime which has been magically tilted in

towards the polarization vector p. And

when you look at this equation here with

the 1 - 2 theta prime pi, that is a

linear equation. And then you look on

the right hand side and that's a cosine.

Now this equation here, it's not

immediately obvious what this has to do

with cosine^ 2 thet. In just a minute

though, we're going to talk about

expectation values and cosine of theta.

And then when we come back to this

equation later on in the paper, it'll

make more sense why exactly it has the

form that it does. But I don't want to

get into that just now because it's a

bit of a tangent. For now, all I want to

say is that this equation involving

theta prime and theta is going to warp

the linear probability dependence of our

model which is linear and theta is going

to warp that into the cosine^ 2 theta /2

curve that we expect from quantum

mechanics. And in fact, that is the

definition of where this theta prime and

theta equation comes from. So this trick

is actually a lot simpler than it seems

because when you think about what we

have here, as we've seen, our model

works when theta is 0, when theta is

90°, when theta is 180, but it breaks

down in between because we have a line

instead of a cosine squar. And so all

this trick is is just saying that we can

go ahead and warp that line into that

cosine squared curve simply by saying

that the effective measurement axis that

the particle is actually being measured

along is not the A that we thought it

was but is actually this A tilted

slightly towards the polarization vector

P. And by doing that we can go ahead and

bend the statistical predictions of our

model in such a way as to make it match

the experimental data and also quantum mechanics.

mechanics.

Now, the first time you hear this, I

mean, you should be thinking, "Rich,

come on now. What? This is absurd. We

should not tolerate this. We should not

go along with this." Your eyebrow should

raise skeptically to the point where

your forehead starts to get sore. Like,

there's just no credible way to justify

this move, this little trick that we're

doing. And so, for that reason, I want

to go ahead and call this the sketchy

move. I know it's kind of a playful

terminology, but there's a couple of

good reasons why we want to call it

this. First of all, it's a concept that

we're going to see a couple more times

throughout the paper. And then secondly,

I want to emphasize that this move is

not illegal. It's not logically

impossible. Technically, it doesn't

violate locality. There's nothing uh

physically impossible going on when we

put forward this model. But it's

extremely sketchy and hard to believe

because it raises so many questions. Why

should the effective measurement axis be

a prime? And also, how is it then that

we have the polarization vector and also

our hidden variable lambda vector that

we both have to take into account?

Because the polarization vector bends

the effective measurement axis. Then we

also have this lambda vector and what's

going on there? And our whole model

starts to become complicated and

contrived and very very hard to believe.

But we're not going to dismiss it just

yet. because later when we think about

quantum entanglement, we're going to

prove that even the sketchy move is no

longer enough to save our model or any

local hidden variable model. And that's

really at the heart of Bell's theorem.

So in summary, by going along with the

sketchy move for now, we're being

maximally open-minded, we're giving the

local hidden variable perspective every

benefit of the doubt. So that later on

when we absolutely destroy local hidden

variables, when we crush this idea,

we'll say, "Look, we even allowed the

sketchy move and that still wasn't

enough to make it work."

Now, I want to take just a moment to

talk about the kind of mathematical

vocabulary we use in quantum physics

when we're describing measuring the spin

of a spin 1/2 particle along some

direction, call it a. And to do that you

often see this expression sigma a. Let

me tell you what that is. So we have the

famous poly matrices which are sigma x

is 0 1 1 0 sigma y is 0 i i 0 and sigma

z is 1 0 01.

And you can find the definition of these

polymatrices in Griffith's intro to

elementary particles equation 4.26.

Although honestly if you just Google

polymatrices you'll find them all over

the place. They're super famous. And

these polymatrices are generators of sud

2, the le algebra of su2 which is the

group that has to do with

transformations of two component

spinners. It's the special unitary group

of degree 2. Anyway, today we don't need

to get into the group theory of su2, but

I just bring up the poly matrices in a

sort of vocabulary like context. Like

we're not actually going to have to

explore their mathematical properties,

but I just want to show you why it is

that these matrices are associated with

measuring the spin of a spin 1/2 particle.

particle.

You often see sigma with an arrow over

it. And you can think of that as a

vector whose components are the three

poly matrices. So you have sigma x,

sigma y, sigma z all packaged into this

vector-like quantity. And with that

sigma vector, we can go ahead and define

the spin operator along the unit vector

A as S hat. The spin operator equals H

bar / 2 sigma. A.

And what we mean by sigma A is we're

going to multiply all of the components

of our measurement direction A with each

of the corresponding poly matrices. So

we have a sub x sigma x plus a sub y

sigma y plus a subz sigma z. So when you

pick out a particular direction in

three-dimensional space and you want to

measure the spin of a particle along

that direction, the components of that

direction unit vector are like weights

of how much of each of the poly matrices

we're going to bake into our spin

operator along that direction.

Now why do we care about a spin operator?

operator?

Well, as we talked about in the EPR

paper, when you have an observable

quantity like spin, the value of the

quantity is going to be the igen value

corresponding to the igen states of the

operator. So if we have a spin 1/2

particle and its state is represented by

the two component spinner s then the

spin operator acts on s as the equation

shat operating on s is h bar / 2 * sigma

a * s

and bear in mind sigma do a this is

going to be a 2x2 matrix in fact if you

want to think about it in terms of the

lee algebra sue 2 that matrix is going

to live at the coordinate It's a sub x,

a sub y, a subz within the lee algebra

which is spanned by the poly matrices

sigma x, sigma y, sigma z. If that makes

sense, great. If it doesn't, don't worry

about it. That's a level of group theory

that we don't have to get into today.

Instead, I want to give you a specific

example of what it means for a particle

to be an igen state of the spin operator.

operator.

So if a particle has definite spin, that

is we've measured the spin and it's

either spin up or spin down along some

axis, then it is going to be an igen

state of the spin operator along that

axis. That's what the measurement does.

You measure the spin of a particle and

you're projecting its wave function onto

an igen state of the spin operator along

that axis. And so therefore s is going

to be a solution to the equation of shat

acting on s equals lambda s for some

real value lambda which is going to be

the spin of the particle.

As a concrete example let's suppose

we're measuring the spin of a particle

along the zaxis.

Well in that case our direction vector

becomes 0 0 1 cuz the vector doesn't

point in x. It doesn't point in y it

points entirely in z. And so therefore

if we evaluate this quantity of sigma a

we find that we have no sigma x no sigma

y and all sigma z. And so then our spin

operator along the z direction becomes h

bar / 2 1 0 01.

And so now if we want to solve for what

are the igen states of spin up and spin

down along z all we have to do is solve

this equation of h bar / 2 * this sigma

z matrix * s equals lambda * s for some

real igen value lambda and this igen

vector igen value equation has the

solutions of 1 0 or 0 1 for s and then

you find igen values of plus h bar / 2

and minus h R /2 respectively. And you

can verify that for yourself if you plug

into that igen vector igen value

equation these different options for S

and lambda.

Oh, and one other thing I'll say is that

for these igen vectors, you can go ahead

and slap a complex phase factor onto

both components and they remain states.

And in a moment, I'll show you a picture

which makes that point obvious. But for

now, I just leave that as a mathematical

algebraic statement. All right. Right.

Now, instead of the spin operator S hat,

we may as well just talk in terms of

sigma. A, which is conceptually it's

exactly the same thing as Shat. The only

difference is it's not scaled by that

factor of H bar / 2. And so therefore,

this sigma operator has nice

dimensionless values of plus or minus

one for spin up versus spin down. And so

therefore the sentence the particle was

measured spin up along the axis can be

said as measuring sigma. A yielded a

value of + one. Or in other words if you

want to say the particle was measured

spin down along the axis. We can say

sigma. A yielded a value of negative 1.

Or if you want to say the particle was

measured spin up along the b axis you

say sigma.b yielded a value of + one.

Right? So [clears throat] what we have

here is a very concise and mathematical

way of saying that a spin 1/2 particle

was measured along some axis and the

result of that measurement is simply the

igen value + one or minus1.

So in Bell's paper, he's going to use

this a lot. And so that's why I wanted

to show you where sigma.A comes from and

what it means. And we don't really have

to get too deep today into the theory of

SU2 and spinners and all that and poly

matrices. So if you're not super

familiar with all of these algebraic

details, that's actually totally fine.

For the purpose of understanding Belle's

paper, you really just have to know from

a vocabulary point of view that sigma. A

means measuring the particle spin along

the AIS and that the results are going

to be + one or minus one depending on

whether it turns out to be spin up or

spin down respectively.

Before we move on, I do want to give you

just a couple more examples of this

concept just to make the idea a little

bit more intuitive, a little bit more

familiar. So suppose we had measured

instead of along Z along the X

direction. Well then we find that the

spin operator along X is going to be H

bar over 2 sigma X. And when you think

about what are the solutions to h bar 2

sigma x acting on s= lambda s you find

the igen states of 1 / <unk>2 * 1 plus

or - 1 corresponding to igen values of

plus or - h bar / 2. That is to say we

find the same exact kind of situation as

before when we measured along z as far

as the igen values. You have two options

spin up or spin down. The magnitude of

the observable is h bar over two. But

now you have this spinner that's in a

different state. It's pointing in a

different direction. And by the way, the

one over <unk>2, that's just a

normalization constant. And likewise, we

can repeat exactly the same procedure.

We can measure along y. We find that the

spin operator along the y direction is h

bar over 2 sigma y. You solve that

vector value equation. you find the igen

states of 1 /<unk>2 1 plus orus i with

the same old values of plus orus h bar / 2.

And I know all of this feels very

abstract, but there is a visual story

that goes with this algebra. And I've

touched on it in my previous videos

about the mystery of spinners and

electromagnetism as a gauge theory and

also driving the dro equation where

there's a way of drawing a two component

spinner as a flag in three dimensions.

So for example, let's take the igen

state for a particle that's in a spin up

state relative to the z-axis. that is

the spinner 1 Z. Well, if we plot that

using this flag picture diagram and

we'll go ahead and slap on a time

evolution phase factor corresponding to

the energy of the particle, we see that

we have a flag that points straight up

along Z. And then the time evolution

phase factor, that is the rotation in

the complex plane, is going to twirl

that flag around.

If you're curious as to the algebraic

machinery that's happening behind the

scenes, definitely check out the paper

an introduction to spinners by Andrew

Mstein. That paper explains in depth how

exactly the two component spinners map

on to these flag diagrams. But now then

if we plot the spin down along Z spinner

01 that is you see hey it's a flag

that's pointing down along Z. So that

makes sense. And now notice the time

evolution phase vector which rotates the

flag in the complex plane has the effect

of twirling the flag but in the opposite

way as before. Although really it's the

same way. It's just that the flag is

pointing in the opposite direction. The

way to see this is point your right

thumb along the direction that the flag

pole is pointing and then you find that

the phase factor is going to twirl the

flag in the same way that your fingers

go around on your right hand.

So we find in these spinners a picture

of a thing of some kind of quantity that

has an orientation and that kind of

spins around under a complex phase time

evolution. And so that gives you a feel

for some of the algebraic machinery

that's happening behind the scenes when

we talk about spinners and polyatrices

and all of that.

And so [clears throat] now I want you to

imagine in your mind what would the igen

state of spin up along the xaxis look like?

like?

Well, there it is. Makes sense, right?

So, this is 1 / <unk>2 1 with the time

evolution phase factor. We can go ahead

and also add on the spin down along xigg

state. And that's exactly as you would

expect. Now, let's also add in the spin

up along yen state. And there it is

pointing along y spinning around. And if

you add in the spin down along yen

state, well then there it is.

So without going into too too much

detail about the algebra of spinners and

all that, I just wanted to show you that

there is a picture corresponding to all

of this algebra. And that's something

that I would definitely encourage you to

read more about and to explore. But for

the purposes of Belell's paper, we

actually don't need to get too into the

details there. But I hope this has been

useful context.

All right. So before returning to the

paper, I want to say a couple of words

about the concept of the expectation

value of these spin measurements cuz

we're going to see that concept later on

in the paper. So remember earlier we

were looking at the slide shown here and

we thought about how if we rotate the

second magnet by an angle theta for a

particle beam, which we know is going to

be spin up if we measure it vertically,

then the beam is going to split into two

beams. And for a small angle theta, it's

going to be mostly spin up. But there's

some probability of that also being spin

down. And then as we talked about

before, the probability of spin up is

going to be cosine^ squar of that tilt

angle theta / 2. And likewise, the

probability of it being spin down is

going to be 1 minus that. So we're going

to have sin^ square of theta / 2. And

that's all fine and good and that's

totally true and that's one way to talk

about it. But there's another way we can

talk about it in terms of expectation

value which is in some ways more convenient.

convenient.

So to be really technical about this,

suppose we go ahead and call the second

magnet's axis the vector A and then as

we talked about we can use the notation

sigma A as a shorthand for the result of

measuring the spin along the axis A.

Because as you know when you dot the

sigma vector comprised of the poly

matrices by some unit vector a you end

up with something that's directly

proportional to the spin operator but

which has igen values of + one if the

particle is measured spin up and

negative 1 if the particle is measured

spin down. So then now we ask the

question of what is the expectation

value of sigma. A and all we mean by

expectation value is the average over

many measurements holding the A vector

constant. Let me give you an analogy.

Let's say you're a gambler and somehow

you have the opportunity to play a game

where you have a 60% chance of winning a

dollar and a 40% chance of losing a

dollar. Well, in that case, the

expectation value is going to be 20

cents because you have 0.6 6 * 1 which

is 6 and then you add on to that the 0.4

* -1 which is 0.4 and so you have a net

0.2 expectation value of a profit and so

you should play that game. Now the

reason I bring up this analogy is

because of course if you play the game

once you're not going to get 20. You're

either going to make a dollar or you're

going to lose a dollar. So we should not

expect one game to yield 20 cents.

However, if you play that game a 100

times you're going to have about 20

bucks. that's what you should expect to

have. And so that's exactly the sense in

which we use the term expectation value

when thinking about these spin

measurements. In every case, when you

measure the spin, it's going to be a

plus one or a minus one. But depending

on the tilt angle and depending on the

probability that depends on the tilt

angle, there's going to be some average

number that we'll find for that tilt

angle over many subsequent measurements

along that axis. And if you work out the

math as we'll do in just a moment, you

end up with the plot shown here where on

the x-axis we have the tilt angle theta

and then if you look at this curve for

the expectation value and by the way we

use the bracket notation here to

indicate expectation value. Well, as a

sanity check, let's go ahead and look at

a few points and see if this curve kind

of makes sense.

So first of all when theta is zero and

when a is aligned with the polarization

of those incoming spin-up atoms then we

find an expectation value of one and

that makes sense because when the second

detector is not tilted then every single

time a spin up coming in is going to be

a spin up going out and so sigma. A is

going to yield an igen value of plus one all the time. So you do it 100 times

all the time. So you do it 100 times you're going to get 100 plus ones. And

you're going to get 100 plus ones. And then conversely, if we flip a all the

then conversely, if we flip a all the way upside down, then you have a spin up

way upside down, then you have a spin up coming in relative to the upside down

coming in relative to the upside down second detector. That's always going to

second detector. That's always going to come out as a spin down. And so in that

come out as a spin down. And so in that extreme case, you always have a negative

extreme case, you always have a negative 1 for sigma. A, therefore, the

1 for sigma. A, therefore, the expectation value is precisely -1. Now,

expectation value is precisely -1. Now, if you check out this point in the

if you check out this point in the middle of the plot when theta is 90° and

middle of the plot when theta is 90° and the measurement axis A is perfectly

the measurement axis A is perfectly perpendicular to the incoming spin up

perpendicular to the incoming spin up polarization, well, in that case, sigma.

polarization, well, in that case, sigma. A is going to be a +1 or a minus1, you

A is going to be a +1 or a minus1, you know, each with a 50% probability. And

know, each with a 50% probability. And so if you have a set of 100 numbers

so if you have a set of 100 numbers which are either +1 or minus1 with equal

which are either +1 or minus1 with equal probability, well, you add those all up

probability, well, you add those all up and on average you're going to get zero.

and on average you're going to get zero. All right, then. So based on the three

All right, then. So based on the three points we've looked at, the curve seems

points we've looked at, the curve seems to make sense. But how do we calculate

to make sense. But how do we calculate the exact form of this curve? Well, all

the exact form of this curve? Well, all you have to do is think like a gambler

you have to do is think like a gambler and say the expectation value is going

and say the expectation value is going to be the probability of measuring spin

to be the probability of measuring spin up along the axis A times a plus one

up along the axis A times a plus one corresponding to spin up plus the

corresponding to spin up plus the probability of measuring spin down along

probability of measuring spin down along the axis A time the negative 1 that

the axis A time the negative 1 that corresponds to spin down. This is just

corresponds to spin down. This is just like in that game where you have 60%

like in that game where you have 60% chance of winning a dollar, 40% chance

chance of winning a dollar, 40% chance of losing a dollar. So the expectation

of losing a dollar. So the expectation value is $0.2. So it's the same

value is $0.2. So it's the same reasoning as a gambling calculation. And

reasoning as a gambling calculation. And as we saw earlier, we already know the

as we saw earlier, we already know the probability of measuring spin up versus

probability of measuring spin up versus spin down. In the first case, we have a

spin down. In the first case, we have a cosine^ 2 / 2 probability of measuring

cosine^ 2 / 2 probability of measuring spin up. And then we have a sin^ 2 thet

spin up. And then we have a sin^ 2 thet / 2 probability of measuring spin down.

/ 2 probability of measuring spin down. Now, if you are a trig identity

Now, if you are a trig identity enthusiast, you'll recognize this form

enthusiast, you'll recognize this form as having a delightful simplification,

as having a delightful simplification, which is that cosine^ 2 / 2us theta / 2

which is that cosine^ 2 / 2us theta / 2 equals cosine of theta. Isn't that

equals cosine of theta. Isn't that wonderful how that simplifies? So that's

wonderful how that simplifies? So that's a super nice result. And we're going to

a super nice result. And we're going to see the same result in Belle's paper in

see the same result in Belle's paper in equation 3 in a slightly different

equation 3 in a slightly different context, but it's the same exact

context, but it's the same exact reasoning. So anyway, that's all I

reasoning. So anyway, that's all I wanted to say about the expectation

wanted to say about the expectation value. So just think about this as a

value. So just think about this as a pretty common and useful way of putting

pretty common and useful way of putting a statistical handle on this kind of

a statistical handle on this kind of probabilistic situation.

probabilistic situation. All right, then. So now I think we've

All right, then. So now I think we've discussed all of the prerequisites that

discussed all of the prerequisites that we need for the remainder of the paper.

we need for the remainder of the paper. So now let's go ahead and get into part

So now let's go ahead and get into part two formulation.

So remember how in the EPR paper they gave a specific example of a two

gave a specific example of a two particle wave function with

particle wave function with anti-correlated momenta and correlated

anti-correlated momenta and correlated positions.

positions. And with that wave function, we saw how

And with that wave function, we saw how if we measure the momentum of one of the

if we measure the momentum of one of the particles, we end up putting the other

particles, we end up putting the other particle in a momentum state. And

particle in a momentum state. And conversely, if we choose to measure the

conversely, if we choose to measure the position of the particle, then we put

position of the particle, then we put the other one into a position state. So

the other one into a position state. So that specific wave function in the EPR

that specific wave function in the EPR paper was a very mathematically

paper was a very mathematically convenient example to illustrate the

convenient example to illustrate the point. However, of course, the EPR

point. However, of course, the EPR paradox is more general than just a

paradox is more general than just a single specific two particle wave

single specific two particle wave function. And if you see equations 7 and

function. And if you see equations 7 and 8 of the EPR paper, you can see that

8 of the EPR paper, you can see that more generically, whenever you have two

more generically, whenever you have two particles in an entangled state and you

particles in an entangled state and you think about representing that wave

think about representing that wave function as a sum over states of the

function as a sum over states of the first particle, then when you measure

first particle, then when you measure the first particle and put it into that

the first particle and put it into that igen state, that's going to have an

igen state, that's going to have an impact on the state of the second

impact on the state of the second particle. And so really the EPR paradox

particle. And so really the EPR paradox is just the observation that because we

is just the observation that because we have the freedom to choose which

have the freedom to choose which observable we measure of the first

observable we measure of the first particle, we have the ability then to

particle, we have the ability then to affect the quantum state of the second

affect the quantum state of the second particle in a way that somehow appears

particle in a way that somehow appears to violate the constraint of local

to violate the constraint of local causality.

causality. So anyway, the reason I bring that up is

So anyway, the reason I bring that up is because in Bell's paper, we're going to

because in Bell's paper, we're going to use a different two particle state to

use a different two particle state to get at the same fundamental paradoxical

get at the same fundamental paradoxical nature of quantum physics. So instead of

nature of quantum physics. So instead of the particles having anti-correlated

the particles having anti-correlated momenta and correlated positions, we're

momenta and correlated positions, we're going to imagine a pair of spin 1/2

going to imagine a pair of spin 1/2 particles whose spins are going to be in

particles whose spins are going to be in an entangled state. And this

an entangled state. And this configuration for thinking about the EPR

configuration for thinking about the EPR paradox is actually not original to

paradox is actually not original to Bell. It was first put forward by Bow

Bell. It was first put forward by Bow and Aharonov in 1957.

and Aharonov in 1957. So part two of Bell's paper begins with

So part two of Bell's paper begins with the example advocated by Bow and

the example advocated by Bow and Aharonov. The EPR argument is the

Aharonov. The EPR argument is the following. Consider a pair of spin 1/2

following. Consider a pair of spin 1/2 particles formed somehow in the singlet

particles formed somehow in the singlet spin state. Now I want to pause here and

spin state. Now I want to pause here and say what exactly is the singlet spin

say what exactly is the singlet spin state? Well that means that the spins of

state? Well that means that the spins of the two particles have no preferred

the two particles have no preferred direction a priori. If you think about

direction a priori. If you think about either of the particles and you're going

either of the particles and you're going to measure their spin, there's total

to measure their spin, there's total rotational symmetry in that neither of

rotational symmetry in that neither of the particles has a preferred spin axis.

the particles has a preferred spin axis. It's totally uniformly distributed over

It's totally uniformly distributed over all possibilities.

all possibilities. However, the spins of the particles

However, the spins of the particles exhibit perfectly anti-correlated

exhibit perfectly anti-correlated outcomes when measured along the same

outcomes when measured along the same axis. And this is a very bizarre state

axis. And this is a very bizarre state of affairs. Intuitively, you would think

of affairs. Intuitively, you would think that such a state is not possible. And

that such a state is not possible. And yet, the singlet state has been measured

yet, the singlet state has been measured in all kinds of experiments. So, this

in all kinds of experiments. So, this really is possible. This is something

really is possible. This is something that is real. And as we'll talk about

that is real. And as we'll talk about later in the paper, even though it's

later in the paper, even though it's very hard to imagine and it seems kind

very hard to imagine and it seems kind of surreal, the experimental data very

of surreal, the experimental data very strongly indicates that the singlet

strongly indicates that the singlet state is actually a legit thing that can

state is actually a legit thing that can exist. And you sometimes hear the

exist. And you sometimes hear the singlet state described as the particles

singlet state described as the particles having equal and opposite spin. But

having equal and opposite spin. But that's not exactly true, or rather

that's not exactly true, or rather that's too narrow of a description.

that's too narrow of a description. It is true that if you measure the two

It is true that if you measure the two particles along the same axis, you'll

particles along the same axis, you'll always find that their spins are equal

always find that their spins are equal and opposite. But, and this is really a

and opposite. But, and this is really a super important fact about the singlet

super important fact about the singlet spin state. So, I want to reemphasize

spin state. So, I want to reemphasize this. Before the measurement, neither of

this. Before the measurement, neither of the particles has a preferred spin

the particles has a preferred spin direction. This is very hard to imagine

direction. This is very hard to imagine but that is a super important aspect of

but that is a super important aspect of what it is for the particles to be in

what it is for the particles to be in the singlet state.

the singlet state. All right. So that's the singlet state.

All right. So that's the singlet state. Now imagine that we have some process

Now imagine that we have some process which produces pairs of spin 1/2

which produces pairs of spin 1/2 particles in the singlet state and then

particles in the singlet state and then each particle goes its separate way and

each particle goes its separate way and they're both moving freely in opposite

they're both moving freely in opposite directions.

directions. Now then suppose we send each particle

Now then suppose we send each particle into a detector say maybe a sternerlock

into a detector say maybe a sternerlock magnet and then we measure the spin of

magnet and then we measure the spin of both particles to get a sense of the

both particles to get a sense of the kind of thing that happens here. At

kind of thing that happens here. At first we're going to say that the

first we're going to say that the detectors are measuring along the same

detectors are measuring along the same axis.

axis. Let's go ahead and denote that with the

Let's go ahead and denote that with the unit vectors A and B respectively. And

unit vectors A and B respectively. And for starters, those unit vectors are

for starters, those unit vectors are going to be precisely aligned so that

going to be precisely aligned so that we're measuring both particles along the

we're measuring both particles along the same spin axis. And now because the

same spin axis. And now because the particles are in the singlet spin state,

particles are in the singlet spin state, if we measure the spin of particle one

if we measure the spin of particle one along the direction A and we get the

along the direction A and we get the value of + one, right? So suppose

value of + one, right? So suppose particle one measures spin up along a

particle one measures spin up along a then according to quantum mechanics and

then according to quantum mechanics and what it means for the particles to be in

what it means for the particles to be in the singlet state. For sure it's 100%

the singlet state. For sure it's 100% guaranteed that measuring the spin of

guaranteed that measuring the spin of particle 2 along the same axis is going

particle 2 along the same axis is going to yield a value of -1 that is spin down

to yield a value of -1 that is spin down and vice versa. Had we measured particle

and vice versa. Had we measured particle one in the spin down state then we would

one in the spin down state then we would know for sure that particle 2 would be

know for sure that particle 2 would be spin up along the same axis.

spin up along the same axis. By the way, just a comment on the

By the way, just a comment on the notation here. So, as we talked about

notation here. So, as we talked about earlier, the expression sigma A is

earlier, the expression sigma A is shorthand for measuring the spin of the

shorthand for measuring the spin of the particle along the axis A. And this

particle along the axis A. And this operator returns a plus one if it's spin

operator returns a plus one if it's spin up along A and a minus one if it's spin

up along A and a minus one if it's spin down along A. Now, then the subscripts

down along A. Now, then the subscripts here 1 and two, all that indicates is

here 1 and two, all that indicates is that in the first case we're measuring

that in the first case we're measuring particle one and in the second case

particle one and in the second case we're measuring particle two. So it's

we're measuring particle two. So it's not like we have two different sigma

not like we have two different sigma vectors. No, it's the same poly

vectors. No, it's the same poly matrices. It's the same operator. It's

matrices. It's the same operator. It's just that in the first case we apply it

just that in the first case we apply it to the first particle. And in the second

to the first particle. And in the second case, sigma 2, we apply that to the

case, sigma 2, we apply that to the second particle.

second particle. So now we make the hypothesis of local

So now we make the hypothesis of local causality. And it seems one at least

causality. And it seems one at least worth considering that if the two

worth considering that if the two measurements are made at places remote

measurements are made at places remote from one another, the orientation of one

from one another, the orientation of one magnet does not influence the result

magnet does not influence the result obtained with the other. And just to

obtained with the other. And just to really emphasize that point, imagine

really emphasize that point, imagine that detector A and detector B are

that detector A and detector B are separated so far and that the

separated so far and that the measurement of particle one and the

measurement of particle one and the measurement of particle 2 happen so

measurement of particle 2 happen so closely together in time that whatever

closely together in time that whatever tiny time difference there is between

tiny time difference there is between these two measurements, not even light

these two measurements, not even light could travel between detectors A and B

could travel between detectors A and B during that time. So, we imagine that

during that time. So, we imagine that the measurements going on at detector A

the measurements going on at detector A and detector B are completely causally

and detector B are completely causally disconnected if local causality is to be

disconnected if local causality is to be believed.

believed. But here's where we run into the APR

But here's where we run into the APR paradox. Since we can predict in advance

paradox. Since we can predict in advance the result of measuring any chosen

the result of measuring any chosen component of the spin of particle 2 by

component of the spin of particle 2 by previously measuring the same component

previously measuring the same component of the spin of particle 1, it follows

of the spin of particle 1, it follows that the result of any such measurement

that the result of any such measurement must actually be predetermined.

must actually be predetermined. That is to say because the particles

That is to say because the particles start off in the singlet state with no

start off in the singlet state with no preferred spin direction. Then imagine

preferred spin direction. Then imagine particle one is measured in detector A

particle one is measured in detector A ever so slightly before particle 2 is

ever so slightly before particle 2 is measured in detector B. You know by 0001

measured in detector B. You know by 0001 ns or whatever. Well, as soon as we've

ns or whatever. Well, as soon as we've measured particle 1 along the axis A,

measured particle 1 along the axis A, now we can predict with certainty the

now we can predict with certainty the component of the spin of particle 2

component of the spin of particle 2 along the same axis. And yet that

along the same axis. And yet that certainty does not exist in quantum

certainty does not exist in quantum physics. Now we can tell a story about

physics. Now we can tell a story about non-local wave function collapse where

non-local wave function collapse where you measure particle one along axis A

you measure particle one along axis A and the wave function instantly

and the wave function instantly collapses and then particle 2 is no

collapses and then particle 2 is no longer in the singlet state but now it's

longer in the singlet state but now it's for sure going to be polarized in

for sure going to be polarized in accordance with that measurement

accordance with that measurement direction A. But assuming that we don't

direction A. But assuming that we don't allow for non-local wave function

allow for non-local wave function collapse because we want to preserve our

collapse because we want to preserve our sanity and we want to hold on to this

sanity and we want to hold on to this concept of local causality, then we find

concept of local causality, then we find here an apparent contradiction because

here an apparent contradiction because the spin of particle 2 along the axis A

the spin of particle 2 along the axis A should definitely not be predictable

should definitely not be predictable with certainty given the wave function

with certainty given the wave function of the singlet state. A quantum physics

of the singlet state. A quantum physics just doesn't allow for that level of

just doesn't allow for that level of predictability unless we allow for the

predictability unless we allow for the possibility of instantaneous wave

possibility of instantaneous wave function collapse. So then since the

function collapse. So then since the initial quantum mechanical wave function

initial quantum mechanical wave function that is the singlet state does not

that is the singlet state does not determine the result of an individual

determine the result of an individual measurement this predetermination

measurement this predetermination implies the possibility of a more

implies the possibility of a more complete specification of the state. And

complete specification of the state. And so that is apparently the EPR paradox

so that is apparently the EPR paradox this time thought of in terms of spins

this time thought of in terms of spins rather than momentum and position

rather than momentum and position states. And so in other words, all of

states. And so in other words, all of this thought process leads us to think

this thought process leads us to think that surely there must be some kind of

that surely there must be some kind of hidden variables that go along with

hidden variables that go along with particles one and two in a way that

particles one and two in a way that quantum mechanics doesn't account for.

quantum mechanics doesn't account for. And if only we had some kind of more

And if only we had some kind of more complete model where we could figure out

complete model where we could figure out what are those hidden variables and what

what are those hidden variables and what are their dynamics and how do they

are their dynamics and how do they influence the spin measurements. Then

influence the spin measurements. Then surely we can find a more complete and

surely we can find a more complete and more sane and more understandable

more sane and more understandable explanation of what's going on here than

explanation of what's going on here than what quantum mechanics currently has to

what quantum mechanics currently has to offer. Well, all right then. So we want

offer. Well, all right then. So we want a more complete theory involving some

a more complete theory involving some kind of hidden variables. So let this

kind of hidden variables. So let this more complete specification be affected

more complete specification be affected by means of parameters lambda. These are

by means of parameters lambda. These are going to be our hidden variables. So in

going to be our hidden variables. So in this video whenever you see this yellow

this video whenever you see this yellow lambda that's going to stand for

lambda that's going to stand for whatever hidden variables we want to put

whatever hidden variables we want to put into our model that's going to give us a

into our model that's going to give us a more complete description of what's

more complete description of what's happening. So you know earlier we were

happening. So you know earlier we were looking at the Sternerlock experiment

looking at the Sternerlock experiment and we were trying to explain it in

and we were trying to explain it in terms of particles carrying with them

terms of particles carrying with them this yellow vector. And so that was an

this yellow vector. And so that was an example of lambda. But now we're going

example of lambda. But now we're going to broaden that up a little bit. Or

to broaden that up a little bit. Or actually we're going to broaden it up

actually we're going to broaden it up all the way and say lambda can be

all the way and say lambda can be whatever you want it to be. Whatever you

whatever you want it to be. Whatever you can imagine. A vector, a scalar, a

can imagine. A vector, a scalar, a tensor, a function, a set, whatever you

tensor, a function, a set, whatever you want it to be. It is a matter of

want it to be. It is a matter of indifference in the following. Whether

indifference in the following. Whether lambda denotes a single variable or a

lambda denotes a single variable or a set or even a set of functions and

set or even a set of functions and whether the variables are discrete or

whether the variables are discrete or continuous. The beautiful thing about

continuous. The beautiful thing about Belle's paper is it accounts for all

Belle's paper is it accounts for all possible hidden variable models in one

possible hidden variable models in one fell swoop because it's such a generic

fell swoop because it's such a generic argument as we'll see. However, we write

argument as we'll see. However, we write as if lambda were a single continuous

as if lambda were a single continuous parameter. So the notation that we'll be

parameter. So the notation that we'll be using, for example, we'll integrate over

using, for example, we'll integrate over all possible lambda and it'll look like

all possible lambda and it'll look like we're assuming that lambda is a

we're assuming that lambda is a continuous parameter. However, what

continuous parameter. However, what Belle is saying here is that if you want

Belle is saying here is that if you want to modify the argument so that lambda is

to modify the argument so that lambda is not a continuous parameter but is rather

not a continuous parameter but is rather a discrete parameter or a set or

a discrete parameter or a set or whatever contrived thing you want to

whatever contrived thing you want to come up with, you can trivially modify

come up with, you can trivially modify the argument to account for that.

the argument to account for that. Replace an integral with a sum or

Replace an integral with a sum or whatever you have to do. Those kinds of

whatever you have to do. Those kinds of modifications won't have any effect on

modifications won't have any effect on the logical structure of the argument

the logical structure of the argument put forward in this paper. So now let's

put forward in this paper. So now let's think about what's happening in these

think about what's happening in these detectors. And at this moment we can go

detectors. And at this moment we can go ahead and say that the axis of

ahead and say that the axis of measurement in detector B does not have

measurement in detector B does not have to be the same as the axis of

to be the same as the axis of measurement in detector A. So we're

measurement in detector A. So we're going to make this more generic. Oh, and

going to make this more generic. Oh, and one thing that I'll point out is that in

one thing that I'll point out is that in everything we're about to talk about,

everything we're about to talk about, what matters as far as the orientations

what matters as far as the orientations of the unit vectors of A and B is only

of the unit vectors of A and B is only the angle between those two vectors, the

the angle between those two vectors, the extent to which they're aligned or

extent to which they're aligned or misaligned.

misaligned. And when you think about two vectors in

And when you think about two vectors in three-dimensional space, the two vectors

three-dimensional space, the two vectors are going to span a plane, and then

are going to span a plane, and then there's going to be some angle between

there's going to be some angle between them in that plane. And that angle

them in that plane. And that angle between them, that theta angle is the

between them, that theta angle is the relevant quantity when we're thinking

relevant quantity when we're thinking about how the orientations of these two

about how the orientations of these two measurement axes are going to matter.

measurement axes are going to matter. And so if you want, you can imagine a

And so if you want, you can imagine a fully generic three-dimensional

fully generic three-dimensional situation where A and B can point

situation where A and B can point whichever ways you want to imagine them

whichever ways you want to imagine them pointing. But because it's only the

pointing. But because it's only the theta angle between them that matters in

theta angle between them that matters in whatever plane they happen to span, we

whatever plane they happen to span, we may as well imagine the A vector

may as well imagine the A vector pointing straight up. And then we can

pointing straight up. And then we can imagine the B vector having some random

imagine the B vector having some random orientation in the plane. And so the

orientation in the plane. And so the diagram shown here on your

diagram shown here on your two-dimensional screen with A pointing

two-dimensional screen with A pointing up and B pointing wherever, imagine

up and B pointing wherever, imagine rotating the B- axis a full 360. Well,

rotating the B- axis a full 360. Well, for all intents and purposes, that 360

for all intents and purposes, that 360 sweep is going to span all of the

sweep is going to span all of the possibilities as far as the ways in

possibilities as far as the ways in which we can misorient our detectors

which we can misorient our detectors relative to each other. And actually,

relative to each other. And actually, you only need 180 cuz once you tilt it

you only need 180 cuz once you tilt it past 180, theta starts to come back in.

past 180, theta starts to come back in. See what I mean? And then technically by

See what I mean? And then technically by symmetry, all the interesting stuff

symmetry, all the interesting stuff happens between 0 and 90°.

happens between 0 and 90°. Okay. So then what is actually going on

Okay. So then what is actually going on in these detectors? Well, if we assume

in these detectors? Well, if we assume this hidden variable model, then the

this hidden variable model, then the result A of measuring the spin of

result A of measuring the spin of particle 1 along the AIS is then

particle 1 along the AIS is then determined by the AIS and the hidden

determined by the AIS and the hidden variable lambda.

variable lambda. So, particle 1 is coming in, it's

So, particle 1 is coming in, it's carrying with it some kind of hidden

carrying with it some kind of hidden variable, maybe some vector, some

variable, maybe some vector, some scalar, some tensor, whatever it is,

scalar, some tensor, whatever it is, whatever hidden variable we want to

whatever hidden variable we want to imagine. And as particle one goes into

imagine. And as particle one goes into detector A and detector A is oriented

detector A and detector A is oriented along the A axis, then the only things

along the A axis, then the only things that are going to affect the spin

that are going to affect the spin measurement at particle 1 are the

measurement at particle 1 are the orientation that A vector and the hidden

orientation that A vector and the hidden variable lambda that goes with particle

variable lambda that goes with particle 1. Because particles one and two are in

1. Because particles one and two are in the singlet state, they don't have any a

the singlet state, they don't have any a priori preferred directions. So the

priori preferred directions. So the result of the spin measurement is going

result of the spin measurement is going to be deterministically well determined

to be deterministically well determined by however the hidden variable lambda

by however the hidden variable lambda interacts with the detector oriented

interacts with the detector oriented along A. And likewise then the result B

along A. And likewise then the result B of measuring the spin of particle 2

of measuring the spin of particle 2 along the B ais in the same instance is

along the B ais in the same instance is determined by the B ais and lambda for

determined by the B ais and lambda for exactly the same reason. And so we can

exactly the same reason. And so we can write that the measurement outcome at A

write that the measurement outcome at A as a function of the measurement

as a function of the measurement direction A and the hidden variables

direction A and the hidden variables lambda can take on a value of + one or

lambda can take on a value of + one or minus1 depending on whether particle 1

minus1 depending on whether particle 1 is measured spin up or spin down

is measured spin up or spin down respectively. And likewise the

respectively. And likewise the measurement result at detector B which

measurement result at detector B which is a function of the B ais and the

is a function of the B ais and the hidden variables lambda is also going to

hidden variables lambda is also going to take on a value of +1 or minus1 for spin

take on a value of +1 or minus1 for spin up and spin down respectively.

up and spin down respectively. And we're going to leave this fully

And we're going to leave this fully generic as far as in what way or by what

generic as far as in what way or by what function do the hidden variables

function do the hidden variables interact with the measurement axis.

interact with the measurement axis. Whatever it is you can imagine, whatever

Whatever it is you can imagine, whatever principle you want to go ahead and

principle you want to go ahead and postulate, then it's still for sure the

postulate, then it's still for sure the case whatever these functions actually

case whatever these functions actually are, by definition, they're going to

are, by definition, they're going to have values of plus or minus one

have values of plus or minus one depending on the outcome of the spin

depending on the outcome of the spin measurement. Now the vital assumption of

measurement. Now the vital assumption of local causality is that the result B for

local causality is that the result B for particle 2 does not depend on the

particle 2 does not depend on the setting A of the magnet for particle 1.

setting A of the magnet for particle 1. Nor does A depend on B. So in equation

Nor does A depend on B. So in equation one you see that A is a function of the

one you see that A is a function of the A vector and the hidden variables

A vector and the hidden variables lambda. b is a function of the B vector

lambda. b is a function of the B vector and the hidden variables lambda. But

and the hidden variables lambda. But notice that A is not a function of the B

notice that A is not a function of the B vector, nor is B a function of the A

vector, nor is B a function of the A vector. The reason being detectors A and

vector. The reason being detectors A and B are separated out so far and these

B are separated out so far and these measurements happen so quickly. So

measurements happen so quickly. So there's no way that the information

there's no way that the information about which way one detector is oriented

about which way one detector is oriented can propagate over to the other detector

can propagate over to the other detector and affect the measurement result in any

and affect the measurement result in any way. No, these two things happen in

way. No, these two things happen in different light cones. And so by local

different light cones. And so by local causality, you can't have the

causality, you can't have the measurement result of A depending on the

measurement result of A depending on the B vector or vice versa.

B vector or vice versa. And one of the things that we're going

And one of the things that we're going to show in this paper is that any hidden

to show in this paper is that any hidden variable model is going to have to

variable model is going to have to violate that assumption. And the only

violate that assumption. And the only way to get it to work is if you're going

way to get it to work is if you're going to relax that constraint and say, okay,

to relax that constraint and say, okay, the measurement outcome at A depends on

the measurement outcome at A depends on the orientation at B and vice versa. And

the orientation at B and vice versa. And then it's like, oh, that's weird. That's

then it's like, oh, that's weird. That's non-local. That is absurd. But you know

non-local. That is absurd. But you know that's like super weird. And then so at

that's like super weird. And then so at that point there's no advantage of using

that point there's no advantage of using a hidden variable model because whether

a hidden variable model because whether you take ordinary quantum mechanics or

you take ordinary quantum mechanics or some speculative hidden variable model

some speculative hidden variable model in both cases you're going to have a

in both cases you're going to have a non-local model. And so no matter how

non-local model. And so no matter how you look at it it's a glitch in reality.

you look at it it's a glitch in reality. All right. Then suppose we define row of

All right. Then suppose we define row of lambda as the probability distribution

lambda as the probability distribution of the hidden variables lambda.

of the hidden variables lambda. So in other words, imagine all possible

So in other words, imagine all possible configurations of our hidden variables

configurations of our hidden variables lambda, whether they're vectors or

lambda, whether they're vectors or scalers or tensors or functions or sets,

scalers or tensors or functions or sets, whatever you want to imagine for lambda.

whatever you want to imagine for lambda. There's going to be some space of

There's going to be some space of configurations, some space of

configurations, some space of possibilities that lambda can take on.

possibilities that lambda can take on. And you can assign a probability to each

And you can assign a probability to each and every configuration. And so row of

and every configuration. And so row of lambda is precisely the distribution

lambda is precisely the distribution which defines how likely our hidden

which defines how likely our hidden variables are to exist in whatever state

variables are to exist in whatever state we can imagine them existing in. So this

we can imagine them existing in. So this is quite a generic thing and as we go

is quite a generic thing and as we go through the paper we'll imagine some

through the paper we'll imagine some specific cases with some simple

specific cases with some simple functions for row of lambda. But notice

functions for row of lambda. But notice the power in keeping this generic. See

the power in keeping this generic. See so far we haven't narrowed down what

so far we haven't narrowed down what lambda can be. Our hidden variables can

lambda can be. Our hidden variables can be whatever you can imagine. And then

be whatever you can imagine. And then row of lambda as a probability

row of lambda as a probability distribution on those hidden variables

distribution on those hidden variables can also be whatever you want to

can also be whatever you want to imagine. Whatever distribution you want

imagine. Whatever distribution you want to take over whatever space of variables

to take over whatever space of variables you want to define. And even though our

you want to define. And even though our setup is so generic, one of the things

setup is so generic, one of the things we can still say for sure is that the

we can still say for sure is that the expectation value of the product of the

expectation value of the product of the two components measuring particle one

two components measuring particle one along the A axis and measuring particle

along the A axis and measuring particle 2 along the B ais is going to be P of A

2 along the B ais is going to be P of A and B where here P is the expectation

and B where here P is the expectation value of the products of A and B that is

value of the products of A and B that is the plus or minus one that's recorded at

the plus or minus one that's recorded at each detector. We can say that P of A

each detector. We can say that P of A and B is going to be the integral over

and B is going to be the integral over all possible configurations of hidden

all possible configurations of hidden variables. Each one weighted by row of

variables. Each one weighted by row of lambda that is how likely that

lambda that is how likely that configuration is to be. And then as

configuration is to be. And then as we're integrating over that space of

we're integrating over that space of possible hidden variables for each

possible hidden variables for each possibility, we simply multiply the

possibility, we simply multiply the outcome of the measurement at detector

outcome of the measurement at detector A, that is A of A and lambda times the

A, that is A of A and lambda times the measurement outcome at detector B, that

measurement outcome at detector B, that is B of B and lambda.

is B of B and lambda. By the way, in Belle's paper, he writes

By the way, in Belle's paper, he writes this integral as integral row lambda D

this integral as integral row lambda D lambda A * B. I like to write it in the

lambda A * B. I like to write it in the sandwich notation where you have the

sandwich notation where you have the integral sign on the left and the

integral sign on the left and the differential element on the right and

differential element on the right and then whatever you're integrating over in

then whatever you're integrating over in between. It doesn't matter either way.

between. It doesn't matter either way. It's just a stylistic choice. So, well,

It's just a stylistic choice. So, well, anyway, I want to reflect on exactly

anyway, I want to reflect on exactly what this equation means, equation two,

what this equation means, equation two, because it is of central importance to

because it is of central importance to everything that follows. So, this

everything that follows. So, this parameter P, we're going to go ahead and

parameter P, we're going to go ahead and call that the correlation between our

call that the correlation between our measurements.

measurements. And this correlation has a really

And this correlation has a really intuitive meaning. So the first thing to

intuitive meaning. So the first thing to notice is that P the correlation has to

notice is that P the correlation has to be somewhere in between -1 and 1. When

be somewhere in between -1 and 1. When it's negative 1, then the measurement

it's negative 1, then the measurement outcomes at detector A are going to be

outcomes at detector A are going to be perfectly anti-correlated with the

perfectly anti-correlated with the measurement outcomes at detector B. So

measurement outcomes at detector B. So for example, this would be when detector

for example, this would be when detector A and detector B are aligned along

A and detector B are aligned along precisely the same axis. Because if we

precisely the same axis. Because if we have a pair of particles in the singlet

have a pair of particles in the singlet state and we measure them both along the

state and we measure them both along the same axis then if one is spin up the

same axis then if one is spin up the other spin down and vice versa. So if a

other spin down and vice versa. So if a is + one then b is minus1 and vice

is + one then b is minus1 and vice versa. And so when we're measuring the

versa. And so when we're measuring the singlet state along the same axis then

singlet state along the same axis then the product of a and b is always going

the product of a and b is always going to be -1 because 1 *1 is -1 and -1 * 1

to be -1 because 1 *1 is -1 and -1 * 1 is -1. And so in that configuration if

is -1. And so in that configuration if the product of a and b is always neg -1

the product of a and b is always neg -1 then equation 2 is simply the negative

then equation 2 is simply the negative integral over row of lambda d lambda.

integral over row of lambda d lambda. Now this is a normalized probability

Now this is a normalized probability distribution. So when you integrate over

distribution. So when you integrate over all possibilities and each one is

all possibilities and each one is weighted by the probability distribution

weighted by the probability distribution the result of that integral is always

the result of that integral is always going to equal one because there's a

going to equal one because there's a 100% chance that the hidden variables

100% chance that the hidden variables are in some kind of configuration.

are in some kind of configuration. And so then we find that P of A and B

And so then we find that P of A and B when A and B are the same vector is

when A and B are the same vector is equal to -1.

equal to -1. Conversely, if we flip B around so that

Conversely, if we flip B around so that now B is equal to A and our measurement

now B is equal to A and our measurement axes are pointing in equal and opposite

axes are pointing in equal and opposite directions, then we find a correlation

directions, then we find a correlation of one. That is the product of A and B

of one. That is the product of A and B is always going to equal one. Because if

is always going to equal one. Because if we measure the particle spin up in

we measure the particle spin up in detector A, but then detector B is

detector A, but then detector B is flipped upside down relative to detector

flipped upside down relative to detector A, then the other particle is also going

A, then the other particle is also going to be measured spin up in detector B,

to be measured spin up in detector B, but along the upside down axis. So the

but along the upside down axis. So the singlet correlation is still there. It's

singlet correlation is still there. It's just that when you flip the vector B

just that when you flip the vector B upside down, that's kind of a

upside down, that's kind of a redefinition of what spin up and spin

redefinition of what spin up and spin down means in detector B. And so in that

down means in detector B. And so in that case if the product of a and b is always

case if the product of a and b is always equal to 1 because 1 * 1 is 1 and also -

equal to 1 because 1 * 1 is 1 and also - 1 * 1 is 1 then equation 2 simply

1 * 1 is 1 then equation 2 simply reduces to the integral of row of lambda

reduces to the integral of row of lambda d lambda which because row is a

d lambda which because row is a normalized probability distribution

normalized probability distribution equals 1.

equals 1. Now there's one more special case that

Now there's one more special case that we can imagine which is when a and b are

we can imagine which is when a and b are perpendicular.

perpendicular. So suppose a is pointing straight up and

So suppose a is pointing straight up and b is pointing straight to the right.

b is pointing straight to the right. Well, in that case, we should expect a

Well, in that case, we should expect a correlation of zero. The reason being in

correlation of zero. The reason being in the singlet state, say you measure spin

the singlet state, say you measure spin up along A, well, if B is perpendicular

up along A, well, if B is perpendicular to A, then it could go either way. You

to A, then it could go either way. You could get a spin up or a spin down. And

could get a spin up or a spin down. And so on average, the product of A and B is

so on average, the product of A and B is going to be a + one or a minus1 about

going to be a + one or a minus1 about 50/50. And so that'll average out to

50/50. And so that'll average out to zero. So if we have a value of P equals

zero. So if we have a value of P equals Z, there is no correlation between the

Z, there is no correlation between the two detectors.

two detectors. Okay, so that's equation two. The

Okay, so that's equation two. The correlation between our measurement

correlation between our measurement outcomes is found simply by integrating

outcomes is found simply by integrating over the space of all possible hidden

over the space of all possible hidden variables weighted by the probability of

variables weighted by the probability of each configuration of the products of

each configuration of the products of the plus -1 outcome at A times the plus

the plus -1 outcome at A times the plus orus one outcome at B. Now that

orus one outcome at B. Now that correlation given by equation 2 based on

correlation given by equation 2 based on a hidden variable model should equal the

a hidden variable model should equal the quantum mechanical expectation value

quantum mechanical expectation value which for the singlet state the

which for the singlet state the expectation value of that product is

expectation value of that product is going to be a b or as we saw earlier

going to be a b or as we saw earlier negative cosine of theta where theta is

negative cosine of theta where theta is the angle between the two measurement

the angle between the two measurement axis vectors a and b. And the way to

axis vectors a and b. And the way to prove that equation three is true, that

prove that equation three is true, that this is the quantum mechanical

this is the quantum mechanical expectation value, and that this does

expectation value, and that this does match the experimental data is just to

match the experimental data is just to imagine that particle 1 gets to detector

imagine that particle 1 gets to detector A ever so slightly before particle 2

A ever so slightly before particle 2 gets to detector B. So then particle 1

gets to detector B. So then particle 1 is measured along the AIS and the wave

is measured along the AIS and the wave function instantly collapses. And now

function instantly collapses. And now particle 2 is going to be polarized

particle 2 is going to be polarized opposite to the AIS. And so then when

opposite to the AIS. And so then when you measure the spin of particle 2 along

you measure the spin of particle 2 along the direction B, you can think about it

the direction B, you can think about it sort of like the two-stage sternerlock

sort of like the two-stage sternerlock experiment where we create a beam of

experiment where we create a beam of purely polarized spin- up particles and

purely polarized spin- up particles and we send that through a second detector

we send that through a second detector which is tilted by some angle theta. And

which is tilted by some angle theta. And then as we know we have a cosine^ 2

then as we know we have a cosine^ 2 probability of measuring spin up sin^ 2

probability of measuring spin up sin^ 2 thet2 probability of measuring spin

thet2 probability of measuring spin down. And if you take the expectation

down. And if you take the expectation value, you think like a gambler and

value, you think like a gambler and calculate the expectation value, you end

calculate the expectation value, you end up with an expectation value of cosine

up with an expectation value of cosine of theta for the measurement outcome at

of theta for the measurement outcome at the second detector if spin up is + one

the second detector if spin up is + one and spin down is ne1. And we saw that

and spin down is ne1. And we saw that earlier. And then the minus sign here

earlier. And then the minus sign here simply comes from the fact that the two

simply comes from the fact that the two particles in the singlet state are

particles in the singlet state are anti-correlated.

anti-correlated. So if particle one is spin up along the

So if particle one is spin up along the axis A, then particle 2 is actually

axis A, then particle 2 is actually going to be polarized spin down along A.

going to be polarized spin down along A. And so that's where the minus sign comes

And so that's where the minus sign comes from. It's basically just a 180 flip of

from. It's basically just a 180 flip of the two-stage Stern Garlock experiment

the two-stage Stern Garlock experiment that we were looking at earlier.

that we were looking at earlier. Well, anyway, all that's to say, quantum

Well, anyway, all that's to say, quantum mechanics tells us that the correlation

mechanics tells us that the correlation of the measurement outcomes for unit

of the measurement outcomes for unit vector A at detector A and unit vector B

vector A at detector A and unit vector B at detector B for two particles in the

at detector B for two particles in the singlet state should be negative cosine

singlet state should be negative cosine of theta where theta is the angle

of theta where theta is the angle between the two vectors. And so the main

between the two vectors. And so the main question of this paper is is it possible

question of this paper is is it possible to have some hidden variable model based

to have some hidden variable model based on some set of possible lambdas and some

on some set of possible lambdas and some probability distribution which describes

probability distribution which describes the likelihood of each lambda. Based on

the likelihood of each lambda. Based on a model like that, can we get equation 2

a model like that, can we get equation 2 to match the quantum mechanical and the

to match the quantum mechanical and the experimental value of negative cosine of

experimental value of negative cosine of theta between the vectors a and b? If

theta between the vectors a and b? If so, then such a hidden variable model

so, then such a hidden variable model might be plausible, you know, because it

might be plausible, you know, because it would match the data. It would match

would match the data. It would match quantum theory and yet it would be an

quantum theory and yet it would be an alternate way of looking at things. So

alternate way of looking at things. So that's cool. But what we're going to

that's cool. But what we're going to show in this paper, in particular, part

show in this paper, in particular, part four in the contradiction, is that no

four in the contradiction, is that no local hidden variable model can actually

local hidden variable model can actually have an equation 2 correlation which

have an equation 2 correlation which matches the quantum mechanical

matches the quantum mechanical correlation and the experimental data.

correlation and the experimental data. And so therefore, we cannot have a local

And so therefore, we cannot have a local hidden variable explanation of what's

hidden variable explanation of what's going on here. And so therefore, we have

going on here. And so therefore, we have to confront the fact that quantum

to confront the fact that quantum mechanics genuinely is super weird and

mechanics genuinely is super weird and non-local and a glitch in reality.

non-local and a glitch in reality. Oh, and then one little caveat on the

Oh, and then one little caveat on the way we've formulated things here. Some

way we've formulated things here. Some might prefer a formulation in which the

might prefer a formulation in which the hidden variables fall into two sets with

hidden variables fall into two sets with the measurement outcome at A dependent

the measurement outcome at A dependent on one set of hidden variables and the

on one set of hidden variables and the measurement at B depending on another

measurement at B depending on another set of hidden variables.

set of hidden variables. However, this possibility is contained

However, this possibility is contained in the above since lambda stands for any

in the above since lambda stands for any number of variables and the dependencies

number of variables and the dependencies thereon of A and B are unrestricted. So

thereon of A and B are unrestricted. So in other words, if you want to have a

in other words, if you want to have a hidden variable model where particle one

hidden variable model where particle one carries with it some kind of set of

carries with it some kind of set of hidden variables and particle 2 carries

hidden variables and particle 2 carries with it a whole another set of hidden

with it a whole another set of hidden variables, go right ahead. That's fine.

variables, go right ahead. That's fine. We're not ruling out that possibility.

We're not ruling out that possibility. When we use this character lambda to

When we use this character lambda to stand for any imaginable hidden

stand for any imaginable hidden variables, you can go ahead and imagine

variables, you can go ahead and imagine that in whatever way you want, including

that in whatever way you want, including the situation where you have two sets of

the situation where you have two sets of hidden variables, one for each particle.

hidden variables, one for each particle. You know, go for it. That's totally

You know, go for it. That's totally fine. were not restricting that

fine. were not restricting that possibility at all. And likewise, in a

possibility at all. And likewise, in a complete physical theory of the type

complete physical theory of the type envisaged by Einstein, the hidden

envisaged by Einstein, the hidden variables would have dynamical

variables would have dynamical significance and laws of motion.

significance and laws of motion. Our lambda can then be thought of as

Our lambda can then be thought of as initial values of these variables at

initial values of these variables at some suitable instant. So in other

some suitable instant. So in other words, if you want to think about hidden

words, if you want to think about hidden variables as some kind of fields with

variables as some kind of fields with dynamical significance, that's cool,

dynamical significance, that's cool, too. Everything we're about to argue

too. Everything we're about to argue doesn't rule out that possibility at

doesn't rule out that possibility at all. And if you want, you can imagine

all. And if you want, you can imagine lambda representing a snapshot in time

lambda representing a snapshot in time of those fields. And then you can

of those fields. And then you can imagine those fields evolving in

imagine those fields evolving in accordance with some dynamical

accordance with some dynamical equations. But none of that time

equations. But none of that time evolution is going to break that thought

evolution is going to break that thought experiment outside of the framework that

experiment outside of the framework that we're setting up because our argument is

we're setting up because our argument is fully generic. Anything you can imagine

fully generic. Anything you can imagine for lambda, lambda can be. You know, I

for lambda, lambda can be. You know, I just noticed this yellow lambda. It kind

just noticed this yellow lambda. It kind of looks like a banana peel. You

of looks like a banana peel. You wouldn't want that as a hidden variable.

wouldn't want that as a hidden variable. [laughter]

[laughter] Hey, that would affect the measurement

Hey, that would affect the measurement of your spin state.

of your spin state. All right, moving on.

All right, moving on. Part three of the paper begins. The

Part three of the paper begins. The proof of the main result is quite

proof of the main result is quite simple. Well, according to Belle, at

simple. Well, according to Belle, at least. I don't know if I would say it's

least. I don't know if I would say it's quite simple, but uh anyway, before

quite simple, but uh anyway, before giving it in part four, however, a

giving it in part four, however, a number of illustrations may serve to put

number of illustrations may serve to put it in perspective.

it in perspective. So part three is all about establishing

So part three is all about establishing some context for part four looking at

some context for part four looking at some specific examples which we're then

some specific examples which we're then going to generalize in part four when we

going to generalize in part four when we give the formal argumentation that local

give the formal argumentation that local hidden variable models don't work.

hidden variable models don't work. Now I'm going to go ahead and break up

Now I'm going to go ahead and break up part three into three parts 3 A 3 B and

part three into three parts 3 A 3 B and 3 C because this part of the paper is

3 C because this part of the paper is kind of naturally broken up into those

kind of naturally broken up into those three parts anyway and I want to take

three parts anyway and I want to take the time to zoom in on each part of this

the time to zoom in on each part of this individually.

individually. So the first part of part three is that

So the first part of part three is that for a single particle we can make up a

for a single particle we can make up a hidden variable story of what's going on

hidden variable story of what's going on with the spin and it's okay it seems to

with the spin and it's okay it seems to work.

work. Firstly there is no difficulty in giving

Firstly there is no difficulty in giving a hidden variable account of spin

a hidden variable account of spin measurements on a single particle.

measurements on a single particle. Suppose we have a spin half particle in

Suppose we have a spin half particle in a pure spin state with polarization

a pure spin state with polarization denoted by a unit vector P. And all that

denoted by a unit vector P. And all that means is imagine we send a beam of spin

means is imagine we send a beam of spin 1/2 particles through a sternerlock

1/2 particles through a sternerlock magnet and then filter it out like what

magnet and then filter it out like what we saw before where we allow only the

we saw before where we allow only the spin up particles through. Well, then if

spin up particles through. Well, then if the axis of that sternerlock magnet is

the axis of that sternerlock magnet is the vector P, then the outgoing beam of

the vector P, then the outgoing beam of particles are polarized with reference

particles are polarized with reference to that vector P. That is to say, if you

to that vector P. That is to say, if you were to do a subsequent spin measurement

were to do a subsequent spin measurement on that particle along the direction P,

on that particle along the direction P, then for sure the result of that

then for sure the result of that measurement is going to be spin up. So

measurement is going to be spin up. So that's what it means for the particle to

that's what it means for the particle to be polarized along the direction P.

be polarized along the direction P. All right. Now then suppose we let our

All right. Now then suppose we let our hidden variable be for example a unit

hidden variable be for example a unit vector lambda with uniform probability

vector lambda with uniform probability distribution over the hemisphere

distribution over the hemisphere lambda.p is greater than zero. That is

lambda.p is greater than zero. That is to say a lambda is going to be some

to say a lambda is going to be some additional directional or orientational

additional directional or orientational degree of freedom that travels along

degree of freedom that travels along with the particle. And we don't know

with the particle. And we don't know exactly what lambda is going to be. All

exactly what lambda is going to be. All we know about it is that it's going to

we know about it is that it's going to have a uniform probability distribution

have a uniform probability distribution over the hemisphere which points in the

over the hemisphere which points in the same direction as P. And so this

same direction as P. And so this constraint that the dotproduct of lambda

constraint that the dotproduct of lambda and P is greater than zero, all that

and P is greater than zero, all that means is that lambda kind of points

means is that lambda kind of points towards P and it doesn't kind of point

towards P and it doesn't kind of point away from P. Now, if you think back to

away from P. Now, if you think back to what we saw earlier in this video, where

what we saw earlier in this video, where we sent our particle through a two-stage

we sent our particle through a two-stage Sternerlock experiment, and we supposed

Sternerlock experiment, and we supposed that all the magnet does is filters out

that all the magnet does is filters out the particles that point a little up

the particles that point a little up versus a little down without actively

versus a little down without actively flipping the arrow up and down. You'll

flipping the arrow up and down. You'll see that that thought experiment

see that that thought experiment actually gives us a beam of this kind of

actually gives us a beam of this kind of particle where we start off with the

particle where we start off with the assumption that the incoming particle,

assumption that the incoming particle, those evaporated silver atoms, have a

those evaporated silver atoms, have a totally randomly oriented lambda vector,

totally randomly oriented lambda vector, but then we send it through the first

but then we send it through the first sternerlock magnet to get a beam that's

sternerlock magnet to get a beam that's purely polarized along the axis of that

purely polarized along the axis of that magnet. And then at that point, what we

magnet. And then at that point, what we know about the lambda vector is it's

know about the lambda vector is it's still going to be totally random, but

still going to be totally random, but only on the half of the sphere that kind

only on the half of the sphere that kind of points along the direction P because

of points along the direction P because the particles for which lambda pointed

the particles for which lambda pointed away from P were sent into the spin down

away from P were sent into the spin down beam and those didn't go forward.

beam and those didn't go forward. And so the question then comes up, what

And so the question then comes up, what happens if we measure the spin of this

happens if we measure the spin of this kind of particle along some axis A?

kind of particle along some axis A? Well, we already know what the

Well, we already know what the expectation value is going to be. The

expectation value is going to be. The expectation value of the spin of this

expectation value of the spin of this kind of particle from quantum mechanics

kind of particle from quantum mechanics and from experiment is going to be the

and from experiment is going to be the cosine of the tilt angle of the second

cosine of the tilt angle of the second detector relative to the first. That is

detector relative to the first. That is in this language we would say it's going

in this language we would say it's going to be the coine of the angle theta

to be the coine of the angle theta between the polarization vector P and

between the polarization vector P and the measurement vector A.

the measurement vector A. So then suppose that as we're building

So then suppose that as we're building our hidden variable model, we speculate

our hidden variable model, we speculate that the result of measuring along some

that the result of measuring along some axis A is going to be the sign of the

axis A is going to be the sign of the hidden variable lambda vector dotted

hidden variable lambda vector dotted with the effective measurement axis A

with the effective measurement axis A prime. See, we're going to have to do a

prime. See, we're going to have to do a sketchy move here of the kind we talked

sketchy move here of the kind we talked about earlier.

about earlier. And so A prime is going to be a unit

And so A prime is going to be a unit vector which depends on A and P in a way

vector which depends on A and P in a way to be specified. We're going to talk

to be specified. We're going to talk about exactly what that has to be in a

about exactly what that has to be in a moment, but this is exactly the same

moment, but this is exactly the same kind of sketchy move we looked at

kind of sketchy move we looked at earlier when we were thinking about how

earlier when we were thinking about how can we modify our hidden variable model

can we modify our hidden variable model into something that matches the data.

into something that matches the data. And in fact, the example we looked at

And in fact, the example we looked at earlier in the video is mathematically

earlier in the video is mathematically equivalent to what we're talking about

equivalent to what we're talking about now. Oh, and then the sign function here

now. Oh, and then the sign function here simply takes on the values of + one or

simply takes on the values of + one or minus one according to the sign of its

minus one according to the sign of its argument. So the sign of the dotproduct

argument. So the sign of the dotproduct of the lambda vector and the effective

of the lambda vector and the effective measurement axis a prime is going to be

measurement axis a prime is going to be positive if lambda kind of points along

positive if lambda kind of points along a prime and it's going to be negative if

a prime and it's going to be negative if lambda kind of points away from a prime.

lambda kind of points away from a prime. And so all this is to say the

And so all this is to say the measurement result is going to be spin

measurement result is going to be spin up if lambda is in the hemisphere whose

up if lambda is in the hemisphere whose pole is a prime and otherwise it'll be

pole is a prime and otherwise it'll be spin down if lambda is outside of that

spin down if lambda is outside of that hemisphere.

hemisphere. And then you can say what if lambda is

And then you can say what if lambda is right on the equator relative to the

right on the equator relative to the north pole of a prime. Well, the

north pole of a prime. Well, the probability of lambda being perfectly on

probability of lambda being perfectly on the equator is zero. And so we don't

the equator is zero. And so we don't have to worry about it. As Bell says in

have to worry about it. As Bell says in his paper, actually this leaves the

his paper, actually this leaves the result undetermined when lambda a prime

result undetermined when lambda a prime equals zero. But as the probability of

equals zero. But as the probability of this is zero, we will not make special

this is zero, we will not make special prescriptions for it. So we don't have

prescriptions for it. So we don't have to worry about that. Now then if you

to worry about that. Now then if you average over all possible hidden

average over all possible hidden variable vectors lambda in accordance

variable vectors lambda in accordance with the setup we've described here the

with the setup we've described here the expectation value of the spin

expectation value of the spin measurement is going to be 1 - 2 theta

measurement is going to be 1 - 2 theta prime over pi. Call that equation 5

prime over pi. Call that equation 5 where theta prime is the angle between

where theta prime is the angle between the effective measurement axis a prime

the effective measurement axis a prime and the polarization vector p. That's

and the polarization vector p. That's the same theta prime from our sketchy

the same theta prime from our sketchy move we talked about earlier. And so

move we talked about earlier. And so let's go ahead and see where equation 5

let's go ahead and see where equation 5 comes from. Why does this model give us

comes from. Why does this model give us an expectation value of 1 - 2 theta

an expectation value of 1 - 2 theta prime over pi?

prime over pi? Well, the reason being is that the

Well, the reason being is that the expectation value of the spin

expectation value of the spin measurement along the measurement axis A

measurement along the measurement axis A in accordance with the equation for the

in accordance with the equation for the rule that we've stipulated here is going

rule that we've stipulated here is going to be the probability that the lambda

to be the probability that the lambda vector is in the hemisphere defined with

vector is in the hemisphere defined with A prime at the pole times a + one for

A prime at the pole times a + one for the spin up result plus the probability

the spin up result plus the probability of lambda not being in a prime's

of lambda not being in a prime's hemisphere times the negative 1 value

hemisphere times the negative 1 value which goes along with the spin down

which goes along with the spin down measurement.

measurement. So we're thinking like a gambler here

So we're thinking like a gambler here and we're calculating that expectation

and we're calculating that expectation value. And then when we think about

value. And then when we think about this, what we realize is that the

this, what we realize is that the expectation value of the spin

expectation value of the spin measurement is going to be one, its

measurement is going to be one, its maximum value, when the theta prime

maximum value, when the theta prime angle is zero. That is when our

angle is zero. That is when our polarization vector is exactly aligned

polarization vector is exactly aligned with the effective measurement axis a

with the effective measurement axis a prime, then we're always going to get

prime, then we're always going to get spin up. like for sure 100% guarantee

spin up. like for sure 100% guarantee because when you think about the

because when you think about the hemisphere of possible lambda vectors,

hemisphere of possible lambda vectors, well, those are going to be in the same

well, those are going to be in the same hemisphere as the polarization vector.

hemisphere as the polarization vector. So if the polarization vector and the a

So if the polarization vector and the a prime vector point in exactly the same

prime vector point in exactly the same direction, then lambda is guaranteed to

direction, then lambda is guaranteed to be in a prime's hemisphere. So you're

be in a prime's hemisphere. So you're always going to get a plus one in that

always going to get a plus one in that case. And conversely, the expectation

case. And conversely, the expectation value of the spin measurement if the

value of the spin measurement if the polarization vector P is completely

polarization vector P is completely antiparallel to the effective

antiparallel to the effective measurement axis A prime that is if

measurement axis A prime that is if theta prime is pi or 180° then we're

theta prime is pi or 180° then we're always going to get a negative one a

always going to get a negative one a spin down measurement in that case. If

spin down measurement in that case. If the polarization vector is pointing

the polarization vector is pointing completely away from a prime then the

completely away from a prime then the space of possible lambda vectors is

space of possible lambda vectors is precisely the opposite of a prime's

precisely the opposite of a prime's hemisphere. And so you're always going

hemisphere. And so you're always going to get a spin down measurement in that

to get a spin down measurement in that case. And then if you think about

case. And then if you think about rotating the polarization vector P

rotating the polarization vector P relative to A prime and think about the

relative to A prime and think about the overlap in the hemispheres of P and A

overlap in the hemispheres of P and A prime, you see that the overlap varies

prime, you see that the overlap varies linearly with the angle theta prime.

linearly with the angle theta prime. This goes back to what we were talking

This goes back to what we were talking about earlier. When you imagine the

about earlier. When you imagine the board game with the spinny thing and you

board game with the spinny thing and you spin the needle and the probability of

spin the needle and the probability of it landing somewhere simply has to do

it landing somewhere simply has to do with the area of the wedge that it's

with the area of the wedge that it's going to land on. Well, as you rotate

going to land on. Well, as you rotate theta prime, you see that our

theta prime, you see that our expectation value is going to vary

expectation value is going to vary linearly with the angle theta prime for

linearly with the angle theta prime for precisely the same reason. And you can

precisely the same reason. And you can think about that as a two-dimensional

think about that as a two-dimensional circle and a board game spinner thing.

circle and a board game spinner thing. Or you can think about it in the full

Or you can think about it in the full three dimensions as if it's like an

three dimensions as if it's like an orange and you have the volume of the

orange and you have the volume of the orange slice going along with the wedge

orange slice going along with the wedge angle. But in any case, this model is

angle. But in any case, this model is going to give us an expectation value of

going to give us an expectation value of the spin measurement which is linearly

the spin measurement which is linearly dependent on theta prime. And so if you

dependent on theta prime. And so if you consider the two boundary conditions

consider the two boundary conditions we've looked at for theta prime= 0 and

we've looked at for theta prime= 0 and theta prime= pi and then apply the fact

theta prime= pi and then apply the fact that this is a linear function and then

that this is a linear function and then just think in terms of y = mx + b. You

just think in terms of y = mx + b. You see that our equation for the

see that our equation for the expectation value of the spin

expectation value of the spin measurement is necessarily 1 - 2 theta

measurement is necessarily 1 - 2 theta prime over pi. And as we know this

prime over pi. And as we know this linear function is not what quantum

linear function is not what quantum mechanics predicts and is not a match of

mechanics predicts and is not a match of the experimental data because in both

the experimental data because in both cases that's going to be the coine of

cases that's going to be the coine of the angle, not a linear function of the

the angle, not a linear function of the angle. But here's where the sketchy move

angle. But here's where the sketchy move comes in. Right? Here's why we have a

comes in. Right? Here's why we have a prime instead of just a. Suppose then

prime instead of just a. Suppose then that a prime is obtained from a by

that a prime is obtained from a by rotation towards the polarization vector

rotation towards the polarization vector p until 1 - 2 thet prime / pi equals

p until 1 - 2 thet prime / pi equals cosine of theta. Call that equation 6

cosine of theta. Call that equation 6 where theta is the angle between the

where theta is the angle between the measurement axis a and the polarization

measurement axis a and the polarization vector p. So that's that sketchy move

vector p. So that's that sketchy move that we use in order to warp the linear

that we use in order to warp the linear function into a cosine function. Well

function into a cosine function. Well then if we do that if we apply equation

then if we do that if we apply equation six then we have the desired result that

six then we have the desired result that the expectation value of the spin

the expectation value of the spin measurement is cosine of theta which is

measurement is cosine of theta which is in alignment with quantum physics and

in alignment with quantum physics and it's in alignment with the experimental

it's in alignment with the experimental data. And so technically we haven't done

data. And so technically we haven't done anything illegal here. We haven't broken

anything illegal here. We haven't broken any rules and this model therefore

any rules and this model therefore cannot be completely dismissed though it

cannot be completely dismissed though it is contrived and it is implausible and

is contrived and it is implausible and it's like we don't want to have to

it's like we don't want to have to believe this because if we have a

believe this because if we have a detector which is oriented along the

detector which is oriented along the vector A and we have to stipulate that

vector A and we have to stipulate that no actually what's happening there is

no actually what's happening there is the effective measurement axis is bent a

the effective measurement axis is bent a little bit in towards the polarization

little bit in towards the polarization vector. It's like uh well you can say

vector. It's like uh well you can say that but why would that be the case?

that but why would that be the case? This is not a very convincing model but

This is not a very convincing model but we will not dismiss it on the basis that

we will not dismiss it on the basis that it's not convincing. Instead we're going

it's not convincing. Instead we're going to go ahead and say look it's possible.

to go ahead and say look it's possible. We're not going to rule it out just yet.

We're not going to rule it out just yet. And so by lowering the epistemic

And so by lowering the epistemic standards for the hidden variable model,

standards for the hidden variable model, then that's going to hold us to a higher

then that's going to hold us to a higher standard when later on we rule out all

standard when later on we rule out all possible local hidden variable models.

possible local hidden variable models. Because then we'll be able to say, look,

Because then we'll be able to say, look, we went along with the sketchy move. We

we went along with the sketchy move. We allowed it. But even allowing that, our

allowed it. But even allowing that, our proof later on is going to be so strong

proof later on is going to be so strong that we're going to actually show that

that we're going to actually show that despite our generosity here, despite

despite our generosity here, despite being maximally charitable to the local

being maximally charitable to the local hidden variable model perspective, later

hidden variable model perspective, later on we're going to show that it just

on we're going to show that it just doesn't work. All right. So in this

doesn't work. All right. So in this simple case there is no difficulty in

simple case there is no difficulty in the view that the result of every

the view that the result of every measurement is determined by the value

measurement is determined by the value of an extra variable lambda and that the

of an extra variable lambda and that the statistical features of quantum

statistical features of quantum mechanics arise because the value of

mechanics arise because the value of this variable is unknown in individual

this variable is unknown in individual instances.

instances. That is in this particular case we can

That is in this particular case we can come up with a story involving local

come up with a story involving local hidden variables and it kind of appears

hidden variables and it kind of appears to work even though it is a little bit

to work even though it is a little bit sketchy.

sketchy. Okay, so part three of the paper then

Okay, so part three of the paper then goes on to show that hidden variables

goes on to show that hidden variables also seem to work for special cases in

also seem to work for special cases in which the two detectors have special

which the two detectors have special orientations for their measurement axis.

orientations for their measurement axis. Secondly, there is no difficulty in

Secondly, there is no difficulty in reproducing in the form of equation two

reproducing in the form of equation two that is the correlation function based

that is the correlation function based on local hidden variables the only

on local hidden variables the only features of the quantum mechanical and

features of the quantum mechanical and experimental correlation function three

experimental correlation function three commonly used in verbal discussions of

commonly used in verbal discussions of this problem. That is when our two

this problem. That is when our two measurement directions are the same in

measurement directions are the same in which case we have P of A and A cuz A

which case we have P of A and A cuz A and B are the same when they're aligned

and B are the same when they're aligned the same way. And that'll give us the

the same way. And that'll give us the negative of the correlation that we

negative of the correlation that we would find when B is equal to negative A

would find when B is equal to negative A and that's equal to1.

and that's equal to1. So when the unit vectors A and B are

So when the unit vectors A and B are aligned the same way, we get a perfect

aligned the same way, we get a perfect anti-correlation of -1. And when A and B

anti-correlation of -1. And when A and B are oppositely aligned, then we get a

are oppositely aligned, then we get a perfect correlation of 1. And the other

perfect correlation of 1. And the other special case is when the dotproduct of A

special case is when the dotproduct of A and B equals zero. That is when A and B

and B equals zero. That is when A and B are perfectly perpendicular to each

are perfectly perpendicular to each other. in which case we have no

other. in which case we have no correlation.

correlation. So aligned the same way we have negative

So aligned the same way we have negative 1. Aligned opposite ways P is 1.

1. Aligned opposite ways P is 1. Perpendicular P is zero. And these three

Perpendicular P is zero. And these three special cases can be explained by a

special cases can be explained by a local hidden variable model. For

local hidden variable model. For example, let lambda now be the unit

example, let lambda now be the unit vector lambda with uniform probability

vector lambda with uniform probability distribution over all directions and

distribution over all directions and take the rules that the measurement

take the rules that the measurement outcome a as a function of the unit

outcome a as a function of the unit vector a and this hidden variable lambda

vector a and this hidden variable lambda vector is going to be the sign of a dol

vector is going to be the sign of a dol lambda. And conversely, the measurement

lambda. And conversely, the measurement outcome at b as a function of the unit

outcome at b as a function of the unit vector b and the hidden variable vector

vector b and the hidden variable vector lambda is going to be the negative sign

lambda is going to be the negative sign of b do lambda. By the way, in Belle's

of b do lambda. By the way, in Belle's paper, there's a typo here. In the

paper, there's a typo here. In the paper, it's written as B is a function

paper, it's written as B is a function of A and B, but that should be B as a

of A and B, but that should be B as a function of B and lambda. All right. So,

function of B and lambda. All right. So, what are we doing here? Well, what we're

what are we doing here? Well, what we're saying is that we have the two particles

saying is that we have the two particles in the singlet state, and we're going to

in the singlet state, and we're going to stick a unit vector onto this pair of

stick a unit vector onto this pair of particles. So you can imagine particles

particles. So you can imagine particles one and particles 2 both carrying along

one and particles 2 both carrying along this orientational piece of information.

this orientational piece of information. This unit vector lambda which is chosen

This unit vector lambda which is chosen totally randomly out of all possible

totally randomly out of all possible directions. And [clears throat] then

directions. And [clears throat] then when particle 1 gets to detector A, if

when particle 1 gets to detector A, if lambda is pointing kind of along the

lambda is pointing kind of along the direction of A, that is if the

direction of A, that is if the dotproduct of A and lambda is positive,

dotproduct of A and lambda is positive, then you measure a spin up of particle 1

then you measure a spin up of particle 1 in detector A. And likewise, as particle

in detector A. And likewise, as particle 2 is measured in detector B, if the

2 is measured in detector B, if the lambda vector is pointing in the same

lambda vector is pointing in the same kind of direction as B, then you measure

kind of direction as B, then you measure a spin down at B. So what this model is

a spin down at B. So what this model is is kind of uh what we might

is kind of uh what we might instinctively expect is happening with a

instinctively expect is happening with a pair of particles who have an entangled

pair of particles who have an entangled spin because you might expect that there

spin because you might expect that there is some kind of orientational quantity

is some kind of orientational quantity that each particle intrinsically has,

that each particle intrinsically has, but that quantum mechanics doesn't

but that quantum mechanics doesn't account for. and that this hidden

account for. and that this hidden variable which carries with it a kind of

variable which carries with it a kind of orientation is what predetermines how

orientation is what predetermines how particles 1 and two are going to be

particles 1 and two are going to be measured at A and B respectively.

measured at A and B respectively. And so the claim is that this rule given

And so the claim is that this rule given by equation 9 works in the special cases

by equation 9 works in the special cases that the vectors A and B are perfectly

that the vectors A and B are perfectly parallel, perfectly antiparallel or

parallel, perfectly antiparallel or perfectly perpendicular. And you can

perfectly perpendicular. And you can show that that's the case. So in the

show that that's the case. So in the first case, imagine A and B being

first case, imagine A and B being perfectly parallel. Well then in

perfectly parallel. Well then in equation 9 you see that the rules for

equation 9 you see that the rules for the measurement outcomes at a and b are

the measurement outcomes at a and b are going to be equal and opposite in that

going to be equal and opposite in that case because for a we have the sign of a

case because for a we have the sign of a dot lambda but if a and b are the same

dot lambda but if a and b are the same vector then for b the rule is that it's

vector then for b the rule is that it's the negative sign of b do lambda which

the negative sign of b do lambda which is equal to a dot lambda. So you have

is equal to a dot lambda. So you have the negative of the outcome of particle

the negative of the outcome of particle a. Therefore we find perfect

a. Therefore we find perfect anti-correlation in the case that the

anti-correlation in the case that the unit vector a equals the unit vector b.

unit vector a equals the unit vector b. Likewise, then if you reverse that logic

Likewise, then if you reverse that logic and you look at rule 9 in the case that

and you look at rule 9 in the case that A and B are antiparallel, so B equals

A and B are antiparallel, so B equals negative A, then the measurement outcome

negative A, then the measurement outcome at detector A is s of A dot lambda. And

at detector A is s of A dot lambda. And the measurement outcome at detector B is

the measurement outcome at detector B is negative sign of B do lambda. B dot

negative sign of B do lambda. B dot lambda in this case would equal A dot

lambda in this case would equal A dot lambda. And you can carry that negative

lambda. And you can carry that negative sign outside of the sign function. So

sign outside of the sign function. So that then the two negatives cancel out

that then the two negatives cancel out and we find for the measurement outcome

and we find for the measurement outcome at B sine of A do lambda which is

at B sine of A do lambda which is precisely the same as the measurement

precisely the same as the measurement outcome at A. So in the case that the

outcome at A. So in the case that the measurement directions A and B are

measurement directions A and B are perfectly antiparallel we find a perfect

perfectly antiparallel we find a perfect correlation of one for the measurement

correlation of one for the measurement outcomes with this local hidden variable

outcomes with this local hidden variable model. And so in that case this model

model. And so in that case this model works just fine. And then finally, for

works just fine. And then finally, for the case that A and B are perpendicular,

the case that A and B are perpendicular, whatever the measurement outcome is at

whatever the measurement outcome is at A, you're going to have a 50/50 chance

A, you're going to have a 50/50 chance of it being the same or the opposite at

of it being the same or the opposite at B. And so in that case too, this model

B. And so in that case too, this model works just fine.

works just fine. But again, this model has a flaw, which

But again, this model has a flaw, which is that just like what we saw before in

is that just like what we saw before in part 3A, the dependence of the

part 3A, the dependence of the measurement correlation on the angle

measurement correlation on the angle theta between the vectors A and B is

theta between the vectors A and B is linear in theta. It's not the negative

linear in theta. It's not the negative cosine of theta that we expect from

cosine of theta that we expect from quantum physics and that is shown in

quantum physics and that is shown in experiments.

experiments. And to see that let's draw a picture

And to see that let's draw a picture where we imagine all possibilities for

where we imagine all possibilities for lambda selected uniformly across all

lambda selected uniformly across all possible directions and then we draw the

possible directions and then we draw the measurement direction a and you consider

measurement direction a and you consider the hemisphere of all possible vectors

the hemisphere of all possible vectors that sort of point in the same direction

that sort of point in the same direction as a that is all vectors for which a dot

as a that is all vectors for which a dot that vector is positive. Well, then the

that vector is positive. Well, then the measurement result at detector A is

measurement result at detector A is going to be spin up if lambda is in the

going to be spin up if lambda is in the same hemisphere as A or it'll be spin

same hemisphere as A or it'll be spin down if lambda is in the opposite

down if lambda is in the opposite hemisphere. So, we have a 50/50 chance

hemisphere. So, we have a 50/50 chance of measuring spin up or spin down, which

of measuring spin up or spin down, which is an agreement with experiment. But

is an agreement with experiment. But then things get a little tricky when you

then things get a little tricky when you also draw the measurement direction B in

also draw the measurement direction B in detector B and then you apply the same

detector B and then you apply the same reasoning about what the measurement

reasoning about what the measurement result is going to be in detector B. In

result is going to be in detector B. In this case, the result is going to be

this case, the result is going to be spin down if lambda is in the same

spin down if lambda is in the same hemisphere as B. Spin down because we're

hemisphere as B. Spin down because we're in the singlet state where the spins are

in the singlet state where the spins are anti-correlated and that's encoded in

anti-correlated and that's encoded in the minus sign in the second part of

the minus sign in the second part of equation 9. And then conversely,

equation 9. And then conversely, detector B will measure spin up if

detector B will measure spin up if lambda is not in the same hemisphere as

lambda is not in the same hemisphere as the measurement direction B.

the measurement direction B. And then if we want to go ahead and

And then if we want to go ahead and imagine this as an animation where we're

imagine this as an animation where we're sweeping the theta angle and considering

sweeping the theta angle and considering simultaneously all possibilities for the

simultaneously all possibilities for the hidden variable lambda that are

hidden variable lambda that are uniformly distributed over the sphere

uniformly distributed over the sphere which you may as well imagine as a

which you may as well imagine as a circle or a sphere because in either

circle or a sphere because in either case the area or the volume respectively

case the area or the volume respectively changes the same way as a function of

changes the same way as a function of the theta angle. Well, then just think

the theta angle. Well, then just think about what is the probability of having

about what is the probability of having the same outcome at both detectors

the same outcome at both detectors versus the probability of having

versus the probability of having opposite outcomes. And what you realize

opposite outcomes. And what you realize is that you're going to have the same

is that you're going to have the same outcome at both detectors when lambda is

outcome at both detectors when lambda is in the hemisphere of one of the

in the hemisphere of one of the measurement directions, but not in the

measurement directions, but not in the hemisphere of the other measurement

hemisphere of the other measurement direction. So in this animation, if you

direction. So in this animation, if you look at the two sectors with the blue

look at the two sectors with the blue arc, for both of those sectors, you're

arc, for both of those sectors, you're going to have the same measurement

going to have the same measurement outcome for both A and B. And so the

outcome for both A and B. And so the product of the outcomes at A and B is

product of the outcomes at A and B is going to equal one if lambda lies in one

going to equal one if lambda lies in one of the two blue sectors shown here. And

of the two blue sectors shown here. And then on the other hand, if lambda is in

then on the other hand, if lambda is in the hemispheres of both measurement

the hemispheres of both measurement directions or neither measurement

directions or neither measurement directions, then in that case you're

directions, then in that case you're going to have opposite outcomes at the

going to have opposite outcomes at the two detectors. And so then the product

two detectors. And so then the product of the outcomes A and B is going to be

of the outcomes A and B is going to be -1.

-1. And so to find the correlation, all we

And so to find the correlation, all we have to do is compare the area of the

have to do is compare the area of the blue sectors to the area of the red

blue sectors to the area of the red sectors. And so all the formula is is 1

sectors. And so all the formula is is 1 * the fraction of the circle taken up by

* the fraction of the circle taken up by the blue sectors minus 1 * the fraction

the blue sectors minus 1 * the fraction of the circle taken up by the red

of the circle taken up by the red sectors.

sectors. And then as we sweep theta around, we

And then as we sweep theta around, we can see the linear dependence of the

can see the linear dependence of the correlation on the theta angle. And this

correlation on the theta angle. And this linear dependence of the correlation on

linear dependence of the correlation on theta, which now we've seen a few times

theta, which now we've seen a few times in a few different contexts, is really

in a few different contexts, is really at the heart of Bell's argument, as

at the heart of Bell's argument, as we're going to see in part four.

we're going to see in part four. And so in part 3B of this paper, Bell

And so in part 3B of this paper, Bell shows us that the local hidden variable

shows us that the local hidden variable model does work for the three special

model does work for the three special cases where A and B are either parallel,

cases where A and B are either parallel, antiparallel, or perpendicular.

antiparallel, or perpendicular. And when you look at the plot of the

And when you look at the plot of the correlation that we get from our local

correlation that we get from our local hidden variable model that is this blue

hidden variable model that is this blue line and you compare it to the quantum

line and you compare it to the quantum mechanical correlation that we would

mechanical correlation that we would expect namely negative cosine of theta

expect namely negative cosine of theta you see that even though these two

you see that even though these two curves are different they do in fact

curves are different they do in fact intersect at precisely these three

intersect at precisely these three special cases. And so part 3B of Belle's

special cases. And so part 3B of Belle's paper is all about saying like, yeah,

paper is all about saying like, yeah, the local hidden variable model does

the local hidden variable model does seem to work for those three special

seem to work for those three special cases. But nonetheless, the local hidden

cases. But nonetheless, the local hidden variable model breaks down for anything

variable model breaks down for anything other than those three special cases

other than those three special cases because a line is not a cosine. And

because a line is not a cosine. And there's actually a couple of ways in

there's actually a couple of ways in which a line is not a cosine. The most

which a line is not a cosine. The most obvious one is that there's just a

obvious one is that there's just a mismatch in these two curves for most

mismatch in these two curves for most values. So pick a theta value at random

values. So pick a theta value at random and negative cosine of theta is just not

and negative cosine of theta is just not the same value as what our linear

the same value as what our linear correlation gives us. So it doesn't

correlation gives us. So it doesn't match. But the other noticeable thing

match. But the other noticeable thing that differs between this linear

that differs between this linear correlation that we get from our local

correlation that we get from our local hidden variable model and the quantum

hidden variable model and the quantum mechanical correlation is that the

mechanical correlation is that the linear correlation has a nonzero slope

linear correlation has a nonzero slope at a theta angle of 0. Whereas the

at a theta angle of 0. Whereas the quantum mechanical correlation has a

quantum mechanical correlation has a flat slope of zero at theta= 0.

flat slope of zero at theta= 0. And this is kind of a subtle difference

And this is kind of a subtle difference between these two correlation functions,

between these two correlation functions, but nonetheless, it is a difference and

but nonetheless, it is a difference and it's a difference that's totally generic

it's a difference that's totally generic to all local hidden variable models. So,

to all local hidden variable models. So, one of the things that we're going to

one of the things that we're going to prove in this paper in part 4 a is that

prove in this paper in part 4 a is that any local hidden variable model is going

any local hidden variable model is going to have a nonzero slope at a theta angle

to have a nonzero slope at a theta angle of zero.

of zero. So this animation gives us a great

So this animation gives us a great intuition for how the local hidden

intuition for how the local hidden variable model gives us a correlation

variable model gives us a correlation which depends linearly on the angle

which depends linearly on the angle theta between the vectors a and b. And

theta between the vectors a and b. And therefore bell goes on to say this gives

therefore bell goes on to say this gives a correlation as a function of a and b

a correlation as a function of a and b of1 + 2 thet pi. Call that equation 10

of1 + 2 thet pi. Call that equation 10 where theta is the angle between the

where theta is the angle between the vectors a and b and 10 has the

vectors a and b and 10 has the properties of equation 8. that is it

properties of equation 8. that is it works for the three special cases. And

works for the three special cases. And of course, the precise form of equation

of course, the precise form of equation 10, this 2 pi, that's just y= mx plus b.

10, this 2 pi, that's just y= mx plus b. That's just what it has to be to be a

That's just what it has to be to be a line that goes through the boundary

line that goes through the boundary conditions given by equation 8. But

conditions given by equation 8. But noticeably, the blue curve and the

noticeably, the blue curve and the purple curve are not the same in

purple curve are not the same in general. Not only do their values not

general. Not only do their values not match in general but also at theta

match in general but also at theta equals z the blue line has a non-zero

equals z the blue line has a non-zero slope whereas the purple quantum curve

slope whereas the purple quantum curve has a slope of zero. Now here Belle

has a slope of zero. Now here Belle abruptly brings up a very important

abruptly brings up a very important point although it is kind of jarring the

point although it is kind of jarring the way in which he brings it up so abruptly

way in which he brings it up so abruptly but in any case following the paper um

but in any case following the paper um for comparison consider the result of a

for comparison consider the result of a modified theory in which the pure

modified theory in which the pure singlet state is replaced in the course

singlet state is replaced in the course of time by an isotropic mixture of

of time by an isotropic mixture of product states. This gives the

product states. This gives the correlation function a b / 3. Call that

correlation function a b / 3. Call that equation 11. Now, what does that mean? I

equation 11. Now, what does that mean? I mean, that sentence just comes out of

mean, that sentence just comes out of nowhere, right? And there is a lot that

nowhere, right? And there is a lot that Belle is communicating in this one

Belle is communicating in this one sentence. So, I want to take a moment to

sentence. So, I want to take a moment to unpack exactly what he means because

unpack exactly what he means because this is actually a really profound

this is actually a really profound point. So when we have our purple curve

point. So when we have our purple curve of negative cosine theta for the

of negative cosine theta for the correlation between the measurement

correlation between the measurement outcome at detector A and detector B.

outcome at detector A and detector B. This is based on the two particles being

This is based on the two particles being in the singlet spin state where before

in the singlet spin state where before the measurement neither particle has a

the measurement neither particle has a preferred spin direction. But the spin

preferred spin direction. But the spin measurement outcomes for the two

measurement outcomes for the two particles are guaranteed to be

particles are guaranteed to be anti-correlated along the same

anti-correlated along the same measurement axis. whatever that

measurement axis. whatever that measurement axis may be. On the other

measurement axis may be. On the other hand, if instead of the singlet state,

hand, if instead of the singlet state, we imagine that the two particles

we imagine that the two particles already have some preferred spin

already have some preferred spin direction before they're measured, but

direction before they're measured, but still their spins are equal and opposite

still their spins are equal and opposite relative to that particular spin

relative to that particular spin direction, then we would expect

direction, then we would expect anti-correlated spin measurements if the

anti-correlated spin measurements if the particles are measured along that

particles are measured along that particular spin direction. But if the

particular spin direction. But if the particles are measured perpendicularly

particles are measured perpendicularly to that spin direction, then in that

to that spin direction, then in that case we would expect no correlation

case we would expect no correlation between the spin outcomes of those two

between the spin outcomes of those two particles.

particles. And so what Belle means by isotropic

And so what Belle means by isotropic mixture of product states is that

mixture of product states is that imagine when we're producing these

imagine when we're producing these particles instead of being in the

particles instead of being in the singlet state with pure rotational

singlet state with pure rotational symmetry and no preferred spin axis a

symmetry and no preferred spin axis a priori instead of that the particle

priori instead of that the particle pairs do have an intrinsic preferred

pairs do have an intrinsic preferred spin direction relative to which they're

spin direction relative to which they're equal and opposite and then by isotropic

equal and opposite and then by isotropic all that means is that that direction

all that means is that that direction call it n hat is selected uniformly from

call it n hat is selected uniformly from the sphere. So the particles preferred

the sphere. So the particles preferred direction is going to be totally random.

direction is going to be totally random. And so now if you imagine measuring over

And so now if you imagine measuring over many such pairs of particles and for the

many such pairs of particles and for the sake of argument suppose we imagine them

sake of argument suppose we imagine them along the same measurement axis A. Well

along the same measurement axis A. Well sometimes that spin axis n is going to

sometimes that spin axis n is going to be aligned but usually it's not going to

be aligned but usually it's not going to be very aligned in which case we won't

be very aligned in which case we won't really see much of a correlation. And

really see much of a correlation. And when you work out the math of on

when you work out the math of on average, what correlation strength would

average, what correlation strength would we expect, you find a correlation

we expect, you find a correlation strength which is the same as for the

strength which is the same as for the singlet state, but divided by a factor

singlet state, but divided by a factor of three, which represents the fact that

of three, which represents the fact that when you average over all three

when you average over all three dimensions of space, more often than

dimensions of space, more often than not, our measurement directions are not

not, our measurement directions are not going to be aligned with the spin

going to be aligned with the spin direction n.

direction n. And so we actually see a very strong

And so we actually see a very strong theoretical and experimental difference

theoretical and experimental difference between the singlet state and a

between the singlet state and a situation where the particles have equal

situation where the particles have equal and opposite spin along some random

and opposite spin along some random axis. The correlation we get from the

axis. The correlation we get from the singlet state is weirdly strong in a

singlet state is weirdly strong in a surreal kind of way. And this reflects

surreal kind of way. And this reflects the fact that in the singlet state

the fact that in the singlet state neither particle has a preferred

neither particle has a preferred direction before it's measured. And so

direction before it's measured. And so if you think in terms of one of the

if you think in terms of one of the particles being measured ever so

particles being measured ever so slightly before the other, then you're

slightly before the other, then you're guaranteed to collapse the wave function

guaranteed to collapse the wave function along that measurement direction. And so

along that measurement direction. And so in the singlet state, your measurement

in the singlet state, your measurement axes are always going to be more

axes are always going to be more aligned. Whereas for an isotropic

aligned. Whereas for an isotropic mixture of product states, in general,

mixture of product states, in general, you're not going to have this kind of

you're not going to have this kind of alignment.

alignment. All right. So Belle then goes on to say

All right. So Belle then goes on to say it is probably less easy experimentally

it is probably less easy experimentally to distinguish equation 10 from equation

to distinguish equation 10 from equation 3 than equation 11 from equation 3. So

3 than equation 11 from equation 3. So equation 10 is the linear correlation

equation 10 is the linear correlation that we get from our local hidden

that we get from our local hidden variable model. And equation 11 is the

variable model. And equation 11 is the A.B3

A.B3 that is negative cosine theta over 3

that is negative cosine theta over 3 correlation that we get from a quantum

correlation that we get from a quantum mechanical model in which the two

mechanical model in which the two particles are not in the singlet state

particles are not in the singlet state but rather are in a product state with

but rather are in a product state with some preferred direction. And what Bell

some preferred direction. And what Bell is saying here is that there's really a

is saying here is that there's really a big contrast in the experimental data

big contrast in the experimental data between a singlet state and an isotropic

between a singlet state and an isotropic mixture of product states. whereas the

mixture of product states. whereas the linear correlation from a local hidden

linear correlation from a local hidden variable model is going to be a better

variable model is going to be a better approximation to the actual quantum

approximation to the actual quantum mechanical singlet correlation. So

mechanical singlet correlation. So that's just a point about experimental

that's just a point about experimental practicality.

practicality. Now before moving on from part 3B, Bell

Now before moving on from part 3B, Bell makes one final comment which is that

makes one final comment which is that unlike equation 3, the quantum

unlike equation 3, the quantum mechanical correlation negative cosine

mechanical correlation negative cosine of theta, the function of equation 10,

of theta, the function of equation 10, this linear correlation we get from the

this linear correlation we get from the local hidden variable model is not

local hidden variable model is not stationary. That is the slope is non

stationary. That is the slope is non zero at the minimum value -1 where theta

zero at the minimum value -1 where theta equals 0. So we talked about that

equals 0. So we talked about that earlier when thinking about the

earlier when thinking about the differences between the blue line and

differences between the blue line and the magenta curve that is between the

the magenta curve that is between the local hidden variable model and the

local hidden variable model and the quantum mechanical correlation.

quantum mechanical correlation. One of the differences is that the

One of the differences is that the values in general are not the same

values in general are not the same value. But another difference is that

value. But another difference is that the quantum mechanical correlation has a

the quantum mechanical correlation has a slope of zero at its minimum value

slope of zero at its minimum value whereas the local hidden variable line

whereas the local hidden variable line does not. It'll be seen in part 4 a that

does not. It'll be seen in part 4 a that this is characteristic of functions of

this is characteristic of functions of type two that is where the correlation

type two that is where the correlation is given by a local hidden variable

is given by a local hidden variable model. So in part 4 a we're going to

model. So in part 4 a we're going to prove that any local hidden variable

prove that any local hidden variable model is going to have a nonzero slope

model is going to have a nonzero slope in its correlation function at the

in its correlation function at the minimum value which is incompatible with

minimum value which is incompatible with quantum mechanics and with the

quantum mechanics and with the experimental data. And then in part 4B,

experimental data. And then in part 4B, we're going to prove that in general,

we're going to prove that in general, the two correlation curves for a local

the two correlation curves for a local hidden variable model and for quantum

hidden variable model and for quantum mechanics in general cannot take on the

mechanics in general cannot take on the same values everywhere.

same values everywhere. So in part four, we're going to prove in

So in part four, we're going to prove in two different ways that local hidden

two different ways that local hidden variable models are not compatible with

variable models are not compatible with quantum mechanics and not compatible

quantum mechanics and not compatible with the experimental data.

with the experimental data. Okay, so then Bell wraps up part three

Okay, so then Bell wraps up part three by talking about how a hidden variable

by talking about how a hidden variable model could work if we allow for

model could work if we allow for non-locality.

non-locality. Thirdly and finally, there is no

Thirdly and finally, there is no difficulty in reproducing the quantum

difficulty in reproducing the quantum mechanical correlation of equation three

mechanical correlation of equation three if the results of the spin measurements

if the results of the spin measurements at A and B in equation two, the

at A and B in equation two, the correlation function of the local hidden

correlation function of the local hidden variable model are allowed to depend on

variable model are allowed to depend on the measurement directions B and A

the measurement directions B and A respectively as well as on A and B. And

respectively as well as on A and B. And Belle shows this by saying if we do a

Belle shows this by saying if we do a non-local sketchy move, we can warp the

non-local sketchy move, we can warp the blue line into the magenta curve. So the

blue line into the magenta curve. So the reasoning here is exactly the same as

reasoning here is exactly the same as what we've seen before when we thought

what we've seen before when we thought about doing a sketchy move to warp the

about doing a sketchy move to warp the line into the curve. But the key

line into the curve. But the key difference now is that when you have two

difference now is that when you have two entangled particles that are separated

entangled particles that are separated in space, you can't do this sketchy move

in space, you can't do this sketchy move unless you know the angle between the

unless you know the angle between the measurement directions A and B, which

measurement directions A and B, which are in different light cones. And so

are in different light cones. And so this is a non-local sketchy move because

this is a non-local sketchy move because somehow what's happening at detector A

somehow what's happening at detector A depends on the measurement axis at

depends on the measurement axis at detector B and vice versa. So as a

detector B and vice versa. So as a concrete example of this, we can replace

concrete example of this, we can replace the vector A in equation 9 by an

the vector A in equation 9 by an effective measurement axis A prime

effective measurement axis A prime obtained from A by rotation towards the

obtained from A by rotation towards the measurement vector B until 1 - 2 theta

measurement vector B until 1 - 2 theta prime over pi equals cosine of theta

prime over pi equals cosine of theta where theta prime is the angle between

where theta prime is the angle between the effective measurement axis A prime

the effective measurement axis A prime and B. So if you make that sketchy move

and B. So if you make that sketchy move then the blue line is going to warp into

then the blue line is going to warp into the magenta quantum curve and then in

the magenta quantum curve and then in that case we would have a match between

that case we would have a match between our hidden variable model and quantum

our hidden variable model and quantum mechanics and the experimental data. And

mechanics and the experimental data. And so this is exactly the same reasoning as

so this is exactly the same reasoning as the sketchy moves that we looked at

the sketchy moves that we looked at before. In fact, it's exactly the same

before. In fact, it's exactly the same mathematical maneuver. However, for

mathematical maneuver. However, for given values of the hidden variables,

given values of the hidden variables, the results of measurements with one

the results of measurements with one magnet now depend on the setting of the

magnet now depend on the setting of the distant magnet, which is just what we

distant magnet, which is just what we would wish to avoid, that is

would wish to avoid, that is non-locality.

non-locality. And there's really no way around that.

And there's really no way around that. If you look at the example shown here

If you look at the example shown here where we replaced a with a prime and you

where we replaced a with a prime and you think maybe there's some way to do the

think maybe there's some way to do the sketchy move differently in a way that

sketchy move differently in a way that doesn't violate locality, well, try to

doesn't violate locality, well, try to do that and you find it doesn't work. So

do that and you find it doesn't work. So for example, what if instead of rotating

for example, what if instead of rotating A into A prime, we leave A alone and

A into A prime, we leave A alone and rotate B into B prime in a way that

rotate B into B prime in a way that gives us the same result. Well, that

gives us the same result. Well, that would require for B prime to be a vector

would require for B prime to be a vector that's slightly rotated towards A. And

that's slightly rotated towards A. And again, it's the same thing. And in fact,

again, it's the same thing. And in fact, by symmetry, that reasoning is the same

by symmetry, that reasoning is the same as before, where now we're just saying

as before, where now we're just saying that what's happening at detector B is

that what's happening at detector B is somehow bent towards the measurement

somehow bent towards the measurement direction A. And so really, it's the

direction A. And so really, it's the same kind of nonsense.

same kind of nonsense. And then also philosophically we might

And then also philosophically we might expect there to be some symmetry here.

expect there to be some symmetry here. So if we wanted an idea like this to

So if we wanted an idea like this to work maybe we should actually bend A to

work maybe we should actually bend A to A prime and B to B prime where A prime

A prime and B to B prime where A prime is bent towards B and B prime is bent

is bent towards B and B prime is bent towards A in an equal and opposite kind

towards A in an equal and opposite kind of way. But in that case then both

of way. But in that case then both detectors know something about how the

detectors know something about how the other detector is configured. And so

other detector is configured. And so fundamentally it's exactly the same

fundamentally it's exactly the same problem no matter how you look at it.

problem no matter how you look at it. So reflecting on part three, we've seen

So reflecting on part three, we've seen some specific examples of how hidden

some specific examples of how hidden variable models don't really work. They

variable models don't really work. They just don't match the experimental data,

just don't match the experimental data, whereas quantum mechanics does. And so

whereas quantum mechanics does. And so what follows in part four is going to be

what follows in part four is going to be very abstract, very mathematical, very

very abstract, very mathematical, very algebraic, and we're going to take our

algebraic, and we're going to take our time with it because it's a whole lot of

time with it because it's a whole lot of equations and symbols and all that. But

equations and symbols and all that. But if you followed along part three, then

if you followed along part three, then you already have the fundamental insight

you already have the fundamental insight required to make sense of part four. All

required to make sense of part four. All we're doing in part 4 is generalizing on

we're doing in part 4 is generalizing on this specific example to show first that

this specific example to show first that every local hidden variable model is

every local hidden variable model is going to have a correlation function

going to have a correlation function with nonzero slope at its minimum value,

with nonzero slope at its minimum value, which is in contradiction with quantum

which is in contradiction with quantum mechanics and the experimental data.

mechanics and the experimental data. And then second, in part 4B, we're going

And then second, in part 4B, we're going to show that in general, the correlation

to show that in general, the correlation function given by a local hidden

function given by a local hidden variable model cannot take on the same

variable model cannot take on the same value as the correlation given by

value as the correlation given by quantum mechanics and experiment at

quantum mechanics and experiment at every theta point. That is for every

every theta point. That is for every possible configuration of the

possible configuration of the measurement axes A and B. And so it's

measurement axes A and B. And so it's the same kind of reasoning that we've

the same kind of reasoning that we've seen in part three, but just in a much

seen in part three, but just in a much more abstract and generic kind of way.

more abstract and generic kind of way. And the abstraction is worth it. Even

And the abstraction is worth it. Even though it is somewhat impenetrable and

though it is somewhat impenetrable and it takes a lot of time to digest, it's

it takes a lot of time to digest, it's going to be a very powerful result. And

going to be a very powerful result. And so, as usual, ask not for easier

so, as usual, ask not for easier equations, but for stronger coffee. You

equations, but for stronger coffee. You got to prepare yourself for this because

got to prepare yourself for this because it's going to be a bit of work, but it

it's going to be a bit of work, but it is well worth the effort.

is well worth the effort. All right, my friends. We're now ready

All right, my friends. We're now ready to approach the core argument of Bell's

to approach the core argument of Bell's paper, part four, contradiction.

paper, part four, contradiction. Okay, so in the first part of part four,

Okay, so in the first part of part four, we're going to show that the correlation

we're going to show that the correlation function that we get from a local hidden

function that we get from a local hidden variable model cannot be stationary at

variable model cannot be stationary at its minimum value when theta equals 0

its minimum value when theta equals 0 unlike the quantum correlation which is

unlike the quantum correlation which is stationary that is does have zero slope

stationary that is does have zero slope at its minimum value for theta equals 0.

at its minimum value for theta equals 0. And so this is going to be a generic

And so this is going to be a generic difference between the kinds of

difference between the kinds of correlations that local hidden variable

correlations that local hidden variable models can give us and the correlation

models can give us and the correlation that we expect from quantum mechanics

that we expect from quantum mechanics which is also the correlation measured

which is also the correlation measured in experiments.

in experiments. All right, the main result will now be

All right, the main result will now be proved because row is a normalized

proved because row is a normalized probability distribution. The integral

probability distribution. The integral over row d lambda equals 1. And we saw

over row d lambda equals 1. And we saw that before. That just means if you

that before. That just means if you consider every possible configuration of

consider every possible configuration of hidden variables and add them all up,

hidden variables and add them all up, each one weighted by its probability,

each one weighted by its probability, then the result is going to be one. In

then the result is going to be one. In other words, the hidden variables have

other words, the hidden variables have to be in some kind of configuration.

to be in some kind of configuration. And next, because of the properties of

And next, because of the properties of equation one, where we saw that the

equation one, where we saw that the measurement outcomes at detectors A and

measurement outcomes at detectors A and B can only take on the values of + one

B can only take on the values of + one or minus one depending on whether that

or minus one depending on whether that detector measured spin up or spin down

detector measured spin up or spin down respectively. Then if we consider the

respectively. Then if we consider the definition of our local hidden variable

definition of our local hidden variable correlation function in equation two

correlation function in equation two where we found that P is going to be the

where we found that P is going to be the integral over all possible

integral over all possible configurations of the hidden variables

configurations of the hidden variables of the measurement result at A times the

of the measurement result at A times the measurement result at B and this

measurement result at B and this correlation is going to be a function of

correlation is going to be a function of the measurement axes A and B. Then as

the measurement axes A and B. Then as you can see this correlation P cannot be

you can see this correlation P cannot be less than -1.

less than -1. That is the lowest value our correlation

That is the lowest value our correlation can be is a perfectly anti-correlated

can be is a perfectly anti-correlated value of negative 1.

value of negative 1. And when can it take on that value?

And when can it take on that value? Well, as we've seen, the correlation

Well, as we've seen, the correlation function can only reach -1 at a equals

function can only reach -1 at a equals b. That is when the two measurements are

b. That is when the two measurements are aligned along the same axis. Then for

aligned along the same axis. Then for the singlet state, you're going to have

the singlet state, you're going to have perfectly anti-correlated results.

perfectly anti-correlated results. Measure spin up at detector A along the

Measure spin up at detector A along the axis A. And for sure you know you're

axis A. And for sure you know you're going to measure spin down at detector B

going to measure spin down at detector B for an axis B which is equal to A. So

for an axis B which is equal to A. So we've seen that before. That's nothing

we've seen that before. That's nothing new. And now Belle makes a technically

new. And now Belle makes a technically nuanced comment which is that this is

nuanced comment which is that this is only the case if A as a function of A

only the case if A as a function of A and lambda is equal to B as a function

and lambda is equal to B as a function of A and lambda except at a set of

of A and lambda except at a set of points lambda of zero probability.

points lambda of zero probability. Now this is a technical caveat that is

Now this is a technical caveat that is designed to keep this argument fully

designed to keep this argument fully generic. We know from experiments that

generic. We know from experiments that for the singlet state, it is going to be

for the singlet state, it is going to be true that the measurement result at A

true that the measurement result at A for measurement axis A is indeed going

for measurement axis A is indeed going to be equal to the negative of the

to be equal to the negative of the measurement result at B for measurement

measurement result at B for measurement along the same axis A. But because we're

along the same axis A. But because we're trying to rule out the possibility of

trying to rule out the possibility of all imaginable hidden variable models,

all imaginable hidden variable models, you could in theory imagine a model

you could in theory imagine a model where these functions A as a function of

where these functions A as a function of A and lambda is not necessarily equal to

A and lambda is not necessarily equal to B as a function of A and lambda. But you

B as a function of A and lambda. But you could have some superfluous

could have some superfluous configurations of hidden variables. And

configurations of hidden variables. And that's technically fine as long as those

that's technically fine as long as those configurations of hidden variables have

configurations of hidden variables have zero probability. So this is a really

zero probability. So this is a really minor point and honestly it probably

minor point and honestly it probably kind of goes without saying because we

kind of goes without saying because we know from the experimental data that for

know from the experimental data that for sure the result at detector A is going

sure the result at detector A is going to be the negative of the result at

to be the negative of the result at detector B when you're measuring along

detector B when you're measuring along the same axis. So you can think of that

the same axis. So you can think of that as an experimental boundary condition.

as an experimental boundary condition. And if any local hidden variable model

And if any local hidden variable model disagrees with that, that is if you have

disagrees with that, that is if you have a local hidden variable model that goes

a local hidden variable model that goes against equation 13, well, that can only

against equation 13, well, that can only match the experiment if the lambda which

match the experiment if the lambda which violate equation 13 have zero

violate equation 13 have zero probability of occurring. Anyway, I

probability of occurring. Anyway, I think the paper probably could have gone

think the paper probably could have gone without that little comment about a set

without that little comment about a set of points lambda of zero probability,

of points lambda of zero probability, but it's in there just for the reader

but it's in there just for the reader who's going to be very pedantic about

who's going to be very pedantic about that. So, all right then. If we assume

that. So, all right then. If we assume equation 13, which is really less of an

equation 13, which is really less of an assumption and more of an experimental

assumption and more of an experimental boundary condition, then equation 2, the

boundary condition, then equation 2, the correlation for a local hidden variable

correlation for a local hidden variable model, can be written as P as a function

model, can be written as P as a function of A and B is equal to the negative of

of A and B is equal to the negative of the integral over all possible

the integral over all possible configurations of hidden variables of

configurations of hidden variables of the result at detector A as a function

the result at detector A as a function of A and lambda times the hypothetical

of A and lambda times the hypothetical result at detector A as a function

result at detector A as a function and lambda. Now let's linger on that for

and lambda. Now let's linger on that for a second. What is this term as a

a second. What is this term as a function of lambda? Well, what that is

function of lambda? Well, what that is is imagine a generic case where we have

is imagine a generic case where we have our detectors A and B and A is aligned

our detectors A and B and A is aligned with some axis A and the alignment of

with some axis A and the alignment of detector B is some axis B. Well, we know

detector B is some axis B. Well, we know that our correlation is going to depend

that our correlation is going to depend on the product of the measurement

on the product of the measurement results at detectors A and B. And all

results at detectors A and B. And all equation 14 is is that the result at

equation 14 is is that the result at detector B can be thought of as the

detector B can be thought of as the negative of the result that detector A

negative of the result that detector A would measure if A were aligned along

would measure if A were aligned along the B axis. And so you see the only

the B axis. And so you see the only difference between equation 14 and

difference between equation 14 and equation two is that the measurement

equation two is that the measurement result at B aligned along the axis B as

result at B aligned along the axis B as a function of our hidden variables

a function of our hidden variables lambda has been replaced with what would

lambda has been replaced with what would have been the results of the measurement

have been the results of the measurement at A if A were aligned along the same

at A if A were aligned along the same axis B and we had the same hidden

axis B and we had the same hidden variables lambda. So this is just a way

variables lambda. So this is just a way of writing our correlation in terms of

of writing our correlation in terms of measurement results at detector A.

measurement results at detector A. All right. And next what we're going to

All right. And next what we're going to do is we're going to let C be another

do is we're going to let C be another unit vector which is an alternative

unit vector which is an alternative option for B. So imagine C as the

option for B. So imagine C as the alignment in detector B. In fact, at

alignment in detector B. In fact, at first imagine that C is the same thing

first imagine that C is the same thing as B and then give it just a little

as B and then give it just a little nudge so that C is just a little

nudge so that C is just a little different than B. And then the question

different than B. And then the question we can ask is if you imagine two

we can ask is if you imagine two hypothetical scenarios, one where you

hypothetical scenarios, one where you had the measurement axes A and B and

had the measurement axes A and B and another where you had the measurement

another where you had the measurement axes A and C where C is just a little

axes A and C where C is just a little nudge away from B. Then how do we

nudge away from B. Then how do we calculate the difference in the

calculate the difference in the correlations P of A and B and P of A and

correlations P of A and B and P of A and C? In other words, what kind of

C? In other words, what kind of difference in the correlation do we get

difference in the correlation do we get when we apply a small little nudge on

when we apply a small little nudge on the axis of detector B?

the axis of detector B? Well, all we have to do is replace P

Well, all we have to do is replace P with the integral formula given by

with the integral formula given by equation 14. And we can go ahead and

equation 14. And we can go ahead and smush these together into one integral.

smush these together into one integral. And we see that we have the negative

And we see that we have the negative integral over all possibilities for the

integral over all possibilities for the hidden variables of a as a function of a

hidden variables of a as a function of a and lambda * a as a function of b and

and lambda * a as a function of b and lambda minus a as a function of a and

lambda minus a as a function of a and lambda time a as a function of c and

lambda time a as a function of c and lambda which is how the correlation

lambda which is how the correlation function would change if we slightly

function would change if we slightly changed the measurement axis at detector

changed the measurement axis at detector B from the vector B to the very similar

B from the vector B to the very similar vector C. So now Belle goes on to

vector C. So now Belle goes on to algebraically massage this integral

algebraically massage this integral expression into a different form shown

expression into a different form shown here.

here. And to see what he's done here, let's go

And to see what he's done here, let's go ahead and color code this like so. So

ahead and color code this like so. So first of all, you see that both parts of

first of all, you see that both parts of the integrant have in common this factor

the integrant have in common this factor of a as a function of a and lambda. So

of a as a function of a and lambda. So we can go ahead and factor that out and

we can go ahead and factor that out and pull that to the left. And the next

pull that to the left. And the next thing you want to look at is in the top

thing you want to look at is in the top expression there, we have that factor of

expression there, we have that factor of a of b and lambda. And we also have a

a of b and lambda. And we also have a minus sign. So now what we're going to

minus sign. So now what we're going to do to bring that into the bottom

do to bring that into the bottom expression is we're going to factor out

expression is we're going to factor out that term a of b and lambda. So we're

that term a of b and lambda. So we're going to bring that to the left. And

going to bring that to the left. And then what remains is just the number

then what remains is just the number one. But then we're going to go ahead

one. But then we're going to go ahead and pull in that minus sign from the

and pull in that minus sign from the outside of the integral to the inside.

outside of the integral to the inside. And so that term is just going to be a

And so that term is just going to be a -1 inside of the brackets from which we

-1 inside of the brackets from which we factored out a of b and lambda.

factored out a of b and lambda. And then the final thing that we have to

And then the final thing that we have to prove is that in that top expression,

prove is that in that top expression, the term on the right involving the a of

the term on the right involving the a of c and lambda can be brought down below

c and lambda can be brought down below and turned into this expression a of b

and turned into this expression a of b and lambda time a of c and lambda. And

and lambda time a of c and lambda. And to show that this is in fact a

to show that this is in fact a legitimate move, first of all, in the

legitimate move, first of all, in the top equation, notice how we have two

top equation, notice how we have two minus signs. And so those are going to

minus signs. And so those are going to cancel each other out. And then the only

cancel each other out. And then the only question that remains is, is the product

question that remains is, is the product of these two purple expressions times a

of these two purple expressions times a of cm and lambda equal to a of c and

of cm and lambda equal to a of c and lambda? Well, yeah, it is. The reason

lambda? Well, yeah, it is. The reason being that purple expression is the

being that purple expression is the square of a of b and lambda. But

square of a of b and lambda. But remember this capital A, this is the

remember this capital A, this is the measurement result at detector A. And

measurement result at detector A. And the only values it can take on are

the only values it can take on are either plus or minus one. But in either

either plus or minus one. But in either case, the square of plus or -1 equals 1.

case, the square of plus or -1 equals 1. And so yeah, the purple expression then

And so yeah, the purple expression then collapses onto the number one. And we

collapses onto the number one. And we see that this was in fact a legitimate

see that this was in fact a legitimate move the way we've factored things out

move the way we've factored things out here. So what we end up with is the same

here. So what we end up with is the same integral we had before, but just

integral we had before, but just massaged into a different form.

massaged into a different form. All right. So now bell is going to claim

All right. So now bell is going to claim that this integral expression is less

that this integral expression is less than another integral. So using equation

than another integral. So using equation one which is where we specified that the

one which is where we specified that the measurement outcomes at detectors A and

measurement outcomes at detectors A and B can only either be +1 or minus1 then

B can only either be +1 or minus1 then we can show that our integral expression

we can show that our integral expression is going to be less than or equal to the

is going to be less than or equal to the integral over row D lambda of 1 minus A

integral over row D lambda of 1 minus A as a function of B and lambda* A of C

as a function of B and lambda* A of C and lambda.

and lambda. Now, when I got to this part of the

Now, when I got to this part of the paper, I was looking at it and I was

paper, I was looking at it and I was like, "Uh, hm. Okay, [clears throat]

like, "Uh, hm. Okay, [clears throat] why? [laughter]

why? [laughter] How do we know that's the case?" And I

How do we know that's the case?" And I was staring at it for a while and I just

was staring at it for a while and I just couldn't figure it out. I I don't know

couldn't figure it out. I I don't know if there's supposed to be an easier way

if there's supposed to be an easier way to do this because if you look at the

to do this because if you look at the two sides of this inequality, you see

two sides of this inequality, you see that on the left side we have something

that on the left side we have something of the form n * m -1 and on the right

of the form n * m -1 and on the right side we have something of the form 1 -

side we have something of the form 1 - m. And notice that in both cases that

m. And notice that in both cases that blue expression m is the same number on

blue expression m is the same number on both the left side and the right side.

both the left side and the right side. And both n and m are necessarily

And both n and m are necessarily integers. And because both of them are

integers. And because both of them are just a * a, we know that n and m are

just a * a, we know that n and m are both going to be plus or - 1. And so

both going to be plus or - 1. And so then the question just becomes whether

then the question just becomes whether it is in fact the case that n * mus1 is

it is in fact the case that n * mus1 is always less than or equal to 1 - m for

always less than or equal to 1 - m for the four possibilities of each n and m

the four possibilities of each n and m being plus or minus one. So anyway, I

being plus or minus one. So anyway, I ended up just checking all four

ended up just checking all four possibilities and verifying that for all

possibilities and verifying that for all the four possible options, this is

the four possible options, this is actually true. I don't know if there's a

actually true. I don't know if there's a more elegant way of demonstrating that

more elegant way of demonstrating that this is true. But in any case, this way

this is true. But in any case, this way works fine. It's just a little bit

works fine. It's just a little bit tedious.

tedious. If you check all four possibilities

If you check all four possibilities here, you find that this is in fact a

here, you find that this is in fact a legit move and the integral expression

legit move and the integral expression on the left is indeed always less than

on the left is indeed always less than or equal to the integral expression on

or equal to the integral expression on the right.

the right. Okay, then. So that works. But why do we

Okay, then. So that works. But why do we care? Like what are we doing here? Well,

care? Like what are we doing here? Well, notice this. If we look at that integral

notice this. If we look at that integral expression, the second term on the right

expression, the second term on the right is our correlation function evaluated

is our correlation function evaluated for the vectors B and C.

for the vectors B and C. You see, because by equation 14, we know

You see, because by equation 14, we know that we can calculate our correlation

that we can calculate our correlation function in terms of results at detector

function in terms of results at detector A as a function of measurement axes and

A as a function of measurement axes and the hidden variables. And in that case

the hidden variables. And in that case we just integrate over all row d lambda

we just integrate over all row d lambda a of a and lambda* a of b and lambda

a of a and lambda* a of b and lambda with a minus sign on the outside. And so

with a minus sign on the outside. And so by pattern recognition we can see that

by pattern recognition we can see that the second term on the right of this

the second term on the right of this integral by equation 14 is actually

integral by equation 14 is actually equal to p evaluated with the vectors b

equal to p evaluated with the vectors b and c. And that's going to be very

and c. And that's going to be very important in just a moment. Okay. Okay.

important in just a moment. Okay. Okay. So having recognized P as a function of

So having recognized P as a function of B and C, it follows that 1 + P as a

B and C, it follows that 1 + P as a function of B and C is greater than or

function of B and C is greater than or equal to the absolute value of P of A

equal to the absolute value of P of A and B minus P of A and C. And you can

and B minus P of A and C. And you can kind of read that directly from the

kind of read that directly from the characters that are colorful here. You

characters that are colorful here. You see because if you think about the

see because if you think about the expression that we've been evaluating,

expression that we've been evaluating, remember we started off with thinking

remember we started off with thinking about what is the difference in the

about what is the difference in the correlation function. If we have P as a

correlation function. If we have P as a function of A and B compared to that is

function of A and B compared to that is minus the correlation as a function of A

minus the correlation as a function of A and C where C is a vector very much like

and C where C is a vector very much like B but with a little nudge. And we showed

B but with a little nudge. And we showed after evaluating all of these integral

after evaluating all of these integral expressions that this difference in

expressions that this difference in correlations has to be less than or

correlations has to be less than or equal to this integral expression which

equal to this integral expression which contains in it P is a function of B and

contains in it P is a function of B and C plus one. There's also a one in the

C plus one. There's also a one in the integral. But because row is normalized,

integral. But because row is normalized, that one just pops outside of the

that one just pops outside of the integral. But then Belle goes ahead and

integral. But then Belle goes ahead and switches this expression around so that

switches this expression around so that you have the difference in the

you have the difference in the correlation on the right side and we

correlation on the right side and we pull the expression involving P of B and

pull the expression involving P of B and C on over to the left side. And so

C on over to the left side. And so that's why the less than or equal to

that's why the less than or equal to sign flips around into a greater than or

sign flips around into a greater than or equal to.

equal to. And so that reasoning justifies equation

And so that reasoning justifies equation 15 without the absolute value. But now

15 without the absolute value. But now we need to justify where that absolute

we need to justify where that absolute value comes from. And as it turns out,

value comes from. And as it turns out, the absolute value sign arises from

the absolute value sign arises from symmetry. So imagine swapping the

symmetry. So imagine swapping the vectors B and C in all of these

vectors B and C in all of these equations. Well, on the left hand side,

equations. Well, on the left hand side, when you consider the function P as a

when you consider the function P as a function of B and C, if instead we had P

function of B and C, if instead we had P as a function of C and B, that's

as a function of C and B, that's actually the same thing. That's equal to

actually the same thing. That's equal to P as a function of B and C. Because at

P as a function of B and C. Because at the end of the day, you're still

the end of the day, you're still measuring along the same two measurement

measuring along the same two measurement axes. And it doesn't matter which

axes. And it doesn't matter which detector we say is detector A versus

detector we say is detector A versus detector B. So the order of the input B

detector B. So the order of the input B and C doesn't matter in the correlation

and C doesn't matter in the correlation function.

function. However, on the right hand side where we

However, on the right hand side where we have this difference P of A and B minus

have this difference P of A and B minus P of A and C, if you switch around B and

P of A and C, if you switch around B and C on that side, you end up with P of A

C on that side, you end up with P of A and C minus P of A and B, which is the

and C minus P of A and B, which is the same right hand side as before, but with

same right hand side as before, but with a sign flip. And then you think about

a sign flip. And then you think about the fact that we should be able to swap

the fact that we should be able to swap B and C around in this argument. By

B and C around in this argument. By symmetry, there's no meaningful

symmetry, there's no meaningful difference between the vectors B and C.

difference between the vectors B and C. And so then you can imagine that the

And so then you can imagine that the same line of reasoning shows us that our

same line of reasoning shows us that our left hand side is going to be greater

left hand side is going to be greater than or equal to plus or minus the right

than or equal to plus or minus the right hand side. And so without loss of

hand side. And so without loss of generality, we can go ahead and clean

generality, we can go ahead and clean that up and just say that the left hand

that up and just say that the left hand side is greater than or equal to the

side is greater than or equal to the absolute value of the right hand side.

absolute value of the right hand side. So we're not losing anything by shaving

So we're not losing anything by shaving off that negative option.

off that negative option. All right. So then Bell goes on to say

All right. So then Bell goes on to say that unless P is constant, the right

that unless P is constant, the right hand side is in general of order

hand side is in general of order absolute value B minus C for small

absolute value B minus C for small absolute value of B minus C. And in just

absolute value of B minus C. And in just a moment I'm going to unpack why that

a moment I'm going to unpack why that is. But real quick, I just want to read

is. But real quick, I just want to read the next thing that Belle wrote, which

the next thing that Belle wrote, which is that thus P of B and C cannot be

is that thus P of B and C cannot be stationary at the minimum value, which

stationary at the minimum value, which is -1, where B equals C. Right? When the

is -1, where B equals C. Right? When the axes are aligned, our correlation takes

axes are aligned, our correlation takes on its minimum value of a perfect

on its minimum value of a perfect anti-correlation.

anti-correlation. And therefore, the correlation function

And therefore, the correlation function cannot equal the quantum mechanical

cannot equal the quantum mechanical value given by equation 3, which is a b

value given by equation 3, which is a b or also known as negative cosine of

or also known as negative cosine of theta.

theta. Okay, now I am a huge fan of Belle and

Okay, now I am a huge fan of Belle and his work and he is a great genius, but

his work and he is a great genius, but my goodness does he say so much with so

my goodness does he say so much with so few words and here it's kind of hard to

few words and here it's kind of hard to see exactly what he's talking about. So

see exactly what he's talking about. So I want to take a moment to just unpack

I want to take a moment to just unpack this and really get into what exactly

this and really get into what exactly he's saying here. Okay, so the first

he's saying here. Okay, so the first thing to realize is that if we write our

thing to realize is that if we write our correlation function P of B and C and we

correlation function P of B and C and we think about it just as a mathematical

think about it just as a mathematical function that takes two vectors as input

function that takes two vectors as input and we know that this function is going

and we know that this function is going to take on a minimum value of -1 when

to take on a minimum value of -1 when the vector B equals the vector C. Then

the vector B equals the vector C. Then we can say that if the function were

we can say that if the function were stationary then the curve would be flat

stationary then the curve would be flat there at that value. Just like the

there at that value. Just like the negative cosine of theta curve of

negative cosine of theta curve of quantum mechanics is flat at theta

quantum mechanics is flat at theta equals 0. So if we claim that we have a

equals 0. So if we claim that we have a hidden variable model that matches the

hidden variable model that matches the quantum mechanical predictions, we

quantum mechanical predictions, we should expect it to have a slope of zero

should expect it to have a slope of zero when its two vector inputs are the same

when its two vector inputs are the same vector. But now if it's flat wherever

vector. But now if it's flat wherever its two vector inputs are the same, then

its two vector inputs are the same, then we can say something about this

we can say something about this situation. We can say that if now we

situation. We can say that if now we imagine that the vectors B and C are

imagine that the vectors B and C are very similar. They're almost the same. B

very similar. They're almost the same. B is approximately C with the absolute

is approximately C with the absolute value of B minus C. That is the size of

value of B minus C. That is the size of the tiny difference between these two

the tiny difference between these two vectors. Call that epsilon. And let's

vectors. Call that epsilon. And let's say epsilon is much less than one. It's

say epsilon is much less than one. It's a very small number. Then our

a very small number. Then our correlation function evaluated for the

correlation function evaluated for the inputs B and C is going to be -1 plus

inputs B and C is going to be -1 plus some positive number that is of order

some positive number that is of order epsilon squared or in principle it could

epsilon squared or in principle it could be higher order in epsilon but the

be higher order in epsilon but the biggest of a number it could be for

biggest of a number it could be for small epsilon is going to be of order of

small epsilon is going to be of order of epsilon squar but we can't have a first

epsilon squar but we can't have a first order term of order epsilon because then

order term of order epsilon because then that would be a slope in the function

that would be a slope in the function you see what I mean if P evaluated at B

you see what I mean if P evaluated at B and C for B approximately equal to C if

and C for B approximately equal to C if that had the form of -1 + something on

that had the form of -1 + something on the order of epsilon then the function

the order of epsilon then the function would be sloped there and that wouldn't

would be sloped there and that wouldn't be stationary that wouldn't be a zero

be stationary that wouldn't be a zero slope situation and so what we're saying

slope situation and so what we're saying here about this second order or higher

here about this second order or higher in epsilon I mean this is really just

in epsilon I mean this is really just the definition of what it means for the

the definition of what it means for the function to be stationary at its minimum

function to be stationary at its minimum value you know the slope is zero

value you know the slope is zero well okay then but the absolute value of

well okay then but the absolute value of The difference in the correlations P as

The difference in the correlations P as a function of A and B minus P as a

a function of A and B minus P as a function of A and C. That is the

function of A and C. That is the difference in correlations that we would

difference in correlations that we would get if first we have our detectors set

get if first we have our detectors set up with the axis A on one side and the

up with the axis A on one side and the axis B on the other side minus what the

axis B on the other side minus what the correlation would be if instead we had

correlation would be if instead we had the axis A on one side and the axis C on

the axis A on one side and the axis C on the other side where again C is equal to

the other side where again C is equal to B plus a tiny little nudge. Well, that

B plus a tiny little nudge. Well, that difference in the correlations, its

difference in the correlations, its absolute value is going to be first

absolute value is going to be first order in epsilon.

order in epsilon. The reason being our correlation

The reason being our correlation function changes when A and/ or B

function changes when A and/ or B change. I mean by definition, you think

change. I mean by definition, you think about how the correlation is defined as

about how the correlation is defined as the integral of the product of A and B

the integral of the product of A and B integrated over all possible lambda each

integrated over all possible lambda each weighted by the probability and lambda.

weighted by the probability and lambda. Well, the only thing that can change as

Well, the only thing that can change as we rotate C a little bit away from B in

we rotate C a little bit away from B in the correlation function is going to be

the correlation function is going to be some of the results at A and B changing

some of the results at A and B changing sign from +1 to minus1 or minus1 to +

sign from +1 to minus1 or minus1 to + one. And this is a very binary thing.

one. And this is a very binary thing. And so the amount of change that's

And so the amount of change that's happening here is going to be directly

happening here is going to be directly proportional to the difference between

proportional to the difference between the vectors B and C epsilon for small

the vectors B and C epsilon for small epsilon. And if you think about what the

epsilon. And if you think about what the measurement results are going to be at A

measurement results are going to be at A and B as we're moving C slightly away

and B as we're moving C slightly away from B, you can think about like a belt

from B, you can think about like a belt of area where A and B are flipping sign

of area where A and B are flipping sign and that's contributing to the change in

and that's contributing to the change in the correlation. Sort of like thinking

the correlation. Sort of like thinking about an orange slice having a volume

about an orange slice having a volume proportional to the wedge angle. And

proportional to the wedge angle. And then you see that that area of A and B

then you see that that area of A and B flipping around is going to be directly

flipping around is going to be directly proportional to this change in the

proportional to this change in the correlations.

correlations. Well, okay. So considering all of that

Well, okay. So considering all of that we run into a contradiction because then

we run into a contradiction because then equation 15 would imply that a positive

equation 15 would imply that a positive number which is second order in a small

number which is second order in a small epsilon is greater than or equal to a

epsilon is greater than or equal to a positive number which is first order in

positive number which is first order in a small epsilon. But that's not true for

a small epsilon. But that's not true for a small positive epsilon the first order

a small positive epsilon the first order term dominates because if epsilon is

term dominates because if epsilon is small then epsilon squar is a small* a

small then epsilon squar is a small* a small which is a tiny and so the thing

small which is a tiny and so the thing should be flipped around the other way.

should be flipped around the other way. you know something of the order of

you know something of the order of epsilon squar is going to be smaller

epsilon squar is going to be smaller than order of epsilon not what equation

than order of epsilon not what equation 15 would imply and so that mathematical

15 would imply and so that mathematical contradiction proves that our

contradiction proves that our correlation function cannot be

correlation function cannot be stationary at its minimum value unlike

stationary at its minimum value unlike the quantum mechanical correlation

the quantum mechanical correlation function which is stationary at its

function which is stationary at its minimum value. And so this is one way in

minimum value. And so this is one way in which we see that a local hidden

which we see that a local hidden variable model cannot give us the same

variable model cannot give us the same correlation function as quantum

correlation function as quantum mechanics which is also the correlation

mechanics which is also the correlation function that we see in experiments. And

function that we see in experiments. And so this right here is the first of the

so this right here is the first of the two parts of part 4 where we've proven

two parts of part 4 where we've proven that a local hidden variable model just

that a local hidden variable model just is not capable of reproducing the

is not capable of reproducing the statistical predictions of quantum

statistical predictions of quantum mechanics.

Now it's time to get into the second part of part four. And this is the main

part of part four. And this is the main argument of Bell's paper. This is the

argument of Bell's paper. This is the really powerful proof that the

really powerful proof that the correlation function we get from a local

correlation function we get from a local hidden variable model cannot be equal to

hidden variable model cannot be equal to the quantum mechanical correlation

the quantum mechanical correlation function. In other words, in just the

function. In other words, in just the same way that we've seen that a line

same way that we've seen that a line cannot be a cosine, it's true more

cannot be a cosine, it's true more generically that any kind of correlation

generically that any kind of correlation function we can get from a classical

function we can get from a classical hidden variable model cannot be equal to

hidden variable model cannot be equal to the quantum mechanical correlation of

the quantum mechanical correlation of negative cosine of theta also known as a

negative cosine of theta also known as a b. All right. So having already proven

b. All right. So having already proven the thing about the slope being non

the thing about the slope being non zero, Bell goes on to say, nor can the

zero, Bell goes on to say, nor can the quantum mechanical correlation of

quantum mechanical correlation of equation 3 that is a b also known as

equation 3 that is a b also known as negative cosine of theta be arbitrarily

negative cosine of theta be arbitrarily closely approximated by the form of

closely approximated by the form of equation 2 that is a correlation

equation 2 that is a correlation function given by a local hidden

function given by a local hidden variable model. No matter what kind of

variable model. No matter what kind of local hidden variable model you want to

local hidden variable model you want to come up with, it's just not the case

come up with, it's just not the case that the correlations that model gives

that the correlations that model gives you are going to be the same as the

you are going to be the same as the quantum mechanical correlations. And

quantum mechanical correlations. And this holds for all possible local hidden

this holds for all possible local hidden variable models.

variable models. The formal proof of this may be set out

The formal proof of this may be set out as follows. Well, first of all, we would

as follows. Well, first of all, we would not worry about the failure of the

not worry about the failure of the approximation at isolated points. So let

approximation at isolated points. So let us consider instead of equation 2 and

us consider instead of equation 2 and three the functions p bar of a and b and

three the functions p bar of a and b and a dob bar. And these functions are

a dob bar. And these functions are essentially exactly the same thing as

essentially exactly the same thing as equations 2 and three but they're

equations 2 and three but they're averaged over vectors near the vectors a

averaged over vectors near the vectors a and b. So the bar denotes independent

and b. So the bar denotes independent averaging of the correlations as a

averaging of the correlations as a function of a prime and b prime within

function of a prime and b prime within specified small angles of a and b.

specified small angles of a and b. Okay, so let's pause here and think

Okay, so let's pause here and think about what Belle is saying and why it

about what Belle is saying and why it matters. So this averaging thing, it's

matters. So this averaging thing, it's kind of a mathematically pedantic point.

kind of a mathematically pedantic point. But what Belle is saying here is look,

But what Belle is saying here is look, let's be generous and say that if

let's be generous and say that if someone came up with a local hidden

someone came up with a local hidden variable model which had a correlation

variable model which had a correlation function that matched the quantum

function that matched the quantum mechanical correlation for the most

mechanical correlation for the most part, but there were isolated points at

part, but there were isolated points at specific values of A and B where there

specific values of A and B where there was a mismatch between the local hidden

was a mismatch between the local hidden variable correlation and the quantum

variable correlation and the quantum mechanical correlation.

mechanical correlation. So for example, let's say P of A and B

So for example, let's say P of A and B is equal to A.B everywhere except at one

is equal to A.B everywhere except at one special point where A is equal to B or

special point where A is equal to B or whatever it may be. And at that one

whatever it may be. And at that one infinite decimally small point suppose

infinite decimally small point suppose there's some disagreement between P and

there's some disagreement between P and the quantum mechanical correlation.

the quantum mechanical correlation. What Belle is saying is don't worry

What Belle is saying is don't worry about that. If the local hidden variable

about that. If the local hidden variable model matches the quantum correlations

model matches the quantum correlations except at these special isolated points

except at these special isolated points where for whatever reason it doesn't

where for whatever reason it doesn't work out, you know what? We're going to

work out, you know what? We're going to be generous and we're going to say that

be generous and we're going to say that would work. The reason being

would work. The reason being experimentally we might not notice if

experimentally we might not notice if there was a mismatch at very specific

there was a mismatch at very specific isolated points between local hidden

isolated points between local hidden variables and quantum mechanics. And so

variables and quantum mechanics. And so when you're thinking about the mismatch

when you're thinking about the mismatch between the local hidden variable

between the local hidden variable correlation and the quantum mechanical

correlation and the quantum mechanical correlation, you want to kind of smear

correlation, you want to kind of smear things out or smooth things out just a

things out or smooth things out just a bit to where a mismatch at an isolated

bit to where a mismatch at an isolated point would be totally washed away. And

point would be totally washed away. And so all we're doing when we're taking

so all we're doing when we're taking this average over very close nearby

this average over very close nearby points is we're just saying don't worry

points is we're just saying don't worry if the correlation fails at specific

if the correlation fails at specific isolated points. That's all that is. So

isolated points. That's all that is. So to imagine the vectors A prime and B

to imagine the vectors A prime and B prime, just imagine the vectors A and B,

prime, just imagine the vectors A and B, but then smear them out just a little

but then smear them out just a little bit over a tiny little space of nearby

bit over a tiny little space of nearby vectors. That's all that means.

vectors. That's all that means. Suppose that for all A and B, the

Suppose that for all A and B, the difference between the local hidden

difference between the local hidden variable correlation and the quantum

variable correlation and the quantum mechanical correlation is bounded by

mechanical correlation is bounded by some number epsilon.

some number epsilon. That is P bar of A and B plus A.B bar.

That is P bar of A and B plus A.B bar. the absolute value is always going to be

the absolute value is always going to be less than or equal to this value

less than or equal to this value epsilon.

epsilon. Now the thing you have to see about

Now the thing you have to see about equation 16 is that this is just the

equation 16 is that this is just the local hidden variable model correlation

local hidden variable model correlation minus the quantum mechanical correlation

minus the quantum mechanical correlation because the quantum mechanical

because the quantum mechanical correlation is negative a b and so this

correlation is negative a b and so this plus a b this is minus the quantum

plus a b this is minus the quantum mechanical correlation and then you take

mechanical correlation and then you take the absolute value and that is just the

the absolute value and that is just the magnitude of the error or the mismatch

magnitude of the error or the mismatch between our local hidden variable models

between our local hidden variable models correlation and the quantum mechanical

correlation and the quantum mechanical correlation.

correlation. And so what epsilon represents is the

And so what epsilon represents is the maximum amount of error in our local

maximum amount of error in our local hidden variable model relative to

hidden variable model relative to quantum mechanics. And then as a

quantum mechanics. And then as a reminder, these bars are just there to

reminder, these bars are just there to say don't worry about single isolated

say don't worry about single isolated points where there's a mismatch. We'll

points where there's a mismatch. We'll allow that. We're going to go ahead and

allow that. We're going to go ahead and smooth out or filter out any infinite

smooth out or filter out any infinite decimally small points of mismatch. So

decimally small points of mismatch. So epsilon is the maximum mismatch when you

epsilon is the maximum mismatch when you factor out any infinite decimally small

factor out any infinite decimally small areas where the two correlation

areas where the two correlation functions disagree. And so if we can

functions disagree. And so if we can show that epsilon is zero for some local

show that epsilon is zero for some local hidden variable model then that model

hidden variable model then that model would effectively reproduce the quantum

would effectively reproduce the quantum mechanical correlation. So that would

mechanical correlation. So that would work. However, it will be shown that

work. However, it will be shown that epsilon cannot be made arbitrarily

epsilon cannot be made arbitrarily small. That is what we're about to prove

small. That is what we're about to prove is that at minimum epsilon has to be

is that at minimum epsilon has to be some nonzero number. And so therefore,

some nonzero number. And so therefore, you're always going to have some

you're always going to have some mismatch between the local hidden

mismatch between the local hidden variable correlation and the quantum

variable correlation and the quantum mechanical correlation, no matter the

mechanical correlation, no matter the details of your local hidden variable

details of your local hidden variable model. So that's going to be the main

model. So that's going to be the main proof of Bell's paper. All right. So

proof of Bell's paper. All right. So next we're going to massage equation 16

next we're going to massage equation 16 into a slightly different form by

into a slightly different form by supposing that for all a and b the

supposing that for all a and b the absolute value of a dob bar minus a dob

absolute value of a dob bar minus a dob is going to be less than or equal to

is going to be less than or equal to some small number delta.

some small number delta. So this expression is the mismatch

So this expression is the mismatch between the average dotproduct over the

between the average dotproduct over the a prime and b prime that are close to a

a prime and b prime that are close to a and b minus the exact dotproduct a dob.

and b minus the exact dotproduct a dob. So you can think about this as the error

So you can think about this as the error introduced into the quantum mechanical

introduced into the quantum mechanical correlation as a result of our averaging

correlation as a result of our averaging technique. So as we smooth things out

technique. So as we smooth things out just a little bit and we average away

just a little bit and we average away those infinite decimal potential points

those infinite decimal potential points of mismatch. Suppose that this is going

of mismatch. Suppose that this is going to smear things out such that the

to smear things out such that the average of the dotproduct of a and b

average of the dotproduct of a and b minus the dotproduct of exactly a and

minus the dotproduct of exactly a and exactly b is going to be at most some

exactly b is going to be at most some small number delta.

small number delta. Then from equation 16 we find that p bar

Then from equation 16 we find that p bar of a and b plus a dob that is the

of a and b plus a dob that is the average local hidden variable

average local hidden variable correlation function minus the exact

correlation function minus the exact quantum mechanical correlation function

quantum mechanical correlation function evaluated at exactly a and b. Notice we

evaluated at exactly a and b. Notice we no longer have the bar over a dob is

no longer have the bar over a dob is going to be less than or equal to the

going to be less than or equal to the small number epsilon plus the small

small number epsilon plus the small number delta. That is the mismatch

number delta. That is the mismatch between P bar and the exact quantum

between P bar and the exact quantum correlation function evaluated at A and

correlation function evaluated at A and B has to be less than or equal to the

B has to be less than or equal to the maximum mismatch between P bar and A.B

maximum mismatch between P bar and A.B bar plus whatever the maximum number is

bar plus whatever the maximum number is that results from us smearing out the

that results from us smearing out the quantum mechanical correlation a dob

quantum mechanical correlation a dob into a b bar. And that kind of makes

into a b bar. And that kind of makes sense just by looking at it. But just to

sense just by looking at it. But just to show exactly how equation 18 follows

show exactly how equation 18 follows from equation 16 and 17, we can go ahead

from equation 16 and 17, we can go ahead and write equation 18 as p bar of a and

and write equation 18 as p bar of a and b plus a dob bar plus a dob minus a dob

b plus a dob bar plus a dob minus a dob bar. See, all we've done here is within

bar. See, all we've done here is within that absolute value, we've added an a.b

that absolute value, we've added an a.b bar and we've subtracted out an a.b bar.

bar and we've subtracted out an a.b bar. Now, why does that matter? Well, because

Now, why does that matter? Well, because now we know that that has to be less

now we know that that has to be less than or equal to the absolute value of p

than or equal to the absolute value of p of a and b plus a dob bar plus the

of a and b plus a dob bar plus the absolute value of a dob minus a dob bar.

absolute value of a dob minus a dob bar. And that comes from the triangle

And that comes from the triangle inequality, which is the idea that if

inequality, which is the idea that if you have the absolute value of x + y,

you have the absolute value of x + y, that can at most be the absolute value

that can at most be the absolute value of x plus the absolute value of y. Well,

of x plus the absolute value of y. Well, then now if you examine these two

then now if you examine these two quantities on the right side, you see

quantities on the right side, you see that the first one p bar of a and b plus

that the first one p bar of a and b plus a dob bar is what we have in equation

a dob bar is what we have in equation 16. And so we know that has to be less

16. And so we know that has to be less than or equal to epsilon.

than or equal to epsilon. And then the yellow expression a dob

And then the yellow expression a dob minus a dob bar absolute value. Well,

minus a dob bar absolute value. Well, that's the same thing we have in

that's the same thing we have in equation 17. And so that has to be less

equation 17. And so that has to be less than or equal to delta. And so

than or equal to delta. And so therefore, the whole thing has to be

therefore, the whole thing has to be less than or equal to epsilon plus

less than or equal to epsilon plus delta. And so therefore we've just

delta. And so therefore we've just proven equation 18.

proven equation 18. Okay. So next we want to think about

Okay. So next we want to think about what exactly is par bar of a and b. Well

what exactly is par bar of a and b. Well by equation two this is just going to be

by equation two this is just going to be p of a and b the local hidden variable

p of a and b the local hidden variable correlation function but averaged out

correlation function but averaged out over a space of vectors a prime and b

over a space of vectors a prime and b prime which are very close to the

prime which are very close to the vectors a and b but just a little bit

vectors a and b but just a little bit smeared out so we don't worry about

smeared out so we don't worry about weird singular points. And so therefore

weird singular points. And so therefore we can write P bar in exactly the same

we can write P bar in exactly the same way as we write P in equation 2. But

way as we write P in equation 2. But here we simply put a bar over A and B

here we simply put a bar over A and B because when we smear out the vectors A

because when we smear out the vectors A and B a little bit and we ask what is P

and B a little bit and we ask what is P bar? Well, that's just going to depend

bar? Well, that's just going to depend on how smearing out A and B affects the

on how smearing out A and B affects the average of the results at detector A and

average of the results at detector A and detector B because the correlation

detector B because the correlation function is just the product of the

function is just the product of the results at A and B integrated over all

results at A and B integrated over all possible hidden variables. And remember

possible hidden variables. And remember that the bar is just averaging out or

that the bar is just averaging out or smearing out the vectors A and B a

smearing out the vectors A and B a little bit. So that's not going to

little bit. So that's not going to affect the distribution of hidden

affect the distribution of hidden variables. And that's why we don't have

variables. And that's why we don't have a bar over the row because this process

a bar over the row because this process of smearing out A and B doesn't have any

of smearing out A and B doesn't have any effect on the probability distribution

effect on the probability distribution of our hidden variables.

of our hidden variables. But now if you think about what are the

But now if you think about what are the values that a bar and b bar are going to

values that a bar and b bar are going to take on. Well remember that the results

take on. Well remember that the results at a and b can only ever be plus or

at a and b can only ever be plus or minus one. And so now when we smear out

minus one. And so now when we smear out a and b and we're going to average over

a and b and we're going to average over the values that a and b take on for

the values that a and b take on for these smeared out vectors. Well then we

these smeared out vectors. Well then we find that at most the absolute values of

find that at most the absolute values of a and b are going to be one. But now it

a and b are going to be one. But now it is possible for a bar and b bar to be

is possible for a bar and b bar to be less than one. If when we smear out the

less than one. If when we smear out the vectors a and b, we dip into a space of

vectors a and b, we dip into a space of the detector results where the sign

the detector results where the sign flips relative to what it would have

flips relative to what it would have been along exactly the measurement

been along exactly the measurement direction a or the measurement direction

direction a or the measurement direction b. That is to say, if the result at

b. That is to say, if the result at detector a is a function of a is equal

detector a is a function of a is equal to 1. But if you give a a little nudge,

to 1. But if you give a a little nudge, then you could nudge the result into

then you could nudge the result into being negative 1. If the measurement

being negative 1. If the measurement direction a is right on the edge of what

direction a is right on the edge of what determines the sign of the result at

determines the sign of the result at detector A, well then in that case, the

detector A, well then in that case, the absolute value of the result at A might

absolute value of the result at A might be something like 0.9 or 0.8 or

be something like 0.9 or 0.8 or whatever. But no matter what, it's going

whatever. But no matter what, it's going to be some number less than or equal to

to be some number less than or equal to 1.

1. All right. And next, Belle goes ahead

All right. And next, Belle goes ahead and constructs equation 21 from

and constructs equation 21 from equations 18 and 19 with the measurement

equations 18 and 19 with the measurement direction A set equal to the measurement

direction A set equal to the measurement direction B. So that yields equation 21.

direction B. So that yields equation 21. And in just a moment, we're going to use

And in just a moment, we're going to use equation 21 and we're going to see why

equation 21 and we're going to see why it matters and why Bell writes it out.

it matters and why Bell writes it out. But for now, I just want to reflect on

But for now, I just want to reflect on how equation 21 follows from equations

how equation 21 follows from equations 18 and 19. So the first thing to

18 and 19. So the first thing to recognize is that the right hand side of

recognize is that the right hand side of equation 21 is precisely the same as the

equation 21 is precisely the same as the right hand side of equation 18. And then

right hand side of equation 18. And then if you look at the left hand side of

if you look at the left hand side of equation 18, you see the p bar of a and

equation 18, you see the p bar of a and b. And you can recognize that on the

b. And you can recognize that on the left hand side of equation 21 as the

left hand side of equation 21 as the integral over all row d lambda of a as a

integral over all row d lambda of a as a function of b and lambda time b as a

function of b and lambda time b as a function of b and lambda. Because

function of b and lambda. Because remember here in the context of equation

remember here in the context of equation 21, we're setting the two measurement

21, we're setting the two measurement axes to the same vector B for both

axes to the same vector B for both detectors.

detectors. And so then we see that this integral

And so then we see that this integral expression is par bar evaluated for the

expression is par bar evaluated for the vectors b and b.

vectors b and b. Now then you notice there's also that

Now then you notice there's also that plus one inside the integrant and that

plus one inside the integrant and that is simply a dob because when a and b are

is simply a dob because when a and b are the same unit vector then you have b dob

the same unit vector then you have b dob which is magnitude of b ^ 2 which is 1

which is magnitude of b ^ 2 which is 1 cuz b is a unit vector and the one we

cuz b is a unit vector and the one we can bring inside or outside of the

can bring inside or outside of the integral because of the fact that the

integral because of the fact that the integral of row lambda d lambda is equal

integral of row lambda d lambda is equal to 1 because row is a normalized

to 1 because row is a normalized probability distribution.

probability distribution. And then there's another little detail

And then there's another little detail here, which is that notice how we've

here, which is that notice how we've dropped the absolute value sign on the

dropped the absolute value sign on the left side of equation 18. The reason

left side of equation 18. The reason that's an okay move is because by

that's an okay move is because by inspection, the integral on the left

inspection, the integral on the left hand side of equation 21 cannot be

hand side of equation 21 cannot be negative. Because if you consider the

negative. Because if you consider the product of a bar and b bar, the minimum

product of a bar and b bar, the minimum value that can be is -1. Say a bar is 1

value that can be is -1. Say a bar is 1 and b bar is ne. And so therefore, a bar

and b bar is ne. And so therefore, a bar * b bar + 1 is at least zero. It can't

* b bar + 1 is at least zero. It can't go negative. So then when we integrate

go negative. So then when we integrate over a bar * b bar + 1, we're always

over a bar * b bar + 1, we're always integrating over a non- negative number.

integrating over a non- negative number. And so that's why we can just go ahead

And so that's why we can just go ahead and drop the absolute value sign because

and drop the absolute value sign because if we know it's not negative, then

if we know it's not negative, then there's no point in having an absolute

there's no point in having an absolute value sign. Okay, so I'm sure you're

value sign. Okay, so I'm sure you're wondering, what's the point of equation

wondering, what's the point of equation 21? Where are we going with this? Why

21? Where are we going with this? Why does this matter? Well, I want to take a

does this matter? Well, I want to take a moment to recognize where we are

moment to recognize where we are currently at in the paper as a kind of

currently at in the paper as a kind of natural checkpoint in part 4B. So,

natural checkpoint in part 4B. So, everything we've done up until now is

everything we've done up until now is sort of the warm-up of part 4B. We've

sort of the warm-up of part 4B. We've essentially been setting the stage,

essentially been setting the stage, thinking about what it is that we want

thinking about what it is that we want to prove, thinking about averaging,

to prove, thinking about averaging, smoothing things out, not worrying about

smoothing things out, not worrying about isolated points and all this sort of

isolated points and all this sort of thing. and then introducing these

thing. and then introducing these quantities epsilon and delta and making

quantities epsilon and delta and making some algebraic observations.

some algebraic observations. In the next part of this derivation,

In the next part of this derivation, we're going to be utilizing these

we're going to be utilizing these equations to make an algebraic argument

equations to make an algebraic argument which is going to lead to Bell's famous

which is going to lead to Bell's famous result that the error between the local

result that the error between the local hidden variable correlation and the

hidden variable correlation and the quantum mechanical correlation that is

quantum mechanical correlation that is epsilon cannot be made arbitrarily

epsilon cannot be made arbitrarily small. which is to say that no local

small. which is to say that no local hidden variable model can reproduce the

hidden variable model can reproduce the statistics of quantum mechanics to an

statistics of quantum mechanics to an arbitrarily good approximation.

arbitrarily good approximation. And then the next thing Belle goes ahead

And then the next thing Belle goes ahead and does is he writes an expression for

and does is he writes an expression for P bar as a function of A and B minus P

P bar as a function of A and B minus P bar as a function of A and C. And in

bar as a function of A and C. And in just a moment I'll tell you exactly what

just a moment I'll tell you exactly what that is. But for now, let's see why the

that is. But for now, let's see why the equation is true. So if you look at

equation is true. So if you look at equation 19, we have the definition of P

equation 19, we have the definition of P bar as a function of A and B, which is

bar as a function of A and B, which is simply the integral definition of the

simply the integral definition of the correlation P as a function of A and B,

correlation P as a function of A and B, that is equation 2, but average over

that is equation 2, but average over smeared out vectors near A and B. So

smeared out vectors near A and B. So that's why we have A bar and B bar in

that's why we have A bar and B bar in the integrant. Well, then if we want to

the integrant. Well, then if we want to write the expression P bar of A and B

write the expression P bar of A and B minus P bar of A and C, we can just go

minus P bar of A and C, we can just go ahead and copy and paste equation 19

ahead and copy and paste equation 19 twice. in the first case evaluated for

twice. in the first case evaluated for the vectors A and B and in the second

the vectors A and B and in the second case evaluated for the vectors A and C

case evaluated for the vectors A and C and then you may as well smoosh them

and then you may as well smoosh them together into the same integral. So

together into the same integral. So that's all we've written here. It's

that's all we've written here. It's basically just equation 19.

basically just equation 19. So now let's reflect on what is this

So now let's reflect on what is this quantity P bar of A and B minus P bar of

quantity P bar of A and B minus P bar of A and C. Well, you want to think of C as

A and C. Well, you want to think of C as another alternative for B that is the

another alternative for B that is the measurement axis of detector B. And this

measurement axis of detector B. And this is just like how we had imagined the

is just like how we had imagined the vector C before in part 4 A. However,

vector C before in part 4 A. However, whereas before we imagined that B and C

whereas before we imagined that B and C were very similar vectors, so that C was

were very similar vectors, so that C was just a little nudge away from B. And

just a little nudge away from B. And that let us probe the behavior of the

that let us probe the behavior of the correlation P of B and C near its

correlation P of B and C near its minimum value where B equals C. We're

minimum value where B equals C. We're now going to imagine the vector C as

now going to imagine the vector C as being totally unrelated to B. So not

being totally unrelated to B. So not just a nudge away but a whole different

just a nudge away but a whole different vector that we are totally free to

vector that we are totally free to choose for the measurement axis of

choose for the measurement axis of detector B. So then in that context P

detector B. So then in that context P bar of A and B minus P bar of A and C is

bar of A and B minus P bar of A and C is the difference between the correlation

the difference between the correlation strengths that we would measure for the

strengths that we would measure for the detector settings A and B compared to

detector settings A and B compared to the detector settings A and C. Now of

the detector settings A and C. Now of course we have the bar on the P and so

course we have the bar on the P and so we're neglecting aberant isolated

we're neglecting aberant isolated points. you know, we're smoothing out

points. you know, we're smoothing out any infinite decimal pathological point.

any infinite decimal pathological point. And so that's why we have the bar and

And so that's why we have the bar and the P. All right. So then now Belle goes

the P. All right. So then now Belle goes ahead and writes this equation in a form

ahead and writes this equation in a form that looks way more complicated, but is

that looks way more complicated, but is going to be useful in a moment. So he

going to be useful in a moment. So he writes out this integral expression like

writes out this integral expression like so. I'm not going to try to pronounce

so. I'm not going to try to pronounce this equation cuz it's a mouthful. But I

this equation cuz it's a mouthful. But I will show you why this is a legit move

will show you why this is a legit move and why this complicated expression is

and why this complicated expression is in fact algebraically equivalent to the

in fact algebraically equivalent to the previous integral. So to recognize this,

previous integral. So to recognize this, you just have to consider the fact that

you just have to consider the fact that if you have an expression of the form x

if you have an expression of the form x * y - x * z. If you want, you could

* y - x * z. If you want, you could write that as x * y * the quantity of 1

write that as x * y * the quantity of 1 + w * z - x * z * the quantity of 1 + w

+ w * z - x * z * the quantity of 1 + w * y, assuming all these variables

* y, assuming all these variables commute, which they do because they're

commute, which they do because they're scalers. And the reason that's true is

scalers. And the reason that's true is because on the right hand side here, the

because on the right hand side here, the terms involving w are going to cancel

terms involving w are going to cancel each other out. In one case, you'll have

each other out. In one case, you'll have uh xy wz, but then you're going to have

uh xy wz, but then you're going to have a minus xzwy.

a minus xzwy. And so you're going to end up with

And so you're going to end up with wxyzus wxyz equals zero. Then what

wxyzus wxyz equals zero. Then what remains the terms multiplied by 1 is

remains the terms multiplied by 1 is just xy - xz, which is exactly the left

just xy - xz, which is exactly the left hand side of the equation. So if you

hand side of the equation. So if you look at the integral expression shown

look at the integral expression shown here on the bottom line, you see that it

here on the bottom line, you see that it has this complicated form where we have

has this complicated form where we have something of the form xy * 1 + wz - xz *

something of the form xy * 1 + wz - xz * 1 + wy. And so that's how to see that

1 + wy. And so that's how to see that these two integrals are equivalent. So

these two integrals are equivalent. So this kind of feels like backwards math.

this kind of feels like backwards math. Like if you started with the second

Like if you started with the second line, you would feel a sense of

line, you would feel a sense of accomplishment upon seeing that the

accomplishment upon seeing that the terms simplify into the first line. But

terms simplify into the first line. But here we're going backwards. We're

here we're going backwards. We're expanding out the equation. We're making

expanding out the equation. We're making it more messy because this is going to

it more messy because this is going to be a form that's going to be useful for

be a form that's going to be useful for us in just a moment.

us in just a moment. All right. So, where do we go from here?

All right. So, where do we go from here? Well, think about what this equation is.

Well, think about what this equation is. This is a generic statement that for any

This is a generic statement that for any local hidden variable model, the

local hidden variable model, the difference between the correlations that

difference between the correlations that we would expect with the measurement

we would expect with the measurement axes A and B compared to A and C is

axes A and B compared to A and C is going to be equal to this big mess of an

going to be equal to this big mess of an equation involving integrating over

equation involving integrating over these expressions involving the various

these expressions involving the various outcomes at A and B with given

outcomes at A and B with given measurement axes A, B, and C. So the

measurement axes A, B, and C. So the difference in correlations equals a big

difference in correlations equals a big mess. And the next move that we're going

mess. And the next move that we're going to do is we're going to convert this

to do is we're going to convert this equation into an inequality. And in the

equation into an inequality. And in the process, we're also going to convert the

process, we're also going to convert the big mess into a medium-sized mess.

big mess into a medium-sized mess. From equation 20, we find that the

From equation 20, we find that the absolute value of this difference in

absolute value of this difference in correlations is going to be less than or

correlations is going to be less than or equal to this medium-sized mess.

equal to this medium-sized mess. Now, to get from this inequality from

Now, to get from this inequality from the previous equation, it only takes a

the previous equation, it only takes a couple of steps. The first thing you

couple of steps. The first thing you want to do is take the absolute value of

want to do is take the absolute value of both sides. So you see on the left hand

both sides. So you see on the left hand side, we've simply taken the absolute

side, we've simply taken the absolute value of the difference in correlations.

value of the difference in correlations. And then when you take the absolute

And then when you take the absolute value of the right hand side, you find

value of the right hand side, you find that you're taking the absolute value of

that you're taking the absolute value of an integral minus an integral or plus a

an integral minus an integral or plus a negative integral if you want to think

negative integral if you want to think about it like that. And then you realize

about it like that. And then you realize that by the triangle inequality, the

that by the triangle inequality, the absolute value of the sum of two

absolute value of the sum of two integrals can be at most the absolute

integrals can be at most the absolute value of the first integral plus the

value of the first integral plus the absolute value of the second integral.

absolute value of the second integral. And so then because we're converting the

And so then because we're converting the equation to an inequality, then we can

equation to an inequality, then we can go ahead and imagine the absolute value

go ahead and imagine the absolute value on the right hand side applying to each

on the right hand side applying to each integral individually.

integral individually. And then because a bar * b bar is at

And then because a bar * b bar is at least -1 because there's no way if a bar

least -1 because there's no way if a bar and b bar could be less than negative 1

and b bar could be less than negative 1 then the quantity 1 + a bar b bar is

then the quantity 1 + a bar b bar is always going to be non- negative. So

always going to be non- negative. So that's all good.

that's all good. All right. Now at this stage in the

All right. Now at this stage in the derivation it should not be obvious why

derivation it should not be obvious why we care about this inequality that we've

we care about this inequality that we've written here. But if you look at this

written here. But if you look at this equation you can see a bit of

equation you can see a bit of foreshadowing here. The reason being we

foreshadowing here. The reason being we have a very generic statement that

have a very generic statement that applies for any local hidden variable

applies for any local hidden variable model which says that the magnitude of

model which says that the magnitude of the difference in the correlations for

the difference in the correlations for the settings A and B versus the settings

the settings A and B versus the settings A and C are going to be bounded by an

A and C are going to be bounded by an upper limit given by the right hand side

upper limit given by the right hand side of this inequality. So you can imagine

of this inequality. So you can imagine that we're just a few algebraic moves

that we're just a few algebraic moves away from a very interesting result

away from a very interesting result which constrains all possible local

which constrains all possible local hidden variable models in a way that is

hidden variable models in a way that is relevant to the question of whether

relevant to the question of whether local hidden variable models can

local hidden variable models can reproduce the statistical correlations

reproduce the statistical correlations of quantum mechanics.

of quantum mechanics. So in service of that goal, we can now

So in service of that goal, we can now go ahead and rewrite this inequality

go ahead and rewrite this inequality with a much simpler right- hand side.

with a much simpler right- hand side. See from equations 19 and 21 we can see

See from equations 19 and 21 we can see that the expression on the right hand

that the expression on the right hand side is going to be less than or equal

side is going to be less than or equal to 1 + p bar of b and c plus epsilon

to 1 + p bar of b and c plus epsilon plus delta. The reason being if you look

plus delta. The reason being if you look at the first of the two integrals on the

at the first of the two integrals on the right hand side we see that there's a 1

right hand side we see that there's a 1 which can be pulled outside of the

which can be pulled outside of the integral because row of lambda is a

integral because row of lambda is a normalized probability distribution.

normalized probability distribution. And then what remains in that integral

And then what remains in that integral is by definition P bar evaluated with

is by definition P bar evaluated with the vectors B and C by equation 19. So

the vectors B and C by equation 19. So the first integral is going to be

the first integral is going to be exactly equal to 1 + P bar of B and C.

exactly equal to 1 + P bar of B and C. And then if you look at the second

And then if you look at the second integral, you find that that is exactly

integral, you find that that is exactly the left hand side of equation 21

the left hand side of equation 21 because we're integrating over row d

because we're integrating over row d lambda of 1 plus a bar of b and lambda *

lambda of 1 plus a bar of b and lambda * b bar of b and lambda. And we've already

b bar of b and lambda. And we've already established in equation 21 that that has

established in equation 21 that that has to be less than or equal to epsilon plus

to be less than or equal to epsilon plus delta. And so those inequalities stack.

delta. And so those inequalities stack. And so then we can go ahead and pull

And so then we can go ahead and pull that down to the bottom line here. And

that down to the bottom line here. And we end up with this much more elegant

we end up with this much more elegant upper bound on the difference between

upper bound on the difference between the correlations of a local hidden

the correlations of a local hidden variable model for detector settings A

variable model for detector settings A and B versus A and C. And now we're

and B versus A and C. And now we're really getting somewhere. You can see

really getting somewhere. You can see that things are starting to clean up

that things are starting to clean up really nicely. And so now Bell goes on

really nicely. And so now Bell goes on to abruptly say that finally using

to abruptly say that finally using equation 18, the absolute value of a C

equation 18, the absolute value of a C minus A.B B - 2 epsilon + delta has to

minus A.B B - 2 epsilon + delta has to be less than or equal to 1 minus B do C

be less than or equal to 1 minus B do C + 2 quantity epsilon + delta. And that's

+ 2 quantity epsilon + delta. And that's a bit of a leap. You know, you can't

a bit of a leap. You know, you can't really see that just by looking at it.

really see that just by looking at it. So, we have to take a moment to see why

So, we have to take a moment to see why that's the case. All right. So, if you

that's the case. All right. So, if you look at equation 18, we find that the

look at equation 18, we find that the absolute value of p bar of a and b plus

absolute value of p bar of a and b plus a dob is less than or equal to epsilon

a dob is less than or equal to epsilon plus delta.

plus delta. And remember what that equation means.

And remember what that equation means. That is the absolute value of the

That is the absolute value of the difference between the correlation

difference between the correlation function given by a local hidden

function given by a local hidden variable model and smoothed out a little

variable model and smoothed out a little bit. So we're neglecting any

bit. So we're neglecting any pathological aberant points minus the

pathological aberant points minus the quantum mechanical correlation of a b.

quantum mechanical correlation of a b. And as we saw earlier that has to be

And as we saw earlier that has to be less than or equal to epsilon plus delta

less than or equal to epsilon plus delta where epsilon is the upper bound on the

where epsilon is the upper bound on the mismatch between the local hidden

mismatch between the local hidden variable correlation and the quantum

variable correlation and the quantum mechanical correlation.

mechanical correlation. And this small number delta encodes the

And this small number delta encodes the mismatch between the precise quantum

mismatch between the precise quantum mechanical correlation and the slightly

mechanical correlation and the slightly smeared out quantum mechanical

smeared out quantum mechanical correlation when we're averaging over

correlation when we're averaging over the vectors a prime and b prime near a

the vectors a prime and b prime near a and b respectively. And we saw earlier

and b respectively. And we saw earlier why equation 18 is true. But now we can

why equation 18 is true. But now we can think of it in another way which is to

think of it in another way which is to say equation 18 tells us that p bar of a

say equation 18 tells us that p bar of a and b is going to be equal to a dob plus

and b is going to be equal to a dob plus some error which let's go ahead and

some error which let's go ahead and subscript that error sub a. And the

subscript that error sub a. And the reason this follows directly from

reason this follows directly from equation 18 is that equation 18 tells us

equation 18 is that equation 18 tells us that the difference between p bar of a

that the difference between p bar of a and b and a dob the absolute value of

and b and a dob the absolute value of that is going to be bounded by the sum

that is going to be bounded by the sum of two small numbers epsilon and delta.

of two small numbers epsilon and delta. And so therefore p bar of a and b and a

And so therefore p bar of a and b and a do.b are going to be pretty similar

do.b are going to be pretty similar numbers. And so we can think about these

numbers. And so we can think about these two as the same thing plus some error

two as the same thing plus some error factor.

factor. So now then if you take that reasoning

So now then if you take that reasoning and you apply it to the inequality we

and you apply it to the inequality we derived before regarding the absolute

derived before regarding the absolute value of p bar of a and b minus p bar of

value of p bar of a and b minus p bar of a and c you see that we can go ahead and

a and c you see that we can go ahead and replace those p bars with a quantum

replace those p bars with a quantum correlation a dob plus error sub a and

correlation a dob plus error sub a and then for the negative p bar of a and c

then for the negative p bar of a and c that becomes for the same reason plus a

that becomes for the same reason plus a do c minus error a sub c And you see

do c minus error a sub c And you see we've gone ahead and distributed a minus

we've gone ahead and distributed a minus sign throughout those terms.

sign throughout those terms. And so thinking about equation 18 as a

And so thinking about equation 18 as a statement about the error between par

statement about the error between par and the quantum mechanical correlation

and the quantum mechanical correlation with the absolute value of the error

with the absolute value of the error bounded by epsilon plus delta. We can go

bounded by epsilon plus delta. We can go ahead and replace any expression

ahead and replace any expression involving p bar with the quantum

involving p bar with the quantum mechanical correlation plus that error.

mechanical correlation plus that error. And so likewise on the right hand side

And so likewise on the right hand side we can go ahead and replace par bar of b

we can go ahead and replace par bar of b and c with negative b c plus error

and c with negative b c plus error subbc.

subbc. And so now what we want to do is ideally

And so now what we want to do is ideally we would like to replace these error

we would like to replace these error factors with factors of epsilon plus

factors with factors of epsilon plus delta. But when we do that, we have to

delta. But when we do that, we have to be careful because it's not guaranteed

be careful because it's not guaranteed that the absolute value of the error is

that the absolute value of the error is going to equal epsilon plus delta

going to equal epsilon plus delta because in general, it's going to be

because in general, it's going to be actually less than or equal to epsilon

actually less than or equal to epsilon plus delta. And so if we're starting

plus delta. And so if we're starting with this inequality about the absolute

with this inequality about the absolute value of p bar of a and b minus p bar of

value of p bar of a and b minus p bar of a and c and we want to go from that

a and c and we want to go from that inequality to another inequality where

inequality to another inequality where we can replace these error factors with

we can replace these error factors with factors of epsilon and delta and we want

factors of epsilon and delta and we want to make sure that logically our new

to make sure that logically our new inequality actually does logically

inequality actually does logically follow from the previous one. then we

follow from the previous one. then we have to consider the quote unquote worst

have to consider the quote unquote worst case scenario where the magnitude of the

case scenario where the magnitude of the error is indeed equal to epsilon plus

error is indeed equal to epsilon plus delta. And in a way, this is the best

delta. And in a way, this is the best case scenario for ensuring that the

case scenario for ensuring that the inequality that we're going to arrive at

inequality that we're going to arrive at is true. Because what this means is that

is true. Because what this means is that on the left hand side of this

on the left hand side of this expression, we're going to subtract 2 *

expression, we're going to subtract 2 * the quantity of epsilon plus delta

the quantity of epsilon plus delta corresponding to the most that our error

corresponding to the most that our error factors could pull down the left side of

factors could pull down the left side of that inequality to make it as small as

that inequality to make it as small as possible. And then likewise on the right

possible. And then likewise on the right hand side of the expression, we're going

hand side of the expression, we're going to let our error be the most it could

to let our error be the most it could possibly be. So we're going to add

possibly be. So we're going to add epsilon plus delta on the right side to

epsilon plus delta on the right side to bring up the right hand side as much as

bring up the right hand side as much as we possibly can.

we possibly can. And so because we did it like that where

And so because we did it like that where we considered, okay, worst case

we considered, okay, worst case scenario, the error is as big as

scenario, the error is as big as possible and we're going to let it pull

possible and we're going to let it pull down the small side and push up the big

down the small side and push up the big side. then we know for sure that the

side. then we know for sure that the simpler inequality where the errors have

simpler inequality where the errors have been replaced with epsilon plus delta is

been replaced with epsilon plus delta is guaranteed to still be true.

guaranteed to still be true. All right. Now, there's one little

All right. Now, there's one little adjustment we're going to do cuz when

adjustment we're going to do cuz when you look at an equation like this, you

you look at an equation like this, you think maybe we can clean this up a

think maybe we can clean this up a little bit. So, let's go ahead and pull

little bit. So, let's go ahead and pull all factors of epsilon and delta on over

all factors of epsilon and delta on over to the left side of the expression. And

to the left side of the expression. And while we're at it, let's go ahead and

while we're at it, let's go ahead and flip around the inequality and then put

flip around the inequality and then put everything else on the right. So with

everything else on the right. So with just a little bit of algebraic

just a little bit of algebraic maneuvering we end up with this

maneuvering we end up with this inequality that 4 * the quantity of

inequality that 4 * the quantity of epsilon plus delta is guaranteed to be

epsilon plus delta is guaranteed to be greater than or equal to the absolute

greater than or equal to the absolute value of a dot c minus a dob plus b dot

value of a dot c minus a dob plus b dot c minus 1. This is equation 22 of bell's

c minus 1. This is equation 22 of bell's paper. And this is a very profound

paper. And this is a very profound result. In fact, you know, the term

result. In fact, you know, the term Bell's theorem is kind of a vague

Bell's theorem is kind of a vague generic statement that applies generally

generic statement that applies generally to Bell's observation that local hidden

to Bell's observation that local hidden variable models don't work. But if you

variable models don't work. But if you had to take a single equation, or in

had to take a single equation, or in this case, an inequality from Bell's

this case, an inequality from Bell's paper and say this is the result. This

paper and say this is the result. This is the statement, well, it would be the

is the statement, well, it would be the inequality shown here. And why is that?

inequality shown here. And why is that? What's the big deal? Who cares about

What's the big deal? Who cares about equation 22?

equation 22? Well, to see what equation 22 can tell

Well, to see what equation 22 can tell us, let's imagine a thought experiment

us, let's imagine a thought experiment where we consider the vectors A, B, and

where we consider the vectors A, B, and C. A is going to be a constant

C. A is going to be a constant measurement axis at detector A. And then

measurement axis at detector A. And then B and C are going to be the two

B and C are going to be the two different options that we imagine for

different options that we imagine for detector B. And suppose for the sake of

detector B. And suppose for the sake of a specific example that A and C are

a specific example that A and C are perpendicular such that A dot C equals

perpendicular such that A dot C equals zero. And then also A.B B is equal to B

zero. And then also A.B B is equal to B do C, which is 1 / <unk>2. That is to

do C, which is 1 / <unk>2. That is to say, we have a 45° angle between the

say, we have a 45° angle between the vectors A and B, as well as also a 45°

vectors A and B, as well as also a 45° angle between the vectors B and C. So,

angle between the vectors B and C. So, for example, if A is pointing straight

for example, if A is pointing straight up and C is pointing straight to the

up and C is pointing straight to the right, then B is going to be right in

right, then B is going to be right in between them, a 45° angle that points up

between them, a 45° angle that points up and to the right. And if we apply this

and to the right. And if we apply this reasoning to that scenario, you'll find

reasoning to that scenario, you'll find when you evaluate the dot products in

when you evaluate the dot products in equation 22 that 4 * the quantity of

equation 22 that 4 * the quantity of epsilon plus delta has to be greater

epsilon plus delta has to be greater than or equal to the<unk> of 2 - 1,

than or equal to the<unk> of 2 - 1, which is about 0.41.

which is about 0.41. So divide both sides by 4, you find that

So divide both sides by 4, you find that epsilon plus delta has to be greater

epsilon plus delta has to be greater than or equal to.1 something. And then

than or equal to.1 something. And then remember that delta is kind of an

remember that delta is kind of an artifact of our smearing process. So you

artifact of our smearing process. So you can imagine making that as small as you

can imagine making that as small as you want. In fact, if you want to make that

want. In fact, if you want to make that zero and say forget about averaging,

zero and say forget about averaging, don't worry about the averaging process.

don't worry about the averaging process. But even then, you'll find that epsilon

But even then, you'll find that epsilon cannot be made arbitrarily small because

cannot be made arbitrarily small because in this case, it would have to be at

in this case, it would have to be at least 01 something. But remember what

least 01 something. But remember what epsilon is. It's a bound on the mismatch

epsilon is. It's a bound on the mismatch between the local hidden variable

between the local hidden variable correlation and the quantum mechanical

correlation and the quantum mechanical correlation. So if epsilon cannot be set

correlation. So if epsilon cannot be set to zero then the quantum mechanical

to zero then the quantum mechanical expectation value cannot be represented

expectation value cannot be represented either accurately or arbitrarily closely

either accurately or arbitrarily closely in the form of equation 2 which is the

in the form of equation 2 which is the definition of a generic local hidden

definition of a generic local hidden variable correlation.

variable correlation. So that is argument. Now you can see

So that is argument. Now you can see there's a bit of algebra and it takes a

there's a bit of algebra and it takes a moment to kind of soak it in. And when

moment to kind of soak it in. And when you're first encountering this argument,

you're first encountering this argument, probably the thing you want to do is

probably the thing you want to do is just focus on how each step follows

just focus on how each step follows logically from the previous step and

logically from the previous step and then think big picture about what are

then think big picture about what are our assumptions and what is the result.

our assumptions and what is the result. And you think about how our assumptions

And you think about how our assumptions were so generic going all the way back

were so generic going all the way back to equation two defining the correlation

to equation two defining the correlation for a local hidden variable model. We

for a local hidden variable model. We made no assumptions or any kind of

made no assumptions or any kind of restrictions on the sort of thing that

restrictions on the sort of thing that our hidden variables lambda could be.

our hidden variables lambda could be. And so we've proven this very generic

And so we've proven this very generic result which is that at least for some

result which is that at least for some measurement settings A, B, and C. We can

measurement settings A, B, and C. We can show that there is going to be a finite

show that there is going to be a finite nonzero mismatch between the correlation

nonzero mismatch between the correlation given by a local hidden variable model

given by a local hidden variable model and the correlation given by quantum

and the correlation given by quantum mechanics.

mechanics. And here there's a possibility of

And here there's a possibility of getting confused by equation 22 because

getting confused by equation 22 because you might say, well wait a minute,

you might say, well wait a minute, aren't there settings of A, B, and C

aren't there settings of A, B, and C that make the right hand side zero and

that make the right hand side zero and so this isn't a problem? And that is

so this isn't a problem? And that is true, but it's not surprising because

true, but it's not surprising because remember, as we saw earlier in part 3B

remember, as we saw earlier in part 3B of this paper, you can have an agreement

of this paper, you can have an agreement between a local hidden variable model

between a local hidden variable model and quantum mechanics for certain

and quantum mechanics for certain specific settings of our measurement

specific settings of our measurement directions.

directions. So the fact that there exist

So the fact that there exist experimental configurations where a

experimental configurations where a local hidden variable model might agree

local hidden variable model might agree with quantum mechanics is not

with quantum mechanics is not philosophically profound because the

philosophically profound because the profound thing is that there exist

profound thing is that there exist experimental conditions where no local

experimental conditions where no local hidden variable model can explain the

hidden variable model can explain the results of quantum mechanics. All that's

results of quantum mechanics. All that's to say, if you as an experimentter

to say, if you as an experimentter design an experiment where local hidden

design an experiment where local hidden variables in quantum mechanics agree,

variables in quantum mechanics agree, it's like fine. Okay. But if someone

it's like fine. Okay. But if someone else designs an experiment where they

else designs an experiment where they orient their detectors in such a way,

orient their detectors in such a way, like the example given here, where no

like the example given here, where no local hidden variable explanation makes

local hidden variable explanation makes sense and only quantum mechanics with

sense and only quantum mechanics with its weird non-local wave function

its weird non-local wave function collapse or something mathematically

collapse or something mathematically isomeorphic is able to explain the data.

isomeorphic is able to explain the data. Well, then that's the case and point

Well, then that's the case and point right there that reality is not

right there that reality is not described by a local hidden variable

described by a local hidden variable model. And so even the existence of one

model. And so even the existence of one possible experimental setup that

possible experimental setup that violates local realism is all you need

violates local realism is all you need to know that well something other than

to know that well something other than local realism is going on in this

local realism is going on in this universe. So that's a glitch in reality

universe. So that's a glitch in reality right there. You know, this is one of

right there. You know, this is one of those things that the more you think

those things that the more you think about it, the more it blows your mind.

about it, the more it blows your mind. You'd like to think the more you think

You'd like to think the more you think about something, the less it blows your

about something, the less it blows your mind. But no, in this case, it's the

mind. But no, in this case, it's the opposite.

Part five, generalization. All right. Right. So in this part of the

All right. Right. So in this part of the paper, Belle is going to make the

paper, Belle is going to make the argument that even though we've been

argument that even though we've been thinking in terms of spin and the

thinking in terms of spin and the singlet state of two spin 1/2 particles

singlet state of two spin 1/2 particles with entangled spin, the same arguments

with entangled spin, the same arguments regarding non-locality and correlations

regarding non-locality and correlations and hidden variables applies much more

and hidden variables applies much more generally in quantum mechanics in a way

generally in quantum mechanics in a way that doesn't depend specifically on

that doesn't depend specifically on spin. We just thought about it in terms

spin. We just thought about it in terms of spin because that's an example that's

of spin because that's an example that's easy to think about. So Bell begins part

easy to think about. So Bell begins part five generalization with the example

five generalization with the example considered above has the advantage that

considered above has the advantage that it requires little imagination to

it requires little imagination to envisage the measurements involved

envisage the measurements involved actually being made cuz you can imagine

actually being made cuz you can imagine the sternerlock magnets and the

the sternerlock magnets and the orientation and the spin and all of

orientation and the spin and all of that. But in a more formal way, assuming

that. But in a more formal way, assuming that any hermission operator with a

that any hermission operator with a complete set of igen states is an

complete set of igen states is an observable, the result is easily

observable, the result is easily extended to other systems. So in other

extended to other systems. So in other words, it's not just about spin. We can

words, it's not just about spin. We can apply this reasoning to any quantum

apply this reasoning to any quantum mechanical observable.

mechanical observable. If two systems have state spaces of

If two systems have state spaces of dimensionality greater than two, we can

dimensionality greater than two, we can always consider two-dimensional

always consider two-dimensional subspaces and define in their direct

subspaces and define in their direct product operators sigma 1 and sigma 2

product operators sigma 1 and sigma 2 formally analogous to those used above

formally analogous to those used above and which are zero for states outside of

and which are zero for states outside of the product subspace.

the product subspace. Whenever we have two quantum systems, no

Whenever we have two quantum systems, no matter how complicated they might be,

matter how complicated they might be, they'll always contain smaller two-state

they'll always contain smaller two-state parts that we can focus in on. And

parts that we can focus in on. And within those parts, we can define

within those parts, we can define measurements that behave just like the

measurements that behave just like the simple spin measurements we discussed

simple spin measurements we discussed earlier. And when we do that in that

earlier. And when we do that in that two-dimensional subspace, there's going

two-dimensional subspace, there's going to be a state which is analogous to the

to be a state which is analogous to the singlet spin state but pertaining to

singlet spin state but pertaining to whatever observable we're talking about

whatever observable we're talking about in this more general context.

in this more general context. Then for at least one quantum mechanical

Then for at least one quantum mechanical state, the singlet state in the combined

state, the singlet state in the combined subspaces, the statistical predictions

subspaces, the statistical predictions of quantum mechanics are incompatible

of quantum mechanics are incompatible with separable predetermination.

with separable predetermination. That is the kind of realism or local

That is the kind of realism or local causality that we would expect from a

causality that we would expect from a local hidden variable theory or even a

local hidden variable theory or even a kind of quantum mechanical picture where

kind of quantum mechanical picture where the two states are separable. Like

the two states are separable. Like remember earlier we were talking about

remember earlier we were talking about the uh isotropic mixture of product

the uh isotropic mixture of product states where each particle had an equal

states where each particle had an equal and opposite spin and we saw how that

and opposite spin and we saw how that gave a correlation which was three times

gave a correlation which was three times weaker than the singlet state. Well,

weaker than the singlet state. Well, that same kind of reasoning applies to

that same kind of reasoning applies to this two-dimensional subspace of

this two-dimensional subspace of whatever observable we're dealing with.

whatever observable we're dealing with. you can create a state which is directly

you can create a state which is directly analogous to the spin singlet state. And

analogous to the spin singlet state. And when you do that and you separate out

when you do that and you separate out the particles and you measure them in

the particles and you measure them in different ways, you'll find that the

different ways, you'll find that the quantum mechanical singlet quote unquote

quantum mechanical singlet quote unquote state is always going to have weirdly

state is always going to have weirdly strong non-local correlations.

strong non-local correlations. And so all that's to say, Bell's theorem

And so all that's to say, Bell's theorem is not about spin per se. Generically,

is not about spin per se. Generically, quantum mechanics can exhibit non-local

quantum mechanics can exhibit non-local correlations in all kinds of different

correlations in all kinds of different observables.

observables. All right, my friends, let's go ahead

All right, my friends, let's go ahead and wrap things up with part six,

and wrap things up with part six, conclusion.

conclusion. In a theory in which parameters are

In a theory in which parameters are added to quantum mechanics to determine

added to quantum mechanics to determine the results of individual measurements

the results of individual measurements without changing the statistical

without changing the statistical predictions, there must be a mechanism

predictions, there must be a mechanism whereby the setting of one measuring

whereby the setting of one measuring device can influence the reading of

device can influence the reading of another instrument, however remote.

another instrument, however remote. That is to say, if you take Einstein's

That is to say, if you take Einstein's perspective that quantum mechanics needs

perspective that quantum mechanics needs to be supplemented with hidden

to be supplemented with hidden variables, then Bell has proven that

variables, then Bell has proven that that hidden variable model has to

that hidden variable model has to contain non-local interactions which are

contain non-local interactions which are apparently unrestricted by the normal

apparently unrestricted by the normal limitations of space and time. Moreover,

limitations of space and time. Moreover, the signal involved must propagate

the signal involved must propagate instantaneously

instantaneously so that such a theory could not be

so that such a theory could not be Loren's invariant. and Lorent and

Loren's invariant. and Lorent and variance. That's just one of the main

variance. That's just one of the main principles of special relativity. That

principles of special relativity. That is to say, once you have a non-local

is to say, once you have a non-local theory, you run into all kinds of

theory, you run into all kinds of problems with special relativity. And

problems with special relativity. And really, a non-local theory just totally

really, a non-local theory just totally goes against the usual relativistic

goes against the usual relativistic notions of space and time and causality.

notions of space and time and causality. Now, fortunately, because of the no

Now, fortunately, because of the no signaling theorem, the non-local

signaling theorem, the non-local correlations in quantum physics are not

correlations in quantum physics are not actually able to corrupt our universe by

actually able to corrupt our universe by allowing for the transmission of

allowing for the transmission of information faster than the speed of

information faster than the speed of light. But still, there's a deep

light. But still, there's a deep philosophical tension between the

philosophical tension between the non-local correlations in quantum

non-local correlations in quantum mechanics and the way we usually think

mechanics and the way we usually think about the nature of space and time from

about the nature of space and time from a relativistic perspective. And to this

a relativistic perspective. And to this day, that tension remains unresolved. We

day, that tension remains unresolved. We really do not have a good explanation

really do not have a good explanation for what's going on with the non-local

for what's going on with the non-local correlations in quantum mechanics.

correlations in quantum mechanics. Depending on who you ask, different

Depending on who you ask, different people have different ideas and

people have different ideas and theories, but there's really no

theories, but there's really no consensus. And the reason being, well,

consensus. And the reason being, well, one of the reasons is that all these

one of the reasons is that all these different models are so crazy that it's

different models are so crazy that it's like what are you going to believe in?

like what are you going to believe in? You want to believe in many worlds or

You want to believe in many worlds or super determinism or that you just give

super determinism or that you just give up the concept of realism? I mean, no

up the concept of realism? I mean, no matter how you try to explain the

matter how you try to explain the implications of Bell's theorem, it ends

implications of Bell's theorem, it ends up just blowing your mind. No one has

up just blowing your mind. No one has yet found a sane explanation for what's

yet found a sane explanation for what's going on here. All right, so this is

going on here. All right, so this is basically the conclusion of Bell's paper

basically the conclusion of Bell's paper right here. But then he goes on to add

right here. But then he goes on to add one additional note, a little caveat,

one additional note, a little caveat, which is, of course, the situation is

which is, of course, the situation is different if the quantum mechanical

different if the quantum mechanical predictions are of limited validity.

predictions are of limited validity. Conceivably, they might apply only to

Conceivably, they might apply only to experiments in which the settings of the

experiments in which the settings of the instruments are made sufficiently in

instruments are made sufficiently in advance to allow them to reach some

advance to allow them to reach some mutual rapport by exchange of signals

mutual rapport by exchange of signals with velocity less than or equal to that

with velocity less than or equal to that of light. In that connection,

of light. In that connection, experiments of the type proposed by Bow

experiments of the type proposed by Bow and Aaronov in which the settings are

and Aaronov in which the settings are changed during the flight of the

changed during the flight of the particles are crucial. And all that's to

particles are crucial. And all that's to say, if you're doing an experiment where

say, if you're doing an experiment where the settings of the two detectors are

the settings of the two detectors are set in advance and then you're sending

set in advance and then you're sending your entangled particles to each

your entangled particles to each detector, well, maybe there's some way

detector, well, maybe there's some way that the two detectors have communicated

that the two detectors have communicated with each other or established some sort

with each other or established some sort of rapport somehow. And even though for

of rapport somehow. And even though for each pair of particles, the measurements

each pair of particles, the measurements are happening so fast that they're in

are happening so fast that they're in different light cones, perhaps somehow

different light cones, perhaps somehow the two detectors are already kind of in

the two detectors are already kind of in sync with each other in some sort of

sync with each other in some sort of way. in that they somehow know the

way. in that they somehow know the settings of one another and therefore

settings of one another and therefore you don't need non-locality to explain

you don't need non-locality to explain the correlation results. Now, that would

the correlation results. Now, that would be a very hard to believe situation

be a very hard to believe situation because you'd be like, how can that be?

because you'd be like, how can that be? And you know, how and why would the two

And you know, how and why would the two detectors know about each other, but I

detectors know about each other, but I mean, in theory, that is a loophole that

mean, in theory, that is a loophole that you could imagine possibly somehow being

you could imagine possibly somehow being true. And so, that's why Belle mentions

true. And so, that's why Belle mentions these experiments where you change the

these experiments where you change the settings of the detectors as the

settings of the detectors as the particles are flying along, so that

particles are flying along, so that there's no possible time for the two

there's no possible time for the two detectors to establish a rapport with

detectors to establish a rapport with one another. And so each detector is

one another. And so each detector is going to be truly independent of each

going to be truly independent of each other detector. And so then you're

other detector. And so then you're really ensuring that these correlations

really ensuring that these correlations are genuinely non-local.

are genuinely non-local. Well, okay. So that's the end of the

Well, okay. So that's the end of the paper. I hope you found this

paper. I hope you found this interesting. I hope it's given you

interesting. I hope it's given you something to think about. So yeah,

something to think about. So yeah, thanks for watching. I really appreciate

thanks for watching. I really appreciate it. And I'll see you next time.

it. And I'll see you next time. Hey, I want to say thank you to everyone

Hey, I want to say thank you to everyone who's been supporting my channel on

who's been supporting my channel on Patreon. Your support really means a

Patreon. Your support really means a lot. It really makes a big difference.

lot. It really makes a big difference. And genuinely without your support, I

And genuinely without your support, I wouldn't be able to really dive into

wouldn't be able to really dive into this full-time. So, I'm so grateful for

this full-time. So, I'm so grateful for all of you. Thank you so much. It really

all of you. Thank you so much. It really means a lot.

テキストまたはタイムスタンプをクリックすると、動画のその場面に移動できます

ほとんどの文字起こしは5秒以内に完了

ワンクリックコピー125以上の言語内容を検索タイムスタンプにジャンプ

YouTube URLを貼り付け

任意のYouTube動画リンクを入力すると、完全な文字起こしを取得できます

ほとんどの文字起こしは5秒以内に完了

Chrome拡張機能を追加

YouTubeを離れずに文字起こしを瞬時に取得。Chrome拡張機能をインストールすると、動画視聴ページで任意の文字起こしにワンクリックでアクセスできます。

Chromeに追加 — 無料

YouTube、Coursera、Udemyなど主要な学習プラットフォームに対応

文字起こしをすばやく取得：アドレスバーのドメインを変えるだけ！

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube文字起こし結果を準備しています…

YouTube文字起こし：Bell's Theorem, a Glitch in Reality