Bell's theorem demonstrates that quantum mechanics is fundamentally non-local, meaning that entangled particles can influence each other instantaneously regardless of distance, a concept that challenges classical notions of space, time, and causality.
Mind Map
Click to expand
Click to explore the full interactive mind map • Zoom, pan, and navigate
[music]
Hey everyone. Today I have for you a
genuine glitch in reality that's going
to blow your mind and change the whole
way you think about everything. So it's
called Bell's theorem and this is one of
the most mysterious, unsettling,
magnificent results in all of
theoretical physics. So let's talk about it.
Bell's theorem demonstrates that quantum
mechanics is weirdly non-local.
That is, there's something going on with
quantum physics that doesn't seem to be
bothered by the limitations of space and
time. Now, of course, much has been said
about this, including in various popular
science uh articles and videos and all
that sort of thing. You often hear about
quantum entanglement, spooky action at a
distance, and all that kind of stuff.
And there's often some crossover with uh
sci-fi, about communication systems that
work faster than light and all that. And
there's also kind of this woo woo
connotation about consciousness and all
that sort of thing. And those are all
really fanciful notions, but in many
cases, what you hear about Bell's
theorem and quantum entanglement and all
that is not well grounded in the actual
physics and the math of quantum mechanics.
mechanics.
And so I wanted to make a video where we
actually really get into the technical
details of what exactly did Bell teach
us about the nature of reality. And so I
wanted to go through his famous
legendary 1964 paper, you know, word for
word, equation for equation. I want to
really dive into it and explore with you
exactly what is his argument and what
does it imply about the nature of reality.
reality.
I should point out in case you don't
know, I recently made a video on the
Einstein Podilski Rosen paradox which is
definitely a prequel to this video.
In fact, Bell's legendary 1964 paper is
called on the Einstein Podilski Rosen
paradox. Okay, so this is a followup to
the argument that Einstein, Podolski,
and Rosen put forward back in 1935 in
which they looked at quantum mechanics
and said, "Hey, wait a minute.
Something's wrong here. Something's
paradoxical. Either quantum mechanics is
super weird or maybe it's just incomplete."
incomplete."
And so almost 30 years after that, John
Stewart Bell thought about it real hard
and was like, "You know what? Sorry
Einstein and friends, actually quantum
mechanics is not incomplete, but rather
it's just really weird and genuinely
non-local in at least in some subtle
ways." So that's the context in which
Bell wrote this paper. It's a follow-up
to the argument put forward by Einstein,
Podilski, and Rosen. So before watching
this video, I do recommend watching my
video on the EPR paradox. Or if you
haven't seen that video, but you're just
familiar with the EPR paradox, then
that's cool, too. You don't have to get
your info from me. I'm just one of many
sources on this beautiful internet.
All right, then let's get into the
paper. Well, first of all, this paper is
broken up into six parts. Part one is
the introduction.
Part two is the formulation where we
sort of define our terms and think about
what it is we're going to be thinking
about. Part three is an illustration of
some examples.
And part four has the main argument of
the paper in which we find that if you
try to explain quantum physics using a
local hidden variable theory, you run
into a contradiction. In part five, the
ideas are generalized. And in part six,
we have our conclusion. So those are the
six parts of this paper. We're going to
go through them one at a time. And in
between these, I'm also going to have
some animations and some information and
equations that provide context because
one thing you got to know about this
paper is it is so cryptic and it is so
dense with equations and very few words
that if you just try to read it, it's
really hard actually. You really got to
take your time with this one. And so
we're going to take our time and I'm
going to have related animations and
equations to help us along and to fill
in the gaps in the paper where it's
assumed that the reader is going to be
imagining a certain thing in mind when
they read it. Oh, and speaking of, I've
put a link to the PDF in the description
below the video. And I definitely
recommend printing out this paper so
that you have it for reference as we go
through it. If you don't have a printer,
that's fine, but then you should open it
up on another screen or another tab or something.
something.
All right. So now it's time to get into
the introduction of the paper. The paper
begins. The paradox of Einstein,
Podilski, and Rosen was advanced as an
argument that quantum mechanics could
not be a complete theory, but should be
supplemented by additional variables.
Remember at the end of the EPR paper
they talked about how quantum physics is
incomplete and it's missing something
and you have to put variables into
quantum physics in order to have it
provide a complete description of reality.
reality.
These additional variables were to
restore to the theory causality and locality
locality
and that's often called local causality.
It's just the idea that cause and effect
should propagate such that an object is
only affected by its immediate
surroundings. as opposed to some kind of
weird teleportation or spooky action at
a distance. So Einstein and friends
argued that you have to put some kind of
additional variables into quantum
mechanics in order to resolve the EPR
paradox and give quantum mechanics local causality.
causality.
In this note that is Bell's paper, that
idea will be formulated mathematically
and shown to be incompatible with the
statistical predictions of quantum
mechanics. So that's what we're going to
do today. We're going to mathematically
explore the concept of hidden additional
variables in quantum mechanics and show
that it doesn't work and that therefore
quantum mechanics genuinely does exhibit
non-local phenomena which is crazy. Like
that goes against everything we think we
know about the nature of reality.
Anyway, it is the requirement of
locality or more precisely that the
result of a measurement on one system be
unaffected by operations on a distant
system with which it has interacted in
the past. That creates the essential
difficulty. So the hidden variable story
doesn't work if you require the theory
to be local. There have been attempts to
show that even without such a
separability or locality requirement, no
hidden variable interpretation of
quantum mechanics is possible.
These attempts have been examined
elsewhere and found wanting. That is to
say, actually, you can make a hidden
variable interpretation of quantum
mechanics work if you relax the
constraint of locality. But then it's
like what's the point, right? Moreover,
a hidden variable interpretation of
elementary quantum theory has been
explicitly constructed. Here he's
referring to bombian mechanics. That
particular interpretation bomb mechanics
has indeed a grossly non-local
structure. Famously bow mechanics is a
non-local theory. This the non-locality
is characteristic according to the
results to be proved here of any such
theory which reproduces exactly the
quantum mechanical predictions. That is
to say, what we're going to show in this
paper is that if you want a theory that
matches the quantum mechanical
statistics and you want it to involve
hidden variables as advocated for by
Einstein, Prolski, and Rosen, then
necessarily you're going to end up with
a non-local theory. And of course, that
non-locality is the same kind of dilemma
that you end up having to confront if
you just take quantum mechanics at face
value in which it does appear to be a
non-local theory. So, no matter how you
look at it, there's some weird non-local
stuff going on in quantum mechanics.
All right. Now, before going further, I
want to say a few words about spin 1/2
particles because spin 1/2 particles are
the main characters of this paper. And
so, it'll be helpful to review some of
the main points regarding the experiment
and theory of spin 1/2 particles.
So on the experimental side for sure the
most important and famous spin 1/2
experiment is the stern gerlock
experiment. The way this experiment
works is imagine that you have an oven
and inside the oven you put some silver
and the oven is so hot that the silver
atoms start to evaporate and fly around
with crazy high speeds and some of them
are going to fly out of a hole in the
oven. And then suppose you have some
kind of apparatus called a columator so
that we end up with a line of silver
atoms flying in a particular direction.
And also suppose this whole experiment
happens in a vacuum so that the silver
atoms aren't bumping into air as they
fly along. Now then this beam of atoms
is directed to fly through a strong
non-uniform magnetic field. And
amazingly, what happens is that magnetic
field somehow splits the beam of atoms
into two beams. And it's like, what
what's going on with that two beams? Why
do we have two beams? How can it be that
you have one beam of atoms coming in and
you have two beams going out? Well, the
key to understanding this is that a
silver atom is electrically neutral.
It's 47 protons perfectly cancel out.
It's 47 electrons because it's just a
neutral atom. It's not ionized. But if
you look at the electrons in a silver
atom, you find that all of the electrons
are paired up in their various orbitals,
but there remains a single unpaired
electron in the 5s orbital.
And so for all of the paired electrons
in the silver atom, their spins cancel
each other out. But the unpaired 5s
electron has a spin of 1/2 because an
electron is a spin 1/2 particle. And as
a result, it's sort of like the whole
silver atom behaves like an electrically
neutral spin 1/2 particle. So that
unpaired electron spin gives the whole
atom a tiny magnetic moment. That is it
makes the silver atom sort of like a
tiny little magnet.
I should also say the nucleus of the
silver atom also has a net spin of 1/2.
But because the nucleus is so tightly
packed compared to the electrons, the
magnetic effect of the nuclear spin is
thousands of times smaller than the
magnetic effect of the electron spin. So
for all intents and purposes, it doesn't
matter in this experiment.
So then what happens to the silver atoms
as they're flying through this apparatus
is that the initial beam is totally
thermally random. I mean, you're talking
about evaporated silver atoms. There's
no preferred directionality to the spin.
It's all a random distribution over the
spin directions. But then as they fly
through the sternerlock magnet, for some
reason the spins get projected either
onto purely spin up or purely spin down.
And that's really weird because it's not
this distribution of some continuous
quantity. No, it's a quantum like either
up or down. There's only two options
that it can be, which is super weird,
right? This is a very quantum effect.
And so then if we want to say okay well
these two states are going to be
separated by one quantum unit then you
realize that given the symmetry of the
situation since both beams are deflected
by equal amounts we can say that spin up
is associated with a quantity of plus
1/2 and spin down is associated with a
quantity of - 1/2. So that the
difference between plus one/2 and minus
one/2 is one quantum unit. And so that's
why we call this a spin 1/2 particle.
Okay. So we have two discrete beams. And
clearly there's something weirdly
quantum going on here. But what's really
going on here? You know, cuz the story I
just told about spin 1/2 and the
electron, it's like a little magnet and
it separates out. What does that really
mean? Like physically, how should we
imagine that? Well, in a moment I'll
tell you a little bit of the quantum
theory and then we'll also imagine some
kind of speculative hidden variable
theory and we'll see that those don't
really work. So, we'll get into the
theory in a moment, but for now I
actually want to stick on the
experimental side of things so that we
can learn a little bit more about how
spin 1/2 particles actually behave.
So, imagine we do a sternerlock
experiment where we have a beam of
silver atoms flying through. It goes
through the sternerlock magnet and it
splits into two beams, spin up and spin
down. Now suppose we put a wall so that
all the spin down atoms hit the wall and
they stop going. But then the spin up
atoms, they can fly right through and
they can keep going. And now we have a
beam of spin up atoms. So then we line
it up and pass it through another
sternerlock magnet that's oriented along
the same axis, the same direction in
space. Well, then an amazing thing
happens, which is that in the second
Stern Gerlock magnet, we only see a spin
up beam. There's no spin down. And I
guess that's not too surprising. It kind
of makes sense because we start off with
a random beam of silver atoms. We split
that into a spin up and a spin down. And
then we reme-measure and we find, okay,
there's only spin up. Yeah. Okay, that's
not too mind-blowing. That kind of makes
a lot of sense, right? And remember, all
of this is happening in a vacuum
chamber. So there's no air molecules
that the silver atoms are bumping into
cuz if there were, then we could imagine
the beam kind of rerandomizing. You
know, eventually the silver atoms are
slamming into air molecules and getting
all reoriented and all that sort of
thing. So this is all happening inside a
vacuum chamber. What this two-stage
Stern Gerlock experiment shows is that
spin is a state that the atom is in,
right? It's a property that persists
with the atom and has some continuity
across time. So that it makes sense to
say this is a spin up atom at least for
now. You know, I mean, it can bump into
something and change its spin. But
supposing it doesn't, then it can
continue on in that spin- up state for
some amount of time. So that's cool.
That gives us some sense of the
physicality of spin. But we're still
left with the mysterious question of why
do we have two discrete options for a
spin measurement anyway as opposed to
some continuous range of outcomes? And
how should we visualize a spin state?
Well, again, we'll talk about the theory
of that in just a moment, but there's
one more experimental thing I want to
show you before we get there. What we're
going to do now is imagine slightly
rotating the second magnet by some small
angle theta. And then a magical thing
happens. The second beam now mostly
comes out as spin up. But now there's
also a spin down beam as well. And it's
very subtle because all the spin up
atoms that are flying through the second
detector, most of them are going to come
out spin up. But every now and then
there is a chance that it'll come out
spin down. And so if you think about
many atoms flying through and so it's
sort of like a continuous beam
situation, then imagine a very bright
spin up beam and a dull but nonzero spin
down beam. And so then the question
becomes what is the probability of it
being spin up versus spin down in this
kind of an experiment? And there's
actually a very good agreement between
quantum mechanics and experimental
results which show that for the atoms
passing through the second magnet they
have a cosine^ squar theta /2
probability of being spin up and
likewise a sin^ 2 theta /2 probability
of being spin down. Remember that
cosine^ 2 + sin^ square is 1. So those
probabilities add up to one 100%. And
we're going to take that as sort of a
ground truth for this video.
This cosine^ 2 / 2 sin^ square. We're
going to take that as an absolute fact
about reality because it has been
measured in many experiments and it is a
pretty direct result of quantum theory.
Oh, and one thing I should say in this
diagram, you see that second beam is
still horizontal even though I tilted
the picture of the detector. In reality,
if you're doing an experiment like this,
you would want to realign the second
beam so that it comes in parallel to the
detector. But there are ways of doing
that without modifying the spin state of
the particle. So I just didn't show that
in this diagram because I wanted to keep
things simple. Actually, let me show you
this. This is a cool much better
diagram. So this comes from Wikipedia.
Shout outs to Clara Kate Jones for
making this beautiful diagram. What this
diagram shows is a two-stage Stern
Gerlock experiment. The particle beam
comes in. You get a 50-50 split between
spin up and spin down denoted as Z plus
and Z minus. You know, because we're
measuring along the Z-axis.
Then we send that second beam through
the second detector. The second detector
appears to be tilted, but is actually
just in alignment with the way the Z
plus beam comes out of the first detector.
detector.
But now I want to look at something
really cool, which is what if the second
detector measures along a whole
different axis. So, for example, if the
second detector measures along the
x-axis, the spin up particle beam goes
through the second detector and then
splits into a 50/50 probability mix of
being spin left or spin right. By the
way, instead of spin left and spin
right, let's use the language spin up
along x and spin down along x. So you
see when we say spin up and spin down,
it's always with reference to a
measurement axis and spin up is going to
be the beam which goes up relative to
that axis. Okay? So we can always use
the words spin up and spin down. But in
this experiment, you can also think
about it as spin left and spin right
when we're measuring along the x-axis.
I suppose this experiment is not too
surprising either because we see that
the particles come in spin up. We
wouldn't really expect any kind of
probabilistic biases as far as spin
left, spin right because all we know is
that the particles are all spin up and
up is perpendicular to left and right.
So it'd be kind of weird if the second
particle beam had some kind of bias
towards left and right, right? Like
where would that come from? We should
still expect some kind of randomness
along the x direction. Okay, so that
doesn't really blow your mind, but this
See, imagine we have a three-stage
experiment where the particle beam comes
in, the first detector splits into spin
up and spin down. We send only the spin
up through. Then the second detector
measures along X. So we get our spin
left and our spin right or in other
words along X. We can talk about it in
terms of spin up and spin down along X.
And then suppose we only allow the spin
up along X beam to go through. Then we
measure again along the Zaxis. And the
craziest thing happens. Look what we
get. We get a 50/50 particle beam of
spin up or spin down along Z. Well, how
can that be? Because the first magnet
already filtered out all of the spin
down along Z. So, shouldn't we expect
for the outgoing beam, we should have
only spin- ups, right? Isn't that what
we should expect is only spin up along Z
because the first magnet already
filtered out the spin down. But no, in
reality in experiments, you get a 50/50
spin up along Z. So what is going on
there? That's very strange. And the
reason this is so strange is that we
know that spin is a property of the
atom. We know that it's a physical thing
that the atom carries with it as it
moves along. Right? Right? I mean, we
thought about this earlier and we
realized, yeah, the Stern Gerlock
experiment shows us that spin is a state
that the atom can be and it's a property
of the atom at some moment in time. And
so, how can it be that if we've filtered
out the spin down along Z atoms, somehow
after the third detector, we get spin
down along Z? Like, what's happening
there? How can spin be a conserved
quantity if it comes back like that?
Like, what's going on? Now, what I'm
showing here, this is just an
experimental fact. This is the reality.
And then as people, it's on us to figure
out how do we tell a story that makes
sense of this reality. And so in just a
moment, I'm going to tell you the
quantum story, which is going to explain
what's happening here. And the long
story short of that is when you measure
the spin along some axis, the particle
forgets its spin information along the
other axis because you're resetting the
spin state of the particle. you're
projecting it into a spin igen state of
whatever axis you most recently measured
it on. And so once you measure it spin
up spin down along X, now all of a
sudden if it's in a spin up along Xigg
state, that has equal 50/50 odds of
being measured spin up or spin down
along Z. But then of course when you
learn quantum physics you're always
thinking about this is so weird and so
strange and I don't like it and surely
there's some kind of more classical
explanation with some kind of hidden
variable. Surely there's some kind of
secret behavior happening inside the
atom or to do with these detectors.
Maybe the detectors are modifying the
atom in such a way as to flip them up
and flip them down and kind of reset
their state. All right. So when you
learn quantum physics, you yearn for a
more sane explanation.
And especially, you know what would be
really nice is if we didn't have all
these weird quantum probabilities,
right? So wouldn't it be cool if we can
come up with some kind of explanation
for what's going on in the Stern Gerlock
experiment, but rather than this
confusing quantum story with wave
functions and states, what if we can
come up with some kind of more classical
deterministic model of what's going on
here? Even though such models don't
work, it's still very helpful to give it
a try, see what we can come up with, and
then when we figure out the way in which
the model doesn't work, that'll help us
appreciate why we need quantum
mechanics, even though it's super weird.
And seeing the failure of these local
hidden variable models is going to segue
very nicely into the core argument of
Bell's paper. All right. So, I want to
return to this picture of the two-stage
Stern Gerlock experiment where we use
the first magnet just to filter out the
spin down atoms and give us a beam of
nice pure spin up atoms. Then, we're
going to send those through a second
detector tilted relative to the first by
an angle of theta. And as we talked
about earlier, the probability of the
atom being spin up in the second
detector is going to be cosine^ 2 of
theta / 2. In this plot, we put the
theta angle along the x-axis and we put
the percentage probability that it'll be
spin up on the y-axis.
So on the far left of this plot, you can
see that we have a 100% chance of
measuring spin up when the second
detector is tilted 0° relative to the
first. That is when they're in
alignment. A spin up coming in is always
a spin up going out. On the opposite
extreme, if you imagine we put the
second detector all the way upside down,
180 degrees tilted, then relative to
that orientation, the detector is going
to say, "Hey, every particle spin down."
And now that's not too surprising
because all that is is we're flipping
the second detector around. So what was
defined as spin up is now relative to
the second detector spin down. And so
really, we don't have to think about an
angle of all the way up to 180° because
the interesting stuff happens with a
tilt angle between 0 and 90°. And beyond
that point, there's a kind of symmetry
where it's the same thing, but it's just
everything's flipped relative to before.
And speaking of 90°, if we tilted the
second detector 90°, then we'd have a
50/50 chance of an incoming spin up atom
going out as either spin up or spin down.
down.
Here's an animation, and this will give
us a more dynamic picture of what's
going on here. So, we have our incoming
beam of silver atoms coming in from the
left. They go through the first
detector. We split out, spin up, spin
down. The spin- ups keep going. And on
the right, what I'm showing here, and
this is just a rectangle, so it's kind
of abstract, but all I mean to indicate
there is we're doing a spin measurement
along the axis symbolized by the
orientation of that rectangle.
And as the rectangle goes back and
forth, you can kind of get a feel for
how the relative probability of
measuring spin up and spin down along
that second measurement axis changes as
a function of the angle.
On one extreme, when the detectors are
aligned, spin up is always spin up. On
the other hand, when the detector is
90°, we get a 50/50 split. And in
between, we get a probability which goes
with this cosine^ 2 theta / 2 curve.
Now this equation the cosine^ square of
theta / 2 comes from the spinner math of
what happens when you project a spin
state relative to one axis onto another
axis. But all of that spinner math and
projection and all that that's the weird
quantum stuff we don't want to have to
deal with if we don't have to. So when
we're trying to come up with a hidden
variable explanation, we want to think
in terms of some kind of quantity that
we can attach to each particle. maybe
some kind of arrow that indicates some
sort of direction. And you know, one of
the first things that comes to mind when
you think about the Stern Gerlock
experiment is maybe each incoming atom
has some kind of vector-like directional
quantity associated with it and then
maybe the detector sort of flips that
vector up or down as the particle passes through.
through.
Now, I'm not saying that's the case. I'm
just saying that's kind of something
that we might instinctively or
intuitively think might be the case. And
so let's go ahead and test our intuition
against logic and reason and see if it
actually holds up. So what I'm showing
here is an animation where we have these
atoms coming in and there's a yellow
vector associated with each one of them
which encodes some sort of orientational
direction like thing that goes with the
atom. And so for the sake of argument,
we can say our incoming beam should have
a random distribution over those vector
angles because these are evaporated
silver atoms and it's all thermally
random. Then suppose we claim that what
a sternerlock magnet does is it's going
to flip that arrow either up or down.
And then if it flips it up, it sends it
upwards. If it flips it down, it sends
it downwards.
Well, at first glance, an explanation
like this seems like it could possibly
be kind of what's going on here. This is
a model where the Sternlock magnet plays
a really active role in aligning the
particle a certain way. And whether or
not it flips up or flips down, we can
say the rule there is just if the vector
is pointing even a little bit up, it
goes up. If it's pointing even a little
bit down, it goes down. If it's pointing
perfectly horizontal, well, in reality,
nothing's perfectly horizontal. There's
probability zero of that happening. And
even if it did happen, it happens so
rarely you'd never even notice.
You know, the cool thing about physics
is that you can put an idea forward and
you can really propose it like, hey,
maybe this is how it is. But one of the
rules of physics is you have to stick to
whatever principles you propose. But
then if you can show that your own
principle leads to a contradiction, well
then sorry, but you have to redesign
your model. Okay. So what I want to show
now is that this assumption that the
sternerlock magnet flips up or flips
down the atom is actually not consistent
with the experimental data. And the
reason is actually very simple and you
can totally see it which is that if you
have a two-stage Stern Gerlock
experiment where the second detector is
tilted. We know from the experimental
data that when the second detector is
tilted then some of the particles should
sometimes come out spin down even if
they went in as spin up.
But if we tilt the detector anywhere
between 0° and all the way up to 89.9°,
then by this rule that the sternerlock
magnet is going to flip the particle in
whichever way it was already kind of
pointing in. Well, that leads us to see
that an incoming beam of spin up is
always going to come out spin up.
And so right there you see that this
model doesn't actually work by our own
principle that we put forward about
these arrows getting flipped up or
flipped down and and all that it doesn't
work. It just doesn't match the
two-stage Stern Gerlock experiment.
And so whatever is going on with spin,
it's not that. It's something else.
So what do we do? Well, just because our
model didn't work doesn't mean we can't
massage it into something that might work.
work.
So let's go ahead and see if we can
massage our model into something which
matches the experimental data at least
better than our first attempt which kind
of matched the data in the case of one
sternerlock magnet but failed miserably
when we had two and the second one was
tilted. Well, okay. So what if we did
this? Let's say that a sternerlock
magnet doesn't actually flip the
particle up or down, right? Because if
it does that, then as we've seen, the
second detector is just going to give us
a bunch of spin ups and no spin downs.
So let's say instead of flipping the
arrow up or down, the Stern Gerlock
magnet just kind of passively sorts
these particles based on whether their
vector points a little bit up or a
little bit down.
And so any vector that points even a
little bit up, that gets sent towards
the up beam. And any vector that points
a little bit down, that atom goes in the
down beam. But the sternerlock magnet
doesn't change the direction of that vector.
vector.
So maybe this vector represents a kind
of classical spin axis. Then in this
model, the angular momentum of the
particle would be conserved as it passes
through the detector. But somehow and
for some reason, the detector is just
sorting the incoming particles into two
beams depending on whether they're a
little bit up or a little bit down.
Well, you know, there's a problem with
this model, which is that
philosophically, it's starting to feel a
bit contrived because it's hard to
reconcile the fact that we see two
discrete beams with such a passive thing
going on at the detector.
Because at least before when we thought
that maybe the magnet just flips the
thing up or flips the thing down, there
you have kind of a naturally physically
dichomous situation where yeah, it's a
sword, but then it's also an action
where the particles are really separated
out in a binary way.
So if you have a more passive situation
where it's just a sword, you kind of
have to wonder, well then how is it that
we get two sharp beams? But never mind
all that because even though it seems
implausible, that's different than it
being illogical or impossible or
incoherent. You know, nature is weird.
So maybe this is how it is. But now if
we take this model and pass it through a
second sternerlock magnet, the question
comes up of does this model match the
data? In particular, do we find a
cosine^ squar theta / 2 of an incoming
spin up remaining spin up versus a sin^
square theta /2 probability of it going
spin down? Well, if you just look at the
animation shown here, you can see that
at first glance it kind of does seem to
work because when the second detector is
not tilted at all, anything coming in
spin up is going to go out spin up. So
that's good. at theta equals 0, this
model matches experiment.
And then if you imagine at 90°, well,
there it's a 50/50 because coming in the
spin up beam, that's just going to be a
vector that's pointing up a little bit,
but the distribution is totally random
as far as left and right. And so when
the detector is tilted at 90°, that
could go either way at that point, you
know. And so there again, we find
another angle at which our model matches
the data. And another wonderful thing
about this model is that for
intermediate angles, it kind of seems
like it would fit the data. You know, if
you tilt the detector like 45°, you can
see there's kind of a chance that it
would be spin down versus spin up. And
so at first, this feels very exciting
and very promising.
But when you think through it carefully,
you realize that this model actually
doesn't quite [clears throat] match the
cosine squared statistics that we get
from the experiment and from quantum
physics because instead of a cosine
squared function, it's actually just a
linear function in theta. And that's
actually a very important point. So I
want to linger on that for a moment and
I want to see exactly why this model
gives us a probability which is linear
in theta. So you think about the fact
that we have evaporated silver atoms
coming in and presumably they're all
going to be randomly oriented. And so if
we want to come up with a picture that
involves this hidden variable of an
orientational vector-like degree of
freedom, call it lambda, then the
situation we're describing here begins
with lambda vectors chosen totally at
random as far as their direction is
concerned. And if you like, you can
imagine lambda is being selected
uniformly from the unit circle. or if
you want to be fully three-dimensional,
the unit sphere. Although, as we're
about to see, it actually really doesn't
matter whether we think about it in
terms of a two-dimensional situation or
a three-dimensional situation. In either
case, we find the same linear trend. All
right, then. So, the particle passes
through the first sternlock magnet and
all of these vectors lambda that were
pointing a little bit downwards get
filtered out. They go in the spin down
beam and we block that. But then if the
vector is pointing even a little bit up
then it keeps passing through and then
it moves on to the next sterner lock detector.
detector.
So let's go ahead and use the vector P
to symbolize the polarization vector
that is the axis of measurement for the
first sterning lock magnet. You see here
based on the diagram that all of the
particles that have made it through our
filter are all going to be measured spin
up if they're measured again perfectly
along the direction P with no tilt angle.
angle.
And so that's what it means
experimentally to prepare some spin 1/2
particles some firmians with the spin
polarization along the vector P. It
means that for sure we know if we
measure the spin along P we're going to
get spin up.
Now then what can we say about that
hidden variable vector lambda? Well, we
can say that the particles that are
allowed through necessarily have lambda
which is somewhere in the northern
hemisphere. that is the hemisphere that
points in the same kind of direction as
the polarization vector P. Or in other
words, these are the lambda such that
lambda.p is greater than zero. And the
lambda are still going to be uniformly
distributed around that hemisphere
because they came in uniformly
distributed around the sphere and we've
just cut it in half. So now we want to
ask the question of what is the
probability of a particle with some
lambda vector being measured spin up in
the second detector which would happen
in our local hidden variable model if
lambda. A is greater than zero. That is
if the lambda vector happens to be
pointing in the same hemisphere as the
measurement axis a. And when you think
about it, you realize that the
probability of lambda measuring spin up
depends on the overlap of the lambda
hemisphere and the a hemisphere.
See, cuz if we draw a and then we think
about the hemisphere of vectors that
point in kind of the same direction as a
that is for which the vector a is
positive, you realize that the set of
all lambdas which are going to be
measured spin up is precisely the
overlap between the lambda hemisphere
and the a hemisphere. And given that
lambda is going to have a uniform
probability distribution, we can see
then that the probability of measuring
spin up is just going to be the fraction
of the lambda hemisphere that overlaps
with A. And the probability of it
measuring spin down is going to be the
fraction of lambda's hemisphere that
does not overlap with A. And if you see
that, then you see one of the core
concepts of Bell's paper. We're going to
describe this slightly differently in a
moment when we get into the paper and
it's going to be a little bit more
complicated, but this right here is a
very fundamental insight. Imagining
rotating hemispheres and seeing how the
overlap varies linearly. That is a
mental image that you want to keep in
mind as we get into parts three and four
of the paper. All right, then. So, just
to be really formal about this, let's go
ahead and say that theta is the tilt
angle between our polarization vector P
and our measurement axis vector A. And
then I want you to go ahead and imagine
rotating theta from 0 to pi or 180 if
you want to talk in terms of degrees.
Well, when you start off with theta
equals 0, p and a are aligned the same
way. And there's a complete overlap
between the lambda hemisphere and the a
hemisphere. And so you have a 100%
chance, guaranteed chance that when
theta is zero, you're going to measure
the particle spin up. But now imagine
theta growing and growing until theta
equals 90° or p<unk> /2 radians. Well,
at that point you're going to have a
50/50 overlap between the lambda
hemisphere and the a hemisphere. And so
then you're going to have a 50/50 chance
of measuring spin up versus spin down.
And then if you go ahead and flip it all
the way around 180° A and P are
perfectly antiparallel, then it'll be
guaranteed that you'll measure spin down
for a theta of 180°. Bearing in mind
that spin down is relative to that
upside down vector a. Now these three
points for which theta is 0, theta is 90
and theta is 180° all of those actually
do match the experimental data and
quantum mechanics. So that's all good.
But what's not all good is that linear
dependence on the probability of
measuring spin up as a function of the
angle theta. And you can see that linear
dependence just based on the way the
area fraction changes as you slide theta
around and you change the overlap
between these two hemispheres.
You know, one way to think about the
probability logic here is just imagine
you're playing one of those board games
that has the spinner thing and you spin
the thing and then the probability that
it lands on some wedge is just going to
be the wedge area. Well, yeah. So when
you think about that kind of logic and
then you think about the wedge area of
the overlap between the hemispheres and
the way it changes you can see that the
probability is indeed linear in theta.
But now that linearity is actually a
real problem because from experiments
and from quantum mechanics we can very
confidently say that the probability of
measuring the particle spin up is not
linear in the tilt angle theta but
rather it's the cosine^ square of theta
/ 2. And that fact that cosine squared
curvy fact makes our linear model very
hard to believe because the math is
wrong. the statistical predictions of
our model are not the true statistics of
the situation.
So what do we do? We just give up. Well,
we actually should give up because as
we'll see in this, you know, the whole
paper is about how local hidden variable
models don't work. But let's not give up
yet. Let's be very stubborn, okay?
Because technically there is a way that
we can fix this particular model for
this particular situation.
And the way in which we do that is going
to involve a concept which we'll see
later on in the paper. So we're going to
try to save this model somehow. And the
way that we're going to try to do that
is going to be illustrative and teach us
something about the situation. Even
though ultimately this fix is going to
break down when we later on start
looking at quantum entanglement.
All right. Then so the way to fix the
model is to define an effective
measurement axis. Call that a prime. and
define that as the measurement axis A
tilted towards the polarization vector P
such that the equation 1 - 2 theta prime
pi= cosine of theta is satisfied. Now
here by theta prime I mean the tilt
angle between the polarization vector p
and the effective measurement axis a
prime which has been magically tilted in
towards the polarization vector p. And
when you look at this equation here with
the 1 - 2 theta prime pi, that is a
linear equation. And then you look on
the right hand side and that's a cosine.
Now this equation here, it's not
immediately obvious what this has to do
with cosine^ 2 thet. In just a minute
though, we're going to talk about
expectation values and cosine of theta.
And then when we come back to this
equation later on in the paper, it'll
make more sense why exactly it has the
form that it does. But I don't want to
get into that just now because it's a
bit of a tangent. For now, all I want to
say is that this equation involving
theta prime and theta is going to warp
the linear probability dependence of our
model which is linear and theta is going
to warp that into the cosine^ 2 theta /2
curve that we expect from quantum
mechanics. And in fact, that is the
definition of where this theta prime and
theta equation comes from. So this trick
is actually a lot simpler than it seems
because when you think about what we
have here, as we've seen, our model
works when theta is 0, when theta is
90°, when theta is 180, but it breaks
down in between because we have a line
instead of a cosine squar. And so all
this trick is is just saying that we can
go ahead and warp that line into that
cosine squared curve simply by saying
that the effective measurement axis that
the particle is actually being measured
along is not the A that we thought it
was but is actually this A tilted
slightly towards the polarization vector
P. And by doing that we can go ahead and
bend the statistical predictions of our
model in such a way as to make it match
the experimental data and also quantum mechanics.
mechanics.
Now, the first time you hear this, I
mean, you should be thinking, "Rich,
come on now. What? This is absurd. We
should not tolerate this. We should not
go along with this." Your eyebrow should
raise skeptically to the point where
your forehead starts to get sore. Like,
there's just no credible way to justify
this move, this little trick that we're
doing. And so, for that reason, I want
to go ahead and call this the sketchy
move. I know it's kind of a playful
terminology, but there's a couple of
good reasons why we want to call it
this. First of all, it's a concept that
we're going to see a couple more times
throughout the paper. And then secondly,
I want to emphasize that this move is
not illegal. It's not logically
impossible. Technically, it doesn't
violate locality. There's nothing uh
physically impossible going on when we
put forward this model. But it's
extremely sketchy and hard to believe
because it raises so many questions. Why
should the effective measurement axis be
a prime? And also, how is it then that
we have the polarization vector and also
our hidden variable lambda vector that
we both have to take into account?
Because the polarization vector bends
the effective measurement axis. Then we
also have this lambda vector and what's
going on there? And our whole model
starts to become complicated and
contrived and very very hard to believe.
But we're not going to dismiss it just
yet. because later when we think about
quantum entanglement, we're going to
prove that even the sketchy move is no
longer enough to save our model or any
local hidden variable model. And that's
really at the heart of Bell's theorem.
So in summary, by going along with the
sketchy move for now, we're being
maximally open-minded, we're giving the
local hidden variable perspective every
benefit of the doubt. So that later on
when we absolutely destroy local hidden
variables, when we crush this idea,
we'll say, "Look, we even allowed the
sketchy move and that still wasn't
enough to make it work."
Now, I want to take just a moment to
talk about the kind of mathematical
vocabulary we use in quantum physics
when we're describing measuring the spin
of a spin 1/2 particle along some
direction, call it a. And to do that you
often see this expression sigma a. Let
me tell you what that is. So we have the
famous poly matrices which are sigma x
is 0 1 1 0 sigma y is 0 i i 0 and sigma
z is 1 0 01.
And you can find the definition of these
polymatrices in Griffith's intro to
elementary particles equation 4.26.
Although honestly if you just Google
polymatrices you'll find them all over
the place. They're super famous. And
these polymatrices are generators of sud
2, the le algebra of su2 which is the
group that has to do with
transformations of two component
spinners. It's the special unitary group
of degree 2. Anyway, today we don't need
to get into the group theory of su2, but
I just bring up the poly matrices in a
sort of vocabulary like context. Like
we're not actually going to have to
explore their mathematical properties,
but I just want to show you why it is
that these matrices are associated with
measuring the spin of a spin 1/2 particle.
particle.
You often see sigma with an arrow over
it. And you can think of that as a
vector whose components are the three
poly matrices. So you have sigma x,
sigma y, sigma z all packaged into this
vector-like quantity. And with that
sigma vector, we can go ahead and define
the spin operator along the unit vector
A as S hat. The spin operator equals H
bar / 2 sigma. A.
And what we mean by sigma A is we're
going to multiply all of the components
of our measurement direction A with each
of the corresponding poly matrices. So
we have a sub x sigma x plus a sub y
sigma y plus a subz sigma z. So when you
pick out a particular direction in
three-dimensional space and you want to
measure the spin of a particle along
that direction, the components of that
direction unit vector are like weights
of how much of each of the poly matrices
we're going to bake into our spin
operator along that direction.
Now why do we care about a spin operator?
operator?
Well, as we talked about in the EPR
paper, when you have an observable
quantity like spin, the value of the
quantity is going to be the igen value
corresponding to the igen states of the
operator. So if we have a spin 1/2
particle and its state is represented by
the two component spinner s then the
spin operator acts on s as the equation
shat operating on s is h bar / 2 * sigma
a * s
and bear in mind sigma do a this is
going to be a 2x2 matrix in fact if you
want to think about it in terms of the
lee algebra sue 2 that matrix is going
to live at the coordinate It's a sub x,
a sub y, a subz within the lee algebra
which is spanned by the poly matrices
sigma x, sigma y, sigma z. If that makes
sense, great. If it doesn't, don't worry
about it. That's a level of group theory
that we don't have to get into today.
Instead, I want to give you a specific
example of what it means for a particle
to be an igen state of the spin operator.
operator.
So if a particle has definite spin, that
is we've measured the spin and it's
either spin up or spin down along some
axis, then it is going to be an igen
state of the spin operator along that
axis. That's what the measurement does.
You measure the spin of a particle and
you're projecting its wave function onto
an igen state of the spin operator along
that axis. And so therefore s is going
to be a solution to the equation of shat
acting on s equals lambda s for some
real value lambda which is going to be
the spin of the particle.
As a concrete example let's suppose
we're measuring the spin of a particle
along the zaxis.
Well in that case our direction vector
becomes 0 0 1 cuz the vector doesn't
point in x. It doesn't point in y it
points entirely in z. And so therefore
if we evaluate this quantity of sigma a
we find that we have no sigma x no sigma
y and all sigma z. And so then our spin
operator along the z direction becomes h
bar / 2 1 0 01.
And so now if we want to solve for what
are the igen states of spin up and spin
down along z all we have to do is solve
this equation of h bar / 2 * this sigma
z matrix * s equals lambda * s for some
real igen value lambda and this igen
vector igen value equation has the
solutions of 1 0 or 0 1 for s and then
you find igen values of plus h bar / 2
and minus h R /2 respectively. And you
can verify that for yourself if you plug
into that igen vector igen value
equation these different options for S
and lambda.
Oh, and one other thing I'll say is that
for these igen vectors, you can go ahead
and slap a complex phase factor onto
both components and they remain states.
And in a moment, I'll show you a picture
which makes that point obvious. But for
now, I just leave that as a mathematical
algebraic statement. All right. Right.
Now, instead of the spin operator S hat,
we may as well just talk in terms of
sigma. A, which is conceptually it's
exactly the same thing as Shat. The only
difference is it's not scaled by that
factor of H bar / 2. And so therefore,
this sigma operator has nice
dimensionless values of plus or minus
one for spin up versus spin down. And so
therefore the sentence the particle was
measured spin up along the axis can be
said as measuring sigma. A yielded a
value of + one. Or in other words if you
want to say the particle was measured
spin down along the axis. We can say
sigma. A yielded a value of negative 1.
Or if you want to say the particle was
measured spin up along the b axis you
say sigma.b yielded a value of + one.
Right? So [clears throat] what we have
here is a very concise and mathematical
way of saying that a spin 1/2 particle
was measured along some axis and the
result of that measurement is simply the
igen value + one or minus1.
So in Bell's paper, he's going to use
this a lot. And so that's why I wanted
to show you where sigma.A comes from and
what it means. And we don't really have
to get too deep today into the theory of
SU2 and spinners and all that and poly
matrices. So if you're not super
familiar with all of these algebraic
details, that's actually totally fine.
For the purpose of understanding Belle's
paper, you really just have to know from
a vocabulary point of view that sigma. A
means measuring the particle spin along
the AIS and that the results are going
to be + one or minus one depending on
whether it turns out to be spin up or
spin down respectively.
Before we move on, I do want to give you
just a couple more examples of this
concept just to make the idea a little
bit more intuitive, a little bit more
familiar. So suppose we had measured
instead of along Z along the X
direction. Well then we find that the
spin operator along X is going to be H
bar over 2 sigma X. And when you think
about what are the solutions to h bar 2
sigma x acting on s= lambda s you find
the igen states of 1 / <unk>2 * 1 plus
or - 1 corresponding to igen values of
plus or - h bar / 2. That is to say we
find the same exact kind of situation as
before when we measured along z as far
as the igen values. You have two options
spin up or spin down. The magnitude of
the observable is h bar over two. But
now you have this spinner that's in a
different state. It's pointing in a
different direction. And by the way, the
one over <unk>2, that's just a
normalization constant. And likewise, we
can repeat exactly the same procedure.
We can measure along y. We find that the
spin operator along the y direction is h
bar over 2 sigma y. You solve that
vector value equation. you find the igen
states of 1 /<unk>2 1 plus orus i with
the same old values of plus orus h bar / 2.
2.
And I know all of this feels very
abstract, but there is a visual story
that goes with this algebra. And I've
touched on it in my previous videos
about the mystery of spinners and
electromagnetism as a gauge theory and
also driving the dro equation where
there's a way of drawing a two component
spinner as a flag in three dimensions.
So for example, let's take the igen
state for a particle that's in a spin up
state relative to the z-axis. that is
the spinner 1 Z. Well, if we plot that
using this flag picture diagram and
we'll go ahead and slap on a time
evolution phase factor corresponding to
the energy of the particle, we see that
we have a flag that points straight up
along Z. And then the time evolution
phase factor, that is the rotation in
the complex plane, is going to twirl
that flag around.
If you're curious as to the algebraic
machinery that's happening behind the
scenes, definitely check out the paper
an introduction to spinners by Andrew
Mstein. That paper explains in depth how
exactly the two component spinners map
on to these flag diagrams. But now then
if we plot the spin down along Z spinner
01 that is you see hey it's a flag
that's pointing down along Z. So that
makes sense. And now notice the time
evolution phase vector which rotates the
flag in the complex plane has the effect
of twirling the flag but in the opposite
way as before. Although really it's the
same way. It's just that the flag is
pointing in the opposite direction. The
way to see this is point your right
thumb along the direction that the flag
pole is pointing and then you find that
the phase factor is going to twirl the
flag in the same way that your fingers
go around on your right hand.
So we find in these spinners a picture
of a thing of some kind of quantity that
has an orientation and that kind of
spins around under a complex phase time
evolution. And so that gives you a feel
for some of the algebraic machinery
that's happening behind the scenes when
we talk about spinners and polyatrices
and all of that.
And so [clears throat] now I want you to
imagine in your mind what would the igen
state of spin up along the xaxis look like?
like?
Well, there it is. Makes sense, right?
So, this is 1 / <unk>2 1 with the time
evolution phase factor. We can go ahead
and also add on the spin down along xigg
state. And that's exactly as you would
expect. Now, let's also add in the spin
up along yen state. And there it is
pointing along y spinning around. And if
you add in the spin down along yen
state, well then there it is.
So without going into too too much
detail about the algebra of spinners and
all that, I just wanted to show you that
there is a picture corresponding to all
of this algebra. And that's something
that I would definitely encourage you to
read more about and to explore. But for
the purposes of Belell's paper, we
actually don't need to get too into the
details there. But I hope this has been
useful context.
All right. So before returning to the
paper, I want to say a couple of words
about the concept of the expectation
value of these spin measurements cuz
we're going to see that concept later on
in the paper. So remember earlier we
were looking at the slide shown here and
we thought about how if we rotate the
second magnet by an angle theta for a
particle beam, which we know is going to
be spin up if we measure it vertically,
then the beam is going to split into two
beams. And for a small angle theta, it's
going to be mostly spin up. But there's
some probability of that also being spin
down. And then as we talked about
before, the probability of spin up is
going to be cosine^ squar of that tilt
angle theta / 2. And likewise, the
probability of it being spin down is
going to be 1 minus that. So we're going
to have sin^ square of theta / 2. And
that's all fine and good and that's
totally true and that's one way to talk
about it. But there's another way we can
talk about it in terms of expectation
value which is in some ways more convenient.
convenient.
So to be really technical about this,
suppose we go ahead and call the second
magnet's axis the vector A and then as
we talked about we can use the notation
sigma A as a shorthand for the result of
measuring the spin along the axis A.
Because as you know when you dot the
sigma vector comprised of the poly
matrices by some unit vector a you end
up with something that's directly
proportional to the spin operator but
which has igen values of + one if the
particle is measured spin up and
negative 1 if the particle is measured
spin down. So then now we ask the
question of what is the expectation
value of sigma. A and all we mean by
expectation value is the average over
many measurements holding the A vector
constant. Let me give you an analogy.
Let's say you're a gambler and somehow
you have the opportunity to play a game
where you have a 60% chance of winning a
dollar and a 40% chance of losing a
dollar. Well, in that case, the
expectation value is going to be 20
cents because you have 0.6 6 * 1 which
is 6 and then you add on to that the 0.4
* -1 which is 0.4 and so you have a net
0.2 expectation value of a profit and so
you should play that game. Now the
reason I bring up this analogy is
because of course if you play the game
once you're not going to get 20. You're
either going to make a dollar or you're
going to lose a dollar. So we should not
expect one game to yield 20 cents.
However, if you play that game a 100
times you're going to have about 20
bucks. that's what you should expect to
have. And so that's exactly the sense in
which we use the term expectation value
when thinking about these spin
measurements. In every case, when you
measure the spin, it's going to be a
plus one or a minus one. But depending
on the tilt angle and depending on the
probability that depends on the tilt
angle, there's going to be some average
number that we'll find for that tilt
angle over many subsequent measurements
along that axis. And if you work out the
math as we'll do in just a moment, you
end up with the plot shown here where on
the x-axis we have the tilt angle theta
and then if you look at this curve for
the expectation value and by the way we
use the bracket notation here to
indicate expectation value. Well, as a
sanity check, let's go ahead and look at
a few points and see if this curve kind
of makes sense.
So first of all when theta is zero and
when a is aligned with the polarization
of those incoming spin-up atoms then we
find an expectation value of one and
that makes sense because when the second
detector is not tilted then every single
time a spin up coming in is going to be
a spin up going out and so sigma. A is
going to yield an igen value of plus one all the time. So you do it 100 times
all the time. So you do it 100 times you're going to get 100 plus ones. And
you're going to get 100 plus ones. And then conversely, if we flip a all the
then conversely, if we flip a all the way upside down, then you have a spin up
way upside down, then you have a spin up coming in relative to the upside down
coming in relative to the upside down second detector. That's always going to
second detector. That's always going to come out as a spin down. And so in that
come out as a spin down. And so in that extreme case, you always have a negative
extreme case, you always have a negative 1 for sigma. A, therefore, the
1 for sigma. A, therefore, the expectation value is precisely -1. Now,
expectation value is precisely -1. Now, if you check out this point in the
if you check out this point in the middle of the plot when theta is 90° and
middle of the plot when theta is 90° and the measurement axis A is perfectly
the measurement axis A is perfectly perpendicular to the incoming spin up
perpendicular to the incoming spin up polarization, well, in that case, sigma.
polarization, well, in that case, sigma. A is going to be a +1 or a minus1, you
A is going to be a +1 or a minus1, you know, each with a 50% probability. And
know, each with a 50% probability. And so if you have a set of 100 numbers
so if you have a set of 100 numbers which are either +1 or minus1 with equal
which are either +1 or minus1 with equal probability, well, you add those all up
probability, well, you add those all up and on average you're going to get zero.
and on average you're going to get zero. All right, then. So based on the three
All right, then. So based on the three points we've looked at, the curve seems
points we've looked at, the curve seems to make sense. But how do we calculate
to make sense. But how do we calculate the exact form of this curve? Well, all
the exact form of this curve? Well, all you have to do is think like a gambler
you have to do is think like a gambler and say the expectation value is going
and say the expectation value is going to be the probability of measuring spin
to be the probability of measuring spin up along the axis A times a plus one
up along the axis A times a plus one corresponding to spin up plus the
corresponding to spin up plus the probability of measuring spin down along
probability of measuring spin down along the axis A time the negative 1 that
the axis A time the negative 1 that corresponds to spin down. This is just
corresponds to spin down. This is just like in that game where you have 60%
like in that game where you have 60% chance of winning a dollar, 40% chance
chance of winning a dollar, 40% chance of losing a dollar. So the expectation
of losing a dollar. So the expectation value is $0.2. So it's the same
value is $0.2. So it's the same reasoning as a gambling calculation. And
reasoning as a gambling calculation. And as we saw earlier, we already know the
as we saw earlier, we already know the probability of measuring spin up versus
probability of measuring spin up versus spin down. In the first case, we have a
spin down. In the first case, we have a cosine^ 2 / 2 probability of measuring
cosine^ 2 / 2 probability of measuring spin up. And then we have a sin^ 2 thet
spin up. And then we have a sin^ 2 thet / 2 probability of measuring spin down.
/ 2 probability of measuring spin down. Now, if you are a trig identity
Now, if you are a trig identity enthusiast, you'll recognize this form
enthusiast, you'll recognize this form as having a delightful simplification,
as having a delightful simplification, which is that cosine^ 2 / 2us theta / 2
which is that cosine^ 2 / 2us theta / 2 equals cosine of theta. Isn't that
equals cosine of theta. Isn't that wonderful how that simplifies? So that's
wonderful how that simplifies? So that's a super nice result. And we're going to
a super nice result. And we're going to see the same result in Belle's paper in
see the same result in Belle's paper in equation 3 in a slightly different
equation 3 in a slightly different context, but it's the same exact
context, but it's the same exact reasoning. So anyway, that's all I
reasoning. So anyway, that's all I wanted to say about the expectation
wanted to say about the expectation value. So just think about this as a
value. So just think about this as a pretty common and useful way of putting
pretty common and useful way of putting a statistical handle on this kind of
a statistical handle on this kind of probabilistic situation.
probabilistic situation. All right, then. So now I think we've
All right, then. So now I think we've discussed all of the prerequisites that
discussed all of the prerequisites that we need for the remainder of the paper.
we need for the remainder of the paper. So now let's go ahead and get into part
So now let's go ahead and get into part two formulation.
So remember how in the EPR paper they gave a specific example of a two
gave a specific example of a two particle wave function with
particle wave function with anti-correlated momenta and correlated
anti-correlated momenta and correlated positions.
positions. And with that wave function, we saw how
And with that wave function, we saw how if we measure the momentum of one of the
if we measure the momentum of one of the particles, we end up putting the other
particles, we end up putting the other particle in a momentum state. And
particle in a momentum state. And conversely, if we choose to measure the
conversely, if we choose to measure the position of the particle, then we put
position of the particle, then we put the other one into a position state. So
the other one into a position state. So that specific wave function in the EPR
that specific wave function in the EPR paper was a very mathematically
paper was a very mathematically convenient example to illustrate the
convenient example to illustrate the point. However, of course, the EPR
point. However, of course, the EPR paradox is more general than just a
paradox is more general than just a single specific two particle wave
single specific two particle wave function. And if you see equations 7 and
function. And if you see equations 7 and 8 of the EPR paper, you can see that
8 of the EPR paper, you can see that more generically, whenever you have two
more generically, whenever you have two particles in an entangled state and you
particles in an entangled state and you think about representing that wave
think about representing that wave function as a sum over states of the
function as a sum over states of the first particle, then when you measure
first particle, then when you measure the first particle and put it into that
the first particle and put it into that igen state, that's going to have an
igen state, that's going to have an impact on the state of the second
impact on the state of the second particle. And so really the EPR paradox
particle. And so really the EPR paradox is just the observation that because we
is just the observation that because we have the freedom to choose which
have the freedom to choose which observable we measure of the first
observable we measure of the first particle, we have the ability then to
particle, we have the ability then to affect the quantum state of the second
affect the quantum state of the second particle in a way that somehow appears
particle in a way that somehow appears to violate the constraint of local
to violate the constraint of local causality.
causality. So anyway, the reason I bring that up is
So anyway, the reason I bring that up is because in Bell's paper, we're going to
because in Bell's paper, we're going to use a different two particle state to
use a different two particle state to get at the same fundamental paradoxical
get at the same fundamental paradoxical nature of quantum physics. So instead of
nature of quantum physics. So instead of the particles having anti-correlated
the particles having anti-correlated momenta and correlated positions, we're
momenta and correlated positions, we're going to imagine a pair of spin 1/2
going to imagine a pair of spin 1/2 particles whose spins are going to be in
particles whose spins are going to be in an entangled state. And this
an entangled state. And this configuration for thinking about the EPR
configuration for thinking about the EPR paradox is actually not original to
paradox is actually not original to Bell. It was first put forward by Bow
Bell. It was first put forward by Bow and Aharonov in 1957.
and Aharonov in 1957. So part two of Bell's paper begins with
So part two of Bell's paper begins with the example advocated by Bow and
the example advocated by Bow and Aharonov. The EPR argument is the
Aharonov. The EPR argument is the following. Consider a pair of spin 1/2
following. Consider a pair of spin 1/2 particles formed somehow in the singlet
particles formed somehow in the singlet spin state. Now I want to pause here and
spin state. Now I want to pause here and say what exactly is the singlet spin
say what exactly is the singlet spin state? Well that means that the spins of
state? Well that means that the spins of the two particles have no preferred
the two particles have no preferred direction a priori. If you think about
direction a priori. If you think about either of the particles and you're going
either of the particles and you're going to measure their spin, there's total
to measure their spin, there's total rotational symmetry in that neither of
rotational symmetry in that neither of the particles has a preferred spin axis.
the particles has a preferred spin axis. It's totally uniformly distributed over
It's totally uniformly distributed over all possibilities.
all possibilities. However, the spins of the particles
However, the spins of the particles exhibit perfectly anti-correlated
exhibit perfectly anti-correlated outcomes when measured along the same
outcomes when measured along the same axis. And this is a very bizarre state
axis. And this is a very bizarre state of affairs. Intuitively, you would think
of affairs. Intuitively, you would think that such a state is not possible. And
that such a state is not possible. And yet, the singlet state has been measured
yet, the singlet state has been measured in all kinds of experiments. So, this
in all kinds of experiments. So, this really is possible. This is something
really is possible. This is something that is real. And as we'll talk about
that is real. And as we'll talk about later in the paper, even though it's
later in the paper, even though it's very hard to imagine and it seems kind
very hard to imagine and it seems kind of surreal, the experimental data very
of surreal, the experimental data very strongly indicates that the singlet
strongly indicates that the singlet state is actually a legit thing that can
state is actually a legit thing that can exist. And you sometimes hear the
exist. And you sometimes hear the singlet state described as the particles
singlet state described as the particles having equal and opposite spin. But
having equal and opposite spin. But that's not exactly true, or rather
that's not exactly true, or rather that's too narrow of a description.
that's too narrow of a description. It is true that if you measure the two
It is true that if you measure the two particles along the same axis, you'll
particles along the same axis, you'll always find that their spins are equal
always find that their spins are equal and opposite. But, and this is really a
and opposite. But, and this is really a super important fact about the singlet
super important fact about the singlet spin state. So, I want to reemphasize
spin state. So, I want to reemphasize this. Before the measurement, neither of
this. Before the measurement, neither of the particles has a preferred spin
the particles has a preferred spin direction. This is very hard to imagine
direction. This is very hard to imagine but that is a super important aspect of
but that is a super important aspect of what it is for the particles to be in
what it is for the particles to be in the singlet state.
the singlet state. All right. So that's the singlet state.
All right. So that's the singlet state. Now imagine that we have some process
Now imagine that we have some process which produces pairs of spin 1/2
which produces pairs of spin 1/2 particles in the singlet state and then
particles in the singlet state and then each particle goes its separate way and
each particle goes its separate way and they're both moving freely in opposite
they're both moving freely in opposite directions.
directions. Now then suppose we send each particle
Now then suppose we send each particle into a detector say maybe a sternerlock
into a detector say maybe a sternerlock magnet and then we measure the spin of
magnet and then we measure the spin of both particles to get a sense of the
both particles to get a sense of the kind of thing that happens here. At
kind of thing that happens here. At first we're going to say that the
first we're going to say that the detectors are measuring along the same
detectors are measuring along the same axis.
axis. Let's go ahead and denote that with the
Let's go ahead and denote that with the unit vectors A and B respectively. And
unit vectors A and B respectively. And for starters, those unit vectors are
for starters, those unit vectors are going to be precisely aligned so that
going to be precisely aligned so that we're measuring both particles along the
we're measuring both particles along the same spin axis. And now because the
same spin axis. And now because the particles are in the singlet spin state,
particles are in the singlet spin state, if we measure the spin of particle one
if we measure the spin of particle one along the direction A and we get the
along the direction A and we get the value of + one, right? So suppose
value of + one, right? So suppose particle one measures spin up along a
particle one measures spin up along a then according to quantum mechanics and
then according to quantum mechanics and what it means for the particles to be in
what it means for the particles to be in the singlet state. For sure it's 100%
the singlet state. For sure it's 100% guaranteed that measuring the spin of
guaranteed that measuring the spin of particle 2 along the same axis is going
particle 2 along the same axis is going to yield a value of -1 that is spin down
to yield a value of -1 that is spin down and vice versa. Had we measured particle
and vice versa. Had we measured particle one in the spin down state then we would
one in the spin down state then we would know for sure that particle 2 would be
know for sure that particle 2 would be spin up along the same axis.
spin up along the same axis. By the way, just a comment on the
By the way, just a comment on the notation here. So, as we talked about
notation here. So, as we talked about earlier, the expression sigma A is
earlier, the expression sigma A is shorthand for measuring the spin of the
shorthand for measuring the spin of the particle along the axis A. And this
particle along the axis A. And this operator returns a plus one if it's spin
operator returns a plus one if it's spin up along A and a minus one if it's spin
up along A and a minus one if it's spin down along A. Now, then the subscripts
down along A. Now, then the subscripts here 1 and two, all that indicates is
here 1 and two, all that indicates is that in the first case we're measuring
that in the first case we're measuring particle one and in the second case
particle one and in the second case we're measuring particle two. So it's
we're measuring particle two. So it's not like we have two different sigma
not like we have two different sigma vectors. No, it's the same poly
vectors. No, it's the same poly matrices. It's the same operator. It's
matrices. It's the same operator. It's just that in the first case we apply it
just that in the first case we apply it to the first particle. And in the second
to the first particle. And in the second case, sigma 2, we apply that to the
case, sigma 2, we apply that to the second particle.
second particle. So now we make the hypothesis of local
So now we make the hypothesis of local causality. And it seems one at least
causality. And it seems one at least worth considering that if the two
worth considering that if the two measurements are made at places remote
measurements are made at places remote from one another, the orientation of one
from one another, the orientation of one magnet does not influence the result
magnet does not influence the result obtained with the other. And just to
obtained with the other. And just to really emphasize that point, imagine
really emphasize that point, imagine that detector A and detector B are
that detector A and detector B are separated so far and that the
separated so far and that the measurement of particle one and the
measurement of particle one and the measurement of particle 2 happen so
measurement of particle 2 happen so closely together in time that whatever
closely together in time that whatever tiny time difference there is between
tiny time difference there is between these two measurements, not even light
these two measurements, not even light could travel between detectors A and B
could travel between detectors A and B during that time. So, we imagine that
during that time. So, we imagine that the measurements going on at detector A
the measurements going on at detector A and detector B are completely causally
and detector B are completely causally disconnected if local causality is to be
disconnected if local causality is to be believed.
believed. But here's where we run into the APR
But here's where we run into the APR paradox. Since we can predict in advance
paradox. Since we can predict in advance the result of measuring any chosen
the result of measuring any chosen component of the spin of particle 2 by
component of the spin of particle 2 by previously measuring the same component
previously measuring the same component of the spin of particle 1, it follows
of the spin of particle 1, it follows that the result of any such measurement
that the result of any such measurement must actually be predetermined.
must actually be predetermined. That is to say because the particles
That is to say because the particles start off in the singlet state with no
start off in the singlet state with no preferred spin direction. Then imagine
preferred spin direction. Then imagine particle one is measured in detector A
particle one is measured in detector A ever so slightly before particle 2 is
ever so slightly before particle 2 is measured in detector B. You know by 0001
measured in detector B. You know by 0001 ns or whatever. Well, as soon as we've
ns or whatever. Well, as soon as we've measured particle 1 along the axis A,
measured particle 1 along the axis A, now we can predict with certainty the
now we can predict with certainty the component of the spin of particle 2
component of the spin of particle 2 along the same axis. And yet that
along the same axis. And yet that certainty does not exist in quantum
certainty does not exist in quantum physics. Now we can tell a story about
physics. Now we can tell a story about non-local wave function collapse where
non-local wave function collapse where you measure particle one along axis A
you measure particle one along axis A and the wave function instantly
and the wave function instantly collapses and then particle 2 is no
collapses and then particle 2 is no longer in the singlet state but now it's
longer in the singlet state but now it's for sure going to be polarized in
for sure going to be polarized in accordance with that measurement
accordance with that measurement direction A. But assuming that we don't
direction A. But assuming that we don't allow for non-local wave function
allow for non-local wave function collapse because we want to preserve our
collapse because we want to preserve our sanity and we want to hold on to this
sanity and we want to hold on to this concept of local causality, then we find
concept of local causality, then we find here an apparent contradiction because
here an apparent contradiction because the spin of particle 2 along the axis A
the spin of particle 2 along the axis A should definitely not be predictable
should definitely not be predictable with certainty given the wave function
with certainty given the wave function of the singlet state. A quantum physics
of the singlet state. A quantum physics just doesn't allow for that level of
just doesn't allow for that level of predictability unless we allow for the
predictability unless we allow for the possibility of instantaneous wave
possibility of instantaneous wave function collapse. So then since the
function collapse. So then since the initial quantum mechanical wave function
initial quantum mechanical wave function that is the singlet state does not
that is the singlet state does not determine the result of an individual
determine the result of an individual measurement this predetermination
measurement this predetermination implies the possibility of a more
implies the possibility of a more complete specification of the state. And
complete specification of the state. And so that is apparently the EPR paradox
so that is apparently the EPR paradox this time thought of in terms of spins
this time thought of in terms of spins rather than momentum and position
rather than momentum and position states. And so in other words, all of
states. And so in other words, all of this thought process leads us to think
this thought process leads us to think that surely there must be some kind of
that surely there must be some kind of hidden variables that go along with
hidden variables that go along with particles one and two in a way that
particles one and two in a way that quantum mechanics doesn't account for.
quantum mechanics doesn't account for. And if only we had some kind of more
And if only we had some kind of more complete model where we could figure out
complete model where we could figure out what are those hidden variables and what
what are those hidden variables and what are their dynamics and how do they
are their dynamics and how do they influence the spin measurements. Then
influence the spin measurements. Then surely we can find a more complete and
surely we can find a more complete and more sane and more understandable
more sane and more understandable explanation of what's going on here than
explanation of what's going on here than what quantum mechanics currently has to
what quantum mechanics currently has to offer. Well, all right then. So we want
offer. Well, all right then. So we want a more complete theory involving some
a more complete theory involving some kind of hidden variables. So let this
kind of hidden variables. So let this more complete specification be affected
more complete specification be affected by means of parameters lambda. These are
by means of parameters lambda. These are going to be our hidden variables. So in
going to be our hidden variables. So in this video whenever you see this yellow
this video whenever you see this yellow lambda that's going to stand for
lambda that's going to stand for whatever hidden variables we want to put
whatever hidden variables we want to put into our model that's going to give us a
into our model that's going to give us a more complete description of what's
more complete description of what's happening. So you know earlier we were
happening. So you know earlier we were looking at the Sternerlock experiment
looking at the Sternerlock experiment and we were trying to explain it in
and we were trying to explain it in terms of particles carrying with them
terms of particles carrying with them this yellow vector. And so that was an
this yellow vector. And so that was an example of lambda. But now we're going
example of lambda. But now we're going to broaden that up a little bit. Or
to broaden that up a little bit. Or actually we're going to broaden it up
actually we're going to broaden it up all the way and say lambda can be
all the way and say lambda can be whatever you want it to be. Whatever you
whatever you want it to be. Whatever you can imagine. A vector, a scalar, a
can imagine. A vector, a scalar, a tensor, a function, a set, whatever you
tensor, a function, a set, whatever you want it to be. It is a matter of
want it to be. It is a matter of indifference in the following. Whether
indifference in the following. Whether lambda denotes a single variable or a
lambda denotes a single variable or a set or even a set of functions and
set or even a set of functions and whether the variables are discrete or
whether the variables are discrete or continuous. The beautiful thing about
continuous. The beautiful thing about Belle's paper is it accounts for all
Belle's paper is it accounts for all possible hidden variable models in one
possible hidden variable models in one fell swoop because it's such a generic
fell swoop because it's such a generic argument as we'll see. However, we write
argument as we'll see. However, we write as if lambda were a single continuous
as if lambda were a single continuous parameter. So the notation that we'll be
parameter. So the notation that we'll be using, for example, we'll integrate over
using, for example, we'll integrate over all possible lambda and it'll look like
all possible lambda and it'll look like we're assuming that lambda is a
we're assuming that lambda is a continuous parameter. However, what
continuous parameter. However, what Belle is saying here is that if you want
Belle is saying here is that if you want to modify the argument so that lambda is
to modify the argument so that lambda is not a continuous parameter but is rather
not a continuous parameter but is rather a discrete parameter or a set or
a discrete parameter or a set or whatever contrived thing you want to
whatever contrived thing you want to come up with, you can trivially modify
come up with, you can trivially modify the argument to account for that.
the argument to account for that. Replace an integral with a sum or
Replace an integral with a sum or whatever you have to do. Those kinds of
whatever you have to do. Those kinds of modifications won't have any effect on
modifications won't have any effect on the logical structure of the argument
the logical structure of the argument put forward in this paper. So now let's
put forward in this paper. So now let's think about what's happening in these
think about what's happening in these detectors. And at this moment we can go
detectors. And at this moment we can go ahead and say that the axis of
ahead and say that the axis of measurement in detector B does not have
measurement in detector B does not have to be the same as the axis of
to be the same as the axis of measurement in detector A. So we're
measurement in detector A. So we're going to make this more generic. Oh, and
going to make this more generic. Oh, and one thing that I'll point out is that in
one thing that I'll point out is that in everything we're about to talk about,
everything we're about to talk about, what matters as far as the orientations
what matters as far as the orientations of the unit vectors of A and B is only
of the unit vectors of A and B is only the angle between those two vectors, the
the angle between those two vectors, the extent to which they're aligned or
extent to which they're aligned or misaligned.
misaligned. And when you think about two vectors in
And when you think about two vectors in three-dimensional space, the two vectors
three-dimensional space, the two vectors are going to span a plane, and then
are going to span a plane, and then there's going to be some angle between
there's going to be some angle between them in that plane. And that angle
them in that plane. And that angle between them, that theta angle is the
between them, that theta angle is the relevant quantity when we're thinking
relevant quantity when we're thinking about how the orientations of these two
about how the orientations of these two measurement axes are going to matter.
measurement axes are going to matter. And so if you want, you can imagine a
And so if you want, you can imagine a fully generic three-dimensional
fully generic three-dimensional situation where A and B can point
situation where A and B can point whichever ways you want to imagine them
whichever ways you want to imagine them pointing. But because it's only the
pointing. But because it's only the theta angle between them that matters in
theta angle between them that matters in whatever plane they happen to span, we
whatever plane they happen to span, we may as well imagine the A vector
may as well imagine the A vector pointing straight up. And then we can
pointing straight up. And then we can imagine the B vector having some random
imagine the B vector having some random orientation in the plane. And so the
orientation in the plane. And so the diagram shown here on your
diagram shown here on your two-dimensional screen with A pointing
two-dimensional screen with A pointing up and B pointing wherever, imagine
up and B pointing wherever, imagine rotating the B- axis a full 360. Well,
rotating the B- axis a full 360. Well, for all intents and purposes, that 360
for all intents and purposes, that 360 sweep is going to span all of the
sweep is going to span all of the possibilities as far as the ways in
possibilities as far as the ways in which we can misorient our detectors
which we can misorient our detectors relative to each other. And actually,
relative to each other. And actually, you only need 180 cuz once you tilt it
you only need 180 cuz once you tilt it past 180, theta starts to come back in.
past 180, theta starts to come back in. See what I mean? And then technically by
See what I mean? And then technically by symmetry, all the interesting stuff
symmetry, all the interesting stuff happens between 0 and 90°.
happens between 0 and 90°. Okay. So then what is actually going on
Okay. So then what is actually going on in these detectors? Well, if we assume
in these detectors? Well, if we assume this hidden variable model, then the
this hidden variable model, then the result A of measuring the spin of
result A of measuring the spin of particle 1 along the AIS is then
particle 1 along the AIS is then determined by the AIS and the hidden
determined by the AIS and the hidden variable lambda.
variable lambda. So, particle 1 is coming in, it's
So, particle 1 is coming in, it's carrying with it some kind of hidden
carrying with it some kind of hidden variable, maybe some vector, some
variable, maybe some vector, some scalar, some tensor, whatever it is,
scalar, some tensor, whatever it is, whatever hidden variable we want to
whatever hidden variable we want to imagine. And as particle one goes into
imagine. And as particle one goes into detector A and detector A is oriented
detector A and detector A is oriented along the A axis, then the only things
along the A axis, then the only things that are going to affect the spin
that are going to affect the spin measurement at particle 1 are the
measurement at particle 1 are the orientation that A vector and the hidden
orientation that A vector and the hidden variable lambda that goes with particle
variable lambda that goes with particle 1. Because particles one and two are in
1. Because particles one and two are in the singlet state, they don't have any a
the singlet state, they don't have any a priori preferred directions. So the
priori preferred directions. So the result of the spin measurement is going
result of the spin measurement is going to be deterministically well determined
to be deterministically well determined by however the hidden variable lambda
by however the hidden variable lambda interacts with the detector oriented
interacts with the detector oriented along A. And likewise then the result B
along A. And likewise then the result B of measuring the spin of particle 2
of measuring the spin of particle 2 along the B ais in the same instance is
along the B ais in the same instance is determined by the B ais and lambda for
determined by the B ais and lambda for exactly the same reason. And so we can
exactly the same reason. And so we can write that the measurement outcome at A
write that the measurement outcome at A as a function of the measurement
as a function of the measurement direction A and the hidden variables
direction A and the hidden variables lambda can take on a value of + one or
lambda can take on a value of + one or minus1 depending on whether particle 1
minus1 depending on whether particle 1 is measured spin up or spin down
is measured spin up or spin down respectively. And likewise the
respectively. And likewise the measurement result at detector B which
measurement result at detector B which is a function of the B ais and the
is a function of the B ais and the hidden variables lambda is also going to
hidden variables lambda is also going to take on a value of +1 or minus1 for spin
take on a value of +1 or minus1 for spin up and spin down respectively.
up and spin down respectively. And we're going to leave this fully
And we're going to leave this fully generic as far as in what way or by what
generic as far as in what way or by what function do the hidden variables
function do the hidden variables interact with the measurement axis.
interact with the measurement axis. Whatever it is you can imagine, whatever
Whatever it is you can imagine, whatever principle you want to go ahead and
principle you want to go ahead and postulate, then it's still for sure the
postulate, then it's still for sure the case whatever these functions actually
case whatever these functions actually are, by definition, they're going to
are, by definition, they're going to have values of plus or minus one
have values of plus or minus one depending on the outcome of the spin
depending on the outcome of the spin measurement. Now the vital assumption of
measurement. Now the vital assumption of local causality is that the result B for
local causality is that the result B for particle 2 does not depend on the
particle 2 does not depend on the setting A of the magnet for particle 1.
setting A of the magnet for particle 1. Nor does A depend on B. So in equation
Nor does A depend on B. So in equation one you see that A is a function of the
one you see that A is a function of the A vector and the hidden variables
A vector and the hidden variables lambda. b is a function of the B vector
lambda. b is a function of the B vector and the hidden variables lambda. But
and the hidden variables lambda. But notice that A is not a function of the B
notice that A is not a function of the B vector, nor is B a function of the A
vector, nor is B a function of the A vector. The reason being detectors A and
vector. The reason being detectors A and B are separated out so far and these
B are separated out so far and these measurements happen so quickly. So
measurements happen so quickly. So there's no way that the information
there's no way that the information about which way one detector is oriented
about which way one detector is oriented can propagate over to the other detector
can propagate over to the other detector and affect the measurement result in any
and affect the measurement result in any way. No, these two things happen in
way. No, these two things happen in different light cones. And so by local
different light cones. And so by local causality, you can't have the
causality, you can't have the measurement result of A depending on the
measurement result of A depending on the B vector or vice versa.
B vector or vice versa. And one of the things that we're going
And one of the things that we're going to show in this paper is that any hidden
to show in this paper is that any hidden variable model is going to have to
variable model is going to have to violate that assumption. And the only
violate that assumption. And the only way to get it to work is if you're going
way to get it to work is if you're going to relax that constraint and say, okay,
to relax that constraint and say, okay, the measurement outcome at A depends on
the measurement outcome at A depends on the orientation at B and vice versa. And
the orientation at B and vice versa. And then it's like, oh, that's weird. That's
then it's like, oh, that's weird. That's non-local. That is absurd. But you know
non-local. That is absurd. But you know that's like super weird. And then so at
that's like super weird. And then so at that point there's no advantage of using
that point there's no advantage of using a hidden variable model because whether
a hidden variable model because whether you take ordinary quantum mechanics or
you take ordinary quantum mechanics or some speculative hidden variable model
some speculative hidden variable model in both cases you're going to have a
in both cases you're going to have a non-local model. And so no matter how
non-local model. And so no matter how you look at it it's a glitch in reality.
you look at it it's a glitch in reality. All right. Then suppose we define row of
All right. Then suppose we define row of lambda as the probability distribution
lambda as the probability distribution of the hidden variables lambda.
of the hidden variables lambda. So in other words, imagine all possible
So in other words, imagine all possible configurations of our hidden variables
configurations of our hidden variables lambda, whether they're vectors or
lambda, whether they're vectors or scalers or tensors or functions or sets,
scalers or tensors or functions or sets, whatever you want to imagine for lambda.
whatever you want to imagine for lambda. There's going to be some space of
There's going to be some space of configurations, some space of
configurations, some space of possibilities that lambda can take on.
possibilities that lambda can take on. And you can assign a probability to each
And you can assign a probability to each and every configuration. And so row of
and every configuration. And so row of lambda is precisely the distribution
lambda is precisely the distribution which defines how likely our hidden
which defines how likely our hidden variables are to exist in whatever state
variables are to exist in whatever state we can imagine them existing in. So this
we can imagine them existing in. So this is quite a generic thing and as we go
is quite a generic thing and as we go through the paper we'll imagine some
through the paper we'll imagine some specific cases with some simple
specific cases with some simple functions for row of lambda. But notice
functions for row of lambda. But notice the power in keeping this generic. See
the power in keeping this generic. See so far we haven't narrowed down what
so far we haven't narrowed down what lambda can be. Our hidden variables can
lambda can be. Our hidden variables can be whatever you can imagine. And then
be whatever you can imagine. And then row of lambda as a probability
row of lambda as a probability distribution on those hidden variables
distribution on those hidden variables can also be whatever you want to
can also be whatever you want to imagine. Whatever distribution you want
imagine. Whatever distribution you want to take over whatever space of variables
to take over whatever space of variables you want to define. And even though our
you want to define. And even though our setup is so generic, one of the things
setup is so generic, one of the things we can still say for sure is that the
we can still say for sure is that the expectation value of the product of the
expectation value of the product of the two components measuring particle one
two components measuring particle one along the A axis and measuring particle
along the A axis and measuring particle 2 along the B ais is going to be P of A
2 along the B ais is going to be P of A and B where here P is the expectation
and B where here P is the expectation value of the products of A and B that is
value of the products of A and B that is the plus or minus one that's recorded at
the plus or minus one that's recorded at each detector. We can say that P of A
each detector. We can say that P of A and B is going to be the integral over
and B is going to be the integral over all possible configurations of hidden
all possible configurations of hidden variables. Each one weighted by row of
variables. Each one weighted by row of lambda that is how likely that
lambda that is how likely that configuration is to be. And then as
configuration is to be. And then as we're integrating over that space of
we're integrating over that space of possible hidden variables for each
possible hidden variables for each possibility, we simply multiply the
possibility, we simply multiply the outcome of the measurement at detector
outcome of the measurement at detector A, that is A of A and lambda times the
A, that is A of A and lambda times the measurement outcome at detector B, that
measurement outcome at detector B, that is B of B and lambda.
is B of B and lambda. By the way, in Belle's paper, he writes
By the way, in Belle's paper, he writes this integral as integral row lambda D
this integral as integral row lambda D lambda A * B. I like to write it in the
lambda A * B. I like to write it in the sandwich notation where you have the
sandwich notation where you have the integral sign on the left and the
integral sign on the left and the differential element on the right and
differential element on the right and then whatever you're integrating over in
then whatever you're integrating over in between. It doesn't matter either way.
between. It doesn't matter either way. It's just a stylistic choice. So, well,
It's just a stylistic choice. So, well, anyway, I want to reflect on exactly
anyway, I want to reflect on exactly what this equation means, equation two,
what this equation means, equation two, because it is of central importance to
because it is of central importance to everything that follows. So, this
everything that follows. So, this parameter P, we're going to go ahead and
parameter P, we're going to go ahead and call that the correlation between our
call that the correlation between our measurements.
measurements. And this correlation has a really
And this correlation has a really intuitive meaning. So the first thing to
intuitive meaning. So the first thing to notice is that P the correlation has to
notice is that P the correlation has to be somewhere in between -1 and 1. When
be somewhere in between -1 and 1. When it's negative 1, then the measurement
it's negative 1, then the measurement outcomes at detector A are going to be
outcomes at detector A are going to be perfectly anti-correlated with the
perfectly anti-correlated with the measurement outcomes at detector B. So
measurement outcomes at detector B. So for example, this would be when detector
for example, this would be when detector A and detector B are aligned along
A and detector B are aligned along precisely the same axis. Because if we
precisely the same axis. Because if we have a pair of particles in the singlet
have a pair of particles in the singlet state and we measure them both along the
state and we measure them both along the same axis then if one is spin up the
same axis then if one is spin up the other spin down and vice versa. So if a
other spin down and vice versa. So if a is + one then b is minus1 and vice
is + one then b is minus1 and vice versa. And so when we're measuring the
versa. And so when we're measuring the singlet state along the same axis then
singlet state along the same axis then the product of a and b is always going
the product of a and b is always going to be -1 because 1 *1 is -1 and -1 * 1
to be -1 because 1 *1 is -1 and -1 * 1 is -1. And so in that configuration if
is -1. And so in that configuration if the product of a and b is always neg -1
the product of a and b is always neg -1 then equation 2 is simply the negative
then equation 2 is simply the negative integral over row of lambda d lambda.
integral over row of lambda d lambda. Now this is a normalized probability
Now this is a normalized probability distribution. So when you integrate over
distribution. So when you integrate over all possibilities and each one is
all possibilities and each one is weighted by the probability distribution
weighted by the probability distribution the result of that integral is always
the result of that integral is always going to equal one because there's a
going to equal one because there's a 100% chance that the hidden variables
100% chance that the hidden variables are in some kind of configuration.
are in some kind of configuration. And so then we find that P of A and B
And so then we find that P of A and B when A and B are the same vector is
when A and B are the same vector is equal to -1.
equal to -1. Conversely, if we flip B around so that
Conversely, if we flip B around so that now B is equal to A and our measurement
now B is equal to A and our measurement axes are pointing in equal and opposite
axes are pointing in equal and opposite directions, then we find a correlation
directions, then we find a correlation of one. That is the product of A and B
of one. That is the product of A and B is always going to equal one. Because if
is always going to equal one. Because if we measure the particle spin up in
we measure the particle spin up in detector A, but then detector B is
detector A, but then detector B is flipped upside down relative to detector
flipped upside down relative to detector A, then the other particle is also going
A, then the other particle is also going to be measured spin up in detector B,
to be measured spin up in detector B, but along the upside down axis. So the
but along the upside down axis. So the singlet correlation is still there. It's
singlet correlation is still there. It's just that when you flip the vector B
just that when you flip the vector B upside down, that's kind of a
upside down, that's kind of a redefinition of what spin up and spin
redefinition of what spin up and spin down means in detector B. And so in that
down means in detector B. And so in that case if the product of a and b is always
case if the product of a and b is always equal to 1 because 1 * 1 is 1 and also -
equal to 1 because 1 * 1 is 1 and also - 1 * 1 is 1 then equation 2 simply
1 * 1 is 1 then equation 2 simply reduces to the integral of row of lambda
reduces to the integral of row of lambda d lambda which because row is a
d lambda which because row is a normalized probability distribution
normalized probability distribution equals 1.
equals 1. Now there's one more special case that
Now there's one more special case that we can imagine which is when a and b are
we can imagine which is when a and b are perpendicular.
perpendicular. So suppose a is pointing straight up and
So suppose a is pointing straight up and b is pointing straight to the right.
b is pointing straight to the right. Well, in that case, we should expect a
Well, in that case, we should expect a correlation of zero. The reason being in
correlation of zero. The reason being in the singlet state, say you measure spin
the singlet state, say you measure spin up along A, well, if B is perpendicular
up along A, well, if B is perpendicular to A, then it could go either way. You
to A, then it could go either way. You could get a spin up or a spin down. And
could get a spin up or a spin down. And so on average, the product of A and B is
so on average, the product of A and B is going to be a + one or a minus1 about
going to be a + one or a minus1 about 50/50. And so that'll average out to
50/50. And so that'll average out to zero. So if we have a value of P equals
zero. So if we have a value of P equals Z, there is no correlation between the
Z, there is no correlation between the two detectors.
two detectors. Okay, so that's equation two. The
Okay, so that's equation two. The correlation between our measurement
correlation between our measurement outcomes is found simply by integrating
outcomes is found simply by integrating over the space of all possible hidden
over the space of all possible hidden variables weighted by the probability of
variables weighted by the probability of each configuration of the products of
each configuration of the products of the plus -1 outcome at A times the plus
the plus -1 outcome at A times the plus orus one outcome at B. Now that
orus one outcome at B. Now that correlation given by equation 2 based on
correlation given by equation 2 based on a hidden variable model should equal the
a hidden variable model should equal the quantum mechanical expectation value
quantum mechanical expectation value which for the singlet state the
which for the singlet state the expectation value of that product is
expectation value of that product is going to be a b or as we saw earlier
going to be a b or as we saw earlier negative cosine of theta where theta is
negative cosine of theta where theta is the angle between the two measurement
the angle between the two measurement axis vectors a and b. And the way to
axis vectors a and b. And the way to prove that equation three is true, that
prove that equation three is true, that this is the quantum mechanical
this is the quantum mechanical expectation value, and that this does
expectation value, and that this does match the experimental data is just to
match the experimental data is just to imagine that particle 1 gets to detector
imagine that particle 1 gets to detector A ever so slightly before particle 2
A ever so slightly before particle 2 gets to detector B. So then particle 1
gets to detector B. So then particle 1 is measured along the AIS and the wave
is measured along the AIS and the wave function instantly collapses. And now
function instantly collapses. And now particle 2 is going to be polarized
particle 2 is going to be polarized opposite to the AIS. And so then when
opposite to the AIS. And so then when you measure the spin of particle 2 along
you measure the spin of particle 2 along the direction B, you can think about it
the direction B, you can think about it sort of like the two-stage sternerlock
sort of like the two-stage sternerlock experiment where we create a beam of
experiment where we create a beam of purely polarized spin- up particles and
purely polarized spin- up particles and we send that through a second detector
we send that through a second detector which is tilted by some angle theta. And
which is tilted by some angle theta. And then as we know we have a cosine^ 2
then as we know we have a cosine^ 2 probability of measuring spin up sin^ 2
probability of measuring spin up sin^ 2 thet2 probability of measuring spin
thet2 probability of measuring spin down. And if you take the expectation
down. And if you take the expectation value, you think like a gambler and
value, you think like a gambler and calculate the expectation value, you end
calculate the expectation value, you end up with an expectation value of cosine
up with an expectation value of cosine of theta for the measurement outcome at
of theta for the measurement outcome at the second detector if spin up is + one
the second detector if spin up is + one and spin down is ne1. And we saw that
and spin down is ne1. And we saw that earlier. And then the minus sign here
earlier. And then the minus sign here simply comes from the fact that the two
simply comes from the fact that the two particles in the singlet state are
particles in the singlet state are anti-correlated.
anti-correlated. So if particle one is spin up along the
So if particle one is spin up along the axis A, then particle 2 is actually
axis A, then particle 2 is actually going to be polarized spin down along A.
going to be polarized spin down along A. And so that's where the minus sign comes
And so that's where the minus sign comes from. It's basically just a 180 flip of
from. It's basically just a 180 flip of the two-stage Stern Garlock experiment
the two-stage Stern Garlock experiment that we were looking at earlier.
that we were looking at earlier. Well, anyway, all that's to say, quantum
Well, anyway, all that's to say, quantum mechanics tells us that the correlation
mechanics tells us that the correlation of the measurement outcomes for unit
of the measurement outcomes for unit vector A at detector A and unit vector B
vector A at detector A and unit vector B at detector B for two particles in the
at detector B for two particles in the singlet state should be negative cosine
singlet state should be negative cosine of theta where theta is the angle
of theta where theta is the angle between the two vectors. And so the main
between the two vectors. And so the main question of this paper is is it possible
question of this paper is is it possible to have some hidden variable model based
to have some hidden variable model based on some set of possible lambdas and some
on some set of possible lambdas and some probability distribution which describes
probability distribution which describes the likelihood of each lambda. Based on
the likelihood of each lambda. Based on a model like that, can we get equation 2
a model like that, can we get equation 2 to match the quantum mechanical and the
to match the quantum mechanical and the experimental value of negative cosine of
experimental value of negative cosine of theta between the vectors a and b? If
theta between the vectors a and b? If so, then such a hidden variable model
so, then such a hidden variable model might be plausible, you know, because it
might be plausible, you know, because it would match the data. It would match
would match the data. It would match quantum theory and yet it would be an
quantum theory and yet it would be an alternate way of looking at things. So
alternate way of looking at things. So that's cool. But what we're going to
that's cool. But what we're going to show in this paper, in particular, part
show in this paper, in particular, part four in the contradiction, is that no
four in the contradiction, is that no local hidden variable model can actually
local hidden variable model can actually have an equation 2 correlation which
have an equation 2 correlation which matches the quantum mechanical
matches the quantum mechanical correlation and the experimental data.
correlation and the experimental data. And so therefore, we cannot have a local
And so therefore, we cannot have a local hidden variable explanation of what's
hidden variable explanation of what's going on here. And so therefore, we have
going on here. And so therefore, we have to confront the fact that quantum
to confront the fact that quantum mechanics genuinely is super weird and
mechanics genuinely is super weird and non-local and a glitch in reality.
non-local and a glitch in reality. Oh, and then one little caveat on the
Oh, and then one little caveat on the way we've formulated things here. Some
way we've formulated things here. Some might prefer a formulation in which the
might prefer a formulation in which the hidden variables fall into two sets with
hidden variables fall into two sets with the measurement outcome at A dependent
the measurement outcome at A dependent on one set of hidden variables and the
on one set of hidden variables and the measurement at B depending on another
measurement at B depending on another set of hidden variables.
set of hidden variables. However, this possibility is contained
However, this possibility is contained in the above since lambda stands for any
in the above since lambda stands for any number of variables and the dependencies
number of variables and the dependencies thereon of A and B are unrestricted. So
thereon of A and B are unrestricted. So in other words, if you want to have a
in other words, if you want to have a hidden variable model where particle one
hidden variable model where particle one carries with it some kind of set of
carries with it some kind of set of hidden variables and particle 2 carries
hidden variables and particle 2 carries with it a whole another set of hidden
with it a whole another set of hidden variables, go right ahead. That's fine.
variables, go right ahead. That's fine. We're not ruling out that possibility.
We're not ruling out that possibility. When we use this character lambda to
When we use this character lambda to stand for any imaginable hidden
stand for any imaginable hidden variables, you can go ahead and imagine
variables, you can go ahead and imagine that in whatever way you want, including
that in whatever way you want, including the situation where you have two sets of
the situation where you have two sets of hidden variables, one for each particle.
hidden variables, one for each particle. You know, go for it. That's totally
You know, go for it. That's totally fine. were not restricting that
fine. were not restricting that possibility at all. And likewise, in a
possibility at all. And likewise, in a complete physical theory of the type
complete physical theory of the type envisaged by Einstein, the hidden
envisaged by Einstein, the hidden variables would have dynamical
variables would have dynamical significance and laws of motion.
significance and laws of motion. Our lambda can then be thought of as
Our lambda can then be thought of as initial values of these variables at
initial values of these variables at some suitable instant. So in other
some suitable instant. So in other words, if you want to think about hidden
words, if you want to think about hidden variables as some kind of fields with
variables as some kind of fields with dynamical significance, that's cool,
dynamical significance, that's cool, too. Everything we're about to argue
too. Everything we're about to argue doesn't rule out that possibility at
doesn't rule out that possibility at all. And if you want, you can imagine
all. And if you want, you can imagine lambda representing a snapshot in time
lambda representing a snapshot in time of those fields. And then you can
of those fields. And then you can imagine those fields evolving in
imagine those fields evolving in accordance with some dynamical
accordance with some dynamical equations. But none of that time
equations. But none of that time evolution is going to break that thought
evolution is going to break that thought experiment outside of the framework that
experiment outside of the framework that we're setting up because our argument is
we're setting up because our argument is fully generic. Anything you can imagine
fully generic. Anything you can imagine for lambda, lambda can be. You know, I
for lambda, lambda can be. You know, I just noticed this yellow lambda. It kind
just noticed this yellow lambda. It kind of looks like a banana peel. You
of looks like a banana peel. You wouldn't want that as a hidden variable.
wouldn't want that as a hidden variable. [laughter]
[laughter] Hey, that would affect the measurement
Hey, that would affect the measurement of your spin state.
of your spin state. All right, moving on.
All right, moving on. Part three of the paper begins. The
Part three of the paper begins. The proof of the main result is quite
proof of the main result is quite simple. Well, according to Belle, at
simple. Well, according to Belle, at least. I don't know if I would say it's
least. I don't know if I would say it's quite simple, but uh anyway, before
quite simple, but uh anyway, before giving it in part four, however, a
giving it in part four, however, a number of illustrations may serve to put
number of illustrations may serve to put it in perspective.
it in perspective. So part three is all about establishing
So part three is all about establishing some context for part four looking at
some context for part four looking at some specific examples which we're then
some specific examples which we're then going to generalize in part four when we
going to generalize in part four when we give the formal argumentation that local
give the formal argumentation that local hidden variable models don't work.
hidden variable models don't work. Now I'm going to go ahead and break up
Now I'm going to go ahead and break up part three into three parts 3 A 3 B and
part three into three parts 3 A 3 B and 3 C because this part of the paper is
3 C because this part of the paper is kind of naturally broken up into those
kind of naturally broken up into those three parts anyway and I want to take
three parts anyway and I want to take the time to zoom in on each part of this
the time to zoom in on each part of this individually.
individually. So the first part of part three is that
So the first part of part three is that for a single particle we can make up a
for a single particle we can make up a hidden variable story of what's going on
hidden variable story of what's going on with the spin and it's okay it seems to
with the spin and it's okay it seems to work.
work. Firstly there is no difficulty in giving
Firstly there is no difficulty in giving a hidden variable account of spin
a hidden variable account of spin measurements on a single particle.
measurements on a single particle. Suppose we have a spin half particle in
Suppose we have a spin half particle in a pure spin state with polarization
a pure spin state with polarization denoted by a unit vector P. And all that
denoted by a unit vector P. And all that means is imagine we send a beam of spin
means is imagine we send a beam of spin 1/2 particles through a sternerlock
1/2 particles through a sternerlock magnet and then filter it out like what
magnet and then filter it out like what we saw before where we allow only the
we saw before where we allow only the spin up particles through. Well, then if
spin up particles through. Well, then if the axis of that sternerlock magnet is
the axis of that sternerlock magnet is the vector P, then the outgoing beam of
the vector P, then the outgoing beam of particles are polarized with reference
particles are polarized with reference to that vector P. That is to say, if you
to that vector P. That is to say, if you were to do a subsequent spin measurement
were to do a subsequent spin measurement on that particle along the direction P,
on that particle along the direction P, then for sure the result of that
then for sure the result of that measurement is going to be spin up. So
measurement is going to be spin up. So that's what it means for the particle to
that's what it means for the particle to be polarized along the direction P.
be polarized along the direction P. All right. Now then suppose we let our
All right. Now then suppose we let our hidden variable be for example a unit
hidden variable be for example a unit vector lambda with uniform probability
vector lambda with uniform probability distribution over the hemisphere
distribution over the hemisphere lambda.p is greater than zero. That is
lambda.p is greater than zero. That is to say a lambda is going to be some
to say a lambda is going to be some additional directional or orientational
additional directional or orientational degree of freedom that travels along
degree of freedom that travels along with the particle. And we don't know
with the particle. And we don't know exactly what lambda is going to be. All
exactly what lambda is going to be. All we know about it is that it's going to
we know about it is that it's going to have a uniform probability distribution
have a uniform probability distribution over the hemisphere which points in the
over the hemisphere which points in the same direction as P. And so this
same direction as P. And so this constraint that the dotproduct of lambda
constraint that the dotproduct of lambda and P is greater than zero, all that
and P is greater than zero, all that means is that lambda kind of points
means is that lambda kind of points towards P and it doesn't kind of point
towards P and it doesn't kind of point away from P. Now, if you think back to
away from P. Now, if you think back to what we saw earlier in this video, where
what we saw earlier in this video, where we sent our particle through a two-stage
we sent our particle through a two-stage Sternerlock experiment, and we supposed
Sternerlock experiment, and we supposed that all the magnet does is filters out
that all the magnet does is filters out the particles that point a little up
the particles that point a little up versus a little down without actively
versus a little down without actively flipping the arrow up and down. You'll
flipping the arrow up and down. You'll see that that thought experiment
see that that thought experiment actually gives us a beam of this kind of
actually gives us a beam of this kind of particle where we start off with the
particle where we start off with the assumption that the incoming particle,
assumption that the incoming particle, those evaporated silver atoms, have a
those evaporated silver atoms, have a totally randomly oriented lambda vector,
totally randomly oriented lambda vector, but then we send it through the first
but then we send it through the first sternerlock magnet to get a beam that's
sternerlock magnet to get a beam that's purely polarized along the axis of that
purely polarized along the axis of that magnet. And then at that point, what we
magnet. And then at that point, what we know about the lambda vector is it's
know about the lambda vector is it's still going to be totally random, but
still going to be totally random, but only on the half of the sphere that kind
only on the half of the sphere that kind of points along the direction P because
of points along the direction P because the particles for which lambda pointed
the particles for which lambda pointed away from P were sent into the spin down
away from P were sent into the spin down beam and those didn't go forward.
beam and those didn't go forward. And so the question then comes up, what
And so the question then comes up, what happens if we measure the spin of this
happens if we measure the spin of this kind of particle along some axis A?
kind of particle along some axis A? Well, we already know what the
Well, we already know what the expectation value is going to be. The
expectation value is going to be. The expectation value of the spin of this
expectation value of the spin of this kind of particle from quantum mechanics
kind of particle from quantum mechanics and from experiment is going to be the
and from experiment is going to be the cosine of the tilt angle of the second
cosine of the tilt angle of the second detector relative to the first. That is
detector relative to the first. That is in this language we would say it's going
in this language we would say it's going to be the coine of the angle theta
to be the coine of the angle theta between the polarization vector P and
between the polarization vector P and the measurement vector A.
the measurement vector A. So then suppose that as we're building
So then suppose that as we're building our hidden variable model, we speculate
our hidden variable model, we speculate that the result of measuring along some
that the result of measuring along some axis A is going to be the sign of the
axis A is going to be the sign of the hidden variable lambda vector dotted
hidden variable lambda vector dotted with the effective measurement axis A
with the effective measurement axis A prime. See, we're going to have to do a
prime. See, we're going to have to do a sketchy move here of the kind we talked
sketchy move here of the kind we talked about earlier.
about earlier. And so A prime is going to be a unit
And so A prime is going to be a unit vector which depends on A and P in a way
vector which depends on A and P in a way to be specified. We're going to talk
to be specified. We're going to talk about exactly what that has to be in a
about exactly what that has to be in a moment, but this is exactly the same
moment, but this is exactly the same kind of sketchy move we looked at
kind of sketchy move we looked at earlier when we were thinking about how
earlier when we were thinking about how can we modify our hidden variable model
can we modify our hidden variable model into something that matches the data.
into something that matches the data. And in fact, the example we looked at
And in fact, the example we looked at earlier in the video is mathematically
earlier in the video is mathematically equivalent to what we're talking about
equivalent to what we're talking about now. Oh, and then the sign function here
now. Oh, and then the sign function here simply takes on the values of + one or
simply takes on the values of + one or minus one according to the sign of its
minus one according to the sign of its argument. So the sign of the dotproduct
argument. So the sign of the dotproduct of the lambda vector and the effective
of the lambda vector and the effective measurement axis a prime is going to be
measurement axis a prime is going to be positive if lambda kind of points along
positive if lambda kind of points along a prime and it's going to be negative if
a prime and it's going to be negative if lambda kind of points away from a prime.
lambda kind of points away from a prime. And so all this is to say the
And so all this is to say the measurement result is going to be spin
measurement result is going to be spin up if lambda is in the hemisphere whose
up if lambda is in the hemisphere whose pole is a prime and otherwise it'll be
pole is a prime and otherwise it'll be spin down if lambda is outside of that
spin down if lambda is outside of that hemisphere.
hemisphere. And then you can say what if lambda is
And then you can say what if lambda is right on the equator relative to the
right on the equator relative to the north pole of a prime. Well, the
north pole of a prime. Well, the probability of lambda being perfectly on
probability of lambda being perfectly on the equator is zero. And so we don't
the equator is zero. And so we don't have to worry about it. As Bell says in
have to worry about it. As Bell says in his paper, actually this leaves the
his paper, actually this leaves the result undetermined when lambda a prime
result undetermined when lambda a prime equals zero. But as the probability of
equals zero. But as the probability of this is zero, we will not make special
this is zero, we will not make special prescriptions for it. So we don't have
prescriptions for it. So we don't have to worry about that. Now then if you
to worry about that. Now then if you average over all possible hidden
average over all possible hidden variable vectors lambda in accordance
variable vectors lambda in accordance with the setup we've described here the
with the setup we've described here the expectation value of the spin
expectation value of the spin measurement is going to be 1 - 2 theta
measurement is going to be 1 - 2 theta prime over pi. Call that equation 5
prime over pi. Call that equation 5 where theta prime is the angle between
where theta prime is the angle between the effective measurement axis a prime
the effective measurement axis a prime and the polarization vector p. That's
and the polarization vector p. That's the same theta prime from our sketchy
the same theta prime from our sketchy move we talked about earlier. And so
move we talked about earlier. And so let's go ahead and see where equation 5
let's go ahead and see where equation 5 comes from. Why does this model give us
comes from. Why does this model give us an expectation value of 1 - 2 theta
an expectation value of 1 - 2 theta prime over pi?
prime over pi? Well, the reason being is that the
Well, the reason being is that the expectation value of the spin
expectation value of the spin measurement along the measurement axis A
measurement along the measurement axis A in accordance with the equation for the
in accordance with the equation for the rule that we've stipulated here is going
rule that we've stipulated here is going to be the probability that the lambda
to be the probability that the lambda vector is in the hemisphere defined with
vector is in the hemisphere defined with A prime at the pole times a + one for
A prime at the pole times a + one for the spin up result plus the probability
the spin up result plus the probability of lambda not being in a prime's
of lambda not being in a prime's hemisphere times the negative 1 value
hemisphere times the negative 1 value which goes along with the spin down
which goes along with the spin down measurement.
measurement. So we're thinking like a gambler here
So we're thinking like a gambler here and we're calculating that expectation
and we're calculating that expectation value. And then when we think about
value. And then when we think about this, what we realize is that the
this, what we realize is that the expectation value of the spin
expectation value of the spin measurement is going to be one, its
measurement is going to be one, its maximum value, when the theta prime
maximum value, when the theta prime angle is zero. That is when our
angle is zero. That is when our polarization vector is exactly aligned
polarization vector is exactly aligned with the effective measurement axis a
with the effective measurement axis a prime, then we're always going to get
prime, then we're always going to get spin up. like for sure 100% guarantee
spin up. like for sure 100% guarantee because when you think about the
because when you think about the hemisphere of possible lambda vectors,
hemisphere of possible lambda vectors, well, those are going to be in the same
well, those are going to be in the same hemisphere as the polarization vector.
hemisphere as the polarization vector. So if the polarization vector and the a
So if the polarization vector and the a prime vector point in exactly the same
prime vector point in exactly the same direction, then lambda is guaranteed to
direction, then lambda is guaranteed to be in a prime's hemisphere. So you're
be in a prime's hemisphere. So you're always going to get a plus one in that
always going to get a plus one in that case. And conversely, the expectation
case. And conversely, the expectation value of the spin measurement if the
value of the spin measurement if the polarization vector P is completely
polarization vector P is completely antiparallel to the effective
antiparallel to the effective measurement axis A prime that is if
measurement axis A prime that is if theta prime is pi or 180° then we're
theta prime is pi or 180° then we're always going to get a negative one a
always going to get a negative one a spin down measurement in that case. If
spin down measurement in that case. If the polarization vector is pointing
the polarization vector is pointing completely away from a prime then the
completely away from a prime then the space of possible lambda vectors is
space of possible lambda vectors is precisely the opposite of a prime's
precisely the opposite of a prime's hemisphere. And so you're always going
hemisphere. And so you're always going to get a spin down measurement in that
to get a spin down measurement in that case. And then if you think about
case. And then if you think about rotating the polarization vector P
rotating the polarization vector P relative to A prime and think about the
relative to A prime and think about the overlap in the hemispheres of P and A
overlap in the hemispheres of P and A prime, you see that the overlap varies
prime, you see that the overlap varies linearly with the angle theta prime.
linearly with the angle theta prime. This goes back to what we were talking
This goes back to what we were talking about earlier. When you imagine the
about earlier. When you imagine the board game with the spinny thing and you
board game with the spinny thing and you spin the needle and the probability of
spin the needle and the probability of it landing somewhere simply has to do
it landing somewhere simply has to do with the area of the wedge that it's
with the area of the wedge that it's going to land on. Well, as you rotate
going to land on. Well, as you rotate theta prime, you see that our
theta prime, you see that our expectation value is going to vary
expectation value is going to vary linearly with the angle theta prime for
linearly with the angle theta prime for precisely the same reason. And you can
precisely the same reason. And you can think about that as a two-dimensional
think about that as a two-dimensional circle and a board game spinner thing.
circle and a board game spinner thing. Or you can think about it in the full
Or you can think about it in the full three dimensions as if it's like an
three dimensions as if it's like an orange and you have the volume of the
orange and you have the volume of the orange slice going along with the wedge
orange slice going along with the wedge angle. But in any case, this model is
angle. But in any case, this model is going to give us an expectation value of
going to give us an expectation value of the spin measurement which is linearly
the spin measurement which is linearly dependent on theta prime. And so if you
dependent on theta prime. And so if you consider the two boundary conditions
consider the two boundary conditions we've looked at for theta prime= 0 and
we've looked at for theta prime= 0 and theta prime= pi and then apply the fact
theta prime= pi and then apply the fact that this is a linear function and then
that this is a linear function and then just think in terms of y = mx + b. You
just think in terms of y = mx + b. You see that our equation for the
see that our equation for the expectation value of the spin
expectation value of the spin measurement is necessarily 1 - 2 theta
measurement is necessarily 1 - 2 theta prime over pi. And as we know this
prime over pi. And as we know this linear function is not what quantum
linear function is not what quantum mechanics predicts and is not a match of
mechanics predicts and is not a match of the experimental data because in both
the experimental data because in both cases that's going to be the coine of
cases that's going to be the coine of the angle, not a linear function of the
the angle, not a linear function of the angle. But here's where the sketchy move
angle. But here's where the sketchy move comes in. Right? Here's why we have a
comes in. Right? Here's why we have a prime instead of just a. Suppose then
prime instead of just a. Suppose then that a prime is obtained from a by
that a prime is obtained from a by rotation towards the polarization vector
rotation towards the polarization vector p until 1 - 2 thet prime / pi equals
p until 1 - 2 thet prime / pi equals cosine of theta. Call that equation 6
cosine of theta. Call that equation 6 where theta is the angle between the
where theta is the angle between the measurement axis a and the polarization
measurement axis a and the polarization vector p. So that's that sketchy move
vector p. So that's that sketchy move that we use in order to warp the linear
that we use in order to warp the linear function into a cosine function. Well
function into a cosine function. Well then if we do that if we apply equation
then if we do that if we apply equation six then we have the desired result that
six then we have the desired result that the expectation value of the spin
the expectation value of the spin measurement is cosine of theta which is
measurement is cosine of theta which is in alignment with quantum physics and
in alignment with quantum physics and it's in alignment with the experimental
it's in alignment with the experimental data. And so technically we haven't done
data. And so technically we haven't done anything illegal here. We haven't broken
anything illegal here. We haven't broken any rules and this model therefore
any rules and this model therefore cannot be completely dismissed though it
cannot be completely dismissed though it is contrived and it is implausible and
is contrived and it is implausible and it's like we don't want to have to
it's like we don't want to have to believe this because if we have a
believe this because if we have a detector which is oriented along the
detector which is oriented along the vector A and we have to stipulate that
vector A and we have to stipulate that no actually what's happening there is
no actually what's happening there is the effective measurement axis is bent a
the effective measurement axis is bent a little bit in towards the polarization
little bit in towards the polarization vector. It's like uh well you can say
vector. It's like uh well you can say that but why would that be the case?
that but why would that be the case? This is not a very convincing model but
This is not a very convincing model but we will not dismiss it on the basis that
we will not dismiss it on the basis that it's not convincing. Instead we're going
it's not convincing. Instead we're going to go ahead and say look it's possible.
to go ahead and say look it's possible. We're not going to rule it out just yet.
We're not going to rule it out just yet. And so by lowering the epistemic
And so by lowering the epistemic standards for the hidden variable model,
standards for the hidden variable model, then that's going to hold us to a higher
then that's going to hold us to a higher standard when later on we rule out all
standard when later on we rule out all possible local hidden variable models.
possible local hidden variable models. Because then we'll be able to say, look,
Because then we'll be able to say, look, we went along with the sketchy move. We
we went along with the sketchy move. We allowed it. But even allowing that, our
allowed it. But even allowing that, our proof later on is going to be so strong
proof later on is going to be so strong that we're going to actually show that
that we're going to actually show that despite our generosity here, despite
despite our generosity here, despite being maximally charitable to the local
being maximally charitable to the local hidden variable model perspective, later
hidden variable model perspective, later on we're going to show that it just
on we're going to show that it just doesn't work. All right. So in this
doesn't work. All right. So in this simple case there is no difficulty in
simple case there is no difficulty in the view that the result of every
the view that the result of every measurement is determined by the value
measurement is determined by the value of an extra variable lambda and that the
of an extra variable lambda and that the statistical features of quantum
statistical features of quantum mechanics arise because the value of
mechanics arise because the value of this variable is unknown in individual
this variable is unknown in individual instances.
instances. That is in this particular case we can
That is in this particular case we can come up with a story involving local
come up with a story involving local hidden variables and it kind of appears
hidden variables and it kind of appears to work even though it is a little bit
to work even though it is a little bit sketchy.
sketchy. Okay, so part three of the paper then
Okay, so part three of the paper then goes on to show that hidden variables
goes on to show that hidden variables also seem to work for special cases in
also seem to work for special cases in which the two detectors have special
which the two detectors have special orientations for their measurement axis.
orientations for their measurement axis. Secondly, there is no difficulty in
Secondly, there is no difficulty in reproducing in the form of equation two
reproducing in the form of equation two that is the correlation function based
that is the correlation function based on local hidden variables the only
on local hidden variables the only features of the quantum mechanical and
features of the quantum mechanical and experimental correlation function three
experimental correlation function three commonly used in verbal discussions of
commonly used in verbal discussions of this problem. That is when our two
this problem. That is when our two measurement directions are the same in
measurement directions are the same in which case we have P of A and A cuz A
which case we have P of A and A cuz A and B are the same when they're aligned
and B are the same when they're aligned the same way. And that'll give us the
the same way. And that'll give us the negative of the correlation that we
negative of the correlation that we would find when B is equal to negative A
would find when B is equal to negative A and that's equal to1.
and that's equal to1. So when the unit vectors A and B are
So when the unit vectors A and B are aligned the same way, we get a perfect
aligned the same way, we get a perfect anti-correlation of -1. And when A and B
anti-correlation of -1. And when A and B are oppositely aligned, then we get a
are oppositely aligned, then we get a perfect correlation of 1. And the other
perfect correlation of 1. And the other special case is when the dotproduct of A
special case is when the dotproduct of A and B equals zero. That is when A and B
and B equals zero. That is when A and B are perfectly perpendicular to each
are perfectly perpendicular to each other. in which case we have no
other. in which case we have no correlation.
correlation. So aligned the same way we have negative
So aligned the same way we have negative 1. Aligned opposite ways P is 1.
1. Aligned opposite ways P is 1. Perpendicular P is zero. And these three
Perpendicular P is zero. And these three special cases can be explained by a
special cases can be explained by a local hidden variable model. For
local hidden variable model. For example, let lambda now be the unit
example, let lambda now be the unit vector lambda with uniform probability
vector lambda with uniform probability distribution over all directions and
distribution over all directions and take the rules that the measurement
take the rules that the measurement outcome a as a function of the unit
outcome a as a function of the unit vector a and this hidden variable lambda
vector a and this hidden variable lambda vector is going to be the sign of a dol
vector is going to be the sign of a dol lambda. And conversely, the measurement
lambda. And conversely, the measurement outcome at b as a function of the unit
outcome at b as a function of the unit vector b and the hidden variable vector
vector b and the hidden variable vector lambda is going to be the negative sign
lambda is going to be the negative sign of b do lambda. By the way, in Belle's
of b do lambda. By the way, in Belle's paper, there's a typo here. In the
paper, there's a typo here. In the paper, it's written as B is a function
paper, it's written as B is a function of A and B, but that should be B as a
of A and B, but that should be B as a function of B and lambda. All right. So,
function of B and lambda. All right. So, what are we doing here? Well, what we're
what are we doing here? Well, what we're saying is that we have the two particles
saying is that we have the two particles in the singlet state, and we're going to
in the singlet state, and we're going to stick a unit vector onto this pair of
stick a unit vector onto this pair of particles. So you can imagine particles
particles. So you can imagine particles one and particles 2 both carrying along
one and particles 2 both carrying along this orientational piece of information.
this orientational piece of information. This unit vector lambda which is chosen
This unit vector lambda which is chosen totally randomly out of all possible
totally randomly out of all possible directions. And [clears throat] then
directions. And [clears throat] then when particle 1 gets to detector A, if
when particle 1 gets to detector A, if lambda is pointing kind of along the
lambda is pointing kind of along the direction of A, that is if the
direction of A, that is if the dotproduct of A and lambda is positive,
dotproduct of A and lambda is positive, then you measure a spin up of particle 1
then you measure a spin up of particle 1 in detector A. And likewise, as particle
in detector A. And likewise, as particle 2 is measured in detector B, if the
2 is measured in detector B, if the lambda vector is pointing in the same
lambda vector is pointing in the same kind of direction as B, then you measure
kind of direction as B, then you measure a spin down at B. So what this model is
a spin down at B. So what this model is is kind of uh what we might
is kind of uh what we might instinctively expect is happening with a
instinctively expect is happening with a pair of particles who have an entangled
pair of particles who have an entangled spin because you might expect that there
spin because you might expect that there is some kind of orientational quantity
is some kind of orientational quantity that each particle intrinsically has,
that each particle intrinsically has, but that quantum mechanics doesn't
but that quantum mechanics doesn't account for. and that this hidden
account for. and that this hidden variable which carries with it a kind of
variable which carries with it a kind of orientation is what predetermines how
orientation is what predetermines how particles 1 and two are going to be
particles 1 and two are going to be measured at A and B respectively.
measured at A and B respectively. And so the claim is that this rule given
And so the claim is that this rule given by equation 9 works in the special cases
by equation 9 works in the special cases that the vectors A and B are perfectly
that the vectors A and B are perfectly parallel, perfectly antiparallel or
parallel, perfectly antiparallel or perfectly perpendicular. And you can
perfectly perpendicular. And you can show that that's the case. So in the
show that that's the case. So in the first case, imagine A and B being
first case, imagine A and B being perfectly parallel. Well then in
perfectly parallel. Well then in equation 9 you see that the rules for
equation 9 you see that the rules for the measurement outcomes at a and b are
the measurement outcomes at a and b are going to be equal and opposite in that
going to be equal and opposite in that case because for a we have the sign of a
case because for a we have the sign of a dot lambda but if a and b are the same
dot lambda but if a and b are the same vector then for b the rule is that it's
vector then for b the rule is that it's the negative sign of b do lambda which
the negative sign of b do lambda which is equal to a dot lambda. So you have
is equal to a dot lambda. So you have the negative of the outcome of particle
the negative of the outcome of particle a. Therefore we find perfect
a. Therefore we find perfect anti-correlation in the case that the
anti-correlation in the case that the unit vector a equals the unit vector b.
unit vector a equals the unit vector b. Likewise, then if you reverse that logic
Likewise, then if you reverse that logic and you look at rule 9 in the case that
and you look at rule 9 in the case that A and B are antiparallel, so B equals
A and B are antiparallel, so B equals negative A, then the measurement outcome
negative A, then the measurement outcome at detector A is s of A dot lambda. And
at detector A is s of A dot lambda. And the measurement outcome at detector B is
the measurement outcome at detector B is negative sign of B do lambda. B dot
negative sign of B do lambda. B dot lambda in this case would equal A dot
lambda in this case would equal A dot lambda. And you can carry that negative
lambda. And you can carry that negative sign outside of the sign function. So
sign outside of the sign function. So that then the two negatives cancel out
that then the two negatives cancel out and we find for the measurement outcome
and we find for the measurement outcome at B sine of A do lambda which is
at B sine of A do lambda which is precisely the same as the measurement
precisely the same as the measurement outcome at A. So in the case that the
outcome at A. So in the case that the measurement directions A and B are
measurement directions A and B are perfectly antiparallel we find a perfect
perfectly antiparallel we find a perfect correlation of one for the measurement
correlation of one for the measurement outcomes with this local hidden variable
outcomes with this local hidden variable model. And so in that case this model
model. And so in that case this model works just fine. And then finally, for
works just fine. And then finally, for the case that A and B are perpendicular,
the case that A and B are perpendicular, whatever the measurement outcome is at
whatever the measurement outcome is at A, you're going to have a 50/50 chance
A, you're going to have a 50/50 chance of it being the same or the opposite at
of it being the same or the opposite at B. And so in that case too, this model
B. And so in that case too, this model works just fine.
works just fine. But again, this model has a flaw, which
But again, this model has a flaw, which is that just like what we saw before in
is that just like what we saw before in part 3A, the dependence of the
part 3A, the dependence of the measurement correlation on the angle
measurement correlation on the angle theta between the vectors A and B is
theta between the vectors A and B is linear in theta. It's not the negative
linear in theta. It's not the negative cosine of theta that we expect from
cosine of theta that we expect from quantum physics and that is shown in
quantum physics and that is shown in experiments.
experiments. And to see that let's draw a picture
And to see that let's draw a picture where we imagine all possibilities for
where we imagine all possibilities for lambda selected uniformly across all
lambda selected uniformly across all possible directions and then we draw the
possible directions and then we draw the measurement direction a and you consider
measurement direction a and you consider the hemisphere of all possible vectors
the hemisphere of all possible vectors that sort of point in the same direction
that sort of point in the same direction as a that is all vectors for which a dot
as a that is all vectors for which a dot that vector is positive. Well, then the
that vector is positive. Well, then the measurement result at detector A is
measurement result at detector A is going to be spin up if lambda is in the
going to be spin up if lambda is in the same hemisphere as A or it'll be spin
same hemisphere as A or it'll be spin down if lambda is in the opposite
down if lambda is in the opposite hemisphere. So, we have a 50/50 chance
hemisphere. So, we have a 50/50 chance of measuring spin up or spin down, which
of measuring spin up or spin down, which is an agreement with experiment. But
is an agreement with experiment. But then things get a little tricky when you
then things get a little tricky when you also draw the measurement direction B in
also draw the measurement direction B in detector B and then you apply the same
detector B and then you apply the same reasoning about what the measurement
reasoning about what the measurement result is going to be in detector B. In
result is going to be in detector B. In this case, the result is going to be
this case, the result is going to be spin down if lambda is in the same
spin down if lambda is in the same hemisphere as B. Spin down because we're
hemisphere as B. Spin down because we're in the singlet state where the spins are
in the singlet state where the spins are anti-correlated and that's encoded in
anti-correlated and that's encoded in the minus sign in the second part of
the minus sign in the second part of equation 9. And then conversely,
equation 9. And then conversely, detector B will measure spin up if
detector B will measure spin up if lambda is not in the same hemisphere as
lambda is not in the same hemisphere as the measurement direction B.
the measurement direction B. And then if we want to go ahead and
And then if we want to go ahead and imagine this as an animation where we're
imagine this as an animation where we're sweeping the theta angle and considering
sweeping the theta angle and considering simultaneously all possibilities for the
simultaneously all possibilities for the hidden variable lambda that are
hidden variable lambda that are uniformly distributed over the sphere
uniformly distributed over the sphere which you may as well imagine as a
which you may as well imagine as a circle or a sphere because in either
circle or a sphere because in either case the area or the volume respectively
case the area or the volume respectively changes the same way as a function of
changes the same way as a function of the theta angle. Well, then just think
the theta angle. Well, then just think about what is the probability of having
about what is the probability of having the same outcome at both detectors
the same outcome at both detectors versus the probability of having
versus the probability of having opposite outcomes. And what you realize
opposite outcomes. And what you realize is that you're going to have the same
is that you're going to have the same outcome at both detectors when lambda is
outcome at both detectors when lambda is in the hemisphere of one of the
in the hemisphere of one of the measurement directions, but not in the
measurement directions, but not in the hemisphere of the other measurement
hemisphere of the other measurement direction. So in this animation, if you
direction. So in this animation, if you look at the two sectors with the blue
look at the two sectors with the blue arc, for both of those sectors, you're
arc, for both of those sectors, you're going to have the same measurement
going to have the same measurement outcome for both A and B. And so the
outcome for both A and B. And so the product of the outcomes at A and B is
product of the outcomes at A and B is going to equal one if lambda lies in one
going to equal one if lambda lies in one of the two blue sectors shown here. And
of the two blue sectors shown here. And then on the other hand, if lambda is in
then on the other hand, if lambda is in the hemispheres of both measurement
the hemispheres of both measurement directions or neither measurement
directions or neither measurement directions, then in that case you're
directions, then in that case you're going to have opposite outcomes at the
going to have opposite outcomes at the two detectors. And so then the product
two detectors. And so then the product of the outcomes A and B is going to be
of the outcomes A and B is going to be -1.
-1. And so to find the correlation, all we
And so to find the correlation, all we have to do is compare the area of the
have to do is compare the area of the blue sectors to the area of the red
blue sectors to the area of the red sectors. And so all the formula is is 1
sectors. And so all the formula is is 1 * the fraction of the circle taken up by
* the fraction of the circle taken up by the blue sectors minus 1 * the fraction
the blue sectors minus 1 * the fraction of the circle taken up by the red
of the circle taken up by the red sectors.
sectors. And then as we sweep theta around, we
And then as we sweep theta around, we can see the linear dependence of the
can see the linear dependence of the correlation on the theta angle. And this
correlation on the theta angle. And this linear dependence of the correlation on
linear dependence of the correlation on theta, which now we've seen a few times
theta, which now we've seen a few times in a few different contexts, is really
in a few different contexts, is really at the heart of Bell's argument, as
at the heart of Bell's argument, as we're going to see in part four.
we're going to see in part four. And so in part 3B of this paper, Bell
And so in part 3B of this paper, Bell shows us that the local hidden variable
shows us that the local hidden variable model does work for the three special
model does work for the three special cases where A and B are either parallel,
cases where A and B are either parallel, antiparallel, or perpendicular.
antiparallel, or perpendicular. And when you look at the plot of the
And when you look at the plot of the correlation that we get from our local
correlation that we get from our local hidden variable model that is this blue
hidden variable model that is this blue line and you compare it to the quantum
line and you compare it to the quantum mechanical correlation that we would
mechanical correlation that we would expect namely negative cosine of theta
expect namely negative cosine of theta you see that even though these two
you see that even though these two curves are different they do in fact
curves are different they do in fact intersect at precisely these three
intersect at precisely these three special cases. And so part 3B of Belle's
special cases. And so part 3B of Belle's paper is all about saying like, yeah,
paper is all about saying like, yeah, the local hidden variable model does
the local hidden variable model does seem to work for those three special
seem to work for those three special cases. But nonetheless, the local hidden
cases. But nonetheless, the local hidden variable model breaks down for anything
variable model breaks down for anything other than those three special cases
other than those three special cases because a line is not a cosine. And
because a line is not a cosine. And there's actually a couple of ways in
there's actually a couple of ways in which a line is not a cosine. The most
which a line is not a cosine. The most obvious one is that there's just a
obvious one is that there's just a mismatch in these two curves for most
mismatch in these two curves for most values. So pick a theta value at random
values. So pick a theta value at random and negative cosine of theta is just not
and negative cosine of theta is just not the same value as what our linear
the same value as what our linear correlation gives us. So it doesn't
correlation gives us. So it doesn't match. But the other noticeable thing
match. But the other noticeable thing that differs between this linear
that differs between this linear correlation that we get from our local
correlation that we get from our local hidden variable model and the quantum
hidden variable model and the quantum mechanical correlation is that the
mechanical correlation is that the linear correlation has a nonzero slope
linear correlation has a nonzero slope at a theta angle of 0. Whereas the
at a theta angle of 0. Whereas the quantum mechanical correlation has a
quantum mechanical correlation has a flat slope of zero at theta= 0.
flat slope of zero at theta= 0. And this is kind of a subtle difference
And this is kind of a subtle difference between these two correlation functions,
between these two correlation functions, but nonetheless, it is a difference and
but nonetheless, it is a difference and it's a difference that's totally generic
it's a difference that's totally generic to all local hidden variable models. So,
to all local hidden variable models. So, one of the things that we're going to
one of the things that we're going to prove in this paper in part 4 a is that
prove in this paper in part 4 a is that any local hidden variable model is going
any local hidden variable model is going to have a nonzero slope at a theta angle
to have a nonzero slope at a theta angle of zero.
of zero. So this animation gives us a great
So this animation gives us a great intuition for how the local hidden
intuition for how the local hidden variable model gives us a correlation
variable model gives us a correlation which depends linearly on the angle
which depends linearly on the angle theta between the vectors a and b. And
theta between the vectors a and b. And therefore bell goes on to say this gives
therefore bell goes on to say this gives a correlation as a function of a and b
a correlation as a function of a and b of1 + 2 thet pi. Call that equation 10
of1 + 2 thet pi. Call that equation 10 where theta is the angle between the
where theta is the angle between the vectors a and b and 10 has the
vectors a and b and 10 has the properties of equation 8. that is it
properties of equation 8. that is it works for the three special cases. And
works for the three special cases. And of course, the precise form of equation
of course, the precise form of equation 10, this 2 pi, that's just y= mx plus b.
10, this 2 pi, that's just y= mx plus b. That's just what it has to be to be a
That's just what it has to be to be a line that goes through the boundary
line that goes through the boundary conditions given by equation 8. But
conditions given by equation 8. But noticeably, the blue curve and the
noticeably, the blue curve and the purple curve are not the same in
purple curve are not the same in general. Not only do their values not
general. Not only do their values not match in general but also at theta
match in general but also at theta equals z the blue line has a non-zero
equals z the blue line has a non-zero slope whereas the purple quantum curve
slope whereas the purple quantum curve has a slope of zero. Now here Belle
has a slope of zero. Now here Belle abruptly brings up a very important
abruptly brings up a very important point although it is kind of jarring the
point although it is kind of jarring the way in which he brings it up so abruptly
way in which he brings it up so abruptly but in any case following the paper um
but in any case following the paper um for comparison consider the result of a
for comparison consider the result of a modified theory in which the pure
modified theory in which the pure singlet state is replaced in the course
singlet state is replaced in the course of time by an isotropic mixture of
of time by an isotropic mixture of product states. This gives the
product states. This gives the correlation function a b / 3. Call that
correlation function a b / 3. Call that equation 11. Now, what does that mean? I
equation 11. Now, what does that mean? I mean, that sentence just comes out of
mean, that sentence just comes out of nowhere, right? And there is a lot that
nowhere, right? And there is a lot that Belle is communicating in this one
Belle is communicating in this one sentence. So, I want to take a moment to
sentence. So, I want to take a moment to unpack exactly what he means because
unpack exactly what he means because this is actually a really profound
this is actually a really profound point. So when we have our purple curve
point. So when we have our purple curve of negative cosine theta for the
of negative cosine theta for the correlation between the measurement
correlation between the measurement outcome at detector A and detector B.
outcome at detector A and detector B. This is based on the two particles being
This is based on the two particles being in the singlet spin state where before
in the singlet spin state where before the measurement neither particle has a
the measurement neither particle has a preferred spin direction. But the spin
preferred spin direction. But the spin measurement outcomes for the two
measurement outcomes for the two particles are guaranteed to be
particles are guaranteed to be anti-correlated along the same
anti-correlated along the same measurement axis. whatever that
measurement axis. whatever that measurement axis may be. On the other
measurement axis may be. On the other hand, if instead of the singlet state,
hand, if instead of the singlet state, we imagine that the two particles
we imagine that the two particles already have some preferred spin
already have some preferred spin direction before they're measured, but
direction before they're measured, but still their spins are equal and opposite
still their spins are equal and opposite relative to that particular spin
relative to that particular spin direction, then we would expect
direction, then we would expect anti-correlated spin measurements if the
anti-correlated spin measurements if the particles are measured along that
particles are measured along that particular spin direction. But if the
particular spin direction. But if the particles are measured perpendicularly
particles are measured perpendicularly to that spin direction, then in that
to that spin direction, then in that case we would expect no correlation
case we would expect no correlation between the spin outcomes of those two
between the spin outcomes of those two particles.
particles. And so what Belle means by isotropic
And so what Belle means by isotropic mixture of product states is that
mixture of product states is that imagine when we're producing these
imagine when we're producing these particles instead of being in the
particles instead of being in the singlet state with pure rotational
singlet state with pure rotational symmetry and no preferred spin axis a
symmetry and no preferred spin axis a priori instead of that the particle
priori instead of that the particle pairs do have an intrinsic preferred
pairs do have an intrinsic preferred spin direction relative to which they're
spin direction relative to which they're equal and opposite and then by isotropic
equal and opposite and then by isotropic all that means is that that direction
all that means is that that direction call it n hat is selected uniformly from
call it n hat is selected uniformly from the sphere. So the particles preferred
the sphere. So the particles preferred direction is going to be totally random.
direction is going to be totally random. And so now if you imagine measuring over
And so now if you imagine measuring over many such pairs of particles and for the
many such pairs of particles and for the sake of argument suppose we imagine them
sake of argument suppose we imagine them along the same measurement axis A. Well
along the same measurement axis A. Well sometimes that spin axis n is going to
sometimes that spin axis n is going to be aligned but usually it's not going to
be aligned but usually it's not going to be very aligned in which case we won't
be very aligned in which case we won't really see much of a correlation. And
really see much of a correlation. And when you work out the math of on
when you work out the math of on average, what correlation strength would
average, what correlation strength would we expect, you find a correlation
we expect, you find a correlation strength which is the same as for the
strength which is the same as for the singlet state, but divided by a factor
singlet state, but divided by a factor of three, which represents the fact that
of three, which represents the fact that when you average over all three
when you average over all three dimensions of space, more often than
dimensions of space, more often than not, our measurement directions are not
not, our measurement directions are not going to be aligned with the spin
going to be aligned with the spin direction n.
direction n. And so we actually see a very strong
And so we actually see a very strong theoretical and experimental difference
theoretical and experimental difference between the singlet state and a
between the singlet state and a situation where the particles have equal
situation where the particles have equal and opposite spin along some random
and opposite spin along some random axis. The correlation we get from the
axis. The correlation we get from the singlet state is weirdly strong in a
singlet state is weirdly strong in a surreal kind of way. And this reflects
surreal kind of way. And this reflects the fact that in the singlet state
the fact that in the singlet state neither particle has a preferred
neither particle has a preferred direction before it's measured. And so
direction before it's measured. And so if you think in terms of one of the
if you think in terms of one of the particles being measured ever so
particles being measured ever so slightly before the other, then you're
slightly before the other, then you're guaranteed to collapse the wave function
guaranteed to collapse the wave function along that measurement direction. And so
along that measurement direction. And so in the singlet state, your measurement
in the singlet state, your measurement axes are always going to be more
axes are always going to be more aligned. Whereas for an isotropic
aligned. Whereas for an isotropic mixture of product states, in general,
mixture of product states, in general, you're not going to have this kind of
you're not going to have this kind of alignment.
alignment. All right. So Belle then goes on to say
All right. So Belle then goes on to say it is probably less easy experimentally
it is probably less easy experimentally to distinguish equation 10 from equation
to distinguish equation 10 from equation 3 than equation 11 from equation 3. So
3 than equation 11 from equation 3. So equation 10 is the linear correlation
equation 10 is the linear correlation that we get from our local hidden
that we get from our local hidden variable model. And equation 11 is the
variable model. And equation 11 is the A.B3
A.B3 that is negative cosine theta over 3
that is negative cosine theta over 3 correlation that we get from a quantum
correlation that we get from a quantum mechanical model in which the two
mechanical model in which the two particles are not in the singlet state
particles are not in the singlet state but rather are in a product state with
but rather are in a product state with some preferred direction. And what Bell
some preferred direction. And what Bell is saying here is that there's really a
is saying here is that there's really a big contrast in the experimental data
big contrast in the experimental data between a singlet state and an isotropic
between a singlet state and an isotropic mixture of product states. whereas the
mixture of product states. whereas the linear correlation from a local hidden
linear correlation from a local hidden variable model is going to be a better
variable model is going to be a better approximation to the actual quantum
approximation to the actual quantum mechanical singlet correlation. So
mechanical singlet correlation. So that's just a point about experimental
that's just a point about experimental practicality.
practicality. Now before moving on from part 3B, Bell
Now before moving on from part 3B, Bell makes one final comment which is that
makes one final comment which is that unlike equation 3, the quantum
unlike equation 3, the quantum mechanical correlation negative cosine
mechanical correlation negative cosine of theta, the function of equation 10,
of theta, the function of equation 10, this linear correlation we get from the
this linear correlation we get from the local hidden variable model is not
local hidden variable model is not stationary. That is the slope is non
stationary. That is the slope is non zero at the minimum value -1 where theta
zero at the minimum value -1 where theta equals 0. So we talked about that
equals 0. So we talked about that earlier when thinking about the
earlier when thinking about the differences between the blue line and
differences between the blue line and the magenta curve that is between the
the magenta curve that is between the local hidden variable model and the
local hidden variable model and the quantum mechanical correlation.
quantum mechanical correlation. One of the differences is that the
One of the differences is that the values in general are not the same
values in general are not the same value. But another difference is that
value. But another difference is that the quantum mechanical correlation has a
the quantum mechanical correlation has a slope of zero at its minimum value
slope of zero at its minimum value whereas the local hidden variable line
whereas the local hidden variable line does not. It'll be seen in part 4 a that
does not. It'll be seen in part 4 a that this is characteristic of functions of
this is characteristic of functions of type two that is where the correlation
type two that is where the correlation is given by a local hidden variable
is given by a local hidden variable model. So in part 4 a we're going to
model. So in part 4 a we're going to prove that any local hidden variable
prove that any local hidden variable model is going to have a nonzero slope
model is going to have a nonzero slope in its correlation function at the
in its correlation function at the minimum value which is incompatible with
minimum value which is incompatible with quantum mechanics and with the
quantum mechanics and with the experimental data. And then in part 4B,
experimental data. And then in part 4B, we're going to prove that in general,
we're going to prove that in general, the two correlation curves for a local
the two correlation curves for a local hidden variable model and for quantum
hidden variable model and for quantum mechanics in general cannot take on the
mechanics in general cannot take on the same values everywhere.
same values everywhere. So in part four, we're going to prove in
So in part four, we're going to prove in two different ways that local hidden
two different ways that local hidden variable models are not compatible with
variable models are not compatible with quantum mechanics and not compatible
quantum mechanics and not compatible with the experimental data.
with the experimental data. Okay, so then Bell wraps up part three
Okay, so then Bell wraps up part three by talking about how a hidden variable
by talking about how a hidden variable model could work if we allow for
model could work if we allow for non-locality.
non-locality. Thirdly and finally, there is no
Thirdly and finally, there is no difficulty in reproducing the quantum
difficulty in reproducing the quantum mechanical correlation of equation three
mechanical correlation of equation three if the results of the spin measurements
if the results of the spin measurements at A and B in equation two, the
at A and B in equation two, the correlation function of the local hidden
correlation function of the local hidden variable model are allowed to depend on
variable model are allowed to depend on the measurement directions B and A
the measurement directions B and A respectively as well as on A and B. And
respectively as well as on A and B. And Belle shows this by saying if we do a
Belle shows this by saying if we do a non-local sketchy move, we can warp the
non-local sketchy move, we can warp the blue line into the magenta curve. So the
blue line into the magenta curve. So the reasoning here is exactly the same as
reasoning here is exactly the same as what we've seen before when we thought
what we've seen before when we thought about doing a sketchy move to warp the
about doing a sketchy move to warp the line into the curve. But the key
line into the curve. But the key difference now is that when you have two
difference now is that when you have two entangled particles that are separated
entangled particles that are separated in space, you can't do this sketchy move
in space, you can't do this sketchy move unless you know the angle between the
unless you know the angle between the measurement directions A and B, which
measurement directions A and B, which are in different light cones. And so
are in different light cones. And so this is a non-local sketchy move because
this is a non-local sketchy move because somehow what's happening at detector A
somehow what's happening at detector A depends on the measurement axis at
depends on the measurement axis at detector B and vice versa. So as a
detector B and vice versa. So as a concrete example of this, we can replace
concrete example of this, we can replace the vector A in equation 9 by an
the vector A in equation 9 by an effective measurement axis A prime
effective measurement axis A prime obtained from A by rotation towards the
obtained from A by rotation towards the measurement vector B until 1 - 2 theta
measurement vector B until 1 - 2 theta prime over pi equals cosine of theta
prime over pi equals cosine of theta where theta prime is the angle between
where theta prime is the angle between the effective measurement axis A prime
the effective measurement axis A prime and B. So if you make that sketchy move
and B. So if you make that sketchy move then the blue line is going to warp into
then the blue line is going to warp into the magenta quantum curve and then in
the magenta quantum curve and then in that case we would have a match between
that case we would have a match between our hidden variable model and quantum
our hidden variable model and quantum mechanics and the experimental data. And
mechanics and the experimental data. And so this is exactly the same reasoning as
so this is exactly the same reasoning as the sketchy moves that we looked at
the sketchy moves that we looked at before. In fact, it's exactly the same
before. In fact, it's exactly the same mathematical maneuver. However, for
mathematical maneuver. However, for given values of the hidden variables,
given values of the hidden variables, the results of measurements with one
the results of measurements with one magnet now depend on the setting of the
magnet now depend on the setting of the distant magnet, which is just what we
distant magnet, which is just what we would wish to avoid, that is
would wish to avoid, that is non-locality.
non-locality. And there's really no way around that.
And there's really no way around that. If you look at the example shown here
If you look at the example shown here where we replaced a with a prime and you
where we replaced a with a prime and you think maybe there's some way to do the
think maybe there's some way to do the sketchy move differently in a way that
sketchy move differently in a way that doesn't violate locality, well, try to
doesn't violate locality, well, try to do that and you find it doesn't work. So
do that and you find it doesn't work. So for example, what if instead of rotating
for example, what if instead of rotating A into A prime, we leave A alone and
A into A prime, we leave A alone and rotate B into B prime in a way that
rotate B into B prime in a way that gives us the same result. Well, that
gives us the same result. Well, that would require for B prime to be a vector
would require for B prime to be a vector that's slightly rotated towards A. And
that's slightly rotated towards A. And again, it's the same thing. And in fact,
again, it's the same thing. And in fact, by symmetry, that reasoning is the same
by symmetry, that reasoning is the same as before, where now we're just saying
as before, where now we're just saying that what's happening at detector B is
that what's happening at detector B is somehow bent towards the measurement
somehow bent towards the measurement direction A. And so really, it's the
direction A. And so really, it's the same kind of nonsense.
same kind of nonsense. And then also philosophically we might
And then also philosophically we might expect there to be some symmetry here.
expect there to be some symmetry here. So if we wanted an idea like this to
So if we wanted an idea like this to work maybe we should actually bend A to
work maybe we should actually bend A to A prime and B to B prime where A prime
A prime and B to B prime where A prime is bent towards B and B prime is bent
is bent towards B and B prime is bent towards A in an equal and opposite kind
towards A in an equal and opposite kind of way. But in that case then both
of way. But in that case then both detectors know something about how the
detectors know something about how the other detector is configured. And so
other detector is configured. And so fundamentally it's exactly the same
fundamentally it's exactly the same problem no matter how you look at it.
problem no matter how you look at it. So reflecting on part three, we've seen
So reflecting on part three, we've seen some specific examples of how hidden
some specific examples of how hidden variable models don't really work. They
variable models don't really work. They just don't match the experimental data,
just don't match the experimental data, whereas quantum mechanics does. And so
whereas quantum mechanics does. And so what follows in part four is going to be
what follows in part four is going to be very abstract, very mathematical, very
very abstract, very mathematical, very algebraic, and we're going to take our
algebraic, and we're going to take our time with it because it's a whole lot of
time with it because it's a whole lot of equations and symbols and all that. But
equations and symbols and all that. But if you followed along part three, then
if you followed along part three, then you already have the fundamental insight
you already have the fundamental insight required to make sense of part four. All
required to make sense of part four. All we're doing in part 4 is generalizing on
we're doing in part 4 is generalizing on this specific example to show first that
this specific example to show first that every local hidden variable model is
every local hidden variable model is going to have a correlation function
going to have a correlation function with nonzero slope at its minimum value,
with nonzero slope at its minimum value, which is in contradiction with quantum
which is in contradiction with quantum mechanics and the experimental data.
mechanics and the experimental data. And then second, in part 4B, we're going
And then second, in part 4B, we're going to show that in general, the correlation
to show that in general, the correlation function given by a local hidden
function given by a local hidden variable model cannot take on the same
variable model cannot take on the same value as the correlation given by
value as the correlation given by quantum mechanics and experiment at
quantum mechanics and experiment at every theta point. That is for every
every theta point. That is for every possible configuration of the
possible configuration of the measurement axes A and B. And so it's
measurement axes A and B. And so it's the same kind of reasoning that we've
the same kind of reasoning that we've seen in part three, but just in a much
seen in part three, but just in a much more abstract and generic kind of way.
more abstract and generic kind of way. And the abstraction is worth it. Even
And the abstraction is worth it. Even though it is somewhat impenetrable and
though it is somewhat impenetrable and it takes a lot of time to digest, it's
it takes a lot of time to digest, it's going to be a very powerful result. And
going to be a very powerful result. And so, as usual, ask not for easier
so, as usual, ask not for easier equations, but for stronger coffee. You
equations, but for stronger coffee. You got to prepare yourself for this because
got to prepare yourself for this because it's going to be a bit of work, but it
it's going to be a bit of work, but it is well worth the effort.
is well worth the effort. All right, my friends. We're now ready
All right, my friends. We're now ready to approach the core argument of Bell's
to approach the core argument of Bell's paper, part four, contradiction.
paper, part four, contradiction. Okay, so in the first part of part four,
Okay, so in the first part of part four, we're going to show that the correlation
we're going to show that the correlation function that we get from a local hidden
function that we get from a local hidden variable model cannot be stationary at
variable model cannot be stationary at its minimum value when theta equals 0
its minimum value when theta equals 0 unlike the quantum correlation which is
unlike the quantum correlation which is stationary that is does have zero slope
stationary that is does have zero slope at its minimum value for theta equals 0.
at its minimum value for theta equals 0. And so this is going to be a generic
And so this is going to be a generic difference between the kinds of
difference between the kinds of correlations that local hidden variable
correlations that local hidden variable models can give us and the correlation
models can give us and the correlation that we expect from quantum mechanics
that we expect from quantum mechanics which is also the correlation measured
which is also the correlation measured in experiments.
in experiments. All right, the main result will now be
All right, the main result will now be proved because row is a normalized
proved because row is a normalized probability distribution. The integral
probability distribution. The integral over row d lambda equals 1. And we saw
over row d lambda equals 1. And we saw that before. That just means if you
that before. That just means if you consider every possible configuration of
consider every possible configuration of hidden variables and add them all up,
hidden variables and add them all up, each one weighted by its probability,
each one weighted by its probability, then the result is going to be one. In
then the result is going to be one. In other words, the hidden variables have
other words, the hidden variables have to be in some kind of configuration.
to be in some kind of configuration. And next, because of the properties of
And next, because of the properties of equation one, where we saw that the
equation one, where we saw that the measurement outcomes at detectors A and
measurement outcomes at detectors A and B can only take on the values of + one
B can only take on the values of + one or minus one depending on whether that
or minus one depending on whether that detector measured spin up or spin down
detector measured spin up or spin down respectively. Then if we consider the
respectively. Then if we consider the definition of our local hidden variable
definition of our local hidden variable correlation function in equation two
correlation function in equation two where we found that P is going to be the
where we found that P is going to be the integral over all possible
integral over all possible configurations of the hidden variables
configurations of the hidden variables of the measurement result at A times the
of the measurement result at A times the measurement result at B and this
measurement result at B and this correlation is going to be a function of
correlation is going to be a function of the measurement axes A and B. Then as
the measurement axes A and B. Then as you can see this correlation P cannot be
you can see this correlation P cannot be less than -1.
less than -1. That is the lowest value our correlation
That is the lowest value our correlation can be is a perfectly anti-correlated
can be is a perfectly anti-correlated value of negative 1.
value of negative 1. And when can it take on that value?
And when can it take on that value? Well, as we've seen, the correlation
Well, as we've seen, the correlation function can only reach -1 at a equals
function can only reach -1 at a equals b. That is when the two measurements are
b. That is when the two measurements are aligned along the same axis. Then for
aligned along the same axis. Then for the singlet state, you're going to have
the singlet state, you're going to have perfectly anti-correlated results.
perfectly anti-correlated results. Measure spin up at detector A along the
Measure spin up at detector A along the axis A. And for sure you know you're
axis A. And for sure you know you're going to measure spin down at detector B
going to measure spin down at detector B for an axis B which is equal to A. So
for an axis B which is equal to A. So we've seen that before. That's nothing
we've seen that before. That's nothing new. And now Belle makes a technically
new. And now Belle makes a technically nuanced comment which is that this is
nuanced comment which is that this is only the case if A as a function of A
only the case if A as a function of A and lambda is equal to B as a function
and lambda is equal to B as a function of A and lambda except at a set of
of A and lambda except at a set of points lambda of zero probability.
points lambda of zero probability. Now this is a technical caveat that is
Now this is a technical caveat that is designed to keep this argument fully
designed to keep this argument fully generic. We know from experiments that
generic. We know from experiments that for the singlet state, it is going to be
for the singlet state, it is going to be true that the measurement result at A
true that the measurement result at A for measurement axis A is indeed going
for measurement axis A is indeed going to be equal to the negative of the
to be equal to the negative of the measurement result at B for measurement
measurement result at B for measurement along the same axis A. But because we're
along the same axis A. But because we're trying to rule out the possibility of
trying to rule out the possibility of all imaginable hidden variable models,
all imaginable hidden variable models, you could in theory imagine a model
you could in theory imagine a model where these functions A as a function of
where these functions A as a function of A and lambda is not necessarily equal to
A and lambda is not necessarily equal to B as a function of A and lambda. But you
B as a function of A and lambda. But you could have some superfluous
could have some superfluous configurations of hidden variables. And
configurations of hidden variables. And that's technically fine as long as those
that's technically fine as long as those configurations of hidden variables have
configurations of hidden variables have zero probability. So this is a really
zero probability. So this is a really minor point and honestly it probably
minor point and honestly it probably kind of goes without saying because we
kind of goes without saying because we know from the experimental data that for
know from the experimental data that for sure the result at detector A is going
sure the result at detector A is going to be the negative of the result at
to be the negative of the result at detector B when you're measuring along
detector B when you're measuring along the same axis. So you can think of that
the same axis. So you can think of that as an experimental boundary condition.
as an experimental boundary condition. And if any local hidden variable model
And if any local hidden variable model disagrees with that, that is if you have
disagrees with that, that is if you have a local hidden variable model that goes
a local hidden variable model that goes against equation 13, well, that can only
against equation 13, well, that can only match the experiment if the lambda which
match the experiment if the lambda which violate equation 13 have zero
violate equation 13 have zero probability of occurring. Anyway, I
probability of occurring. Anyway, I think the paper probably could have gone
think the paper probably could have gone without that little comment about a set
without that little comment about a set of points lambda of zero probability,
of points lambda of zero probability, but it's in there just for the reader
but it's in there just for the reader who's going to be very pedantic about
who's going to be very pedantic about that. So, all right then. If we assume
that. So, all right then. If we assume equation 13, which is really less of an
equation 13, which is really less of an assumption and more of an experimental
assumption and more of an experimental boundary condition, then equation 2, the
boundary condition, then equation 2, the correlation for a local hidden variable
correlation for a local hidden variable model, can be written as P as a function
model, can be written as P as a function of A and B is equal to the negative of
of A and B is equal to the negative of the integral over all possible
the integral over all possible configurations of hidden variables of
configurations of hidden variables of the result at detector A as a function
the result at detector A as a function of A and lambda times the hypothetical
of A and lambda times the hypothetical result at detector A as a function
result at detector A as a function and lambda. Now let's linger on that for
and lambda. Now let's linger on that for a second. What is this term as a
a second. What is this term as a function of lambda? Well, what that is
function of lambda? Well, what that is is imagine a generic case where we have
is imagine a generic case where we have our detectors A and B and A is aligned
our detectors A and B and A is aligned with some axis A and the alignment of
with some axis A and the alignment of detector B is some axis B. Well, we know
detector B is some axis B. Well, we know that our correlation is going to depend
that our correlation is going to depend on the product of the measurement
on the product of the measurement results at detectors A and B. And all
results at detectors A and B. And all equation 14 is is that the result at
equation 14 is is that the result at detector B can be thought of as the
detector B can be thought of as the negative of the result that detector A
negative of the result that detector A would measure if A were aligned along
would measure if A were aligned along the B axis. And so you see the only
the B axis. And so you see the only difference between equation 14 and
difference between equation 14 and equation two is that the measurement
equation two is that the measurement result at B aligned along the axis B as
result at B aligned along the axis B as a function of our hidden variables
a function of our hidden variables lambda has been replaced with what would
lambda has been replaced with what would have been the results of the measurement
have been the results of the measurement at A if A were aligned along the same
at A if A were aligned along the same axis B and we had the same hidden
axis B and we had the same hidden variables lambda. So this is just a way
variables lambda. So this is just a way of writing our correlation in terms of
of writing our correlation in terms of measurement results at detector A.
measurement results at detector A. All right. And next what we're going to
All right. And next what we're going to do is we're going to let C be another
do is we're going to let C be another unit vector which is an alternative
unit vector which is an alternative option for B. So imagine C as the
option for B. So imagine C as the alignment in detector B. In fact, at
alignment in detector B. In fact, at first imagine that C is the same thing
first imagine that C is the same thing as B and then give it just a little
as B and then give it just a little nudge so that C is just a little
nudge so that C is just a little different than B. And then the question
different than B. And then the question we can ask is if you imagine two
we can ask is if you imagine two hypothetical scenarios, one where you
hypothetical scenarios, one where you had the measurement axes A and B and
had the measurement axes A and B and another where you had the measurement
another where you had the measurement axes A and C where C is just a little
axes A and C where C is just a little nudge away from B. Then how do we
nudge away from B. Then how do we calculate the difference in the
calculate the difference in the correlations P of A and B and P of A and
correlations P of A and B and P of A and C? In other words, what kind of
C? In other words, what kind of difference in the correlation do we get
difference in the correlation do we get when we apply a small little nudge on
when we apply a small little nudge on the axis of detector B?
the axis of detector B? Well, all we have to do is replace P
Well, all we have to do is replace P with the integral formula given by
with the integral formula given by equation 14. And we can go ahead and
equation 14. And we can go ahead and smush these together into one integral.
smush these together into one integral. And we see that we have the negative
And we see that we have the negative integral over all possibilities for the
integral over all possibilities for the hidden variables of a as a function of a
hidden variables of a as a function of a and lambda * a as a function of b and
and lambda * a as a function of b and lambda minus a as a function of a and
lambda minus a as a function of a and lambda time a as a function of c and
lambda time a as a function of c and lambda which is how the correlation
lambda which is how the correlation function would change if we slightly
function would change if we slightly changed the measurement axis at detector
changed the measurement axis at detector B from the vector B to the very similar
B from the vector B to the very similar vector C. So now Belle goes on to
vector C. So now Belle goes on to algebraically massage this integral
algebraically massage this integral expression into a different form shown
expression into a different form shown here.
here. And to see what he's done here, let's go
And to see what he's done here, let's go ahead and color code this like so. So
ahead and color code this like so. So first of all, you see that both parts of
first of all, you see that both parts of the integrant have in common this factor
the integrant have in common this factor of a as a function of a and lambda. So
of a as a function of a and lambda. So we can go ahead and factor that out and
we can go ahead and factor that out and pull that to the left. And the next
pull that to the left. And the next thing you want to look at is in the top
thing you want to look at is in the top expression there, we have that factor of
expression there, we have that factor of a of b and lambda. And we also have a
a of b and lambda. And we also have a minus sign. So now what we're going to
minus sign. So now what we're going to do to bring that into the bottom
do to bring that into the bottom expression is we're going to factor out
expression is we're going to factor out that term a of b and lambda. So we're
that term a of b and lambda. So we're going to bring that to the left. And
going to bring that to the left. And then what remains is just the number
then what remains is just the number one. But then we're going to go ahead
one. But then we're going to go ahead and pull in that minus sign from the
and pull in that minus sign from the outside of the integral to the inside.
outside of the integral to the inside. And so that term is just going to be a
And so that term is just going to be a -1 inside of the brackets from which we
-1 inside of the brackets from which we factored out a of b and lambda.
factored out a of b and lambda. And then the final thing that we have to
And then the final thing that we have to prove is that in that top expression,
prove is that in that top expression, the term on the right involving the a of
the term on the right involving the a of c and lambda can be brought down below
c and lambda can be brought down below and turned into this expression a of b
and turned into this expression a of b and lambda time a of c and lambda. And
and lambda time a of c and lambda. And to show that this is in fact a
to show that this is in fact a legitimate move, first of all, in the
legitimate move, first of all, in the top equation, notice how we have two
top equation, notice how we have two minus signs. And so those are going to
minus signs. And so those are going to cancel each other out. And then the only
cancel each other out. And then the only question that remains is, is the product
question that remains is, is the product of these two purple expressions times a
of these two purple expressions times a of cm and lambda equal to a of c and
of cm and lambda equal to a of c and lambda? Well, yeah, it is. The reason
lambda? Well, yeah, it is. The reason being that purple expression is the
being that purple expression is the square of a of b and lambda. But
square of a of b and lambda. But remember this capital A, this is the
remember this capital A, this is the measurement result at detector A. And
measurement result at detector A. And the only values it can take on are
the only values it can take on are either plus or minus one. But in either
either plus or minus one. But in either case, the square of plus or -1 equals 1.
case, the square of plus or -1 equals 1. And so yeah, the purple expression then
And so yeah, the purple expression then collapses onto the number one. And we
collapses onto the number one. And we see that this was in fact a legitimate
see that this was in fact a legitimate move the way we've factored things out
move the way we've factored things out here. So what we end up with is the same
here. So what we end up with is the same integral we had before, but just
integral we had before, but just massaged into a different form.
massaged into a different form. All right. So now bell is going to claim
All right. So now bell is going to claim that this integral expression is less
that this integral expression is less than another integral. So using equation
than another integral. So using equation one which is where we specified that the
one which is where we specified that the measurement outcomes at detectors A and
measurement outcomes at detectors A and B can only either be +1 or minus1 then
B can only either be +1 or minus1 then we can show that our integral expression
we can show that our integral expression is going to be less than or equal to the
is going to be less than or equal to the integral over row D lambda of 1 minus A
integral over row D lambda of 1 minus A as a function of B and lambda* A of C
as a function of B and lambda* A of C and lambda.
and lambda. Now, when I got to this part of the
Now, when I got to this part of the paper, I was looking at it and I was
paper, I was looking at it and I was like, "Uh, hm. Okay, [clears throat]
why? [laughter] How do we know that's the case?" And I
How do we know that's the case?" And I was staring at it for a while and I just
was staring at it for a while and I just couldn't figure it out. I I don't know
couldn't figure it out. I I don't know if there's supposed to be an easier way
if there's supposed to be an easier way to do this because if you look at the
to do this because if you look at the two sides of this inequality, you see
two sides of this inequality, you see that on the left side we have something
that on the left side we have something of the form n * m -1 and on the right
of the form n * m -1 and on the right side we have something of the form 1 -
side we have something of the form 1 - m. And notice that in both cases that
m. And notice that in both cases that blue expression m is the same number on
blue expression m is the same number on both the left side and the right side.
both the left side and the right side. And both n and m are necessarily
And both n and m are necessarily integers. And because both of them are
integers. And because both of them are just a * a, we know that n and m are
just a * a, we know that n and m are both going to be plus or - 1. And so
both going to be plus or - 1. And so then the question just becomes whether
then the question just becomes whether it is in fact the case that n * mus1 is
it is in fact the case that n * mus1 is always less than or equal to 1 - m for
always less than or equal to 1 - m for the four possibilities of each n and m
the four possibilities of each n and m being plus or minus one. So anyway, I
being plus or minus one. So anyway, I ended up just checking all four
ended up just checking all four possibilities and verifying that for all
possibilities and verifying that for all the four possible options, this is
the four possible options, this is actually true. I don't know if there's a
actually true. I don't know if there's a more elegant way of demonstrating that
more elegant way of demonstrating that this is true. But in any case, this way
this is true. But in any case, this way works fine. It's just a little bit
works fine. It's just a little bit tedious.
tedious. If you check all four possibilities
If you check all four possibilities here, you find that this is in fact a
here, you find that this is in fact a legit move and the integral expression
legit move and the integral expression on the left is indeed always less than
on the left is indeed always less than or equal to the integral expression on
or equal to the integral expression on the right.
the right. Okay, then. So that works. But why do we
Okay, then. So that works. But why do we care? Like what are we doing here? Well,
care? Like what are we doing here? Well, notice this. If we look at that integral
notice this. If we look at that integral expression, the second term on the right
expression, the second term on the right is our correlation function evaluated
is our correlation function evaluated for the vectors B and C.
for the vectors B and C. You see, because by equation 14, we know
You see, because by equation 14, we know that we can calculate our correlation
that we can calculate our correlation function in terms of results at detector
function in terms of results at detector A as a function of measurement axes and
A as a function of measurement axes and the hidden variables. And in that case
the hidden variables. And in that case we just integrate over all row d lambda
we just integrate over all row d lambda a of a and lambda* a of b and lambda
a of a and lambda* a of b and lambda with a minus sign on the outside. And so
with a minus sign on the outside. And so by pattern recognition we can see that
by pattern recognition we can see that the second term on the right of this
the second term on the right of this integral by equation 14 is actually
integral by equation 14 is actually equal to p evaluated with the vectors b
equal to p evaluated with the vectors b and c. And that's going to be very
and c. And that's going to be very important in just a moment. Okay. Okay.
important in just a moment. Okay. Okay. So having recognized P as a function of
So having recognized P as a function of B and C, it follows that 1 + P as a
B and C, it follows that 1 + P as a function of B and C is greater than or
function of B and C is greater than or equal to the absolute value of P of A
equal to the absolute value of P of A and B minus P of A and C. And you can
and B minus P of A and C. And you can kind of read that directly from the
kind of read that directly from the characters that are colorful here. You
characters that are colorful here. You see because if you think about the
see because if you think about the expression that we've been evaluating,
expression that we've been evaluating, remember we started off with thinking
remember we started off with thinking about what is the difference in the
about what is the difference in the correlation function. If we have P as a
correlation function. If we have P as a function of A and B compared to that is
function of A and B compared to that is minus the correlation as a function of A
minus the correlation as a function of A and C where C is a vector very much like
and C where C is a vector very much like B but with a little nudge. And we showed
B but with a little nudge. And we showed after evaluating all of these integral
after evaluating all of these integral expressions that this difference in
expressions that this difference in correlations has to be less than or
correlations has to be less than or equal to this integral expression which
equal to this integral expression which contains in it P is a function of B and
contains in it P is a function of B and C plus one. There's also a one in the
C plus one. There's also a one in the integral. But because row is normalized,
integral. But because row is normalized, that one just pops outside of the
that one just pops outside of the integral. But then Belle goes ahead and
integral. But then Belle goes ahead and switches this expression around so that
switches this expression around so that you have the difference in the
you have the difference in the correlation on the right side and we
correlation on the right side and we pull the expression involving P of B and
pull the expression involving P of B and C on over to the left side. And so
C on over to the left side. And so that's why the less than or equal to
that's why the less than or equal to sign flips around into a greater than or
sign flips around into a greater than or equal to.
equal to. And so that reasoning justifies equation
And so that reasoning justifies equation 15 without the absolute value. But now
15 without the absolute value. But now we need to justify where that absolute
we need to justify where that absolute value comes from. And as it turns out,
value comes from. And as it turns out, the absolute value sign arises from
the absolute value sign arises from symmetry. So imagine swapping the
symmetry. So imagine swapping the vectors B and C in all of these
vectors B and C in all of these equations. Well, on the left hand side,
equations. Well, on the left hand side, when you consider the function P as a
when you consider the function P as a function of B and C, if instead we had P
function of B and C, if instead we had P as a function of C and B, that's
as a function of C and B, that's actually the same thing. That's equal to
actually the same thing. That's equal to P as a function of B and C. Because at
P as a function of B and C. Because at the end of the day, you're still
the end of the day, you're still measuring along the same two measurement
measuring along the same two measurement axes. And it doesn't matter which
axes. And it doesn't matter which detector we say is detector A versus
detector we say is detector A versus detector B. So the order of the input B
detector B. So the order of the input B and C doesn't matter in the correlation
and C doesn't matter in the correlation function.
function. However, on the right hand side where we
However, on the right hand side where we have this difference P of A and B minus
have this difference P of A and B minus P of A and C, if you switch around B and
P of A and C, if you switch around B and C on that side, you end up with P of A
C on that side, you end up with P of A and C minus P of A and B, which is the
and C minus P of A and B, which is the same right hand side as before, but with
same right hand side as before, but with a sign flip. And then you think about
a sign flip. And then you think about the fact that we should be able to swap
the fact that we should be able to swap B and C around in this argument. By
B and C around in this argument. By symmetry, there's no meaningful
symmetry, there's no meaningful difference between the vectors B and C.
difference between the vectors B and C. And so then you can imagine that the
And so then you can imagine that the same line of reasoning shows us that our
same line of reasoning shows us that our left hand side is going to be greater
left hand side is going to be greater than or equal to plus or minus the right
than or equal to plus or minus the right hand side. And so without loss of
hand side. And so without loss of generality, we can go ahead and clean
generality, we can go ahead and clean that up and just say that the left hand
that up and just say that the left hand side is greater than or equal to the
side is greater than or equal to the absolute value of the right hand side.
absolute value of the right hand side. So we're not losing anything by shaving
So we're not losing anything by shaving off that negative option.
off that negative option. All right. So then Bell goes on to say
All right. So then Bell goes on to say that unless P is constant, the right
that unless P is constant, the right hand side is in general of order
hand side is in general of order absolute value B minus C for small
absolute value B minus C for small absolute value of B minus C. And in just
absolute value of B minus C. And in just a moment I'm going to unpack why that
a moment I'm going to unpack why that is. But real quick, I just want to read
is. But real quick, I just want to read the next thing that Belle wrote, which
the next thing that Belle wrote, which is that thus P of B and C cannot be
is that thus P of B and C cannot be stationary at the minimum value, which
stationary at the minimum value, which is -1, where B equals C. Right? When the
is -1, where B equals C. Right? When the axes are aligned, our correlation takes
axes are aligned, our correlation takes on its minimum value of a perfect
on its minimum value of a perfect anti-correlation.
anti-correlation. And therefore, the correlation function
And therefore, the correlation function cannot equal the quantum mechanical
cannot equal the quantum mechanical value given by equation 3, which is a b
value given by equation 3, which is a b or also known as negative cosine of
or also known as negative cosine of theta.
theta. Okay, now I am a huge fan of Belle and
Okay, now I am a huge fan of Belle and his work and he is a great genius, but
his work and he is a great genius, but my goodness does he say so much with so
my goodness does he say so much with so few words and here it's kind of hard to
few words and here it's kind of hard to see exactly what he's talking about. So
see exactly what he's talking about. So I want to take a moment to just unpack
I want to take a moment to just unpack this and really get into what exactly
this and really get into what exactly he's saying here. Okay, so the first
he's saying here. Okay, so the first thing to realize is that if we write our
thing to realize is that if we write our correlation function P of B and C and we
correlation function P of B and C and we think about it just as a mathematical
think about it just as a mathematical function that takes two vectors as input
function that takes two vectors as input and we know that this function is going
and we know that this function is going to take on a minimum value of -1 when
to take on a minimum value of -1 when the vector B equals the vector C. Then
the vector B equals the vector C. Then we can say that if the function were
we can say that if the function were stationary then the curve would be flat
stationary then the curve would be flat there at that value. Just like the
there at that value. Just like the negative cosine of theta curve of
negative cosine of theta curve of quantum mechanics is flat at theta
quantum mechanics is flat at theta equals 0. So if we claim that we have a
equals 0. So if we claim that we have a hidden variable model that matches the
hidden variable model that matches the quantum mechanical predictions, we
quantum mechanical predictions, we should expect it to have a slope of zero
should expect it to have a slope of zero when its two vector inputs are the same
when its two vector inputs are the same vector. But now if it's flat wherever
vector. But now if it's flat wherever its two vector inputs are the same, then
its two vector inputs are the same, then we can say something about this
we can say something about this situation. We can say that if now we
situation. We can say that if now we imagine that the vectors B and C are
imagine that the vectors B and C are very similar. They're almost the same. B
very similar. They're almost the same. B is approximately C with the absolute
is approximately C with the absolute value of B minus C. That is the size of
value of B minus C. That is the size of the tiny difference between these two
the tiny difference between these two vectors. Call that epsilon. And let's
vectors. Call that epsilon. And let's say epsilon is much less than one. It's
say epsilon is much less than one. It's a very small number. Then our
a very small number. Then our correlation function evaluated for the
correlation function evaluated for the inputs B and C is going to be -1 plus
inputs B and C is going to be -1 plus some positive number that is of order
some positive number that is of order epsilon squared or in principle it could
epsilon squared or in principle it could be higher order in epsilon but the
be higher order in epsilon but the biggest of a number it could be for
biggest of a number it could be for small epsilon is going to be of order of
small epsilon is going to be of order of epsilon squar but we can't have a first
epsilon squar but we can't have a first order term of order epsilon because then
order term of order epsilon because then that would be a slope in the function
that would be a slope in the function you see what I mean if P evaluated at B
you see what I mean if P evaluated at B and C for B approximately equal to C if
and C for B approximately equal to C if that had the form of -1 + something on
that had the form of -1 + something on the order of epsilon then the function
the order of epsilon then the function would be sloped there and that wouldn't
would be sloped there and that wouldn't be stationary that wouldn't be a zero
be stationary that wouldn't be a zero slope situation and so what we're saying
slope situation and so what we're saying here about this second order or higher
here about this second order or higher in epsilon I mean this is really just
in epsilon I mean this is really just the definition of what it means for the
the definition of what it means for the function to be stationary at its minimum
function to be stationary at its minimum value you know the slope is zero
value you know the slope is zero well okay then but the absolute value of
well okay then but the absolute value of The difference in the correlations P as
The difference in the correlations P as a function of A and B minus P as a
a function of A and B minus P as a function of A and C. That is the
function of A and C. That is the difference in correlations that we would
difference in correlations that we would get if first we have our detectors set
get if first we have our detectors set up with the axis A on one side and the
up with the axis A on one side and the axis B on the other side minus what the
axis B on the other side minus what the correlation would be if instead we had
correlation would be if instead we had the axis A on one side and the axis C on
the axis A on one side and the axis C on the other side where again C is equal to
the other side where again C is equal to B plus a tiny little nudge. Well, that
B plus a tiny little nudge. Well, that difference in the correlations, its
difference in the correlations, its absolute value is going to be first
absolute value is going to be first order in epsilon.
order in epsilon. The reason being our correlation
The reason being our correlation function changes when A and/ or B
function changes when A and/ or B change. I mean by definition, you think
change. I mean by definition, you think about how the correlation is defined as
about how the correlation is defined as the integral of the product of A and B
the integral of the product of A and B integrated over all possible lambda each
integrated over all possible lambda each weighted by the probability and lambda.
weighted by the probability and lambda. Well, the only thing that can change as
Well, the only thing that can change as we rotate C a little bit away from B in
we rotate C a little bit away from B in the correlation function is going to be
the correlation function is going to be some of the results at A and B changing
some of the results at A and B changing sign from +1 to minus1 or minus1 to +
sign from +1 to minus1 or minus1 to + one. And this is a very binary thing.
one. And this is a very binary thing. And so the amount of change that's
And so the amount of change that's happening here is going to be directly
happening here is going to be directly proportional to the difference between
proportional to the difference between the vectors B and C epsilon for small
the vectors B and C epsilon for small epsilon. And if you think about what the
epsilon. And if you think about what the measurement results are going to be at A
measurement results are going to be at A and B as we're moving C slightly away
and B as we're moving C slightly away from B, you can think about like a belt
from B, you can think about like a belt of area where A and B are flipping sign
of area where A and B are flipping sign and that's contributing to the change in
and that's contributing to the change in the correlation. Sort of like thinking
the correlation. Sort of like thinking about an orange slice having a volume
about an orange slice having a volume proportional to the wedge angle. And
proportional to the wedge angle. And then you see that that area of A and B
then you see that that area of A and B flipping around is going to be directly
flipping around is going to be directly proportional to this change in the
proportional to this change in the correlations.
correlations. Well, okay. So considering all of that
Well, okay. So considering all of that we run into a contradiction because then
we run into a contradiction because then equation 15 would imply that a positive
equation 15 would imply that a positive number which is second order in a small
number which is second order in a small epsilon is greater than or equal to a
epsilon is greater than or equal to a positive number which is first order in
positive number which is first order in a small epsilon. But that's not true for
a small epsilon. But that's not true for a small positive epsilon the first order
a small positive epsilon the first order term dominates because if epsilon is
term dominates because if epsilon is small then epsilon squar is a small* a
small then epsilon squar is a small* a small which is a tiny and so the thing
small which is a tiny and so the thing should be flipped around the other way.
should be flipped around the other way. you know something of the order of
you know something of the order of epsilon squar is going to be smaller
epsilon squar is going to be smaller than order of epsilon not what equation
than order of epsilon not what equation 15 would imply and so that mathematical
15 would imply and so that mathematical contradiction proves that our
contradiction proves that our correlation function cannot be
correlation function cannot be stationary at its minimum value unlike
stationary at its minimum value unlike the quantum mechanical correlation
the quantum mechanical correlation function which is stationary at its
function which is stationary at its minimum value. And so this is one way in
minimum value. And so this is one way in which we see that a local hidden
which we see that a local hidden variable model cannot give us the same
variable model cannot give us the same correlation function as quantum
correlation function as quantum mechanics which is also the correlation
mechanics which is also the correlation function that we see in experiments. And
function that we see in experiments. And so this right here is the first of the
so this right here is the first of the two parts of part 4 where we've proven
two parts of part 4 where we've proven that a local hidden variable model just
that a local hidden variable model just is not capable of reproducing the
is not capable of reproducing the statistical predictions of quantum
statistical predictions of quantum mechanics.
Now it's time to get into the second part of part four. And this is the main
part of part four. And this is the main argument of Bell's paper. This is the
argument of Bell's paper. This is the really powerful proof that the
really powerful proof that the correlation function we get from a local
correlation function we get from a local hidden variable model cannot be equal to
hidden variable model cannot be equal to the quantum mechanical correlation
the quantum mechanical correlation function. In other words, in just the
function. In other words, in just the same way that we've seen that a line
same way that we've seen that a line cannot be a cosine, it's true more
cannot be a cosine, it's true more generically that any kind of correlation
generically that any kind of correlation function we can get from a classical
function we can get from a classical hidden variable model cannot be equal to
hidden variable model cannot be equal to the quantum mechanical correlation of
the quantum mechanical correlation of negative cosine of theta also known as a
negative cosine of theta also known as a b. All right. So having already proven
b. All right. So having already proven the thing about the slope being non
the thing about the slope being non zero, Bell goes on to say, nor can the
zero, Bell goes on to say, nor can the quantum mechanical correlation of
quantum mechanical correlation of equation 3 that is a b also known as
equation 3 that is a b also known as negative cosine of theta be arbitrarily
negative cosine of theta be arbitrarily closely approximated by the form of
closely approximated by the form of equation 2 that is a correlation
equation 2 that is a correlation function given by a local hidden
function given by a local hidden variable model. No matter what kind of
variable model. No matter what kind of local hidden variable model you want to
local hidden variable model you want to come up with, it's just not the case
come up with, it's just not the case that the correlations that model gives
that the correlations that model gives you are going to be the same as the
you are going to be the same as the quantum mechanical correlations. And
quantum mechanical correlations. And this holds for all possible local hidden
this holds for all possible local hidden variable models.
variable models. The formal proof of this may be set out
The formal proof of this may be set out as follows. Well, first of all, we would
as follows. Well, first of all, we would not worry about the failure of the
not worry about the failure of the approximation at isolated points. So let
approximation at isolated points. So let us consider instead of equation 2 and
us consider instead of equation 2 and three the functions p bar of a and b and
three the functions p bar of a and b and a dob bar. And these functions are
a dob bar. And these functions are essentially exactly the same thing as
essentially exactly the same thing as equations 2 and three but they're
equations 2 and three but they're averaged over vectors near the vectors a
averaged over vectors near the vectors a and b. So the bar denotes independent
and b. So the bar denotes independent averaging of the correlations as a
averaging of the correlations as a function of a prime and b prime within
function of a prime and b prime within specified small angles of a and b.
specified small angles of a and b. Okay, so let's pause here and think
Okay, so let's pause here and think about what Belle is saying and why it
about what Belle is saying and why it matters. So this averaging thing, it's
matters. So this averaging thing, it's kind of a mathematically pedantic point.
kind of a mathematically pedantic point. But what Belle is saying here is look,
But what Belle is saying here is look, let's be generous and say that if
let's be generous and say that if someone came up with a local hidden
someone came up with a local hidden variable model which had a correlation
variable model which had a correlation function that matched the quantum
function that matched the quantum mechanical correlation for the most
mechanical correlation for the most part, but there were isolated points at
part, but there were isolated points at specific values of A and B where there
specific values of A and B where there was a mismatch between the local hidden
was a mismatch between the local hidden variable correlation and the quantum
variable correlation and the quantum mechanical correlation.
mechanical correlation. So for example, let's say P of A and B
So for example, let's say P of A and B is equal to A.B everywhere except at one
is equal to A.B everywhere except at one special point where A is equal to B or
special point where A is equal to B or whatever it may be. And at that one
whatever it may be. And at that one infinite decimally small point suppose
infinite decimally small point suppose there's some disagreement between P and
there's some disagreement between P and the quantum mechanical correlation.
the quantum mechanical correlation. What Belle is saying is don't worry
What Belle is saying is don't worry about that. If the local hidden variable
about that. If the local hidden variable model matches the quantum correlations
model matches the quantum correlations except at these special isolated points
except at these special isolated points where for whatever reason it doesn't
where for whatever reason it doesn't work out, you know what? We're going to
work out, you know what? We're going to be generous and we're going to say that
be generous and we're going to say that would work. The reason being
would work. The reason being experimentally we might not notice if
experimentally we might not notice if there was a mismatch at very specific
there was a mismatch at very specific isolated points between local hidden
isolated points between local hidden variables and quantum mechanics. And so
variables and quantum mechanics. And so when you're thinking about the mismatch
when you're thinking about the mismatch between the local hidden variable
between the local hidden variable correlation and the quantum mechanical
correlation and the quantum mechanical correlation, you want to kind of smear
correlation, you want to kind of smear things out or smooth things out just a
things out or smooth things out just a bit to where a mismatch at an isolated
bit to where a mismatch at an isolated point would be totally washed away. And
point would be totally washed away. And so all we're doing when we're taking
so all we're doing when we're taking this average over very close nearby
this average over very close nearby points is we're just saying don't worry
points is we're just saying don't worry if the correlation fails at specific
if the correlation fails at specific isolated points. That's all that is. So
isolated points. That's all that is. So to imagine the vectors A prime and B
to imagine the vectors A prime and B prime, just imagine the vectors A and B,
prime, just imagine the vectors A and B, but then smear them out just a little
but then smear them out just a little bit over a tiny little space of nearby
bit over a tiny little space of nearby vectors. That's all that means.
vectors. That's all that means. Suppose that for all A and B, the
Suppose that for all A and B, the difference between the local hidden
difference between the local hidden variable correlation and the quantum
variable correlation and the quantum mechanical correlation is bounded by
mechanical correlation is bounded by some number epsilon.
some number epsilon. That is P bar of A and B plus A.B bar.
That is P bar of A and B plus A.B bar. the absolute value is always going to be
the absolute value is always going to be less than or equal to this value
less than or equal to this value epsilon.
epsilon. Now the thing you have to see about
Now the thing you have to see about equation 16 is that this is just the
equation 16 is that this is just the local hidden variable model correlation
local hidden variable model correlation minus the quantum mechanical correlation
minus the quantum mechanical correlation because the quantum mechanical
because the quantum mechanical correlation is negative a b and so this
correlation is negative a b and so this plus a b this is minus the quantum
plus a b this is minus the quantum mechanical correlation and then you take
mechanical correlation and then you take the absolute value and that is just the
the absolute value and that is just the magnitude of the error or the mismatch
magnitude of the error or the mismatch between our local hidden variable models
between our local hidden variable models correlation and the quantum mechanical
correlation and the quantum mechanical correlation.
correlation. And so what epsilon represents is the
And so what epsilon represents is the maximum amount of error in our local
maximum amount of error in our local hidden variable model relative to
hidden variable model relative to quantum mechanics. And then as a
quantum mechanics. And then as a reminder, these bars are just there to
reminder, these bars are just there to say don't worry about single isolated
say don't worry about single isolated points where there's a mismatch. We'll
points where there's a mismatch. We'll allow that. We're going to go ahead and
allow that. We're going to go ahead and smooth out or filter out any infinite
smooth out or filter out any infinite decimally small points of mismatch. So
decimally small points of mismatch. So epsilon is the maximum mismatch when you
epsilon is the maximum mismatch when you factor out any infinite decimally small
factor out any infinite decimally small areas where the two correlation
areas where the two correlation functions disagree. And so if we can
functions disagree. And so if we can show that epsilon is zero for some local
show that epsilon is zero for some local hidden variable model then that model
hidden variable model then that model would effectively reproduce the quantum
would effectively reproduce the quantum mechanical correlation. So that would
mechanical correlation. So that would work. However, it will be shown that
work. However, it will be shown that epsilon cannot be made arbitrarily
epsilon cannot be made arbitrarily small. That is what we're about to prove
small. That is what we're about to prove is that at minimum epsilon has to be
is that at minimum epsilon has to be some nonzero number. And so therefore,
some nonzero number. And so therefore, you're always going to have some
you're always going to have some mismatch between the local hidden
mismatch between the local hidden variable correlation and the quantum
variable correlation and the quantum mechanical correlation, no matter the
mechanical correlation, no matter the details of your local hidden variable
details of your local hidden variable model. So that's going to be the main
model. So that's going to be the main proof of Bell's paper. All right. So
proof of Bell's paper. All right. So next we're going to massage equation 16
next we're going to massage equation 16 into a slightly different form by
into a slightly different form by supposing that for all a and b the
supposing that for all a and b the absolute value of a dob bar minus a dob
absolute value of a dob bar minus a dob is going to be less than or equal to
is going to be less than or equal to some small number delta.
some small number delta. So this expression is the mismatch
So this expression is the mismatch between the average dotproduct over the
between the average dotproduct over the a prime and b prime that are close to a
a prime and b prime that are close to a and b minus the exact dotproduct a dob.
and b minus the exact dotproduct a dob. So you can think about this as the error
So you can think about this as the error introduced into the quantum mechanical
introduced into the quantum mechanical correlation as a result of our averaging
correlation as a result of our averaging technique. So as we smooth things out
technique. So as we smooth things out just a little bit and we average away
just a little bit and we average away those infinite decimal potential points
those infinite decimal potential points of mismatch. Suppose that this is going
of mismatch. Suppose that this is going to smear things out such that the
to smear things out such that the average of the dotproduct of a and b
average of the dotproduct of a and b minus the dotproduct of exactly a and
minus the dotproduct of exactly a and exactly b is going to be at most some
exactly b is going to be at most some small number delta.
small number delta. Then from equation 16 we find that p bar
Then from equation 16 we find that p bar of a and b plus a dob that is the
of a and b plus a dob that is the average local hidden variable
average local hidden variable correlation function minus the exact
correlation function minus the exact quantum mechanical correlation function
quantum mechanical correlation function evaluated at exactly a and b. Notice we
evaluated at exactly a and b. Notice we no longer have the bar over a dob is
no longer have the bar over a dob is going to be less than or equal to the
going to be less than or equal to the small number epsilon plus the small
small number epsilon plus the small number delta. That is the mismatch
number delta. That is the mismatch between P bar and the exact quantum
between P bar and the exact quantum correlation function evaluated at A and
correlation function evaluated at A and B has to be less than or equal to the
B has to be less than or equal to the maximum mismatch between P bar and A.B
maximum mismatch between P bar and A.B bar plus whatever the maximum number is
bar plus whatever the maximum number is that results from us smearing out the
that results from us smearing out the quantum mechanical correlation a dob
quantum mechanical correlation a dob into a b bar. And that kind of makes
into a b bar. And that kind of makes sense just by looking at it. But just to
sense just by looking at it. But just to show exactly how equation 18 follows
show exactly how equation 18 follows from equation 16 and 17, we can go ahead
from equation 16 and 17, we can go ahead and write equation 18 as p bar of a and
and write equation 18 as p bar of a and b plus a dob bar plus a dob minus a dob
b plus a dob bar plus a dob minus a dob bar. See, all we've done here is within
bar. See, all we've done here is within that absolute value, we've added an a.b
that absolute value, we've added an a.b bar and we've subtracted out an a.b bar.
bar and we've subtracted out an a.b bar. Now, why does that matter? Well, because
Now, why does that matter? Well, because now we know that that has to be less
now we know that that has to be less than or equal to the absolute value of p
than or equal to the absolute value of p of a and b plus a dob bar plus the
of a and b plus a dob bar plus the absolute value of a dob minus a dob bar.
absolute value of a dob minus a dob bar. And that comes from the triangle
And that comes from the triangle inequality, which is the idea that if
inequality, which is the idea that if you have the absolute value of x + y,
you have the absolute value of x + y, that can at most be the absolute value
that can at most be the absolute value of x plus the absolute value of y. Well,
of x plus the absolute value of y. Well, then now if you examine these two
then now if you examine these two quantities on the right side, you see
quantities on the right side, you see that the first one p bar of a and b plus
that the first one p bar of a and b plus a dob bar is what we have in equation
a dob bar is what we have in equation 16. And so we know that has to be less
16. And so we know that has to be less than or equal to epsilon.
than or equal to epsilon. And then the yellow expression a dob
And then the yellow expression a dob minus a dob bar absolute value. Well,
minus a dob bar absolute value. Well, that's the same thing we have in
that's the same thing we have in equation 17. And so that has to be less
equation 17. And so that has to be less than or equal to delta. And so
than or equal to delta. And so therefore, the whole thing has to be
therefore, the whole thing has to be less than or equal to epsilon plus
less than or equal to epsilon plus delta. And so therefore we've just
delta. And so therefore we've just proven equation 18.
proven equation 18. Okay. So next we want to think about
Okay. So next we want to think about what exactly is par bar of a and b. Well
what exactly is par bar of a and b. Well by equation two this is just going to be
by equation two this is just going to be p of a and b the local hidden variable
p of a and b the local hidden variable correlation function but averaged out
correlation function but averaged out over a space of vectors a prime and b
over a space of vectors a prime and b prime which are very close to the
prime which are very close to the vectors a and b but just a little bit
vectors a and b but just a little bit smeared out so we don't worry about
smeared out so we don't worry about weird singular points. And so therefore
weird singular points. And so therefore we can write P bar in exactly the same
we can write P bar in exactly the same way as we write P in equation 2. But
way as we write P in equation 2. But here we simply put a bar over A and B
here we simply put a bar over A and B because when we smear out the vectors A
because when we smear out the vectors A and B a little bit and we ask what is P
and B a little bit and we ask what is P bar? Well, that's just going to depend
bar? Well, that's just going to depend on how smearing out A and B affects the
on how smearing out A and B affects the average of the results at detector A and
average of the results at detector A and detector B because the correlation
detector B because the correlation function is just the product of the
function is just the product of the results at A and B integrated over all
results at A and B integrated over all possible hidden variables. And remember
possible hidden variables. And remember that the bar is just averaging out or
that the bar is just averaging out or smearing out the vectors A and B a
smearing out the vectors A and B a little bit. So that's not going to
little bit. So that's not going to affect the distribution of hidden
affect the distribution of hidden variables. And that's why we don't have
variables. And that's why we don't have a bar over the row because this process
a bar over the row because this process of smearing out A and B doesn't have any
of smearing out A and B doesn't have any effect on the probability distribution
effect on the probability distribution of our hidden variables.
of our hidden variables. But now if you think about what are the
But now if you think about what are the values that a bar and b bar are going to
values that a bar and b bar are going to take on. Well remember that the results
take on. Well remember that the results at a and b can only ever be plus or
at a and b can only ever be plus or minus one. And so now when we smear out
minus one. And so now when we smear out a and b and we're going to average over
a and b and we're going to average over the values that a and b take on for
the values that a and b take on for these smeared out vectors. Well then we
these smeared out vectors. Well then we find that at most the absolute values of
find that at most the absolute values of a and b are going to be one. But now it
a and b are going to be one. But now it is possible for a bar and b bar to be
is possible for a bar and b bar to be less than one. If when we smear out the
less than one. If when we smear out the vectors a and b, we dip into a space of
vectors a and b, we dip into a space of the detector results where the sign
the detector results where the sign flips relative to what it would have
flips relative to what it would have been along exactly the measurement
been along exactly the measurement direction a or the measurement direction
direction a or the measurement direction b. That is to say, if the result at
b. That is to say, if the result at detector a is a function of a is equal
detector a is a function of a is equal to 1. But if you give a a little nudge,
to 1. But if you give a a little nudge, then you could nudge the result into
then you could nudge the result into being negative 1. If the measurement
being negative 1. If the measurement direction a is right on the edge of what
direction a is right on the edge of what determines the sign of the result at
determines the sign of the result at detector A, well then in that case, the
detector A, well then in that case, the absolute value of the result at A might
absolute value of the result at A might be something like 0.9 or 0.8 or
be something like 0.9 or 0.8 or whatever. But no matter what, it's going
whatever. But no matter what, it's going to be some number less than or equal to
to be some number less than or equal to 1.
1. All right. And next, Belle goes ahead
All right. And next, Belle goes ahead and constructs equation 21 from
and constructs equation 21 from equations 18 and 19 with the measurement
equations 18 and 19 with the measurement direction A set equal to the measurement
direction A set equal to the measurement direction B. So that yields equation 21.
direction B. So that yields equation 21. And in just a moment, we're going to use
And in just a moment, we're going to use equation 21 and we're going to see why
equation 21 and we're going to see why it matters and why Bell writes it out.
it matters and why Bell writes it out. But for now, I just want to reflect on
But for now, I just want to reflect on how equation 21 follows from equations
how equation 21 follows from equations 18 and 19. So the first thing to
18 and 19. So the first thing to recognize is that the right hand side of
recognize is that the right hand side of equation 21 is precisely the same as the
equation 21 is precisely the same as the right hand side of equation 18. And then
right hand side of equation 18. And then if you look at the left hand side of
if you look at the left hand side of equation 18, you see the p bar of a and
equation 18, you see the p bar of a and b. And you can recognize that on the
b. And you can recognize that on the left hand side of equation 21 as the
left hand side of equation 21 as the integral over all row d lambda of a as a
integral over all row d lambda of a as a function of b and lambda time b as a
function of b and lambda time b as a function of b and lambda. Because
function of b and lambda. Because remember here in the context of equation
remember here in the context of equation 21, we're setting the two measurement
21, we're setting the two measurement axes to the same vector B for both
axes to the same vector B for both detectors.
detectors. And so then we see that this integral
And so then we see that this integral expression is par bar evaluated for the
expression is par bar evaluated for the vectors b and b.
vectors b and b. Now then you notice there's also that
Now then you notice there's also that plus one inside the integrant and that
plus one inside the integrant and that is simply a dob because when a and b are
is simply a dob because when a and b are the same unit vector then you have b dob
the same unit vector then you have b dob which is magnitude of b ^ 2 which is 1
which is magnitude of b ^ 2 which is 1 cuz b is a unit vector and the one we
cuz b is a unit vector and the one we can bring inside or outside of the
can bring inside or outside of the integral because of the fact that the
integral because of the fact that the integral of row lambda d lambda is equal
integral of row lambda d lambda is equal to 1 because row is a normalized
to 1 because row is a normalized probability distribution.
probability distribution. And then there's another little detail
And then there's another little detail here, which is that notice how we've
here, which is that notice how we've dropped the absolute value sign on the
dropped the absolute value sign on the left side of equation 18. The reason
left side of equation 18. The reason that's an okay move is because by
that's an okay move is because by inspection, the integral on the left
inspection, the integral on the left hand side of equation 21 cannot be
hand side of equation 21 cannot be negative. Because if you consider the
negative. Because if you consider the product of a bar and b bar, the minimum
product of a bar and b bar, the minimum value that can be is -1. Say a bar is 1
value that can be is -1. Say a bar is 1 and b bar is ne. And so therefore, a bar
and b bar is ne. And so therefore, a bar * b bar + 1 is at least zero. It can't
* b bar + 1 is at least zero. It can't go negative. So then when we integrate
go negative. So then when we integrate over a bar * b bar + 1, we're always
over a bar * b bar + 1, we're always integrating over a non- negative number.
integrating over a non- negative number. And so that's why we can just go ahead
And so that's why we can just go ahead and drop the absolute value sign because
and drop the absolute value sign because if we know it's not negative, then
if we know it's not negative, then there's no point in having an absolute
there's no point in having an absolute value sign. Okay, so I'm sure you're
value sign. Okay, so I'm sure you're wondering, what's the point of equation
wondering, what's the point of equation 21? Where are we going with this? Why
21? Where are we going with this? Why does this matter? Well, I want to take a
does this matter? Well, I want to take a moment to recognize where we are
moment to recognize where we are currently at in the paper as a kind of
currently at in the paper as a kind of natural checkpoint in part 4B. So,
natural checkpoint in part 4B. So, everything we've done up until now is
everything we've done up until now is sort of the warm-up of part 4B. We've
sort of the warm-up of part 4B. We've essentially been setting the stage,
essentially been setting the stage, thinking about what it is that we want
thinking about what it is that we want to prove, thinking about averaging,
to prove, thinking about averaging, smoothing things out, not worrying about
smoothing things out, not worrying about isolated points and all this sort of
isolated points and all this sort of thing. and then introducing these
thing. and then introducing these quantities epsilon and delta and making
quantities epsilon and delta and making some algebraic observations.
some algebraic observations. In the next part of this derivation,
In the next part of this derivation, we're going to be utilizing these
we're going to be utilizing these equations to make an algebraic argument
equations to make an algebraic argument which is going to lead to Bell's famous
which is going to lead to Bell's famous result that the error between the local
result that the error between the local hidden variable correlation and the
hidden variable correlation and the quantum mechanical correlation that is
quantum mechanical correlation that is epsilon cannot be made arbitrarily
epsilon cannot be made arbitrarily small. which is to say that no local
small. which is to say that no local hidden variable model can reproduce the
hidden variable model can reproduce the statistics of quantum mechanics to an
statistics of quantum mechanics to an arbitrarily good approximation.
arbitrarily good approximation. And then the next thing Belle goes ahead
And then the next thing Belle goes ahead and does is he writes an expression for
and does is he writes an expression for P bar as a function of A and B minus P
P bar as a function of A and B minus P bar as a function of A and C. And in
bar as a function of A and C. And in just a moment I'll tell you exactly what
just a moment I'll tell you exactly what that is. But for now, let's see why the
that is. But for now, let's see why the equation is true. So if you look at
equation is true. So if you look at equation 19, we have the definition of P
equation 19, we have the definition of P bar as a function of A and B, which is
bar as a function of A and B, which is simply the integral definition of the
simply the integral definition of the correlation P as a function of A and B,
correlation P as a function of A and B, that is equation 2, but average over
that is equation 2, but average over smeared out vectors near A and B. So
smeared out vectors near A and B. So that's why we have A bar and B bar in
that's why we have A bar and B bar in the integrant. Well, then if we want to
the integrant. Well, then if we want to write the expression P bar of A and B
write the expression P bar of A and B minus P bar of A and C, we can just go
minus P bar of A and C, we can just go ahead and copy and paste equation 19
ahead and copy and paste equation 19 twice. in the first case evaluated for
twice. in the first case evaluated for the vectors A and B and in the second
the vectors A and B and in the second case evaluated for the vectors A and C
case evaluated for the vectors A and C and then you may as well smoosh them
and then you may as well smoosh them together into the same integral. So
together into the same integral. So that's all we've written here. It's
that's all we've written here. It's basically just equation 19.
basically just equation 19. So now let's reflect on what is this
So now let's reflect on what is this quantity P bar of A and B minus P bar of
quantity P bar of A and B minus P bar of A and C. Well, you want to think of C as
A and C. Well, you want to think of C as another alternative for B that is the
another alternative for B that is the measurement axis of detector B. And this
measurement axis of detector B. And this is just like how we had imagined the
is just like how we had imagined the vector C before in part 4 A. However,
vector C before in part 4 A. However, whereas before we imagined that B and C
whereas before we imagined that B and C were very similar vectors, so that C was
were very similar vectors, so that C was just a little nudge away from B. And
just a little nudge away from B. And that let us probe the behavior of the
that let us probe the behavior of the correlation P of B and C near its
correlation P of B and C near its minimum value where B equals C. We're
minimum value where B equals C. We're now going to imagine the vector C as
now going to imagine the vector C as being totally unrelated to B. So not
being totally unrelated to B. So not just a nudge away but a whole different
just a nudge away but a whole different vector that we are totally free to
vector that we are totally free to choose for the measurement axis of
choose for the measurement axis of detector B. So then in that context P
detector B. So then in that context P bar of A and B minus P bar of A and C is
bar of A and B minus P bar of A and C is the difference between the correlation
the difference between the correlation strengths that we would measure for the
strengths that we would measure for the detector settings A and B compared to
detector settings A and B compared to the detector settings A and C. Now of
the detector settings A and C. Now of course we have the bar on the P and so
course we have the bar on the P and so we're neglecting aberant isolated
we're neglecting aberant isolated points. you know, we're smoothing out
points. you know, we're smoothing out any infinite decimal pathological point.
any infinite decimal pathological point. And so that's why we have the bar and
And so that's why we have the bar and the P. All right. So then now Belle goes
the P. All right. So then now Belle goes ahead and writes this equation in a form
ahead and writes this equation in a form that looks way more complicated, but is
that looks way more complicated, but is going to be useful in a moment. So he
going to be useful in a moment. So he writes out this integral expression like
writes out this integral expression like so. I'm not going to try to pronounce
so. I'm not going to try to pronounce this equation cuz it's a mouthful. But I
this equation cuz it's a mouthful. But I will show you why this is a legit move
will show you why this is a legit move and why this complicated expression is
and why this complicated expression is in fact algebraically equivalent to the
in fact algebraically equivalent to the previous integral. So to recognize this,
previous integral. So to recognize this, you just have to consider the fact that
you just have to consider the fact that if you have an expression of the form x
if you have an expression of the form x * y - x * z. If you want, you could
* y - x * z. If you want, you could write that as x * y * the quantity of 1
write that as x * y * the quantity of 1 + w * z - x * z * the quantity of 1 + w
+ w * z - x * z * the quantity of 1 + w * y, assuming all these variables
* y, assuming all these variables commute, which they do because they're
commute, which they do because they're scalers. And the reason that's true is
scalers. And the reason that's true is because on the right hand side here, the
because on the right hand side here, the terms involving w are going to cancel
terms involving w are going to cancel each other out. In one case, you'll have
each other out. In one case, you'll have uh xy wz, but then you're going to have
uh xy wz, but then you're going to have a minus xzwy.
a minus xzwy. And so you're going to end up with
And so you're going to end up with wxyzus wxyz equals zero. Then what
wxyzus wxyz equals zero. Then what remains the terms multiplied by 1 is
remains the terms multiplied by 1 is just xy - xz, which is exactly the left
just xy - xz, which is exactly the left hand side of the equation. So if you
hand side of the equation. So if you look at the integral expression shown
look at the integral expression shown here on the bottom line, you see that it
here on the bottom line, you see that it has this complicated form where we have
has this complicated form where we have something of the form xy * 1 + wz - xz *
something of the form xy * 1 + wz - xz * 1 + wy. And so that's how to see that
1 + wy. And so that's how to see that these two integrals are equivalent. So
these two integrals are equivalent. So this kind of feels like backwards math.
this kind of feels like backwards math. Like if you started with the second
Like if you started with the second line, you would feel a sense of
line, you would feel a sense of accomplishment upon seeing that the
accomplishment upon seeing that the terms simplify into the first line. But
terms simplify into the first line. But here we're going backwards. We're
here we're going backwards. We're expanding out the equation. We're making
expanding out the equation. We're making it more messy because this is going to
it more messy because this is going to be a form that's going to be useful for
be a form that's going to be useful for us in just a moment.
us in just a moment. All right. So, where do we go from here?
All right. So, where do we go from here? Well, think about what this equation is.
Well, think about what this equation is. This is a generic statement that for any
This is a generic statement that for any local hidden variable model, the
local hidden variable model, the difference between the correlations that
difference between the correlations that we would expect with the measurement
we would expect with the measurement axes A and B compared to A and C is
axes A and B compared to A and C is going to be equal to this big mess of an
going to be equal to this big mess of an equation involving integrating over
equation involving integrating over these expressions involving the various
these expressions involving the various outcomes at A and B with given
outcomes at A and B with given measurement axes A, B, and C. So the
measurement axes A, B, and C. So the difference in correlations equals a big
difference in correlations equals a big mess. And the next move that we're going
mess. And the next move that we're going to do is we're going to convert this
to do is we're going to convert this equation into an inequality. And in the
equation into an inequality. And in the process, we're also going to convert the
process, we're also going to convert the big mess into a medium-sized mess.
big mess into a medium-sized mess. From equation 20, we find that the
From equation 20, we find that the absolute value of this difference in
absolute value of this difference in correlations is going to be less than or
correlations is going to be less than or equal to this medium-sized mess.
equal to this medium-sized mess. Now, to get from this inequality from
Now, to get from this inequality from the previous equation, it only takes a
the previous equation, it only takes a couple of steps. The first thing you
couple of steps. The first thing you want to do is take the absolute value of
want to do is take the absolute value of both sides. So you see on the left hand
both sides. So you see on the left hand side, we've simply taken the absolute
side, we've simply taken the absolute value of the difference in correlations.
value of the difference in correlations. And then when you take the absolute
And then when you take the absolute value of the right hand side, you find
value of the right hand side, you find that you're taking the absolute value of
that you're taking the absolute value of an integral minus an integral or plus a
an integral minus an integral or plus a negative integral if you want to think
negative integral if you want to think about it like that. And then you realize
about it like that. And then you realize that by the triangle inequality, the
that by the triangle inequality, the absolute value of the sum of two
absolute value of the sum of two integrals can be at most the absolute
integrals can be at most the absolute value of the first integral plus the
value of the first integral plus the absolute value of the second integral.
absolute value of the second integral. And so then because we're converting the
And so then because we're converting the equation to an inequality, then we can
equation to an inequality, then we can go ahead and imagine the absolute value
go ahead and imagine the absolute value on the right hand side applying to each
on the right hand side applying to each integral individually.
integral individually. And then because a bar * b bar is at
And then because a bar * b bar is at least -1 because there's no way if a bar
least -1 because there's no way if a bar and b bar could be less than negative 1
and b bar could be less than negative 1 then the quantity 1 + a bar b bar is
then the quantity 1 + a bar b bar is always going to be non- negative. So
always going to be non- negative. So that's all good.
that's all good. All right. Now at this stage in the
All right. Now at this stage in the derivation it should not be obvious why
derivation it should not be obvious why we care about this inequality that we've
we care about this inequality that we've written here. But if you look at this
written here. But if you look at this equation you can see a bit of
equation you can see a bit of foreshadowing here. The reason being we
foreshadowing here. The reason being we have a very generic statement that
have a very generic statement that applies for any local hidden variable
applies for any local hidden variable model which says that the magnitude of
model which says that the magnitude of the difference in the correlations for
the difference in the correlations for the settings A and B versus the settings
the settings A and B versus the settings A and C are going to be bounded by an
A and C are going to be bounded by an upper limit given by the right hand side
upper limit given by the right hand side of this inequality. So you can imagine
of this inequality. So you can imagine that we're just a few algebraic moves
that we're just a few algebraic moves away from a very interesting result
away from a very interesting result which constrains all possible local
which constrains all possible local hidden variable models in a way that is
hidden variable models in a way that is relevant to the question of whether
relevant to the question of whether local hidden variable models can
local hidden variable models can reproduce the statistical correlations
reproduce the statistical correlations of quantum mechanics.
of quantum mechanics. So in service of that goal, we can now
So in service of that goal, we can now go ahead and rewrite this inequality
go ahead and rewrite this inequality with a much simpler right- hand side.
with a much simpler right- hand side. See from equations 19 and 21 we can see
See from equations 19 and 21 we can see that the expression on the right hand
that the expression on the right hand side is going to be less than or equal
side is going to be less than or equal to 1 + p bar of b and c plus epsilon
to 1 + p bar of b and c plus epsilon plus delta. The reason being if you look
plus delta. The reason being if you look at the first of the two integrals on the
at the first of the two integrals on the right hand side we see that there's a 1
right hand side we see that there's a 1 which can be pulled outside of the
which can be pulled outside of the integral because row of lambda is a
integral because row of lambda is a normalized probability distribution.
normalized probability distribution. And then what remains in that integral
And then what remains in that integral is by definition P bar evaluated with
is by definition P bar evaluated with the vectors B and C by equation 19. So
the vectors B and C by equation 19. So the first integral is going to be
the first integral is going to be exactly equal to 1 + P bar of B and C.
exactly equal to 1 + P bar of B and C. And then if you look at the second
And then if you look at the second integral, you find that that is exactly
integral, you find that that is exactly the left hand side of equation 21
the left hand side of equation 21 because we're integrating over row d
because we're integrating over row d lambda of 1 plus a bar of b and lambda *
lambda of 1 plus a bar of b and lambda * b bar of b and lambda. And we've already
b bar of b and lambda. And we've already established in equation 21 that that has
established in equation 21 that that has to be less than or equal to epsilon plus
to be less than or equal to epsilon plus delta. And so those inequalities stack.
delta. And so those inequalities stack. And so then we can go ahead and pull
And so then we can go ahead and pull that down to the bottom line here. And
that down to the bottom line here. And we end up with this much more elegant
we end up with this much more elegant upper bound on the difference between
upper bound on the difference between the correlations of a local hidden
the correlations of a local hidden variable model for detector settings A
variable model for detector settings A and B versus A and C. And now we're
and B versus A and C. And now we're really getting somewhere. You can see
really getting somewhere. You can see that things are starting to clean up
that things are starting to clean up really nicely. And so now Bell goes on
really nicely. And so now Bell goes on to abruptly say that finally using
to abruptly say that finally using equation 18, the absolute value of a C
equation 18, the absolute value of a C minus A.B B - 2 epsilon + delta has to
minus A.B B - 2 epsilon + delta has to be less than or equal to 1 minus B do C
be less than or equal to 1 minus B do C + 2 quantity epsilon + delta. And that's
+ 2 quantity epsilon + delta. And that's a bit of a leap. You know, you can't
a bit of a leap. You know, you can't really see that just by looking at it.
really see that just by looking at it. So, we have to take a moment to see why
So, we have to take a moment to see why that's the case. All right. So, if you
that's the case. All right. So, if you look at equation 18, we find that the
look at equation 18, we find that the absolute value of p bar of a and b plus
absolute value of p bar of a and b plus a dob is less than or equal to epsilon
a dob is less than or equal to epsilon plus delta.
plus delta. And remember what that equation means.
And remember what that equation means. That is the absolute value of the
That is the absolute value of the difference between the correlation
difference between the correlation function given by a local hidden
function given by a local hidden variable model and smoothed out a little
variable model and smoothed out a little bit. So we're neglecting any
bit. So we're neglecting any pathological aberant points minus the
pathological aberant points minus the quantum mechanical correlation of a b.
quantum mechanical correlation of a b. And as we saw earlier that has to be
And as we saw earlier that has to be less than or equal to epsilon plus delta
less than or equal to epsilon plus delta where epsilon is the upper bound on the
where epsilon is the upper bound on the mismatch between the local hidden
mismatch between the local hidden variable correlation and the quantum
variable correlation and the quantum mechanical correlation.
mechanical correlation. And this small number delta encodes the
And this small number delta encodes the mismatch between the precise quantum
mismatch between the precise quantum mechanical correlation and the slightly
mechanical correlation and the slightly smeared out quantum mechanical
smeared out quantum mechanical correlation when we're averaging over
correlation when we're averaging over the vectors a prime and b prime near a
the vectors a prime and b prime near a and b respectively. And we saw earlier
and b respectively. And we saw earlier why equation 18 is true. But now we can
why equation 18 is true. But now we can think of it in another way which is to
think of it in another way which is to say equation 18 tells us that p bar of a
say equation 18 tells us that p bar of a and b is going to be equal to a dob plus
and b is going to be equal to a dob plus some error which let's go ahead and
some error which let's go ahead and subscript that error sub a. And the
subscript that error sub a. And the reason this follows directly from
reason this follows directly from equation 18 is that equation 18 tells us
equation 18 is that equation 18 tells us that the difference between p bar of a
that the difference between p bar of a and b and a dob the absolute value of
and b and a dob the absolute value of that is going to be bounded by the sum
that is going to be bounded by the sum of two small numbers epsilon and delta.
of two small numbers epsilon and delta. And so therefore p bar of a and b and a
And so therefore p bar of a and b and a do.b are going to be pretty similar
do.b are going to be pretty similar numbers. And so we can think about these
numbers. And so we can think about these two as the same thing plus some error
two as the same thing plus some error factor.
factor. So now then if you take that reasoning
So now then if you take that reasoning and you apply it to the inequality we
and you apply it to the inequality we derived before regarding the absolute
derived before regarding the absolute value of p bar of a and b minus p bar of
value of p bar of a and b minus p bar of a and c you see that we can go ahead and
a and c you see that we can go ahead and replace those p bars with a quantum
replace those p bars with a quantum correlation a dob plus error sub a and
correlation a dob plus error sub a and then for the negative p bar of a and c
then for the negative p bar of a and c that becomes for the same reason plus a
that becomes for the same reason plus a do c minus error a sub c And you see
do c minus error a sub c And you see we've gone ahead and distributed a minus
we've gone ahead and distributed a minus sign throughout those terms.
sign throughout those terms. And so thinking about equation 18 as a
And so thinking about equation 18 as a statement about the error between par
statement about the error between par and the quantum mechanical correlation
and the quantum mechanical correlation with the absolute value of the error
with the absolute value of the error bounded by epsilon plus delta. We can go
bounded by epsilon plus delta. We can go ahead and replace any expression
ahead and replace any expression involving p bar with the quantum
involving p bar with the quantum mechanical correlation plus that error.
mechanical correlation plus that error. And so likewise on the right hand side
And so likewise on the right hand side we can go ahead and replace par bar of b
we can go ahead and replace par bar of b and c with negative b c plus error
and c with negative b c plus error subbc.
subbc. And so now what we want to do is ideally
And so now what we want to do is ideally we would like to replace these error
we would like to replace these error factors with factors of epsilon plus
factors with factors of epsilon plus delta. But when we do that, we have to
delta. But when we do that, we have to be careful because it's not guaranteed
be careful because it's not guaranteed that the absolute value of the error is
that the absolute value of the error is going to equal epsilon plus delta
going to equal epsilon plus delta because in general, it's going to be
because in general, it's going to be actually less than or equal to epsilon
actually less than or equal to epsilon plus delta. And so if we're starting
plus delta. And so if we're starting with this inequality about the absolute
with this inequality about the absolute value of p bar of a and b minus p bar of
value of p bar of a and b minus p bar of a and c and we want to go from that
a and c and we want to go from that inequality to another inequality where
inequality to another inequality where we can replace these error factors with
we can replace these error factors with factors of epsilon and delta and we want
factors of epsilon and delta and we want to make sure that logically our new
to make sure that logically our new inequality actually does logically
inequality actually does logically follow from the previous one. then we
follow from the previous one. then we have to consider the quote unquote worst
have to consider the quote unquote worst case scenario where the magnitude of the
case scenario where the magnitude of the error is indeed equal to epsilon plus
error is indeed equal to epsilon plus delta. And in a way, this is the best
delta. And in a way, this is the best case scenario for ensuring that the
case scenario for ensuring that the inequality that we're going to arrive at
inequality that we're going to arrive at is true. Because what this means is that
is true. Because what this means is that on the left hand side of this
on the left hand side of this expression, we're going to subtract 2 *
expression, we're going to subtract 2 * the quantity of epsilon plus delta
the quantity of epsilon plus delta corresponding to the most that our error
corresponding to the most that our error factors could pull down the left side of
factors could pull down the left side of that inequality to make it as small as
that inequality to make it as small as possible. And then likewise on the right
possible. And then likewise on the right hand side of the expression, we're going
hand side of the expression, we're going to let our error be the most it could
to let our error be the most it could possibly be. So we're going to add
possibly be. So we're going to add epsilon plus delta on the right side to
epsilon plus delta on the right side to bring up the right hand side as much as
bring up the right hand side as much as we possibly can.
we possibly can. And so because we did it like that where
And so because we did it like that where we considered, okay, worst case
we considered, okay, worst case scenario, the error is as big as
scenario, the error is as big as possible and we're going to let it pull
possible and we're going to let it pull down the small side and push up the big
down the small side and push up the big side. then we know for sure that the
side. then we know for sure that the simpler inequality where the errors have
simpler inequality where the errors have been replaced with epsilon plus delta is
been replaced with epsilon plus delta is guaranteed to still be true.
guaranteed to still be true. All right. Now, there's one little
All right. Now, there's one little adjustment we're going to do cuz when
adjustment we're going to do cuz when you look at an equation like this, you
you look at an equation like this, you think maybe we can clean this up a
think maybe we can clean this up a little bit. So, let's go ahead and pull
little bit. So, let's go ahead and pull all factors of epsilon and delta on over
all factors of epsilon and delta on over to the left side of the expression. And
to the left side of the expression. And while we're at it, let's go ahead and
while we're at it, let's go ahead and flip around the inequality and then put
flip around the inequality and then put everything else on the right. So with
everything else on the right. So with just a little bit of algebraic
just a little bit of algebraic maneuvering we end up with this
maneuvering we end up with this inequality that 4 * the quantity of
inequality that 4 * the quantity of epsilon plus delta is guaranteed to be
epsilon plus delta is guaranteed to be greater than or equal to the absolute
greater than or equal to the absolute value of a dot c minus a dob plus b dot
value of a dot c minus a dob plus b dot c minus 1. This is equation 22 of bell's
c minus 1. This is equation 22 of bell's paper. And this is a very profound
paper. And this is a very profound result. In fact, you know, the term
result. In fact, you know, the term Bell's theorem is kind of a vague
Bell's theorem is kind of a vague generic statement that applies generally
generic statement that applies generally to Bell's observation that local hidden
to Bell's observation that local hidden variable models don't work. But if you
variable models don't work. But if you had to take a single equation, or in
had to take a single equation, or in this case, an inequality from Bell's
this case, an inequality from Bell's paper and say this is the result. This
paper and say this is the result. This is the statement, well, it would be the
is the statement, well, it would be the inequality shown here. And why is that?
inequality shown here. And why is that? What's the big deal? Who cares about
What's the big deal? Who cares about equation 22?
equation 22? Well, to see what equation 22 can tell
Well, to see what equation 22 can tell us, let's imagine a thought experiment
us, let's imagine a thought experiment where we consider the vectors A, B, and
where we consider the vectors A, B, and C. A is going to be a constant
C. A is going to be a constant measurement axis at detector A. And then
measurement axis at detector A. And then B and C are going to be the two
B and C are going to be the two different options that we imagine for
different options that we imagine for detector B. And suppose for the sake of
detector B. And suppose for the sake of a specific example that A and C are
a specific example that A and C are perpendicular such that A dot C equals
perpendicular such that A dot C equals zero. And then also A.B B is equal to B
zero. And then also A.B B is equal to B do C, which is 1 / <unk>2. That is to
do C, which is 1 / <unk>2. That is to say, we have a 45° angle between the
say, we have a 45° angle between the vectors A and B, as well as also a 45°
vectors A and B, as well as also a 45° angle between the vectors B and C. So,
angle between the vectors B and C. So, for example, if A is pointing straight
for example, if A is pointing straight up and C is pointing straight to the
up and C is pointing straight to the right, then B is going to be right in
right, then B is going to be right in between them, a 45° angle that points up
between them, a 45° angle that points up and to the right. And if we apply this
and to the right. And if we apply this reasoning to that scenario, you'll find
reasoning to that scenario, you'll find when you evaluate the dot products in
when you evaluate the dot products in equation 22 that 4 * the quantity of
equation 22 that 4 * the quantity of epsilon plus delta has to be greater
epsilon plus delta has to be greater than or equal to the<unk> of 2 - 1,
than or equal to the<unk> of 2 - 1, which is about 0.41.
which is about 0.41. So divide both sides by 4, you find that
So divide both sides by 4, you find that epsilon plus delta has to be greater
epsilon plus delta has to be greater than or equal to.1 something. And then
than or equal to.1 something. And then remember that delta is kind of an
remember that delta is kind of an artifact of our smearing process. So you
artifact of our smearing process. So you can imagine making that as small as you
can imagine making that as small as you want. In fact, if you want to make that
want. In fact, if you want to make that zero and say forget about averaging,
zero and say forget about averaging, don't worry about the averaging process.
don't worry about the averaging process. But even then, you'll find that epsilon
But even then, you'll find that epsilon cannot be made arbitrarily small because
cannot be made arbitrarily small because in this case, it would have to be at
in this case, it would have to be at least 01 something. But remember what
least 01 something. But remember what epsilon is. It's a bound on the mismatch
epsilon is. It's a bound on the mismatch between the local hidden variable
between the local hidden variable correlation and the quantum mechanical
correlation and the quantum mechanical correlation. So if epsilon cannot be set
correlation. So if epsilon cannot be set to zero then the quantum mechanical
to zero then the quantum mechanical expectation value cannot be represented
expectation value cannot be represented either accurately or arbitrarily closely
either accurately or arbitrarily closely in the form of equation 2 which is the
in the form of equation 2 which is the definition of a generic local hidden
definition of a generic local hidden variable correlation.
variable correlation. So that is argument. Now you can see
So that is argument. Now you can see there's a bit of algebra and it takes a
there's a bit of algebra and it takes a moment to kind of soak it in. And when
moment to kind of soak it in. And when you're first encountering this argument,
you're first encountering this argument, probably the thing you want to do is
probably the thing you want to do is just focus on how each step follows
just focus on how each step follows logically from the previous step and
logically from the previous step and then think big picture about what are
then think big picture about what are our assumptions and what is the result.
our assumptions and what is the result. And you think about how our assumptions
And you think about how our assumptions were so generic going all the way back
were so generic going all the way back to equation two defining the correlation
to equation two defining the correlation for a local hidden variable model. We
for a local hidden variable model. We made no assumptions or any kind of
made no assumptions or any kind of restrictions on the sort of thing that
restrictions on the sort of thing that our hidden variables lambda could be.
our hidden variables lambda could be. And so we've proven this very generic
And so we've proven this very generic result which is that at least for some
result which is that at least for some measurement settings A, B, and C. We can
measurement settings A, B, and C. We can show that there is going to be a finite
show that there is going to be a finite nonzero mismatch between the correlation
nonzero mismatch between the correlation given by a local hidden variable model
given by a local hidden variable model and the correlation given by quantum
and the correlation given by quantum mechanics.
mechanics. And here there's a possibility of
And here there's a possibility of getting confused by equation 22 because
getting confused by equation 22 because you might say, well wait a minute,
you might say, well wait a minute, aren't there settings of A, B, and C
aren't there settings of A, B, and C that make the right hand side zero and
that make the right hand side zero and so this isn't a problem? And that is
so this isn't a problem? And that is true, but it's not surprising because
true, but it's not surprising because remember, as we saw earlier in part 3B
remember, as we saw earlier in part 3B of this paper, you can have an agreement
of this paper, you can have an agreement between a local hidden variable model
between a local hidden variable model and quantum mechanics for certain
and quantum mechanics for certain specific settings of our measurement
specific settings of our measurement directions.
directions. So the fact that there exist
So the fact that there exist experimental configurations where a
experimental configurations where a local hidden variable model might agree
local hidden variable model might agree with quantum mechanics is not
with quantum mechanics is not philosophically profound because the
philosophically profound because the profound thing is that there exist
profound thing is that there exist experimental conditions where no local
experimental conditions where no local hidden variable model can explain the
hidden variable model can explain the results of quantum mechanics. All that's
results of quantum mechanics. All that's to say, if you as an experimentter
to say, if you as an experimentter design an experiment where local hidden
design an experiment where local hidden variables in quantum mechanics agree,
variables in quantum mechanics agree, it's like fine. Okay. But if someone
it's like fine. Okay. But if someone else designs an experiment where they
else designs an experiment where they orient their detectors in such a way,
orient their detectors in such a way, like the example given here, where no
like the example given here, where no local hidden variable explanation makes
local hidden variable explanation makes sense and only quantum mechanics with
sense and only quantum mechanics with its weird non-local wave function
its weird non-local wave function collapse or something mathematically
collapse or something mathematically isomeorphic is able to explain the data.
isomeorphic is able to explain the data. Well, then that's the case and point
Well, then that's the case and point right there that reality is not
right there that reality is not described by a local hidden variable
described by a local hidden variable model. And so even the existence of one
model. And so even the existence of one possible experimental setup that
possible experimental setup that violates local realism is all you need
violates local realism is all you need to know that well something other than
to know that well something other than local realism is going on in this
local realism is going on in this universe. So that's a glitch in reality
universe. So that's a glitch in reality right there. You know, this is one of
right there. You know, this is one of those things that the more you think
those things that the more you think about it, the more it blows your mind.
about it, the more it blows your mind. You'd like to think the more you think
You'd like to think the more you think about something, the less it blows your
about something, the less it blows your mind. But no, in this case, it's the
mind. But no, in this case, it's the opposite.
Part five, generalization. All right. Right. So in this part of the
All right. Right. So in this part of the paper, Belle is going to make the
paper, Belle is going to make the argument that even though we've been
argument that even though we've been thinking in terms of spin and the
thinking in terms of spin and the singlet state of two spin 1/2 particles
singlet state of two spin 1/2 particles with entangled spin, the same arguments
with entangled spin, the same arguments regarding non-locality and correlations
regarding non-locality and correlations and hidden variables applies much more
and hidden variables applies much more generally in quantum mechanics in a way
generally in quantum mechanics in a way that doesn't depend specifically on
that doesn't depend specifically on spin. We just thought about it in terms
spin. We just thought about it in terms of spin because that's an example that's
of spin because that's an example that's easy to think about. So Bell begins part
easy to think about. So Bell begins part five generalization with the example
five generalization with the example considered above has the advantage that
considered above has the advantage that it requires little imagination to
it requires little imagination to envisage the measurements involved
envisage the measurements involved actually being made cuz you can imagine
actually being made cuz you can imagine the sternerlock magnets and the
the sternerlock magnets and the orientation and the spin and all of
orientation and the spin and all of that. But in a more formal way, assuming
that. But in a more formal way, assuming that any hermission operator with a
that any hermission operator with a complete set of igen states is an
complete set of igen states is an observable, the result is easily
observable, the result is easily extended to other systems. So in other
extended to other systems. So in other words, it's not just about spin. We can
words, it's not just about spin. We can apply this reasoning to any quantum
apply this reasoning to any quantum mechanical observable.
mechanical observable. If two systems have state spaces of
If two systems have state spaces of dimensionality greater than two, we can
dimensionality greater than two, we can always consider two-dimensional
always consider two-dimensional subspaces and define in their direct
subspaces and define in their direct product operators sigma 1 and sigma 2
product operators sigma 1 and sigma 2 formally analogous to those used above
formally analogous to those used above and which are zero for states outside of
and which are zero for states outside of the product subspace.
the product subspace. Whenever we have two quantum systems, no
Whenever we have two quantum systems, no matter how complicated they might be,
matter how complicated they might be, they'll always contain smaller two-state
they'll always contain smaller two-state parts that we can focus in on. And
parts that we can focus in on. And within those parts, we can define
within those parts, we can define measurements that behave just like the
measurements that behave just like the simple spin measurements we discussed
simple spin measurements we discussed earlier. And when we do that in that
earlier. And when we do that in that two-dimensional subspace, there's going
two-dimensional subspace, there's going to be a state which is analogous to the
to be a state which is analogous to the singlet spin state but pertaining to
singlet spin state but pertaining to whatever observable we're talking about
whatever observable we're talking about in this more general context.
in this more general context. Then for at least one quantum mechanical
Then for at least one quantum mechanical state, the singlet state in the combined
state, the singlet state in the combined subspaces, the statistical predictions
subspaces, the statistical predictions of quantum mechanics are incompatible
of quantum mechanics are incompatible with separable predetermination.
with separable predetermination. That is the kind of realism or local
That is the kind of realism or local causality that we would expect from a
causality that we would expect from a local hidden variable theory or even a
local hidden variable theory or even a kind of quantum mechanical picture where
kind of quantum mechanical picture where the two states are separable. Like
the two states are separable. Like remember earlier we were talking about
remember earlier we were talking about the uh isotropic mixture of product
the uh isotropic mixture of product states where each particle had an equal
states where each particle had an equal and opposite spin and we saw how that
and opposite spin and we saw how that gave a correlation which was three times
gave a correlation which was three times weaker than the singlet state. Well,
weaker than the singlet state. Well, that same kind of reasoning applies to
that same kind of reasoning applies to this two-dimensional subspace of
this two-dimensional subspace of whatever observable we're dealing with.
whatever observable we're dealing with. you can create a state which is directly
you can create a state which is directly analogous to the spin singlet state. And
analogous to the spin singlet state. And when you do that and you separate out
when you do that and you separate out the particles and you measure them in
the particles and you measure them in different ways, you'll find that the
different ways, you'll find that the quantum mechanical singlet quote unquote
quantum mechanical singlet quote unquote state is always going to have weirdly
state is always going to have weirdly strong non-local correlations.
strong non-local correlations. And so all that's to say, Bell's theorem
And so all that's to say, Bell's theorem is not about spin per se. Generically,
is not about spin per se. Generically, quantum mechanics can exhibit non-local
quantum mechanics can exhibit non-local correlations in all kinds of different
correlations in all kinds of different observables.
observables. All right, my friends, let's go ahead
All right, my friends, let's go ahead and wrap things up with part six,
and wrap things up with part six, conclusion.
conclusion. In a theory in which parameters are
In a theory in which parameters are added to quantum mechanics to determine
added to quantum mechanics to determine the results of individual measurements
the results of individual measurements without changing the statistical
without changing the statistical predictions, there must be a mechanism
predictions, there must be a mechanism whereby the setting of one measuring
whereby the setting of one measuring device can influence the reading of
device can influence the reading of another instrument, however remote.
another instrument, however remote. That is to say, if you take Einstein's
That is to say, if you take Einstein's perspective that quantum mechanics needs
perspective that quantum mechanics needs to be supplemented with hidden
to be supplemented with hidden variables, then Bell has proven that
variables, then Bell has proven that that hidden variable model has to
that hidden variable model has to contain non-local interactions which are
contain non-local interactions which are apparently unrestricted by the normal
apparently unrestricted by the normal limitations of space and time. Moreover,
limitations of space and time. Moreover, the signal involved must propagate
the signal involved must propagate instantaneously
instantaneously so that such a theory could not be
so that such a theory could not be Loren's invariant. and Lorent and
Loren's invariant. and Lorent and variance. That's just one of the main
variance. That's just one of the main principles of special relativity. That
principles of special relativity. That is to say, once you have a non-local
is to say, once you have a non-local theory, you run into all kinds of
theory, you run into all kinds of problems with special relativity. And
problems with special relativity. And really, a non-local theory just totally
really, a non-local theory just totally goes against the usual relativistic
goes against the usual relativistic notions of space and time and causality.
notions of space and time and causality. Now, fortunately, because of the no
Now, fortunately, because of the no signaling theorem, the non-local
signaling theorem, the non-local correlations in quantum physics are not
correlations in quantum physics are not actually able to corrupt our universe by
actually able to corrupt our universe by allowing for the transmission of
allowing for the transmission of information faster than the speed of
information faster than the speed of light. But still, there's a deep
light. But still, there's a deep philosophical tension between the
philosophical tension between the non-local correlations in quantum
non-local correlations in quantum mechanics and the way we usually think
mechanics and the way we usually think about the nature of space and time from
about the nature of space and time from a relativistic perspective. And to this
a relativistic perspective. And to this day, that tension remains unresolved. We
day, that tension remains unresolved. We really do not have a good explanation
really do not have a good explanation for what's going on with the non-local
for what's going on with the non-local correlations in quantum mechanics.
correlations in quantum mechanics. Depending on who you ask, different
Depending on who you ask, different people have different ideas and
people have different ideas and theories, but there's really no
theories, but there's really no consensus. And the reason being, well,
consensus. And the reason being, well, one of the reasons is that all these
one of the reasons is that all these different models are so crazy that it's
different models are so crazy that it's like what are you going to believe in?
like what are you going to believe in? You want to believe in many worlds or
You want to believe in many worlds or super determinism or that you just give
super determinism or that you just give up the concept of realism? I mean, no
up the concept of realism? I mean, no matter how you try to explain the
matter how you try to explain the implications of Bell's theorem, it ends
implications of Bell's theorem, it ends up just blowing your mind. No one has
up just blowing your mind. No one has yet found a sane explanation for what's
yet found a sane explanation for what's going on here. All right, so this is
going on here. All right, so this is basically the conclusion of Bell's paper
basically the conclusion of Bell's paper right here. But then he goes on to add
right here. But then he goes on to add one additional note, a little caveat,
one additional note, a little caveat, which is, of course, the situation is
which is, of course, the situation is different if the quantum mechanical
different if the quantum mechanical predictions are of limited validity.
predictions are of limited validity. Conceivably, they might apply only to
Conceivably, they might apply only to experiments in which the settings of the
experiments in which the settings of the instruments are made sufficiently in
instruments are made sufficiently in advance to allow them to reach some
advance to allow them to reach some mutual rapport by exchange of signals
mutual rapport by exchange of signals with velocity less than or equal to that
with velocity less than or equal to that of light. In that connection,
of light. In that connection, experiments of the type proposed by Bow
experiments of the type proposed by Bow and Aaronov in which the settings are
and Aaronov in which the settings are changed during the flight of the
changed during the flight of the particles are crucial. And all that's to
particles are crucial. And all that's to say, if you're doing an experiment where
say, if you're doing an experiment where the settings of the two detectors are
the settings of the two detectors are set in advance and then you're sending
set in advance and then you're sending your entangled particles to each
your entangled particles to each detector, well, maybe there's some way
detector, well, maybe there's some way that the two detectors have communicated
that the two detectors have communicated with each other or established some sort
with each other or established some sort of rapport somehow. And even though for
of rapport somehow. And even though for each pair of particles, the measurements
each pair of particles, the measurements are happening so fast that they're in
are happening so fast that they're in different light cones, perhaps somehow
different light cones, perhaps somehow the two detectors are already kind of in
the two detectors are already kind of in sync with each other in some sort of
sync with each other in some sort of way. in that they somehow know the
way. in that they somehow know the settings of one another and therefore
settings of one another and therefore you don't need non-locality to explain
you don't need non-locality to explain the correlation results. Now, that would
the correlation results. Now, that would be a very hard to believe situation
be a very hard to believe situation because you'd be like, how can that be?
because you'd be like, how can that be? And you know, how and why would the two
And you know, how and why would the two detectors know about each other, but I
detectors know about each other, but I mean, in theory, that is a loophole that
mean, in theory, that is a loophole that you could imagine possibly somehow being
you could imagine possibly somehow being true. And so, that's why Belle mentions
true. And so, that's why Belle mentions these experiments where you change the
these experiments where you change the settings of the detectors as the
settings of the detectors as the particles are flying along, so that
particles are flying along, so that there's no possible time for the two
there's no possible time for the two detectors to establish a rapport with
detectors to establish a rapport with one another. And so each detector is
one another. And so each detector is going to be truly independent of each
going to be truly independent of each other detector. And so then you're
other detector. And so then you're really ensuring that these correlations
really ensuring that these correlations are genuinely non-local.
are genuinely non-local. Well, okay. So that's the end of the
Well, okay. So that's the end of the paper. I hope you found this
paper. I hope you found this interesting. I hope it's given you
interesting. I hope it's given you something to think about. So yeah,
something to think about. So yeah, thanks for watching. I really appreciate
thanks for watching. I really appreciate it. And I'll see you next time.
it. And I'll see you next time. Hey, I want to say thank you to everyone
Hey, I want to say thank you to everyone who's been supporting my channel on
who's been supporting my channel on Patreon. Your support really means a
Patreon. Your support really means a lot. It really makes a big difference.
lot. It really makes a big difference. And genuinely without your support, I
And genuinely without your support, I wouldn't be able to really dive into
wouldn't be able to really dive into this full-time. So, I'm so grateful for
this full-time. So, I'm so grateful for all of you. Thank you so much. It really
all of you. Thank you so much. It really means a lot.
Click on any text or timestamp to jump to that moment in the video
Share:
Most transcripts ready in under 5 seconds
One-Click Copy125+ LanguagesSearch ContentJump to Timestamps
Paste YouTube URL
Enter any YouTube video link to get the full transcript
Transcript Extraction Form
Most transcripts ready in under 5 seconds
Get Our Chrome Extension
Get transcripts instantly without leaving YouTube. Install our Chrome extension for one-click access to any video's transcript directly on the watch page.