This content introduces the fundamental concepts of geometric perception for robotics, transitioning from an assumed "perception oracle" to using real-world sensor data (cameras) to locate objects. It focuses on the geometric problem of aligning 3D point clouds to determine an object's pose, laying the groundwork for more advanced perception techniques.
Mind Map
Nhấn để mở rộng
Nhấn để khám phá sơ đồ tư duy tương tác đầy đủ
all right welcome back
everybody if anybody's out there so I'm
I'm I could use the ethernet cable if
anybody's back there in the
booth but uh somehow my Wi-Fi is not working
working
again okay so today we're transitioning
into perception so let me even just as a
quick setup
right last time we built an almost
complete manipulation system if you will
right we
we had our Hardware abstraction our Hardware
simulation and we built on top of
that our differential inverse
kinematics block
which well after some
integration sent us EA
commands EA positions
um the diffi block needed to know the
current EA state so we had an important line
here okay but the whole thing was predicated
predicated
on something over here which
just had a gripper velocity trajectory coming
in okay and although
that's that's a complete closed loop
system hidden inside when we designed
the gripper trajectories and the and
then differentiated them to get the
gripper velocities we made a big
assumption which is that we knew where
the red brick was
okay so everything was sort of based on
this someone told me exactly where in
the world I needed to reach to and
that's just not good enough right at
some point we have to now uh use our
sensors to figure out where the brick is
to do the work so so far we assumed a
perception Oracle right someone you
could just query and tell us where the
the object was in the world
frame and we were using some of the the
the cheat ports like we could tell uh
the body pose directly from of any
object in the world right in our
Hardware station and so the goal today
is to stop using those cheat ports and
start using cameras instead and close
the simplest Loop we can around that whole
whole
system that's that could actually be run
robot
so I'm going to download
a ridiculously big mesh cat animation
over my over my phone's uh 5G and burn
my data plan for the month but hey here we
go it's ridiculously big just because
the mesh the
the the ycb objects if you know what
these are these this is a mustard bottle
from the ycb objects okay
database that's a little funny here
okay the meshes or the sorry the
material files the texture maps for
these files are like 50
megabytes so it takes a minute to
download okay this is roughly what we're
doing almost the same as what we did
last time the big difference is that
we're going to have cameras now in the
scene these are
cameras I brought some I don't always
have cameras in my pocket
but today I do okay yeah they're like
this okay this is the 415 this is the
435 real sense camera ceras okay super
you just USB plug them in um you're good
to go they're amazing depth cameras I'm
going to tell you a little bit about
them today okay what what's going on
here but that's what you see in the
scene are these death cameras sprinkled
throughout a bunch of them actually we
put three on each bin you'll understand
why okay and then mostly this is just
the same kind of thing um we did before
out where the mustard bottle is that's
the the role of perception we then we do
our standard gripper trajectory and make
our move from part A to point B now what
the heck is this thing that's getting
left behind Okay that thing is a point
Cloud which is computed by reading the
cameras at Time Zero okay and doing some
initial processing on the data that you
get back from those cameras in order to
make the decision about where to grasp
okay so it's I at the moment of
perception I went ahead and um you know
put those in okay I have observations
like this and then I have a model of
what those I expect those observations
to be and they end up matching okay and
we we'll do all that in by the end of
the lecture okay so we going to think in
again all right so like I said today is
the first day of perception we're going
to do a lot of perception throughout the
course today um is kind of the more
geometric view of perception we're not
just learning a deep Network that goes
from image to um to whatever
representation we want we're going to
start by doing the geometry version
now even though going from image with
through a deep Network directly is I
think by all accounts the best way to do
things today um I there's still a lot of
the a lot of perception tools that have
baked in them whether they neural
network or not some of the fundamentals of
of
geometry it focus it it builds
beautifully on what we did last time I
think on in the kinematics okay and if
you're if you're interested in uh ideas
like neural Radiance fields or um
putting baking 3D priors GE geometric
priors into your neural networks and
everything like this that this is going
to be the foundation you need to do that
kind of
work so today will do it's a slightly
old school version of perception but
it's the foundation okay so we'll do
yes we will we'll tell talk about where
Nerf comes into the stack later y
sense okay so um everybody knows about
the Deep learning Revolution I think a
few less people realize how much of a
geometry Revolution we've had at the
same time okay and it's it was powered
partly by Deep neural networks but even
before that it was powered by I don't
know autonomous driving companies really
caring about where the pedestrians were
in space people building a lot better
sensors I mean virtual reality and
augmented reality were a big driver too
okay and we started getting pretty
incredible things that had no no nural
networks involved okay this is even uh I
don't know eight years ago or something
like this Dynamic Fusion where we could
we started having algorithms that could
run in real time and build instantaneous
3D reconstructions of the perceptions
they were seeing and track you know
people moving around building sort of
these beautiful 3D models and that was a
culmination of algorithmic work of of new
new
sensors um and in particular the sensors
were not only higher quality but they were
were
faster faster to the point where if the
world doesn't change too much between
each frame then you can write a simpler
algorithm to do tracking and
reconstruction okay so there was just
this massive revolution in Geometry
that's happened sort of alongside uh the
Deep learning Revolution and
interestingly they've come together so
now you know there's people baking in
geometric priors into into neural
like okay so it started with um sensors
that we're thinking you know maybe
autonomous driving related so you see a
vadine actually spot has a vadine you
can stick on top of it it's still you
know relevant sensor today these are are
liars laser being shot out and bouncing
back and estimating the distance from
each point of light to the to the Target
okay and some of these reconstructions
are just absolutely amazing you'll see
an autonomous car driving through you
know City street and hundreds of yards
into the future you can see like a cat
walking around a bush or something like
that just incredible what kind of
resolution and range that Those sensors
um have that's like the luminar in
particular 500 meter range crazy okay
indoors though you'll tend to see a
different type of of laser scanner these
hakuu were were very very popular are
still very very popular for uh sort of
indoor navigation where the lighting
conditions are not as severe the
payloads can be a lot smaller the energy
budget is maybe smaller okay uh but lar
is definitely one of the tools that sort
of enabled this sort of
Revolution but alongside that were
cameras that um didn't only return depth
you know lists of numbers that are just
the depth but coupled RGB red green blue
you know color images with depth images
and you you get those in a handful of
different ways some of them are um
actually just using stereo processing of
the of a of the depth of the RGB images
so if you take an image with your right
eye and an image with your left eye and
you know the relative position of your
eyes you can do some quick stereo
matching that says well this block over
here looks a lot like the blocks on over
here and therefore the depth of that
those pixels must be at a certain range
okay um people do that now on fpgas for
instance on Specialized Hardware so that
you can package it nicely into a into a
block that basically is just putting
both an image and a depth this is the
the Carnegie multi sense is actually the
head that we carried around in Atlas the
entire time uh you'll see bumblebe point
grce from bumblebee there's a lot of of
systems that are doing like that okay
let me go through the the first round
first okay um another line of of the of
tools like this the connect sensor was a
big deal when that came out not only
because it worked very well but because
it was so inexpensive right so so
somehow the home entertainment uh gaming
World kind of revolutionized robotics by
building a sensor that we needed okay
that was fantastic uh the the first
version of connect worked with um
structured light so it's actually about
as simple I remember I mean this was an
idea for for decades and decades okay
but it became practical with the
Microsoft Connect where you actually
just project patterns the patterns will
deform on the object you can back out
from the geometry something about the
depth exgon there's a bunch of different
Hardware manufacturers that that built
uh structured light based depth cameras
the ones that I showed you here like the
415 we'll talk the most about is
actually uh projected texture stereo
okay so the problem with just taking two
camera images is if you're looking at a
white wall for instance there's nothing
to compare and contrast between the two
images okay so but if if you put if you
project in infrared for instance that
you know something that will just puts a
little bit of a pattern then you can
even in low contrast situations you can
get depth to come back out okay one of
the reasons that's nice compared to some
of the other cam options like the time
of flight I'll show you next um is that
these these uh cameras can work even if
they're looking at the same scene and
they won't interfere with each other
right so sometimes some of these cameras
if they're sending out pings or
something like this they can actually
actively interfere with each other
unless you synchronize them very very
carefully uh one of the projected
texture you can just point them and forget
okay um and then these days uh there
really is a a massive movement that was
powered by Deep learning which is you
can just go straight from RGB okay so a
lot of times even if you only have a
cell phone camera and you don't have it
actually the cell phone cameras have
depth sensors a lot of lot of them true
depth on the iPhone for instance okay A
lot of times you can actually build
beautiful 3D models even from a single
camera you don't even need the stereo
pair there's there's enough cues of
course from movement and other things
but there's also uh there's additional
information that can be available if
you've you know read everything on the
internet that's not what's happening in
this particular one this is a neural
Radiance field we'll talk about later
but just to say this was one of the
first videos of the neural radi field
showing that just from taking camera
images you could stitch together into
something that could project and
generate new images from new uh viewing
angles and there's also lines of work in
monocular depth that for instance will
just have uh you know a very clever
thing to do for instance is to drive a
car around with two cameras on it and
then just use the second camera to to
you know use stereo matching from there
but just learn a model from the single
camera to the depth
and then you know when it's time to make
a 100 million of them because you're
going to make it to a product you just
take away that extra camera and just use
the Learned mapping from image to depth
right so monocular depth is a a
surprisingly effective technology
now okay now is a good time if you have
I don't actually know what's in the iPad
so the I know that there's a true depth
sensor here if you look you know at the
back of these there's they're projecting
something here and they have the ability
to do uh you you do have a depth camera
on your iPhone I don't I don't have an
okay so the question was the second
question was about the light are on spot
so typically the the the vines on a car
or on spot I didn't the spot I brought
did not have a vadine on it it just had
the C surround cameras but yes typically
those are scanning liars and you have to
do some careful work to think about
blurring from Fast motions and timing it
true I think I think spot's pretty good
yes so um so the question is um you know
I mentioned fpga and the Cary head
uh yes I I think that basically um if
you're doing block matching stereo
that's the simplest algorithm and it's
they're doing more than that in those
heads but that's I think the the essence
of the of the hardness in the
computation is that you're doing a
relatively simple computation taking
like an 8 by8 block of pixels comparing
it to another 8 by8 block of pixels but
you have to do that for all possible
pairs in a row for instance like this so
it's a it's a very trivially paralyzed
operation and in order to get operating
at full frame rates I mean computers
have gotten faster and faster but it's
just a beautiful solution for for
specialized Hardware yes
yeah depends on the technology so um so
like Nerf out out of the box the neural
Radiance Fields uh don't have scale
unless you do some work a priori to to
tell them about the relative poses so
you can run a a different geometry
processing algorithm first most of the
projected um texture or or anything
that's projecting light actively does
have absolute scale they do have maximum
range and minimum range so a lot of them
um are actually it's kind of it can be
frustrating to use them you know you put
a beautiful sensor on your wrist and
then you realize that the minimum range
of like the D415 is3 M or something like
that so you know the last when you're
when you're getting close to the object
that camera becomes blind uh to the in
okay alongside this you know many
sources of of camera input we also have
a um a lot of different ways to store
that data when it comes in lots of
representations
a bit like we talked about
representations of
rotation there's a there's just a
handful of of different formats for
instance and um some of them are good
for some kind type of computation and
some are good for the other and you can
convert you should expect to convert
back and forth between them and figure
out which one's best for for particular sources
sources
so the image the the image that you get
directly out of a depth camera that has
RGB you think of that as an g b plus d
image so it' have four channels some the
first three are color values the last
one is just the depth and at every pixel
that's kind of the output of these
cameras by default okay so that's you
know one depth per pixel
right we're going to uh take those RGB
images which is a perfectly good
representation and convert them them
okay so while this is a you know 4
by uh size of the image representation a
of points in
3D possibly annotated with color values
attributes you compare that to some of
the ones you might have seen from a
graphics software Graphics course you'll
meshes or there's volumetric meshes
triangle meshes are surface
meshes and you can you can have
volumetric meshes in addition to surface
meshes you can think about s distance
presentation and increasingly now people
are choosing to store those
nerve is the neural Radiance
Fields I was mentioning before
is almost a sign distance function we'll
get to the nuances of that when we get
closer okay but this would be I mean
we're going to we're going to go into
each of these when it becomes most
relevant but I want to just sort of get
the the landscape out here first okay
you can also
grids okay there's lots of different
ways to represent
3D uh data like this okay some of the
algorithms you know will will really
more Naturally Fit with one versus the
other and most of the time you can go
oh no I I think by default the depth
channel the depth always will give you
some will have some minimum range some
maximum range and some resolution in
inside that range of course but you
should think of it as an image that has
for every pixel a depth specified so in
yeah yep yep you should think of every
yes um so so there's a lot of algorithms
in fact the one we we'll talk about
today doesn't use color to start
okay but you can potentially do better
values some algorithms will only use the
D part in fact I would say many of the
algorithms before deep learning really
came in would would have only very
limited use of the RGB values because
RGB I mean computer vision is hard let
me let just let's just take a second to
think to remember why computer vision is
hard right so if I take two slightly
different images okay of of of of a
similar scene if the lighting changes at
all right the color values are going to
go are going to be wildly different for
pixels that correspond to the same point
in in real space or if I take the same
object and put it in two different rooms
right the same point on the same object
is going to come up with has very
different color values okay but having
said that if you were to um I remember
the the time when I was I was with my
students and we were you know enjoying
how well RG techniques were starting to
work and I said okay today if you were
to pick if if someone could only give
you depth or only give you RGB what
would you pick and nowadays it's RGB all
the way RGB is so much more informative
there's so many things that you can't
see through a depth camera that you can
see in RGB and humans of course are very
very good at
that so so I think if you have a method
that's only limited using depth it's
it's probably limited it's probably not
the State ofthe art
okay all right so um let's dig in to
sort of this part of the pipeline first
go from RGB to to point cloud and start
seeing the connections between the
geometry of of these camera
representations and the geometry of
spatial transforms and the like and how
do we write optimization problems over
so this
but I could you could argue that
perception is just a hard kinematics
problem at least the problem the first
version of perception we're going to do
today okay it's certainly a controls
problem but uh but even uh even before
that we're going to think of perception
today as a as a kinema
problem what do I mean by that okay so
let me say I've got an
object in space okay so I'll do 2D
objects here
because my artistic abilities are
limiting in that way okay so let's say I
have an object in space and it's got
some coordinate system some canonical
my coordin frame o and this will be the
X and this will be the y axis
okay and I'm going to
say the first thing I want to do is um
represent this geometry I could
represent it as a series of of bases for
instance that would be most similar to
the triangle mesh in 2D it would just be
line segments okay but but instead I'm
going to use a Point Cloud
representation of that object okay so I
want to represent this object with a
boundary they're going to be points in
in now for my example here a 2d
space and they're going to be written in
object okay so I'll call those points
points
sort of my my model of the object
and and I'll write them as
um points right by P for for Point here
if I have model point I here and I'll
say that there my model exists in the object's
object's
frame okay position of the
frame okay now I have a camera that's
kicking out some other points hopefully
they're relatively similar okay similar
the space but maybe I have something
that is coming
out like this
this
okay by the
way you rarely get all the points from a
camera but we'll assume that for just a
points SI for scene
and I get what I get out of my camera is
frame and let's say I took great care
when I mounted my camera so maybe we can
say that the location of the
known if it's bolted to your hand or
something like that that could be also
just a forward kinematics problem to
figure out where the location of the
yeah it can yeah absolutely if it's if
it's bolted to the robot then you would
you would definitely as a function of
the joint positions right so far this is
just a a pose not a rotation or no not a velocity
velocity
out the transform of the object in the
world right you have a lot of the pieces
we have points in the world we have
camera points in the world model points
and scen points and then we have the the
yeah great great yes so um so the point
Cloud res so that's a really good
question so the question was you know
the camera I think of a camera image as
giving me points in a 2d picture right
but if I have a depth Channel inside
that then it's the first step that I
didn't I should have said is I'm going
to take those 2D points in in the the
camera and I'm going to project them
into a 3D point right by just applying
my my in in my camera frame that's easy
I just say that it's at at some depth in
the in the immediate frame now there's a
couple steps that go invol that are
involved in that so first of all there's
like intrinsics in the camera you have
to take out any Distortion from the lens
or something like this but this is
something we know a lot about it's still
a pain but it's uh but it's something we
know a lot about and then the other
thing is the extrinsics which is to take
those points in the camera frame and
bring it into the world frame and that
would be this this xwc that's the camera extrinsics
extrinsics
okay good but all those are are quite
doable to go from the 2D picture with a
ahead so um let's say this is exactly
the right the same object right my model
my model was perfect right so um and
that my sensor had zero noise okay
there's still things that can get in the way
way
right which is that like those points
might have been sampled in different
places along the you know there's
there's reasons why that's almost never
going to be perfect but as a toy problem
to start I'm actually going to say let's
consider the case where we've just taken
the model points and translated it
someone translated them through an
unknown transform and we're going to try
to get that back okay that's the easy
case and we'll look at the harder case
where there's noise and and outliers and
other things in between great
question okay there's another thing that
we have to assume to to well that we
will assume to get started
okay these are all just yellow dots okay
and these are all just yellow dots and
your incredible brain knows how to map
the you know knows that this yellow dot
probably corresponds to that yellow bot
dot okay but if it's just a list of
numbers on the computer that mapping the
correspondence it's called between those
points and this points is not given and
in general it has to be acquired by some
sort of logic right you have to figure
out which of these if you just have a
pile of points over here and a pile of
points over here figuring out those
correspondences is a massive part of the
problem okay but let's just start by
assuming that someone said that the I
point here matches the I point over here
and we'll solve the second part of that
yeah so so if let's say I had a cad
model I could take the these points
directly from the cad model so if you if
that's a helpful that's why I'm using
the word model sometimes the way you get
it is you you know put your your object
down in a nice situation and you get one
scan and then you use that as your model
for finding it in other things okay but
but think of this as like you've got a
cad model and then this is the real
object out in the world that I got returns
returns
scene
okay so step one
one will
will
correspondences sometimes the word I
feel like I forget to Define it but just
the mapping I would have defined it in
symbols in a minute but the mapping from
those points to these points are the correspondences
right
okay so in that case you can sort of
see that we have a nice little
kinematics problem an optimization
that the E Point um of the model should
correspond with um
the oops
oops [Music]
[Music] zero
zero
Mi that's the model in object frame
mapped to the world frame should correspond
to in the simple exactly one: one
correspondence problem these are the
obvious kinematic equations okay and in
unknown so the question becomes how do I
extract the pose given a list of you
correspondences now if we dig in just a
little bit to Theos repr presentation
you remember from our spatial
transforms that uh
doing this is equivalent to both the
okay I'm leaving off the W for my short hand
hand
but the W is
everywhere okay so really if the
unknowns are both the translation inside
there and the rotation the rotation
you'll remember we have lots of choices
about how to represent it this this is
always three numbers basically in 3D in
2D would be the two numbers here we have
many different ways we could possibly
represent the 3D
rotations but somehow we need to search over
over
these okay and that's the the
question so solving this is is actually
an inverse kinematics problem right
we're trying to figure out given the um
given the uh the data the points in
space we're trying to back out the
orientations and rotations so this is an
problem now let's stop and think for a
second here so last time we didn't
actually do inverse kinematics we did
differential inverse kinematics I said
inverse kinematics is
harder we're going to defer that to
later but this time we're going to go
directly I haven't written any
differential kinematics yet I've just
written a kinematics problem so this is
like you know the the positions and
orientations inside here is like our
generalized coordinates that we're
trying to back
out so what is the difference why am I
why did I advocate immediately def I k
for moving the arm around but I'm saying
we're going to have to solve the full
yeah I I think that's right so he says
you don't have the ground truth right so
effectively the the magic that happens
in the in the robot case is that you
know the initial conditions and you want
to change those initial conditions so
you have an initial q and you're making
small ch changes to that so we have a
place to linearize around okay in
perception you know at least once you
have to wake up and figure out where the
objects are right you don't have an
initial unless someone gives you a good
initial guess then you could be in the
land of differential kinematics inverse
kinematics okay but we're saying you
know at least once you have to figure
out the hard problem find the needle and
the Hast stack potentially okay now once
you solve that once I actually would
Advocate differential inverse kinematics
if you wanted to track for instance if
you want to do real-time tracking then
by all means you should be thinking
about gradients and and the like okay
but the onetime problem is a little bit
it must be solved I guess in the
perception case fortunately this is not
some complicated chain of of equations
that can lead to lots of uh of
nonlinearities and local Minima and
stuff this is about the simplest inverse
kinematics problem we could have to
solve and we're going to see it has
beautiful structure and good Solutions
Solutions okay
okay so let's start um we get to pick a
rotation representation now the the
derivation I'll do would go through fine
for at least for
querian and we're going to do rotation
matrices here just because I think it's
a little bit it's obviously a linear
equation in a 3X3 Matrix here so it's
just a little easier on the board um so
so let's let's say even though this is
my spatial algebra rotation we're going
to represent this with 3x3 matrices for
for the purpose of
question so um you know I've got a a mug
here at least once I'm going to say you
know I'm going to build a CAD model or
whatever I'm going to pick an origin of
my cad system and I'm going to just
Define an object relevant model of this
so I think that is the natural like if
you think about what you would have to
do to to build a model and write some
coordinate system the natural thing
would be to attach your coordinate
system to the object
itself okay and then the question of
then is of where it is in space becomes
what's where is that object in space
that's the second transform I think
yep yep so
uh pick the bottom corner you know it's
it's it's just like if you were in solid
works and you were you know you have to
just start drawing lines you got to pick
a z 0 somewhere okay and all of your
points are sort of relative to that Z 0
it doesn't matter if you put it in the
middle it doesn't put it in the matter
in the bottom corner but it's just the
frame of reference that you're going to
define those points relative to each
other thank you that's a good
question okay so so now to solve this
problem we want to back out those 9 + 3
numbers 12 numbers in order to make
those equations match okay little
Annoying that I picked the divider right
up okay that's the
all do you see even though it's it
doesn't look quite in the normal way but
do you see that's just a
linear equation right so this is you
should see that this is like ax is appro
approximately equal to B right except in
this case the
X would be
my three you know three
positions and then the the
12 rotation coordinates all stacked in
one line I just flatten those into one
vector the a has a bunch of stuff about
the P mi in here flipped around a little
bit and shifted okay and this has got
the P SI over
here inside that okay but it really is
just you could just rewrite that you
know if I flatten that out into a big a
matrix that's your data Matrix Big B
Matrix okay and you're trying to solve a
Le squares problem to back that
yes um so this is the scene points you
know in the I guess in the camera it
also is going to have insided the the
X WC which is I guess
given right in this bright pink thing on the
the
top that's the that's the right hand
side yeah and then the rest of it is
decision variables which are just
PM so if I just tried to solve ax equal
to B if I just tried to take an a
inverse what's wrong with that
right if you have any number of so you
need a some number of points we'll ask
you on the P set exactly how many points
you need for that to be well defined
okay you need some number of points for
this to just to have a unique solution
okay but most of the time you're going
to be in a situation where you have many
points many more than 12 points let's
say well you know if each point
contributes three things so you know
many more than four points hopefully if
you're at the fourpoint room you know
find another camera or something right
okay so if solving that with exact
equality would be very brittle for all
the reasons we brought up a minute ago
right if there's any noise if you
sampled uh slightly different points on
the surface just because of where the
position of your camera or something
like that that's not going to be a good
way to go so we're going to solve this
that wow it used to be like I had
options one through 32 but now it
actually just says chalkboard and Center
light turn on that's good they just
upgraded that this year okay that was
way too easy I should have done that before
sorry this is a I'm I'm I'm using this
as an abstract so so I could I want to
when I see this I think of a you know a
standard linear algebra problem where
typically in linear algebra when they
people write this they will use a and b
Matrix and what I was trying to convey
is that the the problem we have with
different variables that are you know
rooted in Geometry could be interpreted
as our standard ax B where the a matrix
is populated with the data from PM okay
and the B Matrix is populated from
yeah that's exactly the yes so you got
it right so so the reason I said almost
he says is this going to give you a
valid rotation Matrix right so what
really want to write in an optimization
world is I want to
minimize the difference between the
right and the left hand side this is
basically my ax minus B okay but I'll
say um
P plus
r p Mii minus uh P SII that's already
rotated into the world frame
okay sum over I I'm going to minimize
this over
P that's almost what I want to say okay
that's like the least squares version of
that fitting but there's one important
detail which is that not all nine
numbers you can't pick arbitrary nine
numbers and get a valid rotation Matrix
okay so what I'll do is I'll write in
here R has to be part of s
SO3 as a constraint this is the special
orthogonal group three okay uh and
you're going to understand it completely
Matrix one way to write that is with
additional constraints on the optimization
optimization
problem so I can I could write this you
know in the shorthand but what I in
order to implement that what I would
actually do is
write the determin the things that
define a valid rotation Matrix first of
all R transpose has to be R
inverse that's one constraint that makes
a valid rotation Matrix and the other
one is that the
one so this is the optimization problem
yes it's a good that's a good question
so um it turns out that this constraint
by itself
ensures that the determin is either plus
one or minus one but the minus one case
uh can get you and that would be called
an improper rotation which is a rotation
plus a reflection so if you want to stay
with the proper rotations you need the
extra constraint that's a good question
in practice we often we will drop this
and then check for Reflections
afterwards and flip it back because this
general in fact if you you think about
this so this we said is if in the
decision variables the inside of this is a
a linear
linear
function which means the squared is a
quadratic function so you should be
thinking I've got you know a quadratic
bowl like this that's a good case what
are these in terms of the constraints I
told you quadratic programming is
beautiful if you have L but it's defined
when you have linear constraints are
these constraints linear right this is
not a linear
constraint it's a it's a um it's also a
quadratic constraint the way you could
see that is you multiply by both sides
you say R * R
transpose equals I would be an
equivalent writing of that and so the decision
decision
variables have to be zero or one to make
all of the nine elements match okay but
those are each of those the entries in
this Matrix is is a squared of the
original decision variabl okay so this
constraint this constraint turns out to
be cubic in the 3X3 matrices in in 2x2
matrices it's just it's actually the
same as it's a quadratic again okay but
that can be in general
cubic so those are less good but it
turns out that um you know we have
really really good solutions to this one
this is this is like one potentially
ugly hard class of optimization problems
where this one we just we nail it okay
um let's actually do the 2D
example I think it's
useful to understand the optimization
landscape right what what are we setting
solve okay so let's say we're going to
do the two x two version of it so in that
that
case a 2 x two rotation Matrix um you'll
often see it it's cosine
Theta sin
Theta sin
Theta cos
Theta we could try to search directly for
for
Theta but to keep our analogy to the 3D
case what I'm going to instead do is I'm
going to just name a
variable I'll call it what did I call it
a for cosine Theta
so B for sin
Theta and also I'll parameterize my
Matrix as a b
b a and I'm going to search over A and B
this is a trick that you can't quite do
in so so in 3D if I I I would have had
to I could have done a b c d for
instance like
this it just happens that in 2D I know
there's not enough degrees of freedom I
know that I can solve away C and D so I
haven't done that but in 3D you don't
it okay we'll come back to it when it
when it makes sense then okay all right
so um so let's just multiply this out so
what is
r * R transpose equal I look like well
that looks
like a b b a times the transpose of that
A and that implies in order for this to
equal 1
01 that implies
1 and it says that AB minus ba has to equal
equal
zero okay so that's a two quadratic
constraints that Define the rotation
Matrix being a valid
rotation okay and this is the same thing
is happening in 3D you know same same
type of thing but you just have more
equations flying
around it happens if you wanted to say
that the determinant of this equals
pos1 in this case it's the same as this
right this is also the
one and that's because I took out the
improper rotations by my parameterization
yes yep yep sorry that's that was just
me spelling it out but you're right
true yep also because I did the change of
of
variables okay
so I made a animation of a visualization
of this okay so what I want you to think
of is a objective that's a quadratic
bowl and a constraint that is this
quadratic constraint
like went out of order but here it is
okay this is what it looks like okay so
I I took a few points and I made it I
made a you know my quadratic objective
and I can and I just I took my points
and I just rotated them I'm just trying
to back it bring it back so I can
actually dynamically rotate what those points
points are
are
okay and it's moving around and I'm my
goal is to find the bottom of the
quadratic bow but it has to be on this
constraint so if you look down from the
top this is the unit circle constraint
which looks like a cylinder I mean I it
projected up okay so I I have to find
the lowest point on the on the parabola
that intersects with the red constraint
and it turns out in the case where
there's no noise the minimum is always
going to be on the manifold right
because the best rotation that you could
find is going to have it's going to be a
valid rotation okay so in this case is
actually the good case now as soon as
you add noise that the the the parabola
could move off the unit circle and
you're going to need to project it back
onto the unit circle that's the
fundamental geometry of this problem
it's also it's very it's a very famous
problem it's the point correspondence
problem the woba problem there's it
comes up in all kinds of fields it's a
famous problem of this solve a quad
ratic objective onto the unit circle or
ahead it's only a little bit hard so it
is harder to do um basically I don't
know that I don't have the simple
relationship to just know that this is
negative sign Theta so I would have to
use nine numbers instead of I just used
in 3x3 I'll you know have ABC DF
DF
right and then the um the arch transpose
R is still quadratic the determinant is
cubic for for a 3X3 Matrix
yeah okay but the geometry is roughly
uh that's this constraint and the
yep okay interestingly let me just get
one thing more thing in and then
interestingly we could have
parameterized the whole thing directly
with Theta okay that's only one variable
in 2D of course in 3D we'd pick cians or
we'd pick one of the other
representations in this case it's maybe
illustrative to see that if we just did
it in terms of theta what does that cost
function look like I threw that on a plot
okay and it is similarly beautiful and
good okay so now there's no constraints
I don't if I were to just parameterize
it with Theta then I would always get a
valid rotation out I don't need this so
I could just write WR my
objective but the objective is no longer
quadratic it's a it's a non-convex
objective it's got signs and cosines in
the middle of it and the signs and
cosines multiplied out you get you got
cosine squar whatever you know to the
second power gives you a cost landscape
that looks like this and if I you know
move my Theta that rotated the two
relative to each other then the minimum moves
moves
correspondingly luckily you know all of
the minimum are good in this case
they're all just 2 pi off so just you
know similarly this is a good
optimization if I start with an initial
guess and I and I go down then I'm
always going to find a good answer in
the no noise case these things are going
to behave differently not only when you
have noise but also when you have
additional constraints like for instance
if you don't want to penetrate if you
don't want your object to penetrate the
world or something like this that
becomes a harder constraint and we'll
choose we'll see the differences between
those representations more yeah
in the 2x two case it is quadratic in
the 3X3 it's not quadratic but what
we're going to do is ignore
it the rotation Matrix is sufficient to
get determinant plus or minus one we're
going to solve if our if our determinant
was minus one then we're going to we're
going to multiply by negative one in one
of the in one of the Yeah so basically
it's whether you have a right-handed
rule or a left-handed rule if you end up
with a left-handed rule you flip it back
to a right-handed Rule and you call it a
day yeah so that's how we get around
cubic okay so that was just searching
for rotations I left out the the the in
this simple example I left out the
positions but one of the most important
insights I want you to take away today
is that actually you can separate
solving rotations from solving uh for
the translations
okay why is that so registering the
points the key the key insight and we we
already had it in the first lecture
about spatial transforms remember I I I
made you I did a checky yourself kind of
thing saying the position the the
position of B relative to
a only depends on the rotation between
those two frames not on the position
because it's already it's already a
vector the base of that Vector is is you
know in rooted in the coordinate system
so the the relative positions of two
rotations so the trick is if you just
try to f fit every Point by
itself then then you have to solve for
the translation and rotation separately but if you just take the difference
but if you just take the difference between two points that quantity does
between two points that quantity does not depend on the translation of the
not depend on the translation of the object you could take an object you know
object you could take an object you know I don't know in in building 32 or an
I don't know in in building 32 or an object here okay and the the absolute
object here okay and the the absolute translation does not affect the relative
translation does not affect the relative points only the
orientation so the trick is you subtract out some nominal point in the middle of
out some nominal point in the middle of your point cloud from all your points
your point cloud from all your points you solve for the rotations and then at
you solve for the rotations and then at the end now that you know the rotations
the end now that you know the rotations you can easily figure out the
you can easily figure out the positions okay you can solve for the
positions okay you can solve for the rotations
rotations separately you guys didn't look like you
separately you guys didn't look like you got that I mean I'm not trying to but
got that I mean I'm not trying to but you didn't look as happy as a as a you
you didn't look as happy as a as a you know on average let's just say yeah
know on average let's just say yeah yes
true so so so so uh you said it almost right and I have to resay it for the
right and I have to resay it for the people on the video so um so we're going
people on the video so um so we're going to take the the the model Point Cloud
to take the the the model Point Cloud we're find the centroid of the model
we're find the centroid of the model Point cloud and and write all of the
Point cloud and and write all of the model points relative to the model
model points relative to the model centroid and I'll take the central I'll
centroid and I'll take the central I'll take all of those scene points I'll take
take all of those scene points I'll take the scene centroid and write all of
the scene centroid and write all of theirs relative to the scene centroid
theirs relative to the scene centroid and then I'm going to try to take those
and then I'm going to try to take those relative coordinates and rotate them
relative coordinates and rotate them until they match and now I have an easy
until they match and now I have an easy problem to just snap the positions into
problem to just snap the positions into alignment so it's a two-step process
yes uh so so the question is why at an arbitrary point so it turn the key
arbitrary point so it turn the key Insight is that it's the relative points
Insight is that it's the relative points that that match it turns out there is a
that that match it turns out there is a right kind of a natural point to pick
right kind of a natural point to pick which is the centroid uh because then it
which is the centroid uh because then it actually just drops right out of the out
actually just drops right out of the out of the equations in a beautiful way yeah
of the equations in a beautiful way yeah yeah
ISF it's the average of the points it's literally the average of the data points
literally the average of the data points yeah
yes uh in 3D also it's only rotations that affects the relative points if you
that affects the relative points if you have a you know this point relative to
have a you know this point relative to this point then it the only thing that
this point then it the only thing that changes that number you know changing
changes that number you know changing the coordinate
the coordinate system doesn't you know the the location
system doesn't you know the the location of the coordinate system does not change
of the coordinate system does not change the relative number it's only the
rotations in 3D it works fine yeah okay so quick quick quiz so um
yeah okay so quick quick quiz so um what happens if you have a symmetric
what happens if you have a symmetric object right so I drew one sort of
object right so I drew one sort of intentionally that had a uh you
intentionally that had a uh you know asymmetry there what if it was a
know asymmetry there what if it was a box
it's so so she says it's impossible to know you know if it was if it was
know you know if it was if it was four-way symmetric in an actual box then
four-way symmetric in an actual box then it's impossible to know but so you're
it's impossible to know but so you're right of course but the thing I just
right of course but the thing I just want to make sure it's clear so far I've
want to make sure it's clear so far I've assumed known
assumed known correspondences so in the case of known
correspondences so in the case of known correspondences there is no symmetry it
correspondences there is no symmetry it cannot be right if someone told me that
cannot be right if someone told me that this point corresponds to that face and
this point corresponds to that face and there is always a unique solution right
there is always a unique solution right and the reason that sort of maybe
and the reason that sort of maybe puzzling is I'm showing you these plots
puzzling is I'm showing you these plots that look like they have a unique
that look like they have a unique solution even in the case of something
solution even in the case of something that has symmetries and the reason for
that has symmetries and the reason for that it's not a it's not a trick that
that it's not a it's not a trick that function doesn't change if your object
function doesn't change if your object suddenly becomes symmetric it's because
suddenly becomes symmetric it's because it's the known correspondences case okay
it's the known correspondences case okay let me let me keep moving a little bit
let me let me keep moving a little bit so I want to get through a couple things
so I want to get through a couple things all right and we'll ask you a couple
all right and we'll ask you a couple questions about uniqueness and the like
questions about uniqueness and the like on the
problems okay is that is that basic idea clear if someone gives me the
clear if someone gives me the correspondences then I have a very good
correspondences then I have a very good algorithm which is just solving this uh
algorithm which is just solving this uh that that uh
that that uh can can find the optimal solution in
can can find the optimal solution in fact even the the solving the quadratic
fact even the the solving the quadratic thing projecting onto the unit circle uh
thing projecting onto the unit circle uh you don't have to just you know solve
you don't have to just you know solve and then project you actually can solve
and then project you actually can solve it beautifully and it turns out the
it beautifully and it turns out the solution is given by this by the
solution is given by this by the singular value decomposition okay so the
singular value decomposition okay so the the Waba problem is famously solved by
the Waba problem is famously solved by the singular value uh decomposition if
the singular value uh decomposition if you have extra constraints then you're
you have extra constraints then you're going to use extra Machinery typically
going to use extra Machinery typically the a very powerful way to write that is
the a very powerful way to write that is as a semi-definite program which we'll
as a semi-definite program which we'll get to later okay but but these kind of
get to later okay but but these kind of quadratic constraints are a particularly
quadratic constraints are a particularly nice case of a semi-definite
nice case of a semi-definite program okay but in in the in the
program okay but in in the in the unconstrained case or you know only this
unconstrained case or you know only this constraint this is like the the SVD if
constraint this is like the the SVD if you know the basic picture of of the
you know the basic picture of of the geometry of SVD right it's about finding
geometry of SVD right it's about finding the coordinate you warp it to the circle
the coordinate you warp it to the circle you rotate it and then you warp it back
you rotate it and then you warp it back okay well that warping to the circle is
okay well that warping to the circle is exactly the warping that happens of
exactly the warping that happens of projecting onto the unit circle okay so
projecting onto the unit circle okay so it's it turns out to be exactly related
it's it turns out to be exactly related to the SVD
okay so equipped with that we now have the most
the most important uh algorithm for sort of
important uh algorithm for sort of geometric perception which is the
geometric perception which is the iterative closest
point the biggest assumption we made so far was this known
far was this known correspondence right someone told me the
correspondence right someone told me the the relative
the relative correspondences if I instead have to
correspondences if I instead have to solve for the
solve for the correspondences then um then I have an
correspondences then um then I have an extra work to do and just to give my
extra work to do and just to give my notation let's define a correspondence
notation let's define a correspondence Vector
okay so I'll use um a correspondence Vector c one for
um a correspondence Vector c one for it's the length
it's the length num points time one
okay so num points by one vector and the E element takes an integer value and
E element takes an integer value and I'll
I'll say the E element is an integer
say the E element is an integer J
J if
if Point SI
corresponds to model
MJ and now I was careful that to choose that so it doesn't have to be a one
that so it doesn't have to be a one toone mapping right I'm going to try to
toone mapping right I'm going to try to go power through a little bit more so um
go power through a little bit more so um it doesn't have to be a one toone
it doesn't have to be a one toone mapping there could be model points uh
mapping there could be model points uh that don't have a corresponding scene
that don't have a corresponding scene point which is important because often
point which is important because often times if you have a camera you're just
times if you have a camera you're just looking at one side you're not going to
looking at one side you're not going to have scene points all the way around the
have scene points all the way around the OB object okay but we're saying that
OB object okay but we're saying that every scene Point corresponds to a model
every scene Point corresponds to a model there's other choices people sometimes
there's other choices people sometimes make where the you assume that every
make where the you assume that every model goes to a scene and that all of
model goes to a scene and that all of them have implications but in this we're
them have implications but in this we're going to choose it like like this
going to choose it like like this okay I could then just write my
X using that notation minus p SI squar but now I have to search I have
SI squar but now I have to search I have to find both
to find both CI which is this discreet thing right
CI which is this discreet thing right this is an it's a function on the
this is an it's a function on the elements of one to numb model
points that's the set that that lives in so it's kind of a weird thing to
so it's kind of a weird thing to optimize
optimize over and X in
se3 so how am I going to optimize that right it looks like kind of a
right it looks like kind of a quadratic objective but with a
quadratic objective but with a combinatorial aspect of it of trying to
combinatorial aspect of it of trying to figure out all these correspondences
figure out all these correspondences simultaneously and you can do that we've
simultaneously and you can do that we've had paper that does that kind of thing
had paper that does that kind of thing where where we're trying to do the
where where we're trying to do the combinatorial search at the same time as
combinatorial search at the same time as the continuous search but it's very it's
the continuous search but it's very it's a expensive
a expensive optimization
optimization so the ICP algorithm uh
so the ICP algorithm uh famously does it by by splitting it into
famously does it by by splitting it into two
in many optimizations kind of like this it's often the case that if you fix one
it's often the case that if you fix one set of variables then the optimization
set of variables then the optimization is easy fix another set of variables the
is easy fix another set of variables the other optimization is easy and then you
other optimization is easy and then you end up with natural algorithms that
end up with natural algorithms that alternate between the two optimization
alternate between the two optimization problems and that's exactly what we'll
problems and that's exactly what we'll do here because if the correspondences
do here because if the correspondences are known then the optimization is
are known then the optimization is exactly what we did a minute ago that's
exactly what we did a minute ago that's the point registration with known
the point registration with known correspondences and it has a beautiful
correspondences and it has a beautiful solution and then the the other side of
solution and then the the other side of it is if the transform is known then
it is if the transform is known then finding the corresponding points is just
finding the corresponding points is just a nearest neighbor
a nearest neighbor problem okay so if we have an initial
guess for X then we can find we'll take step one
X then we can find we'll take step one solve the nearest neighbor
so C our new CI is going to be just be the we can just try all the possible
the we can just try all the possible correspondences basically I'm going to
correspondences basically I'm going to say
say x p o m j Argin over
x p o m j Argin over J minus
J minus PSI squared so in the if for a small
PSI squared so in the if for a small Point Cloud you can just literally try
Point Cloud you can just literally try all the possible correspondences for
all the possible correspondences for this when X is known you can just
this when X is known you can just measure the distance and take the
measure the distance and take the smallest
smallest one okay when it gets bigger you start
one okay when it gets bigger you start using efficient nearest neighbor data
using efficient nearest neighbor data structures like KD trees and stuff like
structures like KD trees and stuff like that okay but this is a a fast to
that okay but this is a a fast to nearest neighbor
nearest neighbor query and then the second step is um
query and then the second step is um given
solve for x and then you just repeat until
until convergence
okay so let's see what that looks like try to not stand in the midle right
like try to not stand in the midle right here okay so this is the kind of plot
here okay so this is the kind of plot I'm going to show you here so this was
I'm going to show you here so this was the known correspondence one I picked a
the known correspondence one I picked a lovely salmon color for the random
lovely salmon color for the random object with random points in 2D and
object with random points in 2D and known points and then I rotated it and
known points and then I rotated it and translated it by some random quantities
translated it by some random quantities and got my blue scene points in the
and got my blue scene points in the first step we have a known
first step we have a known correspondence problem and the
correspondence problem and the registration just works exactly okay
registration just works exactly okay that's the known correspondence version
that's the known correspondence version now if I take another Point another
now if I take another Point another example here with my scene my my model
example here with my scene my my model points and my scene points if I start
points and my scene points if I start the it the
the it the ICP algorithm then I think I can just
ICP algorithm then I think I can just step through here we go okay the first
step through here we go okay the first thing is I do is I solve that given that
thing is I do is I solve that given that initial guess which was a bad initial
initial guess which was a bad initial guess I you know this is the initial
guess I you know this is the initial guess here I just compute the nearest
guess here I just compute the nearest neighbors for every um every scene point
neighbors for every um every scene point I find the nearest
I find the nearest Point okay and then I given those
Point okay and then I given those correspondences I try to solve for the
correspondences I try to solve for the new optimization and it doesn't do very
new optimization and it doesn't do very well because my correspondence were all
well because my correspondence were all wrong but the hope is that it gets you
wrong but the hope is that it gets you closer okay and then I get a new chance
closer okay and then I get a new chance at my correspondences and many times
at my correspondences and many times this converges
this converges beautifully in a small number of of
beautifully in a small number of of alternations because you know at some
alternations because you know at some point your correspondences are correct
point your correspondences are correct and you snap right into
and you snap right into place
yeah yes um so just like uh you can separate translation and rotation
separate translation and rotation scaling can be separated too and the
scaling can be separated too and the trick is so just like the the
trick is so just like the the observation is that um the the relative
observation is that um the the relative positions only depend on rotations it
positions only depend on rotations it turns out the difference of of
turns out the difference of of distances only depends on
distances only depends on scale so if you if you play that trick
scale so if you if you play that trick one more time you get something that
one more time you get something that only depends on scale so you can fit
only depends on scale so you can fit scale
scale first and then fit rotations and then
first and then fit rotations and then fit translations I actually cite that in
fit translations I actually cite that in the notes because I think that's part of
the notes because I think that's part of the
the story
yeah yes good so so this algorithm can absolutely get stuck in local minimum I
absolutely get stuck in local minimum I I you know there's a in the in the code
I you know there's a in the in the code you can play with it's just random so
you can play with it's just random so the fact that it's mostly translation
the fact that it's mostly translation it's probably because I wanted one that
it's probably because I wanted one that fits on the slide uh but I didn't think
fits on the slide uh but I didn't think of it that way now I feel like I picked
of it that way now I feel like I picked a bad example but uh yeah so so it but
a bad example but uh yeah so so it but it absolutely can get it stuck in local
it absolutely can get it stuck in local minimum yes if you pick the wrong
minimum yes if you pick the wrong correspondences you make not enough
correspondences you make not enough change you could get the same correspond
change you could get the same correspond same wrong correspondences back and that
same wrong correspondences back and that will never leave
will never leave right uh so there are many ways and
right uh so there are many ways and we're going to talk about those more
we're going to talk about those more next time but there are many ways you
next time but there are many ways you can try random initializations of course
can try random initializations of course responden but you can also there are
responden but you can also there are more robust methods that can that can
more robust methods that can that can try to avoid some of those local minimum
yeah so and we're going to talk about noise uh also next time it's actually
noise uh also next time it's actually it's a fairly subtle question and I had
it's a fairly subtle question and I had I had a slide I blw past real quick but
I had a slide I blw past real quick but um just to to show you some like real
um just to to show you some like real world noise is is extremely structured
world noise is is extremely structured so if you think about noise as like
so if you think about noise as like adding
adding gaussian values to all of those values
gaussian values to all of those values independently that's not the way cameras
independently that's not the way cameras have noise cameras tend to have like
have noise cameras tend to have like this is the actual depth image and this
this is the actual depth image and this is like the the act the depth image you
is like the the act the depth image you get out of a camera they have dropouts
get out of a camera they have dropouts like pixels that are just mixing missing
like pixels that are just mixing missing in the middle they will have swaths of
in the middle they will have swaths of like a shiny material might have very
like a shiny material might have very noisy things and a you know a flat
noisy things and a you know a flat material could could not so so the the
material could could not so so the the answer to your question requires
answer to your question requires thinking a little bit about the types of
thinking a little bit about the types of noise
yeah yep just alternate back and forth between when X is fixed the problem is
between when X is fixed the problem is easy it's nearest Neighbors when the
easy it's nearest Neighbors when the correspondences are fixed then the
correspondences are fixed then the problem is easy it's this W
that's true you're solving many optimization problems in the loop the
optimization problems in the loop the one the this one is so easy that it
one the this one is so easy that it becomes an SVD it's a called SVD so it's
becomes an SVD it's a called SVD so it's I wouldn't even call it an optimization
I wouldn't even call it an optimization problem in the implementation it's very
problem in the implementation it's very fast even for very big Point clouds but
fast even for very big Point clouds but yes it is it is alternating between
yes it is it is alternating between those and let me just I'll take home
those and let me just I'll take home with one more example here so yeah these
with one more example here so yeah these are I've got lots of examples of real
are I've got lots of examples of real noisy things
but you're going to play with the bunny because everybody who ever does ICP
because everybody who ever does ICP makes the Stanford
makes the Stanford bunny snap into alignment with another
bunny snap into alignment with another Stanford bunny that's just like you know
Stanford bunny that's just like you know early in computer Graphics the Stanford
early in computer Graphics the Stanford bunny sort of did a win or take all
bunny sort of did a win or take all thing and it just one out and there's
thing and it just one out and there's everybody uses the Stanford bunny okay
everybody uses the Stanford bunny okay so you you'll do that on your problem
so you you'll do that on your problem set but just to show you even in the
set but just to show you even in the examples I I um I showed you have like
examples I I um I showed you have like the the loading a dishwasher for
the the loading a dishwasher for instance if you watch carefully at what
instance if you watch carefully at what happens so there was a perception system
happens so there was a perception system that tried to figure out where the mug
that tried to figure out where the mug was to begin with okay but as the robot
was to begin with okay but as the robot moves even in this sort of state of the
moves even in this sort of state of the art perception system okay state of the
art perception system okay state of the a a few years ago I guess but um watch
a a few years ago I guess but um watch this that was running ICP I mean that
this that was running ICP I mean that wasn't the ICP updates but it it
wasn't the ICP updates but it it actually when it goes there it has a
actually when it goes there it has a model of the mug back back in the day
model of the mug back back in the day okay and it actually tried to align the
okay and it actually tried to align the model of the mug before going into to
model of the mug before going into to close the the the difference between the
close the the the difference between the far away cameras rough estimate of the
far away cameras rough estimate of the where the mugs were and actually making
where the mugs were and actually making the pick and people still do that today
the pick and people still do that today Leslie and toas we were in a meeting
Leslie and toas we were in a meeting with Leslie and toas the other day and
with Leslie and toas the other day and they're like we're going to do ICP for
they're like we're going to do ICP for this and and the young students were
this and and the young students were like okay that's kind of old school but
like okay that's kind of old school but uh but it still works like it still
uh but it still works like it still works really well
yeah yep so this is part one part two is like how do you do more robust versions
like how do you do more robust versions of this with partial views and outliers
of this with partial views and outliers and noise y so we're going to talk about
and noise y so we're going to talk about that next time good I I'll answer you
that next time good I I'll answer you can come on afterwards yeah thank you
Nhấn vào bất kỳ đoạn văn bản hoặc mốc thời gian nào để nhảy đến phần đó trong video
Chia sẻ:
Hầu hết transcript sẵn sàng trong dưới 5 giây
Sao Chép 1 Chạm125+ Ngôn ngữTìm kiếm nội dungNhảy đến mốc thời gian
Dán URL YouTube
Nhập link bất kỳ video YouTube để lấy toàn bộ transcript
Form Trích Xuất Transcript
Hầu hết transcript sẵn sàng trong dưới 5 giây
Cài Tiện Ích Chrome Của Chúng Tôi
Lấy transcript ngay mà không cần rời khỏi YouTube. Cài tiện ích Chrome để truy cập transcript của bất kỳ video nào ngay trên trang xem, chỉ với một cú nhấp.