The lecture explores the challenges and techniques for enabling robots to learn and operate effectively in the real world, primarily focusing on the "sim-to-real" transfer problem, where models trained in simulation are applied to physical robots.
Mind Map
Expand करने के लिए click करें
पूरा interactive mind map देखने के लिए click करें
hi everyone welcome to lecture today we
have a guest lecture
Josh Tobin Josh got his PhD in AI from
Berkeley has been a research scientist
at open the iock for several years and
it's one of the world leading experts I
would say the world leading pioneer in
seem to real how to make robots learn in
simulation and have it still somehow
work well in the real world and that's
exactly what we're going to learn about
today before Josh gets started a couple
of quick logistical things you have your
last homework your homework five is out
and you in about two weeks
so keep track of that and then your
midterm has happened and then final
project presentation time slot Paul has
gone out we'll look at that
soon and then we'll send out something
for signups in the next couple of days
you can pick a specific slot for your
team and if you have any complexes all
the slots will figure something out
probably with you're doing a recording
for us ahead of time all right any
logistical questions okay then please
join me in welcoming Josh all right
thank you Peter I'm really excited to be
here before I dive into the topic of
today's talk which is sim to real just a
bit about me
my background was in pure math and then
I you know decided that I wanted to like
do stuff in the real world went into
consulting for a little bit but miss
being technical so I came back to
Berkeley to do my PhD in applied math
but I'm little did I know when I came
back to Berkeley that I would do one
thing that would change at all which is
actually to take this class and so I
took CS 287 I think the last time was
offered around four years ago and that
sort of changed my trajectory from
- robotics and artificial intelligence
spend time doing those things that open
AI and Berkeley so the first thing I
want to talk about is just you know what
was my like takeaway from CS 287
actually I'm curious you you all are
almost done with the semester what is
what's your like main takeaway from this
yeah I really do or maybe no one's
learned anything from the class yeah you
are lqr yeah that's great that's a
that's not exactly my main takeaway but
I definitely did take that away my main
takeaway was get started early on
homework five it's a no I'm just getting
odd I think like the thing the thing
that I took away from the class more
than anything else is that you know
robotics is really hard right and so why
why is that the case right so you all
have talked about you know the
simplified model of how robots interact
with the real world which is an MDP but
I think one of the the core things that
makes robotics so hard is that the real
world really isn't an MDP right so in an
MVP you have an agent that gets to
observe the ground truth state of the
world but in the real world you know
states are super complex and they're
ambiguous and they they're really hard
to model so these are all kind of
examples of scenes where you should
think about how would we actually model
the state of this so what the robot does
get instead of this data is an
observation but the observations
themselves are often really high
dimensional and they're multimodal so
maybe they have many camera inputs and
they're also super noisy right so even
even though we do get an observation of
the world that observation might not be
reliable the next assumption that we
have in an MVP is that we get a reward
but my question is you know where does
reward come from in the real world right
so if you're trying to have a robot pour
a cup of coffee for someone how do you
actually set up a system that will give
you a reward back when they do that task
successfully and a couple of other
examples that you might want to think
about is what does the reward function
look like for folding a towel or for
you know cooking someone dinner or you
know ultimately like making their user
happy all right how would you define a
reward function for those things and
then even if you can define a reward
function a lot of times our reward
measuring our reward relies on having
sensors to tell us sort of where the
robot is in space so how do you measure
a reward outside the lab and then lastly
actions that robots take right so
designing controllers for robotics is a
really hard engineering problem you need
to understand the system that the robot
is interacting with very well and it
doesn't always scale that well to high
dimensions I like this quote from Russ
Tedric at MIT in t.ri which is that you
know manipulation like maybe one of the
main things that we care about in
robotics breaks all the rigorous and
reliable methods for control that we
know about and then once you do get your
controller you know your robot is going
to break and it's going to degrade and
the sensors are going to going to fail
and so how do you make sure that things
are reliable for that and so one thing
that I was really excited about when I
took this class is deep reinforcement
learning and you know deep learning more
generally applied to robotics and so you
know I think the the hope there is that
rather than like us needing to spend a
ton of time understanding the
environment that the robots can operate
in maybe we can just collect a lot of
experience and let the algorithm handle
the rest and so the next thing I want to
talk about is like what's preventing us
from doing that why is this so hard and
so I think that the main observation
here is that deep learning is super data
hungry right so from you know trading
models and images to sentences to
robotic control you really need like
often millions or tens of millions or
even more sort of labeled examples in
order to get things to work well but for
those who have for those of us in
robotics that presents a big challenge
right because robotic data is super
expensive robots themselves are
expensive and you know actually going
out and collecting data on robots can be
dangerous and then it often in the real
world is hard to actually get labels for
the things that we care about our robots
doing and so one of the main things that
motivated me when I started working on
my PhD is how can we get around the data
availability problem in robotics so is
there any way to make data more
plentiful so that we can do deep
learning and robotics there's a few ways
that people have thought about approach
this may be if the problem is we don't
have enough data like let's just scale
up data collection and so some research
researchers have thought about how to
make fleets of robots that all collect
data together and learn from their
shared experiences maybe if the problem
is that our learning algorithms are too
inefficient maybe we should just make
them more efficient and so more
efficient than sort of model free
reinforcement learning could be
model-based RL meta learning and
learning from demonstrations and I know
that you've talked about at least two of
these model base and LFT in the class so
far or you know maybe another way to get
around this is if if the problem is it's
one of the problems is it's really hard
to get labeled data in the real world
maybe we can do a lot of our learning
and unsupervised way and so I think you
know all of these approaches that I've
sort of touched on at a really high
level so far are really interesting and
I think are going to be a big part of
the story about how robots make it to
the real world but the question I want
to focus on is what can we do without
doing that so what can we do with
simulated data and you might ask you
know why why would we even bother with
simulated data at all
well if simulated data works it has a
lot of really big advantages so unlike
robotic data simulated data is super
cheap basically zero marginal cost it's
very fast you can run simulators faster
than real time and it's scalable right
so you can have a simulation running on
every core in your data center you don't
need to go and buy new robots maybe more
importantly it's safe right so you can't
actually damage something by running a
simulation at least not yet
you get labels for free because you
design the world so you know where all
the objects are and you know kind of how
the task is evolving and you're not
holding to real-world probability
distributions and I'll expand on what I
mean by this in a second but first
labeling I think this is kind of like an
underrated advantage of simulation there
are a lot of tasks where it's very hard
to get a human to label the data for you
so for example if you have images from
the real world and you want to have
someone annotate the ground truth for
depth in those images that's kind of
like a hard task for for a human to do
or similarly annotating the 3d pose of
objects when you only have a 2d image
both of those things you know
it's kind of hard to just go and ask
people in Amazon to do it for you what
do I mean by not being beholden to
real-world probability distributions a
couple of kind of examples I want to
mention here first is the edge case
problem so if you're training at
self-driving car right most of the time
your car is just driving on the highway
but every once in a while you see
something like this right like a you
know cyclist and a pink bodysuit or you
know a kangaroo hopping across the road
maybe that's common in Australia I don't
know but definitely not here or this
like crazy roundabout that's like five
roundabouts in one or something like that
that
and so you know the challenge with edge
cases in self-driving cars is that by
definition we have sort of very few if
any training examples for our robot and
so how do we like if we're doing machine
learning how do we not over fit to the
training examples that we have and if we
can train on simulated data that might
be a way around that another reason I'm
excited about simulation is for reducing
bias so you know take a toy example here
like let's say that we're training a
model to distinguish I don't know dogs
from puppies and our training data looks
like this right so this training data is
biased because it only has golden
retrievers in the dogs category and so
what would happen if we train a model on
data that looks like this well the model
might classify all Australian Shepherds
as puppies right that's really bad
because there are adult Australian
Shepherds as well and so you know the
question I have is can we fix this by
synthesizing adult Australian Shepherds
but I think kind of the core question to
ask yourself about all this is like if
simulation works it could be really
valuable valuable but what reason do we
have to believe that it should actually
work and so this is a quote from Rodney
Brooks that I like because I think it
captures what the way that most people
in the robotics community and the
machine learning community felt for a
long time about the value of simulation
right there's actually a near certainty
that programs that work well on our
simulated robots will completely fail in
the real world and the reason that
they'll fail is because you know in the
real the real world is not like
simulation there are differences between
between the dynamics in the real world
and the
in our simulator okay so what am I going
to talk about for for the rest of this
talk the first thing I want to touch on
is just I want to give you sort of an
intuitive sense of why it's so hard to
use simulated training data so you know
why is it the case that we have a gap
between simulation in the real world and
then I want to sort of briefly mention
you know simulation is a broad topic in
robotics and there even if you don't
solve this immaterial problem you can
still use simulation to help you build
robot systems that work and so I'll talk
about a couple of those then I'll
mention kind of some ways that you can
go about building a good simulation so a
simulation that's a good fit for the
real world and then I'll talk about a
couple of techniques to bridge the gap
so the first is domain adaptation and
then the second is domain randomization
which I'll have the most to say about
and then finally I want to mention a
couple of thoughts about sort of what's
next for this field of sim toril but are
there any questions before I dive into
that all right
so why is it so hard to use simulated
training data I think the core at its
core there are two reasons the first is
that it's really hard to accurately and
also efficiently model sensors and
physical systems and then the second is
that even if you have only like a small
modeling error that can tend to lead to
large errors in the behavior of the
downstream control system so why is it
hard to accurately and efficiently model
sensors and physical systems well you
know as we talked about a couple weeks
ago physics emulators make some big
assumptions about the world in order to
run faster right so a lot of physics
simulators assume that all the objects
are contacts or that we have sort of
discrete time steps with a relatively
large DT or that all bodies are rigid or
we have a simplified model of friction
let's say and so there's there's
inherently going to be gaps between any
model that makes assumptions that's
large and the physics of the real world
but also you know even if you can model
everything accurately then if you want
to carefully match the real world you
still need to get the parameters of that
simulation right and so how do you
measure things that are not directly
observable in your data like damping
inertia and friction
and you know the more accurate your
model is the more parameters it's going
to have and so the more of these things
you need to measure so that means that
you need more data in order to
accurately estimate them I'll talk a
little bit more on how to how to do this
later but it's not just physics
you know we can do a reasonably good job
now of photorealistic rendering of
sensors and so this is an example from a
movie a few years ago ago which is the
remake of The Jungle Book and I think
this is a really good example of super
super high quality rendered images but
if you look at how much effort went into
creating images like this it's like tens
of hours of artist effort per frame
right so getting sensor data that's this
high quality is very expensive and it's
really not a solved problem right and so
I think lidar is kind of one example of
something that people see as as being
relatively easy to simulate but in fact
there there are a lot of gaps between
how wide our simulators work and
real-world lidar data ok so there's so
there's always going to be some sort of
gap between your simulator and the data
that you get from the real world but
what's worse is that you know if there
is a gap then then simulators will tend
or then your model will tend to exploit
it right so one reason for that is that
like one of my intuitions about neural
networks is that they're very lazy
alright and so if there's like some
artifact in your data distribution that
they can't exploit they will exploit it
and so an example to illustrate this
point is the virtual kiddy data set and
so they essentially took each scene in a
self-driving car data set and
exhaustively reproduced it in simulation
they trained a model on both the real
data distribution and the simulated data
distribution and so even though this is
kind of like the best that you could
expect to do in terms of recreating your
real data distribution there's still a
big gap in the performance between
training on the simulated version and
the real version but so you might say
well maybe it's ok if we have some
errors because our our robot should be
robust to errors that we make in in
modeling right I think one challenge is
that errors tend to compound
for for gaps between symmetry oh so what
we hope happens is you know if you have
some blue curve that's like the path
that you want the robot to follow and
the green curve is the path that it
actually follows where you kind of make
small mistakes along the way but those
mistakes are uncorrelated and so you're
able to kind of keep the robot on track
but what actually happens a lot in the
real world is that you have you know the
same path that you're trying to follow
but the robot gets off the path and it
gets so far off the path that it's out
of the data distribution that is trained
on and it's not able to recover all
right any any questions about what I've
covered so far all right um the next
thing I want to talk about is like so
we've we've sort of established that
this this Imperial problem like the
problem of using simulated data for
real-world tasks is hard so the next
thing I want to address is like why why
should we do this at all maybe we can
take advantage of simulation without
needing to Train robots on it and
there's a couple of ways that you can do
that one is you know simulation is great
for prototyping your algorithms
simulation is also really good for
debugging your specific implementation
and making sure that you know you have
sort of bug free code before running it
on a robot prototyping entire systems
and then testing so for prototyping
algorithms I think this is like really
common in reinforcement learning for
example where people will if they're
trying to come up with a better
reinforcement learning algorithm they'll
you know almost always run that in
simulation before it ever makes it to a
real robot and you know the reason for
that is you want to you're going to have
to do a lot of cycles since you want to
make sure you're using those cycles
efficiently it's also useful for
debugging your software and so typically
this is done in tools like gazebo and
Ross that are very similar to the
software that's actually going to be
running on the robot and so what you do
here is you actually implement your
entire stack that you want to run on
your robot and then you make sure that
it with realistic latency and all the
sort of bugs in your raw stack that
you're able to get things to work in
simulation first before to apply
deploying them on the robot
another use case that you see a lot in
industrial robotics is for prototyping
entire systems right so you know for
example like if you have some tasks that
you want your industrial robot to solve
then you need to figure out what robot
you're going to use and you need to once
you figure that out you want to like
kind of make sure that it's going to be
able to solve the task before you go and
buy it and like invest the effort into
installing it and then you often need to
design the entire cell itself so like
the entire workflow that the robot is
going to be part of and you can
prototype things like that much faster
in simulation and then finally you can
kind of test how long things are going
to take and make some sort of ROI
calculation before you decide to invest
in expensive robots one other kind of
use case that I'm actually really
excited about that I want to mention is
for reliability testing and/or like
continuous integration for robotic
software development and so you know
that I think the question is like say
you're developing a self-driving car and
you make some change to your to your
vision model right and so how do you
make sure that that change to your
vision model is not actually going to
degrade performance in the real world so
the most straightforward way to do that
is you can run tests against your log
data so you can like look at all the
sensor data that your robot has seen
before and then you can look at your
model against that and make sure that
the error is a mix aren't too big but
the challenges are that like log data
has itself incomplete right it's it's a
noisy observation of the world you don't
have a full state information and you
know it's partially observed and then
importantly like log data is also static
right so really what you want to do is
you want to make sure that your entire
control system still behaves well when
you make a change to your vision system
but if you're looking at log data then
you know you you can't like you can't
explore what happens when your robots
behavior itself changes you can only
look at sort of what's happening in the
current time step and so a lot of
self-driving car companies have invested
pretty heavily in this there's an
article that came out about Wei Mo's
simulation testing setup that I
recommend checking out and you know
they've they've run like several more
orders of magnitude
tests in simulation than they have on
real cars another example of this that I
really like is the approach that rust
hedrick's group is taking at Toyota
Research Institute and so they call it
simulation first robotic development and
basically what this means is that they
have a bunch of tests that run in
simulation every night and so they want
to make sure that like any changes to
the code base that they push during the
day they want to know how those affect
the behavior of the robot in the
simulation and so the key is that that
they've mentioned that have been
important for this technique being
successful are making sure that the
simulation is harder than the real
environment that they're trying to solve
being rigorous about sources of
randomness so you know knowing that if
you're you have a degradation in
performance it's not just because you
got unlucky with the random seed and
then manually going through the errors
that your model makes to find kind of
sources of bugs so like looking you know
if you if you if you had a degradation
in performance overnight then like
actually going and looking at each of
those cases and saying like ok why did
we make this mistake
and then lastly good contact simulation
is important for them ok the next thing
I want to talk about is like if like
let's say that we decide to use
simulation to train our robots or just
to do testing then how can we actually
go about building a good simulation but
I'll pause there just to see if there
were any questions on the pass section
so my question is do you know if most
self-driving car companies just use
simulation for testing or do they also
train on that data yeah I think
self-driving car companies are sort of
interested and curious about training on
self-driving car data but and I think it
so from the ones that I've talked to you
they have tried doing this but I don't
think it's a widespread practice right
now you quoted earlier about like make
simulation hotter than reality
yeah like expand on that how can you
actually make simulation like a more
rigorous than reality that sounds a
little bit like counterintuitive sure
yeah it's a good question so the so like
one way to think about it is let's say
that you're you're training a robot that
you want to be able to grasp objects
right and so you have some properties
that you know about the objects that you
want to be able to grasp like maybe
they're all they're all objects that you
would see in a kitchen and they're all
between this size and this size and you
know it's always going to be well the
kid like lighting conditions are always
going to be good when you're trying to
grasp the robots one thing that you
might think about when designing the
simulator is like take your worst-case
estimate of all those things and just
make sure that your simulator is like
sort of bias towards that so make sure
that you're giving the robot lots of the
hardest objects for it to grasp in the
simulation rather than you know like if
you only see those hard objects 1% of
the time in the real world maybe you see
it more of the time in in the simulated
world all right so I think the process
that people typically go through when
they're designing a simulator is you
know first you kind of build the model
of the world and then you create
scenarios so the first part is about
like designing the physics and making
sure that you have accurate a kind of
model of your robot then the second part
is about creating the scenarios that the
robot is going to interact with so like
which roads is it going to need to drive
on or which objects is it going to need
to pick up
and then finally typically what you'll
do is you'll collect a bunch of data
from the real world you use that data to
do system ID which is a process of
improving your simulation so just really
briefly on designing the simulation
model I'll just refer you back to
Peter's lecture earlier in this class in
practice what most people do is they
don't build their simulator from scratch
they just pick bullet or pie bullet or
mojo and then use the models that are
provided for them by the developer of
their robot a couple of other simulators
that are worth looking at Drake from
tetrax group again at MIT which is I
think pushing a little bit more towards
trying to make more realistic simulation
at the expense of it being maybe a
little bit slower and then gazebo which
was the most popular simulator those
popular simulator for a while in
robotics and has since fallen out of
favor but if you're doing a lot of stuff
with Ross it's still worth exploring the
next thing you need to do is create your
scenarios for the robot and so this is
kind of the process of like designing
the world that the robot is going to
interact with and so I think kind of one
of the main questions here is like where
do we actually get 3d models for things
right so if the robot needs to interact
with objects where we find examples of
objects that the robot can interact with
in the simulator there's a spectrum of
different options that sort of have
trade-offs in terms of the quality of
the objects and the number of objects
shape nut is the one that's freely
available that I think has the most
objects and so it's like in the high
tens of thousands but the quality of the
objects themselves tends to vary ycb is
sort of at the other end of the spectrum
very very high quality object models but
Dex net is a data set from from from
Jeff Mahler at Berkeley and it's
actually a combination of other data
sets and this is I think at a good
pretty good trade-off point between
quality of the models themselves and the
number of models that you have access to
a couple of other things to be aware of
there are like all these sort of 3d
model repositories that you might see if
you've ever done like game development
these are worth checking out generally
you can't get things for free from them
and then lastly procedural object
generation and I'll talk a little bit
more about that in a second and so then
you know the next question is like we
have our database of objects that we
want the robot to be able to interact
with like many hammers and cups and
whatever it is that we need our robot to
do and so then the next question is like
how do we place those into the world in
a coherent way you could just try
placing them randomly but what tends to
happen if you do that is that all the
objects will sort of collide with each
other and they'll be in very unrealistic
configurations you can place them
randomly according to physics so maybe
you just have like a box that the
objects are sitting in and you might
drop them from above the box before the
scene starts so they're so that they're
at least placed in a way that's
consistent with physics or you might do
it procedurally so I mentioned
procedural content generation in the
context of object modeling and also
world design and so this is kind of a
pretty big and well explored area in
game design I won't really go into it
but there's a book that I recommend if
all right and so lastly you have you
know you've built your simulator you
have collected a bunch of scenarios that
you can put in your simulator that you
want your robot to perform well on and
so then the next thing to do is like
you've made guesses at all of your
physics parameters when you've been
modeling modeling the world and so the
next thing you want to do is like
actually actually collect a bunch of
data and use that data to try to make
the simulator a better match for reality
and so this is the process of system ID
so what is the problem that we're trying
to solve with system ID well we have
some like parameters parameters of our
simulation so these are things like
friction damping you know mass of
different links of the robot and then we
have some sequence of actions that we
want the robot to follow and so the goal
of system ID is to try to find the set
of parameter values that give you the
sort of lowest loss the lowest
difference between what the robot does
when you execute execute those actions
in the simulator
what it does when you execute those
actions in reality alright so there are
a few design choices here
one is like how do you actually choose
this sequence of actions so like what do
you want to actually run on the robot in
order to like in order to kind of
minimize the difference between simin
real and then another is like what
distance function do you want to use so
how do you tell if how you measure if
these two trajectories are close to one
another and so I'm going to just give a
quick sort of case study of how this
works for one problem which was doing
system ID for the shadow' hand in some
of the opening I robotics experiments so
in this case they chose trajectories
that consisted of kind of individually
moving each joint to its limits and then
moving each finger individually along
like kind of spline curves to try to
capture the inter dependencies between
the joints and then the distance
function that they use is they took you
know they apply the same sequence of
actions both in Sim and real and then
they looked at where the robot was one
second later and then they took the
difference between those those states
and tried to minimize that distance and
then finally the optimization algorithm
that they use to do this minimum a
minimization was sort of iterating
between the coordinates and doing
coordinate descent until all things converged
okay so you've you know you've designed
your simulator and you've tried to make
it as good a fit to reality as you can
but you know as I alluded to earlier
there's still gonna be gaps all right so
your simulator is still not really going
to be a perfect match for reality and so
kind of the rest of the talk is about
what to do about that but before I dive
have you ever seen a simulator that has
good like other agents as well like
simulation of like other cars in the
scene or maybe things that the robot is
gonna interact with yeah it's a it's a
really good question I think so a couple
of ways that I've seen people do deal
with this one is if you're if you can
incorporate the learning of the other
agent into the actual optimization
process so if you're in like a multi
agent reinforcement learning setting
that's kind of one way that people deal
with this but in general this is a
really hard problem and this is like
when you talk to self-driving car
companies about their problems with a
simulation one of the biggest ones that
they cite is not having good models of
like how pedestrians are going to behave
or how other drivers are going to behave
and so I think you know if you can
figure out an answer to that question
then it's going to be really really
valuable in that industry yeah for the
data collection in the previous slide
how about running an optimization over
the data collection mm-hmm so you're
saying also figuring out trying to
optimize which data set to collect in
order to minimize yeah that's really
interesting I've I'm sure that there are
examples of people doing that I've not
yeah do you have a sense for which types
of tasks as possible to make a good
simulation forum which types of tasks is
just like near impossible mm-hmm so I'll
give a couple of categories of things
that are maybe harder to make a
simulation for I think anything where
it's so anything where it like
explicitly violates the assumptions that
most simulators make so if you have non
rigid bodies like if you need to do
cloth simulation for example you can do
that but it's it tends to be harder so
that's kind of one category things like
maybe if your robot needs to interact
with fluids that might also be really
difficult and then another category of
things is just where you have or the
like set of things that you need your
robot to be able to solve is just really
wide so if for example you're trying to
make a simulator for all of self-driving
cars then it's like how can you possibly
get enough variety in your simulator to
capture all the different scenarios that
well it's really recording I think yeah
how do you think about like the model
class that you should have here for
example like you could you can imagine
using a neural network but like in
between all the data points that you
observe its arbitrarily bad so how do
you how do you what is your process for
selecting the appropriate model for for
mx4 the dynamics itself yeah I think I
think typically the thing that you want
to do is like so neural networks can
like basically learn any function right
or they can learn any function if you
have enough data but the challenge with
neural nets is that they don't really
generalize that well to data that's out
of the distribution that they've seen
before and so I think like one of the
reasons why simulated training data
works so well is because we like we as
humans actually know things about the
world right it's not like the world is
just a black box that spits data out at
us like we understand at least at a
simplified level how physics work and so
I think one of the one of the reasons
why this approach has actually been
really successful is because if you
build a physics model of the world and
then use that to generate data then even
if it's not perfect we're exploiting
some of our knowledge about how the real
world works so I guess my answer is I
would I would suggest like trying to use
like physically based models and yeah
and I think Peters lecture on that from
a couple weeks ago is a there's a good
all right so addressing the gap between
the simulator in the real world the
first class of techniques that I want to
talk about is domain adaptation so this
is sort of a broad topic in machine
learning and I don't really have time to
do it justice but what I do want to do
is just give a few examples of how
domain adaptation techniques have been
applied for sim to real problems in
robotics and so I would kind of
categorize domain adaptation techniques
into two buckets one is supervised
domain adaptation where you know like
let's say that you assume that you're
able to get labels or rewards signal in
the real world and then the other is
unsupervised or weekly supervised domain
adaptation where you know your
assumption is that we have labels and
rewards in the simulator but in the real
world we don't we just have like only
unlabeled sensor data so the first
category is supervised domain adaptation
and so you know for those of you that
have that have done like that range like
the convolutional neural networks for
example the simplest form of supervised
domain adaptation is just fine-tuning
right so you train on some source data
distribution and then you kind of take
the weights that you get from training
on that source data distribution and
then you just retrain them a little bit
on data from your target data
distribution so from the real world in
this example and so this this can work
quite well in robotics and it's present
in a lot of papers but it's kind of
rarely the focus of people's effort when
they're doing research in this area one
kind of extension of fine-tuning is this
idea of progressive networks and so you
know one of the challenges with fine
tuning is that when you fine-tune a
model that's trained on one data set to
another data set it tends to forget what
it learns in the first data set and so
progressive networks are kind of a way
to try to address that where instead of
fine tuning the same network they
instead like add some additional layers
to the network and then those are what
they train on the second data set
another approach that people have tried
here is what I would call like learning
inverse dynamics and so inverse dynamics
is basically you you have kind of the
current state of the world and then you
have some goal state that you want to
get to and the learning problem
solve is learning what action will take
you from from this state to the next
state and so there's a couple of
different variations of this technique
that people have tried another idea in
this category is using simulation to
find a low dimensional search space so
one thing that you can do is like you
know one of the reasons why it's slow to
learn policies or models in the real
world is because you're searching over
this really high dimensional space which
is like all possible policies but if you
train in simulation and you use that to
find like a sub manifold of of that huge
space and then search over that in the
real world then that could make learning
more efficient and then the final
category that I wanna just quickly
mention here is using simulation
explicitly as a Bayesian prior for for
your learning in the real world and you
know there's there's quite a bit of
research in this area and I think this
category is actually particularly
exciting for kind of ongoing research
yes yeah so the idea in this category of
of techniques is like so you train a
model in simulation and the goal of that
model in simulation is to tell the robot
what states it's trying to reach at any
given time point and so that model might
say like alright here's a trajectory
that I you know given the state of the
world I see now here's the trajectory
that I want to follow but you know the
challenge is that if you if you just
apply the actions that you took in the
simulator then that won't actually allow
you to follow the same trajectory in the
real world and so what you're doing here
is you're kind of like you're you're
taking the output of that simulated
model which says like alright since I'm
in this state I need to go to this state
next and then what you're learning on
real data is the function that allows
you to get from this state to the next
all right so there's also kind of less
supervised domain adaptation one
category is weekly supervised where you
take you know you take the labels of
that your model outputs and you treat
them as kind of you you treat you take
your models predictions and you treat
those as noisy labels for fine tuning
there's self supervise to be an apt
ation so if you can create a system that
allows the robot to do things that
automatically allow it to label the data
so if you know that for example that if
you kind of that if like if you have a
sensor that tells you that this object
has moved then that might tell you well
okay if the object is movable a certain
height then that means that our that our
attempt to grasp that object was
successful that's kind of this category
of things and then lastly is
unsupervised domain adaptation and so I
think kind of the most exciting recent
and advance in this is taking image to
image translation models and applying
those to domain adaptation and so what I
mean by that is you might have some data
from your simulator that's unrealistic
and then you might also have some data
from the real world but the data from
the real world doesn't have labels and
so what you can do is you can learn a
function that map's your simulated data
into the real world and tries to match
the data distribution from the real
world and so the idea is like you're
kind of translating the image from the
simulated domain into the real domain
and if you can do that successfully then
what that allows you to do is take your
as to train on data that's instead of
just your simulated data it's the
translated data so you can train on data
that looks like this and then the hope
is that when you go to the real world
the it's close enough that that things
all right any questions about domain
adaptation though it's kind of like a
quick tour it's a very deep topic but I
all right the next topic I want to talk
about is domain randomization and so the
idea here is you know in a lot of in
sort of the techniques that we've talked
about so far for sim to real transfer
the assumption has been like let's try
to model the real world as closely as we
can in the simulator and if we get it
close enough then like maybe that data
will be more useful in the real world or
maybe at least though it will allow us
to kind of adapt between that data and
the real world data the idea of domain
randomization is a little bit different
which is instead of trying to find a
single best simulator let's just make
the simulator as varied as possible and
you know maybe the hypothesis is that
like maybe if the if the model sees
enough simulated variation so it sees
enough kind of different simulated
worlds then when it does get into the
real world
it'll have learned sort of a general
enough strategy because it sees so much
variety that it'll be able to figure out
what's happening in the real world so
this is kind of a core idea and what I
want to cover on demand randomization
first I want to give like kind of a
little bit of a history of the idea
because I think it's it's important to
kind of know where this idea came from
and it's not a new idea then I want to
talk about some of the applications that
people have used it for then I want to
try to give you a little bit of an
intuition as to why it works right
because it's kind of a counterintuitive
thing right
why should training out a lot of really
unrealistic data allow us to generalize
to realistic data then I want to breach
briefly mention a few tools if you want
to use this in practice that you can
that you can go try and then finally I
want to talk about some extensions that
people have made to this core idea and
sort of how this research field is
evolving all right starting with the
history I think you know so so again
this is like the this idea of using
really noisy simulators is not new in
robotics and the first instance that I
know of is from this paper called the
radical envelope of noise hypothesis
from 1997 and the idea here is if you're
trying to solve a task like you know you
have a robot driving down a hallway and
needs to turn decide whether to turn
left or right depending on whether it
gets a flash of light from the left or
the right how do you build a simulator
to solve this task well the the insight
of this paper was to say there's you
know some things that we really need to
model carefully in order to actually
solve the problem at all and so that's
that's what we call like the the base
set of things in the simulator right so
is the light coming from the left or is
it coming from the right how long is the
hallway and things like that and then
there's a bunch of other things that
that we need to model in our simulator
but that are sort of inconsequential for
solving the task so like you know what
is the what is the friction model
between the wheels and the hallway and
so that the insight here is we want to
take the base set and model those things
as carefully as possible and then take
everything else and maximally randomize
it and they were able to solve this task
using sort of a very simple simulator by
using this technique in the deep
learning world the first example of this
that I've seen is from this page like
very underrated paper called live
repetition counting and the idea of this
paper is you know they wanted to train a
model that could count when people are
doing cyclical behavior so when they're
doing push-ups or jumping jacks or
something like that but they didn't
really want to like go through all the
effort of labeling data of people doing
that so what they did instead is they
created the synthetic training data set
that consisted of kind of random white
noise in the background and then
cyclical periodic noise in the
foreground and the really surprising
result from this paper was that when
they trained model on data that looked
like this random noise and then tested
on real data they were able to actually
solve the task right they're able to
count how many times people were doing
jumping jacks the first application of
this sort of concept in robotics in sort
of deep learning of robotics that I'm
aware of is this paper called CAD to RL
from Sergey Levin's group here at
Berkeley and the the task that they were
trying to solve here is driving a
quadcopter down a hallway and making
sure that doesn't hit the walls and so
they built this simulator that had that
was randomized with these different sort of
of
semi-realistic textures and floor plans
that they that they designed and what
they found was that when they trained on
the simulator they were able to fly a
quadcopter down a real hallway and not
crash at least reasonably frequently and
then I think you know the last two
papers that I mentioned were kind of the
inspiration for for us to start working
on this and the the core thing that we
wanted to try to figure out is whether
we could apply this idea to sort of more
precise tasks in robotics so two
grasping something where you need to be
able to position the gripper really
carefully and we were also curious to
see whether you know whether we could
get away without needing to design floor
plans and textures ourselves if instead
we could just procedurally generate
those in a really unrealistic way and
then finally we are curious if we really
needed to pre train these models on
imagenet in order for this to work or
whether we could just train them only on
synthetic data all right so the next
thing I'm going to talk about is some of
the ways that people who apply this idea
and the first is kind of the problem
that I just mentioned which is using
domain randomization for computer vision
and in particular using for using it to
estimate the pose of a particular object
in a scene and so what we did here is we
for each scene so for each like image
that the the model saw we gave it a
unique set of randomization so we
randomized things like textures and
materials colors of the background and
things like that we changed the
positions of the cameras we change the
lighting and we added a bunch of other
objects to the scene that we're sort of
trying to distract the model from the
object that it ultimately cares about we
trained a relatively simple neural
network so this is kind of just a vgg
with the top two fully connected layers
popped off then smaller ones put on top
and the model is taking an image of a
scene and regressing it to just the XY
and z coordinates of a particular object
that we care about in that scene so how
well does it work you know this is sort
of an unfair comparison because you know
all these papers use different objects
and different distances for
the camera and so on but kind of at a
high level were sort of within what
you'd expect to be able to do with
relatively state-of-the-art post
estimation techniques from a single
singular single monocular camera
training entirely on synthetic data and
so here's what this work looks like when
you deploy it on a robot to grasp an
all right well we'll see if we can get
oh there we go okay yeah so this is a
kind of extent an extension of the
original paper where we were you know it
was April Fool's Day and we wanted to
like see if we could train a robot that
could detect like spam in the real world
so we've it we we built the spam
detecting robot and I would pick up the
spam off the table and drop it in the
trash can the other the next thing that
we applied this to at open a I was block
stacking and so the goal here was we had
trained a policy that could do block
stacking in simulation using one-shot
imitation learning right so see a single
demonstration of a human doing the task
in virtual reality and then apply it
from different initial conditions but
the challenge was in order to deploy
this in the real world we needed to know
where the blocks were and so we we
trained similar model really similar
data set and we were much more careful
about sort of calibrating cameras and
stuff like that and we were able to get
really precise localizations of the
objects so that you could actually stack
six blocks on top of each other using
you know vision model is trained it
entirely on synthetic data so how does
it work a few observations one is you
know one of the really important things
is just using a lot of data so as you
increase the number of the amount of
training data on the x-axis then the the
error goes down at least until you get
to around fifty or a hundred thousand
images you might ask like what's what's
the important part of having more data
is it just having more training examples
so we tested whether we could get the
same results with the same number of
images but with fewer unique textures
and it turns out that the important
thing here was you need to have as many
unique textures as you can so as if as
you increase the number of unique
textures the error also goes down and
then lastly and this is kind of
surprising to us was that to find that
you know pre training our model and
imagenet is actually not necessary and
so you can see like appreciating
imagenet actually does help right like
if you are in the low data regime then
the model pre trained on imagenet is
still able to do something reasonable
but as you
enough data then then pre-training
becomes unnecessary alright so here
here's just a few other kind of
highlights of results that people have
had using using this technique or
extensions of this technique to solve
other kind of perception problems in robotics
robotics
so people have extended it to estimating
you know not just where the object is on
the table but the full sort of 6d pose
so the position and orientation of the
object people have extended it to doing
objects with really challenging textures
and so this is a paper where they use
domain randomization to train a model
perception model for a system that was
grasping fish out of a bucket not sure
why they picked that task but it's it's
a really challenging one for from a
perception standpoint because fish are
like kind of shiny and reflective and
and difficult to model those textures
and then people have also extended to
you know instead of just localizing
where a single object is with a single
network instead localizing kind of an
entire corpus of objects a few others I
want to highlight people have used these
techniques for object detection for
autonomous vehicles for face tracking so
taking a simulated model of your face
randomizing it training a model on that
to sort of tell the pose of your face
from a single camera image localizing a
robot within a lung so you know your
your if you have like if you're driving
a robot around a lung and you want to
decide whether to turn left or turn
right you need to know where in the lung
you are and so if you have a vision
model that allows you to sort of take
the the image that the robot season and
then map it back to where on the map of
the lung you are sort of end-to-end
control so instead of just training a
pose estimator you can also train a
policy that takes images directly and
outputs commands to the robot and so
people have people have done that with
germain randomized data and then also
cloth manipulation so estimating kind of
the state of a cloth so that a robot can
fold the corners together so also you
know tasks where there's
non-rigid objects this technique also
works for other types of sensors so some
of Jeff Mahler's work here at Berkeley
on decks net which is a sort of work
that does a really good job of grasping
generic objects there their models are
trained the inputs to their models are
images from at least in this version of
the work or images from a depth camera
and you can apply a similar set of
techniques by like adding a lot of
random noise the depth image and then
you can train on synthetic depth images
and generalize to real depth images in
the real world a couple of assumptions
that the results I've shown you so far
have in common the first is that you
actually have 3d models of kind of all
the objects that you want to track and
so you know one thing that you might ask
is how can we how can we move from this
right so how can we move from needing 3d
models of every object that we care
about to being able to train a sort of
generic vision policy that can work for
any type of object so we explore this in
the context of grasping right so you
know in grasping like you care about
really being able to grasp any object
but as we talked about earlier it's
really hard to get good databases of
objects to train your model on and so we
ask the question well maybe you know
similarly to how you don't really need
realistic textures in order to train a
vision model maybe you also don't need
realistic objects to train a grasping
model and so we procedurally generated
these sort of highly unrealistic and
objects like on the left and we trained
a policy to pick up those objects in
simulation based on you know based on
depth images of the objects and then we
tested on real-world objects and kind of
the surprising thing that we found out
was that we're able to actually
generalize to grasping realistic objects
in the real world from only training on
highly unrealistic procedurally
generated objects entirely in simulation
this is what it looks like so again it's
not perfect right this is not we're not
getting a hundred percent success here
but it's you know it's a grapple
generalized grasping as a very
difficult problem and the the
interesting result here is that we're
able to get something to work at all
using entirely simulated data another
assumption baked into the results I've
shown you so far is that sort of
dynamics are relatively consistent
between the simulator and the real world
so what if what if that's not true
right so what if your what it you know
what if there's some gap in physics
between your simulator in the real world
and so similar set of ideas also applies
to randomizing dynamics so the way this
typically works is you know in standard
reinforcement learning you'll train like
a feed-forward neural network policy on
a single best version of the environment
and then you'll execute that on your
test environment but what these
techniques do is instead they train a
recurrent neural network so a neural
network that has some state and they'll
train that on a variety of different
physical environments and the idea here
is that the the memory of the neural
network in principle should allow the
neural network to kind of figure out
which version of the simulation it's in
and adapt to that simulation so the
first set of results here was from Jason
pang during his internship at open AI
and they worked on kind of these tasks
that involved sort of sliding objects on
table so this is trained entirely in
simulation with their with their kernel
network and and then generalize this to
the real world this has also been
extended to more challenging tasks so
this is a result from open AI about a
little more than a year ago where they
we trained a robotic hand high
dimensional robotic hand to sort of
reorient it and manipulate objects in
hand so it's a very like contact rich
and challenging task and this is trained
you know more or less exactly the same
way using randomized physics parameter
parameters and a little bit more detail
you know the way this worked is there
were a bunch of different sort of
variations of the environment and robots
were trained in those environments using
reinforcement learning and there were
the the recurrent policies were forced
to adapt to a wide range of different
physics environments and then in the
real world there was also a sort of a
state estimation module that was trained
in a similar way so trained on vision
data from the simulator and then
deployed deployed in order to like
estimate the state so that the policy
could know what to do next and so the
things that are typically randomized
here and that were randomized in this
paper are things like physical
parameters but then also just sort of
correlated and uncorrelated noise being
added to the simulation sensor dropout
so occasionally just assuming that the
sensor fails how long the physics time
how long DT is in the physics simulation
there's a model of backlash that's
applied to it and they're random forces
that are applied to the object as well
so there's like quite a bit of effort
that went into figuring out what are
really all the things that we need to
randomize in order to make something
like this transfer all right any
questions about kind of like the high
level idea here sort of where people who
applied it to where it works where it
doesn't before I move on to talking
about my intuition about why this
actually works yeah behind you
so when you see Sims 2 real not working
how can you tell what the failure mode
was like was it the dynamics that was
different was it the pose estimation
that was different
yeah it's a great question and I think
this is sort of one of the core things
that makes that makes using these
techniques still difficult there's a few
things that you can do to make it easier
I think in a lot of the opening eye
results the sort of approach that we
took was to separate perception and
control so we'll have one neural network
that's looking at raw sensor data like
images and then it's trying to output
what it thinks the state of the world is
given those images and then we'll have
another module that says alright given
that I know the state of the world let
me try to predict what I should do next
and so separating those two things
allows you to audit it a little bit more
easily because then you can look at the
the errors that the pose estimation
module like the state estimator is doing
specifically and isolate that as a
source of error but in general like if
you you know if you're if you're
deploying a model the real world and it
you know in it and it fails right it's
like it's still a very hard problem to
go back and see like okay what like did
this fail because you know there was
there's like way more friction in this
scenario than we modelled or you know is it something else
it something else and so I think there's kind of a lot of
and so I think there's kind of a lot of like intuition and an engineering that
like intuition and an engineering that goes into this still I also have one
goes into this still I also have one more question yeah so how much domain
more question yeah so how much domain knowledge is required when you're
knowledge is required when you're applying these techniques to one robot
applying these techniques to one robot versus then trying it for maybe a
versus then trying it for maybe a slightly different application is there
slightly different application is there a lot of fine-tuning with each
a lot of fine-tuning with each individual robot and each individual
individual robot and each individual application or do you think you're
application or do you think you're getting closer to a general strategies
getting closer to a general strategies yeah it's it's a little bit hard to
yeah it's it's a little bit hard to answer that definitively because this
answer that definitively because this has really only been tried on to my
has really only been tried on to my knowledge like a pretty small number of
knowledge like a pretty small number of robots there's the the fetch robot that
robots there's the the fetch robot that I showed you for the grasping examples
I showed you for the grasping examples and the shadow hand and I think there's
and the shadow hand and I think there's there's other examples too but those are
there's other examples too but those are the two that I'm most familiar with I
the two that I'm most familiar with I think the thing that gives me hope that
think the thing that gives me hope that maybe it's like maybe we're starting to
maybe it's like maybe we're starting to figure out the limits of the parameters
figure out the limits of the parameters that we need to randomize and things
that we need to randomize and things like that is that the like opening item
like that is that the like opening item was able to get the kind of the in hand
was able to get the kind of the in hand block manipulation result to work with
block manipulation result to work with other objects like different shaped
other objects like different shaped objects with sort of relatively little
objects with sort of relatively little additional effort on top of that so
additional effort on top of that so that's
that's of the one pieces of evidence that I can
of the one pieces of evidence that I can point to that says you know maybe once
point to that says you know maybe once we figure these things out once it's
we figure these things out once it's easier to expand them to other types of
easier to expand them to other types of problems question behind you so for
problems question behind you so for doing the physics parameter
doing the physics parameter randomization that can't be skipped the
randomization that can't be skipped the system ID stuff ah great question in
system ID stuff ah great question in principle you would hope right that like
principle you would hope right that like if we're randomizing physics then the
if we're randomizing physics then the whole point is that like we're trying to
whole point is that like we're trying to make the our simulator so much more
make the our simulator so much more diverse than the real world that you
diverse than the real world that you know it doesn't matter that it's not
know it doesn't matter that it's not exactly the same and I think for vision
exactly the same and I think for vision that's mostly true like you don't really
that's mostly true like you don't really have to be super careful about
have to be super careful about calibrating things for for vision but
calibrating things for for vision but for dynamics that's decidedly not true
for dynamics that's decidedly not true so it is it is still important like the
so it is it is still important like the better your system ID is the more likely
better your system ID is the more likely this technique is to be successful for
this technique is to be successful for for dynamics randomization even if
for dynamics randomization even if you're sort of have a big range of
you're sort of have a big range of randomization parameters and I'm not
randomization parameters and I'm not sure why that's the case all right so
sure why that's the case all right so the next topic I want to touch on is
the next topic I want to touch on is like why why does this work right it's
like why why does this work right it's kind of this mysterious thing you have
kind of this mysterious thing you have sort of all of its very training data
sort of all of its very training data and you train a model on it and then it
and you train a model on it and then it kind of just magically works on your
kind of just magically works on your real data even though the simulated data
real data even though the simulated data is super low fidelity and unrealistic so
is super low fidelity and unrealistic so there's a few so I think no one really
there's a few so I think no one really like has a great answer to this question
like has a great answer to this question there's a few intuitions that I have
there's a few intuitions that I have that I want to just sort of lay out for
that I want to just sort of lay out for you and I'll talk a little bit about
you and I'll talk a little bit about each of these so the first intuition
each of these so the first intuition that I have is that you know maybe the
that I have is that you know maybe the training data itself comes from like
training data itself comes from like some sort of covering distribution of
some sort of covering distribution of the real world data and so this
the real world data and so this intuition intuition says that like if
intuition intuition says that like if you have a none randomized simulator
you have a none randomized simulator maybe like this is sort of the
maybe like this is sort of the distribution of kind of like
distribution of kind of like environments and physics that you would
environments and physics that you would see but the real data is like this
see but the real data is like this complicated messy distribution of of
complicated messy distribution of of environments that's like much wider than
environments that's like much wider than your simulated distribution and so
your simulated distribution and so that's why you don't generalize well if
that's why you don't generalize well if you use a none randomized simulation and
you use a none randomized simulation and so what this intuition says is like well
so what this intuition says is like well maybe what we can do is we can just make
maybe what we can do is we can just make the range of randomizations so big in
the range of randomizations so big in simulation that like everything that we
simulation that like everything that we might see in the real world like
might see in the real world like lies somewhere in between different
lies somewhere in between different things that we've seen in rent when
things that we've seen in rent when we're randomizing and so like maybe the
we're randomizing and so like maybe the domain randomized data looks like this
domain randomized data looks like this it's like this massive distribution that
it's like this massive distribution that covers the real distribution there's so
covers the real distribution there's so I think this is kind of a flawed
I think this is kind of a flawed intuition there are a few things that I
intuition there are a few things that I think are useful about it one is the
think are useful about it one is the idea that wider distribution like you
idea that wider distribution like you know implication of this is that as you
know implication of this is that as you make the distribution of simulated
make the distribution of simulated parameters wider you should get better
parameters wider you should get better results and that tends to be true
results and that tends to be true another thing that I think is useful
another thing that I think is useful about this is that the concept that
about this is that the concept that we've already touched on in the in the K
we've already touched on in the in the K in the sort of instance of testing that
in the sort of instance of testing that you want your simulated task to be
you want your simulated task to be harder than your real task and then I
harder than your real task and then I think the last intuition that this the
think the last intuition that this the last sort of useful thing about this
last sort of useful thing about this intuition is that it's clear from this
intuition is that it's clear from this intuition that if you want your model to
intuition that if you want your model to perform well then you need kind of you
perform well then you need kind of you need to be able to perform well in all
need to be able to perform well in all parts of this like massive distribution
parts of this like massive distribution that you're training on right so it's
that you're training on right so it's not okay if like if you know if your
not okay if like if you know if your model performs really badly over here
model performs really badly over here right because it might be the case that
right because it might be the case that your real data sort of lies in that part
your real data sort of lies in that part of the region I think there are some
of the region I think there are some problems with this intuition so we're
problems with this intuition so we're mostly operating in a high dimensional
mostly operating in a high dimensional space and so you know we really should
space and so you know we really should need a like a really massive amount of
need a like a really massive amount of data to truly cover this real data
data to truly cover this real data distribution and then you know there are
distribution and then you know there are a lot of real-world effects that we
a lot of real-world effects that we might not model at all in our simulator
might not model at all in our simulator like backlash Gear backlash for example
like backlash Gear backlash for example or the specific distortion of the camera
or the specific distortion of the camera that you're using and so if we're not
that you're using and so if we're not modeling and effect at all is it like
modeling and effect at all is it like really reasonable to believe that the
really reasonable to believe that the impact of that effect will somehow be
impact of that effect will somehow be accounted for by the things that we are
accounted for by the things that we are randomizing another intuition that I
randomizing another intuition that I think can be helpful in thinking about
think can be helpful in thinking about this is that domain randomization is a
this is that domain randomization is a way of telling the model what it can
way of telling the model what it can ignore and so the the example I have
ignore and so the the example I have here is like let's say that you have a
here is like let's say that you have a data set that looks like this and you
data set that looks like this and you train a neural network on it to predict
train a neural network on it to predict whether the image comes from label from
whether the image comes from label from class 1 or class 2 and so if you train
class 1 or class 2 and so if you train on this data distribution and then what
on this data distribution and then what your model is going to do is it's going
your model is going to do is it's going to train a detector for
to train a detector for blue owls on green backgrounds right so
blue owls on green backgrounds right so neural networks are kind of lazy and
neural networks are kind of lazy and they'll exploit any sort of commonality
they'll exploit any sort of commonality in the data that you give them and so if
in the data that you give them and so if you want to instead train an owl
you want to instead train an owl detector then you need to like then what
detector then you need to like then what might work better is to use data like
might work better is to use data like this right so if you don't want the
this right so if you don't want the model to pick up on the fact that all of
model to pick up on the fact that all of the the owls in your data set are blue
the the owls in your data set are blue then maybe you should just change the
then maybe you should just change the color of the owl every single time so
color of the owl every single time so that you force that feature to be
that you force that feature to be unreliable and then the neural network
unreliable and then the neural network can't exploit it in order to decide in
can't exploit it in order to decide in order to make its decision and then the
order to make its decision and then the last intuition I want to touch on is
last intuition I want to touch on is this idea of domain randomization as
this idea of domain randomization as meta learning so the high-level idea of
meta learning so the high-level idea of meta learning is that you know in a
meta learning is that you know in a standard machine learning task you're
standard machine learning task you're trying to find some parameters that
trying to find some parameters that minimize some loss function on your data
minimize some loss function on your data but in meta learning you assume that you
but in meta learning you assume that you can also that you can also choose or
can also that you can also choose or that the data itself is is not static
that the data itself is is not static right so you're minimizing some
right so you're minimizing some parameters for data that is sampled from
parameters for data that is sampled from some distribution over datasets and so
some distribution over datasets and so kind of a concrete example here is like
kind of a concrete example here is like suppose that you're you want to you want
suppose that you're you want to you want to train a model that can from a very
to train a model that can from a very small number of images distinguish
small number of images distinguish between two different classes right and
between two different classes right and so your training examples in this
so your training examples in this paradigm are themselves data sets so you
paradigm are themselves data sets so you might have one data set that has cats
might have one data set that has cats and birds and the model has to decide
and birds and the model has to decide whether it's a cat or a bird and then
whether it's a cat or a bird and then you might have another data set that has
you might have another data set that has you know flowers and bikes and the model
you know flowers and bikes and the model has to decide for a new image whether
has to decide for a new image whether it's a flower or a bike and then at test
it's a flower or a bike and then at test time you'll be given some other data set
time you'll be given some other data set that might have you know some different
that might have you know some different classes maybe that you haven't seen
classes maybe that you haven't seen before and then by adjusting this kind
before and then by adjusting this kind of small amount of labeled data your
of small amount of labeled data your model will need to take a new image like
model will need to take a new image like let's say of a dog and then correctly
let's say of a dog and then correctly predict whether it's a dog or an otter
so this idea is has been also applied to reinforcement learning and you know and
reinforcement learning and you know and so like in this formulation you have you
so like in this formulation you have you know the concept of a task so like
know the concept of a task so like predicting whether something is a cat or
predicting whether something is a cat or a dog
a dog is like basically a like one or more
is like basically a like one or more rollouts and it give an environment and
rollouts and it give an environment and then you kind of and then you can kind
then you kind of and then you can kind of reset the state of the policy that
of reset the state of the policy that you're using to learn between each of
you're using to learn between each of those tasks and so the I think like the
those tasks and so the I think like the one paper that I like on this is our l
one paper that I like on this is our l squared paper from Rocky duan who was
squared paper from Rocky duan who was one of Peters former PhD students and
one of Peters former PhD students and the idea here is that like the recurrent
the idea here is that like the recurrent neural network is allowed to use its
neural network is allowed to use its hidden state within a given task to sort
hidden state within a given task to sort of quickly figure out how to solve a new
of quickly figure out how to solve a new reinforcement learning and then there's
reinforcement learning and then there's a slow learning process on top of that
a slow learning process on top of that that allows it to figure out what it
that allows it to figure out what it needs to do when it's faced with the new
needs to do when it's faced with the new environment in order to learn quickly
environment in order to learn quickly so I'll skip through the formalisms here
so I'll skip through the formalisms here but I think I do want to touch on like
but I think I do want to touch on like why
why demain randomization might be meta
demain randomization might be meta learning so the formulation of domain
learning so the formulation of domain randomization as meta learning is that
randomization as meta learning is that like each set of physics parameters
like each set of physics parameters corresponds to some environment and you
corresponds to some environment and you know one so one like attempt at solving
know one so one like attempt at solving the task in that environment
the task in that environment what attempt at solving the problem in
what attempt at solving the problem in that environment is a task and so you
that environment is a task and so you know during the rollout itself like when
know during the rollout itself like when you're trying to solve that problem in
you're trying to solve that problem in your new environment the recurrent state
your new environment the recurrent state of the policy allows you to adapt to
of the policy allows you to adapt to whatever new physics you're seeing and
whatever new physics you're seeing and so there's there's a little bit of
so there's there's a little bit of evidence that this might actually be the
evidence that this might actually be the case that this might actually be
case that this might actually be happening in when policies are trained
happening in when policies are trained in simulation and then deployed in the
in simulation and then deployed in the real world there are some sort of tools
real world there are some sort of tools that you can use to do demand
that you can use to do demand randomization like if you're using
randomization like if you're using different simulators Gazebo unity unreal
different simulators Gazebo unity unreal or sort of custom self driving simulator
or sort of custom self driving simulator and I recommend checking these out if
and I recommend checking these out if you want to apply this and then there
you want to apply this and then there are also some challenges to applying
are also some challenges to applying domain randomization and so you know in
domain randomization and so you know in practice like how does this process
practice like how does this process actually work right so for the first
actually work right so for the first thing that you do is you build a
thing that you do is you build a simulated world and then you take your
simulated world and then you take your simulated world and you calibrate that
simulated world and you calibrate that to the real environment and then you
to the real environment and then you design some randomizations that like you
design some randomizations that like you think intuitively might sort of like
think intuitively might sort of like the real-world variability then what you
the real-world variability then what you do is you train a model in that
do is you train a model in that simulation and you evaluate it in the
simulation and you evaluate it in the real world and then finally you kind of
real world and then finally you kind of have to go through this manual iterative
have to go through this manual iterative process of examining the failure modes
process of examining the failure modes in the real world and trying to design
in the real world and trying to design new randomizations that allow you to to
new randomizations that allow you to to get around those failure modes and so I
get around those failure modes and so I think kind of the core challenges here
think kind of the core challenges here are that this process is very manual
are that this process is very manual right so you need to do all the 3d
right so you need to do all the 3d modeling yourself you need to do this
modeling yourself you need to do this system ID problem which itself can be
system ID problem which itself can be challenging you need to decide what to
challenging you need to decide what to randomize which can which can require a
randomize which can which can require a lot of judgment and you need to decide
lot of judgment and you need to decide how much to randomize it and then
how much to randomize it and then finally you need to like as was pointed
finally you need to like as was pointed out once you evaluate in the real world
out once you evaluate in the real world you know somehow like go back and figure
you know somehow like go back and figure out what you should do when the model is
out what you should do when the model is failing like what additional
failing like what additional randomization should you add and so
randomization should you add and so there's been some recent work that's try
there's been some recent work that's try to extend domain randomization to kind
to extend domain randomization to kind of alleviate some of these challenges
of alleviate some of these challenges and I think what I'll do is I'll just
and I think what I'll do is I'll just kind of give a high-level sense of like
kind of give a high-level sense of like what are the what are the ways that
what are the what are the ways that people are trying to adjust a dress the
people are trying to adjust a dress the challenges of domain randomization and
challenges of domain randomization and then I have also some references to some
then I have also some references to some specific papers where people are trying
specific papers where people are trying to do this that you can dive into if
to do this that you can dive into if you're curious about the specifics of
you're curious about the specifics of the approaches that people have taken so
the approaches that people have taken so the first kind of the first class of
the first kind of the first class of techniques that people have tried to
techniques that people have tried to make domain randomization better is to
make domain randomization better is to say like well maybe we can design
say like well maybe we can design specific types of neural network
specific types of neural network architectures that are better suited to
architectures that are better suited to transfer so that work better for this
transfer so that work better for this particular type of tasks that we're
particular type of tasks that we're doing so an example of a paper here is
doing so an example of a paper here is this randomized to canonical adaptation
this randomized to canonical adaptation networks from some folks at Google and
networks from some folks at Google and the idea here is instead of you know
the idea here is instead of you know taking a randomized simulation and
taking a randomized simulation and training your model on that to directly
training your model on that to directly output what it's supposed to do instead
output what it's supposed to do instead what if you add this intermediate step
what if you add this intermediate step and this intermediate step is we first
and this intermediate step is we first trained a model that Maps this
trained a model that Maps this randomized simulation into some sort of
randomized simulation into some sort of canonical simulation and then when we
canonical simulation and then when we get in the real world we'll take our
get in the real world we'll take our real world data
real world data we'll also map that into the canonical
we'll also map that into the canonical simulation and and so then and so it
simulation and and so then and so it turns out in their experiments this
turns out in their experiments this performs better than just training on
performs better than just training on randomized data from scratch alone
randomized data from scratch alone another class of techniques that people
another class of techniques that people have tried is trying to match the
have tried is trying to match the simulator to real data and so this is
simulator to real data and so this is kind of like combining domain
kind of like combining domain randomization and system ID one approach
randomization and system ID one approach in this category is simha where they
in this category is simha where they kind of it early interactively train on
kind of it early interactively train on a randomized environments use the policy
a randomized environments use the policy from that randomized environment to
from that randomized environment to collect data in the real world and then
collect data in the real world and then use that real-world data to try to
use that real-world data to try to update the parameters of the simulator
update the parameters of the simulator to be a better match for what was seen
to be a better match for what was seen in the real world so this kind of an
in the real world so this kind of an iterative approach to automatically
iterative approach to automatically incorporating real-world data into
incorporating real-world data into simulator design and they had some
simulator design and they had some pretty interesting results from that and
pretty interesting results from that and you know another sort of Catalan other--
you know another sort of Catalan other-- approach here is is this idea of
approach here is is this idea of medicine where this is really addressing
medicine where this is really addressing the problem of world design using
the problem of world design using simulation and so you know the challenge
simulation and so you know the challenge is that like if you're trying to design
is that like if you're trying to design a self-driving car simulator and you're
a self-driving car simulator and you're just placing you know objects randomly
just placing you know objects randomly then you're gonna get a lot of scenes
then you're gonna get a lot of scenes that look like on the Left right where
that look like on the Left right where objects are not placed in in a way
objects are not placed in in a way that's physically realistic or that you
that's physically realistic or that you would see in your real data set and so
would see in your real data set and so the the goal of this approach is to try
the the goal of this approach is to try to take these scenes that you generate
to take these scenes that you generate naively and use a little bit of
naively and use a little bit of real-world data to make the the scenes
real-world data to make the the scenes that your simulator generates more
that your simulator generates more physically plausible
another thing that you can do that you could think about doing is like is
could think about doing is like is actually using the real data itself to
actually using the real data itself to try to directly improve the performance
try to directly improve the performance of the Model T train on simulation on
of the Model T train on simulation on the tasks that you care about and so I
the tasks that you care about and so I think the one one paper that I really
think the one one paper that I really like in this category is called learning
like in this category is called learning to simulate and so the idea here is you
to simulate and so the idea here is you know in standard domain randomization
know in standard domain randomization what we do is we sort of we have this
what we do is we sort of we have this manual process of tuning the simulator
manual process of tuning the simulator parameters right so we create some
parameters right so we create some simulator parameters and then we train a
simulator parameters and then we train a model on those parameters we see how
model on those parameters we see how well it works in the real world and then
well it works in the real world and then we try to use our intuition to go back
we try to use our intuition to go back and say all right can we design better
and say all right can we design better simulator parameters so what they what
simulator parameters so what they what they do in this paper is instead of
they do in this paper is instead of doing that tuning process manually they
doing that tuning process manually they instead use meta-learning to find the
instead use meta-learning to find the parameter distribution so they they're
parameter distribution so they they're optimizing over the distribution of
optimizing over the distribution of simulator parameters that performs the
simulator parameters that performs the best on the tasks that they actually
best on the tasks that they actually care about they're like they're actually
care about they're like they're actually optimizing the distribution of simulator
optimizing the distribution of simulator parameters itself based on this metric
parameters itself based on this metric of how old is the model that I train on
of how old is the model that I train on that simulator perform in the real world
another kind of class of techniques in this category is providing some way of
this category is providing some way of telling whether you're overfitting to
telling whether you're overfitting to the simulation before you before you
the simulation before you before you actually go into the real world and so
actually go into the real world and so what this could allow you to do is if if
what this could allow you to do is if if you know that you've trained too much on
you know that you've trained too much on this simulation then you can stop
this simulation then you can stop training and deploy into the real world
training and deploy into the real world before you've kind of over fit to the
before you've kind of over fit to the simulator and so this is a really
simulator and so this is a really interesting paper that follows that
interesting paper that follows that approach another thing you might think
approach another thing you might think about doing is right so we have this
about doing is right so we have this intuition that it's really good if your
intuition that it's really good if your simulator is harder than the real world
simulator is harder than the real world right but most most of the examples in
right but most most of the examples in your simulator maybe are not really that
your simulator maybe are not really that hard and so the question here is like
hard and so the question here is like can is there some automated way of
can is there some automated way of surfacing the hardest examples in the
surfacing the hardest examples in the simulator so that you can focus your
simulator so that you can focus your models training on those hard examples
models training on those hard examples and two papers that I like in this
and two papers that I like in this category one is active domain
category one is active domain randomization where
randomization where they have like they're randomized
they have like they're randomized simulators and then some reference
simulators and then some reference simulator and they train a model to try
simulator and they train a model to try to tell whether the policy was being
to tell whether the policy was being rolled out in the reference simulator
rolled out in the reference simulator where you know that you should do well
where you know that you should do well or one of the randomized simulators and
or one of the randomized simulators and so if if if this discriminator can tell
so if if if this discriminator can tell the difference between the the behavior
the difference between the the behavior of the robot in the randomize simulator
of the robot in the randomize simulator then that means that that simulator
then that means that that simulator might be harder and so you can focus
might be harder and so you can focus more of your effort on on training in
more of your effort on on training in that simulator I think I'm gonna skip
that simulator I think I'm gonna skip over this one and then I think the the
over this one and then I think the the last category of extensions to demand
last category of extensions to demand randomization that I'm excited about is
randomization that I'm excited about is so you know we had this idea that the
so you know we had this idea that the wider the range of simulation and the
wider the range of simulation and the better right so if we can train on a
better right so if we can train on a wide range of simulators then it's more
wide range of simulators then it's more likely that our model will generalize to
likely that our model will generalize to the real world but the challenge is that
the real world but the challenge is that in a lot of cases if you make the
in a lot of cases if you make the distribution of simulations too wide it
distribution of simulations too wide it becomes too hard a task for your network
becomes too hard a task for your network to do well on right so if you if you
to do well on right so if you if you degrade the performance in simulation
degrade the performance in simulation then you can't expect it to do well in
then you can't expect it to do well in the real world either and so this
the real world either and so this category of things is about trying to
category of things is about trying to allow the model to perform well in a
allow the model to perform well in a wider range of simulations so you can
wider range of simulations so you can continue to expand the range of
continue to expand the range of simulations that you train on without
simulations that you train on without hurting your models performance in
hurting your models performance in simulation and so two ideas in this
simulation and so two ideas in this category that I want to touch on one is
category that I want to touch on one is essentially allowing the model when
essentially allowing the model when you're training in simulation to see
you're training in simulation to see which simulator it's in right so instead
which simulator it's in right so instead of needing to figure out which simulator
of needing to figure out which simulator it's and you provide the information
it's and you provide the information about which simulator it's in to the
about which simulator it's in to the policy and so that allows the policy to
policy and so that allows the policy to kind of have less work to do right
kind of have less work to do right because it doesn't need to figure out
because it doesn't need to figure out which version of the world it's in it
which version of the world it's in it already knows that then in the real
already knows that then in the real environment obviously you don't have
environment obviously you don't have that information and so what they do is
that information and so what they do is they run an optimization algorithm in
they run an optimization algorithm in the real world that allows them to find
the real world that allows them to find the value of that simulator parameter
the value of that simulator parameter vector that allows the model to perform
vector that allows the model to perform best there they have some results some
best there they have some results some promising results in simulation that
promising results in simulation that show that this can work well
show that this can work well and then the last idea when a touch on
and then the last idea when a touch on is automatic domain randomization and
is automatic domain randomization and this is kind of the extension to domain
this is kind of the extension to domain randomization that allowed open AI to
randomization that allowed open AI to recently solve the Rubik's key with a
recently solve the Rubik's key with a robotic hand and the you know the core
robotic hand and the you know the core concept here is that you know since wide
concept here is that you know since wide randomization ranges lead to poor
randomization ranges lead to poor performance of a model that's trained on
performance of a model that's trained on the entire randomization range maybe we
the entire randomization range maybe we can allow the model to perform well on a
can allow the model to perform well on a wider and wider range of simulations
wider and wider range of simulations sort of by gradually growing the width
sort of by gradually growing the width of those simulations that's trained on
of those simulations that's trained on so we start with a really narrow range
so we start with a really narrow range of simulations and then once we perform
of simulations and then once we perform well on that narrow range then we make
well on that narrow range then we make the range a little bit wider and so like
the range a little bit wider and so like the idea here is that maybe that's an
the idea here is that maybe that's an easier learning problem and so we can
easier learning problem and so we can continually we can continue to expand
continually we can continue to expand the number of simulators that the model
the number of simulators that the model is trained on I'll skip over the details
is trained on I'll skip over the details here but this is kind of the result that
here but this is kind of the result that you can get by using something like this
you can get by using something like this and so I think it's a this is running in
and so I think it's a this is running in real time so it's gonna it takes a few
real time so it's gonna it takes a few seconds for it to actually start doing
seconds for it to actually start doing things but this this robot like
things but this this robot like ultimately is able to solve the Rubik's
ultimately is able to solve the Rubik's Cube in hand a couple of caveats to this
Cube in hand a couple of caveats to this result the first is that like this
result the first is that like this doesn't actually work all that reliably
doesn't actually work all that reliably it's kind of maybe 20% of the time that
it's kind of maybe 20% of the time that actually is able to solve it
actually is able to solve it successfully and then there were some
successfully and then there were some kind of explicit choices that need to be
kind of explicit choices that need to be made around how the sensors were
made around how the sensors were configured so it's not all it's not like
configured so it's not all it's not like directly estimating the state of the
directly estimating the state of the world for envision but I think
world for envision but I think impressive result nonetheless okay the
impressive result nonetheless okay the last thing I want to touch on is just
last thing I want to touch on is just kind of what's you know like what's
kind of what's you know like what's coming in this field right so what's
coming in this field right so what's next I think I hope that we're gonna see
next I think I hope that we're gonna see more and better tools I sort of showed
more and better tools I sort of showed you a slide I flashed a slide that has
you a slide I flashed a slide that has some tools for domain randomization but
some tools for domain randomization but I think that we can do better especially
I think that we can do better especially on physics randomization I think you
on physics randomization I think you know hopefully we can like simulators
know hopefully we can like simulators will continue to get more and more
will continue to get more and more accurate and more and more scalable and
accurate and more and more scalable and so we can do larger and larger training
so we can do larger and larger training runs and I think there's sort of
runs and I think there's sort of generation of simcha real techniques
generation of simcha real techniques that are coming and so I'm really
that are coming and so I'm really excited about sort of the research areas
excited about sort of the research areas that I showed you earlier around
that I showed you earlier around automating different parts of like the
automating different parts of like the very manual demand randomization process
very manual demand randomization process and I think those are going to continue
and I think those are going to continue to get better
to get better I see like Devane right ization and
I see like Devane right ization and domain adaptation and as well as like
domain adaptation and as well as like model-based reinforcement learning
model-based reinforcement learning people kind of think about these things
people kind of think about these things separately right now but I think that
separately right now but I think that they're all going to converge right
they're all going to converge right there's no reason that you can't do
there's no reason that you can't do domain randomization then also do domain
domain randomization then also do domain adaptation on top of that so I think
adaptation on top of that so I think ideas that sort of like approaches that
ideas that sort of like approaches that combine ideas from these three fields
combine ideas from these three fields are promising and then in terms of use
are promising and then in terms of use cases right I think this comes back to
cases right I think this comes back to sort of what what I touched on at the
sort of what what I touched on at the beginning as motivation for studying
beginning as motivation for studying these techniques to begin with I hope
these techniques to begin with I hope that people will start to prove out that
that people will start to prove out that you can use synthetic data for edge
you can use synthetic data for edge cases and for reducing bias and
cases and for reducing bias and ultimately for getting robots to learn
ultimately for getting robots to learn on really complicated like wide messy
on really complicated like wide messy real-world data distributions like I
real-world data distributions like I think an awesome project for someone to
think an awesome project for someone to do would be to try to get a
do would be to try to get a remote-controlled car to drive around
remote-controlled car to drive around Berkeley campus only training on
Berkeley campus only training on synthetic data and then lastly like the
synthetic data and then lastly like the dream for all this right so right now
dream for all this right so right now it's the super manual process but what
it's the super manual process but what like I think the Northstar for this
like I think the Northstar for this field is is like what I'd call like real
field is is like what I'd call like real to sim to real and so the idea here is
to sim to real and so the idea here is like well you where you want to be is
like well you where you want to be is where you can sort of collect some data
where you can sort of collect some data about the real world like you maybe have
about the real world like you maybe have some sensors that are observing your
some sensors that are observing your scene and then you can use that sensor
scene and then you can use that sensor data to automatically construct a
data to automatically construct a simulation and automatically decide what
simulation and automatically decide what ranges of parameters to randomize and
ranges of parameters to randomize and then you train a model in those
then you train a model in those simulations and then use the policy that
simulations and then use the policy that results from that to go and collect more
results from that to go and collect more real-world data and then go back and
real-world data and then go back and sort of improve and widen your
sort of improve and widen your simulation and so my hope is that like
simulation and so my hope is that like in the long term this whole process is
in the long term this whole process is going to get automated and we're going
going to get automated and we're going to be able to just build really powerful
to be able to just build really powerful robotic systems on top of these
robotic systems on top of these techniques okay if you're interested in
techniques okay if you're interested in learning more about this here are a few
learning more about this here are a few references that I recommend and yeah
references that I recommend and yeah thanks I think we're over time but
thanks I think we're over time but happy to take questions kind of outside
happy to take questions kind of outside or offline by email after yeah thanks
or offline by email after yeah thanks yeah I had like way too much there
yeah I had like way too much there realized halfway through but tends to
realized halfway through but tends to happen I think we'll share the slides
happen I think we'll share the slides right yeah
Video के उस moment पर जाने के लिए कोई भी text या timestamp click करें
Share करें:
ज्यादातर transcripts 5 सेकंड से कम में तैयार
एक Click में Copy125+ भाषाएंContent Search करेंTimestamps पर जाएं
YouTube URL Paste करें
कोई भी YouTube video link डालें और पूरा transcript पाएं
Transcript निकालें
ज्यादातर transcripts 5 सेकंड से कम में तैयार
हमारा Chrome Extension लें
YouTube छोड़े बिना transcript तुरंत पाएं। हमारा Chrome extension install करें और watch page पर ही किसी भी video का transcript one-click में access करें।