YouTube Transcript:
Lecture 22 Sim2Real and Domain Randomization -- CS287-FA19 Advanced Robotics at UC Berkeley

पूरा वीडियो देखने की जरूरत नहीं — full transcript लें, keywords search करें, और एक click में copy करें।

Share करें:

AutoDub

YouTube के विदेशी Videos समझें

YouTube Videos की Hindi में Immersive Dubbing

भाषा की दीवार तोड़ें, दुनिया भर का बेहतरीन content enjoy करें

Free में Use करें

वीडियो Transcript

वीडियो Summary

Summary

Core Theme

The lecture explores the challenges and techniques for enabling robots to learn and operate effectively in the real world, primarily focusing on the "sim-to-real" transfer problem, where models trained in simulation are applied to physical robots.

Mind Map

Expand करने के लिए click करें

पूरा interactive mind map देखने के लिए click करें

hi everyone welcome to lecture today we

have a guest lecture

Josh Tobin Josh got his PhD in AI from

Berkeley has been a research scientist

at open the iock for several years and

it's one of the world leading experts I

would say the world leading pioneer in

seem to real how to make robots learn in

simulation and have it still somehow

work well in the real world and that's

exactly what we're going to learn about

today before Josh gets started a couple

of quick logistical things you have your

last homework your homework five is out

and you in about two weeks

so keep track of that and then your

midterm has happened and then final

project presentation time slot Paul has

gone out we'll look at that

soon and then we'll send out something

for signups in the next couple of days

you can pick a specific slot for your

team and if you have any complexes all

the slots will figure something out

probably with you're doing a recording

for us ahead of time all right any

logistical questions okay then please

join me in welcoming Josh all right

thank you Peter I'm really excited to be

here before I dive into the topic of

today's talk which is sim to real just a

bit about me

my background was in pure math and then

I you know decided that I wanted to like

do stuff in the real world went into

consulting for a little bit but miss

being technical so I came back to

Berkeley to do my PhD in applied math

but I'm little did I know when I came

back to Berkeley that I would do one

thing that would change at all which is

actually to take this class and so I

took CS 287 I think the last time was

offered around four years ago and that

sort of changed my trajectory from

- robotics and artificial intelligence

spend time doing those things that open

AI and Berkeley so the first thing I

want to talk about is just you know what

was my like takeaway from CS 287

actually I'm curious you you all are

almost done with the semester what is

what's your like main takeaway from this

yeah I really do or maybe no one's

learned anything from the class yeah you

are lqr yeah that's great that's a

that's not exactly my main takeaway but

I definitely did take that away my main

takeaway was get started early on

homework five it's a no I'm just getting

odd I think like the thing the thing

that I took away from the class more

than anything else is that you know

robotics is really hard right and so why

why is that the case right so you all

have talked about you know the

simplified model of how robots interact

with the real world which is an MDP but

I think one of the the core things that

makes robotics so hard is that the real

world really isn't an MDP right so in an

MVP you have an agent that gets to

observe the ground truth state of the

world but in the real world you know

states are super complex and they're

ambiguous and they they're really hard

to model so these are all kind of

examples of scenes where you should

think about how would we actually model

the state of this so what the robot does

get instead of this data is an

observation but the observations

themselves are often really high

dimensional and they're multimodal so

maybe they have many camera inputs and

they're also super noisy right so even

even though we do get an observation of

the world that observation might not be

reliable the next assumption that we

have in an MVP is that we get a reward

but my question is you know where does

reward come from in the real world right

so if you're trying to have a robot pour

a cup of coffee for someone how do you

actually set up a system that will give

you a reward back when they do that task

successfully and a couple of other

examples that you might want to think

about is what does the reward function

look like for folding a towel or for

you know cooking someone dinner or you

know ultimately like making their user

happy all right how would you define a

reward function for those things and

then even if you can define a reward

function a lot of times our reward

measuring our reward relies on having

sensors to tell us sort of where the

robot is in space so how do you measure

a reward outside the lab and then lastly

actions that robots take right so

designing controllers for robotics is a

really hard engineering problem you need

to understand the system that the robot

is interacting with very well and it

doesn't always scale that well to high

dimensions I like this quote from Russ

Tedric at MIT in t.ri which is that you

know manipulation like maybe one of the

main things that we care about in

robotics breaks all the rigorous and

reliable methods for control that we

know about and then once you do get your

controller you know your robot is going

to break and it's going to degrade and

the sensors are going to going to fail

and so how do you make sure that things

are reliable for that and so one thing

that I was really excited about when I

took this class is deep reinforcement

learning and you know deep learning more

generally applied to robotics and so you

know I think the the hope there is that

rather than like us needing to spend a

ton of time understanding the

environment that the robots can operate

in maybe we can just collect a lot of

experience and let the algorithm handle

the rest and so the next thing I want to

talk about is like what's preventing us

from doing that why is this so hard and

so I think that the main observation

here is that deep learning is super data

hungry right so from you know trading

models and images to sentences to

robotic control you really need like

often millions or tens of millions or

even more sort of labeled examples in

order to get things to work well but for

those who have for those of us in

robotics that presents a big challenge

right because robotic data is super

expensive robots themselves are

expensive and you know actually going

out and collecting data on robots can be

dangerous and then it often in the real

world is hard to actually get labels for

the things that we care about our robots

doing and so one of the main things that

motivated me when I started working on

my PhD is how can we get around the data

availability problem in robotics so is

there any way to make data more

plentiful so that we can do deep

learning and robotics there's a few ways

that people have thought about approach

this may be if the problem is we don't

have enough data like let's just scale

up data collection and so some research

researchers have thought about how to

make fleets of robots that all collect

data together and learn from their

shared experiences maybe if the problem

is that our learning algorithms are too

inefficient maybe we should just make

them more efficient and so more

efficient than sort of model free

reinforcement learning could be

model-based RL meta learning and

learning from demonstrations and I know

that you've talked about at least two of

these model base and LFT in the class so

far or you know maybe another way to get

around this is if if the problem is it's

one of the problems is it's really hard

to get labeled data in the real world

maybe we can do a lot of our learning

and unsupervised way and so I think you

know all of these approaches that I've

sort of touched on at a really high

level so far are really interesting and

I think are going to be a big part of

the story about how robots make it to

the real world but the question I want

to focus on is what can we do without

doing that so what can we do with

simulated data and you might ask you

know why why would we even bother with

simulated data at all

well if simulated data works it has a

lot of really big advantages so unlike

robotic data simulated data is super

cheap basically zero marginal cost it's

very fast you can run simulators faster

than real time and it's scalable right

so you can have a simulation running on

every core in your data center you don't

need to go and buy new robots maybe more

importantly it's safe right so you can't

actually damage something by running a

simulation at least not yet

you get labels for free because you

design the world so you know where all

the objects are and you know kind of how

the task is evolving and you're not

holding to real-world probability

distributions and I'll expand on what I

mean by this in a second but first

labeling I think this is kind of like an

underrated advantage of simulation there

are a lot of tasks where it's very hard

to get a human to label the data for you

so for example if you have images from

the real world and you want to have

someone annotate the ground truth for

depth in those images that's kind of

like a hard task for for a human to do

or similarly annotating the 3d pose of

objects when you only have a 2d image

both of those things you know

it's kind of hard to just go and ask

people in Amazon to do it for you what

do I mean by not being beholden to

real-world probability distributions a

couple of kind of examples I want to

mention here first is the edge case

problem so if you're training at

self-driving car right most of the time

your car is just driving on the highway

but every once in a while you see

something like this right like a you

know cyclist and a pink bodysuit or you

know a kangaroo hopping across the road

maybe that's common in Australia I don't

know but definitely not here or this

like crazy roundabout that's like five

roundabouts in one or something like that

that

and so you know the challenge with edge

cases in self-driving cars is that by

definition we have sort of very few if

any training examples for our robot and

so how do we like if we're doing machine

learning how do we not over fit to the

training examples that we have and if we

can train on simulated data that might

be a way around that another reason I'm

excited about simulation is for reducing

bias so you know take a toy example here

like let's say that we're training a

model to distinguish I don't know dogs

from puppies and our training data looks

like this right so this training data is

biased because it only has golden

retrievers in the dogs category and so

what would happen if we train a model on

data that looks like this well the model

might classify all Australian Shepherds

as puppies right that's really bad

because there are adult Australian

Shepherds as well and so you know the

question I have is can we fix this by

synthesizing adult Australian Shepherds

but I think kind of the core question to

ask yourself about all this is like if

simulation works it could be really

valuable valuable but what reason do we

have to believe that it should actually

work and so this is a quote from Rodney

Brooks that I like because I think it

captures what the way that most people

in the robotics community and the

machine learning community felt for a

long time about the value of simulation

right there's actually a near certainty

that programs that work well on our

simulated robots will completely fail in

the real world and the reason that

they'll fail is because you know in the

real the real world is not like

simulation there are differences between

between the dynamics in the real world

and the

in our simulator okay so what am I going

to talk about for for the rest of this

talk the first thing I want to touch on

is just I want to give you sort of an

intuitive sense of why it's so hard to

use simulated training data so you know

why is it the case that we have a gap

between simulation in the real world and

then I want to sort of briefly mention

you know simulation is a broad topic in

robotics and there even if you don't

solve this immaterial problem you can

still use simulation to help you build

robot systems that work and so I'll talk

about a couple of those then I'll

mention kind of some ways that you can

go about building a good simulation so a

simulation that's a good fit for the

real world and then I'll talk about a

couple of techniques to bridge the gap

so the first is domain adaptation and

then the second is domain randomization

which I'll have the most to say about

and then finally I want to mention a

couple of thoughts about sort of what's

next for this field of sim toril but are

there any questions before I dive into

that all right

so why is it so hard to use simulated

training data I think the core at its

core there are two reasons the first is

that it's really hard to accurately and

also efficiently model sensors and

physical systems and then the second is

that even if you have only like a small

modeling error that can tend to lead to

large errors in the behavior of the

downstream control system so why is it

hard to accurately and efficiently model

sensors and physical systems well you

know as we talked about a couple weeks

ago physics emulators make some big

assumptions about the world in order to

run faster right so a lot of physics

simulators assume that all the objects

are contacts or that we have sort of

discrete time steps with a relatively

large DT or that all bodies are rigid or

we have a simplified model of friction

let's say and so there's there's

inherently going to be gaps between any

model that makes assumptions that's

large and the physics of the real world

but also you know even if you can model

everything accurately then if you want

to carefully match the real world you

still need to get the parameters of that

simulation right and so how do you

measure things that are not directly

observable in your data like damping

inertia and friction

and you know the more accurate your

model is the more parameters it's going

to have and so the more of these things

you need to measure so that means that

you need more data in order to

accurately estimate them I'll talk a

little bit more on how to how to do this

later but it's not just physics

you know we can do a reasonably good job

now of photorealistic rendering of

sensors and so this is an example from a

movie a few years ago ago which is the

remake of The Jungle Book and I think

this is a really good example of super

super high quality rendered images but

if you look at how much effort went into

creating images like this it's like tens

of hours of artist effort per frame

right so getting sensor data that's this

high quality is very expensive and it's

really not a solved problem right and so

I think lidar is kind of one example of

something that people see as as being

relatively easy to simulate but in fact

there there are a lot of gaps between

how wide our simulators work and

real-world lidar data ok so there's so

there's always going to be some sort of

gap between your simulator and the data

that you get from the real world but

what's worse is that you know if there

is a gap then then simulators will tend

or then your model will tend to exploit

it right so one reason for that is that

like one of my intuitions about neural

networks is that they're very lazy

alright and so if there's like some

artifact in your data distribution that

they can't exploit they will exploit it

and so an example to illustrate this

point is the virtual kiddy data set and

so they essentially took each scene in a

self-driving car data set and

exhaustively reproduced it in simulation

they trained a model on both the real

data distribution and the simulated data

distribution and so even though this is

kind of like the best that you could

expect to do in terms of recreating your

real data distribution there's still a

big gap in the performance between

training on the simulated version and

the real version but so you might say

well maybe it's ok if we have some

errors because our our robot should be

robust to errors that we make in in

modeling right I think one challenge is

that errors tend to compound

for for gaps between symmetry oh so what

we hope happens is you know if you have

some blue curve that's like the path

that you want the robot to follow and

the green curve is the path that it

actually follows where you kind of make

small mistakes along the way but those

mistakes are uncorrelated and so you're

able to kind of keep the robot on track

but what actually happens a lot in the

real world is that you have you know the

same path that you're trying to follow

but the robot gets off the path and it

gets so far off the path that it's out

of the data distribution that is trained

on and it's not able to recover all

right any any questions about what I've

covered so far all right um the next

thing I want to talk about is like so

we've we've sort of established that

this this Imperial problem like the

problem of using simulated data for

real-world tasks is hard so the next

thing I want to address is like why why

should we do this at all maybe we can

take advantage of simulation without

needing to Train robots on it and

there's a couple of ways that you can do

that one is you know simulation is great

for prototyping your algorithms

simulation is also really good for

debugging your specific implementation

and making sure that you know you have

sort of bug free code before running it

on a robot prototyping entire systems

and then testing so for prototyping

algorithms I think this is like really

common in reinforcement learning for

example where people will if they're

trying to come up with a better

reinforcement learning algorithm they'll

you know almost always run that in

simulation before it ever makes it to a

real robot and you know the reason for

that is you want to you're going to have

to do a lot of cycles since you want to

make sure you're using those cycles

efficiently it's also useful for

debugging your software and so typically

this is done in tools like gazebo and

Ross that are very similar to the

software that's actually going to be

running on the robot and so what you do

here is you actually implement your

entire stack that you want to run on

your robot and then you make sure that

it with realistic latency and all the

sort of bugs in your raw stack that

you're able to get things to work in

simulation first before to apply

deploying them on the robot

another use case that you see a lot in

industrial robotics is for prototyping

entire systems right so you know for

example like if you have some tasks that

you want your industrial robot to solve

then you need to figure out what robot

you're going to use and you need to once

you figure that out you want to like

kind of make sure that it's going to be

able to solve the task before you go and

buy it and like invest the effort into

installing it and then you often need to

design the entire cell itself so like

the entire workflow that the robot is

going to be part of and you can

prototype things like that much faster

in simulation and then finally you can

kind of test how long things are going

to take and make some sort of ROI

calculation before you decide to invest

in expensive robots one other kind of

use case that I'm actually really

excited about that I want to mention is

for reliability testing and/or like

continuous integration for robotic

software development and so you know

that I think the question is like say

you're developing a self-driving car and

you make some change to your to your

vision model right and so how do you

make sure that that change to your

vision model is not actually going to

degrade performance in the real world so

the most straightforward way to do that

is you can run tests against your log

data so you can like look at all the

sensor data that your robot has seen

before and then you can look at your

model against that and make sure that

the error is a mix aren't too big but

the challenges are that like log data

has itself incomplete right it's it's a

noisy observation of the world you don't

have a full state information and you

know it's partially observed and then

importantly like log data is also static

right so really what you want to do is

you want to make sure that your entire

control system still behaves well when

you make a change to your vision system

but if you're looking at log data then

you know you you can't like you can't

explore what happens when your robots

behavior itself changes you can only

look at sort of what's happening in the

current time step and so a lot of

self-driving car companies have invested

pretty heavily in this there's an

article that came out about Wei Mo's

simulation testing setup that I

recommend checking out and you know

they've they've run like several more

orders of magnitude

tests in simulation than they have on

real cars another example of this that I

really like is the approach that rust

hedrick's group is taking at Toyota

Research Institute and so they call it

simulation first robotic development and

basically what this means is that they

have a bunch of tests that run in

simulation every night and so they want

to make sure that like any changes to

the code base that they push during the

day they want to know how those affect

the behavior of the robot in the

simulation and so the key is that that

they've mentioned that have been

important for this technique being

successful are making sure that the

simulation is harder than the real

environment that they're trying to solve

being rigorous about sources of

randomness so you know knowing that if

you're you have a degradation in

performance it's not just because you

got unlucky with the random seed and

then manually going through the errors

that your model makes to find kind of

sources of bugs so like looking you know

if you if you if you had a degradation

in performance overnight then like

actually going and looking at each of

those cases and saying like ok why did

we make this mistake

and then lastly good contact simulation

is important for them ok the next thing

I want to talk about is like if like

let's say that we decide to use

simulation to train our robots or just

to do testing then how can we actually

go about building a good simulation but

I'll pause there just to see if there

were any questions on the pass section

so my question is do you know if most

self-driving car companies just use

simulation for testing or do they also

train on that data yeah I think

self-driving car companies are sort of

interested and curious about training on

self-driving car data but and I think it

so from the ones that I've talked to you

they have tried doing this but I don't

think it's a widespread practice right

now you quoted earlier about like make

simulation hotter than reality

yeah like expand on that how can you

actually make simulation like a more

rigorous than reality that sounds a

little bit like counterintuitive sure

yeah it's a good question so the so like

one way to think about it is let's say

that you're you're training a robot that

you want to be able to grasp objects

right and so you have some properties

that you know about the objects that you

want to be able to grasp like maybe

they're all they're all objects that you

would see in a kitchen and they're all

between this size and this size and you

know it's always going to be well the

kid like lighting conditions are always

going to be good when you're trying to

grasp the robots one thing that you

might think about when designing the

simulator is like take your worst-case

estimate of all those things and just

make sure that your simulator is like

sort of bias towards that so make sure

that you're giving the robot lots of the

hardest objects for it to grasp in the

simulation rather than you know like if

you only see those hard objects 1% of

the time in the real world maybe you see

it more of the time in in the simulated

world all right so I think the process

that people typically go through when

they're designing a simulator is you

know first you kind of build the model

of the world and then you create

scenarios so the first part is about

like designing the physics and making

sure that you have accurate a kind of

model of your robot then the second part

is about creating the scenarios that the

robot is going to interact with so like

which roads is it going to need to drive

on or which objects is it going to need

to pick up

and then finally typically what you'll

do is you'll collect a bunch of data

from the real world you use that data to

do system ID which is a process of

improving your simulation so just really

briefly on designing the simulation

model I'll just refer you back to

Peter's lecture earlier in this class in

practice what most people do is they

don't build their simulator from scratch

they just pick bullet or pie bullet or

mojo and then use the models that are

provided for them by the developer of

their robot a couple of other simulators

that are worth looking at Drake from

tetrax group again at MIT which is I

think pushing a little bit more towards

trying to make more realistic simulation

at the expense of it being maybe a

little bit slower and then gazebo which

was the most popular simulator those

popular simulator for a while in

robotics and has since fallen out of

favor but if you're doing a lot of stuff

with Ross it's still worth exploring the

next thing you need to do is create your

scenarios for the robot and so this is

kind of the process of like designing

the world that the robot is going to

interact with and so I think kind of one

of the main questions here is like where

do we actually get 3d models for things

right so if the robot needs to interact

with objects where we find examples of

objects that the robot can interact with

in the simulator there's a spectrum of

different options that sort of have

trade-offs in terms of the quality of

the objects and the number of objects

shape nut is the one that's freely

available that I think has the most

objects and so it's like in the high

tens of thousands but the quality of the

objects themselves tends to vary ycb is

sort of at the other end of the spectrum

very very high quality object models but

Dex net is a data set from from from

Jeff Mahler at Berkeley and it's

actually a combination of other data

sets and this is I think at a good

pretty good trade-off point between

quality of the models themselves and the

number of models that you have access to

a couple of other things to be aware of

there are like all these sort of 3d

model repositories that you might see if

you've ever done like game development

these are worth checking out generally

you can't get things for free from them

and then lastly procedural object

generation and I'll talk a little bit

more about that in a second and so then

you know the next question is like we

have our database of objects that we

want the robot to be able to interact

with like many hammers and cups and

whatever it is that we need our robot to

do and so then the next question is like

how do we place those into the world in

a coherent way you could just try

placing them randomly but what tends to

happen if you do that is that all the

objects will sort of collide with each

other and they'll be in very unrealistic

configurations you can place them

randomly according to physics so maybe

you just have like a box that the

objects are sitting in and you might

drop them from above the box before the

scene starts so they're so that they're

at least placed in a way that's

consistent with physics or you might do

it procedurally so I mentioned

procedural content generation in the

context of object modeling and also

world design and so this is kind of a

pretty big and well explored area in

game design I won't really go into it

but there's a book that I recommend if

all right and so lastly you have you

know you've built your simulator you

have collected a bunch of scenarios that

you can put in your simulator that you

want your robot to perform well on and

so then the next thing to do is like

you've made guesses at all of your

physics parameters when you've been

modeling modeling the world and so the

next thing you want to do is like

actually actually collect a bunch of

data and use that data to try to make

the simulator a better match for reality

and so this is the process of system ID

so what is the problem that we're trying

to solve with system ID well we have

some like parameters parameters of our

simulation so these are things like

friction damping you know mass of

different links of the robot and then we

have some sequence of actions that we

want the robot to follow and so the goal

of system ID is to try to find the set

of parameter values that give you the

sort of lowest loss the lowest

difference between what the robot does

when you execute execute those actions

in the simulator

what it does when you execute those

actions in reality alright so there are

a few design choices here

one is like how do you actually choose

this sequence of actions so like what do

you want to actually run on the robot in

order to like in order to kind of

minimize the difference between simin

real and then another is like what

distance function do you want to use so

how do you tell if how you measure if

these two trajectories are close to one

another and so I'm going to just give a

quick sort of case study of how this

works for one problem which was doing

system ID for the shadow' hand in some

of the opening I robotics experiments so

in this case they chose trajectories

that consisted of kind of individually

moving each joint to its limits and then

moving each finger individually along

like kind of spline curves to try to

capture the inter dependencies between

the joints and then the distance

function that they use is they took you

know they apply the same sequence of

actions both in Sim and real and then

they looked at where the robot was one

second later and then they took the

difference between those those states

and tried to minimize that distance and

then finally the optimization algorithm

that they use to do this minimum a

minimization was sort of iterating

between the coordinates and doing

coordinate descent until all things converged

okay so you've you know you've designed

your simulator and you've tried to make

it as good a fit to reality as you can

but you know as I alluded to earlier

there's still gonna be gaps all right so

your simulator is still not really going

to be a perfect match for reality and so

kind of the rest of the talk is about

what to do about that but before I dive

have you ever seen a simulator that has

good like other agents as well like

simulation of like other cars in the

scene or maybe things that the robot is

gonna interact with yeah it's a it's a

really good question I think so a couple

of ways that I've seen people do deal

with this one is if you're if you can

incorporate the learning of the other

agent into the actual optimization

process so if you're in like a multi

agent reinforcement learning setting

that's kind of one way that people deal

with this but in general this is a

really hard problem and this is like

when you talk to self-driving car

companies about their problems with a

simulation one of the biggest ones that

they cite is not having good models of

like how pedestrians are going to behave

or how other drivers are going to behave

and so I think you know if you can

figure out an answer to that question

then it's going to be really really

valuable in that industry yeah for the

data collection in the previous slide

how about running an optimization over

the data collection mm-hmm so you're

saying also figuring out trying to

optimize which data set to collect in

order to minimize yeah that's really

interesting I've I'm sure that there are

examples of people doing that I've not

yeah do you have a sense for which types

of tasks as possible to make a good

simulation forum which types of tasks is

just like near impossible mm-hmm so I'll

give a couple of categories of things

that are maybe harder to make a

simulation for I think anything where

it's so anything where it like

explicitly violates the assumptions that

most simulators make so if you have non

rigid bodies like if you need to do

cloth simulation for example you can do

that but it's it tends to be harder so

that's kind of one category things like

maybe if your robot needs to interact

with fluids that might also be really

difficult and then another category of

things is just where you have or the

like set of things that you need your

robot to be able to solve is just really

wide so if for example you're trying to

make a simulator for all of self-driving

cars then it's like how can you possibly

get enough variety in your simulator to

capture all the different scenarios that

well it's really recording I think yeah

how do you think about like the model

class that you should have here for

example like you could you can imagine

using a neural network but like in

between all the data points that you

observe its arbitrarily bad so how do

you how do you what is your process for

selecting the appropriate model for for

mx4 the dynamics itself yeah I think I

think typically the thing that you want

to do is like so neural networks can

like basically learn any function right

or they can learn any function if you

have enough data but the challenge with

neural nets is that they don't really

generalize that well to data that's out

of the distribution that they've seen

before and so I think like one of the

reasons why simulated training data

works so well is because we like we as

humans actually know things about the

world right it's not like the world is

just a black box that spits data out at

us like we understand at least at a

simplified level how physics work and so

I think one of the one of the reasons

why this approach has actually been

really successful is because if you

build a physics model of the world and

then use that to generate data then even

if it's not perfect we're exploiting

some of our knowledge about how the real

world works so I guess my answer is I

would I would suggest like trying to use

like physically based models and yeah

and I think Peters lecture on that from

a couple weeks ago is a there's a good

all right so addressing the gap between

the simulator in the real world the

first class of techniques that I want to

talk about is domain adaptation so this

is sort of a broad topic in machine

learning and I don't really have time to

do it justice but what I do want to do

is just give a few examples of how

domain adaptation techniques have been

applied for sim to real problems in

robotics and so I would kind of

categorize domain adaptation techniques

into two buckets one is supervised

domain adaptation where you know like

let's say that you assume that you're

able to get labels or rewards signal in

the real world and then the other is

unsupervised or weekly supervised domain

adaptation where you know your

assumption is that we have labels and

rewards in the simulator but in the real

world we don't we just have like only

unlabeled sensor data so the first

category is supervised domain adaptation

and so you know for those of you that

have that have done like that range like

the convolutional neural networks for

example the simplest form of supervised

domain adaptation is just fine-tuning

right so you train on some source data

distribution and then you kind of take

the weights that you get from training

on that source data distribution and

then you just retrain them a little bit

on data from your target data

distribution so from the real world in

this example and so this this can work

quite well in robotics and it's present

in a lot of papers but it's kind of

rarely the focus of people's effort when

they're doing research in this area one

kind of extension of fine-tuning is this

idea of progressive networks and so you

know one of the challenges with fine

tuning is that when you fine-tune a

model that's trained on one data set to

another data set it tends to forget what

it learns in the first data set and so

progressive networks are kind of a way

to try to address that where instead of

fine tuning the same network they

instead like add some additional layers

to the network and then those are what

they train on the second data set

another approach that people have tried

here is what I would call like learning

inverse dynamics and so inverse dynamics

is basically you you have kind of the

current state of the world and then you

have some goal state that you want to

get to and the learning problem

solve is learning what action will take

you from from this state to the next

state and so there's a couple of

different variations of this technique

that people have tried another idea in

this category is using simulation to

find a low dimensional search space so

one thing that you can do is like you

know one of the reasons why it's slow to

learn policies or models in the real

world is because you're searching over

this really high dimensional space which

is like all possible policies but if you

train in simulation and you use that to

find like a sub manifold of of that huge

space and then search over that in the

real world then that could make learning

more efficient and then the final

category that I wanna just quickly

mention here is using simulation

explicitly as a Bayesian prior for for

your learning in the real world and you

know there's there's quite a bit of

research in this area and I think this

category is actually particularly

exciting for kind of ongoing research

yes yeah so the idea in this category of

of techniques is like so you train a

model in simulation and the goal of that

model in simulation is to tell the robot

what states it's trying to reach at any

given time point and so that model might

say like alright here's a trajectory

that I you know given the state of the

world I see now here's the trajectory

that I want to follow but you know the

challenge is that if you if you just

apply the actions that you took in the

simulator then that won't actually allow

you to follow the same trajectory in the

real world and so what you're doing here

is you're kind of like you're you're

taking the output of that simulated

model which says like alright since I'm

in this state I need to go to this state

next and then what you're learning on

real data is the function that allows

you to get from this state to the next

all right so there's also kind of less

supervised domain adaptation one

category is weekly supervised where you

take you know you take the labels of

that your model outputs and you treat

them as kind of you you treat you take

your models predictions and you treat

those as noisy labels for fine tuning

there's self supervise to be an apt

ation so if you can create a system that

allows the robot to do things that

automatically allow it to label the data

so if you know that for example that if

you kind of that if like if you have a

sensor that tells you that this object

has moved then that might tell you well

okay if the object is movable a certain

height then that means that our that our

attempt to grasp that object was

successful that's kind of this category

of things and then lastly is

unsupervised domain adaptation and so I

think kind of the most exciting recent

and advance in this is taking image to

image translation models and applying

those to domain adaptation and so what I

mean by that is you might have some data

from your simulator that's unrealistic

and then you might also have some data

from the real world but the data from

the real world doesn't have labels and

so what you can do is you can learn a

function that map's your simulated data

into the real world and tries to match

the data distribution from the real

world and so the idea is like you're

kind of translating the image from the

simulated domain into the real domain

and if you can do that successfully then

what that allows you to do is take your

as to train on data that's instead of

just your simulated data it's the

translated data so you can train on data

that looks like this and then the hope

is that when you go to the real world

the it's close enough that that things

all right any questions about domain

adaptation though it's kind of like a

quick tour it's a very deep topic but I

all right the next topic I want to talk

about is domain randomization and so the

idea here is you know in a lot of in

sort of the techniques that we've talked

about so far for sim to real transfer

the assumption has been like let's try

to model the real world as closely as we

can in the simulator and if we get it

close enough then like maybe that data

will be more useful in the real world or

maybe at least though it will allow us

to kind of adapt between that data and

the real world data the idea of domain

randomization is a little bit different

which is instead of trying to find a

single best simulator let's just make

the simulator as varied as possible and

you know maybe the hypothesis is that

like maybe if the if the model sees

enough simulated variation so it sees

enough kind of different simulated

worlds then when it does get into the

real world

it'll have learned sort of a general

enough strategy because it sees so much

variety that it'll be able to figure out

what's happening in the real world so

this is kind of a core idea and what I

want to cover on demand randomization

first I want to give like kind of a

little bit of a history of the idea

because I think it's it's important to

kind of know where this idea came from

and it's not a new idea then I want to

talk about some of the applications that

people have used it for then I want to

try to give you a little bit of an

intuition as to why it works right

because it's kind of a counterintuitive

thing right

why should training out a lot of really

unrealistic data allow us to generalize

to realistic data then I want to breach

briefly mention a few tools if you want

to use this in practice that you can

that you can go try and then finally I

want to talk about some extensions that

people have made to this core idea and

sort of how this research field is

evolving all right starting with the

history I think you know so so again

this is like the this idea of using

really noisy simulators is not new in

robotics and the first instance that I

know of is from this paper called the

radical envelope of noise hypothesis

from 1997 and the idea here is if you're

trying to solve a task like you know you

have a robot driving down a hallway and

needs to turn decide whether to turn

left or right depending on whether it

gets a flash of light from the left or

the right how do you build a simulator

to solve this task well the the insight

of this paper was to say there's you

know some things that we really need to

model carefully in order to actually

solve the problem at all and so that's

that's what we call like the the base

set of things in the simulator right so

is the light coming from the left or is

it coming from the right how long is the

hallway and things like that and then

there's a bunch of other things that

that we need to model in our simulator

but that are sort of inconsequential for

solving the task so like you know what

is the what is the friction model

between the wheels and the hallway and

so that the insight here is we want to

take the base set and model those things

as carefully as possible and then take

everything else and maximally randomize

it and they were able to solve this task

using sort of a very simple simulator by

using this technique in the deep

learning world the first example of this

that I've seen is from this page like

very underrated paper called live

repetition counting and the idea of this

paper is you know they wanted to train a

model that could count when people are

doing cyclical behavior so when they're

doing push-ups or jumping jacks or

something like that but they didn't

really want to like go through all the

effort of labeling data of people doing

that so what they did instead is they

created the synthetic training data set

that consisted of kind of random white

noise in the background and then

cyclical periodic noise in the

foreground and the really surprising

result from this paper was that when

they trained model on data that looked

like this random noise and then tested

on real data they were able to actually

solve the task right they're able to

count how many times people were doing

jumping jacks the first application of

this sort of concept in robotics in sort

of deep learning of robotics that I'm

aware of is this paper called CAD to RL

from Sergey Levin's group here at

Berkeley and the the task that they were

trying to solve here is driving a

quadcopter down a hallway and making

sure that doesn't hit the walls and so

they built this simulator that had that

was randomized with these different sort of

semi-realistic textures and floor plans

that they that they designed and what

they found was that when they trained on

the simulator they were able to fly a

quadcopter down a real hallway and not

crash at least reasonably frequently and

then I think you know the last two

papers that I mentioned were kind of the

inspiration for for us to start working

on this and the the core thing that we

wanted to try to figure out is whether

we could apply this idea to sort of more

precise tasks in robotics so two

grasping something where you need to be

able to position the gripper really

carefully and we were also curious to

see whether you know whether we could

get away without needing to design floor

plans and textures ourselves if instead

we could just procedurally generate

those in a really unrealistic way and

then finally we are curious if we really

needed to pre train these models on

imagenet in order for this to work or

whether we could just train them only on

synthetic data all right so the next

thing I'm going to talk about is some of

the ways that people who apply this idea

and the first is kind of the problem

that I just mentioned which is using

domain randomization for computer vision

and in particular using for using it to

estimate the pose of a particular object

in a scene and so what we did here is we

for each scene so for each like image

that the the model saw we gave it a

unique set of randomization so we

randomized things like textures and

materials colors of the background and

things like that we changed the

positions of the cameras we change the

lighting and we added a bunch of other

objects to the scene that we're sort of

trying to distract the model from the

object that it ultimately cares about we

trained a relatively simple neural

network so this is kind of just a vgg

with the top two fully connected layers

popped off then smaller ones put on top

and the model is taking an image of a

scene and regressing it to just the XY

and z coordinates of a particular object

that we care about in that scene so how

well does it work you know this is sort

of an unfair comparison because you know

all these papers use different objects

and different distances for

the camera and so on but kind of at a

high level were sort of within what

you'd expect to be able to do with

relatively state-of-the-art post

estimation techniques from a single

singular single monocular camera

training entirely on synthetic data and

so here's what this work looks like when

you deploy it on a robot to grasp an

all right well we'll see if we can get

oh there we go okay yeah so this is a

kind of extent an extension of the

original paper where we were you know it

was April Fool's Day and we wanted to

like see if we could train a robot that

could detect like spam in the real world

so we've it we we built the spam

detecting robot and I would pick up the

spam off the table and drop it in the

trash can the other the next thing that

we applied this to at open a I was block

stacking and so the goal here was we had

trained a policy that could do block

stacking in simulation using one-shot

imitation learning right so see a single

demonstration of a human doing the task

in virtual reality and then apply it

from different initial conditions but

the challenge was in order to deploy

this in the real world we needed to know

where the blocks were and so we we

trained similar model really similar

data set and we were much more careful

about sort of calibrating cameras and

stuff like that and we were able to get

really precise localizations of the

objects so that you could actually stack

six blocks on top of each other using

you know vision model is trained it

entirely on synthetic data so how does

it work a few observations one is you

know one of the really important things

is just using a lot of data so as you

increase the number of the amount of

training data on the x-axis then the the

error goes down at least until you get

to around fifty or a hundred thousand

images you might ask like what's what's

the important part of having more data

is it just having more training examples

so we tested whether we could get the

same results with the same number of

images but with fewer unique textures

and it turns out that the important

thing here was you need to have as many

unique textures as you can so as if as

you increase the number of unique

textures the error also goes down and

then lastly and this is kind of

surprising to us was that to find that

you know pre training our model and

imagenet is actually not necessary and

so you can see like appreciating

imagenet actually does help right like

if you are in the low data regime then

the model pre trained on imagenet is

still able to do something reasonable

but as you

enough data then then pre-training

becomes unnecessary alright so here

here's just a few other kind of

highlights of results that people have

had using using this technique or

extensions of this technique to solve

other kind of perception problems in robotics

robotics

so people have extended it to estimating

you know not just where the object is on

the table but the full sort of 6d pose

so the position and orientation of the

object people have extended it to doing

objects with really challenging textures

and so this is a paper where they use

domain randomization to train a model

perception model for a system that was

grasping fish out of a bucket not sure

why they picked that task but it's it's

a really challenging one for from a

perception standpoint because fish are

like kind of shiny and reflective and

and difficult to model those textures

and then people have also extended to

you know instead of just localizing

where a single object is with a single

network instead localizing kind of an

entire corpus of objects a few others I

want to highlight people have used these

techniques for object detection for

autonomous vehicles for face tracking so

taking a simulated model of your face

randomizing it training a model on that

to sort of tell the pose of your face

from a single camera image localizing a

robot within a lung so you know your

your if you have like if you're driving

a robot around a lung and you want to

decide whether to turn left or turn

right you need to know where in the lung

you are and so if you have a vision

model that allows you to sort of take

the the image that the robot season and

then map it back to where on the map of

the lung you are sort of end-to-end

control so instead of just training a

pose estimator you can also train a

policy that takes images directly and

outputs commands to the robot and so

people have people have done that with

germain randomized data and then also

cloth manipulation so estimating kind of

the state of a cloth so that a robot can

fold the corners together so also you

know tasks where there's

non-rigid objects this technique also

works for other types of sensors so some

of Jeff Mahler's work here at Berkeley

on decks net which is a sort of work

that does a really good job of grasping

generic objects there their models are

trained the inputs to their models are

images from at least in this version of

the work or images from a depth camera

and you can apply a similar set of

techniques by like adding a lot of

random noise the depth image and then

you can train on synthetic depth images

and generalize to real depth images in

the real world a couple of assumptions

that the results I've shown you so far

have in common the first is that you

actually have 3d models of kind of all

the objects that you want to track and

so you know one thing that you might ask

is how can we how can we move from this

right so how can we move from needing 3d

models of every object that we care

about to being able to train a sort of

generic vision policy that can work for

any type of object so we explore this in

the context of grasping right so you

know in grasping like you care about

really being able to grasp any object

but as we talked about earlier it's

really hard to get good databases of

objects to train your model on and so we

ask the question well maybe you know

similarly to how you don't really need

realistic textures in order to train a

vision model maybe you also don't need

realistic objects to train a grasping

model and so we procedurally generated

these sort of highly unrealistic and

objects like on the left and we trained

a policy to pick up those objects in

simulation based on you know based on

depth images of the objects and then we

tested on real-world objects and kind of

the surprising thing that we found out

was that we're able to actually

generalize to grasping realistic objects

in the real world from only training on

highly unrealistic procedurally

generated objects entirely in simulation

this is what it looks like so again it's

not perfect right this is not we're not

getting a hundred percent success here

but it's you know it's a grapple

generalized grasping as a very

difficult problem and the the

interesting result here is that we're

able to get something to work at all

using entirely simulated data another

assumption baked into the results I've

shown you so far is that sort of

dynamics are relatively consistent

between the simulator and the real world

so what if what if that's not true

right so what if your what it you know

what if there's some gap in physics

between your simulator in the real world

and so similar set of ideas also applies

to randomizing dynamics so the way this

typically works is you know in standard

reinforcement learning you'll train like

a feed-forward neural network policy on

a single best version of the environment

and then you'll execute that on your

test environment but what these

techniques do is instead they train a

recurrent neural network so a neural

network that has some state and they'll

train that on a variety of different

physical environments and the idea here

is that the the memory of the neural

network in principle should allow the

neural network to kind of figure out

which version of the simulation it's in

and adapt to that simulation so the

first set of results here was from Jason

pang during his internship at open AI

and they worked on kind of these tasks

that involved sort of sliding objects on

table so this is trained entirely in

simulation with their with their kernel

network and and then generalize this to

the real world this has also been

extended to more challenging tasks so

this is a result from open AI about a

little more than a year ago where they

we trained a robotic hand high

dimensional robotic hand to sort of

reorient it and manipulate objects in

hand so it's a very like contact rich

and challenging task and this is trained

you know more or less exactly the same

way using randomized physics parameter

parameters and a little bit more detail

you know the way this worked is there

were a bunch of different sort of

variations of the environment and robots

were trained in those environments using

reinforcement learning and there were

the the recurrent policies were forced

to adapt to a wide range of different

physics environments and then in the

real world there was also a sort of a

state estimation module that was trained

in a similar way so trained on vision

data from the simulator and then

deployed deployed in order to like

estimate the state so that the policy

could know what to do next and so the

things that are typically randomized

here and that were randomized in this

paper are things like physical

parameters but then also just sort of

correlated and uncorrelated noise being

added to the simulation sensor dropout

so occasionally just assuming that the

sensor fails how long the physics time

how long DT is in the physics simulation

there's a model of backlash that's

applied to it and they're random forces

that are applied to the object as well

so there's like quite a bit of effort

that went into figuring out what are

really all the things that we need to

randomize in order to make something

like this transfer all right any

questions about kind of like the high

level idea here sort of where people who

applied it to where it works where it

doesn't before I move on to talking

about my intuition about why this

actually works yeah behind you

so when you see Sims 2 real not working

how can you tell what the failure mode

was like was it the dynamics that was

different was it the pose estimation

that was different

yeah it's a great question and I think

this is sort of one of the core things

that makes that makes using these

techniques still difficult there's a few

things that you can do to make it easier

I think in a lot of the opening eye

results the sort of approach that we

took was to separate perception and

control so we'll have one neural network

that's looking at raw sensor data like

images and then it's trying to output

what it thinks the state of the world is

given those images and then we'll have

another module that says alright given

that I know the state of the world let

me try to predict what I should do next

and so separating those two things

allows you to audit it a little bit more

easily because then you can look at the

the errors that the pose estimation

module like the state estimator is doing

specifically and isolate that as a

source of error but in general like if

you you know if you're if you're

deploying a model the real world and it

you know in it and it fails right it's

like it's still a very hard problem to

go back and see like okay what like did

this fail because you know there was

there's like way more friction in this

scenario than we modelled or you know is it something else

it something else and so I think there's kind of a lot of

and so I think there's kind of a lot of like intuition and an engineering that

like intuition and an engineering that goes into this still I also have one

goes into this still I also have one more question yeah so how much domain

more question yeah so how much domain knowledge is required when you're

knowledge is required when you're applying these techniques to one robot

applying these techniques to one robot versus then trying it for maybe a

versus then trying it for maybe a slightly different application is there

slightly different application is there a lot of fine-tuning with each

a lot of fine-tuning with each individual robot and each individual

individual robot and each individual application or do you think you're

application or do you think you're getting closer to a general strategies

getting closer to a general strategies yeah it's it's a little bit hard to

yeah it's it's a little bit hard to answer that definitively because this

answer that definitively because this has really only been tried on to my

has really only been tried on to my knowledge like a pretty small number of

knowledge like a pretty small number of robots there's the the fetch robot that

robots there's the the fetch robot that I showed you for the grasping examples

I showed you for the grasping examples and the shadow hand and I think there's

and the shadow hand and I think there's there's other examples too but those are

there's other examples too but those are the two that I'm most familiar with I

the two that I'm most familiar with I think the thing that gives me hope that

think the thing that gives me hope that maybe it's like maybe we're starting to

maybe it's like maybe we're starting to figure out the limits of the parameters

figure out the limits of the parameters that we need to randomize and things

that we need to randomize and things like that is that the like opening item

like that is that the like opening item was able to get the kind of the in hand

was able to get the kind of the in hand block manipulation result to work with

block manipulation result to work with other objects like different shaped

other objects like different shaped objects with sort of relatively little

objects with sort of relatively little additional effort on top of that so

additional effort on top of that so that's

that's of the one pieces of evidence that I can

of the one pieces of evidence that I can point to that says you know maybe once

point to that says you know maybe once we figure these things out once it's

we figure these things out once it's easier to expand them to other types of

easier to expand them to other types of problems question behind you so for

problems question behind you so for doing the physics parameter

doing the physics parameter randomization that can't be skipped the

randomization that can't be skipped the system ID stuff ah great question in

system ID stuff ah great question in principle you would hope right that like

principle you would hope right that like if we're randomizing physics then the

if we're randomizing physics then the whole point is that like we're trying to

whole point is that like we're trying to make the our simulator so much more

make the our simulator so much more diverse than the real world that you

diverse than the real world that you know it doesn't matter that it's not

know it doesn't matter that it's not exactly the same and I think for vision

exactly the same and I think for vision that's mostly true like you don't really

that's mostly true like you don't really have to be super careful about

have to be super careful about calibrating things for for vision but

calibrating things for for vision but for dynamics that's decidedly not true

for dynamics that's decidedly not true so it is it is still important like the

so it is it is still important like the better your system ID is the more likely

better your system ID is the more likely this technique is to be successful for

this technique is to be successful for for dynamics randomization even if

for dynamics randomization even if you're sort of have a big range of

you're sort of have a big range of randomization parameters and I'm not

randomization parameters and I'm not sure why that's the case all right so

sure why that's the case all right so the next topic I want to touch on is

the next topic I want to touch on is like why why does this work right it's

like why why does this work right it's kind of this mysterious thing you have

kind of this mysterious thing you have sort of all of its very training data

sort of all of its very training data and you train a model on it and then it

and you train a model on it and then it kind of just magically works on your

kind of just magically works on your real data even though the simulated data

real data even though the simulated data is super low fidelity and unrealistic so

is super low fidelity and unrealistic so there's a few so I think no one really

there's a few so I think no one really like has a great answer to this question

like has a great answer to this question there's a few intuitions that I have

there's a few intuitions that I have that I want to just sort of lay out for

that I want to just sort of lay out for you and I'll talk a little bit about

you and I'll talk a little bit about each of these so the first intuition

each of these so the first intuition that I have is that you know maybe the

that I have is that you know maybe the training data itself comes from like

training data itself comes from like some sort of covering distribution of

some sort of covering distribution of the real world data and so this

the real world data and so this intuition intuition says that like if

intuition intuition says that like if you have a none randomized simulator

you have a none randomized simulator maybe like this is sort of the

maybe like this is sort of the distribution of kind of like

distribution of kind of like environments and physics that you would

environments and physics that you would see but the real data is like this

see but the real data is like this complicated messy distribution of of

complicated messy distribution of of environments that's like much wider than

environments that's like much wider than your simulated distribution and so

your simulated distribution and so that's why you don't generalize well if

that's why you don't generalize well if you use a none randomized simulation and

you use a none randomized simulation and so what this intuition says is like well

so what this intuition says is like well maybe what we can do is we can just make

maybe what we can do is we can just make the range of randomizations so big in

the range of randomizations so big in simulation that like everything that we

simulation that like everything that we might see in the real world like

might see in the real world like lies somewhere in between different

lies somewhere in between different things that we've seen in rent when

things that we've seen in rent when we're randomizing and so like maybe the

we're randomizing and so like maybe the domain randomized data looks like this

domain randomized data looks like this it's like this massive distribution that

it's like this massive distribution that covers the real distribution there's so

covers the real distribution there's so I think this is kind of a flawed

I think this is kind of a flawed intuition there are a few things that I

intuition there are a few things that I think are useful about it one is the

think are useful about it one is the idea that wider distribution like you

idea that wider distribution like you know implication of this is that as you

know implication of this is that as you make the distribution of simulated

make the distribution of simulated parameters wider you should get better

parameters wider you should get better results and that tends to be true

results and that tends to be true another thing that I think is useful

another thing that I think is useful about this is that the concept that

about this is that the concept that we've already touched on in the in the K

we've already touched on in the in the K in the sort of instance of testing that

in the sort of instance of testing that you want your simulated task to be

you want your simulated task to be harder than your real task and then I

harder than your real task and then I think the last intuition that this the

think the last intuition that this the last sort of useful thing about this

last sort of useful thing about this intuition is that it's clear from this

intuition is that it's clear from this intuition that if you want your model to

intuition that if you want your model to perform well then you need kind of you

perform well then you need kind of you need to be able to perform well in all

need to be able to perform well in all parts of this like massive distribution

parts of this like massive distribution that you're training on right so it's

that you're training on right so it's not okay if like if you know if your

not okay if like if you know if your model performs really badly over here

model performs really badly over here right because it might be the case that

right because it might be the case that your real data sort of lies in that part

your real data sort of lies in that part of the region I think there are some

of the region I think there are some problems with this intuition so we're

problems with this intuition so we're mostly operating in a high dimensional

mostly operating in a high dimensional space and so you know we really should

space and so you know we really should need a like a really massive amount of

need a like a really massive amount of data to truly cover this real data

data to truly cover this real data distribution and then you know there are

distribution and then you know there are a lot of real-world effects that we

a lot of real-world effects that we might not model at all in our simulator

might not model at all in our simulator like backlash Gear backlash for example

like backlash Gear backlash for example or the specific distortion of the camera

or the specific distortion of the camera that you're using and so if we're not

that you're using and so if we're not modeling and effect at all is it like

modeling and effect at all is it like really reasonable to believe that the

really reasonable to believe that the impact of that effect will somehow be

impact of that effect will somehow be accounted for by the things that we are

accounted for by the things that we are randomizing another intuition that I

randomizing another intuition that I think can be helpful in thinking about

think can be helpful in thinking about this is that domain randomization is a

this is that domain randomization is a way of telling the model what it can

way of telling the model what it can ignore and so the the example I have

ignore and so the the example I have here is like let's say that you have a

here is like let's say that you have a data set that looks like this and you

data set that looks like this and you train a neural network on it to predict

train a neural network on it to predict whether the image comes from label from

whether the image comes from label from class 1 or class 2 and so if you train

class 1 or class 2 and so if you train on this data distribution and then what

on this data distribution and then what your model is going to do is it's going

your model is going to do is it's going to train a detector for

to train a detector for blue owls on green backgrounds right so

blue owls on green backgrounds right so neural networks are kind of lazy and

neural networks are kind of lazy and they'll exploit any sort of commonality

they'll exploit any sort of commonality in the data that you give them and so if

in the data that you give them and so if you want to instead train an owl

you want to instead train an owl detector then you need to like then what

detector then you need to like then what might work better is to use data like

might work better is to use data like this right so if you don't want the

this right so if you don't want the model to pick up on the fact that all of

model to pick up on the fact that all of the the owls in your data set are blue

the the owls in your data set are blue then maybe you should just change the

then maybe you should just change the color of the owl every single time so

color of the owl every single time so that you force that feature to be

that you force that feature to be unreliable and then the neural network

unreliable and then the neural network can't exploit it in order to decide in

can't exploit it in order to decide in order to make its decision and then the

order to make its decision and then the last intuition I want to touch on is

last intuition I want to touch on is this idea of domain randomization as

this idea of domain randomization as meta learning so the high-level idea of

meta learning so the high-level idea of meta learning is that you know in a

meta learning is that you know in a standard machine learning task you're

standard machine learning task you're trying to find some parameters that

trying to find some parameters that minimize some loss function on your data

minimize some loss function on your data but in meta learning you assume that you

but in meta learning you assume that you can also that you can also choose or

can also that you can also choose or that the data itself is is not static

that the data itself is is not static right so you're minimizing some

right so you're minimizing some parameters for data that is sampled from

parameters for data that is sampled from some distribution over datasets and so

some distribution over datasets and so kind of a concrete example here is like

kind of a concrete example here is like suppose that you're you want to you want

suppose that you're you want to you want to train a model that can from a very

to train a model that can from a very small number of images distinguish

small number of images distinguish between two different classes right and

between two different classes right and so your training examples in this

so your training examples in this paradigm are themselves data sets so you

paradigm are themselves data sets so you might have one data set that has cats

might have one data set that has cats and birds and the model has to decide

and birds and the model has to decide whether it's a cat or a bird and then

whether it's a cat or a bird and then you might have another data set that has

you might have another data set that has you know flowers and bikes and the model

you know flowers and bikes and the model has to decide for a new image whether

has to decide for a new image whether it's a flower or a bike and then at test

it's a flower or a bike and then at test time you'll be given some other data set

time you'll be given some other data set that might have you know some different

that might have you know some different classes maybe that you haven't seen

classes maybe that you haven't seen before and then by adjusting this kind

before and then by adjusting this kind of small amount of labeled data your

of small amount of labeled data your model will need to take a new image like

model will need to take a new image like let's say of a dog and then correctly

let's say of a dog and then correctly predict whether it's a dog or an otter

so this idea is has been also applied to reinforcement learning and you know and

reinforcement learning and you know and so like in this formulation you have you

so like in this formulation you have you know the concept of a task so like

know the concept of a task so like predicting whether something is a cat or

predicting whether something is a cat or a dog

a dog is like basically a like one or more

is like basically a like one or more rollouts and it give an environment and

rollouts and it give an environment and then you kind of and then you can kind

then you kind of and then you can kind of reset the state of the policy that

of reset the state of the policy that you're using to learn between each of

you're using to learn between each of those tasks and so the I think like the

those tasks and so the I think like the one paper that I like on this is our l

one paper that I like on this is our l squared paper from Rocky duan who was

squared paper from Rocky duan who was one of Peters former PhD students and

one of Peters former PhD students and the idea here is that like the recurrent

the idea here is that like the recurrent neural network is allowed to use its

neural network is allowed to use its hidden state within a given task to sort

hidden state within a given task to sort of quickly figure out how to solve a new

of quickly figure out how to solve a new reinforcement learning and then there's

reinforcement learning and then there's a slow learning process on top of that

a slow learning process on top of that that allows it to figure out what it

that allows it to figure out what it needs to do when it's faced with the new

needs to do when it's faced with the new environment in order to learn quickly

environment in order to learn quickly so I'll skip through the formalisms here

so I'll skip through the formalisms here but I think I do want to touch on like

but I think I do want to touch on like why

why demain randomization might be meta

demain randomization might be meta learning so the formulation of domain

learning so the formulation of domain randomization as meta learning is that

randomization as meta learning is that like each set of physics parameters

like each set of physics parameters corresponds to some environment and you

corresponds to some environment and you know one so one like attempt at solving

know one so one like attempt at solving the task in that environment

the task in that environment what attempt at solving the problem in

what attempt at solving the problem in that environment is a task and so you

that environment is a task and so you know during the rollout itself like when

know during the rollout itself like when you're trying to solve that problem in

you're trying to solve that problem in your new environment the recurrent state

your new environment the recurrent state of the policy allows you to adapt to

of the policy allows you to adapt to whatever new physics you're seeing and

whatever new physics you're seeing and so there's there's a little bit of

so there's there's a little bit of evidence that this might actually be the

evidence that this might actually be the case that this might actually be

case that this might actually be happening in when policies are trained

happening in when policies are trained in simulation and then deployed in the

in simulation and then deployed in the real world there are some sort of tools

real world there are some sort of tools that you can use to do demand

that you can use to do demand randomization like if you're using

randomization like if you're using different simulators Gazebo unity unreal

different simulators Gazebo unity unreal or sort of custom self driving simulator

or sort of custom self driving simulator and I recommend checking these out if

and I recommend checking these out if you want to apply this and then there

you want to apply this and then there are also some challenges to applying

are also some challenges to applying domain randomization and so you know in

domain randomization and so you know in practice like how does this process

practice like how does this process actually work right so for the first

actually work right so for the first thing that you do is you build a

thing that you do is you build a simulated world and then you take your

simulated world and then you take your simulated world and you calibrate that

simulated world and you calibrate that to the real environment and then you

to the real environment and then you design some randomizations that like you

design some randomizations that like you think intuitively might sort of like

think intuitively might sort of like the real-world variability then what you

the real-world variability then what you do is you train a model in that

do is you train a model in that simulation and you evaluate it in the

simulation and you evaluate it in the real world and then finally you kind of

real world and then finally you kind of have to go through this manual iterative

have to go through this manual iterative process of examining the failure modes

process of examining the failure modes in the real world and trying to design

in the real world and trying to design new randomizations that allow you to to

new randomizations that allow you to to get around those failure modes and so I

get around those failure modes and so I think kind of the core challenges here

think kind of the core challenges here are that this process is very manual

are that this process is very manual right so you need to do all the 3d

right so you need to do all the 3d modeling yourself you need to do this

modeling yourself you need to do this system ID problem which itself can be

system ID problem which itself can be challenging you need to decide what to

challenging you need to decide what to randomize which can which can require a

randomize which can which can require a lot of judgment and you need to decide

lot of judgment and you need to decide how much to randomize it and then

how much to randomize it and then finally you need to like as was pointed

finally you need to like as was pointed out once you evaluate in the real world

out once you evaluate in the real world you know somehow like go back and figure

you know somehow like go back and figure out what you should do when the model is

out what you should do when the model is failing like what additional

failing like what additional randomization should you add and so

randomization should you add and so there's been some recent work that's try

there's been some recent work that's try to extend domain randomization to kind

to extend domain randomization to kind of alleviate some of these challenges

of alleviate some of these challenges and I think what I'll do is I'll just

and I think what I'll do is I'll just kind of give a high-level sense of like

kind of give a high-level sense of like what are the what are the ways that

what are the what are the ways that people are trying to adjust a dress the

people are trying to adjust a dress the challenges of domain randomization and

challenges of domain randomization and then I have also some references to some

then I have also some references to some specific papers where people are trying

specific papers where people are trying to do this that you can dive into if

to do this that you can dive into if you're curious about the specifics of

you're curious about the specifics of the approaches that people have taken so

the approaches that people have taken so the first kind of the first class of

the first kind of the first class of techniques that people have tried to

techniques that people have tried to make domain randomization better is to

make domain randomization better is to say like well maybe we can design

say like well maybe we can design specific types of neural network

specific types of neural network architectures that are better suited to

architectures that are better suited to transfer so that work better for this

transfer so that work better for this particular type of tasks that we're

particular type of tasks that we're doing so an example of a paper here is

doing so an example of a paper here is this randomized to canonical adaptation

this randomized to canonical adaptation networks from some folks at Google and

networks from some folks at Google and the idea here is instead of you know

the idea here is instead of you know taking a randomized simulation and

taking a randomized simulation and training your model on that to directly

training your model on that to directly output what it's supposed to do instead

output what it's supposed to do instead what if you add this intermediate step

what if you add this intermediate step and this intermediate step is we first

and this intermediate step is we first trained a model that Maps this

trained a model that Maps this randomized simulation into some sort of

randomized simulation into some sort of canonical simulation and then when we

canonical simulation and then when we get in the real world we'll take our

get in the real world we'll take our real world data

real world data we'll also map that into the canonical

we'll also map that into the canonical simulation and and so then and so it

simulation and and so then and so it turns out in their experiments this

turns out in their experiments this performs better than just training on

performs better than just training on randomized data from scratch alone

randomized data from scratch alone another class of techniques that people

another class of techniques that people have tried is trying to match the

have tried is trying to match the simulator to real data and so this is

simulator to real data and so this is kind of like combining domain

kind of like combining domain randomization and system ID one approach

randomization and system ID one approach in this category is simha where they

in this category is simha where they kind of it early interactively train on

kind of it early interactively train on a randomized environments use the policy

a randomized environments use the policy from that randomized environment to

from that randomized environment to collect data in the real world and then

collect data in the real world and then use that real-world data to try to

use that real-world data to try to update the parameters of the simulator

update the parameters of the simulator to be a better match for what was seen

to be a better match for what was seen in the real world so this kind of an

in the real world so this kind of an iterative approach to automatically

iterative approach to automatically incorporating real-world data into

incorporating real-world data into simulator design and they had some

simulator design and they had some pretty interesting results from that and

pretty interesting results from that and you know another sort of Catalan other--

you know another sort of Catalan other-- approach here is is this idea of

approach here is is this idea of medicine where this is really addressing

medicine where this is really addressing the problem of world design using

the problem of world design using simulation and so you know the challenge

simulation and so you know the challenge is that like if you're trying to design

is that like if you're trying to design a self-driving car simulator and you're

a self-driving car simulator and you're just placing you know objects randomly

just placing you know objects randomly then you're gonna get a lot of scenes

then you're gonna get a lot of scenes that look like on the Left right where

that look like on the Left right where objects are not placed in in a way

objects are not placed in in a way that's physically realistic or that you

that's physically realistic or that you would see in your real data set and so

would see in your real data set and so the the goal of this approach is to try

the the goal of this approach is to try to take these scenes that you generate

to take these scenes that you generate naively and use a little bit of

naively and use a little bit of real-world data to make the the scenes

real-world data to make the the scenes that your simulator generates more

that your simulator generates more physically plausible

another thing that you can do that you could think about doing is like is

could think about doing is like is actually using the real data itself to

actually using the real data itself to try to directly improve the performance

try to directly improve the performance of the Model T train on simulation on

of the Model T train on simulation on the tasks that you care about and so I

the tasks that you care about and so I think the one one paper that I really

think the one one paper that I really like in this category is called learning

like in this category is called learning to simulate and so the idea here is you

to simulate and so the idea here is you know in standard domain randomization

know in standard domain randomization what we do is we sort of we have this

what we do is we sort of we have this manual process of tuning the simulator

manual process of tuning the simulator parameters right so we create some

parameters right so we create some simulator parameters and then we train a

simulator parameters and then we train a model on those parameters we see how

model on those parameters we see how well it works in the real world and then

well it works in the real world and then we try to use our intuition to go back

we try to use our intuition to go back and say all right can we design better

and say all right can we design better simulator parameters so what they what

simulator parameters so what they what they do in this paper is instead of

they do in this paper is instead of doing that tuning process manually they

doing that tuning process manually they instead use meta-learning to find the

instead use meta-learning to find the parameter distribution so they they're

parameter distribution so they they're optimizing over the distribution of

optimizing over the distribution of simulator parameters that performs the

simulator parameters that performs the best on the tasks that they actually

best on the tasks that they actually care about they're like they're actually

care about they're like they're actually optimizing the distribution of simulator

optimizing the distribution of simulator parameters itself based on this metric

parameters itself based on this metric of how old is the model that I train on

of how old is the model that I train on that simulator perform in the real world

another kind of class of techniques in this category is providing some way of

this category is providing some way of telling whether you're overfitting to

telling whether you're overfitting to the simulation before you before you

the simulation before you before you actually go into the real world and so

actually go into the real world and so what this could allow you to do is if if

what this could allow you to do is if if you know that you've trained too much on

you know that you've trained too much on this simulation then you can stop

this simulation then you can stop training and deploy into the real world

training and deploy into the real world before you've kind of over fit to the

before you've kind of over fit to the simulator and so this is a really

simulator and so this is a really interesting paper that follows that

interesting paper that follows that approach another thing you might think

approach another thing you might think about doing is right so we have this

about doing is right so we have this intuition that it's really good if your

intuition that it's really good if your simulator is harder than the real world

simulator is harder than the real world right but most most of the examples in

right but most most of the examples in your simulator maybe are not really that

your simulator maybe are not really that hard and so the question here is like

hard and so the question here is like can is there some automated way of

can is there some automated way of surfacing the hardest examples in the

surfacing the hardest examples in the simulator so that you can focus your

simulator so that you can focus your models training on those hard examples

models training on those hard examples and two papers that I like in this

and two papers that I like in this category one is active domain

category one is active domain randomization where

randomization where they have like they're randomized

they have like they're randomized simulators and then some reference

simulators and then some reference simulator and they train a model to try

simulator and they train a model to try to tell whether the policy was being

to tell whether the policy was being rolled out in the reference simulator

rolled out in the reference simulator where you know that you should do well

where you know that you should do well or one of the randomized simulators and

or one of the randomized simulators and so if if if this discriminator can tell

so if if if this discriminator can tell the difference between the the behavior

the difference between the the behavior of the robot in the randomize simulator

of the robot in the randomize simulator then that means that that simulator

then that means that that simulator might be harder and so you can focus

might be harder and so you can focus more of your effort on on training in

more of your effort on on training in that simulator I think I'm gonna skip

that simulator I think I'm gonna skip over this one and then I think the the

over this one and then I think the the last category of extensions to demand

last category of extensions to demand randomization that I'm excited about is

randomization that I'm excited about is so you know we had this idea that the

so you know we had this idea that the wider the range of simulation and the

wider the range of simulation and the better right so if we can train on a

better right so if we can train on a wide range of simulators then it's more

wide range of simulators then it's more likely that our model will generalize to

likely that our model will generalize to the real world but the challenge is that

the real world but the challenge is that in a lot of cases if you make the

in a lot of cases if you make the distribution of simulations too wide it

distribution of simulations too wide it becomes too hard a task for your network

becomes too hard a task for your network to do well on right so if you if you

to do well on right so if you if you degrade the performance in simulation

degrade the performance in simulation then you can't expect it to do well in

then you can't expect it to do well in the real world either and so this

the real world either and so this category of things is about trying to

category of things is about trying to allow the model to perform well in a

allow the model to perform well in a wider range of simulations so you can

wider range of simulations so you can continue to expand the range of

continue to expand the range of simulations that you train on without

simulations that you train on without hurting your models performance in

hurting your models performance in simulation and so two ideas in this

simulation and so two ideas in this category that I want to touch on one is

category that I want to touch on one is essentially allowing the model when

essentially allowing the model when you're training in simulation to see

you're training in simulation to see which simulator it's in right so instead

which simulator it's in right so instead of needing to figure out which simulator

of needing to figure out which simulator it's and you provide the information

it's and you provide the information about which simulator it's in to the

about which simulator it's in to the policy and so that allows the policy to

policy and so that allows the policy to kind of have less work to do right

kind of have less work to do right because it doesn't need to figure out

because it doesn't need to figure out which version of the world it's in it

which version of the world it's in it already knows that then in the real

already knows that then in the real environment obviously you don't have

environment obviously you don't have that information and so what they do is

that information and so what they do is they run an optimization algorithm in

they run an optimization algorithm in the real world that allows them to find

the real world that allows them to find the value of that simulator parameter

the value of that simulator parameter vector that allows the model to perform

vector that allows the model to perform best there they have some results some

best there they have some results some promising results in simulation that

promising results in simulation that show that this can work well

show that this can work well and then the last idea when a touch on

and then the last idea when a touch on is automatic domain randomization and

is automatic domain randomization and this is kind of the extension to domain

this is kind of the extension to domain randomization that allowed open AI to

randomization that allowed open AI to recently solve the Rubik's key with a

recently solve the Rubik's key with a robotic hand and the you know the core

robotic hand and the you know the core concept here is that you know since wide

concept here is that you know since wide randomization ranges lead to poor

randomization ranges lead to poor performance of a model that's trained on

performance of a model that's trained on the entire randomization range maybe we

the entire randomization range maybe we can allow the model to perform well on a

can allow the model to perform well on a wider and wider range of simulations

wider and wider range of simulations sort of by gradually growing the width

sort of by gradually growing the width of those simulations that's trained on

of those simulations that's trained on so we start with a really narrow range

so we start with a really narrow range of simulations and then once we perform

of simulations and then once we perform well on that narrow range then we make

well on that narrow range then we make the range a little bit wider and so like

the range a little bit wider and so like the idea here is that maybe that's an

the idea here is that maybe that's an easier learning problem and so we can

easier learning problem and so we can continually we can continue to expand

continually we can continue to expand the number of simulators that the model

the number of simulators that the model is trained on I'll skip over the details

is trained on I'll skip over the details here but this is kind of the result that

here but this is kind of the result that you can get by using something like this

you can get by using something like this and so I think it's a this is running in

and so I think it's a this is running in real time so it's gonna it takes a few

real time so it's gonna it takes a few seconds for it to actually start doing

seconds for it to actually start doing things but this this robot like

things but this this robot like ultimately is able to solve the Rubik's

ultimately is able to solve the Rubik's Cube in hand a couple of caveats to this

Cube in hand a couple of caveats to this result the first is that like this

result the first is that like this doesn't actually work all that reliably

doesn't actually work all that reliably it's kind of maybe 20% of the time that

it's kind of maybe 20% of the time that actually is able to solve it

actually is able to solve it successfully and then there were some

successfully and then there were some kind of explicit choices that need to be

kind of explicit choices that need to be made around how the sensors were

made around how the sensors were configured so it's not all it's not like

configured so it's not all it's not like directly estimating the state of the

directly estimating the state of the world for envision but I think

world for envision but I think impressive result nonetheless okay the

impressive result nonetheless okay the last thing I want to touch on is just

last thing I want to touch on is just kind of what's you know like what's

kind of what's you know like what's coming in this field right so what's

coming in this field right so what's next I think I hope that we're gonna see

next I think I hope that we're gonna see more and better tools I sort of showed

more and better tools I sort of showed you a slide I flashed a slide that has

you a slide I flashed a slide that has some tools for domain randomization but

some tools for domain randomization but I think that we can do better especially

I think that we can do better especially on physics randomization I think you

on physics randomization I think you know hopefully we can like simulators

know hopefully we can like simulators will continue to get more and more

will continue to get more and more accurate and more and more scalable and

accurate and more and more scalable and so we can do larger and larger training

so we can do larger and larger training runs and I think there's sort of

runs and I think there's sort of generation of simcha real techniques

generation of simcha real techniques that are coming and so I'm really

that are coming and so I'm really excited about sort of the research areas

excited about sort of the research areas that I showed you earlier around

that I showed you earlier around automating different parts of like the

automating different parts of like the very manual demand randomization process

very manual demand randomization process and I think those are going to continue

and I think those are going to continue to get better

to get better I see like Devane right ization and

I see like Devane right ization and domain adaptation and as well as like

domain adaptation and as well as like model-based reinforcement learning

model-based reinforcement learning people kind of think about these things

people kind of think about these things separately right now but I think that

separately right now but I think that they're all going to converge right

they're all going to converge right there's no reason that you can't do

there's no reason that you can't do domain randomization then also do domain

domain randomization then also do domain adaptation on top of that so I think

adaptation on top of that so I think ideas that sort of like approaches that

ideas that sort of like approaches that combine ideas from these three fields

combine ideas from these three fields are promising and then in terms of use

are promising and then in terms of use cases right I think this comes back to

cases right I think this comes back to sort of what what I touched on at the

sort of what what I touched on at the beginning as motivation for studying

beginning as motivation for studying these techniques to begin with I hope

these techniques to begin with I hope that people will start to prove out that

that people will start to prove out that you can use synthetic data for edge

you can use synthetic data for edge cases and for reducing bias and

cases and for reducing bias and ultimately for getting robots to learn

ultimately for getting robots to learn on really complicated like wide messy

on really complicated like wide messy real-world data distributions like I

real-world data distributions like I think an awesome project for someone to

think an awesome project for someone to do would be to try to get a

do would be to try to get a remote-controlled car to drive around

remote-controlled car to drive around Berkeley campus only training on

Berkeley campus only training on synthetic data and then lastly like the

synthetic data and then lastly like the dream for all this right so right now

dream for all this right so right now it's the super manual process but what

it's the super manual process but what like I think the Northstar for this

like I think the Northstar for this field is is like what I'd call like real

field is is like what I'd call like real to sim to real and so the idea here is

to sim to real and so the idea here is like well you where you want to be is

like well you where you want to be is where you can sort of collect some data

where you can sort of collect some data about the real world like you maybe have

about the real world like you maybe have some sensors that are observing your

some sensors that are observing your scene and then you can use that sensor

scene and then you can use that sensor data to automatically construct a

data to automatically construct a simulation and automatically decide what

simulation and automatically decide what ranges of parameters to randomize and

ranges of parameters to randomize and then you train a model in those

then you train a model in those simulations and then use the policy that

simulations and then use the policy that results from that to go and collect more

results from that to go and collect more real-world data and then go back and

real-world data and then go back and sort of improve and widen your

sort of improve and widen your simulation and so my hope is that like

simulation and so my hope is that like in the long term this whole process is

in the long term this whole process is going to get automated and we're going

going to get automated and we're going to be able to just build really powerful

to be able to just build really powerful robotic systems on top of these

robotic systems on top of these techniques okay if you're interested in

techniques okay if you're interested in learning more about this here are a few

learning more about this here are a few references that I recommend and yeah

references that I recommend and yeah thanks I think we're over time but

thanks I think we're over time but happy to take questions kind of outside

happy to take questions kind of outside or offline by email after yeah thanks

or offline by email after yeah thanks yeah I had like way too much there

yeah I had like way too much there realized halfway through but tends to

realized halfway through but tends to happen I think we'll share the slides

happen I think we'll share the slides right yeah

Video के उस moment पर जाने के लिए कोई भी text या timestamp click करें

Share करें:

ज्यादातर transcripts 5 सेकंड से कम में तैयार

एक Click में Copy125+ भाषाएंContent Search करेंTimestamps पर जाएं

YouTube URL Paste करें

कोई भी YouTube video link डालें और पूरा transcript पाएं

ज्यादातर transcripts 5 सेकंड से कम में तैयार

हमारा Chrome Extension लें

YouTube छोड़े बिना transcript तुरंत पाएं। हमारा Chrome extension install करें और watch page पर ही किसी भी video का transcript one-click में access करें।

Chrome में Add करें — Free

YouTube, Coursera, Udemy और अन्य educational platforms पर काम करता है

Instant Transcript पाएं: बस Address Bar में Domain बदलें!

YouTube

←

→

↻

https://www.youtube.com/watch?v=UF8uR6Z6KLc

YoutubeToText

←

→

↻

https://youtubetotext.net/watch?v=UF8uR6Z6KLc

YouTube Transcriptआपके results तैयार हो रहे हैं…

YouTube Transcript:Lecture 22 Sim2Real and Domain Randomization -- CS287-FA19 Advanced Robotics at UC Berkeley