MACHINE LEARNING AND ROBOTICS Lisa Lyons 10/22/08
OUTLINE Machine Learning Basics and Terminology An Example: DARPA Grand/Urban Challenge Multi-Agent Systems Netflix Challenge (if time permits)
INTRODUCTION Machine learning is commonly associated with robotics When some think of robots, they think of machines like WALL-E (right) – human-looking, has feelings, capable of complex tasks Goals for machine learning in robotics aren’t usually this advanced, but some think we’re getting there Next three slides outline some goals that motivate researchers to continue work in this area
HOUSEHOLD ROBOT TO ASSIST HANDICAPPED Could come preprogrammed with general procedures and behaviors Needs to be able to learn to recognize objects and obstacles and maybe even its owner (face recognition?) Also needs to be able to manipulate objects without breaking them May not always have all information about its environment (poor lighting, obscured objects)
FLEXIBLE MANUFACTURING ROBOT Configurable robot that could manufacture multiple items Must learn to manipulate new types of parts without damaging them
LEARNING SPOKEN DIALOG SYSTEM FOR REPAIRS Given some initial information about a system, a robot could converse with a human and help to repair it Speech understanding is a very hard problem in itself
MACHINE LEARNING BASICS AND TERMINOLOGY With applications and examples in robotics
LEARNING ASSOCIATIONS Association Rule – probability that an event will happen given another event already has (P(Y|X))
CLASSIFICATION Classification – model where input is assigned to a class based on some data Prediction – assuming a future scenario is similar to a past one, using past data to decide what this scenario would look like Pattern Recognition – a method used to make predictions Face Recognition Speech Recognition Knowledge Extraction – learning a rule from data Outlier Detection – finding exceptions to the rules
REGRESSION Linear regression is an example Both Classification and Regression are “Supervised Learning” strategies where the goal is to find a mapping from input to output Example:  Navigation of autonomous car Training Data: actions of human drivers in various situations Input: data from sensors (like GPS or video) Output:  angle to turn steering wheel
UNSUPERVISED LEARNING Only have input Want to find regularities in the input Density Estimation:  finding patterns in the input space Clustering: find groupings in the input
REINFORCEMENT LEARNING Policy:  generating correct actions to reach the goal Learn from past good policies Example: robot navigating unknown environment in search of a goal Some data may be missing May be multiple agents in the system
POSSIBLE APPLICATIONS Exploring a world Learning object properties Learning to interact with the world and with objects Optimizing actions Recognizing states in world model Monitoring actions to ensure correctness Recognizing and repairing errors Planning Learning action rules Deciding actions based on tasks
WHAT WE EXPECT ROBOTS TO DO Be able to react promptly and correctly to changes in environment or internal state Work in situations where information about the environment is imperfect or incomplete Learn through their experience and human guidance Respond quickly to human interaction Unfortunately, these are very high expectations which don’t always correlate very well with machine learning techniques
DIFFERENCES BETWEEN OTHER TYPES OF MACHINE LEARNING AND ROBOTICS Planning can frequently be done offline Actions usually deterministic No major time constraints Often require simultaneous planning and execution (online) Actions could be nondeterministic depending on data (or lack thereof) Real-time often required Other ML Applications Robotics
AN EXAMPLE: DARPA GRAND/URBAN CHALLENGE
THE CHALLENGE Defense Advanced Research Projects Agency (DARPA) Goal: to build a vehicle capable of traversing unrehearsed off-road terrain Started in 2003 142 mile course through Mojave No one made it through more than 5% of the course in 2004 race In 2005, 195 teams registered, 23 teams raced, 5 teams finished
THE RULES Must traverse a desert course up to 175 miles long in under 10 h Course kept secret until 2h before the race Must follow speed limits for specific areas of the course to protect infrastructure and ecology If a faster vehicle needs to overtake a slower one, the slower one is paused so that vehicles don’t have to handle dynamic passing Teams given data on the course 2h before race so that no global path planning was required
A DARPA GRAND CHALLENGE VEHICLE CRASHING
A DARPA GRAND CHALLENGE VEHICLE THAT DID NOT CRASH … namely Stanley, the winner of the 2005 challenge
TERRAIN MAPPING AND OBSTACLE DETECTION Data from 5 laser scanners mounted on top of the car is used to generate a point cloud of what’s in front of the car Classification problem Drivable Occupied Unknown Area in front of vehicle as grid Stanley’s system finds the probability that  ∆h >  δ  where ∆h is the observed height of the terrain in a certain cell If this probability is higher than some threshold  α , the system defines the cell as occupied
(CONT.) A discriminative learning algorithm is used to tune the parameters Data is taken as a human driver drives through a mapped terrain avoiding obstacles (supervised learning) Algorithm uses coordinate ascent to determine  δ  and  α
COMPUTER VISION ASPECT Lasers only make it safe for car to drive < 25 mph Needs to go faster to satisfy time constraint Color camera is used for long-range obstacle detection Still the same classification problem Now there are more factors to consider – lighting, material, dust on lens Stanley takes adaptive approach
VISION ALGORITHM Take out the sky Map a quadrilateral on camera video corresponding with laser sensor boundaries As long as this region is deemed drivable, use the pixels in the quad as a training set for the concept of drivable surface Maintain Gaussians that model the color of drivable terrain Adapt by adjusting previous Gaussians and/or throwing them out and adding new ones Adjustment allows for slow adjustment to lighting conditions Replacement allows for rapid change in color of the road Label regions as drivable if their pixel values are near one or more of the Gaussians and they are connected to laser quadrilateral
 
ROAD BOUNDARIES Best way to avoid obstacles on a desert road is to find road boundaries and drive down the middle Uses low-pass one-dimensional Kalman Filters to determine road boundary on both sides of vehicle Small obstacles don’t really affect the boundary found Large obstacles over time have a stronger effect
SLOPE AND RUGGEDNESS If terrain becomes too rugged or steep, vehicle must slow down to maintain control Slope is found from vehicle’s pitch estimate Ruggedness is determined by taking data from vehicle’s z accelerometer with gravity and vehicle vibration filtered out
PATH PLANNING No global planning necessary Coordinate system used is base trajectory + lateral offset Base trajectory is smoothed version of driving corridor on the map given to contestants before the race
PATH SMOOTHING Base trajectory computed in 4 steps: Points are added to the map in proportion to local curvature Least-squares optimization is used to adjust trajectories for smoothing Cubic spline interpolation is used to find a path that can be resampled efficiently Calculate the speed limit
ONLINE PATH PLANNING Determines the actual trajectory of vehicle during race Search algorithm that minimizes a linear combination of continuous cost functions Subject to dynamic and kinematic constraints Max lateral acceleration Max steering angle Max steering rate Max acceleration Penalize hitting obstacles, leaving corridor, leaving center of road
 
MULTI-AGENT SYSTEMS
RECURSIVE MODELING METHOD (RMM) Agents model the belief states of other agents Beyesian methods implemented Useful in homogeneous non-communicating Multi-Agent Systems (MAS) Has to be cut off at some point (don’t want a situations where agent A thinks that agent B thinks that agent A thinks that…) Agents can affect other agents by affecting the environment to produce a desired reaction
HETEROGENEOUS NON-COMMUNICATING MAS Competitive and cooperative learning possible Competitive learning more difficult because agents may end up in “arms race” Credit-assignment problem Can’t tell if agent benefitted because it’s actions were good or if opponent’s actions were bad Experts and observers have proven useful Different agents may be given different roles to reach the goal Supervised learning to “teach” each agent how to do its part
COMMUNICATION Allowing agents to communicate can lead to deeper levels of planning since agents know (or think they know) the beliefs of others Could allow one agent to “train” another to follow it’s actions using reinforcement learning Negotiations Commitment Autonomous robots could understand their position in an environment by querying other robots for their believed positions and making a guess based on that (Markov localization, SLAM)
NETFLIX CHALLENGE (if time permits)
REFERENCES Alpaydin, E.  Introduction to Machine Learning . Cambridge, Mass. : MIT Press, 2004. Kreuziger, J.  “Application of Machine Learning to Robotics – An Analysis.”  In Proceedings of the Second International Conference on Automation, Robotics, and Computer Vision  (ICARCV '92).  1992. Mitchell et. al.  “Machine Learning .”  Annu. Rev. Coput. Sci.  1990.  4:417-33. Stone, P and Veloso, M.  “Multiagent Systems:  A Survey from a Machine Learning Perspective.”  Autonomous Robots  8, 345-383, 2000. Thrun et. al.  “Stanley:  The Robot that Won the DARPA Grand Challenge.”  Journal of Field Robotics  23(9), 661-692, 2006.

Machine Learning and Robotics

  • 1.
    MACHINE LEARNING ANDROBOTICS Lisa Lyons 10/22/08
  • 2.
    OUTLINE Machine LearningBasics and Terminology An Example: DARPA Grand/Urban Challenge Multi-Agent Systems Netflix Challenge (if time permits)
  • 3.
    INTRODUCTION Machine learningis commonly associated with robotics When some think of robots, they think of machines like WALL-E (right) – human-looking, has feelings, capable of complex tasks Goals for machine learning in robotics aren’t usually this advanced, but some think we’re getting there Next three slides outline some goals that motivate researchers to continue work in this area
  • 4.
    HOUSEHOLD ROBOT TOASSIST HANDICAPPED Could come preprogrammed with general procedures and behaviors Needs to be able to learn to recognize objects and obstacles and maybe even its owner (face recognition?) Also needs to be able to manipulate objects without breaking them May not always have all information about its environment (poor lighting, obscured objects)
  • 5.
    FLEXIBLE MANUFACTURING ROBOTConfigurable robot that could manufacture multiple items Must learn to manipulate new types of parts without damaging them
  • 6.
    LEARNING SPOKEN DIALOGSYSTEM FOR REPAIRS Given some initial information about a system, a robot could converse with a human and help to repair it Speech understanding is a very hard problem in itself
  • 7.
    MACHINE LEARNING BASICSAND TERMINOLOGY With applications and examples in robotics
  • 8.
    LEARNING ASSOCIATIONS AssociationRule – probability that an event will happen given another event already has (P(Y|X))
  • 9.
    CLASSIFICATION Classification –model where input is assigned to a class based on some data Prediction – assuming a future scenario is similar to a past one, using past data to decide what this scenario would look like Pattern Recognition – a method used to make predictions Face Recognition Speech Recognition Knowledge Extraction – learning a rule from data Outlier Detection – finding exceptions to the rules
  • 10.
    REGRESSION Linear regressionis an example Both Classification and Regression are “Supervised Learning” strategies where the goal is to find a mapping from input to output Example: Navigation of autonomous car Training Data: actions of human drivers in various situations Input: data from sensors (like GPS or video) Output: angle to turn steering wheel
  • 11.
    UNSUPERVISED LEARNING Onlyhave input Want to find regularities in the input Density Estimation: finding patterns in the input space Clustering: find groupings in the input
  • 12.
    REINFORCEMENT LEARNING Policy: generating correct actions to reach the goal Learn from past good policies Example: robot navigating unknown environment in search of a goal Some data may be missing May be multiple agents in the system
  • 13.
    POSSIBLE APPLICATIONS Exploringa world Learning object properties Learning to interact with the world and with objects Optimizing actions Recognizing states in world model Monitoring actions to ensure correctness Recognizing and repairing errors Planning Learning action rules Deciding actions based on tasks
  • 14.
    WHAT WE EXPECTROBOTS TO DO Be able to react promptly and correctly to changes in environment or internal state Work in situations where information about the environment is imperfect or incomplete Learn through their experience and human guidance Respond quickly to human interaction Unfortunately, these are very high expectations which don’t always correlate very well with machine learning techniques
  • 15.
    DIFFERENCES BETWEEN OTHERTYPES OF MACHINE LEARNING AND ROBOTICS Planning can frequently be done offline Actions usually deterministic No major time constraints Often require simultaneous planning and execution (online) Actions could be nondeterministic depending on data (or lack thereof) Real-time often required Other ML Applications Robotics
  • 16.
    AN EXAMPLE: DARPAGRAND/URBAN CHALLENGE
  • 17.
    THE CHALLENGE DefenseAdvanced Research Projects Agency (DARPA) Goal: to build a vehicle capable of traversing unrehearsed off-road terrain Started in 2003 142 mile course through Mojave No one made it through more than 5% of the course in 2004 race In 2005, 195 teams registered, 23 teams raced, 5 teams finished
  • 18.
    THE RULES Musttraverse a desert course up to 175 miles long in under 10 h Course kept secret until 2h before the race Must follow speed limits for specific areas of the course to protect infrastructure and ecology If a faster vehicle needs to overtake a slower one, the slower one is paused so that vehicles don’t have to handle dynamic passing Teams given data on the course 2h before race so that no global path planning was required
  • 19.
    A DARPA GRANDCHALLENGE VEHICLE CRASHING
  • 20.
    A DARPA GRANDCHALLENGE VEHICLE THAT DID NOT CRASH … namely Stanley, the winner of the 2005 challenge
  • 21.
    TERRAIN MAPPING ANDOBSTACLE DETECTION Data from 5 laser scanners mounted on top of the car is used to generate a point cloud of what’s in front of the car Classification problem Drivable Occupied Unknown Area in front of vehicle as grid Stanley’s system finds the probability that ∆h > δ where ∆h is the observed height of the terrain in a certain cell If this probability is higher than some threshold α , the system defines the cell as occupied
  • 22.
    (CONT.) A discriminativelearning algorithm is used to tune the parameters Data is taken as a human driver drives through a mapped terrain avoiding obstacles (supervised learning) Algorithm uses coordinate ascent to determine δ and α
  • 23.
    COMPUTER VISION ASPECTLasers only make it safe for car to drive < 25 mph Needs to go faster to satisfy time constraint Color camera is used for long-range obstacle detection Still the same classification problem Now there are more factors to consider – lighting, material, dust on lens Stanley takes adaptive approach
  • 24.
    VISION ALGORITHM Takeout the sky Map a quadrilateral on camera video corresponding with laser sensor boundaries As long as this region is deemed drivable, use the pixels in the quad as a training set for the concept of drivable surface Maintain Gaussians that model the color of drivable terrain Adapt by adjusting previous Gaussians and/or throwing them out and adding new ones Adjustment allows for slow adjustment to lighting conditions Replacement allows for rapid change in color of the road Label regions as drivable if their pixel values are near one or more of the Gaussians and they are connected to laser quadrilateral
  • 25.
  • 26.
    ROAD BOUNDARIES Bestway to avoid obstacles on a desert road is to find road boundaries and drive down the middle Uses low-pass one-dimensional Kalman Filters to determine road boundary on both sides of vehicle Small obstacles don’t really affect the boundary found Large obstacles over time have a stronger effect
  • 27.
    SLOPE AND RUGGEDNESSIf terrain becomes too rugged or steep, vehicle must slow down to maintain control Slope is found from vehicle’s pitch estimate Ruggedness is determined by taking data from vehicle’s z accelerometer with gravity and vehicle vibration filtered out
  • 28.
    PATH PLANNING Noglobal planning necessary Coordinate system used is base trajectory + lateral offset Base trajectory is smoothed version of driving corridor on the map given to contestants before the race
  • 29.
    PATH SMOOTHING Basetrajectory computed in 4 steps: Points are added to the map in proportion to local curvature Least-squares optimization is used to adjust trajectories for smoothing Cubic spline interpolation is used to find a path that can be resampled efficiently Calculate the speed limit
  • 30.
    ONLINE PATH PLANNINGDetermines the actual trajectory of vehicle during race Search algorithm that minimizes a linear combination of continuous cost functions Subject to dynamic and kinematic constraints Max lateral acceleration Max steering angle Max steering rate Max acceleration Penalize hitting obstacles, leaving corridor, leaving center of road
  • 31.
  • 32.
  • 33.
    RECURSIVE MODELING METHOD(RMM) Agents model the belief states of other agents Beyesian methods implemented Useful in homogeneous non-communicating Multi-Agent Systems (MAS) Has to be cut off at some point (don’t want a situations where agent A thinks that agent B thinks that agent A thinks that…) Agents can affect other agents by affecting the environment to produce a desired reaction
  • 34.
    HETEROGENEOUS NON-COMMUNICATING MASCompetitive and cooperative learning possible Competitive learning more difficult because agents may end up in “arms race” Credit-assignment problem Can’t tell if agent benefitted because it’s actions were good or if opponent’s actions were bad Experts and observers have proven useful Different agents may be given different roles to reach the goal Supervised learning to “teach” each agent how to do its part
  • 35.
    COMMUNICATION Allowing agentsto communicate can lead to deeper levels of planning since agents know (or think they know) the beliefs of others Could allow one agent to “train” another to follow it’s actions using reinforcement learning Negotiations Commitment Autonomous robots could understand their position in an environment by querying other robots for their believed positions and making a guess based on that (Markov localization, SLAM)
  • 36.
    NETFLIX CHALLENGE (iftime permits)
  • 37.
    REFERENCES Alpaydin, E. Introduction to Machine Learning . Cambridge, Mass. : MIT Press, 2004. Kreuziger, J. “Application of Machine Learning to Robotics – An Analysis.” In Proceedings of the Second International Conference on Automation, Robotics, and Computer Vision (ICARCV '92). 1992. Mitchell et. al. “Machine Learning .” Annu. Rev. Coput. Sci. 1990. 4:417-33. Stone, P and Veloso, M. “Multiagent Systems: A Survey from a Machine Learning Perspective.” Autonomous Robots 8, 345-383, 2000. Thrun et. al. “Stanley: The Robot that Won the DARPA Grand Challenge.” Journal of Field Robotics 23(9), 661-692, 2006.