Imitating Experts with Privileged Information (via Zoom)

Abstract: Imitation learning is a flexible paradigm for programming robots implicitly  through demonstrations, interventions or preferences. However, often times the expert has access to context that is hidden from the learner. For instance, in self-driving, the human expert has richer context about the scene than the limited perception system of the car. While a common solution is to add a history of past states and actions to the model, practitioners have often noted that off-policy methods lead to a “latching effect” where the learner simply repeats the past action. On the other hand, on-policy approaches that leverage interaction with the demonstrator or the environment are able to match expert performance in the limit. We study this question and show a sharp phase transition in performance of off-policy approaches in contrast to uniformly good performance of on-policy approaches. We believe that this strong separation helps explain the variable performance of behavior cloning, even in regimes with large data and powerful model classes, and the consistent success of on-policy methods across domains like search, self-driving and mobile manipulation.

Bio: Sanjiban Choudhury is an Assistant Professor in the Department of Computer Science at Cornell University and a Research Scientist at Aurora Innovation. His research goal is to enable robots to work seamlessly alongside human partners in the wild. To this end, his work focuses on imitation learning, decision making and human-robot interaction. He did his Ph.D. in Robotics from Carnegie Mellon University and was a Postdoctoral fellow at the University of Washington. His research has received best paper awards at ICAPS 2019, finalist for IJRR 2018, and AHS 2014, and winner of the 2018 Howard Hughes award. He is a Siebel Scholar, class of 2013.