- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Learning to build an actionable, composable, and controllable digital twin
Abstract: Simulation has been the driving force behind robot development. With recent advances in computer vision and graphics, simulating sensor observations has particularly drawn wide attention across the community, since it may enable end-to-end testing of full autonomy systems. Unfortunately, existing sensor simulators, while impressive, still suffer from realism and can neither effectively model the outcomes of actions nor hallucinate counterfactual scenarios. In this talk, I will summarize our recent efforts to enable this goal.
First, I will discuss how we develop a high-fidelity closed-loop sensor simulator for self-driving vehicles. Our key insight is to build a digital twin directly from real-world data and leverage the compositional structure of the world to decompose the scene into foreground actors and background. This not only allows us to synthesize extremely high-quality sensor observations to avoid domain gap, but also facilitates better modeling of the interactions among the actors and the scene. Next, I will discuss how we can further expand the simulator to generate physically plausible sensor observations under different lighting conditions and improve the robustness of autonomous systems. Finally, I will present our recent efforts on pushing the boundaries of digital twins with generative models. I will showcase how we distill knowledge from multimodal LLMs into existing 3D systems, making them interactable, actionable, and thus suitable for physical intelligence.