- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Talk Title: Multimodal Learning from the Bottom Up
Abstract: Today's machine perception systems rely extensively on human-provided supervision, such as language. I will talk about our efforts to develop systems that instead learn directly about the world from unlabeled multimodal signals, bypassing the need for this supervision. First, I will discuss our work on creating models that learn from analyzing unlabeled videos, particularly self-supervised approaches for learning space-time correspondence. Next, I will present models that learn from the paired audio and visual signals that naturally occur in video, including methods for generating soundtracks for silent videos. I will also discuss methods for capturing and learning from paired visual and tactile signals, such as models that augment visual 3D reconstructions with touch. Finally, I will talk about work that explores the limits of pretrained text-to-image generation models by using them to create visual illusions.
Bio: Andrew Owens is an assistant professor at The University of Michigan in the department of Electrical Engineering and Computer Science. Prior to that, he was a postdoctoral scholar at UC Berkeley, and he obtained a Ph.D. in computer science from MIT in 2016. He is a recipient of an NSF CAREER Award, a Computer Vision and Pattern Recognition (CVPR) Best Paper Honorable Mention Award, and a Microsoft Research Ph.D. Fellowship.