- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Talk Title: Investigating Length Correlations in RLHF
Abstract: Reinforcement Learning with Human Feedback (RLHF) has reported great success in aligning language models, but often drives models to produce longer outputs. We investigate this phenomenon and find that response length is a more significant contributing factor behind RLHF’s reported improvements than previously thought. On three diverse data settings, we find that performance improvements after RLHF are largely due to increased length, instead of other important features. In fact, optimizing a purely length-based reward reproduces most downstream RLHF improvements over fine-tuned models. We test a comprehensive set of length-countering interventions, and identify reward models as the dominance source of this bias.
Bio: Tanya Goyal is an assistant professor in the Computer Science department at Cornell University. Her research interests include building reliable and sustainable evaluation frameworks for large language models (LLMs) as well as understanding LLM behaviors as a function of training data and/or alignment strategies. Previously, she was a postdoctoral scholar at Princeton Language and Intelligence Center (2023-2024). Tanya completed her Ph.D. in Computer Science at UT Austin in 2023, advised by Greg Durrett, and her thesis was awarded UTCS’s Bert Kay Dissertation award.