- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Spring 2025 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University / Cornell Tech - High School Programming Workshop and Contest 2025
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Robotics Ph. D. prgram
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Title: Strategies for Training Massive AI Workloads
Abstract: The rapid advancement of deep learning for generative tasks has shown strong scaling laws where the model performance increases proportional to its size. This has led to the proliferation of machine learning models with billions and trillions of parameters. Training such large-scale models presents significant challenges in memory efficiency, compute utilization, and communication overhead. Solving these challenges requires non-trivial strategies for parallelizing and synchronizing models at scale. This talk explores the landscape of training performant models at large scales, and discusses various techniques such as 5D Parallelism, DeepSpeed and FSDP. We discuss the trade-offs between various methods in terms of memory efficiency, communication overhead, and compute intensity, offering insights into their optimizations. Finally, we delve into the best practices and practical implementation insights for training large models.
Bio: Tanmaey Gupta is a first year Ph.D. student working with Prof. Chris De Sa and Prof. Udit Gupta at the intersection of Systems and Machine Learning. His interests lie in designing and implementing systems, software, and algorithms which enable efficient and scalable machine learning tasks in distributed and resource-constrained settings. Prior to joining Cornell, he was a Pre-doctoral Research Fellow at Microsoft Research India, working on projects in the AI Infrastructure team and the Center for Societal impact through Cloud and Artificial Intelligence.