- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Veridical Data Science Toward Trustworthy AI (via Zoom)
Abstract:
"AI is like nuclear energy–both promising and dangerous."
Bill Gates, 2019
Data Science is central to AI and has driven most of recent advances in biomedicine and beyond. Human judgment calls are ubiquitous at every step of a data science life cycle (DSLC): problem formulation, data cleaning, EDA, modeling, and reporting. Such judgment calls are often responsible for the "dangers" of AI by creating a universe of hidden uncertainties well beyond sample-to-sample uncertainty.
To mitigate these dangers, veridical (truthful) data science is introduced based on three principles: Predictability, Computability and Stability (PCS). The PCS framework and documentation unify, streamline, and expand on the ideas and best practices of statistics and machine learning. In every step of a DSLC, PCS emphasizes reality check through predictability, considers computability up front, and takes into account of expanded uncertainty sources including those from data curation/cleaning and algorithm choice to build more trust in data results. PCS will be showcased through collaborative research in finding genetic drivers of a heart disease, stress-testing a clinical decision rule, and identifying microbiome-related metabolite signature for possible early cancer detection.
Bio: Bin Yu is Chancellor's Distinguished Professor and Class of 1936 Second Chair in Statistics, EECS, and Computational Biology at UC Berkeley. Her recent research focuses on statistical machine learning practice, algorithm, and theory, veridical data science for trustworthy AI, and interdisciplinary data problems in neuroscience, genomics, and precision medicine. She is a member of the U. S. National Academy of Sciences and American Academy of Arts and Sciences. She was a Guggenheim Fellow, Tukey Memorial Lecturer of the Bernoulli Society, and Rietz Lecturer of the Institute of Mathematical Statistics (IMS) , and won the E. L. Scott Award given by the Committee of Presidents of Statistical Societies (COPSS). She delivered the IMS Wald Lectures and the COPSS Distinguished Achievement Award and Lecture (DAAL) (formerly Fisher) at the Joint Statistical Meetings (JSM) in August, 2023. She holds an Honorary Doctorate from The University of Lausanne. She served on the inaugural scientific advisory board of the UK Turing Institute of Data Science and AI, and is serving on the editorial board of PNAS and as a senior advisor at the Simons Institute for the Theory of Computing at UC Berkeley.