Inherent Interpretability via Language Model Guided Bottleneck Design

Title: Inherent Interpretability via Language Model Guided Bottleneck Design

Abstract: As deep learning systems improve, their applicability to critical domains is hampered because of a lack of transparency. Post-hoc explanations attempt to address this concern but they provide no guarantee of faithfulness to the model’s computations. Inherently interpretable models are an alternative but such models are often considered to be too simple to perform well. In this talk, we challenge this assumption by demonstrating how to create high-performance inherently interpretable models. Our methods extend concept bottlenecks, a class of inherently interpretable models, by casting their creation as a generation problem for large language models. This allows us to develop search routines for finding high-performing bottlenecks. We specialize in this general approach to image classification, text classification, and visual question answering. In these domains, language model-guided bottleneck models perform competitively to their uninterpretable counterparts and in low-data settings even sometimes outperform them.

Bio: Mark Yatskar is an Assistant Professor at the University of Pennsylvania in the Department of Computer and Information Science. He did his PhD at the University of Washington co-advised by Luke Zettlemoyer and Ali Farhadi. He was a Young Investigator at the Allen Institute for Artificial Intelligence for several years working with their computer vision team, Prior. His work spans Natural Language Processing, Computer Vision, and Fairness in Machine Learning. He received a Best Paper Award at EMNLP for work on gender bias amplification, and his work has been featured in Wired and the New York Times.