Scalable Deep Bayesian Optimization for Biological Sequence Design

Title: Scalable Deep Bayesian Optimization for Biological Sequence Design (via Zoom)

Abstract: Bayesian optimization is a framework that leverages the ability of Gaussian processes to quantify uncertainty in order to efficiently solve black-box optimization problems. For many years, much work in this area has focused on relatively low dimensional continuous optimization problems where the objective function is highly expensive to evaluate, and is limited to a few hundred evaluations at most. In this talk, I’ll discuss the application of Bayesian optimization to a radically different problem setting: the design of biological sequences like antibodies and mRNA molecules. In these settings, scientists may have access to vast libraries of known compound properties, and the objective functions are structured, discrete and high dimensional. By uniting recent work on deep representation learning for molecules, scalable Gaussian processes, and high dimensional black-box optimization, we are able to achieve up to a 20x performance improvement over state of the art on several of the most popular benchmarks for molecule design, and even find large sets of diverse molecules that all achieve high reward.

Bio: Jacob Gardner is an assistant professor in the Computer and Information Science department at the University of Pennsylvania. Before that, he was a research scientist at Uber AI Labs. He completed his PhD at Cornell University in 2018. His lab does research in machine learning, with a focus on scalable probabilistic machine learning methods, and the intersection of deep learning and Bayesian machine learning. Recently, he’s been interested in how machine learning techniques can be applied to large-scale, high dimensional optimization problems in the natural sciences. In 2022, he received an NSF CAREER award funding work on these kinds of optimization problems.