The Complexity of Making the Gradient Small in Stochastic Convex Optimization

Abstract: We give nearly matching upper and lower bounds on the oracle complexity of finding $\epsilon$-stationary points in stochastic convex optimization. We jointly analyze the oracle complexity in both the local stochastic oracle model and the global oracle (or, statistical learning) model. This allows us to decompose the complexity of finding near-stationary points into optimization complexity and sample complexity, and reveals some surprising differences between the complexity of stochastic optimization versus learning. Notably, we show that in the global oracle / statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model. In other words, the separation in complexity between the two models can be exponential, and that the folklore understanding that smoothness is required to find stationary points is only weakly true for statistical learning.

This is joint work with Dylan J. Foster, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth and was presented at COLT 2019.  

Bio: Ayush is a Ph.D. student in the Computer Science department at Cornell University, advised by Prof. Karthik Sridharan and Prof. Robert D. Kleinberg. His research interests span across the interplay between machine learning, optimization and theoretical computer science. Before coming to Cornell, Ayush spent a year at Google as a part of the Brain residency program. Before Google, Ayush completed his undergraduate studies from Indian Institute of Technology (IIT) Kanpur, where he was awarded President's Gold Medal for the best academic performance in the graduating class.