Efficient Learning of Latent Variable Models

Abstract: We formulate a new geometric problem - of identifying a simplex K given data points obtained by perturbing latent points in K. It is a common generalization of several latent variable models including Topic Modeling (LDA), Mixed Membership Block models (MMBM), non-negative matrix factorization and others. We show that K is well approximated by a data-determined polytope K which admits a polynomial time optimization oracle. We then prove that successive vertices of K can be found by optimizing carefully chosen linear functions over K. The algorithm is simply stated and has running time matching the best previous algorithms. The proof of correctness involves new and existing tools from Numerical Analysis, Random Projections and convex geometry. In the last part of the talk, we describe how related techniques can be used to derive near-optimal sample complexity for LDA and MMBM.

Joint Work with Chiranjib Bhattacharyya.