Bayesian Learning via Stochastic Gradient Langevin Dynamics
The Bayesian approach to machine learning is a theoretically well-motivated framework to learning from data. It provides a coherent framework to reasoning about uncertainties, and an inbuilt protection against overfitting. However, computations in the framework can be expensive, and most approaches to Bayesian computations do not scale well to the big data setting. In this talk we propose a new computational approach for Bayesian learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. We apply the method to logistic regression and latent Dirichlet allocation, showing state-of-the-art performance.
Joint work with Max Welling and Sam Patterson.