Using second-order information in training large-scale machine learning models.
We will give a broad overview of the recent developments in using deterministic and stochastic information to speed up optimization methods for problems arising in machine learning. Specifically, we will show how such methods tend to perform well in convex setting but often fail to provide improvement over simple methods, such as stochastic gradient descent, when applied to large-scale nonconvex deep learning models. We will discuss the difficulties faced by quasi-Newton methods that rely on stochastic first order information and Hessian-Free methods that use stochastic second order information. We will then give an overview of some recent theoretical results for optimization methods based of stochastic information.
Speaker bio
Katya Scheinberg is the Harvey E. Wagner Endowed Chair Professor at the Industrial and Systems Engineering Department at Lehigh University.
She attended Moscow University for her undergraduate studies in applied mathematics and then moved to New York and received her PhD degree in operations research from Columbia University. After receiving her doctoral degree she has worked at the IBM T.J. Watson Research Center as a research staff member for over a decade before joining Lehigh in 2010. In 2016-2017 Katya is on sabbatical leave visiting Google Research in NY and University of Oxford.
Katya’s main research areas are related to developing practical algorithms (and their theoretical analysis) for various problems in continuous optimization, such as convex optimization, derivative free optimization, machine learning, quadratic programming, etc. She has been focusing on large-scale optimization method for Big Data applications and Machine Learning since 2000. In 2015, jointly with Andy Conn and Luis Vicente, she received the Lagrange Prize awarded jointly by SIAM and MOS.