Kernel Methods for Large Scale Representation Learning
Kernel methods have great promise for learning rich statistical representations of large modern datasets. However, compared to neural networks, kernel methods have been perceived as lacking in scalability and flexibility. In this talk, I introduce expressive kernel methods which are simultaneously flexible and scalable. I show how my approach can be used to profoundly improve the performance of kernel machines in general settings, and to enable feature extraction and extrapolation on large scale multidimensional patterns -- e.g., time series extrapolation, image inpainting, video extrapolation, and long range spatiotemporal forecasting.
In particular, I introduce expressive kernels derived by modelling spectral densities with scale location Gaussian mixtures. Scaling up such expressive kernel learning approaches poses different challenges than scaling standard kernel machines. One faces additional computational constraints, and the need to retain significant model structure for expressing the rich information available in a large dataset. I derive, extend, and argue for structure-exploiting approaches to scalability, such as Kronecker methods, which distinctly enable large scale pattern extrapolation with kernel machines such as Gaussian processes. I also argue for the importance of fully nonparametric probabilistic approaches to pattern discovery and extrapolation. This work is intended to help unify efforts in simultaneously enhancing the flexibility and scalability of kernel methods. In a sense, flexibility and scalability are one and the same problem: we want the most expressive methods for the biggest datasets.
1) Andrew Gordon Wilson. Covariance Kernels for Fast Automatic Pattern Discovery and Extrapolation with Gaussian Processes. PhD Thesis, University of Cambridge. January 2014.
2) Andrew Gordon Wilson, Elad Gilboa, Arye Nehorai, and John P. Cunningham. Fast Kernel Learning for Multidimensional Pattern Extrapolation. To appear in NIPS 2014.
3) Andrew Gordon Wilson. A Process over All Stationary Kernels. Technical Report, University of Cambridge. June 2012.
4) Andrew Gordon Wilson and Ryan P. Adams. Gaussian Process Kernels for Pattern Discovery and Extrapolation. International Conference on Machine Learning, 2013.
Andrew Gordon Wilson is a postdoctoral research fellow in the machine learning department at Carnegie Mellon University. He graduated with a PhD from the University of Cambridge in 2014, under the supervision of professors Zoubin Ghahramani and Carl Edward Rasmussen. He has won best dissertation, best paper, and best reviewer awards, and has given a number of invited talks, most recently at MLSS 2014 in Pittsburgh. Andrew has reviewed for top machine learning and statistics conferences and journals, including NIPS, ICML, AISTATS, IJCAI, JMLR, Biometrika, The Electronic Journal of Statistics, IEEE TPAMI, and IEEE Transactions on Neural Networks. Andrew specializes in building scalable and expressive kernel learning methods, particularly for automatic pattern discovery and extrapolation. Andrew's work has applications in econometrics, geostatistics, gene expression, nuclear magnetic resonance spectroscopy, multi-task learning, image and video inpainting and extrapolation, time series extrapolation, and audio modelling, and often involves Bayesian nonparametrics and Gaussian processes. A website with a list of publications can be found at: http://www.cs.cmu.edu/~andrewgw