Distributed, Real-Time Bayesian Learning in Online Services
The last ten years have seen a tremendous growth in Internet-based online services such as search, advertising, gaming and social networking. Traditionally, offline analysis of large collections of user interaction data has informed building predictive models for these services; today, adjusting and improving these models in real-time has become just as important.
One of the biggest challenges in this setting is scale: not only does the sheer scale of data necessitate parallel processing but it also necessitates distributed models; with over 900 million active users at Facebook, any user-specific sets of features in a linear or non-linear model yields models of a size bigger than can be stored in a single physical computer.
In this talk, I will give a hands-on introduction to one of the most versatile tools for handling large collections of data with distributed probabilistic models: the sum-product algorithm for approximate message passing in factor graphs. I will discuss the application of this algorithm for the specific case of generalised linear models and outline the challenges of both approximate and distributed message passing including an in-depth discussion of expectation propagation and the relation to Map-Reduce - a related technique for dealing with Big data and distributed learning. The talk will be filled with experimental findings when running such systems at the scale of Facebook.
Speaker bioRalf is working at Facebook on large-scale, distributed learning and prediction as a web services/infrastructure. Before joining Facebook, he was heading the Bing Personalization team which focused on prototyping and enabling personalized experiences across Microsoft's Online Services Division. Prior this his work on Bing, Ralf was Director of Microsoft's Future Social Experiences (FUSE) Labs UK working on new social experiences powered by computational intelligence technologies on large online data collections. Ralf joined Microsoft Research in 2000 as a Postdoctoral researcher and Research Fellow of the Darwin College Cambridge. During his time at Microsoft Research, Ralf was working in the areas of machine learning, information retrieval, game theory, artificial intelligence, optimization and social network analysis. Prior to joining Microsoft, Ralf worked at the Technical University Berlin as a teaching assistant where he obtained both a Diploma degree in Computer Science and a Ph.D. degree in Statistics.
Ralf's research interests include Bayesian inference and decision making, computer games, kernel methods, statistical learning theory and distributed systems. Ralf is one of the inventors of the Drivatars system used in the Forza Motorsport series as well as the TrueSkill ranking and matchmaking system in Xbox 360 Live. He also co-invented the click-prediction technology used in Bing's online advertising system.