The Quest to Automate Everything with Bayesian Optimization
Big data systems are not only vast, but also extremely complex and dynamic. Automation is a core need in these systems. The problems that motivate automation are many and varied. Big data products often integrate many software components developed by different teams, each with its own parameters and choices. How do we find the optimal set of parameters for the entire system? Likewise, information extraction architectures and machine learning techniques used to translate big data to knowledge and applications also have many free meta-parameters. Setting these parameters is a time consuming task for experts. Moreover, these experts are often unavailable. Given a dataset, how can we automatically recommend the best machine learning technique to a non-expert? That is, how do we build automatic machine learning solutions to enable non-experts to model data, extract knowledge, and make predictions and informed decisions? In addition, in systems with millions of users how do we personalize the content that is delivered to them so as to learn their preferences, maximize revenue, or simply deliver better personalized services? Generally, how do we automatically and simultaneously configure software and hardware platforms for more efficient data processing?}
Although these problems may superficially seem to be very different, I will argue that most of these problems can be cast within the growing framework of Bayesian optimization. I will also argue that for Bayesian optimization to be feasible, it is essential to extend the technique to high-dimensions. The talk will provide an overview of some automation challenges, discuss Bayesian optimization at an introductory level and present a solution for attacking high-dimensional problems using random embeddings.