Hierarchical bandits for decision making and planning
The "optimism in the face of uncertainty" principle has been recently applied to complex sequential decision making and planning problems, such as the game of go. The idea is to use regret-minimization algorithms (the so-called bandit algorithms) to explore efficiently the set of possible strategies given noisy evaluations of their performance. In this talk I will review some theoretical results obtained with collegues, including the Hierarchical Optimistic Optimization algorithm and other related hierarchical bandit algorithms.
Speaker bioDr. Muno's research interests cover reinforcement learning, multi-armed bandits, and dynamic programming. He was also a contributor to the Go playing program Mogo, using Monte-Carlo Tree Search which uses patterns in the simulations and improvements in UCT.
Since 2006, Dr. Muno has been a senior researcher at INRIA Lille, France in the SequeL team (Sequential Learning). From 2000-2006, he was an assistant then associate professor at Ecole Polytechnique, in Paris, and from 1998-2000, he was a Postdoc at Carnegie Mellon University, Pittsburgh. Dr. Munos received his PhD in Cognitive Science at Ecole des Hautes Etudes en Sciences Sociales in 1997.