MergeRUCB: A Method for Large−Scale Online Ranker Evaluation
Masrour Zoghi‚ Shimon Whiteson and Maarten de Rijke
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, which interleave lists proposed by two different candidate rankers, then the problem of minimizing the total regret accumulated while evaluating the rankers can be formalized as a K-armed dueling bandit problem. In the setting of web search, the number of rankers under consideration may be large. Scaling effectively in the presence of so many rankers is a key challenge not adequately addressed by existing algorithms. We propose a new method, which we call mergeRUCB, that uses ``localized'' comparisons to provide the first provably scalable K-armed dueling bandit algorithm. Empirical comparisons on several large learning to rank datasets show that mergeRUCB can substantially outperform the state of the art K-armed dueling bandit algorithms when many rankers must be compared. Moreover, we provide theoretical guarantees demonstrating the soundness of our algorithm.