Utile Context Tree Weighting

Utile Context Tree Weighting

João Messias and Shimon Whiteson

Abstract

Reinforcement learning (RL) in partially observable settings is challenging because the agent's immediate observations are not Markov. Recently proposed methods can learn variable-order Markov models of the underlying process but have steep memory requirements and are sensitive to aliasing between observation histories due to sensor noise. This paper proposes utile context tree weighting (UCTW), a model-learning method that addresses these limitations. UCTW dynamically expands a suffix tree while ensuring that the total size of the model, but not its depth, remains bounded. We show that UCTW approximately matches the performance of state-of-the-art alternatives at stochastic time-series prediction while using at least an order of magnitude less memory. We also apply UCTW to model-based RL, showing that, on tasks that require memory of past observations, UCTW can learn without prior knowledge of a good state representation, or even the length of history upon which such a representation should depend.

Book Title

NIPS 2017: Proceedings of the Thirty−First Annual Conference on Neural Information Processing Systems

Month

December

Note

To appear.

Year

2017

Utile Context Tree Weighting

Abstract

Links