Skip to main content

Utile Context Tree Weighting

João Messias and Shimon Whiteson

Abstract

Reinforcement learning (RL) in partially observable settings is challenging because the agent's immediate observations are not Markov. Recently proposed methods can learn variable-order Markov models of the underlying process but have steep memory requirements and are sensitive to aliasing between observation histories due to sensor noise. This paper proposes utile context tree weighting (UCTW), a model-learning method that addresses these limitations. UCTW dynamically expands a suffix tree while ensuring that the total size of the model, but not its depth, remains bounded. We show that UCTW approximately matches the performance of state-of-the-art alternatives at stochastic time-series prediction while using at least an order of magnitude less memory. We also apply UCTW to model-based RL, showing that, on tasks that require memory of past observations, UCTW can learn without prior knowledge of a good state representation, or even the length of history upon which such a representation should depend.

Book Title
NIPS 2017: Proceedings of the Thirty−First Annual Conference on Neural Information Processing Systems
Month
December
Note
To appear.
Year
2017