Efficient Abstraction Selection in Reinforcement Learning
Harm van Seijen‚ Shimon Whiteson and Leon Kester
This article addresses reinforcement learning problems based on factored Markov decision processes (MDPs) in which the agent must choose among a set of candidate abstractions, each build up from a different combination of state components. We present and evaluate a new approach that can perform effective abstraction selection that is more resource-efficient and/or more general than existing approaches. The core of the approach is to make selection of an abstraction part of the learning agent's decision-making process by augmenting the agent's action space with internal actions that select the abstraction it uses. We prove that under certain conditions this approach results in a derived MDP whose solution yields both the optimal abstraction for the original MDP and the optimal policy under that abstraction. We examine our approach in three domains of increasing complexity: contextual bandit problems, episodic MDPs, and general MDPs with context-specific structure.