Pareto Local Policy Search for MOMDP Planning
Chiel Kooijman‚ Maarten de Waard‚ Maarten Inja‚ Diederik Roijers and Shimon Whiteson
Standard single-objective methods such as value iteration are not applicable to multi-objective Markov decision processes (MOMDPs) because they depend on a maximization, which is not defined if the rewards are multi-dimensional. As a result, special multi-objective algorithms are needed to find a set of policies that contains all optimal trade-offs between objectives, i.e. a set of Pareto optimal policies. In this paper, we propose Pareto local policy search (PLoPS), a new planning method for MOMDPs based on Pareto local search (PLS). This method produces a good set of policies by iteratively scanning the neighbourhood of locally non-dominated policies for improvements. It is fast because neighbouring policies can be quickly identified as improvements, and their values can be computed incrementally. We test the performance of PLoPS on several MOMDP benchmarks, and compare it to popular decision-theoretic and evolutionary alternatives. The results show that PLoPS outperforms the alternatives.