Why Multi−Objective Reinforcement Learning?
Diederik Roijers‚ Shimon Whiteson‚ Peter Vamplew and Richard Dazeley
We argue that multi-objective methods are underrepresented in RL research, and present three scenarios to justify the need for explicitly multi-objective approaches. Key to these scenarios is that although the utility the user derives from a policy — which is what we ultimately aim to optimize — is scalar, it is sometimes impossible, undesirable or infeasible to formulate the problem as single-objective at the moment when the policies need to be learned. We also present the case for a utility-based view of multi-objective RL, i.e., that the appropriate multi-objective solution concept should be derived from what we know about the user's utility function, rather than axiomatically assumed to be the Pareto front.