Skip to main content

Multi-Objective Reinforcement Learning

Diederik Roijers ( Vrije Universiteit Brussel )

Most decision problems have more than one objective, and are thus most
naturally expressed using a vector-valued reward signal. While it might
seem tempting to define one scalar reward function for these problems
anyway by combining the different objectives in some way a priori, doing so
can be error-prone, and will force human decision-makers to define
(hypothetical) preferences before being able to look at the actually
available alternatives. Instead, we can apply multi-objective planning or
learning to obtain a set of possibly optimal policies and policy values per
objective, and ask humans to define their preferences between these actual
alternatives. This typically leads to a more well-informed decision, and is
therefore highly preferable.

In this talk, we focus on an important question within the realm of
multi-objective decision making: "What if there are infinitely many
possibly optimal policies? How can users select optimal policies from an
infinite set?" To answer this, we make use of Gaussian Processes, and show
how they can be used exploit the properties of multi-objective decision
making. Then, if time permits, we will go into what I consider to be some
important open problems, such as "Is it still possible for agents to
cooperate if they receive the same vector-valued (i.e., multi-objective)
rewards, but may differ in their preferences?"

 

 

Share this: