Unifying task specification in reinforcement learning
Martha White ( University of Alberta )
Markov decision processes have long been the standard formalism for sequential decision-making within reinforcement learning. But, this is not the full story; in reality, there are specialized instances that require separate treatment, with the most notable being episodic and continuing problems. In this talk, I will discuss a generalization to the discount that enables a more unified formalism for these settings. I will discuss some advantages of this generalization, for specifying a broader class of policy evaluation questions and in terms of unifying the theoretical treatment of these different settings.