Constrained Reinforcement Learning via Self-Supervision

Supervisor

Suitable for

Abstract

Despite the recent advances and success of reinforcement learning methods, trial and error-based approaches have been limited to toy settings (e.g. video/board games). In safety-critical domains (e.g. self-driving vehicles, domestic robots) autonomous systems are trained against a simulator, since mistakes in the real-world cannot be tolerated. Nonetheless, simulators are hardly capable of capturing the full complexity of the problem and hence when the trained machines fail when exposed to novel settings [1]. Consequently, we would like those systems to be able to learn online, in the real-world, without acting catastrophically, respecting pre-defined constraints. Self-supervised (a.k.a. unsupervised) reinforcement learning studies sequential decision-making without external reward functions, driven only by intrinsically-motivated utilities. The experiments will be based be based on the recently proposed OpenAI safety-gym [2] and a suite of unsupervised reinforcement learning methods [3, 4, 5] will be implemented and benchmarked in this framework.

Prerequisites:

Requirements: constrained optimisation, experience with deep learning frameworks (e.g. TensorFlow), reinforcement learning.

[1] Generalizing from a few environments in safety-critical reinforcement learning. Zachary Kenton, Angelos Filos, Owain Evans, Yarin Gal.

[2] Benchmarking Safe Exploration in Deep Reinforcement Learning. Alex Ray, Joshua Achiam, Dario Amodei.

[3] Unsupervised meta-learning for reinforcement learning. Abhishek Gupta, Benjamin Eysenbach, Chelsea Finn, Sergey Levine.

[4] Reinforcement Learning with Unsupervised Auxiliary Tasks. Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, Koray Kavukcuoglu.

[5] Diversity is All You Need: Learning Skills without a Reward Function. Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine.

Constrained Reinforcement Learning via Self-Supervision

Supervisor

Suitable for

Abstract

Our Students