Deep Reinforcement Learning

Deep Reinforcement Learning image collage

Reinforcement learning is a field of machine learning, in which an agent learns to perform tasks by trial-and-error, while receiving feedback in form of reward signals. Solving such tasks involves dealing with high-dimensional state and action spaces, sparse reward signals, and uncertainties in the agent’s observations. In recent years, much of the successes of scaling up reinforcement learning to more complex tasks has come from leveraging the successes of deep neural networks, coining the term deep reinforcement learning.

Research

Some of the main areas of research into deep reinforcement learning we focus on at the University of Oxford are the following:

Multi-Agent Deep Reinforcement Learning: Multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. We are developing new algorithms that enable teams of cooperating agents to learn control policies for solving complex tasks, including techniques for learning to communicate and stabilising multi-agent experience replay.
Deep Model-Based Reinforcement Learning: In model-based reinforcement learning, the agent learns a model of its environment and uses this to efficiently learn to act optimally, instead of directly learning optimal behaviour. We are developing methods to do model-based reinforcement with deep neural networks, using methods such as look-ahead tree planning.
Robust Reinforcement Learning: We are developing new reinforcement learning methods that are robust to significant rare events, i.e., events with low probability that nonetheless significantly affect expected performance. For example, some rare wind conditions may increase the risk of crashing a helicopter. Since crashes are so catastrophic, avoiding them is key to maximising expected performance, even though the wind conditions contributing to the crash occur only rarely.
Active Perception: We are developing decision-theoretic methods for helping perception systems, such as multi-camera tracking systems, to make efficient use of scarce resources such as computation and bandwidth. By exploiting submodularity, we can efficiently determine which subset of cameras to use, or which subset of pixel boxes in an image to process, so as to maximise metrics such as information gain and expected coverage.