Probabilistic Shielding for Safe Reinforcement Learning

Edwin Hamel-de le Court ( Imperial College London )

22May
15:00 22nd May 2025 ( Trinity Term 2025 )
Seminar Room 051 + https://cs-ox-ac-uk.zoom.us/j/94619293870

In real-life scenarios, a Reinforcement Learning (RL) agent aiming to maximise their reward must often also behave in a safe manner, including at training time. Thus, much attention in recent years has been given to Safe RL, where an agent aims to learn an optimal policy among all policies that satisfy a given safety constraint. However, strict safety probabilistic guarantees are often provided through approaches based on linear programming, and thus have limited scaling. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL, in the case where the safety dynamics of the Markov Decision Process (MDP) are known. Our approach is based on state-augmentation of the MDP, and on the design of a shield that restricts the actions available to the agent. We demonstrate that our approach is viable in practice through experimental evaluation.

Seminar Series

Verification Seminars

Probabilistic Shielding for Safe Reinforcement Learning

Seminar Series

See also

Coordinators

News & Events