Skip to main content

Active Defenses against Illusory Attacks

Supervisors

Suitable for

MSc in Advanced Computer Science
Mathematics and Computer Science, Part C
Computer Science and Philosophy, Part C
Computer Science, Part C
Computer Science, Part B

Abstract

Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We recently demonstrated that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detectable using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations. In response, we introduced illusory attacks, a novel form of adversarial attack on sequential decision-makers that is both effective and of epsilon-bounded statistical detectability. In this project, we, for the first time, investigate how victims of epsilon-bounded illusory attacks can actively defend themselves, i.e. adapt their test-time behaviour to maximise information-theoretic detectability. Drawing from the active learning literature, and inspired by Bayesian optimal experiment design and constraint preferences in economics, we will develop novel ways of optimising for the victim’s best response to a cost-bounded adversary. This project is designed to lead to publication. We are looking for a highly-motivated student with interest in theory.

[1] Tim Franzmeyer, Stephen Marcus McAleer, Joao F. Henriques, Jakob Nicolaus Foerster, Philip Torr, Adel Bibi, Christian Schroeder de Witt, Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers, https://openreview.net/forum?id=F5dhGCdyYh (under review at ICLR 2024)