Bisimulation on Distributions
for Reinforcement Learning

Nathanaël Fijalkow

Who am I?


A Logician at the
Alan Turing Institute
of Data Science, London


Thanks to them:

The virtuous circle

Automata Analysis


An automaton is a data structure to facilitate data analysis.

Typical problems:
  • Emptiness (boolean properties): is the property non-trivial?
  • Equivalence: are two properties equivalent?
    Notion of minimal / canonical automaton

Markov Decision Process

Making things simple before learning them


Probabilistic Bisimulation (Larsen, Skou)


Definition: $p \sim q$ just if $\forall a \in A, \forall C \in [S]_\sim,\ \tau(p,a)(C) = \tau(q,a)(C)$.

Transfer results


Bisimulation is extended to a bisimulation metric.

Theorem (Castro, Precup, Panangaden) The difference in Q-values is bounded by the bisimulation metric.

Bisimulation on Distributions


Definition: $p \sim_D q$ just if $\forall a \in A,\ \tau(p,a) \sim_D \tau(q,a)$.

What transfer results?