Testing Probabilistic Equivalence through Reinforcement Learning
We propose a new approach to quantify differences between two dynamical systems in which one can interact with the systems by performing actions and receiving observations. The quantification is called a trace-divergence: its value is zero on processes that are trace-equivalent and strictly positive otherwise.
We also suggest a new family of equivalences, called K-moment, for which we can also define such a notion of divergence. The weakest, 1-moment equivalence is trace-equivalence. The others are weaker than bisimulation but stronger than trace-equivalence.
The key idea is to define a Markov Decision Process (MDP) based on the systems to be compared, in such a way that the optimal value of the MDP initial state can be interpreted as a divergence (or dissimilarity) between the systems. This dissimilarity can therefore be estimated by Reinforcement Learning methods. Let us recall that those RL methods are known for their ability to efficiently tackles very huge systems
Moreover, when the two systems are inequivalent, our divergence algorithm will provide a test that witnesses the non-equivalence.