Skip to main content

Information Theory, Lattice of Information and Information Leakage

Pasquale Malacaria ( Queen Mary )

The general aim of Quantitative Information Flow (QIF) is to measure the amount of information about a source of information that can be gained by observations on some related system component. An application in the context of programming languages is for example to measure leakage of confidential information in programs. The simplest example motivating QIF is a password checking program where secret data (stored in the high variable h) can be deduced by observing the final value of the low variable l:
if (h == l) l = 0 else l = 1

The above program is insecure and the aim of QIF is to measure its insecurity. Further examples of this scenario are code handling medical records, code querying databases, arithmetic operations within a
cryptographic package, RFID tag identification etc. In these examples we expect sound code to leak as little as possible. There are also positive applications of QIF, where the analysis should return high leakage; examples of this scenario are programs computing best fit representation for data, hash coding, image processing code, strategy game solving programs etc.

The basic albeit crude unit of measure in the analysis is the number of states of the source of information that can be distinguished by the observations. Classical non-interference amounts to not to be able to make any distinction by looking at the observations, whereas total leakage corresponds to be able to make all possible distinctions. The tricky bit is, of course, what lies in between these two extremes. In the past years a number of works have refined this idea and provided a solid theoretical background for QIF. The theory is based on Information Theoretical terms, the main being Shannon’s entropy but also Renyi’s entropies and guessability measures have been shown to be relevant. The Information Theoretical approach has been shown to be powerful and able to provide natural, intuitive and precise reasoning principles for program constructs including loops. The theory has also been fruitfully applied to the analysis of anonymity protocols, where the same framework has been used to measure the loss of anonymity.More recent works have investigated automation of such ideas, and verification techniques like abstract interpretation, bounded model checkers and SAT solvers have been applied in this context. Implementation of the quantitative aspects of the analysis presents however a number of challenges, the main being scalability and handling of data structures.

Some of these challenges could probably be solved by appropriate abstraction techniques, some may be better addressed by statistical means.The talk will give an overview of the field, the basic ideas, reasoning principles and problems.

Share this: