Software Verification for Weak Memory via Program Transformation

Abstract

Despite multiprocessors implementing weak memory models, verification methods often assume Sequential Consistency (SC), thus may miss bugs due to weak memory. We propose a sound transformation of the program to verify, enabling SC tools to perform verification w.r.t. weak memory. We present experiments for a broad variety of models (from x86/TSO to Power/ARM) and a vast range of verification tools, quantify the additional cost of the transformation and highlight the cases when we can drastically reduce it. Our benchmarks include work-queue management code from PostgreSQL.

Paper

Software Verification for Weak Memory via Program Transformation (2012)

Proofs

Proofs of the paper formalised and checked in Coq (2012)


News


Tool manual

Here is the manual of the tool, and how to plug in a model-checker (we have tried CheckFence, ESBMC, MMChecker, Poirot, SatAbs, Threader and our new CImpact).



Case Study: Fixing a WMM bug in PostgreSQL

We show here how we can detect this bug for Power reported by PostgreSQL developers.





We then test their fix (the reader should search for the comment "XXX there really ought to be a memory barrier operation right here" in the diff). We show that this fix might not be sufficient, we provide a counter-example generated by our tool, and we propose an additional fix, which provably fixes the problem.


Summary of systematic testing for SatAbs

Here are the average runtime and overhead for SatAbs, on all litmus tests. We provide the same data for all model-checkers, on all litmus tests but also on C examples below.

The following scatter plot (click for more details and an enlarged version) depicts the execution time of running SatAbs on a fully instrumented program vs. execution times for the optimised instrumentation. Any dot below the diagonal line marks a case where full instrumentation is slower than the optimised one.


Experiments

Here are all our experimental data, for all model-checkers, on all litmus tests and C examples, including standard ones, borrowed ones, the PostgreSQL bug, and RCU.

As we can independently compute the expected verification result for all litmus tests, we are able to assess the soundness of each of the verification tool. The following graphics provide an executive summary of this assessment.









People