Case Study: Fixing a WMM bug in Pgsql

We show here how we can detect this bug for Power reported by PostgreSQL developers, and test their fix. We show that this fix might not be sufficient, we provide a counter-example generated by our tool, and we propose an additional fix.

Background

In this code, each worker has a critical section, which is the body of the if statement. To ensure that two workers can never access their critical section at the same time, the workers share a token, flag. A worker can enter its critical section only if it has the token. And a worker can only pass the token to its next neighbour, and only when it leaves its critical section. After having passed the token, the worker loops until getting the token again (final blocking while, with latch). This system should prevent starvation. However, some infinite loops have been observed, meaning that the token was lost. After some discussions, it has been agreed that this is due to a weak memory effect. The failure has indeed been detected on some multi-core PowerPC machines.

Detecting the bug

We want to detect this behaviour, which leads to the starvation of the system. First, we consider a version of the code with two workers (two threads). We model the risk of getting the final while looping forever with an assertion, which would be violated if such a case occurred. The guarantee of non-starvation is ensured by the token, modeled by the latch and the flag. If the latch of a worker passes to 1, it wakes up, and tests its flag before entering the critical section, where it will pass the token to the next worker. The flag was updated at the same time as latch, so should have the same value. If it is not the case, the body of the if is skipped, the worker waits again its turn, but does not pass the token. The assertion is "assert(!latch[i] || flag[i])", to place between the wake-up of the worker and its test before entering the critical section.

We convert the program with assertion into goto-program using goto-cc, then apply goto-instrument --power. We now have an instrumented version, which is simulating Power behaviours with an SC execution. Goto-instrument also generated a list of cycles of events that could potentially induce some non-expected WMM behaviours (graph). We now check this instrumented program with Satabs. After 21.34 seconds, Satabs returns "Verification failed" with this counter-example. This states that the worker 0, in its critical section, delays its write to the flag. Worker 1 wakes up, thanks to the update of latch, but when arriving at the if, reads from the memory, which has not been hit yet by the write of worker 0. The flag is false, worker 1 skips its critical section and loops forever in its loops. Meanwhile, the write to the flag can be flushed, but it is already too late, as the only way to relaunch the exchange of token is to wake the worker 1 up. Unfortunately, only the latch can do this... This corresponds to a specific cycle (figure below), which relates writes to latch and flag in a critical section, and their reads (in the while and the following if) in the next worker.

We can also reduce the cycle to the program mp, which only models the events of this cycles. Satabs finds a counter-example in 9.13 sec.

Testing the developers' fix

Pgsql developers proposed to to fix the bug by inserting a memory fence just before the test preceding the critical section, to force the flush of pending writes for latch and flag. In the cycle above, this means inserting a fence between the two reads to prevent their re-ordering, and blocking the Rfe by AC. But the two writes on the previous thread would not be affected by this barrier. Yet, they can be re-ordered, on Power...

We add the fence in the program, as proposed. After generating a new goto-program, we instrument it, and pass it to Satabs. After 3.79 sec, Satabs returns "Verification failed". The bug has not been fully corrected, as the two writes can still be re-ordered (but Rfe is correctly fixed by AC, and RR by sync).

Actually fixing the bug

We propose to prevent the re-ordering of these two writes by adding a fence inbetween. The fence initially proposed between the while and the if can be replaced by an address dependency, less expensive than a sync or a lwsync. The Rfe is still blocked by BC thanks to the other barrier.

We add the fence and the dependency in the program, than generate a goto-program, instrument it and check it with Satabs. After 4.37 sec, the model-checker eventually returns "Verification successful". The bug has been fixed, with one sync, and a dependency by address.