Mercedes: Maximal Reliability of Concurrent and Distributed Software


Funded by: NWO
Duration: June 2017 until June 2022
Contact: prof.dr. M. Huisman

Summary of the project


Software is everywhere, and the whole world depends on software to behave reliably under all circumstances. It is well-known that all software contains errors, and  large amounts of money are spent on correcting errors in safety-critical and business-critical applications.
As many software applications consist of multiple, relatively independent tasks, and have to live up to high performance demands, more and more software has computations done in parallel. Parallel computations further increase the risk of introducing software errors, because the computations can interact with each other in subtle ways. Moreover, when re-executing the parallel software, errors might not re-occur: in every execution the behaviour may be different, because the computations  interact differently, due to different timing behaviour. This makes detecting and fixing the sources of errors in concurrent and distributed software notoriously difficult, and new techniques are needed to address the challenge of software reliability.

The Mercedes project develops automated verification techniques that can be integrated in the software development process, to detect and correct errors before the software is put in use.  We focus in particular on techniques to verify that a concurrent or distributed application indeed behaves as intended. It is well-understood how to do this for sequential software, but these techniques cannot be directly carried over to concurrent or distributed software, because they do not take the interactions between the parallel computations into account. In the ERC project VerCors, we have developed a technique to prove that concurrent software is free of basic errors such as data races (two parallel computations simultaneously trying to change the same memory location).

The Mercedes project will prove general properties about the software's intended behaviour, by  constructing a mathematical model of  a concurrent or distributed program, providing an abstract view of its executions, which is amenable to verification.  The unique characteristic of our approach is that the correspondence between the abstract model and the concrete program code is proven using existing program verification techniques.

To achieve our goal, we develop an abstraction theory, extending and generalising existing abstractions for concurrent and distributed software. The framework will be general, capturing many different concurrent and distributed programming paradigms. Furthermore, we will investigate how to make reasoning at the abstract level compositional, to enable further scaling and reasoning about open programs. To verify the correspondence between the abstract model and the concrete software automatically using a program logic, we develop techniques to generate the necessary additional annotations. Abstractions traditionally capture safety properties, we will investigate how to expand the approach to progress and liveness properties. Finally, we will investigate use of the abstraction technique in the opposite direction, to derive a correct program via refinement.