|title:||Distributed Symbolic Reachability Analysis|
|keywords:||distributed, symbolic, model checking|
|topics:||Algorithms and Data Structures|
Jaco van de Pol
Tom van Dijk ,
|type:||MTV Master project|
Model checking is an important tool in program verification and software validation. Model checkers generally examine the entire state space of a model to find behaviour that differs from a given formal specification. Most temporal safety properties can be verified via reachability analysis. A major limitation is the state space explosion problem, which occurs when the state space does not fit into the available memory.
State spaces can efficiently be stored as Binary Decision Diagrams (BDDs), a data structure used to efficiently represent sets. The BDD representations of state spaces can then be manipulated via BDD operations, which enables symbolic reachability analysis. Moreover, more hardware resources can be employed. By increasing processing units and available memory, larger state spaces can be stored and processed in less time.
This thesis focuses on symbolic reachability analysis, in addition to adding hardware resources by using a network of workstations. Our research goal is to implement distributed BDD operations that scale efficiently along all processing units and memory connected via a high-performance network.
Compared to existing work on distributed symbolic reachability, we use modern techniques and components, like Infiniband and RDMA, to reduce network latency. Furthermore, we use hierarchical private deque work stealing to implement efficient load-balancing. Finally, we attempt to use the idle-times of workers as efficient as possible by overlapping RDMA roundtrips as much as possible.
We designed an RDMA-based distributed hash table and private deque work stealing algorithms. Both components are micro-benchmarked in detail and used to implement the actual BDD operations for distributed symbolic reachability. The algorithms are evaluated by performing symbolic reachability over a number of BEEM models.
Instead of network latency, the limited throughput of the RDMA devices appears to be the performance bottleneck. As a result, we achieve good scalability along the number of machines added to the network, but scalability along the number of processes per machine remains limited. Compared to sequential runs, speedups up to 28 are observed. With respect to parallel symbolic reachability, by paying an acceptable performance price, the combined memory of a network of workstations can be used efficiently, and adding workstations does not decrease performance.