author: | Danny Bergsma |
title: | Porting tree-based hash table compression to GPGPU model checking |
keywords: | gpu programming, verification, compression algorithm |
topics: | Algorithms and Data Structures |
committee: |
Anton Wijs
, Mohsen Safari , Sebastiaan Joosten , Marieke Huisman |
started: | July 2017 |
end: | December 2019 |
Abstract
To reduce the costs of faulty software, methods to improve software quality are very popular nowadays. One of these methods is model checking: verifying the functional correctness of the model of a hardware or software system. The model implies a state space, which consists of all possible states of the system and all possible transitions between them.
For complex systems, the number of states can be millions or even more. Consequently, exploring the state space and checking that the system satisfies its specification is computationally very intensive. Multiple model checking algorithms have been adapted to make use of the massive parallelism of GPUs by GPGPU programming, resulting in spectacular performance improvements.
One such implementation is the GPU-based model checker GPUexplore. GPUexplore uses a hash table for the shared store of visited states in the model checking process. Considering the spectacular speedups, this hash table is now the bottleneck, as GPU memory size is relatively limited. Compression of the hash table can be used to reduce space requirements.
But we first tackled two large flaws in GPUexplore’s uncompressed hash table: replication of states in the hash table and a hash function that resulted in the inferior distribution of states across the table. We successfully designed and implemented a replication-free hash table with a probabilistic-optimal distribution, whose performance was equal to or even better than the flawed implementation.
Subsequently, we have successfully ported an existing tree-based hash table compression algorithm designed for multi-core CPUs to stand-alone GPGPU hash tables. The recursion in the compression algorithm was the main challenge, as this is implemented differently on GPUs than on CPUs and impacts performance. We made multiple improvements to the original implementation, aimed at reducing recursive overhead. Moreover, we designed and implemented a solution without any recursion at all, consisting of fifteen variants that differ in aspects related to GPU performance.
We used parameterised random data input and state sequences extracted from real-world models, consisting of up to 375M states, for performance analysis: We exhaustively examined the impact of different GPU, input and hash table parameters on performance, both uncompressed and compressed, including the improved versions. We used the results to find optimal settings for each input/program combination, which enabled a fair evaluation of compression overhead and for measuring the performance impact of the improvements.
Ultimately, there was still compression overhead, but very limited, up to a 3.8x slowdown, corresponding to the scattered memory accesses needed for tree-based compression. Considering the spectacular speedups of GPU model checking, the maximum slowdown of 3.8x would not be an issue for integrating compression into GPU model checkers, which would enable checking bigger models and models with data. As integration enables possibilities for reducing recursive overhead and the amount of required memory accesses, compression overhead may then even reduce.
Now, there is only one step left: integrating our compression algorithm into a GPU-based model checker, e.g., into GPUexplore, and examining whether the results of our performance analysis also apply when integrated.