|title:||Parallel Prefix Sum: A Case Study for GPGPU Program Verification|
|keywords:||GPU programming, Verification, OpenCL, Parallel algorithms|
|topics:||Case studies and Applications|
The Prefix Sum is an algorithm used as a building block for various other algorithms, for example radix sort, quicksort and lexically comparing strings. Implementing the Prefix Sum algorithm on the CPU is trivial, but a parallel approach with OpenCL is more complicated. An implementation in OpenCL has been made, and optimized to minimize branch divergence by comparing two different storage solutions. These storage solutions have been benchmarked in order to show the difference in execution time. In addition to performance, verification of the algorithm is important. The two aspects that need to be verified are the absence of data races and functional correctness. This paper describes a specification that covers the absence of data races, by using Permission-Based Separation Logic. The first part of this specification (the up-sweep phase of the algorithm) has been verified using the VerCors tool.