author: | Jeroen Vonk |
title: | Puzzle solving using GPUs |
keywords: | GPU, verification, permissions |
topics: | Algorithms and Data Structures , Case studies and Applications |
committee: |
Marieke Huisman
, Marina Zaharieva-Stojanovski , Matej Mihelčić |
end: | June 2013 |
Abstract
Computing general problems using the graphical processing unit (GPU) of a
device is an emerging field. The parallel structure of the GPU allows for massive
concurrency, when executing a program. Therefore, by execute (a part of)
the code on the GPU, a previously unused resource can be used, to achieve a
speed-up of an application. Previously, programming on GPUs was a tedious
job, and the implementation was depending on the manufacturer - or even on
the model of the GPU. With the arrival of OpenCL, an open and broad platform
was offered, focussed to deliver General Purpose computing on the GPU,
or GPGPU to a broader audience. Despite the sometimes simple appearance
of OpenCL code, it is important to keep in mind that threads running the
code can be executed concurrently with thousands of threads. All these threads
concurrently executing, and potentially accessing the same memory locations,
can easily lead to implementation errors. This research is focussed on verifying
OpenCL code, using permission-based separation logic, to prevent those errors
in an early stage. Moreover, we have investigated what the consequences are
of optimizations of a OpenCL-program for the verification of that program. It
is common practice to use optimization in GPGPU, since the code executed on
the GPU is often "resource-hungry", either for memory, processing power, or
both. Therefore, optimizing the GPGPU part of a program will often result in
a significant speed-up.
As a verification use-case, we have developed a simple implementation of Conway’s
Game of Life, a well-known zero player game, based on a cellular automaton.
We have verified this implementation using permission-based separation
logic, enriched with some rules specifically for OpenCL. Therefore, we had to
annotate the code in a similar way when using the VerCors tool-set. Furthermore,
we developed three optimizations of this code using common optimization
techniques. To verify each of the optimizations we have looked at the changes
needed in the verification, in relation to the original verification. Our optimized
versions, upon execution, are indeed faster than the original implementation.
Moreover, we can show several patterns for changing our verification to fit our
optimization. Using these patterns, one could possibly automatically optimize
OpenCL code, whilst still guaranteeing the correctness of the program, given
that the previous implementation was correct.