FMT - Education

author:	Jeroen Vonk
title:	Puzzle solving using GPUs
keywords:	GPU, verification, permissions
topics:	Algorithms and Data Structures , Case studies and Applications
committee:	Marieke Huisman , Marina Zaharieva-Stojanovski , Matej Mihelčić
end:	June 2013

Abstract

Computing general problems using the graphical processing unit (GPU) of a

device is an emerging field. The parallel structure of the GPU allows for massive

concurrency, when executing a program. Therefore, by execute (a part of)

the code on the GPU, a previously unused resource can be used, to achieve a

speed-up of an application. Previously, programming on GPUs was a tedious

job, and the implementation was depending on the manufacturer - or even on

the model of the GPU. With the arrival of OpenCL, an open and broad platform

was offered, focussed to deliver General Purpose computing on the GPU,

or GPGPU to a broader audience. Despite the sometimes simple appearance

of OpenCL code, it is important to keep in mind that threads running the

code can be executed concurrently with thousands of threads. All these threads

concurrently executing, and potentially accessing the same memory locations,

can easily lead to implementation errors. This research is focussed on verifying

OpenCL code, using permission-based separation logic, to prevent those errors

in an early stage. Moreover, we have investigated what the consequences are

of optimizations of a OpenCL-program for the verification of that program. It

is common practice to use optimization in GPGPU, since the code executed on

the GPU is often "resource-hungry", either for memory, processing power, or

both. Therefore, optimizing the GPGPU part of a program will often result in

a significant speed-up.

As a verification use-case, we have developed a simple implementation of Conway’s

Game of Life, a well-known zero player game, based on a cellular automaton.

We have verified this implementation using permission-based separation

logic, enriched with some rules specifically for OpenCL. Therefore, we had to

annotate the code in a similar way when using the VerCors tool-set. Furthermore,

we developed three optimizations of this code using common optimization

techniques. To verify each of the optimizations we have looked at the changes

needed in the verification, in relation to the original verification. Our optimized

versions, upon execution, are indeed faster than the original implementation.

Moreover, we can show several patterns for changing our verification to fit our

optimization. Using these patterns, one could possibly automatically optimize

OpenCL code, whilst still guaranteeing the correctness of the program, given

that the previous implementation was correct.