|title:||Implementation of dynamic data structures on GPU|
|keywords:||Dynamic data structures, Linked list, GPU, GPGPU programming, OpenCL|
|topics:||Algorithms and Data Structures|
With the emergence of general purpose GPU (GPGPU) programming, concurrent data processing of large arrays of data has gained a significant boost in performance. However, due to the memory architecture between the host and GPU device and other limitations in the instructions available on GPUs, the implementation of dynamic data structures, like linked list and trees, is not evident. In GPU programming a high number of computing elements (work items) all execute a single instruction on multiple data (SIMD). With dynamically changing data structures, and the absence of locks in GPU programming, one of the biggest challenges is to ensure each work-item operates on its own piece of data. In this paper we show that it is possible to implement (shared) dynamic data structures on GPUs. A kernel memory allocator is used to manage the memory on the GPU device. Lock-free algorithms have been used to implement a linked list. A study to the performance of this implementation showed that the massive parallelism, as found on GPUs, does not improve the performance for the basic functionality of the linked list that was developed. However, dynamic data structures like linked lists, are often used to store intermediate results. Being able to implement these data structures on GPUs can aid developers in implementing other algorithms, that could benefit from parallelisation but use linked lists, on GPUs. By using a kernel memory allocator, memory can be reused during the execution of the program. Thereby offering the possibility to make more ef- ficient use of the memory available on the device, limiting the memory overhead.