Workshop Programme May 16, 2018

10:00 - 10:30 Welcome with coffee
10:30 - 11:00 Mohsen Safari - Parallel Algorithms to solve Shortest Path Problems in large graphs on GPUs
11:00 - 11:30 Pieter Hijma - An Ecosystem for Combining Performance and Correctness for Many-Cores
11:30 - 12:00 Roel Aaij - Potential for GPUs in HEP experiments: two examples
12:00 - 12:30 Shirin Dora - Deep Predictive Coding Networks on GPUs
12:30 - 13:30 Lunch
13:30 - 14:00 Antoine Veenstra - Edit distance on GPU clusters using MPI
14:00 - 14:30 Alfons Laarman - Optimal Storage of Combinatorial State Spaces
14:30 - 14:45 Coffee
14:45 - 15:15 Tanja Lange - Cryptanalysis using GPUs
15:15 - 15:45 Kai-Chun Ning - Solving large polynomial systems on GPUs
15:45 - 16:15 Muhammad Osama Mahmoud - SAT Simplifications on GPU Architecture
16:15 - 17:00 Brainstorming session: Roadmap for future Dutch GPU research
17:00 Drinks, Snacks & Networking

10:30 - 11:00 Mohsen Safari (University of Twente) - Parallel Algorithms to solve Shortest Path Problems in large graphs on GPUs
Graph processing algorithms have a significant impact on several domains of applications as graphs are used to model conceptual networks, systems and natural phenomena. Two of the most important problems in graph processing are the Single-Source Shortest Path (SSSP) and Shortest Path Query Problem (SPQP) that have applications in a variety of contexts (e.g., traffic routing, circuit design, formal analysis of computing systems). Due to the significance of the time/space efficiency of solving SSSP and SPQP on large graphs, researchers have proposed parallel/distributed algorithms. Amongst these, the algorithms that harness the computational power of Graphical Processing Units (GPUs) using NVIDIA’s Compute Unified Device Architecture (CUDA) have attracted noticeable attention in research community. However, efficient utilization of the computational power of GPUs is a challenging (and problem-dependent) task. In this talk, we present two highly efficient methods that solve SSSP on GPUs with two different approaches according to input graphs. More precisely, we introduce a vertex-based SSSP algorithm for sparse graphs and an arc-based SSSP algorithm for non-sparse graphs. Based on these SSSP algorithms, we design SPQP algorithms. Finally, we show the experimental results of the algorithms for large real-world graphs on GPUs.

11:00 - 11:30 Pieter Hijma (VU University) An Ecosystem for Combining Performance and Correctness for Many-Cores
In this presentation we will present our vision on an ecosystem for programming clusters of many-core devices such as GPUs. The system aids programmers to achieve high performance by providing performance feedback. We also aim to incorporate model checking into the system in a non-intrusive way to let programmers verify properties of their programs. We show several real-world applications from various domains with which we evaluate our ecosystem.

11:30 - 12:00 Roel Aaij (Nikhef) Potential for GPUs in HEP experiments: two examples
High Energy Physics experiments produce GiB/s more data than they can afford to store long term. Dedicated server farms are used to select interesting data in real-time. At the moment this data is processed largely with custom reconstruction algorithms running on CPUs. Porting such algorithms - or developing new ones - to run on GPUs has the potential to provide significant gains in processing speed and/or cost savings. Two examples of this potential will be discussed: the LHCb and KM3NeT experiments.

12:00 - 12:30 Shirin Dora (CWI) Deep Predictive Coding Networks on GPUs
Deep Learning has helped in achieving near human performance in many domains like image processing, natural language processing, etc. Most of these results have been achieved by using new and, sometimes, complicated architectures. Error-backpropagation has often been the learning algorithm that is used to train these massive networks. A property of error-backpropagation is that it requires a forward-pass followed by a backward-pass through the network. The operations in these two passes are very hard to parallellize. Predictive coding is an alternative to error-backpropagation which is inspired by the efficient information coding mechanisms observed in the brain. It allows all the layers in the network to be trained in parallel. This presentation describes the learning algorithm of predictive coding to train a neural network whose architecture is inspired by the brain.

13:30 - 14:00 Antoine Veenstra (University of Twente) Edit distance on GPU clusters using MPI
This presentation will describe a verified implementation of the Levenshtein distance problem on a GPU cluster using MPI and OpenCL. The implementation is based on an existing verified single GPU implementation. The speed of the implementation is higher on a cluster, but the efficiency is affected by the overhead, which is caused by the extra communication between nodes. See paper for more info

14:00 - 14:30 Alfons Laarman (Leiden University) Optimal Storage of Combinatorial State Spaces
Bandwidth is a critical resource for many parallel computations. We show that an information theoretical view on data promises linear reductions assuming only locality of computations. We then exploit this fact to reduce bandwidth usage in (parallel) model checkers by compressing state spaces. While the information theoretical view only provides a lower bound on the size of the compressed states, it does not suggest how to efficiently reach it. Instead of accommodating the locality between states, we propose a data structure that compresses the arising combinatorial structure of the state space. This tree-based data structure provides logarithmic access times, but in practice performs as efficient as a plain hash table. Experiments using hundreds of process algebra, petri net and imperative program inputs demonstrate that the information theoretic lower bound is often reached with negligible performance penalty.

14:45 - 15:15 Tanja Lange (TU Eindhoven) Cryptanalysis using GPUs
Cryptography plays an important part in securing our electronic communications. To make sure that the security holds up to serious attackers, cryptanalysts investigate how hard it is to break a system. This typically translates to studying algorithms and their complexity as well as implementing attacks on downsized version. Many of the attacks can be run in parallel and have a relatively small memory footprint which makes them ideal candidates for implementations on GPUs. An interesting feature of cryptanalysis is that we're only interested in the cost of obtaining the result; the attacker is free to change the algorithm and make it more suitable for the computing environment. The Cryptographic Implementations group and the Coding Theory and Cryptology group at TU/e will report on their attacks against some (scaled down) cryptosystems.

15:15 - 15:45 Kai-Chun Ning (KPN) Solving large polynomial systems on GPUs
Breaking some cryptographic schemes is in essence searching for a solution to a polynomial system. To achieve so, one can perform Gaussian elimination on the polynomial system and subsequently perform exhaustive search. Although Gaussian elimination is simple, performing it on a 30GB matrix of size 205,000 x 1,200,000 can be extremely slow. Therefore, one might be tempted to use GPUs to speed up this procedure. This presentation describes an approach to shrinking a matrix from 30GB to 4.5GB and subsequently solving the polynomial system on GPUs, which turned out to be faster than the state of the art by a factor as high as 60.

15:45 - 16:15Muhammad Osama Mahmoud (Eindhoven University of Technology) SAT Simplifications on GPU Architecture
The growing scale of industrial applications encoded to Boolean Satisfiability (SAT) problems imposed the researchers to practice SAT simplification as an imperative requirement for any SAT solver. In this talk, I will discuss how GPU can be utilized to perform variable and subsumption eliminations in parallel. Benchmarks show that our proposed simplifier (SIGmA) achieved an acceleration of 250x over SatELite. Regarding SAT solving, SIGmA outperformed SatELite in terms of problems solved faster when combined with MiniSat by a factor of 77%. Moreover, MiniSat was more effective than Lingeling by 73.3% in solving simplified formulas given by SIGmA.