## Workshop Programme May 16, 2018

**10:00 - 10:30**Welcome with coffee

**10:30 - 11:00**Mohsen Safari - Parallel Algorithms to solve Shortest Path Problems in large graphs on GPUs

**11:00 - 11:30**Pieter Hijma - An Ecosystem for Combining Performance and Correctness for Many-Cores

**11:30 - 12:00**Roel Aaij - Potential for GPUs in HEP experiments: two examples

**12:00 - 12:30**Shirin Dora - Deep Predictive Coding Networks on GPUs

**12:30 - 13:30**Lunch

**13:30 - 14:00**Antoine Veenstra - Edit distance on GPU clusters using MPI

**14:00 - 14:30**Alfons Laarman - Optimal Storage of Combinatorial State Spaces

**14:30 - 14:45**Coffee

**14:45 - 15:15**Tanja Lange - Cryptanalysis using GPUs

**15:15 - 15:45**Kai-Chun Ning - Solving large polynomial systems on GPUs

**15:45 - 16:15**Muhammad Osama Mahmoud - SAT Simplifications on GPU Architecture

**16:15 - 17:00**Brainstorming session: Roadmap for future Dutch GPU research

**17:00**Drinks, Snacks & Networking

**10:30 - 11:00** **Mohsen Safari** (University of Twente)
- *Parallel Algorithms to solve Shortest Path Problems in large graphs on GPUs*

**Abstract**

Graph processing algorithms have a significant impact on several domains of applications as graphs are used to model conceptual networks, systems and natural phenomena. Two of the most important problems in graph processing are the Single-Source Shortest Path (SSSP) and Shortest Path Query Problem (SPQP) that have applications in a variety of contexts (e.g., traffic routing, circuit design, formal analysis of computing systems). Due to the significance of the time/space efficiency of solving SSSP and SPQP on large graphs, researchers have proposed parallel/distributed algorithms. Amongst these, the algorithms that harness the computational power of Graphical Processing Units (GPUs) using NVIDIAâ€™s Compute Unified Device Architecture (CUDA) have attracted noticeable attention in research community. However, efficient utilization of the computational power of GPUs is a challenging (and problem-dependent) task. In this talk, we present two highly efficient methods that solve SSSP on GPUs with two different approaches according to input graphs. More precisely, we introduce a vertex-based SSSP algorithm for sparse graphs and an arc-based SSSP algorithm for non-sparse graphs. Based on these SSSP algorithms, we design SPQP algorithms. Finally, we show the experimental results of the algorithms
for large real-world graphs on GPUs.

**11:00 - 11:30** **Pieter Hijma** (VU University)
*An Ecosystem for Combining Performance and Correctness for
Many-Cores*

**Abstract**

In this presentation we will present our vision on an ecosystem for
programming clusters of many-core devices such as GPUs. The system aids
programmers to achieve high performance by providing performance
feedback. We also aim to incorporate model checking into the system in
a non-intrusive way to let programmers verify properties of their
programs. We show several real-world applications from various domains
with which we evaluate our ecosystem.

**11:30 - 12:00** **Roel Aaij** (Nikhef)
*Potential for GPUs in HEP experiments: two examples*

**Abstract**

High Energy Physics experiments produce GiB/s more data than they can
afford to store long term. Dedicated server farms are used to select
interesting data in real-time. At the moment this data is processed
largely with custom reconstruction algorithms running on CPUs. Porting
such algorithms - or developing new ones - to run on GPUs has the
potential to provide significant gains in processing speed and/or cost
savings. Two examples of this potential will be discussed: the LHCb and
KM3NeT experiments.

**12:00 - 12:30** **Shirin Dora** (CWI)
*Deep Predictive Coding Networks on GPUs*

**Abstract**

Deep Learning has helped in achieving near human performance in many
domains like image processing, natural language processing, etc. Most of
these results have been achieved by using new and, sometimes,
complicated architectures. Error-backpropagation has often been the
learning algorithm that is used to train these massive networks. A
property of error-backpropagation is that it requires a forward-pass
followed by a backward-pass through the network. The operations in these
two passes are very hard to parallellize. Predictive coding is an
alternative to error-backpropagation which is inspired by the efficient
information coding mechanisms observed in the brain. It allows all the
layers in the network to be trained in parallel. This presentation
describes the learning algorithm of predictive coding to train a neural
network whose architecture is inspired by the brain.

**13:30 - 14:00** **Antoine Veenstra**
(University of Twente)
*Edit distance on GPU clusters using MPI*

**Abstract**

This presentation will describe a verified implementation of the Levenshtein distance problem on a GPU cluster using MPI and OpenCL. The implementation is based on an existing verified single GPU implementation. The speed of the implementation is higher on a cluster, but the efficiency is affected by the overhead, which is caused by the extra communication between nodes.
See
paper for more info

**14:00 - 14:30** **Alfons Laarman**
(Leiden University)
*Optimal Storage of Combinatorial State Spaces*

**Abstract**

Bandwidth is a critical resource for many parallel computations. We
show that an information theoretical view on data promises linear
reductions assuming only locality of computations. We then exploit
this fact to reduce bandwidth usage in (parallel) model checkers by
compressing state spaces. While the information theoretical view only
provides a lower bound on the size of the compressed states, it does
not suggest how to efficiently reach it. Instead of accommodating the
locality between states, we propose a data structure that compresses
the arising combinatorial structure of the state space. This
tree-based data structure provides logarithmic access times, but in
practice performs as efficient as a plain hash table. Experiments
using hundreds of process algebra, petri net and imperative program
inputs demonstrate that the information theoretic lower bound is often
reached with negligible performance penalty.

**14:45 - 15:15** **Tanja Lange** (TU Eindhoven)
*Cryptanalysis using GPUs*

**Abstract**

Cryptography plays an important part in securing our electronic communications.
To make sure that the security holds up to serious attackers, cryptanalysts
investigate how hard it is to break a system. This typically translates to
studying algorithms and their complexity as well as implementing attacks on
downsized version. Many of the attacks can be run in parallel and have a
relatively small memory footprint which makes them ideal candidates for
implementations on GPUs.
An interesting feature of cryptanalysis is that we're only interested in
the cost of obtaining the result; the attacker is free to change the
algorithm and make it more suitable for the computing environment. The
Cryptographic Implementations group and the Coding Theory and Cryptology
group at TU/e will report on their attacks against some (scaled down)
cryptosystems.

**15:15 - 15:45** **Kai-Chun Ning** (KPN)
*Solving large polynomial systems on GPUs*

**Abstract**

Breaking some cryptographic schemes is in essence searching for a solution to a
polynomial system. To achieve so, one can perform Gaussian elimination on the
polynomial system and subsequently perform exhaustive search. Although Gaussian
elimination is simple, performing it on a 30GB matrix of size 205,000 x
1,200,000 can be extremely slow. Therefore, one might be tempted to use GPUs to
speed up this procedure. This presentation describes an approach to shrinking a
matrix from 30GB to 4.5GB and subsequently solving the polynomial system on
GPUs, which turned out to be faster than the state of the art by a factor as
high as 60.

**15:45 - 16:15****Muhammad Osama Mahmoud **
(Eindhoven University of Technology)
*SAT Simplifications on GPU Architecture*

**Abstract**

The growing scale of industrial applications encoded to Boolean Satisfiability (SAT) problems imposed the researchers to practice SAT simplification as an imperative requirement for any SAT solver. In this talk, I will discuss how GPU can be utilized to perform variable and subsumption eliminations in parallel. Benchmarks show that our proposed simplifier (SIGmA) achieved an acceleration of 250x over SatELite. Regarding SAT solving, SIGmA outperformed SatELite in terms of problems solved faster when combined with MiniSat by a factor of 77%. Moreover, MiniSat was more effective than Lingeling by 73.3% in solving simplified formulas given by SIGmA.