GPU-Tasking à la Carte? Eventify Meets GPUs
TimeMonday, June 22nd9:29pm - 10:06pm
DescriptionTask-parallel programming is omnipresent in data mining and machine learning, in matrix factorization algorithms and molecular dynamics simulations. We aim at a uniform task-parallel programming technology for applications that target strong scaling on heterogeneous systems. Therefore, we extend the event-based, low-overhead, task-parallel programming library Eventify to GPUs.
Our use case is the implementation of a Fast Multipole Method for molecular dynamics simulations. Since the FMM is based on a tree-structured task graph, parallelizing it in a level-wise manner introduces heavy synchronization overhead. Due to this, current FMM implementations do barely use GPUs for acceleration; except for the embarrassingly parallel computation of near field interactions (P2P). Our goal is the execution of the entire FMM on a GPU by means of Eventify.
Our requirements on a tasking approach for GPUs stem from its CPU-equivalent in order to reach a uniform code base with reusable kernel functions. Hence, the requirements cover the generation of ready-to-execute tasks, multiple critical-path-aware task-queues as well as a scalable load balancing approach. On CPUs, these concepts of Eventify led to a performance improvement of 52% compared to an OpenMP-based Fast Multipole Method.
In this poster, we present our persistent thread based programming model for GPU-Eventify, provide details on our hierarchical task queues and reveal first performance results.