Presentation
Evaluating Asynchronous Schwarz solvers for GPUs
SessionPhD Forum Posters
Event Type
PhD Forum
Pre-Recorded
TimeMonday, June 22nd8:52pm - 9:29pm
LocationApplaus
DescriptionWith the commencement of the exascale computing era, we realize that the
majority of the leadership supercomputers are heterogeneous and massively
parallel. Even a single node can contain multiple co-processors such as GPUs
and multiple CPU cores. For example, ORNL's Summit accumulates six NVIDIA
Tesla V100 GPUs and 42 IBM Power9 cores on each node. Synchronizing across
compute resources of multiple nodes can be prohibitively expensive. Hence, it
is necessary to develop and study asynchronous algorithms that circumvent this
issue of bulk-synchronous computing. In this study, we examine the
asynchronous version of the abstract Restricted Additive Schwarz method as a
solver. We do not explicitly synchronize, but allow the communication between
the sub-domains to be completely asynchronous, thereby removing the bulk
synchronous nature of the algorithm.
We accomplish this by using the one-sided Remote Memory Access (RMA) functions
of the MPI standard. We study the benefits of using such an asynchronous
solver over its synchronous counterpart. We also study the communication
patterns governed by the partitioning and the overlap between the sub-domains
on the global solver. Finally, we show that this concept can render attractive
performance benefits over the synchronous counterparts even for a
well-balanced problem.
majority of the leadership supercomputers are heterogeneous and massively
parallel. Even a single node can contain multiple co-processors such as GPUs
and multiple CPU cores. For example, ORNL's Summit accumulates six NVIDIA
Tesla V100 GPUs and 42 IBM Power9 cores on each node. Synchronizing across
compute resources of multiple nodes can be prohibitively expensive. Hence, it
is necessary to develop and study asynchronous algorithms that circumvent this
issue of bulk-synchronous computing. In this study, we examine the
asynchronous version of the abstract Restricted Additive Schwarz method as a
solver. We do not explicitly synchronize, but allow the communication between
the sub-domains to be completely asynchronous, thereby removing the bulk
synchronous nature of the algorithm.
We accomplish this by using the one-sided Remote Memory Access (RMA) functions
of the MPI standard. We study the benefits of using such an asynchronous
solver over its synchronous counterpart. We also study the communication
patterns governed by the partitioning and the overlap between the sub-domains
on the global solver. Finally, we show that this concept can render attractive
performance benefits over the synchronous counterparts even for a
well-balanced problem.
Poster PDF