SHARP Streaming-Aggregation Hardware Design and Evaluation
Event Type
Research Paper
TimeTuesday, June 23rd8:30pm - 8:55pm
DescriptionThis paper introduces a new hardware-based Mellanox Scalable Hierarchical Aggregation and Reduction Protocol SHARP reduction capability added to Mellanox’s HDR InfiniBand switches called SHARP Streaming-Aggregation. This capability is designed to achieve reduction bandwidths similar to those of point-to-point messages of the same size, and complements the latency-optimized Low-Latency-Aggregation reduction capabilities, aimed at small data reductions. MPI Allreduce() bandwidth measured on an HDR InfiniBand based system achieves about 96% of point-to-point bandwidth. For medium and large data reduction this also improves the reduction bandwidth by a factor of two to about five relative to host-based (e.g., software-based) reduction algorithms. Using this capability also increased DL-Poly and PyTorch application performance by as much as 7% and 10%, respectively. This paper describes SHARP Streaming-Aggregation hardware architecture and a set of synthetic and application benchmarks used to study this new reduction capability, and the range of data sizes for which Streaming-Aggregation performs better than the Low-Latency-Aggregation algorithm.
Senior Director
Mellanox Senior Vice President of Marketing at NVIDIA
Staff Architect
Principal Architect