Presentation
Simplifying Communication Overlap in OpenSHMEM Through Integrated User-Level Thread Scheduling
SessionResearch Paper Session
Event Type
Research Paper
Pre-Recorded
TimeTuesday, June 23rd8:55pm - 9:20pm
LocationDigital
DescriptionOverlap of communication with computation is a key optimization for high performance computing (HPC) applications. In this paper, we explore the usage of user-level threading to enable productive and efficient communication overlap and pipelining. We extend OpenSHMEM with integrated user-level thread scheduling, enabling applications to leverage fine-grain threading as an alternative to non-blocking communication. Our solution introduces communication aware thread scheduling that utilizes the communication state of threads to minimize context switching overheads. We identify several patterns common to multithreaded OpenSHMEM applications, leverage user-level threads to increase overlap of communication and computation, and explore the impact of different thread scheduling policies. Results indicate that user-level threading can enable blocking communication to meet the performance of highly-optimized, non-blocking, single-threaded codes with significantly lower application-level complexity. In one case, we observe a 28.7% performance improvement for the Smith-Waterman DNA sequence alignment benchmark.