Simplifying Communication Overlap in OpenSHMEM Through Integrated User-Level Thread Scheduling
Performance Analysis and Optimization
Programming Models & Languages
System Software & Runtime Systems
TimeTuesday, June 23rd4:15pm - 4:45pm
LocationAnalog 1, 2
DescriptionOverlap of communication with computation is a key optimization for high performance computing (HPC) applications. In this paper, we explore the usage of user-level threading to enable productive and efﬁcient communication overlap and pipelining. We extend OpenSHMEM with integrated user-level thread scheduling, enabling applications to leverage ﬁne-grain threading as an alternative to non-blocking communication. Our solution introduces communication aware thread scheduling that utilizes the communication state of threads to minimize context switching overheads. We identify several patterns common to multithreaded OpenSHMEM applications, leverage user-level threads to increase overlap of communication and computation, and explore the impact of different thread scheduling policies. Results indicate that user-level threading can enable blocking communication to meet the performance of highly-optimized, non-blocking, single-threaded codes with signiﬁcantly lower application-level complexity. In one case, we observe a 28.7% performance improvement for the Smith-Waterman DNA sequence alignment benchmark.