Log-based Fingerprinting of HPC Applications
TimeTuesday, June 23rd4pm - 4:05pm
DescriptionHigh Performance Computing (HPC) is an important method for scientific discovery via large scale simulation, data analysis, and artificial intelligence.
Leadership supercomputers are expensive, but essential to run large HPC applications.
In order to improve our understanding of HPC applications, user demands, and resource usage characteristics, we perform correlative analysis of various logs for different subsystems of a leadership supercomputer.
Based on the holistic insights of the application through combined analysis of multiple logs from different perspectives, and general intuition, we engineer features to ``fingerprint'' an HPC application.
We use t-SNE (a machine learning technique for dimensionality reduction) to validate the explainability of our features and finally train machine learning models to identify HPC applications and group applications with similar characteristics.
The learned application representation can be used for verification, identification (e.g. identifying unapproved applications) and prediction using machine learning models.
Since the fingerprint is built on performance counters, we can use it to categorize applications, for example as I/O intensive, communication intensive, computing intensive, or memory bound.
Such categories can be used to target applications for optimization and other specialized services.
We also believe that this characterization is useful for optimizing facility management, improving energy efficiency, and optimizing scheduling policy.
Moreover, log-based characterization does not introduce extra instrumenting burden to the supercomputer, and adds value to logs originally collected for debugging, troubleshooting, and auditing purposes.
We also hope that insights gained from our analysis can suggest directions for further analysis and encourage other HPC centers to undertake similar efforts.