Presentation
Leveraging LSTMs for interference-aware run-time system Predictability of cloud workloads
SessionPhD Forum Posters
Event Type
PhD Forum
Pre-Recorded
TimeMonday, June 22nd10:06pm - 10:44pm
LocationApplaus
DescriptionModern micro-service and container-based cloud-native applications have leveraged multi-tenancy as a first class system design concern. The increasing number of co-located services/workloads into server facilities stresses resource availability and system capability in an unconventional and unpredictable manner.
To efficiently manage resources in such dynamic environments, run-time observability and forecasting are required to capture workload sensitivities under differing interference effects, according to applied co-location scenarios.
While several research efforts have emerged on interference-aware performance modelling, they are usually applied at a very coarse-grained manner e.g. estimating the overall performance degradation of an application, thus failing to effectively quantify, predict or provide educated insights on the impact of continuous runtime interference on per-resource allocations.
In this work, we present a predictive monitoring system that leverages the power of Long Short-Term Memory networks to enable fast and accurate runtime forecasting of key performance metrics and resource stresses of cloud-native applications under interference.
We evaluate our approach under a diverse set of interference scenarios for a plethora of representative cloud workloads, showing that i) we achieve extremely high prediction accuracy, average R^2 value of 0.98, ii) enable very deep prediction horizons retaining high accuracy, e.g. R^2 of around 0.99 for a horizon of 1 sec ahead and around 0.94 for an horizon of 5 sec ahead, while iii) satisfying, at the same time, the strict latency constraints required to make our proposed framework practical for continuous predictive monitoring at runtime.
To efficiently manage resources in such dynamic environments, run-time observability and forecasting are required to capture workload sensitivities under differing interference effects, according to applied co-location scenarios.
While several research efforts have emerged on interference-aware performance modelling, they are usually applied at a very coarse-grained manner e.g. estimating the overall performance degradation of an application, thus failing to effectively quantify, predict or provide educated insights on the impact of continuous runtime interference on per-resource allocations.
In this work, we present a predictive monitoring system that leverages the power of Long Short-Term Memory networks to enable fast and accurate runtime forecasting of key performance metrics and resource stresses of cloud-native applications under interference.
We evaluate our approach under a diverse set of interference scenarios for a plethora of representative cloud workloads, showing that i) we achieve extremely high prediction accuracy, average R^2 value of 0.98, ii) enable very deep prediction horizons retaining high accuracy, e.g. R^2 of around 0.99 for a horizon of 1 sec ahead and around 0.94 for an horizon of 5 sec ahead, while iii) satisfying, at the same time, the strict latency constraints required to make our proposed framework practical for continuous predictive monitoring at runtime.
Poster PDF