Leveraging LSTMs for interference-aware run-time system Predictability of cloud workloads
TimeMonday, June 22nd10:06pm - 10:44pm
DescriptionModern micro-service and container-based cloud-native applications have leveraged multi-tenancy as a first class system design concern. The increasing number of co-located services/workloads into server facilities stresses resource availability and system capability in an unconventional and unpredictable manner.
To efficiently manage resources in such dynamic environments, run-time observability and forecasting are required to capture workload sensitivities under differing interference effects, according to applied co-location scenarios.
While several research efforts have emerged on interference-aware performance modelling, they are usually applied at a very coarse-grained manner e.g. estimating the overall performance degradation of an application, thus failing to effectively quantify, predict or provide educated insights on the impact of continuous runtime interference on per-resource allocations.
In this work, we present a predictive monitoring system that leverages the power of Long Short-Term Memory networks to enable fast and accurate runtime forecasting of key performance metrics and resource stresses of cloud-native applications under interference.
We evaluate our approach under a diverse set of interference scenarios for a plethora of representative cloud workloads, showing that i) we achieve extremely high prediction accuracy, average R^2 value of 0.98, ii) enable very deep prediction horizons retaining high accuracy, e.g. R^2 of around 0.99 for a horizon of 1 sec ahead and around 0.94 for an horizon of 5 sec ahead, while iii) satisfying, at the same time, the strict latency constraints required to make our proposed framework practical for continuous predictive monitoring at runtime.