Scaling Genomics Data Processing with Memory-Driven Computing
TimeTuesday, June 23rd7:15pm - 7:40pm
DescriptionResearch is increasingly becoming data-driven and natural sciences are not the exception. In biology and medicine, we can observe an exponential growth of data that allows us to gain novel insights that would not be possible without data science and increasing efforts for structured collections from experiments and population studies. However, these growing data sets pose a challenge on the existing compute infrastructures since data outgrows compute. In this work, we present the application of a novel approach, Memory-Driven Computing (MDC), in the life sciences. MDC proposes a data-centric approach that has been designed for growing data sizes and provides a composable infrastructure for changing workloads. In particular, we show how a typical pipeline for genomics data processing can be accelerated and which modifications are required to exploit this novel architecture. Furthermore, we demonstrate how the isolated evalua\-tion of individual tasks misses the major overheads of typical pipelines in genomics data processing.