Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology
Scientific Software Development
TimeWednesday, June 24th4:00pm - 4:30pm
LocationAnalog 1, 2
DescriptionResearch is increasingly becoming data-driven and natural sciences are not the exception. In biology and medicine, we can observe an exponential growth of data that allows us to gain novel insights that would not be possible without data science and increasing efforts for structured collections from experiments and population studies. However, these growing data sets pose a challenge on the existing compute infra\-structures since data outgrows compute. In this work, we present the application of a novel approach, Memory-Driven Computing (MDC), in the life sciences. MDC proposes a data-centric approach that has been designed for growing data sizes and provides a composable infrastructure for changing workloads. In particular, we show how a typical pipeline for genomics data processing can be accelerated and which modifications are required to exploit this novel architecture. Furthermore, we demonstrate how the isolated evalua\-tion of individual tasks misses the major overheads of typical pipelines in genomics data processing.