Login / Signup

Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers.

Laura WrattenAndreas WilmJonathan Goke
Published in: Nature methods (2021)
The rapid growth of high-throughput technologies has transformed biomedical research. With the increasing amount and complexity of data, scalability and reproducibility have become essential not just for experiments, but also for computational analysis. However, transforming data into information involves running a large number of tools, optimizing parameters, and integrating dynamically changing reference data. Workflow managers were developed in response to such challenges. They simplify pipeline development, optimize resource usage, handle software installation and versions, and run on different compute platforms, enabling workflow portability and sharing. In this Perspective, we highlight key features of workflow managers, compare commonly used approaches for bioinformatics workflows, and provide a guide for computational and noncomputational users. We outline community-curated pipeline initiatives that enable novice and experienced users to perform complex, best-practice analyses without having to manually assemble workflows. In sum, we illustrate how workflow managers contribute to making computational analysis in biomedical research shareable, scalable, and reproducible.
Keyphrases
  • electronic health record
  • high throughput
  • healthcare
  • big data
  • mental health
  • quality improvement
  • social media
  • high intensity
  • health information
  • artificial intelligence
  • deep learning