Login / Signup

DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark.

Michael D LindermanDavin ChiaForrest WallaceFrank A Nothaft
Published in: BMC bioinformatics (2019)
We describe DECA's performance, our algorithmic and implementation enhancements to XHMM to obtain that performance, and our lessons learned porting a complex genome analysis application to ADAM and Spark. ADAM and Apache Spark are a performant and productive platform for implementing large-scale genome analyses, but efficiently utilizing large clusters can require algorithmic optimizations and careful attention to Spark's configuration parameters.
Keyphrases
  • copy number
  • genome wide
  • mitochondrial dna
  • dna methylation
  • quality improvement
  • primary care
  • healthcare
  • working memory
  • high throughput
  • gene expression