DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark.
Michael D LindermanDavin ChiaForrest WallaceFrank A NothaftPublished in: BMC bioinformatics (2019)
We describe DECA's performance, our algorithmic and implementation enhancements to XHMM to obtain that performance, and our lessons learned porting a complex genome analysis application to ADAM and Spark. ADAM and Apache Spark are a performant and productive platform for implementing large-scale genome analyses, but efficiently utilizing large clusters can require algorithmic optimizations and careful attention to Spark's configuration parameters.