Login / Signup

BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data.

Kristiina AusmeesAji JohnSalman Z ToorAndreas HellanderCarl Nettelblad
Published in: BMC bioinformatics (2018)
BAMSI constitutes a framework for efficient filtering of large genomic data sets that is flexible in the use of compute as well as storage resources. The data resulting from the filter is assumed to be greatly reduced in size, and can easily be downloaded or routed into e.g. a Hadoop cluster for subsequent interactive analysis using Hive, Spark or similar tools. In this respect, our framework also suggests a general model for making very large datasets of high scientific value more accessible by offering the possibility for organizations to share the cost of hosting data on hot storage, without compromising the scalability of downstream analysis.
Keyphrases
  • electronic health record
  • big data
  • healthcare
  • mental health
  • machine learning
  • data analysis
  • rna seq
  • single cell
  • deep learning