bamSliceR : cross-cohort variant and allelic bias analysis for rare variants and rare diseases.
Yizhou Peter HuangLauren HarmonEve Deering-GardnerXiaotu MaJosiah HarshZhaoyu XueHong WenMarcel RamosSean DavisTimothy J TrichePublished in: bioRxiv : the preprint server for biology (2023)
Rare diseases and conditions create unique challenges for genetic epidemiologists precisely because cases and samples are scarce. In recent years, whole-genome and whole-transcriptome sequencing (WGS /WTS) have eased the study of rare genetic variants. Paired WGS and WTS data are ideal, but logistical and financial constraints often preclude generating paired WGS and WTS data. Thus, many databases contain a patchwork of specimens with either WGS or WTS data, but only a minority of samples have both. The NCI Genomic Data Commons facilitates controlled access to genomic and transcriptomic data for thousands of subjects, many with unpaired sequencing results. Local reanalysis of expressed variants across whole transcriptomes requires significant data storage, compute, and expertise. We developed the bamSliceR package to facilitate swift transition from aligned sequence reads to expressed variant characterization. bamSliceR leverages the NCI Genomic Data Commons API to query genomic sub-regions of aligned sequence reads from specimens identified through the robust Bioconductor ecosystem. We demonstrate how population-scale targeted genomic analysis can be completed using orders of magnitude fewer resources in this fashion, with minimal compute burden. We demonstrate pilot results from bamSliceR for the TARGET pediatric AML and BEAT-AML projects, where identification of rare but recurrent somatic variants directly yields biologically testable hypotheses. bamSliceR and its documentation are freely available on GitHub at https://github.com/trichelab/bamSliceR .