Estimation of neutral mutation rates and quantification of somatic variant selection using canceffectsizeR.

Jeffrey D MandellVincent L CannataroJeffrey P Townsend
Published in: Cancer research (2022)
Somatic nucleotide mutations, or variants, can contribute to cancer cell survival, proliferation, and pathogenesis. Quantifying the proliferative effects of specific variants within clinically relevant contexts has scientific and therapeutic implications regarding cancer biology, prognosis, and treatment. To enable researchers to estimate these cancer effects, we have developed cancereffectsizeR, an R package that organizes somatic variant data, facilitates mutational signature analysis, calculates site-specific mutation rates, and tests models of selection. Built-in models support effect estimation at arbitrary scale, from single nucleotides to genes. Users may also estimate epistatic effects between paired sets of variants, or design and test custom models. We validate the utility of cancer effect by showing in a pan-cancer data set that somatic variants classified as likely pathogenic or pathogenic in ClinVar exhibit substantially higher effects than most other variants. Indeed, a multiple logistic regression demonstrates cancer effect to be a better predictor of pathogenic status than variant prevalence or functional impact scores such as SIFT or PolyPhen-2. In addition, we illustrate the application of this approach toward pairwise epistasis in lung adenocarcinoma, showing that driver mutations in any of BRAF, EGFR, and KRAS typically reduce selection for alterations in the other two genes. Companion reference data packages support analyses using the hg19 or hg38 human genome builds, and an included reference data builder enables use of the package with any species or custom genome build for which genomic and transcriptomic data are available. A reference manual, tutorial, and public source code repository are available at https://townsend-lab-yale.github.io/cancereffectsizeR.