A Bayesian model for unsupervised detection of RNA splicing based subtypes in cancers.
David WangMathieu Quesnel-VallieresSan JewellMoein ElzubeirKristen LynchAndrei Thomas-TikhonenkoYoseph BarashPublished in: Nature communications (2023)
Identification of cancer sub-types is a pivotal step for developing personalized treatment. Specifically, sub-typing based on changes in RNA splicing has been motivated by several recent studies. We thus develop CHESSBOARD, an unsupervised algorithm tailored for RNA splicing data that captures "tiles" in the data, defined by a subset of unique splicing changes in a subset of patients. CHESSBOARD allows for a flexible number of tiles, accounts for uncertainty of splicing quantification, and is able to model missing values as additional signals. We first apply CHESSBOARD to synthetic data to assess its domain specific modeling advantages, followed by analysis of several leukemia datasets. We show detected tiles are reproducible in independent studies, investigate their possible regulatory drivers and probe their relation to known AML mutations. Finally, we demonstrate the potential clinical utility of CHESSBOARD by supplementing mutation based diagnostic assays with discovered splicing profiles to improve drug response correlation.
Keyphrases
- machine learning
- electronic health record
- big data
- end stage renal disease
- acute myeloid leukemia
- newly diagnosed
- ejection fraction
- transcription factor
- emergency department
- case control
- nucleic acid
- peritoneal dialysis
- prognostic factors
- smoking cessation
- quantum dots
- risk assessment
- squamous cell
- young adults
- climate change
- living cells
- patient reported outcomes
- childhood cancer
- loop mediated isothermal amplification
- rna seq
- single cell
- label free
- neural network
- single molecule
- sensitive detection
- genetic diversity