Agnostic Framework for the Classification/Identification of Organisms Based on RNA Post-Transcriptional Modifications.
William D McIntyreReza NematiMehraveh SalehiColin C AldrichMolly FitzGibbonLimin DengManuel A PazosRebecca E RoseBotros ToroRachel E NetzbandCara T PagerIngrid P RobinsonSean M BialosukniaAlexander T CiotaDaniele FabrisPublished in: Analytical chemistry (2021)
We propose a novel approach for building a classification/identification framework based on the full complement of RNA post-transcriptional modifications (rPTMs) expressed by an organism at basal conditions. The approach relies on advanced mass spectrometry techniques to characterize the products of exonuclease digestion of total RNA extracts. Sample profiles comprising identities and relative abundances of all detected rPTM were used to train and test the capabilities of different machine learning (ML) algorithms. Each algorithm proved capable of identifying rigorous decision rules for differentiating closely related classes and correctly assigning unlabeled samples. The ML classifiers resolved different members of the Enterobacteriaceae family, alternative Escherichia coli serotypes, a series of Saccharomyces cerevisiae knockout mutants, and primary cells of the Homo sapiens central nervous system, which shared very similar genetic backgrounds. The excellent levels of accuracy and resolving power achieved by training on a limited number of classes were successfully replicated when the number of classes was significantly increased to escalate complexity. A dendrogram generated from ML-curated data exhibited a hierarchical organization that closely resembled those afforded by established taxonomic systems. Finer clustering patterns revealed the extensive effects induced by the deletion of a single pivotal gene. This information provided a putative roadmap for exploring the roles of rPTMs in their respective regulatory networks, which will be essential to decipher the epitranscriptomics code. The ubiquitous presence of RNA in virtually all living organisms promises to enable the broadest possible range of applications, with significant implications in the diagnosis of RNA-related diseases.
Keyphrases
- machine learning
- deep learning
- escherichia coli
- saccharomyces cerevisiae
- mass spectrometry
- big data
- artificial intelligence
- transcription factor
- nucleic acid
- gene expression
- induced apoptosis
- single cell
- genome wide
- high resolution
- social media
- cell death
- ms ms
- cell proliferation
- health information
- cell cycle arrest
- cerebrospinal fluid
- bioinformatics analysis
- endoplasmic reticulum stress
- rna seq
- wild type
- gas chromatography
- heat shock
- urinary tract infection
- simultaneous determination