Login / Signup

RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization.

Guo-Hua YuanYing WangGuang-Zhong WangLi Yang
Published in: Briefings in bioinformatics (2022)
Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction.
Keyphrases
  • machine learning
  • sars cov
  • respiratory syndrome coronavirus
  • big data
  • high resolution
  • network analysis
  • transcription factor
  • coronavirus disease
  • nucleic acid