Leveraging omic features with F3UTER enables identification of unannotated 3'UTRs for synaptic genes.
Siddharth SethiDavid ZhangSebastian GuelfiZhongbo ChenSonia Garcia-RuizEmmanuel O OlagbajuMina RytenHarpreet SainiJuan A BotiaPublished in: Nature communications (2022)
There is growing evidence for the importance of 3' untranslated region (3'UTR) dependent regulatory processes. However, our current human 3'UTR catalogue is incomplete. Here, we develop a machine learning-based framework, leveraging both genomic and tissue-specific transcriptomic features to predict previously unannotated 3'UTRs. We identify unannotated 3'UTRs associated with 1,563 genes across 39 human tissues, with the greatest abundance found in the brain. These unannotated 3'UTRs are significantly enriched for RNA binding protein (RBP) motifs and exhibit high human lineage-specificity. We find that brain-specific unannotated 3'UTRs are enriched for the binding motifs of important neuronal RBPs such as TARDBP and RBFOX1, and their associated genes are involved in synaptic function. Our data is shared through an online resource F3UTER ( https://astx.shinyapps.io/F3UTER/ ). Overall, our data improves 3'UTR annotation and provides additional insights into the mRNA-RBP interactome in the human brain, with implications for our understanding of neurological and neurodevelopmental diseases.
Keyphrases
- endothelial cells
- binding protein
- machine learning
- genome wide
- bioinformatics analysis
- induced pluripotent stem cells
- pluripotent stem cells
- electronic health record
- cerebral ischemia
- single cell
- white matter
- gene expression
- genome wide identification
- transcription factor
- rna seq
- dna methylation
- copy number
- blood brain barrier
- deep learning