Surveying the landscape of RNA isoform diversity and expression across 9 GTEx tissues using long-read sequencing data.
Madeline L PageBernardo Aguzzoli HeberleJ Anthony BrandonMark E WadsworthLacey A GordonKayla A NationsMark T W EbbertPublished in: bioRxiv : the preprint server for biology (2024)
Even though alternative RNA splicing was discovered in 1977 (nearly 50 years ago), we still understand very little about most isoforms arising from a single gene, including in which tissues they are expressed and if their functions differ. Human gene annotations suggest remarkable transcriptional complexity, with approximately 252,798 distinct RNA isoform annotations from 62,710 gene bodies (Ensembl v109; 2023), emphasizing the need to understand their biological effects. For example, 256 gene bodies have ≥50 annotated isoforms and 30 have ≥100, where one protein- coding gene ( MAPK10 ) even has 192 distinct RNA isoform annotations. Whether such isoform diversity results from biological noise ( i.e. , spurious alternative splicing) or whether it represents biological intent and specialized functions (even if subtle) remains a mystery. Recent studies by Aguzzoli-Heberle et al., Leung et al., and Glinos et al. demonstrate long-read RNAseq enables improved RNA isoform quantification for essentially any tissue, cell type, or biological condition ( e.g., disease, development, aging, etc.) making it possible to better assess individual isoform expression and function. While each study provided important discoveries related to RNA isoform diversity, deeper exploration is needed. We sought, in part, to quantify real isoform usage across tissues (compared to annotations) and explore whether observed diversity is biological noise or intent. We used long-read RNAseq data from 58 GTEx samples across nine tissues (three brain, two heart, muscle, lung, liver, and cultured fibroblasts) generated by Glinos et al. and found considerable isoform diversity within and across tissues. Cerebellar hemisphere was the most transcriptionally complex tissue (22,522 distinct isoforms; 3,726 unique); liver was least diverse (12,435 isoforms; 1,039 unique). We highlight gene clusters exhibiting high tissue-specific isoform diversity per tissue ( e.g. , TPM1 expresses 19 in heart's atrial appendage), and specific genes ( PAX6 and TPM1 ) that counterintuitively exhibit evidence that their expressed isoform diversity results from both biological noise and intent. We also validated 447 of the 700 new isoforms discovered by Aguzzoli-Heberle et al. and found that 88 were expressed in all nine tissues, while 58 were specific to a single tissue. This study represents a broad survey of the RNA isoform landscape, demonstrating isoform diversity across nine tissues and emphasizes the need to better understand how individual isoforms from a single gene body contribute to human health and disease.
Keyphrases
- genome wide
- gene expression
- copy number
- genome wide identification
- heart failure
- risk assessment
- poor prognosis
- oxidative stress
- air pollution
- electronic health record
- nucleic acid
- binding protein
- genome wide analysis
- small molecule
- skeletal muscle
- atrial fibrillation
- single molecule
- big data
- mitral valve
- machine learning
- left ventricular
- brain injury
- subarachnoid hemorrhage