FLIBase: a comprehensive repository of full-length isoforms across human cancers and tissues.
Qili ShiXinrong LiYizhe LiuZhiao ChenXianghuo HePublished in: Nucleic acids research (2023)
Regulatory processes at the RNA transcript level play a crucial role in generating transcriptome diversity and proteome composition in human cells, impacting both physiological and pathological states. This study introduces FLIBase (www.FLIBase.org), a specialized database that focuses on annotating full-length isoforms using long-read sequencing techniques. We collected and integrated long-read (351 samples) and short-read (12 469 samples) RNA sequencing data from diverse normal and cancerous human tissues and cells. The current version of FLIBase comprises a total of 983 789 full-length spliced isoforms, identified through long-read sequences and verified using short-read exon-exon splice junctions. Of these, 188 248 isoforms have been annotated, while 795 541 isoforms remain unannotated. By overcoming the limitations of short-read RNA sequencing methods, FLIBase provides an accurate and comprehensive representation of full-length transcripts. These comprehensive annotations empower researchers to undertake various downstream analyses and investigations. Importantly, FLIBase exhibits a significant advantage in identifying a substantial number of previously unannotated isoforms and tumor-specific RNA transcripts. These tumor-specific RNA transcripts have the potential to serve as a source of immunogenic recurrent neoantigens. This remarkable discovery holds tremendous promise for advancing the development of tailored RNA-based diagnostic and therapeutic strategies for various types of human cancer.
Keyphrases
- single molecule
- single cell
- endothelial cells
- gene expression
- induced pluripotent stem cells
- rna seq
- pluripotent stem cells
- small molecule
- squamous cell carcinoma
- big data
- machine learning
- induced apoptosis
- high throughput
- risk assessment
- high resolution
- papillary thyroid
- transcription factor
- climate change
- cell death
- cell cycle arrest
- smoking cessation
- neural network