Principal Component Analysis of Alternative Splicing Profiles Revealed by Long-Read ONT Sequencing in Human Liver Tissue and Hepatocyte-Derived HepG2 and Huh7 Cell Lines.
Elizaveta SaryginaAnna S KozlovaKseniia DeinichenkoSergey RadkoKonstantin PtitsynSvetlana KhmelevaLeonid K KurbatovPavel V SpirinVladimir S PrassolovEkaterina IlgisonisAndrey LisitsaElena PonomarenkoPublished in: International journal of molecular sciences (2023)
The long-read RNA sequencing developed by Oxford Nanopore Technology provides a direct quantification of transcript isoforms. That makes the number of transcript isoforms per gene an intrinsically suitable metric for alternative splicing (AS) profiling in the application to this particular type of RNA sequencing. By using this simple metric and recruiting principal component analysis (PCA) as a tool to visualize the high-dimensional transcriptomic data, we were able to group biospecimens of normal human liver tissue and hepatocyte-derived malignant HepG2 and Huh7 cells into clear clusters in a 2D space. For the transcriptome-wide analysis, the clustering was observed regardless whether all genes were included in analysis or only those expressed in all biospecimens tested. However, in the application to a particular set of genes known as pharmacogenes, which are involved in drug metabolism, the clustering worsened dramatically in the latter case. Based on PCA data, the subsets of genes most contributing to biospecimens' grouping into clusters were selected and subjected to gene ontology analysis that allowed us to determine the top 20 biological processes among which translation and processes related to its regulation dominate. The suggested metrics can be a useful addition to the existing metrics for describing AS profiles, especially in application to transcriptome studies with long-read sequencing.