An NMF-based approach to discover overlooked differentially expressed gene regions from single-cell RNA-seq data.
Hirotaka MatsumotoTetsutaro HayashiHaruka OzakiKoki TsuyuzakiMana UmedaTsuyoshi IidaMasaya NakamuraHideyuki OkanoItoshi NikaidoPublished in: NAR genomics and bioinformatics (2019)
Single-cell RNA sequencing has enabled researchers to quantify the transcriptomes of individual cells, infer cell types and investigate differential expression among cell types, which will lead to a better understanding of the regulatory mechanisms of cell states. Transcript diversity caused by phenomena such as aberrant splicing events have been revealed, and differential expression of previously unannotated transcripts might be overlooked by annotation-based analyses. Accordingly, we have developed an approach to discover overlooked differentially expressed (DE) gene regions that complements annotation-based methods. Our algorithm decomposes mapped count data matrix for a gene region using non-negative matrix factorization, quantifies the differential expression level based on the decomposed matrix, and compares the differential expression level based on annotation-based approach to discover previously unannotated DE transcripts. We performed single-cell RNA sequencing for human neural stem cells and applied our algorithm to the dataset. We also applied our algorithm to two public single-cell RNA sequencing datasets correspond to mouse ES and primitive endoderm cells, and human preimplantation embryos. As a result, we discovered several intriguing DE transcripts, including a transcript related to the modulation of neural stem/progenitor cell differentiation.
Keyphrases
- single cell
- rna seq
- high throughput
- induced apoptosis
- endothelial cells
- machine learning
- deep learning
- genome wide
- copy number
- cell cycle arrest
- electronic health record
- emergency department
- healthcare
- big data
- genome wide identification
- transcription factor
- cell death
- induced pluripotent stem cells
- artificial intelligence
- data analysis
- gene expression
- drug induced