A data-efficient deep learning tool for scRNA-Seq label transfer in neuroscience.
Julian LehrerJesus Gonzalez-FerrerDavid HausslerMircea TeodorescuVanessa D JonssonMohammed A Mostajo-RadjiPublished in: bioRxiv : the preprint server for biology (2023)
Large single-cell RNA datasets have contributed to unprecedented biological insight. Often, these take the form of cell atlases and serve as a reference for automating cell labeling of newly sequenced samples. Yet, classification algorithms have lacked the capacity to accurately annotate cells, particularly in complex datasets. Here we present SIMS (Scalable, Interpretable Machine Learning for Single-Cell), an end-to-end data-efficient machine learning pipeline for discrete classification of single-cell data that can be applied to new datasets with minimal coding. We benchmarked SIMS against common single-cell label transfer tools and demonstrated that it performs as well or better than state of the art algorithms. We then use SIMS to classify cells in one of the most complex tissues: the brain. We show that SIMS classifies cells of the adult cerebral cortex and hippocampus at a remarkably higher accuracy than state-of-the-art single cell classifiers. This accuracy is maintained in trans-sample label transfers of the adult human cerebral cortex. We then apply SIMS to classify cells in the developing brain and demonstrate a high level of accuracy at predicting neuronal subtypes, even in periods of fate refinement. Finally, we apply SIMS to single cell datasets of cortical organoids to predict cell identities in previously unclassified cells and to uncover genetic variations in the developmental trajectories of organoids derived from different pluripotent stem cell lines. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.
Keyphrases
- single cell
- rna seq
- machine learning
- deep learning
- induced apoptosis
- high throughput
- cell cycle arrest
- big data
- artificial intelligence
- endoplasmic reticulum stress
- endothelial cells
- signaling pathway
- multiple sclerosis
- oxidative stress
- stem cells
- dna methylation
- resting state
- convolutional neural network
- cell proliferation
- functional connectivity
- cognitive impairment
- depressive symptoms
- blood brain barrier
- copy number
- mesenchymal stem cells
- induced pluripotent stem cells