Learning Cell Annotation under Multiple Reference Datasets by Multisource Domain Adaptation.

Yan LiuHe YanLong-Chen ShenDong-Jun Yu

Published in: Journal of chemical information and modeling (2022)

Accurate and efficient cell type annotation is essential for single-cell sequence analysis. Currently, cell type annotation using well-annotated reference datasets with powerful models has become increasingly popular. However, with the increasing amount of single-cell data, there is an urgent need to develop a novel annotation method that can integrate multiple reference datasets to improve cell type annotation performance. Since the unwanted batch effects between individual reference datasets, integrating multiple reference datasets is still an open challenge. To address this, we proposed scMDR and scMultiR, respectively, using multisource domain adaptation to learn cell type-specific information from multiple reference datasets and query cells. Based on the learned cell type-specific information, scMDR and scMultiR provide the most likely cell types for the query cells. Benchmark experiments demonstrated their state-of-the-art effectiveness for integrative single-cell assignment with multiple reference datasets.

Keyphrases