CaSee: A lightning transfer-learning model directly used to discriminate cancer/normal cells from scRNA-seq.
Yuan ShXiuli ZhangZhimin YangJierong DongYuanzhuo WangYing ZhouXuejie LiCaixia GuoZhiyuan HuPublished in: Oncogene (2022)
Single-cell RNA sequencing (scRNA-seq) is one of the most efficient technologies for human tumor research. However, data analysis is still faced with technical challenges, especially the difficulty in efficiently and accurately discriminating cancer/normal cells in the scRNA-seq expression matrix. If we can address these challenges, we can have a deeper understanding of the intratumoral and intertumoral heterogeneity. In this study, we developed a cancer/normal cell discrimination pipeline called pan-Cancer Seeker (CaSee) devoted to scRNA-seq expression matrix, which is based on the traditional high-quality pan-cancer bulk sequencing data using transfer learning. CaSee is the first tool directly used to discriminate cancer/normal cells in the scRNA-seq expression matrix, with much wider application fields and higher efficiency than copy number variation (CNV) method which requires corresponding reference cells. CaSee is user-friendly and can adapt to a variety of data sources, including but not limited to scRNA tissue sequencing data, scRNA cell line sequencing data, scRNA xenograft cell sequencing data and scRNA circulating tumor cell sequencing data. It is compatible with mainstream sequencing technology platforms, 10× Genomics Chromium, Smart-seq2, and Microwell-seq. Here, CaSee pipeline exhibited excellent performance in the multicenter data evaluation of 11 retrospective cohorts and one independent dataset, with an average discrimination accuracy of 96.69%. In general, the development of a deep-learning based, pan-cancer cell discrimination model, CaSee, to distinguish cancer cells from normal cells will be compelling to researchers working in the genomics, cancer, and single-cell fields.
Keyphrases
- single cell
- rna seq
- papillary thyroid
- high throughput
- data analysis
- squamous cell
- induced apoptosis
- electronic health record
- genome wide
- copy number
- big data
- poor prognosis
- deep learning
- lymph node metastasis
- cell cycle arrest
- mesenchymal stem cells
- endothelial cells
- cross sectional
- stem cells
- young adults
- drinking water
- squamous cell carcinoma
- clinical trial
- gene expression
- childhood cancer
- artificial intelligence
- pi k akt