scCDC: a computational method for gene-specific contamination detection and correction in single-cell and single-nucleus RNA-seq data.
Weijian WangYihui CenZezhen LuYueqing XuTianyi SunYing XiaoWanlu LiuJingyi Jessica LiChaochen WangPublished in: Genome biology (2024)
In droplet-based single-cell and single-nucleus RNA-seq assays, systematic contamination of ambient RNA molecules biases the quantification of gene expression levels. Existing methods correct the contamination for all genes globally. However, there lacks specific evaluation of correction efficacy for varying contamination levels. Here, we show that DecontX and CellBender under-correct highly contaminating genes, while SoupX and scAR over-correct lowly/non-contaminating genes. Here, we develop scCDC as the first method to detect the contamination-causing genes and only correct expression levels of these genes, some of which are cell-type markers. Compared with existing decontamination methods, scCDC excels in decontaminating highly contaminating genes while avoiding over-correction of other genes.
Keyphrases
- rna seq
- single cell
- genome wide
- genome wide identification
- gene expression
- bioinformatics analysis
- risk assessment
- high throughput
- drinking water
- health risk
- dna methylation
- genome wide analysis
- human health
- poor prognosis
- machine learning
- electronic health record
- air pollution
- transcription factor
- copy number
- heavy metals
- long non coding rna
- particulate matter
- binding protein