Login / Signup

Reconstruction Set Test (RESET): a computationally efficient method for single sample gene set testing based on randomized reduced rank reconstruction error.

Hildreth Robert Frost
Published in: bioRxiv : the preprint server for biology (2023)
Gene set testing is a widely used hypothesis aggregation technique that can improve the power, interpretation and replication of genomic data analyses by focusing on biological pathways instead of individual genes. These benefits are amplified for genomic data generated on individual cells, which has significantly elevated levels of noise and sparsity relative to the output from bulk tissue assays. To address the lack of gene set testing methods optimized for single cell data, we recently developed a new technique for cell-level gene set scoring of single cell transcriptomic data called Variance-adjusted Mahalanobis (VAM). While the VAM technique offers a significant improvement in terms of computational performance and accuracy over other single sample methods, it has four important limitations. First, all existing single sample gene set testing methods are designed to detect differences in mean value and struggle to identify biologically relevant patterns of differential correlation. Second, the VAM method, and other computationally efficient techniques, are self-contained methods that generate scores for a given gene set without considering the values of other genes; so-called competitive scenarios, where the measured values of set genes differ from non-set genes in the same sample, cannot be directly detected. Third, the scores generated by existing methods can only be accurately compared across samples for a single set and not between sets, which complicates downstream analyses. Fourth, the computational performance of VAM, while better than most existing methods, can still be significant on very large datasets. To address these challenges, we have developed a new, and analytically novel, single sample method called Reconstruction Set Test (RESET). RESET quantifies gene set importance at both the sample-level and for the entire data based on the ability of genes in each set to reconstruct values for all measured genes. RESET is realized using a computationally efficient randomized reduced rank reconstruction algorithm and can effectively detect patterns of differential abundance and differential correlation for both self-contained and competitive scenarios. As we demonstrate using simulated and real single cell RNA-sequencing data, the RESET method provides superior classification accuracy at a lower computational cost relative to VAM and other popular single sample gene set testing approaches. An R implementation, which supports integration with the Seurat framework, is available in the RESET package on CRAN.
Keyphrases