Compositionally aware estimation of cross-correlations for microbiome data.

Ib Thorsgaard JensenLuc JanssSimona RadutoiuRasmus Waagepetersen

Published in: PloS one (2024)

In the field of microbiome studies, it is of interest to infer correlations between abundances of different microbes (here referred to as operational taxonomic units, OTUs). Several methods taking the compositional nature of the sequencing data into account exist. However, these methods cannot infer correlations between OTU abundances and other variables. In this paper we introduce the novel methods SparCEV (Sparse Correlations with External Variables) and SparXCC (Sparse Cross-Correlations between Compositional data) for quantifying correlations between OTU abundances and either continuous phenotypic variables or components of other compositional datasets, such as transcriptomic data. SparCEV and SparXCC both assume that the average correlation in the dataset is zero. Iterative versions of SparCEV and SparXCC are proposed to alleviate bias resulting from deviations from this assumption. We compare these new methods to empirical Pearson cross-correlations after applying naive transformations of the data (log and log-TSS). Additionally, we test the centered log ratio transformation (CLR) and the variance stabilising transformation (VST). We find that CLR and VST outperform naive transformations, except when the correlation matrix is dense. SparCEV and SparXCC outperform CLR and VST when the number of OTUs is small and perform similarly to CLR and VST for large numbers of OTUs. Adding the iterative procedure increases accuracy for SparCEV and SparXCC for all cases, except when the average correlation in the dataset is close to zero or the correlation matrix is dense. These results are consistent with our theoretical considerations.

Keyphrases