An optimal normalization method for high sparse compositional microbiome data.
Michael B SohnCynthia L MonacoSteven R GillPublished in: PLoS computational biology (2024)
In many omics data, including microbiome sequencing data, we are only able to measure relative information. Various computational or statistical methods have been proposed to extract absolute (or biologically relevant) information from this relative information; however, these methods are under rather strong assumptions that may not be suitable for multigroup (more than two groups) and/or longitudinal outcome data. In this article, we first introduce the minimal assumption required to extract absolute from relative information. This assumption is less stringent than those imposed in existing methods, thus being applicable to multigroup and/or longitudinal outcome data. We then propose the first normalization method that works under this minimal assumption. The optimality and validity of the proposed method and its beneficial effects on downstream analysis are demonstrated in extensive simulation studies, where existing methods fail to produce consistent performance under the minimal assumption. We also demonstrate its application to real microbiome datasets to determine biologically relevant microbes to a specific disease/condition.