Login / Signup

Generalized Hotelling's test for paired compositional data with application to human microbiome studies.

Ni ZhaoXiang ZhanKatherine A GuthrieCaroline M MitchellJoseph Larson
Published in: Genetic epidemiology (2018)
The human microbiome is a dynamic system that changes due to diseases, medication, change in diet, etc. The paired design is a common approach to evaluate the microbial changes while controlling for the inherent differences between people. For example, microbiome data may be collected from the same individuals before and after a treatment. Two challenges exist in analyzing this type of data. First, microbiome data are compositional such that the reads for all taxa in each sample are constrained to sum to a constant. Second, the number of taxa can be much larger than the sample size. Few statistical methods exist to analyze such data besides methods that test one taxon at a time. In this paper, we propose to first conduct a log-ratio transformation of the compositions, and then develop a generalized Hotelling's test (GHT) to evaluate whether the average microbiome compositions are equivalent in the paired samples. We replace the sample covariance matrix in standard Hotelling's statistic by a shrinkage-based covariance, calculated as a weighted average of the sample covariance and a positive definite target matrix. The optimal weighting can be obtained for many commonly used target matrices. We develop a permutation procedure to assess the statistical significance. Extensive simulations show that our proposed method has well-controlled type I error and better power than a few ad hoc approaches. We apply our method to examine the vaginal microbiome changes in response to treatments for menopausal hot flashes. An R package " GHT" is freely available at https://github.com/zhaoni153/GHT.
Keyphrases
  • electronic health record
  • big data
  • endothelial cells
  • machine learning
  • minimally invasive
  • case control
  • deep learning
  • monte carlo