Application of Compositional Data Analysis to Study the Relationship between Bacterial Diversity in Human Faeces and Sex, Age, and Weight.
Elio López-GarcíaAntonio Benítez-CabelloAntonio Pablo Arenas-de LarrivaFrancisco Miguel Gutierrez-MariscalPablo Pérez-MartínezElena Maria Yubero-SerranoFrancisco Noé Arroyo-LópezAntonio Garrido-FernándezPublished in: Biomedicines (2023)
This work uses Compositional Data Analysis (CoDA) to examine the typical human faecal bacterial diversity in 39 healthy volunteers from the Andalusian region (Spain). Stool samples were subjected to high-throughput sequencing of the V3 and V4 regions of the 16S ribosomal RNA gene using Illumina MiSeq. The numbers of sequences per sample and their genus-level assignment were carried out using the Phyloseq R package. The alpha diversity indices of the faecal bacterial population were not influenced by the volunteer's sex (male or female), age (19-46 years), and weight (48.6-99.0 kg). To study the relationship between these variables and the faecal bacterial population, the ALDEx2 and coda4microbiome CoDA packages were used. Applying ALDEx2, a trend suggesting a connection between sex and the genera Senegalimassilia and Negatibacillus (slightly more abundant in females) and Desulfovibrio (more abundant in males) was found. Moreover, age was tentatively associated with Streptococcus , Tizzerella , and Ruminococaceae _UCG-003, while weight was linked to Senegalimassilia. The exploratory tool of the coda4microbiome package revealed numerous bacterial log-ratios strongly related to sex and, to a lesser extent, age and weight. Moreover, the cross-sectional analysis identified bacterial signature balances able to assign sex to samples regardless of controlling for volunteers' age or weight. Desulfovibrio , Faecalitalea , and Romboutsia were relevant in the numerator, while Coprococcus , Streptococcus , and Negatibacillus were prominent in the denominator; the greater presence of these could characterise the female sex. Predictions for age included Caproiciproducens , Coprobacter , and Ruminoclostridium in the numerator and Odoribacter , Ezakiella , and Tyzzerella in the denominator. The predictions depend on the relationship between both groups, but the abundance of the first group and scarcity of the second could be related to older individuals. However, the association of the faecal bacterial population with weight did not yield a satisfactory model, indicating scarce influence. These results demonstrate the usefulness of the CoDA methodology for studying metagenomics data and, specifically, human microbiota.