A bacterial sensor taxonomy across earth ecosystems for machine learning applications.
Helen ParkMarcin P JoachimiakSean P JungbluthZiming YangWilliam J RiehlR Shane CanonAdam Paul ArkinParamvir S DehalPublished in: mSystems (2023)
Microbes infect, colonize, and proliferate due to their ability to sense and respond quickly to their surroundings. In this research, we extract the sensory proteins from a diverse range of environmental, engineered, and host-associated metagenomes. We trained machine learning classifiers using sensors as features such that it is possible to predict the ecosystem for a metagenome from its sensor profile. We use the optimized model's feature importance to identify the most impactful and predictive sensors in different environments. We next use the sensor profile from human gut metagenomes to classify their disease states and explore which sensors can explain differences between diseases. The sensors most predictive of environmental labels here, most of which correspond to uncharacterized proteins, are a useful starting point for the discovery of important environment signals and the development of possible diagnostic interventions.