Login / Signup

Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder.

Wenxing GaoWeili LinQiang LiWanning ChenWenjing YinXinyue ZhuSheng GaoLei LiuWenjie LiDingfeng WuGuoqing ZhangRui-Xin ZhuNa Jiao
Published in: Nature protocols (2024)
Microbial signatures have emerged as promising biomarkers for disease diagnostics and prognostics, yet their variability across different studies calls for a standardized approach to biomarker research. Therefore, we introduce xMarkerFinder, a four-stage computational framework for microbial biomarker identification with comprehensive validations from cross-cohort datasets, including differential signature identification, model construction, model validation and biomarker interpretation. xMarkerFinder enables the identification and validation of reproducible biomarkers for cross-cohort studies, along with the establishment of classification models and potential microbiome-induced mechanisms. Originally developed for gut microbiome research, xMarkerFinder's adaptable design makes it applicable to various microbial habitats and data types. Distinct from existing biomarker research tools that typically concentrate on a singular aspect, xMarkerFinder uniquely incorporates a sophisticated feature selection process, specifically designed to address the heterogeneity between different cohorts, extensive internal and external validations, and detailed specificity assessments. Execution time varies depending on the sample size, selected algorithm and computational resource. Accessible via GitHub ( https://github.com/tjcadd2020/xMarkerFinder ), xMarkerFinder supports users with diverse expertise levels through different execution options, including step-to-step scripts with detailed tutorials and frequently asked questions, a single-command execution script, a ready-to-use Docker image and a user-friendly web server ( https://www.biosino.org/xmarkerfinder ).
Keyphrases
  • microbial community
  • deep learning
  • machine learning
  • bioinformatics analysis
  • rna seq
  • single cell
  • electronic health record
  • big data
  • climate change
  • neural network
  • protein protein