Scalable Log-ratio Lasso Regression Enhances Microbiome Feature Selection for Predictive Models.
Teng FeiTyler FunnellNicholas R WatersSandeep S RajSean M DevlinAngel DaiOriana MiltiadousRoni ShouvalMeng LvJonathan U PeledDoris M PonceMiguel-Ángel PeralesMithat GönenMarcel R M van den BrinkPublished in: bioRxiv : the preprint server for biology (2023)
Identifying predictive biomarkers of patient outcomes from high-throughput microbiome data is of high interest in contemporary cancer research. We present FLORAL , an open-source computational tool to perform scalable log-ratio lasso regression modeling and microbial feature selection for continuous, binary, time-to-event, and competing risk outcomes. The proposed method adapts the augmented Lagrangian algorithm for a zero-sum constraint optimization problem while enabling a two-stage screening process for extended false-positive control. In extensive simulation studies, FLORAL achieved consistently better false-positive control compared to other lasso-based approaches and better variable selection F 1 score over popular differential abundance approaches. We demonstrate the practical utility of the proposed tool with a real data application on an allogeneic hematopoietic-cell transplantation cohort. The R package is available at https://github.com/vdblab/FLORAL .
Keyphrases
- machine learning
- high throughput
- deep learning
- electronic health record
- stem cell transplantation
- big data
- papillary thyroid
- microbial community
- neural network
- bone marrow
- case report
- virtual reality
- squamous cell
- single cell
- squamous cell carcinoma
- data analysis
- adipose tissue
- skeletal muscle
- high dose
- lymph node metastasis
- hematopoietic stem cell
- metabolic syndrome
- low dose
- wastewater treatment