Controlled variable selection in Weibull mixture cure models for high-dimensional data.
Han FuDeedra NicoletKrzysztof MrózekRichard M StoneAnn-Kathrin EisfeldJohn C ByrdKellie J ArcherPublished in: Statistics in medicine (2022)
Medical breakthroughs in recent years have led to cures for many diseases. The mixture cure model (MCM) is a type of survival model that is often used when a cured fraction exists. Many have sought to identify genomic features associated with a time-to-event outcome which requires variable selection strategies for high-dimensional spaces. Unfortunately, currently few variable selection methods exist for MCMs especially when there are more predictors than samples. This study develops high-dimensional penalized Weibull MCMs, which allow for identification of prognostic factors associated with both cure status and/or survival. We demonstrated how such models may be estimated using two different iterative algorithms. The model-X knockoffs method was combined with these algorithms to control the false discovery rate (FDR) in variable selection. Through extensive simulation studies, our penalized MCMs have been shown to outperform alternative methods on multiple metrics and achieve high statistical power with FDR being controlled. In an acute myeloid leukemia (AML) application with gene expression data, our proposed approach identified 14 genes associated with potential cure and 12 genes with time-to-relapse, which may help inform treatment decisions for AML patients.
Keyphrases
- acute myeloid leukemia
- gene expression
- machine learning
- end stage renal disease
- allogeneic hematopoietic stem cell transplantation
- free survival
- ejection fraction
- deep learning
- big data
- dna methylation
- newly diagnosed
- genome wide
- magnetic resonance imaging
- peritoneal dialysis
- risk assessment
- bioinformatics analysis
- high throughput
- climate change
- data analysis
- human health
- case control