The discovery of biomarkers that are informative for cancer risk assessment, diagnosis, prognosis and treatment predictions is crucial. Recent advances in high-throughput genomics make it plausible to select biomarkers from the vast number of human genes in an unbiased manner. Yet, control of false discoveries is challenging given the large number of genes versus the relatively small number of patients in a typical cancer study. To ensure that most of the discoveries are true, we employ a knockoff procedure to control false discoveries. Our method is general and flexible, accommodating arbitrary covariate distributions, linear and nonlinear associations, and survival models. In simulations, our method compares favorably to the alternatives; its utility of identifying important genes in real clinical applications is demonstrated by the identification of seven genes associated with Breslow thickness in skin cutaneous melanoma patients.
Keyphrases
- high throughput
- end stage renal disease
- papillary thyroid
- risk assessment
- chronic kidney disease
- newly diagnosed
- ejection fraction
- genome wide
- small molecule
- squamous cell
- endothelial cells
- bioinformatics analysis
- prognostic factors
- single cell
- molecular dynamics
- lymph node metastasis
- genome wide identification
- heavy metals