Login / Signup

Model-free conditional screening for ultrahigh-dimensional survival data via conditional distance correlation.

Hengjian CuiYanyan LiuGuangcai MaoJing Zhang
Published in: Biometrical journal. Biometrische Zeitschrift (2022)
How to select the active variables that have significant impact on the event of interest is a very important and meaningful problem in the statistical analysis of ultrahigh-dimensional data. In many applications, researchers often know that a certain set of covariates are active variables from some previous investigations and experiences. With the knowledge of the important prior knowledge of active variables, we propose a model-free conditional screening procedure for ultrahigh dimensional survival data based on conditional distance correlation. The proposed procedure can effectively detect the hidden active variables that are jointly important but are weakly correlated with the response. Moreover, it performs well when covariates are strongly correlated with each other. We establish the sure screening property and the ranking consistency of the proposed method and conduct extensive simulation studies, which suggests that the proposed procedure works well for practical situations. Then, we illustrate the new approach through a real dataset from the diffuse large-B-cell lymphoma study S1.
Keyphrases
  • diffuse large b cell lymphoma
  • electronic health record
  • healthcare
  • big data
  • minimally invasive
  • epstein barr virus
  • machine learning
  • free survival
  • case control