Postprediction Inference for Clinical Characteristics Extracted With Machine Learning on Electronic Health Records.

Arjun SondhiAlexander S RichSiruo WangJeffery T Leek
Published in: JCO clinical cancer informatics (2023)
We describe and evaluate methods for fitting statistical models using ML-extracted variables subject to model error. We show that estimation and inference is generally valid when using extracted data from high-performing ML models. More complex methods that incorporate auxiliary labeled data provide further improvements.