Postprediction Inference for Clinical Characteristics Extracted With Machine Learning on Electronic Health Records.
Arjun SondhiAlexander S RichSiruo WangJeffery T LeekPublished in: JCO clinical cancer informatics (2023)
We describe and evaluate methods for fitting statistical models using ML-extracted variables subject to model error. We show that estimation and inference is generally valid when using extracted data from high-performing ML models. More complex methods that incorporate auxiliary labeled data provide further improvements.