Core concepts in pharmacoepidemiology: Validation of health outcomes of interest within real-world healthcare databases.
Erica J WeinsteinMary Elizabeth RitcheyVincent Lo RePublished in: Pharmacoepidemiology and drug safety (2022)
Real-world healthcare data, including administrative and electronic medical record databases, provide a rich source of data for the conduct of pharmacoepidemiologic studies but carry the potential for misclassification of health outcomes of interest (HOIs). Validation studies are important ways to quantify the degree of error associated with case-identifying algorithms for HOIs and are crucial for interpreting study findings within real-world data. This review provides a rationale, framework, and step-by-step approach to validating case-identifying algorithms for HOIs within healthcare databases. Key steps in validating a case-identifying algorithm within a healthcare database include: (1) selecting the appropriate health outcome; (2) determining the reference standard against which to validate the algorithm; (3) developing the algorithm using diagnosis codes, diagnostic tests or their results, procedures, drug therapies, patient-reported symptoms or diagnoses, or some combinations of these parameters; (4) selection of patients and sample sizes for validation; (5) collecting data to confirm the HOI; (6) confirming the HOI; and (7) assessing the algorithm's performance. Additional strategies for algorithm refinement and methods to correct for bias due to misclassification of outcomes are discussed. The review concludes by discussing factors affecting the transportability of case-identifying algorithms and the need for ongoing validation as data elements within healthcare databases, such as diagnosis codes, change over time or new variables, such as patient-generated health data, are included in these data sources.
Keyphrases
- healthcare
- machine learning
- big data
- electronic health record
- deep learning
- artificial intelligence
- patient reported
- public health
- type diabetes
- health information
- physical activity
- newly diagnosed
- ejection fraction
- insulin resistance
- adipose tissue
- risk assessment
- patient reported outcomes
- social media
- human health
- sleep quality
- health promotion