Dark Data in Real-World Evidence: Challenges, Implications, and the Imperative of Data Literacy in Medical Research.
Hun Sung KimPublished in: Journal of Korean medical science (2024)
Randomized controlled trials (RCTs) and real-world evidence (RWE) studies are crucial and complementary in generating clinical evidence. RCTs provide controlled settings to validate the clinical effect of specific drugs or medical devices, while RWE integrates extrinsic factors, encompassing external influences affecting real-world scenarios, thus challenging RCT results in practical applications. In this study, we explore the impact of extrinsic factors on RWE outcomes, focusing on "dark data," which refers to data collected but not used or excluded from the analyses. Dark data can arise in many ways during research process, from selecting study samples to data collection and analysis. However, even unused or unanalyzed dark data hold potential insights, providing a comprehensive view of clinical contexts. Extrinsic factors lead to divergent RWE outcomes that could differ from RCTs beyond statistical correction's scope. Two main types of dark data exist: "known-unknown" and "unknown-unknown." The distinction between these dark data types highlights RWE's complexity. The transformation of unknown into known depends on data literacy-powerful utilization capabilities that can be interpreted based on medical expertise. Shifting the focus to excluded subjects or unused data in real-world contexts reveals unexplored potential. Understanding the significance of dark data is vital in reflecting the complexity of clinical settings. Connecting RCTs and RWEs requires medical data literacy, enabling clinicians to decipher meaningful insights. In the big data and artificial intelligence era, medical staff must navigate data complexities while promoting the core role of medicine. Prepared clinicians will lead this transformative journey, ensuring data value shapes the medical landscape.