NCI Cancer Research Data Commons: Lessons Learned and Future State.
Erika KimTanja M DavidsenBrandi N Davis-DusenberyAlexander BaumannAngela MaggioZhaoyi ChenDaoud MeerzamanEsmeralda Casas-SilvaDavid PotTodd PihlJohn OtridgeEve R Shalleynull nullJill S Barnholtz-SloanAnthony R KerlavagePublished in: Cancer research (2024)
More than ever, scientific progress in cancer research hinges on our ability to combine datasets and extract meaningful interpretations to better understand diseases and ultimately inform the development of better treatments and diagnostic tools. To enable the successful sharing and use of big data, the NCI developed the Cancer Research Data Commons (CRDC), providing access to a large, comprehensive, and expanding collection of cancer data. The CRDC is a cloud-based data science infrastructure that eliminates the need for researchers to download and store large-scale datasets by allowing them to perform analysis where data reside. Over the past 10 years, the CRDC has made significant progress in providing access to data and tools along with training and outreach to support the cancer research community. In this review, we provide an overview of the history and the impact of the CRDC to date, lessons learned, and future plans to further promote data sharing, accessibility, interoperability, and reuse. See related articles by Brady et al., p. 1384, Wang et al., p. 1388, and Pot et al., p. 1396.