Diversity and inclusion: A hidden additional benefit of Open Data.
Marie-Laure CharpignonLeo Anthony CeliMarisa CobanajRene EberAmelia FiskeJack GallifantChenyu LiGurucharan LingamalluAnton PetushkovRobin PiercePublished in: PLOS digital health (2024)
The recent imperative by the National Institutes of Health to share scientific data publicly underscores a significant shift in academic research. Effective as of January 2023, it emphasizes that transparency in data collection and dedicated efforts towards data sharing are prerequisites for translational research, from the lab to the bedside. Given the role of data access in mitigating potential bias in clinical models, we hypothesize that researchers who leverage open-access datasets rather than privately-owned ones are more diverse. In this brief report, we proposed to test this hypothesis in the transdisciplinary and expanding field of artificial intelligence (AI) for critical care. Specifically, we compared the diversity among authors of publications leveraging open datasets, such as the commonly used MIMIC and eICU databases, with that among authors of publications relying exclusively on private datasets, unavailable to other research investigators (e.g., electronic health records from ICU patients accessible only to Mayo Clinic analysts). To measure the extent of author diversity, we characterized gender balance as well as the presence of researchers from low- and middle-income countries (LMIC) and minority-serving institutions (MSI) located in the United States (US). Our comparative analysis revealed a greater contribution of authors from LMICs and MSIs among researchers leveraging open critical care datasets (treatment group) than among those relying exclusively on private data resources (control group). The participation of women was similar between the two groups, albeit slightly larger in the former. Notably, although over 70% of all articles included at least one author inferred to be a woman, less than 25% had a woman as a first or last author. Importantly, we found that the proportion of authors from LMICs was substantially higher in the treatment than in the control group (10.1% vs. 6.2%, p<0.001), including as first and last authors. Moreover, we found that the proportion of US-based authors affiliated with a MSI was 1.5 times higher among articles in the treatment than in the control group, suggesting that open data resources attract a larger pool of participants from minority groups (8.6% vs. 5.6%, p<0.001). Thus, our study highlights the valuable contribution of the Open Data strategy to underrepresented groups, while also quantifying persisting gender gaps in academic and clinical research at the intersection of computer science and healthcare. In doing so, we hope our work points to the importance of extending open data practices in deliberate and systematic ways.
Keyphrases
- electronic health record
- big data
- healthcare
- artificial intelligence
- minimally invasive
- machine learning
- mental health
- primary care
- metabolic syndrome
- end stage renal disease
- clinical decision support
- adipose tissue
- peritoneal dialysis
- skeletal muscle
- newly diagnosed
- health insurance
- polycystic ovary syndrome
- single cell
- intensive care unit
- insulin resistance
- quality improvement
- social media
- risk assessment
- acute respiratory distress syndrome
- smoking cessation