A large dataset of annotated incident reports on medication errors.
Zoie Shui-Yee WongNeil WatersJiaxing LiuShin UshiroPublished in: Scientific data (2024)
Incident reports of medication errors are valuable learning resources for improving patient safety. However, pertinent information is often contained within unstructured free text, which prevents automated analysis and limits the usefulness of these data. Natural language processing can structure this free text automatically and retrieve relevant past incidents and learning materials, but to be able to do so requires a large, fully annotated and validated corpus of incident reports. We present a corpus of 58,658 machine-annotated incident reports of medication errors that can be used to advance the development of information extraction models and subsequent incident learning. We report the best F1-scores for the annotated dataset: 0.97 and 0.76 for named entity recognition and intention/factuality analysis, respectively, for the cross-validation exercise. Our dataset contains 478,175 named entities and differentiates between incident types by recognising discrepancies between what was intended and what actually occurred. We explain our annotation workflow and technical validation and provide access to the validation datasets and machine annotator for labelling future incident reports of medication errors.