Identifying unreported links between ClinicalTrials.gov trial registrations and their published results.
Shifeng LiuFlorence T BourgeoisAdam G DunnPublished in: Research synthesis methods (2022)
A substantial proportion of trial registrations are not linked to corresponding published articles, limiting analyses and new tools. Our aim was to develop a method for finding articles reporting the results of trials that are registered on ClinicalTrials.gov when they do not include metadata links. We used a set of 27,280 trial registration and article pairs to train and evaluate methods for identifying missing links in both directions-from articles to registrations and from registrations to articles. We trained a classifier with six distance metrics as feature representations to rank the correct article or registration, using recall@K to evaluate performance and compare to baseline methods. When identifying links from registrations to published articles, the classifier ranked the correct article first (recall@1) among 378,048 articles in 80.8% of evaluation cases and 34.9% in the baseline method. Recall@10 was 85.1% compared to 60.7% in the baseline. When predicting links from articles to registrations, recall@1 was 83.4% for the classifier and 39.8% in the baseline. Recall@10 was 89.5% compared to 65.8% in the baseline. The proposed method improves on our baseline document similarity method to be feasible for identifying missing links in practice. Given a ClinicalTrials.gov registration, a user checking 10 ranked articles can expect to identify the matching article in at least 85% of cases, if the trial has been published. The proposed method can be used to improve the coupling of ClinicalTrials.gov and PubMed, with applications related to automating systematic review and evidence synthesis processes.