A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses.

Heidi SeiboldSeverin CzernySiona DeckeRoman DieterleThomas EderSteffen FohrNico HahnRabea HartmannChristoph HeindlPhilipp KopperDario LepkeVerena LoidlMaximilian MandlSarah MusiolJessica PeterAlexander PiehlerElio RojasStefanie SchmidHannah SchmidtMelissa SchmollLennart SchneiderXiao-Yin ToViet TranAntje VölkerMoritz WagnerJoshua WagnerMaria WaizeHannah WeckerRui YangSimone ZellnerMalte Nalenz

Published in: PloS one (2021)

Computational reproducibility is a corner stone for sound and credible research. Especially in complex statistical analyses-such as the analysis of longitudinal data-reproducing results is far from simple, especially if no source code is available. In this work we aimed to reproduce analyses of longitudinal data of 11 articles published in PLOS ONE. Inclusion criteria were the availability of data and author consent. We investigated the types of methods and software used and whether we were able to reproduce the data analysis using open source software. Most articles provided overview tables and simple visualisations. Generalised Estimating Equations (GEEs) were the most popular statistical models among the selected articles. Only one article used open source software and only one published part of the analysis code. Replication was difficult in most cases and required reverse engineering of results or contacting the authors. For three articles we were not able to reproduce the results, for another two only parts of them. For all but two articles we had to contact the authors to be able to reproduce the results. Our main learning is that reproducing papers is difficult if no code is supplied and leads to a high burden for those conducting the reproductions. Open data policies in journals are good, but to truly boost reproducibility we suggest adding open code policies.

Keyphrases