Login / Signup

Analysis of erroneous data entries in paper based and electronic data collection.

Benedikt LeyKomal Raj RijalJutta MarfurtNaba Raj AdhikariMegha Raj BanjaraUpendra Thapa ShresthaKamala ThriemerRic N PricePrakash Ghimire
Published in: BMC research notes (2019)
Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1370/12,530). Overall 64% (1499/2352) of all discrepancies were due to data omissions, 76.6% (1148/1499) of missing entries were among categorical data. Omissions in PBDC (n = 1002) were twice as frequent as in EDC (n = 497, p < 0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively.
Keyphrases
  • electronic health record
  • big data