Login / Signup

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing.

Debbie RankinMichaela BlackRaymond R BondJonathan WallaceMaurice D MulvennaGorka Epelde
Published in: JMIR medical informatics (2020)
The results of this study are promising with small decreases in accuracy observed in models trained with synthetic data compared with models trained with real data, where both are tested on real data. Such deviations are expected and manageable. Tree-based classifiers have some sensitivity to synthetic data, and the underlying cause requires further investigation. This study highlights the potential of synthetic data and the need for further evaluation of their robustness. Synthetic data must ensure individual privacy and data utility are preserved in order to instill confidence in health care departments when using such data to inform policy decision-making.
Keyphrases
  • big data
  • electronic health record
  • machine learning
  • healthcare
  • artificial intelligence
  • data analysis
  • mental health
  • body composition