Login / Signup

Optimal distribution-preserving downsampling of large biomedical data sets (opdisDownsampling).

Jörn LötschSebastian MalkuschAlfred Ultsch
Published in: PloS one (2021)
Optimal distribution-preserving class-proportional downsampling yields data subsets that reflect the structure of the entire data better than those obtained with the standard method. By using distributional similarity as the only selection criterion, the proposed method does not in any way affect the results of a later planned analysis.
Keyphrases
  • electronic health record
  • big data
  • data analysis
  • machine learning