Login / Signup

Designing accurate emulators for scientific processes using calibration-driven deep models.

Jayaraman J ThiagarajanBindya VenkateshRushil AnirudhPeer-Timo BremerJim GaffneyGemma J AndersonBrian Spears
Published in: Nature communications (2020)
Predictive models that accurately emulate complex scientific processes can achieve speed-ups over numerical simulators or experiments and at the same time provide surrogates for improving the subsequent analysis. Consequently, there is a recent surge in utilizing modern machine learning methods to build data-driven emulators. In this work, we study an often overlooked, yet important, problem of choosing loss functions while designing such emulators. Popular choices such as the mean squared error or the mean absolute error are based on a symmetric noise assumption and can be unsuitable for heterogeneous data or asymmetric noise distributions. We propose Learn-by-Calibrating, a novel deep learning approach based on interval calibration for designing emulators that can effectively recover the inherent noise structure without any explicit priors. Using a large suite of use-cases, we demonstrate the efficacy of our approach in providing high-quality emulators, when compared to widely-adopted loss function choices, even in small-data regimes.
Keyphrases
  • machine learning
  • deep learning
  • air pollution
  • big data
  • electronic health record
  • artificial intelligence
  • low cost
  • data analysis
  • mass spectrometry