Login / Signup

Double-Parallel Monte Carlo for Bayesian Analysis of Big Data.

Jingnan XueFaming Liang
Published in: Statistics and computing (2017)
This paper proposes a simple, practical and efficient MCMC algorithm for Bayesian analysis of big data. The proposed algorithm suggests to divide the big dataset into some smaller subsets and provides a simple method to aggregate the subset posteriors to approximate the full data posterior. To further speed up computation, the proposed algorithm employs the population stochastic approximation Monte Carlo (Pop-SAMC) algorithm, a parallel MCMC algorithm, to simulate from each subset posterior. Since this algorithm consists of two levels of parallel, data parallel and simulation parallel, it is coined as "Double Parallel Monte Carlo". The validity of the proposed algorithm is justified mathematically and numerically.
Keyphrases
  • big data
  • machine learning
  • monte carlo
  • artificial intelligence
  • deep learning
  • neural network
  • electronic health record
  • peripheral blood
  • data analysis