Login / Signup

Private measures, random walks, and synthetic data.

March BoedihardjoThomas StrohmerRoman Vershynin
Published in: Probability theory and related fields (2024)
Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex-but very common-machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a private measure from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data in general compact metric spaces, for any fixed privacy budget ε bounded away from zero. A key ingredient in our construction is a new superregular random walk , whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmically slowly.
Keyphrases
  • big data
  • machine learning
  • health information
  • artificial intelligence
  • healthcare
  • health insurance
  • electronic health record
  • social media
  • deep learning
  • working memory
  • high resolution
  • public health