Login / Signup

Clustering with varying risks of false assignments in discrete latent variable model.

Donghwan LeeDongseok ChoiYoungjo Lee
Published in: Statistical methods in medical research (2020)
In clustering problems, to model the intrinsic structure of unlabeled data, the latent variable models are frequently used. These model-based clustering methods often provide a clustering rule minimizing the total false assignment error. However, in many clustering applications, it is desirable to treat false assignment errors for a certain cluster differently. In this paper, we introduce the false assignment rate for clustering and estimate it by using the extended likelihood approach. We propose VRclust, a novel clustering rule that controls various errors differently across clusters. Real data examples illustrate the usage of estimation of false assignment rate and a simulation study shows that error controls are consistent as the sample size increases.
Keyphrases
  • single cell
  • rna seq
  • mental health
  • electronic health record
  • emergency department
  • patient safety
  • risk assessment
  • machine learning
  • big data
  • adverse drug
  • climate change
  • data analysis