Login / Signup

clusterMLD: An Efficient Hierarchical Clustering Method for Multivariate Longitudinal Data.

Junyi ZhouYing ZhangWanzhu Tu
Published in: Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America (2023)
Longitudinal data clustering is challenging because the grouping has to account for the similarity of individual trajectories in the presence of sparse and irregular times of observation. This paper puts forward a hierarchical agglomerative clustering method based on a dissimilarity metric that quantifies the cost of merging two distinct groups of curves, which are depicted by B -splines for the repeatedly measured data. Extensive simulations show that the proposed method has superior performance in determining the number of clusters, classifying individuals into the correct clusters, and in computational efficiency. Importantly, the method is not only suitable for clustering multivariate longitudinal data with sparse and irregular measurements but also for intensely measured functional data. Towards this end, we provide an R package for the implementation of such analyses. To illustrate the use of the proposed clustering method, two large clinical data sets from real-world clinical studies are analyzed.
Keyphrases
  • electronic health record
  • big data
  • single cell
  • rna seq
  • data analysis
  • healthcare
  • depressive symptoms
  • machine learning
  • molecular dynamics
  • deep learning