Login / Signup

Model-based multifacet clustering with high-dimensional omics applications.

Wei ZongDanyang LiMarianne L SeneyColleen A McclungGeorge C Tseng
Published in: Biostatistics (Oxford, England) (2024)
High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.
Keyphrases
  • single cell
  • rna seq
  • high throughput
  • electronic health record
  • high resolution
  • small molecule
  • healthcare
  • multiple sclerosis
  • case control
  • blood brain barrier
  • data analysis
  • deep learning
  • subarachnoid hemorrhage