Login / Signup

DiviK: divisive intelligent K-means for hands-free unsupervised clustering in big biological data.

Grzegorz MrukwaJoanna Polańska
Published in: BMC bioinformatics (2022)
DiviK could be the default choice in the exploration of MSI data. Thanks to its unique, GMM-based local optimisation of the feature space and deglomerative schema, DiviK results do not strongly depend on the feature engineering technique applied and can reveal the hidden structure in a tissue sample. Additionally, DiviK shows high scalability, and it can process at once the big omics data with more than 1.5 mln instances and a few thousand features. Finally, due to its simplicity, DiviK is easily generalisable to an even more flexible framework. Therefore, it is helpful for other -omics data (as single cell spatial transcriptomic) or tabular data in general (including medical images after appropriate embedding). A generic implementation is freely available under Apache 2.0 license at https://github.com/gmrukwa/divik .
Keyphrases
  • single cell
  • big data
  • electronic health record
  • machine learning
  • rna seq
  • deep learning
  • healthcare
  • high throughput
  • gene expression
  • functional connectivity
  • optical coherence tomography