Login / Signup

Gaining Biological Insights through Supervised Data Visualization.

Jake S RhodesAdrien AumonSacha MorinMarc GirardCatherine LarochelleElsa Brunet-RatnasinghamAmélie PagliuzzaLorie MarchittoWei ZhangAdele CutlerFrancois Grand'MaisonAnhong ZhouAndrés FinziNicolas ChomontDaniel E KaufmannStephanie E J ZandeeAlexandre PratGuy WolfKevin R Moon
Published in: bioRxiv : the preprint server for biology (2024)
Dimensionality reduction-based data visualization is pivotal in comprehending complex biological data. The most common methods, such as PHATE, t-SNE, and UMAP, are unsupervised and therefore reflect the dominant structure in the data, which may be independent of expert-provided labels. Here we introduce a supervised data visualization method called RF-PHATE, which integrates expert knowledge for further exploration of the data. RF-PHATE leverages random forests to capture intricate featurelabel relationships. Extracting information from the forest, RF-PHATE generates low-dimensional visualizations that highlight relevant data relationships while disregarding extraneous features. This approach scales to large datasets and applies to classification and regression. We illustrate RF-PHATE's prowess through three case studies. In a multiple sclerosis study using longitudinal clinical and imaging data, RF-PHATE unveils a sub-group of patients with non-benign relapsingremitting Multiple Sclerosis, demonstrating its aptitude for time-series data. In the context of Raman spectral data, RF-PHATE effectively showcases the impact of antioxidants on diesel exhaust-exposed lung cells, highlighting its proficiency in noisy environments. Furthermore, RF-PHATE aligns established geometric structures with COVID-19 patient outcomes, enriching interpretability in a hierarchical manner. RF-PHATE bridges expert insights and visualizations, promising knowledge generation. Its adaptability, scalability, and noise tolerance underscore its potential for widespread adoption.
Keyphrases
  • electronic health record
  • multiple sclerosis
  • big data
  • machine learning
  • healthcare
  • coronavirus disease
  • sars cov
  • magnetic resonance imaging
  • deep learning
  • high resolution
  • cell death
  • pi k akt
  • cell cycle arrest