Complexity of possibly gapped histogram and analysis of histogram.
Fushing HsiehTania RoyPublished in: Royal Society open science (2018)
We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis of histogram (ANOHT). Constructing a possibly gapped histogram is a complex problem of statistical mechanics due to the ensemble of candidate histograms being captured by a two-layer Ising model. This construction is also a distinctive problem of Information Theory from the perspective of data compression via uniformity. By defining a Hamiltonian (or energy) as a sum of total coding lengths of boundaries and total decoding errors within bins, this issue of computing the minimum energy macroscopic states is surprisingly resolved by applying the hierarchical clustering algorithm. Thus, a possibly gapped histogram corresponds to a macro-state. And then the first phase of ANOHT is developed for simultaneous comparison of multiple treatments, while the second phase of ANOHT is developed based on classical empirical process theory for a tree-geometry that can check the authenticity of branches of the treatment tree. The well-known Iris data are used to illustrate our technical developments. Also, a large baseball pitching dataset and a heavily right-censored divorce data are analysed to showcase the existential gaps and utilities of ANOHT.
Keyphrases
- diffusion weighted imaging
- contrast enhanced
- diffusion weighted
- electronic health record
- big data
- magnetic resonance imaging
- machine learning
- healthcare
- computed tomography
- emergency department
- magnetic resonance
- health information
- single cell
- patient safety
- artificial intelligence
- deep learning
- social media
- adverse drug
- convolutional neural network
- data analysis
- high density