Login / Signup

Computing the polytomous discrimination index.

Douglas C DoverSunjidatul IslamCynthia M WesterhoutLinn E MoorePadma KaulAnamaria Savu
Published in: Statistics in medicine (2021)
Polytomous regression models generalize logistic models for the case of a categorical outcome variable with more than two distinct categories. These models are currently used in clinical research, and it is essential to measure their abilities to distinguish between the categories of the outcome. In 2012, van Calster et al proposed the polytomous discrimination index (PDI) as an extension of the binary discrimination c-statistic to unordered polytomous regression. The PDI is a summary of the simultaneous discrimination between all outcome categories. Previous implementations of the PDI are not capable of running on "big data." This article shows that the PDI formula can be manipulated to depend only on the distributions of the predicted probabilities evaluated for each outcome category and within each observed level of the outcome, which substantially improves the computation time. We present a SAS macro and R function that can rapidly evaluate the PDI and its components. The routines are evaluated on several simulated datasets after varying the number of categories of the outcome and size of the data and two real-world large administrative health datasets. We compare PDI with two other discrimination indices: M-index and hypervolume under the manifold (HUM) on simulated examples. We describe situations where the PDI and HUM, indices based on multiple comparisons, are superior to the M-index, an index based on pairwise comparisons, to detect predictions that are no different than random selection or erroneous due to incorrect ranking.
Keyphrases
  • big data
  • healthcare
  • machine learning
  • mental health
  • preterm birth
  • neural network
  • single cell
  • data analysis