Login / Signup

Contrast trees and distribution boosting.

Jerome H Friedman
Published in: Proceedings of the National Academy of Sciences of the United States of America (2020)
A method for decision tree induction is presented. Given a set of predictor variables [Formula: see text] and two outcome variables y and z associated with each x, the goal is to identify those values of x for which the respective distributions of [Formula: see text] and [Formula: see text], or selected properties of those distributions such as means or quantiles, are most different. Contrast trees provide a lack-of-fit measure for statistical models of such statistics, or for the complete conditional distribution [Formula: see text], as a function of x. They are easily interpreted and can be used as diagnostic tools to reveal and then understand the inaccuracies of models produced by any learning method. A corresponding contrast-boosting strategy is described for remedying any uncovered errors, thereby producing potentially more accurate predictions. This leads to a distribution-boosting strategy for directly estimating the full conditional distribution of y at each x under no assumptions concerning its shape, form, or parametric representation.
Keyphrases
  • smoking cessation
  • human milk
  • magnetic resonance
  • contrast enhanced
  • emergency department
  • single cell
  • computed tomography
  • mass spectrometry
  • quality improvement
  • preterm birth
  • electronic health record