Login / Signup

Prediction and outlier detection in classification problems.

Leying GuanRobert Tibshirani
Published in: Journal of the Royal Statistical Society. Series B, Statistical methodology (2022)
We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set C ( x ) as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class and to detect outliers x as often as possible. BCOPS returns no prediction (corresponding to C ( x ) equal to the empty set) if it infers x to be an outlier. The proposed method combines supervised learning algorithms with conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given procedure. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.
Keyphrases
  • machine learning
  • deep learning
  • electronic health record
  • healthcare
  • wastewater treatment
  • minimally invasive
  • loop mediated isothermal amplification
  • quantum dots
  • health insurance