Login / Signup

Understanding metric-related pitfalls in image analysis validation.

Annika ReinkeMinu D TizabiMichael BaumgartnerMatthias EisenmannDoreen Heckmann-NötzelA Emre KavurTim RädschCarole H SudreLaura AciónMichela AntonelliTal ArbelSpyridon BakasArriel BenisFlorian BuettnerM Jorge CardosoVeronika CheplyginaJianxu ChenEvangelia ChristodoulouBeth A CiminiKeyvan FarahaniLuciana FerrerAdrian GaldranBram van GinnekenBen GlockerPatrick GodauDaniel A HashimotoMichael M HoffmanMerel HuismanFabian IsenseePierre JanninCharles E KahnDagmar KainmuellerBernhard KainzAlexandros KarargyrisJens KleesiekFlorian KoflerThijs KooiDominik T SchneiderMichal KozubekAnna KreshukTahsin KurcBennett A LandmanGeert LitjensAmin MadaniKlaus Maier-HeinAnne L MartelErik MeijeringBjoern H MenzeKarel G M MoonsHenning MullerBrennan NichyporukFelix NickelJens PetersenSusanne M RafelskiNasir M RajpootMauricio ReyesMichael A RieglerNicola RiekeJulio Saez-RodriguezClara I SánchezShravya ShettyRonald M SummersAbdel A TahaAleksei TiulpinSotirios A TsaftarisBen Van CalsterGael VaroquauxZiv Rafael YanivPaul F JaegerLena Maier-Hein
Published in: Nature methods (2024)
Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.
Keyphrases
  • artificial intelligence
  • healthcare
  • machine learning
  • primary care
  • deep learning
  • big data
  • mental health
  • quality improvement
  • health information
  • clinical practice
  • drug induced