Login / Signup

A Unified Approach for Outliers and Influential Data Detection - The Value of Information in Retrospect.

Jacob ParsonsLe Bao
Published in: Stat (International Statistical Institute) (2021)
Identifying influential and outlying data is important as it would guide the effective collection of future data and the proper use of existing information. We develop a unified approach for outlier detection and influence analysis. Our proposed method is grounded in the intuitive value of information concepts and has a distinct advantage in interpretability and flexibility when compared to existing methods: it decomposes the data influence into the leverage effect (expected to be influential) and the outlying effect (surprisingly more influential than being expected); and it applies to all decision problems such as estimation, prediction, and hypothesis testing. We study the theoretical properties of three value of information quantities, establish the relationship between the proposed measures and classic measures in the linear regression setting, and provide real data analysis examples of how to apply the new value of information approach in the cases of linear regression, generalized linear mixed model, and hypothesis testing.
Keyphrases
  • data analysis
  • electronic health record
  • health information
  • big data
  • machine learning
  • artificial intelligence
  • social media
  • label free
  • loop mediated isothermal amplification
  • deep learning
  • neural network