Login / Signup

Identification of outlying observations for large-dimensional data.

Tao WangXiaona YangYunfei GuoZhonghua Li
Published in: Journal of applied statistics (2021)
This work proposes a two-stage procedure for identifying outlying observations in a large-dimensional data set. In the first stage, an outlier identification measure is defined by using a max-normal statistic and a clean subset that contains non-outliers is obtained. The identification of outliers can be deemed as a multiple hypothesis testing problem, then, in the second stage, we explore the asymptotic distribution of the proposed measure, and obtain the threshold of the outlying observations. Furthermore, in order to improve the identification power and better control the misjudgment rate, a one-step refined algorithm is proposed. Simulation results and two real data analysis examples show that, compared with other methods, the proposed procedure has great advantages in identifying outliers in various data situations.
Keyphrases
  • data analysis
  • electronic health record
  • bioinformatics analysis
  • big data
  • machine learning