Login / Signup

Simulation-Based Performance Evaluation of Missing Data Handling in Network Analysis.

Kai Jannik NehlerMartin Schultze
Published in: Multivariate behavioral research (2024)
Network analysis has gained popularity as an approach to investigate psychological constructs. However, there are currently no guidelines for applied researchers when encountering missing values. In this simulation study, we compared the performance of a two-step EM algorithm with separated steps for missing handling and regularization, a combined direct EM algorithm, and pairwise deletion. We investigated conditions with varying network sizes, numbers of observations, missing data mechanisms, and percentages of missing values. These approaches are evaluated with regard to recovering population networks in terms of loss in the precision matrix, edge set identification and network statistics. The simulation showed adequate performance only in conditions with large samples ( n ≥ 500 ) or small networks ( p  = 10). Comparing the missing data approaches, the direct EM appears to be more sensitive and superior in nearly all chosen conditions. The two-step EM yields better results when the ratio of n/p is very large - being less sensitive but more specific. Pairwise deletion failed to converge across numerous conditions and yielded inferior results overall. Overall, direct EM is recommended in most cases, as it is able to mitigate the impact of missing data quite well, while modifications to two-step EM could improve its performance.
Keyphrases
  • network analysis
  • electronic health record
  • big data
  • machine learning
  • deep learning
  • data analysis
  • high resolution
  • artificial intelligence
  • atomic force microscopy