On kernel machine learning for propensity score estimation under complex confounding structures.
Baiming ZouXinlei MiPatrick J TigheGary G KochFei ZouPublished in: Pharmaceutical statistics (2021)
Post marketing data offer rich information and cost-effective resources for physicians and policy-makers to address some critical scientific questions in clinical practice. However, the complex confounding structures (e.g., nonlinear and nonadditive interactions) embedded in these observational data often pose major analytical challenges for proper analysis to draw valid conclusions. Furthermore, often made available as electronic health records (EHRs), these data are usually massive with hundreds of thousands observational records, which introduce additional computational challenges. In this paper, for comparative effectiveness analysis, we propose a statistically robust yet computationally efficient propensity score (PS) approach to adjust for the complex confounding structures. Specifically, we propose a kernel-based machine learning method for flexibly and robustly PS modeling to obtain valid PS estimation from observational data with complex confounding structures. The estimated propensity score is then used in the second stage analysis to obtain the consistent average treatment effect estimate. An empirical variance estimator based on the bootstrap is adopted. A split-and-merge algorithm is further developed to reduce the computational workload of the proposed method for big data, and to obtain a valid variance estimator of the average treatment effect estimate as a by-product. As shown by extensive numerical studies and an application to postoperative pain EHR data comparative effectiveness analysis, the proposed approach consistently outperforms other competing methods, demonstrating its practical utility.