Login / Signup

A comparison of bias-adjusted generalized estimating equations for sparse binary data in small-sample longitudinal studies.

Masahiko GoshoRyota IshiiHisashi NomaKazushi Maruo
Published in: Statistics in medicine (2023)
Using a generalized estimating equation (GEE) can lead to a bias in regression coefficients for a small sample or sparse data. The bias-corrected GEE (BCGEE) and penalized GEE (PGEE) were proposed to resolve the small-sample bias. Moreover, the standard sandwich covariance estimator leads to a bias of standard error for small samples; several modified covariance estimators have been proposed to address this issue. We review the modified GEEs and modified covariance estimators, and evaluate their performance in sparse binary data from small-sample longitudinal studies. The simulation results showed that GEE and BCGEE often failed to achieve convergence, whereas the convergence proportion for PGEE was quite high. The bias for the regression coefficients was generally in the ascending order of PGEE < $$ < $$ BCGEE < $$ < $$ GEE. However, PGEE and BCGEE did not sufficiently remove the bias involving 20-30 subjects with unequal exposure levels with a 5% response rate. The coverage probability (CP) of the confidence interval for BCGEE was relatively poor compared with GEE and PGEE. The CP with the sandwich covariance estimator deteriorated regardless of the GEE methods under the small sample size and low response rate, whereas the CP with the modified covariance estimators-such as Morel's method-was relatively acceptable. PGEE will be the reasonable way for analyzing sparse binary data in small-sample studies. Instead of using the standard sandwich covariance estimator, one should always apply the modified covariance estimators for analyzing these data.
Keyphrases
  • electronic health record
  • big data
  • ionic liquid
  • data analysis
  • pulmonary artery
  • case control
  • coronary artery
  • aortic dissection