Login / Signup

Penalized likelihood methods for modeling count data.

Minh Thu BuiCornelis J PotgieterAkihito Kamata
Published in: Journal of applied statistics (2022)
The paper considers parameter estimation in count data models using penalized likelihood methods. The motivating data consists of multiple independent count variables with a moderate sample size per variable. The data were collected during the assessment of oral reading fluency (ORF) in school-aged children. A sample of fourth-grade students were given one of ten available passages to read with these differing in length and difficulty. The observed number of words read incorrectly (WRI) is used to measure ORF. Three models are considered for WRI scores, namely the binomial, the zero-inflated binomial, and the beta-binomial. We aim to efficiently estimate passage difficulty, a quantity expressed as a function of the underlying model parameters. Two types of penalty functions are considered for penalized likelihood with respective goals of shrinking parameter estimates closer to zero or closer to one another. A simulation study evaluates the efficacy of the shrinkage estimates using Mean Square Error (MSE) as metric. Big reductions in MSE relative to unpenalized maximum likelihood are observed. The paper concludes with an analysis of the motivating ORF data.
Keyphrases
  • electronic health record
  • big data
  • mental health
  • young adults
  • physical activity
  • data analysis
  • single molecule
  • machine learning