Login / Signup

Sample size and sample composition for constructing growth reference centiles.

Tim James Cole
Published in: Statistical methods in medical research (2020)
Growth reference centile charts are widely used in child health to assess weight, height and other age-varying measurements. The centiles are easy to construct from reference data, using the LMS method or GAMLSS (Generalised Additive Models for Location Scale and Shape). However, there is as yet no clear guidance on how to design such studies, and in particular how many reference data to collect, and this has led to study sizes varying widely. The paper aims to provide a theoretical framework for optimally designing growth reference studies based on cross-sectional data. Centiles for weight, height, body mass index and head circumference, in 6878 boys aged 0-21 years from the Fourth Dutch Growth Study, were fitted using GAMLSS. The effect on precision of varying the sample size and the distribution of measurement ages (sample composition) was explored by fitting a series of GAMLSS models to simulated data. Sample composition was defined as uniform on the ageλ scale, where λ was chosen to give constant precision across the age range. Precision was measured on the z-score scale, and was the same for all four measurements, with a standard error of 0.041 z-score units for the median and 0.066 for the 2nd and 98th centiles. Compared to a naïve calculation, the process of smoothing the centiles increased the notional sample size two- to threefold by 'borrowing strength'. The sample composition for estimating the median curve was optimal for λ=0.4, reflecting considerable over-sampling of infants compared to children. However, for the 2nd and 98th centiles, λ=0.75 was optimal, with less infant over-sampling. The conclusion is that both sample size and sample composition need to be optimised. The paper provides practical advice on design, and concludes that optimally designed studies need 7000-25,000 subjects per sex.
Keyphrases
  • birth weight
  • body mass index
  • weight gain
  • gestational age
  • cross sectional
  • electronic health record
  • weight loss
  • young adults
  • physical activity
  • machine learning
  • data analysis