Login / Signup

An evaluation of the performance and suitability of R × C methods for ecological inference with known true values.

Carolina PlesciaLorenzo De Sio
Published in: Quality & quantity (2017)
Ecological inference refers to the study of individuals using aggregate data and it is used in an impressive number of studies; it is well known, however, that the study of individuals using group data suffers from an ecological fallacy problem (Robinson in Am Sociol Rev 15:351-357, 1950). This paper evaluates the accuracy of two recent methods, the Rosen et al. (Stat Neerl 55:134-156, 2001) and the Greiner and Quinn (J R Stat Soc Ser A (Statistics in Society) 172:67-81, 2009) and the long-standing Goodman's (Am Sociol Rev 18:663-664, 1953; Am J Sociol 64:610-625, 1959) method designed to estimate all cells of R × C tables simultaneously by employing exclusively aggregate data. To conduct these tests we leverage on extensive electoral data for which the true quantities of interest are known. In particular, we focus on examining the extent to which the confidence intervals provided by the three methods contain the true values. The paper also provides important guidelines regarding the appropriate contexts for employing these models.
Keyphrases
  • electronic health record
  • big data
  • climate change
  • cell proliferation
  • induced apoptosis
  • single cell
  • human health
  • signaling pathway
  • oxidative stress
  • machine learning