Login / Signup

A synthetic population dataset for estimating small area health and socio-economic outcomes in Great Britain.

Guoqiang WuAlison HeppenstallPetra MeierRobin PurshouseNik Lomax
Published in: Scientific data (2022)
In order to understand the health outcomes for distinct sub-groups of the population or across different geographies, it is advantageous to be able to build bespoke groupings from individual level data. Individuals possess distinct characteristics, exhibit distinct behaviours and accumulate their own unique history of exposure or experiences. However, in most disciplines, not least public health, there is a lack of individual level data available outside of secure settings, especially covering large portions of the population. This paper provides detail on the creation of a synthetic micro dataset for individuals in Great Britain who have detailed attributes which can be used to model a wide range of health and other outcomes. These attributes are constructed from a range of sources including the United Kingdom Census, survey and administrative datasets. It provides a rationale for the need for this synthetic population, discusses methods for creating this dataset and provides some example results of different attribute distributions for distinct sub-population groups and over different geographical areas.
Keyphrases
  • public health
  • healthcare
  • mental health
  • wastewater treatment
  • machine learning
  • risk assessment
  • big data
  • cross sectional
  • drinking water
  • health information
  • rna seq
  • human health