This paper describes the creation of the Longitudinal, Intergenerational Family Electronic Micro-Database (LIFE-M), a new data resource linking vital records and decennial censuses for millions of individuals and families living in the late 19th and 20th centuries in the United States. This combination of records provides a life-course and intergenerational perspective on the evolution of health and economic outcomes. Vital records also enable the linkage of women, because they contain a crosswalk between women's birth (i.e., "maiden") and married names. We describe (1) the data sources, coverage, and linking sequence; (2) the process and supervised machine-learning methods to linking records longitudinally and across generations; and (3) the resulting linked samples, including linking rates, representativeness, and weights.
Keyphrases
- machine learning
- polycystic ovary syndrome
- big data
- electronic health record
- healthcare
- pregnancy outcomes
- public health
- cross sectional
- quality improvement
- adverse drug
- emergency department
- artificial intelligence
- drinking water
- type diabetes
- genome wide
- insulin resistance
- dna methylation
- weight loss
- data analysis
- skeletal muscle
- breast cancer risk
- deep learning
- hiv infected
- human health