This paper describes the creation of the Longitudinal, Intergenerational Family Electronic Micro-Database (LIFE-M), a new data resource linking vital records and decennial censuses for millions of individuals and families living in the late 19th and 20th centuries in the United States. This combination of records provides a life-course and intergenerational perspective on the evolution of health and economic outcomes. Vital records also enable the linkage of women, because they contain a crosswalk between women's birth (i.e., "maiden") and married names. We describe (1) the data sources, coverage, and linking sequence; (2) the process and supervised machine-learning methods to linking records longitudinally and across generations; and (3) the resulting linked samples, including linking rates, representativeness, and weights.
Keyphrases
- machine learning
- polycystic ovary syndrome
- big data
- electronic health record
- public health
- pregnancy outcomes
- healthcare
- mental health
- artificial intelligence
- health information
- metabolic syndrome
- adipose tissue
- genome wide
- pregnant women
- social media
- skeletal muscle
- data analysis
- breast cancer risk
- insulin resistance
- intimate partner violence