This paper describes the creation of the Longitudinal, Intergenerational Family Electronic Micro-Database (LIFE-M), a new data resource linking vital records and decennial censuses for millions of individuals and families living in the late 19th and 20th centuries in the United States. This combination of records provides a life-course and intergenerational perspective on the evolution of health and economic outcomes. Vital records also enable the linkage of women, because they contain a crosswalk between women's birth (i.e., "maiden") and married names. We describe (1) the data sources, coverage, and linking sequence; (2) the process and supervised machine-learning methods to linking records longitudinally and across generations; and (3) the resulting linked samples, including linking rates, representativeness, and weights.
Keyphrases
- machine learning
- polycystic ovary syndrome
- big data
- electronic health record
- pregnancy outcomes
- healthcare
- public health
- cross sectional
- mental health
- adverse drug
- pregnant women
- drinking water
- type diabetes
- cervical cancer screening
- risk assessment
- metabolic syndrome
- adipose tissue
- hepatitis c virus
- health information
- social media
- hiv testing
- men who have sex with men