Login / Signup

How to make causal inferences using texts.

Naoki EgamiChristian J FongJustin GrimmerMargaret E RobertsBrandon M Stewart
Published in: Science advances (2022)
Text as data techniques offer a great promise: the ability to inductively discover measures that are useful for testing social science theories with large collections of text. Nearly all text-based causal inferences depend on a latent representation of the text, but we show that estimating this latent representation from the data creates underacknowledged risks: we may introduce an identification problem or overfit. To address these risks, we introduce a split-sample workflow for making rigorous causal inferences with discovered measures as treatments or outcomes. We then apply it to estimate causal effects from an experiment on immigration attitudes and a study on bureaucratic responsiveness.
Keyphrases
  • smoking cessation
  • electronic health record
  • big data
  • human health
  • healthcare
  • type diabetes
  • metabolic syndrome
  • skeletal muscle
  • insulin resistance
  • climate change
  • weight loss