Login / Signup

Integrating topic modeling and word embedding to characterize violent deaths.

Alina Arseniev-KoehlerSusan D CochranVickie M MaysKai-Wei ChangJacob G Foster
Published in: Proceedings of the National Academy of Sciences of the United States of America (2022)
SignificanceWe introduce an approach to identify latent topics in large-scale text data. Our approach integrates two prominent methods of computational text analysis: topic modeling and word embedding. We apply our approach to written narratives of violent death (e.g., suicides and homicides) in the National Violent Death Reporting System (NVDRS). Many of our topics reveal aspects of violent death not captured in existing classification schemes. We also extract gender bias in the topics themselves (e.g., a topic about long guns is particularly masculine). Our findings suggest new lines of research that could contribute to reducing suicides or homicides. Our methods are broadly applicable to text data and can unlock similar information in other administrative databases.
Keyphrases
  • smoking cessation
  • big data
  • electronic health record
  • machine learning
  • mental health
  • genome wide
  • data analysis
  • healthcare
  • gene expression
  • emergency department
  • health information
  • dna methylation