Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning.
Yang LiuChristopher WhitfieldTianyang ZhangAmanda HauserTaeyonn ReynoldsMohd AnwarPublished in: Health information science and systems (2021)
The findings in our study show that the use of Reddit data to monitor COVID-19 pandemic in North Carolina (NC) was effective. The study shows the utility of NLP methods (e.g. cosine similarity, Latent Dirichlet Allocation (LDA) topic modeling, custom NER and BERT-based sentence clustering) in discovering the change of the public's concerns/behaviors over the course of COVID-19 pandemic in NC using Reddit data. Moreover, the results show that social media data can be utilized to surveil the epidemic situation in a specific community.