Login / Signup

Pharmacovigilance with Transformers: A Framework to Detect Adverse Drug Reactions Using BERT Fine-Tuned with FARM.

Sajid HussainHammad AfzalRamsha SaeedNaima IltafMir Yasir Umair
Published in: Computational and mathematical methods in medicine (2021)
Adverse drug reactions (ADRs) are the undesirable effects associated with the use of a drug due to some pharmacological action of the drug. During the last few years, social media has become a popular platform where people discuss their health problems and, therefore, has become a popular source to share information related to ADR in the natural language. This paper presents an end-to-end system for modelling ADR detection from the given text by fine-tuning BERT with a highly modular Framework for Adapting Representation Models (FARM). BERT overcame the predominant neural networks bringing remarkable performance gains. However, training BERT is a computationally expensive task which limits its usage for production environments and makes it difficult to determine the most important hyperparameters for the downstream task. Furthermore, developing an end-to-end ADR extraction system comprising two downstream tasks, i.e., text classification for filtering text containing ADRs and extracting ADR mentions from the classified text, is also challenging. The framework used in this work, FARM-BERT, provides support for multitask learning by combining multiple prediction heads which makes training of the end-to-end systems easier and computationally faster. In the proposed model, one prediction head is used for text classification and the other is used for ADR sequence labeling. Experiments are performed on Twitter, PubMed, TwiMed-Twitter, and TwiMed-PubMed datasets. The proposed model is compared with the baseline models and state-of-the-art techniques, and it is shown that it yields better results for the given task with the F-scores of 89.6%, 97.6%, 84.9%, and 95.9% on Twitter, PubMed, TwiMed-Twitter, and TwiMed-PubMed datasets, respectively. Moreover, training time and testing time of the proposed model are compared with BERT's, and it is shown that the proposed model is computationally faster than BERT.
Keyphrases