Large Language Models and Genomics for Summarizing the Role of microRNA in Regulating mRNA Expression.
Balu BhasuranSharanya ManoharanOviya Ramalakshmi IyyappanGurusamy MurugesanArchana PrabaharKalpana RajaPublished in: Biomedicines (2024)
microRNA (miRNA)-messenger RNA (mRNA or gene) interactions are pivotal in various biological processes, including the regulation of gene expression, cellular differentiation, proliferation, apoptosis, and development, as well as the maintenance of cellular homeostasis and pathogenesis of numerous diseases, such as cancer, cardiovascular diseases, neurological disorders, and metabolic conditions. Understanding the mechanisms of miRNA-mRNA interactions can provide insights into disease mechanisms and potential therapeutic targets. However, extracting these interactions efficiently from a huge collection of published articles in PubMed is challenging. In the current study, we annotated a miRNA-mRNA Interaction Corpus (MMIC) and used it for evaluating the performance of a variety of machine learning (ML) models, deep learning-based transformer (DLT) models, and large language models (LLMs) in extracting the miRNA-mRNA interactions mentioned in PubMed. We used the genomics approaches for validating the extracted miRNA-mRNA interactions. Among the ML, DLT, and LLM models, PubMedBERT showed the highest precision, recall, and F-score, with all equal to 0.783. Among the LLM models, the performance of Llama-2 is better when compared to others. Llama 2 achieved 0.56 precision, 0.86 recall, and 0.68 F-score in a zero-shot experiment and 0.56 precision, 0.87 recall, and 0.68 F-score in a three-shot experiment. Our study shows that Llama 2 achieves better recall than ML and DLT models and leaves space for further improvement in terms of precision and F-score.
Keyphrases
- gene expression
- machine learning
- deep learning
- cardiovascular disease
- binding protein
- type diabetes
- oxidative stress
- autism spectrum disorder
- signaling pathway
- systematic review
- randomized controlled trial
- dna methylation
- cell death
- single cell
- risk assessment
- genome wide
- brain injury
- subarachnoid hemorrhage
- big data