PubExN: An Automated PubMed Bulk Article Extractor with Affiliation Normalization Package.
Ashutosh KumarAakanksha SharaffPublished in: SN computer science (2023)
Biomedical article extraction is the preliminary step for every biomedical application. These applications are helpful in finding the gene, disease, chemical, drugs, protein entities. Finding entities relation such as gene-gene entities, drug-disease interaction, and chemical protein relation the PubExN can be helpful for these types of biomedical applications. In most cases, domain experts do this extraction process on their own. Human interference makes this process time-consuming and there is a high probability, that documents can be missed during the extraction process. To get rid of these complicated processes a python package is introduced to automate the process of bulk extraction from the PubMed database. The extraction process covers all the citation information with the associated abstract. The batch approach is used to extract the bulk extraction. The motivation for the development of PubExN was to provide flexibility for the extraction process of biomedical article's text data from NCBI's PubMed database. Basically, NCBI's PubMed database article contains the article id or can say PubMed-id (PMID), the title of the article, abstract, authors information, etc. This package will benefit many biomedical texts mining research including biomedical named entity recognition, biomedical relation extraction, literature discovery, knowledgebase creation, and various biomedical Natural Language Processing (NLP) tasks. In addition, it could be used in the author name disambiguation problems and new drug discoveries. This package will help save time and extra effort for the extraction and normalization process of PubMed articles.
Keyphrases