Taxonize-gb: A tool for filtering GenBank non-redundant databases based on taxonomy.
Mohamed S SarhanMichele FilosiFrank MaixnerChristian FuchsbergerPublished in: bioRxiv : the preprint server for biology (2024)
Analyzing taxonomic diversity and identification in diverse ecological samples has become a crucial routine in various research and industrial fields. While DNA barcoding marker-gene approaches were once prevalent, the decreasing costs of next-generation sequencing have made metagenomic shotgun sequencing more popular and feasible. In contrast to DNA-barcoding, metagenomic shotgun sequencing offers possibilities for in-depth characterization of structural and functional diversity. However, analysis of such data is still considered a hurdle due to absence of taxa-specific databases. Here we present taxonize-gb, a command-line software tool to extract GenBank non-redundant nucleotide and protein databases, related to one or more input taxonomy identifier. Our tool allows the creation of taxa-specific reference databases tailored to specific research questions, which reduces search times and therefore represents a practical solution for researchers analyzing large metagenomic data on regular basis. Taxonize-gb is an open-source command-line Python-based tool freely available for installation at https://pypi.org/project/taxonize-gb/ and on GitHub https://github.com/msabrysarhan/taxonize_genbank. It is released under Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Keyphrases
- electronic health record
- circulating tumor
- big data
- antibiotic resistance genes
- single cell
- single molecule
- cell free
- copy number
- magnetic resonance
- wastewater treatment
- oxidative stress
- clinical practice
- computed tomography
- gene expression
- genome wide
- microbial community
- climate change
- artificial intelligence
- optical coherence tomography
- machine learning
- binding protein
- quality improvement
- human health
- small molecule
- contrast enhanced
- circulating tumor cells