Taxonize-gb: A tool for filtering GenBank non-redundant databases based on taxonomy.
Mohamed S SarhanMichele FilosiFrank MaixnerChristian FuchsbergerPublished in: bioRxiv : the preprint server for biology (2024)
Analyzing taxonomic diversity and identification in diverse ecological samples has become a crucial routine in various research and industrial fields. While DNA barcoding marker-gene approaches were once prevalent, the decreasing costs of next-generation sequencing have made metagenomic shotgun sequencing more popular and feasible. In contrast to DNA-barcoding, metagenomic shotgun sequencing offers possibilities for in-depth characterization of structural and functional diversity. However, analysis of such data is still considered a hurdle due to absence of taxa-specific databases. Here we present taxonize-gb, a command-line software tool to extract GenBank non-redundant nucleotide and protein databases, related to one or more input taxonomy identifier. Our tool allows the creation of taxa-specific reference databases tailored to specific research questions, which reduces search times and therefore represents a practical solution for researchers analyzing large metagenomic data on regular basis. Taxonize-gb is an open-source command-line Python-based tool freely available for installation at https://pypi.org/project/taxonize-gb/ and on GitHub https://github.com/msabrysarhan/taxonize_genbank. It is released under Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Keyphrases
- big data
- circulating tumor
- antibiotic resistance genes
- electronic health record
- single molecule
- copy number
- single cell
- artificial intelligence
- wastewater treatment
- magnetic resonance
- quality improvement
- computed tomography
- data analysis
- dna methylation
- nucleic acid
- optical coherence tomography
- clinical practice
- risk assessment
- transcription factor
- microbial community
- magnetic resonance imaging
- amino acid