Hound: a novel tool for automated mapping of genotype to phenotype in bacterial genomes assembled de novo.
Carlos RedingNaphat SatapoominMatthew B AvisonPublished in: Briefings in bioinformatics (2024)
Increasing evidence suggests that microbial species have a strong within species genetic heterogeneity. This can be problematic for the analysis of prokaryote genomes, which commonly relies on a reference genome to guide the assembly process. Differences between reference and sample genomes will therefore introduce errors in final assembly, jeopardizing the detection from structural variations to point mutations-critical for genomic surveillance of antibiotic resistance. Here we present Hound, a pipeline that integrates publicly available tools to assemble prokaryote genomes de novo, detect user-given genes by similarity to report mutations found in the coding sequence, promoter, as well as relative gene copy number within the assembly. Importantly, Hound can use the query sequence as a guide to merge contigs, and reconstruct genes that were fragmented by the assembler. To showcase Hound, we screened through 5032 bacterial whole-genome sequences isolated from farmed animals and human infections, using the amino acid sequence encoded by blaTEM-1, to detect and predict resistance to amoxicillin/clavulanate which is driven by over-expression of this gene. We believe this tool can facilitate the analysis of prokaryote species that currently lack a reference genome, and can be scaled either up to build automated systems for genomic surveillance or down to integrate into antibiotic susceptibility point-of-care diagnostics.
Keyphrases
- copy number
- genome wide
- dna methylation
- mitochondrial dna
- amino acid
- public health
- genetic diversity
- deep learning
- endothelial cells
- machine learning
- gene expression
- high throughput
- genome wide identification
- microbial community
- high resolution
- transcription factor
- binding protein
- induced pluripotent stem cells
- loop mediated isothermal amplification
- genome wide analysis
- high density
- bioinformatics analysis
- real time pcr
- drug induced
- long non coding rna