Hound: a novel tool for automated mapping of genotype to phenotype in bacterial genomes assembled de novo.
Carlos RedingNaphat SatapoominMatthew B AvisonPublished in: Briefings in bioinformatics (2024)
Increasing evidence suggests that microbial species have a strong within species genetic heterogeneity. This can be problematic for the analysis of prokaryote genomes, which commonly relies on a reference genome to guide the assembly process. Differences between reference and sample genomes will therefore introduce errors in final assembly, jeopardizing the detection from structural variations to point mutations-critical for genomic surveillance of antibiotic resistance. Here we present Hound, a pipeline that integrates publicly available tools to assemble prokaryote genomes de novo, detect user-given genes by similarity to report mutations found in the coding sequence, promoter, as well as relative gene copy number within the assembly. Importantly, Hound can use the query sequence as a guide to merge contigs, and reconstruct genes that were fragmented by the assembler. To showcase Hound, we screened through 5032 bacterial whole-genome sequences isolated from farmed animals and human infections, using the amino acid sequence encoded by blaTEM-1, to detect and predict resistance to amoxicillin/clavulanate which is driven by over-expression of this gene. We believe this tool can facilitate the analysis of prokaryote species that currently lack a reference genome, and can be scaled either up to build automated systems for genomic surveillance or down to integrate into antibiotic susceptibility point-of-care diagnostics.
Keyphrases
- copy number
- genome wide
- dna methylation
- mitochondrial dna
- amino acid
- public health
- genetic diversity
- deep learning
- machine learning
- high throughput
- gene expression
- poor prognosis
- endothelial cells
- high resolution
- microbial community
- transcription factor
- single cell
- genome wide identification
- emergency department
- pluripotent stem cells
- long non coding rna
- bioinformatics analysis
- real time pcr
- genome wide analysis