Login / Signup

Sensitive and error-tolerant annotation of protein-coding DNA with BATH.

Genevieve R KrauseWalt ShandsTravis J Wheeler
Published in: bioRxiv : the preprint server for biology (2024)
We present BATH, a tool for highly sensitive annotation of protein-coding DNA based on direct alignment of that DNA to a database of protein sequences or profile hidden Markov models (pHMMs). BATH is built on top of the HMMER3 code base, and simplifies the annotation workflow for pHMM-based annotation by providing a straightforward input interface and easy-to-interpret output. BATH also introduces novel frameshift-aware algorithms to detect frameshift-inducing nucleotide insertions and deletions (indels). BATH matches the accuracy of HM-MER3 for annotation of sequences containing no errors, and produces superior accuracy to all tested tools for annotation of sequences containing nucleotide indels. These results suggest that BATH should be used when high annotation sensitivity is required, particularly when frameshift errors are expected to interrupt protein-coding regions, as is true with long read sequencing data and in the context of pseudogenes.
Keyphrases
  • rna seq
  • single molecule
  • circulating tumor
  • single cell
  • protein protein
  • cell free
  • amino acid
  • binding protein
  • electronic health record
  • machine learning
  • patient safety
  • adverse drug
  • mass spectrometry
  • living cells