Login / Signup

A large-scale assessment of sequence database search tools for homology-based protein function prediction.

Chengxin ZhangPeter L Freddolino
Published in: bioRxiv : the preprint server for biology (2023)
Sequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. Although sequence search tools are the basis of homology-based protein function prediction, previous studies have scarcely explored how to select the optimal sequence search tools and configure their parameters to achieve the best function prediction. In this paper, we evaluate the effect of using different options from among popular search tools, as well as the impacts of search parameters, on protein function prediction. When predicting GO terms on a large benchmark dataset, we found that BLASTp and MMseqs2 consistently exceed the performance of other tools, including DIAMOND - one of the most popular tools for function prediction - under default search parameters. However, with the correct parameter settings, DIAMOND can perform comparably to BLASTp and MMseqs2 in function prediction. This study emphasizes the critical role of search parameter settings in homology-based function transfer.
Keyphrases
  • gene expression
  • emergency department
  • binding protein
  • dna methylation
  • protein protein
  • genome wide
  • genome wide identification