Developing robust protein analysis profiles to identify bacterial acid phosphatases in genomes and metagenomic libraries.
Zulema UdaondoEstrella DuqueAbdelali DaddaouaCarlos CasellesAmalia RocaPaloma Pizarro-TobiasJuan Luis RamosPublished in: Environmental microbiology (2020)
Phylogenetic analysis of more than 4000 annotated bacterial acid phosphatases was carried out. Our analysis enabled us to sort these enzymes into the following three types: (1) class B acid phosphatases, which were distantly related to the other types, (2) class C acid phosphatases and (3) generic acid phosphatases (GAP). Although class B phosphatases are found in a limited number of bacterial families, which include known pathogens, class C acid phosphatases and GAP proteins are found in a variety of microbes that inhabit soil, fresh water and marine environments. As part of our analysis, we developed three profiles, named Pfr-B-Phos, Pfr-C-Phos and Pfr-GAP, to describe the three groups of acid phosphatases. These sequence-based profiles were then used to scan genomes and metagenomes to identify a large number of formerly unknown acid phosphatases. A number of proteins in databases annotated as hypothetical proteins were also identified by these profiles as putative acid phosphatases. To validate these in silico results, we cloned genes encoding candidate acid phosphatases from genomic DNA or recovered from metagenomic libraries or genes synthesized in vitro based on protein sequences recovered from metagenomic data. Expression of a number of these genes, followed by enzymatic analysis of the proteins, further confirmed that sequence similarity searches using our profiles could successfully identify previously unknown acid phosphatases.