Assessment of selection pressure exerted on genes from complete pangenomes helps to improve the accuracy in the prediction of new genes.
Alejandro RubioJuan JimenezAntonio J Pérez-PulidoPublished in: Briefings in bioinformatics (2022)
Bacterial genomes are massively sequenced, and they provide valuable data to better know the complete set of genes of a species. The analysis of thousands of bacterial strains can identify both shared genes and those appearing only in the pathogenic ones. Current computational gene finders facilitate this task but often miss some existing genes. However, the present availability of different genomes from the same species is useful to estimate the selective pressure applied on genes of complete pangenomes. It may assist in evaluating gene predictions either by checking the certainty of a new gene or annotating it as a gene under positive selection. Here, we estimated the selective pressure of 19 271 genes that are part of the pangenome of the human opportunistic pathogen Acinetobacter baumannii and found that most genes in this bacterium are subject to negative selection. However, 23% of them showed values compatible with positive selection. These latter were mainly uncharacterized proteins or genes required to evade the host defence system including genes related to resistance and virulence whose changes may be favoured to acquire new functions. Finally, we evaluated the utility of measuring selection pressure in the detection of sequencing errors and the validation of gene prediction.
Keyphrases
- genome wide
- genome wide identification
- genome wide analysis
- bioinformatics analysis
- dna methylation
- copy number
- transcription factor
- pseudomonas aeruginosa
- multidrug resistant
- drug resistant
- escherichia coli
- emergency department
- gene expression
- endothelial cells
- cystic fibrosis
- staphylococcus aureus
- quality improvement
- machine learning
- artificial intelligence
- patient safety
- biofilm formation
- big data
- label free
- real time pcr