Login / Signup

AHLS-pred: a novel sequence-based predictor of acyl-homoserine-lactone synthases using machine learning algorithms.

Jie HuJin WangJiahao LiHaidong HuBin WuHongqiang RenJinfeng Wang
Published in: Environmental microbiology reports (2022)
Acyl-homoserine-lactones (AHLs), as the major quorum sensing (QS) signalling molecules in Gram-negative bacteria, have shown great application potential in regulating biological nutrient removal process. The identification of AHLs synthases plays an essential role in in-depth research on QS mechanisms and applications of biological wastewater treatment processes. This work proposed the first prediction model for AHLs synthases based on machine learning algorithms, namely, AHLS-pred. The training dataset AHLS1400 and the independent testing dataset AHLS132 for AHLSs prediction were first established. Three sequence-based feature extraction methods are utilized to generate feature descriptors, namely, amino acid composition, dipeptide composition and G-gap dipeptide composition respectively. Subsequently, the optimal features were obtained based on the sorted feature descriptors (in F-score order) and the sequential forward search strategy. By comparing five different machine learning algorithms, the final prediction model is trained with support vector machine classifier on AHLS1400 in fivefold cross-validation with the best performance (ACC = 99.43%, MCC = 0.989, AUC = 0.997). The results show that AHLS-pred achieves an ACC of 94.70%, MCC of 0.894 and AUC of 0.995 on the independent testing dataset AHLS132. It demonstrates that AHLS-pred is a promising and powerful prediction method for accelerating the process of AHLSs computational identification.
Keyphrases
  • machine learning
  • deep learning
  • wastewater treatment
  • artificial intelligence
  • amino acid
  • big data
  • fatty acid
  • microbial community
  • antibiotic resistance genes
  • neural network