A generalized machine-learning aided method for targeted identification of industrial enzymes from metagenome: A xylanase temperature dependence case study.
Mehdi Foroozandeh ShahrakiKiana FarhadyarKaveh KavousiMohammad H AzarabadAmin BoroomandShohreh AriaeenejadGhasem Hosseini SalekdehPublished in: Biotechnology and bioengineering (2020)
Growing industrial utilization of enzymes and the increasing availability of metagenomic data highlight the demand for effective methods of targeted identification and verification of novel enzymes from various environmental microbiota. Xylanases are a class of enzymes with numerous industrial applications and are involved in the degradation of xylose, a component of lignocellulose. The optimum temperature of enzymes is an essential factor to be considered when choosing appropriate biocatalysts for a particular purpose. Therefore, in silico prediction of this attribute is a significant cost and time-effective step in the effort to characterize novel enzymes. The objective of this study was to develop a computational method to predict the thermal dependence of xylanases. This tool was then implemented for targeted screening of putative xylanases with specific thermal dependencies from metagenomic data and resulted in the identification of three novel xylanases from sheep and cow rumen microbiota. Here we present thermal activity prediction for xylanase, a new sequence-based machine learning method that has been trained using a selected combination of various protein features. This random forest classifier discriminates non-thermophilic, thermophilic, and hyper-thermophilic xylanases. The model's performance was evaluated through multiple iterations of sixfold cross-validations as well as holdout tests, and it is freely accessible as a web-service at arimees.com.