Login / Signup

Predictive classifier models built from natural products with antimalarial bioactivity using machine learning approach.

Samuel EgieyehJames SyceSarel F MalanRuben Cloete
Published in: PloS one (2018)
In view of the vast number of natural products with potential antiplasmodial bioactivity and cost of conducting antiplasmodial bioactivity assays, it may be judicious to learn from previous antiplasmodial bioassays and predict bioactivity of these natural products before experimental bioassays. This study set out to harness antimalarial bioactivity data of natural products to build accurate predictive models, utilizing classical machine learning approaches, which can find potential antimalarial hits from new sets of natural products. Classical machine learning approaches were used to build four classifier models (Naïve Bayesian, Voted Perceptron, Random Forest and Sequence Minimization Optimization of Support Vector Machines) from bioactivity data of natural products with in-vitro antiplasmodial activity (NAA) using a combination of the molecular descriptors and two-dimensional molecular fingerprints of the compounds. Models were evaluated with an independent test dataset. Possible chemical features associated with reported antimalarial activities of the compounds were also extracted. From the results, Random Forest (accuracy 82.81%, Kappa statistics 0.65 and Area under Receiver Operating Characteristics curve 0.91) and Sequential Minimization Optimization (accuracy 85.93%, Kappa statistics 0.72 and Area under Receiver Operating Characteristics curve 0.86) showed good predictive performance for the NAA dataset. The amine chemical group (specifically alkyl amines and basic nitrogen) was confirmed to be essential for antimalarial activity in active NAA dataset. This study built and evaluated classifier models that were used to predict the antiplasmodial bioactivity class (active or inactive) of a set of natural products from interBioScreen chemical library.
Keyphrases
  • machine learning
  • plasmodium falciparum
  • big data
  • climate change
  • electronic health record
  • nuclear factor
  • high throughput
  • data analysis
  • deep learning
  • human health
  • toll like receptor