Discovering type I cis-AT polyketides through computational mass spectrometry and genome mining with Seq2PKS.
Donghui YanMuqing ZhouAbhinav AdduriYihao ZhuangMustafa GulerSitong LiuHyonyoung ShinTorin KovachGloria OhXiao LiuYuting DengXiaofeng WangLiu CaoDavid H ShermanPamela J SchultzRoland D KerstenJason A ClementAshootosh TripathiBahar BehsazHosein MohimaniPublished in: Nature communications (2024)
Type 1 polyketides are a major class of natural products used as antiviral, antibiotic, antifungal, antiparasitic, immunosuppressive, and antitumor drugs. Analysis of public microbial genomes leads to the discovery of over sixty thousand type 1 polyketide gene clusters. However, the molecular products of only about a hundred of these clusters are characterized, leaving most metabolites unknown. Characterizing polyketides relies on bioactivity-guided purification, which is expensive and time-consuming. To address this, we present Seq2PKS, a machine learning algorithm that predicts chemical structures derived from Type 1 polyketide synthases. Seq2PKS predicts numerous putative structures for each gene cluster to enhance accuracy. The correct structure is identified using a variable mass spectral database search. Benchmarks show that Seq2PKS outperforms existing methods. Applying Seq2PKS to Actinobacteria datasets, we discover biosynthetic gene clusters for monazomycin, oasomycin A, and 2-aminobenzamide-actiphenol.
Keyphrases
- genome wide
- rna seq
- dna methylation
- machine learning
- single cell
- copy number
- mass spectrometry
- high resolution
- healthcare
- small molecule
- ms ms
- microbial community
- high throughput
- deep learning
- mental health
- emergency department
- optical coherence tomography
- artificial intelligence
- single molecule
- high performance liquid chromatography
- capillary electrophoresis
- drug induced