A new genome-mining tool redefines the lasso peptide biosynthetic landscape.
Jonathan I TietzChristopher J SchwalenParth S PatelTucker MaxsonPatricia M BlairHua-Chia TaiUzma I ZakaiDouglas A MitchellPublished in: Nature chemical biology (2017)
Ribosomally synthesized and post-translationally modified peptide (RiPP) natural products are attractive for genome-driven discovery and re-engineering, but limitations in bioinformatic methods and exponentially increasing genomic data make large-scale mining of RiPP data difficult. We report RODEO (Rapid ORF Description and Evaluation Online), which combines hidden-Markov-model-based analysis, heuristic scoring, and machine learning to identify biosynthetic gene clusters and predict RiPP precursor peptides. We initially focused on lasso peptides, which display intriguing physicochemical properties and bioactivities, but their hypervariability renders them challenging prospects for automated mining. Our approach yielded the most comprehensive mapping to date of lasso peptide space, revealing >1,300 compounds. We characterized the structures and bioactivities of six lasso peptides, prioritized based on predicted structural novelty, including one with an unprecedented handcuff-like topology and another with a citrulline modification exceptionally rare among bacteria. These combined insights significantly expand the knowledge of lasso peptides and, more broadly, provide a framework for future genome-mining efforts.
Keyphrases
- machine learning
- genome wide
- amino acid
- big data
- electronic health record
- high resolution
- current status
- high throughput
- copy number
- healthcare
- small molecule
- social media
- dna methylation
- single cell
- health information
- data analysis
- transcription factor
- loop mediated isothermal amplification
- sensitive detection
- genome wide identification