Bioinformatics-Guided Expansion and Discovery of Graspetides.
Sangeetha RameshXiaorui GuoAdam J DiCaprioAshley M De LioLonnie A HarrisBryce L KilleTaras V PogorelovDouglas A MitchellPublished in: ACS chemical biology (2021)
Graspetides are a class of ribosomally synthesized and post-translationally modified peptide natural products featuring ATP-grasp ligase-dependent formation of macrolactones/macrolactams. These modifications arise from serine, threonine, or lysine donor residues linked to aspartate or glutamate acceptor residues. Characterized graspetides include serine protease inhibitors such as the microviridins and plesiocin. Here, we report an update to Rapid ORF Description and Evaluation Online (RODEO) for the automated detection of graspetides, which identified 3,923 high-confidence graspetide biosynthetic gene clusters. Sequence and co-occurrence analyses doubled the number of graspetide groups from 12 to 24, defined based on core consensus sequence and putative secondary modification. Bioinformatic analyses of the ATP-grasp ligase superfamily suggest that extant graspetide synthetases diverged once from an ancestral ATP-grasp ligase and later evolved to introduce a variety of ring connectivities. Furthermore, we characterized thatisin and iso-thatisin, two graspetides related by conformational stereoisomerism from Lysobacter antibioticus. Derived from a newly identified graspetide group, thatisin and iso-thatisin feature two interlocking macrolactones with identical ring connectivity, as determined by a combination of tandem mass spectrometry (MS/MS), methanolytic, and mutational analyses. NMR spectroscopy of thatisin revealed a cis conformation for a key proline residue, while molecular dynamics simulations, solvent-accessible surface area calculations, and partial methanolytic analysis coupled with MS/MS support a trans conformation for iso-thatisin at the same position. Overall, this work provides a comprehensive overview of the graspetide landscape, and the improved RODEO algorithm will accelerate future graspetide discoveries by enabling open-access analysis of existing and emerging genomes.
Keyphrases
- molecular dynamics simulations
- ms ms
- tandem mass spectrometry
- ultra high performance liquid chromatography
- high performance liquid chromatography
- machine learning
- deep learning
- molecular docking
- protein kinase
- simultaneous determination
- liquid chromatography
- high throughput
- amino acid
- single cell
- gas chromatography
- loop mediated isothermal amplification
- liquid chromatography tandem mass spectrometry
- small molecule
- solid phase extraction
- high resolution
- genome wide
- multiple sclerosis
- genome wide identification
- current status
- healthcare
- neural network
- solar cells
- health information
- gene expression
- white matter
- copy number
- label free
- transcription factor
- electron transfer
- energy transfer