HSQC Spectra Simulation and Matching for Molecular Identification.
Martin PriessnerRichard J LewisMagnus J JohanssonJonathan M GoodmanJon Paul JanetAnna TombergPublished in: Journal of chemical information and modeling (2024)
In the pursuit of improved compound identification and database search tasks, this study explores heteronuclear single quantum coherence (HSQC) spectra simulation and matching methodologies. HSQC spectra serve as unique molecular fingerprints, enabling a valuable balance of data collection time and information richness. We conducted a comprehensive evaluation of the following four HSQC simulation techniques: ACD/Labs (ACD), MestReNova (MNova), Gaussian NMR calculations (DFT), and a graph-based neural network (ML). For the latter two techniques, we developed a reconstruction logic to combine proton and carbon 1D spectra into HSQC spectra. The methodology involved the implementation of three peak-matching strategies (minimum-sum, Euclidean-distance, and Hungarian distance) combined with three padding strategies (zero-padding, peak-truncated, and nearest-neighbor double assignment). We found that coupling these strategies with a robust simulation technique facilitates the accurate identification of correct molecules from similar analogues (regio- and stereoisomers) and allows for fast and accurate large database searches. Furthermore, we demonstrated the efficacy of the best-performing methodology by rectifying the structures of a set of previously misidentified molecules. This research indicates that effective HSQC spectral simulation and matching methodologies significantly facilitate molecular structure elucidation. Furthermore, we offer a Google Colab notebook for researchers to use our methods on their own data (https://github.com/AstraZeneca/hsqc_structure_elucidation.git).