Cross-Modal Retrieval Between 13 C NMR Spectra and Structures Based on Focused Libraries.
Hanyu SunXi XueXue LiuHai-Yu HuYafeng DengXiao-Jian WangPublished in: Analytical chemistry (2024)
Library matching by comparing carbon-13 nuclear magnetic resonance ( 13 C NMR) spectra with spectral data in the library is a crucial method for compound identification. In our previous paper, we introduced a deep contrastive learning system called CReSS, which used a library that contained more structures. However, CReSS has two limitations: there were no unknown structures in the library, and a redundant library reduces the structure-elucidation accuracy. Herein, we replaced the oversize traditional libraries with focused libraries containing a small number of molecules. A previously generative model, CMGNet, was used to generate focused libraries for CReSS. The combined model achieved a Top-10 accuracy of 54.03% when tested on 6,471 13 C NMR spectra. In comparison, CReSS with a random reference structure library achieved an accuracy of only 9.17%. Furthermore, to expand the advantages of the focused libraries, we proposed SAmpRNN, which is a recurrent neural network (RNN). With the large focused library amplified by SAmpRNN, the structure-identification accuracy of the model increased in 70.0% of the 30 random example cases. In general, cross-modal retrieval between 13 C NMR spectra and structures based on focused libraries (CFLS) achieved high accuracy and provided more accurate candidate structures than traditional libraries for compound identification.