Machine Learning Enhanced Spectrum Recognition Based on Computer Vision (SRCV) for Intelligent NMR Data Extraction.
Wenqiang JiaZhuo YangMinjian YangLiang ChengZengrong LeiXiao-Jian WangPublished in: Journal of chemical information and modeling (2020)
A machine learning enhanced spectrum recognition system called spectrum recognition based on computer vision (SRCV) for data extraction from previously analyzed 13C and 1H NMR spectra has been developed. The intelligent system was designed with four function modules to extract data from three areas of NMR images, including 13C and 1H chemical shifts, the integral, and the range of the shift values. During this study, three machine learning models were pretrained for number recognition, which is the key procedure for NMR data extraction. The k nearest neighbor (kNN) method was selected with optimized k (k = 4), which displayed a 100% recognition rate. Subsequently, the performance of SRCV was tested and validated to have high accuracy with a short processing time (11-21 s) for each NMR spectral image. Our spectrum recognizer enables high-throughput 13C and 1H NMR data extraction from abundant spectra in the literature and has the potential to be used for spectral database construction. In addition, the system may be applicable to be developed for data import to computer-assisted structure elucidation systems, which would automate this procedure significantly. SRCV can be accessed in GitHub (https://github.com/WJmodels/SRCV).