Towards compound identification of synthetic opioids in nontargeted screening using machine learning techniques.
Joshua KlingbergAdam T CawleyRonald ShimmonShanlin FuPublished in: Drug testing and analysis (2020)
The constant evolution of the illicit drug market makes the identification of unknown compounds problematic. Obtaining certified reference materials for a broad array of new analogues can be difficult and cost prohibitive. Machine learning provides a promising avenue to putatively identify a compound before confirmation against a standard. In this study, machine learning approaches were used to develop class prediction and retention time prediction models. The developed class prediction model used a naïve Bayes architecture to classify opioids as belonging to either the fentanyl analogues, AH series or U series, with an accuracy of 89.5%. The model was most accurate for the fentanyl analogues, most likely due to their greater number in the training data. This classification model can provide guidance to an analyst when determining a suspected structure. A retention time prediction model was also trained for a wide array of synthetic opioids. This model utilised Gaussian process regression to predict the retention time of analytes based on multiple generated molecular features with 79.7% of the samples predicted within ±0.1 min of their experimental retention time. Once the suspected structure of an unknown compound is determined, molecular features can be generated and input for the prediction model to compare with experimental retention time. The incorporation of machine learning prediction models into a compound identification workflow can assist putative identifications with greater confidence and ultimately save time and money in the purchase and/or production of superfluous certified reference materials.
Keyphrases
- machine learning
- chronic pain
- big data
- artificial intelligence
- pain management
- molecular docking
- high resolution
- deep learning
- pulmonary embolism
- electronic health record
- high throughput
- bioinformatics analysis
- structure activity relationship
- single molecule
- single cell
- resistance training
- high density
- adverse drug
- body composition
- high intensity
- high resolution mass spectrometry
- gas chromatography