A machine learning approach for handling big data produced by high resolution mass spectrometry after data independent acquisition of small molecules - Proof of concept study using an artificial neural network for sample classification.

Gabriel L StreunMarco P ElmigerAkos Dobay Lars Christian Ebert Thomas Krämer

Published in: Drug testing and analysis (2020)

Liquid chromatography coupled to high-resolution mass spectrometry (HRMS) enables data independent acquisition (DIA) and untargeted screening. However, to avoid the handling of the resulting large dataset, most laboratories in that field still use targeted screening methods, which offer good sensitivity and specificity but are limited to known compounds. The promising field of machine learning offers new possibilities such as artificial neural networks that can be trained to classify large amounts of data. In this proof of concept study, we exemplify such a machine learning approach for raw HRMS-DIA data files. We evaluated a machine learning model using training, validation, and test sets of solvent and whole blood samples containing drugs (of abuse) common in forensic toxicology. For that purpose, different platforms were used. With a feedforward neural network model architecture, a category prediction (blank sample vs. drug containing sample) was aimed for. With the applied machine learning approaches, the sensitivity and specificity, of the validation and test set, for the prediction of sample classes were in a suitable range for an actual use in a (routine) laboratory (e.g. workplace drug testing). In conclusion, this proof of concept study clearly demonstrated the huge potential of machine learning in the analysis of HRMS-DIA data.

Keyphrases