Retention Index Prediction Using Quantitative Structure-Retention Relationships for Improving Structure Identification in Nontargeted Metabolomics.

Yabin WenRuth I J AmosMohammad TalebiRoman SzucsJohn W DolanChristopher A Pohl Paul R Haddad

Published in: Analytical chemistry (2018)

Structure identification in nontargeted metabolomics based on liquid-chromatography coupled to mass spectrometry (LC-MS) remains a significant challenge. Quantitative structure-retention relationship (QSRR) modeling is a technique capable of accelerating the structure identification of metabolites by predicting their retention, allowing false positives to be eliminated during the interpretation of metabolomics data. In this work, 191 compounds were grouped according to molecular weight and a QSRR study was carried out on the 34 resulting groups to eliminate false positives. Partial least squares (PLS) regression combined with a Genetic algorithm (GA) was applied to construct the linear QSRR models based on a variety of VolSurf+ molecular descriptors. A novel dual-filtering approach, which combines Tanimoto similarity (TS) searching as the primary filter and retention index (RI) similarity clustering as the secondary filter, was utilized to select compounds in training sets to derive the QSRR models yielding R2 of 0.8512 and an average root mean square error in prediction (RMSEP) of 8.45%. With a retention index filter expressed as ±2 standard deviations (SD) of the error, representative compounds were predicted with >91% accuracy, and for 53% of the groups (18/34), at least one false positive compound could be eliminated. The proposed strategy can thus narrow down the number of false positives to be assessed in nontargeted metabolomics.

Keyphrases