A sparse QSRR model for predicting retention indices of essential oils based on robust screening approach.
A M Al-FakihZ Y AlgamalM H LeeM AzizPublished in: SAR and QSAR in environmental research (2018)
A robust screening approach and a sparse quantitative structure-retention relationship (QSRR) model for predicting retention indices (RIs) of 169 constituents of essential oils is proposed. The proposed approach is represented in two steps. First, dimension reduction was performed using the proposed modified robust sure independence screening (MR-SIS) method. Second, prediction of RIs was made using the proposed robust sparse QSRR with smoothly clipped absolute deviation (SCAD) penalty (RSQSRR). The RSQSRR model was internally and externally validated based on [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], Y-randomization test, [Formula: see text], [Formula: see text], and the applicability domain. The validation results indicate that the model is robust and not due to chance correlation. The descriptor selection and prediction performance of the RSQSRR for training dataset outperform the other two used modelling methods. The RSQSRR shows the highest [Formula: see text], [Formula: see text], and [Formula: see text], and the lowest [Formula: see text]. For the test dataset, the RSQSRR shows a high external validation value ([Formula: see text]), and a low value of [Formula: see text] compared with the other methods, indicating its higher predictive ability. In conclusion, the results reveal that the proposed RSQSRR is an efficient approach for modelling high dimensional QSRRs and the method is useful for the estimation of RIs of essential oils that have not been experimentally tested.