Analysis of Isocratic-Chromatographic-Retention Data using Bayesian Multilevel Modeling.
Łukasz KubikRoman KaliszanPaweł WiczlingPublished in: Analytical chemistry (2018)
The objective of this work was to develop a multilevel (hierarchical) model based on isocratic-reversed-phase-high-performance-chromatographic data collected in methanol and acetonitrile for 58 chemical compounds. Such a multilevel model is a regression model of the analyte-specific chromatographic measurements, in which all the regression parameters are given a probability model. It is a fundamentally different approach from the most common approach, where parameters are separately estimated for each analyte (without sharing information across analytes and different organic modifiers). The statistical analysis was done with Stan software implementing the Bayesian-statistics inference with Markov-chain Monte Carlo sampling. During the model-building process, a series of multilevel models of different complexity were obtained, such as (1) a model with no pooling (separate models were fitted for each analyte), (2) a model with partial pooling (a common distribution was used for analyte-specific parameters), and (3) a model with partial pooling as well as a regression model relating analyte-specific parameters and analyte-specific properties (QSRR equations). All the models were compared with each other using 10-fold cross-validation. The benefits of multilevel models in inference and predictions were shown. In particular the obtained models allowed us to (i) better understand the data and (ii) solve many routine analytical problems, such as obtaining well-calibrated predictions of retention factors for an analyte in acetonitrile-containing mobile phases given zero, one, or several measurements in methanol-containing mobile phases and vice versa.