Automated Feature Mining for Two-Dimensional Liquid Chromatography Applied to Polymers Enabled by Mass Remainder Analysis.
Stef R A MolenaarBram van de PutJessica S DesportSaer SamanipourRon A H PetersBob W J PirokPublished in: Analytical chemistry (2022)
A fast algorithm for automated feature mining of synthetic (industrial) homopolymers or perfectly alternating copolymers was developed. Comprehensive two-dimensional liquid chromatography-mass spectrometry data (LC × LC-MS) was utilized, undergoing four distinct parts within the algorithm. Initially, the data is reduced by selecting regions of interest within the data. Then, all regions of interest are clustered on the time and mass-to-charge domain to obtain isotopic distributions. Afterward, single-value clusters and background signals are removed from the data structure. In the second part of the algorithm, the isotopic distributions are employed to define the charge state of the polymeric units and the charge-state reduced masses of the units are calculated. In the third part, the mass of the repeating unit ( i.e. , the monomer) is automatically selected by comparing all mass differences within the data structure. Using the mass of the repeating unit, mass remainder analysis can be performed on the data. This results in groups sharing the same end-group compositions. Lastly, combining information from the clustering step in the first part and the mass remainder analysis results in the creation of compositional series, which are mapped on the chromatogram. Series with similar chromatographic behavior are separated in the mass-remainder domain, whereas series with an overlapping mass remainder are separated in the chromatographic domain. These series were extracted within a calculation time of 3 min. The false positives were then assessed within a reasonable time. The algorithm is verified with LC × LC-MS data of an industrial hexahydrophthalic anhydride-derivatized propylene glycol-terephthalic acid copolyester. Afterward, a chemical structure proposal has been made for each compositional series found within the data.
Keyphrases
- mass spectrometry
- machine learning
- electronic health record
- liquid chromatography
- big data
- deep learning
- healthcare
- simultaneous determination
- artificial intelligence
- heavy metals
- magnetic resonance imaging
- high throughput
- high resolution mass spectrometry
- social media
- health information
- magnetic resonance
- drug release