We develop and discuss a methodology for batch-level analysis of hyperspectral stimulated Raman scattering (hsSRS) data sets of human meibum in the CH-stretching vibrational range. The analysis consists of two steps. The first step uses a training set (n=19) to determine chemically meaningful reference spectra that jointly constitute a basis set for the sample. This procedure makes use of batch-level vertex component analysis (VCA), followed by unsupervised k-means clustering to express the data set in terms of spectra that represent lipid and protein mixtures in changing proportions. The second step uses a random forest classifier to rapidly classify hsSRS stacks in terms of the pre-determined basis set. The overall procedure allows a rapid quantitative analysis of large hsSRS data sets, enabling a direct comparison among samples using a single set of reference spectra. We apply this procedure to assess 50 specimens of expressed human meibum, rich in both protein and lipid, and show that the batch-level analysis reveals marked variation among samples that potentially correlate with meibum health quality.
Keyphrases
- endothelial cells
- machine learning
- big data
- density functional theory
- healthcare
- pluripotent stem cells
- induced pluripotent stem cells
- electronic health record
- minimally invasive
- high resolution
- deep learning
- public health
- climate change
- molecular dynamics simulations
- mental health
- risk assessment
- fatty acid
- amino acid
- anaerobic digestion
- small molecule
- room temperature
- single cell
- high speed
- mass spectrometry