The Impact of Preprocessing Methods for a Successful Prostate Cell Lines Discrimination Using Partial Least Squares Regression and Discriminant Analysis Based on Fourier Transform Infrared Imaging.
Danuta LiberdaEwa PiętaKatarzyna PogodaNatalia PiergiesMaciej RomanPaulina KoziolTomasz P WrobelCzeslawa PaluszkiewiczWojciech M KwiatekPublished in: Cells (2021)
Fourier transform infrared spectroscopy (FT-IR) is widely used in the analysis of the chemical composition of biological materials and has the potential to reveal new aspects of the molecular basis of diseases, including different types of cancer. The potential of FT-IR in cancer research lies in its capability of monitoring the biochemical status of cells, which undergo malignant transformation and further examination of spectral features that differentiate normal and cancerous ones using proper mathematical approaches. Such examination can be performed with the use of chemometric tools, such as partial least squares discriminant analysis (PLS-DA) classification and partial least squares regression (PLSR), and proper application of preprocessing methods and their correct sequence is crucial for success. Here, we performed a comparison of several state-of-the-art methods commonly used in infrared biospectroscopy (denoising, baseline correction, and normalization) with the addition of methods not previously used in infrared biospectroscopy classification problems: Mie extinction extended multiplicative signal correction, Eiler's smoothing, and probabilistic quotient normalization. We compared all of these approaches and their effect on the data structure, classification, and regression capability on experimental FT-IR spectra collected from five different prostate normal and cancerous cell lines. Additionally, we tested the influence of added spectral noise. Overall, we concluded that in the case of the data analyzed here, the biggest impact on data structure and performance of PLS-DA and PLSR was caused by the baseline correction; therefore, much attention should be given, especially to this step of data preprocessing.
Keyphrases
- electronic health record
- machine learning
- prostate cancer
- deep learning
- big data
- papillary thyroid
- mental health
- squamous cell
- working memory
- high resolution
- magnetic resonance imaging
- risk assessment
- benign prostatic hyperplasia
- induced apoptosis
- oxidative stress
- squamous cell carcinoma
- air pollution
- human health
- convolutional neural network
- cell death
- cell proliferation
- computed tomography
- young adults
- lymph node metastasis
- fluorescence imaging
- dual energy