Estimation of biological variance in coherent Raman microscopy data of two cell lines using chemometrics.
Rajendhar JunjuriMatteo CalvareseMohammadsadegh VafaeinezhadFederico VernuccioMarco VenturaTobias Meyer-ZedlerBenedetta GavazzoniDario PolliRenzo VannaItalia BongarzoneSilvia GhislanzoniMatteo NegroJuergen PoppThomas Wilhelm BocklitzPublished in: The Analyst (2024)
Broadband Coherent Anti-Stokes Raman Scattering (BCARS) is a valuable spectroscopic imaging tool for visualizing cellular structures and lipid distributions in biomedical applications. However, the inevitable biological changes in the samples (cells/tissues/lipids) introduce spectral variations in BCARS data and make analysis challenging. In this work, we conducted a systematic study to estimate the biological variance in BCARS data of two commonly used cell lines (HEK293 and HepG2) in biomedical research. The BCARS data were acquired from two different experimental setups (Leibniz Institute of Photonics Technology (IPHT) in Jena and Politecnico di Milano (POLIMI) in Milano) to evaluate the reproducibility of results. Also, spontaneous Raman data were independently acquired at POLIMI to validate those results. First, Kramers-Kronig (KK) algorithm was utilized to retrieve Raman-like signals from the BCARS data, and a pre-processing pipeline was subsequently used to standardize the data. Principal component analysis - Linear discriminant analysis (PCA-LDA) was performed using two cross-validation (CV) methods: batch-out CV and 10-fold CV. Additionally, the analysis was repeated, considering different spectral regions of the data as input to the PCA-LDA. Finally, the classification accuracies of the two BCARS datasets were compared with the results of spontaneous Raman data. The results demonstrated that the CH band region (2770-3070 cm -1 ) and spectral data in the 1500-1800 cm -1 region have significantly contributed to the classification. A maximum of 100% balanced accuracies were obtained for the 10-fold CV for both BCARS setups. However, in the case of batch-out CV, it is 92.4% for the IPHT dataset and 98.8% for the POLIMI dataset. This study offers a comprehensive overview for estimating biological variance in biomedical applications. The insights gained from this analysis hold promise for improving the reliability of BCARS measurements in biomedical applications, paving the way for more accurate and meaningful spectroscopic analyses in the study of biological systems.
Keyphrases
- electronic health record
- big data
- machine learning
- high resolution
- gene expression
- deep learning
- computed tomography
- oxidative stress
- magnetic resonance imaging
- data analysis
- mass spectrometry
- signaling pathway
- artificial intelligence
- optical coherence tomography
- staphylococcus aureus
- cell proliferation
- single cell
- label free
- molecular dynamics simulations
- cystic fibrosis
- cell death
- rna seq
- living cells
- cell cycle arrest