Strategies for data normalization and missing data imputation and consequences for potential diagnostic microRNA biomarkers in epithelial ovarian cancer.
Joanna Lopacinska-JørgensenPatrick H D PetersenDouglas V N P OliveiraClaus K HøgdallEstrid V HøgdallPublished in: PloS one (2023)
MicroRNAs (miRNAs) are small non-coding RNA molecules regulating gene expression with diagnostic potential in different diseases, including epithelial ovarian carcinomas (EOC). As only a few studies have been published on the identification of stable endogenous miRNA in EOC, there is no consensus which miRNAs should be used aiming standardization. Currently, U6-snRNA is widely adopted as a normalization control in RT-qPCR when investigating miRNAs in EOC; despite its variable expression across cancers being reported. Therefore, our goal was to compare different missing data and normalization approaches to investigate their impact on the choice of stable endogenous controls and subsequent survival analysis while performing expression analysis of miRNAs by RT-qPCR in most frequent subtype of EOC: high-grade serous carcinoma (HGSC). 40 miRNAs were included based on their potential as stable endogenous controls or as biomarkers in EOC. Following RNA extraction from formalin-fixed paraffin embedded tissues from 63 HGSC patients, RT-qPCR was performed with a custom panel covering 40 target miRNAs and 8 controls. The raw data was analyzed by applying various strategies regarding choosing stable endogenous controls (geNorm, BestKeeper, NormFinder, the comparative ΔCt method and RefFinder), missing data (single/multiple imputation), and normalization (endogenous miRNA controls, U6-snRNA or global mean). Based on our study, we propose hsa-miR-23a-3p and hsa-miR-193a-5p, but not U6-snRNA as endogenous controls in HGSC patients. Our findings are validated in two external cohorts retrieved from the NCBI Gene Expression Omnibus database. We present that the outcome of stability analysis depends on the histological composition of the cohort, and it might suggest unique pattern of miRNA stability profiles for each subtype of EOC. Moreover, our data demonstrates the challenge of miRNA data analysis by presenting various outcomes from normalization and missing data imputation strategies on survival analysis.
Keyphrases
- gene expression
- data analysis
- electronic health record
- high grade
- big data
- end stage renal disease
- dna methylation
- poor prognosis
- ejection fraction
- computed tomography
- newly diagnosed
- magnetic resonance imaging
- magnetic resonance
- systematic review
- binding protein
- machine learning
- peritoneal dialysis
- climate change
- artificial intelligence
- deep learning
- emergency department
- image quality
- long non coding rna
- pet ct