Discovering Common miRNA Signatures Underlying Female-Specific Cancers via a Machine Learning Approach Driven by the Cancer Hallmark ERBB.
Katia PaneMario ZanfardinoAnna Maria GrimaldiGustavo BaldassarreMarco SalvatoreMariarosaria IncoronatoMonica FranzesePublished in: Biomedicines (2022)
Big data processing, using omics data integration and machine learning (ML) methods, drive efforts to discover diagnostic and prognostic biomarkers for clinical decision making. Previously, we used the TCGA database for gene expression profiling of breast, ovary, and endometrial cancers, and identified a top-scoring network centered on the ERBB2 gene, which plays a crucial role in carcinogenesis in the three estrogen-dependent tumors. Here, we focused on microRNA expression signature similarity, asking whether they could target the ERBB family. We applied an ML approach on integrated TCGA miRNA profiling of breast, endometrium, and ovarian cancer to identify common miRNA signatures differentiating tumor and normal conditions. Using the ML-based algorithm and the miRTarBase database, we found 205 features and 158 miRNAs targeting ERBB isoforms, respectively. By merging the results of both databases and ranking each feature according to the weighted Support Vector Machine model, we prioritized 42 features, with accuracy (0.98), AUC (0.93-95% CI 0.917-0.94), sensitivity (0.85), and specificity (0.99), indicating their diagnostic capability to discriminate between the two conditions. In vitro validations by qRT-PCR experiments, using model and parental cell lines for each tumor type showed that five miRNAs (hsa-mir-323a-3p, hsa-mir-323b-3p, hsa-mir-331-3p, hsa-mir-381-3p, and hsa-mir-1301-3p) had expressed trend concordance between breast, ovarian, and endometrium cancer cell lines compared with normal lines, confirming our in silico predictions. This shows that an integrated computational approach combined with biological knowledge, could identify expression signatures as potential diagnostic biomarkers common to multiple tumors.
Keyphrases
- machine learning
- big data
- genome wide
- artificial intelligence
- tyrosine kinase
- deep learning
- papillary thyroid
- poor prognosis
- copy number
- decision making
- dna methylation
- squamous cell
- single cell
- genome wide identification
- healthcare
- magnetic resonance
- adverse drug
- lymph node metastasis
- magnetic resonance imaging
- binding protein
- computed tomography
- emergency department
- squamous cell carcinoma
- quality improvement
- electronic health record
- contrast enhanced
- drug delivery
- gene expression
- climate change
- transcription factor
- risk assessment
- cancer therapy