Identifying Biomarkers Using Support Vector Machine to Understand the Racial Disparity in Triple-Negative Breast Cancer.
Bikram SahooZandra PinnixSeth SimsAlexander ZelikovskyPublished in: Journal of computational biology : a journal of computational molecular cell biology (2023)
With the properties of aggressive cancer and heterogeneous tumor biology, triple-negative breast cancer (TNBC) is a type of breast cancer known for its poor clinical outcome. The lack of estrogen, progesterone, and human epidermal growth factor receptor in the tumors of TNBC leads to fewer treatment options in clinics. The incidence of TNBC is higher in African American (AA) women compared with European American (EA) women with worse clinical outcomes. The significant factors responsible for the racial disparity in TNBC are socioeconomic lifestyle and tumor biology. The current study considered the open-source gene expression data of triple-negative breast cancer samples' racial information. We implemented a state-of-the-art classification Support Vector Machine (SVM) method with a recurrent feature elimination approach to the gene expression data to identify significant biomarkers deregulated in AA women and EA women. We also included Spearman's rho and Ward's linkage method in our feature selection workflow. Our proposed method generates 24 features/genes that can classify the AA and EA samples 98% accurately. We also performed the Kaplan-Meier analysis and log-rank test on the 24 features/genes. We only discussed the correlation between deregulated expression and cancer progression with a poor survival rate of 2 genes, KLK10 and LRRC37A2 , out of 24 genes. We believe that further improvement of our method with a higher number of RNA-seq gene expression data will more accurately provide insight into racial disparity in TNBC.
Keyphrases
- african american
- gene expression
- genome wide
- deep learning
- dna methylation
- rna seq
- epidermal growth factor receptor
- electronic health record
- polycystic ovary syndrome
- machine learning
- papillary thyroid
- single cell
- bioinformatics analysis
- big data
- genome wide identification
- pregnancy outcomes
- breast cancer risk
- endothelial cells
- primary care
- metabolic syndrome
- squamous cell
- genome wide analysis
- healthcare
- squamous cell carcinoma
- data analysis
- artificial intelligence
- adipose tissue
- pregnant women
- induced pluripotent stem cells
- young adults
- health information
- hiv infected
- binding protein
- neural network