Discovering predisposing genes for hereditary breast cancer using deep learning.
Gal PassiSari LiebermanFouad ZahdehOmer MurikPaul RenbaumRachel BeeriMichal LinialDalit MayEphrat Levy-LahadDina Schneidman-DuhovnyPublished in: Briefings in bioinformatics (2024)
Breast cancer (BC) is the most common malignancy affecting Western women today. It is estimated that as many as 10% of BC cases can be attributed to germline variants. However, the genetic basis of the majority of familial BC cases has yet to be identified. Discovering predisposing genes contributing to familial BC is challenging due to their presumed rarity, low penetrance, and complex biological mechanisms. Here, we focused on an analysis of rare missense variants in a cohort of 12 families of Middle Eastern origins characterized by a high incidence of BC cases. We devised a novel, high-throughput, variant analysis pipeline adapted for family studies, which aims to analyze variants at the protein level by employing state-of-the-art machine learning models and three-dimensional protein structural analysis. Using our pipeline, we analyzed 1218 rare missense variants that are shared between affected family members and classified 80 genes as candidate pathogenic. Among these genes, we found significant functional enrichment in peroxisomal and mitochondrial biological pathways which segregated across seven families in the study and covered diverse ethnic groups. We present multiple evidence that peroxisomal and mitochondrial pathways play an important, yet underappreciated, role in both germline BC predisposition and BC survival.
Keyphrases
- copy number
- genome wide
- machine learning
- deep learning
- high throughput
- bioinformatics analysis
- genome wide identification
- oxidative stress
- south africa
- dna methylation
- intellectual disability
- artificial intelligence
- dna repair
- early onset
- type diabetes
- risk factors
- insulin resistance
- metabolic syndrome
- genome wide analysis
- autism spectrum disorder
- polycystic ovary syndrome
- protein protein
- skeletal muscle
- young adults
- dna damage
- convolutional neural network