Sequence-to-expression approach to identify etiological non-coding DNA variations in P53 and cMYC-driven diseases.
Katherine KinShounak BhogaleLisha ZhuDerrick ThomasJessica BertolW Jim ZhengSaurabh SinhaWalid D FakhouriPublished in: Human molecular genetics (2024)
Disease risk prediction based on genomic sequence and transcriptional profile can improve disease screening and prevention. Despite identifying many disease-associated DNA variants, distinguishing deleterious non-coding DNA variations remains poor for most common diseases. In this study, we designed in vitro experiments to uncover the significance of occupancy and competitive binding between P53 and cMYC on common target genes. Analyzing publicly available ChIP-seq data for P53 and cMYC in embryonic stem cells showed that ~344-366 regions are co-occupied, and on average, two cis-overlapping motifs (CisOMs) per region were identified, suggesting that co-occupancy is evolutionarily conserved. Using U2OS and Raji cells untreated and treated with doxorubicin to increase P53 protein level while potentially reducing cMYC level, ChIP-seq analysis illustrated that around 16 to 922 genomic regions were co-occupied by P53 and cMYC, and substitutions of cMYC signals by P53 were detected post doxorubicin treatment. Around 187 expressed genes near co-occupied regions were altered at mRNA level according to RNA-seq data analysis. We utilized a computational motif-matching approach to illustrate that changes in predicted P53 binding affinity in CisOMs of co-occupied elements significantly correlate with alterations in reporter gene expression. We performed a similar analysis using SNPs mapped in CisOMs for P53 and cMYC from ChIP-seq data, and expression of target genes from GTEx portal. We found significant correlation between change in cMYC-motif binding affinity in CisOMs and altered expression. Our study brings us closer to developing a generally applicable approach to filter etiological non-coding variations associated with common diseases.
Keyphrases
- rna seq
- genome wide
- single cell
- binding protein
- data analysis
- gene expression
- poor prognosis
- copy number
- dna methylation
- circulating tumor
- high throughput
- circulating tumor cells
- cell free
- single molecule
- transcription factor
- genome wide identification
- induced apoptosis
- amino acid
- big data
- electronic health record
- nucleic acid
- small molecule
- cell proliferation
- mass spectrometry
- heat shock protein
- artificial intelligence
- machine learning
- pi k akt