EndoQuad: a comprehensive genome-wide experimentally validated endogenous G-quadruplex database.
Sheng Hu QianMeng-Wei ShiYu-Li XiongYuan ZhangZe-Hao ZhangXue-Mei SongXin-Yin DengZhen-Xia ChenPublished in: Nucleic acids research (2023)
G-quadruplexes (G4s) are non-canonical four-stranded structures and are emerging as novel genetic regulatory elements. However, a comprehensive genomic annotation of endogenous G4s (eG4s) and systematic characterization of their regulatory network are still lacking, posing major challenges for eG4 research. Here, we present EndoQuad (https://EndoQuad.chenzxlab.cn/) to address these pressing issues by integrating high-throughput experimental data. First, based on high-quality genome-wide eG4s mapping datasets (human: 1181; mouse: 24; chicken: 2) generated by G4 ChIP-seq/CUT&Tag, we generate a reference set of genome-wide eG4s. Our multi-omics analyses show that most eG4s are identified in one or a few cell types. The eG4s with higher occurrences across samples are more structurally stable, evolutionarily conserved, enriched in promoter regions, mark highly expressed genes and associate with complex regulatory programs, demonstrating higher confidence level for further experiments. Finally, we integrate millions of functional genomic variants and prioritize eG4s with regulatory functions in disease and cancer contexts. These efforts have culminated in the comprehensive and interactive database of experimentally validated DNA eG4s. As such, EndoQuad enables users to easily access, download and repurpose these data for their own research. EndoQuad will become a one-stop resource for eG4 research and lay the foundation for future functional studies.
Keyphrases
- genome wide
- copy number
- dna methylation
- transcription factor
- high throughput
- single cell
- rna seq
- electronic health record
- high resolution
- gene expression
- big data
- adverse drug
- public health
- stem cells
- machine learning
- circulating tumor
- papillary thyroid
- emergency department
- mesenchymal stem cells
- single molecule
- current status
- bone marrow
- circulating tumor cells
- induced pluripotent stem cells
- squamous cell
- case control
- genome wide identification