Dissecting endogeneous genetic circuits from first principles.
Rosalind Wenshan PanTom RöschingerKian FaiziRob PhillipsPublished in: bioRxiv : the preprint server for biology (2024)
With the rapid advancement of sequencing technology, there has been an exponential increase in the amount of data on the genomic sequences of diverse organisms. Nevertheless, deciphering the sequence-phenotype mapping of the genomic data remains a formidable task, especially when dealing with non-coding sequences such as the promoter and its allied regulatory architecture. In current databases, annotations on transcription factor binding sites are sorely lacking, which creates a challenge for developing a systematic theory of transcriptional regulation. To address this gap in knowledge, high-throughput methods such as massively parallel reporter assays have been employed to decipher the regulatory genome. In this work, we make use of thermodynamic models to computationally simulate MPRAs in the context of transcriptional regulation studies and produce synthetic MPRA datasets. We examine how well typical experimental and data analysis procedures of MPRAs are able to recover common regulatory architectures under different sets of experimental and biological parameters. By establishing a dialogue between high-throughput experiments and a physical theory of transcription, our efforts serve to both improve current experimental procedures and enhancing our broader understanding of the sequence-function landscape of regulatory sequences.
Keyphrases
- transcription factor
- high throughput
- data analysis
- single cell
- dna binding
- copy number
- big data
- electronic health record
- genome wide
- mental health
- gene expression
- physical activity
- high resolution
- dna methylation
- machine learning
- amino acid
- artificial intelligence
- crispr cas
- quality improvement
- quantum dots
- sensitive detection
- aqueous solution