Deciphering regulatory architectures from synthetic single-cell expression patterns.
Rosalind Wenshan PanTom RöschingerKian FaiziHernan G GarciaRob PhillipsPublished in: bioRxiv : the preprint server for biology (2024)
For the vast majority of genes in sequenced genomes, there is limited understanding of how they are regulated. Without such knowledge, it is not possible to perform a quantitative theory-experiment dialogue on how such genes give rise to physiological and evolutionary adaptation. One category of high-throughput experiments used to understand the sequence-phenotype relationship of the transcriptome is massively parallel reporter assays (MPRAs). However, to improve the versatility and scalability of MPRA pipelines, we need a "theory of the experiment" to help us better understand the impact of various biological and experimental parameters on the interpretation of experimental data. These parameters include binding site copy number, where a large number of specific binding sites may titrate away transcription factors, as well as the presence of overlapping binding sites, which may affect analysis of the degree of mutual dependence between mutations in the regulatory region and expression levels. To that end, in this paper we create tens of thousands of synthetic single-cell gene expression outputs using both equilibrium and out-of-equilibrium models. These models make it possible to imitate the summary statistics (information footprints and expression shift matrices) used to characterize the output of MPRAs and from this summary statistic to infer the underlying regulatory architecture. Specifically, we use a more refined implementation of the so-called thermodynamic models in which the binding energies of each sequence variant are derived from energy matrices. Our simulations reveal important effects of the parameters on MPRA data and we demonstrate our ability to optimize MPRA experimental designs with the goal of generating thermodynamic models of the transcriptome with base-pair specificity. Further, this approach makes it possible to carefully examine the mapping between mutations in binding sites and their corresponding expression profiles, a tool useful not only for better designing MPRAs, but also for exploring regulatory evolution.
Keyphrases
- single cell
- genome wide
- transcription factor
- high throughput
- rna seq
- copy number
- poor prognosis
- gene expression
- dna methylation
- mitochondrial dna
- molecular dynamics
- genome wide identification
- healthcare
- binding protein
- dna binding
- high resolution
- primary care
- long non coding rna
- aqueous solution
- crispr cas
- social media
- quality improvement
- mass spectrometry
- machine learning
- amino acid
- genome wide analysis