Deciphering regulatory architectures from synthetic single-cell expression patterns.
Rosalind Wenshan PanTom RöschingerKian FaiziHernan G GarciaRob PhillipsPublished in: bioRxiv : the preprint server for biology (2024)
With the rapid advancement of sequencing technology, there has been an exponential increase in the amount of data on the genomic sequences of diverse organisms. Nevertheless, deciphering the sequence-phenotype mapping of the genomic data remains a formidable task, especially when dealing with non-coding sequences such as the promoter. In current databases, annotations on transcription factor binding sites are sorely lacking, which creates a challenge for developing a systematic theory of transcriptional regulation. To address this gap in knowledge, high-throughput methods such as massively parallel reporter assays (MPRAs) have been employed to decipher the regulatory genome. In this work, we make use of thermodynamic models to computationally simulate MPRAs in the context of transcriptional regulation and produce thousands of synthetic MPRA datasets. We examine how well typical experimental and data analysis procedures of MPRAs are able to recover common regulatory architectures under different sets of experimental and biological parameters. By establishing a dialogue between high-throughput experiments and a physical theory of transcription, our efforts serve to both improve current experimental procedures and enhancing our broader understanding of the sequence-function landscape of regulatory sequences.