Predicting gene expression responses to environment in Arabidopsis thaliana using natural variation in DNA sequence.
Margarita TakouEmily S BellisJesse R LaskyPublished in: bioRxiv : the preprint server for biology (2024)
The evolution of gene expression responses are a critical component of adaptation to variable environments. Predicting how DNA sequence influences expression is challenging because the genotype to phenotype map is not well resolved for cis regulatory elements, transcription factor binding, regulatory interactions, and epigenetic features, not to mention how these factors respond to environment. We tested if flexible machine learning models could learn some of the underlying cis- regulatory genotype to phenotype map. We tested this approach using cold-responsive transcriptome profiles in 5 diverse Arabidopsis thaliana accessions. We first tested for evidence that cis regulation plays a role in environmental response, finding 14 and 15 motifs that were significantly enriched within the up- and down-stream regions of cold-responsive differentially regulated genes (DEGs). We next applied convolutional neural networks (CNNs), which learn de novo cis- regulatory motifs in DNA sequences to predict expression response to environment. We found that CNNs predicted differential expression with moderate accuracy, with evidence that predictions were hindered by biological complexity of regulation and the large potential regulatory code. Overall, DEGs between specific environments can be predicted based on variation in cis- regulatory sequences, although more information needs to be incorporated and better models may be required.
Keyphrases
- transcription factor
- gene expression
- arabidopsis thaliana
- machine learning
- dna methylation
- circulating tumor
- poor prognosis
- convolutional neural network
- dna binding
- single molecule
- genome wide
- genome wide identification
- cancer therapy
- deep learning
- high intensity
- nucleic acid
- drug delivery
- single cell
- artificial intelligence