In Silico Verification of Predicted Potential Promoter Sequences in the Rice ( Oryza sativa ) Genome.
Anastasiya N BubnovaIrina V YakovlevaEugene V KorotkovAnastasiya M KamionskayaPublished in: Plants (Basel, Switzerland) (2023)
The exact identification of promoter sequences remains a serious problem in computational biology, as the promoter prediction algorithms under development continue to produce false-positive results. Therefore, to fully assess the validity of predicted sequences, it is necessary to perform a comprehensive test of their properties, such as the presence of downstream transcribed DNA regions behind them, or chromatin accessibility for transcription factor binding. In this paper, we examined the promoter sequences of chromosome 1 of the rice Oryza sativa genome from the Database of Potential Promoter Sequences predicted using a mathematical algorithm based on the derivation and calculation of statistically significant promoter classes. In this paper TATA motifs and cis-regulatory elements were identified in the predicted promoter sequences. We also verified the presence of potential transcription start sites near the predicted promoters by analyzing CAGE-seq data. We searched for unannotated transcripts behind the predicted sequences by de novo assembling transcripts from RNA-seq data. We also examined chromatin accessibility in the region of the predicted promoters by analyzing ATAC-seq data. As a result of this work, we identified the predicted sequences that are most likely to be promoters for further experimental validation in an in vivo or in vitro system.
Keyphrases
- transcription factor
- dna methylation
- rna seq
- genome wide
- gene expression
- dna binding
- single cell
- machine learning
- electronic health record
- big data
- genetic diversity
- genome wide identification
- risk assessment
- emergency department
- molecular docking
- cell free
- molecular dynamics simulations
- molecular dynamics
- single molecule