TRACE: transcription factor footprinting using chromatin accessibility data and DNA sequence.
Ningxin OuyangAlan P BoylePublished in: Genome research (2020)
Transcription is tightly regulated by cis-regulatory DNA elements where transcription factors (TFs) can bind. Thus, identification of TF binding sites (TFBSs) is key to understanding gene expression and whole regulatory networks within a cell. The standard approaches used for TFBS prediction, such as position weight matrices (PWMs) and chromatin immunoprecipitation followed by sequencing (ChIP-seq), are widely used but have their drawbacks, including high false-positive rates and limited antibody availability, respectively. Several computational footprinting algorithms have been developed to detect TFBSs by investigating chromatin accessibility patterns; however, these also have limitations. We have developed a footprinting method to predict TF footprints in active chromatin elements (TRACE) to improve the prediction of TFBS footprints. TRACE incorporates DNase-seq data and PWMs within a multivariate hidden Markov model (HMM) to detect footprint-like regions with matching motifs. TRACE is an unsupervised method that accurately annotates binding sites for specific TFs automatically with no requirement for pregenerated candidate binding sites or ChIP-seq training data. Compared with published footprinting algorithms, TRACE has the best overall performance with the distinct advantage of targeting multiple motifs in a single model.
Keyphrases
- transcription factor
- single cell
- gene expression
- genome wide
- machine learning
- heavy metals
- rna seq
- dna binding
- electronic health record
- high throughput
- big data
- genome wide identification
- dna methylation
- dna damage
- circulating tumor
- physical activity
- single molecule
- data analysis
- circulating tumor cells
- body mass index
- weight loss
- deep learning
- risk assessment
- cell therapy
- virtual reality
- amino acid