Assessment of transcriptional importance of cell line-specific features based on GTRD and FANTOM5 data.
Ruslan N SharipovYury V KondrakhinAnna S RyabovaIvan S YevshinFedor A KolpakovPublished in: PloS one (2020)
Creating a complete picture of the regulation of transcription seems to be an urgent task of modern biology. Regulation of transcription is a complex process carried out by transcription factors (TFs) and auxiliary proteins. Over the past decade, ChIP-Seq has become the most common experimental technology studying genome-wide interactions between TFs and DNA. We assessed the transcriptional significance of cell line-specific features using regression analysis of ChIP-Seq datasets from the GTRD database and transcriptional start site (TSS) activities from the FANTOM5 expression atlas. For this purpose, we initially generated a large number of features that were defined as the presence or absence of TFs in different promoter regions around TSSs. Using feature selection and regression analysis, we identified sets of the most important TFs that affect expression activity of TSSs in human cell lines such as HepG2, K562 and HEK293. We demonstrated that some TFs can be classified as repressors and activators depending on their location relative to TSS.
Keyphrases
- transcription factor
- genome wide
- single cell
- dna methylation
- poor prognosis
- gene expression
- rna seq
- dna binding
- high throughput
- endothelial cells
- circulating tumor cells
- binding protein
- genome wide identification
- machine learning
- circulating tumor
- emergency department
- copy number
- deep learning
- cell free
- heat shock
- big data
- electronic health record
- long non coding rna
- induced pluripotent stem cells
- nucleic acid
- high throughput sequencing