Underlying causes for prevalent false positives and false negatives in STARR-seq data.
Pengyu NiSiwen WuZhengchang SuPublished in: NAR genomics and bioinformatics (2023)
Self-transcribing active regulatory region sequencing (STARR-seq) and its variants have been widely used to characterize enhancers. However, it has been reported that up to 87% of STARR-seq peaks are located in repressive chromatin and are not functional in the tested cells. While some of the STARR-seq peaks in repressive chromatin might be active in other cell/tissue types, some others might be false positives. Meanwhile, many active enhancers may not be identified by the current STARR-seq methods. Although methods have been proposed to mitigate systematic errors caused by the use of plasmid vectors, the artifacts due to the intrinsic limitations of current STARR-seq methods are still prevalent and the underlying causes are not fully understood. Based on predicted cis -regulatory modules (CRMs) and non-CRMs in the human genome as well as predicted active CRMs and non-active CRMs in a few human cell lines/tissues with STARR-seq data available, we reveal prevalent false positives and false negatives in STARR-seq peaks generated by major variants of STARR-seq methods and possible underlying causes. Our results will help design strategies to improve STARR-seq methods and interpret the results.
Keyphrases
- genome wide
- single cell
- rna seq
- dna methylation
- copy number
- endothelial cells
- transcription factor
- gene expression
- magnetic resonance imaging
- escherichia coli
- emergency department
- dna damage
- stem cells
- induced apoptosis
- electronic health record
- mesenchymal stem cells
- crispr cas
- oxidative stress
- cell proliferation
- computed tomography
- artificial intelligence
- quality improvement
- adverse drug