MethyMer: Design of combinations of specific primers for bisulfite sequencing of complete CpG islands.
George S KrasnovNataliya V MelnikovaValentina A LakuninaAnastasiya V SnezhkinaAnna V KudryavtsevaAlexey A DmitrievPublished in: Journal of bioinformatics and computational biology (2018)
We present MethyMer, a Python-based tool aimed at selecting primers for amplification of complete CpG islands. These regions are difficult in terms of selecting appropriate primers because of their low-complexity, high GC content. Moreover, bisulfite treatment, in fact, leads to the reduction of the 4-letter alphabet (ATGC) to 3-letter one (ATG, except for methylated cytosines), and this also reduces region complexity and increases mispriming potential. MethyMer has a flexible scoring system, which optimizes the balance between various characteristics such as nucleotide composition, thermodynamic features (melting temperature, dimers [Formula: see text]G, etc.), the presence of CpG sites and polyN tracts, and primer specificity, which is assessed with aligning primers to the bisulfite-treated genome using bowtie (up to three mismatches are allowed). Users are able to customize desired or limit ranges of various parameters as well as penalties for non-desired values. Moreover, MethyMer allows picking up the optimal combination of PCR primer pairs to perform the amplification of a large genomic locus, e.g. CpG island or other hard-to-study region, with minimal overlap of the individual amplicons. MethyMer incorporates ENCODE genome annotation records (promoter/enhancer/insulator), The Cancer Genome Atlas (TCGA) CpG methylation data derived with Illumina Infinium 450K microarrays, and records on correlations between TCGA RNA-Seq and CpG methylation data for 20 cancer types. These databases are included in the MethyMer release. Our tool is available at https://sourceforge.net/projects/methymer/ .
Keyphrases
- dna methylation
- genome wide
- rna seq
- single cell
- gene expression
- papillary thyroid
- copy number
- big data
- electronic health record
- transcription factor
- high resolution
- machine learning
- squamous cell carcinoma
- risk assessment
- smoking cessation
- artificial intelligence
- newly diagnosed
- tandem mass spectrometry
- data analysis