Login / Signup

W2V-repeated index: Prediction of enhancers and their strength based on repeated fragments.

Weiming XieZhaomin YaoYizhe YuanJingwei TooFei LiHongyu WangYing ZhanXiaodan WuZhiguo WangGuoxu Zhang
Published in: Genomics (2024)
Enhancers are crucial in gene expression regulation, dictating the specificity and timing of transcriptional activity, which highlights the importance of their identification for unravelling the intricacies of genetic regulation. Therefore, it is critical to identify enhancers and their strengths. Repeated sequences in the genome are repeats of the same or symmetrical fragments. There has been a great deal of evidence that repetitive sequences contain enormous amounts of genetic information. Thus, We introduce the W2V-Repeated Index, designed to identify enhancer sequence fragments and evaluates their strength through the analysis of repeated K-mer sequences in enhancer regions. Utilizing the word2vector algorithm for numerical conversion and Manta Ray Foraging Optimization for feature selection, this method effectively captures the frequency and distribution of K-mer sequences. By concentrating on repeated K-mer sequences, it minimizes computational complexity and facilitates the analysis of larger K values. Experiments indicate that our method performs better than all other advanced methods on almost all indicators.
Keyphrases
  • gene expression
  • transcription factor
  • machine learning
  • genome wide
  • dna methylation
  • high frequency
  • genetic diversity
  • social media
  • oxidative stress
  • health information
  • heat shock