Fundamentals for predicting transcriptional regulations from DNA sequence patterns.
Masaru KoidoKohei TomizukaChikashi C TeraoPublished in: Journal of human genetics (2024)
Cell-type-specific regulatory elements, cataloged through extensive experiments and bioinformatics in large-scale consortiums, have enabled enrichment analyses of genetic associations that primarily utilize positional information of the regulatory elements. These analyses have identified cell types and pathways genetically associated with human complex traits. However, our understanding of detailed allelic effects on these elements' activities and on-off states remains incomplete, hampering the interpretation of human genetic study results. This review introduces machine learning methods to learn sequence-dependent transcriptional regulation mechanisms from DNA sequences for predicting such allelic effects (not associations). We provide a concise history of machine-learning-based approaches, the requirements, and the key computational processes, focusing on primers in machine learning. Convolution and self-attention, pivotal in modern deep-learning models, are explained through geometrical interpretations using dot products. This facilitates understanding of the concept and why these have been used for machine learning for DNA sequences. These will inspire further research in this genetics and genomics field.
Keyphrases
- machine learning
- deep learning
- circulating tumor
- artificial intelligence
- endothelial cells
- cell free
- transcription factor
- genome wide
- single molecule
- big data
- single cell
- pluripotent stem cells
- gene expression
- nucleic acid
- working memory
- copy number
- dna methylation
- stem cells
- convolutional neural network
- healthcare
- amino acid
- oxidative stress
- heat shock protein
- energy transfer