Machine learning applications for transcription level and phenotype predictions.
Juthamard ChantaraampornPongpannee PhumikhetSarintip NguantadTodsapol TechoVarodom CharoensawanPublished in: IUBMB life (2022)
Predicting phenotypes and complex traits from genomic variations has always been a big challenge in molecular biology, at least in part because the task is often complicated by the influences of external stimuli and the environment on regulation of gene expression. With today's abundance of omic data and advances in high-throughput computing and machine learning (ML), we now have an unprecedented opportunity to uncover the missing links and molecular mechanisms that control gene expression and phenotypes. To empower molecular biologists and researchers in related fields to start using ML for in-depth analyses of their large-scale data, here we provide a summary of fundamental concepts of machine learning, and describe a wide range of research questions and scenarios in molecular biology where ML has been implemented. Due to the abundance of data, reproducibility, and genome-wide coverage, we focus on transcriptomics, and two ML tasks involving it: (a) predicting of transcriptomic profiles or transcription levels from genomic variations in DNA, and (b) predicting phenotypes of interest from transcriptomic profiles or transcription levels. Similar approaches can also be applied to more complex data such as those in multi-omic studies. We envisage that the concepts and examples described here will raise awareness and promote the application of ML among molecular biologists, and eventually help improve a framework for systematic design and predictions of gene expression and phenotypes for synthetic biology applications.
Keyphrases
- gene expression
- machine learning
- big data
- dna methylation
- genome wide
- electronic health record
- single cell
- high throughput
- artificial intelligence
- single molecule
- transcription factor
- copy number
- rna seq
- healthcare
- climate change
- working memory
- antibiotic resistance genes
- wastewater treatment
- circulating tumor cells
- health insurance