Deep learning models predict regulatory variants in pancreatic islets and refine type 2 diabetes association signals.
Agata Wesolowska-AndersenGrace Zhuo YuVibe NylanderFernando AbaituaMatthias ThurnerJason M TorresAnubha MahajanAnna L GloynMark I McCarthyPublished in: eLife (2020)
Genome-wide association analyses have uncovered multiple genomic regions associated with T2D, but identification of the causal variants at these remains a challenge. There is growing interest in the potential of deep learning models - which predict epigenome features from DNA sequence - to support inference concerning the regulatory effects of disease-associated variants. Here, we evaluate the advantages of training convolutional neural network (CNN) models on a broad set of epigenomic features collected in a single disease-relevant tissue - pancreatic islets in the case of type 2 diabetes (T2D) - as opposed to models trained on multiple human tissues. We report convergence of CNN-based metrics of regulatory function with conventional approaches to variant prioritization - genetic fine-mapping and regulatory annotation enrichment. We demonstrate that CNN-based analyses can refine association signals at T2D-associated loci and provide experimental validation for one such signal. We anticipate that these approaches will become routine in downstream analyses of GWAS.
Keyphrases
- convolutional neural network
- deep learning
- copy number
- genome wide association
- type diabetes
- transcription factor
- artificial intelligence
- genome wide
- dna methylation
- machine learning
- endothelial cells
- gene expression
- high resolution
- cardiovascular disease
- air pollution
- metabolic syndrome
- adipose tissue
- insulin resistance
- single molecule
- clinical practice
- risk assessment
- climate change
- induced pluripotent stem cells
- human health
- bioinformatics analysis