Biomedical Data Commons (BMDC) prioritizes B-lymphocyte non-coding genetic variants in Type 1 Diabetes.
Samantha N PiekosSadhana GaddamPranav BhardwajPrashanth RadhakrishnanRamanathan V GuhaAnthony E OroPublished in: PLoS computational biology (2021)
The repurposing of biomedical data is inhibited by its fragmented and multi-formatted nature that requires redundant investment of time and resources by data scientists. This is particularly true for Type 1 Diabetes (T1D), one of the most intensely studied common childhood diseases. Intense investigation of the contribution of pancreatic β-islet and T-lymphocytes in T1D has been made. However, genetic contributions from B-lymphocytes, which are known to play a role in a subset of T1D patients, remain relatively understudied. We have addressed this issue through the creation of Biomedical Data Commons (BMDC), a knowledge graph that integrates data from multiple sources into a single queryable format. This increases the speed of analysis by multiple orders of magnitude. We develop a pipeline using B-lymphocyte multi-dimensional epigenome and connectome data and deploy BMDC to assess genetic variants in the context of Type 1 Diabetes (T1D). Pipeline-identified variants are primarily common, non-coding, poorly conserved, and are of unknown clinical significance. While variants and their chromatin connectivity are cell-type specific, they are associated with well-studied disease genes in T-lymphocytes. Candidates include established variants in the HLA-DQB1 and HLA-DRB1 and IL2RA loci that have previously been demonstrated to protect against T1D in humans and mice providing validation for this method. Others are included in the well-established T1D GRS2 genetic risk scoring method. More intriguingly, other prioritized variants are completely novel and form the basis for future mechanistic and clinical validation studies The BMDC community-based platform can be expanded and repurposed to increase the accessibility, reproducibility, and productivity of biomedical information for diverse applications including the prioritization of cell type-specific disease alleles from complex phenotypes.
Keyphrases
- type diabetes
- electronic health record
- copy number
- genome wide
- big data
- healthcare
- end stage renal disease
- cardiovascular disease
- peritoneal dialysis
- transcription factor
- peripheral blood
- systemic lupus erythematosus
- glycemic control
- resting state
- systemic sclerosis
- climate change
- high throughput
- disease activity
- single cell
- idiopathic pulmonary fibrosis
- case control
- insulin resistance
- artificial intelligence
- white matter
- oxidative stress
- social media
- convolutional neural network
- interstitial lung disease