Structure-based network analysis predicts pathogenic variants in human proteins associated with inherited retinal disease.
Blake M HauserYuyang LuoAnusha NathanAhmad Al-MoujahedDemetrios George VavvasJason ComanderEric A PierceEmily M PlaceKinga M BujakowskaGaurav D GaihaElizabeth J RossinPublished in: NPJ genomic medicine (2024)
Advances in gene sequencing technologies have accelerated the identification of genetic variants, but better tools are needed to understand which are causal of disease. This would be particularly useful in fields where gene therapy is a potential therapeutic modality for a disease-causing variant such as inherited retinal disease (IRD). Here, we apply structure-based network analysis (SBNA), which has been successfully utilized to identify variant-constrained amino acid residues in viral proteins, to identify residues that may cause IRD if subject to missense mutation. SBNA is based entirely on structural first principles and is not fit to specific outcome data, which makes it distinct from other contemporary missense prediction tools. In 4 well-studied human disease-associated proteins (BRCA1, HRAS, PTEN, and ERK2) with high-quality structural data, we find that SBNA scores correlate strongly with deep mutagenesis data. When applied to 47 IRD genes with available high-quality crystal structure data, SBNA scores reliably identified disease-causing variants according to phenotype definitions from the ClinVar database. Finally, we applied this approach to 63 patients at Massachusetts Eye and Ear (MEE) with IRD but for whom no genetic cause had been identified. Untrained models built using SBNA scores and BLOSUM62 scores for IRD-associated genes successfully predicted the pathogenicity of novel variants (AUC = 0.851), allowing us to identify likely causative disease variants in 40 IRD patients. Model performance was further augmented by incorporating orthogonal data from EVE scores (AUC = 0.927), which are based on evolutionary multiple sequence alignments. In conclusion, SBNA can used to successfully identify variants as causal of disease in human proteins and may help predict variants causative of IRD in an unbiased fashion.
Keyphrases
- copy number
- endothelial cells
- genome wide
- electronic health record
- big data
- optical coherence tomography
- newly diagnosed
- gene expression
- dna methylation
- chronic kidney disease
- sars cov
- cell proliferation
- end stage renal disease
- body composition
- artificial intelligence
- ejection fraction
- single cell
- drug induced
- peritoneal dialysis