A comprehensive WGS-based pipeline for the identification of new candidate genes in inherited retinal dystrophies.
María González-Del PozoElena Fernández-SuárezNereida Bravo-GilCristina Méndez-VidalMarta Martín-SánchezEnrique Rodríguez-de la RúaManuel Ramos-JiménezMaría José Morillo-SánchezSalud BorregoGuillermo AntiñoloPublished in: NPJ genomic medicine (2022)
To enhance the use of Whole Genome Sequencing (WGS) in clinical practice, it is still necessary to standardize data analysis pipelines. Herein, we aimed to define a WGS-based algorithm for the accurate interpretation of variants in inherited retinal dystrophies (IRD). This study comprised 429 phenotyped individuals divided into three cohorts. A comparison of 14 pathogenicity predictors, and the re-definition of its cutoffs, were performed using panel-sequencing curated data from 209 genetically diagnosed individuals with IRD (training cohort). The optimal tool combinations, previously validated in 50 additional IRD individuals, were also tested in patients with hereditary cancer (n = 109), and with neurological diseases (n = 47) to evaluate the translational value of this approach (validation cohort). Then, our workflow was applied for the WGS-data analysis of 14 individuals from genetically undiagnosed IRD families (discovery cohort). The statistical analysis showed that the optimal filtering combination included CADDv1.6, MAPP, Grantham, and SIFT tools. Our pipeline allowed the identification of one homozygous variant in the candidate gene CFAP20 (c.337 C > T; p.Arg113Trp), a conserved ciliary gene, which was abundantly expressed in human retina and was located in the photoreceptors layer. Although further studies are needed, we propose CFAP20 as a candidate gene for autosomal recessive retinitis pigmentosa. Moreover, we offer a translational strategy for accurate WGS-data prioritization, which is essential for the advancement of personalized medicine.
Keyphrases
- data analysis
- copy number
- electronic health record
- diabetic retinopathy
- genome wide
- clinical practice
- big data
- optical coherence tomography
- machine learning
- endothelial cells
- optic nerve
- small molecule
- genome wide identification
- deep learning
- high resolution
- high throughput
- squamous cell carcinoma
- gene expression
- escherichia coli
- dna methylation
- pseudomonas aeruginosa
- induced pluripotent stem cells
- bioinformatics analysis
- young adults