Repeat Detector: versatile sizing of expanded tandem repeats and identification of interrupted alleles from targeted DNA sequencing.
Alysha S TaylorDinis BarrosNastassia GobetThierry SchuepbachBranduff McAllisterLorene AeschbachEmma L RandallEvgeniya TrofimenkoEleanor R HeuchanPaula BarszczMarc CiosiJoanne MorganNathaniel J Hafford-TearAlice E DavidsonThomas H MasseyDarren G MoncktonLesley JonesRegistry Investigators Of The European Huntington's Disease NetworkIoannis XenariosVincent DionPublished in: NAR genomics and bioinformatics (2022)
Targeted DNA sequencing approaches will improve how the size of short tandem repeats is measured for diagnostic tests and preclinical studies. The expansion of these sequences causes dozens of disorders, with longer tracts generally leading to a more severe disease. Interrupted alleles are sometimes present within repeats and can alter disease manifestation. Determining repeat size mosaicism and identifying interruptions in targeted sequencing datasets remains a major challenge. This is in part because standard alignment tools are ill-suited for repetitive and unstable sequences. To address this, we have developed Repeat Detector (RD), a deterministic profile weighting algorithm for counting repeats in targeted sequencing data. We tested RD using blood-derived DNA samples from Huntington's disease and Fuchs endothelial corneal dystrophy patients sequenced using either Illumina MiSeq or Pacific Biosciences single-molecule, real-time sequencing platforms. RD was highly accurate in determining repeat sizes of 609 blood-derived samples from Huntington's disease individuals and did not require prior knowledge of the flanking sequences. Furthermore, RD can be used to identify alleles with interruptions and provide a measure of repeat instability within an individual. RD is therefore highly versatile and may find applications in the diagnosis of expanded repeat disorders and in the development of novel therapies.
Keyphrases
- single molecule
- single cell
- cancer therapy
- circulating tumor
- cell free
- end stage renal disease
- healthcare
- living cells
- machine learning
- atomic force microscopy
- rna seq
- endothelial cells
- deep learning
- mass spectrometry
- high resolution
- computed tomography
- magnetic resonance
- high frequency
- peritoneal dialysis
- optical coherence tomography
- big data
- drug induced
- image quality
- wound healing