High-throughput Interpretation of Killer-cell Immunoglobulin-like Receptor Short-read Sequencing Data with PING.
Wesley M MarinRavi DandekarDanillo Gardenal AugustoTasneem YusufaliBianca HeynJan A HofmannVinzenz LangeJürgen SauterPaul J NormanJill A HollenbachPublished in: PLoS computational biology (2021)
The killer-cell immunoglobulin-like receptor (KIR) complex on chromosome 19 encodes receptors that modulate the activity of natural killer cells, and variation in these genes has been linked to infectious and autoimmune disease, as well as having bearing on pregnancy and transplant outcomes. The medical relevance and high variability of KIR genes makes short-read sequencing an attractive technology for interrogating the region, providing a high-throughput, high-fidelity sequencing method that is cost-effective. However, because this gene complex is characterized by extensive nucleotide polymorphism, structural variation including gene fusions and deletions, and a high level of homology between genes, its interrogation at high resolution has been thwarted by bioinformatic challenges, with most studies limited to examining presence or absence of specific genes. Here, we present the PING (Pushing Immunogenetics to the Next Generation) pipeline, which incorporates empirical data, novel alignment strategies and a custom alignment processing workflow to enable high-throughput KIR sequence analysis from short-read data. PING provides KIR gene copy number classification functionality for all KIR genes through use of a comprehensive alignment reference. The gene copy number determined per individual enables an innovative genotype determination workflow using genotype-matched references. Together, these methods address the challenges imposed by the structural complexity and overall homology of the KIR complex. To determine copy number and genotype determination accuracy, we applied PING to European and African validation cohorts and a synthetic dataset. PING demonstrated exceptional copy number determination performance across all datasets and robust genotype determination performance. Finally, an investigation into discordant genotypes for the synthetic dataset provides insight into misaligned reads, advancing our understanding in interpretation of short-read sequencing data in complex genomic regions. PING promises to support a new era of studies of KIR polymorphism, delivering high-resolution KIR genotypes that are highly accurate, enabling high-quality, high-throughput KIR genotyping for disease and population studies.
Keyphrases
- copy number
- genome wide
- high throughput
- single cell
- mitochondrial dna
- dna methylation
- high resolution
- electronic health record
- rna seq
- genome wide identification
- natural killer cells
- single molecule
- big data
- solid phase extraction
- healthcare
- machine learning
- bioinformatics analysis
- molecularly imprinted
- gene expression
- artificial intelligence
- cell therapy
- deep learning
- pregnant women
- metabolic syndrome
- mass spectrometry
- mesenchymal stem cells
- bone marrow
- genome wide analysis