Accurate multi-population imputation of MICA, MICB, HLA-E, HLA-F and HLA-G alleles from genome SNP data.
Silja TammiSatu KoskelaBlood Service BiobankKati HyvärinenJukka PartanenJarmo RitariPublished in: PLoS computational biology (2024)
In addition to the classical HLA genes, the major histocompatibility complex (MHC) harbors a high number of other polymorphic genes with less established roles in disease associations and transplantation matching. To facilitate studies of the non-classical and non-HLA genes in large patient and biobank cohorts, we trained imputation models for MICA, MICB, HLA-E, HLA-F and HLA-G alleles on genome SNP array data. We show, using both population-specific and multi-population 1000 Genomes references, that the alleles of these genes can be accurately imputed for screening and research purposes. The best imputation model for MICA, MICB, HLA-E, -F and -G achieved a mean accuracy of 99.3% (min, max: 98.6, 99.9). Furthermore, validation of the 1000 Genomes exome short-read sequencing-based allele calling against a clinical-grade reference data showed an average accuracy of 99.8%, testifying for the quality of the 1000 Genomes data as an imputation reference. We also fitted the models for Infinium Global Screening Array (GSA, Illumina, Inc.) and Axiom Precision Medicine Research Array (PMRA, Thermo Fisher Scientific Inc.) SNP content, with mean accuracies of 99.1% (97.2, 100) and 98.9% (97.4, 100), respectively.