An overview of SNP-SNP microhaplotypes in the 26 populations of the 1000 Genomes Project.
Jiaming XueShengqiu QuMengyu TanYuanyuan XiaoRanran ZhangDezhi ChenMeili LvYiming ZhangLin ZhangWei Bo LiangPublished in: International journal of legal medicine (2022)
Microhaplotypes (MHs) are a promising new type of forensic markers that are defined by the combinations of two- or more single-nucleotide polymorphisms (SNPs) within 200 bp. Their advantages, such as low mutation rates, lack of stutter artifacts, and short amplicons, have improved human identification, kinship analysis, ancestry prediction, and mixture deconvolution capabilities. Information on published MHs, e.g., allele frequencies, is available in widely used public databases, ALlele FREquency Database, and MicroHapDB. However, there are abundant non-published MHs spread over the whole genome, and those databases do not incorporate other databases (e.g., the SNP Database) to provide users with more integrated information. Therefore, it is essential to establish a robust, responsive, and comprehensive MHs database. In this study, we thoroughly screened for SNP-SNP MHs among 26 populations from the 1000 Genomes Project (Phase 3). All genotype data of SNPs in each MH were converted to PHASE input files, and allele frequencies were estimated using PHASE. We compiled a detailed summary of SNP-SNPs at the global, continental, and population levels focused on haplotypes and the A e value and supplemented our database using dbSNP data (last updated in 2015). We have successfully established a dual-SNP MH database (D-SNPsDB) of MHs within 50 bp for 26 populations in the integration of basic data such as physical positions in the human genome, mapping of variant identifiers (rsIDs), allele frequencies, and basic variant information. For public database queries, the D-SNPsDB web app was developed with the R Shiny package to get integrated information.
Keyphrases
- genome wide
- dna methylation
- adverse drug
- high density
- genetic diversity
- big data
- electronic health record
- endothelial cells
- mental health
- health information
- randomized controlled trial
- physical activity
- emergency department
- machine learning
- induced pluripotent stem cells
- systematic review
- gene expression
- artificial intelligence
- deep learning
- meta analyses