Login / Signup

Genome-wide association studies combined with k-fold cross-validation identify rs17822931 as an ancestry-informative marker in Han Chinese population.

Zheng LiJiayi WuJiawen YangKai LiJi ChenShuainan HuangQiang JiXiaochao KongSumei XieWenxuan ZhanBeilei ZhangKe YeQingfan LiuZhengsheng MaoYue CaoHuijie HuangYoujia YuKang WangYanfang YuDing LiFeng ChenPeng Chen
Published in: Electrophoresis (2023)
DNA-based ancestry inference has long been a research hot spot in forensic science. The differentiation of Han Chinese population, such as the northern-to-southern substructure, would benefit forensic practice. In the present study, we enrolled participants from northern and southern China, each participant was genotyped at ∼400 K single-nucleotide polymorphisms (SNPs) and data of CHB and CHS from 1000 Genomes Project were used to perform genome-wide association analyses. Meanwhile, a new method combining genome-wide association study (GWAS) analyses with k-fold cross-validation in a small sample size was introduced. As a result, one SNP rs17822931 emerged with a p-value of 7.51E - 6. We also simulated a huge dataset to verify whether k-fold cross-validation could reduce the false-negative rate of GWAS. The identified ABCC11 rs17822931 has been reported to have allele frequencies varied with the geographical gradient distribution in humans. We also found a great difference in the allele frequency distributions of rs17822931 among five different cohorts of the Chinese population. In conclusion, our study demonstrated that even small-scale GWAS can also have potential to identify effective loci with implemented k-fold cross-validation method and shed light on the potential maker of rs17822931 in differentiating the north-to-south substructure of the Han Chinese population.
Keyphrases