Population-specific genetic variation in large sequencing data sets: why more data is still better.
Jeroen G J van RooijMila JhamaiPascal P ArpStephan C A NouwensMarijn VerkerkAlbert HofmanMohammad Arfan IkramAnnemieke J VerkerkJoyce B J van MeursFernando RivadeneiraAndré G UitterlindenRobert KraaijPublished in: European journal of human genetics : EJHG (2017)
We have generated a next-generation whole-exome sequencing data set of 2628 participants of the population-based Rotterdam Study cohort, comprising 669 737 single-nucleotide variants and 24 019 short insertions and deletions. Because of broad and deep longitudinal phenotyping of the Rotterdam Study, this data set permits extensive interpretation of genetic variants on a range of clinically relevant outcomes, and is accessible as a control data set. We show that next-generation sequencing data sets yield a large degree of population-specific variants, which are not captured by other available large sequencing efforts, being ExAC, ESP, 1000G, UK10K, GoNL and DECODE.