Rare coding variant analysis for human diseases across biobanks and ancestries.
Sean Joseph JurgensXin WangSeung Hoan ChoiLu-Chen WangSatoshi KoyamaJames Paul PirruccelloTrang T NguyenPatrick SmadbeckDongkeun JangMark D ChaffinRoddy WalshCarolina RoselliAmanda L ElliottLeonoor F J M WijdeveldKiran J BiddingerShinwan KanyJoel T RämöPradeep NatarajanKrishna G AragamJason FlannickNoël P BurttConnie R BezzinaSteven A LubitzKathryn L LunettaPatrick T EllinorPublished in: Nature genetics (2024)
Large-scale sequencing has enabled unparalleled opportunities to investigate the role of rare coding variation in human phenotypic variability. Here, we present a pan-ancestry analysis of sequencing data from three large biobanks, including the All of Us research program. Using mixed-effects models, we performed gene-based rare variant testing for 601 diseases across 748,879 individuals, including 155,236 with ancestry dissimilar to European. We identified 363 significant associations, which highlighted core genes for the human disease phenome and identified potential novel associations, including UBR3 for cardiometabolic disease and YLPM1 for psychiatric disease. Pan-ancestry burden testing represented an inclusive and useful approach for discovery in diverse datasets, although we also highlight the importance of ancestry-specific sensitivity analyses in this setting. Finally, we found that effect sizes for rare protein-disrupting variants were concordant between samples similar to European ancestry and other genetic ancestries (β Deming = 0.7-1.0). Our results have implications for multi-ancestry and cross-biobank approaches in sequencing association studies for human disease.