The complete sequence of a human Y chromosome.

Arang RhieSergey NurkMonika CechovaSavannah J HoytDylan J TaylorNicolas AltemosePaul W HookSergey KorenMikko RautiainenIvan A AlexandrovJamie AllenMobin AsriAndrey V BzikadzeNae-Chyun ChenChen-Shan ChinMark E DiekhansPaul FlicekGiulio FormentiArkarachai FungtammasanCarlos Garcia GironErik P GarrisonAriel GershmanJennifer L GertonPatrick G S GradyAndrea GuarracinoLeanne HaggertyReza HalabianNancy F HansenRobert HarrisGabrielle A HartleyWilliam T HarveyMarina HauknessJakob HeinzThibaut HourlierRobert M HubleySarah E HuntStephen HwangMiten JainRupesh K KesharwaniAlexandra P LewisHeng LiGlennis A LogsdonJulian K LucasWojciech MakalowskiChristopher MarkovicFergal J MartinAnn M Mc CartneyRajiv C McCoyJennifer McDanielBrandy M McNultyPaul MedvedevAlla MikheenkoKatherine M MunsonTerence D MurphyHugh E OlsenNathan D OlsonLuis F PaulinDavid PorubskyTamara A PotapovaFedor D RyabovSteven L SalzbergMichael E G SauriaFritz J SedlazeckKishwar ShafinValery A ShepelevAlaina ShumateJessica M StorerLikhitha SurapaneniAngela M Taravella OillFrançoise Thibaud-NissenWinston TimpMarta TomaszkiewiczMitchell R VollgerBrian P WalenzAllison C WatwoodMatthias H WeissensteinerAaron M WengerMelissa A WilsonSamantha ZarateYiming ZhuJustin M ZookEvan E EichlerRachel J O'NeillMichael C SchatzKaren H MigaKateryna D MakovaAdam M Phillippy
Published in: Nature (2023)
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications 1-3 . As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished 4,5 . Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome 4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.