Login / Signup

Automated assembly of centromeres from ultra-long error-prone reads.

Andrey V BzikadzePavel A Pevzner
Published in: Nature biotechnology (2020)
Centromeric variation has been linked to cancer and infertility, but centromere sequences contain multiple tandem repeats and can only be assembled manually from long error-prone reads. Here we describe the centroFlye algorithm for centromere assembly using long error-prone reads, and apply it to assemble human centromeres on chromosomes 6 and X. Our analyses reveal putative breakpoints in the manual reconstruction of the human X centromere, demonstrate that human X chromosome is partitioned into repeat subfamilies and provide initial insights into centromere evolution. We anticipate that centroFlye could be applied to automatically close remaining multimegabase gaps in the reference human genome.
Keyphrases
  • endothelial cells
  • induced pluripotent stem cells
  • pluripotent stem cells
  • deep learning
  • gene expression
  • mass spectrometry
  • high resolution
  • adipose tissue
  • polycystic ovary syndrome
  • neural network