Login / Signup

The complete human diploid reference genome of RPE-1 identifies the phased epigenetic landscapes from multi-omics data.

Emilia VolpeLuca CordaElena Di TommasoFranca PellicciaRiccardo OttaleviDanilo LicastroAndrea GuarracinoMattia CapulliGiulio FormentiEvelyne TassoneSimona Giunta
Published in: bioRxiv : the preprint server for biology (2023)
Comparative analysis of recent human genome assemblies highlights profound sequence divergence that peaks within polymorphic loci such as centromeres. This raises the question about the adequacy of relying on human reference genomes to accurately analyze sequencing data derived from experimental cell lines. Here we propose a novel approach, referred to as "isogenomic reference", that leverages a matched reference genome to perform multi-omics analyses. We have generated a new diploid genome assembly for the human retinal epithelial cells (RPE-1), a widely used non-cancer laboratory cell line with a stable diploid karyotype, that presents phased haplotypes and chromosome-level scaffolds that completely span centromeres. Using this assembly, we have characterized haplotype-resolved genomic variation unique to RPE- 1, including a stable marker chromosome X with a 73.18 Mb segmental duplication of chromosome 10 translocated onto the microdeleted telomere t(Xq;10q), specific to this cell line. Comparative analyses revealed sequence polymorphism within centromeric regions, including unexpected genetic and epigenetic diversity among haplotypes for all chromosomes. Using our assembly as reference, we re-analyzed both our own and publicly available sequencing, methylation and epigenetic data generated in RPE-1 which had previously been analyzed with non-matched and non-diploid reference genomes. Our results show that the isogenomic reference improves alignments with an increased mapping quality up to 85% while halving mismatches, resulting in significant changes in peaks calling related to centromeres. Our work represents a proof-of-concept, showcasing the use of matched reference genomes for multiomics analyses and, at scale, serves as the foundation for a call to comprehensively assemble experimentally relevant cell lines for a widespread application of isogenomic reference genomes.
Keyphrases