Assembled and annotated 26.5 Gbp coast redwood genome: a resource for estimating evolutionary adaptive potential and investigating hexaploid origin.
David B NealeAleksey V ZiminSumaira ZamanAlison D ScottBikash ShresthaRachael E WorkmanDaniela PuiuBrian J AllenZane J MooreManoj K SekhwalAmanda R De La TorrePatrick E McGuireEmily BurnsWinston TimpJill L WegrzynSteven L SalzbergPublished in: G3 (Bethesda, Md.) (2022)
Sequencing, assembly, and annotation of the 26.5 Gbp hexaploid genome of coast redwood (Sequoia sempervirens) was completed leading toward discovery of genes related to climate adaptation and investigation of the origin of the hexaploid genome. Deep-coverage short-read Illumina sequencing data from haploid tissue from a single seed were combined with long-read Oxford Nanopore Technologies sequencing data from diploid needle tissue to create an initial assembly, which was then scaffolded using proximity ligation data to produce a highly contiguous final assembly, SESE 2.1, with a scaffold N50 size of 44.9 Mbp. The assembly included several scaffolds that span entire chromosome arms, confirmed by the presence of telomere and centromere sequences on the ends of the scaffolds. The structural annotation produced 118,906 genes with 113 containing introns that exceed 500 Kbp in length and one reaching 2 Mb. Nearly 19 Gbp of the genome represented repetitive content with the vast majority characterized as long terminal repeats, with a 2.9:1 ratio of Copia to Gypsy elements that may aid in gene expression control. Comparison of coast redwood to other conifers revealed species-specific expansions for a plethora of abiotic and biotic stress response genes, including those involved in fungal disease resistance, detoxification, and physical injury/structural remodeling and others supporting flavonoid biosynthesis. Analysis of multiple genes that exist in triplicate in coast redwood but only once in its diploid relative, giant sequoia, supports a previous hypothesis that the hexaploidy is the result of autopolyploidy rather than any hybridizations with separate but closely related conifer species.
Keyphrases
- genome wide
- dna methylation
- single cell
- gene expression
- genome wide identification
- electronic health record
- copy number
- single molecule
- bioinformatics analysis
- tissue engineering
- rna seq
- genome wide analysis
- healthcare
- physical activity
- high throughput
- high frequency
- data analysis
- ultrasound guided
- risk assessment
- cell wall
- human health
- health insurance
- rare case
- arabidopsis thaliana