Construction of a reference transcriptome for the analysis of male sterility in sugi (Cryptomeria japonica D. Don) focusing on MALE STERILITY 1 (MS1).
Fu-Jin WeiSaneyoshi UenoTokuko Ujino-IharaMaki SaitoYoshihiko TsumuraYuumi HiguchiSatoko HirayamaJunji IwaiTetsuji HakamataYoshinari MoriguchiPublished in: PloS one (2021)
Sugi (Cryptomeria japonica D. Don) is an important conifer used for afforestation in Japan. As the genome of this species is 11 Gbps, it is too large to assemble within a short timeframe. Transcriptomics is one approach that can address this deficiency. Here we designed a workflow consisting of three stages to de novo assemble transcriptome using Oases and Trinity. The three transcriptomic stage used were independent assembly, automatic and semi-manual integration, and refinement by filtering out potential contamination. We identified a set of 49,795 cDNA and an equal number of translated proteins. According to the benchmark set by BUSCO, 87.01% of cDNAs identified were complete genes, and 78.47% were complete and single-copy genes. Compared to other full-length cDNA resources collected by Sanger and PacBio sequencers, the extent of the coverage in our dataset was the highest, indicating that these data can be safely used for further studies. When two tissue-specific libraries were compared, there were significant expression differences between male strobili and leaf and bark sets. Moreover, subtle expression difference between male-fertile and sterile libraries were detected. Orthologous genes from other model plants and conifer species were identified. We demonstrated that our transcriptome assembly output (CJ3006NRE) can serve as a reference transcriptome for future functional genomics and evolutionary biology studies.
Keyphrases
- genome wide
- single cell
- rna seq
- dna methylation
- poor prognosis
- gene expression
- genome wide identification
- mass spectrometry
- ms ms
- healthcare
- risk assessment
- machine learning
- electronic health record
- human health
- binding protein
- long non coding rna
- transcription factor
- climate change
- big data
- artificial intelligence
- health risk