Comprehensive Stress-Based De Novo Transcriptome Assembly and Annotation of Guar (Cyamopsis tetragonoloba (L.) Taub.): An Important Industrial and Forage Crop.
Fahad Al-QurainyAref AlshameriAbdel-Rhman Zakaria GaafarSalim KhanMohammad NadeemAbdulhafed Abdullah AlameriMohamed TarroumMuhammad AshrafPublished in: International journal of genomics (2019)
The forage crop Guar (Cyamopsis tetragonoloba (L.) Taub.) has the ability to endure heat, drought, and mild salinity. A complete image on its genic architecture will promote our understanding about gene expression networks and different tolerance mechanisms at the molecular level. Therefore, whole mRNA sequence approach on the Guar plant was conducted to provide a snapshot of the mRNA information in the cell under salinity, heat, and drought stresses to be integrated with previous transcriptomic studies. RNA-Seq technology was employed to perform a 2 × 100 paired-end sequencing using an Illumina HiSeq 2500 platform for the transcriptome of leaves of C. tetragonoloba under normal, heat, drought, and salinity conditions. Trinity was used to achieve a de novo assembly followed by gene annotation, functional classification, metabolic pathway analysis, and identification of SSR markers. A total of 218.2 million paired-end raw reads (~44 Gbp) were generated. Of those, 193.5M paired-end reads of high quality were used to reconstruct a total of 161,058 transcripts (~266 Mbp) with N50 of 2552 bp and 61,508 putative genes. There were 6463 proteins having >90% full-length coverage against the Swiss-Prot database and 94% complete orthologs against Embryophyta. Approximately, 62.87% of transcripts were blasted, 50.46% mapped, and 43.50% annotated. A total of 4715 InterProScan families, 3441 domains, 74 repeats, and 490 sites were detected. Biological processes, molecular functions, and cellular components comprised 64.12%, 25.42%, and 10.4%, respectively. The transcriptome was associated with 985 enzymes and 156 KEGG pathways. A total of 27,066 SSRs were gained with an average frequency of one SSR/9.825 kb in the assembled transcripts. This resulting data will be helpful for the advanced analysis of Guar to multi-stress tolerance.
Keyphrases
- rna seq
- single cell
- heat stress
- climate change
- high throughput
- gene expression
- genome wide
- microbial community
- deep learning
- arabidopsis thaliana
- machine learning
- dna methylation
- plant growth
- binding protein
- heavy metals
- copy number
- risk assessment
- wastewater treatment
- transcription factor
- health information
- single molecule
- genetic diversity
- mesenchymal stem cells
- cell wall
- data analysis
- adverse drug