Transcriptome Profile of the Asian Giant Hornet (Vespa mandarinia) Using Illumina HiSeq 4000 Sequencing: De Novo Assembly, Functional Annotation, and Discovery of SSR Markers.
Bharat Bhusan PatnaikSo Young ParkSe Won KangHee-Ju HwangTae Hun WangEun Bi ParkJong Min ChungDae Kwon SongChangmu KimSoonok KimJae Bong LeeHeon Cheon JeongHong Seog ParkYeon Soo HanYong Seok LeePublished in: International journal of genomics (2016)
Vespa mandarinia found in the forests of East Asia, including Korea, occupies the highest rank in the arthropod food web within its geographical range. It serves as a source of nutrition in the form of Vespa amino acid mixture and is listed as a threatened species, although no conservation measures have been implemented. Here, we performed de novo assembly of the V. mandarinia transcriptome by Illumina HiSeq 4000 sequencing. Over 60 million raw reads and 59,184,811 clean reads were obtained. After assembly, a total of 66,837 unigenes were clustered, 40,887, 44,455, and 22,390 of which showed homologous matches against the PANM, Unigene, and KOG databases, respectively. A total of 15,675 unigenes were assigned to Gene Ontology terms, and 5,132 unigenes were mapped to 115 KEGG pathways. The zinc finger domain (C2H2-like), serine/threonine/dual specificity protein kinase domain, and RNA recognition motif domain were among the top InterProScan domains predicted for V. mandarinia sequences. Among the unigenes, we identified 534,922 cDNA simple sequence repeats as potential markers. This is the first transcriptomic analysis of the wasp V. mandarinia using Illumina HiSeq 4000. The obtained datasets should promote the search for new genes to understand the physiological attributes of this wasp.
Keyphrases
- single cell
- rna seq
- protein kinase
- genome wide
- amino acid
- high throughput
- high throughput sequencing
- small molecule
- gene expression
- climate change
- genome wide identification
- dna methylation
- physical activity
- copy number
- human health
- machine learning
- genome wide analysis
- big data
- deep learning
- bioinformatics analysis
- risk assessment