Discovery and Analyses of Caulimovirid-like Sequences in Upland Cotton ( Gossypium hirsutum ).
Nina Aboughanem-SabanadzovicThomas W AllenJames FrelichowskiJodi A SchefflerSead SabanadzovicPublished in: Viruses (2023)
Analyses of Illumina-based high-throughput sequencing data generated during characterization of the cotton leafroll dwarf virus population in Mississippi (2020-2022) consistently yielded contigs varying in size (most frequently from 4 to 7 kb) with identical nucleotide content and sharing similarities with reverse transcriptases (RTases) encoded by extant plant pararetroviruses (family Caulimoviridiae ). Initial data prompted an in-depth study involving molecular and bioinformatic approaches to characterize the nature and origins of these caulimovirid-like sequences. As a result, here, we report on endogenous viral elements (EVEs) related to extant members of the family Caulimoviridae, integrated into a genome of upland cotton ( Gossypium hirsutum ), for which we propose the provisional name "endogenous cotton pararetroviral elements" (eCPRVE). Our investigations pinpointed a ~15 kbp-long locus on the A04 chromosome consisting of head-to-head orientated tandem copies located on positive- and negative-sense DNA strands (eCPRVE+ and eCPRVE-). Sequences of the eCPRVE+ comprised nearly complete and slightly decayed genome information, including ORFs coding for the viral movement protein (MP), coat protein (CP), RTase, and transactivator/viroplasm protein (TA). Phylogenetic analyses of major viral proteins suggest that the eCPRVE+ may have been initially derived from a genome of a cognate virus belonging to a putative new genus within the family. Unexpectedly, an identical 15 kb-long locus composed of two eCPRVE copies was also detected in a newly recognized species G. ekmanianum , shedding some light on the relatively recent evolution within the cotton family.
Keyphrases
- sars cov
- high throughput sequencing
- genome wide analysis
- genome wide identification
- electronic health record
- big data
- genome wide
- amino acid
- single molecule
- health information
- high throughput
- machine learning
- optic nerve
- dna methylation
- optical coherence tomography
- gene expression
- social media
- cell free
- transcription factor
- genome wide association study