Complex Analysis of Retroposed Genes' Contribution to Human Genome, Proteome and Transcriptome.
Magdalena Regina KubiakMichał Wojciech SzcześniakIzabela MakałowskaPublished in: Genes (2020)
Gene duplication is a major driver of organismal evolution. One of the main mechanisms of gene duplications is retroposition, a process in which mRNA is first transcribed into DNA and then reintegrated into the genome. Most gene retrocopies are depleted of the regulatory regions. Nevertheless, examples of functional retrogenes are rapidly increasing. These functions come from the gain of new spatio-temporal expression patterns, imposed by the content of the genomic sequence surrounding inserted cDNA and/or by selectively advantageous mutations, which may lead to the switch from protein coding to regulatory RNA. As recent studies have shown, these genes may lead to new protein domain formation through fusion with other genes, new regulatory RNAs or other regulatory elements. We utilized existing data from high-throughput technologies to create a complex description of retrogenes functionality. Our analysis led to the identification of human retroposed genes that substantially contributed to transcriptome and proteome. These retrocopies demonstrated the potential to encode proteins or short peptides, act as cis- and trans- Natural Antisense Transcripts (NATs), regulate their progenitors' expression by competing for the same microRNAs, and provide a sequence to lncRNA and novel exons to existing protein-coding genes. Our study also revealed that retrocopies, similarly to retrotransposons, may act as recombination hot spots. To our best knowledge this is the first complex analysis of these functions of retrocopies.
Keyphrases
- genome wide
- genome wide identification
- transcription factor
- dna methylation
- copy number
- bioinformatics analysis
- binding protein
- genome wide analysis
- high throughput
- endothelial cells
- poor prognosis
- single cell
- rna seq
- healthcare
- dna damage
- protein protein
- long non coding rna
- gene expression
- induced pluripotent stem cells
- electronic health record
- long noncoding rna
- cell free
- data analysis
- human health