Genes divided according to the relative position of the longest intron show increased representation in different KEGG pathways.
Pavel DvořákViktor HlavacVojtech HanicinecBhavana Hemantha RaoPavel SoučekPublished in: BMC genomics (2024)
Despite the fact that introns mean an energy and time burden for eukaryotic cells, they play an irreplaceable role in the diversification and regulation of protein production. As a common feature of eukaryotic genomes, it has been reported that in protein-coding genes, the longest intron is usually one of the first introns. The goal of our work was to find a possible difference in the biological function of genes that fulfill this common feature compared to genes that do not. Data on the lengths of all introns in genes were extracted from the genomes of six vertebrates (human, mouse, koala, chicken, zebrafish and fugu) and two other model organisms (nematode worm and arabidopsis). We showed that more than 40% of protein-coding genes have the relative position of the longest intron located in the second or third tertile of all introns. Genes divided according to the relative position of the longest intron were found to be significantly increased in different KEGG pathways. Genes with the longest intron in the first tertile predominate in a range of pathways for amino acid and lipid metabolism, various signaling, cell junctions or ABC transporters. Genes with the longest intron in the second or third tertile show increased representation in pathways associated with the formation and function of the spliceosome and ribosomes. In the two groups of genes defined in this way, we further demonstrated the difference in the length of the longest introns and the distribution of their absolute positions. We also pointed out other characteristics, namely the positive correlation between the length of the longest intron and the sum of the lengths of all other introns in the gene and the preservation of the exact same absolute and relative position of the longest intron between orthologous genes.
Keyphrases
- genome wide
- genome wide identification
- bioinformatics analysis
- genome wide analysis
- dna methylation
- transcription factor
- stem cells
- gene expression
- multidrug resistant
- copy number
- oxidative stress
- single cell
- small molecule
- molecular dynamics
- deep learning
- induced apoptosis
- endoplasmic reticulum stress
- bone marrow
- signaling pathway
- artificial intelligence