Interspecific comparison of gene expression profiles using machine learning.
Artem S KasianovAnna V KlepikovaAlexey V MayorovGleb S BuzanovMaria D LogachevaAleksey A PeninPublished in: PLoS computational biology (2023)
Interspecific gene comparisons are the keystones for many areas of biological research and are especially important for the translation of knowledge from model organisms to economically important species. Currently they are hampered by the low resolution of methods based on sequence analysis and by the complex evolutionary history of eukaryotic genes. This is especially critical for plants, whose genomes are shaped by multiple whole genome duplications and subsequent gene loss. This requires the development of new methods for comparing the functions of genes in different species. Here, we report ISEEML (Interspecific Similarity of Expression Evaluated using Machine Learning)-a novel machine learning-based algorithm for interspecific gene classification. In contrast to previous studies focused on sequence similarity, our algorithm focuses on functional similarity inferred from the comparison of gene expression profiles. We propose novel metrics for expression pattern similarity-expression score (ES)-that is suitable for species with differing morphologies. As a proof of concept, we compare detailed transcriptome maps of Arabidopsis thaliana, the model species, Zea mays (maize) and Fagopyrum esculentum (common buckwheat), which are species that represent distant clades within flowering plants. The classifier resulted in an AUC of 0.91; under the ES threshold of 0.5, the specificity was 94%, and sensitivity was 72%.
Keyphrases
- genome wide
- machine learning
- genome wide identification
- copy number
- poor prognosis
- arabidopsis thaliana
- dna methylation
- deep learning
- healthcare
- genome wide analysis
- gene expression
- transcription factor
- magnetic resonance
- lymph node
- artificial intelligence
- computed tomography
- genetic diversity
- big data
- long non coding rna
- gram negative