Statistical characteristics of amino acid covariance as possible descriptors of viral genomic complexity.
C K SruthiMeher K PrakashPublished in: Scientific reports (2019)
At the sequence level it is hard to describe the complexity of viruses which allows them to challenge host immune system, some for a few weeks and others up to a complete compromise. Paradoxically, viral genomes are both complex and simple. Complex because amino acid mutation rates are very high, and yet viruses remain functional. Simple because they have barely around 10 types of proteins, so viral protein-protein interaction networks are not insightful. In this work we use fine-grained amino acid level information and their evolutionary characteristics obtained from large-scale genomic data to develop a statistical panel, towards the goal of developing quantitative descriptors for the biological complexity of viruses. Networks were constructed from pairwise covariation of amino acids and were statistically analyzed. Three differentiating factors arise: predominantly intra- vs inter-protein covariance relations, the nature of the node degree distribution and network density. Interestingly, the covariance relations were primarily intra-protein in avian influenza and inter-protein in HIV. The degree distributions showed two universality classes: a power-law with exponent -1 in HIV and avian-influenza, random behavior in human flu and dengue. The calculated covariance network density correlates well with the mortality strengths of viruses on the viral-Richter scale. These observations suggest the potential utility of the statistical metrics for describing the covariance patterns in viruses. Our host-virus interaction analysis point to the possibility that host proteins which can interact with multiple viral proteins may be responsible for shaping the inter-protein covariance relations. With the available data, it appears that network density might be a surrogate for the virus Richter scale, however the hypothesis needs a re-examination when large scale complete genome data for more viruses becomes available.
Keyphrases
- amino acid
- protein protein
- sars cov
- small molecule
- antiretroviral therapy
- electronic health record
- big data
- hepatitis c virus
- hiv infected
- human immunodeficiency virus
- genetic diversity
- hiv testing
- endothelial cells
- hiv aids
- high resolution
- lymph node
- cardiovascular events
- gene expression
- healthcare
- copy number
- wastewater treatment
- cardiovascular disease
- men who have sex with men
- risk assessment
- risk factors
- artificial intelligence
- aedes aegypti
- deep learning
- coronary artery disease
- computed tomography
- social media
- binding protein
- mass spectrometry
- data analysis
- human health
- gestational age
- neural network