Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli.
Minseung KimNavneet RaiVioleta ZorraquinoIlias TagkopoulosPublished in: Nature communications (2016)
A significant obstacle in training predictive cell models is the lack of integrated data sources. We develop semi-supervised normalization pipelines and perform experimental characterization (growth, transcriptional, proteome) to create Ecomics, a consistent, quality-controlled multi-omics compendium for Escherichia coli with cohesive meta-data information. We then use this resource to train a multi-scale model that integrates four omics layers to predict genome-wide concentrations and growth dynamics. The genetic and environmental ontology reconstructed from the omics data is substantially different and complementary to the genetic and chemical ontologies. The integration of different layers confers an incremental increase in the prediction performance, as does the information about the known gene regulatory and protein-protein interactions. The predictive performance of the model ranges from 0.54 to 0.87 for the various omics layers, which far exceeds various baselines. This work provides an integrative framework of omics-driven predictive modelling that is broadly applicable to guide biological discovery.
Keyphrases
- single cell
- escherichia coli
- genome wide
- electronic health record
- big data
- high throughput
- dna methylation
- copy number
- machine learning
- small molecule
- gene expression
- transcription factor
- health information
- stem cells
- mesenchymal stem cells
- risk assessment
- staphylococcus aureus
- oxidative stress
- high resolution
- biofilm formation
- high speed
- cystic fibrosis
- human health
- solar cells
- multidrug resistant
- bone marrow