The sbv IMPROVER Systems Toxicology Computational Challenge: Identification of Human and Species-Independent Blood Response Markers as Predictors of Smoking Exposure and Cessation Status.
Vincenzo BelcastroCarine PoussinYang XiangMaurizio GiordanoKumar Parijat TripathiAkash BodaStéphanie BouéMario GuarracinoFlorian MartinManuel C PeitschJulia HoengRoberto RomeroAdi L TarcaZhongqu DuanHao YangXiaofeng GongPeixuan WangChenfang ZhangWenxin YangOmer Sinan SaracIsmail BilgenAli Tugrul BalciRahul KumarSandeep Kumar DhandaPublished in: Computational toxicology (Amsterdam, Netherlands) (2017)
Cigarette smoking entails chronic exposure to a mixture of harmful chemicals that trigger molecular changes over time, and is known to increase the risk of developing diseases. Risk assessment in the context of 21st century toxicology relies on the elucidation of mechanisms of toxicity and the identification of exposure response markers, usually from high-throughput data, using advanced computational methodologies. The sbv IMPROVER Systems Toxicology computational challenge (Fall 2015-Spring 2016) aimed to evaluate whether robust and sparse (≤40 genes) human (sub-challenge 1, SC1) and species-independent (sub-challenge 2, SC2) exposure response markers (so called gene signatures) could be extracted from human and mouse blood transcriptomics data of current (S), former (FS) and never (NS) smoke-exposed subjects as predictors of smoking and cessation status. Best-performing computational methods were identified by scoring anonymized participants' predictions. Worldwide participation resulted in 12 (SC1) and six (SC2) final submissions qualified for scoring. The results showed that blood gene expression data were informative to predict smoking exposure (i.e. discriminating smoker versus never or former smokers) status in human and across species with a high level of accuracy. By contrast, the prediction of cessation status (i.e. distinguishing FS from NS) remained challenging, as reflected by lower classification performances. Participants successfully developed inductive predictive models and extracted human and species-independent gene signatures, including genes with high consensus across teams. Post-challenge analyses highlighted "feature selection" as a key step in the process of building a classifier and confirmed the importance of testing a gene signature in independent cohorts to ensure the generalized applicability of a predictive model at a population-based level. In conclusion, the Systems Toxicology challenge demonstrated the feasibility of extracting a consistent blood-based smoke exposure response gene signature and further stressed the importance of independent and unbiased data and method evaluations to provide confidence in systems toxicology-based scientific conclusions.
Keyphrases
- endothelial cells
- genome wide
- gene expression
- induced pluripotent stem cells
- risk assessment
- high throughput
- electronic health record
- copy number
- machine learning
- big data
- single cell
- computed tomography
- physical activity
- magnetic resonance
- oxidative stress
- transcription factor
- deep learning
- zika virus
- contrast enhanced