Tissue-guided LASSO for prediction of clinical drug response using preclinical samples.
Edward W HuangAmeya BhopeJing LimSaurabh SinhaAmin EmadPublished in: PLoS computational biology (2020)
Prediction of clinical drug response (CDR) of cancer patients, based on their clinical and molecular profiles obtained prior to administration of the drug, can play a significant role in individualized medicine. Machine learning models have the potential to address this issue but training them requires data from a large number of patients treated with each drug, limiting their feasibility. While large databases of drug response and molecular profiles of preclinical in-vitro cancer cell lines (CCLs) exist for many drugs, it is unclear whether preclinical samples can be used to predict CDR of real patients. We designed a systematic approach to evaluate how well different algorithms, trained on gene expression and drug response of CCLs, can predict CDR of patients. Using data from two large databases, we evaluated various linear and non-linear algorithms, some of which utilized information on gene interactions. Then, we developed a new algorithm called TG-LASSO that explicitly integrates information on samples' tissue of origin with gene expression profiles to improve prediction performance. Our results showed that regularized regression methods provide better prediction performance. However, including the network information or common methods of including information on the tissue of origin did not improve the results. On the other hand, TG-LASSO improved the predictions and distinguished resistant and sensitive patients for 7 out of 13 drugs. Additionally, TG-LASSO identified genes associated with the drug response, including known targets and pathways involved in the drugs' mechanism of action. Moreover, genes identified by TG-LASSO for multiple drugs in a tissue were associated with patient survival. In summary, our analysis suggests that preclinical samples can be used to predict CDR of patients and identify biomarkers of drug sensitivity and survival.
Keyphrases
- machine learning
- end stage renal disease
- ejection fraction
- gene expression
- newly diagnosed
- chronic kidney disease
- peritoneal dialysis
- prognostic factors
- adverse drug
- drug induced
- genome wide
- big data
- cell therapy
- health information
- dna methylation
- case report
- mesenchymal stem cells
- risk assessment
- patient reported
- electronic health record
- body composition
- genome wide identification