Drug sensitivity prediction from cell line-based pharmacogenomics data: guidelines for developing machine learning models.
Hossein Sharifi-NoghabiSoheil Jahangiri-TazehkandPetr SmirnovCasey HonAnthony MammolitiSisira Kadambat NairArvind Singh MerMartin EsterBenjamin Haibe-KainsPublished in: Briefings in bioinformatics (2022)
The goal of precision oncology is to tailor treatment for patients individually using the genomic profile of their tumors. Pharmacogenomics datasets such as cancer cell lines are among the most valuable resources for drug sensitivity prediction, a crucial task of precision oncology. Machine learning methods have been employed to predict drug sensitivity based on the multiple omics data available for large panels of cancer cell lines. However, there are no comprehensive guidelines on how to properly train and validate such machine learning models for drug sensitivity prediction. In this paper, we introduce a set of guidelines for different aspects of training gene expression-based predictors using cell line datasets. These guidelines provide extensive analysis of the generalization of drug sensitivity predictors and challenge many current practices in the community including the choice of training dataset and measure of drug sensitivity. The application of these guidelines in future studies will enable the development of more robust preclinical biomarkers.
Keyphrases
- machine learning
- gene expression
- adverse drug
- clinical practice
- healthcare
- big data
- primary care
- electronic health record
- drug induced
- end stage renal disease
- artificial intelligence
- mental health
- stem cells
- emergency department
- peritoneal dialysis
- squamous cell carcinoma
- ejection fraction
- rna seq
- prognostic factors
- data analysis
- single cell
- decision making
- mass spectrometry
- replacement therapy
- smoking cessation