MSIpred: a python package for tumor microsatellite instability classification from tumor mutation annotation data using a support vector machine.
Chen WangChun LiangPublished in: Scientific reports (2018)
Microsatellite instability (MSI) is characterized by high degree of polymorphism in microsatellite lengths due to deficiency in mismatch repair (MMR) system. MSI is associated with several tumor types and its status can be considered as an important indicator for patient prognosis. Conventional clinical diagnosis of MSI examines PCR products of a panel of microsatellite markers using electrophoresis (MSI-PCR), which is laborious, costly, and time consuming. We developed MSIpred, a python package for automatic MSI classification using a machine learning technology - support vector machine (SVM). MSIpred computes 22 features characterizing tumor somatic mutational load from mutation data in mutation annotation format (MAF) generated from paired tumor-normal exome sequencing data, subsequently using these features to predict tumor MSI status with a SVM classifier trained by MAF data of 1074 tumors belonging to four types. Evaluation of MSIpred on an independent testing set, MAF data of another 358 tumors, achieved overall accuracy of ≥98% and area under receiver operating characteristic (ROC) curve of 0.967. Further analysis on discrepant cases revealed that discrepancies were partially due to misclassification of MSI-PCR. Additional testing of MSIpred on non-TCGA data also validated its good classification performance. These results indicated that MSIpred is a robust pan-tumor MSI classification tool and can serve as a complementary diagnostic to MSI-PCR in MSI diagnosis.