The causes of traffic violations by elderly drivers are different from those of other age groups. To reduce serious traffic violations that are more likely to cause serious traffic crashes, this study divided the severity of traffic violations into three levels (i.e., slight, ordinary, severe) based on point deduction, and explore the patterns of serious traffic violations (i.e., ordinary, severe) using multi-source data. This paper designed an interpretable machine learning framework, in which four popular machine learning models were enhanced and compared. Specifically, adaptive synthetic sampling method was applied to overcome the effects of imbalanced data and improve the prediction accuracy of minority classes (i.e., ordinary, severe); multi-objective feature selection based on NSGA-II was used to remove the redundant factors to increase the computational efficiency and make the patterns discovered by the explainer more effective; Bayesian hyperparameter optimization aimed to obtain more effective hyperparameters combination with fewer iterations and boost the model adaptability. Results show that the proposed interpretable machine learning framework can significantly improve and distinguish the performance of four popular machine learning models and two post-hoc interpretation methods. It is found that six of the top ten important factors belong to multi-scale built environment attributes. By comparing the results of feature contribution and interaction effects, some findings can be summarized: ordinary and severe traffic violations have some identical influencing factors and interactive effects; have the same influencing factors or the same combinations of influencing factors, but the values of the factors are different; have some unique influencing factors and unique combinations of influencing factors.