Explainable deep learning enhances robust and reliable real-time monitoring of a chromatographic protein A capture step.
Matthias MedlFriedrich LeischAstrid DürauerTheresa ScharlPublished in: Biotechnology journal (2024)
The application of model-based real-time monitoring in biopharmaceutical production is a major step toward quality-by-design and the fundament for model predictive control. Data-driven models have proven to be a viable option to model bioprocesses. In the high stakes setting of biopharmaceutical manufacturing it is essential to ensure high model accuracy, robustness, and reliability. That is only possible when (i) the data used for modeling is of high quality and sufficient size, (ii) state-of-the-art modeling algorithms are employed, and (iii) the input-output mapping of the model has been characterized. In this study, we evaluate the accuracy of multiple data-driven models in predicting the monoclonal antibody (mAb) concentration, double stranded DNA concentration, host cell protein concentration, and high molecular weight impurity content during elution from a protein A chromatography capture step. The models achieved high-quality predictions with a normalized root mean squared error of <4% for the mAb concentration and of ≈10% for the other process variables. Furthermore, we demonstrate how permutation/occlusion-based methods can be used to gain an understanding of dependencies learned by one of the most complex data-driven models, convolutional neural network ensembles. We observed that the models generally exhibited dependencies on correlations that agreed with first principles knowledge, thereby bolstering confidence in model reliability. Finally, we present a workflow to assess the model behavior in case of systematic measurement errors that may result from sensor fouling or failure. This study represents a major step toward improved viability of data-driven models in biopharmaceutical manufacturing.