# Validation of Machine Learning Models with Algorithms from the Area of Explainable AI for Regression and Classification Tasks

Masterarbeit am ifp - Moritz Johannes Weixler

## Moritz Johannes Weixler

### Validation of Machine Learning Models with Algorithms from the Area of Explainable AI for Regression and Classification Tasks

Duration: 6 months

Completition: June 2021

Supervisor: Dr. Sibylle Sager (Robert Bosch GmbH), Prof. Dr.-Ing. Norbert Haala

Examiner: Prof. Dr.-Ing. Norbert Haala

As the scientific field of artificial intelligence (AI) is evolving rapidly in the past few years, concerns grow about the safety, security, reliability and resiliency of AI-based systems. There have been increasing efforts to understand machine learning (ML) models in order to detect security-related problems at an early stage. These methods are known collectively under the term explainable AI (XAI). For the explanation of neural networks, several methods have prevailed. These algorithms aim to explain the decisions of the network by the sensitivity (gradient methods) or the importance of input features (relevance and activation-based methods).

To evaluate the usability of different approaches on explainability, eight different methods (Gradient Backpropagation by Simonyan et al. (2014), Deconvolutional Network by Zeiler and Fergus (2014), Guided Backpropagation by Springenberg et al. (2015), Layer-wise Relevance Propagation (LRP) by Bach et al. (2015), DeepLIFT by Shrikumar et al. (2017), Integrated Gradients by (Sundararajan et al., 2017), Grad-CAM by Selvaraju et al. (2019) and Grad-CAM++ by Chattopadhay et al. (2018)) are applied to classification and regression networks. The image classification tasks are represented by the PEG model. Explanations for three chosen algorithms (that show the visually most pleasing results) can be seen in Figure 1 as an overlay over the input image. All three explanations focus on the same area in the input image. Since exactly this area represents the defect to be classified, the model seems to work as intended. To validate the model thoroughly, this hypothesis needs to be proven. To confirm or reject the hypothesis, adversarial attacks can be used.

In order to evaluate the methods objectively, existing and new metrics are introduced to evaluate the methods. For this, the ability of the methods to accurately describe the model at the given sample (exactness), the ability to compensate noise in the model function (susceptibility to noise) and the computational speed of the algorithms are examined. Regarding exactness, the gradient methods are treated separately from the other methods. The evaluation of the results shows that the ability of the methods to filter model-induced noise is of great importance. Here, the use of Gradient Backpropagation or the Deconvolutional Network carries risks. Similarly, LRP turns out to show drawbacks in terms of exactness and Integrated Gradients in terms of computational speed. In the case of activation-based methods, Grad-CAM shows slight advantages compared to Grad-CAM++.

Based on these findings, the applicability of XAI methods to simple regression networks is investigated. For this purpose, a model for the correction of fuel quantities in common rail injection systems for diesel vehicles is used. The model uses a pressure curve with 20 values ( *p*_{1} to *p*_{20} ), the engine n_{eng} speed and the angle of the start of the main injection *Φ _{MI}* as input and estimates the total injected fuel mass m as output. For this model only the first six methods can be applied (the Grad-CAM methods require convolutional layers in the network). The attributions for this regression network are only single dimensional. This offers in-depth possibilities for the visualization of explanations. In particular, it allows explanations to be visualized simultaneously for a large number of samples. As a result, the model can be examined not only locally (using a single sample), but also globally (over the complete data space) in approximation. This allows the model to be explained in much greater detail. To achieve a global model visualization, the attributions of all available samples are calculated and sorted on the x-axis for the output of the network. For this network and a relevance-based approach like DeepLIFT (as shown in Figure 2), it can be seen for which outputs the network model considers which input features as important. For the explanation in Figure 2, we can form the hypothesis, that the model decides mainly on the features

*p*

_{7}to

*p*

_{9},

*p*

_{17}to

*p*

_{18}and

*p*

_{20}. This can be related to a thermodynamic formula. However, techniques like adversarial attacks are required here as well to prove this statement.

Concerning the evaluation of the XAI methods with metrics, many similarities but also differences between image classification and simple regression models emerge. The main differences lie in the importance of exactness and the susceptibility to noisy model functions. For simple regression networks, exactness proves to be of much greater importance than for image classification models. This especially applies for the gradient methods. Little to no consideration has to be given to the filtering of noisy model functions, since due to fewer degrees of freedom, the amount of noise in explanations is significantly reduced. In contrast to image classification tasks, Gradient Backpropagation proves to be the best gradient method. The Deconvolutional Network performs worse due to missing max pooling layers (as shown by Springenberg et al. (2015)) and Guided Backpropagation filters too much (non-existent) noise. For the relevance-based methods, only minor differences to the image classification networks exist. Here, DeepLIFT also offers the best results in terms of accuracy and computational speed. Again, LRP suffers from a meaningless baseline of 0. Although Integrated Gradients produces attributions very similar to DeepLIFT, it has a huge drawback, taking roughly 11 times as long as DeepLIFT.

For both use cases, relevance-based methods yield explanations that are easier to interpret. Especially DeepLIFT shows remarkable results for all examined models.

### References

Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., & Samek, W. (2015). On Pixel-Wise Explanations for Non-Linear Classiﬁer Decisions by Layer-Wise Relevance Propagation. *PLOS ONE, 10 *(7)*,* 1-46.

Chattopadhay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. (2018). Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. *2018 IEEE Winter Conference on Applications of Computer Vision (WACV).*

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2019). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. *International Journal of Computer Vision, 128* (2), 336–359.

Shrikumar, A., Greenside, P., & Kundaje, A. (2017). Learning important features through propagating activation diﬀerences. In *Proceedings of the 34th International Conference on Machine Learning (ICML)* (Vol. 70, p. 3145–3153).

Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classiﬁcation models and saliency maps. In Y. Bengio & Y. LeCun (Eds.), *2nd International Conference on Learning Representations (ICLR).*

Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2015). Striving for Simplicity: The All Convolutional Net. In *International Conference on Learning Representations (ICLR).*

Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic attribution for deep networks. In *Proceedings of the 34th International Conference on Machine Learning (ICML)* (Vol. 70, p. 3319-3328).

Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. *Lecture Notes in Computer Science*, 818–833.

### Ansprechpartner

**apl. Prof. Dr.-Ing.**

### Norbert Haala

Stellvertretender Institutsleiter