# Explaining Neural Network Decisions Is Hard

Poster

Jan Macdonald, Stephan Wäldchen, Sascha Hauch, Gitta Kutyniok

### Abstract

We connect the widespread idea of interpreting classifier decisions to probabilistic prime implicants. A set of input features is deemed relevant for a classification decision if the classifier score remains nearly constant when randomising the remaining features. This introduces a rate-distortion trade-off between the set size and the deviation of the score. We explain how relevance maps can be interpreted as a greedy strategy to calculate the rate-distortion function. For neural networks we show that approximating this function even in a single point up to any non-trivial approximation factor is NP-hard. Thus, no algorithm will provably find small relevant sets of input features even if they exist. Finally, as a numerical comparison we express a Boolean function, for which the prime implicant sets are known, as a neural network and investigate which relevance mapping methods are able to highlight them.

My research is at the interface of applied and computational mathematics and scientific machine learning. I am interested in inverse problems, signal- and image recovery, and robust and interpretable deep learning.