Explaining Neural Network Decisions Is Hard

Name: Explaining Neural Network Decisions Is Hard
Start: 2020-07-17T00:00:00Z
Location: Virtual Workshop

Poster

Jan Macdonald, Stephan Wäldchen, Sascha Hauch, Gitta Kutyniok

Abstract

We connect the widespread idea of interpreting classifier decisions to probabilistic prime implicants. A set of input features is deemed relevant for a classification decision if the classifier score remains nearly constant when randomising the remaining features. This introduces a rate-distortion trade-off between the set size and the deviation of the score. We explain how relevance maps can be interpreted as a greedy strategy to calculate the rate-distortion function. For neural networks we show that approximating this function even in a single point up to any non-trivial approximation factor is NP-hard. Thus, no algorithm will provably find small relevant sets of input features even if they exist. Finally, as a numerical comparison we express a Boolean function, for which the prime implicant sets are known, as a neural network and investigate which relevance mapping methods are able to highlight them.

Date

Jul 17, 2020

Event

XXAI: Extending Explainable AI Beyond Deep Models and Classifiers - ICML 2020 Workshop

Location

Virtual Workshop

Deep Learning Deep Neural Networks Explainable Neural Networks