Poster
We connect the widespread idea of interpreting classifier decisions to probabilistic prime implicants. A set of input features is deemed relevant for a classification decision if the classifier score remains nearly constant when randomising the remaining features. This introduces a rate-distortion trade-off between the set size and the deviation of the score. We explain how relevance maps can be interpreted as a greedy strategy to calculate the rate-distortion function. For neural networks we show that approximating this function even in a single point up to any non-trivial approximation factor is NP-hard. Thus, no algorithm will provably find small relevant sets of input features even if they exist. Finally, as a numerical comparison we express a Boolean function, for which the prime implicant sets are known, as a neural network and investigate which relevance mapping methods are able to highlight them.