Deep learning systems to explain their decisions

1 min read

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory have found a new way to train neural networks so that they not only provide predictions and classifications but also rationales for their decisions.

“In real-world applications, sometimes people want to know why the model makes the predictions it does,” said graduate student Tao Lei. “One major reason that doctors don’t trust machine-learning methods is that there’s no evidence.”

“You may not want to just verify that the model is making the prediction in the right way; you might also want to exert some influence in terms of the types of predictions that it should make,” commented Tommi Jaakkola, an MIT professor of electrical engineering and computer science.

The researchers address neural nets trained on textual data. To enable interpretation of a neural net’s decisions, the group divide the net into two modules. The first module extracts segments of text from the training data, and the segments are scored according to their length and their coherence: The shorter the segment and the more of it that is drawn from strings of consecutive words, the higher its score.

The segments selected by the first module are then passed to the second module, which performs the prediction or classification task. The modules are trained together, and the goal of training is to maximise both the score of the extracted segments and the accuracy of prediction.

One of the data sets on which the researchers tested their system was a group of reviews from a website where users evaluate different beers.

For example, a review might consist of eight or nine sentences, and the annotator might have highlighted those that refer to the beer’s ‘tan-coloured head about half an inch thick’, ‘signature Guinness smells’ and ‘lack of carbonation’. Each sentence is correlated with a different attribute rating.

If the first module has extracted those three phrases, and the second module has correlated them with the correct ratings, then the system has identified the same basis for judgment as the human annotator.

In experiments, the system’s agreement with the human annotations was 96% and 95%, respectively, for ratings of appearance and aroma, and 80% for the more nebulous concept of palate.