Interpretable and Safe AI Research Landscape Map

Surveys

Interpretability Techniques

A. Adadi and M. Berrada, “Peeking inside the black-box"

R. Ashmore, R. Calinescu, and C. Paterson, “Assuring the machine learning lifecycle

O. Biran and C. Cotton, “Explanation and justification in machine
learning

Mohseni et al "Explainable Artificial Intelligence: A Survey."

F. K. Dosilovi, M. Brci, and N. Hlupi, “Explainable artificial intelli-gence: A survey,”

Theory

T. Miller, “Explanation in artificial intelligence: Insights from the social
sciences,”

S. Mohseni, N. Zarei, and E. D. Ragan, “A survey of evaluation methods
and measures for interpretable machine learning

Safety Critical Systems

Assurance

T. Kelly, “A systematic approach to safety case management.” inhttps://www-users.cs.york.ac.uk/tpk/04AE-149.pdf, 2003.

R. Ashmore, R. Calinescu, and C. Paterson, “Assuring the ma-chine learning lifecycle: Desiderata, methods, and challenges,” inhttps://arxiv.org/pdf/1905.04223.pdf, May 2019.

Chiara, “A pattern for arguing the assurance of machine learning inmedical diagnosis systems,” inAssuring Autonomy International Pro-gramme, The University of York, York, U.K.

Machine Learning

Interpretability techniques for ML models

RL

L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele,and T. D. and, “Generating visual explanations,”arXiv:1603.08507v1[cs.CV] 28 Mar 2016, 2016

Neural Nets

T. Zahavy, N. B. Zrihem, and S. Mannor, “Graying the black box:Understanding dqns

C. Olah, L. Schubert, and A. Mordvintsev, “Feature visualization how neural networks build up their understanding of images

A. Avati, K. Jung, S. Harman, L. Downing, A. Ng, and N. H. Shah, “Im-proving palliative care with deep learning,”arXiv:1711.06402v1 [cs.CY]17 Nov 2017, 2017

L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Kagal,“Explaining explanations: An overview of interpretability of machinelearning,”arXiv:1806.00069v3 [cs.AI] 3 Feb 2019, 2019

S. Sarkar, “Accuracy and interpretability trade-offs in machine learningapplied to safer gambling,” inCEUR Workshop Proceedings

Q. shi Zhang and S. chun Zhu, “Visual interpretability for deep learning:a survey,”Frontiers of Information Technology Electronic Engineering,2018.

K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutionalnetworks: Visualising image classification models and saliency maps,”https://arxiv.org/abs/1312.6034, 2013

Supervised Learning

A. Avati, K. Jung, S. Harman, L. Downing, A. Ng, and N. H. Shah, “Im-proving palliative care with deep learning,”arXiv:1711.06402v1 [cs.CY]17 Nov 2017, 2017

F. K. Dosilovi, M. Brci, and N. Hlupi, “Explainable artificial intelli-gence: A survey,”MIPRO 2018, May 21-25, 2018, Opatija Croatia,2018

Z.C.Lipton,“Themythosofmodelinterpretability,”arXiv:1606.03490v3 [cs.LG], 2017

Unsupervised Learning

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting modelpredictions,” inarXiv:1705.07874v2 [cs.AI] 25 Nov 2017, 2017

M. T. Ribeiro,S. Singh,and C. Guestrin,“why shoulditrustyou?explainingthepredictionsofanyclassifier,”https://arxiv.org/abs/1602.04938, 2016

Model Agnostic

P. W. Koh and P. Liang, “Understanding black-box predictions viainfluence functions,”ICML’17 Proceedings of the 34th InternationalConference on Machine Learning - Volume 70 Pages 1885-1894, 2017.11

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting modelpredictions,” inarXiv:1705.07874v2 [cs.AI] 25 Nov 2017, 2017

H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec, “Interpretableexplorable approximations of black box models,” inarXiv:1707.01154v1[cs.AI] 4 Jul 2017, 2017

I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick,S. Mohamed, and A. Lerchner, “Learning basic visual concepts with aconstrained variational framework,”ICLR 2017, 2017.

S. Wachter, B. D. Mittelstadt, and C. Russell, “Counterfactualexplanations without opening the black box: Automated decisions andthe GDPR,”CoRR, vol. abs/1711.00399, 2017

Classifiers

M. T. Ribeiro, S. Singh, and C. Guestrin,“why should i trust you? explaining the predictions of any classifier, ” https://arxiv.org/abs/1602.04938, 2016.

C. Otte, “Safe and interpretable machine learning a methodologicalreview,”

H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec, “Interpretableexplorable approximations of black box models,” inarXiv:1707.01154v1[cs.AI] 4 Jul 2017, 2017

L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele,and T. D. and, “Generating visual explanations,”

Applications

Health Care

A. Avati, K. Jung, S. Harman, L. Downing, A. Ng, and N. H. Shah, “Im-proving palliative care with deep learning,”

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional net-works for biomedical image segmentation,” 2015.

N. Tomaev, X. Glorot, and S. Mohamed, “A clinically applicable ap-proach to continuous prediction of future acute kidney injury,” 2019.

Autonomous Vehicles

Requirements and needs

Legality

B. Goodman and S. Flaxman, “European union regulations on algorithmic
decision-making and a ”right to explanation”,” AI Magazine, Vol 38, No 3,
2017, 2016.

R. Budishet al., “Accountability of ai under the law: The role of ex-planation,”

S. Wachter, B. Mittelstadt, and L. Floridi, “Why a right to explana-tion of automated decision-making does not exist in the general dataprotection regulation,” 2017.

Safety

R. Ashmore, R. Calinescu, and C. Paterson, “Assuring the machine learning lifecycle

Discussion Of Explanations

Philosophy

Grimm, “The goal of explanation,” in Studies in the History and Philosophy
of Science A, 41(4), 337-344, 2010.

Achinstein, “The nature of explanation,” in Oxford University Press, 1983

P. Lipton, “Contrastive explanation,” in Royal Institute of Philosophy Supplement 27:247-266, 1990.

Explanation in AI

Shane T. Mueller "Explanation in Human-AI Systems:
A Literature Meta-Review
Synopsis of Key Ideas and Publications
and
Bibliography for Explainable AI"

T. Miller, “Explanation in artificial intelligence: Insights from the social sciences,” in https://arxiv.org/abs/1706.07269, 2018.

Z. C. Lipton, “The mythos of model interpretability,”

D. Doran, S. Schulz, and T. R. Besold, “What does explainable ai really mean?
a new conceptualization of perspectives,” in arXiv:1710.00794v1 [cs.AI] 2 Oct
2017, 2017

C. Rudin, “Please stop explaining black box models for high-stakes decisions,”
arXiv:1811.10154v2, 2018.

Z. Lipton, “The doctor just wont accept that!” in arXiv 2015;1711.08037v2,
2015.

Highlighted papers are key discussions and surveys of interpretable ML