An illustration of model agnostic explainability methods applied to environmental data

Christopher K. Wikle; Abhirup Datta; Bhava Vyasa Hari; Edward L. Boone; Indranil Sahoo; Indulekha Kavila; Stefano Castruccio; Susan J. Simmons; Wesley S. Burr; Won Chang

doi:10.1002/env.2772

An illustration of model agnostic explainability methods applied to environmental data

Christopher K. Wikle, Abhirup Datta, Bhava Vyasa Hari, Edward L. Boone, Indranil Sahoo, Indulekha Kavila, Stefano Castruccio, Susan J. Simmons, Wesley S. Burr, Won Chang

Bloomberg School of Public Health

Research output: Contribution to journal › Article › peer-review

Abstract

Historically, two primary criticisms statisticians have of machine learning and deep neural models is their lack of uncertainty quantification and the inability to do inference (i.e., to explain what inputs are important). Explainable AI has developed in the last few years as a sub-discipline of computer science and machine learning to mitigate these concerns (as well as concerns of fairness and transparency in deep modeling). In this article, our focus is on explaining which inputs are important in models for predicting environmental data. In particular, we focus on three general methods for explainability that are model agnostic and thus applicable across a breadth of models without internal explainability: “feature shuffling”, “interpretable local surrogates”, and “occlusion analysis”. We describe particular implementations of each of these and illustrate their use with a variety of models, all applied to the problem of long-lead forecasting monthly soil moisture in the North American corn belt given sea surface temperature anomalies in the Pacific Ocean.

Original language	English (US)
Article number	e2772
Journal	Environmetrics
Volume	34
Issue number	1
DOIs	https://doi.org/10.1002/env.2772
State	Published - Feb 2023

Keywords

LIME
Shapley values
explainable AI
feature shuffling
machine learning

ASJC Scopus subject areas

Ecological Modeling
Statistics and Probability

Access to Document

10.1002/env.2772

Cite this

@article{6d911e993f954fe0977a08358556b1bb,

title = "An illustration of model agnostic explainability methods applied to environmental data",

abstract = "Historically, two primary criticisms statisticians have of machine learning and deep neural models is their lack of uncertainty quantification and the inability to do inference (i.e., to explain what inputs are important). Explainable AI has developed in the last few years as a sub-discipline of computer science and machine learning to mitigate these concerns (as well as concerns of fairness and transparency in deep modeling). In this article, our focus is on explaining which inputs are important in models for predicting environmental data. In particular, we focus on three general methods for explainability that are model agnostic and thus applicable across a breadth of models without internal explainability: “feature shuffling”, “interpretable local surrogates”, and “occlusion analysis”. We describe particular implementations of each of these and illustrate their use with a variety of models, all applied to the problem of long-lead forecasting monthly soil moisture in the North American corn belt given sea surface temperature anomalies in the Pacific Ocean.",

keywords = "LIME, Shapley values, explainable AI, feature shuffling, machine learning",

author = "Wikle, {Christopher K.} and Abhirup Datta and Hari, {Bhava Vyasa} and Boone, {Edward L.} and Indranil Sahoo and Indulekha Kavila and Stefano Castruccio and Simmons, {Susan J.} and Burr, {Wesley S.} and Won Chang",

note = "Publisher Copyright: {\textcopyright} 2022 John Wiley & Sons Ltd.",

year = "2023",

month = feb,

doi = "10.1002/env.2772",

language = "English (US)",

volume = "34",

journal = "Environmetrics",

issn = "1180-4009",

publisher = "John Wiley and Sons Ltd",

number = "1",

}

TY - JOUR

T1 - An illustration of model agnostic explainability methods applied to environmental data

AU - Wikle, Christopher K.

AU - Datta, Abhirup

AU - Hari, Bhava Vyasa

AU - Boone, Edward L.

AU - Sahoo, Indranil

AU - Kavila, Indulekha

AU - Castruccio, Stefano

AU - Simmons, Susan J.

AU - Burr, Wesley S.

AU - Chang, Won

PY - 2023/2

Y1 - 2023/2

N2 - Historically, two primary criticisms statisticians have of machine learning and deep neural models is their lack of uncertainty quantification and the inability to do inference (i.e., to explain what inputs are important). Explainable AI has developed in the last few years as a sub-discipline of computer science and machine learning to mitigate these concerns (as well as concerns of fairness and transparency in deep modeling). In this article, our focus is on explaining which inputs are important in models for predicting environmental data. In particular, we focus on three general methods for explainability that are model agnostic and thus applicable across a breadth of models without internal explainability: “feature shuffling”, “interpretable local surrogates”, and “occlusion analysis”. We describe particular implementations of each of these and illustrate their use with a variety of models, all applied to the problem of long-lead forecasting monthly soil moisture in the North American corn belt given sea surface temperature anomalies in the Pacific Ocean.

AB - Historically, two primary criticisms statisticians have of machine learning and deep neural models is their lack of uncertainty quantification and the inability to do inference (i.e., to explain what inputs are important). Explainable AI has developed in the last few years as a sub-discipline of computer science and machine learning to mitigate these concerns (as well as concerns of fairness and transparency in deep modeling). In this article, our focus is on explaining which inputs are important in models for predicting environmental data. In particular, we focus on three general methods for explainability that are model agnostic and thus applicable across a breadth of models without internal explainability: “feature shuffling”, “interpretable local surrogates”, and “occlusion analysis”. We describe particular implementations of each of these and illustrate their use with a variety of models, all applied to the problem of long-lead forecasting monthly soil moisture in the North American corn belt given sea surface temperature anomalies in the Pacific Ocean.

KW - LIME

KW - Shapley values

KW - explainable AI

KW - feature shuffling

KW - machine learning

UR - http://www.scopus.com/inward/record.url?scp=85140365359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85140365359&partnerID=8YFLogxK

U2 - 10.1002/env.2772

DO - 10.1002/env.2772

M3 - Article

C2 - 37200542

AN - SCOPUS:85140365359

SN - 1180-4009

VL - 34

JO - Environmetrics

JF - Environmetrics

IS - 1

M1 - e2772

ER -

An illustration of model agnostic explainability methods applied to environmental data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this