A Case Study Competition Among Methods for Analyzing Large Spatial Data

Matthew J. Heaton; Abhirup Datta; Andrew O. Finley; Reinhard Furrer; Joseph Guinness; Rajarshi Guhaniyogi; Florian Gerber; Robert B. Gramacy; Dorit Hammerling; Matthias Katzfuss; Finn Lindgren; Douglas W. Nychka; Furong Sun; Andrew Zammit-Mangion

doi:10.1007/s13253-018-00348-w

A Case Study Competition Among Methods for Analyzing Large Spatial Data

Matthew J. Heaton, Abhirup Datta, Andrew O. Finley, Reinhard Furrer, Joseph Guinness, Rajarshi Guhaniyogi, Florian Gerber, Robert B. Gramacy, Dorit Hammerling, Matthias Katzfuss, Finn Lindgren, Douglas W. Nychka, Furong Sun, Andrew Zammit-Mangion

Research output: Contribution to journal › Article › peer-review

66 Scopus citations

Abstract

The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online.

Original language	English (US)
Pages (from-to)	398-425
Number of pages	28
Journal	Journal of Agricultural, Biological, and Environmental Statistics
Volume	24
Issue number	3
DOIs	https://doi.org/10.1007/s13253-018-00348-w
State	Published - Sep 15 2019
Externally published	Yes

Keywords

Big data
Gaussian process
Low-rank approximation
Parallel computing

ASJC Scopus subject areas

Statistics and Probability
General Environmental Science
Agricultural and Biological Sciences (miscellaneous)
General Agricultural and Biological Sciences
Statistics, Probability and Uncertainty
Applied Mathematics

Access to Document

10.1007/s13253-018-00348-w

Cite this

Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., Lindgren, F., Nychka, D. W., Sun, F., & Zammit-Mangion, A. (2019). A Case Study Competition Among Methods for Analyzing Large Spatial Data. Journal of Agricultural, Biological, and Environmental Statistics, 24(3), 398-425. https://doi.org/10.1007/s13253-018-00348-w

Heaton, MJ, Datta, A, Finley, AO, Furrer, R, Guinness, J, Guhaniyogi, R, Gerber, F, Gramacy, RB, Hammerling, D, Katzfuss, M, Lindgren, F, Nychka, DW, Sun, F & Zammit-Mangion, A 2019, 'A Case Study Competition Among Methods for Analyzing Large Spatial Data', Journal of Agricultural, Biological, and Environmental Statistics, vol. 24, no. 3, pp. 398-425. https://doi.org/10.1007/s13253-018-00348-w

@article{dac221b26f8a4967b898d4fc69c59b06,

title = "A Case Study Competition Among Methods for Analyzing Large Spatial Data",

abstract = "The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online.",

keywords = "Big data, Gaussian process, Low-rank approximation, Parallel computing",

author = "Heaton, {Matthew J.} and Abhirup Datta and Finley, {Andrew O.} and Reinhard Furrer and Joseph Guinness and Rajarshi Guhaniyogi and Florian Gerber and Gramacy, {Robert B.} and Dorit Hammerling and Matthias Katzfuss and Finn Lindgren and Nychka, {Douglas W.} and Furong Sun and Andrew Zammit-Mangion",

note = "Funding Information: This material was based upon work supported by the National Science Foundation (NSF) under Grant Number DMS-1417856. Dr. Katzfuss was partially supported by NSF Grants DMS–1521676 and DMS–1654083. Dr. Gramacy and Furong Sun are partially supported by NSF Award #1621746. Dr. Finley was partially supported by NSF DMS-1513481, EF-1241874, EF-1253225, and National Aeronautics and Space Administration (NASA) Carbon Monitoring System (CMS) grants. Dr. Guhaniyogi is partially supported by ONR N00014-18-1-2741. Dr. Gerber and Dr. Furrer were partially supported by SNSF Grant 175529 and acknowledge the support by the University of Zurich Research Priority Program on Global Change and Biodiversity. Dr. Zammit-Mangion{\textquoteright}s research was supported by an Australian Research Council (ARC) Discovery Early Career Research Award, DE180100203. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the ARC, NSF or NASA. Publisher Copyright: {\textcopyright} 2018, The Author(s).",

year = "2019",

month = sep,

day = "15",

doi = "10.1007/s13253-018-00348-w",

language = "English (US)",

volume = "24",

pages = "398--425",

journal = "Journal of Agricultural, Biological, and Environmental Statistics",

issn = "1085-7117",

publisher = "Springer New York",

number = "3",

}

TY - JOUR

T1 - A Case Study Competition Among Methods for Analyzing Large Spatial Data

AU - Heaton, Matthew J.

AU - Datta, Abhirup

AU - Finley, Andrew O.

AU - Furrer, Reinhard

AU - Guinness, Joseph

AU - Guhaniyogi, Rajarshi

AU - Gerber, Florian

AU - Gramacy, Robert B.

AU - Hammerling, Dorit

AU - Katzfuss, Matthias

AU - Lindgren, Finn

AU - Nychka, Douglas W.

AU - Sun, Furong

AU - Zammit-Mangion, Andrew

N1 - Funding Information: This material was based upon work supported by the National Science Foundation (NSF) under Grant Number DMS-1417856. Dr. Katzfuss was partially supported by NSF Grants DMS–1521676 and DMS–1654083. Dr. Gramacy and Furong Sun are partially supported by NSF Award #1621746. Dr. Finley was partially supported by NSF DMS-1513481, EF-1241874, EF-1253225, and National Aeronautics and Space Administration (NASA) Carbon Monitoring System (CMS) grants. Dr. Guhaniyogi is partially supported by ONR N00014-18-1-2741. Dr. Gerber and Dr. Furrer were partially supported by SNSF Grant 175529 and acknowledge the support by the University of Zurich Research Priority Program on Global Change and Biodiversity. Dr. Zammit-Mangion’s research was supported by an Australian Research Council (ARC) Discovery Early Career Research Award, DE180100203. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the ARC, NSF or NASA. Publisher Copyright: © 2018, The Author(s).

PY - 2019/9/15

Y1 - 2019/9/15

N2 - The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online.

AB - The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online.

KW - Big data

KW - Gaussian process

KW - Low-rank approximation

KW - Parallel computing

UR - http://www.scopus.com/inward/record.url?scp=85056024454&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056024454&partnerID=8YFLogxK

U2 - 10.1007/s13253-018-00348-w

DO - 10.1007/s13253-018-00348-w

M3 - Article

C2 - 31496633

AN - SCOPUS:85056024454

SN - 1085-7117

VL - 24

SP - 398

EP - 425

JO - Journal of Agricultural, Biological, and Environmental Statistics

JF - Journal of Agricultural, Biological, and Environmental Statistics

IS - 3

ER -

A Case Study Competition Among Methods for Analyzing Large Spatial Data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this