TY - JOUR
T1 - A Case Study Competition Among Methods for Analyzing Large Spatial Data
AU - Heaton, Matthew J.
AU - Datta, Abhirup
AU - Finley, Andrew O.
AU - Furrer, Reinhard
AU - Guinness, Joseph
AU - Guhaniyogi, Rajarshi
AU - Gerber, Florian
AU - Gramacy, Robert B.
AU - Hammerling, Dorit
AU - Katzfuss, Matthias
AU - Lindgren, Finn
AU - Nychka, Douglas W.
AU - Sun, Furong
AU - Zammit-Mangion, Andrew
N1 - Funding Information:
This material was based upon work supported by the National Science Foundation (NSF) under Grant Number DMS-1417856. Dr. Katzfuss was partially supported by NSF Grants DMS–1521676 and DMS–1654083. Dr. Gramacy and Furong Sun are partially supported by NSF Award #1621746. Dr. Finley was partially supported by NSF DMS-1513481, EF-1241874, EF-1253225, and National Aeronautics and Space Administration (NASA) Carbon Monitoring System (CMS) grants. Dr. Guhaniyogi is partially supported by ONR N00014-18-1-2741. Dr. Gerber and Dr. Furrer were partially supported by SNSF Grant 175529 and acknowledge the support by the University of Zurich Research Priority Program on Global Change and Biodiversity. Dr. Zammit-Mangion’s research was supported by an Australian Research Council (ARC) Discovery Early Career Research Award, DE180100203. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the ARC, NSF or NASA.
Publisher Copyright:
© 2018, The Author(s).
PY - 2019/9/15
Y1 - 2019/9/15
N2 - The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online.
AB - The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online.
KW - Big data
KW - Gaussian process
KW - Low-rank approximation
KW - Parallel computing
UR - http://www.scopus.com/inward/record.url?scp=85056024454&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056024454&partnerID=8YFLogxK
U2 - 10.1007/s13253-018-00348-w
DO - 10.1007/s13253-018-00348-w
M3 - Article
C2 - 31496633
AN - SCOPUS:85056024454
SN - 1085-7117
VL - 24
SP - 398
EP - 425
JO - Journal of Agricultural, Biological, and Environmental Statistics
JF - Journal of Agricultural, Biological, and Environmental Statistics
IS - 3
ER -