On nearest-neighbor Gaussian process models for massive spatial data

Abhirup Datta, Sudipto Banerjee, Andrew O. Finley, Alan E. Gelfand

Research output: Contribution to journalReview article

Abstract

Gaussian Process (GP) models provide a very flexible nonparametric approach to modeling location-and-time indexed datasets. However, the storage and computational requirements for GP models are infeasible for large spatial datasets. Nearest Neighbor Gaussian Processes (Datta A, Banerjee S, Finley AO, Gelfand AE. Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. J Am Stat Assoc 2016., JASA) provide a scalable alternative by using local information from few nearest neighbors. Scalability is achieved by using the neighbor sets in a conditional specification of the model. We show how this is equivalent to sparse modeling of Cholesky factors of large covariance matrices. We also discuss a general approach to construct scalable Gaussian Processes using sparse local kriging. We present a multivariate data analysis which demonstrates how the nearest neighbor approach yields inference indistinguishable from the full rank GP despite being several times faster. Finally, we also propose a variant of the NNGP model for automating the selection of the neighbor set size. WIREs Comput Stat 2016, 8:162–171. doi: 10.1002/wics.1383. For further resources related to this article, please visit the WIREs website.

Original languageEnglish (US)
Pages (from-to)162-171
Number of pages10
JournalWiley Interdisciplinary Reviews: Computational Statistics
Volume8
Issue number5
DOIs
StatePublished - Sep 1 2016
Externally publishedYes

Fingerprint

Gaussian Model
Spatial Data
Gaussian Process
Process Model
Nearest Neighbor
Multivariate Data Analysis
Cholesky
Kriging
Modeling
Large Data Sets
Covariance matrix
Scalability
Specification
Resources
Alternatives
Requirements
Model
Demonstrate

Keywords

  • Bayesian methods and theory
  • computational Bayesian methods
  • data structures
  • image and spatial data

ASJC Scopus subject areas

  • Statistics and Probability

Cite this

On nearest-neighbor Gaussian process models for massive spatial data. / Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O.; Gelfand, Alan E.

In: Wiley Interdisciplinary Reviews: Computational Statistics, Vol. 8, No. 5, 01.09.2016, p. 162-171.

Research output: Contribution to journalReview article

Datta, Abhirup ; Banerjee, Sudipto ; Finley, Andrew O. ; Gelfand, Alan E. / On nearest-neighbor Gaussian process models for massive spatial data. In: Wiley Interdisciplinary Reviews: Computational Statistics. 2016 ; Vol. 8, No. 5. pp. 162-171.
@article{ef90ee85ae6e49f4b407a6076461122a,
title = "On nearest-neighbor Gaussian process models for massive spatial data",
abstract = "Gaussian Process (GP) models provide a very flexible nonparametric approach to modeling location-and-time indexed datasets. However, the storage and computational requirements for GP models are infeasible for large spatial datasets. Nearest Neighbor Gaussian Processes (Datta A, Banerjee S, Finley AO, Gelfand AE. Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. J Am Stat Assoc 2016., JASA) provide a scalable alternative by using local information from few nearest neighbors. Scalability is achieved by using the neighbor sets in a conditional specification of the model. We show how this is equivalent to sparse modeling of Cholesky factors of large covariance matrices. We also discuss a general approach to construct scalable Gaussian Processes using sparse local kriging. We present a multivariate data analysis which demonstrates how the nearest neighbor approach yields inference indistinguishable from the full rank GP despite being several times faster. Finally, we also propose a variant of the NNGP model for automating the selection of the neighbor set size. WIREs Comput Stat 2016, 8:162–171. doi: 10.1002/wics.1383. For further resources related to this article, please visit the WIREs website.",
keywords = "Bayesian methods and theory, computational Bayesian methods, data structures, image and spatial data",
author = "Abhirup Datta and Sudipto Banerjee and Finley, {Andrew O.} and Gelfand, {Alan E.}",
year = "2016",
month = "9",
day = "1",
doi = "10.1002/wics.1383",
language = "English (US)",
volume = "8",
pages = "162--171",
journal = "Wiley Interdisciplinary Reviews: Computational Statistics",
issn = "1939-5108",
publisher = "John Wiley and Sons Inc.",
number = "5",

}

TY - JOUR

T1 - On nearest-neighbor Gaussian process models for massive spatial data

AU - Datta, Abhirup

AU - Banerjee, Sudipto

AU - Finley, Andrew O.

AU - Gelfand, Alan E.

PY - 2016/9/1

Y1 - 2016/9/1

N2 - Gaussian Process (GP) models provide a very flexible nonparametric approach to modeling location-and-time indexed datasets. However, the storage and computational requirements for GP models are infeasible for large spatial datasets. Nearest Neighbor Gaussian Processes (Datta A, Banerjee S, Finley AO, Gelfand AE. Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. J Am Stat Assoc 2016., JASA) provide a scalable alternative by using local information from few nearest neighbors. Scalability is achieved by using the neighbor sets in a conditional specification of the model. We show how this is equivalent to sparse modeling of Cholesky factors of large covariance matrices. We also discuss a general approach to construct scalable Gaussian Processes using sparse local kriging. We present a multivariate data analysis which demonstrates how the nearest neighbor approach yields inference indistinguishable from the full rank GP despite being several times faster. Finally, we also propose a variant of the NNGP model for automating the selection of the neighbor set size. WIREs Comput Stat 2016, 8:162–171. doi: 10.1002/wics.1383. For further resources related to this article, please visit the WIREs website.

AB - Gaussian Process (GP) models provide a very flexible nonparametric approach to modeling location-and-time indexed datasets. However, the storage and computational requirements for GP models are infeasible for large spatial datasets. Nearest Neighbor Gaussian Processes (Datta A, Banerjee S, Finley AO, Gelfand AE. Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. J Am Stat Assoc 2016., JASA) provide a scalable alternative by using local information from few nearest neighbors. Scalability is achieved by using the neighbor sets in a conditional specification of the model. We show how this is equivalent to sparse modeling of Cholesky factors of large covariance matrices. We also discuss a general approach to construct scalable Gaussian Processes using sparse local kriging. We present a multivariate data analysis which demonstrates how the nearest neighbor approach yields inference indistinguishable from the full rank GP despite being several times faster. Finally, we also propose a variant of the NNGP model for automating the selection of the neighbor set size. WIREs Comput Stat 2016, 8:162–171. doi: 10.1002/wics.1383. For further resources related to this article, please visit the WIREs website.

KW - Bayesian methods and theory

KW - computational Bayesian methods

KW - data structures

KW - image and spatial data

UR - http://www.scopus.com/inward/record.url?scp=84983063683&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84983063683&partnerID=8YFLogxK

U2 - 10.1002/wics.1383

DO - 10.1002/wics.1383

M3 - Review article

AN - SCOPUS:84983063683

VL - 8

SP - 162

EP - 171

JO - Wiley Interdisciplinary Reviews: Computational Statistics

JF - Wiley Interdisciplinary Reviews: Computational Statistics

SN - 1939-5108

IS - 5

ER -