Improved semiparametric time series models of air pollution and mortality

Francesca Dominici, Aidan McDermott, Trevor J. Hastie

Research output: Contribution to journalArticle

Abstract

In 2002, methodological issues around time series analyses of air pollution and health attracted the attention of the scientific community, policy makers, the press, and the diverse stakeholders concerned with air pollution. As the U.S. Environmental Protection Agency (EPA) was finalizing its most recent review of epidemiologic evidence on particulate matter air pollution (PM), statisticians and epidemiologists found that the S-PLUS implementation of generalized additive models (GAMs) can overestimate effects of air pollution and understate statistical uncertainty in time series studies of air pollution and health. This discovery delayed completion of the PM Criteria Document prepared as part of the review of the U.S. National Ambient Air Quality Standard, because the time series findings represented a critical component of the evidence. In addition, it raised concerns about the adequacy of current model formulations and their software implementations. In this article we provide improvements in semiparametric regression directly relevant to risk estimation in time series studies of air pollution. First, we introduce a closed-form estimate of the asymptotically exact covariance matrix of the linear component of a GAM. To ease the implementation of these calculations, we develop the S package gam.exact, an extended version of gam. Use of gam.exact allows a more robust assessment of the statistical uncertainty of the estimated pollution coefficients. Second, we develop a bandwidth selection method to reduce confounding bias in the pollution-mortality relationship due to unmeasured time-varying factors, such as season and influenza epidemics. Third, we introduce a conceptual framework to fully explore the sensitivity of the air pollution risk estimates to model choice. We apply our methods to data of the National Mortality Morbidity Air Pollution Study, which includes time series data from the 90 largest U.S. cities for the period 1987-1994.

Original languageEnglish (US)
Pages (from-to)938-948
Number of pages11
JournalJournal of the American Statistical Association
Volume99
Issue number468
DOIs
StatePublished - Dec 2004

Fingerprint

Air Pollution
Semiparametric Model
Time Series Models
Mortality
Time series
Generalized Additive Models
Pollution
Health
Model Choice
Semiparametric Regression
Uncertainty
Particulate Matter
Bandwidth Selection
Morbidity
Air Quality
Air pollution
Time series models
Confounding
Influenza
Time Series Data

Keywords

  • Bandwidth selection
  • Generalized additive model
  • Generalized linear Model
  • Mean squared error
  • Particulate matter
  • Semiparametric regression
  • Time series

ASJC Scopus subject areas

  • Mathematics(all)
  • Statistics and Probability

Cite this

Improved semiparametric time series models of air pollution and mortality. / Dominici, Francesca; McDermott, Aidan; Hastie, Trevor J.

In: Journal of the American Statistical Association, Vol. 99, No. 468, 12.2004, p. 938-948.

Research output: Contribution to journalArticle

@article{88bc8b92c6fb4517a7158e3bde4a8f44,
title = "Improved semiparametric time series models of air pollution and mortality",
abstract = "In 2002, methodological issues around time series analyses of air pollution and health attracted the attention of the scientific community, policy makers, the press, and the diverse stakeholders concerned with air pollution. As the U.S. Environmental Protection Agency (EPA) was finalizing its most recent review of epidemiologic evidence on particulate matter air pollution (PM), statisticians and epidemiologists found that the S-PLUS implementation of generalized additive models (GAMs) can overestimate effects of air pollution and understate statistical uncertainty in time series studies of air pollution and health. This discovery delayed completion of the PM Criteria Document prepared as part of the review of the U.S. National Ambient Air Quality Standard, because the time series findings represented a critical component of the evidence. In addition, it raised concerns about the adequacy of current model formulations and their software implementations. In this article we provide improvements in semiparametric regression directly relevant to risk estimation in time series studies of air pollution. First, we introduce a closed-form estimate of the asymptotically exact covariance matrix of the linear component of a GAM. To ease the implementation of these calculations, we develop the S package gam.exact, an extended version of gam. Use of gam.exact allows a more robust assessment of the statistical uncertainty of the estimated pollution coefficients. Second, we develop a bandwidth selection method to reduce confounding bias in the pollution-mortality relationship due to unmeasured time-varying factors, such as season and influenza epidemics. Third, we introduce a conceptual framework to fully explore the sensitivity of the air pollution risk estimates to model choice. We apply our methods to data of the National Mortality Morbidity Air Pollution Study, which includes time series data from the 90 largest U.S. cities for the period 1987-1994.",
keywords = "Bandwidth selection, Generalized additive model, Generalized linear Model, Mean squared error, Particulate matter, Semiparametric regression, Time series",
author = "Francesca Dominici and Aidan McDermott and Hastie, {Trevor J.}",
year = "2004",
month = "12",
doi = "10.1198/016214504000000656",
language = "English (US)",
volume = "99",
pages = "938--948",
journal = "Journal of the American Statistical Association",
issn = "0162-1459",
publisher = "Taylor and Francis Ltd.",
number = "468",

}

TY - JOUR

T1 - Improved semiparametric time series models of air pollution and mortality

AU - Dominici, Francesca

AU - McDermott, Aidan

AU - Hastie, Trevor J.

PY - 2004/12

Y1 - 2004/12

N2 - In 2002, methodological issues around time series analyses of air pollution and health attracted the attention of the scientific community, policy makers, the press, and the diverse stakeholders concerned with air pollution. As the U.S. Environmental Protection Agency (EPA) was finalizing its most recent review of epidemiologic evidence on particulate matter air pollution (PM), statisticians and epidemiologists found that the S-PLUS implementation of generalized additive models (GAMs) can overestimate effects of air pollution and understate statistical uncertainty in time series studies of air pollution and health. This discovery delayed completion of the PM Criteria Document prepared as part of the review of the U.S. National Ambient Air Quality Standard, because the time series findings represented a critical component of the evidence. In addition, it raised concerns about the adequacy of current model formulations and their software implementations. In this article we provide improvements in semiparametric regression directly relevant to risk estimation in time series studies of air pollution. First, we introduce a closed-form estimate of the asymptotically exact covariance matrix of the linear component of a GAM. To ease the implementation of these calculations, we develop the S package gam.exact, an extended version of gam. Use of gam.exact allows a more robust assessment of the statistical uncertainty of the estimated pollution coefficients. Second, we develop a bandwidth selection method to reduce confounding bias in the pollution-mortality relationship due to unmeasured time-varying factors, such as season and influenza epidemics. Third, we introduce a conceptual framework to fully explore the sensitivity of the air pollution risk estimates to model choice. We apply our methods to data of the National Mortality Morbidity Air Pollution Study, which includes time series data from the 90 largest U.S. cities for the period 1987-1994.

AB - In 2002, methodological issues around time series analyses of air pollution and health attracted the attention of the scientific community, policy makers, the press, and the diverse stakeholders concerned with air pollution. As the U.S. Environmental Protection Agency (EPA) was finalizing its most recent review of epidemiologic evidence on particulate matter air pollution (PM), statisticians and epidemiologists found that the S-PLUS implementation of generalized additive models (GAMs) can overestimate effects of air pollution and understate statistical uncertainty in time series studies of air pollution and health. This discovery delayed completion of the PM Criteria Document prepared as part of the review of the U.S. National Ambient Air Quality Standard, because the time series findings represented a critical component of the evidence. In addition, it raised concerns about the adequacy of current model formulations and their software implementations. In this article we provide improvements in semiparametric regression directly relevant to risk estimation in time series studies of air pollution. First, we introduce a closed-form estimate of the asymptotically exact covariance matrix of the linear component of a GAM. To ease the implementation of these calculations, we develop the S package gam.exact, an extended version of gam. Use of gam.exact allows a more robust assessment of the statistical uncertainty of the estimated pollution coefficients. Second, we develop a bandwidth selection method to reduce confounding bias in the pollution-mortality relationship due to unmeasured time-varying factors, such as season and influenza epidemics. Third, we introduce a conceptual framework to fully explore the sensitivity of the air pollution risk estimates to model choice. We apply our methods to data of the National Mortality Morbidity Air Pollution Study, which includes time series data from the 90 largest U.S. cities for the period 1987-1994.

KW - Bandwidth selection

KW - Generalized additive model

KW - Generalized linear Model

KW - Mean squared error

KW - Particulate matter

KW - Semiparametric regression

KW - Time series

UR - http://www.scopus.com/inward/record.url?scp=10844238813&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=10844238813&partnerID=8YFLogxK

U2 - 10.1198/016214504000000656

DO - 10.1198/016214504000000656

M3 - Article

AN - SCOPUS:10844238813

VL - 99

SP - 938

EP - 948

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

SN - 0162-1459

IS - 468

ER -