A smoothing approach for masking spatial data

Yijie Zhou, Francesca Dominici, Thomas Louis

Research output: Contribution to journalArticle

Abstract

Individual-level health data are often not publicly available due to confidentiality; masked data are released instead. Therefore, it is important to evaluate the utility of using the masked data in statistical analyses such as regression. In this paper we propose a data masking method which is based on spatial smoothing techniques. The proposed method allows for selecting both the form and the degree of masking, thus resulting in a large degree of flexibility. We investigate the utility of the masked data sets in terms of the mean square error (MSE) of regression parameter estimates when fitting a Generalized Linear Model (GLM) to the masked data. We also show that incorporating prior knowledge on the spatial pattern of the exposure into the data masking may reduce the bias and MSE of the parameter estimates. By evaluating both utility and disclosure risk as functions of the form and the degree of masking, our method produces a risk-utility profile which can facilitate the selection of masking parameters. We apply the method to a study of racial disparities in mortality rates using data on more than 4 million Medicare enrollees residing in 2095 zip codes in the Northeast region of the United States.

Original languageEnglish (US)
Pages (from-to)1451-1475
Number of pages25
JournalAnnals of Applied Statistics
Volume4
Issue number3
DOIs
StatePublished - Sep 2010

Fingerprint

Masking
Spatial Data
Mean square error
Smoothing
Health
Regression
Smoothing Techniques
Mortality Rate
Disclosure
Confidentiality
Spatial Pattern
Generalized Linear Model
Prior Knowledge
Estimate
Flexibility
Data masking
Evaluate

Keywords

  • Data masking
  • Data utility
  • Disclosure risk
  • Spatial smoothing
  • Statistical disclosure limitation

ASJC Scopus subject areas

  • Statistics and Probability
  • Modeling and Simulation
  • Statistics, Probability and Uncertainty

Cite this

A smoothing approach for masking spatial data. / Zhou, Yijie; Dominici, Francesca; Louis, Thomas.

In: Annals of Applied Statistics, Vol. 4, No. 3, 09.2010, p. 1451-1475.

Research output: Contribution to journalArticle

Zhou, Yijie ; Dominici, Francesca ; Louis, Thomas. / A smoothing approach for masking spatial data. In: Annals of Applied Statistics. 2010 ; Vol. 4, No. 3. pp. 1451-1475.
@article{e5274bbc58a645e7be696780ffb0d4f4,
title = "A smoothing approach for masking spatial data",
abstract = "Individual-level health data are often not publicly available due to confidentiality; masked data are released instead. Therefore, it is important to evaluate the utility of using the masked data in statistical analyses such as regression. In this paper we propose a data masking method which is based on spatial smoothing techniques. The proposed method allows for selecting both the form and the degree of masking, thus resulting in a large degree of flexibility. We investigate the utility of the masked data sets in terms of the mean square error (MSE) of regression parameter estimates when fitting a Generalized Linear Model (GLM) to the masked data. We also show that incorporating prior knowledge on the spatial pattern of the exposure into the data masking may reduce the bias and MSE of the parameter estimates. By evaluating both utility and disclosure risk as functions of the form and the degree of masking, our method produces a risk-utility profile which can facilitate the selection of masking parameters. We apply the method to a study of racial disparities in mortality rates using data on more than 4 million Medicare enrollees residing in 2095 zip codes in the Northeast region of the United States.",
keywords = "Data masking, Data utility, Disclosure risk, Spatial smoothing, Statistical disclosure limitation",
author = "Yijie Zhou and Francesca Dominici and Thomas Louis",
year = "2010",
month = "9",
doi = "10.1214/09-AOAS325",
language = "English (US)",
volume = "4",
pages = "1451--1475",
journal = "Annals of Applied Statistics",
issn = "1932-6157",
publisher = "Institute of Mathematical Statistics",
number = "3",

}

TY - JOUR

T1 - A smoothing approach for masking spatial data

AU - Zhou, Yijie

AU - Dominici, Francesca

AU - Louis, Thomas

PY - 2010/9

Y1 - 2010/9

N2 - Individual-level health data are often not publicly available due to confidentiality; masked data are released instead. Therefore, it is important to evaluate the utility of using the masked data in statistical analyses such as regression. In this paper we propose a data masking method which is based on spatial smoothing techniques. The proposed method allows for selecting both the form and the degree of masking, thus resulting in a large degree of flexibility. We investigate the utility of the masked data sets in terms of the mean square error (MSE) of regression parameter estimates when fitting a Generalized Linear Model (GLM) to the masked data. We also show that incorporating prior knowledge on the spatial pattern of the exposure into the data masking may reduce the bias and MSE of the parameter estimates. By evaluating both utility and disclosure risk as functions of the form and the degree of masking, our method produces a risk-utility profile which can facilitate the selection of masking parameters. We apply the method to a study of racial disparities in mortality rates using data on more than 4 million Medicare enrollees residing in 2095 zip codes in the Northeast region of the United States.

AB - Individual-level health data are often not publicly available due to confidentiality; masked data are released instead. Therefore, it is important to evaluate the utility of using the masked data in statistical analyses such as regression. In this paper we propose a data masking method which is based on spatial smoothing techniques. The proposed method allows for selecting both the form and the degree of masking, thus resulting in a large degree of flexibility. We investigate the utility of the masked data sets in terms of the mean square error (MSE) of regression parameter estimates when fitting a Generalized Linear Model (GLM) to the masked data. We also show that incorporating prior knowledge on the spatial pattern of the exposure into the data masking may reduce the bias and MSE of the parameter estimates. By evaluating both utility and disclosure risk as functions of the form and the degree of masking, our method produces a risk-utility profile which can facilitate the selection of masking parameters. We apply the method to a study of racial disparities in mortality rates using data on more than 4 million Medicare enrollees residing in 2095 zip codes in the Northeast region of the United States.

KW - Data masking

KW - Data utility

KW - Disclosure risk

KW - Spatial smoothing

KW - Statistical disclosure limitation

UR - http://www.scopus.com/inward/record.url?scp=84870265884&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84870265884&partnerID=8YFLogxK

U2 - 10.1214/09-AOAS325

DO - 10.1214/09-AOAS325

M3 - Article

AN - SCOPUS:84870265884

VL - 4

SP - 1451

EP - 1475

JO - Annals of Applied Statistics

JF - Annals of Applied Statistics

SN - 1932-6157

IS - 3

ER -