Expanding the role of synthetic data at the U.S. Census Bureau

Ron S. Jarmin, Thomas Louis, Javier Miranda

Research output: Contribution to journalArticle

Abstract

National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of users to extract as much information from the data as possible. Traditional disclosure protection techniques result in official data products that do not fully utilize the information content of the underlying microdata. Typically, these products take the form of simple aggregate tabulations. In a few cases anonymized public-use micro samples are made available, but these face a growing risk of re-identification by the increasing amounts of information about individuals and firms available in the public domain. One approach for overcoming these risks is to release products based on synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata. We discuss recent Census Bureau work to develop and deploy such products. We discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics.

Original languageEnglish (US)
Pages (from-to)117-121
Number of pages5
JournalStatistical Journal of the IAOS
Volume30
Issue number2
DOIs
StatePublished - 2014

Fingerprint

Census
Synthetic Data
Statistics
Official Statistics
Confidentiality
Odds
Disclosure
Information Content
Joint Distribution
Statistical Model
Micro data
Statistical Models
Official statistics

Keywords

  • Confidentiality
  • official statistics
  • synthetic micro data

ASJC Scopus subject areas

  • Management Information Systems
  • Statistics, Probability and Uncertainty
  • Economics and Econometrics

Cite this

Expanding the role of synthetic data at the U.S. Census Bureau. / Jarmin, Ron S.; Louis, Thomas; Miranda, Javier.

In: Statistical Journal of the IAOS, Vol. 30, No. 2, 2014, p. 117-121.

Research output: Contribution to journalArticle

Jarmin, Ron S. ; Louis, Thomas ; Miranda, Javier. / Expanding the role of synthetic data at the U.S. Census Bureau. In: Statistical Journal of the IAOS. 2014 ; Vol. 30, No. 2. pp. 117-121.
@article{8e6c4064181749cd8e95c762e0d1a084,
title = "Expanding the role of synthetic data at the U.S. Census Bureau",
abstract = "National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of users to extract as much information from the data as possible. Traditional disclosure protection techniques result in official data products that do not fully utilize the information content of the underlying microdata. Typically, these products take the form of simple aggregate tabulations. In a few cases anonymized public-use micro samples are made available, but these face a growing risk of re-identification by the increasing amounts of information about individuals and firms available in the public domain. One approach for overcoming these risks is to release products based on synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata. We discuss recent Census Bureau work to develop and deploy such products. We discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics.",
keywords = "Confidentiality, official statistics, synthetic micro data",
author = "Jarmin, {Ron S.} and Thomas Louis and Javier Miranda",
year = "2014",
doi = "10.3233/SJI-140813",
language = "English (US)",
volume = "30",
pages = "117--121",
journal = "Statistical Journal of the IAOS",
issn = "1874-7655",
publisher = "IOS Press",
number = "2",

}

TY - JOUR

T1 - Expanding the role of synthetic data at the U.S. Census Bureau

AU - Jarmin, Ron S.

AU - Louis, Thomas

AU - Miranda, Javier

PY - 2014

Y1 - 2014

N2 - National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of users to extract as much information from the data as possible. Traditional disclosure protection techniques result in official data products that do not fully utilize the information content of the underlying microdata. Typically, these products take the form of simple aggregate tabulations. In a few cases anonymized public-use micro samples are made available, but these face a growing risk of re-identification by the increasing amounts of information about individuals and firms available in the public domain. One approach for overcoming these risks is to release products based on synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata. We discuss recent Census Bureau work to develop and deploy such products. We discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics.

AB - National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of users to extract as much information from the data as possible. Traditional disclosure protection techniques result in official data products that do not fully utilize the information content of the underlying microdata. Typically, these products take the form of simple aggregate tabulations. In a few cases anonymized public-use micro samples are made available, but these face a growing risk of re-identification by the increasing amounts of information about individuals and firms available in the public domain. One approach for overcoming these risks is to release products based on synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata. We discuss recent Census Bureau work to develop and deploy such products. We discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics.

KW - Confidentiality

KW - official statistics

KW - synthetic micro data

UR - http://www.scopus.com/inward/record.url?scp=84948155519&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84948155519&partnerID=8YFLogxK

U2 - 10.3233/SJI-140813

DO - 10.3233/SJI-140813

M3 - Article

AN - SCOPUS:84948155519

VL - 30

SP - 117

EP - 121

JO - Statistical Journal of the IAOS

JF - Statistical Journal of the IAOS

SN - 1874-7655

IS - 2

ER -