Multiple imputation with large data sets: A case study of the children's mental health initiative

Research output: Contribution to journalArticle

Abstract

Multiple imputation is an effective method for dealing with missing data, and it is becoming increasingly common in many fields. However, the method is still relatively rarely used in epidemiology, perhaps in part because relatively few studies have looked at practical questions about how to implement multiple imputation in large data sets used for diverse purposes. This paper addresses this gap by focusing on the practicalities and diagnostics for multiple imputation in large data sets. It primarily discusses the method of multiple imputation by chained equations, which iterates through the data, imputing one variable at a time conditional on the others. Illustrative data were derived from 9,186 youths participating in the national evaluation of the Community Mental Health Services for Children and Their Families Program, a US federally funded program designed to develop and enhance community-based systems of care to meet the needs of children with serious emotional disturbances and their families. Multiple imputation was used to ensure that data analysis samples reflect the full population of youth participating in this program. This case study provides an illustration to assist researchers in implementing multiple imputation in their own data.

Original languageEnglish (US)
Pages (from-to)1133-1139
Number of pages7
JournalAmerican Journal of Epidemiology
Volume169
Issue number9
DOIs
StatePublished - May 2009

Fingerprint

Mental Health
Community Mental Health Services
Affective Symptoms
Epidemiology
Research Personnel
Population
Child Health
Datasets

Keywords

  • Mental health services
  • Missing at random
  • Missing data
  • Multiple imputation

ASJC Scopus subject areas

  • Epidemiology

Cite this

@article{bf2c42bf0c024cf8bf5f3bd1ba3884e1,
title = "Multiple imputation with large data sets: A case study of the children's mental health initiative",
abstract = "Multiple imputation is an effective method for dealing with missing data, and it is becoming increasingly common in many fields. However, the method is still relatively rarely used in epidemiology, perhaps in part because relatively few studies have looked at practical questions about how to implement multiple imputation in large data sets used for diverse purposes. This paper addresses this gap by focusing on the practicalities and diagnostics for multiple imputation in large data sets. It primarily discusses the method of multiple imputation by chained equations, which iterates through the data, imputing one variable at a time conditional on the others. Illustrative data were derived from 9,186 youths participating in the national evaluation of the Community Mental Health Services for Children and Their Families Program, a US federally funded program designed to develop and enhance community-based systems of care to meet the needs of children with serious emotional disturbances and their families. Multiple imputation was used to ensure that data analysis samples reflect the full population of youth participating in this program. This case study provides an illustration to assist researchers in implementing multiple imputation in their own data.",
keywords = "Mental health services, Missing at random, Missing data, Multiple imputation",
author = "Stuart, {Elizabeth A.} and Melissa Azur and Constantine Frangakis and Philip Leaf",
year = "2009",
month = "5",
doi = "10.1093/aje/kwp026",
language = "English (US)",
volume = "169",
pages = "1133--1139",
journal = "American Journal of Epidemiology",
issn = "0002-9262",
publisher = "Oxford University Press",
number = "9",

}

TY - JOUR

T1 - Multiple imputation with large data sets

T2 - A case study of the children's mental health initiative

AU - Stuart, Elizabeth A.

AU - Azur, Melissa

AU - Frangakis, Constantine

AU - Leaf, Philip

PY - 2009/5

Y1 - 2009/5

N2 - Multiple imputation is an effective method for dealing with missing data, and it is becoming increasingly common in many fields. However, the method is still relatively rarely used in epidemiology, perhaps in part because relatively few studies have looked at practical questions about how to implement multiple imputation in large data sets used for diverse purposes. This paper addresses this gap by focusing on the practicalities and diagnostics for multiple imputation in large data sets. It primarily discusses the method of multiple imputation by chained equations, which iterates through the data, imputing one variable at a time conditional on the others. Illustrative data were derived from 9,186 youths participating in the national evaluation of the Community Mental Health Services for Children and Their Families Program, a US federally funded program designed to develop and enhance community-based systems of care to meet the needs of children with serious emotional disturbances and their families. Multiple imputation was used to ensure that data analysis samples reflect the full population of youth participating in this program. This case study provides an illustration to assist researchers in implementing multiple imputation in their own data.

AB - Multiple imputation is an effective method for dealing with missing data, and it is becoming increasingly common in many fields. However, the method is still relatively rarely used in epidemiology, perhaps in part because relatively few studies have looked at practical questions about how to implement multiple imputation in large data sets used for diverse purposes. This paper addresses this gap by focusing on the practicalities and diagnostics for multiple imputation in large data sets. It primarily discusses the method of multiple imputation by chained equations, which iterates through the data, imputing one variable at a time conditional on the others. Illustrative data were derived from 9,186 youths participating in the national evaluation of the Community Mental Health Services for Children and Their Families Program, a US federally funded program designed to develop and enhance community-based systems of care to meet the needs of children with serious emotional disturbances and their families. Multiple imputation was used to ensure that data analysis samples reflect the full population of youth participating in this program. This case study provides an illustration to assist researchers in implementing multiple imputation in their own data.

KW - Mental health services

KW - Missing at random

KW - Missing data

KW - Multiple imputation

UR - http://www.scopus.com/inward/record.url?scp=65249094801&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=65249094801&partnerID=8YFLogxK

U2 - 10.1093/aje/kwp026

DO - 10.1093/aje/kwp026

M3 - Article

C2 - 19318618

AN - SCOPUS:65249094801

VL - 169

SP - 1133

EP - 1139

JO - American Journal of Epidemiology

JF - American Journal of Epidemiology

SN - 0002-9262

IS - 9

ER -