Multiple imputation with large data sets: A case study of the children's mental health initiative

Elizabeth A. Stuart; Melissa Azur; Constantine Frangakis; Philip Leaf

doi:10.1093/aje/kwp026

Multiple imputation with large data sets: A case study of the children's mental health initiative

Elizabeth A. Stuart, Melissa Azur, Constantine Frangakis, Philip Leaf

Bloomberg School of Public Health

Research output: Contribution to journal › Article › peer-review

142 Scopus citations

Abstract

Multiple imputation is an effective method for dealing with missing data, and it is becoming increasingly common in many fields. However, the method is still relatively rarely used in epidemiology, perhaps in part because relatively few studies have looked at practical questions about how to implement multiple imputation in large data sets used for diverse purposes. This paper addresses this gap by focusing on the practicalities and diagnostics for multiple imputation in large data sets. It primarily discusses the method of multiple imputation by chained equations, which iterates through the data, imputing one variable at a time conditional on the others. Illustrative data were derived from 9,186 youths participating in the national evaluation of the Community Mental Health Services for Children and Their Families Program, a US federally funded program designed to develop and enhance community-based systems of care to meet the needs of children with serious emotional disturbances and their families. Multiple imputation was used to ensure that data analysis samples reflect the full population of youth participating in this program. This case study provides an illustration to assist researchers in implementing multiple imputation in their own data.

Original language	English (US)
Pages (from-to)	1133-1139
Number of pages	7
Journal	American journal of epidemiology
Volume	169
Issue number	9
DOIs	https://doi.org/10.1093/aje/kwp026
State	Published - May 2009

Keywords

Mental health services
Missing at random
Missing data
Multiple imputation

ASJC Scopus subject areas

Epidemiology

Access to Document

10.1093/aje/kwp026

Cite this

@article{bf2c42bf0c024cf8bf5f3bd1ba3884e1,

title = "Multiple imputation with large data sets: A case study of the children's mental health initiative",

abstract = "Multiple imputation is an effective method for dealing with missing data, and it is becoming increasingly common in many fields. However, the method is still relatively rarely used in epidemiology, perhaps in part because relatively few studies have looked at practical questions about how to implement multiple imputation in large data sets used for diverse purposes. This paper addresses this gap by focusing on the practicalities and diagnostics for multiple imputation in large data sets. It primarily discusses the method of multiple imputation by chained equations, which iterates through the data, imputing one variable at a time conditional on the others. Illustrative data were derived from 9,186 youths participating in the national evaluation of the Community Mental Health Services for Children and Their Families Program, a US federally funded program designed to develop and enhance community-based systems of care to meet the needs of children with serious emotional disturbances and their families. Multiple imputation was used to ensure that data analysis samples reflect the full population of youth participating in this program. This case study provides an illustration to assist researchers in implementing multiple imputation in their own data.",

keywords = "Mental health services, Missing at random, Missing data, Multiple imputation",

author = "Stuart, {Elizabeth A.} and Melissa Azur and Constantine Frangakis and Philip Leaf",

year = "2009",

month = may,

doi = "10.1093/aje/kwp026",

language = "English (US)",

volume = "169",

pages = "1133--1139",

journal = "American journal of epidemiology",

issn = "0002-9262",

publisher = "Oxford University Press",

number = "9",

}

TY - JOUR

T1 - Multiple imputation with large data sets

T2 - A case study of the children's mental health initiative

AU - Stuart, Elizabeth A.

AU - Azur, Melissa

AU - Frangakis, Constantine

AU - Leaf, Philip

PY - 2009/5

Y1 - 2009/5

N2 - Multiple imputation is an effective method for dealing with missing data, and it is becoming increasingly common in many fields. However, the method is still relatively rarely used in epidemiology, perhaps in part because relatively few studies have looked at practical questions about how to implement multiple imputation in large data sets used for diverse purposes. This paper addresses this gap by focusing on the practicalities and diagnostics for multiple imputation in large data sets. It primarily discusses the method of multiple imputation by chained equations, which iterates through the data, imputing one variable at a time conditional on the others. Illustrative data were derived from 9,186 youths participating in the national evaluation of the Community Mental Health Services for Children and Their Families Program, a US federally funded program designed to develop and enhance community-based systems of care to meet the needs of children with serious emotional disturbances and their families. Multiple imputation was used to ensure that data analysis samples reflect the full population of youth participating in this program. This case study provides an illustration to assist researchers in implementing multiple imputation in their own data.

AB - Multiple imputation is an effective method for dealing with missing data, and it is becoming increasingly common in many fields. However, the method is still relatively rarely used in epidemiology, perhaps in part because relatively few studies have looked at practical questions about how to implement multiple imputation in large data sets used for diverse purposes. This paper addresses this gap by focusing on the practicalities and diagnostics for multiple imputation in large data sets. It primarily discusses the method of multiple imputation by chained equations, which iterates through the data, imputing one variable at a time conditional on the others. Illustrative data were derived from 9,186 youths participating in the national evaluation of the Community Mental Health Services for Children and Their Families Program, a US federally funded program designed to develop and enhance community-based systems of care to meet the needs of children with serious emotional disturbances and their families. Multiple imputation was used to ensure that data analysis samples reflect the full population of youth participating in this program. This case study provides an illustration to assist researchers in implementing multiple imputation in their own data.

KW - Mental health services

KW - Missing at random

KW - Missing data

KW - Multiple imputation

UR - http://www.scopus.com/inward/record.url?scp=65249094801&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=65249094801&partnerID=8YFLogxK

U2 - 10.1093/aje/kwp026

DO - 10.1093/aje/kwp026

M3 - Article

C2 - 19318618

AN - SCOPUS:65249094801

SN - 0002-9262

VL - 169

SP - 1133

EP - 1139

JO - American journal of epidemiology

JF - American journal of epidemiology

IS - 9

ER -

Multiple imputation with large data sets: A case study of the children's mental health initiative

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this