TY - JOUR
T1 - Multiple imputation with large data sets
T2 - A case study of the children's mental health initiative
AU - Stuart, Elizabeth A.
AU - Azur, Melissa
AU - Frangakis, Constantine
AU - Leaf, Philip
PY - 2009/5
Y1 - 2009/5
N2 - Multiple imputation is an effective method for dealing with missing data, and it is becoming increasingly common in many fields. However, the method is still relatively rarely used in epidemiology, perhaps in part because relatively few studies have looked at practical questions about how to implement multiple imputation in large data sets used for diverse purposes. This paper addresses this gap by focusing on the practicalities and diagnostics for multiple imputation in large data sets. It primarily discusses the method of multiple imputation by chained equations, which iterates through the data, imputing one variable at a time conditional on the others. Illustrative data were derived from 9,186 youths participating in the national evaluation of the Community Mental Health Services for Children and Their Families Program, a US federally funded program designed to develop and enhance community-based systems of care to meet the needs of children with serious emotional disturbances and their families. Multiple imputation was used to ensure that data analysis samples reflect the full population of youth participating in this program. This case study provides an illustration to assist researchers in implementing multiple imputation in their own data.
AB - Multiple imputation is an effective method for dealing with missing data, and it is becoming increasingly common in many fields. However, the method is still relatively rarely used in epidemiology, perhaps in part because relatively few studies have looked at practical questions about how to implement multiple imputation in large data sets used for diverse purposes. This paper addresses this gap by focusing on the practicalities and diagnostics for multiple imputation in large data sets. It primarily discusses the method of multiple imputation by chained equations, which iterates through the data, imputing one variable at a time conditional on the others. Illustrative data were derived from 9,186 youths participating in the national evaluation of the Community Mental Health Services for Children and Their Families Program, a US federally funded program designed to develop and enhance community-based systems of care to meet the needs of children with serious emotional disturbances and their families. Multiple imputation was used to ensure that data analysis samples reflect the full population of youth participating in this program. This case study provides an illustration to assist researchers in implementing multiple imputation in their own data.
KW - Mental health services
KW - Missing at random
KW - Missing data
KW - Multiple imputation
UR - http://www.scopus.com/inward/record.url?scp=65249094801&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=65249094801&partnerID=8YFLogxK
U2 - 10.1093/aje/kwp026
DO - 10.1093/aje/kwp026
M3 - Article
C2 - 19318618
AN - SCOPUS:65249094801
SN - 0002-9262
VL - 169
SP - 1133
EP - 1139
JO - American journal of epidemiology
JF - American journal of epidemiology
IS - 9
ER -