TY - JOUR
T1 - Inference in semiparametric regression models under partial questionnaire design and nonmonotone missing data
AU - Chatterjee, Nilanjan
AU - Li, Yan
N1 - Funding Information:
Nilanjan Chatterjee is Senior Principal Investigator, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Rockville, MD 20852 (E-mail: chattern@mail.nih.gov). Yan Li is Assistant Professor, Department of Mathematics, University of Texas, Arlington, TX 76019. This research was supported by the Intramural Research Program of the National Cancer Institute (NCI), NIH, DHHS. The authors would like to thank Patricia Hartge and Joanne Colt at NCI for making the NHL data available for our analysis.
PY - 2010/6
Y1 - 2010/6
N2 - In epidemiologic studies, partial questionnaire design (PQD) can reduce cost, time, and other practical burdens associated with lengthy questionnaires by assigning different subsets of the questionnaire to different, but overlapping, subsets of the study participants. In this article, we describe methods for semiparametric inference for regression model under PQD and other study settings that can generate nonmonotone missing data in covariates. In particular, motivated from methods for multiphase designs, we develop three estimators, namely mean score, pseudo-likelihood, and semiparametric maximum likelihood, each of which has some unique advantages. We develop the asymptotic theory and a sandwich variance estimator for each of the estimators under the underlying semiparametric model that allows the distribution of the covariates to remain nonparametric. We study the finite sample performances and relative efficiencies of the methods using simulation studies. We illustrate the methods using data from a case-control study of non-Hodgkin's lymphoma where the data on the main chemical exposures of interest are collected using two different instruments on two different, but overlapping, subsets of the participants. This article has supplementary material online.
AB - In epidemiologic studies, partial questionnaire design (PQD) can reduce cost, time, and other practical burdens associated with lengthy questionnaires by assigning different subsets of the questionnaire to different, but overlapping, subsets of the study participants. In this article, we describe methods for semiparametric inference for regression model under PQD and other study settings that can generate nonmonotone missing data in covariates. In particular, motivated from methods for multiphase designs, we develop three estimators, namely mean score, pseudo-likelihood, and semiparametric maximum likelihood, each of which has some unique advantages. We develop the asymptotic theory and a sandwich variance estimator for each of the estimators under the underlying semiparametric model that allows the distribution of the covariates to remain nonparametric. We study the finite sample performances and relative efficiencies of the methods using simulation studies. We illustrate the methods using data from a case-control study of non-Hodgkin's lymphoma where the data on the main chemical exposures of interest are collected using two different instruments on two different, but overlapping, subsets of the participants. This article has supplementary material online.
KW - Mean score
KW - Multiphase design
KW - Outcome dependent sampling
KW - Pseudo-likelihood
UR - http://www.scopus.com/inward/record.url?scp=78649397774&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78649397774&partnerID=8YFLogxK
U2 - 10.1198/jasa.2010.tm08756
DO - 10.1198/jasa.2010.tm08756
M3 - Article
AN - SCOPUS:78649397774
VL - 105
SP - 787
EP - 797
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
SN - 0162-1459
IS - 490
ER -