Abstract
In epidemiologic studies, partial questionnaire design (PQD) can reduce cost, time, and other practical burdens associated with lengthy questionnaires by assigning different subsets of the questionnaire to different, but overlapping, subsets of the study participants. In this article, we describe methods for semiparametric inference for regression model under PQD and other study settings that can generate nonmonotone missing data in covariates. In particular, motivated from methods for multiphase designs, we develop three estimators, namely mean score, pseudo-likelihood, and semiparametric maximum likelihood, each of which has some unique advantages. We develop the asymptotic theory and a sandwich variance estimator for each of the estimators under the underlying semiparametric model that allows the distribution of the covariates to remain nonparametric. We study the finite sample performances and relative efficiencies of the methods using simulation studies. We illustrate the methods using data from a case-control study of non-Hodgkin's lymphoma where the data on the main chemical exposures of interest are collected using two different instruments on two different, but overlapping, subsets of the participants. This article has supplementary material online.
Original language | English (US) |
---|---|
Pages (from-to) | 787-797 |
Number of pages | 11 |
Journal | Journal of the American Statistical Association |
Volume | 105 |
Issue number | 490 |
DOIs | |
State | Published - Jun 1 2010 |
Externally published | Yes |
Keywords
- Mean score
- Multiphase design
- Outcome dependent sampling
- Pseudo-likelihood
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty