Inference in semiparametric regression models under partial questionnaire design and nonmonotone missing data

Research output: Contribution to journalArticle

Abstract

In epidemiologic studies, partial questionnaire design (PQD) can reduce cost, time, and other practical burdens associated with lengthy questionnaires by assigning different subsets of the questionnaire to different, but overlapping, subsets of the study participants. In this article, we describe methods for semiparametric inference for regression model under PQD and other study settings that can generate nonmonotone missing data in covariates. In particular, motivated from methods for multiphase designs, we develop three estimators, namely mean score, pseudo-likelihood, and semiparametric maximum likelihood, each of which has some unique advantages. We develop the asymptotic theory and a sandwich variance estimator for each of the estimators under the underlying semiparametric model that allows the distribution of the covariates to remain nonparametric. We study the finite sample performances and relative efficiencies of the methods using simulation studies. We illustrate the methods using data from a case-control study of non-Hodgkin's lymphoma where the data on the main chemical exposures of interest are collected using two different instruments on two different, but overlapping, subsets of the participants. This article has supplementary material online.

Original languageEnglish (US)
Pages (from-to)787-797
Number of pages11
JournalJournal of the American Statistical Association
Volume105
Issue number490
DOIs
Publication statusPublished - Jun 2010
Externally publishedYes

    Fingerprint

Keywords

  • Mean score
  • Multiphase design
  • Outcome dependent sampling
  • Pseudo-likelihood

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this