Analysis of cohort studies with multivariate and partially observed disease classification data

Nilanjan Chatterjee, Samiran Sinha, W. Ryan Diver, Heather Spencer Feigelson

Research output: Contribution to journalArticlepeer-review

18 Scopus citations


Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort.

Original languageEnglish (US)
Pages (from-to)683-698
Number of pages16
Issue number3
StatePublished - Sep 2010
Externally publishedYes


  • Competing-risk
  • Etiologic heterogeneity
  • Influence function
  • Missing cause of failure
  • Partial likelihood
  • Proportional hazard regression
  • Two-stage model

ASJC Scopus subject areas

  • Statistics and Probability
  • Mathematics(all)
  • Agricultural and Biological Sciences (miscellaneous)
  • Agricultural and Biological Sciences(all)
  • Statistics, Probability and Uncertainty
  • Applied Mathematics


Dive into the research topics of 'Analysis of cohort studies with multivariate and partially observed disease classification data'. Together they form a unique fingerprint.

Cite this