Analysis of cohort studies with multivariate and partially observed disease classification data

Nilanjan Chatterjee, Samiran Sinha, W. Ryan Diver, Heather Spencer Feigelson

Research output: Contribution to journalArticle

Abstract

Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort.

Original languageEnglish (US)
Pages (from-to)683-698
Number of pages16
JournalBiometrika
Volume97
Issue number3
DOIs
StatePublished - Sep 2010
Externally publishedYes

Fingerprint

Cohort Study
Data Classification
cohort studies
Cohort Studies
neoplasms
Estimating Equation
sandwiches
disease incidence
Cancer
Sandwich Estimator
methodology
nutrition
Proportional Hazards Regression
Cox Regression
Unbiasedness
Hazard Models
Missing at Random
Competing Risks
Influence Function
Nutrition

Keywords

  • Competing-risk
  • Etiologic heterogeneity
  • Influence function
  • Missing cause of failure
  • Partial likelihood
  • Proportional hazard regression
  • Two-stage model

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Agricultural and Biological Sciences (miscellaneous)
  • Statistics, Probability and Uncertainty
  • Mathematics(all)
  • Applied Mathematics
  • Statistics and Probability

Cite this

Analysis of cohort studies with multivariate and partially observed disease classification data. / Chatterjee, Nilanjan; Sinha, Samiran; Diver, W. Ryan; Feigelson, Heather Spencer.

In: Biometrika, Vol. 97, No. 3, 09.2010, p. 683-698.

Research output: Contribution to journalArticle

Chatterjee, Nilanjan ; Sinha, Samiran ; Diver, W. Ryan ; Feigelson, Heather Spencer. / Analysis of cohort studies with multivariate and partially observed disease classification data. In: Biometrika. 2010 ; Vol. 97, No. 3. pp. 683-698.
@article{5588fdfa438b4592ab5f883bd1c06aa1,
title = "Analysis of cohort studies with multivariate and partially observed disease classification data",
abstract = "Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort.",
keywords = "Competing-risk, Etiologic heterogeneity, Influence function, Missing cause of failure, Partial likelihood, Proportional hazard regression, Two-stage model",
author = "Nilanjan Chatterjee and Samiran Sinha and Diver, {W. Ryan} and Feigelson, {Heather Spencer}",
year = "2010",
month = "9",
doi = "10.1093/biomet/asq036",
language = "English (US)",
volume = "97",
pages = "683--698",
journal = "Biometrika",
issn = "0006-3444",
publisher = "Oxford University Press",
number = "3",

}

TY - JOUR

T1 - Analysis of cohort studies with multivariate and partially observed disease classification data

AU - Chatterjee, Nilanjan

AU - Sinha, Samiran

AU - Diver, W. Ryan

AU - Feigelson, Heather Spencer

PY - 2010/9

Y1 - 2010/9

N2 - Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort.

AB - Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort.

KW - Competing-risk

KW - Etiologic heterogeneity

KW - Influence function

KW - Missing cause of failure

KW - Partial likelihood

KW - Proportional hazard regression

KW - Two-stage model

UR - http://www.scopus.com/inward/record.url?scp=77955885005&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955885005&partnerID=8YFLogxK

U2 - 10.1093/biomet/asq036

DO - 10.1093/biomet/asq036

M3 - Article

C2 - 22822252

AN - SCOPUS:77955885005

VL - 97

SP - 683

EP - 698

JO - Biometrika

JF - Biometrika

SN - 0006-3444

IS - 3

ER -