Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis

N. E. Breslow, Nilanjan Chatterjee

Research output: Contribution to journalArticle

Abstract

Two-phase stratified sampling is used to select subjects for the collection of additional data, e.g. validation data in measurement error problems. Stratification jointly by outcome and covariates, with sampling fractions chosen to achieve approximately equal numbers per stratum at the second phase of sampling, enhances efficiency compared with stratification based on the outcome or covariates alone. Nonparametric maximum likelihood may result in substantially more efficient estimates of logistic regression coefficients than weighted or pseudolikelihood procedures. Software to implement all three procedures is available. We demonstrate the practical importance of these design and analysis principles by an analysis of, and simulations based on, data from the US National Wilms Tumor Study.

Original languageEnglish (US)
Pages (from-to)457-468
Number of pages12
JournalJournal of the Royal Statistical Society. Series C: Applied Statistics
Volume48
Issue number4
StatePublished - 1999
Externally publishedYes

Fingerprint

Binary Outcomes
Prognosis
Tumor
Stratification
Covariates
Nonparametric Maximum Likelihood
Two-phase Sampling
Stratified Sampling
Pseudo-likelihood
Approximately equal
Regression Coefficient
Logistic Regression
Measurement Error
Software
Estimate
Demonstrate
Design
Sampling
Simulation
Maximum likelihood

Keywords

  • Design efficiency
  • Logistic regression
  • Nonparametric maximum likelihood
  • Stratified sampling

ASJC Scopus subject areas

  • Mathematics(all)
  • Statistics and Probability

Cite this

@article{b00c8720fab747198b8401636981c966,
title = "Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis",
abstract = "Two-phase stratified sampling is used to select subjects for the collection of additional data, e.g. validation data in measurement error problems. Stratification jointly by outcome and covariates, with sampling fractions chosen to achieve approximately equal numbers per stratum at the second phase of sampling, enhances efficiency compared with stratification based on the outcome or covariates alone. Nonparametric maximum likelihood may result in substantially more efficient estimates of logistic regression coefficients than weighted or pseudolikelihood procedures. Software to implement all three procedures is available. We demonstrate the practical importance of these design and analysis principles by an analysis of, and simulations based on, data from the US National Wilms Tumor Study.",
keywords = "Design efficiency, Logistic regression, Nonparametric maximum likelihood, Stratified sampling",
author = "Breslow, {N. E.} and Nilanjan Chatterjee",
year = "1999",
language = "English (US)",
volume = "48",
pages = "457--468",
journal = "Journal of the Royal Statistical Society. Series C: Applied Statistics",
issn = "0035-9254",
publisher = "Wiley-Blackwell",
number = "4",

}

TY - JOUR

T1 - Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis

AU - Breslow, N. E.

AU - Chatterjee, Nilanjan

PY - 1999

Y1 - 1999

N2 - Two-phase stratified sampling is used to select subjects for the collection of additional data, e.g. validation data in measurement error problems. Stratification jointly by outcome and covariates, with sampling fractions chosen to achieve approximately equal numbers per stratum at the second phase of sampling, enhances efficiency compared with stratification based on the outcome or covariates alone. Nonparametric maximum likelihood may result in substantially more efficient estimates of logistic regression coefficients than weighted or pseudolikelihood procedures. Software to implement all three procedures is available. We demonstrate the practical importance of these design and analysis principles by an analysis of, and simulations based on, data from the US National Wilms Tumor Study.

AB - Two-phase stratified sampling is used to select subjects for the collection of additional data, e.g. validation data in measurement error problems. Stratification jointly by outcome and covariates, with sampling fractions chosen to achieve approximately equal numbers per stratum at the second phase of sampling, enhances efficiency compared with stratification based on the outcome or covariates alone. Nonparametric maximum likelihood may result in substantially more efficient estimates of logistic regression coefficients than weighted or pseudolikelihood procedures. Software to implement all three procedures is available. We demonstrate the practical importance of these design and analysis principles by an analysis of, and simulations based on, data from the US National Wilms Tumor Study.

KW - Design efficiency

KW - Logistic regression

KW - Nonparametric maximum likelihood

KW - Stratified sampling

UR - http://www.scopus.com/inward/record.url?scp=0033474613&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033474613&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0033474613

VL - 48

SP - 457

EP - 468

JO - Journal of the Royal Statistical Society. Series C: Applied Statistics

JF - Journal of the Royal Statistical Society. Series C: Applied Statistics

SN - 0035-9254

IS - 4

ER -