Validation data-based adjustments for outcome misclassification in logistic regression: An illustration

Robert H. Lyles, Li Tang, Hillary M. Superak, Caroline C. King, David D Celentano, Yungtai Lo, Jack D. Sobel

Research output: Contribution to journalArticle

Abstract

Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validationstudy data. These methods are readily applicable under random crosssectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.

Original languageEnglish (US)
Pages (from-to)589-598
Number of pages10
JournalEpidemiology
Volume22
Issue number4
DOIs
StatePublished - Jul 2011

Fingerprint

Research Design
Logistic Models
Likelihood Functions
Bayes Theorem
Validation Studies
Research
Epidemiology
Software
Odds Ratio
Research Personnel
HIV

ASJC Scopus subject areas

  • Epidemiology

Cite this

Validation data-based adjustments for outcome misclassification in logistic regression : An illustration. / Lyles, Robert H.; Tang, Li; Superak, Hillary M.; King, Caroline C.; Celentano, David D; Lo, Yungtai; Sobel, Jack D.

In: Epidemiology, Vol. 22, No. 4, 07.2011, p. 589-598.

Research output: Contribution to journalArticle

Lyles, Robert H. ; Tang, Li ; Superak, Hillary M. ; King, Caroline C. ; Celentano, David D ; Lo, Yungtai ; Sobel, Jack D. / Validation data-based adjustments for outcome misclassification in logistic regression : An illustration. In: Epidemiology. 2011 ; Vol. 22, No. 4. pp. 589-598.
@article{744763be04c946d59af980747de23ff6,
title = "Validation data-based adjustments for outcome misclassification in logistic regression: An illustration",
abstract = "Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validationstudy data. These methods are readily applicable under random crosssectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.",
author = "Lyles, {Robert H.} and Li Tang and Superak, {Hillary M.} and King, {Caroline C.} and Celentano, {David D} and Yungtai Lo and Sobel, {Jack D.}",
year = "2011",
month = "7",
doi = "10.1097/EDE.0b013e3182117c85",
language = "English (US)",
volume = "22",
pages = "589--598",
journal = "Epidemiology",
issn = "1044-3983",
publisher = "Lippincott Williams and Wilkins",
number = "4",

}

TY - JOUR

T1 - Validation data-based adjustments for outcome misclassification in logistic regression

T2 - An illustration

AU - Lyles, Robert H.

AU - Tang, Li

AU - Superak, Hillary M.

AU - King, Caroline C.

AU - Celentano, David D

AU - Lo, Yungtai

AU - Sobel, Jack D.

PY - 2011/7

Y1 - 2011/7

N2 - Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validationstudy data. These methods are readily applicable under random crosssectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.

AB - Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validationstudy data. These methods are readily applicable under random crosssectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.

UR - http://www.scopus.com/inward/record.url?scp=80051550370&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80051550370&partnerID=8YFLogxK

U2 - 10.1097/EDE.0b013e3182117c85

DO - 10.1097/EDE.0b013e3182117c85

M3 - Article

C2 - 21487295

AN - SCOPUS:80051550370

VL - 22

SP - 589

EP - 598

JO - Epidemiology

JF - Epidemiology

SN - 1044-3983

IS - 4

ER -