Validation data-based adjustments for outcome misclassification in logistic regression: An illustration

Robert H. Lyles; Li Tang; Hillary M. Superak; Caroline C. King; David D. Celentano; Yungtai Lo; Jack D. Sobel

doi:10.1097/EDE.0b013e3182117c85

Validation data-based adjustments for outcome misclassification in logistic regression: An illustration

Robert H. Lyles, Li Tang, Hillary M. Superak, Caroline C. King, David D. Celentano, Yungtai Lo, Jack D. Sobel

Bloomberg School of Public Health

Research output: Contribution to journal › Article › peer-review

50 Scopus citations

Abstract

Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validationstudy data. These methods are readily applicable under random crosssectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.

Original language	English (US)
Pages (from-to)	589-598
Number of pages	10
Journal	Epidemiology
Volume	22
Issue number	4
DOIs	https://doi.org/10.1097/EDE.0b013e3182117c85
State	Published - Jul 2011

ASJC Scopus subject areas

Epidemiology

Access to Document

10.1097/EDE.0b013e3182117c85

Cite this

@article{744763be04c946d59af980747de23ff6,

title = "Validation data-based adjustments for outcome misclassification in logistic regression: An illustration",

abstract = "Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validationstudy data. These methods are readily applicable under random crosssectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.",

author = "Lyles, {Robert H.} and Li Tang and Superak, {Hillary M.} and King, {Caroline C.} and Celentano, {David D.} and Yungtai Lo and Sobel, {Jack D.}",

year = "2011",

month = jul,

doi = "10.1097/EDE.0b013e3182117c85",

language = "English (US)",

volume = "22",

pages = "589--598",

journal = "Epidemiology",

issn = "1044-3983",

publisher = "Lippincott Williams and Wilkins",

number = "4",

}

TY - JOUR

T1 - Validation data-based adjustments for outcome misclassification in logistic regression

T2 - An illustration

AU - Lyles, Robert H.

AU - Tang, Li

AU - Superak, Hillary M.

AU - King, Caroline C.

AU - Celentano, David D.

AU - Lo, Yungtai

AU - Sobel, Jack D.

PY - 2011/7

Y1 - 2011/7

N2 - Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validationstudy data. These methods are readily applicable under random crosssectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.

AB - Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validationstudy data. These methods are readily applicable under random crosssectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.

UR - http://www.scopus.com/inward/record.url?scp=80051550370&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80051550370&partnerID=8YFLogxK

U2 - 10.1097/EDE.0b013e3182117c85

DO - 10.1097/EDE.0b013e3182117c85

M3 - Article

C2 - 21487295

AN - SCOPUS:80051550370

SN - 1044-3983

VL - 22

SP - 589

EP - 598

JO - Epidemiology

JF - Epidemiology

IS - 4

ER -

Validation data-based adjustments for outcome misclassification in logistic regression: An illustration

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this