Collinearity and causal diagrams

Enrique F. Schisterman; Neil J. Perkins; Sunni L. Mumford; Katherine A. Ahrens; Emily M. Mitchell

doi:10.1097/EDE.0000000000000554

Collinearity and causal diagrams

Enrique F. Schisterman, Neil J. Perkins, Sunni L. Mumford, Katherine A. Ahrens, Emily M. Mitchell

Research output: Contribution to journal › Article › peer-review

32 Scopus citations

Abstract

Background: Correlated data are ubiquitous in epidemiologic research, particularly in nutritional and environmental epidemiology where mixtures of factors are often studied. Our objectives are to demonstrate how highly correlated data arise in epidemiologic research and provide guidance, using a directed acyclic graph approach, on how to proceed analytically when faced with highly correlated data. Methods: We identified three fundamental structural scenarios in which high correlation between a given variable and the exposure can arise: intermediates, confounders, and colliders. For each of these scenarios, we evaluated the consequences of increasing correlation between the given variable and the exposure on the bias and variance for the total effect of the exposure on the outcome using unadjusted and adjusted models. We derived closed-form solutions for continuous outcomes using linear regression and empirically present our findings for binary outcomes using logistic regression. Results: For models properly specified, total effect estimates remained unbiased even when there was almost perfect correlation between the exposure and a given intermediate, confounder, or collider. In general, as the correlation increased, the variance of the parameter estimate for the exposure in the adjusted models increased, while in the unadjusted models, the variance increased to a lesser extent or decreased. Conclusion: Our findings highlight the importance of considering the causal framework under study when specifying regression models. Strategies that do not take into consideration the causal structure may lead to biased effect estimation for the original question of interest, even under high correlation.

Original language	English (US)
Pages (from-to)	47-53
Number of pages	7
Journal	Epidemiology
Volume	28
Issue number	1
DOIs	https://doi.org/10.1097/EDE.0000000000000554
State	Published - Jan 1 2017
Externally published	Yes

ASJC Scopus subject areas

Epidemiology

Access to Document

10.1097/EDE.0000000000000554

Cite this

@article{c481d784ef1745d6b5235020b94257e9,

title = "Collinearity and causal diagrams",

abstract = "Background: Correlated data are ubiquitous in epidemiologic research, particularly in nutritional and environmental epidemiology where mixtures of factors are often studied. Our objectives are to demonstrate how highly correlated data arise in epidemiologic research and provide guidance, using a directed acyclic graph approach, on how to proceed analytically when faced with highly correlated data. Methods: We identified three fundamental structural scenarios in which high correlation between a given variable and the exposure can arise: intermediates, confounders, and colliders. For each of these scenarios, we evaluated the consequences of increasing correlation between the given variable and the exposure on the bias and variance for the total effect of the exposure on the outcome using unadjusted and adjusted models. We derived closed-form solutions for continuous outcomes using linear regression and empirically present our findings for binary outcomes using logistic regression. Results: For models properly specified, total effect estimates remained unbiased even when there was almost perfect correlation between the exposure and a given intermediate, confounder, or collider. In general, as the correlation increased, the variance of the parameter estimate for the exposure in the adjusted models increased, while in the unadjusted models, the variance increased to a lesser extent or decreased. Conclusion: Our findings highlight the importance of considering the causal framework under study when specifying regression models. Strategies that do not take into consideration the causal structure may lead to biased effect estimation for the original question of interest, even under high correlation.",

author = "Schisterman, {Enrique F.} and Perkins, {Neil J.} and Mumford, {Sunni L.} and Ahrens, {Katherine A.} and Mitchell, {Emily M.}",

note = "Publisher Copyright: {\textcopyright} 2017 Wolters Kluwer Health, Inc.",

year = "2017",

month = jan,

day = "1",

doi = "10.1097/EDE.0000000000000554",

language = "English (US)",

volume = "28",

pages = "47--53",

journal = "Epidemiology",

issn = "1044-3983",

publisher = "Lippincott Williams and Wilkins",

number = "1",

}

TY - JOUR

T1 - Collinearity and causal diagrams

AU - Schisterman, Enrique F.

AU - Perkins, Neil J.

AU - Mumford, Sunni L.

AU - Ahrens, Katherine A.

AU - Mitchell, Emily M.

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Background: Correlated data are ubiquitous in epidemiologic research, particularly in nutritional and environmental epidemiology where mixtures of factors are often studied. Our objectives are to demonstrate how highly correlated data arise in epidemiologic research and provide guidance, using a directed acyclic graph approach, on how to proceed analytically when faced with highly correlated data. Methods: We identified three fundamental structural scenarios in which high correlation between a given variable and the exposure can arise: intermediates, confounders, and colliders. For each of these scenarios, we evaluated the consequences of increasing correlation between the given variable and the exposure on the bias and variance for the total effect of the exposure on the outcome using unadjusted and adjusted models. We derived closed-form solutions for continuous outcomes using linear regression and empirically present our findings for binary outcomes using logistic regression. Results: For models properly specified, total effect estimates remained unbiased even when there was almost perfect correlation between the exposure and a given intermediate, confounder, or collider. In general, as the correlation increased, the variance of the parameter estimate for the exposure in the adjusted models increased, while in the unadjusted models, the variance increased to a lesser extent or decreased. Conclusion: Our findings highlight the importance of considering the causal framework under study when specifying regression models. Strategies that do not take into consideration the causal structure may lead to biased effect estimation for the original question of interest, even under high correlation.

AB - Background: Correlated data are ubiquitous in epidemiologic research, particularly in nutritional and environmental epidemiology where mixtures of factors are often studied. Our objectives are to demonstrate how highly correlated data arise in epidemiologic research and provide guidance, using a directed acyclic graph approach, on how to proceed analytically when faced with highly correlated data. Methods: We identified three fundamental structural scenarios in which high correlation between a given variable and the exposure can arise: intermediates, confounders, and colliders. For each of these scenarios, we evaluated the consequences of increasing correlation between the given variable and the exposure on the bias and variance for the total effect of the exposure on the outcome using unadjusted and adjusted models. We derived closed-form solutions for continuous outcomes using linear regression and empirically present our findings for binary outcomes using logistic regression. Results: For models properly specified, total effect estimates remained unbiased even when there was almost perfect correlation between the exposure and a given intermediate, confounder, or collider. In general, as the correlation increased, the variance of the parameter estimate for the exposure in the adjusted models increased, while in the unadjusted models, the variance increased to a lesser extent or decreased. Conclusion: Our findings highlight the importance of considering the causal framework under study when specifying regression models. Strategies that do not take into consideration the causal structure may lead to biased effect estimation for the original question of interest, even under high correlation.

UR - http://www.scopus.com/inward/record.url?scp=84988695200&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84988695200&partnerID=8YFLogxK

U2 - 10.1097/EDE.0000000000000554

DO - 10.1097/EDE.0000000000000554

M3 - Article

C2 - 27676260

AN - SCOPUS:84988695200

SN - 1044-3983

VL - 28

SP - 47

EP - 53

JO - Epidemiology

JF - Epidemiology

IS - 1

ER -

Collinearity and causal diagrams

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this