Logic Regression

Ingo Ruczinski; Charles Kooperberg; Michael Leblanc

doi:10.1198/1061860032238

Logic Regression

Ingo Ruczinski, Charles Kooperberg, Michael Leblanc

Bloomberg School of Public Health

Research output: Contribution to journal › Review article › peer-review

221 Scopus citations

Abstract

Logic regression is an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates. In many regression problems a model is developed that relates the main effects (the predictors or transformations thereof) to the response, while interactions are usually kept simple (two- to three-way interactions at most). Often, especially when all predictors are binary, the interaction between many predictors may be what causes the differences in response. This issue arises, for example, in the analysis of SNP microarray data or in some data mining problems. In the proposed methodology, given a set of binary predictors we create new predictors such as "X₁, X₂, X₃, and X₄ are true," or "X₅ or X₆, but not X₇ are true." In more specific terms: we try to fit regression models of the form g(E[Y]) = b₀ + b₁L₁ ++ b_nL _n, where L_j is any Boolean expression of the predictors. The L_j and b_j are estimated simultaneously using a simulated annealing algorithm. This article discusses how to fit logic regression models, how to carry out model selection for these models, and gives some examples.

Original language	English (US)
Pages (from-to)	475-511
Number of pages	37
Journal	Journal of Computational and Graphical Statistics
Volume	12
Issue number	3
DOIs	https://doi.org/10.1198/1061860032238
State	Published - Sep 2003

Keywords

Adaptive model selection
Binary variables
Boolean logic
Interactions
Simulated annealing
Snp data

ASJC Scopus subject areas

Discrete Mathematics and Combinatorics
Statistics and Probability
Statistics, Probability and Uncertainty

Access to Document

10.1198/1061860032238

Cite this

@article{850a971c6a0d490faa391fc1abc8866a,

title = "Logic Regression",

abstract = "Logic regression is an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates. In many regression problems a model is developed that relates the main effects (the predictors or transformations thereof) to the response, while interactions are usually kept simple (two- to three-way interactions at most). Often, especially when all predictors are binary, the interaction between many predictors may be what causes the differences in response. This issue arises, for example, in the analysis of SNP microarray data or in some data mining problems. In the proposed methodology, given a set of binary predictors we create new predictors such as {"}X1, X2, X3, and X4 are true,{"} or {"}X5 or X6, but not X7 are true.{"} In more specific terms: we try to fit regression models of the form g(E[Y]) = b0 + b1L1 ++ bnL n, where Lj is any Boolean expression of the predictors. The Lj and bj are estimated simultaneously using a simulated annealing algorithm. This article discusses how to fit logic regression models, how to carry out model selection for these models, and gives some examples.",

keywords = "Adaptive model selection, Binary variables, Boolean logic, Interactions, Simulated annealing, Snp data",

author = "Ingo Ruczinski and Charles Kooperberg and Michael Leblanc",

note = "Funding Information: We thank RicharKrodnmal and Robyn McClelland for providing us with the data used in Section 5.2 and the permission to use those data in this article. We thank the GW oArganizers for peisrionmsto use their data in this aricle.tIngo Ruczinski and ChareslKooperberweregsupported in part by NIH grant CA74841. Michael LeBlanc wasuspported by NIH grant CA90998. All authors were supported in part by NIH grant CA53996. The GW wAorkshop is supported by NIH grant GM31575.",

year = "2003",

month = sep,

doi = "10.1198/1061860032238",

language = "English (US)",

volume = "12",

pages = "475--511",

journal = "Journal of Computational and Graphical Statistics",

issn = "1061-8600",

publisher = "American Statistical Association",

number = "3",

}

TY - JOUR

T1 - Logic Regression

AU - Ruczinski, Ingo

AU - Kooperberg, Charles

AU - Leblanc, Michael

N1 - Funding Information: We thank RicharKrodnmal and Robyn McClelland for providing us with the data used in Section 5.2 and the permission to use those data in this article. We thank the GW oArganizers for peisrionmsto use their data in this aricle.tIngo Ruczinski and ChareslKooperberweregsupported in part by NIH grant CA74841. Michael LeBlanc wasuspported by NIH grant CA90998. All authors were supported in part by NIH grant CA53996. The GW wAorkshop is supported by NIH grant GM31575.

PY - 2003/9

Y1 - 2003/9

N2 - Logic regression is an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates. In many regression problems a model is developed that relates the main effects (the predictors or transformations thereof) to the response, while interactions are usually kept simple (two- to three-way interactions at most). Often, especially when all predictors are binary, the interaction between many predictors may be what causes the differences in response. This issue arises, for example, in the analysis of SNP microarray data or in some data mining problems. In the proposed methodology, given a set of binary predictors we create new predictors such as "X1, X2, X3, and X4 are true," or "X5 or X6, but not X7 are true." In more specific terms: we try to fit regression models of the form g(E[Y]) = b0 + b1L1 ++ bnL n, where Lj is any Boolean expression of the predictors. The Lj and bj are estimated simultaneously using a simulated annealing algorithm. This article discusses how to fit logic regression models, how to carry out model selection for these models, and gives some examples.

AB - Logic regression is an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates. In many regression problems a model is developed that relates the main effects (the predictors or transformations thereof) to the response, while interactions are usually kept simple (two- to three-way interactions at most). Often, especially when all predictors are binary, the interaction between many predictors may be what causes the differences in response. This issue arises, for example, in the analysis of SNP microarray data or in some data mining problems. In the proposed methodology, given a set of binary predictors we create new predictors such as "X1, X2, X3, and X4 are true," or "X5 or X6, but not X7 are true." In more specific terms: we try to fit regression models of the form g(E[Y]) = b0 + b1L1 ++ bnL n, where Lj is any Boolean expression of the predictors. The Lj and bj are estimated simultaneously using a simulated annealing algorithm. This article discusses how to fit logic regression models, how to carry out model selection for these models, and gives some examples.

KW - Adaptive model selection

KW - Binary variables

KW - Boolean logic

KW - Interactions

KW - Simulated annealing

KW - Snp data

UR - http://www.scopus.com/inward/record.url?scp=0141872478&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0141872478&partnerID=8YFLogxK

U2 - 10.1198/1061860032238

DO - 10.1198/1061860032238

M3 - Review article

AN - SCOPUS:0141872478

SN - 1061-8600

VL - 12

SP - 475

EP - 511

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

IS - 3

ER -

Logic Regression

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this