TY - JOUR
T1 - Logic Regression
AU - Ruczinski, Ingo
AU - Kooperberg, Charles
AU - Leblanc, Michael
N1 - Funding Information:
We thank RicharKrodnmal and Robyn McClelland for providing us with the data used in Section 5.2 and the permission to use those data in this article. We thank the GW oArganizers for peisrionmsto use their data in this aricle.tIngo Ruczinski and ChareslKooperberweregsupported in part by NIH grant CA74841. Michael LeBlanc wasuspported by NIH grant CA90998. All authors were supported in part by NIH grant CA53996. The GW wAorkshop is supported by NIH grant GM31575.
PY - 2003/9
Y1 - 2003/9
N2 - Logic regression is an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates. In many regression problems a model is developed that relates the main effects (the predictors or transformations thereof) to the response, while interactions are usually kept simple (two- to three-way interactions at most). Often, especially when all predictors are binary, the interaction between many predictors may be what causes the differences in response. This issue arises, for example, in the analysis of SNP microarray data or in some data mining problems. In the proposed methodology, given a set of binary predictors we create new predictors such as "X1, X2, X3, and X4 are true," or "X5 or X6, but not X7 are true." In more specific terms: we try to fit regression models of the form g(E[Y]) = b0 + b1L1 ++ bnL n, where Lj is any Boolean expression of the predictors. The Lj and bj are estimated simultaneously using a simulated annealing algorithm. This article discusses how to fit logic regression models, how to carry out model selection for these models, and gives some examples.
AB - Logic regression is an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates. In many regression problems a model is developed that relates the main effects (the predictors or transformations thereof) to the response, while interactions are usually kept simple (two- to three-way interactions at most). Often, especially when all predictors are binary, the interaction between many predictors may be what causes the differences in response. This issue arises, for example, in the analysis of SNP microarray data or in some data mining problems. In the proposed methodology, given a set of binary predictors we create new predictors such as "X1, X2, X3, and X4 are true," or "X5 or X6, but not X7 are true." In more specific terms: we try to fit regression models of the form g(E[Y]) = b0 + b1L1 ++ bnL n, where Lj is any Boolean expression of the predictors. The Lj and bj are estimated simultaneously using a simulated annealing algorithm. This article discusses how to fit logic regression models, how to carry out model selection for these models, and gives some examples.
KW - Adaptive model selection
KW - Binary variables
KW - Boolean logic
KW - Interactions
KW - Simulated annealing
KW - Snp data
UR - http://www.scopus.com/inward/record.url?scp=0141872478&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0141872478&partnerID=8YFLogxK
U2 - 10.1198/1061860032238
DO - 10.1198/1061860032238
M3 - Review article
AN - SCOPUS:0141872478
SN - 1061-8600
VL - 12
SP - 475
EP - 511
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
IS - 3
ER -