Prediction via orthogonalized model mixing

Merlise Clyde; Heather Desimone; Giovanni Parmigiani

Prediction via orthogonalized model mixing

Merlise Clyde, Heather Desimone, Giovanni Parmigiani

Research output: Contribution to journal › Article › peer-review

Abstract

We introduce an approach and algorithms for model mixing in large prediction problems with correlated predictors. We focus on the choice of predictors in linear models, and mix over possible subsets of candidate predictors. Our approach is based on expressing the space of models in terms of an orthogonalization of the design matrix. Advantages are both statistical and computational. Statistically, orthogonalization often leads to a reduction in the number of competing models by eliminating correlations. Computationally, large model spaces cannot be enumerated; recent approaches are based on sampling models with high posterior probability via Markov chains. Based on orthogonalization of the space of candidate predictors, we can approximate the posterior probabilities of models by products of predictor-specific terms. This leads to an importance sampling function for sampling directly from the joint distribution over the model space, without resorting to Markov chains. Compared to the latter, orthogonalized model mixing by importance sampling is faster in sampling models and is also more efficient in finding models that contribute significantly to the prediction. Further advantages are in the speed of convergence and the availability of more reliable convergence diagnostic tools. We illustrate these in practice, using a data set on prediction of crime rates. The model space is small enough so that enumeration of all models is available for comparison and convergence checks. Also, we demonstrate the feasibility of orthogonalized model mixing in a large-size problem, which is very difficult to attack by other methods. The data set is from a designed experiment dealing with predicting protein activity under different storage conditions. The model space is large (the rank of the design matrix is 88) and very difficult to explore if expressed in terms of the original variables. We obtain prediction intervals and a probability distribution of the setting that produces the highest response.

Original language	English (US)
Pages (from-to)	1197-1208
Number of pages	12
Journal	Journal of the American Statistical Association
Volume	91
Issue number	435
State	Published - Sep 1996
Externally published	Yes

Keywords

Bayesian linear models
Importance sampling
Model uncertainty
Variable selection

ASJC Scopus subject areas

General Mathematics
Statistics and Probability

Cite this

@article{ba32d0c60142433694ade7f57f123fb0,

title = "Prediction via orthogonalized model mixing",

abstract = "We introduce an approach and algorithms for model mixing in large prediction problems with correlated predictors. We focus on the choice of predictors in linear models, and mix over possible subsets of candidate predictors. Our approach is based on expressing the space of models in terms of an orthogonalization of the design matrix. Advantages are both statistical and computational. Statistically, orthogonalization often leads to a reduction in the number of competing models by eliminating correlations. Computationally, large model spaces cannot be enumerated; recent approaches are based on sampling models with high posterior probability via Markov chains. Based on orthogonalization of the space of candidate predictors, we can approximate the posterior probabilities of models by products of predictor-specific terms. This leads to an importance sampling function for sampling directly from the joint distribution over the model space, without resorting to Markov chains. Compared to the latter, orthogonalized model mixing by importance sampling is faster in sampling models and is also more efficient in finding models that contribute significantly to the prediction. Further advantages are in the speed of convergence and the availability of more reliable convergence diagnostic tools. We illustrate these in practice, using a data set on prediction of crime rates. The model space is small enough so that enumeration of all models is available for comparison and convergence checks. Also, we demonstrate the feasibility of orthogonalized model mixing in a large-size problem, which is very difficult to attack by other methods. The data set is from a designed experiment dealing with predicting protein activity under different storage conditions. The model space is large (the rank of the design matrix is 88) and very difficult to explore if expressed in terms of the original variables. We obtain prediction intervals and a probability distribution of the setting that produces the highest response.",

keywords = "Bayesian linear models, Importance sampling, Model uncertainty, Variable selection",

author = "Merlise Clyde and Heather Desimone and Giovanni Parmigiani",

year = "1996",

month = sep,

language = "English (US)",

volume = "91",

pages = "1197--1208",

journal = "Journal of the American Statistical Association",

issn = "0162-1459",

publisher = "Taylor and Francis Ltd.",

number = "435",

}

TY - JOUR

T1 - Prediction via orthogonalized model mixing

AU - Clyde, Merlise

AU - Desimone, Heather

AU - Parmigiani, Giovanni

PY - 1996/9

Y1 - 1996/9

N2 - We introduce an approach and algorithms for model mixing in large prediction problems with correlated predictors. We focus on the choice of predictors in linear models, and mix over possible subsets of candidate predictors. Our approach is based on expressing the space of models in terms of an orthogonalization of the design matrix. Advantages are both statistical and computational. Statistically, orthogonalization often leads to a reduction in the number of competing models by eliminating correlations. Computationally, large model spaces cannot be enumerated; recent approaches are based on sampling models with high posterior probability via Markov chains. Based on orthogonalization of the space of candidate predictors, we can approximate the posterior probabilities of models by products of predictor-specific terms. This leads to an importance sampling function for sampling directly from the joint distribution over the model space, without resorting to Markov chains. Compared to the latter, orthogonalized model mixing by importance sampling is faster in sampling models and is also more efficient in finding models that contribute significantly to the prediction. Further advantages are in the speed of convergence and the availability of more reliable convergence diagnostic tools. We illustrate these in practice, using a data set on prediction of crime rates. The model space is small enough so that enumeration of all models is available for comparison and convergence checks. Also, we demonstrate the feasibility of orthogonalized model mixing in a large-size problem, which is very difficult to attack by other methods. The data set is from a designed experiment dealing with predicting protein activity under different storage conditions. The model space is large (the rank of the design matrix is 88) and very difficult to explore if expressed in terms of the original variables. We obtain prediction intervals and a probability distribution of the setting that produces the highest response.

AB - We introduce an approach and algorithms for model mixing in large prediction problems with correlated predictors. We focus on the choice of predictors in linear models, and mix over possible subsets of candidate predictors. Our approach is based on expressing the space of models in terms of an orthogonalization of the design matrix. Advantages are both statistical and computational. Statistically, orthogonalization often leads to a reduction in the number of competing models by eliminating correlations. Computationally, large model spaces cannot be enumerated; recent approaches are based on sampling models with high posterior probability via Markov chains. Based on orthogonalization of the space of candidate predictors, we can approximate the posterior probabilities of models by products of predictor-specific terms. This leads to an importance sampling function for sampling directly from the joint distribution over the model space, without resorting to Markov chains. Compared to the latter, orthogonalized model mixing by importance sampling is faster in sampling models and is also more efficient in finding models that contribute significantly to the prediction. Further advantages are in the speed of convergence and the availability of more reliable convergence diagnostic tools. We illustrate these in practice, using a data set on prediction of crime rates. The model space is small enough so that enumeration of all models is available for comparison and convergence checks. Also, we demonstrate the feasibility of orthogonalized model mixing in a large-size problem, which is very difficult to attack by other methods. The data set is from a designed experiment dealing with predicting protein activity under different storage conditions. The model space is large (the rank of the design matrix is 88) and very difficult to explore if expressed in terms of the original variables. We obtain prediction intervals and a probability distribution of the setting that produces the highest response.

KW - Bayesian linear models

KW - Importance sampling

KW - Model uncertainty

KW - Variable selection

UR - http://www.scopus.com/inward/record.url?scp=0030336430&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030336430&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0030336430

SN - 0162-1459

VL - 91

SP - 1197

EP - 1208

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

IS - 435

ER -

Prediction via orthogonalized model mixing

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this