«Key Results of Interaction Models With Centering David Afshartous Vanderbilt University Richard A. Preston University of Miami Journal of Statistics ...»
Journal of Statistics Education, Volume 19, Number 3 (2011)
Key Results of Interaction Models With Centering
Richard A. Preston
University of Miami
Journal of Statistics Education Volume 19, Number 3 (2011)
Copyright c 2011 by David Afshartous and Richard A. Preston, all rights reserved. This
text may be freely shared among individuals, but it may not be republished in any medium
without express written consent from the authors and advance notiﬁcation of the editor.
Key Words: Beta coefﬁcients; Introductory statistics; Medical statistics; Misspeciﬁcation bias; Multicollinearity; Multiple regression.
Abstract We consider the effect on estimation of simultaneous variable centering and interaction effects in linear regression. We technically deﬁne, review, and amplify many of the statistical issues for interaction models with centering in order to create a useful and compact reference for teachers, students, and applied researchers. In addition, we investigate a sequence of models that have an interaction effect and/or variable centering and derive expressions for the change in the regression coefﬁcients between models from both an intuitive and mathematical perspective. We demonstrate how these topics may be employed to motivate discussion of other important areas, e.g., misspeciﬁcation bias, multicollinearity, design of experiments, and regression surfaces. This paper presents a number of results also given elsewhere but in a form that gives a uniﬁed view of the topic. The examples cited are from the area of medical statistics.
Journal of Statistics Education, Volume 19, Number 3 (2011)
1. Introduction We consider the case of simultaneous variable centering and interaction effects in linear regression. The goal is to create a useful reference for teachers and students of statistics, as well as applied researchers. Thus, we technically deﬁne, review, and amplify many of the statistical issues for interaction models with centering and provide a comprehensive summary and discussion of key points. While many of the points we raise have been made elsewhere, they are somewhat scattered across a voluminous literature. The examples cited are from the area of medical statistics.
By the term variable centering we mean subtracting either the mean value or a meaningful constant from an independent variable. It is well-known that variable centering can often increase the interpretability of regression coefﬁcients as well as reduce multicollinearity between lower and higher-order predictor variables.
To discuss characteristics of interaction effects, we consider a model with two predictors and their cross-product term. For ease of illustration we assume that continuous predictors are linearly related to the dependent variable.1 Interaction effects arise when the effects of predictor variables are not additive, i.e., the effect of one predictor variable depends on the value of another variable. For example, consider Figure 1 in the context of a potassium challenge experiment2 where Y represents urinary potassium excretion, X1 represents serum potassium level, and X2 represents glomerular ﬁltration rate (GFR). As GFR is a measure of kidney function one might expect that the slope of the response Y against serum potassium level X1 would increase for higher GFR levels X2. This is often referred to as a reinforcement or synergistic interaction, whereas an offsetting or interference interaction effect occurs when the slope of the response decreases for higher GFR levels X2. Moreover, centering of GFR and serum potassium could enhance the interpretability of the regression coefﬁcients given that it is not meaningful to consider a subject with a zero value for GFR or serum potassium.
When adding centering to a model that includes an interaction term, the magnitude and standard error of certain estimated coefﬁcients change. Indeed, as researchers often sift through several different models, many of which yield the same ﬁtted values merely under different parameterizations, the potential for confusion is high. In this paper we attempt to provide a compact guide to help reduce such confusion. In Section 2, we provide separate overviews of variable centering and interaction effects. In Section 3, we consider simultaneous centering and interaction effects via a sequence of models. We derive expressions for the change in the regression coefﬁcients for the new models from both an intuitive and 1 For a discussion of relaxing the linearity assumption see Harrell (2001), p.16 2 Potassium challenge experiments involve the administration of a potassium load to experimental subjects in order to investigate the physiology of potassium handling.
Figure 1. Illustration of reinforcement and interference interaction effects.
In the additive model (a), the relationship between Y and X1 does not depend on the value of X2. In a reinforcement interaction effect (b), the slope between Y and X1 increases for higher X2 values, while in an interference interaction effect (c) the slope between Y and X1 decreases for higher X2 values.
mathematical perspective. In Section 4, we provide a list of key points to guide both teaching and applied work with interaction models and centering.3 We conclude with a brief summary in Section 5.
2. Variable Centering and Interaction Effects
2.1 Variable Centering Motivations for employing variable centering include enhanced interpretability of coefﬁcients and reduced numerical instability for estimation associated with multicollinearity.
Consider the standard bivariate linear regression model where scalars Xi and Yi represent the predictor and response variables, respectively, for the ith observation, and scalar εi represents the corresponding random error term where the standard assumption is that εi ∼ N(0, σ 2 ). Omitting the subscript without loss of generality, the “true” population
model is4 :
Y = α + β X + ε, (1) 3 For assessments of the methodology to detect interaction effects in certain ﬁelds (that also attempt to identify key points) see Carte and Russell (2003); Champoux and Peters (1987).
4 Throughout the paper only scalar notation is employed. Greek letters are employed for population parameters while the corresponding English lower-case letter represents the corresponding estimator, e.g., (α, β ) versus (a, b).
where one may consider this as the regression of Y on the transformed predictor variable X ∗ = X − k. For instance, consider k = µX = E(X), the population mean of X.5 Although this change of location of the predictor variable shifts the 0 point to µX, other changes of location to another meaningful value k are possible as well. Since E(Y ) = α ∗ + β ∗ (X − µX ), the new intercept α ∗ represents the expected value of Y when X = µX, i.e., the expected value of Y for the average predictor value. If the X variable is a physiological variable such as weight or blood pressure, the centered model provides a much more meaningful intercept. Since both population models must yield the same expected values for the same given X values, it follows that α ∗ = α + β µX and β ∗ = β. For instance, E(Y |X = µX ) = α + β µX = α ∗ and E(Y |X = 0) = α = α ∗ − β ∗ µX, from which both results follow. Since correlation properties between variables do not change under linear transformations, the fact that the estimated slope should not change is also intuitive. It also follows that centering (or any linear transformation) does not alter the coefﬁcient of determination R2 (Arnold and Evans 1979; Allison 1977).
In practice, the population parameters are unknown and must be estimated via sampled data (Xi,Yi ), i = 1,..., n, yielding the analogous equations for the estimated regression coefﬁcients, e.g., a = a∗ − bX and b∗ = b. Note that centering predictors by their sample mean also has the beneﬁcial effect of making the estimate of the intercept independent of the estimate of the slope.6 In multiple regression, variable centering is often touted as a potential solution to reduce numerical instability associated with multicollinearity, and a common cause of multicollinearity is a model with interaction term X1 X2 or other higher-order terms such as X 2 or X 3. For the case of two predictor variables X1 and X2, when X1 and X2 are uncorrelated in the sample data the estimated regression coefﬁcient b1 is the same regardless of whether X2 is included in the model or not (similarly for b2 and X1 ). This may be seen from the following algebraic expression for b1 in the standard multiple regression model with two 5 Asterisks are employed to denote corresponding parameters and estimators in a transformed model versus the original model, e.g., α ∗ is the intercept in the centered model while α is the intercept in the original model.
6 This result no longer holds if one centers via k where k is not the sample mean.
where rY 2 represents the sample correlation coefﬁcient between Y and X2 and r12 represents the sample correlation coefﬁcient between X1 and X2.7 However, if the predictor variables are (perfectly) uncorrelated we have r12 = 0 and it immediately follows that
which by deﬁnition is the estimated slope in the bivariate regression of Y on X1 alone.
Note that predictors are often correlated, except for designed experiments where the experimenter may choose the levels of the predictor variables.
When predictor variables are perfectly correlated inﬁnitely many estimated coefﬁcients provide the same predicted values and ﬁt to the data. Perfect correlation, however, is not as troublesome as near perfect correlation. Under perfect correlation, the simple solution is to remove one of the variables since doing so does not remove any information. On the other hand, if |Cor(X1, X2 )| 1, removing one of the variables entails a loss of information.
Practically, the estimated coefﬁcient b1 changes depending on whether the predictor variable X2 is included in the model or not. This change may be quantiﬁed and is commonly referred to as speciﬁcation bias. Speciﬁcally, if Cov(X1, X2 ) = σ12 and one estimates a model without X2 when the model should include X2, one may show that the resulting estimate for the regression coefﬁcient of X1 has E(b1 ) = β1 + β2 σ12, i.e., the expected bias in σ2 1 b1 is thus β2 σ12 (Goldberger 1964). Even if both variables are included, inference becomes more difﬁcult in the presence of inﬂated standard errors, i.e., estimation uncertainty, where σ1 a small change to the data can result in a large change to the estimated coefﬁcients. The more advanced reader may ﬁnd further details regarding multicollinearity in Christensen (2002).
2.2 Interaction Effects Consider multiple regression with two predictor variables. An interaction effect may be modeled by including the product term X1 × X2 as an additional variable in the regression, known as a two-way interaction term. If there are k predictor variables in the multiple regression, there are 2!(k−2)! potential two-way interactions, and analogously for threek!
7 Note that there exists the distinction between the population correlation and the sample correlation, and correlated in sample does not necessarily imply correlated in population, and vice versa.
Journal of Statistics Education, Volume 19, Number 3 (2011) way and higher-order interactions. For a simple model with two-way interactions only, the
population model is:
The re-arrangement of terms in Equations 6–8 demonstrates the meaning of an interaction effect, i.e., the slope associated with X1 is no longer simply a constant β1, but rather (β1 + β3 X2 ), which clearly depends on the value of X2, and similarly the slope associated with X2 is now (β2 + β3 X1 ). The coefﬁcient β1 now represents the effect of X1 on Y when X2 = 0, whereas β1 in a model without interaction represents the effect of X1 on Y for all levels of X2. The effect of X1 on Y for non-zero values of X2 is affected by the magnitude and sign of β3, e.g., if β3 0, the effect of X1 on Y is less for higher values of X2 and greater for smaller values of X2 (interference or offsetting interaction, Figure 1), and vice versa for β3 0 (synergistic or reinforcing interaction, Figure 1).
For instance, for X2 = 0, 1, 2, we have three different lines for the effect of X1 on Y :
and the bivariate relationship between Y and X1 depends on X2. Note that β3 in isolation lacks information about the relative strength of the interaction. For instance, β1 may be so large that even for a seemingly large β3 there is not a substantial impact over the range of X2 values considered.
Interaction effects are sometimes called joint effects, where the focus (instead of the conditional focus above) is more on how the two variables interact when accounting for the variance in Y over and above the contributions of the individual additive effects. Indeed, the interaction term does not assess the combined effect, e.g., a positive interaction coefﬁcient β3 0 only provides slope change information: higher values of X2 correspond to a greater slope between Y and X1. On the other hand, β3 0 provides no information whatsoever regarding whether Y achieves its highest values for the highest values of X1 and X2 (Hartmann and Moers 1999). For example, in Figure 2a and Figure 2b the sign and magnitude of the interaction coefﬁcient β3 is the same. However, for the range of X1 shown in Figure 2a, Y is higher when both predictors are high, while in Figure 2b we have Y higher when X1 is
high and X2 is low.8 Figure 2. Interaction coefﬁcient does not provide information with respect to where dependent variable is higher. In both a) and b), the sign and magnitude of the interaction is the same. In a), Y is higher when both predictors are high, while in b) Y is higher when X1 is high and X2 is low.