# «Der Open-Access-Publikationsserver der ZBW – Leibniz-Informationszentrum Wirtschaft The Open Access Publication Server of the ZBW – Leibniz ...»

econstor

www.econstor.eu

Der Open-Access-Publikationsserver der ZBW – Leibniz-Informationszentrum Wirtschaft

The Open Access Publication Server of the ZBW – Leibniz Information Centre for Economics

Teresa, Buchen; Wohlrabe, Klaus

Conference Paper

Assessing the Macroeconomic Forecasting

Performance of Boosting: Evidence for the United

States, the Euro Area, and Germany

Beiträge zur Jahrestagung des Vereins für Socialpolitik 2014: Evidenzbasierte

Wirtschaftspolitik - Session: Macroeconomic Forecasts, No. E14-V1

**Provided in Cooperation with:**

Verein für Socialpolitik / German Economic Association Suggested Citation: Teresa, Buchen; Wohlrabe, Klaus (2014) : Assessing the Macroeconomic Forecasting Performance of Boosting: Evidence for the United States, the Euro Area, and Germany, Beiträge zur Jahrestagung des Vereins für Socialpolitik 2014: Evidenzbasierte Wirtschaftspolitik - Session: Macroeconomic Forecasts, No. E14-V1

**This Version is available at:**

http://hdl.handle.net/10419/100626

**Standard-Nutzungsbedingungen: Terms of use:**

Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes.

Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public.

Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence.

zbw Leibniz-Informationszentrum Wirtschaft Leibniz Information Centre for Economics Assessing the Macroeconomic Forecasting Performance of Boosting Evidence for the United States, the Euro Area, and Germany Klaus Wohlrabe†

The use of large datasets for macroeconomic forecasting has received a great deal of interest recently. Boosting is one possible method of using high-dimensional data for this purpose. It is a stagewise additive modelling procedure, which, in a linear speciﬁcation, becomes a variable selection device that iteratively adds the predictors with the largest contribution to the ﬁt. Using data for the United States, the euro area and Germany, we assess the performance of boosting when forecasting a wide range of macroeconomic variables.

Moreover, we analyse to what extent its forecasting accuracy depends on the method used for determining its key regularisation parameter, the number of iterations. We ﬁnd that boosting mostly outperforms the autoregressive benchmark, and that K-fold cross-validation works much better as stopping criterion than the commonly used information criteria.

Keywords: Macroeconomic forecasting, component-wise boosting, large datasets, variable selection, model selection criteria.

∗ Ifo Institute. E-mail address: buchen@ifo.de.

†

**Corresponding author. Mailing address: Ifo Institute, Poschingerstr. 5, 81679 Munich, Germany. E-mail address: wohlrabe@ifo.de. Tel.: +49(0)89/9224-1229. Fax:**

+49(0)89/9224-1463.

1 Introduction There has been a recent upswing of interest using large datasets for macroeconomic forecasting. An increasing number of time series describing the state of the economy are available that could be useful for forecasting. Also, computational power to handle an immense amount of data has steadily risen over time. Thus, researchers now attempt to improve their forecasting models by exploiting a broader information base.

Conventional econometric methods are not well suited to incorporating a large number of predictors; depending on the number of time-series observations, it is either impossible or ineﬃcient to estimate the respective forecasting model. To overcome these problems without losing relevant information, new forecasting methods were developed. Eklund and Kapetanios (2008) classify the methods for forecasting a time series into three broad, partly overlapping, categories. The ﬁrst group includes methods that use the whole dataset for forecasting, such as Bayesian regression and factor methods. The second group consists of forecast combination methods that use subsets of the data to produce multiple forecasts, which are then averaged. Componentwise boosting belongs to the third category. The latter assembles variable selection methods (LASSO and least angle regression are other examples) that also use subsets of the data, but produce only one forecast based on the optimal set of variables. More speciﬁcally, component-wise boosting is a stage-wise additive modelling procedure, that sequentially adds the predictor with the largest contribution to the ﬁt without adjusting the previously entered coeﬃcients.

Boosting has attracted much attention in machine learning and statistics because it can handle large datasets in a computationally eﬃcient manner and because it has proven excellent prediction performance in a wide range of applications (B¨hlmann and Hothorn, 2010). However, only recently has the u method found its way into the macroeconometric literature. Apart from several ﬁnancial applications (Audrino and Barone-Adesi, 2005; Gavrishchaka, 2006; Audrino and Trojani, 2007; Andrada-F´lix and Fern´ndez-Rodr´ e a ıguez, 2008), there are only few macroeconometric studies on the forecasting performance of boosting (Bai and Ng, 2009; Shaﬁk and Tutz, 2009; Buchen and Wohlrabe, 2011; Robinzonov, Tutz, and Hothorn, 2012; Kim and Swanson, 2014). Results with respect to the predictive accuracy of boosting are promising. However, most of these studies are conﬁned to U.S. data and use only few target variables.1 We add to this literature by analysing the performance of boosting when forecasting a wide range of macroeconomic variables using three datasets for the United States, the euro area, and Germany. Moreover, we investigate to what extent the forecasting performance of boosting depends on the specication of the boosting algorithm concerning the stopping criterion for the number of iterations.

Careful choice of the stopping criterion of boosting is crucial since the number of iterations M is the key parameter regularising the tradeoﬀ between bias and variance, on which the forecasting performance hinges. Small values of M yield a parsimonious model with a potentially large bias. The larger M becomes, the more one approaches a perfect ﬁt, increasing the variance of the forecasting model. There are several methods for estimating the optimal number of iterations. The information criteria proposed by B¨hlmann (2006) u are wide-spread because they are computationally attractive,2 but they tend to lead to overﬁtting (Hastie, 2007). Alternatively, resampling methods, such as K-fold cross-validation, can be applied. We evaluate whether the various stopping criteria result in relevant diﬀerences in the predictive performance of boosting when forecasting macroeconomic aggregates.

The remainder of this paper is organised as follows. Section 2 explains the boosting algorithm, especially how it handles the tradeoﬀ between bias and variance. Section 3 sums up our empirical analysis. Section 4 concludes.

An exception is Carriero, Kapetanios, and Marcellino (2011) who compare diﬀerent methods that can be used in a VAR framework for forecasting the whole dataset consisting of 52 macroeconomic variables, including several reduced-rank models, factor models, Bayesian VAR models, and multivariate boosting. The latter is an extension of the standard boosting method developed by Lutz and B¨hlmann (2006), where the predictors are u selected according to a multivariate measure of ﬁt. The results indicate that the forecasting performance of multivariate boosting is somewhat worse than that of the standard boosting approach.

They are used, for instance, by Bai and Ng (2009), Shaﬁk and Tutz (2009), and Kim and Swanson (2014).

2 The Boosting Algorithm Boosting was originally designed as a classiﬁcation scheme (Freund and Schapire, 1995, 1996) and later extended to regression problems (Friedman, Hastie, and Tibshirani, 2000; Friedman, 2001).3 It is based on the machine learning idea, meaning that it is a computer programme that “learns from the data” (Hastie, Tibshirani, and Friedman, 2009). Instead of estimating a “true” model, as is traditionally done in statistics and econometrics, it starts with a simple model that is iteratively improved or “boosted” based on the performance with training data. As B¨hlmann and Yu (2003) put it, “for u large dataset problems with high-dimensional predictors, a good model for the problem is hard to come by, but a sensible procedure is not.”

where m = 1, 2,..., M denote the iteration steps, yt is the dependent variable ˆ and b(xt ; β m ) is called learner, which is a simple function of the input vecˆ tor xt depending on the parameter vector β m. The ﬁtting method used to ˆ determine b(xt ; β m ) is also part of the learner.

More speciﬁcally, boosting performs forward stage-wise modelling: it starts with the intercept and in each iteration m adds to the model the learner that most improves the ﬁt, without modifying the parameters of those previously entered. The learners are selected according to a loss funcˆ ˆ tion L(yt, fm (xt )), given the current model fm−1 (xt ). Since in each iteration, only the parameters of the last learner need to be estimated, the algorithm is computationally feasible even for high-dimensional data. Generally, a forward stage-wise modelling procedure can be summarised as follows.

Note that in a time-series context the predictor vector xt contains p lags of the target variable yt as well as p lags of the exogenous variables zj,t, where The loss function is scaled by the factor 1/2 in order to ensure a convenient representation of the ﬁrst derivative.

xt = (yt−1, yt−2,..., yt−p, z1,t−1, z1,t−2,..., z1,t−p,..., zN,t−1, zN,t−2,..., zN,t−p ).

Thus, component-wise boosting simultaneously selects variables and lags.

From all potential predictor variables xk,t, where k = 1,..., p(1+N ), it selects in every iteration m one variable xkm,t —but not necessarily a diﬀerent one ∗ for each iteration—which yields the smallest sum of squared residuals (SSR).

The algorithm for component-wise boosting with L2 -loss can be summarised as follows.

The parameter ν was introduced by Friedman (2001) who showed that the prediction performance of boosting is improved when the learner is shrunk toward zero. The ﬁnal function estimate is then the sum of the M learners

**multiplied by the shrinkage parameter ν:**

2.3 Controlling the Bias-Variance Tradeoﬀ Both the number of iterations M and the shrinkage parameter ν regulate the tradeoﬀ between bias and variance that emerges when ﬁtting a model and that inﬂuences its forecasting performance. Suppose the data arise from the true but unknown model Y = f (X) + ε, where Y is a random target variable and X is the vector of random predictors. Under the assumption that the error has E(ε) = 0 and Var(ε) = σε we can derive the expected forecast error Err(xt ) at an arbitrary predictor vector xt of a forecasting ˆ

**model f (xt ), using squared error loss:**

The ﬁrst term of this decomposition of the expected forecast error is the noise, that is, the variance of the target series around its true mean f (xt ) = E(Y |X = xt ). It is irreducible, even if we knew the true model. The second term is the squared bias, the amount by which the average model estimate diﬀers from the true mean. In contrast to a simple OLS regression, where you ˆ assume that the true model is known, thus E[f (xt )] = f (xt ), this term is not zero but depends on the model complexity. Typically, it will be larger if the model is not complex enough so that we omit important variables. The third term is the variance of the forecasting model, the expected squared deviation ˆ of f (xt ) around its mean. This term increases with model complexity. If we ﬁt the training data harder, the model will generalise less well to unseen data and the forecasting performance deteriorates. Thus, the model must be chosen such that bias and variance are balanced to minimise the expected forecast error (Hastie, Tibshirani, and Friedman, 2009).

One way of avoiding overﬁtting with boosting is to employ a weak learner, that is, one that involves few parameters and has low variance relative to bias (B¨hlmann and Yu, 2003). This can be achieved, for instance, by shrinking u the learner toward zero because doing so reduces its variance. The other way of controlling the bias-variance tradeoﬀ is to restrict the number of boosting iterations. The shrinkage parameter ν and the number of iterations M are connected; the smaller ν, the more iterations are needed to achieve a given prediction error (Hastie, Tibshirani, and Friedman, 2009). Empirical work ﬁnds that the exact size of the shrinkage parameter is of minor importance, as long as it is “suﬃciently small”, i.e., 0 ν ≤ 0.1 (Friedman, 2001; B¨hlmann u and Hothorn, 2007a; Hastie, Tibshirani, and Friedman, 2009). Thus, the optimal number of iterations M ∗ is the main regularisation parameter of boosting.