Working Paper 11-34 Departamento de Estadística

Statistics and Econometrics Series 026 Universidad Carlos III de Madrid

October 2011

CCaallllee MMaaddrriid,d, 112266

28903 Getafe (Spain)

Fax (34) 91 624-98-49

BOOTSTRAP FORECAST OF MULTIVARIATE VAR MODELS WITHOUT

USING THE BACKWARD REPRESENTATION

♦ ♠ ♣Lorenzo Pascual , Esther Ruiz and DiegoFresoli

Abstract

In this paper, we show how to simplify the construction of bootstrap prediction densities in

multivariate VAR models by avoiding the backward representation. Bootstrap prediction

densities are attractive because they incorporate the parameter uncertainty and do not rely on

aaanynyny papaparrrtttiiiccculululaaarrr aaassssssumumumptptptiiiononon aaaboutboutbout ttthehehe eeerrrrrrooorrr dididissstttrrriiibububutttiiion. on. on. WWWhahahattt iiisss mmmorororeee, , , ttthhheee ccconsonsonstttrrrucucuctttiiiooon n n ooofff

densities for more than one-sstteepp-aaheheaad d iiss posposssiibbllee eevveen n iinn ssiittuuaattiionsons wwhehen n tthehessee dedennssiittiieess aarree

unknown asymptotically. The main advantage of the new procedure is that it is computationally

simple without loosing the good performance of bootstrap procedures. Furthermore, by avoiding

a backward representation, its asymptotic validity can be proved without relying on the

assumption of Gaussian errors as needed by alternative procedures. Finally, the new procedure

proposed in this paper can be implemented to obtain prediction densities in models without a

bababaccckkkwwwaaarrrd d d rrreeeppprrreeessseeentntntaaatttiiion on on aaasss, , , fffooorrr eeexaxaxammmplplpleee, , , mmmodeodeodelllsss wwwiiittth h h MMMAAA cccomomomponeponeponentntntsss ororor GGGAAARRRCCCHHH

disturbances. By comparing ttthehehe fffiiinnniiittteee sssaaammmplplpleee pppeeerrrfffororormmmaaancncnceee ooofff ttthhheee prprproposoposoposeeed d d prprprocococeeedurdurdureee wwwiiittth th th thoshoshoseee

of alternatives, we show that nothing is lost when using it. Finally, we implement the procedure

to obtain prediction regions for US quarterly future inflation, unemployment and GDP growth.

Keywords: Non-Gaussian VAR models, Prediction cubes, Prediction density, Prediction

rreeggiionsons, P, Prreeddiiccttiionon e elllliipspsooiidsds, R, Reessaammplpliingng mmeetthodshods..

♦

EDP-Energías de Portugal, S.A., Unidade de Negócio de Gestao da Energía, Director Adjunto.

♠ Corresponding author: Dpt. Estadística and Instituto Flores de Lemus, Universidad Carlos III de Madrid, C/ Madrid

126, 28903 Getafe, Spain. Tel: 34 916249851, Fax: 34 916249849, e-mail: ortega@est-econ.uc3m.es.

♣Dpt. de Estadística, Universidad Carlos III de Madrid, C/ Madrid 126, 28903 Getafe (Madrid), Spain, e-mail:

dfresoli@est-econ.uc3m.es.

Acknowledgements. The last two authors are grateful for financial support from project ECO2009-08100 of the Spanish

Government. We arree aallssoo ggrraatteeffuull ttoo GGlloorriiaa GGoonnzzáález-RRiivveerraa ffoorr hheerr uusseeffuull ccoommmmeennttss.. TThhee uussuuaall ddiissccllaaiimmss aappppllyy.. Bootstrap Forecast of Multivariate VAR Models without

using the backward representation

y zxLorenzo Pascual Esther Ruiz Diego Fresoli

October 2011

Abstract

In this paper, we show how to simplify the construction of bootstrap prediction densities

in multivariate VAR models by avoiding the backward representation. Bootstrap prediction

densities are attractive because they incorporate the parameter uncertainty and do not rely

on any particular assumption about the error distribution. What is more, the construction of

densities for more than one-step-ahead is possible even in situations when these densities are

unknown asymptotically. The main advantage of the new procedure is that it is computa-

tionally simple without loosing the good performance of bootstrap procedures. Furthermore,

by avoiding a backward representation, its asymptotic validity can be proved without relying

on the assumption of Gaussian errors as needed by alternative procedures. Finally, the new

procedure proposed in this paper can be implemented to obtain prediction densities in models

without a backward representation as, for example, models with MA components or GARCH

disturbances. By comparing the nite sample performance of the proposed procedure with

those of alternatives, we show that nothing is lost when using it. Finally, we implement the

procedure to obtain prediction regions for US quarterly future in ation, unemployment and

GDP growth.

KEYWORDS: Non-Gaussian VAR models, Prediction cubes, Prediction density, Predic-

tion regions, Prediction ellipsoids, Resampling methods.

EDP-Energ as de Portugal, S.A., Unidade de Neg ocio de Gest~ ao da Energ a, Director Adjunto.

yCorresponding author: Dpt. Estad stica and Instituto Flores de Lemus, Universidad Carlos III de Madrid, C/

Madrid 126, 28903 Getafe, Spain. Tel: 34 916249851, Fax: 34 916249849, e-mail: ortega@est-econ.uc3m.es.

zDpt. de Estad stica, Universidad Carlos III de Madrid, C./ Madrid 126, 28903 Getafe (Madrid), Spain, e-mail

dfresoli@est-econ.uc3m.es.

xAcknowledgements. The last two authors are grateful for nancial support from project ECO2009-08100

of the Spanish Government. We are also grateful to Gloria Gonz alez-Rivera for her useful comments. The usual

disclaims apply.

11 Introduction

Bootstrap procedures are known to be useful when forecasting time series because they allow

the construction of prediction densities without imposing particular assumptions on the error

distribution and simultaneously incorporating the parameter uncertainty. Note that when the

errors are non-Gaussian the prediction densities are usually unknown when the prediction hori-

zon is larger than one-step-ahead. However, the bootstrap can be implemented in these cases

to obtain the corresponding prediction densities. They are also attractive because their compu-

tational simplicity and wide applicability. However, these advantages are limited by the use of

the backward representation that many authors advocate after the seminal paper of Thombs and

Schuchany (1990). In particular, Kim (1999) extends the procedure of Thombs and Schuchany

(1990) to stationary VAR(p) models. Later, Kim (2001, 2004) considered bias-corrected predic-

tion regions by employing a bootstrap-after-bootstrap approach. On the other hand, Grigoletto

(2005) proposes two further alternative procedures based on Kim (1999) that take into account

not only the uncertainty due to parameter estimation but also the uncertainty attributable to

model speci cation. In any case, the bootstrap procedures conceived by Kim (1999, 2001, 2004)

and Grigoletto (2005) use the backward representation to generate the bootstrap samples used to

obtain replicates of the estimated parameters. Using the backward representation has three main

drawbacks. First, the resulting procedure is computationally complicate and time consuming.

Second, given that the backward residuals are not independent it is necessary to use the relation-

ship between the backward and forward representation of the model in order to resample from

independent residuals; see Kim (1997, 1998) for this relationship which can be rather complicate

for high order models. Consequently, Kim (1999) resamples directly from the dependent back-

ward residuals. However, the asymptotic validity of the bootstrap resampling can only be proved

by imposing i.i.d. and, as a result, it requires assuming Gaussian errors. Finally, these bootstrap

alternatives can only be applied to models with a backward representation which excludes their

implementation in, for example, multivariate models with Moving Average (MA) components or

with GARCH disturbances.

In an univariate framework, Pascual et al. (2004a) show that the backward representation

can be avoided without loosing the good properties of the bootstrap prediction densities. When

dealing with multivariate systems it is even more important avoiding the backward representation

due to its larger complexity; see Kim (1997, 1998). In this paper, we propose an extension of the

bootstrap procedure proposed by Pascual et al. (2004a) for univariate ARIMA models to obtain

2joint prediction densities for multivariate VAR(p) models avoiding the backward representation

and, consequently, overcoming its limitations. We prove the asymptotic validity of the proposed

procedure without relying on particular assumptions about the prediction error distribution.

We focus on the construction of multivariate prediction densities from which it is possible to

obtain marginal prediction intervals for each of the variables in the system and joint prediction

1regions for two or more variables within the system. Monte Carlo experiments are carried out

to study the nite sample performance of the marginal prediction intervals obtained by the new

bootstrap procedure and compare it those of alternative procedures available in the literature. We

also compare their corresponding elliptical and Bonferroni regions. We show that, although the

bootstrap procedure proposed in this paper is computationally simpler, its nite sample properties

are similar to those of previous more complicate bootstrap approaches and clearly better than

those of the standard and asymptotic prediction densities. We also show that when the errors

are non-Gaussian, the bootstrap elliptical regions are inappropriate with the Bonferroni regions

having better properties. The procedures are illustrated with an empirical application which

consists of predicting future in ation, unemployment and growth rates in the US.

The rest of the paper is organized as follows. Section 2 describes the asymptotic and bootstrap

prediction intervals and regions previously available in the literature. In Section 3, we propose a

new bootstrap procedure, derive its asymptotic distribution and analyze its performance in nite

samples. We compare the new bootstrap densities and corresponding prediction intervals and

regions with the standard, asymptotic and alternative bootstrap procedures. Section 4 illustrates

the results with an empirical application. Finally, section 5 concludes the paper with suggestions

for further research.

2 Asymptotic and bootstrap prediction intervals and re-

gions for VAR models

In this section, we describe the construction of prediction regions in stationary VAR models based

on assuming known parameters and Gaussian errors. We also describe how the parameter uncer-

tainty can be incorporated by using asymptotic and bootstrap approximations of the nite sample

distribution of the parameter estimator. The bootstrap procedures can also be implemented to

deal with non-Gaussian errors.

1Previous paper focus on the Bonferroni regions and not in the bootstrap densities themselves.

32.1 Asymptotic prediction intervals and regions

Consider the following multivariate VAR(p) model

(L)Y = +a ; (1)t t

where Y is the Nx1 vector of observations at time t; is a Nx1 vector of constants, (L) =t

pI L ::: L withL being the lag operator andI theNxN identity matrix. TheNxNN 1 p N

parameter matrices, ;i = 1;:::;p; satisfy the stationarity restriction. Finally,a is a sequence ofi t

Nx1 independent white noise vectors with nonsingular contemporaneous covariance matrix given

by .a

It is well known that if a is an independent vector white noise sequence then the pointt

predictor of Y that minimizes the Mean Square Error (MSE) is its conditional mean whichT+h

depends on the model parameters. In practice, these parameters are unknown and the predictor

of Y is obtained with the parameters substituted by consistent estimates as followsT+h

b b b b bY =b + Y +::: + Y (2)T+hjT 1 T+h1 jT p T+hpjT

b bwhere Y = Y ; j = 0; 1;::: Furthermore, the MSE of Y is usually estimated asT+jT+jjT T+hjT

follows

h1X

0b b b b (h) = ; (3)b j a jY

j=0

0babab bwhere are the estimated matrices of the MA representation of Y and = wherej t a TNp1

ba = (ba ;:::;ba ) with1 T

b bba =Y b Y ::: Y : (4)t t 1 t1 p tp

If a is further assumed to be Gaussian, then the marginal prediction density of the ntht

variable in the system is also Gaussian and the standard practice is to construct the (1-)100%

prediction interval for the nth variable in the system as follows

GI = y jy 2 yb z b ; (5)T+h n;T+h n;T+hjT n;T+hjT =2 n;h

bwhere yb is the nth component of Y , b is the square root of the nth diagonaln;T+hjT T+hjT n;h

belement of (h) andz is the-quantile of the standard Gaussian distribution. The GaussianitybY

of the forecast errors can also be used to obtain the following (1-)100% joint ellipsoid for all the

4variables in the system

h i h i0

1 2b b bGE = Y j Y Y (h) Y Y < (N) ; (6)T+h T+h T+h T+hjT b T+h T+hjT Y

2 2 2where (N) is the -quantile of the distribution with N degrees of freedom. Constructing

the ellipsoids in (6) can be quite demanding when N is larger than two or three. Consequently,

Lutk epolh (1991) proposes using the Bonferroni method to construct the following prediction

cubes with coverage at least (1-)100%

NGC = Y jY 2[ yb z b ; (7)T+h T+h T+h n;hn=1 n;T+hjT

where where = 0:5(=N). However, note that the prediction intervals and regions above have

two main drawbacks. First, they are constructed using the MSE in (3) which does not incor-

porate the parameter uncertainty. As a consequence, if the sample size is small the uncertainty

bassociated with Y is underestimated and the corresponding intervals an regions will haveT+hjT

lower coverages than the nominal. The second problem, is related to the Gaussianity assumption.

When this assumption does not hold, the quadratic form in (6) is not adequate as well as the

width of the intervals in (5) and (7). Even more, when the prediction errors are not Gaussian,

the shape of their densities for h>2 is in general unknown.

As an illustration, consider the following VAR(2) bivariate model

2 3 2 32 3 2 32 3 2 3

y 0:9 0 y 0:5 0:7 y a1;t 1;t1 1;t2 1;t6 7 6 76 7 6 76 7 6 7

= + + (8)4 5 4 54 5 4 54 5 4 5

y 0:4 0 y 0:8 0:1 y a2;t 2;t1 2;t2 2;t

0where a = (a , a ) is an independent white noise vector with contemporaneous covariancet 1;t 2;t

0

matrix given by vec = (1; 0:8; 0:8; 1) where vec denotes the column stacking operator. Thea

2distribution of a is a (4). Panel (a) of Figures 1 and 2 display the joint one-step-ahead andt

eight-steps-ahead densities of y and y respectively, which have clear asymmetries, although1;t 2;t

more pronounced in the former. After generating a time series of size T = 100, the VAR(2)

parameters are estimated by Least Squares (LS). Panel (b) of Figures 1 and 2 plot kernel estimates

of the joint density obtained as usual after assuming that the prediction errors are jointly Gaussian

2The same argument can be applied when the interest lies on only a subset of components of Y .t

For example, if the focus is on the rst J components of Y , the prediction ellipsoid is given byt h i h i0 1

0 0 2b b bfY j Y Y C C (h)C C Y Y < (J)g, where C = [I 0].T+h T+h b T+h JT+hjT T+hjT Y

53with zero mean and convariance matrix given by (3). Comparing panels (a) and (b), it is obvious

that the Gaussian approach fails to capture the asymmetry of the error distribution. It is usual in

practice to construct prediction regions fory andy . From the joint Gaussian density plotted1;t 2;t

in panel (b) of Figure 1, it is possible to obtain the corresponding 95% one-step-ahead ellipsoids

and Bonferroni regions. They are shown in Figure 3 together with a realization of Y . We canT+1

observe that the shape of both regions is not appropriate to construct a satisfactory prediction

region for Y . Finally, as we may also being interested in forecasting only one variable inT+1

the system, Figure 4 displays the true marginal one-step-ahead density of y together with the1;t

Gaussian approximation. Once more, it is clear that the Gaussian approach fails to capture the

skewness of the prediction density.

Consider rst the problem of incorporating the parameter uncertainty. As pointed out above,

the MSE in (3) underestimates the true prediction uncertainty and, consequently, they can be

inappropriate in small samples sizes. Granted that a good estimator is used, the importance

of taking into account the parameter uncertainty could be small in systems consisting in few

variables; see Riise and Tjostheim (1984). But, in empirical applications we often found VAR(p)

models tted to large systems; see, for example, Simkins (1995) for a VAR(6) model for a system

of 5 macroeconomic variables, Waggoner and Zha (1999) who t a VAR(13) model to a system

of 6 macroeconomic variables, Chow and Choy (2006) who t a VAR(5) model to a system of

5 variables related with the global electronic system, G omez and Guerrero (2006) for a VAR(3)

model tted to a system of 6 macroeconomic variables and Chevillon (2009) for a VAR(2) model

for a system of 4 macroeconomic variables just to mention a few empirical applications. Addi-

tionally, as these examples show, when dealing with real systems of time series, their adequate

representation often requires a rather large order p. If the number of variables in the system

and/or the number of lags of the model are relatively large, the estimation precision in nite

samples could be rather low and predictions based on VAR(p) models with estimated parame-

ters may su er severely from the uncertainty in the parameter estimation. In these cases, it is

important to construct prediction intervals and regions that take into account this uncertainty;

see, for instance, Schmidt (1977), Lutk epolh (1991), West (1996), West and McCracken (1998),

Sims and Zha (1998, 1999) for existing evidence on the importance of taking into account param-

eter uncertainty in unconditional forecasts and Waggoner and Zha (1999) for the same result in

conditional forecasts.

3The smoothed density is obtained by applying a Gaussian kernel density estimator with a diagonal bandwidth

matrix with elements given by the Gaussian \rule of thumb".

6To incorporate the parameter uncertainty, Lutk epolh (1991) suggests approximating the sam-

4ple distribution of the estimator by its asymptotic distribution density. In this case, the MSE

bof Y that incorporates the parameter uncertainty can be approximated byT+hjT

1lb b b (h) = (h) + (h); (9)bbY Y T

where

h1 h1 h iXX

0 h1i 1 h1 j 0b b b b b b b b (h) = tr (B ) B ; (10)i a j

i=0 j=0

0ZZb bwith = and B is the following (Np + 1)x(Np + 1) matrix

T

2 3

1 0 0 ... 0 0

6 7

6 7

6 b b b b 7b ... 1 2 p1 p6 7

6 7

b 6 7B = :0 I 0 ... 0 06 N 7

6 7

6 7

6 ... ... ... ... ... ... 7

4 5

0 0 0 ... I 0N

In order to asses the e ect of the parameter uncertainty on the MSE given by (9), consider the

b bcase of one-step-ahead predictions, i.e. when h = 1. In this situation, (1) = ( Np + 1) , anda

lb (1) can be approximated bybY

T +Np + 1lb b (1) = :abY T

This expression shows that the contribution of the parameter uncertainty to the one-step-ahead

MSE matrix depends on the dimension of the system, N, the VAR order, p, and the sample

size, T . As long as N and/or p, or both, are big enough compared to the sample size T , the

e ect of parameter uncertainty can be substantial. Obviously, as the sample size gets larger then

T+Np+1

lim = 1 and the parameter uncertainty contribution to the MSE in (9) vanishes.T!1 T

Once the MSE is computed as in (9), the corresponding prediction intervals, ellipsoids and

cubes are constructed using the Gaussianity assumption as follows

lAI = y jy 2 yb z b ; (11)T+h n;T+h n;T+h n;T+hjT =2 n;h

4Alternatively, some authors propose using Bayesian methods which could be rather complicated from a com-

putational point of view; see, for example, Simkins (1995) and Waggoner and Zha (1999) who need the Gaussianity

assumption to derive the likelihood and posterior distribution.

7n h i h i o

l 1 2b b bAE = Y j Y Y (h) Y Y < (N) ; (12)T+h T+h T+h T+hjT T+h T+hjTb Y

N lAC = Y jY 2[ yb z b ; (13)T+h T+h T+h n;T+hjT n=1 n;h

l lbwhere b is the square root of the nth diagonal element of (h).n;h bY

Panel (c) of Figure 1 which plots the density of y and y for the same example as1;T+1 2;T+1

above constructed assuming that the forecast error are Gaussian with zero mean and covariance

matrix given by (9), shows that this density is not very di erent from that plotted in panel (b)

and, obviously, it is not able to capture the asymmetries of the error distribution. Similarly,

the joint density of y and y in panel (c) of Figure 2 does not look di erent from that1;T+8 2;T+8

in panel (b). The similarity between the standard and the asymptotic densities is even more

clear in Figure 3 that plots the elliptical and Bonferroni regions constructed using (12) and (13),

respectively. As we can observe they are slightly larger than the standard but still located very

close to them. They cannot cope with the lack of symmetry of the prediction error distribution.

This similarity could be expected as we are estimating 8 parameters with T = 100. Similar

comments deserve Figure 4, where we can observe that the asymptotic marginal density for the

rst component of the system y only di ers from the standard density in the variability,1;T+1

which is slightly larger in the former.

Note that the asymptotic approximation of the distribution of the LS estimator can be inad-

equate in small samples depending on the number of parameters to be estimated and the true

distribution of the innovations.

2.2 Bootstrap procedures for prediction intervals and regions

To overcome the limitations of the Gaussian densities described before, Kim (1999, 2001, 2004)

and Grigoletto (2005) propose using bootstrap procedures which incorporate the parameter un-

certainty even when the sample size is small and do not rely on the Gaussianity assumption.

In order to take into account the conditionality of VAR forecasts on past observations, Kim

(1999) proposes to obtain bootstrap replicates of the series based on the following backward

recursion

b bY =!b + Y +::: + Y +b (14)1 pt t+1 t+p t

where Y = Y for i = 0; 1;:::;p 1 are p starting values which coincide with the lastTiTi

b bvalues of the original series, !b; ;:::; ; are LS estimates of the parameters of the backward1 p

representation, andb are obtained by resampling from the empirical distribution function of thet

8centered and rescaled backward residuals. Then, bootstrap LS estimates of the parameters of the

forward representation are obtained by estimating the VAR(p) model in (1) usingfY ;:::;Y g.1 T

^ b bDenote these estimates by B = (b ; ;:::; ). The bootstrap forecast for period T +h is then1 p

given by

b b b b bY =b + Y +::: + Y +ba (15)T+hjT 1 T+h1jT p T+hpjT T+h

whereba are random draws from the empirical distribution function of centered and rescaledT+h

bforward residuals. Having obtained R bootstrap replicates of Y , Kim (2001) de nes the

T+hjT

bootstrap (1-)100% prediction interval for the nth variable in the system as follows

n h io KI = y jy 2 q ;q 1 (16)T+h n;T+h n;T+h K K

2 2

where q () is the empirical -quantile of the bootstrap distribution of the nth component ofK

bY approximated by G (x) = #(yb < x)=R. Similarly, Kim (1999) proposes toT+hjT n;K n;T+hjT

construct bootstrap prediction ellipsoids with probability content (1 )100% are given by

h i h i0

K 1 b bKE = Y j Y Y S (h) Y Y <Q (17)T+h T+h T+h T+h^ KT+hjT T+hjTY

(r) Kb bwhere Y is the sample mean of the R bootstrap replicates Y and S (h) is the cor-^T+hjT T+hjT Y

5 responding sample covariance. The quantity Q in (17) is the (1 )100% percentile of theK

bootstrap distribution of the following quadratic form

h i h i0

K 1 b b b bY Y S (h) Y Y : (18)^T+hjT T+hjT T+hjT T+hjTY

Furthermore, Kim (1999) proposes using the Bonferroni approximation to obtain prediction cubes

with nominal coverage of at least (1-)100% which are given by

n h io N KC = y jy 2[ q ;q 1 (19)T+h n;T+h n;T+h n=1 K K

2 2

6where ==N.

5 KKim (1999) does not explicitly show how S (h) should be de ned. Alternatively, one can obtain S (h) by^^ YY

substituting the parameters in (9) by their bootstrap estimates and computing the average through all bootstrap

replicates. By calculating it with the sample covariance or by substituting the bootstrap parameters in the

corresponding expressions we get similar results.

6Actually, what Kim (1999) de nes as KC is slightly di erent from (19) as he uses the percentile and percentile-t

methods of Hall (1992). Here we prefer to use the Bonferroni prediction regions in (19) because they are better

suited to deal with potential asymmetries of the error distribution; see Hall (1992).

9