The nowcasting package provides the tools to make forecasts of monthly or quarterly economic variables using dynamic factor models. The objective is to help the user at each step of the forecasting process, starting with the construction of a database, all the way to the interpretation of the forecasts. The dynamic factor model adopted in this package is based on the articles from Giannone et al. (2008) and Banbura et al. (2011). Although there exist several other dynamic factor model packages available for R, ours provides an environment to easily forecast economic variables and interpret results.
Important economic decisions are made based on current and future conditions. Oftentimes, the variables used to measure such conditions are not available even for the recent past. This is, for instance, the case with US GDP that is published 45 days after the end of the quarter. Similarly, Brazilian GDP is published with a 60-day lag. There is therefore a need for forecasting the current value of given variables. To this end, (Giannone et al. 2008) proposed a statistical model that allows quarterly variables, such as US GDP, to be forecast using a large set of monthly variables released with different lags. GDP forecasts for the current quarter are, furthermore, updated whenever new information is available. Different central banks have shown interest in this methodology, among them the European Central Bank (Angelini et al. 2008; Van Nieuwenhuyze et al. 2008; Bańbura and Rünstler 2011), and the central banks of Ireland (D’Agostino et al. 2008), New Zealand (Matheson 2010) and Norway (Aastveit and Trovik 2012).
Factor models are designed to summarize the variation contained in a large dataset into only a few variables (Stock and Watson 2006). In (Giannone et al. 2008), the authors show how to reduce the information contained in dozens of monthly time series into only two dynamic factors. These two estimated factors, which are initially monthly, are then transformed into quarterly factors and used in a regression against GDP. Various other authors, such as (Chauvet 2001; Marcellino et al. 2003; Forni et al. 2004; Boivin and Ng 2006; D’Agostino et al. 2006; Banbura et al. 2011; Dahlhaus et al. 2015; Stock and Watson 2016), have explored Dynamic Factor Models (DFMs) in time series forecasting and found promising results.
Given the publication lag of many variables, such as GDP, we can either
forecast past, current or future values. In order to differentiate
between those types of forecasts we adopt the terminology used in
(Giannone et al. 2008) and (Banbura et al. 2011). Backcasting refers to
forecasting the value of a yet unpublished variable for a past period,
while nowcasting will be with respect to the current period. By way of
illustration, suppose we want to forecast the GDP for the
The aim of the package nowcasting is to offer the tools for the R user to implement dynamic factor models. The different steps in the forecasting process and the associated functions within the package are based on the literature. We have chosen to divide the process into 4 main steps: 1) constructing a dataset; 2) defining the model’s initiation parameters; 3) forecasting; 4) presenting results. This particular division will be maintained in most sections.
This brings us to the article’s sections that are organized as follows: 1) the theoretical framework is introduced; 2) the functions of our package are presented; 3) working examples of how to nowcast Brazilian GDP and of the New York FED nowcasting are given; 4) and finally the last section concludes with some considerations.
Let
In equation (1), the variables
In the so-called exact dynamic factor model, the error components from
equation ((1)) are assumed to be mutually uncorrelated at
all lags, i.e.,
where
Following is an example, in matrix form, of equation ((2))
of the model for orders
In order to predict a quarterly variable using monthly data, we
construct a partially observed monthly counterpart of the quarterly
variable as proposed in (Mariano and Murasawa 2003). This allows, for
instance, quarterly GDP to be explained by monthly variables. Continuing
with this example, let
The above accounting rule states that the quarterly GDP flow is equal to
the sum of the monthly flows. Looking at the quarterly change,
Suppose that the variable of interest is a quarterly rate of change,
Stating the approximation between the arithmetic and geometric means we
have:
Combining equations (5) and
(6) we obtain the approximation
from (Mariano and Murasawa 2003) that expresses the quarterly growth rate of
GDP as a function of the unobservable monthly growth rates
Suppose that the unobserved monthly growth rate
where
We follow the papers by Bai and Ng (2002) and Bai and Ng (2007) to respectively
define 1) the number
Let
The chosen number of factors
The number of shocks q
can be lower than the number of factors r
.
Once the number of factors is determined, we use an information
criterion from Bai and Ng (2007) to estimate the number of shocks
Then for some
where the estimated number of shocks to the factors will be
We will describe two methodologies for estimating dynamic factors: Two-Stage and Expectation-Maximization.
Two-Stage: This approach is described in (Giannone et al. 2008) and
refers to the exact DFM. In the first stage, the parameters of the
matrices
The estimator for the variance and covariance matrix for
According to (Stock and Watson 2011), the solution to
((12)) is to set
In the second stage, Kalman smoothing (Durbin and Koopman 2012) is used to
re-estimate the factors for the unbalanced panel
y
of lower frequency than the explanatory
variables. Factors are estimated using the monthly explanatory
variables The parameters of equation (13) are estimated by OLS, and
the forecast for
Expectation-Maximization: This estimation method is able to deal
with arbitrary patterns of missing values as shown in
(Bańbura and Modugno 2014). It is therefore less restrictive than the
Two-Stage method with regards to the frequencies of the variables
and allows for a mixed frequency database. Following
(Banbura et al. 2011), factors can be defined for different subgroups
of variables and no longer all need to be global as in the
Two-Stage estimation method. Below, we illustrate a case where
three factors are partitioned into three groups (global, real and
nominal) as in (Banbura et al. 2011). Rewriting equation
(1) accordingly gives equation
(14). As opposed to the Two-Stage estimation
method that builds on an exact dynamic factor model, the error term
is defined as an AR(1) process. A more restrictive assumption than
the Two-Stage method is that the number of shocks to the factors
q
is set equal to the number of factors r
.
where
The global factor is estimated considering all the explanatory
variables, while the estimates of the nominal and real factors only
consider variable classified, respectively, as nominal and real. The
parameter
In this model, the parameters, the unobserved common factors and the missing values are estimated through the Expectation-Maximization algorithm, which uses the following recursive structure:
Convergence is achieved when the absolute change in the value of the
log-likelihood function is less than
The first step in the nowcasting process is to prepare the data in a way
that is compatible with the proposed models and estimation methods. One
of the motivations of the presented models is the forecasting
improvements that can be achieved by using higher frequency variables.
More specifically, the gains that can be obtained in using monthly
variables to forecast quarterly series. Hence, all functions require
monthly mts
objects. In practice, the quarterly variables are usually
represented as monthly variables for which the last month of the quarter
is observed. As illustrated in the working examples, such
straightforward transformations from one frequency representation to
another can be achieved by using the functions qtr2month()
or
month2qtr()
.
With regards to the estimation methods, different inputs may have to be
provided. As a matter of fact, the Two-Stage method is more
restrictive on the format of the variables as it depends on principal
components in the first stage. This requires a strategy to deal with
missing values, which are not part of the jagged edge, beforehand.
(Giannone et al. 2008) propose to replace such missing values with the
median of the series that are then smoothed with a moving average. Since
such a strategy assigns a value that is independent of the information
contained in other contemporaneous variables, it is advisable to exclude
series with many missing values. The EM algorithm, however, is able to
deal with missing values in a way that uses the information contained in
other variables and might therefore not require discarding such
variables. Finally, independently of the estimation method, stationary
series are required. The usual transformations for making time series
stationary and the different strategies to deal with missing values have
been included in the function Bpanel()
that prepares the database for
the nowcasting function. Since these choices require careful attention,
the function Bpanel()
is explained in further detail.
Bpanel(base, trans, NA.replace = TRUE, aggregate = FALSE, k.ma = 3, na.prop = 1/3, h = 12)
trans
is a vector indicating the transformations to be applied to the
variables. For most cases, the available transformations are
sufficient to make economic variables stationary. The transformation
must be specified by using one of the following values for the
argument trans
:
trans = 0
: the observed series is preserved;
trans = 1
: monthly rate of change:
trans = 2
: monthly difference:
trans = 3
: monthly difference in year-over-year rate of
change:
trans = 4
: monthly difference in year-over-year difference:
trans = 5
: year difference:
trans = 6
: year-over-year rate of change:
trans = 7
: quarterly rate of change
NA.replace
is a boolean to determine whether missing values should be replaced
(NA.replace = TRUE
) or not (NA.replace = FALSE
).
aggregate
is a boolean to indicate whether to aggregate the monthly variables
to represent quarterly quantities. If TRUE
the aggregation is made
following the approximation of (Mariano and Murasawa 2003).
k.ma
is a numeric representing the degree of the moving average
correction if NA.replace = TRUE
.
na.prop
is a number between 0 and 1 indicating the ratio of missing observations to the total number of observations beyond which series will be discarded. The default is 1/3, meaning that if more than 1/3 of the observations are missing the series will be discarded from the database.
h
indicates how many periods should be added to the database. Default
is 12. Those missing values will be predicted with the function
nowcast()
.
As explained in the section on parameter estimation, the package offers
different functions to estimate the number of factors
Function ICfactors()
estimates the number of factors x
is a
balanced panel and rmax
is an integer representing the maximum
number of factors for which the information criterion should be
calculated. The default value is 20. type
indicates which of the
information criterion from Bai and Ng (2002) to use. type
x
is not a balanced panel, the function will delete rows with
missing values in order to use principal components.
ICfactors(x, rmax = 20, type = 2)
Function ICshocks()
estimates the number of idiosyncratic shocks
given a number x
is a balanced
panel. delta
and m
are parameters of the information criterion,
where r
is not specified it will be defined according to
ICfactors(x, rmax = 20, type = 2)
. p
is the number of lags in
the VAR of equation ((2)). If not specified, the default
is the lowest most occurring value from the information criteria
used within the function VARselect()
from the package vars.
ICshocks(x, r = NULL, p = NULL, delta = 0.1, m = 1)
An important feature of factor models is the dimensionality reduction of
(many) original variables into a few common factors. Hence, the target
variable y
will be expressed as a function of a few factors extracted
from the explanatory variables. This motivated the choice of the inputs
for the nowcast()
function. The formula format, which is well known to
R users, captures this idea as formula = y
.
can
be understood as the projection of y
on the information contained in
the dataset. The model’s parameters are estimated according to the
selected method (2s
, 2s_agg
and EM
, which correspond,
respectively, to “two-stage”, “two-stage with factor aggregation” and
“Expectation-Maximization algorithm”) described in the section on
estimation. The number r
of dynamic factors, the number q
of shocks
to the factors, and the lag order p
of the factors are determined
beforehand as shown in the previous subsection. The argument blocks
can be used with the EM
method to estimate factors for different
subgroups of variables. Finally, the argument frequency
is necessary
for all methods in order to identify the frequency of the variables.
nowcast(formula, data, q = NULL, r = NULL, p = NULL, method = 'EM', blocks = NULL,
frequency = NULL)
In the first two methods (2s
and 2s_agg
), the factors are calculated
based on the monthly variables, on which the dependent variable y
will
be regressed. The difference between 2s
and 2s_agg
is that for the
latter the monthly factors are transformed into quarterly quantities
while in the former no such aggregation is used. A linear regression
(bridge equation if y
is quarterly) of y
on the factors allows the
former to be forecast.
In the third method (EM
) no bridge equation is needed, as opposed to
the Two-Stage method. In practice, the algorithm will estimate all the
missing values respecting the restrictions imposed by equation
(7). The forecasts of quarterly time series are
defined as the estimated values of the third month of the out of sample
quarters. As opposed to the Two-Stage method, the number of common
shocks q
can not be specified and is assumed to be equal to r
, the
number of factors in each block.
The function nowcast.plot()
allows to plot several outputs from the
function nowcast()
.
nowcast.plot(out, type = "fcst")
The argument out
is the output from the function nowcast()
. The
argument type
can be chosen from the list
{"fcst","factors","eigenvalues","eigenvectors"}:
"fcst"
: shows the y
variable and its forecasts in sample and out
of sample."factors"
: shows all the estimated factors."eigenvalues"
: indicates how much of the variability in the
dataset is explained by each factor."eigenvectors"
: shows the importance of each variable in the first
factor.In this example we showcase how to nowcast Brazilian GDP using the
Two-Stage estimation method. Most of the variables of interest can be
downloaded from the Brazilian central bank using the function
BETSget()
from the package BETS. The variables and the associated
codes can be found on the Brazilian central bank’s website
> library(nowcasting)
> data(BRGDP)
For this example we will construct a pseudo real-time dataset, using the
function PRTDB()
. Some variables, such as GDP, suffer revisions over
time. Since we do not take revisions into account, we refer to such
datasets as pseudo real-time (as opposed to vintages). The (approximate)
delays in days are included in the BRGDP
object and will be used to
define if observations were available at a specific moment in time. The
dataset is then treated for outliers and missing values that are not
part of the jagged edges of the data, i.e., that are not due to the
different publication lags of the variables. This is achieved through
the function Bpanel()
. Unless otherwise specified by the user, the
function will also discard series with over 1/3 missing values.
> vintage <- PRTDB(mts = BRGDP$base, delay = BRGDP$delay, vintage = "2015-06-01")
> base <- window(vintage, start = c(2005,06), frequency = 12)
> x <- Bpanel(base = base, trans = BRGDP$trans)
The function month2qtr()
transforms monthly time series into quarterly
ones. In this case we want to use the value of the third month as the
quarterly value.
> GDP <- base[,which(colnames(base) == "PIB")]
> window(GDP, start = c(2015,1))
Jan Feb Mar Apr May Jun
2015 NA NA 170.68 NA NA NA
> GDP_qtr <- month2qtr(x = GDP, reference_month = 3)
> window(GDP_qtr, start = c(2015,1))
Qtr1 Qtr2
2015 170.68 NA
The quarterly GDP indicator, in this example, is an index representing
the seasonal quarterly product.
> y <- diff(diff(GDP_qtr,4))
> y <- qtr2month(y)
The dataset x
, which now only posses jagged edges, is well suited for
the information criteria that make use of principal components. The
estimated number of factors is given by the function ICfactors()
. As
explained in the previous section, the information criteria might give
different results for finite samples.
> ICR1 <- ICfactors(x = x, type = 1)
> ICR2 <- ICfactors(x = x, type = 2)
Finally, given the chosen number of factors for our model, we can use an information criterion for determining the number of shocks to the factors.
> ICQ1 <- ICshocks(x = x, r = 2, p = 2)
> ICQ1$q_star
[1] 2
Let the object data
be a monthly mts
object where the first column
is a partially observable stationary GDP series (y
) and the remaining
columns a balanced panel of stationary time series (x
). The
frequency
vector will be determined by the quarterly GDP series and
the remaining monthly series. In this example the factors will be
aggregated to obtain quarterly quantities by setting
method = "2s_agg"
.
> data <- cbind(y,x)
> frequency <- c(4,rep(12,ncol(x)))
> now <- nowcast(formula = y~., data = data, r = 2, q = 2 , p = 2, method = "2s_agg",
frequency = frequency)
> summary(now$reg)
Call:
stats::lm(formula = Y ~ ., data = Balanced_panel)
Residuals:
Min 1Q Median 3Q Max
-3.0248 -0.5679 0.1094 0.5835 1.8912
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.19526 0.16940 -1.153 0.258
Factor1 0.22610 0.01456 15.528 < 2e-16 ***
Factor2 0.06135 0.01174 5.228 1.02e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.002 on 32 degrees of freedom
Multiple R-squared: 0.8995, Adjusted R-squared: 0.8932
F-statistic: 143.1 on 2 and 32 DF, p-value: < 2.2e-16
The function nowcast.plot()
enables the user to visualize some of the
results. Say, for instance, that we want to look at fitted values and
out-of-sample forecasts. This can be achieved by setting the type to
"fcst"
. We might also want to look at the eigenvalues of the
normalized variance-covariance matrix of our balanced panel or at how
variables enter the first factor.
> nowcast.plot(now, type = "fcst")
> nowcast.plot(now, type = "eigenvalues")
> nowcast.plot(now, type = "eigenvectors")
Up until now, we have been forecasting GDP after transforming it into a stationary variable. We might want to transform the former back into a level variable in order to forecast the actual growth rate. Remember that we transformed GDP according to
that can be rewritten as
Equation ((15)) gives us the forecast of the new quarter GDP
level. The variable BRGDP$GDP
is the non-stationary GDP.
> level_forecast <- na.omit(now$yfcst[,3])[1] - tail(na.omit(GDP_qtr),5)[1] +
+ + tail(na.omit(GDP_qtr),5)[5] + tail(na.omit(GDP_qtr),5)[2]
> level_forecast
[1] 170.4783
> position_q2_2015 <- which(time(BRGDP$GDP) == 2015.25)
> BRGDP$GDP[position_q2_2015]
[1] 169.24
In this example we work with the data the Federal Reserve of New York
made available to reproduce its weekly nowcasting report
> library(nowcasting)
> data(NYFED)
> NYFED$legend$SeriesName
[1] "Payroll Employment" "Job Openings"
[3] "Consumer Price Index" "Durable Goods Orders"
[5] "Retail Sales" "Unemployment Rate"
[7] "Housing Starts" "Industrial Production"
[9] "Personal Income" "Exports"
[11] "Imports" "Construction Spending"
[13] "Import Price Index" "Core Consumer Price Index"
[15] "Core PCE Price Index" "PCE Price Index"
[17] "Building Permits" "Capacity Utilization Rate"
[19] "Business Inventories" "Unit Labor Cost"
[21] "Export Price Index" "Empire State Mfg Index"
[23] "Philadelphia Fed Mfg Index" "Real Consumption Spending"
[25] "Real Gross Domestic Product"
Similarly to the previous working example, the object NYFED
contains
all the necessary information to run the nowcast()
function. The time
series, the block structure, the transformations to make the variables
stationary and the variables’ frequencies can be loaded as illustrated
below.
> base <- NYFED$base
> blocks <- NYFED$blocks$blocks
> trans <- NYFED$legend$Transformation
> frequency <- NYFED$legend$Frequency
> delay <- NYFED$legend$delay
The dataset data
can be prepared by using the function Bpanel()
.
Using the EM algorithm, there is no need to replace missing values that
are not part of the jagged edges, as was the case with the Two-Stage
method. This can be achieved by setting NA.replace
to FALSE
. In this
case we do not want to discard series based on a particular ratio of
missing values to total observations as was the case in the Two-Stage
method. This is done by setting na.prop = 1
, where 1 indicates that
only series with more than 100% missing values will be discarded.
> data <- Bpanel(base = base, trans = trans, NA.replace = FALSE, na.prop = 1)
The model´s specifications are the same as those used by the NY FED. We
therefore limit the number of factors, r
, per block to one and define
the factor process as a VAR(1), i.e., p = 1
. The convergence of the
log-likelihood function is displayed every 5 iterations.
> nowEM <- nowcast(formula = GDPC1~., data = data, r = 1, p = 1, method = "EM",
blocks = blocks, frequency = frequency)
5th iteration:
The loglikelihood went from -2418.5983 to -2406.1482
...
65th iteration:
The loglikelihood went from -2354.084 to -2353.8435
Combining the functions nowcast()
and PRTB()
within a loop, we
illustrate how a pseudo out-of-sample end-of-quarter nowcast can be
made. The vector fcst_dates
defines the last month of the quarters for
which quarterly GDP growth will be nowcast. The vector delay
contains
approximate delays, in days, with which variables are published. This
enables us to construct a pseudo real-time dataset for a given day.
> fcst_dates <- seq.Date(from = as.Date("2013-03-01"),to = as.Date("2017-12-01"),
by = "quarter")
> fcst_results <- NULL
> for(date in fcst_dates)\{
+
+ vintage <- PRTDB(data, delay = delay, vintage = date)
+ nowEM <- nowcast(formula = GDPC1~., data = vintage, r = 1, p = 1, method = "EM",
blocks = blocks, frequency = frequency)
+ fcst_results <- c(fcst_results,tail(nowEM$yfcst[,3],1))
+
+ \}
The results of this out-of-sample nowcast example, as well as the results of an out-of-sample ARIMA, are displayed below.
The root mean square prediction error can easily be calculated for the 2013-2016 period. For this given example, when compared to one-period-ahead projections given by an ARIMA model, a Theil’s U statistic of 0.70 is obtained, signaling a 30% improvement over the benchmark.
The package nowcasting was developed in order to facilitate the use of
dynamic factor models for large datasets as set out in Giannone et al. (2008)
and Banbura et al. (2011). The package offers functions at each step of the
forecasting process to help the user treat data, choose and estimate the
value of parameters, as well as interpret results. We provided a working
example for nowcasting Brazilian GDP, illustrating each step and showing
how to implement the various functions available. We also used the New
York FED nowcasting exercise to illustrate the EM algorithm. We will, in
the future, work on adding new tools for the user to better leverage the
EM
method by identifying the source of forecast revisions. As shown by
the New York FED nowcasting report, this is an interesting policy
instrument that helps contextualizing forecast updates.
We thank Daniel Mesquita for revising some of the codes and our colleagues from FGV-IBRE for helpful inputs. We also thank an anonymous referee and the R journal editor Olivia Lau for constructive comments. The authors are responsible for any errors in this paper. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Valk, et al., "Nowcasting: An R Package for Predicting Economic Variables Using Dynamic Factor Models", The R Journal, 2019
BibTeX citation
@article{RJ-2019-020, author = {Valk, Serge de and Mattos, Daiane Marcolino de and Ferreira, Pedro Guilherme Costa}, title = {Nowcasting: An R Package for Predicting Economic Variables Using Dynamic Factor Models}, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {1}, issn = {2073-4859}, pages = {230-244} }