This article explains the usage of R package tvReg, publicly available for download from the Comprehensive R Archive Network, via its application to economic and finance problems. The six basic functions in this package cover the kernel estimation of semiparametric panel data, seemingly unrelated equations, vector autoregressive, impulse response, and linear regression models whose coefficients may vary with time or any random variable. Moreover, this package provides methods for the graphical display of results, forecast, prediction, extraction of the residuals and fitted values, bandwidth selection and nonparametric estimation of the time-varying variance-covariance matrix of the error term. Applications to risk management, portfolio management, asset management and monetary policy are used as examples of these functions usage.
A very popular research area has been brewing in the field of kernel smoothing statistics applied to linear models with time-varying coefficients. In econometrics, Robinson (1989) was the first to analyse these models for linear regressions with time-varying coefficients and stationary variables. Since then, this literature has extended to models with fewer restrictions in the dependence of the variables to models with time dependence in the error term and to multi-equation models. Although these models are potentially applicable to a large number of areas, no comprehensive computational implementation is, to our knowledge, formally available in any of the commercial programming languages. The package tvReg contains the aforementioned functionality, input and output interface, and user-friendly documentation.
Parametric multi-equation linear models have increased in popularity in the last decades due to an increase in access to multiple datasets. Their application extends to, perhaps, every field of quantitative research. Just to mention some, they are found in biostatistics, finance, economics, business, climate, linguistics, psychology, engineering and oceanography. Panel linear models (PLM) are widely used to account for the heterogeneity in the cross-section and time dimensions. Seemingly unrelated equations (SURE) and vector autoregressive models (VAR) are the extensions of linear regressions and autoregressive models to the multi-equation framework. Programs with these algorithms are found in all major programming languages. Particularly in R, the package plm (Croissant and Millo 2008, 2018) contains a comprehensive functionality for panel data models. The package systemfit (Henningsen and Hamann 2007) allows the estimation of coefficients in systems of linear regressions, both with equation error terms correlated among equations (SURE) or uncorrelated. Finally, the package vars (Pfaff 2008) provides the tools to fit VAR models and impulse response functions (IRF). All these functions assume that the coefficients are constant. This assumption might not be true when a time series runs for a long period, and the relationships among variables do change. The package tvReg is relevant in this case.
In comparison to parametric models, the appeal of nonparametric models is their flexibility and robustness to functional form misspecification, with spline-based and kernel-based regression methods being the two main nonparametric estimation techniques, (e.g. Eubank 1999). However, fully nonparametric models are not appropriate when many regressors are in play, as their rate of convergence decreases with the number of regressors, the infamous “curse of dimensionality”. In the case of cross-section data, a popular alternative to avoid this problem are the generalised additive models (GAM), introduced by Hastie and Tibshirani (1993). The GAM is a family of semiparametric models that extends parametric linear models by allowing for non-linear relationships of the explanatory variables and still retaining the additive structure of the model. In the case of time-series data, the most suitable alternative to nonparametric models is the linear models whose coefficients change over time or follow the dynamics of another random variable. This functionality is coded in R, within the single-equation framework, in packages mgm (Haslbeck and Waldorp 2020), and MARSS (Holmes et al. 2012). Package tvReg uses the identical kernel smoothing estimation as package mgm when using a Gaussian kernel to estimate a VAR model with varying coefficients (TVVAR). However, the interpretation of their results is different because they are aimed at different audiences. The mgm focuses in the field of network models, producing network plots to represent relationships between current variables and their lags. Whereas the tvReg focuses in the field of economics where a direct interpretation of the TVVAR coefficients is not meaningful and may be done via the time-varying impulse response function (TVIRF) instead. Models with coefficients varying over time can also be expressed in state space form, which assumes that the coefficients change over time in a determined way for example, as a Brownian motion. These models can be estimated using the Kalman filter or Bayesian techniques, for instance (Primiceri 2005; Liu and Guo 2020). Packages MARSS and bvarsv (Krueger 2015) implement this approach based on the Carter and Kohn (1994) algorithm to estimate the TVVAR. On top of all this and as far as we can tell, the tvReg is the only package containing tools to estimate time-varying coefficients seemingly unrelated equation (TVSURE) and panel linear models (TVPLM) in R.
Simply, the main objective of the tvReg is to provide tools to estimate and forecast linear models with time-varying coefficients in the framework of kernel smoothing estimation, which may be difficult for the nonspecialised end-user to code. For completion, the tvReg also implements methods for the time-varying coefficients linear model (TVLM) and the time-varying coefficients autoregressive (TVAR) model. Often, these can be estimated using packages gam (Hastie 2022) and mgcv (Wood 2017), which combine (restricted) marginal likelihood techniques in combination with nonparametric methodologies. However, the advantage of using the tvReg is that it can handle dependency and any kind of distribution in the error term because it combines least squares techniques with nonparametric methodologies. An example of this is shown in Section 8.
Summing up, this paper presents a review of the most common time-varying coefficient linear models studied in the econometrics literature during the last two decades, their estimation using kernel smoothing techniques, the usage of functions and methods in the package tvReg, and their latest applications. Along these lines, Table 1 offers a glimpse at the tvReg full functionality, displaying a summary of its methods, classes and functions.
Function | Class | Function and Methods for class | Based on |
---|---|---|---|
tvPLM |
"tvPLM" |
tvRE , tvFE , coef , confint , fitted , forecast , plot , predict , print , resid , summary |
plm::plm |
tvSURE |
"tvsure" |
tvGLS , bw , coef , confint , fitted , forecast , plot , predict , print , resid , summary |
systemfit::systemfit |
tvVAR |
"tvvar" |
tvAcoef , tvBcoef , tvIRF , tvOLS , tvPhi , tvPsi , bw , coef , confint , fitted , forecast , plot , predict , print , resid , summary |
vars::VAR |
tvIRF |
"tvirf" |
coef , confint , plot , print , summary |
vars::irf |
tvLM |
"tvlm" |
tvOLS , bw , coef , confint , fitted , forecast , plot , predict , print , resid , summary |
stats::lm |
tvAR |
"tvar" |
tvOLS , bw , coef , confint , fitted , forecast , plot , predict , print , resid , summary |
stats::ar.ols |
A multi-equation model formed by a set of linear models is defined when each equation has its own dependent variable and possible different regressors. Seemingly unrelated equations, panel data models and vector autoregressive models are included in this category.
The SURE was proposed by Zellner (1962) and is referred to as the
seemingly unrelated equations model (SURE). The SURE model is useful to
exploit the correlation structure between the error terms of each
equation. Suppose that there are
It is important to differentiate between two types of smoothing
variables: 1)
The estimation of system ((1)) may be done separately for
each equation as if there is no correlation in the error term across
equations, i.e. system ((1)) has a total of
A second option is to use the correlation matrix of the error term in
the estimation of system ((1)). This is called the
time-varying generalised least squares (TVGLS) estimation. Its
mathematical expression is the same as ((2)) with the
following matrix components:
The TVGLS assumes that the error variance-covariance matrix is known. In practice, this is unlikely and it must be estimated, resulting in the Feasible TVGLS estimator (TVFGLS). This estimator consists of two steps:
Estimate
Estimate the coefficients of the TVSURE by plugging in
To ensure a good estimation of
Panel data linear models (PLM) are a particular case of SURE models with
the same variables for each equation but measured for different
cross-section units, such as countries, and for different points in
time. All equations have the same coefficients apart from the intercept
which can be different for different cross-sections. Therefore, the data
from all cross-sections can be pooled together. The individual effects,
Coefficient dynamics can be added to classical PLM models using a
time-varying coefficients panel data model, TVPLM. Recent developments
in this kind of models can be found in Sun et al. (2009; Dong, C. Jiti Gao, J. and Peng, B. 2015; Casas et al. 2021; Dong et al. 2021) among others, with
general model,
The time-varying pooled ordinary least squares (TVPOLS) has the same
expression than estimator ((2)) with the following
terms:
The time-varying random effects (TVRE) estimator is also given by
Equation ((5)) with a non-identity
The time-varying fixed effects (TVFE) estimator. Unfortunately, the
transformation for the within estimation does not work in the
time-varying coefficients model because the coefficients depend on
time (Sun et al. 2009 explain the issue in detail). Therefore, it is
necessary to make the assumption that
Macroeconomic econometrics experienced a revolution when Sims (1980)
presented the vector autoregressive (VAR) model: a new way of
summarising relationships among several variables while getting around
the problem of endogeneity of structural models. The VAR model has
lagged values of the dependent variable,
The TVVAR(
In the macroeconomic literature, the Bayesian estimation of process
((8)) has attracted a lot of attention in recent years
driven by results in Cogley and Sargent (2005; Primiceri 2005) and
Kapetanios et al. (2012). In their approach, the coefficients are assumed to
follow a random walk. Recently, Kapetanios et al. (2017) studied the
inference of the local constant estimator of a TVVAR(
tvSURE
The main argument of this function is a list of formulas, one for each
equation. The formula
follows the format of formula
in the package
systemfit, which implements estimators of parametric multi-equation
models with constant coefficients. The tvSURE
wraps the tvOLS
and
tvGLS
methods to estimate the coefficients of system
((1)). The tvOLS
method is used by default, calculating
estimates for each equation independently with different bandwidths,
bw
. The user is able to enter a set of bandwidths or a single
bandwidth to be used in the estimation instead. The tvGLS
method has
argument Sigma
where a known variance-covariance matrix of the error
can be entered in Equation ((3)). Otherwise, if
Sigma = NULL
, the variance-covariance matrix tvCov
, which is discussed in Section 6.
In addition to formula
, function tvSURE
has other arguments to
control and choose the desired estimation procedure:
All methods assume by default that the coefficients are unknown
functions of z
is set to
NULL
. The user can modify this setting by entering a numeric
vector in argument z
with the values of the random smoothing
variable over the corresponding time period. Note that the current
version only allows one single smoothing random variable, z
,
common for all equations; and balanced panels.
When argument bw
is set to NULL
, it is automatically selected by
leave-one-out cross-validation. It is possible to select it by
leave-cv.block = k
(k=0
by default). This minimisation can be slow for
large datasets, and it should be avoided if the user knows an
appropriate value of the bandwidth for the required problem.
The three choices for this argument are tkernel = "Triweight"
(default), tkernel = "Epa"
and tkernel = "Gaussian"
. The first
two options refer to the Triweight and Epanechnikov kernels, which
are compact in [-1, 1]. The authors recommend the use of either of
those two instead of the Gaussian kernel which, in general, requires
more calculations.
The default estimation methodology is the Nadaraya-Watson or local
constant, which is set as (est = "lc"
) and it fits a constant at
each interval defined by the bandwidth. The argument est = "ll"
can be chosen to perform a local linear estimation (i.e., to fit a
polynomial of order 1).
The tvOLS
method used in the estimation wraps the lm.wfit
method, which at default allows the fitting of a low-rank model, and
the estimation coefficients can be NAs. The user can change the
argument singular.ok
to FALSE
, so that the program stops in case
of a low-rank model.
The user can restrict certain coefficients in the TVSURE model using
arguments R
and r
. Note that the restriction is done by setting
those coefficients to a constant. Furthermore, argument method
defines
the type of estimator to be used. The possible choices in argument
method
are:
"tvOLS"
for a line by line estimation, i.e, with
"tvGLS"
to estimate the coefficients of the system using
Sigma
.
Argument Sigma
takes either a symmetric matrix or an array. If
Sigma
is a matrix (constant over time) then it must have
dimensions neq Sigma
is an array of dimension neq method ="tvGLS"
is entered but Sigma = NULL
, then tvSURE
is
fitted as if method = "tvOLS"
and a warning is issued.
"tvFGLS"
to estimate the coefficients of the system using an
estimate of control
indicates otherwise.
The user can choose the maximum number of iterations or the level of
tolerance in the estimation of
The package systemfit contains the Kmenta
dataset, which was first
described in Kmenta (1986), to show the usage of the function systemfit
to fit SURE models. This example has two equations: i) a demand
equation, which explains how food consumption per capita, consump
,
depends on the ratio of food price, price
; and disposable income,
income
; and ii) a supply equation, which shows how consumption depends
on price
, ratio prices received by farmers to general consumer prices,
farmPrice
; and a possible time trend, trend
. Mathematically, this
SURE model is
formula
calls
which are put into a "list"
.
> data("Kmenta", package = "systemfit")
> eqDemand <- consump ~ price + income
> eqSupply <- consump ~ price + farmPrice + trend
> system <- list(demand = eqDemand, supply = eqSupply)
Two parametric models are fitted to the data using the function
systemfit
: one assuming that there is no correlation of the errors
setting (the default), OLS.fit
below; and another one assuming the
existence of correlation in the system error term setting
method = "SUR"
, FGLS1.fit
below. Arguing that the coefficients in
((10)) may change over time, the corresponding TVSUREs are
fitted by using the the function tvSURE
with the default in the
argument method
and by method = "tvFGLS"
, respectively. They are
denoted by TVOLS.fit
and TVFGLS1.fit
.
> OLS.fit <- systemfit::systemfit(system, data = Kmenta)
> FGLS1.fit <- systemfit::systemfit(system, data = Kmenta, method = "SUR")
> TVOLS.fit <- tvSURE(system, data = Kmenta)
> TVFGLS1.fit <- tvSURE(system, data = Kmenta, method = "tvFGLS")
In the previous chunk, the FGLS and TVFGLS estimators use only one
iteration. However, the user can choose the iterative FGLS and the
iterative TVFGLS models, which estimate the coefficients iteratively
until convergence. The convergence level can be chosen with the argument
tol
(1e-05 by default) and the argument maxiter
with the maximum
number of iterations. The following chunk illustrates its usage:
> FGLS2.fit <- systemfit::systemfit(system, data = Kmenta, method = "SUR",
+ maxiter = 100)
> TVFGLS2.fit <- tvSURE(system, data = Kmenta, method = "tvFGLS",
+ control = list(tol = 0.001, maxiter = 100))
Some of the coefficients can be restricted to have a certain constant
value in tvSURE
. This can aid statistical inference to test certain
conditions. See an example of this below. Matrix R
has as many rows as
restrictions in r
and as many columns as regressors in the model. In
this case, Model ((10)) has 7 coefficients which are ordered
as they appear in the list of formulas. Note that the time-varying
coefficient of the variable trend
is redundant when an intercept is
included in the second equation of the TVSURE. Therefore, we want to
restrict its coefficient to zero. For illustration, we also impose
> Rrestr <- matrix(0, 2, 7)
> Rrestr[1, 7] <- 1; Rrestr[2, 2] <- 1; Rrestr[2, 5] <- -1
> qrestr <- c(0, 0.5)
> TVFGLS.rest <- tvSURE(system, data = Kmenta, method = "tvFGLS",
+ R = Rrestr, r = qrestr,
+ bw = TVFGLS1.fit$bw, bw.cov = TVFGLS1.fit$bw.cov)
Several studies have argued that the three-factor model by
Fama and French (1993) does not explain the whole variation in average returns.
In this line, Fama and French (2015) added two new factors that measure the
differences in profitability (robust and weak) and investment
(conservative and aggressive), creating their five-factor model (FF5F).
This model has been applied in Fama and French (2017) to analyse the
international markets. A time-varying coefficients version of the FF5F
has been studied in Casas et al. (2019), whose dataset is included in the
tvReg under the name of FF5F
. The TVFF5F model is
The FF5F
dataset has been downloaded from the Kenneth R. French (2016) data
library. It contains the five factors from four different international
markets: North America (NA), Japan (JP), Europe (EU), and Asia Pacific
(AP). For the dependent variable, the excess returns of portfolios
formed on size and book-to-market have been selected. The period runs
from July 1990 to August 2016 and it has a monthly frequency. The data
contains the Small/Low, Small/High, Big/Low and Big/High portfolios. The
factors in the TVFF5F model explain the variation in returns well if the
intercept is statistically zero. The lines of code below illustrate how
to fit a TVSURE to the Small/Low portfolio.
> data("FF5F")
> eqNA <- NA.SMALL.LoBM - NA.RF ~ NA.Mkt.RF + NA.SMB + NA.HML + NA.RMW + NA.CMA
> eqJP <- JP.SMALL.LoBM - JP.RF ~ JP.Mkt.RF + JP.SMB + JP.HML + JP.RMW + JP.CMA
> eqAP <- AP.SMALL.LoBM - AP.RF ~ AP.Mkt.RF + AP.SMB + AP.HML + AP.RMW + AP.CMA
> eqEU <- EU.SMALL.LoBM - EU.RF ~ EU.Mkt.RF + EU.SMB + EU.HML + EU.RMW + EU.CMA
> system2 <- list(NorthA = eqNA, JP = eqJP, AP = eqAP, EU = eqEU)
> TVFF5F <- tvSURE(system2, data = FF5F, method = "tvFGLS",
+ bw = c(0.56, 0.27, 0.43, 0.18), bw.cov = 0.12)
The package tvReg also includes the functionality to compute
confidence intervals for the coefficients of class attributes "tvlm"
,
"tvar"
, "tvplm"
, "tvsure"
and "tvirf"
by extending the confint
method. The algorithm in Fan and Zhang (2000) and Chen et al. (2017) to calculate
bootstrap confidence intervals has been adapted for all these class
attributes. Argument level
is set to 0.95 (95% confidence interval) by
default. Argument runs
(100 by default) is the number of resamples
used in the bootstrapping calculation. Note that the calculation using
runs = 100
can take long, so we suggest to try a small value in runs
first to get an initial intuition of the results. Because coefficients
are time-varying, only wild bootstrap residual resampling is
implemented. Two choices of wildbootstrap are allowed in argument
tboot
: the default one proposed in Mammen (1993) (tboot = "wild"
); and
the standard normal (tboot = "wild2"
).
In the backend code, coefficient estimates from all replications are
stored in the BOOT
variable. In this way, calculations are not done
again if the user chooses a different level
for the same object. In
the chunk below, the confint
method calculates the 90% confidence
interval of the object TVFF5F
. Posteriorly, the 95% interval is
calculated quickly because the resample calculations in the first
interval are re-used for the second.Thus, the 90% confidence interval
calculation takes around 318 seconds with a 2.2 GHz Intel Core i7
processor and the posterior 95% confidence interval takes only around
0.7 seconds.
> TVFF5F.90 <- confint(TVFF5F, level = 0.90)
> TVFF5F.95 <- confint(TVFF5F.90)
The plot
method is implemented for each of the six class attributes in
tvReg. For example, the 95% confidence intervals of the intercept for
the North American, Japanese, Asia Pacific and European markets
Figure 1 are with plot
statement below, that produces
four independent plots of the first variable (the intercept in this
case) in each equation due to argument vars = 1
.
> plot(TVFF5F.95, vars = 1)
The user can also choose to plot the coefficients of several variables and/or equations. Plots will be grouped by equation, with a maximum of three variables per plot. The piece of code below show how to plot the coefficients of the second and third variables from the Japan market equation, which results can be seen in Figure 2.
> plot(TVFF5F.95, vars = c(2, 3), eqs = 2)
tvPLM
The tvPLM
method is inspired by the plm
method from the package
plm. It converts data
into an object of the class attribute
"pdata.frame"
using argument index
to define the cross-section and
time dimensions. If index = NULL
(default), the two first columns of
data
define the dimensions. The tvPLM
wraps the tvRE
and tvFE
methods to estimate the coefficients of time-varying panel data models.
The user can provide additional optional arguments to modify the default
estimation. See section 3 for details on arguments z
,
bw
, est
and tkernel
. Furthermore, argument method
defines the
estimator used. The possible choices based on package plm choices are:
"pooling"
(default), "random"
and "within"
.
The income elasticity of healthcare expenditure is defined as the
percentage change in healthcare expenditure in response to the
percentage change in income per capita. If this elasticity is greater
than one, then healthcare expenditure grows faster than income, as
luxury goods do, and is driven by market forces alone. The heterogeneity
of health systems among countries and time periods have motivated the
use of panel data models, for example in Gerdtham et al. (1992) who use a FE
model. Recently, Casas et al. (2021) have investigated the problem from the
time-varying panel models perspective using the TVFE estimation. In
addition to the income per capita, measured by the log GDP, the authors
use the proportion of population over 65 years old, the proportion of
population under 15 years old and the share of public funding of
healthcare. The income elasticity estimate with a FE implemented in the
plm
is greater than 1, a counter-intuitive result. This issue is
resolved using the TVFE implemented in the tvReg. The code below
estimates coefficients with the parametric and semiparametric models:
> data("OECD")
> elast.fe <- plm::plm(lhe ~ lgdp + pop65 + pop14 + public, data = OECD,
+ index = c("country", "year"), model = "within")
> elast.tvfe <- tvPLM (lhe ~ lgdp + pop65 + pop14 + public, data = OECD,
+ index = c("country", "year"), method = "within",
+ bw = 0.67)
> elast.fe <- confint(elast.fe)
> elast.tvfe <- confint(elast.tvfe)
Figure 3 shows the elasticity estimates using the FE and TVFE estimators. The constant coefficients model (dashed line) suggests that healthcare is a luxury good (over 1), while the time-varying coefficients (solid line) model suggests it is a value under 0.8.
tvVAR
and tvIRF
A TVVAR(y
, is of the class attribute
"matrix"
or "data.frame"
with as many columns as equations.
Regressors are the same for all equations and they contain an intercept
if the argument type = "const"
(default) or not if type = "none"
;
lagged values of y
; and other exogenous variables in exogen
.
Econometrically, the tvOLS
method is called to calculate the estimates
for each equation independently using one bandwidth per equation. The
user can choose between automatic bandwidth selection; or entering a one
value in bw
, meaning that all equations will be estimated with the
same bandwidth; or a vector of bandwidths, one for each equation. The
tvVAR
returns a list
of the class attribute tvvar
, which can be
used to estimate the TVIRF model with the function tvIRF
.
The assessment and forecast of the effects of monetary policy on
macroeconomic variables, such as inflation, economic output and
employment is commonly modelled using the econometric framework of VAR
and interpreted by the IRF. In recent years, scholars of
macroeconometrics have searched intensely for a way to include time
variation in the coefficients and covariance matrix of the VAR model.
The reason for this is that the macroeconomic climate evolves over time
and effects of monetary policy must be identified locally rather than
globally. In the Bayesian framework, Primiceri (2005) used the
Carter and Kohn (1994) algorithm to fit the TVP-VAR to this monetary policy
problem. Results of the latter can be replicated with the functions in
the package bvarsv and compared with results in the tvReg that fits
the following TVVAR(4):
Central banks commonly regulate the money supply by changing the
interest rates to keep a stable inflation growth. The R code below uses
macroeconomic data from the United States, exactly the one used in
Primiceri (2005), with the following three variables: inflation rate
(inf
), unemployment rate (une
) and the three months treasury bill
interest rate (tbi
). For illustration, a VAR(4) model is estimated
using the function VAR
from the package vars, a TVVAR(4) model is
estimated using the function tvVAR
from the package tvReg and a
TVP-VAR(4) model is estimated using the function bvar.sv.tvp
from the
package bvarsv. Furthermore, their corresponding impulse response
functions with horizon 20 are calculated to forecast how the inflation
responds to a positive shock in interest rates. The TVVAR(4) can also be
estimated with function tvmvar
from R package mgm, which will give
the same coefficient estimates than the tvVAR
for the Gaussian kernel
and same bandwidth. However, package mgm does not have an impulse
response function and, for this reason, it is left out of the example.
> data(usmacro, package = "bvarsv")
> VAR.usmacro <- vars::VAR(usmacro, p = 4, type = "const")
> TVVAR.usmacro <- tvVAR(usmacro, p = 4, bw = c(1.14, 20, 20), type = "const")
> TVPVAR.usmacro <- bvarsv::bvar.sv.tvp(usmacro, p = 4, pdrift = TRUE, nrep = 1000,
+ nburn = 1000, save.parameters = TRUE)
The user can provide additional optional arguments to modify the default
estimation. See Section 3 to understand the usage of
arguments bw
, tkernel
, est
and singular.ok
. In addition, the
function tvVAR
has the following arguments:
The number of lags is given by the model order set in the argument
p
.
Other exogenous variables can be included in the model using the
argument exogen
, which accepts a vector or a matrix with the same
number of rows as the argument y
.
The default model contains an intercept (i.e., it has a mean
different from zero). The user can set argument type = "none"
, so
the model has mean zero.
The variance-covariance matrix from the residuals of a TVVAR(plot
method for object of
class attribute "tvvar"
displays as many plots as equations, each plot
with the fitted and residuals values as it is shown in
Figure 4 obtained with:
> plot(TVVAR.usmacro)
Figure 4 shows the residuals of the inflation equation that has a mean close to zero and the fitted values are fitting the observed values closely.
TVVAR.usmacro
for the inflation equation. The dots in the
top plot represent the observed values and the red line represents the
fitted values, while the black line in the bottom plot represents the
returns of the estimation. The model fits the observed values well and
the returns appear to have zero mean and constant variance.
Function tvIRF
estimates the TVIRF with main argument, x
, which is
an object of class attribute "tvvar"
returned by the function tvVAR
.
The user can provide additional optional arguments to modify the default
estimation as explained below.
The user has the option to pick a subset of impulse variables and/or
response variables using arguments impulse
and response
.
The horizon of the TVIRF coefficients can be chosen by the user with
argument n.ahead
, the default is 10.
The orthogonalised impulse response function is computed by default
(ortho = TRUE
). In the orthogonal case, the estimation of the
variance-covariance matrix of the errors is estimated as
time-varying (ortho.cov = "tv"
) by default (see Section
6 for theoretical details). Note that the user can
enter a value of the bandwidth for the variance-covariance matrix
estimation in bw.cov
. It is possible to use a constant
variance-covariance matrix by setting ortho.cov = "const"
.
If the user desires to obtain the cumulative TVIRF values, then
argument cumulative
must be set to TRUE
.
Following the previous example, the lines of code below estimate the IRF using the package vars, the TVP-IRF using the package bvarsv and the TVIRF using the package tvReg.
> IRF.usmacro <- vars::irf(VAR.usmacro, impulse = "tbi", response = "inf", n.ahead = 20)
> TVIRF.usmacro <- tvIRF(TVVAR.usmacro, impulse = "tbi", response = "inf", n.ahead = 20)
> TVPIRF.usmacro <- bvarsv::impulse.responses(TVPVAR.usmacro, impulse.variable = 3,
+ response.variable = 1, draw.plot = FALSE)
A comparison of impulse response functions from the three estimations is plotted in Figure 5, whose R code is shown below:
> irf1 <- IRF.usmacro$irf[["tbi"]]
> irf2 <- TVIRF.usmacro$irf[["tbi"]]
> irf3 <- TVPIRF.usmacro$irf
> ylim <- range(irf1, irf2[150,,], irf3[50,])
> plot(1:20, irf1[-1], ylim = ylim, main = "Impulse variable: tbi from 1990Q2",
+ xlab ="horizon", ylab ="inf", type ="l", lwd = 2)
> lines(1:20, irf2[150,,-1], lty = 2, lwd = 2)
> lines(1:20, irf3[50,], lty = 3, lwd = 2)
Figure 5 displays the IRF, the TVIRF and the TVP-IRF
(the two latter at time 150 in our database, which corresponds to the
second quarter of 1990) for horizons 1 to 20. The IRF and TVIRF follow a
similar pattern: a positive shock of one unit in the short-term interest
rates (tbi
) during 1990Q2 results in an initial drop in inflation
during the first three months, followed by an increase for two or three
months and finally in a steady decrease until it plateaus one year
after. The left plot shows an increase in inflation during the first
three months and a drop after.
The confint
method is also implemented for the class attribute
"tvirf"
. Remember that the TVIRF model contains one impulse response
function for each data time record. So, the full plot of TVIRF would
have as many lines as the number of rows in the dataset. Instead, the
plot
method displays only one line by default, the mean value of all
those impulse response functions and it issues a warning. The user can
enter one or several values into argument obs.index
to plot the IRF at
the desire point(s) in time.
The time-varying variance-covariance matrix of two or more series is
estimated nonparametrically in tvReg. Given a random process
The user must be aware that the local linear estimator can return
non-positive definite matrices for small samples. Although the local
constant estimator, calculated when tvCov
.
The function tvCov
is called by the function tvIRF
to calculate the
orthogonal TVIRF, and by the function tvSURE
for method = "tvFGLS"
to estimate the variance-covariance matrix of the error term. The
function tvCov
can generally be used to estimate the time-varying
covariance matrix of any two or more series.
Aslanidis and Casas (2013) consider a portfolio of daily US dollar exchange
rates of the Australian dollar (AUS), Swiss franc (CHF), euro (EUR),
British pound (GBP), South African rand (RAND), Brazilian real (REALB)
and Japanese yen (YEN), over the period from January 6, 1999 until May
7, 2010 (T = 2855 observations). This dataset contains the standarised
rates after “devolatilisation”; i.e., after standarising the rates using
the GARCH(1,1) estimates of the volatility and it is available in the
tvReg under the names of CEES. A portfolio consisting of these
currencies is well diversified containing some safe haven currencies,
active and liquid currencies and currencies that perform well in times
of high interest rates. The estimation of the correlation matrix among
these currencies is essential for portfolio management. The model is
tvCov
and then used in risk management, for example
to calculate the Value-at-risk, denoted by VaR in the financial
literature. The VaR, not be confused with the VAR, measures the level of
financial risk of a portfolio, asset or firm. The VaR of an asset omega
in the code below). The portfolio weights are
the percentage of the total portfolio investment in each asset and can
be chosen to be constant or changing over time. In the code below,
weights are calculated by minimum variance at each point in time. The
estimated VaR of this example portfolio is shown in
Figure 6.
> data(CEES)
> VaR <- numeric(nrow(CEES))
> Ht <- tvCov(CEES[, -1], bw = 0.12)
> e <- rep (1, ncol(CEES)-1)
> for (t in 1:nrow(CEES)){
+ omega <- solve(Ht[,,t])%*%e/((t(e)%*%solve(Ht[,,t])%*%e)[1])
+ VaR[t] <- abs(qnorm(0.05))*sqrt(max(t(omega)%*%Ht[,,t]%*%omega,0))
+ }
> plot(as.Date(CEES[, "Date"]), VaR, type ="l", xlab = "year",
+ ylab = expression(VaR[t]), main="VaR of CEES over time")
A varying coefficients linear model (TVLM) is generally expressed by
The case of
tvLM
The function tvLM
fits a TVLM using the tvOLS
method. The tvLM
follows the standards of the function lm
with main arguments formula
and data
. The only mandatory argument is formula
, which should be a
single formula for a single-equation model. This arguments follows the
standard regression formula
in R. The function tvLM
returns an
object of the class attribute tvlm
. This model is in some cases a
GAM-type model which is implemented in the comprehensive and
well-established mgcv package. The mgcv uses a methodology different
from kernel smoothing to estimate the varying coefficients, involving
splines and quasi-maximum likelihood estimation. The advantage of using
kernel smoothing techniques to estimate the TVLM is that it can handle
dependency and any kind of distribution in the error term. For
illustration of this difference between the two packages in relation to
the TVLM, the following model is generated:
lm
are
constant and lie around the average of all tvLM
and gam
follow the
dynamics of the varying coefficients. Besides the estimates of gam
fit
tvLM
,
although it requires for a longer computation time, it is able to fit
both coefficients well.
> tau <- seq(1:1000)/1000
> d <- data.frame(tau, beta1 = sin(2 * pi * tau), beta2 = 2 * tau,
+ x1 = rt(1000, df = 2), x2 = rchisq(1000, df = 4))
> error.cov <- exp(-as.matrix(dist(tau))/10)
> error <- t(chol(error.cov)) %*% rchisq(N, df = 2)
> d <- transform(d, y = x1 * beta1 + x2 * beta2 + error)
> lm1 <- stats::lm(y ~ x1 + x2, data = d)
> TVLM1 <- tvLM(y ~ x1 + x2, data = d, bw = 0.05, est ="ll")
> gam1 <- mgcv::gam(y ~ s(tau, by = x1) + s(tau, by = x2), data = d)
lm
,
tvLM
and gam
estimates of β1t and β2t. The true
values are plotted in black, the red lines represent the lm
estimates, the green lines refer to the tvLM
estimates and
the blue lines represent the gam
estimates. This result
suggests that the TVLM is preferable for modelling non-linear varying
coefficients under strong dependency.
In addition to formula
, the function tvLM
has the arguments
described in Section 3 above. Also methods confint
,
fitted
, print
, plot
, residuals
and summary
are implemented for
class "tvlm"
.
The summary
method displays: (i) a summary of all coefficient values
over the whole time period, (ii) the value of the bandwidth(s), and
(iii) a measure of goodness-of-fit, pseudo-R"tvsure"
, "tvplm"
, "tvvar"
, "tvlm"
and
"tvar"
and it is calculated with the classical equation,
tvAR
A TVAR model is a particular case of TVLM whose regressors contain
lagged values of the dependent variable, y
. The number of lags is
given by the model order set in the argument p
. Other exogenous
variables can be included in the model using the argument exogen
,
which accepts a vector or a matrix with the same number of rows as the
argument y
. An intercept is included by default unless the user enters
type = "none"
into the function call. Econometrically, this function
also wraps the tvOLS
estimator, which needs a bandwidth bw
that is
automatically selected when the user does not enter any number. An
object of the class attribute "tvar"
is returned by the function
tvAR
.
The user can provide additional optional arguments to modify the default
estimation of the function tvAR
. See Section 3 to
understand the usage of arguments bw
, tkernel
, est
and
singular.ok
and Section 5 to understand the usage of
argument type
. In addition, the function tvAR
has the following
argument:
An autoregressive process of order p does not necessarily contain
all the previous fixed
, with the same
format as in the function arima
from the package
stats, permits to
impose these restrictions. The order of variables in the model is:
intercept (if any), lag 1, lag 2, fixed
is a vector of
NAs with length the number of coefficients in the model. The user
can enter a vector in the argument fixed
with zeros in the
positions corresponding to the restricted coefficients.
The realized variance (RV) model was popularised in the financial
literature by Andersen and Bollerslev (1998), who show that the use of intraday data
can offer an accurate forecast of daily variance. It is defined as
Here,
It is likely that changes in the business cycles affect the coefficients
in ((15)). Chen et al. (2017) coined the time-varying coefficient
HAR, whose coefficients are functions of the rescaled time period. The
RV
dataset contains daily variables running from January 3, 1990 until
December 19, 2007 that have been computed from 5 minute intraday data
from Store (2017). This period coincides with the period in
Bollerslev et al. (2009). The variable names in this dataset are RV
,
RV_lag
, RV_week
, RV_month
and RQ_lag_sqrt
and correspond to the
> data("RV")
> RV2 <- head(RV, 2000)
> HAR <- with(RV2, arima(RV, order = c(1, 0, 0), xreg = cbind(RV_week, RV_month)))
> TVHAR<- with(RV2, tvAR(RV, p = 1, bw = 0.8, exogen = cbind(RV_week, RV_month)))
Bollerslev et al. (2016) extended the Model ((15)) to control for the effect of the realized quarticity (RQ) on the relationship between the future RV and its near past values. They present the HARQ model,
tvAR
or with
the function tvLM
as it is shown in the chunk below.
> HARQ <- with(RV2, lm(RV ~ RV_lag + I(RV_lag*RQ_lag_sqrt) + RV_week + RV_month))
> TVHARQ <- with(RV2, tvAR(RV, p = 1, exogen = cbind(RV_week, RV_month),
+ z = RQ_lag_sqrt, cv.block = 10))
Estimation is a useful tool to understand the patterns and processes hidden in known data. Prediction and forecast are the mechanisms to extend this understanding to unknown data. Although the two terms are often used indistinctively, the term prediction is broader than the term forecast which is reserved for time-series models and consists on using historical data to infer the future. For example, we speak of predicting values from a linear regression fitted to cross-sectional data and of forecasting future values from an AR(p) model.
The prediction of the dependent variable at time
In the tvReg, we refer to prediction when predict
and forecast
methods in tvReg
are slightly different.
The forecast
method is implemented for the class attributes
"tvsure"
, "tvplm"
, "tvar"
, "tvlm"
and "tvar"
. As an example,
the three days ahead forecast of model TVHAR
, evaluated in Section
9.1 using the first 2000 values of the dataset RV
, is
provided in the lines of code below. This is a TVAR(1) model with two
exogenous variables, RV_week
and RV_month
. The argument newexogen
requires three values of these exogenous variables and variable
n.ahead = 3
.
> newexogen <- cbind(RV$RV_week[2001:2003], RV$RV_month[2001:2003])
> forecast(TVHAR, n.ahead = 3, newexogen = newexogen)
[1] 2.200921e-05 2.566854e-05 2.466637e-05
The forecast
method requires the argument object
. In addition, other
arguments are necessary, some of them depending on the class attribute
of object
.
The argument n.ahead
is a scalar with the forecast horizon. By
default, it is set to 1.
It is possible to run either an increasing window forecast
(default), when the argument winsize = 0
or a rolling window
forecast with a window size defined in the argument winsize
.
These arguments belong to the forecast
methods and it is a
"vector"
, "data.frame"
or "matrix"
containing the new values
of the regressors in the model. It is not necessary to enter the
intercept. Note that this newdata
does not refer to the variables
in exogen
which might be part of the "tvar"
and "tvvar"
objects. Those must be included in newexogen
, if needed.
This argument appears in the forecast
method for the class
attributes "tvar"
and "tvvar"
and it must be entered when the
initial model contains exogen
variables. It is a "vector"
,
"data.frame"
or "matrix"
.
The predict
method is implemented for the same class attributes than
the forecast
. It does not require arguments n.ahead
and winsize
,
but arguments newdata
and newexogen
are defined as in forecast
. In
addition, new values of the smoothing variable must be entered into the
argument newz
. This must be of the class attribute "vector"
or
"numeric"
. The code below, predicts three future values of the TVHAR
model fitted above.
> newdata <- RV$RV_lag[2001:2003]
> newexogen <- cbind(RV$RV_week[2001:2003], RV$RV_month[2001:2003])
> newz <- RV$RQ_lag_sqrt[2001:2003]
> predict(TVHARQ, newdata, newz, newexogen = newexogen)
[1] 1.741663e-05 2.402516e-05 2.088794e-05
The example below shows the usage of the forecast
and predict
methods for the class attribute "tvsure"
.
The lines of code below forecast three values for model TVOLS.fit
evaluated in Section 3. The method needs a set of new
values in the argument newdata
, which must have the same number of
columns as the original dataset.
> newKmenta <- data.frame(consump = c(95, 100, 102), price = c(90, 100, 103),
+ farmPrice = c(70, 95, 103), income = c(82, 94, 115),
+ trend = c(21:23))
> forecast(TVOLS.fit, newdata = newKmenta, n.ahead = 3)
demand supply
[1,] 97.92300 95.32852
[2,] 98.94076 103.48589
[3,] 105.36951 106.26576
In case the smoothing variable in the model is a random variable, the
predict
method for the class attribute "tvsure"
requires also a new
set of values in argument newz
. The chunk below first fits a TVSURE
model, tvOLS.z.fit
, to the Kmenta
data with the same system of
equations as in the TVOLS.fit
, but with random variable as the
smoothing variable, which is generated as an ARMA(2,2) process. Three
values of the dependent variable are predicted with the predict
method. In addition to new values in the argument newdata
, it requires
a set of new smoothing values in the argument newz
. It returns the
predicted values as a matrix with as many columns as equations in the
system.
> nobs <- nrow (Kmenta)
> smoothing <- arima.sim(n = nobs + 3, sd = sqrt(0.1796),
+ list(ar = c(0.8897, -0.4858), ma = c(-0.2279, 0.2488)))
> smoothing <- as.numeric(smoothing)
> tvOLS.z.fit <- tvSURE(system, data = Kmenta, z = smoothing[1:nobs])
> newSmoothing <- tail(smoothing, 3)
> predict(tvOLS.z.fit, newdata = newKmenta, newz = newSmoothing)
demand supply
[1,] 100.0195 96.50136
[2,] 100.3919 105.29293
[3,] 106.1426 107.97822
The forecast
and predict
methods for the rest of the class
attributes in the package follow similar patterns, and further examples
can be found in the documentation of the tvReg.
Research of time-varying coefficient linear models and their estimation using kernel smoothing methods has seen a great theoretical development during the last two decades. Our own work in this field has served as inspiration to code the tvReg because we encounter a lack of computer applications with this functionality. Indeed, we expect that this package empowers empirical researchers working with regression and time series models with a computing tool that allows for more flexible models.
Within the R framework: (i) the tvReg extends functions in the R
packages systemfit, plm and vars; (ii) it extends functions lm
,
ar.ols
and arima
to allow for varying coefficients; (iv) it
complements R packages mgcv and gam for the linear regression model
by providing a consistent estimator of this model for in case of
dependency and a general distribution in the error term; (v) it
complements R package mgm by adding the time-varying impulse response
(TVIRF) function which is commonly used in macroeconomics; and (vi) it
complements R package bvarsv and MARSS which estimate the TVVAR and
TVIRF within the state-space framework. In addition, the confint
,
fitted
, forecast
, plot
, predict
, print
, resid
and summary
methods are implemented for all class attributes in the tvReg and will
allow the user to conveniently produce their research output. In any
case, the user is able to produce customised plots and summaries from
the returns of the functions, whose elements are accessible in the same
manner as other R "list"
objects.
Finally, the tvReg shows multiple applications in economics and finance. Specifically in asset management, portfolio management, risk management, health policy and monetary policy. The methods and datasets permit to verify results in (Aslanidis and Casas 2013; Casas et al. 2018, 2019, 2021). Models in this paper are used in other fields too. For example, (Reikard 2009) uses the TVLM to forecast the wave energy flux and (Haslbeck et al. 2021) uses the TVVAR in different applications in psychology. The tvReg is therefore not only relevant and original but also timely.
We thank the unknown referee for the constructive suggestions and comments on an earlier version of this paper. The first author would also like to thank her co-authors of related papers: Nektarios Aslanidis, Eva Ferreira, Jiti Gao, Stefano Grassi, Xiuping Mao, Susan Orbe, Bin Peng, Helena Veiga and Shangyu Xie whose comments, suggestions and collaboration have helped to shape this code. The first author acknowledges financial support from the European Research Council under the European Union’s Horizon 2020 research and innovation program (grant no. 657182) and from the Ministerio de Economía y Competitividad (grant no. ECO2014-51914-P). The second author acknowledges financial support from the MINECO (grant no. MTM2017-82724-R), MICINN (grant no. PID2020-113578RB-I00), and from the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14 and Centro de Investigación del Sistema Universitario de Galicia ED431G 2019/01), all of them through the ERDF.
tvReg, plm, systemfit, vars, mgm, MARSS, bvarsv, gam, mgcv, stats
Bayesian, CausalInference, Econometrics, Environmetrics, Finance, GraphicalModels, MixedModels, Psychometrics, Spatial, SpatioTemporal, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Casas & Fernández-Casal, "tvReg: Time-varying Coefficients in Multi-Equation Regression in R", The R Journal, 2022
BibTeX citation
@article{RJ-2022-002, author = {Casas, Isabel and Fernández-Casal, Rubén}, title = {tvReg: Time-varying Coefficients in Multi-Equation Regression in R}, journal = {The R Journal}, year = {2022}, note = {https://rjournal.github.io/}, volume = {14}, issue = {1}, issn = {2073-4859}, pages = {79-100} }