betategarch: Simulation, estimation and forecasting of Beta-Skew-t-EGARCH Models

This paper illustrates the usage of the betategarch package, a package for the simulation, 1 estimation and forecasting of Beta-Skew-t-EGARCH models. The Beta-Skew-t-EGARCH model is 2 a dynamic model of the scale or volatility of financial returns. The model is characterised by its 3 robustness to jumps or outliers, and by its exponential specification of volatility. The latter enables 4 richer dynamics, since parameters need not be restricted to be positive to ensure positivity of volatility. 5 In addition, the model also allows for heavy tails and skewness in the conditional return (i.e. scaled 6 return), and for leverage and a time-varying long-term component in the volatility specification. More 7 generally, the model can be viewed as a model of the scale of the error in a dynamic regression. 8


Introduction
It is well known that financial returns are characterised by volatility clustering: Large returns in absolute value are likely to be followed by other large returns in absolute value, and small returns in absolute value are likely to be followed by other small returns in absolute value.This characteristic is usually modelled by specifications in the Autoregressive Conditional Heteroscedasticity (ARCH) class of models by Engle (1982).If y t denotes the financial return at t such that then the scale or volatility σ t > 0 is said to follow an ARCH process.Arguably, the most popular ARCH specification is the first order Generalised ARCH (GARCH) of Bollerslev (1986), where σ 2 t is modelled in an ARMA(1,1)-like manner, Francq and Zakoïan (2010) for a recent survey of GARCH models.If the financial return in question is predictable, then y t can be interpreted as de-meaned return, i.e. the unpredictable part of return.However, more generally, y t can be viewed as the error-term in a dynamic regression.Three characteristics that are often exhibited by financial returns are leverage (i.e.volatility asymmetry), conditional fat-tailedness and conditional skewness.The former means volatility tends to be higher after negative returnsthis is typically attributed to leverage (hence the name), whereas conditional fat-tailedness means the standardised conditional return (i.e.ε t ) is more fat-tailed than the Gaussian.Conditional skewness means the standardised return is not symmetric.For stock returns, the skewness is typically negative, which means the probability of a large negative return is greater than a large positive return, even after controlling or adjusting for the recent level of volatility.
Several R packages provide facilities for the estimation and forecasting of univariate GARCH models that contains one or more of these features.Arguably, the three most important packages are tseries by Trapletti and Hornik (2013), fGarch by Wuertz et al. (2013) and rugarch by Ghalanos (2013).The tseries package probably has the fastest GARCH optimiser, but does not offer state-of-the-art specifications with leverage and fat-tailed skewed densities.This, by contrast, is provided by the fGarch and rugarch packages.The former has been considered by many -including myself -as the benchmark package in R for quite a while, since it provides a wide range of GARCH models coupled with a variety of densities.However, unfortunately, fGarch does not offer the possibility to estimate Exponential GARCH (EGARCH) models, i.e. models where the dynamics is specified in terms of ln σ 2 t rather than in terms of σ 2 t .EGARCH models are attractive, since they enable richer and more flexible dynamics (parameters can be negative), and since the autocorrelation function of returns depends on the conditional density (this is not the case for non-exponential ARCH models).The rugarch package, which despite its relative recent origin (available on CRAN since September 2011) already offers an impressive range of GARCH specifications and variations of these, fills this gap to some extent by providing Nelson's (1991) EGARCH model.Another package that partially fills this gap is AutoSEARCH by Sucarrat (2012), which offers estimation and automated General-to-Specific model selection of log-ARCH-X models.However, to the best of my knowledge, there are no other R packages that provide facilities for additional EGARCH models.The betategarch package thus contributes to the R world by offering utilities for the simulation, estimation and forecasting of Beta-t-EGARCH models.
The Beta-t-EGARCH model was first proposed by Harvey (2013) and Harvey and Chakravarty (2008), but it can also be viewed as an unrestricted version of the Generalised Autoregressive Score The R Journal Vol.5/2, December ISSN 2073-4859 (GAS) model of Creal et al. (2013).The code upon which betategarch is based was originally developed for Harvey and Sucarrat (2013), which extends the Beta-t-EGARCH model to the skewed case.The Beta-Skew-t-EGARCH model has a number of attractions.First, the model is robust to jumps or outliers, and fares very well empirically for a variety of financial returns when compared with other GARCH models, see Harvey and Sucarrat (2013).Second, the model accommodates the most important characteristics of time-varying financial volatility: Leverage, conditional fat-tailedness, conditional skewness and a decomposition of volatility into a short-term and a long-term component.Third, the unconditional moments of return (i.e.y t ) exist (if the conditional moments exist), which is important in long-term forecasting and for the computation of the autocorrelation function of returns.By contrast, this is generally not the case for Nelson's (1991) EGARCH when coupled with a t density: A necessary condition for the unconditional moments of the first-order specification to exist when ε t is t is that the ARCH parameter is negative, see condition (A1.6) and the subsequent discussion in Nelson (1991, p. 365).Moreover, in the presence of leverage the ARCH parameter must be even more negative for the unconditional moments to exist.This is why Nelson (1991) proposed his model with a Generalised Error Distribution (GED) instead of a t.Fourth, the asymptotic properties are much easier to derive than those of Nelson's (1991) EGARCH, see Straumann and Mikosch (2006) and Wintenberger (2012) for a detailed description of the difficulties.Finally, since the conditional score drives the dynamics of the model, the Beta-t-EGARCH acquires some attractive theoretical properties.
In particular, a simple transformation of the score is Beta-distributed (hence the name).
The two main functions of the betategarch package (version 3.1) are tegarchSim and tegarch.The first simulates from a Beta-Skew-t-EGARCH, whereas the latter estimates one.The tegarch function returns an object (a list) of the tegarch class, and a collection of S3 methods developed for this class can be applied to such objects: coef, fitted, logLik, predict, print, residuals, summary and vcov.The rest of the functions in the package are either auxiliary functions called by the two main functions, or a dataset, nasdaq, which is included for illustration purposes (see empirical example below).Finally, it is worth noting that the objects returned by tegarchSim function and the fitted and predict methods are of the class zoo defined in package zoo, see Zeileis and Grothendieck (2005) and Zeileis et al. (2013).This means a large range of useful time-series methods and facilities can be applied to these objects, including plotting and printing methods.

The one-component Beta-Skew-t-EGARCH
The martingale difference version of the first order one-component Beta-Skew-t-EGARCH model (see Sections 4 and 6 in Harvey and Sucarrat, 2013) is given by The σ t is the conditional scale or volatility, which need not equal the conditional standard deviation.
In other words, ε t is not standardised to have variance one.The conditional standard deviation is obtained as σ t σ ε , where σ 2 ε is the variance of ε t .The conditional error ε t is distributed as a Skewed t with zero mean, scale σ 2 ε , degrees of freedom parameter ν and skewness parameter γ.The conditional error is defined as ε t = ε * t − µ ε * , where ε * t is an uncentred (i.e.mean not necessarily equal to zero) Skewed t variable with ν degrees of freedom, skewness parameter γ and mean µ ε * .A centred and symmetric (i.e.ordinary) t-distributed variable with mean zero is obtained when γ = 1, in which µ ε * = 0, whereas a left-skewed (right-skewed) t-variable is obtained when γ < 1 (γ > 1).More details on the distribution are given below.The ω is the log-scale intercept and can be interpreted as the long-term log-volatility, φ 1 is the persistence parameter (the bigger, the more clustering), κ 1 is the ARCH parameter (the bigger in absolute value, the greater the response to shocks), u t is the conditional score (i.e. the derivative of the log-likelihood of y t at t with respect to λ t ) and κ * is the leverage parameter.A sufficient condition for stability in λ t is |φ 1 | < 1.
Let * denote an ordinary (i.e.centred and symmetric) t-distributed variable (with unit scale), and let f ( * ) denote its density.By means of the skewing method of Fernández and Steel (1998), the density of an uncentred Skewed t variable can be written as Computation of the density values and of the mean of an uncentred Skewed t variable, and random number generation, can be undertaken with dST, STmean and rST, respectively.For example, the following code compares the empirical average of ten thousand random numbers with the analytical mean: The R Journal Vol.5/2, December ISSN 2073-4859 In addition, the functions STvar, STskewness and STkurtosis return the analytical values of the variance, skewness (the standardised 3rd moment) and kurtosis (the standardised 4th moment), respectively, of an uncentered Skewed t variable.
Leverage, however, cannot be turned off in the two-component model for identifiability reasons.

Forecasting volatility
Forecasts of volatility -i.e.either the conditional standard deviation or the conditional scale -can be generated with the predict method applied to a tegarch object.The formula for n-step ahead forecasts of conditional scale σ t is for the one-component model, where g t is an IID term equal to κ 1 u t + κ * sgn(−ε)(u t + 1).The t subscript in the conditional expectation operator E t (•) means the set of conditioning information contains all values up to and including period t.Accordingly, for i = 1 the value of E t [exp(φ n−i 1 g t+i−1 )] is exp(φ n−1 1 g t ).For i > 1, however, the expectations are not available in explicit form, so they are estimated by computing the sample mean of a large number of simulated IID variates.The default number of IID variates is 10 000, but this can be changed via the n.sim option.Another default option is that the predict method only returns forecasts of the conditional standard deviation E t (σ t+n )σ ε . (11) However, if the verbose option is changed to TRUE, then forecasts of the conditional scale are also returned.Similarly, the initial values used are by default those of the last observation in the estimation sample.This can be altered via the initial.valuesoption, e.g. for counterfactual or scenario analysis purposes.
The scale formula for the two-component specification is where g 1,t = κ 1 u t and g 2,t = κ 2 u t + κ * sgn(−ε)(u t + 1).Again, forecasts of conditional standard deviations are given by E t (σ t+n )σ ε , and the expectations in the last term on the right are estimated by simulation for i > 1.
As an example, the following code generates forecasts up to 7-period ahead for both scale and standard deviation for the one-component model estimated above: set.seed(123 The returned object is of class zoo, so a convenient time-series plotting method is readily available.Similarly, the command predict(twocompmod,n.ahead=7,verbose=TRUE) generates a corresponding set of forecasts for the two-component model estimated above.

Computational challenges
Estimation of the Beta-Skew-t-EGARCH model is by exact Maximum Log-likelihood (ML).The expressions of the first and second derivatives are not available in explicit form, so the procedure is all numerical.This leads to four computational challenges.The first is simply that the model is highly nonlinear, which is readily apparent by simply looking at the expression for the score (equation ( 5)).Moreover, the degree of non-linearity is compounded in the two-component specification.However, as no positivity constraints on the ARCH, GARCH and leverage parameters (i.e.ω, φ 1 , φ 2 , κ 1 , κ 2 , κ) are needed, the numerical challenges are in fact not as large as those of, say, the two-component GARCH model of Engle and Lee (1999).There, positivity constraints on all parameters are necessary.As in all highly nonlinear estimation problems a set of good initial values is essential.By default, these are 0.02, 0.95, 0.05, 0.01, 10, 0.98 for one-component specifications, and 0.02, 0.95, 0.9, 0.001, 0.01, 0.005, 10, 0.98 for two-component specifications.However, if the user wishes to do so they can be changed via the initial.valuesoption in the tegarch function.The summary method with option verbose=TRUE returns (amongst other) the initial values used in estimation.
The second computational challenge is that dynamic stability or stationarity -in practice that |φ 1 | < 1 (and |φ 2 | < 1 in the two-component case) -is required for estimation, whereas empirically φ 1 (and φ 2 ) is often close to, but just below, 1.In order to avoid explosive recursions during estimation it is therefore desirable to restrict φ 1 (and φ 2 ) such that |φ 1 | < 1 (and |φ 2 | < 1).My experience suggests the nlminb function is an excellent choice for this type of problems.The optim function with the L-BFGS-U option provides a similar bounding facility, but in practice it does not work as well as nlminb (it is less precise and fails more often in my experience).As for bounds, the skewing parameter γ is only defined on strictly positive values, and on theoretical grounds the degrees of freedom parameter ν must be greater than 2. For these reasons the default lower bounds on φ 1 , φ 2 , ν and γ are -1+.Machine$double.eps,-1+.Machine$double.eps,2+.Machine$double.epsand .Machine$double.eps, and their default upper bounds are 1-.Machine$double.eps,1-.Machine$double.eps,+Inf and +Inf (i.e.ν and γ are unbounded from above).The other parameters are unbounded (i.e.their default upper and lower bounds are +Inf and -Inf, respectively).If the user wishes to do so the bounds can be changed via the lower and upper options in the tegarch function.Additional control options can also be passed on to the nlminb optimiser (see nlminb documentation) by introducing them as arguments in the tegarch function.
A third computational challenge is due to the presence of the sign function sgn in the skewed density (4), which means the log-likelihood is not differentiable in γ at the skewness change point.The log-likelihood is continuous, so the problem is likely to be numerical only and not theoretical.In fact, consistency and asymptotic normality results often hold regardless (see e.g.Zhu and Galbraith, 2010).Most of the time the user will not encounter problems due to this characteristic, neither in simulation nor in empirical practice.Occasionally, however, the numerically estimated gradients (analytical ones are not available in explicit form) may explode when they iterate towards the proximity of the skewness change point.The nlminb function together with its bounding facility resolves this problem to a large extent.For extra protection against NA and Inf values the log-likelihood functions (tegarchLogl and tegarchLogl2) check for NA and Inf values at each iteration: Whenever such values are produced, then a small value on the log-likelihood is returned.By default this small value is the log-likelihood associated with the initial values, but this can be changed via the logl.penaltyoption in the tegarch function.
The fourth computational challenge is easily resolved.The expressions for the density function of the t-distribution and the moments of a t-distributed variable contain the gamma function in both the numerator and the denominator.When the argument of the gamma function is 172 or larger (i.e.344 degrees of freedom or higher), then a value of Inf/Inf = NaN is returned, which can be detrimental to estimation.This issue is not particular to R but a consequence of how the Gamma function is implemented numerically in computer languages in general.Luckily, the issue can be easily resolved by noting that beta(x,y) = gamma(x) * gamma(y)/gamma(x+y).The t-density and the moments of t-variables are thus readily re-written in terms of the beta function.The fGarch package has been using this parametrisation of the t-distribution for some time (always?), and I suspect it is for the same reason.
The R Journal Vol.5/2, December ISSN 2073-4859 Empirical example: Daily volatility of the Nasdaq 100 stock market index This section illustrates the use of the package in an empirical example (it is available as a demo by typing demo(betategarch)).The financial return in question is the daily log-return (in percent) of the Nasdaq 100 composite index (adjusted closing values).The study by Harvey and Sucarrat (2013) suggests stock market indices are particularly prone to be characterised by leverage, conditional fat-tailedness, skewness and a long-term component, so a stock market index is therefore specially suited to illustrate the usefulness of the Beta-Skew-t-EGARCH model.Also, the Nasdaq index was not included in Harvey and Sucarrat (2013) study.The source of the data is http://finance.yahoo.com/and the period in question is 3 January 2001 to 15 October 2013, i.e. a total of 3215 observations.
The following code loads the data, defines y to equal daily return in terms of a 'zoo' object and plots y: data(nasdaq) y <-zoo(nasdaq[,"nasdaqret"], order.by=as.Date(nasdaq[,"day"], "%Y-%m-%d")) plot(y, main="The Nasdaq 1 index (daily)", xlab="", ylab="Log-return in %") The plot appears as the upper graph in Figure 1 and shows that the return series is clearly characterised by time-varying volatility.
To estimate a one-component Beta-Skew-t-EGARCH, to extract its fitted values and to plot the fitted conditional standard deviations, the following code can be used: nasdaq1comp <-tegarch(y) nasdaq1stdev <-fitted(nasdaq1comp) plot(nasdaq1stdev, main="", ylab="1-comp: St.dev.",xlab="") The degrees of freedom in the Skewed t is estimated to be 10.3, a reasonably fat-tailed conditional t density, whereas the skewness is estimated to be about 0.86, which corresponds to pronounced negative skewness in ε t .Also, the test statistic (0.8567 − 1)/0.0193= −7.4249 is significant at all conventional significance levels.For a model without skewness, the command tegarch(y,skew=FALSE) can be used.For a closer look at the standardised residuals εt / σε in the estimated model above, the residuals method can be used for extraction.For the unstandardised residuals εt , the standardised argument must be set to FALSE.BIC is the Schwarz (1978) information criterion in terms of the average log-likelihood.By default, the fitted method returns the fitted conditional standard deviation.However, more output is available by using the verbose option, i.e. by typing fitted(nasdaq1comp, verbose=TRUE).This returns a matrix with, amongst other, the fitted scale and log-scale, the estimated score and the residuals.The returned matrix is a 'zoo' object that is automatically matched with the dates -if any -of returns y t .The middle graph in Figure 1 contains the plot of the fitted conditional standard deviations of the one-component model.
To estimate a two-component Beta-Skew-t-EGARCH model, to extract its fitted values and to plot its fitted conditional standard deviation, the following code can be used: nasdaq2comp <-tegarch(y, components=2) nasdaq2stdev <-fitted(nasdaq2comp) plot(nasdaq2stdev, main="", ylab="2-comp: St.dev.",xlab="") The plot of the fitted conditional standard deviations appears as the lower graph in Figure 1 3.487262 Comparison of the BIC values of the two models suggests the latter provides a better fit.This provides further evidence in favour of two-component models in modelling the volatility of financial stockmarket returns.
To generate out-of-sample forecasts up to 5-days ahead, the following code can be used: set.seed( 123) predict(nasdaq1comp, n.ahead=5) 1 2 3 4 5 .81214 1 .817946 .821867 .8256776.8295533 In other words, the predicted conditional standard deviation 5 trading days after the 15th of October 2013 is about 0.8296.By default, the fitted values of λ t and λ † t for the last day of the estimation sample are used as initial values.However, alternative initial values on λ t and λ † t can be specified by the user via the initial.valuesoption.
Conceptually the Beta-Skew-t-EGARCH model can appear complicated.So one may ask whether it performs better than conceptually simpler models like, say, an ordinary GARCH model with leverage and the two-component model of Engle and Lee (1999).An ordinary GARCH(1,1) model with leverage of the Glosten et al. (1993) type, sometimes referred to as the GJR-GARCH, coupled with a standardised Skewed t conditional density is given by y = σ t ε t , ε t ∼ st(0, 1, ν, γ), (13) where the parameters have similar interpretations to those of the Beta-Skew-t-EGARCH specification.3.482173 3.493511 3.482166 3.486237 In the table alpha1, gamma1 and beta1 correspond to κ 1 , κ * and φ 1 , respectively.The skewness and degrees of freedom estimates are relatively similar to those of the Beta-Skew-t-EGARCH above.However, the BIC value is lower, which means the Beta-Skew-t-EGARCH provides a better fit according to this criterion.
In the rugarch package the first order two-component model of Engle and Lee (1999)

Figure 1 :
Figure 1: Nasdaq log-returns in %, fitted conditional standard deviation of the one-component model and fitted conditional standard deviation of the two-component model.
By default, the tegarchSim function returns the values of y t only.However, for the full set of output one may use the verbose option.For example, the following code generates 100 observations using the default parameter values, stores the output in the matrix mY that is of class zoo and returns the first six observations: Harvey and Sucarrat, 2013)volatility accommodate the long-memory property by decomposing The R Journal Vol.5/2, December ISSN 2073-4859 volatility into one long-term component and one short-term component.The role of the latter is to pick up temporary changes following a shock.The martingale difference version of the first order two-component Beta-Skew-t-EGARCH model (see Sections 2.5 and 6 inHarvey and Sucarrat, 2013)is given by y The estimation results are stored in nasdaq1comp, so typing nasdaq1comp yields The model can be estimated with the fGarch package by means of library(fGarch) nasdaqgarch <-garchFit(data=y, cond.dist="sstd",include.mean=FALSE,include.skew=TRUE,leverage=TRUE)Next, summary(nasdaqgarch) yields (amongst other)