garchx: Flexible and Robust GARCH-X Modeling

The garchx package provides a user-friendly, fast, flexible, and robust framework for the estimation and inference of GARCH(p, q, r)-X models, where p is the ARCH order, q is the GARCH order, r is the asymmetry or leverage order, and ’X’ indicates that covariates can be included. Quasi Maximum Likelihood (QML) methods ensure estimates are consistent and standard errors valid, even when the standardized innovations are non-normal or dependent, or both. Zero-coefficient restrictions by omission enable parsimonious specifications, and functions to facilitate the non-standard inference associated with zero-restrictions in the null-hypothesis are provided. Finally, in the formal comparisons of precision and speed, the garchx package performs well relative to other prominent GARCH-packages on CRAN.


Introduction
In the Autoregressive Conditional Heteroscedasticity (ARCH) class of models proposed by Engle (1982), the variable of interest t is decomposed multiplicatively as where σ t > 0 is the standard deviation of t , and η t is a real-valued standardised innovation with mean zero and unit variance (e.g. the standard normal). Originally, Engle (1982) interpreted t as the error-term of a dynamic regression of inflation, so that σ t is the uncertainty of the inflation forecast. However, ARCH-models have also proved to be useful in many other areas. The field in which they have become most popular is finance. There, t is commonly interpreted as a financial return, either raw or mean-corrected (i.e. t has mean zero), so that σ t is a measure of the variability or volatility of return. In Engle and Russell (1998) it was noted that the ARCH framework can also be used to model non-negative variables, say, the trading volume of financial assets, the duration between financial trades, and so on. Specifically, suppose y t denotes a non-negative variable, say, volume, and µ t is the conditional expectation of y t . Engle and Russell (1998) noted that, in the expression 2 t = σ 2 t η 2 t implied by the ARCH model, if you replace 2 t with y t and σ 2 t with µ t , then it follows straightforwardly that µ t is the conditional expectation of y t . This is justified theoretically, since the underlying estimation theory does not require that t has mean zero. The observation made by Engle and Russell (1998) spurred a new class of models, which is known as the Multiplicative Error Model (MEM), see Brownlees et al. (2012) for a survey. The practical implication of all this is that ARCH-software can in fact be used to estimate MEMs by simply feeding the package in question with √ y t rather than t . The fitted values of σ 2 t become the fitted values of µ t , the error term is defined by y t /µ t , and the inference theory and other statistical results usually carry over straightforwardly. In conclusion, the ARCH-class of models provides a flexible framework that can be used in a very wide range of empirical applications.

Prominent GARCH-packages on CRAN
Although a large number of specifications of σ t have been proposed, the most common in empirical applications are variants of the Generalised ARCH (GARCH) proposed by Bollerslev (1986). In particular, the plain GARCH(1,1) is ubiquitous: By analogy with an ARMA(1,1), the conditional variance σ 2 t is modelled as a function of the recent past, where the 2 t−1 is referred to as the ARCH term, and σ 2 t−1 is referred to as the GARCH term. The non-negativity of 2 t , together with the constraints on the parameters ω, α 1 and β 1 , ensure σ 2 t is strictly positive. Another way of ensuring that σ 2 t is strictly positive is by modelling its logarithm, ln σ 2 t , as for example in the log-ARCH class of models proposed by Geweke (1986), Pantula (1986) and Milhøj (1987). Here, however, the focus is exclusively on non-logarithmic specifications of σ t . Also, multivariate GARCH-specifications are not covered.
The most prominent packages on CRAN that are commonly used to estimate variants of (2) are tseries (Trapletti and Hornik, 2019), fGarch (Wuertz et al., 2020) and rugarch (Ghalanos, 2020). In tseries, the function garch() enables estimation of the GARCH(p, q) specification Notable features of garch() include simplicity and speed. With respect to simplicity, it is appealing that a plain GARCH(1,1) can be estimated by the straightforward and simple command garch(eps), where eps is the vector or series in question, i.e. t in (1). As for speed, it is the fastest among the packages compared here, and outside the R universe it is also likely to be one of the fastest. Indeed, a formal speed comparison (see Section 4) reveals the relative speed provided by garch can be important -in particular if the number of observations is large, or if many models has to be estimated (as in simulations). A notable limitation of (3) is that it does not allow for asymmetry terms, e.g. I { t−1 <0} 2 t−1 , also known as 'leverage', or covariates. Asymmetry effects are particularly common in daily stock returns, where its presence implies that volatility in day t is higher if return on the previous day, t−1 , is negative. Often, such asymmetry effects are attributed to leverage.
In fGarch asymmetry effects are possible. Specifically, the function garchFit() enables estimation of the Asymmetric Power GARCH (APARCH) specification where δ > 0 is the power parameter, and the γ k 's are the asymmetry parameters. The power parameter δ is rarely different from 2 in empirical applications, but it does provide the added flexibility of modelling, say, the conditional standard deviation (δ = 1) directly, if the user wishes to do so. Another feature of garchFit() is that other densities than the normal can be used in the ML estimation, for example the skewed normal or the skewed Student's t. In theory, this provides more efficient estimates asymptotically if η t is skewed or more heavy-tailed than the normal. In finite samples, however, the actual efficiency may be more dependent on how estimation is carried out numerically. Also, additional density parameters may increase the possibility of numerical problems. To alleviate this potential problem, the package offers a non-normality robust coefficient-covariance along the lines of Bollerslev and Wooldridge (1992) in combination with normal ML. The coefficient-covariance of Bollerslev and Wooldridge (1992) does not, however, provide robustness to dependence of the η t 's. Finally, fGarch also offers the possibility of specifying the mean equation as an ARMA model. That is, t = y t − µ t , where µ t is the ARMA-specification. Theoretically, joint estimation of µ t and σ t may improve the asymptotic efficiency compared with, say, a two-step estimation approach, where µ t is estimated in the first step, and σ t is estimated in the second using the residuals from the first step. In practice, however, joint estimation may in fact reduce the actual efficiency. The reasons for this are the increase in the number of parameters to be estimated, and the increased possibility of numerical problems due to the increase in the number of parameters to be estimated.
A limitation of fGarch is that it does not allow for additional covariates ('X') in (4). This can be a serious limitation, since additional conditioning variables like high − low, realised volatility, interest rates and so on may help to predict or explain volatility in substantial ways. The rugarch package remedies this. Most of the non-exponential specifications offered by rugarch are contained in where the x l,t−1 's are the covariates. However, it should be mentioned that the package also enables the estimation of additional models, e.g. the Component GARCH model and the Fractionally Integrated GARCH model, amongst other. These additional models are not the focus here. Note that the covariates in (5) need not enter as lagged of order 1. That is, x l,t−1 may denote a variable that is lagged of order 2, say, w t−2 , and so on. A variable may also enter as unlagged, w t . However, it is not clear what the teoretical requirements are for consistent estimation in this case due to simultaneity issues. Just as in fGarch, the rugarch package also enables a non-normality robust coefficient-covariance, ML estimation with non-normal densities, and the joint estimation of an ARMA-specification in the mean together with σ t . To the best of my knowledge, no other CRAN-package offers more univariate GARCH-specifications than rugarch.
What does garchx offer that is not already available?
The garchx package 1 aims at providing a simple, fast, flexible and robust -both theoretically and numerically -framework for GARCH-X modelling. The specifications that can be estimated are all contained within While this implies a restriction of δ = 2 compared with the rugarch package, garchx enables several additional features that are not available in the above-mentioned packages: i) Robustness to dependence. Normal ML estimation is usually consistent when the η t 's are dependent over time, see e.g. Escanciano (2009), and Francq and Thieu (2018). This is useful, for example, when the conditional skewness, conditional kurtosis or conditional zero-probability of η t is time-varying and dependent on the past in unknown ways. In these cases, however, the non-normality robust coefficient-covariance of Bollerslev and Wooldridge (1992) is not valid. Optionally, garchx offers the possibility of using the dependence (and non-normality) robust coefficient-covariance derived by Francq and Thieu (2018).
ii) Inference under nullity. In applied work it is frequently of interest to test whether a coefficient differs from zero. The permissible parameter-space of GARCH-models, however, is bounded from below by zero. Accordingly, non-standard inference is required when the value of a null-hypothesis lies on the zero-boundary, see Francq and Thieu (2018). The garchx package offers functions to facilitate such tests, named ttest0() and waldtest0(), respectively, based on the results by Francq and Thieu (2018).
iii) Zero-constrained coefficients by omission. If one or more coefficients are indeed zero, then it may be desirable to obtain estimates under zero-constraints on these coefficients. For example, if t is the error-term in a regression of quarterly inflation, then it may be desirable to estimate a GARCH(4,4) model in which the parameters associated with orders 1, 2 and 3 are restricted to zero. That is, it is desirable to estimate Another example is the non-exponential Realised GARCH of Hansen et al. (2012), which is simply a GARCH(0,1)-X. That is, the ARCH(1) coefficient is set to zero. Zero-constrained coefficients do not only provide a more parsimonious characterisation of the process in question. They may also make estimation more efficient and stable numerically, since fewer parameters need to be estimated. rugarch offers a feature in which coefficients can be fixed to zero. However, its approach is not by omission. In other words, using coef in rugarch to extract the coefficients in the GARCH(4,4) example above will return a vector of length 9 rather than of length 3, while the coefficient-covariance returned by rugarch will be 3 × 3. This makes multiple hypothesis testing with Wald-tests tedious in constrained models. In garchx, by contrasts, Wald-tests in constrained models are straightforward, since the zeros are due to omission: The vector returned by coef is of length 3 and the coefficient-covariance is 3 × 3. iv) Computation of the asymptotic coefficient-covariance. Knowing the value of the theoretical, asymptotic coefficient-covariance matrix is needed for a formal evaluation of an estimator. For GARCH-models, these expressions are not available in explicit form. The garchx offers a function, garchxAvar(), that compute them by combining simulation and numerical differentiation. To illustrate the usage of garchxAvar(), a small Monte Carlo study is undertaken. While the results of the study suggest all four packages return approximately unbiased estimates in large samples, they also suggest tseries and rugarch are less robust numerically than fGarch and garchx under default options. In addition, the simulations reveal the standard errors of tseries can be substantially biased downwards when η t is non-normal. A bias is expected, since tseries does not offer a non-normality robust coefficient-covariance. However, the bias is larger than suggested by the underlying estimation theory. Table 1 provides a summary of the features offered by the four packages.
The rest of the article is organised as follows. 2 The next section provides an overview of the garchx package and its usage. Thereafter, the garchxAvar() function is illustrated by means of a Monte Carlo study of the large sample properties of the packages. Next, a speed comparison of the packages is undertaken. While tseries is the fastest for the specifications it can estimate, garchx is notably faster than fGarch and rugarch in all the experiments that are conducted. Finally, the last section concludes.

The garchx package Estimation theory
Let F t−1 denote the sigma-field generated by past observables. Formally, in the garchx package, t is expected to satisfy 2 t = σ 2 t η 2 t , (6) and The conditional unit variance assumption in (7) is very mild, since it does not require that the distribution of η t is identical over time, nor that η t is independent of the past. In particular, the assumption is compatible with a time-varying conditional skewness E(η 3 t |F t−1 ) that depends on the past in unknown ways, a time-varying conditional kurtosis E(η 4 t |F t−1 ) that depends on the past in unknown ways, and even a time-varying conditional zero-probability Pr(η t = 0|F t−1 ) that depends on the past in unknown ways. Empirically, such forms of dependence are common, see e.g. Hansen (1994), and Sucarrat and Grønneberg (2020). GARCH models in which the η t 's are dependent are often referred to as semi-strong after Drost and Nijman (1993).
Subject to suitable regularity conditions, the normal ML estimator provides consistent and asymptotically normal estimates of semi-strong GARCH-models, see Francq and Thieu (2018). Specifically, they show that where is the (normal) Quasi ML (QML) estimate of the true parameter ϑ 0 . If the η t 's are independent of the past, then This is essentially the univariate version of the non-normality robust coefficient-covariance of Bollerslev and Wooldridge (1992). It is easily estimated, since the standardised residuals can be used to obtain an estimate of E(η 4 t ), and a numerical estimate of the Hessian J is returned by the optimiser. In the garchx package, the estimate of (10) is referred to as the "ordinary" coefficient-covariance. Of course, the expression returned by the software is the estimate of the finite sample counterpart Σ/T, where T is the sample size. In other words, the standard errors are equal to the square root of the diagonal of the estimate Σ/T. If, instead, the η t 's are not independent of the past, then In the garchx package, the estimate of this expression is referred to as the "robust" coefficientcovariance. Again, the expression returned by the software is the estimate of the finite sample counterpart Σ/T. It should be noted that the estimation of (11) is computationally much more demanding than (10), since an estimate of ∂σ 2 t (ϑ 0 )/∂ϑ in I is computed at each t. More details about how this is implemented is contained in the Appendix.

Basic usage of garchx
For illustration the spyreal dataset in the rugarch package is used, which contains two daily financial time series: The SPDR SP500 index open-to-close return, and the realized kernel volatility. The data are from Hansen et al. (2012), and goes from 2002-01-02 to 2008-08-29. The following code loads the data, and stores the daily return -in percent -in an object named eps: library(xts) data(spyreal, package = "rugarch") eps <-spyreal[,"SPY_OC"]*100 Note that the data spyreal is an object of class xts (Ryan and Ulrich, 2014). Accordingly, the object eps is also of class xts. The series returned by fitted, quantile and residuals are of class zoo (Zeileis and Grothendieck, 2005).
To control the ARCH, GARCH and asymmetry orders, the argument order -which takes a vector of length 1, 2 or 3 -can be used in a similar way to as in the garch function of tseries: • order[1] controls the GARCH order • order[2] controls the ARCH order • order[3] controls the asymmetry order For example, the following code estimate, respectively, a GARCH(1,1) with asymmetry and a GARCH(2,1) without asymmetry: garchx(eps, order = c(1,1,1)) #garch(1,1) w/asymmetry garchx(eps, order = c(1,2)) #garch(2,1) To illustrate how covariates can be included via the xreg argument, the lagged realised volatility from the spyreal dataset can be used: x <-spyreal[,"SPY_RK"]*100 xlagged <-lag(x) #this lags, since x is an xts object xlagged[1] <-0 #replace NA-value with 0 The code garchx(eps, xreg = xlagged) estimates a GARCH(1,1) with the lagged realised volatility as covariate, i.e.  -likelihood: -1970.247 The estimates suggest the ARCH parameter α 1 is 0. In a t-test with α 1 = 0 as null hypothesis the parameter lies on the boundary of the permissible parameter space under the null. Accordingly, inference is non-standard, and below I illustrate how this can be carried out with the ttest0() function. Note that, if α 1 is indeed 0, then the specification reduces to the non-exponential Realised GARCH of Hansen et al. (2012). Below I illustrate how it can be estimated by simply omitting the ARCH term, i.e. by imposing a zero-coefficient restriction via omission.

Inference under nullity
If the value of a parameter is zero under the null hypothesis, then it lies on the boundary of the permissible parameter-space. In these cases non-standard inference is required, see Francq and Thieu (2018). The garchx package offers two functions to facilitate such nonstandard tests, ttest0() and waldtest0().
Recall that ϑ 0 denotes the d-dimensional vector of true parameters. In a plain GARCH(1,1), for example, d = 3. Next, let e k denote a d × 1 vector whose elements are all 0 except element k, which is 1. The function ttest0() undertakes the following t-test of parameter k ≥ 2: H 0 : e k ϑ 0 = 0 and e l ϑ 0 > 0 ∀l = k against H A : e k ϑ 0 > 0.
Note that, in this test, all parameters -except parameter k -are assumed to be greater than 0 under the null. While the test-statistic is the usual one, the p-value is obtained by only considering the positive part of the normal distribution. To illustrate the usage of ttest0, let us re-visit the GARCH(1,1)-X model in (12): mymod <-garchx(eps, xreg = xlagged) In this model, the non-exponential Realised GARCH of Hansen et al. (2012) is obtained when the ARCH(1)-parameter α 1 is 0. This is straightforwardly tested with ttest0(mymod,k = 2), which yields coef std.error t-stat p-value arch1 0 0.03427413 0 0.5 In other words, at the most common significance levels the result supports the claim that α 1 = 0. Finally, note that if the user does not specify k, then the code ttest0() returns a t-test of all the coefficients except the intercept ω.
The function waldtest0() can be used to test whether one or more coefficients are zero. Let r denote the restriction vector of dimension r 0 × 1, and let R denote the combination matrix of dimension r 0 × d. Assuming that R has full row-rank, the null and alternative hypotheses in the Wald-test are given by The associated Wald test-statistic has the usual form, but the distribution is non-standard (Francq and Thieu, 2018): Critical values are obtained by parametric Bootstrap. First the sequence ||R Z i || 2 , i = 1, . . . , n is simulated, where the Z i 's are independent and identically distributed N(0, Σ) vectors. In waldtest0(), the default is n = 20000. Next, the critical value associated with significance level α ∈ (0, 1) is obtained by computing the empirical (1-α)-quantile of the simulated values. To illustrate the usage of waldtest0(), let us re-consider the GARCH(1,1)-X model in (12). Specifically, let us test whether both the ARCH and GARCH coefficients are zero: H 0 : α 1 = 0 and β 1 = 0. This means r <-cbind(c(0,0)) R <-rbind(c(0,1,0,0),c(0,0,1,0)) Next, the command waldtest0(mymod,r = r,R = R) performs the test, and returns a list with the statistic and critical values: In other words, the Wald-statistic is 72.96, and the critical values associated with the 10%, 5% and 1% levels, respectively, are 41.80, 57.97 and 97.15. So H 0 is rejected at the 10% and 5% levels, but not at 1% level. If the user wishes to do so, the significance levels can be changed via the level argument.

Zero-coefficient restrictions via omission
The ARCH, GARCH and asymmetry orders can be specified in two ways. Either via the order argument as illustrated above, or via the arch, garch and asym arguments whose defaults are all NULL. If any of their values is not NULL, then it takes precedence over the corresponding component in order. For example, the code garchx(eps, order = c(0,0), arch = 1, garch = 1) estimates a GARCH(1,1), since the values of arch and garch override those of order[2] and order[1], respectively. Similarly, garchx(eps,asym = 1) estimates a GARCH(1,1) with asymmetry, and garchx(eps,garch = 0) estimates a GARCH(1,0) model.
To estimate higher order models with the arch, garch and asym arguments, the lags must be provided explicitly. For example, to estimate the GARCH(2,2) model σ 2 The returned print shows that the ARCH(1) term has not been included during the estimation.
Finally, a caveat is in order. The flexibility provided by the arch, garch and asym arguments are not always warranted by the underlying estimation theory. For example, if the ARCH-parameter α 1 in a plain GARCH(1,1) model is restricted to zero, then the normal ML estimator is invalid. The garchx function nevertheless tries to estimate it if the user provides the code garchx(eps,arch = 0). Currently, the function garchx() does not undertake any checks of whether the zero-coefficient restrictions are theoretically valid.

Numerical optimisation
The two optimisation algorithms in base R that work best for GARCH estimation are, in my experience, the "Nelder-Mead" method in the optim() function and the nlminb() function. The latter enables bounded optimisation, so it is the preferred algorithm here, since the parameters of the GARCH-model must be non-negative. The "L-BFGS-B" method in optim() also enables bounded optimisation, but it does not work as well in my experience. When using the garchx() function, the call to nlminb() can be controlled and tuned via the arguments initial.values, lower, upper and control. In nlminb, the first argument is named start, whereas the other three are equal.
Suitable initial parameter values are important for numerical robustness. In the garchx() function, the user can set these via the initial.values argument. If not, then they are automatically determined internally. In the case of a GARCH(1,1), the default initial values are ω = 0.1, α 1 = 0.1 and β 1 = 0.7. For numerical robustness, it is important that they are not too close to the lower boundary of 0, and that β 1 is not too close to instability, i.e. β 1 ≥ 1. The choice c(0.1,0.1,0.7) works well across a range of problems. Indeed, the Monte Carlo simulations of the large sample properties of the packages (see Section 3) reveals that the numerical robustness of tseries improves when these initial values are used instead of the default initial values. In the list returned by garchx(), the item named initial.values contains the values used. For example, the following code extracts the initial values used in the estimation of a GARCH(1,1) with asymmetry: mymod <-garchx(eps, asym = 1) mymod$initial.values In each iteration nlminb() calls the function garchxObjective() to evaluate the objective function. For additional numerical robustness, checks of the parameters and fitted conditional variance are conducted within garchxObjective() at each iteration. The first check is for whether any of the parameter values at the current iteration are equal to NA. The second check is for whether any of the fitted conditional variances are Inf, 0 or negative. If either of these checks fail, then garchxObjective() returns the value of the logl.penalty argument in the garchx() function, whose default value is that produced by the initial values. To avoid that the term ln σ 2 t in the objective function explodes to minus infinity, the fitted values of σ 2 t are restricted to be equal or greater than the value provided by the sigma2.min argument in the garchx() function.
A drawback with nlminb() is that it does not return an estimate of the Hessian at the optimum, which is needed to compute the coefficient-covariance. To obtain such an estimate the optimHess() function is used. In garchx(), the call to optimHess() can be controlled and tuned via the optim.control argument. Next, the inverse of the estimated Hessian is computed with solve, whose tolerance for detecting linear dependencies in the columns is determined by the solve.tol argument in the garchx function.

Checking the large sample properties
The function garchxAvar() returns the asymptotic coefficient-covariance of a GARCH-X model. Currently (version 1.1), only non-normality robust versions are available. The aim of this section is to illustrate how it can be used to check whether the large sample properties of the packages correspond to those of the underlying asymptotic estimation theory. Specifically, the aim is to explore whether large sample estimates from Monte Carlo simulations are unbiased, whether the empirical standard errors correspond to the asymptotic ones, and whether the estimate of the non-normality robust coefficient-covariance is unbiased.

The garchxAvar() function
To recall, the non-normality robust asymptotic coefficient-covariance is given by when the η t 's are independent of the past. In general, the expression for J is not available in closed form. Accordingly, numerical methods are needed. The garchxAvar() function combines simulation and numerical differentiation to compute Σ. In short, the function proceeds by first simulating n values of t (the default is n = 10 million), and then the Hessian of the criterion function n −1 ∑ n t=1 l t (ϑ) about the true value ϑ 0 is obtained by numerical differentiation to compute an estimate of J. Internally, the garchxAvar() function conducts the simulation with garchxSim(), and the differentiation with optimHess(). If we denote the numerically obtained Hessian asJ, then the corresponding finite sample counterpart of the asymptotic coefficient-covariance associated with a sample of size T is given by In other words, the square root of the diagonal of this expression is the asymptotic standard error associated with sample size T.

Bias and standard errors of estimates
To illustrate how garchxAvar() can be used to study the large sample properties of the packages, a Monte Carlo study is undertaken. The DGP in the study is a plain GARCH(1,1) with either η t ∼ N(0, 1) or η t ∼ standardised t(5), and the sample size is T = 10000: (5), t = 1, . . . , T = 10000, The values of (ω, α 1 , β 1 ) are similar to those that are usually found in empirical studies of financial returns. The code n <-10000000 pars <-c(0.2, 0.1, 0.8) set.seed(123) AvarNormal <-garchxAvar(pars, arch=1, garch=1, Eeta4=3, n=n) eta <-rt(n, df=5)/sqrt(5/3) Avart5 <-garchxAvar(pars, arch=1, garch=1, Eeta4=9, n=n, innovations=eta) computes and stores the asymptotic coefficient-covariances in objects named AvarNormal and Avart5, respectively.    (Wuertz et al., 2020), rugarch version 1.4-2 (Ghalanos, 2020) and garchx version 1.1 (Sucarrat, 2020). m(·), sample average of estimates. se(·), sample standard deviation of estimates. ase(·), asymptotic standard error. n(NA), the number of times estimation failed due to numerical issues. arch1 3.216076 2.483018 -3.647237 garch1 -11.313749 -3.647237 9.239820 Next, the asymptotic standard errors associated with sample size T = 10000 are obtained with sqrt( diag(AvarNormal/10000) ) sqrt( diag(Avart5/10000) ) These values are contained in the columns labelled ase(·) in Table 2. Table 2 contains the estimation results of the Monte Carlo study (1000 replications). For each package, normal ML estimation is undertaken with default options on initial parameter values, initial recursion values and numerical control. The columns labelled m(·) contain the sample average across the replications, and se(·) contains the sample standard deviation. Apart from tseries, the simulations suggest the packages produce asymptotically unbiased estimates, and empirical standard errors that correspond to the asymptotic ones. Closer examination suggests the biases and faulty empirical standard errors of tseries are due to outliers. Additional simulations, with non-default initial parameter values, produce results similar to those of the other packages. 3 This underlines the importance of suitable initial parameter values for numerical robustness. The package rugarch ran into numerical problems twice for η t ∼ t(5), and thus failed to returned estimates in these two cases. Additional simulations confirmed rugarch is less robust numerically than the other packages under its default options when η t ∼ t(5): It always failed at least once. Changing the initial parameter values to those of garchx did not resolve the problem. Also, changing the optimiser to a non-default algorithm, nlminb, which is the default algorithm in fGarch and the only option available in garchx, produced more failures and substantially biased results by rugarch. 4

Coefficient-covariance estimate
In each of the 1000 replications of the Monte Carlo study, the estimate of the asymptotic coefficientcovariance is recorded. For fGarch, rugarch and garchx the estimate is of the non-normality robust type. For tseries, which does not offer the non-normality robust option, the estimate is under the assumption of normality. Note also that, for tseries, the results reported here are with the numerically more robust non-default initial parameter values alluded to above.
Let Σ i denote the estimate produced by a package in replication i = 1, . . . , 1000 of the simulations. The relative bias in replication i is given by the ratio Σ i /Σ, a 3 × 3 matrix, which is obtained by dividing the row i column j component in Σ i by the corresponding component in Σ. The average relative bias, m( Σ/Σ), is obtained by taking the average across the 1000 replications for each of the 9 entries. When η t ∼ N(0, 1), this produces the following averages: Three general characteristics are clear. First, the ratios are all greater than 1. In other words, all packages tend to return estimated coefficient-covariances that are too large in absolute terms. In particular, standard errors tend to be too high. Second, the size of the biases are similar across packages. Those of rugarch are slightly higher than those of the other packages, but the difference may disappear if a larger number of replications is used. Third, the magnitude of the relative bias is fairly low, since they all lie between 1.26% and 8.69%.
When η t ∼ t(5), the simulations produce the following averages: The downwards relative bias of about 90% produced by tseries, simply reflects that a non-normality robust option is not available in that package. However, the size of the bias is larger than expected. If it were simply due to E(η 4 t ) in the expression for Σ being erroneous (3 instead of 9 in the simulations), then the ratios should have been in the vicinity of (3 − 1)/(9 − 1) = 0.29. Instead, they are substantially lower, since they all lie in the vicinity of 0.10. In other words, the way estimation is implemented by tseries induces a downward bias in the standard errors that can be substantially larger than expected when η t is fat-tailed. The relative bias produced by fGarch, rugarch and garchx are more moderate, since they all lie less than 18% away from the true values. While the relative bias of rugarch is slightly larger than those of fGarch and garchx, the general tendency of the three packages is that the bias is downwards.

Comparison of speed
In nominal terms all four packages are fairly fast. On an average contemporary laptop, for example, estimation of a plain GARCH(1,1) usually takes less than a second if the number of observations is 10 000 or less. The reason is that all four packages use compiled C/C++ or Fortran code in the recursion, i.e. the computationally most demanding part. While the nominal speed difference is almost unnoticeable in simple models with small T, the relative difference among the packages is significant. In other words, when T is large or when a large number of models are estimated (as in Monte Carlo simulations), then the choice of package can be important.
The comparison is undertaken with the microbenchmark (Mersmann, 2019) package version 1.4-7, and the average estimation-time of four GARCH-models are compared: 3 GARCH(1,1) w/asymmetry: The parameters of the Data Generating Processes (DGPs) are (ω, α 1 , β 1 , α 2 , β 2 , γ 1 , λ 1 ) = (0.2, 0.1, 0.8, 0.00, 0.00, 0.05, 0.3), and x t in DGP number 4 is governed by the AR(1) process The comparison is made for sample sizes T = 1000 and T = 2000. Table 3 contains the results of the comparison in relative terms. A value of 1.0 means the package is the fastest on average for the experiment in question. A value of 7.15 means the average estimation time of the package is 7.15 times larger than the average of the fastest. And so on. The entry is empty if the GARCH-specification cannot be estimated by the package. The overall pattern of the results is clear: tseries is the fastest among the models it can estimate, garchx is the second fastest, fGarch is the third fastest and rugarch is the slowest. Another salient feature is how much faster tseries is relative to the other packages. This is particluarly striking for the GARCH(2,2), where the second fastest packagegarchx -is about 5 to 6 times slower, and the slowest packagerugarch -is about 28 to 30 times slower. A third notable characteristic is that the relative differences tend to fall as the sample size T increases. For example, rugarch is about 17 times slower than tseries when T = 1000, but only about 13 times slower when T = 2000. As for the specifications that tseries are not capable of estimating, garchx is the fastest. Notably so compared with fGarch, and even more so compared with rugarch.

Conclusions
This paper provides an overview of the package garchx, and compares it with three prominent CRANpackages that offers GARCH estimation routines: tseries, fGarch and rugarch. While garchx does not offer all the GARCH-specifications available in rugarch, it is much more flexible than tseries, and it also offers the important possibility of including covariates. This feature is not available in fGarch. The package garchx also offers additional features that are not available in the other packages: i) A dependence-robust coefficient covariance, ii) functions that facilitate hypothesis testing under nullity, iii) zero-coefficient restrictions by omission and iv) a function that computes the asymptotic coefficient-covariance of a GARCH-model.
In a Monte Carlo study of the packages the large sample properties of the normal Quasi ML (QML) estimator was studied. There, it was revealed that fGarch and garchx are numerically more   (Wuertz et al., 2020), rugarch version 1.4-2 (Ghalanos, 2020) and garchx version 1.1 (Sucarrat, 2020). A value of 1.00 means the package is the fastest on average for the experiment in question. A value of 7.15 means the average estimation time of the package is 7.15 times larger than the average of the fastest. And so on. The entry is empty if the GARCH-specification cannot be estimated by the package.
robust (under default options) than tseries and rugarch. However, in the case of tseries the study also revealed how its numerical robustness can be improved straightforwardly by simply changing the initial parameter values. In the case of rugarch, it is less clear how the numerical robustness can be improved. The study also revealed that the standard errors of tseries can be substantially biased downwards when η t is non-normal. A bias is expected, since tseries does not offer a non-normality robust coefficient-covariance. However, the bias is larger than suggested by the underlying estimation theory.
In a relative speed comparions of the packages, it emerged that the least flexible packagetseriesis notably faster than the other packages. Next, garchx is the second fastest (1.85 to 6.27 times slower in the experiments), fGarch is the third fastest and rugarch is the slowest. The experiments also revealed that the difference can be larger in higher order models. For example, in the estimation of a GARCH(2,2), rugarch was about 28 times slower than tseries. In estimating a plain GARCH(1,1), by contrast, it was only 13 to 17 times slower. Another finding was that the difference seems to fall in sample size: The larger the sample size, the smaller the difference in speed.

Bibliography
The computationally challenging part to estimate in I is ∂σ 2 t (ϑ 0 )/∂ϑ, since it entails the computation of a numerically differentiated gradient of a recursion at each t. In garchx, this is implemented with numericDeriv in the vcov.garchx function.