gk: An R Package for the g-and-k and Generalised g-and-h Distributions

Abstract:

The g-and-k and (generalised) g-and-h distributions are flexible univariate distributions which can model highly skewed or heavy tailed data through only four parameters: location and scale, and two shape parameters influencing the skewness and kurtosis. These distributions have the unusual property that they are defined through their quantile function (inverse cumulative distribution function) and their density is unavailable in closed form, which makes parameter inference complicated. This paper presents the gk R package to work with these distributions. It provides the usual distribution functions and several algorithms for inference of independent identically distributed data, including the finite difference stochastic approximation method, which has not been used before for this problem.

Cite PDF Tweet

Published

Sept. 10, 2020

Received

Jun 22, 2017

Citation

Prangle, 2020

Volume

Pages

12/1

7 - 20


1 Introduction

Statisticians have long sought for a simple extension to the normal distribution which can model data subject to skew, heavy tails or both. One approach is to transform a standard normal random variable ZN(0,1) to (1)X=A+BG(Z)H(Z), where A and B are location and scale parameters, G() introduces asymmetry, and H() elongates the tails of the distribution while having little effect near the mode. This paper considers two such distributions, the g-and-k and generalised g-and-h distributions. These distributions can model many types of behaviour through just a small number of parameters.

Defining random variables as transformations of Z is equivalent to specifying the distribution’s quantile function (defined in the next section), and distributions of this type are known as quantile distributions. Work on quantile distributions goes back at least to . See for a book length treatment of their history and use in statistics. proposed the form (1) and a distribution using it: the original g-and-h distribution. were the first to use the two distributions considered in this paper: the g-and-k distribution and a generalised form of the g-and-h distribution. For brevity henceforth “g-and-h distribution” will refer to their generalised form. See for a thorough review of these and other distributions based on (1).

Applications of the g-and-k and g-and-h distributions have included environmental data , financial returns and insurance risk . There has also been considerable methodological work on inference for these distributions . This is because it is not possible to express the densities of quantile distributions in closed form beyond some special cases, which makes it difficult to apply standard likelihood-based inference methods.

This paper presents the gk R package to work with the g-and-k and g-and-h distributions. The remaining sections covering the following:

2 Definitions

The cumulative distribution function (cdf) of a univariate random variable X, FX:R[0,1], is defined as Pr(Xx). (Later we will often drop subscripts where they are clear from the context.) The cdf suffices to completely specify the probability distribution of X. It is often the case that the cdf is not available in closed form but is implicitly defined through its derivative, the probability density function (pdf). An example of this is the normal distribution.

For quantile distributions, the cdf is implicity defined through its inverse, the quantile function FX1(u) where FX1:[0,1]R. The g-and-k and g-and-h distributions use a quantile function of the form F1(u;θ)=Q(z(u);θ) where z() is the N(0,1) quantile function and θ is a vector of parameters. The Q functions are: (2)Qgk(z;A,B,g,k,c)=A+B(1+ctanh[gz/2])z(1+z2)k

(3)Qgh(z;A,B,g,h,c)=A+B(1+ctanh[gz/2])zexp(hz2/2). It is possible to sample from the distributions using the inversion method, that is, by simulating UU(0,1) and substituting it into the quantile function. Equivalently one can sample ZN(0,1) and substitute it into Qgk or Qgh i.e. the process described in the introduction based on Equation (1). In terms of (1), G(z)=1+ctanh(gz/2) produces asymmetry and H(z)=z(1+z2)k or zexp(hz2/2) elongates tails.

Each distribution has four main parameters: A (location), B (scale), g (shape parameter mainly affecting skewness), and k or h (shape parameter mainly affecting kurtosis). The remaining parameter c is discussed below. When both shape parameters are zero the distribution is simply N(0,1). An illustration of the flexible shapes that the g-and-k density can take is given in Figure 1. The g-and-h can produce similar shapes, with the following exception. The g-and-k distribution allows negative values of k which can produce lighter tails than a normal distribution, but also bimodal distributions of potentially limited usefulness.

Well-defined continuous distributions result from parameter values producing strictly increasing quantile functions. Determining when this is true is complicated so discussion is postponed to a later section. For now note that it is standard to take B>0 and fix c=0.8 (which will be assumed throughout unless mentioned otherwise), and in this case k0 or h0 guarantees a valid distribution.

graphic without alt text
Figure 1: Example g-and-k densities. The first panel fixes g=0 and varies k, mainly altering kurtosis. The second fixes k=0 and varies g, mainly altering skewness. The third shows two examples with k<0.

3 Distribution functions

The gk package provides the standard suite of R functions for the g-and-k and g-and-h distributions i.e. random sampling and calculation of the pdf, cdf and quantile functions. This section describes how these functions are implemented. It is assumed that parameters have been chosen such that the quantile function is strictly increasing. No warning is given when this is not the case as checking validity is time consuming (see next section).

Quantile function

The qgk and qgh functions calculate the quantile function F1(u). Their implementation is straightforward. First z(u) is calculated using qnorm, then this is passed to an internal function, z2gk or z2gh, which computes Qgk or Qgh.

Random sampling

The rgk and rgh functions perform random sampling. This is done by the method described earlier of sampling N(0,1) draws and substituting them into Qgk or Qgh, via the function z2gk or z2gh.

Cumulative distribution function

The pgk and pgh functions calculate the cdf F(x) given input x. They numerically solve Q(z)x=0, which is guaranteed to have a unique root for z. The required final output is then u=Φ1(z) where Φ is the N(0,1) cdf. An alternative approach would be to directly solve Q(z(u))x=0 for u. However we found this was less numerically stable for u close to 0 or 1.

Our code finds the root for z using R’s uniroot command and z2gk or z2gh for Q(z) evaluations. The need to run a root finding algorithm means this function is slow relative to cdf calculations of standard distributions - see Table 1.

The functions include an argument zscale. Setting this to TRUE outputs the z value which is found rather than u. This is used in the density functions below, and more generally is also useful to retain numerical precision when z has large magnitude.

Probability density function

The dgk and dgh functions calculate the pdf f(x), or the log pdf if the argument log=TRUE is supplied. The method is based on the standard probability result that if A has density fA(a) and t(a) is a differentiable 1-1 transformation then the density of B=t(A) is fB(b)=fA(a)/t(a)where a=t1(b), and t denotes the first derivative of t.

For quantile distributions we have ZN(0,1) and X=Q(Z) for some Q function. So the pdf of X is f(x)=ϕ(z)/Q(z)where z=Q1(x), where ϕ(z) is the N(0,1) pdf.

Our code to calculate the pdf first finds z=Q1(x) using pgk or pgh with zscale=TRUE. Then the pdf or its log is calculated using formulae (4) and (5) (see Appendix A) for Q(u). The reliance on performing root finding within pgk and pgh means that dgk and dgh are slow relative to pdf calculations for standard distributions - see Table 1.

Note that an alternative representation of f(x) is 1/q(u) where q(u) represents F1(u) and u=F(x). Density calculations based on this approach are described in . However we found that calculating the u values required for this approach was occasionally numerically unstable, as mentioned above.

Cost

Table 1 compares the time to execute gk’s distributional functions to those for the normal distribution. It illustrates that random sampling and quantile function calculation are reasonably efficient, but calculating the cdf and pdf are expensive.

Table 1: Mean times to perform various distributional operations, evaluated by the microbenchmark package . For example the random sampling row compares rnorm(N), rgk(N,1,2,3,4) and rgh(N,1,2,3,4) for N=100. We also tried N=1, which gave qualitatively similar results but slightly better relative efficiency of the gk functions.
Time (microseconds) Ratio vs normal
Normal g-and-k g-and-h g-and-k g-and-h
Quantile function 175 972 445 5.56 2.55
Random sampling 150 921 436 6.15 2.91
cdf 313 143151 116928 457 374
pdf 369 138381 111279 375 302

4 Range of valid parameters

Recall that a valid continuous distribution requires the quantile function to be strictly increasing. Clearly this property is unaffected by the choice of A and B>0. This section discusses the effects of g,h,k and c.

Several theoretical results on valid parameters can be derived. It’s convenient to concentrate on c0. In this case h<0 or k<1/2 is invalid. Taking k0 or h0 produces valid distributions when 0c<c0.83. This is the reason for taking c=0.8 as standard: it maintains this property while allowing the skewness factor in (2) and (3) to have a large effect. For justification of all these results, see Appendix B.

When c=0.8, the above results completely characterise the range of valid parameters for the g-and-h distribution. For the g-and-k distribution, there is still some uncertainty for 0.5k<0, which, as mentioned earlier, corresponds to light tails. For both distributions, the case where c>c is less clear: even positive values of k or h do not guarantee validity. Therefore we provide the function isValid to test parameter validity numerically.

Validity can be checked by testing whether the minimum derivative of (2) or (3) is positive. Appendix A shows that it is equivalent to test whether the functions (6) or (7) are positive. isValid uses numerical optimisation to minimise these and returns whether the minimum value is positive. To reduce the possibility of finding local minima, multiple optimisation starting points can supplied as a vector to the argument initial_z. However it is still not guaranteed that the global minimum is found, so there remains a possibility that the function may produce false positives.

The function can be used as follows to illustrate the region of valid g-and-k parameter values for c=0.8. The results are plotted as Figure 2.

gk_grid = expand.grid(g = seq(-10, 10, 0.1), k = seq(-0.6, 0.1, 0.01))
v = isValid(gk_grid$g, gk_grid$k)
graphic without alt text
Figure 2: Validity of parameter values for the g-and-k distribution when c=0.8, calculated using isValid. Also shown is a quadratic function k~(g) near the curved part of the boundary between the regions.

We do not test validity automatically within the package’s other functions. This is because isValid is relatively computationally expensive and not guaranteed to be correct. Therefore particular care should be taken for k<0 or c>c, as the distribution functions will not provide warnings when invalid parameters are used. A reasonable region of g and k values to use in practice with c=0.8 can be derived from Figure 2. It shows that for |g|<7 some 0.5k<0 values are invalid. Apart from a narrow strip near g=0, this invalid region’s boundary is roughly quadratic, as illustrated by the curve k~(g)=0.0450.01g2 on the figure. Based on this analysis, kmax(0.5,k~(g)) seems a reasonable sufficient condition for parameter validity to use in practice.

5 Inference functions

The package provides three inference methods for data x1,x2,,xn which are assumed to be independent and identically distributed (IID) draws from a g-and-k or g-and-h distribution with unknown parameters. This section describes these methods. An illustration of their use is provided in the next section. See the gk help files for a full description of all the arguments available.

MCMC inference

The mcmc function implements inference using Markov chain Monte Carlo (MCMC). This samples from a Markov chain whose stationary distribution is the Bayesian posterior of interest for the parameters θ. We use a Metropolis-Hastings algorithm, in which a proposed new state of the chain θ is sampled by adding a N(0,Σ) increment to the current state θt. A decision to accept or reject θ is made based on the prior and likelihood values at θt and θ and a random variable (see steps 3-4 of Algorithm 1.)

Tuning Σ can be difficult. , who first used MCMC for the g-and-k distribution, did this manually. Instead we use the adaptive Metropolis (AM) algorithm of which tunes Σ automatically during its operation. The resulting θts no longer form a Markov chain, but it has been proved that, under suitable conditions, calculations using them still converge to posterior quantities as the length of the chain increases. The AM algorithm is presented as Algorithm 1. Step 1 states the proposal matrix used in terms of the empirical variance of the past MCMC states. To calculate this empirical variance efficiently, the code updates it each time a new state is observed. As a default we specify tuning choices ϵ=106 and t0=100.

Like other Bayesian methods, MCMC requires a prior density for the parameters, π(θ), to be specified. This must be supplied by the user. For computational convenience this should be supplied in the form of a function get_log_prior which takes a vector of parameters as input and returns the log prior density. We allow the user to reparameterise θ, using logB rather than B, via the logB argument. This can improve MCMC efficiency when the posterior for B is concentrated on values close to zero.

For IID data the likelihood is L(θ)=i=1nf(xi;θ), the product each observation’s pdf. Evaluating this for the g-and-k or g-and-h distributions using the pgk or pgh command requires n calls to numerical optimisation. Therefore MCMC becomes computationally expensive for even moderately large datasets.

  1. Loop over 1tN:

  2. If tt0 let Σt=Σ0. Otherwise let Σt=14(2.4)2(Σ^t1+ϵI), where Σ^t1 is the variance of θ1,θ2,,θt1.

  3. Sample θN(θt1,Σt1)

  4. Sample uU(0,1) and let r=π(θ)L(θ)π(θt1)L(θt1).

  5. If u<r let θt=θ. Otherwise let θt=θt1.

Algorithm 1: The Adaptive Metropolis MCMC algorithm

ABC inference

The abc function implements inference by approximate Bayesian computation (ABC). This is a method for approximate Bayesian inference which avoids evaluating the likelihood function. It is especially useful when the likelihood function is unavailable or, as for quantile distributions, is expensive to compute. ABC is based instead on finding parameter values which produce simulated data similar to the observations. The abc function implements a simple version of ABC, Algorithm 2. Here a simulation is accepted if it has one of the M smallest distances to the observations. Distance refers to a weighted version of Euclidean distance between vectors of simulated and observed summary statistics. Details of the weighting are given in the algorithm’s description. (For n large, abc avoids high memory requirements by running several batches of Algorithm 2. Each batch uses N=104 and returns the M best simulations. The overall best M best simulations are then found and returned. The vj weights calculated in the first batch are reused in the others.)

Like mcmc, abc is a Bayesian method and requires a prior distribution for θ to be provided. It is convenient for this to be provided in a different form to the mcmc case. A function rprior should be supplied which a single numeric input and returns that many samples from the prior distribution as rows of a matrix.

  1. Calculate observed summaries s0=s(x).

  2. For 1iN sample parameters θi from the prior.

  3. For 1iN simulate summary statistics s(xi) given parameters θi. Let sij denote the jth component of s(xi).

  4. For 1jq (where q=dim(s0)) calculate the empirical variance vj of the (sij)1in values.

  5. For 1iN let di=j=1q(sijs0j)2/vj.

  6. Find the M smallest di values and return the corresponding θis.

Algorithm 2: Approximate Bayesian computation (ABC)

ABC produces samples from an approximation to the Bayesian posterior distribution. The quality of the approximation depends in a complex way on the choice of summary statistics and the tuning parameters N and M. For more background on ABC see the review paper by and the handbook of . Two general R packages for ABC which implement more advanced methods are abc and EasyABC .

Using ABC for the g-and-k and g-and-h distributions was proposed by and has been investigated in many subsequent papers. Following we offer three choices of summary statistics which can be selected through the sumstats argument: (1) the full order statistics; (2) octiles of the observations, E1,E2,,E7; (3) robust estimates of the moments based on the octiles: SA=E4,SB=E6E2,Sg=(E6+E22E4)/Sb,Sk=(E7E5+E3E1)/Sb. Many more sophisticated approaches to choosing ABC summary statistics have been proposed , but these are a simple starting point.

For summaries (2) or (3) we follow and speed up step 3 of Algorithm 2 by using the fact that the octiles (or close approximations) can be simulated quickly without the need to simulate a full dataset. Suppose X1,X2,,XN are g-and-k or g-and-h variables, and let X(1)<X(2)<X(N) denote the order statistics. We replace Ei with Ei=X(r(iN/8)) where r() rounds to the nearest integer. Now we need to simulate 7 order statistics from the g-and-k or g-and-h distribution. To do so we simulate corresponding order statistics of the U(0,1) distribution using the exponential spacings method . This is implemented by the orderstats function. The uniform order statistics are then substituted into F1(u).

FDSA inference

The fdsa function performs inference using finite difference stochastic approximation (FDSA). FDSA, originally due to , attempts to find θ minimising a loss function L(θ) by iteratively calculating estimates θ1,θ2, Each iteration moves the estimate in the opposite direction to an estimate of the loss gradient, based on finite difference calculations.

We use FDSA for maximum likelihood estimation of IID observations. In this setting L(θ) can be taken to be the negative log likelihood, L(θ)=logL(θ)=i=1nlogf(xi;θ). The gradient of L(θ) can be estimated using only a small subset of the data, so FDSA has the potential to scale up to large datasets better than MCMC, while avoiding the approximation error of ABC. Unlike ABC and MCMC, we are not aware of FDSA having previously been used for the g-and-k and g-and-h distributions.

The g-and-k and g-and-h distributions have some parameter constaints (e.g. B>0, h0). Also we found setting further constaints from preliminary analyses sometimes helps FDSA behave well. Therefore we use a version of FDSA for bounded minimisation from , presented as Algorithm 3.

  1. Loop over 0tN1.

  2. Calculate g^t by performing the following steps for i14.

    1. Let Δi be a 4-dimensional vector whose ith component is 1 and others are zero.

    2. Let ϕ+=P(θt+ctΔi) and ϕ=P(θtctΔi).
      (Here P(ϕ) is a projection operator. Its output is ϕ such that ϕi is the closest value to ϕi in [θi,θi+]. The i subscripts represent ith components.)

    3. Let g^it=1|ϕi+ϕi|[L^(ϕ+)L^(ϕ)].

  3. Let θt+1=P(θtatg^t).

  4. Output: Final estimate θN.

Algorithm 3: Finite difference stochastic approximation (FDSA)

The unbiased estimate of L(θ) required by Algorithm 3, L^(θ), can be taken to be the sum of a random sample of m negative log likelihood terms multiplied by n/m. Hence for a vector y containing a random subsample of m observations (sometimes referred to as a batch), L^(θ) can be calculated using -sum(dgk(y,A,B,g,k,log=TRUE))*n/m (or similar for the g-and-h distribution). Variance reduction in step 1c of Algorithm 3 is possible by coupling the two estimates . Hence we use the same random subsample of data for all L^ calculations in an iteration of step 1.

FDSA convergence requires that the gain sequences at and ct must satisfy certain conditions. Following we take at=a0(A+t+1)α and ct=c0(t+1)γ. This leaves several tuning choices, which can be selected by the user, or left at default values which we provide. Following we use default values α=1 and γ=0.49. Following our default for c0 is an estimate of the standard deviation of L^(θ0) using some preliminary simulations. We provide defaults a0=1 and A=100 but it is recommended to manually tune these to produce rapid convergence. This may require several short pilot runs of the algorithm. The fdsa function allows a0 and c0 to be vectors, in which case operations in Algorithm 3 are interpreted as elementwise where necessary. This allows the user to tune gain sequences differently for each parameter. As for mcmc, we allow the user to reparameterise θ, using logB rather than B, via the logB argument, which can improve FDSA efficiency when the MLE value of B is close to zero.

Under weak assumptions, FDSA converges to a local minimum of L(θ) . In our experience the likelihood for the g-and-k and g-and-h distributions is usually unimodal, so there is little danger of converging to an incorrect mode. Nonetheless it may be a useful check on the results to rerun the algorithm from various starting points or compare with the output of another algorithm.

An alternative to FDSA is simultaneous perturbation stochastic approximation (SPSA) . Here each iteration makes a finite difference estimate of the derivative of the loss function when moving in a random direction from θt. An update moves θt a distance (negatively) proportional to this estimated derivative in the selected direction. Each SPSA iteration requires fewer likelihood estimates than FDSA, and it is asymptotically more efficient . However we found in exploratory work that for our application the SPSA updates were dominated by improving A and B estimates, and the remaining parameters were learned very slowly.

6 Illustration

We illustrate gk’s inference methods on the Garch exchange rate dataset from the Ecdat package . This consists of 1967 daily US dollar exchange rates against other currencies from 1980 to 1987. We concentrate on the exchange rate with Canadian Dollars. Let xt denote the exchange rate on day t. The log return is defined as log(xt+1/xt). Figure 3 is a time series plot of the log returns. Figure 7 shows a histogram and a quantile-quantile plot indicating that the tails are heavier than those of a normal density.

We focus on using the g-and-k distribution to model the log returns under an IID assumption. For models also including time series structure see for example . The full code for the analysis below can be run via the fx function.

The ABC and MCMC analyses which follow are Bayesian and require specification of a prior. We use a uniform prior for ease of comparison to the maximum likelihood results from FDSA. For MCMC we are able to use an improper uniform prior. For ABC a proper prior is required so we bound the parameters as follows 1<A<1, 0<B<1, 5<g<5, 0<k<10. We restrict A and B to magnitude 1 at most, as we believe log returns of this magnitude are highly unlikely. The g and k parameters are given wider support which can capture a broad range of distributional shapes.

graphic without alt text
Figure 3: Log returns for US dollar / Canadian Dollar exchange rates.

ABC

We ran ABC as follows:

rprior = function(i) {
    cbind(runif(i, -1, 1), runif(i, 0, 1), runif(i, -5, 5), runif(i, 0, 10))}
abc_out = abc(log_return, N = 1E7, rprior = rprior, M=200,
    sumstats = 'moment estimates')

This simulated 107 parameter vectors and accepting the best 200. We used moment estimator summary statistics, described earlier, which can be simulated quickly without the need to simulate an entire dataset. As a result this analysis took only 6 minutes.

The resulting approximate posterior samples are shown in Figure 6. Figure 7 shows density and quantile-quantile plots under the mean parameter values. These reveal a very poor fit to the data. However this short ABC analysis does provide reasonable tuning choices for the other methods.

FDSA

We ran FDSA as follows:

abc_out_tf = abc_out[, 1:4]
abc_out_tf[, 2] = log(abc_out_tf[, 2])
abc_est_tf = colMeans(abc_out_tf)
fdsa_out_pilot = fdsa(log_return, N = 1E4, logB = TRUE, theta0 = abc_est_tf,
    batch_size = 100, a0 = 2E-4)
a0 = c(1E-6, 1E-2, 1E-2, 1E-2)
fdsa_out = fdsa(log_return, N = 1E4, logB = TRUE, theta0 = abc_est_tf,
    batch_size = 100, a0 = a0)

We found that using the original parameterisation caused high variance in our gradient estimates. This is because the log-likelihood surface becomes extremely steep for B close to 0. Therefore we reparameterised B to logB. The initial FDSA state was set to equal the ABC means. The FDSA steps sizes a0 were tuned by trial-and-error.

Figure 4 shows a trace plot of the FDSA algorithm output. A pilot run with a0=2×104 is shown in black. Parameters logB,g and k do not converge over 10,000 iterations. However they have smooth curves, indicating that there is relatively little noise in their gradient estimates and so larger steps could be taken. In contrast A converges quickly and then oscillates noisily. This indicates that a smaller step size could be used to average out this noise more effectively without endangering convergence. Therefore for the final run we used a0=(106,102,102,102).

The final FDSA analysis took 17 minutes. The final states were A=9.1×105, B=1.7×103, g=2.0×102 and k=0.35. Figure 7 shows density and quantile-quantile plots under these parameter values. These are a much better fit to the data than the ABC results.

Next we use the FDSA results to help tune an MCMC algorithm, which quantifies the uncertainty in the parameter values.

graphic without alt text
Figure 4: Output from the FDSA algorithm to infer g-and-k parameters for exchange rate log returns. Black shows output from a pilot run with a0=2×104. Red shows output from the final run with a0=(106,102,102,102).

MCMC

We ran MCMC as follows:

fdsa_est_tf = fdsa_out[1E5, 1:4]
Sigma0 = var(fdsa_out[1E5 + (-1000:0), 1:4])
log_prior = function(theta) {
    if (theta[4] < 0) return(-Inf)
    return(theta[2])
}
mcmcout_tf = mcmc(log_return, N = 1E4, logB = TRUE, get_log_prior = log_prior,
    theta0 = fdsa_est, Sigma0 = Sigma0)

Again we used a log reparameterisation for B. To achieve an improper uniform prior on the original parameterisation, we used a prior density proportional of B1(k>0) on (A,logB,g,k) (where 1 represents an indicator function). Our initial parameter vector was the final FDSA state. We use the variance matrix of the last 1000 FDSA states to select the initial MCMC proposal variance.

Figure 5 shows a trace plot of the MCMC algorithm output. For the first few hundred iterations small proposals are made, at least for logB, g and k, but the proposal variance quickly adapts and the remainder of the output appears to have converged. Exploratory work showed that taking a poor initial state meant MCMC is very slow to converge, because the variance matrix adapts to the transient state of the algorithm. Hence tuning based on FDSA output is very useful.

The MCMC analysis took 39 minutes. Figure 6 parameter histograms and figure 7 shows density and quantile-quantile plots. These are similar to the FDSA fit. Note that the density plot is based on mean parameter values from the MCMC output (after discarding the first half of the output as burn-in and transforming logB values back to the original parameterisation).

graphic without alt text
Figure 5: States of an MCMC algorithm to infer g-and-k parameters for exchange rate log returns.
graphic without alt text
Figure 6: Parameter inference for fitting the g-and-k distribution to exchange rate log returns. The top row shows the ABC posterior sample and the bottom row the MCMC posterior sample, which requires much more concentrated parameter scales. FDSA estimates of the MLEs are shown by crosses on the x-axis.
graphic without alt text
Figure 7: (Left) Histogram of exchange rate log returns, and fitted g-and-k densities. (Right) Quantile-quantile (QQ) plots of fitted g-and-k densities. QQ plots are shown for 30 vectors of parameters sampled from the second half of the MCMC output.

Summary

The ABC analysis is quick but produces a poor fit. However it helps tune the FDSA method which finds a good estimate of the MLE in a reasonable time. Further computational effort using MCMC provides a Bayesian fit. Figure 7 shows that the g-and-k distribution fits the data better than a normal distribution, but still does not fit the most extreme observations. Further improvements might be possible by using more flexible distributions, for example allowing different k parameters for the upper and lower tails .

7 Discussion

This paper has reviewed the g-and-k and g-and-h distributions, and introduced the gk package to work with them. The package includes the usual distributional functions, although the pdf and cdf functions are slow due to relying on numerical root-finding. Another function tests the validity of different parameter combinations, and this was used to produce a novel result on which parameters are valid for the g-and-k distribution (i.e. it is appears to be sufficient that kmax(0.5,0.0450.01g2).) The package also provides several methods for inference of IID data under these distributions, and their use has been illustrated above. The methods include a FPSA algorithm which can find MLEs for large datasets in a reasonable time and has not been applied to this problem before.

8 Appendix A: Formulae

The derivatives of the Q functions are as follows: (4)Qgk(z;A,B,g,k,c)=B(1+z2)kRgk(z;g,k),

(5)Qgh(z;A,B,g,h,c)=Bexp(hz2/2)Rgh(z;g,h), where (6)Rgk(z;g,k,c)=[1+ctanh(gz/2)]1+(2k+1)z21+z2+cgz2cosh2(gz/2),

(7)Rgh(z;g,h,c)=[1+ctanh(gz/2)](1+hz2)+cgz2cosh2(gz/2). Observe that each Q function has the same sign and roots as the corresponding R function.

9 Appendix B: Range of valid parameters - theory

This appendix proves theoretical results quoted earlier about which parameter values produce valid g-and-k and g-and-h distributions.

First note that the defining functions in (2) and (3) both have the property that Q(z;A,B,g,k,c)=Q(z;A,B,g,k,c). Therefore any behaviour produced by c<0 can be replicated with c>0 and a different choice of g. So for simplicity it suffices to concentrate on c0.

For the remainder of this appendix, distributional validity will correspond to a strictly increasing quantile function. This property is generally violated if c>1, as there are two solutions to Q(z)=A: z=0 and a solution to 1+ctanh(gz/2)=0 (The only exception is the special case of g=0.) Also taking h<0 or k<1/2 is invalid, as in either case Q, which is continuous, has a positive gradient at z=0 but limits of zero.

Finally it is shown that non-negative values of k or h produce valid distributions provided that 0c<c0.83 . From Appendix A it suffices to derive the values of c such that R(z) – representing either Rgk(z;g,k,c) or Rgh(z;g,h,c) – is guaranteed to be positive for k0 or h0. Note that R(z) is a continuous function of z, and R(0)>0. So a sufficient condition for validity is that no solution to R(z)=0 exists. Rearranging R(z)=0 using (6) and (7) gives (8)1/c=uvsech2u+tanhu,whereu=gz/2,andv={1+z21+(2k+1)z2(g-and-k)11+hz2(g-and-h) For k0 or h0, v can only take values in (0,1] with 1 attained by z=0. Hence (8) gives c>0 if and only if u>0, and we concentrate on this case from now on. We wish to find the minimum positive solution for c. Since 1/c is increasing in v it suffices to concentrate on its largest value, v=1. The problem reduces to minimising (usechu+tanhu)1 for u>0. Numerically this gives c0.83, as shown in Figure 8.

graphic without alt text
Figure 8: Solutions to (8) for v=1 and u>0.

10 Acknowledgements

Thanks to Kieran Peel who wrote a helpful undergraduate dissertation on this topic.

CRAN packages used

gk, microbenchmark, abc, EasyABC, Ecdat

CRAN Task Views implied by cited packages

Bayesian, Distributions, Econometrics, TimeSeries

Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

Footnotes

    References

    D. Allingham, R. A. R. King and K. L. Mengersen. Bayesian estimation of quantile distributions. Statistics and Computing, 19(2): 189–201, 2009. URL https://doi.org/10.1007/s11222-008-9083-x.
    M. G. B. Blum, M. A. Nunes, D. Prangle and S. A. Sisson. A comparative review of dimension reduction methods in approximate bayesian computation. Statistical Science, 28(2): 189–208, 2013. URL https://doi.org/10.1214/12-sts406.
    Y. Croissant. Ecdat: Data sets for econometrics. 2016. URL https://CRAN.R-project.org/package=Ecdat. R package version 0.3-1.
    K. Csilléry, O. François and M. G. B. Blum. Abc: An R package for approximate Bayesian computation (ABC). Methods in ecology and evolution, 3(3): 475–479, 2012. URL https://doi.org/10.1111/j.2041-210X.2011.00179.x.
    C. C. Drovandi and A. N. Pettitt. Likelihood-free Bayesian estimation of multivariate quantile distributions. Computational Statistics & Data Analysis, 55(9): 2541–2556, 2011. URL https://doi.org/10.1016/j.csda.2011.03.019.
    P. Fearnhead and D. Prangle. Constructing summary statistics for approximate Bayesian computation: Semi-automatic ABC. Journal of the Royal Statistical Society, Series B, 74: 419–474, 2012. URL https://doi.org/10.1111/j.1467-9868.2011.01010.x.
    W. Gilchrist. Statistical modelling with quantile functions. CRC Press, 2000. URL https://doi.org/10.1201/9781420035919.
    H. Haario, E. Saksman and J. Tamminen. An adaptive Metropolis algorithm. Bernoulli, 223–242, 2001. URL https://doi.org/10.2307/3318737.
    C. Hastings Jr., F. Mosteller, J. W. Tukey and C. P. Winsor. Low moments for small samples: A comparative study of order statistics. The Annals of Mathematical Statistics, 413–426, 1947. URL https://doi.org/10.1214/aoms/1177730388.
    M. A. Haynes, H. L. MacGillivray and K. L. Mengersen. Robustness of ranking and selection rules using generalised g-and-k distributions. Journal of Statistical Planning and Inference, 65(1): 45–66, 1997. URL https://doi.org/10.1016/s0378-3758(97)00050-5.
    M. Haynes and K. Mengersen. Bayesian estimation of g-and-k distributions using MCMC. Computational Statistics, 20(1): 7–30, 2005. URL https://doi.org/10.1007/BF02736120.
    F. Jabot, T. Faure and N. Dumoulin. EasyABC: Performing efficient approximate Bayesian computation sampling schemes using R. Methods in Ecology and Evolution, 4(7): 684–687, 2013. URL https://doi.org/10.1111/2041-210x.12050.
    J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, 23(3): 462–466, 1952. URL https://doi.org/10.1007/978-1-4613-8505-9_4.
    N. L. Kleinman, J. C. Spall and D. Q. Naiman. Simulation-based optimization with stochastic approximation using common random numbers. Management Science, 45(11): 1570–1578, 1999. URL https://doi.org/10.1287/mnsc.45.11.1570.
    H. J. Kushner and G. G. Yin. Stochastic approximation and recursive algorithms and applications. Springer, 2003. URL https://doi.org/10.1007/b97441.
    P. L’Ecuyer and P. W. Glynn. Stochastic optimization by simulation: Convergence proofs for the GI/G/1 queue in steady-state. Management Science, 40(11): 1562–1578, 1994. URL https://doi.org/10.1287/mnsc.40.11.1562.
    J.-M. Marin, P. Pudlo, C. P. Robert and R. J. Ryder. Approximate Bayesian computational methods. Statistics and Computing, 22(6): 1167–1180, 2012. URL https://doi.org/10.1007/s11222-011-9288-2.
    O. Mersmann. Microbenchmark: Accurate timing functions. 2015. URL https://CRAN.R-project.org/package=microbenchmark. R package version 1.4-2.1.
    G. W. Peters, W. Y. Chen and R. H. Gerlach. Estimating quantile families of loss distributions for non-life insurance modelling via L-moments. Risks, 4(2): 14, 2016. URL https://doi.org/10.2139/ssrn.2739417.
    G. D. Rayner and H. L. MacGillivray. Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions. Statistics and Computing, 12(1): 57–75, 2002. URL https://doi.org/10.1023/A:1013120305780.
    B. Ripley. Stochastic simulation. Wiley, 1987. URL https://doi.org/10.1002/9780470316726.
    E. Saksman and M. Vihola. On the ergodicity of the adaptive Metropolis algorithm on unbounded domains. The Annals of Applied Probability, 20(6): 2178–2203, 2010. URL https://doi.org/10.1214/10-aap682.
    S. A. Sisson, Y. Fan and M. Beaumont, eds. Handbook of Approximate Bayesian Computation. Chapman & Hall/CRC, 2017. URL https://doi.org/10.1201/9781315117195.
    J. C. Spall. Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Transactions on aerospace and electronic systems, 34(3): 817–823, 1998. URL https://doi.org/10.1109/7.705889.
    J. W. Tukey. Modern techniques in data analysis. In Proceedings of the NSF-sponsored regional research conference, 1977. Southern Massachusetts University.

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Prangle, "gk: An R Package for the g-and-k and Generalised g-and-h Distributions", The R Journal, 2020

    BibTeX citation

    @article{RJ-2020-010,
      author = {Prangle, Dennis},
      title = {gk: An R Package for the g-and-k and Generalised g-and-h Distributions},
      journal = {The R Journal},
      year = {2020},
      note = {https://rjournal.github.io/},
      volume = {12},
      issue = {1},
      issn = {2073-4859},
      pages = {7-20}
    }