statmod: Probability Calculations for the Inverse Gaussian Distribution

The inverse Gaussian distribution (IGD) is a well known and often used probability distribution for which fully reliable numerical algorithms have not been available. Our aim in this article is to develop software for this distribution for the R programming environment. We develop fast, reliable basic probability functions (dinvgauss, pinvgauss, qinvgauss and rinvgauss) that work for all possible parameter values and which achieve close to full machine accuracy. The most challenging task is to compute quantiles for given cumulative probabilities and we develop a simple but elegant mathematical solution to this problem. We show that Newton's method for finding the quantiles of a IGD always converges monotonically when started from the mode of the distribution. Simple Taylor series expansions are used to improve accuracy on the log-scale. The IGD probability functions provide the same options and obey the same conventions as do probability functions provided in the standard R stats package. The IGD functions are part of the statmod package available from the CRAN repository.


Introduction
The inverse Gaussian distribution (IGD) [20,8] is widely used in a variety of application areas including reliability and survival analysis [23,4,2,5,21,1]. It is more generally used for modeling non-negative positively skewed data because of its connections to exponential families and generalized linear models [15,3,17,7].
Our aim in this article is to develop reliable software for this distribution for the R programming environment (http://www.r-project.org). Basic probability functions for the IGD have been implemented previously in James Lindsey's R package rmutil [10] and in the CRAN packages SuppDists [22] and STAR [13]. We have found however that none of these IGD functions work for all parameter values or return results to full machine accuracy. Bob Wheeler remarks in the SuppDists documentation that the IGD "is an extremely difficult distribution to treat numerically". The rmutil package was removed from CRAN in 1999 but is still available from Lindsey's webpage (http://www.commanster.eu/rcode.html). SuppDists was orphaned in 2013 but is still available from CRAN. The SuppDists code is mostly implemented in C while the other packages are pure R as far as the IGD functions are concerned.
The probability density of the IGD has a simple closed form expression and so is easy to compute. Care is still required though to handle infinite parameter values that correspond to valid limiting cases. The cumulative distribution function (cdf) is also available in closed form via an indirect relationship with the normal distribution [16,6]. Considerable care is nevertheless required to compute probabilities accurately on the log-scale, because the formula involves a sum of two normal probabilities on the un-logged scale. Random variates from IGDs can be generated using a combination of chisquare and binomial random variables [12]. Most difficult is the inverse cdf or quantile function, which must be computed by some iterative numerical approximation.
Two strategies have been used to compute IGD quantiles. One is to solve for the quantile using a general-purpose equation solver such as the uniroot function in R. This is the approach taken by the qinvgauss functions in the rmutil and STAR packages. This approach can usually be relied on to converge satisfactorily but is computationally slow and provides only limited precision. The other approach is to use Newton's method to solve the equation after applying an initial approximation [9]. This approach was taken by one of the current authors when developing inverse Gaussian code for S-PLUS [18]. It is also the approach taken by the qinvGauss function in the SuppDists package. This approach is fast and accurate when it works but can fail unpredictably when the Newton iteration diverges. Newton's method cannot in general be guaranteed to converge, even when the initial approximation is close to the required value, and the parameter values for which divergence occurs are hard to predict.
We have resolved the above difficulties by developing a Newton iteration for the IGD quantiles that has guaranteed convergence. Instead of attempting to find a starting value that is close to the required solution, we instead use the convexity properties of the cdf function to approach the required quantiles in a predictable fashion. We show that Newton's method for finding the quantiles of an IGD always converges when started from the mode of the distribution. Furthermore the convergence is monotonic, so that backtracking is eliminated. Newton's method is eventually quadratically convergent, meaning that the number of decimal places corrected determined tends to double with each iteration [14]. Although the starting value may be far from the required solution, the rapid convergence means the starting value is quickly left behind. Convergence tends to be rapid even when the required quantile in the extreme tails of the distribution.
The above methods have been implemented in the dinvgauss, pinvgauss, qinvgauss and rinvgauss functions of the statmod package [19]. The functions give close to machine accuracy for all possible parameter values. They obey similar conventions to the probability functions provided in the stats package that is bundled with R. Tests The densities are unimodal with mode between 0 and µ. As µ/λ increases the distribution becomes more right skew and the mode decreases relative to the mean. Note that λ = 1/φ.
show that the functions are faster, more accurate and more reliable than existing functions for the IGD. Every effort has to made to ensure that the functions return results for the widest possible range of parameter values.

Density function
The inverse Gaussian distribution, denoted IG(µ,φ), has probability density function (pdf) for x > 0, µ > 0 and φ > 0. The mean of the distribution is µ and the variance is φµ 3 . In generalized linear model theory [11,17], φ is called the dispersion parameter. Another popular parametrization of the IGD uses λ = 1/φ, which we call the shape parameter. For best accuracy, we compute d(x; µ, φ) on the log-scale and then exponentiate if an unlogged value is required. Note that the mean µ can be viewed as a scaling parameter: if X is distributed as IG(µ,φ), then X/µ is also inverse Gaussian with mean 1 and dispersion φµ. The skewness of the distribution is therefore determined by φµ, and in fact φµ is the squared coefficient of variation of the distribution.

Description
Parameter values log-pdf pdf cdf Left limit The IGD is unimodal with mode at where κ = 3φµ/2 [8]. The second factor in the mode is strictly between 0 and 1, showing that the mode is strictly between 0 and µ. Figure 1 shows the pdf of the IGD for various choices of µ and λ. Care needs to be taken with special cases when evaluating the pdf (Table 1). When φµ is large, a Taylor series expansion shows that the mode becomes dependent on φ only: .
Under the same conditions, the peak value of the density can be seen to converge to φ(2π/27) −1/2 × exp(−3/2). This shows that the distribution has a spike at 0 whenever φ is very large, regardless of µ. It is also known that [16]. Amongst other things, this implies that 1/(Xφ) ∼ χ 2 1 asymptotically for µ large. For infinite µ, the density becomes .
The pdf is always NA if x is NA. Missing values for φ lead to NA values for the pdf except when x < 0 or x = ∞. Missing values for µ lead to NA values for the pdf except when Next we give some code examples. We start by loading the packages that we will compare. Note that statmod is loaded last and is therefore first in the search path. All the existing functions rmutil::dinvgauss, SuppDist::dinvGauss and STAR::dinvgauss return errors for the above calls; they do not tolerate NA values, or infinite parameter values, or x values outside the support of the distribution.

Cumulative distribution function
Let p(q; µ, φ) = P (X ≤ q) be the left tail cdf, and writep(q; µ, φ) for the right tail probability P (X > q) = 1 − p(q; µ, φ). The formula developed by [16] for the cdf is where q m = q/µ, φ m = φµ, r = (qφ) 1/2 and p norm is the cdf of the standard normal distribution. The right tail probability can be written similarly: wherep norm is the right tail of the standard normal. The fact that this formula is additive on the unlogged scale poses some numerical problems. The p norm () evaluations are subject to floating underflow, the exp() evaluation is subject to overflow, and there is the danger of subtractive cancellation when computing the right tail probability. It is possible to derive an asymptotic expression for the right tail probability. If q is very large then: See the Appendix for the derivation of this approximation. This approximation is very accurate when φ −1/2 m (q m − 1) > 10 5 , but only gives 2-3 significant figures correctly for more modest values such as φ To avoid or minimize the numerical problems described above, we convert the terms in the cdf to the log-scale and remove a common factor before combining the two term terms to get log p. Given a quantile value q, we compute the corresponding log p as follows: where log p norm () is computed by pnorm with lower.tail=TRUE and log.p=TRUE. Note also that log1p() is an R function that computes the logarithm of one plus its argument avoiding subtractive cancellation for small arguments. The computation of the right tail probability is similar but with Because of this careful computation, statmod::pinvgauss function is able to compute correct cdf values even in the far tails of the distribution: SuppDists::pinvGauss returns non-zero right tail probabilities, but these are too large by a factor of 10: The use of log-scale computations means that statmod::pinvgauss can accurately compute log-probabilities that are too small to be represented on the unlogged scale: > pinvgauss(0.0001, mean = 1.5, disp = 0.7, log.p = TRUE) [1] -7146.914 None of the other packages can compute log-probabilities less than about −700.
pinvgauss handles special cases similarly to dinvgauss (Table 1). Again, none of the existing functions do this: We can test the accuracy of the cdf functions by comparing to the cdf of the χ 2 1 distribution. For any q 1 < µ, let q 2 > µ be that value satisfying From equation 4, we can conclude that the upper tail probability for the χ 2 1 distribution at z should be the sum of the IGD tail probabilities for q 1 and q 2 , i.e., The following code implements this process for an illustrative example with µ = 1.5, φ = 0.7 and q 1 = 0.1. First we have to solve for q 2 : > options(digits = 4) > mu <-1.5 > phi <-0.7 > q1 <-0.1 > z <-(q1 -mu)^2 / (phi * mu^2 * q1) > polycoef <-c(mu^2, -2 * mu -phi * mu^2 * z, 1) > q <-Re(polyroot(polycoef)) > q [1] 0. 1 22.5 The chisquare cdf value corresponding to the left hand size of equation 6 is: It can be seen that the statmod function is the only one to agree with pchisq to 15 significant figures, corresponding to a relative error of about 10 −15 . The other three packages give 12 significant figures, corresponding to relative errors of slightly over 10 −12 .

Inverting the cdf
Now consider the problem of computing the quantile function q(p; µ, φ). The quantile function computes q satisfying P (X ≤ q) = p.
If q n is an initial approximation to q, then Newton's method is a natural choice for refining the estimate. Newton's method gives the updated estimate as q n+1 = q n + p − p(q n ; µ, φ) d(q n ; µ, φ) .
For right-tail probabilities, the Newton step is almost the same: where now P (X > q) = p. Newton's method is very attractive because it is quadratically convergent if started sufficiently close to the required value. It is hard however to characterize how close the starting value needs to be to achieve convergence and in general there is no guarantee that the Newton iteration will not diverge or give impossible values such as q < 0 or q = ∞. Our approach is to derive simple conditions on the starting values such that the Newton iteration always converges and does so without any backtracking. We call this behavior monotonic convergence.
Recall that the IGD is unimodal for all parameter values with mode m given previously. It follows that the pdf d(q; µφ) is increasing for all q < m and decreasing for all q > m and the cdf p(q; µ, φ) is convex for q < m and concave for q > m. In other words, the cdf has a point of inflexion at the mode of the distribution.
Suppose that the required q satisfies q ≥ m and suppose that the working estimate satisfies m ≤ q n ≤ q. It can be seen that the cdf is concave in the interval [q n , q], the Newton step will be positive and the updated estimate q n+1 will still satisfy m ≤ q n+1 ≤ q (Figure 2). Suppose instead that q < m and suppose that the working estimate satisfies q ≤ q n ≤ m. In this case it can be seen that the cdf is convex in the interval [q n , q], the Newton step will be negative and the updated estimate q n will still satisfy q ≤ q n+1 ≤ m ( Figure 2). It follows that Newton's method is always monotonically convergent provided that the starting value lies between the mode m and the required value q. In fact the mode m itself can be used as the starting value. Note that to compute the mode m accurately without subtractive cancellation we use equation 3 when κ is large and use equation 2 otherwise.
We use q 0 = m as the starting value for the Newton iteration unless the left or right tail probability is very small. When the left tail probability is less than 10 −5 , we use instead q 0 = µ φq 2 norm where q norm is the corresponding quantile of the standard normal distribution. When the right tail probability is less than 10 −5 , we use  where q gamma is the corresponding quantile of the gamma distribution with the same mean and variances as the IGD. These starting values are closer to the required q than is m but still lie between m and the required q and so are in the domain of monotonic convergence. We use the alterative starting values only for extreme tail probabilities because in other cases the computational cost of computing the starting value is greater than the saving enjoyed by reducing the number of Newton iterations that are needed. The term p − p(q n ; µ, φ) in the Newton step could potentially suffer loss of floating point precision by subtractive cancellation when p and p(q n ; µ, φ) are nearly equal or if p is very close to 1. To avoid this we work with p on the log-scale and employ a Taylor series expansion when p and p(q n ; µ, φ) are relatively close. Let δ = log p − log p(q n ; µ, φ). When |δ| < 10 −5 , we approximate Here log p(q n ; µ, φ) is computed by pinvgauss with log.p=TRUE and log1p(−δ/2) is computed using the log1p function.

Random deviates
The functions statmod::rinvgauss, SuppDists::rinvGauss and STAR::rinvgauss all use the same algorithm to compute random deviates from the IGD. The method is to generate chisquare random deviates corresponding to (X − µ) 2 /(φXµ 2 ), and then choose between the two possible X values leading to the same chisquare value with probabilities worked out by [12]. The SuppDists function is faster than the others because of the implementation in C. Nevertheless, the pure R statmod and STAR functions are acceptably fast. The statmod function generates a million random deviates in about a quarter of a second of elapsed time on a standard business laptop computer while STAR takes about half a second.
The rmutil::rinvgauss function generates random deviates by running qinvgauss on random uniform deviates. This is far slower and less accurate than the other functions.

Discussion
Basic probability calculations for the IGD have been available in various forms for some time but the functions described here are the first to work for all parameter values and to return close to full machine accuracy.
The statmod functions achieve good accuracy by computing probabilities on the log-scale where possible. Care is given to handle special limiting cases, including some cases that have not been previously described. The statmod functions trap invalid parameter values, provide all the standard arguments for probability functions in the R and preserve argument attributes on output.
A new strategy has been described to invert the cdf using a monotonically convergent Newton iteration. It may seem surprising that we recommend starting the iteration from the same value regardless of the quantile required. Intuitively, a starting value that is closer to the required quantile might have been expected to be better. However using an initial approximation runs the risk of divergence, and convergence of Newton's method from the mode is so rapid that the potential advantage of a closer initial approximation is minimized. The statmod qinvgauss function is 40 times faster than the quantile functions in the rmutil or STAR packages, despite returning 16 rather than 6 figures of accuracy. It is also 3 times faster than SuppDists, even though SuppDists::qinvGauss is written in C, uses the same basic Newton strategy and has a less stringent stopping criterion. The starting values for Newton's method used by SuppDists::qinvGauss are actually closer to the final values than those used by statmod::qinvgauss, but the latter are more carefully chosen to achieve smooth convergence without backtracking. SuppDists::qinvGauss uses the log-normal approximation of [24] to start the Newton iteration and the STAR::qinvgauss uses the same approximation to setup the interval limits for uniroot. Unfortunately the log-normal approximation has much heavier tails than the IGD, meaning that the starting values are more extreme than the required quantiles and are therefore outside the domain of monotonic convergence.
As well as the efficiency gained by avoiding backtracking, monotonic convergence has the advantage that any change in sign of the Newton step is a symptom that the limits of floating point accuracy have been reached. In the statmod qinvgauss function, the Newton iteration is stopped if this change of sign occurs before the convergence criterion is achieved.
The current statmod functions could be made faster by reimplementing in C, but the pure R versions have benefits in terms of understandability and easy maintenance, and they are only slightly slower than comparable functions such as qchisq and qt.
This strategy used here to compute the quantile could be used for any continuous unimodal distribution, or for continuous distribution that can be transformed to be unimodal. Appendix: asymptotic right tail probabilities Here we derive an asymptotic expression for the right tail probability,p(q; µ, φ), when q is large. Without loss of generality, we will assume µ = 1. First, we drop the 1/x term in the exponent of the pdf (1), leading to: Integrating the pdf gives the right tail probability as: for q large. Transforming the variable of integration gives: Finally, we approximate the integral using for q large.