Generalized Hermite Distribution Modelling with the R Package hermite

The Generalized Hermite distribution (and the Hermite distribution as a particular case) is often used for fitting count data in the presence of overdispersion or multimodality. Despite this, to our knowledge, no standard software packages have implemented specific functions to compute basic probabilities and make simple statistical inference based on these distributions. We present here a set of computational tools that allows the user to face these difficulties by modelling with the Generalized Hermite distribution using the R package hermite. The package can also be used to generate random deviates from a Generalized Hermite distribution and to use basic functions to compute probabilities (density, cumulative density and quantile functions are available), to estimate parameters using the maximum likelihood method and to perform the likelihood ratio test for Poisson assumption against a Generalized Hermite alternative. In order to improve the density and quantile functions performance when the parameters are large, Edgeworth and Cornish-Fisher expansions have been used. Hermite regression is also a useful tool for modeling inflated count data, so its inclusion to a commonly used software like R will make this tool available to a wide range of potential users. Some examples of usage in several fields of application are also given.


Introduction
The Poisson distribution is without a doubt the most common when dealing with count data.There are several reasons for that, including the fact that the maximum likelihood estimate of the population mean is the sample mean and the property that this distribution is closed under convolutions (see Johnson et al. 2005).However, it is very common in practice that data presents overdispersion or zero inflation, cases where the Poisson assumption does not hold.In these situations it is reasonable to consider discrete distributions with more than one parameter.The class of all two-parameter discrete distributions closed under convolutions and satisfying that the sample mean is the maximum likelihood estimator of the population mean are characterized in Puig (2003).One of these families is just the Generalized Hermite distribution.Several generalizations of the Poisson distribution have been considered in the literature (see, for instance, Gurland 1957;Lukacs 1970;Kemp andKemp 1965, 1966), that are compound-Poisson or contagious distributions.They are families with probability generating function (PGF) defined by P(s) = exp(λ( f (s) − 1)) = exp(a 1 (s − 1) + a 2 (s 2 − 1) where f (s) is also a PGF and ∑ m i=1 a i = λ.Many well known discrete distributions are included in these families, like the Negative Binomial, Polya-Aeppli or the Neyman A distributions.The Generalized Hermite distribution was first introduced in Gupta and Jain (1974) as the situation where a m is significant compared to a 1 in (1), while all the other terms a i are negligible, resulting in the PGF P(s) = exp(a 1 (s − 1) + a m (s m − 1)). (2) After fixing the value of the positive integer m ≥ 2, the order or degree of the distribution, the domain of the parameters is a 1 > 0 and a m > 0. Note that when a m tends to zero, the distribution tends to a Poisson.Otherwise, when a 1 tends to zero it tends to m times a Poisson distribution.It is immediate to see that the PGF in (2) is the same than the PGF of X 1 + mX 2 , where X i are independent Poisson distributed random variables with population mean a 1 and a m respectively.From here, it is straightforward to calculate the population mean, variance, skewness and excess kurtosis of the Generalized Hermite distribution: A useful expression for the probability mass function of the Generalized Hermite distribution in terms of the population mean µ and the population index of dispersion d = σ 2 /µ is provided in Puig (2003).
The R Journal Vol.7/2, December 2015 ISSN 2073-4859 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.0 PMF q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.25 where Note that m can be expressed as m = d−1 1+log(p 0 )/µ .Because the denominator is a measure of zero inflation, m can be understood as an index of the relationship between the overdispersion and the zero inflation.
The probabilities can be also written in terms of the parameters a 1 , a m using the identities given in (3).
The case m = 2 in (2) is covered in detail in Kemp and Kemp (1965) and Kemp and Kemp (1966) and the resulting distribution is simply called Hermite distribution.In that case, the probability mass function, in terms of the parameters a 1 and a 2 , has the expression The probability mass function and the distribution function for some values of a 1 and a 2 are shown in Figure 1.Gupta and Jain (1974) also develop a recurrence relation that can be used to calculate the probabilities in a numerically efficient way: where p k = P(Y = k) and the first values can be computed as Although overdispersion or multimodality are common situations when dealing with count data and the Generalized Hermite distribution provides an appropriate framework to face these situations, the use of techniques based on this distribution was not easy in practice as they were not available in any standard statistical software.A description of the hermite package's main functionalities will be given in Section Package hermite.Several examples of application in different fields will be discussed in Section Examples, and finally some conclusions will be commented in Section Conclusions.

Package hermite
Like the common distributions in R, the package hermite implements the probability mass function (dhermite), the distribution function (phermite), the quantile function (qhermite) and a function for random generation (rhermite) for the Generalized Hermite distribution.It also includes the function glm.hermite, which allows to calculate, for a univariate sample of independent draws, the maximum likelihood estimates for the parameters and to perform the likelihood ratio test for a Poisson null hypothesis against a Generalized Hermite alternative.This function can also carry out Hermite regression including covariates for the population mean, in a very similar way to that of the well known R function glm.

Probability mass function
The probability mass function of the Generalized Hermite distribution is implemented in hermite through the function dhermite.A call to this function might be dhermite(x, a, b, m = 2) The R Journal Vol.7/2, December 2015 ISSN 2073-4859 The description of these arguments can be summarized as follows: • x: Vector of non-negative integer values.
• a: First parameter for the Hermite distribution.
• b: Second parameter for the Hermite distribution.
• m: Degree of the Generalized Hermite distribution.Its default value is 2, corresponding to the classical Hermite distribution introduced in Kemp and Kemp (1965).
The recurrence relation ( 6) is used by dhermite for the computation of probabilities.For large values of any parameter a or b (above 20), the probability of Y taking x counts is approximated using an Edgeworth expansion of the distribution function (7), i.e.P(Y = x) = F H (x) − F H (x − 1).The Edgeworth expansion does not guarantee positive values for the probabilities in the tails, so in case this approximation returns a negative probability, the probability is calculated by using the normal approximation P(Y = x) = Φ(x + ) − Φ(x − ) where Φ is the standard normal distribution function and The normal approximation is justified taking into account the representation of any Generalized Hermite random variable Y, as Y = X 1 + mX 2 where X i are independent Poisson distributed with population means a, b.Therefore, for large values of a or b, the Poissons are well approximated by normal distributions.

Distribution function
The distribution function of the Generalized Hermite distribution is implemented in hermite through the function phermite.A call to this function might be phermite(q, a, b, m = 2, lower.tail= TRUE) The description of these arguments can be summarized as follows: • q: Vector of non-negative integer quantiles.
All remaining arguments are defined as specified for dhermite.
If a and b are large enough (a or b > 20), X 1 and X 2 are approximated by N(a, √ a) and N(b, √ b) respectively, so Y can be approximated by a normal distribution with mean a + mb and variance a + m 2 b.This normal approximation is improved by means of an Edgeworth expansion (Barndorff-Nielsen and Cox, 1989), using the following expression where Φ and φ are the typified normal distribution and density functions respectively, He n (x) are the nth-degree probabilists' Hermite polynomials (Barndorff-Nielsen and Cox, 1989) x * is the typified continuous correction of x considered in Pace and Salvan (1997) and γ 1 and γ 2 are respectively the skewness and the excess kurtosis of Y expressed in (3).

Quantile function
The quantile function of the Generalized Hermite distribution is implemented in hermite through the function qhermite.The description of these arguments can be summarized as follows: • p: Vector of probabilities.
All remaining arguments are defined as specified for phermite.The quantile is right continuous: qhermite(p,a,b,m) is the smallest integer x such that P(Y ≤ x) ≥ p, where Y follows an m-th order Hermite distribution with parameters a and b.
When the parameters a or b are over 20, a Cornish-Fisher expansion is used (Barndorff-Nielsen and Cox, 1989) to approximate the quantile function.The Cornish-Fisher expansion uses the following expression where u p is the p quantile of the typified normal distribution.

Random generation
The random generation function rhermite uses the relationship between Poisson and Hermite distributions detailed in Sections Introduction and Probability mass function.A call to this function might be rhermite(n, a, b, m = 2) The description of these arguments can be summarized as follows: • n: Number of random values to return.
All remaining arguments are defined as specified for dhermite.

Maximum likelihood estimation and Hermite regression
Given a sample X = x 1 , . . ., x n of a population coming from a generalized Hermite distribution with mean µ, index of dispersion d and order m, the log-likelihood function is where The maximum likelihood equations do not always have a solution.This is due to the fact that this is not a regular family of distributions because its domain of parameters is not an open set.The following result gives a sufficient and necessary condition for the existence of such a solution (Puig, 2003): Proposition 1 Let x 1 , . . ., x n be a random sample from a generalized Hermite population with fixed m.Then, the maximum likelihood equations have a solution if and only if µ (m)   xm > 1, where x is the sample mean and µ (m)  is the m-th order sample factorial moment, If the likelihood equations do not have a solution, the maximum of the likelihood function ( 8) is attained at the border of the domain of parameters, that is, μ = x, d = 1 (Poisson distribution), or μ = x, d = m (m times a Poisson distribution).The case μ = x, d = m corresponds to the very improbable situation where all the observed values were multiples of m.Then, in general, when the condition of Proposition 1 is not satisfied, the maximum likelihood estimators are μ = x, d = 1.This means that the data is fitted assuming a Poisson distribution.
The package hermite allows to estimate the parameters µ and d given an univariate sample by means of the function glm.hermite: glm.hermite(formula, data, link = "log", start = NULL, m = NULL) The description of the arguments can be summarized as follows: • formula: Symbolic description of the model.A typical predictor has the form response ∼ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response.
The R Journal Vol.7/2, December 2015 ISSN 2073-4859 • data: An optional data frame containing the variables in the model.
• link: Character specification of the population mean link function: "log" or "identity".By default link = "log".
• start: A vector containing the starting values for the parameters of the specified model.Its default value is NULL.
• m: Value for parameter m.Its default value is NULL, and in that case it will be estimated as m, more details below.
The returned value is an object of class glm.hermite, which is a list including the following components: • coefs: The vector of coefficients.
• loglik: Log-likelihood of the fitted model.
• vcov: Covariance matrix of all coefficients in the model (derived from the Hessian returned by the maxLik() output).
• fitted.values:The fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.
If the condition given in Proposition 1 is not met for a sample x, the glm.hermite function provides the maximum likelihood estimates μ = x and d = 1 and a warning message advising the user that the MLE equations have no solutions.
The function glm.hermite can also be used for Hermite regression as described below and as will be shown through practical examples in Sections Giles (2007) and DiGiorgio et al. (2004).
Covariates can be incorporated into the model in various ways (see, e.g., Giles 2007).In function glm.hermite, the distribution is specified in terms of the dispersion index and its mean, which is then related to explanatory variables as in linear regression or other generalized linear models.That is, for Hermite regression, we assume Y i follows a generalized Hermite distribution of order m, where we retain the dispersion index d (> 1) as a parameter to be estimated and let the mean µ i for the i-th observation vary as a function of the covariates for that observation, i.e., µ i = h(x i x i x i t β), where x i x i x i is a vector of covariates, t denotes the transpose vector, β is the corresponding vector of coefficients to be estimated and h is a link function.Note that because the dispersion index d is taken to be constant, this is a linear mean-variance (NB1) regression model.
The link function provides the relationship between the linear predictor and the mean of the distribution function.Although the log is the canonical link for count data as it ensures that all the fitted values are positive, the choice of the link function can be somewhat influenced by the context and data to be treated.For example, the identity link is the accepted standard in biodosimetry as there is no evidence that the increase of chromosomal aberration counts with dose is of exponential shape (IAEA, 2011).Therefore, function glm.hermite allows both link functions.
A consequence of using the identity-link is that the maximum likelihood estimate of the parameters obtained by maximizing the log-likelihood function of the corresponding model may lead to negative values for the mean.Therefore, in order to avoid negative values for the mean, constraints in the domain of the parameters must be included when the model is fitted.In function glm.hermite, this is carried out by using the function maxLik from package maxLik (Henningsen and Toomet, 2011), which is used internally for maximizing the corresponding log-likelihood function.This function allows constraints which are needed when the identity-link is used.
It should also be noted that results may depend of the starting values provided to the optimization routines.If no starting values are supplied, the starting values for the coefficients are computed/fixed internally.Specifically, when the log-link is specified, the starting values are obtained by fitting a standard Poisson regression model through a call to the internal function glm.fit from package stats.If the link function is the identity and no initial values are provided by the user, the function takes 1 as initial value for the coefficients.In both cases, the initial value for the dispersion index d is taken to be 1.1.
Regarding the order of the Hermite distribution, it can be fixed by the user.If it is not provided (default option), when the model includes covariates, the order m is selected by discretized maximum likelihood method, fitting the coefficients for each value of m between 1 (Poisson) and 10, and selecting the case that maximizes the likelihood.In addition, if no covariates are included in the model and no initial values are supplied by the user, the naïve estimate m = s 2 / x−1 1+log(p0)/ x , where p0 is the proportion The R Journal Vol.7/2, December 2015 ISSN 2073-4859 of zeros in the sample is also considered.In the unlikely case the function returns m = 10, we recommend to check the likelihood of the next orders (m = 11, 12, . ..) fixing this parameter in the function until a local minimum is found.
When dealing with the Generalized Hermite distribution it seems natural to wonder if data could be fitted by using a Poisson distribution.Because the Poisson distribution is included in the Generalized Hermite family, this is equivalent to test the null hypothesis H 0 : d = 1 against the alternative H 1 : d > 1.To do this, an immediate solution is to use the likelihood ratio test, which test-statistic is given by W = 2 l X; μ, d − l (X; μ, 1) , where l is the log-likelihood function.
Under the null hypothesis W is not asymptotically χ 2 1 distributed as usual, because d = 1 is on the border of the domain of the parameters.Using the results of Self and Liang (1987) and Geyer (1994) it can be shown that in this case the asymptotic distribution of W is a 50:50 mixture of a zero constant and a χ 2 1 distribution.The α percentile for this mixture is the same as the 2α upper tail percentile for a χ 2 1 (Puig, 2003).The likelihood ratio test is also performed through glm.hermite function, using the maximum likelihood estimates μ and d.
A summary method for objects of class glm.hermite is included in the hermite package, giving a summary of relevant information, including the residuals minimum, maximum, median and first and third quartiles, the table of coefficients including the corresponding standard errors and significance tests based on the Normal reference distribution for regression coefficients and the likelihood ratio test against the Poisson distribution for the dispersion index.The AIC value for the proposed model is also reported.

Examples
Several examples of application of the package hermite in a wide range of contexts are discussed in this section, including classical and recent real datasets and simulated data.

Hartenstein (1961)
This example by Hartenstein (1961)  This dataset was analyzed in Puig (2003) with a Generalized Hermite distribution of order m = 3.The maximum likelihood estimation gave a mean of μ = x = 0.75, and an index of dispersion of d = 1.8906.
Using glm.hermite we calculate the parameter estimates: We can see that these parameter estimates are equivalent to those reported in Puig (2003).
The estimated expected frequencies are shown in  The frequencies in Table 2 have been obtained running the following code and using the transfor- The R Journal Vol.7/2, December 2015 ISSN 2073-4859 > exp <-round(dhermite(seq(0,5,1), a, b, m = 3)*200,2) Note that the null hypothesis of Poisson distributed data is strongly rejected, with a likelihood ratio test statistic W = 48.66494and its corresponding p-value= 1.518232e − 12, as we can see with

Giles (2010)
In Giles (2010), the author explores an interesting application of the classical Hermite distribution (m = 2) in an economic field.In particular, he proposes a model for the number of currency and banking crises.The reported maximum likelihood estimates for the parameters were â = 0.936 and b = 0.5355, slightly different from those obtained using glm.hermite, which are â = 0.910 and b = 0.557.The actual and estimated expected counts under Hermite and Poisson distribution assumptions are shown in Table 3  In this example, the likelihood ratio test clearly rejects the Poisson assumption in favor of the Hermite distribution (W = 40.08,p-value=1.22e− 10).

Giles (2007)
In Giles (2007), the author proposes an application of Hermite regression to the 965 number 1 hits on the Hot 100 chart over the period January 1955 to December 2003.The data were compiled and treated with different approaches by Giles (see Giles 2006 for instance), and is available for download at the author website http://web.uvic.ca/~dgiles/.For all recordings that reach the number one spot, the number of weeks that it stays at number one was recorded.The data also allow for reentry into the number one spot after having being relegated to a lower position in the chart.The actual and predicted counts under Poisson and Hermite distributions are shown in Table 4.
Several dummy covariates were also recorded, including indicators of whether the recording was by Elvis Presley or not, the artist was a solo female, the recording was purely instrumental and whether the recording topped the charts in nonconsecutive weeks.

DiGiorgio et al. (2004)
In diGiorgio et al. ( 2004) the authors perform an experimental simulation of in vitro whole body irradiation for high-LET radiation exposure, where peripheral blood samples were exposed to 10 different doses of 1480MeV oxygen ions.For each dose, the number of dicentrics chromosomes per blood cell were scored.The corresponding data is included in the package hermite, and can be loaded into the R session by

> data(hi_let)
In Puig and Barquinero (2011) the authors apply Hermite regression (to contrast the Poisson assumption) for fitting the dose-response curve, i.e. the yield of dicentrics per cell as a quadratic function of the absorbed dose linked by the identity function (which is commonly used in biodosimetry).This model can be fitted using the glm.hermite function in the following way: > fit.hlet.id<-glm.hermite(Dic~Dose+Dose2-1, data = hi_let, link = "identity") Note that the model defined in fit.hlet.idhas no intercept.
A summary of the most relevant information can be obtained using the summary() method as in We can see the maximum likelihood estimates and corresponding standard errors in the output from the summary output.Note also that the likelihood ratio test rejects the Poisson assumption (W = 107.82,p-value=1.47e− 25).

Higueras et al. (2015)
In the first example in Higueras et al. (2015) the Bayesian estimation of the absorbed dose by Cobalt-60 gamma rays after the in vitro irradiation of a sample of blood cells is given by a density proportional to the probability mass function of a Hermite distribution taking 102 counts whose mean and variance are functions of the dose x, respectively µ(x) = 45.939x 2 + 5.661x and v(x) = 8.913x 4 − 22.553x 3 + 69.571x 2 + 5.661x.
The reparametrization in terms of a and b as a function of the dose is given by the transformation This density only makes sense when a(x) and b(x) are positive.The dose x > 0 and consequently b(x) is always positive and a(x) is positive for x < 3.337.Therefore, the probability density outside (0, 3.337) is 0.
The following code generates the plot of the resulting density,

Random number generation
A vector of random numbers following a Generalized Hermite distribution can be obtained by means of the function rhermite.This generates a multimodal distribution, as can be seen in Figure 3.The probability that this distribution take a value under 5 can be computed using the function phermite: In this case, we have µ (4) z4 = 0.987 and therefore the condition of Proposition 1 is not met.

Conclusions
Hermite distributions can be useful for modeling count data that presents multi-modality or overdispersion, situations that appear commonly in practice in many fields.In this article we present the computational tools that allow to overcome these difficulties by means of the Generalized Hermite distribution (and the classical Hermite distribution as a particular case) compiled as an R package.
The hermite package also allows the user to perform the likelihood ratio test for Poisson assumption and to estimate parameters using the maximum likelihood method.Hermite regression is also a useful tool for modeling inflated count data, and it can be carried out by the hermite package in a flexible framework and including covariates.Currently, the hermite package is also used by the radir package (Moriña et al., 2015) that implements an innovative Bayesian method for radiation biodosimetry introduced in Higueras et al. (2015).

Figure 1 :
Figure 1: Hermite probability mass and distribution functions for the indicated parameter values.

Table 1 :
describes the counts of Collenbola microarthropods in 200 samples of forest soil.The frequency distribution is shown in Table1.Frequency distribution of Collenbola microarthropods.

Table 3 :
Observed and expected frequency distributions of currency and banking crises.

Table 4 :
For instance, we can obtain the predicted value for the average number of weeks that an Elvis record hits the number one spot.According to the model, we obtain a predicted value of 3.9370, whereas the Observed and expected frequency distribution of Hot 100 data.observed corresponding value is 3.9375.The likelihood ratio test result justifies the fitting through a Hermite regression model instead of a Poisson model: