DGLMExtPois: Advances in Dealing with Over and Under-dispersion in a Double GLM Framework

In recent years the use of regression models for under-dispersed count data, such as COM-Poisson or hyper-Poisson models, has increased. In this paper the DGLMExtPois package is presented. DGLMExtPois includes a new procedure to estimate the coefficients of a hyper-Poisson regression model within a GLM framework. The estimation process uses a gradient-based algorithm to solve a nonlinear constrained optimization problem. The package also provides an implementation of the COM-Poisson model, proposed by Huang (2017), to make it easy to compare both models. The functionality of the package is illustrated by fitting a model to a real dataset. Furthermore, an experimental comparison is made with other related packages, although none of these packages allow you to fit a hyper-Poisson model.


Introduction
The Poisson regression model is the most common framework for modeling count data, such as the number of accidents on a stretch of road, the number of visits to the doctor, the number of recreational or shopping trips, etc (Cameron and Trivedi, 2013;Hilbe, 2011;Winkelmann, 2008).This model is constrained by its equi-dispersion assumption, that is, the mean is equal to the variance.However, this constraint is not usually met in real applications due to the existence of over-dispersion (the variance is greater than the mean) or under-dispersion (the variance is less than the mean).As over-dispersion is more common than under-dispersion, the former has received more attention from the research community.
Although the number of probabilistic models for count data affected by under-dispersion is not as high as that for over-dispersion, in recent years a significant effort has been made to provide suitable alternatives for applied researches to fit the data when the variance is below the mean.Generalized Poisson (Consul and Famoye, 1992;Wang and Famoye, 1997;Famoye et al., 2004;Zamani and Ismail, 2012;Grover et al., 2015;Sellers and Morris, 2017), Double Poisson (Efron, 1986;Zou et al., 2013;Sellers and Morris, 2017), Gamma count (Winkelmann, 1995;Oh et al., 2006;Lord et al., 2010;Daniels et al., 2010Daniels et al., , 2011;;Zeviani et al., 2014;Sellers and Morris, 2017), discrete Weibull (McShane et al., 2008;Ong et al., 2015;Klakattawi et al., 2018;Yoo, 2019) or Extended Poisson Process (Faddy and Smith, 2011;Smith and Faddy, 2016) are examples of models that may facilitate, in a regression context, a flexible dispersion structure wherein the Poisson assumption of equi-dispersion cannot be admitted because of the existence of factors that determine an excess or a shortfall of variability.In any case, Conway-Maxwell-Poisson, from now on CMP, is probably the most widely used model to fit data in the presence of under-dispersion.Sellers and Shmueli (2010) presented a formulation of CMP in a regression context and demonstrated that the model could provide good fits to data not only affected by over-dispersion but also by under-dispersion, in the presence of covariates.The number of subsequent papers applying the model in different contexts is noteworthy (Sellers et al., 2012;Sellers and Morris, 2017;Chatla and Shmueli, 2018;Abdella et al., 2019).
Nevertheless, the use of this CMP model has also been subject to criticism because of some limitations.First, there are some convergence problems in the calculation of the normalizing constant, the mean and the variance (since there are no closed expressions for them) (Francis et al., 2012).Second, and this is in our opinion its most important drawback, the original implementation of Sellers and Shmueli (2010) and, therefore, all other applications based on it, consider a log-linear relationship between the covariates and the location parameter instead of the mean, so that the regression coefficients cannot be interpreted in terms of the effect size.In fact, some authors have considered a reparameterization looking for a central tendency parameter (Guikema and Coffelt, 2008;Lord et al., 2010;Francis et al., 2012;Chanialidis et al., 2018), although it is not really the mean.
Fortunately, Huang (2017) has recently implemented a new CMP parametrization where covariates are log-linearly related to the mean and the dispersion parameter, which has been continued in subsequent works (Huang and Kim, 2019;Ribeiro et al., 2019;Forthmann et al., 2019).The mpcmp package is based on the work of Huang (2017).The hyper-Poisson regression model (Saez-Castillo and Conde-Sanchez, 2013;Khazraee et al., 2015;Sáez-Castillo and Conde-Sánchez, 2017), hereafter hP, provided an alternative that also presented both characteristics: on the one hand, covariates are introduced in the equation of the mean, commonly using a log-linear link, with regression The R Journal Vol.14/4, December 2022 ISSN 2073-4859 The remainder of this paper is structured as follows.First, we theoretically describe the hP µ,γ model and the main estimation results for fitting data.Second, we do the same for the CMP θ,ν and CMP µ,ν models.After that, we describe the DGLMExtPois package, illustrating the fit of a real dataset.Next, we include a comparison of fits of hP µ,γ , CMP θ,ν and CMP µ,ν models with different well-known datasets.Finally, we carry out a simulation to assess the performance of the estimates and standard errors of the hP µ,γ model and discuss the optimization process.

The hP µ,γ model
Let us consider Y as count variable of a hyper-Poisson distribution, whose probability mass function is (Saez-Castillo and Conde-Sanchez, 2013) where and (γ) y = Γ(γ + y)/Γ(γ).The parameter γ is known as the dispersion parameter because for γ > 1 the distribution is over-dispersed and for γ < 1 it is under-dispersed (for γ = 1 we have the Poisson distribution).
The hP distribution belongs to the exponential family only for a given γ value.This justifies that the hP µ,γ model can be considered as a member of the GLM family.Moreover, since the model also allows you to introduce covariates in the dispersion parameter, we refer to it as double GLM , although we highlight that the hP distribution does not properly belong to the exponential family for unknown γ.
The previous definition of the hP µ,γ model (Saez-Castillo and Conde-Sanchez, 2013) was based on an alternative equation, different from (2).Concretely, the authors proposed a closed equation that involves the mean, the variance and the normalizing constant.The substitution of the previous equation by equation ( 2) and the change of the optimization algorithm have been the key points in the great reduction of the computational time needed to fit real data provided by the DGLMExtPois package.
To this end, we consider maximum likelihood for the estimation of the regression coefficients.Letting (Y i , X 1i , . . ., X q 1 i , Z 1i , . . ., Z q 2 i ), i = 1, ..., n, be a random sample, the log-likelihood is given by being x ′ i and z ′ i the ith row of the design matrices and each λ i = λ (µ i , γ i ) is a solution of (2) for a given µ i and γ i .This converts the maximum likelihood procedure into a nonlinear constrained optimization problem where the objective function is (n + q 1 + q 2 + 2)-dimensional, depending on n nuisance λ i parameters, q 1 + 1 regression coefficients for the mean, β, and q 2 + 1 regression coefficients for the dispersion parameter, δ, and it is constrained by n non-linear equations from (2).We have implemented it using a sequential quadratic programming (SQP) gradient-based algorithm (Kraft, 1994); concretely, the NLOPT_LD_SLSQP algorithm included in the nloptr package (Ypma et al., 2022).
In the implementation, it was necessary to compute the gradient of the objective function and the gradient of restriction (2), in which the reparameterization of λ ′ = log λ has been considered to ensure the positivity of λ.These expressions have been included in the appendix.
Moreover, the following results are verified: Result 1 For a sample Y 1 , . . ., Y n from a hyper-Poisson the score function with respect to β is where V (µ i , γ i ) is the variance of an hP with a probability mass function given by (1), which depends on µ i and γ i .This way, the MLE of β can be characterized as the solution to the usual GLM score equations.
Result 2 For a sample of size n from a hyper-Poisson it is verified that where ψ() is the digamma function.
The proof is included in the appendix.We have obtained estimates of the standard errors of the maximum likelihood estimators of β and δ based on Cramer-Rao bound, which requires a closed expression of the Fisher information matrix.
The R Journal Vol.14/4, December 2022 ISSN 2073-4859 Result 3 The Fisher information matrix in the sample can be approximated by are evaluated in the maximum likelihood estimates, β and δ.The proof is included in the appendix.
Standard error estimates would appear in the diagonal of the inverse of I n β, δ .We have not experienced any difficulties in this computation, nor in the inversion of ∑ n i=1 E S β S ′ β .In contrast, ∑ n i=1 E S δ S ′ δ is sometimes nearly singular, so the standard errors of δ are not always estimated and, sometimes, are quite large.That is the reason why we recommend assessing the significance of covariates included in Z using the likelihood ratio test (LRT) instead of the Wald test, since it requires a convenient estimation of δ standard errors.
3 The CMP θ,ν and CMP µ,ν models The probabilities of the CMP distribution are given by with Here, ν < 1 corresponds to over-dispersion and ν > 1 to under-dispersion.The distribution belongs to the bi-parametric exponential family (Huang, 2017).In this distribution, θ is a location parameter with a similar role to λ for the hP distribution.It is worth emphasizing that θ is not a mean, see for example Francis et al. (2012) for the exact relationship between θ and µ.
If Y is again the response count variable and X and Z are defined as above, then we denote as CMP θ,ν the model wherein log θ = Xβ and log ν = Zδ.This is the model with constant ν defined by Sellers and Shmueli (2010), later extended to consider covariates in the ν equation (Sellers et al., 2012;Chatla and Shmueli, 2018).In contrast, the model defined by Huang (2017) is given as where additionally equation (2) must be verified for each observation, and the probabilities in (6).We denote this model as CMP µ,ν .
Estimation and inference for the CMP θ,ν is described, for example, in Sellers and Shmueli (2010).The COMPoissonReg package (Sellers et al., 2018) provides adequate methods and functions to estimate such a model.Huang (2017) also provides theoretical results describing the procedure to obtain estimates of β and δ using maximum-likelihood and related inference results for CMP µ,ν , which are included in the mpcmp package.

The DGLMExtPois package
At present, packages such as COMPoissonReg provide what a practitioner commonly requires to fit real data.Our main criticism, as we have mentioned in the introduction section, is that β regression coefficients cannot be interpreted in terms of the effect size.In this sense, mpcmp is, in our opinion, a better proposal for practitioners looking for a clearer interpretation of the β regression coefficients in relation to the effect of each explanatory variable on the mean of the response variable.
In this context, what the DGLMExtPois package provides is, first, a much faster implementation of the hP µ,γ model than the previous existing R code and, second, our own implementation of CMP µ,ν to facilitate the comparison of the models.
DGLMExtPois has been created by trying to reproduce the syntax of GLM fits.Thus, functions glm.hP and glm.CMP provide the corresponding fits for hP µ,γ and CMP µ,ν , respectively.There are also print and summary methods to visualize the fitted models and AIC, confint, predict, residuals and plot methods to evaluate the goodness of fit, obtain confidence intervals of β regression coefficients, predictions, residuals and some associated plots, such as QQ-plots and simulated envelopes.The package additionally incorporates expected, a function which calculates the marginal probabilities of the Y variable, and lrtest to get the likelihood ratio test in nested models.Other functions are provided, for example, to work with probabilities of CMP and hP distributions.A brief description of the main functions in the package is given in Table 2.

Installation
As usual, you can install the DGLMExtPois package from CRAN with the code:

install.packages("DGLMExtPois")
You may encounter problems on Windows operating systems if you are not logged in with administrator rights due to the strong dependency on the nloptr package.

Numerical examples
In this section the main methods and functions in the package are illustrated by fitting a real dataset.Also, several well-known datasets in regression on count variables have been used to compare COMPoissonReg and DGLMExtPois in terms of goodness of fit, computational times and reliability.Finally, some guidelines are given on the optimization process for estimating the models.

Example of application
The dataset originates from research on customer profiles for a household supplies company.An observation corresponds to census tracts within 10-mile radius around a certain store.variable is the number of customers of each census tract who visit the store and the explanatory variables are related to relevant demographic information for each tract.The dataset was analyzed in Neter et al. (1996) as an example of a Poisson regression model, since it is a classic instance of equidispersed count data.They consider the number of housing units (nhu), the average income in dollars (aid), the average housing unit age in years (aha), the distance to the nearest competitor in miles (dnc) and the distance to store in miles (ds) as covariates.Here, we consider this dataset to illustrate the performance of DGLMExtPois fitting models with all the explanatory variables in the mean equation and one of them, dnc, in the dispersion parameter equation.The dataset is also analyzed in (Bonat et al., 2018) considering the Extended Poisson-Tweedie distribution, but they also obtain an equidispersed distribution.The functions glm.hP and glm.CMP return S3 objects of classes "glm_hP" and "glm_CMP", respectively, with information on the fitted models.The most suitable model according to AIC is Poisson.The fits for these models, shown in Table 3, indicate that there is equidispersion.In fact, if the LRT is carried out to determine whether the dispersion parameter is equal to 1 (Poisson model), the hypothesis is not rejected: However, if we introduce covariates in the dispersion parameter we find that the equidispersion hypothesis is rejected, hence the importance of being able to include variables that determine a different variability in the model.We have checked that none of the explanatory variables lead to a statistically significant improvement in the likelihood of the CMP µ,ν model when they are introduced as covariates for the dispersion parameter.That is why, from now on, we concentrate on how to use the package with the hP µ,γ model.The Wald test assesses whether the dnc variable is significant or not for the dispersion parameter.In any case, as previously mentioned, we also used the LRT to confirm the adequacy of hp1 versus the hp0 model, with a significance level of 0.05, resulting in the rejection of the constant dispersion hypothesis: lrtest(hp0, hp1) Statistics: 5.668851 Degrees of freedom: 1 p-value: 0.01726877 Since the dispersion parameter in the hp1 model depends on dnc, its values may determine overor under-dispersion.It is interesting to note that low values of the dnc variable show over-dispersion while high values show under-dispersion.Specifically, the point at which the change occurs is 1.7069.There are 24 out of the 110 cases with γ i > 1, and thus over-dispersed, underlying hP(µ i , γ i ) distributions, see Figure 1.The plot suggests a certain correlation between under-dispersion and high response variable values.The plot method provides two classical plots for the model diagnostics: residuals against fitted values and a normal QQ-Plot.In relation to the residuals, Pearson, deviance and response residuals are offered, although here we only calculate the Pearson type (top row of Figure 2).The package also provides the possibility to compare in a barplot the observed and expected marginal frequencies of the response variable with the hP_expected and the CMP_expected functions (Figure 2c).Finally, the diagnostic can be completed by a simulated envelope for the residuals to assess the accuracy of the model on the entire dataset (Atkinson, 1981;Garay et al., 2011).The residuals method provides this diagnostic tool, and in Figure 2d the cases for which the model assumptions are rejected are highlighted in red.In this case there are 5 observations that fall outside the envelope, therefore there is no evidence against the adequacy of the model.paste(expression(chi ^2), "=", round(expec_hp$chi2, 2))), pos = 4) # Envelope r1 <-residuals(hp1, envelope = TRUE) envelope_down <-apply(r1$sim_residuals, 1, min) envelope_up <-apply(r1$sim_residuals, 1, max) weirds <-which(r1$residuals < envelope_down | r1$residuals > envelope_up) score_weirds <-qnorm(weirds / 111) points(score_weirds, r1$residuals[weirds], col = "red")

Performance comparison
In this section eight datasets, previously used in different works, have been used to compare the performance of fitting hP µ,γ and CMP µ,ν models with the DGLMExtPois package (version 0.2.1, with nloptr version 2.0.0),CMP θ,ν models with the COMPoissonReg package (version 0.7.0) and CMP µ,ν models with the mpcmp package (version 0.3.6).We think that the selected datasets cover a wide range of examples, with low, medium and large sample sizes, and under-and over-dispersion in their variability structure.
For each dataset we have tried to fit hP µ,γ , CMP µ,ν and CMP θ,ν models, and we have computed some goodness of fit criteria (Lord et al., 2010), such as AIC, mean prediction bias (MPB), mean absolute deviance (MAD), mean squared predictive error (MSPE) and the Pearson statistic.Additionally, we have calculated the time needed to fit the models with these R packages.The experimentation has been carried out on a computer equipped with an Intel Core i7 processor (3.20GHz) and 64 GB of RAM.Table 4 shows the AICs and the computational times obtained by the different models for the different datasets.The first column of the table lists the names of the datasets and the second column lists the number of observations of each dataset.The best AICs results are highlighted in bold.Unfortunately, the COMPoissonReg and mpcmp packages crash with some datasets, which is indicated by a dash in the table.In relation to the crashes, we clarify that all the fits have been carried out with the default optional arguments.To obtain a better approximation of the execution time needed to estimate a model, we have estimated each model 10 times and computed the average time required for the estimate.These average times are presented in Table 4 as follows.A value of 1.0 means the model has the fastest average estimate time and this fastest average time, in seconds, is shown in the last column.A value of, e.g., 41.4 means the average estimate time of the model is 41.4 times larger than the average of the fastest.Table 5 shows the results of the hP µ,γ and CMP µ,ν models, fitted with the DGLMExtPois package, according to other goodness of fit criteria.Next, the datasets used in the comparison are described, together with some comments about the fits for the different models:

AIC
• Takeover bids.The response variable is the number of bids received by 126 US firms that were successful targets of tender offers between 1978-1985.We consider 9 explanatory variables on the defensive actions taken by the management of the target firm, firm-specific characteristics and the intervention taken for both the mean and the dispersion parameter with a log-linear link.The main difference between the fits of hP µ,γ and CMP µ,ν is that the former identifies 108 out of 126 of the cases as being under-dispersed, whereas the latter identifies only 88.In general, the goodness of fit measures are quite similar, being somewhat smaller for the hP µ,γ model.In addition, the CMP µ,ν model obtained with the DGLMExtPois package is slower, but with a lower AIC, than the one obtained with the mpcmp package.
• Days absent.The sample refers to 314 students sampled from two urban high schools.The response variable is the number of days absent from high school, and the explanatory variables for both the mean and the dispersion parameter are gender, math's score (standardized score out of 100) and academic program (General, Academic and Vocational).The dataset has been analyzed, among others, by Huang (2017).The COMPoissonReg package crashes again with the CMP θ,ν model, as does the mpcmp package with the CMP µ,ν model.CMP µ,ν achieves a slightly better AIC than hP µ,γ , but again requires more computational time and the goodness of fit measures are quite similar.In relation to the dispersion structure, both models identify all the cases as being over-dispersed.
• Cotton bolls.Here, the response variable is the number of bolls produced by 125 cotton plants.
The explanatory variables, again for both the mean and the dispersion parameter, are the growth stage (vegetative, flower-bud, blossom, fig and cotton boll), and the defoliation level (with 5 levels).The model also includes the square of the defoliation level.It reproduces the analysis carried out by Zeviani et al. (2014).This is one of the cases where COMPoissonReg provides a CMP θ,ν fit, although the one provided by the CMP µ,ν model obtains a lower AIC (very similar to both the DLGMExtPois and mpcmp packages).Here hP µ,γ performs worse and obtains a higher AIC.Another interesting point in relation to the CMP µ,ν fit is that, regardless of the strong underdispersion of the complete dataset, the introduction of covariates in the dispersion parameter allows us to identify 5 cases that are actually described with over-dispersed distributions.Note also that hP µ,γ is the fastest approach.
• Korea crashes.The dataset contains crash count data, which is the response variable, collected at 162 railway-highway crossings in Korea.The explanatory variables include the average daily vehicle traffic (in logarithmic scale) and 7 characteristics of the crossing.See, for example, Khazraee et al. (2015) for more details.For this dataset both the COMPoissonReg and mpcmp packages crash.The CMP µ,ν fit obtains the best AIC, although the rest of the goodness of fit measures are quite similar to those provided by the hP µ,γ fit.Once the effect of the covariates is taken into account, both fits detect over 50 cases with under-dispersed underlying distributions.
• Toronto crashes.The dataset provides Toronto intersection crash data.The response variable is the number of crashes at 868 intersections, and the explanatory variables for the mean and the dispersion parameter are the average annual daily traffic in the major and minor approaches to the intersection.These data have been widely used as an example of over-dispersed crash data.See, for example, Khazraee et al. (2015).Whereas COMPoissonReg once again leads to a crash in the fit of a CMP θ,ν model, CMP µ,ν and hP µ,γ provide similar fits in terms of goodness of fit measures.The CMP µ,ν model obtained with the DGLMExtPois package has a slightly lower AIC than the model obtained with the mpcmp package.In this case, the mpcmp package is the fastest estimating the model.There are 2 under-dispersed cases for the hP µ,γ fit.
• Insurance claims.Data consist of the numbers of policyholders of an insurance company who were exposed to risk, and the number of car insurance claims made by those policyholders in the third quarter of 1973.The response variable is the number of claims, with the number of policyholders being an offset for the mean.Apart from this offset, there are 3 explanatory The R Journal Vol.14/4, December 2022 ISSN 2073-4859 variables for the mean and the dispersion parameter: district of residence of the policyholder, car group and an ordered factor with the age of the insured in 4 groups.The CMP µ,ν model of the mpcmp package obtains the best AIC, but it is the slowest in terms of computational time.
On this occasion the COMPoissonReg package does not fail and is the fastest, however its AIC is clearly the highest among the estimated models.Most of the cases show under-dispersion (52 and 44 for hP µ,γ and CMP µ,ν respectively).
• Credit card reports.The response variable is the number of major derogatory reports.Three explanatory variables are considered: age in years plus twelfths of a year, yearly income (in USD 10,000) and average monthly credit card expenditure.For this dataset the only model that can be estimated is hP µ,γ .This is a case of over-dispersed data, but as a result of introducing covariates in the dispersion parameter, 5 under-dispersed cases appear in the hP µ,γ model.In any case a negative binomial model fits better due to the strong over-dispersion of data.
• Children.In this dataset the response variable is the number of children a woman has.The explanatory variables are: age of the woman in years, years of education, nationality of the woman (German or not), belief in God (with 6 categories) and attended university (yes or no).This is an example of equidispersed data (Tutz, 2011), although our models detected underdispersion (1164 and 1238 cases for hP µ,γ and CMP µ,ν respectively).In this case the mpcmp package provides a very fast CMP µ,ν fit, but with a slightly higher AIC.
In summary, the hP µ,γ model is the only one that can provide fits for all the datasets.Regarding the AIC, the CMP µ,ν model seems to be better than the hP µ,γ model, although the latter usually requires less estimation time.As for the comparison of the CMP µ,ν model in the DGLMExtPois and mpcmp packages, the CMP µ,ν model of the mpcmp package fails in more datasets and provides better results in terms of AIC for only one dataset.

Simulation
In order to assess the performance of the estimates and standard errors in finite samples of the hP µ,γ model a simulation has been performed.The simulation process consists of the following steps: 1. 500 values of two covariates X and Z are generated from a uniform distribution on the interval (0, 1).
3. The λ parameter is obtained as solution of (2) by means of the base R optimize function.
4. Each value of Y in the sample is randomly generated using the rhP function.
5. Once the dataset (Y, X, Z) is available, the coefficients of the hP µ,γ model are estimated.
This process was repeated 1000 times and the average of the estimates and standard errors, as well as the standard deviation of the estimates, were calculated.The standard deviation of the estimates allows us to check whether the standard errors have been calculated correctly, while the average of the estimates allows us to determine whether there is a bias in the estimation of the coefficients.The results obtained for different values of the coefficients are shown in Table 6.Comb.I: δ = (0.5, −1) 0.461 0.4237 0.4232 -1.041 0.8408 0.8164 Comb.II: δ = (0.5, 0.25) 0.456 0.3834 0.3839 0.273 0.6625 0.6567 Comb.III: δ = (0, −0.5) -0.058 0.4943 0.4866 -0.531 0.9329 0.8980 Table 6: Simulation process for hP µ,γ models.For each combination of coefficients the mean and standard deviation (sd) of the estimates were calculated, as well as the mean of the standard errors (s.e.).
The R Journal Vol.14/4, December 2022 ISSN 2073-4859 The first combination of coefficients (β = (1, 0.5) and δ = (0.5, −1)) allows for under-and overdispersion situations.The coefficients in the second combination (β = (1, 0.5) and δ = (0.5, 0.25)) determine a model with over-dispersion only.And finally, the third combination of coefficients (β = (1, 0.5) and δ = (0, −0.5)) corresponds to an under-dispersed model.It can be seen that the estimates of the β coefficients are practically unbiased and very precise, while those of the δ coefficients are more biased and much less precise.In any case the mean standard errors are very close to the standard deviations of the 1000 estimates, so the calculation of these standard errors seem to be correct.
A similar simulation has been performed for the CMP µ,ν model in the DGLMExtPois package, and the results are shown in Table 7.It can be seen that, although the estimates of the δ coefficients are still less precise than those of the β coefficients, these estimates are less biased and more precise than those of the hP µ,γ model.In this sense, the CMP µ,ν model leads to better estimates than the hyper-Poisson model.7: Simulation for CMP µ,ν models.For each combination of coefficients the mean and standard deviation (sd) of the estimates were calculated, as well as the mean of the standard errors (s.e.).

The optimization process
As previously mentioned, the estimation of hP µ,γ and CMP µ,ν models in the DGLMExtPois package was performed using the SLSQP optimizer included in the nloptr package.Because the SLSQP algorithm uses dense-matrix methods, it requires O(m 2 ) storage and O(m 3 ) time in m dimensions (where m is the number of estimated parameters, i.e., the number of observations plus the number of model coefficients), which makes it less practical for optimizing more than a few thousand parameters.Table 8 shows the datasets from the above performance comparison sorted by the dimension of the objective function when fitting an hP µ,γ model.Also, the average estimation time for 10 estimations and the number of iterations taken by the optimizer are listed.Clearly, the dimension of the objective function has the greatest impact on optimization time, while the number of iterations plays a less important role.The nloptr optimization function from the nloptr package was used to fit the models.This function allows several termination conditions for the optimization process, of which the DGLMExtPois package uses, by default, the following: The R Journal Vol.14/4, December 2022 ISSN 2073-4859 • A maximum number of iterations of 1000 and, • a fractional tolerance (xtol_rel) is used in the parameters x, so that the optimizer stops when |∆x i |/|x i | < xtol_rel, with a default value for xtol_rel of 0.01.
Information on the optimization process can be obtained through the component nloptr, of class "nloptr", included in the S3 object returned by the glm.hP function: library(mpcmp) formula <-daysabs ~gender + math + prog fit <-glm.hP(formula,formula, data = attendance) fit$nloptr$message # termination condition [1] "NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs (above) was reached."fit$nloptr$iterations # number of iterations [1] 558 The user can modify the termination conditions.For example, in the following code the second call to glm.hP removes the termination condition on fractional tolerance used in the parameters and set the maximum number of iterations: fit1 <-glm.hP(formula,formula, data = attendance) # default termination conditions AIC(fit1) [1]

Conclusion
In this paper we have presented DGLMExtPois, the only package on CRAN that allows fitting count data using the hyper-Poisson model.This model is able to fit data with over-or under-dispersion, or both for different levels of covariates.Additionally, DGLMExtPois can also fit COM-Poisson models, considering covariates in the mean and the dispersion parameter.The mpcmp package also presents this feature, so the results obtained by both packages can be compared.We have found that the mpcmp The R Journal Vol.14/4, December 2022 ISSN 2073-4859 package fails to fit some datasets and the fits obtained are, in general, somewhat worse than those obtained with the DGLMExtPois package.The COMPoissonReg package allows covariates in the location parameter, but it models a location parameter instead of the mean.In addition, this latter package fails to fit many of the datasets used in the experimentation.
The two models implemented by the DGLMExtPois package achieved similar results in the experimentation, with the COM-Poisson model obtaining, in general, slightly lower AICs and needing more computational time to fit the models.We have also verified, by means of a simulation, that the COM-Poisson model is more accurate in estimating the coefficients of the covariates included in the dispersion parameter.But, overall, it cannot be said that one model is clearly better than the other.Thus, two competitive models are available to deal with count data.As shown in the example, the possibility of including covariates in the dispersion parameter is suitable for situations where there may be both over-and under-dispersion for different levels of the covariates, which makes these models especially attractive compared to those where the dispersion is constant.In fact, in the example with a constant dispersion parameter, equidispersion was obtained, whereas if covariates are considered in the dispersion parameter over-or under-dispersion can be found.
It should also be noted that the algorithm used to estimate the parameters of the hyper-Poisson model is much faster than the previous algorithms (Huang, 2017;Saez-Castillo and Conde-Sanchez, 2013).

Acknowledgment
We would like to thank the reviewers; their comments helped improve and clarify this manuscript and the package.In memory of our friend Antonio José, this work would not have been possible without your talent and enthusiasm.
Furthermore, since where we have used Thus, you get that

Proof of result 1
Using the chain rule: The R Journal Vol.14/4, December 2022 ISSN 2073-4859 so that the score function for µ i is: Let us start with the first of the previous derivatives: where the result (7) has been used.
The second of the above derivatives is obtained using implicit differentiation, taking into account that λ(µ i , γ i ) is solution of equation ( 2).Thus, wherein subscript i has been omitted, and by differentiating with respect to µ and clearing ∂λ ∂µ Therefore, plugging ( 10) and ( 11) into the score function, we obtain The only thing left to consider to achieve expression (4) is that

Proof of result 2
Using the chain rule we get ∂l hP ∂δ = ∂l hP ∂γ i ∂γ i ∂δ .
We develop the first of these derivatives (score for γ) wherein we have used (9).
The derivative of λ i with respect to γ i is also obtained by explicit differentiation from equation (2) The R Journal Vol.14/4, December 2022 ISSN 2073-4859 and clearing: Therefore, replacing (13) in the score of γ we obtain The only thing left to consider to get the expression ( 5) is that

Proof of result 3
The Fisher information matrix can be calculated by blocks: The R Journal Vol.14/4, December 2022 ISSN 2073-4859

Gradient of the objective function
Considering the log-likelihood function (3) as a function of β, λ ′ i = log λ i and δ, the gradient of the objective function is: The proof is straightforward from expressions ( 7) and ( 8).

Gradient of the restriction (2)
Restriction ( 2) can be expressed as whose gradient is given by: where For this, you have to take into account that: where expression (7) has been used.Considering expressions ( 8) and ( 9), it is verified that: from which it is straightforward to obtain the gradient for δ.

Figure 1 :
Figure1: Value of the dispersion parameter according to the response variable.The points above the dashed line are greater than 1, so the distribution is over-dispersed.These points occur for low values of the response variable.

Figure 2 :
Figure 2: Regression diagnostics.The plots do not show evidence against the adequacy of the model.

Table 2 :
Main functions in the DGLMExtPois package.

Table 3 :
Estimation of coefficients and standard errors for fitted models with a constant dispersion parameter.The last row is the estimation of δ 0 and its standard error.
Let us now consider a new hP µ,γ model with dnc in the dispersion parameter equation: summary method provides, as usual, the regression coefficient estimates, their standard errors, the corresponding t-statistics and p-values and other details about the model: formula.disp1 <-ncust ~dnc # New dispersion parameter formula hp1 <-glm.hP(formula.loc,formula.disp1,data=CustomerProfile)The R Journal Vol.14/4, December 2022 ISSN 2073-4859The

Table 4 :
Comparison of models based on AIC (the smallest value is highlighted in bold) and relative average estimate time.For the estimate times a value of 1.0 means the model is the fastest on average for the dataset.A value of, e.g., 41.4 means the average estimate time of the model is 41.4 times larger than the average of the fastest.The fastest time is expressed in seconds.The symbol -means that the model could not be estimated.
See Saez-Castillo and Conde-Sanchez (2013)for more details about these explanatory variables and other references where the dataset was analyzed.For this dataset the COMPoissonReg package crashes with the CMP θ,ν model.The lowest AIC is obtained by the CMP µ,ν model.

Table 8 :
Optimization process for hP µ,γ models.For each dataset, the dimension of the objective function, the average optimization time (in seconds) and the number of iterations are shown.