The R function glm
uses step-halving to deal with certain types of convergence problems when using iteratively reweighted least squares to fit a generalized linear model. This works well in some circumstances but non-convergence remains a possibility, particularly with a non-standard link function. In some cases this is because step-halving is never invoked, despite a lack of convergence. In other cases step-halving is invoked but is unable to induce convergence. One remedy is to impose a stricter form of step-halving than is currently available in glm
, so that the deviance is forced to decrease in every iteration. This has been implemented in the glm2
function available in the glm2 package. Aside from a modified computational algorithm, glm2
operates in exactly the same way as glm
and provides improved convergence properties. These improvements are illustrated here with an identity link Poisson model, but are also relevant in other contexts.
It is not too uncommon for iteratively reweighted least squares (IRLS) to exhibit convergence problems when fitting a generalized linear model (GLM). Such problems tend to be most common when using a non-standard link function, such as a log link binomial model or an identity link Poisson model. Consequently, most commonly used statistical software has the provision to invoke various modifications of IRLS if non-convergence occurs.
In the stats package of R,
IRLS is implemented in the glm
function via its workhorse routine
glm.fit
. This routine deals with specific types of convergence
problems by switching to step-halving if iterates display certain
undesirable properties. That is, if a full Fisher scoring step of IRLS
will lead to either an infinite deviance or predicted values that are
invalid for the model being fitted, then the increment in parameter
estimates is repeatedly halved until the updated estimates no longer
exhibit these features. This is achieved through repeated application of
the call
<- (start + coefold)/2} \code{start
where coefold
and start
contain estimates from the previous and
current iterations, respectively.
Although this approach works well in some contexts, it can be prone to
fail in others. In particular, although the step-halving process in
glm.fit
will throw an errant iterative sequence back into the desired
region, the sequence may repeatedly try to escape that region and never
converge. Furthermore, it is even possible for the IRLS iterative
sequence to be such that step-halving is never invoked in glm.fit
, yet
the sequence does not converge. Such behavior is typically accompanied
by a deviance sequence that increases in one or more of the iterations.
This suggests a modification to glm.fit
which has been implemented in
the glm2 package
(Marschner 2011).
As motivation for the proposed modification, we begin by discussing the
potential for non-convergence using some numerical examples of the above
types of behavior. The glm2
package is then discussed, which consists of a main function glm2
and
a workhorse routine glm.fit2
. These are modified versions of glm
and
glm.fit
, in which step-halving is used to force the deviance to
decrease from one iteration to the next. It is shown that this
modification provides improved convergence behavior.
We start by providing illustrations of the two types of non-convergence
alluded to above. Specifically, we will consider situations in which
standard IRLS does not converge, and for which either: (i) the
step-halving in glm.fit
is invoked but cannot induce convergence; or
(ii) the step-halving in glm.fit
is never invoked despite the
non-convergence of IRLS. These two types of behavior will be illustrated
using an identity link Poisson regression model, which can be prone to
convergence problems as the link function does not automatically respect
the non-negativity of the Poisson means. This context is useful for
studying the convergence properties of algorithms based on IRLS, because
for this particular GLM a reliable alternative algorithm exists which is
not based on IRLS, as described by Marschner (2010). While we focus on the
identity link Poisson model here, the same behavior can occur in other
models and the documentation for the
glm2 package includes a log
link binomial example.
In Table 3.2 of Agresti (2007) a data set is presented that is amenable to analysis using a linear Poisson model. The data set consists of a count response variable \(y_i\), \(i=1,\dots, 173\), called Sa in Agresti (2007), with observed values ranging from 0 through 15 and a mean of 2.9. Also presented are a number of potential covariates. Here we define three covariates, \(x_{i1}\), \(x_{i2}\) and \(x_{i3}\), \(i=1,\dots, 173\), similarly to Marschner (2010). The first two of these covariates, \(x_{i1}\) and \(x_{i2}\), are defined by dichotomizing the covariates called C and S in Agresti (2007), such that \(\{x_{i1}\}=1\{\textrm{C}>3\}\) and \(\{x_{i2}\}=1\{\textrm{S}<3\}\). This yields two 0/1 binary covariates with means 0.4 and 0.3, respectively. The third covariate, \(x_{i3}\), is a continuous covariate called W in Agresti (2007), which is shifted here by subtracting the smallest value, so that it ranges from 0 through 12.5 with a mean of 5.3.
Assuming \(y_i\) is an observation from a Poisson distribution with mean
\(\mu_i\), the identity link model is
\[\mu_i=\alpha_0+\alpha_1x_{i1}+\alpha_2x_{i2}+\alpha_3x_{i3}\] with
parameter vector \(\theta=(\alpha_0,\alpha_1,\alpha_2,\alpha_3)\). This
model can be fitted in R by first defining a response vector y
with
y[i]
\(=y_i\), and corresponding covariate vectors x1
, x2
and x3
with c(x1[i],x2[i],x3[i])
\(=(x_{i1},x_{i2},x_{i3})\), along with a data
frame poisdata
through the assignment
poisdata <- data.frame(y,x1,x2,x3)
. The call to fit the model is then
glm(y ~ x1 + x2 + x3, data = poisdata,
family = poisson(link = "identity"),
start = rep(1,4))
which includes an initial estimate through the start
argument because
the default choice is invalid for this model. Below we also use the
control
argument to monitor the iterative behavior.
As discussed later in the paper, the above call produces satisfactory
convergence for these data; however, a bootstrap analysis using the same
call leads to quite severe numerical instability. Using the sample
function, bootstrap replications were generated by sampling with
replacement from the original data, after which glm
was applied to
each replication. This process readily produced replications for which
glm
failed, and a collection of 100 such replications was generated in
which non-convergence occurred. Two of these replications are discussed
as examples in this section, while the full collection of 100
replications is discussed further in the next section.
Figure 1 displays the lack of convergence using glm
for the two
illustrative replications. Increasing the maximum number of iterations
does not alleviate the problem. Also plotted in Figure 1 is the deviance
achieved by the maximum likelihood estimate (MLE), calculated using the
non-IRLS linear Poisson method of Marschner (2010). For Figure 1(a) this
minimum possible deviance is 604.0 with an MLE of
\(\hat{\theta}=(-0.095,-0.385,0.618,0.530)\), while for Figure 1(b) the
minimum possible deviance is 656.3 with an MLE of
\(\hat{\theta}=(0.997,-1.344,-0.169,0.524)\). Inspection of the score
functions reveals both MLEs are stationary and in the interior of the
parameter space.
These two examples illustrate the two scenarios of non-convergence
described at the beginning of this section. In Figure 1(a), step-halving
was invoked in 28 of the 100 iterations, showing that glm
can fail to
converge even with step-halving. In Figure 1(b) step-halving was not
invoked, showing that glm
can fail to converge without ever making use
of step-halving. The latter example is indicative of a potential problem
with Newton-type algorithms, which can have a so-called attracting
periodic cycle. In this case IRLS is attracted to a cycle of two iterate
values, with deviances of 673.2 and 691.1, and then subsequently
oscillates between those values.
Although these results use a specific starting value, the
non-convergence cannot be remedied with better initial estimates. This
is illustrated in Figure 1(a), where the iterates get very close to the
optimal value. Only when the initial estimate is identical to the MLE
does glm
converge (in one iteration).
An important feature of Figure 1 is that the iterative sequences display
instances of increasing deviance from one iteration to the next. It is
well known that step-halving can be used in Newton-type algorithms to
force the objective function to improve monotonically (Lange 2010 251). This approach is used to improve the convergence properties of
IRLS in other standard statistical software. Here we discuss its
implementation in the glm2
function available within the
glm2 package, and the
associated improvements in convergence properties.
The source code for glm.fit
has two step-halving blocks called inner
loops 1 and 2. These occur within the main loop immediately prior to the
test of convergence, and they use step-halving to rectify a divergent
deviance and invalid predicted values, respectively. The main change
implemented in the glm2
package is that the modified routine glm.fit2
has a further
step-halving block, called inner loop 3, which tests whether the
deviance is lower than in the previous iteration. If not, step-halving
is invoked until the deviance is lowered, leading to an iterative
sequence that monotonically decreases the deviance.
Convergence in glm.fit
occurs if abs(rel) < control$epsilon
where
rel
is the relative change (dev - devold)/(0.1 + abs(dev))
from the
current deviance devold
to the updated deviance dev
, and
control$epsilon
is a positive tolerance. In glm.fit2
the same
convergence test is used, and if it is not satisfied because
rel <= -control$epsilon
, then iterations proceed as in glm.fit
.
However, if convergence is not satisfied because
rel >= control$epsilon
, then inner loop 3 invokes step-halving until
rel <= -control$epsilon
. Thus, as well as decreasing the deviance, the
step-halving takes rel
from one side of the convergence region to the
other, therefore never causing false convergence.
The glm2
function is essentially identical to glm
, except the
default fitting method is glm.fit2
. This allows glm2
to be called
with the same calling sequence as glm
. Alternatively, in version
2.12.1 and later, it should be possible to achieve the same effect by
passing glm.fit2
to glm
through the method
argument, instead of
the default method glm.fit
. Indeed, existing scripts calling glm
should work unchanged with the new fitting function, after first
executing
<- function(..., method = glm2::glm.fit2)\{
glm ::glm(..., method = method)
stats \}
which makes glm.fit2
the default fitting method when glm
is called.
In the previous section we discussed a data set for which 100 bootstrap
replications were generated where glm
failed to converge. When glm2
was used, convergence was achieved for all 100 replications with an
initial value of 1 for each of the four parameters. This included the
two illustrative examples plotted in Figure 1, which both converged to
the MLEs produced by the non-IRLS method of Marschner (2010). This
convergence is displayed in Figure 2, together with the path that glm
would take.
Using glm2
, Figure 3 provides medians and inter-quartile ranges of the
estimates from the 100 replications that did not converge in glm
. Also
shown is the same information for 100 replications where glm
did
converge. In each case the two sets of estimates are relatively
consistent. This would seem to suggest that estimates from the
replications in which glm
failed to converge are not particularly
unusual compared to those for which it did converge. In particular, as
demonstrated in Figure 1, non-convergence of glm
can occur even when
the MLE is a stationary point in the interior of the parameter space.
As well as the functions glm2
and glm.fit2
, the
glm2 package provides data
sets and example code illustrating the usefulness of the proposed
change. This allows reproduction of the behavior described here for the
identity link Poisson model, and also provides an example for the log
link binomial model.
This paper describes a modification that improves the convergence
properties of glm
, and which is available in the glm2
function in
the glm2 package. Following
the recommendation of the editors, the modification has been presented
as a separate package rather than as a code snippet. This has the
advantage of facilitating thorough testing of the proposed change, and
also allows some example analyses to be made available through the
package. These examples are promising, but additional testing would be
required before considering a change to glm
and its ancillaries in the
standard R stats package.
While glm2
should converge whenever glm
converges, it may take a
different path to convergence. This will occur whenever a convergent
glm
sequence has an iteration where the deviance increased. The fact
that glm2
may take a different path to convergence should be
inconsequential in practice, particularly as it will be a more stable
monotone path.
Although we have found cases in which glm
has convergence problems, in
other contexts it copes very well with numerical difficulties. The data
set from which the bootstrap replications were generated is an example.
Using an identity link Poisson model, the existing step-halving in glm
allows it to converge to the non-stationary MLE
\(\hat{\theta}=(0.578,-0.626,0.048,0.484)\). Since the fitted value for
the observed covariate pattern \((x_{i1},x_{i2},x_{i3})=(1,1,0)\) is
\(0.578-0.626+0.048=0\), and all other fitted values are positive, the
estimate produced is on the boundary of the parameter space. Not all GLM
software would be able to converge to a non-stationary boundary point
such as this.
Finally, we end with an observation on predicted values when using a
non-standard link function. When glm
(or glm2
) converges with a link
function that does not respect the natural parameter restrictions of the
error distribution, such as in the example here, the predicted values
will meet these restrictions for all observed covariate combinations.
However, predicted values for other covariate combinations may not meet
these restrictions. This is true not only for extreme covariate values
outside the observed ranges, but also for unobserved combinations that
are within the observed ranges. For example, the values \(x_{i1}=1\),
\(x_{i2}=0\) and \(x_{i3}=0\) are all observed in the above data set, but
the combination \((x_{i1},x_{i2},x_{i3})=(1,0,0)\) was not observed, and
has a negative predicted value of \(0.578-0.626=-0.048\). Although not
available in glm
or glm2
, in principle it is possible to use a
smaller parameter space that only has valid predicted values for
covariate combinations that are within the observed ranges of the
individual covariates. For the above data set this leads to the estimate
\(\hat{\theta}=(0.640,-0.640,0.000,0.476)\) (Marschner 2010), which is
slightly different to the one produced by glm
and has a valid
predicted value for the combination \((1,0,0)\). A discussion of the
relative merits of these parameter spaces is not attempted here, but it
is important to understand which one is being used when one fits a GLM
with a non-standard link.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Marschner, "glm2: Fitting Generalized Linear Models with Convergence Problems", The R Journal, 2011
BibTeX citation
@article{RJ-2011-012, author = {Marschner, Ian}, title = {glm2: Fitting Generalized Linear Models with Convergence Problems}, journal = {The R Journal}, year = {2011}, note = {https://rjournal.github.io/}, volume = {3}, issue = {2}, issn = {2073-4859}, pages = {12-15} }