When a linear model is rank-deficient, then predictions based on that model become questionable because not all predictions are uniquely estimable. However, some of them are, and the estimability package provides tools that package developers can use to tell which is which. With the use of these tools, a model object’s predict
method could return estimable predictions as-is while flagging non-estimable ones in some way, so that the user can know which predictions to believe. The estimability package also provides, as a demonstration, an estimability-enhanced epredict
method to use in place of predict
for models fitted using the stats package.
Consider a linear regression or mixed model having fixed component of the matrix form \(\mathbf{X}\pmb{\beta}\). If \(\mathbf{X}\) is not of full column rank, then there is not a unique estimate \(\mathbf{b}\) of \(\pmb{\beta}\). However, consider using \(\pmb{\lambda}'\mathbf{b}\) to estimate the value of some linear function \(\pmb{\lambda}'\pmb{\beta}= \sum_j\lambda_j\beta_j\). (We use \(\mathbf{x}'\) to denote the transpose of a vector \(\mathbf{x}\).) For some \(\pmb{\lambda}\)s, the prediction depends on the solution \(\mathbf{b}\); but for others—the estimable ones—it does not.
An example will help illustrate the issues. In the following commands,
we create four predictors x1
–x4
and a response variable y
:
> x1 <- -4 : 4
> x2 <- c(-2, 1, -1, 2, 0, 2, -1, 1,-2)
> x3 <- 3 * x1 - 2 * x2
> x4 <- x2 - x1 + 4
> y <- 1 + x1 + x2 + x3 + x4 + c(-.5, .5, .5, -.5, 0, .5, -.5, -.5, .5)
Clearly, x3
and x4
depend linearly on x1
and x2
and the
intercept. Let us fit two versions of the same model to these data,
entering the predictors in different orders, and compare the regression
coefficients:
> mod1234 <- lm(y ~ x1 + x2 + x3 + x4)
> mod4321 <- lm(y ~ x4 + x3 + x2 + x1)
> zapsmall(rbind(b1234 = coef(mod1234), b4321 = coef(mod4321)[c(1, 5:2)]))
(Intercept) x1 x2 x3 x45 3 0 NA NA
b1234 -19 NA NA 3 6 b4321
Note that in each model, two regression coefficients are NA
. This
indicates that the associated predictors were excluded due to their
linear dependence on the others.
The problem that concerns us is making predictions on new values of the predictors. Here are some predictor combinations to try:
> testset <- data.frame(
+ x1 = c(3, 6, 6, 0, 0, 1),
+ x2 = c(1, 2, 2, 0, 0, 2),
+ x3 = c(7, 14, 14, 0, 0, 3),
+ x4 = c(2, 4, 0, 4, 0, 4))
And here is what happens when we make the predictions:
> cbind(testset,
+ pred1234 = predict(mod1234, newdata = testset),
+ pred4321 = suppressWarnings(predict(mod4321, newdata = testset)))
x1 x2 x3 x4 pred1234 pred43211 3 1 7 2 14 14
2 6 2 14 4 23 47
3 6 2 14 0 23 23
4 0 0 0 4 5 5
5 0 0 0 0 5 -19
6 1 2 3 4 8 14
:
Warning messagepredict.lm(mod1234, new = testset) :
In -deficient fit may be misleading prediction from a rank
Note that the two models produce identical predictions in some cases (rows 1, 3, and 4), and different ones in the others. There is also a warning message (we suppressed this the second time) telling us that this could happen.
It happens that cases 1, 3, and 4 are estimable, and the others are not.
It would be helpful to know which predictions to trust, rather than a
vague warning. The epredict
function in
estimability
(Lenth 2015) accomplishes this:
> require("estimability")
> rbind(epred1234 = epredict(mod1234, newdata = testset),
+ epred4321 = epredict(mod4321, newdata = testset))
1 2 3 4 5 6
14 NA 23 5 NA NA
epred1234 14 NA 23 5 NA NA epred4321
The results for non-estimable cases are indicated by NA
s. Note that in
both models, the same new-data cases are identified as estimable, and
they are the ones that yield the same predictions. Note also that
estimability was determined separately from each model. We do not need
to compare two different fits to determine estimability.
It is a simple matter to add estimability checking to the predict
method(s) in a new or existing package.
The package should import the estimability package.
Use estimability::nonest.basis
to obtain the basis for
non-estimable functions of the regression coefficients, preferably
in the model-fitting code.
In the predict
method, call estimability::is.estble
on the model
matrix for new data. (It is not necessary to check estimability of
predictions on the original data.)
This is implemented in code in a manner something like this:
<- function(...) {
fitmodel
...:
code to create'X' (the fixed-effects model matrix),
'QR' (QR decomp of 'X'),
'object' (the fitted model object)
...$nonest <- estimability::nonest.basis(QR)
object
object
}
<- function(object, newdata, tol = 1e-8, ...) {
predict.mymodel
...if(!missing(newdata)) {
... :
code to create'newX' (the model matrix for 'newdata')
...<- estimability::is.estble(newX, object$nonest, tol)
estble
... -estimable predictions
code to flag non
...
}
... }
The nonest.basis
function returns a matrix whose columns span the null
space of the model matrix \(\mathbf{X}\). If, for some reason, the QR
decomposition is not used in model fitting, there is also a
nonest.basis
method that can be called with \(\mathbf{X}\) itself. But
if the QR decomposition of \(\mathbf{X}\) is available, it is most
efficient to use it.
The is.estble
function tests each row of its first argument for
estimability against the null basis provided in its second argument, and
returns a logical vector. A third argument may be added to adjust the
tolerance used in this test. It is important to remember that all
columns of the model matrix be included. In particular, do not exclude
the columns corresponding to NA
s in the coefficient estimates, as they
are needed for the estimability testing. Typically, the results of
is.estble
would be used to skip any non-estimable predictions and
replace them with NA
.
If there is not a rank deficiency in the model, nonest.basis
returns
estimability::all.estble
, which is a trivial null basis (still of
‘matrix
’ class) with which is.estble
will return all TRUE
. The
nonest.basis
function, when called with a ‘qr
’ object, can
immediately detect if there is no rank deficiency and will return
all.estble
. If your model-fitting algorithm does not use the QR
decomposition, it may be worth including additional code to check for
non-singular models, in which case it sets the null basis to
estimability::all.estble
, rather than have nonest.basis
perform the
additional computation to figure this out.
The sample code above is typical for the S3 object model. If S4 objects
are being used, you may want to instead include a slot of class
‘matrix
’ for nonest
; or incorporate nonest
as part of some other
slot.
To illustrate briefly, consider the previous example. The null basis for
mod1234
is obtained as follows:
> (nb <- nonest.basis(mod1234$qr))
1] [,2]
[,1,] 0.29555933 -0.9176629
[2,] 0.76845426 0.2294157
[3,] -0.48767290 -0.2294157
[4,] -0.28078136 0.0000000
[5,] -0.07388983 0.2294157 [
and we can test estimability for the data in testset
by converting it
to a matrix and appending a column for the intercept:
> newX <- cbind(1, as.matrix(testset))
> is.estble(newX, nb)
1] TRUE FALSE TRUE TRUE FALSE FALSE [
The theory of estimability in linear models is well established, and can be found in almost any text on linear models, such as classics like (Searle 1997) and (Seber and Lee 2003). In this discussion, I will make specific references to portions of a more recent reference, (Monahan 2008).
Before delving in, though, it is worth noting (based on what is said in
some contributed packages’ documentation and code) that there seems to
be some confusion in the R community between non-estimability of a
linear function \(\pmb{\lambda}'\pmb{\beta}\) and the placement of nonzero
\(\lambda_j\) coefficients relative to the positions of NA
values in the
estimate \(\mathbf{b}\). It is not possible to assess estimability with
this information. Note, for instance, in the example in the
1, we obtained two different \(\mathbf{b}\)s. Between
them we can find an NA
in the estimates for every predictor except the
intercept. In fact, while NA
usually refers to unknown values, that is
not the case here. An NA
regression coefficient signals a coefficient
that is constrained to zero in order to obtain an estimate. And there is
nothing wrong with multiplying some of the \(\lambda_j\) by zero. To
assess estimability of \(\pmb{\lambda}'\pmb{\beta}\), we must look at
all of the elements of \(\pmb{\lambda}\), even those corresponding to
NA
s in the estimates.
When \(\mathbf{X}\) is not full-rank, the solution \(\mathbf{b}\) to the normal equations is not unique. However, the predicted values, \(\mathbf{X}\mathbf{b}\), are uniquely estimable; that is, for any two solutions \(\mathbf{b}_1\) and \(\mathbf{b}_2\), \(\mathbf{X}\mathbf{b}_1=\mathbf{X}\mathbf{b}_2\). Note that the \(i\)th element of \(\mathbf{X}\mathbf{b}\) is \(\mathbf{x}_i'\mathbf{b}\) where \(\mathbf{x}'_i\) is the \(i\)th row of \(\mathbf{X}\). Thus, \(\mathbf{x}_i'\pmb{\beta}\) is estimable for each \(i\), and so are any linear combinations of these. Indeed, \(\pmb{\lambda}'\pmb{\beta}\) is estimable if and only if \(\pmb{\lambda}'\) is in the row space of \(\mathbf{X}\)—or equivalently, \(\pmb{\lambda}\) is in the column space of \(\mathbf{X}'\) (Monahan 2008 Result 3.1).
From the above result, we can establish estimability of \(\pmb{\lambda}'\pmb{\beta}\) either by showing that \(\pmb{\lambda}\) is in the column space of \(\mathbf{X}'\), or by showing that it is orthogonal to the null space of \(\mathbf{X}\) (Monahan 2008 Methods 3.2 and 3.3, respectively). One way to implement the second idea is to note that if \((\mathbf{X}'\mathbf{X})^-\) is any generalized inverse of \(\mathbf{X}'\mathbf{X}\), then \(\mathbf{P}= (\mathbf{X}'\mathbf{X})(\mathbf{X}'\mathbf{X})^-\) is a projection onto the column space of \(\mathbf{X}'\), and so \(\mathbf{I}-\mathbf{P}\) is a projection onto the null space of \(\mathbf{X}\). The columns of \(\mathbf{I}-\mathbf{P}\) thus comprise a basis for this null space. Accordingly, estimability of \(\pmb{\lambda}'\pmb{\beta}\) can be determined by showing that \(\pmb{\lambda}'(\mathbf{I}-\mathbf{P})=\mathbf{0}'\).
SAS uses \(\mathbf{I}-\mathbf{P}\), as described above, to test
estimability. Of course, in computation we need to set a tolerance for
being close enough to \(\mathbf{0}\). SAS deems
\(\pmb{\lambda}'\pmb{\beta}\) non-estimable if
\(\pmb{\lambda}\ne\mathbf{0}\) and
\(\max|\pmb{\lambda}'(\mathbf{I}-\mathbf{P})| > \psi\cdot\max|\pmb{\lambda}|\),
where \(\psi\) is a tolerance factor with a default value of \(10^{-4}\).
For details, see (SAS Institute Inc. 2013), the chapter on “Shared Concepts,”
documentation for the ESTIMATE
statement and its SINGULAR
option.
In the estimability package, we obtain the null basis in a different
way than shown above, in part because the QR decomposition of
\(\mathbf{X}\) is already (usually) available in an ‘lm
’ object. Suppose
that \(\mathbf{X}\) is \(n\times p\), then we can write
\(\mathbf{X}=\mathbf{Q}\mathbf{R}\), where \(\mathbf{Q}\) (\(n\times p\)) has
orthonormal columns, and \(\mathbf{R}\) (\(p\times p\)) is upper triangular
with nonzero diagonal elements. If \(\mathbf{X}\) is of rank \(r<p\), then
columns are pivoted so that the linearly dependent ones come last, and
we only need \(r\) columns of \(\mathbf{Q}\) and the top \(r\) rows of
\(\mathbf{R}\), along with the pivoting information. Call these pivoted,
down-sized matrices \(\tilde Q\) and \(\tilde R\) respectively; and let
\(\tilde X\) (still \(n\times p\)) denote \(\mathbf{X}\) with its columns
permuted according to the pivoting. We now have that
\(\tilde X=\tilde Q\tilde R\), and \(\tilde R\) comprises the first \(r\) rows
of an upper triangular matrix. Observe that each row of \(\tilde X\) is
thus a linear combination of the rows of \(\tilde R\)—that is, the row
space of \(\tilde X\) is the same as the row space of \(\tilde R\). So the
much smaller matrix \(\tilde R\) has everything we need to know to
determine estimability.
Now, let \(d=p-r\) denote the rank deficiency, and define \[\mathbf{S}= \left[\begin{array}{c|c} \tilde R'_{p\times r} & \begin{array}{c} \mathbf{0}_{r\times d} \\ \mathbf{I}_{d\times d} \end{array} \end{array}\right]\] Then \(\mathbf{S}\) is an upper triangular matrix with nonzero diagonal—hence it is of full rank \(p = r + d\). Let \(\mathbf{S}= \mathbf{T}\mathbf{U}\) be the QR decomposition of \(\mathbf{S}\). We then have that \(\mathbf{T}\) has orthonormal columns, and in fact it is a Gram-Schmidt orthonormalization of \(\mathbf{S}\). Accordingly, the first \(r\) columns of \(\mathbf{T}\) comprise an orthonormalization of the columns of \(\tilde R'\), and thus they form a basis for the row space of \(\tilde R\) and hence of \(\tilde X\). Letting \(\tilde N\) denote the last \(d\) columns of \(\mathbf{T}\), we have that \(\tilde N\) has orthonormal columns, all of which are orthogonal to the first \(r\) columns of \(\mathbf{T}\) and hence to the row space of \(\tilde R\) and \(\tilde X\). It follows that \(\tilde N\) is an orthonormal basis for the null space of \(\tilde X\). After permuting the rows of \(\tilde N\) according to the original pivoting in the QR decomposition of \(\mathbf{X}\), we obtain a basis \(\mathbf{N}\) for the null space of \(\mathbf{X}\).
To test estimability of \(\pmb{\lambda}'\pmb{\beta}\), first compute \(\mathbf{z}' = \pmb{\lambda}'\mathbf{N}\), which is a \(d\)-vector. Theoretically, \(\pmb{\lambda}'\pmb{\beta}\) is estimable if and only if \(\mathbf{z}'\mathbf{z}=\mathbf{0}\); but for computational purposes, we need to set a tolerance for its being suitably small. Again, we opt to deviate from SAS’s approach, which is based on \(\max|z_i|\). Instead, we deem \(\pmb{\lambda}'\pmb{\beta}\) non-estimable when \(\pmb{\lambda}\ne\mathbf{0}\) and \(\mathbf{z}'\mathbf{z}\ge \tau\cdot\pmb{\lambda}'\pmb{\lambda}\), where \(\tau\) is a tolerance value. Our default value is \(\tau = (10^{-4})^2 = 10^{-8}\). The rationale for this criterion is that it is rotation-invariant: We could replace \(\mathbf{N}\leftarrow \mathbf{V}\mathbf{N}\) where \(\mathbf{V}\) is any \(p\times p\) orthogonal matrix, and \(\mathbf{z}'\mathbf{z}\) will be unchanged. Such rotation invariance seems a desirable property because the assessment of estimability does not depend on which null basis is used.
The estimability package’s epredict
function serves as a
demonstration of adding estimability checking to the predict
methods
in the stats package. It works for lm
, aov
, glm
, and mlm
objects. It also provides additional options for the type
argument of
predict
. When newdata
is provided, type = "estimability"
returns a
logical vector showing which rows of newdata
are estimable; and
type = "matrix"
returns the model matrix for newdata
.
An accompanying enhancement is eupdate
, which runs update
on a model
object and also adds a nonest
member. Subsequent epredict
calls on
this object will use that as the null basis instead of having to
reconstruct it. For example,
> mod123 <- eupdate(mod1234, . ~ . - x4)
> mod123$nonest
1]
[,1,] 0.0000000
[2,] -0.8017837
[3,] 0.5345225
[4,] 0.2672612 [
Now let’s find out which predictions in testset
are estimable with
this model:
> epredict(mod123, newdata = testset, type = "estimability")
1 2 3 4 5 6
TRUE TRUE TRUE TRUE TRUE FALSE
Thus, more of these test cases are estimable when we drop \(x_4\) from the
model. Looking at testset
, this makes sense because two of the
previously non-estimable rows differ from estimable ones only in their
values of \(x_4\).
In rank-deficient linear models used for predictions on new predictor sets, it is important for users to know which predictions are to be trusted, and which of them would change if the same model had been specified in a different way. The estimability package provides an easy way to add this capability to new and existing model-fitting packages.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
V. Lenth, "Estimability Tools for Package Developers", The R Journal, 2015
BibTeX citation
@article{RJ-2015-016, author = {V. Lenth, Russell}, title = {Estimability Tools for Package Developers}, journal = {The R Journal}, year = {2015}, note = {https://rjournal.github.io/}, volume = {7}, issue = {1}, issn = {2073-4859}, pages = {195-199} }