The Liu regression estimator is now a commonly used alternative to the conventional ordinary least squares estimator that avoids the adverse effects in the situations when there exists a considerable degree of multicollinearity among the regressors. There are only a few software packages available for estimation of the Liu regression coefficients, though with limited methods to estimate the Liu biasing parameter without addressing testing procedures. Our liureg package can be used to estimate the Liu regression coefficients utilizing a range of different existing biasing parameters, to test these coefficients with more than 15 Liu related statistics, and to present different graphical displays of these statistics.
For data collected either from a designed experiment or from an
observational study, the ordinary least square (OLS) method does not
provide precise estimates of the effect of any explanatory variable
(regressor) when regressors are interdependent (collinear with each
other). Consider a multiple linear regression (MLR) model,
The OLS estimator (OLSE) of
lrmest (1) | ltsbase (2) | liureg | ||
---|---|---|---|---|
Standardization of regressors | ||||
Estimation and testing of Liu coefficient | ||||
Estimation | ||||
Testing | ||||
SE of coeff. | ||||
Liu related statistics | ||||
Adj- |
||||
Variance | ||||
Bias |
||||
MSE | ||||
F-test | ||||
C |
||||
Effective df | ||||
Hat matrix | ||||
Var-Cov matrix | ||||
VIF | ||||
Residuals | ||||
Fitted values | ||||
Predict values | ||||
Liu model selection | ||||
GCV | ||||
AIC&BIC | ||||
PRESS | ||||
Liu related graphs | ||||
Liu trace | ||||
Bias, Var, MSE | ||||
AIC, BIC |
Researchers may be tempted to eliminate regressor(s) causing problems by
consciously removing regressor from the model or by using some screening
method such as stepwise and best subset regression etc. However, these
methods may destroy the usefulness of the model by removing relevant
regressor(s) from the model. To control variance and instability of the
OLS estimates, one may regularize the coefficients, with some
regularization methods such as the ridge regression (RR), Lasso
regression and Liu regression (LR) methods etc., as alternative to the
OLS. Computationally, the RR (
We have developed the liureg (Imdadullah and Aslam 2017) package to provide
the functionality of Liu related computations. The package provides the
most complete suite of tools for the LR available in R.
Table 1 provides a comparison with other alternatives. For
package development and R documentation, we followed
(Leisch 2008; R Core Team 2015; Wickham 2015). The ridge package by
(Cule and De Iorio 2012),
lmridge by
(Imdadullah and Aslam 2016a) and lm.ridge
from the
MASS by
(Venables and Ripley 2002) also provided guidance in coding.
In the available literature, there are only two R packages capable of
estimating and/or testing of the Liu coefficients. The R packages
mentioned in Table 1 are compared with our liureg
package. The lrmest package (Dissanayake and Wijekoon 2016) computes
different estimates such as the OLS, ordinary ridge regression (ORR),
Liu estimator (LE), LE type-1, 2, 3, adjusted Liu estimator (ALTE), and
their type-1, 2, 3 etc. Moreover, lrmest provides scalar mean square
error (MSE), prediction residual error sum of squares (PRESS) values of
some of the estimators. The testing of ridge coefficient is performed
only on scalar k
, however, for a vector of d
, the function liu()
of lrmest package returns only MSE along with value of the biasing
parameter used. The ltsbase package (Kan et al. 2013) computes
ridge and Liu estimates based on the least trimmed squares (LTS) method.
The MSE value from four regression models can be compared graphically if
the argument plot=TRUE
is passed to the ltsbase()
function. There
are three main functions, (i) ltsbase()
computes the minimum MSE
values for six methods: OLS, ridge, ridge based on LTS, LTS, Liu, and
Liu based on LTS method for sequences of biasing parameters ranging from
0 to 1, (ii) the ltsbaseDefault()
function returns the fitted values
and residuals of the model having minimum MSE, and (iii) the
ltsbaseSummary()
function returns the regression coefficients and the
biasing parameter for the best MSE among the four regression models.
It is important to note that the ltsbase package displays these statistics for models having minimum MSE (bias and variance are not displayed in their output), while our package, liureg, computes these and all other statistics not only for scalar but also for vector biasing parameter.
This paper outlines the collinearity detection methods available in the
existing literature and uses the
mctest
(Imdadullah and Aslam 2016b) package through an illustrative example. To
overcome the issues of the collinearity effect on regressors a thorough
introduction to Liu regression, properties of the Liu estimator,
different methods for the selecting values of
Diagnosing collinearity is important to many researchers. It consists of two related but separate elements: (1) detecting the existence of collinear relationship among regressors and (2) assessing the extent to which this relationship has degraded the parameter estimates. There are many diagnostic measures used for detection of collinearity in the existing literature provided by various authors (Klein 1962; Farrar and Glauber 1967; Marquardt 1970; Theil 1971; Gunst and Mason 1977; Koutsoyiannis 1977; Belsley et al. 1980; Kovács et al. 2005; Curto and Pinto 2011; Fox and Weisberg 2011). These diagnostic methods assist in determining whether and where some corrective action is necessary (Belsley et al. 1980). Widely used, and the most suggested diagnostics, are the value of pair-wise correlations, the variance inflation factor (VIF)/ tolerance (TOL) (Marquardt 1970), the eigenvalues and eigenvectors (Kendall 1957), the CN & CI (Belsley et al. 1980; Maddala 1988; Chatterjee and Hadi 2006), Leamer’s method (Greene 2002), Klein’s rule (Klein 1962), the tests proposed by Farrar and Glauber (Farrar and Glauber 1967), the Red indicator (Kovács et al. 2005), the corrected VIF (Curto and Pinto 2011), and Theil’s measures (Theil 1971), (see also (Imdadullah et al. 2016)). All of these diagnostic measures are implemented in a the R package mctest (Imdadullah and Aslam 2016b). Below, we use the Hald dataset (Hald 1952), for testing collinearity among regressors. We then use the liureg package to compute the Liu regression coefficients for different Liu related statistics and methods of selection of Liu biasing parameter is performed. For optimal choice of biasing parameter,a graphical representation of the Liu coefficients is considered, along with a bias variance trade-off plot. In additino, model selection criteria is also performed. The Hald data are about heat generated during setting of 13 cement mixtures of 4 basic ingredients and used by (Hoerl et al. 1975). Each ingredient percentage appears to be rounded down to a full integer. The data set is included in both the mctest and liureg packages.
R > data(Hald)
R > x <- Hald[, -1]
R > y <- Hald[, 1]
R > mctest (x, y)
Call:
omcdiag(x = x, y = y, Inter = TRUE, detr = detr, red = red, conf = conf,
theil = theil, cn = cn)
Overall Multicollinearity Diagnostics
MC Results detection
Determinant |X'X|: 0.0011 1
Farrar Chi-Square: 59.8700 1
Red Indicator: 0.5414 1
Sum of Lambda Inverse: 622.3006 1
Theil's Method: 0.9981 1
Condition Number: 249.5783 1
1 --> COLLINEARITY is detected
0 --> COLLINEARITY is not detected by the test
===================================
Eigenvalues with INTERCEPT
Intercept X1 X2 X3 X4
Eigenvalues: 4.1197 0.5539 0.2887 0.0376 0.0001
Condition Indices: 1.0000 2.7272 3.7775 10.4621 249.5783
The results from all overall collinearity diagnostic measures indicate the existence of collinearity among regressor(s). These results do not tell which regressor(s) are reasons of collinearity. The individual collinearity diagnostic measures can be obtained though:
> mctest(x = x, y, all = TRUE, type = "i")
Call:
imcdiag(x = x, y = y, method = method, corr = FALSE, vif = vif,
tol = tol, conf = conf, cvif = cvif, leamer = leamer, all = all)
All Individual Multicollinearity Diagnostics in 0 or 1
VIF TOL Wi Fi Leamer CVIF Klein
X1 1 1 1 1 0 0 0
X2 1 1 1 1 1 0 1
X3 1 1 1 1 0 0 0
X4 1 1 1 1 1 0 1
1 --> COLLINEARITY is detected
0 --> COLLINEARITY in not detected by the test
X1 , X2 , X3 , X4 , coefficient(s) are non-significant may be due to multicollinearity
R-square of y on all x: 0.9824
* use method argument to check which regressors may be the reason of collinearity
The results from most of the individual collinearity diagnostics suggest
that all of the regressors are the reason for collinearity among
regressors. The last line of the imcdiag()
function’s output suggests
that method argument should be used to check which regressors may be the
reason of collinearity among different regressors. This finding suggest
that one should use regularization method such as LR.
To deal with multicollinear data, (Liu 1993) formulated a new class of
biased estimators that has combined benefits of ORR by
(Hoerl and Kennard 1970) and the Stein type estimator (Stein 1956),
and other statistical areas, the LE has produced a number of new techniques and ideas, see for example (Kaçiranlar et al. 1999; Akdeniz and Kaçiranlar 2001; Kaçiranlar and Sakalhoğlu 2001; Hubert and Wijekoon 2006; Torigoe and Ujiie 2006; Jahufer and Chen 2009, 2011, 2012).
However, (Liu 2011) and (Druilhet and Mom 2008) have made statements that the
biasing parameter
the main interest of LE lies in the suitable selection of
The design matrix
The fitted values of the LE can be found using Eq. (1),
As
The intercept term for the LE (
Like the linear RR, the Liu regression is also the most popular method among biased methods, because of its relation to OLS. Its statistical properties have been studied by (Akdeniz and Kaçiranlar 1995, 2001), (Arslan and Billor 2000), (Kaçiranlar and Sakalhoğlu 2001), (Kaçiranlar et al. 1999) and (Sakalhoğlu et al. 2001) among many others. Due to comprehensive properties of the LE, researchers have been attracted towards this area of research.
For
Let
Sr.# | Property | Formula |
---|---|---|
1) | Linear transformation | The LE is a linear transformation of the OLSE ( |
2) | Wide range |
Wide range of |
3) | Optimal |
An optimal |
4) | Mean | |
5) | Bias | |
6) | Var-Cov matrix | |
7) | MSE | |
8) | Effective DF (EDF) | |
9) | Larger regression coeff. | |
10) | Inflated RSS |
Theoretically and practically, LR is used to propose new methods for the
choice of the biasing parameter
The existing methods to select biasing parameter in the LR may not fully
address the problem of ill-conditioning when there exists severe
multicollinearity, while the appropriate selection of biasing parameter
The optimal value of
We classified estimation methods as (i) Subjective or (ii) Objective
In these methods, the selection of
Objective methods, to some extent, are similar to judgmental methods for
selection of biasing parameter
Sr.# | Formula | Reference |
---|---|---|
1) | (Liu 1993) | |
2) | (Liu 1993) | |
3) | (Liu 2011) | |
4) | Özkale and Kaçiranlar (2007) | |
5) | (Mallows 1973) | |
6) | (Liu 1993) | |
7) |
Testing of the Liu coefficients is performed by following (Aslam 2014)
and (Halawa and El-Bassiouni 2000). For testing
The statistics
For testing overall significance of vector of LE (
The standard error of
Our R package liureg contains functions related to fitting of the LR
model and provides a simple way of obtaining the estimates of LR
coefficients, testing the Liu coefficients, and the computation of
different Liu related statistics, which prove helpful for selection of
optimal biasing parameter
The liureg
objects contain a set of standard methods such as
print()
, summary()
, plot()
, and predict()
. Therefore, inferences
can be made easily using the summary
method for assessing the
estimates of regression coefficients, their standard errors, t-values
and their respective p-values. The default function liu
which calls
liuest()
to perform required computations and estimation for given
values of non-stochastic biasing parameter
liu(formula, data, scaling=("centered", "sc", "scaled"), d, ...)
The four arguments of liu()
function are described in
Table 4.
Argument | Description |
---|---|
formula |
Symbolic representation for LR model of the form, response |
data |
Contains the variables that have to be used in LR model. |
d |
The biasing parameter, may be a scalar or vector. If a |
scaling |
The methods for scaling of predictors. The centered option, centers the predictors, suggested by (Liu 1993), and uses the default scaling option; the sc option scales the predictors in correlation form as described in (Belsley 1991; Draper and Smith 1998); and the scaled option standardizes the predictors having zero mean and unit variance. |
The liu()
function returns an object of class "liu"
. The functions
summary()
, dest()
, and lstats()
etc., are used to compute and
print a summary of the LR results, list of biasing parameter by
(Liu 1993; Liu 2011) and Liu related statistics such as estimated
squared bias, "liu"
is a list, the components
of which are described in Table 5.
Object | Description |
---|---|
coef | A named vector of fitted Liu coefficients. |
lfit | Matrix of Liu fitted values for each biasing parameter |
mf | Actual data used. |
xm | A vector of means of design matrix |
y | The centered response variable. |
xscale | The scales used to standardize the predictors. |
xs | The scaled matrix of the predictors. |
scaling | The method of scaling used to standardize the predictors. |
d | The LR biasing parameter(s). |
Inter | Whether an intercept is included in the model or not. |
call | The matched call. |
terms | The terms object used. |
Table 6 lists the functions and methods available in liureg package.
Functions | Description |
---|---|
Liu coefficient estimation and testing | |
liuest() |
The main model fitting function for implementation of LR models in R. |
coef() |
Display de-scaled Liu coefficients. |
liu() |
Generic function and default method that calls liuest() and returns an object of S3 class "liu" with different set of methods to standard generics. It has a print method for display of Liu de-scaled coefficients. |
summary() |
Standard LR output (coefficient estimates, scaled coefficient estimates, standard errors, t-value and p-values); returns an object of class "summary.liu" containing the relative summary statistics. Has a print method. |
Residuals, fitted values and prediction | |
predict() |
Produces predicted value(s) by evaluating liuest() in the frame newdata . |
fitted() |
Displays Liu fitted values for observed data. |
residuals() |
Displays Liu residuals values. |
press() |
Generic function that computes prediction residuals error sum of squares (PRESS) for Liu coefficients. |
Methods to estimate |
|
dest() |
Displays various |
Liu statistics | |
vcov() |
Displays associated Var-Cov matrix with matching Liu parameter |
hatl() |
Generic function that displays hat matrix from LR. |
infoliu() |
Generic function that compute information criteria AIC and BIC. |
lstats() |
Generic function that displays different statistics of LR such as MSE, squared bias, |
Liu plots | |
plot() |
Liu coefficient trace plot against biasing parameter |
plot.biasliu() |
Bias, variance, and MSE plot as a function of |
plot.infoliu() |
Plot of AIC and BIC against |
The use of liureg is explained through examples using the Hald dataset.
> library(liureg)
> mod <- liu(y ~ X1 + X2 + X3 + X4, data = as.data.frame(Hald),
+ scaling = "centered", d = seq(0, 1, 0.01) )
The output of linear LR from liu()
function is assigned to an object
mod
. The first argument of the function is formula
, which is used to
specify the required LR model for the data provided as second argument.
The print
method for mod
, an object of class "liu"
, will display
the de-scaled coefficients. The output (de-scaled coefficients) from the
above command is only for a few selected biasing parameter values.
Call:
liu.default(formula = y ~ ., data = as.data.frame(Hald), d = c(0,
0.01, 0.49, 0.5, 0.9, 1))
Intercept X1 X2 X3 X4
d=0 75.01755 1.41348 0.38190 -0.03582 -0.27032
d=0.01 74.89142 1.41486 0.38318 -0.03445 -0.26905
d=0.49 68.83758 1.48092 0.44475 0.03167 -0.20845
d=0.5 68.71146 1.48229 0.44603 0.03304 -0.20719
d=0.9 63.66659 1.53734 0.49734 0.08814 -0.15669
d=1 62.40537 1.55110 0.51017 0.10191 -0.14406
To obtain Liu scaled coefficients mod$coef
can be used:
> mod$coef
d=0 d=0.01 d=0.49 d=0.5 d=0.9 d=1
X1 1.41348287 1.41485907 1.48091656 1.48229276 1.53734067 1.5511026
X2 0.38189878 0.38318147 0.44475049 0.44603318 0.49734070 0.5101676
X3 -0.03582438 -0.03444704 0.03166517 0.03304251 0.08813603 0.1019094
X4 -0.27031652 -0.26905396 -0.20845133 -0.20718877 -0.15668658 -0.1440610
Objects of class "liu"
contain components such as lfit
, d
, and
coef
etc. For a fitted Liu model, the generic method summary
is used
to investigate the Liu coefficients. The parameter estimates of the Liu
model are summarized using a matrix of 5 columns, namely estimates,
estimates(Sc), StdErr (Sc), t-values (Sc), and P(>|t|). The
following results are shown only for d=-1.47218
which produces a
minimum MSE as compared to others values specified in the argument.
> summary(mod)
Call:
liu.default(formula = y ~ ., data = as.data.frame(Hald), d = -1.47218)
Coefficients for Liu parameter d= -1.47218
Estimate Estimate (Sc) StdErr (Sc) t-val (Sc) Pr(>|t|)
Intercept 93.5849 93.5849 15.6226 5.990 2.09e-09 ***
X1 1.2109 1.2109 0.2711 4.466 7.97e-06 ***
X2 0.1931 0.1931 0.2595 0.744 0.4568
X3 -0.2386 -0.2386 0.2671 -0.893 0.3717
X4 -0.4562 -0.4562 0.2507 -1.820 0.0688 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Liu Summary
R2 adj-R2 F AIC BIC MSE
d=-1.47218 0.9819 0.8372 127.8 23.95 59.18 0.7047
The summary()
function also displays Liu related liu()
.
The dest()
function, which works with Liu fitted models, computes
different biasing parameters developed by researchers, see
Table 3. The list of different
> dest(mod)
Liu biasing parameter d
d values
dmm -5.91524
dcl -5.66240
dopt -1.47218
dILE -0.83461
min GCV at 1.00000
The lstats()
function can be used to compute different statistics for
a given Liu biasing parameter specified in a call to liu
. The Liu
statistics are MSE, squared bias, F-statistics, Liu variance, degrees
of freedom (df) by (Hastie and Tibshirani 1990), and lstats()
for some
> lstats(mod)
Liu Regression Statistics:
EDF Sigma2 CL VAR Bias^2 MSE F R2 adj-R2
d=-1.47218 9.4135 5.2173 5.0880 0.2750 0.4297 0.7047 127.8388 0.9819 0.8372
d=-0.06 9.0760 5.2989 5.5077 1.0195 0.0790 1.0985 125.8693 0.9823 0.8406
d=0 9.0677 5.3010 5.5315 1.0625 0.0703 1.1328 125.8194 0.9823 0.8407
d=0.1 9.0548 5.3043 5.5722 1.1362 0.0569 1.1931 125.7427 0.9823 0.8408
d=0.5 9.0169 5.3139 5.7488 1.4561 0.0176 1.4737 125.5157 0.9824 0.8412
d=1 9.0000 5.3182 6.0000 1.9119 0.0000 1.9119 125.4141 0.9824 0.8414
minimum MSE occurred at d= -1.47218
The lstats()
also displays the value of liu()
function.
The residuals, fitted values from the LR, and predicted values of the
response variable residuals()
,
fitted()
, and predict()
, respectively. To obtain the Var-Cov and Hat
matrices, the functions vcov()
and hatl()
can be used. The df are
computed by following (Hastie and Tibshirani 1990). The results for Var-Cov
and diagonal elements of the hat matrix from vcov()
and hatl()
functions are given below for
> vcov(liu(y ~ ., as.data.frame(Hald), d = -1.47218))
$`d=-1.47218`
X1 X2 X3 X4
X1 0.07351333 0.04805778 0.06567391 0.04874902
X2 0.04805778 0.06732869 0.05192626 0.06412284
X3 0.06567391 0.05192626 0.07134433 0.05149914
X4 0.04874902 0.06412284 0.05149914 0.06284562
> diag(hatl(liu(y ~ ., as.data.frame(Hald), d = -1.47218)))
1 2 3 4 5 6 7
0.43522319 0.22023015 0.21341231 0.18535953 0.27191765 0.04296839 0.28798591
8 9 10 11 12 13
0.30622895 0.15028900 0.59103231 0.30392765 0.14087610 0.18778716
Following are possible uses of some functions to compute different Liu related statistics. For a detailed description of these functions/commands, see the liureg package documentation.
> hatl(mod)
> halt(mod)[[1]]
> diag(hatl(mod)[[1]])
> vcov(mod)
> residual(mod)
> fitted(mod)
> predict(mod)
> lstats(mod)$lEDF
> lstats(mod)$var
For given values of predict()
:
> predict(mod, newdata = as.data.frame(Hald[1 : 5, -1]))
d=-1.47218 d=-0.06 d=0 d=0.1 d=0.5 d=1
1 78.27798 78.40208 78.40736 78.41615 78.45130 78.49524
2 73.09404 72.91968 72.91227 72.89992 72.85053 72.78880
3 106.68373 106.27656 106.25926 106.23043 106.11510 105.97094
4 89.54007 89.41842 89.41325 89.40463 89.37017 89.32710
5 95.61470 95.63443 95.63527 95.63667 95.64226 95.64924
The model selection criteria’s of AIC and BIC can be computed using
infoliu()
function for each value of liu()
.
For some
> infoliu(liu(y ~ ., as.data.frame(Hald), d = c(-1.47218, -0.06, 0.5, 1)))
AIC BIC
d=-1.47218 23.95378 59.18349
d=-0.06 24.43818 59.88178
d=0.5 24.69007 60.21849
d=1 24.94429 60.54843
The effect of multicollinearity on the coefficient estimates can be
identified by using different graphical displays such as the Liu trace
(see Figure 1); the plotting of bias, variance, and MSE
against
> mod <- liu(y ~ ., as.data.frame(Hald), d = seq(-5, 5, .001) )
> plot(mod)
> plot.biasliu(mod)
> plot.infoliu(mod)
The liureg package provides the most complete suite of tools for LR
available in R, comparable to those available as listed in
Table 1. We have implemented functions to compute the Liu
coefficients, the testing of these coefficients, the computation of
different Liu related statistics and the computation of the biasing
parameter for different existing methods by various authors (see
Table 3). We have greatly increased the Liu related
statistics and different graphical methods for the selection of the
biasing parameter
Up to now, a complete suite of tools for LR was not available for an open source or paid version of statistical software packages, resulting in reduced awareness and use of developed Liu related statistics. The package liureg provides a complete open source suite of tools for the computation of Liu coefficients estimation, testing, and computation of different statistics. We believe the availability of these tools will lead to an increased utilization and better Liu related practices.
lrmest, ltsbase, liureg, lmridge, MASS, mctest
Distributions, Econometrics, Environmetrics, MixedModels, NumericalMathematics, Psychometrics, Robust, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Imdadullah, et al., "liureg: A Comprehensive R Package for the Liu Estimation of Linear Regression Model with Collinear Regressors", The R Journal, 2017
BibTeX citation
@article{RJ-2017-048, author = {Imdadullah, Muhammad and Aslam, Muhammad and Altaf, Saima}, title = {liureg: A Comprehensive R Package for the Liu Estimation of Linear Regression Model with Collinear Regressors}, journal = {The R Journal}, year = {2017}, note = {https://rjournal.github.io/}, volume = {9}, issue = {2}, issn = {2073-4859}, pages = {232-247} }