We develop an R package SIQR that implements the single-index quantile regression (SIQR) models via an efficient iterative local linear approach in (Wu et al. 2010). Single-index quantile regression models are important tools in semiparametric regression to provide a comprehensive view of the conditional distributions of a response variable. It is especially useful when the data is heterogeneous or heavy-tailed. The package provides functions that allow users to fit SIQR models, predict, provide standard errors of the single-index coefficients via bootstrap, and visualize the estimated univariate function. We apply the R package SIQR to a well-known Boston Housing data.
Single-index quantile regression (Wu et al. 2010) generalizes the
seminal work of linear quantile regression of (Koenker and Bassett 1978)
by projecting the
Single-index quantile regression (SIQR) is a flexible semiparametric
quantile regression model for analyzing heterogeneous data. The SIQR
model has some appealing features: (i) It can provide a comprehensive
view of the conditional distribution of a response variable given
We present a package SIQR in R that implements the iterative local linear approach to the single-index quantile regression in (Wu et al. 2010). The unknown univariate function is estimated by local linear estimation. The key algorithm can be decomposed into two efficient estimation steps on augmented data through local linear approximation and some equivalent formulation of the expected loss. Essentially, it iterates between two linear quantile regressions utilizing the state-of-the-art R package quantreg.
We apply our R package, SIQR, to the well-known Boston Housing data (1978) that is available in the R default library. The data has a total of 506 observations, and the response variable of interest is the median price of owner-occupied homes on the census tracts in suburban Boston from the 1970 census. The response variable and some covariates are left-skewed. Clearly, quantile regression is a natural tool to analyze the data (e.g., Chaudhuri et al. (1997); Yu and Lu (2004); Wu et al. (2010); Kong and Xia (2012)). We organize the rest of the paper as follows. In the next section, we review the SIQR models. Next, we discuss the estimation algorithms implemented in this package. The section following describes the main features of the functions provided. Section “Real Data Analysis and Simulation” illustrates the use of SIQR in R for Boston housing data and a simulation study. The last section concludes the paper.
We develop an R package for the single-index quantile regression for
semiparametric estimation with
We implement the local linear estimation for single-index quantile
regression ((1)) (Wu et al. 2010). For notational
convenience, we omit the subscript
We adopt a local linear approximation. In particular, for
Now, we can minimize the sample analogue of ((2)) below
as in (Yu and Jones 1998) with respect to
We further average ((4)) over
Bandwidth is a critical smoothing parameter that tunes the smoothness of
the fitted function in local estimation. We implement the choice of the
optimal bandwidth
We present the main algorithm for fitting the single-index quantile regression (SIQR) with local linear estimation in detail as following:
Input: Quantile level
Output: The estimated quantile single-index parameter
Obtain an initial estimate
Given
Given
Repeat Steps 2 and 3 until convergence.
Finally, we estimate
The above algorithm effectively decomposes ((5))
into two steps that can be achieved by two standard linear quantile
regression procedures in Steps 2 and 3. In Step 3, we further note that
((9)) can be written as
We can see that ((9)) is an alternative to
((8)). Adopting ((9)) yields some advantages:
(i) It uses all the data and is more efficient in estimation; (ii) The
double sum in ((9)) effectively increases the "augmented"
sample size to
The R package SIQR consists of one core estimation function siqr
and
some supporting functions such as visualization tool plot.siqr
and
summary function summary.siqr
. The R package SIQR depends on the R
packages stats,
quantreg,
KernSmooth.
The main estimation function siqr
implements the iterative local
linear approach to the single-index quantile regression in
(Wu et al. 2010).
The usage and input arguments of the main fitting function siqr
are
summarized as follows:
siqr(y, X, tau=0.5, h=NULL, beta.initial=NULL, se.method = NULL, maxiter=30, tol=1e-8)
This function takes two required arguments: the response variable y in vector format, the covariate matrix X. Please note that all the input covariates are required to be numeric variables.
This function also takes several optional arguments for finer controls.
The optional argument tau
is the quantile index, which specifies the
left-tail probability. The default value of tau
is 0.5, which refers
to a single-index median regression. The optional argument h
is the
bandwidth in local linear quantile regression. Users can either provide
a bandwidth or let the algorithm decide the optimal bandwidth as
advocated in (Wu et al. 2010) by setting this argument to NULL as
default. The optional argument beta.initial
is a numeric vector of the
same length as the dimensionality of covariates. The users can use this
argument to pass in any appropriate user-defined initial single-index
coefficients based on prior information or domain knowledge. The default
value is NULL, which instructs the function to estimate the initial
single-index coefficients by linear quantile regression. The optional
argument se.method
is a character variable that specifies the method
to obtain the standard error of estimated single-index coefficients. The
default value is NULL to skip the calculation of standard error while
the bootstrap-based method is available with "bootstrap". The optional
argument maxiter
and tol
are control parameters that specify the
criteria to terminate the iteration process. Although the algorithm
normally converges quickly, the default maxiter
and tol
are set to
30 and 1e-8, respectively.
We also provide several supporting functions:
summary.siqr(siqr.object)
print.summary.siqr(siqr.object)
The functions summary.siqr
and print.summary.siqr
provide detailed
information related to the fitted model and summarize the results as
illustrated in the next section. These two functions can be called
directly by applying functions print and summary to the siqr.object.
plot.siqr(siqr.object, data.points = TRUE, bootstrap.interval=FALSE)
This function plots the fitted quantiles against the single-index term
from an SIQR-fitted model object. By default, this function will also
plot the observed data points in addition to the fitted quantiles to
visualize the fitness of the model. One can remove the data points by
setting the optional argument data.points
to FALSE. Pointwise
confidence interval will be added to the plot if the optional argument
bootstrap.interval
is set to TRUE.
simulation_data <- generate.data(n, true.theta=NULL, sigma=0.1,
setting="setting1", ncopy=1)
To help perform simulation studies, the function generate.data
generates a size true.beta
and the noise level via sigma
. If no
true.beta
was provided, the function will use
ncopy
generates multiple copies of data for Monte
Carlo simulations.
We consider the Boston housing data to demonstrate the real data application of the proposed R package SIQR. This dataset contains the median value of houses (in $1000’s), medv, in 506 tracts in Boston and 13 other socio-demographic related variables. This data has been investigated by many studies. Heterogeneity and some non-linear dependence of medv on predictor variables have been found by previous researchers. The dataset is maintained at the StatLib library of Carnegie Mellon University and can be found at the R built-in package MASS.
We focus on the following four covariates: RM, the average number of rooms per dwelling; TAX, the full-value property tax (in $) per $10,000; PTRATIO, the pupil-teacher ratio by town; and LSTAT, percentage of the lower status of the population as in (Opsomer and Ruppert 1998), (Yu and Lu 2004), and (Wu et al. 2010). Following previous studies, we take logarithmic transformations on TAX and LSTAT and center the dependent variable medv around zero.
We use the following codes to load data from MASS and pre-process as
discussed above. We fit a single-index quantile regression with
library(SIQR)
#load data from MASS
library(MASS)
medv<- Boston$medv
RM <- Boston$rm
logTAX <- log(Boston$tax)
PTRATIO <- Boston$ptratio
logLSTAT <- log(Boston$lstat)
X <- cbind(RM,logTAX,PTRATIO,logLSTAT)
y0 <- medv - mean(medv)
beta0 <- NULL
tau.vec <- c(0.25,0.50,0.75)
est.coefficient <- matrix(NA, nrow = length(tau.vec), ncol = 5)
est.coefficient[,1] <- tau.vec
for (i in 1:length(tau.vec)){
est <- siqr(y0,X,beta.initial = beta0, tau=tau.vec[i],maxiter = 30,tol = 1e-8)
est.coefficient[i,2:5] <- est$beta
}
colnames(est.coefficient) <- c("quantile tau",colnames(X))
est.coefficient
#> quantile tau RM logTAX PTRATIO logLSTAT
#> [1,] 0.25 0.3358285 -0.5243025 -0.06856117 -0.7795033
#> [2,] 0.50 0.3129182 -0.4294159 -0.06640472 -0.8445558
#> [3,] 0.75 0.2385613 -0.1933015 -0.07860687 -0.9484429
The estimated 0.25, 0.50, and 0.75 quantiles and their 95% pointwise confidence bounds are plotted with the following codes and outputs.
est.tau25 <- siqr(y0,X,beta.initial = NULL, tau=0.25)
plot.siqr(est.tau25,bootstrap.interval = TRUE)
plot.siqr
with estimated 0.25 quantiles
and the 95% pointwise confidence
bounds. est.tau50 <- siqr(y0,X,beta.initial = NULL, tau=0.50)
plot.siqr(est.tau05,bootstrap.interval = TRUE)
plot.siqr
with estimated 0.50 quantiles
and the 95% pointwise confidence
bounds. est.tau75 <- siqr(y0,X,beta.initial = NULL, tau=0.75)
plot.siqr(est.tau75,bootstrap.interval = TRUE)
plot.siqr
with estimated 0.75 quantiles
and the 95% pointwise confidence
bounds.As the estimated single-index function curves are almost monotonically
increasing across different quantiles, variables that contribute
positively to the single index affect the response variable (medv)
positively. Based on the estimated coefficients and above plots, we
found that the number of rooms per house (rm) positively affects
different quantiles. This matches the intuition that people value large
spaces and multi-functional rooms. The property tax rate ln(tax) has a
negative impact on housing prices across different quantiles. However,
the influence of the tax rate is not significant at higher quantile
We consider two simulation settings. In the first simulation example, we
use a sine-bump model with homoscedastic errors:
Estimate | ||||
---|---|---|---|---|
mean | 0.5782 | 0.5727 | 0.5725 | |
s.e. | 0.0131 | 0.0281 | 0.0293 | |
bias | 0.0009 | -0.0046 | -0.0048 | |
mean | 0.5787 | 0.5755 | 0.5774 | |
s.e. | 0.0115 | 0.0105 | 0.0111 | |
bias | 0.0014 | -0.0018 | 0.0003 | |
mean | 0.5803 | 0.5756 | 0.5757 | |
s.e. | 0.0119 | 0.0110 | 0.0118 | |
bias | 0.0029 | -0.0017 | -0.0016 |
The single-index coefficients are estimated via a series of quantile
regressions with
For demonstration purposes, we show codes to generate data from
((11)) and fit the SIQR model using
n <- 400
beta0 <- c(1, 1, 1)/sqrt(3)
n.sim <- 200
tau <- 0.50
data <- generate.data(n, true.theta=beta0, setting = "setting1",ncopy = n.sim)
sim.results.50 <- foreach(m = 1:n.sim,.combine = "rbind") %do% {
X <- data$X
Y <- data$Y[[m]]
est <- siqr(Y, X, beta.initial = c(2,1,0), tau=0.50,maxiter = 30,tol = 1e-8)
return(est$beta)
}
Note that this process has been repeated for the cases with
boxplot(data.frame((sim.results.25)), outline=T,notch=T,range=1,
main = "Boxplots of Coefficient Estimates, tau = 0.25",horizontal = F)
boxplot(data.frame((sim.results.50)), outline=T,notch=T,range=1,
main = "Boxplots of Coefficient , tau = 0.50",horizontal = F)
boxplot(data.frame((sim.results.75)), outline=T,notch=T,range=1,
main = "Boxplots of Coefficient Estimates, tau = 0.75",horizontal = F)
Next, we consider a location-scale model as simulation example 2, where
both the location and the scale depend on a common index
The simulated data are generated with the following codes. The sample
size
n <- 400
beta0 <- c(1, 2)/sqrt(5)
n.sim <- 100
tau <- 0.5
data <- generate.data(n, true.theta=beta0, setting = "setting3",ncopy = n.sim)
sim.results <- foreach(m = 1:n.sim,.combine = "rbind") %do% {
X <- data$X
Y <- data$Y[[m]]
est <- siqr(Y, X, beta.initial = NULL, tau=tau,maxiter = 30,tol = 1e-8)
est$beta
}
est.mean <- c(tau,apply(sim.results,2,mean))
names(est.mean) <- c("tau","beta1.hat","beta2.hat")
est.mean
est.mean <- cbind(p_vec,apply(sim_results,c(1,2),sd))
colnames(est.mean) <- c("quantile tau","X1","X2","X3")
est.mean
#> tau beta1.hat beta2.hat
#> 0.5 0.4515909 0.8917233
The average estimated single-index coefficients shown above are close to
the true single-index parameter
est.se <- c(tau,apply(sim.results,2,sd))
names(est.se) <- c("tau","beta1.se.hat","beta1.se.hat")
est.se
#> tau beta1.se.hat beta1.se.hat
#> 0.5 0.02682211 0.01359602
Meanwhile, the following box plots show that the estimated single-index coefficients are close to the true parameters with small deviations.
boxplot(data.frame((sim.results)), outline=T,notch=T,range=1,
main = "Boxplots of Coefficient Estimates (100 replications)",horizontal = F)
Similarly, we plot the estimated quantiles and their 95% pointwise
confidence bounds with the provided plot function plot.siqr
. The
observed data points are also plotted.
est.sim.50 <- siqr(data$Y[[1]],data$X,beta.initial = NULL, tau=0.5)
plot.siqr(est.sim.50,bootstrap.interval = TRUE)
plot.siqr
with estimated 0.50 quantiles
and the 95% pointwise confidence bounds from example
2.In this paper, we present the R package SIQR for the local linear approach to single-index quantile regression models in (Wu et al. 2010). We demonstrate the package applications to a popular Boston-housing data application and two simulation studies. It is our hope that the package will be useful to a variety of applications, especially for complex heterogeneous data where flexible quantile regression modeling is desirable.
Econometrics, Environmetrics, Optimization, ReproducibleResearch, Robust, Survival
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Zu & Yu, "SIQR: An R Package for Single-index Quantile Regression", The R Journal, 2021
BibTeX citation
@article{RJ-2021-092, author = {Zu, Tianhai and Yu, Yan}, title = {SIQR: An R Package for Single-index Quantile Regression}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {2}, issn = {2073-4859}, pages = {460-470} }