The R package quantreg.nonpar implements nonparametric quantile regression methods to estimate and make inference on partially linear quantile models. quantreg.nonpar obtains point estimates of the conditional quantile function and its derivatives based on series approximations to the nonparametric part of the model. It also provides pointwise and uniform confidence intervals over a region of covariate values and/or quantile indices for the same functions using analytical and resampling methods. This paper serves as an introduction to the package and displays basic functionality of the functions contained within.
Let
Conditional quantile function itself:
Partial first and second derivative functions with respect to
Average partial first and second derivative functions with respect
to
Both pointwise or uniform inference over a region of quantile indices and/or covariate values are implemeted.
The coefficient vector
A challenge to perform inference in this setting is that
Pivotal: analytical method based on the pivotal coupling.
Gradient bootstrap: resampling method based on the pivotal coupling.
Gaussian: analytical method based on the Gaussian coupling.
Weighted bootstrap: resampling method based on the Gaussian coupling.
The quantreg.nonpar package implements all these methods.
Additionally, the linear functionals of interest might be naturally
monotone in some of their arguments. For example, the conditional
quantile function
Several existing R packages are available to estimate conditional
quantile models. The package
quantreg
(Koenker 2016) includes multiple commands for parametric and
nonparametric quantile regression. The command rqss
estimates
univariate and bivariate local nonparametric smoothing splines, and the
command rearrange
implements the rearrangement method to tackle the
quantile crossing problem. The package
QuantifQuantile
(Charlier et al. 2015) estimates univariate conditional quantile models
using a local nonparametric method called optimal quantization or
partitioning. The nonparametric methods implemented in the previous
packages are local or kernel-type, whereas our methods are global or
series-type. Finally, the command gcrq
in the package
quantregGrowth
(Muggeo et al. 2013) implements a univariate
We illustrate the functionality of the package with an empirical
application based on data from (Koenker 2011) for childhood malnutrition
in India, where we model the effect of a child’s age and other
covariates on the child’s height. Here,
First, we load the data and construct the variables that will be used in
the analysis. Note that the variable prefixes “c” and “m” refer to
“child” and “mother”. For each factor variable (csex
, ctwin
,
cbirthorder
, munemployed
, mreligion
, mresidence
, wealth
,
electricity
, radio
, television
, refrigerator
, bicycle
,
motorcycle
, and car
), we generate a variable “facvar
” which is the
factor version of the variable “var
”. For each quadratic variable
(mbmi, breastfeeding,
and mage
), we generate a variable “varsq
”
which is the variable squared. For example:
R> data <- india
R> faccsex <- factor(csex)
R> mbmisq <- mbmi^2
We also construct the formula to be used for the linear part of the
model,
R> form.par <- cheight ~ mbmi + mbmisq + breastfeeding + breastfeedingsq + mage +
+ magesq + medu + edupartner + faccsex + facctwin + faccbirthorder +
+ facmunemployed + facmreligion + facmresidence + facwealth + facelectricity +
+ facradio + factelevision + facrefrigerator + facbicycle + facmotorcycle + faccar
Note that this formula does not contain a term for our variable of
interest
R> basis.bsp <- create.bspline.basis(breaks = quantile(cage, c(0:10)/10))
Finally, we set the values of some of the other parameters. For the
purposes of this example, we use 500 simulations for the pivotal and
Gaussian methods, and 100 repetitions for the weighted and gradient
bootstrap methods. The set of analyzed quantile indices will be
npqr
print
only results for quantile indices contained in the set
R> B <- 500
R> B.boot <- 100
R> taus <- c(1:24)/25
R> print.taus <- c(1:4)/5
R> alpha <- 0.05
Initially, we will focus on the average growth rate, i.e., the average
first derivative of the conditional quantile function with respect to
child’s age
taus
. We specify the
average first derivative with the options nderivs = 1
and
average = 1
. Inference will be performed uniformly over
We first construct the four inference processes based on the npqr
generates output similar to that seen below.
In this example, output is suppressed in each call following the first.
Instead of invoking a particular process, we may also set
process = "none"
. In that case, inference will not be performed, and
only point estimates will be reported.
R> piv.bsp <- npqr(formula = form.par, basis = basis.bsp, var = "cage", taus = taus,
+ nderivs = 1, average = 1, print.taus = print.taus, B = B, uniform = TRUE)
R> gaus.bsp <- update(piv.bsp, process = "gaussian", printOutput = FALSE)
R> wboot.bsp <- update(gaus.bsp, process = "wbootstrap", B = B.boot)
R> gboot.bsp <- update(wboot.bsp, process = "gbootstrap")
The output for the pivotal method (which is generated whenever
printOutput = TRUE
) is given in Figure 1.
The point estimates represent the average derivative of the conditional
quantile function with respect to the variable of interest: the child’s
age. In other words, each value represents the average rate of growth
(in centimeters per month) at each quantile of the height distribution.
They are reported, along with their standard errors and respective
two-sided and one-sided confidence intervals, at each quantile for which
output was requested using print.taus
. The null hypotheses on which
hypothesis testing is performed state that the average growth rate is
negative, positive, and equal to zero, respectively, at all quantiles of
the distribution. We reject, at the 5% level, the null hypotheses that
the growth rate is negative and that the growth rate is equal to zero.
We cannot reject, at the 5% level, the null hypothesis that the growth
rate is positive.
Additionally, the following results are saved in piv.bsp
:
piv.bsp$CI
: a 1 length(taus)
taus
.
piv.bsp$CI.oneSided
: a 1 length(taus)
taus
.
piv.bsp$point.est
: a 1 length(taus)
matrix: each entry
is the point estimate for the average derivative of the conditional
quantile function at each quantile index in taus
.
piv.bsp$std.error
: a 1 length(taus)
matrix: each entry
is the standard error of the estimator of the average derivative of
the conditional quantile function at each quantile index in taus
(here, unconditional on the sample).
piv.bsp$pvalues
: a three item vector containing the taus
); the second tests the null hypothesis that the average
derivative is everywhere greater than zero; the third tests the null
hypothesis that the average derivative is everywhere equal to zero.
piv.bsp$taus
: the input vector taus
, i.e.,
piv.bsp$coefficients
: a list of length length(taus)
: each
element of the list contains the estimates of the QR coefficient
vector
piv.bsp$var.unique
: a vector containing all values of the
covariate of interest,
piv.bsp$load
: the input vector or matrix load
. If load
is not
input (as in this case), the output load
is generated based on
average
and nderivs
. Here, it is a vector containing the average
value of the derivative of the regression equation with respect to
the variable of interest, not including the estimated coefficients.
Using piv.bsp$taus
, piv.bsp$CI
, and piv.bsp$point.est
, as well as
the corresponding objects for the Gaussian, weighted bootstrap, and
gradient bootstrap methods, we construct plots containing the estimated
average quantile derivatives, as well as 95% uniform confidence bands
over the quantile indices in taus
:
R> par(mfrow = c(2, 2))
R> yrange <- c(.65, .95)
R> xrange <- c(0, 1)
R> plot(xrange, yrange, type = "n", xlab = "Quantile Index",
+ ylab = "Average Growth (cm/month)", ylim = yrange)
R> lines(piv.bsp$taus, piv.bsp$point.est)
R> lines(piv.bsp$taus, piv.bsp$CI[1, , 1], col = "blue")
R> lines(piv.bsp$taus, piv.bsp$CI[1, , 2], col = "blue")
R> title("Pivotal")
R> plot(xrange, yrange, type = "n", xlab = "Quantile Index", ylab = "", ylim = yrange)
R> lines(gaus.bsp$taus, gaus.bsp$point.est)
R> lines(gaus.bsp$taus, gaus.bsp$CI[1, ,1], col="blue")
R> lines(gaus.bsp$taus, gaus.bsp$CI[1, ,2], col="blue")
R> title("Gaussian")
R> plot(xrange, yrange, type = "n", xlab = "Quantile Index",
+ ylab = "Average Growth (cm/month)", ylim = yrange)
R> lines(wboot.bsp$taus, wboot.bsp$point.est)
R> lines(wboot.bsp$taus, wboot.bsp$CI[1, , 1], col = "blue")
R> lines(wboot.bsp$taus, wboot.bsp$CI[1, , 2], col = "blue")
R> title("Weighted Bootstrap")
R> plot(xrange, yrange, type = "n", xlab = "Quantile Index", ylab = "", ylim = yrange)
R> lines(gboot.bsp$taus, gboot.bsp$point.est)
R> lines(gboot.bsp$taus, gboot.bsp$CI[1, , 1], col = "blue")
R> lines(gboot.bsp$taus, gboot.bsp$CI[1, , 2], col = "blue")
R> title("Gradient Bootstrap")
R> title("Average Growth Rate with 95% CI", outer = TRUE)
As we can see in Figure 2, the confidence bands generated are roughly similar. Note that the point estimates are the same for all the methods.
We can compare the computation times of each of the approximations using
the command Sys.time
. Additionally, we compare the npqr
does not make use of parallel computing).
R> pval.dimnames <- vector("list", 2)
R> pval.dimnames[[1]] <- c("Pivotal", "Gaussian", "Weighted Bootstrap",
+ "Gradient Bootstrap")
R> pval.dimnames[[2]] <- c("H0: Growth Rate <= 0", "H0: Growth Rate >= 0",
+ "H0: Growth Rate = 0", "Computation Minutes")
R> pvals <- matrix(NA, nrow = 4, ncol = 4, dimnames = pval.dimnames)
R> pvals[1, ] <- c(round(piv.bsp$pvalues, digits = 4), round(piv.time, digits = 0))
R> pvals[2, ] <- c(round(gaus.bsp$pvalues, digits = 4), round(gaus.time, digits = 0))
R> pvals[3, ] <- c(round(wboot.bsp$pvalues, digits = 4), round(wboot.time, digits = 0))
R> pvals[4, ] <- c(round(gboot.bsp$pvalues, digits = 4), round(gboot.time, digits = 0))
R> pvals
H0: Growth Rate <= 0 H0: Growth Rate >= 0 H0: Growth Rate = 0
Pivotal 0 1 0.0237
Gaussian 0 1 0.0234
Weighted Bootstrap 0 1 0.0221
Gradient Bootstrap 0 1 0.0221
Computation Minutes
Pivotal 0.9
Gaussian 0.6
Weighted Bootstrap 30.0
Gradient Bootstrap 346.0
As expected, we reject at the 5% level the null hypothesis that the
growth rate is negative and the null hypothesis that the growth rate is
equal to zero in all cases, and we fail to reject the null hypothesis
that the growth rate is positive in all cases. For the one-sided tests,
the relevant null hypothesis is that the average growth rate is less
than or equal to zero (greater than or equal to zero) at all the
quantile indices in taus
. For the two-sided test, the relevant null
hypothesis is that the average growth rate is equal to zero at all the
quantile indices in taus
. Additionally, note that the pivotal and
Gaussian methods are substantially faster than the two bootstrap
methods.
Another option is to take advantage of the variety of bases available in
the quantreg.nonpar package. Here, we consider three bases: the
R> basis.poly <- poly(cage, degree = 12)
R> basis.four <- create.fourier.basis(rangeval = range(data$cage), nbasis = 9,
+ period = 200)
In this section, we focus on the pivotal method for inference. We run
npqr
for the orthogonal polynomial basis and the Fourier basis,
mimicking the analysis run above for the
R> piv.poly <- update(piv.bsp, basis = basis.poly)
R> piv.four <- update(piv.bsp, basis = basis.four)
Similar to Section 2.2, we plot the point estimates with
their uniform 95% confidence bands for each basis. Figure
3 shows that, given the parameters of the chosen
bases, the type of basis does not have an important impact on the
estimation and inference on the growth rate charts. A table containing
the
R> pval2.dimnames <- vector("list", 2)
R> pval2.dimnames[[1]] <- c("B-spline", "Polynomial", "Fourier")
R> pval2.dimnames[[2]] <- c("H0: Growth Rate <= 0", "H0: Growth Rate >= 0",
+ "H0: Growth Rate = 0")
R> pvals2 <- matrix(NA, nrow = 3, ncol = 3, dimnames = pval2.dimnames)
R> pvals2[1, ] <- round(piv.bsp$pvalues, digits = 4)
R> pvals2[2, ] <- round(piv.poly$pvalues, digits = 4)
R> pvals2[3, ] <- round(piv.four$pvalues, digits = 4)
R> pvals2
H0: Growth Rate <= 0 H0: Growth Rate >= 0 H0: Growth Rate = 0
B-spline 0 1 0.0239
Polynomial 0 1 0.0334
Fourier 0 1 0.0386
For all bases, the tests’ conclusions are identical: at the 5% level, we reject the null hypothesis that the average growth rate is negative, fail to reject the null hypothesis that the average growth rate is positive, and reject the null hypothesis that the average growth rate is equal to zero.
Now, we illustrate two additional options available to the user. First,
to perform inference pointwise over a region of covariate values and/or
quantile indices instead of uniformly, and second, to estimate the
standard errors conditional on the values of the covariate
We will use only the pivotal method with a npqr
for each combination of options
mentioned above:
R> piv.bsp <- npqr(formula = form.par, basis = basis.bsp, var = "cage", taus = taus,
+ B = B, nderivs = 1, average = 1, alpha = alpha, process = "pivotal",
+ uniform = TRUE, se = "unconditional", printOutput = FALSE)
R> piv.bsp.cond <- update(piv.bsp, se = "conditional")
R> piv.bsp.point <- update(piv.bsp, uniform = FALSE, se = "unconditional")
R> piv.bsp.point.cond <- update(piv.bsp, uniform = FALSE, se = "conditional")
We obtain Figure 4 using the graphing techniques described in Sections 2.2 and 2.3. As is visible in this figure, usage of conditional standard errors changes the confidence bands only minimally in our example. As expected, the pointwise confidence bands are narrower than the uniform confidence bands.
We can also compare how much of the differences (or lack thereof) in the confidence bands are driven by differences in the standard errors versus the test statistics. Here, we compare the estimated standard errors at the median for conditional versus unconditional inference:
R> piv.bsp.med <- npqr(formula = form.par, basis = basis.bsp, var = "cage", taus = 0.5,
+ B = B, nderivs = 1, average = 1, alpha = alpha, process = "pivotal", uniform = TRUE,
+ se = "unconditional", printOutput = FALSE)
R> piv.bsp.cond.med <- update(piv.bsp.med, se = "conditional")
R> stderr.dimnames <- vector("list", 2)
R> stderr.dimnames[[1]] <- c("Unconditional", "Conditional")
R> stderr.dimnames[[2]] <- c("Standard Error")
R> stderr <- matrix(NA, nrow = 2, ncol = 1, dimnames = stderr.dimnames)
R> stderr[1, ] <- piv.bsp.med$std.error[1]
R> stderr[2, ] <- piv.bsp.cond.med$std.error[1]
R> stderr
Standard Error
Unconditional 0.008104
Conditional 0.007663
Finally, we compare
R> pval3.dimnames <- vector("list", 2)
R> pval3.dimnames[[1]] <- c("Uniform, Unconditional", "Uniform, Conditional",
+ "Pointwise, Unconditional", "Pointwise, Conditional")
R> pval3.dimnames[[2]] <- c("H0: Growth Rate <= 0", "H0: Growth Rate >= 0",
+ "H0: Growth Rate = 0")
R> pvals3 <- matrix(NA, nrow = 4, ncol = 3, dimnames = pval3.dimnames)
R> pvals3[1, ] <- round(piv.bsp$pvalues, digits = 4)
R> pvals3[2, ] <- round(piv.bsp.cond$pvalues, digits = 4)
R> pvals3[3, ] <- round(piv.bsp.point$pvalues, digits = 4)
R> pvals3[4, ] <- round(piv.bsp.point.cond$pvalues, digits = 4)
R> pvals3
H0: Growth Rate <= 0 H0: Growth Rate >= 0 H0: Growth Rate = 0
Uniform, Unconditional 0 1 0.0239
Uniform, Conditional 0 1 0.0243
Pointwise, Unconditional 0 1 0.0267
Pointwise, Conditional 0 1 0.0222
In this example, where the sample size is large, about 38,000
observations, conditional versus unconditional standard errors and
uniform versus pointwise inference have little impact on the estimated
Finally, we illustrate how to estimate and make uniform inference on linear functionals of the conditional quantile function over a region of covariate values and quantile indices. These functionals include the function itself and derivatives with respect to the covariate of interest. The quantreg.nonpar package is able to perform estimation and inference on the conditional quantile function, its first derivative, and its second derivative over a region of covariate values and/or quantile indices. We also illustrate how to report the estimates using three dimensional plots.
First, we consider the first and second derivatives of the conditional
quantile function. In the application they correspond to the growth rate
and growth acceleration of height with respect to age as a function of
age (from 0 to 59 months) and the quantile index. To do so, we use the
output of npqr
called var.unique
, which contains a vector with all
the distinct values of the covariate of interest (cage
here). To
generate this output, we estimate the first and second derivatives of
the conditional quantile function using a var.unique
and the quantile
indices in taus
:
R> piv.bsp.firstderiv <- npqr(formula = form.par, basis = basis.bsp, var = "cage",
+ taus = taus, nderivs = 1, average = 0, print.taus = print.taus, B = B,
+ process = "none", printOutput = FALSE)
R> piv.bsp.secondderiv <- update(piv.bsp.firstderiv, nderivs = 2)
Next, we generate vectors containing the region of covariate values and quantile indices of interest:
R> xsurf1 <- as.vector(piv.bsp.firstderiv$taus)
R> ysurf1 <- as.vector(piv.bsp.firstderiv$var.unique)
R> zsurf1 <- t(piv.bsp.firstderiv$point.est)
R> xsurf2 <- as.vector(piv.bsp.secondderiv$taus)
R> ysurf2 <- as.vector(piv.bsp.secondderiv$var.unique)
R> zsurf2 <- t(piv.bsp.secondderiv$point.est)
Finally, we create the three dimensional plots for:
R> par(mfrow = c(1, 2))
R> persp(xsurf1, ysurf1, zsurf1, xlab = "Quantile Index", ylab = "Age (months)",
+ zlab = "Growth Rate", ticktype = "detailed", phi = 30, theta = 120, d = 5,
+ col = "green", shade = 0.75, main = "Growth Rate (B-splines)")
R> persp(xsurf2, ysurf2, zsurf2, xlab = "Quantile Index", ylab = "Age (months)",
+ zlab = "Growth Acceleration", ticktype = "detailed", phi = 30, theta = 120,
+ d = 5, col = "green", shade = 0.75, main = "Growth Acceleration (B-splines)")
These commands produce Figure 5. Here, we see
that the growth rate is positive at all ages and quantile indices. The
growth rate decreases in the first few months of life and stabilizes
afterwards, which can also be seen in the graph of growth acceleration.
Growth acceleration is negative at young ages but stabilizes around zero
at about 15 months. Both growth rate and growth acceleration are
relatively homogeneous across quantiles at all ages. Saved in
piv.bsp.firstderiv$pvalues
and piv.bsp.secondderiv$pvalues
are the
Order of Derivative H0: Growth Rate <= 0 H0: Growth Rate >= 0 H0: Growth Rate = 0
First Deriviative 0 1 0.042
Second Derivative 0 1 0.061
Thus, we reject at the 5% level the null hypotheses that growth rate is negative, that growth rate is equal to zero, and that growth acceleration is positive over all the first five years of the children’s lifes at all the quantiles of interest. We come close to rejecting at the 5% level the null hypothesis that growth acceleration is equal to zero over all the first five years of the children’s lifes at all the quantiles of interest.
Similarly, we estimate the conditional quantile function over a region of covariate values and quantile indices, which corresponds to a growth chart in our application. Here, we use a fully saturated indicator basis for the series approximation to the nonparametric part of the model. We also compare the original estimates of the resulting growth chart to rearranged estimates that impose that the conditional quantile function of height is monotone in age and the quantile index. In this example, the conditional quantile function estimated using all data is nearly monotone without rearrangement. To illustrate the power of rearrangement when estimates are not monotone, we use a subset of the data containing the first 1,000 observations:
R> data.subset <- data[1:1000, ]
R> detach(data)
R> attach(data.subset)
Now, we create the fully saturated indicator basis for cage
:
R> faccage <- factor(cage)
To perform estimation using this basis, we input faccage
for basis
:
R> piv.fac.fun <- npqr(formula = form.par, basis = faccage, var = "cage", taus = taus,
+ print.taus = print.taus, B = B, nderivs = 0, average = 0, alpha = alpha,
+ process = "none", rearrange = FALSE, rearrange.vars = "both", se = "conditional",
+ printOutput = FALSE, method = "fn")
We also obtain the rearranged estimates with respect to age and the
quantile index using the options of the command npqr
. Note that we
input "both"
for rearrange.vars
. This option performs rearrangement
over quantile indices and age. Other allowable options are "quantile"
(for monotonization over quantile indices only) and "var"
(for
monotonization over the variable of interest only).
R> piv.fac.fun.re <- update(piv.fac.fun, rearrange.vars = "both")
Now, we construct three dimensional plots for the estimates of the
conditional quantile function:
mbmi
, breastfeeding
, mage
, medu
, and edupartner
) and the
sample mode for unordered factor variables (faccsex
, facctwin
,
faccbirthorder
, facmunemployed
, facmreligion
, facmresidence
,
facwealth
, facelectricity
, facradio
,
factelevision
, facrefrigerator
, facbicycle
, facmotorcycle
, and
faccar
).
R> xsurf <- as.vector(piv.fac.fun$taus)
R> ysurf <- as.vector(piv.fac.fun$var.unique)
R> zsurf.fac <- t(piv.fac.fun$point.est)
R> zsurf.fac.re <- t(piv.fac.fun.re$point.est)
R> par(mfrow = c(1, 2))
R> persp(xsurf, ysurf, zsurf.fac, xlab = "Quantile Index", ylab = "Age (months)",
+ zlab = "Height", ticktype = "detailed", phi = 30, theta = 40, d = 5,
+ col = "green", shade = 0.75, main = "Growth Chart (Indicators)")
R> persp(xsurf, ysurf, zsurf.fac.re, xlab = "Quantile Index", ylab = "Age (months)",
+ zlab = "Height", ticktype = "detailed", phi = 30, theta = 40, d = 5,
+ col = "green", shade = 0.75, main = "Growth Chart (Indicators, Rearranged)")
Figure 6 shows that the rearrangement fixes the non-monotonic areas of the original estimates.
In this paper we introduced the R package quantreg.nonpar, which implements the methods of (Belloni et al. 2011) to estimate and make inference on partially linear quantile models. The package allows the user to obtain point estimates of the conditional quantile function and its derivatives based on a nonparametric series QR approximation. Using pivotal, gradient bootstrap, Gaussian, and a weighted bootstrap methods, the user is also able to obtain pointwise and uniform confidence intervals. We apply the package to a dataset containing information on child malnutrition in India, illustrating the ability of quantreg.nonpar to generate point estimates and confidence intervals, as well as output that allows for easy visualization of the computed values. We also illustrate the ability of the package to monotonize estimates by the variable of interest and by quantile index.
We wish to thank Jim Ramsay for assistance with the fda package (Ramsay et al. 2014), Roger Koenker for sharing the data used in (Koenker 2011), and an anonymous referee for insightful comments. We gratefully acknowledge research support from the NSF.
quantreg.nonpar, quantreg, QuantifQuantile, quantregGrowth, fda
Econometrics, Environmetrics, FunctionalData, Optimization, ReproducibleResearch, Robust, Survival
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Lipsitz, et al., "quantreg.nonpar: An R Package for Performing Nonparametric Series Quantile Regression", The R Journal, 2016
BibTeX citation
@article{RJ-2016-052, author = {Lipsitz, Michael and Belloni, Alexandre and Chernozhukov, Victor and Fernández-Val, Iván}, title = {quantreg.nonpar: An R Package for Performing Nonparametric Series Quantile Regression}, journal = {The R Journal}, year = {2016}, note = {https://rjournal.github.io/}, volume = {8}, issue = {2}, issn = {2073-4859}, pages = {370-381} }