QuantifQuantile : an R package for performing quantile regression through optimal quantization

Quantile regression allows to assess the impact of some covariate X on a response Y . An important application is the construction of reference curves and conditional prediction intervals for Y . Recently, Charlier et al. (2014a) developed a new nonparametric quantile regression method based on the concept of optimal quantization . This method, as shown in Charlier et al. (2014b), competes very well with its classical nearest-neighbor or kernel competitors. In this paper, we describe an R package, called QuantifQuantile , that allows to perform quantization-based quantile regression. We describe the various functions of the package and provide examples.


Introduction
In numerous applications, quantile regression is used to evaluate the impact of a d-dimensional covariate X on a (scalar) response variable Y. Quantile regression is an interesting alternative to standard regression whenever the conditional mean does not provide a satisfactory picture of the conditional distribution.Denoting by F(•|x) the conditional distribution of Y given X = x, the conditional quantile functions indeed always yield a complete description of the conditional distribution.For our purposes, it is useful to recall that the conditional quantiles in (1) can be equivalently defined as where ρ α (z) = αzI [z≥0] − (1 − α)zI [z<0] is the so-called check function.
For fixed α, the quantile functions x → q α (x) provide reference curves (when d = 1), one for each value of α.For fixed x, they provide conditional prediction intervals of the form I α = [q α (x), q 1−α (x)] (α < 1/2).Such reference curves and prediction intervals are widely used, e.g. in economics, ecology, or lifetime analysis.In medicine, they are used to provide reference growth curves for children's height and weight given their age.
Many approaches have been developed to estimate conditional quantiles.After the seminal paper of Koenker and Bassett (1978) that introduced linear quantile regression, much effort has been made to consider nonparametric quantile regression.The most classical procedures in this vein are the nearest neighbor estimators (Bhattacharya and Gangopadhyay, 1990), the (kernel) local linear estimators (Yu and Jones, 1998) or the spline-based estimators (Koenker et al., 1994;Koenker and Mizera, 2004).For related work, we also refer to, e.g.Fan et al. (1994), Gannoun et al. (2002), Muggeo et al. (2013) and Yu et al. (2003).There also exists a wide variety of R functions/packages dedicated to the estimation of conditional quantiles.Among them, let us cite the functions rqss (only for d ≤ 2) and gcrq (only for d = 1) from the packages quantreg (Koenker, 2013) and quantregGrowth (Muggeo, 2014), respectively.
Recently, Charlier et al. (2015a) proposed a nonparametric quantile regression method based on the concept of optimal quantization.Optimal quantization replaces the (typically continuous) covariate X with a discretized version X N obtained by projecting X on a collection of N points (these N points, that form the quantization grid, are chosen to minimize the L p -norm of X − X N ; see below).As shown in Charlier et al. (2015b), the resulting conditional quantile estimators compete very well with their classical competitors.
The goal of this paper is to describe an R package, called QuantifQuantile (Charlier et al., 2015c), that allows to perform quantization-based quantile regression.This includes the data-driven selection of the grid size N (that plays the role of a tuning parameter), the construction of the corresponding quantization grid, the computation of the resulting sample conditional quantiles, as well as (for d ≤ 2) their graphical representation.
The paper is organized as follows.The first section briefly recalls the construction of quantizationbased quantile regression introduced in Charlier et al. (2015a,b) and explains the various steps needed to obtain the resulting estimators.The second section lists the main functions of QuantifQuantile and The R Journal Vol.XX/YY, AAAA ISSN 2073-4859

Approximating population conditional quantiles through quantization
Let γ N ∈ (R d ) N be a grid of size N, that is, a collection of N points in R d .For any x ∈ R d , we will denote by xγ N = Proj γ N (x) the projection of x onto this grid, that is, the point of γ N that is closest to x (absolute continuity assumption makes ties unimportant in the sequel).This allows to approximate the d-dimensional covariate X by its quantized version X γ N .
Obviously, the choice of the grid has a significant impact on the quality of this approximation.Under the assumption that X p := E[|X| p ]1/p < ∞ (throughout, | • | denotes the Euclidean norm), optimal quantization selects the grid γ N that minimizes the L p -quantization error X − X γ N p .Such an optimal grid exists under the assumption that the distribution P X of X does not charge any hyperplane, i.e. under the assumption that P X [H] = 0 for any hyperplane H (Pagès, 1998).In practice, an optimal grid is constructed using a stochastic gradient algorithm (see the following section).For more details on quantization, the reader may refer to Pagès (1998) and Graf and Luschgy (2000).
Based on optimal quantization of X, we can approximate the conditional quantile q α (x) in (2) by where X N (resp., xN ) denotes the projection of X (resp., x) onto an optimal grid.It is shown in Charlier et al. (2015a) that -under mild assumptions, 1 q N α (x) converges to q α (x) as N → ∞, uniformly in x.

Obtaining an optimal N-grid
As we will see below, whenever independent copies (X 1 , Y 1 ) , . . ., (X n , Y n ) of (X , Y) are available, the first step to obtain a sample version of (3) is to compute an optimal N-grid (we assume here that N is fixed).As already mentioned, this can be done through a stochastic gradient algorithm.This algorithm, called Competitive Learning Vector Quantization (CLVQ) when p = 2, is an iterative procedure that operates as follows : • First, an initial grid -γN,0 , say -is chosen by sampling randomly without replacement among the X i 's.• Second, n iterations are performed (one for each observation available).The grid γN,t = ( γN,t 1 , . . ., γN,t N ) at step t is obtained through where (δ t ), t ∈ N 0 , is a deterministic sequence in (0, 1) such that ∑ t δ t = ∞ and ∑ t δ 2 t < ∞.At the t th iteration, only one point of the grid moves, namely the one that is closest to X t .
The optimal grid provided by this algorithm is then γN,n .

Estimating conditional quantiles
Assume now that a sample (X 1 , Y 1 ) , . . ., (X n , Y n ) as above is indeed available.The sample analog of (3) is then defined as follows : (S1) First, we compute the optimal grid γN,n through the stochastic gradient algorithm just described, and we write X N i = Proj γN,n (X i ), i = 1, . . ., n. (S2) Then, the conditional quantiles are estimated by where xN = Proj γN,n (x).In practice, q N,n α (x) is simply evaluated as the sample α-quantile of the Y i 's for which It is shown in Charlier et al. (2015a) that -under mild assumptions, q N,n α (x), for any fixed N and x, converges in probability to q N α (x) as n → ∞, provided that quantization is based on p = 2.When the sample size n is small to moderate (n ≤ 300, say), the estimated reference curves x → q N,n α (x) typically are not smooth.To improve on this, Charlier et al. (2015a,b) introduced the following bootstrapped version of the estimator in (4).For some positive integer B, generate B samples of size n from the original sample {(X i , Y i ) } i=1,...,n with replacement.From each of these bootstrap samples, the stochastic gradient algorithm provides an "optimal" grid, using these bootstrapped samples to perform the iterations.The bootstrapped estimator of conditional quantile is then where q(b) α (x) = q(b),N,n α (x) denotes the estimator in (4) computed on the basis of the b th optimal grid.
We stress that, when computing q(b) α (x), the original sample is used in (S2); the bootstrapped samples are only used to provide the B different grids.As shown in Charlier et al. (2015a,b), the bootstrapped reference curves are much smoother than the original ones.Of course, B should be chosen large enough to make the bootstrap useful, but also small enough to keep the computational burden under control.For d = 1, we usually choose B = 50.

Selecting the grid size N
Both for the original estimators q N,n α (x) and for their bootstrapped versions qN,n α,B (x), an appropriate value of N should be identified.If N is too small, then reference curves will have a large bias, while if N is too large, then they will show much variability.This leads to the usual bias/variance trade-off that is to be achieved when selecting the value of a smoothing parameter in nonparametric statistics.Charlier et al. (2015b) proposed the following data-driven method to choose N. Let {x 1 , . . ., x J } be a set of x-values at which we want to estimate q α (x) (the x j 's are for instance chosen equispaced on the support of X) and let N be a finite collection of N-values (this represents the values of N one allows for and should typically be chosen according to the sample size n).Ideally, we would like to select the optimal value of N as This, however, is infeasible, since the population quantiles q α (x j ) are unknown.This is why we draw B extra bootstrap samples (still of size n) from the original sample and consider where q( b) α (x j ), for b = 1 . . ., B, makes use of this bth new bootstrap sample; more precisely, the bootstrap sample is still only used to perform the iterations of the algorithm, whereas the original sample is used in both the initial grid and in (S2).
As shown in Charlier et al. (2015b) through simulations, both N → ISE ᾱ(N) and N → ISE ᾱ(N) are essentially convex in N and lead to roughly the same minimizers.This therefore provides a feasible way to select a reasonable value of N for the estimator qN,n α,B (x) in ( 5).Note that this also applies to qN,n α (x) by simply taking B = 1 in the procedure above.
If quantiles are to be estimated at various orders α, (7) will provide an optimal N-value for each α.It may then happen, in principle, that the resulting reference curves cross, which is of course undesirable.
The R Journal Vol.XX/YY, AAAA ISSN 2073-4859 One way to guarantee that no such crossings occur is to identify a common N-value for the various α's.
In such a case, N will be chosen as where the sum is computed over the various α-values considered.Charlier et al. (2015b) performed extensive comparisons between the quantization-based estimators in (5) -based on the efficient data-driven selection method for N just described -and some of their main competitors, namely estimators obtained from spline, k-nearest neighbor, and kernel methods.This revealed that the quantization-based estimators compete well in all cases, and often dominates their competitors (in terms of integrated square errors), particularly so for complex link functions; see Charlier et al. (2015b) for details.
Unlike the local linear and local constant estimators from Yu and Jones (1998), that are usually based on a global-in-x bandwidth, our quantization-based estimators are locally adaptive in the sense that, when estimating q α (x), the "working bandwidth" -that is, the size of the quantization cell containing x -depends on x.The k-nearest neighbor (kNN) estimator is closer in spirit to quantizationbased estimators but always selects k neighbors, irrespective of the x-value considered, whereas the number of X i 's in the quantization cell of x may depend on x.This subtle local-in-x behavior may explain the good empirical performances of quantization-based estimators over kernel and nearestneighbor competitors.Finally, spline methods (implemented in R with the rqss and qss functions) tend to perform poorly for complex link functions, since they always provide piecewise linear reference curves (Koenker et al., 1994).Moreover, the current implementation of the rqss function only supports dimensions 1 and 2, whereas our package allows to compute quantization-based estimators in any dimension d.

The QuantifQuantile package
This section provides a description of the various functions offered in the R package QuantifQuantile.We first detail the three functions that allow to estimate conditional quantiles through quantization.Then we describe a function computing optimal quantization grids.

Conditional quantile estimation
QuantifQuantile is composed of three main functions that each provide estimated conditional quantiles in (4)-( 5).These functions work in a similar way but address different values of d (recall that d is the dimension of the covariate vector X) : • The function QuantifQuantile is suitable for d = 1.
• Finally, QuantifQuantile.dcan deal with an arbitrary value of d.
Combined with the plot function, the first two functions provide reference curves and reference surfaces, respectively.No graphical outputs can be obtained from the third function if d > 2.
The three functions share the same arguments, but not necessarily the same default values.For each function, using args() displays the various arguments and corresponding default values : We now give more details on these arguments.
• X: a d × n real array (required by all three functions, a vector of length n for QuantifQuantile).
The columns of this matrix are the X i 's, i = 1, . . ., n.
• alpha: an r × 1 array with components in (0, 1) (optional for all three functions).This vector contains the orders for which q α (x) should be estimated.
• x: a d × J real array (optional for QuantifQuantile and QuantifQuantile.d2,required by QuantifQuantile.d).The columns of this matrix are the x j 's at which the quantiles q α (x j ) are to be estimated.If x is not provided when calling QuantifQuantile, then it is set to a vector of J = 100 equispaced values between the minimum and the maximum of the X i 's.If this argument is not provided when calling QuantifQuantile.d2,then the default for x is a matrix whose J = 202 = 400 column vectors are obtained as follows: 20 equispaced values are considered between the minimum and maximum values of the (X i ) 1 's and similarly for the second component.The 400 column vectors of the default x are obtained by considering all combinations of those 20 values for the first component with the 20 values for the second one 2 .
• testN: an m × 1 vector of pairwise distinct positive integers (optional for all three functions).The entries of this vector are the elements of the set N in ( 7)-( 8), hence are the N-values for which the ISE ᾱ quantity considered will be evaluated.The default is (35, 40, . . . , 55) but it is strongly recommended to adapt it according to the sample size n at hand.
• p: a real number larger than or equal to one (optional for all three functions).This is the parameter p to be used when performing optimal quantization in L p -norm.
• B: a positive integer (optional for all three functions).This is the number of bootstrap replications B to be used in (5).
• tildeB: a positive integer (optional for all three functions).This is the number of bootstrap replications B to be used when determining Nᾱ ;opt or Nō pt .• same_N: a boolean variable (optional for all three functions).If same_N=TRUE, then a common value of N (that is, Nō pt in (8)) will be selected for all α's.If same_N=FALSE, then optimal values of N will be chosen independently for the various of α (which will provide several Nᾱ ;opt , as in ( 7)).
• ncores: number of cores to use.These functions can use parallel computation to save time by increasing this parameter.Parallel computation relies on mclapply from parallel package, hence is not available on Windows.
All three functions return the following list of objects, which is of class "QuantifQuantile" : • hatq_opt: an r × J real array (where r is the number of α-values considered).If same_N=TRUE, then the entry (i, j) of this matrix is q Nō pt ,n α i ,B (x j ).If same_N=FALSE, then it is rather q Nᾱ i ;opt ,n α i ,B (x j ).This object can also be returned using the usual fitted.valuesfunction.
Depending on same_N, this provides the value of Nō pt or the vector ( Nᾱ 1 ;opt , . . ., Nᾱ r ;opt ).• hatISE_N: an r × m real array.The entry (i, j) of this matrix is ISE ᾱi (N j ).Plotting this for fixed α or plotting its average over the various α, in both cases over testN, allows to assess the global convexity of these ISEs.Hence, it can be used to indicate whether or not the choice of testN was adequate.This will be illustrated in the examples below.
• hatq_N: an r × J × m real array.The entry (i, j, ) of this matrix is qN ,n α i ,B (x j ), where N is the th entry of the argument testN.From this output, it is easy by fixing the third entry to get the matrix of the qN,n α i ,B (x j ) values for any N in testN.• The arguments X, Y, x, alpha, and testN are also reported in this response list.
Moreover, when the optimal value N_opt selected is on the boundary of testN, which means that testN most likely was not well chosen, a warning message is printed.
The "QuantifQuantile" class response can be used as argument of the functions plot (only for d ≤ 2), summary and print.The plot function draws the observations and plots the estimated conditional quantile curves (d = 1) or surfaces (d = 2) -for d = 2, the rgl package is used (Adler et al., 2014), which allows to change the perspective in a dynamic way.In order to illustrate the selection of N, the function plot also has an optional argument ise.Setting this argument to TRUE (the default is FALSE), this function, that can be used for any dimension d, provides the plot (against N) of the ISE ᾱ and ISE ¯quantities in (7) or in (8), depending on the choice same_N=FALSE or same_N=TRUE, respectively; see the examples below for details.If d ≤ 2, it also returns the fitted quantile curves or surfaces.

Computing optimal grids
Besides the functions that allow to estimate conditional quantiles and to plot the corresponding reference curves/surfaces, QuantifQuantile provides a function that computes optimal quantization grids.This function, called choice.grid,admits the following arguments : • X: a d × n real array (required).The columns of this matrix are the X i 's, i = 1, . . ., n, for which the optimal quantization grid should be determined.Each point of X is used as a stimulus in the stochastic gradient algorithm to get an optimal grid.
• N: a positive integer (required).The size of the desired quantization grid.
• ng: a positive integer (optional).The number of desired quantization grids.The default is 1.
• p: a real number larger than or equal to one (optional).This is the parameter p used in the quantization error.The default is 2.
In some cases, it may be necessary to have several quantization grids, such as in ( 5), where B + tildeB grids are needed.The three functions computing quantization-based conditional quantiles then call the function choice.gridwith ng > 1.In such case, the various grids are obtained using as stimuli a resampling version of X (the X t 's in the previous section).
The output is a list containing the following elements : • init_grid: a d×N×ng real array.The entry (i, j, ) of this matrix is the i th component of the j th point of the th initial grid.
• opti_grid: a d×N×ng real array.The entry (i, j, ) of this matrix is the i th component of the j th point of the th optimal grid provided by the algorithm.

Illustrations
In this section, we illustrate on several examples the use of the functions described above.Examples 1-3 restrict to QuantifQuantile/QuantifQuantile.d2 and provide graphical representations in each case.
Example 4 deals with a three-dimensional covariate, without graphical representation.An illustration of the function choice.grid is given in the Appendix.

Example 1 : simulated data with one-dimensional covariate
We generate a random sample (X i , Y i ) , i = 1, . . ., n = 300, where the X i 's are uniformly distributed over the interval (−2, 2) and where the Y i 's are obtained by adding to X 2 i a standard normal error term that is independent of X i : We test the number N of quantizers between 10 and 30 by steps of 5 and we do not change the default values of the other arguments.We then run the function QuantifQuantile with these arguments and stock the response in res.testN <-seq(1 , 3 , by = 5) res <-QuantifQuantile(X, Y, testN = testN) No warning message is printed, which means that this choice of testN was adequate.To assess this in a graphical way, we use the function plot with ise argument set to TRUE that plots hatISEmean_N (obtained by taking the mean of hatISE_N over alpha) against the various N-values in testN., 11, 12, . . . , 19, 20, 25, 30}.
Figure 1a provides the resulting graph, which confirms that testN was well chosen since hatISEmean_N is larger for smaller and larger values of N than N_opt.We then plot the corresponding estimated conditional quantiles curves in Figure 1b.The default colors of the points and of the curves are changed by using the col.plot argument.This argument is a vector of size 1+length(alpha), whose first component fixes the color of the data points and whose remaining components determine the colors of the various reference curves.
col.plot <-c("grey", "red", "orange", "green", "orange", "red") plot(res, col.plot = col.plot,xlab = "X", ylab = "Y") It is natural to make the grid testN finer.Of course, the more N-values we test, the longer it takes.This is why we adopted this two-stage approach, where the goal of the first stage was to get a rough approximation of the optimal N-value.In the second stage, we can then refine the grid only in the vicinity of the value N_opt obtained in the first stage.

Example 2 : simulated data with two-dimensional covariate
The sample considered here is made of n = 1, 000 independent realizations of a random vector (X , Y) , where X = (X 1 , X 2 ) is uniformly distributed on the square (−2, 2) 2 and where Y is obtained by adding to X 2 1 + X 2 2 an independent standard normal error term.
The R Journal Vol.XX/YY, AAAA ISSN 2073-4859 We test N between 40 and 90 by steps of 10.We change the values of B and tildeB to reduce the computational burden, which is heavier for d = 2 than for d = 1.We keep the default values of all other arguments when running the function QuantifQuantile.d2.Here, a warning message is printed informing us that testN was not well-chosen.We confirm it with the function plot with ise argument set to TRUE.

Example 3 : real data study and comparison with some competitors
This example aims at illustrating the proposed estimated reference curves on a real data set and at comparing them with some competitors.In this example, the ncores parameter of QuantifQuantile function was set to the number of cores detected by R minus 1.The data used here, that are included in the QuantifQuantile package, involves several variables related to employment, housing and environment associated with n = 542 towns/villages in Gironde, France.For the present illustration, we restrict to the regressions R 1 and R 2 involving (X, Y) = (percentage of owners living in their primary residence, percentage of buildings area) and (X, Y) = (percentage of middle-range employees, population density), respectively.In both cases, n = 542 observations are available and we are interested in the estimation of reference curves for α = 0.05, 0.25, 0.50, 0.75 and 0.95.For both R 1 and R 2 , we tested the number N of quantizers to be used between 5 and 15 by step of 1, using the methodology described in Example 1.
Of course, the computational burden is also an important issue.Therefore, Table 1 gathers, for each method and each regression problem, the average and standard deviation of the computing times in a collection of 50 runs (these 50 runs were considered to make results more reliable).In each case, our method is faster than its spline-based competitor.

Example 4 : real data with three-dimensional covariate
To treat an example with d > 2, we reconsider the data set in Example 3, this time with the response Y = population density and the three covariates X 1 = percentage of farmers, X 2 = percentage of unemployed workers, and X 3 = percentage of managers.In this setup, no graphical output is available.We therefore restrict to a finite collection of x-values where conditional quantiles are to be esti- mated.Denoting by M j and X j , j = 1, 2, 3, the maximal value and the average of X ij , i = 1, . . ., n = 542, respectively, we consider the following eight values of x : The function QuantifQuantile.d is then evaluated for the response and covariates indicated above, and with the arguments alpha=(0.25,0.5, 0.75) , testN=(5, 6, 7, 8, 9, 10) , x being the 3 × 8 matrix whose columns are the vectors x 1 , x 2 , . . ., x 8 just defined and ncores being the number of cores detected by R minus 1.

Conclusion
In this paper, we described the package QuantifQuantile that allows to implement the quantizationbased quantile regression method introduced in Charlier et al. (2015a,b).The package is simple to use, as the function QuantifQuantile and its multivariate versions essentially only require providing the covariate and response as arguments.Since the choice of the tuning parameter N is crucial, a warning message is printed if it is not well-chosen and the function plot can also be used as guide to change adequately the value of the parameter testN in the various functions.Moreover, a graphical illustration is directly provided by the same function plot when the dimension of the covariate is smaller than or equal to 2. Finally, this package also contains a function that provides optimal quantization grids, which might be useful in other contexts, too.
are very similar to the ones in Figure1d.The R Journal Vol.XX/YY,

Figure 4 :
Figure 4: For the sample considered in Example 2, this figure plots (with two different views) the estimated conditional quantile surfaces obtained with the plot function for α =0.05, 0.25, 0.50, 0.75 and 0.95.

Figure 7 :
Figure7: For n = 2, 000 (left) and n = 20, 000 (right), an initial grid of size 15 obtained by sampling without replacement among a random sample of size n from the uniform distribution over (−2, 2) (top), and the corresponding optimal grid returned by choice.grid(bottom).

Table 1 :
Averages of the computing times (in seconds) to obtain 50 times the conditional quantile curves in Example 3 for QuantifQuantile and rqss, respectively; standard deviations are reported in parentheses.