Linear regression with stationary errors : the R package slm

This paper introduces the R package slm which stands for Stationary Linear Models. The package contains a set of statistical procedures for linear regression in the general context where the error process is strictly stationary with short memory. We work in the setting of Hannan (1973), who proved the asymptotic normality of the (normalized) least squares estimators (LSE) under very mild conditions on the error process. We propose different ways to estimate the asymptotic covariance matrix of the LSE, and then to correct the type I error rates of the usual tests on the parameters (as well as confidence intervals). The procedures are evaluated through different sets of simulations, and two examples of real datasets are studied.


Introduction
We consider the usual linear regression model where Y is the n-dimensional vector of observations, X is a (possibly random) n × p design matrix, β is a p-dimensional vector of parameters, and ε = (ε i ) 1≤i≤n is the error process (with zero mean and independent of X). The standard assumptions are that the ε i 's are independent and identically distributed (i.i.d.) with zero mean and finite variance.
In this paper, we propose to modify the standard statistical procedures (tests, confidence intervals, . . . ) of the linear model in the more general context where the ε i 's are obtained from a strictly stationary process (ε i ) i∈N with short memory. To be more precise, letβ denote the usual least squares estimator of β. Our approach is based on two papers: the paper by Hannan (1973) who proved the asymptotic normality of the least squares estimator D(n)(β − β) (D(n) being the usual normalization) under very mild conditions on the design and on the error process; and a recent paper by Caron (2019) who showed that, under Hannan's conditions, the asymptotic covariance matrix of D(n)(β − β) can be consistently estimated.
Let us emphasize that Hannan's conditions on the error process are very mild and are satisfied for most of short-memory processes (see the discussion in Section 4.4 of Caron and Dede (2018)). Putting together the two above results, we can develop a general methodology for tests and confidence regions on the parameter β, which should be valid for most of short-memory processes. This is of course directly useful for time-series regression, but also in the more general context where the residuals of the linear model seem to be strongly correlated. More precisely, when checking the residuals of the linear model, if the autocorrelation function of the residuals shows significant correlations, and if the residuals can be suitably modeled by an ARMA process, then our methodology is likely to apply. We shall give an example of such a situation on the "Shanghai pollution" dataset at the end of the paper.
Hence, the tools presented in the present paper can be seen from two different points of view: -as appropriate tools for time series regression with short memory error process -as a way to robustify the usual statistical procedures when the residuals are correlated.
Let us now describe the organisation of the paper. In the next section, we recall the mathematical background, the consistent estimator of the asymptotic covariance matrix introduced in Caron (2019) and the modified Z-statistics and χ-square statistics for testing hypothesis on the parameter β. Next, we present the slm package, and the different ways to estimate the asymptotic covariance matrix: by fitting an autoregressive process on the residuals (default procedure), by means of the kernel estimator described in Caron (2019) (theoretically valid) with a bootstrap method to choose the bandwidth (Wu and Pourahmadi (2009)), by using alternative choices of the bandwidth for the rectangular kernel (Efromovich (1998)) and the quadratic spectral kernel (Andrews (1991)), by means of an adaptive estimator of the spectral density via Histograms (Comte (2001)). In a section about numerical experiments, we estimate the level of a χ-square test for a linear model with random design, with different kinds of error processes and for different estimation procedures. In the last section, we apply the package to the "Shanghai pollution" dataset, and we compare the summary output of slm with the usual summary output of lm. An extended version of this paper is available as an arXiv preprint (see ).

Asymptotic results for the kernel estimator
We start this section by giving a short presentation of linear regression with stationary errors, more details can be found for instance in Caron (2019). Letβ be the usual least squares estimator for the unknown vector β. The aim is to provide hypothesis tests and confidence regions for β in the non i.i.d. context.
Let γ be the autocovariance function of the error process ε: for any integers k and m, let γ(k) = Cov(εm, ε m+k ). We also introduce the covariance matrix Hannan (1973) has shown a Central Limit Theorem forβ when the error process is strictly stationary, under very mild conditions on the design and the error process. Let us notice that the design can be random or deterministic. We introduce the normalization matrix D(n) which is a diagonal matrix with diagonal term d j (n) = X .,j 2 for j in {1, . . . , p}, where X .,j is the jth column of X. Roughly speaking, Hannan's result says in particular that, given the design X, the vector D(n)(β − β) converges in distribution to a centered Gaussian distribution with covariance matrix C. As usual, in practice the covariance matrix C is unknown and it has to be estimated. Hannan also showed the convergence of second order moment: 1 In this paper we propose a general plug-in approach: for some given estimator Γn of Γn, we introduce the plug-in estimator and we use C to standardize the usual statistics considered for the study of linear regression.
Let us illustrate this plug-in approach with a kernel estimator which has been proposed in Caron (2019). For some K and a bandwidth h, the kernel estimator Γ n,h is defined by where the residual based empirical covariance coefficients are defined for 0 ≤ |k| ≤ n − 1 bỹ For a well-chosen kernel K and under mild assumptions on the design and the error process, it has been proved in Caron (2019) that for the plug-in estimator Cn := C( Γ n,hn ), for some suitable sequence of bandwidths (hn).
More generally, in this paper we say that an estimator Γn of Γn is consistent for estimating the covariance matrix C if C( Γn) is positive definite and if it converges in probability to C. Note that such a property requires assumptions on the design, see Caron (2019). If C( Γn) is consistent for estimating the covariance matrix C, then C( Γn) −1/2 D(n)(β − β) converges in distribution to a standard Gaussian vector.
To conclude this section, let us make some additional remarks. The interest of Caron's recent paper is that the consistency of the estimator C( Γn) is proved under Hannan's condition on the error process, which is known to be optimal with respect to the convergence in distribution (see for instance Dedecker (2015)), and which allows to deal with most short memory processes. But the natural estimator of the covariance matrix ofβ based on Γn has been studied by many other 1 The transpose of a matrix X is denoted by X t . authors in various contexts. For instance, let us mention the important line of research initiated by West (1987, 1994), and the related papers by Andrews (1991); Andrews and Monahan (1992) among others. In the paper by Andrews (1991), the consistency of the estimator based on Γn is proved under general conditions on the fourth order cumulants of the error process, and a data driven choice of the bandwidth is proposed. Note that these authors also considered the case of heteroskedastic processes. Most of these procedures, known as HAC (Heteroskedasticity and Autocorrelation Consistent) procedures, are implemented in the package sandwich by Zeileis, Lumley, Berger and Graham, and presented in great detail in the paper by Zeileis (2004). We shall use an argument of the sandwich package, based on the data driven procedure described by Andrews (1991).

Tests and confidence regions
We now present tests and confidence regions for arbitrary estimators Γn. The complete justifications are available for kernel estimators, see Caron (2019).

Z-Statistics.
We introduce the following univariate statistics: where C = C( Γn). If Γn is consistent for estimating the covariance matrix C and if β j = 0, the distribution of Z j converges to a standard normal distribution when n tends to infinity. We directly derive an asymptotic hypothesis test for testing β j = 0 against β j = 0 as well as an asymptotic confidence interval for β j .
Chi-square statistics. Let A be a n × k matrix with rank(A) = k. Under Hannan (1973)'s conditions, D(n)(Aβ − Aβ) converges in distribution to a centered Gaussian distribution with covariance matrix ACA t . If Γn is consistent for estimating the covariance matrix C, then A C( Γn) converges in probability to AC. The matrix A C( Γn)A t being symmetric positive definite, this yields This last result provides asymptotical confidence regions for the vector Aβ. It also provides an asymptotic test for testing the hypothesis H 0 : Aβ = 0 against H 1 : Aβ = 0. Indeed, under the H 0 -hypothesis, the distribution of W 2 2 converges to a χ 2 (k)-distribution. The test can be used to simplify a linear model by testing that several linear combinations between the parameters β j are zero, as we usually do for Anova and regression models. In particular, with A = Ip, the test corresponds to the test of overall significance.

Introduction to linear regression with the slm package
Using the slm package is very intuitive because the arguments and the outputs of slm are similar to those of the standard functions lm, glm, etc. The output of the main function slm is an object of class "slm", a specific class that has been defined for linear regression with stationary processes. The "slm" class has methods plot, summary, confint and predict, see the extended version  for more details. Moreover, the class "slm" inherits from the "lm" class and thus provides the output of the classical lm function.
The statistical tools available in slm strongly depend on the choice of the covariance plug-in estimator C( Γn) we use for estimating C. All the estimators Γn proposed in slm are residual-based estimators, but they rely on different approaches. In this section, we present the main functionality of slm together with the different covariance plug-in estimators.
For illustrating the package, we simulate synthetic data according to the linear model: where Z is a gaussian autoregressive process of order 1, and ε is the Nonmixing process described further in the paper. We use the functions generative_model and generative_process respectively to simulate observations according to this regression design and with this specific stationary process.

Linear regression via AR fitting on the residuals
A large class of stationary processes with continuous spectral density can be well approximated by AR processes, see for instance Corollary 4.4.2 in Brockwell and Davis (1991). The covariance structure of an AR process having a closed form, it is thus easy to derive an approximation Γ AR(p) of Γn by fitting an AR process on the residual process. The AR-based method for estimating C is the default version of slm. This method proceeds in four main steps: 1. Fit an autoregressive process on the residual processε ; 2. Compute the theoretical covariances of the fitted AR process ; 3. Plug the covariances in the Toeplitz matrix Γ AR(p) ; The slm function fits a linear regression of the vector Y on the design X and then fits an AR process on the residual process using the ar function from the stats package. The output of the slm function is an object of class "slm". The order p of the AR process is set in the argument model_selec: The estimated covariance is recorded as a vector in the attribute cov_st of regslm, which is an object of class "slm". The estimated covariance matrix can be computed by taking the Toeplitz matrix of cov_st, using the toeplitz function.
AR order selection. The order p of the AR process can be chosen at hand by setting model_selec = p, or automatically with the AIC criterion by setting model_selec = -1.

Linear regression via kernel estimation of the error covariance
The second method for estimating the covariance matrix C is the kernel estimation method (1) studied in Caron (2019). In short, this method estimates C via a smooth approximation of the covariance matrix Γn of the residuals. This estimation of Γn corresponds to the so-called tapered covariance matrix estimator in the literature, see for instance Xiao and Wu (2012), or also to the "lag-window estimator" defined in Brockwell and Davis (1991), page 330. It applies in particular for non negative symmetric kernels with compact support, with an integrable Fourier transform and such that K(0) = 1. Table 1 gives the list of the available kernels in the package slm. It is also possible for the user to define his own kernel and to use it in the argument kernel_fonc of the slm function. Below we use the triangle kernel which assures that the covariance matrix is positive definite. The support of the kernel K in Equation (1) being compact, only the termsγ j−l for small enough lag j − l are kept and weighted by the kernel in the expression of Γ n,h . Rather than setting the bandwidth h, we select the number of γ(k)'s that should be kept (the lag) with the argument model_selec in the slm function. Then the bandwidth h is calibrated accordingly, that is equal to model_selec +1. R> regslm = slm(Y~X1 + X2, data = design, method_cov_st = "kernel", + model_selec = 5, kernel_fonc = triangle, plot = TRUE) The plot output by the slm function is given in Figure 1. Order selection via bootstrap. The order parameter can be chosen at hand as before or automatically by setting model_selec = -1. The automatic order selection is based on the bootstrap procedure proposed by Wu and Pourahmadi (2009) for banded covariance matrix estimation. The block_size argument sets the size of bootstrap blocks and the block_n argument sets the number of blocks. The final order is chosen by taking the order which has the minimal risk. Figure 2 gives the plots of the estimated risk for the estimation of Γn (left) and the final estimated ACF (right).
R> regslm = slm(Y~X1 + X2, data = design, method_cov_st ="kernel", + model_selec = -1, kernel_fonc = triangle, model_max = 30, + block_size = 100, block_n = 100, plot = TRUE) The selected order is recorded in the model_selec attribute of the slm object output by the slm function: R> regslm@model_selec [1] 10 Order selection by Efromovich's method (rectangular kernel). An alternative method for choosing the bandwidth in the case of the rectangular kernel has been proposed in Efromovich (1998). For a large class of stationary processes with exponentially decaying autocovariance function (mainly the ARMA processes), Efromovich proved that the rectangular kernel is asymptotically minimax, and he proposed the following estimator:  with the lag where r is a regularity index of the autocovariance index. In practice this parameter is unknown and is estimated thanks to the algorithm proposed in the section 4 of Efromovich (1998). As for the other methods, we use the residual based empirical covariancesγ k to computef Jnr (λ).
R> regslm = slm(Y~X1 + X2, data = design, method_cov_st = "efromovich", + model_selec = -1) Order Selection by Andrews's method. Another method for choosing the bandwidth has been proposed by Andrews (1991) and implemented in the package sandwich by Zeileis, Lumley, Berger and Graham (see the paper by Zeileis (2004)). For the slm package, the automatic choice of the bandwidth proposed by Andrews can be obtained as follows R> regslm = slm(Y~X1 + X2, data = design, method_cov_st = "hac") The procedure is based on the function kernHAC in the sandwich package. This function computes directly the covariance matrix estimator ofβ, which will be recorded in the slot Cov_ST of the slm function. Here, we take the quadratic spectral kernel as suggested by Andrews (see Section 2 in Andrews (1991), or Section 3.2 in Zeileis (2004)), but other kernels could be used, such as Bartlett, Parzen, Tukey-Hamming, among others (see Zeileis (2004)).

Positive definite projection.
Depending of the method used, the matrix C( Γn) may not always be positive definite. It is the case of the kernel method with rectangular or trapeze kernel. To overcome this problem, we make the projection of C( Γn) into the cone of positive definite matrices by applying a hard thresholding on the spectrum of this matrix: we replace all eigenvalues lower or equal to zero with the smallest positive eigenvalue of C( Γn). Note that this projection is useless for the triangle or quadratic spectral kernels because their Fourier transform is non-negative (leading to a positive definite matrix C( Γn)). Of course, it is also useless for the fitAR and spectralproj methods.

Linear regression via projection spectral estimation
The projection method relies on the ideas of Comte (2001), where an adaptive nonparametric method has been proposed for estimating the spectral density of a stationary Gaussian process. We use the residual process as a proxy for the error process and we compute the projection coefficients with the residual-based empirical covariance coefficientsγ k , see Equation (2). For some d ∈ N * , the estimator of the spectral density of the error process that we use is defined by computing the projection estimators for the residual process, on the basis of histogram functions The estimator is defined byf where the projection coefficients arê The Fourier coefficients of the spectral density are equal to the covariance coefficients. Thus, for k = 1, . . . , n − 1 it yields and for k = 0: This method can be proceeded in the slm function by setting method_cov_st = "spectralproj": R> regslm = slm(Y~X1 + X2, data = design, method_cov_st = "spectralproj", + model_selec = 10, plot = TRUE) The graph of the estimated spectral density can be plotted by setting plot = TRUE in the slm function, see Figure 3. Model selection. The Gaussian model selection method proposed in Comte (2001) follows the ideas of Birgé and Massart, see for instance Massart (2007). It consists in minimizing the l 2 penalized criterion, see Section 5 in Comte (2001): where c is a multiplicative constant that in practice can be calibrated using the slope heuristic method, see Birgé and Massart (2007); Baudry et al. (2012) and the R package capushe.

Linear regression via masked covariance estimation
This method is a full-manual method for estimating the covariance matrix C by only selecting covariance terms from the residual covariancesγ k defined by (2). Let I be a set of positive integers, then we considerγ and then we define the estimated covariance marix Γ I by taking the Toeplitz matrix of the vector γ I . This estimator is a particular example of masked sample covariance estimator, as introduced by Chen et al. (2012), see also Levina and Vershynin (2012). Finally we derive from Γ I an estimator C( Γ I ) for C. The next instruction selects the coefficients 0, 1, 2 and 4 from the residual covariance terms: R> regslm = slm(Y~X1 + X2, data = design, method_cov_st = "select", + model_selec = c(1,2,4)) The positive lags of the selected covariances are recordered in the model_selec argument. Let us notice that the variance γ 0 is automatically selected.
As for the kernel method, the resulting covariance matrix may not be positive definite. If it is the case, the positive definite projection method described before is used.

Linear regression via manual plugged covariance matrix
This last method is a direct plug-in method. The user proposes his own vector estimatorγ of γ and then the Toeplitz matrix Γn of the vectorγ is used for estimating C with C( Γn).

Numerical experiments and method comparisons
This section summarizes an extensive study which has been carried out to compare the performances of the different approaches presented before in the context of linear model with short range dependent stationary errors.

Description of the generative models
We first present the five generative models for the errors that we consider in the paper. We choose different kinds of processes to reflect the diversity of short-memory processes.
• AR1 process. The AR1 process is a gaussian AR(1) process defined by: where W i is a standard gaussian distribution N (0, 1).
• AR12 process. The AR12 process is a seasonal AR(12) process defined by: where W i is a standard gaussian distribution N (0, 1). When studying monthly datasets, one usually observes a seasonality of order 12. For example, when looking at climate data, the data are often collected per month, and the same phenomenon tends to repeat every year. Even if the design integrates the deterministic part of the seasonality, a correlation of order 12 remains usually present in the residual process.
• MA12 process. The MA12 is also a seasonal process defined by: where the (W i )'s are i.i.d. random variables following Student's distribution with 10 degrees of freedom.
• Nonmixing process. The three processes described above are basic ARMA processes, whose innovations have absolutely continuous distributions; in particular, they are strongly mixing in the sense of Rosenblatt (1956), with a geometric decay of the mixing coefficients (in fact the MA12 process is even 12-dependent, which means that the mixing coefficient α(k) = 0 if k > 12). Let us now describe a more complicated process: let (Z 1 , . . . , Zn) satisfying the AR (1) equation where Z 1 is uniformly distributed over [0, 1] and the η i 's are i.i.d. random variables with distribution B(1/2), independent of Z 1 . The process (Z i ) i≥1 is a strictly stationary Markov chain, but it is not α-mixing in the sense of Rosenblatt (see Bradley (1986)). Let now Q 0,σ 2 be the inverse of the cumulative distribution function of a centered Gaussian distribution with variance σ 2 (for the simulations below, we choose σ 2 = 25). The Nonmixing process is then defined by ε i = Q 0,σ 2 (Z i ).
The sequence (ε i ) i≥1 is also a stationary Markov chain (as an invertible function of a stationary Markov chain). By construction, ε i is N (0, σ 2 )-distributed, but the sequence (ε i ) i≥1 is not a Gaussian process (otherwise it would be mixing in the sense of Rosenblatt). Although it is not obvious, one can prove that the process (ε i ) i≥1 satisfies Hannan's condition (see Caron (2019), Section 4.2).
• Sysdyn process. The four processes described above have the property of "geometric decay of correlations", which means that the γ(k)'s tend to 0 at an exponential rate. However, as already pointed out in the introduction, Hannan's condition is valid for most of short memory processes, even for processes with slow decay of correlations (provided that the γ(k)'s are summable). Hence, our last example will be a non-mixing process (in the sense of Rosenblatt), with an arithmetic decay of the correlations.
The linear regression models simulated in the experiments all have the following form: where Z is a gaussian autoregressive process of order 1 and ε is one of the stationary processes defined above. For the simulations, β 1 is always equal to 3. All the error processes presented above can be simulated with the slm package with the generative_process function. The design can be simulated with the generative_model function.

Automatic calibration of the tests
It is of course of first importance to provide hypothesis tests with correct significance levels or at least with correct asymptotical significance levels, which is possible if the estimator Γn of the covariance matrix Γn is consistent for estimating C. For instance, the results of Caron (2019) show that it is possible to construct statistical tests with correct asymptotical significance levels. However in practice such asymptotical results are not sufficient since they do not indicate how to tune the bandwidth on a given dataset. This situation makes the practice of linear regression with dependent errors really more difficult than linear regression with i.i.d. errors. This problem happens for several methods given before : order choice for the fitAR method, bandwidth choice for the kernel method, dimension selection for the spectralproj method.
It is a tricky issue to design a data driven procedure for choosing test parameters in order to have to correct Type I Error. Note that unlike with supervised problems and density estimation, it is not possible to calibrate hypothesis tests in practice using cross validation approaches. We thus propose to calibrate the tests using well founded statistical procedures for risk minimization : AIC criterion for the fitAR method, bootstrap procedures for the kernel method and slope heuristics for the spectralproj method. These procedures are implemented in the slm function with the model_selec = -1 argument, as detailed in the previous section.
Let us first illustrate the calibration problem with the AR12 process. For T = 1000 simulations, we generate an error process of size n under the null hypothesis: H 0 : β 2 = β 3 = 0. Then we use the fitAR method of the slm function with orders between 1 and 50 and we perform the model significance test. The procedure is repeated 1000 times and we estimate the true level of the test by taking the average of the estimated levels on the 1000 simulations for each order. The results are given on Figure 4 for n = 1000. A boxplot is also displayed to visualize the distribution of the order selected by the automatic criterion (AIC).

Figure 4:
Estimated level of the test according to the order of the fitted AR process on the residuals (top) and boxplot of the order selected by AIC, over 1000 simulations. The data has been simulated according to Model (5) with β 1 = 3 and β 2 = β 3 = 0, with n = 1000.

Non-Seasonal errors
We first study the case of non-Seasonal error processes. We simulate a n-error process according to the AR1, the Nonmixing or the Sysdyn processes. We simulate realizations of the linear regression model (5) under the null hypothesis: H 0 : β 2 = β 3 = 0. We use the automatic selection procedures for each method (model_selec = -1). The simulations are repeated 1000 times in order to estimate the true level of the model significance for each test procedure. We simulate either small samples (n = 200) or larger samples (n = 1000, 2000, 5000). The results of this experiments are summarized in Table 2  For n large enough (n ≥ 1000), all methods work well and the estimated level is around 0.05. However, for small samples (n = 200), we observe that the fitAR and the hac methods show better performances than the others. The kernel method is slightly less effective. With this method, we must choose the size of the bootstrap blocks as well as the number of blocks and the test results are really sensitive to these parameters. In these simulations, we have chosen 100 blocks with a size of n/2. The results are expected to improve with a larger number of blocks.
Let us notice that for all methods and for all sample sizes, the estimated level is much better than if no correction is made (usual Fisher tests).

Seasonal errors
We now study the case of linear regression with seasonal errors. The experiment is exactly the same as before, except that we simulate AR12 or MA12 processes. The results of these experiments are summarized in Table 3.
We directly see that the case of seasonal processes is more complicated than for the non-seasonal processes especially for the AR12 process. For small samples size, the estimated level is between 0.17 and 0.24, which is clearly too large. It is however much better than the estimated level of the usual Fisher test, which is around 0.45. The fitAR method is the best method here for the AR12  process, because for n ≥ 1000, the estimated level is between 0.06 and 0.07. For efromovich and kernel methods, a level less than 0.10 is reached but for large samples only. The spectralproj and hac methods do not seem to work well for the AR12 process, although they remain much better than the usual Fisher tests (around 19% of rejection instead of 45%).
The case of the MA12 process seems easier to deal with. For n large enough (n ≥ 1000), the estimated level is between 0.04 and 0.07 whatever the method, except for hac (around 0.15 for n = 5000). It is less effective for small sample size (n = 200) with an estimated level around 0.115 for fitAR, spectralproj and efromovich methods.

I.I.D. errors
To be complete, we consider the case where the i 's are i.i.d., to see how the five automatic methods perform in that case. We simulate n i.i.d. centered random variables according to the formula: where W follows a student distribution with 10 degrees of freedom. Note that the distribution of the i 's is not symmetric and has no exponential moments. Except for the kernel method, the  estimated levels are close to 5% for n large enough (n ≥ 300). It is slightly worse for small samples but it remains quite good for the methods fitAR, efromovich and hac.
As a general conclusion of this section about numerical experiments and method comparison, we see that the fitAR method performs quite well in a wide variety of situations, and should therefore be used as soon as the user suspects that the error process can be modeled by a stationary short-memory process.

Application to the PM2.5 pollution Shanghai Dataset
This dataset comes from a study about fine particle pollution in five Chinese cities. The data are available on the following website https://archive.ics.uci.edu/ml/datasets/PM2.5+Data+of+ Five+Chinese+Cities#. We are interested here by the city of Shanghai. We study the regression of PM2.5 pollution in Xuhui District by other measurements of pollution in neighboring districts and also by meteorological variables. The dataset contains hourly observations between January 2010 and December 2015. More precisely it contains 52584 records of 17 variables: date, time of measurement, pollution and meteorological variables. More information on these data is available in the paper of Liang et al. (2016).
The autocorrelation function decreases pretty fast, and the partial autocorrelation function suggests that fitting an AR process on the residuals should be an appropriate method in this case. The automatic fitAR method of slm selects an AR process of order 28. The residuals of this AR fitting look like a white noise, as shown in Figure 6. Consequently, we propose to perform a linear regression with slm function, using the fitAR method on the complete model R> regslm = slm(shan_complete$PM_Xuhui~. ,data = shan_complete, + method_cov_st = "fitAR", model_selec = -1) R> summary(regslm) Call: "slm(formula = myformula, data = data, x = x, y = y)"   Note that the variables show globally larger p-values than with the lm procedure, and more variables have no significant effect than with lm. After performing a backward selection we obtain the following results R> shan_slm = shan [1:5000,c(7,8,9,10,11,13)] R> regslm = slm(shan_slm$PM_Xuhui~. , data = shan_slm, + method_cov_st = "fitAR", model_selec = -1) R> summary(regslm) Call: "slm(formula = myformula, data = data, x = x, y = y)" The backward selection with slm only keeps 5 variables.