BNSP: an R Package for Fitting Bayesian Semiparametric Regression Models and Variable Selection

The R package BNSP provides a unified framework for semiparametric location-scale regression and stochastic search variable selection. The statistical methodology that the package is built upon utilizes basis function expansions to represent semiparametric covariate effects in the mean and variance functions, and spike-slab priors to perform selection and regularization of the estimated effects. In addition to the main function that performs posterior sampling, the package includes functions for assessing convergence of the sampler, summarizing model fits, visualizing covariate effects and obtaining predictions for new responses or their means given feature/covariate vectors.


Introduction
There are many approaches to non-and semi-parametric modeling. From a Bayesian perspective, Müller & Mitra (2013) provide a review that covers methods for density estimation, modeling of random effects distributions in mixed effects models, clustering and modeling of unknown functions in regression models.
Our interest is on Bayesian methods for modeling unknown functions in regression models. In particular, we are interested in modeling both the mean and variance functions non-parametrically, as general functions of the covariates. There are multiple reasons why allowing the variance function to be a general function of the covariates may be important (Chan et al., 2006). Firstly, it can result in more realistic prediction intervals than those obtained by assuming constant error variance, or as Müller & Mitra (2013) put it, it can result in more honest representation of uncertainties. Secondly, it allows the practitioner to examine and understand which covariates drive the variance. Thirdly, it results in more efficient estimation of the mean function (Seber, 1977), and lastly, it produces more accurate standard errors of unknown parameters.
In the R (R Core Team, 2017) package BNSP (Papageorgiou, 2018) we have implemented Bayesian regression models with Gaussian errors and with mean and log-variance functions that can be modeled as general functions of the covariates. Covariate effects may enter the mean and log-variance functions parametrically or non-parametrically, with the nonparametric effects represented as linear combinations of basis functions. The strategy that we follow in representing unknown functions is to utilize a large number of basis functions. This allows for flexible estimation and for capturing true effects that are locally adaptive. Potential problems associated with large numbers of basis functions, such as over-fitting, are avoided in our implementation, and efficient estimation is achieved, by utilizing spike-slab priors for variable selection. The particular flavor of spike-slab prior that we implement is described in section 2. A review of variable selection methods is provided by O'Hara & Sillanpää (2009).
The methods described here belong to the general class of models known as generalized additive models for location, scale and shape (GAMLSS) (Rigby & Stasinopoulos, 2005;Stasinopoulos & Rigby, 2007) or the Bayesian analog termed as BAMLSS (Umlauf et al., 2018) and implemented in package bamlss (Umlauf et al., 2017). However, due to the nature of the spike-and-slab priors that we have implemented, in addition to flexible modeling of the mean and variance functions, the methods described here can also be utilized for selecting promising subsets of predictor variables in multiple regression models. The implemented methods fall in the general class of methods known as stochastic search variable selection (SSVS). SSVS has received considerable attention in the Bayesian literature and its applications range from investigating factors that affect individual's happiness (George & McCulloch, 1993), to constructing financial indexes (George & McCulloch, 1997) and to gene mapping (O'Hara & Sillanpää, 2009). These methods associate each regression coefficient, either a main effect or the coefficient of a basis function, with a latent binary variable that indicates whether the corresponding covariate is needed in the model or not. Hence, the joint posterior distribution of the vector of these binary variables can identify the models with the higher posterior probability.
Important work that is related to package BNSP includes the R package spikeSlabGAM (Scheipl, 2016) that also utilizes SSVS methods (Scheipl, 2011). A major difference between the two packages, however, is that whereas spikeSlabGAM utilizes spike-and-slab priors for function selection, BNSP utilizes spike-and-slab priors for variable selection. In addition, Bayesian distributional regression models can also be fit with R package brms (Bürkner, 2018) using normal prior (Bürkner, 2017). Further, important is also the R package gamboostLSS (Hofner et al., 2018) that includes a frequentist GAMLSS implementation based on boosting that can handle high-dimensional data (Mayr et al., 2012;Hofner et al., 2016). Lastly, the R package mgcv (Wood, 2018) can also fit generalized additive models with Gaussian errors and integrated smoothness estimation, with implementations that can handle large datasets.
In BNSP we have implemented functions for fitting such semi-parametric models, summarizing model fits, visualizing covariate effects and predicting new responses or their means. The main functions are mvrm, mvrm2mcmc, print.mvrm, summary.mvrm, plot.mvrm and predict.mvrm. Below we provide a quick description of these functions. The first one, mvrm, returns samples from the posterior distributions of the model parameters, and it is based on an efficient Markov chain Monte Carlo (MCMC) algorithm in which we integrate out the coefficients in the mean function, generate the variable selection indicators in blocks (Chan et al., 2006) and choose the MCMC tuning parameters adaptively (Roberts & Rosenthal, 2009). In order to minimize random-access memory utilization, posterior samples are not kept in memory, but instead written in files in a directory that must be supplied by the user. The second function, mvrm2mcmc, reads-in the samples from the posterior of the parameters of the users choice and it creates an object of class 'mcmc'. This enables users to easily utilize functions from package coda (Plummer et al., 2006), including functions plot and summary, for assessing convergence and for summarizing posterior distributions. Further, functions print.mvrm and summary.mvrm provide summaries of model fits, including models and priors specified, marginal posterior probabilities of term inclusion in the mean and variance models and models with the highest posterior probabilities. Function plot.mvrm, creates plots of parametric and nonparametric terms that appear in the mean and variance models. The function can create two-dimensional plots by calling functions from package ggplot2 (Wickham, 2009). It can also create static or interactive three-dimensional plots, by calling functions from packages plot3D (Soetaert, 2016) and threejs (Lewis, 2016). Lastly, function predict.mvrm provides predictions either for new responses or their means given feature/covariate vectors.
The remainder of this paper is divided in four sections. In section 2 we provide a detailed model description. Section 3 is the largest of this paper and it is devoted to the usage of the package and the options it provides to the users. Technical details on the implementation of the MCMC algorithm are provided in section 4. We conclude with a brief summary in section 5.
2 Mean-variance nonparametric regression models Let y = (y 1 , . . . , y n ) denote the vector of responses and let X = [x 1 , . . . , x n ] and Z = [z 1 , . . . , z n ] denote design matrices. The models that we consider express the vector of responses utilizing where 1 n is the usual n-dimensional vector of ones, β 0 is an intercept term, β 1 is a vector of regression coefficients and = ( 1 , . . . , n ) is an n-dimensional vector of independent random errors. Each i , i = 1, . . . , n, is assumed to have a normal distribution, i ∼ N (0, σ 2 i ), with variances that are modeled in terms of covariates. Let σ 2 = (σ 2 1 , . . . , σ 2 n ) . We model the vector of variances utilizing where α 0 is an intercept term and α 1 is a vector of regression coefficients. Equivalently, the model for the variances can be expressed as where σ 2 = exp(α 0 ). Let D(α) denote an n-dimensional, diagonal matrix with elements exp(z i α 1 /2), i = 1, . . . , n. Then, the model that we consider may be expressed more economically as where β = (β 0 , β 1 ) and X * = [1 n , X].
In the next subsections we describe how, within model (2), both parametric and nonparametric effects of explanatory variables on the mean and variance functions can be captured utilizing regression splines and variable selection methods. We begin by considering the special case where there is a single covariate entering the mean model and a single covariate entering the variance model, both in a flexible way.

Locally adaptive models with a single covariate
Suppose that the observed dataset consists of triplets (y i , u i , w i ), i = 1, . . . , n, where explanatory variables u and w enter flexibly the mean and variance models, respectively. To model the nonparametric effects of u and w we consider the following formulations of the mean and variance models In the preceding x i = (φ 11 (u i ), . . . , φ 1q 1 (u i )) and z i = (φ 21 (w i ), . . . , φ 2q 2 (w i )) are vectors of basis functions and β 1 = (β 1 , . . . , β q 1 ) and α 1 = (α 1 , . . . , α q 2 ) are the corresponding coefficients. In package BNSP we implemented two sets of basis functions. Firstly, radial basis functions where ||u|| denotes the Euclidean norm of u and ξ 1 , . . . , ξ q−1 are the knots that within package BNSP are chosen as the quantiles of the observed values of explanatory variable u, with ξ 1 = min(u i ), ξ q−1 = max(u i ) and the remaining knots chosen as equally spaced quantiles between ξ 1 and ξ q−1 . Secondly, we implemented thin plate splines where (a) + = max(a, 0) and the knots ξ 1 , . . . , ξ q−1 are determined as above.
Locally adaptive models for the mean and variance functions are obtained utilizing the methodology developed by Chan et al. (2006). Considering the mean function first, local adaptivity is achieved by utilizing a large number of basis functions q 1 . Over-fitting, and problems associated with it, is avoided by allowing positive prior probability that the regression coefficients are exactly zero. The latter is achieved by defining binary variables γ j , j = 1, . . . , q 1 , that take value γ j = 1 if β j = 0 and γ j = 0 if β j = 0. Hence, vector γ = (γ 1 , . . . , γ q 1 ) determines which terms enter the mean model. The vector of indicators δ = (δ 1 , . . . , δ q 2 ) for the variance function is defined analogously.
Given vectors γ and δ, the heteroscedastic, semiparametric model that we described in this section can be written in a form that is similar to (2) as where β γ consisting of all non-zero elements of β 1 and X * γ consists of the corresponding columns of X * . Subvector α δ is defined analogously.
We note that, as was suggested by Chan et al. (2006), we work with mean corrected columns in the design matrices X and Z, both in this paper and in the BNSP implementation. We remove the mean from all variables in the design matrices but the categorical variables.
2.2 Prior specification for models with a single covariate LetX = D(α) −1 X * . The prior for β γ is specified as (Zellner, 1986) Further, the prior for α δ is specified as Independent priors are specified for the indicators variables γ j as P (γ j = 1|π µ ) = π µ , j = 1, . . . , q 1 , from which the joint prior is obtained as Similarly, for the indicators δ j we specify independent priors P (δ j = 1|π σ ) = π σ , j = 1, . . . , q 2 . It follows that the joint prior is where N (δ) = q 2 j=1 δ j . We specify inverse Gamma priors for c β and c α and Beta priors for π µ and π σ Lastly, for σ 2 we consider inverse Gamma and half-normal priors

Extension to bivariate covariates
It is straightforward to extend the methodology described in subsections 2.1 and 2.2 in order to allow fitting of flexible mean and variance surfaces. In fact, the only modification required is in the basis functions and knots. For fitting surfaces, in package BNSP we implemented radial basis functions We also note that the prior specification presented in subsection 2.2 for fitting flexible functions remains unchained for fitting flexible surfaces.

Extension to additive models
In the presence of multiple covariates, the effects of which may be modeled parametrically or semiparametrically, the mean model in (3) is extended to the following where, u ip includes the covariates the effects of which are modeled parametrically, β denotes the corresponding effects, and f µ,k (u ik ), k = 1, . . . , K 1 , are flexible functions of univariate or bivariate covariates expressed, similarly to subsection 2.1, as where φ 1kj , j = 1, . . . , q 1k are the basis functions used in the kth component, k = 1, . . . , K, and defined in subsections 2.1 and 2.3.
Similarly to the mean model, the variance model in the presence of multiple covariates is expressed as the mean model in (3) is extended to the following For the more general case of additive models, local adaptivity is achieved using a similar strategy as in the single covariate case. That is, we utilize a potentially large number of knots or basis functions in the flexible components that appear in the mean model, f µ,k , k = 1, . . . , K 1 , and for those that appear in the variance model, f σ,k , k = 1, . . . , K 2 . To avoid over-fitting, we allow removal of the unnecessary ones utilizing the usual indicator variables, γ k = (γ k1 , . . . , γ kq 1k ) , k = 1, . . . , K 1 , and δ k = (δ k1 , . . . , δ kq 2k ) , k = 1, . . . , K 2 . Here, the q 1k -dimensional vector γ k determines which of the basis functions should appear in f µ,k . Likewise, δ k determines which of the basis functions should appear in f σ,k .
The model that we implemented in package BNSP specifies independent priors for the indicators variables γ kj as P (γ kj = 1|π µ k ) = π µ k , j = 1, . . . , q 1k . From these the joint prior follows where N (γ k ) = q 1k j=1 γ kj . Similarly, for the indicators δ kj we specify independent priors P (δ kj = 1|π σ k ) = π σ k , j = 1, . . . , q 2k . It follows that the joint prior is where N (δ k ) = q 2k j=1 δ kj . We specify the following independent priors The rest of the priors are the same as those specified as in section 2.2.

Usage
In this section we provide results on simulations studies and real data analyses. The purpose is twofold: firstly we want to point out that the package works well and provides the expected results (in simulation studies) and secondly we want to illustrate the options that the users of BNSP have and how these options are specified. We first note that the main function, mvrm, takes as input an extended formula argument (Zeileis & Croissant, 2010) that specifies the mean and variance models. For instance, letting y denote the response variable and x1 and x2 two covariates, modeling the mean function in terms of x1 and the variance function in terms of x2 is achieved by specifying formula = y x1 | x2. Furthermore, formulas for specifying the mean and variance functions follow the standard R syntax, with smooth terms specified by function sm. Function sm can take as input one or two covariates. When one covariate is provided as the argument of sm, it has to be continuous. When two covariates are provided, they can be both continuous or one continuous and one discrete. For instance, letting y denote the response variable, and assuming two continuous covariates, x1 and x2, and two discrete ones, d1 and d2, formula specifications for the mean and variance functions can include any of: d1*d2 + sm(x1,x2), d1 + x1 + sm(x2,d2), or d1 + d2 + sm(x1) + x2.

Simulation studies
Here we present results from three simulations studies, involving one, two, and multiple covariates. For the majority of these simulation studies, we utilize the same data-generating mechanisms as those presented by Chan et al. (2006).
To the generated dataset we fit a special case of the model that was presented in sections 2.1 and 2.2. For the mean and variance models in (3) and (4) we choose with φ the radial basis functions presented in (5). Further, we choose q 1 = q 2 = 21 basis functions or, equivalently, 20 knots. Hence, we have φ 1j (u) = φ 2j (u), j = 1, . . . , 21, which results in identical design matrices for the mean and variance models, X = Z. In R, the two models are specified using R> model <-y~sm(u, nknots = 20, bs = "rd") | sm(u, nknots = 20, bs = "rd") The above formula specifies the response, mean and variance models. Function sm takes as input the covariate u, the selected number of knots and the selected type of basis functions. Next we specify the hyper-parameter values for the priors in (6) and (7). The default prior for c β is inverse Gamma with a β = 0.5, b β = n/2 (Liang et al., 2008). For parameter c α the default prior is a non-informative but proper inverse Gamma with a α = b α = 1.1. Concerning π µ and π σ , the default priors are uniform, obtained by setting c µ = d µ = 1 and c σ = d σ = 1. Lastly, the default prior for the error standard deviation is the the normal with variance φ 2 σ = 2, |σ| ∼ N (0, 2). We choose to run the MCMC sampler for 10, 000 iterations and discard the first 5, 000 as burn-in. Of the remaining 5, 000 samples we retain 1 every 2 samples, resulting in 2, 500 posterior samples. Further, as mentioned above, we set the seed of the MCMC sampler equal to one.
R> sweeps <-10000 R> burn <-5000 R> thin <-2 R> seed <-1 Obtaining posterior samples is achieved by a function call of the form R> m1 <-mvrm(formula = model, data = data, sweeps = sweeps, burn = burn, + thin = thin, seed = seed, StorageDir = DIR, + c.betaPrior = "IG(0.5,0.5*n)", c.alphaPrior = "IG(1.1,1.1)", + pi.muPrior = "Beta(1,1)", pi.sigmaPrior = "Beta(1,1)", sigmaPrior = "HN(2)") Samples from the posteriors of the model parameters {β, γ, α, δ, c β , c α , σ 2 } are written in seven separate files which are stored in the directory specified by argument StorageDir. If a storage directory is not specified by the user, then function mvrm returns an error message as without these files there will be no output to process. Furthermore, the last two lines of the above function call show the specified priors, which are c β ∼ IG(0.5, n/2), c α ∼ IG(1.1, 1.1), π µ ∼ Beta(1, 1), π σ ∼ Beta(1, 1) and |σ| ∼ N (0, 2), respectively. As we mentioned above, these priors are the default ones, and hence the same function call can be achieved without specifying the last two lines. Here we display the priors in order to describe how users can specify their own priors. For parameters c β and c α only inverse Gamma priors are available, with parameters that can be specified by the user in the intuitive way. For instance, the prior c β ∼ IG(1.01, 1.01) can be specified in function mvrm by using c.betaPrior = "IG(1.01,1.01)". The second parameter of the prior for c β can be a function of the sample size n (but only symbol n would work here), so for instance c.betaPrior = "IG(1,0.4*n)" is another specification that is allowed by function mvrm. Further, Beta priors are available for parameters π µ and π σ with parameters that can be specified by the user again in the intuitive way. Lastly, two priors are available for the error variance. These are the default half-normal and the inverse Gamma. For instance, sigmaPrior = "HN(5)" defines |σ| ∼ N (0, 5) as the prior while sigmaPrior = "IG(1.1,1.1)" defines σ 2 ∼ IG(1.1, 1.1) as the prior. Function mvrm2mcmc reads in posterior samples from the files that the call to function mvrm generated and creates an object of class 'mcmc'. Hence, for summarizing posterior distributions and for assessing convergence, functions summary and plot from package coda (Plummer et al., 2006) can be used. As an example, here we read in the samples from the posterior of β R> beta <-mvrm2mcmc(m1, "beta") and summarize the posterior using summary. For the sake of economizing space, only the part of the output that describes the posteriors of β 0 , β 1 , β 2 , and β 3 is shown below R> summary(beta) 0.0000 0.0000 0.0000 0.0000 0.0000 sm(u).2 0.0000 0.0000 0.0000 0.0000 0.0000 Further, we may obtain a plot using R> plot(beta) Figure 1 shows the first four of the plots created by function plot. These are the plots of the samples from the posteriors of coefficients β 0 , β 1 , β 2 and β 3 . As we can see from both the summary and Figure 1, only the first two coefficients have posteriors that are not centered around zero.
Whereas the output of function print focuses on marginal inclusion probabilities, the output of function summary focuses on the most frequently visited models. It takes as input an object of class 'mvrm' and the number of (most frequently visited) models to be displayed, which by default is set to nModels = 5. Here to economize space we set nModels = 2. The information returned by functions summary is shown below > summary(m1, nModels = 2) Specified model for the mean and variance: y~sm(u, nknots = 20, bs = "rd") | sm(u, nknots = 20, bs = "rd") Specified priors: [1] c.beta = IG(0.5,0.5*n) c.alpha = IG(1. Firstly, the function provides the specified mean and variance models and the specified priors. This is followed by information about the MCMC chain and the directory where files have been stored. In addition, the function provides the null and the mean posterior deviance. Finally, the function provides the specification of the joint mean/variance models that were visited most often during MCMC sampling. This specification is in terms of a vector of indicators, each consisting of zeros and ones that show which terms are in the mean and variance model. To make clear which terms pertain to the mean and which to the Variables function, we have preceded the names of the models terms by 'mean.' or a 'var.'. In the above output we see that the most visited model specifies a linear mean model (only the linear term in included in the model) while the variance model includes twelve terms. See also Figure 2.
We next describe function plot.mvrm which creates plots of terms in the mean and variance functions. Two calls to function plot can be seen in the code below. Argument x expects an object of class 'mvrm', as created by a call to function mvrm. Argument model may take on one of three possible values: 'mean', 'stdev' or 'both', specifying the model the user is interested in visualizing. Further, argument term determines the term to be plotted. In the current example there is only one term in each of the two models which leaves us with only one choice, term = 'sm(u)'. For creating two-dimensional plots, as in the current example, function plot utilizes package ggplot2 (Wickham, 2009). Users of BNSP may add their own options to plots via argument plotOptions. The code below serves as an example.
The plot in Figure 2 panel (b) displays the posterior mean of the standard deviation function σ(u) = σ exp{ q 2 j=1 α j φ 2j (u)/2}. The details are the same as for the plot of the mean function, so here we briefly mention a difference. For the plot of the standard deviation function, option intercept = TRUE specifies that σ is included in the calculation, and again it may be removed by setting intercept = FALSE, which will result in plots of σ(u) * = exp{ q 2 j=1 α j φ 2j (u)/2}. Results from examples 2.-4. are displayed in Figure 3. The R code for obtaining these results is precisely the same as the code that generated the results in Figure 2, and hence omitted.
We conclude the current section by describing the function predict.mvrm. The function provides predictions and posterior credible or prediction intervals for given feature vectors. The two types of intervals differ in the associated level of uncertainty: prediction intervals attempt to capture a future response and are usually much wider than credible intervals that attempt to capture a mean response.
The following code shows how credible and prediction intervals can be obtained for a sequence of 30 covariate values x1 <-seq(0, 1, length.out = 30) p1 <-predict(m1, newdata = data.frame(u = x1), interval = "credible") p2 <-predict(m1, newdata = data.frame(u = x1), interval = "prediction") where the first argument in function predict is a fitted mvrm model, the second one is a data frame containing the feature vectors at which predictions are to be obtained and the last one defines the type of interval to be created. We applied the predict function using the data that were obtained from the first two data-generating mechanisms introduced earlier. To those two datasets we fitted two models: the first one is the one we showed earlier, were both the mean and variance are modeled in terms of covariates while the second one ignores the dependence of the variance on the covariate and it is specified in R using R> model <-y~sm(u, nknots = 20, bs = "rd") | 1 Results from are displayed in Figure 4. Each of the two figures displays a credible interval and two prediction intervals. The figure emphasizes a point that was discussed in the introductory section, namely that modeling the variance in terms of covariates can result in more realistic prediction intervals. The same point was recently discussed by Mayr et al. (2012).

Bivariate covariate case
Interactions between two predictors can be modeled by appropriate specification of the sm function. The function can take up to two covariates, both of which may be continuous or one continuous and one discrete. Interactions between discrete covariates may also be handled by function mvrm, but this is done in the usual way, without a call to function sm, as it is done, for instance, for fitting general linear models with function lm. Here we consider two examples, one involving interactions between two continuous covariates and one involving interactions between a continuous and a discrete covariate.
We conclude this section by presenting plots of the four terms in the mean and variance models. The plots are presented in Figure 7. We provide a few details on how function plot works in the presence of multiple terms, and how the comparison between true and estimated effects is made. Starting with the mean function, to create the relevant plots, that appear on the left panels of Figure 7) function plot considers only the part of the mean function µ(u) that is related to the chosen term while leaving all other terms out. For instance, in the code below we choose term = 'sm(u1)' and hence we plot the posterior mean and a posterior credible interval for 21 j=1 β 1j φ 1j (u 1 ), where the intercept β 0 is left out by option intercept = FALSE. Further, comparison is made with a centered version of the true curve, represented by the dashed (red color) line and obtained by the first three lines of code below.

Data analyses
In this section we present two empirical applications. The first one represents the simplest applied scenario, where one observes a continuous response and a single continuous covariate. The second one involves multiple covariates, including both continuous and discrete. The datasets that we utilize in these applications can be found in the R package np. The authors of the R package np also present results from the analysis of these datasets utilizing kernel-based estimators (Hayfield & Racine, 2008). The results concerning the mean function that we present below agree with those presented by Hayfield & Racine (2008). Our contribution here is to model the variance function too, in addition to the mean function. The descriptions of the datasets can be found in Hayfield & Racine (2008), however, for the sake of completeness, here we provide a brief description too.

Univariate case
In the first empirical application, we analyze a dataset from Pagan & Ullah (1999). The dataset consists of n = 205 observations on dependent variable logwage, the logarithm of the individual's wage, and covariate age, the individual's age. The dataset comes from the 1971 Census of Canada Public Use Sample Tapes and the sampling units it involves are males of common education. Hence, the investigation of the relationship between age and the logarithm of wage is carried out controlling for the two potentially important covariates education and gender. We utilize the following R code to specify flexible models for the mean and variance functions, and to obtain 5, 000 posterior samples, after a burn-in period of 25, 000 samples and a thinning period of 5.
R> data(cps71) R> sweeps <-50000; burn <-25000; thin <-5; seed <-1 R> DIR <-getwd() R> model <-logwage~sm(age, nknots = 30, bs = "rd") | sm(age, nknots = 30, bs = "rd") R> m4 <-mvrm(formula = model, data = cps71, sweeps = sweeps, burn = burn, thin = thin, seed = seed, StorageDir = DIR) After checking convergence, we use the following code to create the plots that appear in Figure 8. wagePlotOptions <-list(geom_point(data = cps71, aes(x = age, y = logwage))) plot(x = m4, model = "mean", term = "sm(age)", plotOptions = wagePlotOptions) plot(x = m4, model = "stdev", term = "sm(age)") Figure 8 (a) shows the posterior mean and an 80% credible interval for the mean function and it suggests a quadratic relationship between age and logwage. Figure 8 (b) shows the posterior mean and an 80% credible interval for the standard deviation function. It suggest a complex relationship between age and the variability in logwage. The relationship suggested by Figure 8 (b) is also suggested by the spread of the data-points around the estimated mean in Figure 8 (a). At ages around 20 years the variability in logwage is high. It then reduces until about age 30, to start increasing again until about age 45. From age 45 to 60 it remains stable but high, while for ages above 60, Figure 8 (b) suggests further increase in the variability, but the wide credible interval suggests high uncertainty over this age range.

General case
In the second empirical application, we analyze a dataset from Wooldridge (2003). The response variable here is the logarithm of the individual's hourly wage (lwage) while the covariates include the years of education (educ), the years of experience (exper), the years with the current employer (tenure), the individual's gender (named as female within the dataset, with levels Female and Male), and marital status (named as married with levels Married and Notmarried). The dataset consists of n = 526 independent observations. We analyze the first three covariates as continuous and the last two as discrete.
As the variance function is modeled in terms of an exponential, see (1), to avoid potential numerical errors, before we attempt to fit the described model to the data, we transform the three continuous variables to have range in the interval [0,1]. This is achieved with the R code below R> data(wage1) R> wage1$ntenure <-wage1$tenure / max(wage1$tenure) R> wage1$nexper <-wage1$exper / max(wage1$exper) R> wage1$neduc <-wage1$educ / max(wage1$educ) We choose to fit the following mean and variance models to the data We note that, as it turns out, an interaction between variables nexper and female is not necessary for the current data analysis. However, we choose to add this term in the mean model in order to illustrate how interaction terms can be specified. We illustrate further options below.
R> DIR <-getwd() R> m5 <-mvrm(formula = model, data = wage1, sweeps = sweeps, + burn = burn, thin = thin, seed = seed, StorageDir = DIR)) After summarizing results and checking convergence, we create plots of posterior means, along with 95% credible intervals, for functions f 1 , . . . , f 4 . These are displayed in Figure 9. As it turns out, variable married does not have an effect on the mean of lwage. For this reason, we do not provide further results on the posterior of the coefficient of covariate married, β 1 . However, in the code below we show how graphical summaries on β 1 can be obtained, if needed.

Appendix: MCMC algorithm
In this section we present the technical details of how the MCMC algorithm is designed for the case where there is a single covariate in the mean and variance models. We first note that to improve mixing of the sampler, we integrate out vector β from the likelihood of y, as was done by Chan et al. (2006) f (y|α, c β , γ, δ, σ 2 ) ∝ |σ 2 D 2 (α δ )| − 1 2 (c β + 1) − N (γ)+1 where, withỹ = D −1 (α δ )y, we have S = S(y, α, c β , γ, δ) =ỹ ỹ − c β 1+c βỹ X γ (X γX γ ) −1X γỹ . The six steps of the MCMC sampler are as follows 1. Similar to Chan et al. (2006), the elements of γ are updated in blocks of randomly chosen elements. The block size is chosen based on probabilities that can be supplied by the user or be left at their default values. Let γ B be a block of random size of randomly chosen elements from γ. The proposed value for γ B is obtained from its prior with the remaining elements of γ, denoted by γ B C , kept at their current value. The proposal pmf is obtained from the Bernoulli prior with π µ integrated out where L(γ B ) denotes the length of γ B i.e. the size of the block. For this proposal pmf, the acceptance probability of the Metropolis-Hastings move reduces to the ratio of the likelihoods in (10) min    1, (c β + 1) − N (γ P )+1 2 exp(−S P /2σ 2 ) where superscripts P and C denote proposed and currents values respectively.
2. Vectors α and δ are updated simultaneously. Similarly to the updating of γ, the elements of δ are updated in random order in blocks of random size. Let δ B denote a block. Blocks δ B and the whole vector α are generated simultaneously. As was mentioned by Chan et al. (2006), generating the whole vector α, instead of subvector α B , is necessary in order to make the proposed value of α consistent with the proposed value of δ.
Generating the proposed value for δ B is done in a similar way as was done for γ B in the previous step. Let δ P denote the proposed value of δ. Next, we describe how the proposed vale for α δ P is obtained. The development is in the spirit of Chan et al. (2006) who built on the work of Gamerman (1997).
Letβ C γ = {c β /(1 + c β )}(X γX γ ) −1X γỹ denote the current value of the posterior mean of β γ . Define the current squared residuals e C i = (y i − (x * iγ ) β C γ ) 2 , i = 1, . . . , n. These will have an approximate σ 2 i χ 2 1 distribution, where σ 2 i = σ 2 exp(z i α). The latter defines a Gamma generalized linear model (GLM) for the squared residuals with mean E(σ 2 i χ 2 1 ) = σ 2 i = σ 2 exp(z i α), which, utilizing a log-link, can be thought of as Gamma GLM with an offset term: log(σ 2 i ) = log(σ 2 ) + z i α. Given the proposed value of δ, denoted by δ P , the proposal density for α P δ P is derived utilizing the one step iteratively reweighted least squares algorithm. This proceeds as follows. First define the transformed observations where superscript C denotes current values. Further, let d C denote the vector of d C i . Next we define ∆(δ P ) = (c −1 α I + Z δ P Z δ P ) −1 andα(δ P , α C ) = ∆ δ P Z δ P d C , where Z is the design matrix. The proposed value α P δ P is obtained from a multivariate normal distribution with meanα(δ P , α C ) and covariance h∆(δ P ), denoted as N (α P δ P ;α(δ P , α C ), h∆(δ P )), where h is a free parameter that we introduce and select its value adaptively (Roberts & Rosenthal, 2009) in order to achieve an acceptance probability of 20% − 25% (Roberts & Rosenthal, 2001).
Let N (α C δ C ;α(δ C , α P ), h∆(δ C )) denote the proposal density for taking a step in the reverse direction, from model δ P to δ C . Then the acceptance probability of the pair (δ P , α P δ P ) is min    1, |D 2 (α P δ P )| − 1 2 exp(−S P /2σ 2 ) We note that the determinants that appear in the above ratio are equal to one when utilizing centered explanatory variables in the variance model and hence can be left out of the calculation of the acceptance probability.

Summary
We have presented a tutorial on several functions from the R package BNSP. These functions are used for specifying, fitting and summarizing results from regression models with Gaussian errors and with mean and variance functions that can be modeled flexibly in terms of covariates. Function sm is utilized to specify smooth terms in the mean and variance functions of the model. Function mvrm calls an MCMC algorithm that obtains samples from the posteriors of the model parameters. Samples are converted into an object of class 'mcmc' by function mvrm2mcmc which facilitates the use of multiple functions from package coda. Functions print.mvrm and summary.mvrm provide summaries of fitted 'mvrm' objetcs. Further, function plot.mvrm provides graphical summaries of parametric and nonparametric terms that enter the mean or variance function. Lastly, function predict.mvrm provides predictions for a future response or a mean response along with the corresponding prediction/credible intervals.