RobustGaSP: Robust Gaussian Stochastic Process Emulation in R

Gaussian stochastic process (GaSP) emulation is a powerful tool for approximating computationally intensive computer models. However, estimation of parameters in the GaSP emulator is a challenging task. No closed-form estimator is available and many numerical problems arise with standard estimates, e.g., the maximum likelihood estimator (MLE). In this package, we implement a marginal posterior mode estimator, for special priors and parameterizations, an estimation method that meets the robust parameter estimation criteria discussed in \cite{Gu2016thesis,Gu2016robustness}; mathematical reasons are provided therein to explain why robust parameter estimation can greatly improve predictive performance of the emulator. The package also allows inert inputs (inputs that almost have no effect on the variability of a function) to be identified from the marginal posterior mode estimation, at no extra computational cost. The package can be operated in a default mode, but also allows numerous user specifications, such as the capability of specifying trend functions and noise terms. Examples are studied herein to highlight the performance of the package in terms of out-of-sample prediction.}


Introduction
A GaSP emulator is a fast surrogate model used to approximate the outcomes of a computer model (Sacks, Welch, Mitchell, and Wynn (1989); Bayarri, Berger, Paulo, Sacks, Cafeo, Cavendish, Lin, and Tu (2007) ;Paulo, García-Donato, and Palomo (2012); ). The prediction accuracy of the emulator often depends strongly on the quality of the parameter estimates in the GaSP model. Although the mean and variance parameters in the GaSP model are relatively easy to deal with, estimation of parameters in the correlation functions is difficult (Kennedy and O'Hagan (2001)). Standard methods of estimating these parameters, such as maximum likelihood estimation (MLE), often produce unstable results leading to inferior prediction. As shown in ), the GaSP emulator is unstable when the correlation between any two different inputs are estimated to be close to one or to zero. The former case causes a near singularity when inverting the covariance matrix (this can partially be addressed by adding a small nugget (Andrianakis and Challenor (2012))), while the latter problem happens more often and has no easy fix.
There are several packages on the Comprehensive R Archive Network (CRAN, https://CRAN. R-project.org/) which implement the GaSP model based on the MLE, including DiceKrig-arXiv:1801.01874v2 [stat.CO] 11 Sep 2018 ing (Roustant, Ginsbourger, and Deville (2012)), GPfit (MacDonald, Ranjan, and Chipman (2015)), mleGP (Dancik (2013)), spatial (Venables and Ripley (2002)), and fields (Nychka, Furrer, and Sain (2016)). In these packages, bounds on the parameters in the correlation function are typically implemented to overcome the numerical problems with the MLE estimates. Predictions are, however, often quite sensitive to the choice of bound, which is essentially arbitrary, so this is not an appealing fix to the numerical problems.
In Gu (2016), marginal posterior modes based on several objective priors are studied. It has been found that certain parameterizations result in more robust estimators than others, and, more importantly, that some parameterizations which are in common use should clearly be avoided. Marginal posterior modes with the robust parameterization are mathematically stable, as the posterior density is shown to be zero at the two problematic cases-when the correlation is nearly equal to one or to zero. This motivates the RobustGaSP package; examples also indicate that the package results in more accurate in out-of-sample predictions than previous packages based on the MLE. We use the DiceKriging package in these comparisons, because it is a state-of-the-art implementation of the MLE methodology The RobustGaSP package (Gu, Palomo, and Berger (2016)) for R builds a GaSP emulator with robust parameter estimation. It provides a default method with regard to a specific correlation function, a mean/trend function and an objective prior for the parameters. Users are allowed to specify them, e.g., by using a different correlation and/or trend function, another prior distribution, or by adding a noise term with either a fixed or estimated variance. Although the main purpose of the RobustGaSP package is to do emulation/approximation of a complex function, this package can also be used in fitting the GaSP model for other purposes, such as nonparameteric regression, modeling spatial data and so on. For computational purposes, most of the time consuming functions in the RobustGaSP package are implemented in C++.
We highlight several contributions of this work. First of all, to compute the derivative of the reference prior with a robust parametrization in ) is computationally expensive, however this information is needed to find the posterior mode by the low-storage quasi-Newton optimization method (Nocedal (1980)). We introduce a robust and computationally efficient prior, called the jointly robust prior (Gu (2018)), to approximate the reference prior in the tail rates of the posterior. This has been implemented as a default setting in the RobustGaSP package.
Furthermore, the use of the jointly robust prior provides a natural shrinkage for sparsity and thus can be used to identify inert/noisy inputs (if there are any), implemented in the findInertInputs function in the RobustGaSP package. A formal approach to Bayesian model selection requires to compare 2 p models for p variables, whereas in the RobustGaSP package, only the posterior mode of the full model has to be computed. Eliminating mostly inert inputs in a computer model is similar to not including regression coefficients that have a weak effect, since the noise introduced in their estimation degrades prediction. However, as the inputs have a nonlinear effect to the output, variable selection in GaSP is typically much harder than the one in the linear regression. The findInertInputs function in the RobustGaSP package can be used, as a fast pre-experimental check, to separate the influential inputs and inert inputs in highly nonlinear computer model outputs.
the standardized residuals are implemented and will be introduced in this work.
The rest of the paper is organized as follows. In the next section, we briefly review the statistical methodology of the GaSP emulator and the robust posterior mode estimation. In Section 3, we describe the structure of the package and highlight the main functions implemented in this package. In Section 4, several numerical examples are provided to illustrate the behavior of the package under different scenarios. In Section 5, we present conclusions and briefly discuss potential extensions. Examples will be provided throughout the paper for illustrative purposes.
2. The statistical framework 2.1. GaSP emulator Prior to introducing specific functions and usage of the RobustGaSP package, we first review the statistical formulation of the GaSP emulator. Let x P X denote a p-dimensional vector of inputs for the computer model, and let ypxq denote the resulting simulator output, which is assumed to be real-valued in this section. The simulator ypxq is viewed as an unknown function modeled by the stationary GaSP model, meaning that for any inputs tx 1 , . . . , x n u from X , the likelihood is a multivariate normal distribution, here µp¨q is the mean function, σ 2 is the unknown variance parameter and R is the correlation matrix. The mean function is typically modeled via regression, where hpxq " ph 1 pxq, h 2 pxq, ..., h q pxqq is a vector of specified mean basis functions and θ t is the unknown regression parameter for basis function h t p¨q. In the default setting of the RobustGaSP package, a constant basis function is used, i.e., hpxq " 1; alternatively, a general mean structure can be specified by the user (see Section 3 for details).
The pi, jq element of R in (1) is modeled through a correlation function cpx i , x j q. The product correlation function is often assumed in the emulation of computer models (Santner, Williams, and Notz (2003)), where c l p¨,¨q is an one-dimensional correlation function for the l th coordinate of the input vector. Some frequently chosen correlation functions are implemented in the RobustGaSP package, listed in Table 1. In order to use the power exponential covariance function, one needs to specify the roughness parameter α l , which is often set to be close to 2; e.g., α l " 1.9 is advocated in Bayarri, Berger, Calder, Dalbey, Lunagomez, Patra, Pitman, Spiller, and Wolpert (2009), which maintains an adequate smoothness level yet avoids the numerical problems with α l " 2.
The Matérn correlation is commonly used in modeling spatial data (Stein (2012)) and has recently been advocated for computer model emulation ); one benefit is that , 0 ă α ď 2 Table 1: Correlation functions currently implemented in RobustGaSP. γ is the range parameter and d is the distance between two points in each dimension. For simplicity, the subscript l in Equation (3) has been dropped.
the roughness parameter of the Matérn correlation directly controls the smoothness of the process. For example, the Matérn correlation with α l " 5{2 results in sample paths of the GaSP that are twice differentiable, a smoothness level that is usually desirable. Obtaining this smoothness with the more common squared exponential correlation comes at a price, however, as, for large distances, the correlation drops quickly to zero. For the Matérn correlation with α l " 5{2, the natural logarithm of the correlation only decreases linearly with distance, a feature which is much better for emulation of computer models. Based on these reasons, the Matérn correlation with α l " 5{2 is the default correlation function in RobustGaSP. It is also the default correlation function in some other packages, such as DiceKriging (Roustant et al. (2012); ?).
Since the simulator is expensive to run, we will at most be able to evaluate ypxq at a set of design points. Denote the chosen design inputs as The resulting outcomes of the simulator are denoted as y D " py D 1 , y D 2 , ..., y D n q J . The design points are usually chosen to be "space-filling", including the uniform design and lattice designs. The Latin hypercube (LH) design is a "space-filling" design that is widely used. It is defined in a rectangle whereby each sample is the only one in each axis-aligned hyperplane containing it. LH sampling for a 1-dimensional input space is equivalent to stratified sampling, and the variance of an estimator based on stratified sampling has less variance than the random sampling scheme (Santner et al. (2003)); for a multi-dimensional input space, the projection of the LH samples on each dimension spreads out more evenly compared to simple stratified sampling. The LH design is also often used along with other constraints, e.g., the maximin Latin Hypercube maximizes the minimum Euclidean distance in the LH samples. It has been shown that the GaSP emulator based on maximin LH samples has a clear advantage compared to the uniform design in terms of prediction (see, e.g., Chen, Loeppky, Sacks, and Welch (2016)). For these reasons, we recommend the use of the LH design, rather than the uniform design or lattice designs.

Robust parameter estimation
The parameters in a GaSP emulator include mean parameters, a variance parameter and range parameters, denoted as pθ 1 , .., θ q , σ 2 , γ 1 , ..., γ p q. The objective prior implemented in the RobustGaSP package has the form where πpγq is an objective prior for the range parameters. After integrating out pθ, σ 2 q by π R pγq |I˚pγq| 1{2 π R pξq |I˚pξq| 1{2 with ξ l " logp1{γ l q, for l " 1, ..., p π JR pβq p ř p l"1 C l β l q a expp´b ř p l"1 C l β l q, with β l " 1{γ l , for l " 1, ..., p Table 2: Different priors for the parameters in the correlation function implemented in Ro-bustGaSP. Here I˚p¨q is the expected Fisher information matrix, after integrating out pθ, σ 2 q. The default choice of the prior parameters in π JR pβq is a " 0.2, b " n´1 {p pa`pq, and C l equal to the mean of |x D il´x D jl |, for 1 ď i, j ď n, i ‰ j.
the prior in (4), the marginal likelihood is where S 2 " py D q J Qy D with Q " R´1P and P " I n´h px D qth J px D qR´1hpx D qu´1h J px D qR´1, with I n being the identity matrix of size n.
The reference prior π R p¨q and the jointly robust prior π JR p¨q for the range parameters with robust parameterizations implemented in the RobustGaSP package are listed in Table 2. The package incorporates, by default, the jointly robust prior with the prior parameters pC 1 , ..., C p , a, bq (whose values are given in Table 2). The use of the jointly robust prior is mainly because its derivatives are easy to compute, while the derivatives of the reference prior are computationally hard. The jointly robust prior also approximates the reference prior reasonably well with the specified prior parameters (Gu (2018)). In addition, the posterior modes of the jointly robust prior can usually identify the inert inputs, as discussed in Section 2.4. Users can also specify a different prior distribution if they wish.
When another parameterization is used, parameters are first estimated by the posterior mode and then transformed back to obtain pγ 1 , . . .γ p q.

Prediction
After obtainingγ, the predictive distribution of the GaSP emulator (after marginalizing pθ, σ 2 q out) at a new input point x˚follows a student t distribution with n´q degrees of freedom, wherê withθ "`h T px D qR´1 hpx D q˘´1 h J px D qR´1y D being the generalized least squares estimator for θ and rpx˚q " pcpx˚, x D 1 q, . . . , cpx˚, x D n qq J . The emulator interpolates the simulator at the design points where e i is the n dimensional vector with the i th entry being 1 and the others being 0. At other inputs, the emulator not only provides a prediction of the simulator (i.e.,ŷpx˚q) but also an assessment of prediction accuracy. It also incorporates the uncertainty arising from estimating θ and σ 2 since this was developed from a Bayesian perspective.
We now provide an example in which the input has one dimension, ranging from r0, 10s (Higdon et al. (2002)). Estimation of the range parameters using the RobustGaSP package can be done through the following code: Call: rgasp(design = input, response = output) Mean parameters: 0.03014553 Variance parameter: 0.5696874 Range parameters: 1.752277 Noise parameter: 0 The first line of the code generates 15 LH samples at r0, 10s through the rgasp function of the lhs package (Carnell (2016)). The function higdon.1.data is provided within the RobustGaSP package which has the form ypxq " sinp2πx{10q`0.2sinp2πx{2.5q. The third line fits a GaSP model with the robust parameter estimation by marginal posterior modes.
The plot function in RobustGaSP package implements the leave-one-out cross validation for a rgasp class after the GaSP model is built (see Figure 1 for its output):

Identification of inert inputs
Some inputs have little effect on the output of a computer model. Such inputs are called inert inputs (Linkletter, Bingham, Hengartner, Higdon, and Kenny (2006)). To quantify the influence of a set of inputs on the variability of the outputs, functional analysis of the variance (functional ANOVA) can be used, often implemented through Sobol's Indices (Sobol' (1990); Sobol (2001)). Methods for numerical calculation of Sobol's Indices have been implemented in the sensitivity R package (Pujol, Iooss, with contributions from Khalid Boumhaout, Veiga, Fruth, Gilquin, Guillaume, Le Gratiet, Lemaitre, Ramos, Touati, and Weber (2016)).
The identification of inert inputs through the posterior modes with the jointly robust prior (π JR p¨q) for the range parameters is discussed in Gu (2018). The package discussed here implements this idea, using the estimated normalized inverse range parameters, for l " 1, ..., p. The involvement of C l (defined in Table 2) is to account for the different scales of different inputs. The denominator p ř p i"1 C iβi q reflects the overall size of the estimator and C lβl gives the contribution of the l th input. The averageP l is 1 and the sum ofP l is p. When P l is very close to 0, it means the l th input might be an inert input. In the RobustGaSP package, the default threshold is 0.1, i.e., whenP l ă 0.1, it is suggested to be an inert input. The threshold can also be specified by users through the argument threshold in the function findInertInputs.
For demonstration purpose, we build a GaSP emulator for the borehole experiment (Worley (1987); Morris, Mitchell, and Ylvisaker (1993); An and Owen (2001)), a well-studied computer experiment benchmark which models water flow through a borehole. The output y is the flow rate through the borehole in m 3 {year and it is determined by the equation: where r ω , r, T u , H u , T l , H l , L and K ω are the 8 inputs constrained in a rectangular domain with the following ranges r ω P r0.05, 0.15s, r P r100, 50000s, T u P r63070, 115600s, H u P r990, 1110s, T l P r63.1, 116s, H l P r700, 820s, L P r1120, 1680s, K ω P r9855, 12045s.
We use 40 maximin LH samples to find inert inputs at the Borehole function through the following code.
The estimated normalized inverse range parameters are : 3.440765 8.13156e-09 4.983695e-09 0.844324 4.666519e-09 1.31081 1.903236 0.5008652 The inputs 2 3 5 are suspected to be inert inputs The function findInertInputs of the RobustGaSP package indicates that the 2 nd , 3 rd and 5 th inputs are suspected to be inert inputs. Figure 4 presents the plots of the borehole function when varying one input at a time. This analyzes the local sensitivity of an input when having the others fixed. Indeed, the output of the Borehole function changes very little when the 2 nd , 3 rd and 5 th inputs vary.

Noisy outputs
The ideal situation for a computer model is that it produces noise-free data, meaning that the output will not change at the same input. However, there are several cases in which the outputs are noisy. First of all, the numerical solution of the partial differential equations of a computer model could introduce small errors. Secondly, when only a subset of inputs are analyzed, the computer model is no longer deterministic given only the subset of inputs. For example, if we only use the 5 influential inputs of the Borehole function, the outcomes of this function are no longer deterministic, since the variation of the inert inputs still affects the outputs a little. Moreover, some computer models might be stochastic or have random terms in the models.
For these situations, the common adjustment is to add a noise term to account for the error, such asỹp¨q " yp¨q` , where yp¨q is the noise-free GaSP and is an i.i.d. mean-zero Gaussian white noise (Ren, Sun, and He (2012); ). To allow for marginalizing out the variance parameter, the covariance function for the new processỹp¨q can be parameterized as follows: where η is defined to be the nugget-variance ratio and δ lm is a Dirac delta function when l " m, δ lm " 1. After adding the nugget, the covariance matrix becomes σ 2R " σ 2 pR`ηI n q.
Although we call η the nugget-variance ratio parameter, the analysis is different than when a nugget is directly added to stabilize the computation in the GaSP model. As pointed out in Roustant et al. (2012), when a nugget is added to stabilize the computation, it is also added to the covariance function in prediction, and, hence, the resulting emulator is still an interpolator, meaning that the prediction will be exact at the design points. However, when a noise term is added, it does not go into the covariance function and the prediction at a design point will not be exact (because of the effect of the noise).
Objective Bayesian analysis for the proposed GaSP model with the noise term can be done by defining the priorπ whereπpγ, ηq is now the prior for the range and nugget-variance ratio parameters pγ, ηq. The reference prior and the jointly robust prior can also be extended to beπ R p¨q andπ JR p¨q with robust parameterizations listed in Table 2. Based on the computational feasibility of the derivatives and the capacity to identify noisy inputs, the proposed default setting is to use the jointly robust prior with specified prior parameters in Table 2.
After obtainingγ andη, the predictive distribution of the GaSP emulator is almost the same as in Equation (7); simply replace cp¨,¨q bycp¨,¨q and R byR.
Using only the influential inputs of the Borehole function, we construct the GaSP emulator with a nugget based on 30 maximin LH samples through the following code. To compare the performance of the emulator with and without a noise term, we perform some out-of-sample testing. We build the GaSP emulator by the RobustGaSP package and the DiceKriging package using the same mean and covariance. In RobustGaSP, the parameters in the correlation functions are estimated by marginal posterior modes with the robust parameterization, while in DiceKriging, parameters are estimated by MLE with upper and lower bounds. We first construct these four emulators with the following code.

R> dim_inputs <-dim(input)[2]
R> num_testing_input <-100 Since the DiceKriging package seems not to have implemented a method to estimate the noise parameter, we only compare it with the nugget case.
The absolute errors of these 4 emulators (all with the same correlation and mean function) at 100 held-out points are shown in Figure 5. The performance of the RobustGaSP package based on the full set of inputs or only influential inputs with a noise is similar, and they are both better than the predictions from the DiceKriging package. This supports our belief that robust parameter estimation by the marginal posterior mode improves the prediction accuracy, compared to use of the MLE with bounds on the range parameters.

Main functions
The main purpose of the RobustGaSP package is to predict a function at unobserved points based on only a limited number of evaluations of the function. The uncertainty associated with the predictions is obtained from the predictive distribution in Equation (7), which is implemented in two steps. The first step is to build a GaSP model through the rgasp function. This function allows users to specify the mean function, correlation function, prior distribution for the parameters and to include a noise term or not. In the default setting, these are all specified. The mean and variance parameters are handled in a fully Bayesian way, and the range parameters in the correlation function are estimated by their marginal posterior modes. Moreover, users can also fix the range parameters, instead of estimating Absolute out-of-sample prediction errors at 100 random samples by different emulators of the Borehole function based on n " 30 maximin LH samples. The blue squares and green circles are from RobustGaSP, with the full set of inputs and with only influential inputs (and a noise term), respectively; the red triangles and the brown crosses are from DiceKriging, with the full set of inputs and with only influential inputs (and a nugget), respectively. them, change/replace the mean function, add a noise term, etc. The rgasp function returns an object of the rgasp S4 class with all needed estimated parameters, including the mean, variance, noise and range parameters to perform predictions.
The second step is to compute the predictive distribution of the previously created GaSP model through the predict function, which produces the predictive means, the 95% predictive credible intervals, and the predictive standard deviations at each test point. As the predictive distribution follows a student t distribution in (7) for any test points, any quantile/percentile of the predictive distribution can be computed analytically. The joint distribution at a set of test points is a multivariate t distribution whose dimension is equal to the number of test points. Users can also sample from the posterior predictive distribution by using the simulate function.
The identification of inert inputs can be performed using the findInertInput function. As it only depends on the inverse range parameters through Equation (9), there is no extra computational cost in their identification (once the robust GaSP model has been built through the rgasp function). We suggest using the jointly robust prior by setting the argument prior_choice="ref_approx" in the rgasp function before calling the findInertInput function, because the penalty given by this prior is close to an L 1 penalty for the logarithm of the marginal likelihood (with the choice of default prior parameters) and, hence, it can shrink the parameters for those inputs with small effect.

The rgasp function
The rgasp function is the most important function, as it performs the parameter estimation for the GaSP model. In this section, we briefly review the implementation of the rgasp function and its optimization algorithm.
The nˆp design matrix x D and the nˆ1 output vector y D are the only two required arguments (without default values) in the rgasp function. The default setting in the argument trend is a constant function, i.e., hpx D q " 1 n . One can also set zero.mean="Yes" in the rgasp function to assume the mean function in GaSP model is zero. By default, the GaSP model is defined to be noise-free, i.e., the noise parameter is 0. However, a noise term can be added with estimated or fixed variance. As the noise is parameterized following the form (10), the variance is marginalized out explicitly and the nugget-variance parameter η is left to be estimated. This can be done by specifying the argument nugget.est = T in the rgasp function; when the nugget-variance parameter η is known, it can be specified; e.g., η " 0.01 indicates the nugget-variance ratio is equal to 0.01 in rgasp and η will be not be estimated with such a specification.
Two classes of priors of the form (4), with several different robust parameterizations, have been implemented in the RobustGaSP package (see Table 3 for details). The prior that will be used is controlled by the argument prior_choice in the rgasp function. The reference prior π R p¨q with γ (the conventional parameterization of the range parameters for the correlation functions in Table 1) and ξ " logp1{γq parameterization can be specified through the arguments prior_choice="ref_gamma" and prior_choice="ref_xi", respectively. The jointly robust prior π JR p¨q with the β " 1{γ parameterization can be specified through the argument prior_choice="ref_approx"; this is the default choice used in rgasp, for the reasons discussed in Section 2.
The correlation functions implemented in the RobustGaSP package are shown in Table 1, with the default setting being kernel_type = "matern_5_2" in the rgasp function. The power exponential correlation function requires the specification of a vector of roughness parameters α through the argument alpha in the rgasp function; the default value is α l " 1.9 for l " 1, ..., p, as suggested in Bayarri et al. (2009).

The optimization algorithm
Estimation of the range parameters γ is implemented through numerical search for the marginal posterior modes in Equation (6). The low-storage quasi-Newton optimization method (Nocedal (1980); Liu and Nocedal (1989)) has been used in the lbfgs function in the nloptr package (Ypma (2014)) for optimization. The closed-form marginal likelihood, prior and their derivatives are all coded in C++. The maximum number of iterations and tolerance bounds are allowed to be chosen by users with the default setting as max_eval=30 and xtol_rel=1e-5, respectively.
Although maximum marginal posterior mode estimation with the robust parameterization eliminates the problems of the correlation matrix being estimated to be either I n or 1 n 1 J n , the correlation matrix could still be close to these singularities in some scenarios, particularly when the sample size is very large. In such cases, we also utilize an upper bound for the range parameters γ (equivalent to a lower bound for β " 1{γ). The derivation of this bound is discussed in the Appendix. This bound is implemented in the rgasp function through the argument lower_bound=T, and this is the default setting in RobustGaSP. As use of the bound is a somewhat adhoc fix for numerical problems, we encourage users to also try the analysis without the bound; this can be done by specifying lower_bound=F. If the answers are essentially unchanged, one has more confidence that the parameter estimates are satisfactory. Furthermore, if the purpose of the analysis is to detect inert inputs, users are also suggested to use the argument lower_bound=F in the rgasp function.
Since the marginal posterior distribution could be multi-modal, the package allows for different initial values in the optimization by setting the argument multiple_starts=T in the rgasp function. The first default initial value for each inverse range parameter is set to be 50 times their default lower bounds, so the starting value will not be too close to the boundary. The second initial value for each of the inverse range parameter is set to be half of the mean of the jointly robust prior. Two initial values of the nugget-variance parameter are set to be η " 0.0001 and η " 0.0002 respectively.

Examples
In this section, we present further examples of the performance of the RobustGaSP package, and include comparison with the DiceKriging package in R. We will use the same data, trend function and correlation function for the comparisons. The default correlation function in both packages is the Matérn correlation with α " 5{2 and the default trend function is a constant function. The only difference is thus the method of parameter estimation, as discussed in Section 2.

The modified sine wave function
It is expected that, for a one-dimensional function, both packages will perform well with an adequate number of design points, so we start with the function called the modified sine wave discussed in Gu (2016). It has the form y " 3 sinp5πxq`cosp7πxq, where x " r0, 1s. We first perform emulation based on 12 equally spaced design points on r0, 1s.
R> sinewave <-function(x){ + 3*sin(5*pi*x)*x+cos(7*pi*x) + } R> input <-as.matrix(seq(0,1,1/11)) R> output <-sinewave(input) The GaSP model is fitted by both the RobustGaSP and DiceKriging packages, with the constant mean function.  Figure 6: Emulation of the modified sine wave function with 12 design points equally spaced in r0, 1s. The black curve is the graph of the function and the outputs at the design points are the black circles. The blue curve is the predictive mean and the grey region is the 95% posterior credible interval obtained by the RobustGaSP package. The red curve is the predictive mean produced by the DiceKriging package. A big difference between two packages is the estimated range parameter, which is found to be around 0.04 in the RobustGaSP package, whereas it is found to be very close to zero in the DiceKriging package. To see which estimate is better, we perform prediction on 100 test points, equally spaced in r0, 1s.
R> testing_input <-as.matrix(seq(0,1,1/99)) R> m.predict <-predict(m,testing_input) R> dk.predict <-predict(dk,testing_input,type='UK') The emulation results are plotted in Figure 6. Note that the red curve from the DiceKriging package degenerates to the fitted mean with spikes at the design points. This unsatisfying phenomenon, discussed in Gu et al. (2018), happens when the estimated covariance matrix is close to an identity matrix, i.e.,R « I n , or equivalentlyγ tends to 0. Repeated runs of the DiceKriging package under different initializations yielded essentially the same results.
The predictive mean from the RobustGaSP package is plotted as the blue curve in Figure 6 and is quite accurate as an estimate of the true function. Note, however, that the uncertainty in this prediction is quite large, as shown by the wide 95% posterior credible regions.
In this example, adding a nugget is not helpful in DiceKriging, as the problem is thatR « I n ; adding a nugget is only helpful when the correlation estimate is close to a singular matrix (i.e., R « 1 n 1 J n ). However, increasing the sample size is helpful for the parameter estimation. Indeed, emulation of the modified sine wave function using 13 equally spaced design points in r0, 1s was successful for one run of DiceKriging, as shown in the right panel of Figure 7. However, the left panel in Figure 7 gives another run of DiceKriging for this data, and this one converged to the problematical γ « 0. The predictive mean from RobustGaSP is stable. Interestingly, the uncertainty produced by RobustGaSP decreased markedly with the larger number of design points.
It is somewhat of a surprise that even emulation of a smooth one-dimensional function can be problematical. The difficulties with a multi-dimensional input space can be considerably greater, as indicated in the next example.

The Friedman function
The Friedman function was introduced in Friedman (1991) and is given by y " 10 sinpπx 1 x 2 q`20px 3´0 .5q 2`1 0x 4`5 x 5 , where x i P r0, 1s for i " 1, ..., 5. 40 design points are drawn from maximin LH samples. A GaSP model is fitted using the RobustGaSP package and the DiceKriging package with the constant mean basis function (i.e., hpxq " 1). Prediction on 200 test points, uniformly sampled from r0, 1s 5 , is then performed.
R> testing_output <-matrix(0,num_testing_input,1) R> for ( Thus the RMSE from RobustGaSP is 0.28, while the RMSE from RobustGaSP is 0.89. The predictions versus the real outputs are plotted in Figure 8. The black circles correspond to the predictive means from the RobustGaSP package and are closer to the real output than the red circles produced by the DiceKriging package. Since both packages use the same correlation and mean function, the only difference lies in the method of parameter estimation, especially estimation of the range parameters γ. The RobustGaSP package seems to do better, leading to much smaller RMSE in out-of-sample prediction. The Friedman function has a linear trend associated with the 4 th and the 5 th inputs (but not the first three) so we use this example to illustrate specifying a trend in the GaSP model. For realism (one rarely actually knows the trend for a computer model), we specify a linear trend for all variables; thus we use hpxq " p1, xq, where x " px 1 , ..., x 5 q and investigate whether or not adding this linear trend to all inputs is helpful for the prediction.
[1] 0.8468056 Adding a linear trend does improve the out-of-sample prediction accuracy of the RobustGaSP package; the RMSE decreases to 0.13, which is only about one third of the RMSE of the previous model with the constant mean. However, the RMSE using the DiceKriging package with a linear mean increases to 0.85, more than 6 times larger than that for the RobustGaSP.
(That the RMSE actually increased for DiceKriging is likely due to the additional difficulty of parameter estimation, since now the additional linear trend parameters needed to be estimated; in contrast, for RobustGaSP, the linear trend parameters are effectively eliminated through objective Bayesian integration.) The predictions against the real output are plotted in Figure 9. The black circles correspond to the predictive means from the RobustGaSP package, and are an excellent match to the real outputs.
In addition to point prediction, it is of interest to evaluate the uncertainties produced by the emulators, through study of out-of-sample coverage of the resulting credible intervals and their average lengths, P CI p95%q " 1 n˚nÿ i"1 1ty j pxi q P CI i p95%qu, The black circles are the predictive means obtained from RobustGaSP, and the red circles are the predictive means obtained from the DiceKriging package. In both cases, linear terms are assumed for the mean basis function, i.e., hpxq " p1, xq.
where CI i p95%q is the 95% posterior credible interval. An ideal emulator would have P CI p95%q close to the 95% nominal level and a short average length. We first show P CI p95%q and L CI p95%q for the case of a constant mean basis function.
R> prop.m <-length(which((m.predict$lower95<=testing_output) + &(m.predict$upper95>=testing_output)))/num_testing_input R> length.m <-sum(m.predict$upper95-m.predict The P CI p95%q obtained by the RobustGaSP is 97%, which is close to the 95% nominal level; and L CI p95%q, the average lengths of the 95% credible intervals, is 1.02. In contrast, the coverage of credible intervals from DiceKriging is also 97%, but this is achieved by intervals that are, on average, about three times longer than those produced by RobustGaSP. When linear terms are assumed in the basis function of the GaSP emulator, hpxq " p1, xq, R> prop.m.trend <-length(which((m.trend.predict$lower95<=testing_output) + &(m.trend.predict$upper95>=testing_output)))/num_testing_input R> length.m.trend <-sum(m.trend.predict$upper95-+ m.trend.predict$lower95)/num_testing_input R> prop.m.trend the P CI p95%q for RobustGaSP is 100% and L CI p95%q " 0.823, a significant improvement over the case of a constant mean. (The coverage of 100% is too high, but at least is conservative and is achieved with quite small intervals.) For DiceKriging, the coverage is 98.5% with a linear mean, but the average interval size is now around 4 times as those produced by RobustGaSP.
To see whether or not the differences in performance persists when the sample size increases, the same experiment was run on the two emulators with sample size n " 80. When the constant mean function is used, the RMSE obtained by the RobustGaSP package and the DiceKriging package were 0.05 and 0.32, respectively. With hpxq " p1, xq, the RMSE's were 0.04 and 0.34, respectively. Thus the performance difference remains and is even larger, in a proportional sense, than when the sample size is 40.

Concluding remarks
Computer models are widely used in many applications in science and engineering. The Gaussian stochastic process emulator provides a fast surrogate for computationally intensive computer models. The difficulty of parameter estimation in the GaSP model is well-known, as there is no closed-form well-behaved estimator for the correlation parameters; and poor estimation of the correlation parameters can lead to seriously inferior predictions. The Ro-bustGaSP package implements marginal posterior mode estimation of these parameters, for parameterizations that satisfy the "robustness" criteria from Gu et al. (2018). Part of the advantage of this method of estimation is that the posterior has zero density for the problematic cases in which the correlation matrix is an identity matrix or the matrix or all ones. Some frequently used estimators, such as the MLE, do not have this property. Several examples have been provided to illustrate the use of the RobustGaSP package. Results of out-of-sample prediction suggest that the estimators in RobustGaSP, with small to moderately large sample sizes, perform considerably better than the MLE.
Although the main purpose of the RobustGaSP package is to emulate computationally intensive computer models, several functions could be useful for other purposes. For example, the findInertInputs function utilizes the posterior modes to find inert inputs at no extra computational cost than fitting the GaSP model. A noise term can be added to the GaSP model, with fixed or estimated variance, allowing RobustGaSP to analyze noisy data from either computer models or, say, spatial experiments.
While posterior modes are used for estimating the correlation parameters in the current software, it might be worthwhile to implement posterior sampling for this Bayesian model. In GaSP models, the usual computational bottleneck for such sampling is the evaluation of the likelihood, as each evaluation requires inverting the covariance matrix, which is a computation of order of Opn 3 q, with n being the number of observations. As discussed in Gu and Xu (2017), however, exact evaluation of the likelihood for the Matérn covariance is only Opnq for the case of a one-dimensional input, using the stochastic differential equation representation of the GaSP model. If this could be generalized to multi-dimensional inputs, posterior sampling would become practically relevant.