SMR: An R Package for Computing the Externally Studentized Normal Midrange Distribution

The main purpose of this paper is to present the main algorithms underlining the construction and implementation of the SMR package, whose aim is to compute studentized normal midrange distribution. Details on the externally studentized normal midrange and standardized normal midrange distributions are also given. The package follows the same structure as the probability functions implemented in R. That is: the probability density function (dSMR), the cumulative distribution function (pSMR), the quantile function (qSMR) and the random number generating function (rSMR). Pseudocode and illustrative examples of how to use the package are presented.


Introduction
The SMR package was created to provide an infrastructure for the studentized midrange distribution.This is a new distribution that was inspired by the externally studentized range distribution, which has been largely applied in multiple comparison procedures to identify the best treatment level and has been extensively studied theoretically.Several algorithms to compute the probability density, cumulative distribution, and quantile functions were published by Lund and Lund (1983) and Copenhaver and Holland (1988).Recently, Batista and Ferreira (2014) developed the theory of the externally studentized normal midrange distribution, which is new in the scientific literature to the best knowledge of the authors.The cumulative distribution, the probability density, and the quantile functions were obtained by them analytically.
Computations of the required multidimensional integrations should be done numerically.Therefore, Batista and Ferreira (2014) applied Gaussian quadrature for this task.In particular, they chose the Gauss-Legendre quadrature for solving numerical integrations, because it obtains more accurate results when compared with other Gaussian quadrature methods.The quantile function of the externally studentized normal midrange was computed by the Newton-Raphson method.Based on these numerical methods, the SMR package was built and released.
The package name was chosen to identify the Studentized MidRange distribution.The package follows the same structure as the probability functions implemented in R. The following functions were implemented in the package: the probability density function (dSMR), the cumulative distribution function (pSMR), the quantile function (qSMR), and the random number generating function (rSMR).
Therefore, the main purpose of this paper is to present the main algorithms underlining the construction and implementation of the SMR package, showing pseudocode of its functions and providing the fundamental ideas for the appropriate use of the package through illustrative examples.
First, details on the externally studentized normal midrange and standardized normal midrange distributions are given.Second, the algorithms for the construction of the package and their respective pseudocodes are presented.Third, details of the SMR package functions are showed.Finally, illustrative examples of the package are presented.

The externally studentized normal midrange distribution
Let X 1,n , X 2,n , . .., X n,n be the order statistics of a random sample X 1 , X 2 , . .., X n of size n from a population with cumulative distribution function (c.d.f.) F(x) and probability density function (p.d.f.) f (x) arranged in order of magnitude of values.
The studentized midrange Q for a sample X 1 , X 2 , . .., X n is defined as see, e.g., Batista (2012).When S and R are independent, Q corresponds to the externally studentized midrange, otherwise Q is the internally studentized midrange.The former occurs for example, when R is computed from one normal random sample of size n, and S is obtained from another random sample of size m.It also occurs when R/S is computed from the treatment means of an experimental factor with n levels and S = (QME), where QME is the associated experimental mean square error with m − 1 degrees of freedom (Searle, 1987).When considering the externally studentized midrange, it is relevant to compute Q as the ratio of W = R/σ the standardized midrange distribution and X = S/σ.
Considering the particular case of the normal distribution with mean µ = 0, without loss of generality, and variance σ 2 the p.d.f. of X = S/σ is given by (Newman, 1939) (3) An important distribution for this study, well documented in Batista (2012), Gumbel (1958) and Pillai (1950), is the standardized normal midrange distribution, defined by and the c.d.f. is both results found in David and Nagaraja (2003), where Φ(.) and φ(.) are the c.d.f. and p.d.f. of the standard normal distribution N(0, 1), respectively.
Therefore, according to Batista and Ferreira (2014), the p.d.f. and c.d.f. of Q, in the particular case of the normal distribution with mean µ = 0 and variance σ 2 , are respectively and where ν is the number of degrees of freedom.
The p.d.f (4) and c.d.f. ( 5) are very important to the externally studentized normal midrange algorithms implementation in the SMR package.

Algorithms used in the SMR package
The functions implemented in the SMR package are dependent on specific functions of R which are: pnorm, to obtain the cumulative distribution function of the standard normal, Φ(x); dnorm, to obtain the standard normal probability density function, φ(x); lgamma, to obtain the logarithm of the gamma function.To compute the nodes and weights of the Gauss-Legendre quadrature, an R function based on the method presented by Hildebrand (1974) was implemented.In the following subsections the algorithms and pseudocodes used in the construction of each routine of SMR package are presented.
The R Journal Vol.6/2, December 2014 ISSN 2073-4859 The Gauss-Legendre quadrature and the Newton-Raphson method The basic idea of Gauss-Legendre quadrature of a function f (x), according to Davis and Rabinowitz (1984), is to write it as follows: 1 where f (x) ≡ w(x)g(x) and w(x) is the weight function in the Gaussian quadrature, w i and x i are the nodes and weights, respectively, in an m-point Gaussian quadrature rule, for i = 1, 2, . .., m.The weight function is w(x) = 1 in the Gauss-Legendre quadrature, thus f (x) = g(x).
For computing the quadrature weights and nodes, the eigenvalues and eigenvectors of J were obtained Hildebrand (1974).The nodes for the Gaussian quadrature were the eigenvalues of this tridiagonal matrix J and the weights can be computed from the corresponding normalized eigenvectors φ (i) associated to the eigenvalue x i .The weight w i can be computed from the first element φ (i) 1 of the ith corresponding eigenvector by where µ 0 = 2 for the Gauss-Legendre quadrature (Gil et al., 2007).
The set {x i , w i } is determined such that expression (8) yields an exact result for polynomials of degree 2m − 1 or less (Chihara, 1978).For non-polynomial functions the Gauss-Legendre quadrature error is defined by However, the externally studentized normal midrange distribution depends on integrals over infinite intervals.The integral over an infinite range should be changed into an integral over [-1,1] by using the following transformations (Davis and Rabinowitz, 1984 Therefore, the integrals were computed by applying the Gauss-Legendre quadrature rule on these The R Journal Vol.6/2, December 2014 ISSN 2073-4859 transformed variables by For more details, see Olver et al. (2010).
The Newton-Raphson approximation aims to find the roots of a real function, that is, Thus, iteratively, the Newton-Raphson method follows where f (y) is the first derivative of the function f (y), see, e.g., Miller (2014).
The process starts with an initial arbitrary value y 0 , which should be reasonably close to the true root, then the method will usually converge in a few iterations, considering f (y k ) = 0.
In this study the main objective was to find the quantile q, given that the cumulative probability is p, 0 < p < 1.Hence, considering the cumulative distribution function, F(q), and the probability density, f (q), functions of the random variable Q, the desired solution is The solution is obtained by that should be computed sequentially until a certain convergence criterion is reached.

Standardized normal midrange probability density function
For the standardized normal midrange probability density function (4), the integration interval (−∞; q] was divided into two subintervals and w was replaced in the algorithm to q, as done before to the cumulative distribution function.Hence, the subintervals (−∞, q − 8] and [q − 8, q] were considered. For the first subinterval, the following transformation of variable was used y = b + (1 + t)/(t − 1), resulting in an integration interval given by [−1, 1], as required to apply the Gauss-Legendre quadrature.With the same purpose, the transformation z = (a − b)t/2 + (a + b)/2 was applied to the second subinterval.The algorithm was denoted by dNMR.The approximation of (4) was computed by The choice of the value 8 was motivated by the same choice made by Copenhaver and Holland (1988) to the normal range distribution.Since the midrange distribution is more peaked than the range distribution, this value seems to be a good choice for the standardized normal midrange distribution.Further investigations could be undertaken to optimize the choice of these subintervals.

The standardized normal midrange cumulative distribution function
The algorithm for computing the cumulative distribution function of standardized normal midrange, expression (5), was developed using the Gauss-Legendre quadrature.This will be essential in the construction of the externally studentized normal midrange.First, the integration interval was divided into two subintervals to achieve higher accuracy as suggested by Quarteroni et al. (2007) and the following division was considered: In these two parts, different variable transformations were considered.In the first part, the transformation y = (q − 8) + (1 + t)/(t − 1) was applied to obtain an integration interval [−1, 1], needed to solve this integral by Gauss-Legendre quadrature.Considering y i obtained by the above transformation, where t = x i is the ith Gauss-Legendre quadrature node, i = 1, 2, . .., m, and m is the number of quadrature points, then the first part of the integral (18) was approximated by using the transformations ( 11) and ( 14).Hence, For the second part, the transformation y = (b − a)t/2 + (a + b)/2 was used, with the same goal presented for the first transformation.These changes were implemented implicitly as a numerical device, as shown above.Therefore, the second part of the integral (18) was approximated by using the transformations ( 12) and (15).It was given by where b = q and a = q − 8.
The R Journal Vol.6/2, December 2014 ISSN 2073-4859 Considering the expressions ( 19) and ( 20) the cumulative distribution function ( 18) was approximated by in the same way as the numerical approximation adopted by McBane (2006) and by Verma and Suarez (2014) for obtaining the probability density and cumulative distribution functions and critical values for the range ratio statistics.The main difference between those works and the present results is the use of the Gauss-Legendre quadrature, besides the distribution functions where the quadrature was applied.All other integrals computed in this work will be based on this type of numerical approximation.
In the algorithm pseudocode, the standardized normal midrange denoted by w, is replaced by q, following the notation of expression (18), avoiding confusion with the quadratures weights.Hence, the pseudocode using the Gauss-Legendre quadrature to obtain the cumulative distribution function of the standardized normal midrange (21), which is denoted by pNMR, is given by: 1. Input n > 1, q, and m ≥ 2; 2. Compute m nodes (x i ) and weights (w i ), with i = 1, 2, . . ., m; 11.The quantities in steps 3-10, should be computed for each node

Externally studentized normal midrange probability density function
For computing the externally studentized normal midrange probability density function, given in (6), note that the innermost integral is the probability density function of the standardized normal midrange, given by ( 4) and ( 17).Thus, the dNMR algorithm presented above was reused for computing the probability density function of interest.Also, the variable x (standardized standard deviation) is replaced by s in the implemented algorithm, avoiding confusion with the quadrature nodes.The outermost integral was divided into two parts and a Gauss-Legendre quadrature was also applied in each part.The integration intervals are given by [0, 1] and [1, ∞) for the same reasons mentioned above.Thus, where f (sq) was approximated by expression ( 17), with q substituted by sq, and computed by the algorithm dNMR, described in the previous section.
The innermost integral multiplied by the probability density function of the variable s, f (s; ν), was computed by The auxiliary algorithm dSMR_aux was constructed to compute (23).Its pseudocode is given by: 1. Input s (quadrature node at the ith step), q, n > 1 and m ≥ 2; 2. Υ = dNMR(s × q, n, m); Each integration subinterval of ( 22) was appropriately transformed enabling the Gauss-Legendre quadrature to be applied for approximating the integral (6).For this purpose, in the first integration subinterval, [0, 1], the transformation y = (a − b)t/2 + (a + b)/2 was adopted.Hence, where d(q; y i , n, ν) is computed by (23), with corresponding pseudocode given by dSMR_aux.
Quadratures of 64 points are sufficient for most circumstances, although it is possible to increase the number of quadrature points and achieve the desired accuracy.The Newton-Raphson method was used to compute quantiles, since the probability density function, the first derivative, was available.The following subsections describe the algorithms and show their pseudocodes.

The externally studentized normal midrange cumulative distribution function
Note that the innermost integral of the externally studentized normal midrange cumulative distribution function, given in ( 7), is the cumulative distribution function of standardized normal midrange, given by ( 5) and ( 18).Thus, the pNMR algorithm presented above was reused for computing the cumulative distribution function of interest.Also, the variable x (standardized standard deviation) is replaced by s in the implemented algorithm, avoiding confusion with the quadrature nodes.The outermost integral was divided into two parts and a Gauss-Legendre quadrature was also applied in each part.The integration intervals are given by [0, 1] and [1, ∞).The limit 1 was chosen since it is the modal value of the standardized normal standard deviation probability density function f (x; ν), given in expression (3), following the ideas of Copenhaver and Holland (1988).Thus where F Q (sq) was given by expression (18), with q substituted by sq, and computed by the algorithm pNMR, described in the previous section.
In the second subinterval of [1, ∞), the transformations ( 10) and ( 13), with a = 1, was applied and it is given by y = 1 + (1 + t)/(1 − t).Thus, considering the quadrature nodes x i 's, the integral was approximated by The integration (7) was computed by using the results of ( 29) and ( 30).

Externally studentized normal midrange quantile function
For the externally studentized normal midrange quantile function, the qMR algorithm was constructed.

The random number generator
To generate random sample of size N from the normal midrange distribution, the algorithm rMR was constructed, with parameter n and ν.First, a sample from the standard normal distribution of size n, X 1 , X 2 , . .., X n , was simulated, where X i iid ∼ N(0, 1).A chi-square variable U ∼ χ 2 ν , with degrees of freedom ν, was also simulated, and the following transformation was computed: .
With infinite degrees of freedom, then the following transformation was considered instead The process was repeated N times and the values of Q or W, depending on the considered case, are stored in a vector of size N.
This distribution can be used to obtain cumulative probabilities and quantiles.Thus, for example, given a quantile q or w, the cumulative probability function, ensuring that N is large, is computed in an approximated way by: The random number generator rMR returns the vector of N realizations of Q or W and its pseudocode is presented below.The rMR generator is dependent of the R functions rnorm() and rchisq(), for obtaining a random vector of standard normal independently variables and a chi-square realization, respectively.The matrix(x,p,q) function, presented below, creates a matrix of dimension p × q filled by columns with the elements given by the vector x.
The value of the argument df can be finite or infinity.If df=Inf, values of the probability density, cumulative distribution and quantile functions of the normal midrange (standardized normal midrange) are computed.If the argument df is not specified in the rSMR function, the default value Inf is used and random samples from the normal midrange distribution are drawn.The other functions presented earlier in the previous section are internal algorithms of the SMR package.

Concluding remarks
The importance of the externally studentized normal midrange distribution can be enormous in the analysis of experiments, since the midrange estimator is more efficient than the sample mean in platykurtic distributions.Therefore, this distribution could be useful in the proposition of multiple comparison procedures, that could potentially show better results (more robust and powerful) than the traditional tests based on the externally studentized normal range.This package is easy to use and shows very high accuracy.The accuracy of critical values is estimated through the computation of the difference between two results using different numbers of quadrature points according to the expression (9).The number of quadrature points can be chosen and for most cases, 32 points are enough to achieve high accuracy.Monte Carlo simulations were also used to evaluate the accuracy of the proposed methods and the consistency of the SMR package.The algorithms were proposed and implemented using Gauss-Legendre quadrature and the Newton-Raphson method in R software, resulting in the SMR package, that is available for download from CRAN.An important aspect is that the SMR package is written in R. Code in Fortran or C can make the SMR functions faster, a feature that is planned for future releases of the package.Another important aspect is the possibility of using in future releases two-dimensional interpolations when dSMR, pSMR and qSMR are called with long vectors of arguments x, q, and p.In this case, two-dimensional grids for {x, q, and p} and {size} can be computed offline and the approx function used to interpolate.

Table 1 :
Steel and Torrie (1980)rformed in the same way of the Tukey test.The results of the proposed midrange and Tukey tests are shown in Table1.The proposed midrange test show no ambiguous results in this example, as happens with the Tukey test (two or more letters per level).The midrange and Tukey multiple comparison procedures applied to the nitrogen contents of 6 red clover plants presented inSteel and Torrie (1980), page 180.Means with the same letter are not significantly different. *