SMR: An R Package for Computing the Externally Studentized Normal Midrange Distribution

Ben Dêivide de Oliveira Batista; Daniel Furtado Ferreira

1 Introduction

The SMR package was created to provide an infrastructure for the studentized midrange distribution. This is a new distribution that was inspired by the externally studentized range distribution, which has been largely applied in multiple comparison procedures to identify the best treatment level and has been extensively studied theoretically. Several algorithms to compute the probability density, cumulative distribution, and quantile functions were published by Lund and J. R. Lund (1983) and Copenhaver and B. Holland (1988). Recently, Batista and D. F. Ferreira (2014) developed the theory of the externally studentized normal midrange distribution, which is new in the scientific literature to the best knowledge of the authors. The cumulative distribution, the probability density, and the quantile functions were obtained by them analytically.

Computations of the required multidimensional integrations should be done numerically. Therefore, Batista and D. F. Ferreira (2014) applied Gaussian quadrature for this task. In particular, they chose the Gauss-Legendre quadrature for solving numerical integrations, because it obtains more accurate results when compared with other Gaussian quadrature methods. The quantile function of the externally studentized normal midrange was computed by the Newton-Raphson method. Based on these numerical methods, the SMR package was built and released.

The package name was chosen to identify the Studentized MidRange distribution. The package follows the same structure as the probability functions implemented in R. The following functions were implemented in the package: the probability density function (dSMR), the cumulative distribution function (pSMR), the quantile function (qSMR), and the random number generating function (rSMR).

Therefore, the main purpose of this paper is to present the main algorithms underlining the construction and implementation of the SMR package, showing pseudocode of its functions and providing the fundamental ideas for the appropriate use of the package through illustrative examples.

First, details on the externally studentized normal midrange and standardized normal midrange distributions are given. Second, the algorithms for the construction of the package and their respective pseudocodes are presented. Third, details of the SMR package functions are showed. Finally, illustrative examples of the package are presented.

2 The externally studentized normal midrange distribution

Let \(X_{1,n}\), \(X_{2,n}\), \(\ldots\), \(X_{n,n}\) be the order statistics of a random sample \(X_1\), \(X_2\), \(\ldots\), \(X_n\) of size \(n\) from a population with cumulative distribution function (c.d.f.) \(F(x)\) and probability density function (p.d.f.) \(f(x)\) arranged in order of magnitude of values.

The midrange \(R\) for a sample \(X_1\), \(X_2\), \(\ldots\), \(X_n\) is defined as \[\begin{aligned} R&=&(X_{1,n}+X_{n,n})/2, \end{aligned}\] see, e.g., Rider (1957).

The p.d.f. and c.d.f. of \({R}\) are

\[\begin{aligned} f_{{R}}({r})&= \int^{{r}}_{-\infty}2n(n-1)f(y) f(2{r}-y)[F(2{r}-y)-F(y)]^{n-2}dy \label{eq:fdpRbar} \end{aligned} \tag{1} \]

and

\[\begin{aligned} F_{{R}}({r}) &= n\int^{{r}}_{-\infty}f(y)[F(2{r}-y)-F(y)]^{n-1}dy, \label{eq:func_dist_r} \end{aligned} \tag{2} \]

see, e.g., David and H. N. Nagaraja (2003).

The studentized midrange \(Q\) for a sample \(X_1\), \(X_2\), \(\ldots\), \(X_n\) is defined as \[\begin{aligned} Q&=&\frac{R/\sigma}{S/\sigma}=\frac{R}{S}=\frac{W}{X}, \end{aligned}\] see, e.g., Batista (2012). When \(S\) and \(R\) are independent, \(Q\) corresponds to the externally studentized midrange, otherwise \(Q\) is the internally studentized midrange. The former occurs for example, when \(R\) is computed from one normal random sample of size \(n\), and \(S\) is obtained from another random sample of size \(m\). It also occurs when \(R/S\) is computed from the treatment means of an experimental factor with \(n\) levels and \(S=\sqrt{(QME)}\), where \(QME\) is the associated experimental mean square error with \(m-1\) degrees of freedom (Searle 1987). When considering the externally studentized midrange, it is relevant to compute \(Q\) as the ratio of \(W = R/\sigma\) the standardized midrange distribution and \(X=S/\sigma\).

Considering the particular case of the normal distribution with mean \(\mu=0\), without loss of generality, and variance \(\sigma^2\) the p.d.f. of \(X=S/\sigma\) is given by (Newman 1939)

\[\label{fxv} f(x;\nu)=\frac{\nu^{\nu/2}}{\Gamma(\nu/2)2^{\nu/2-1}}x^{\nu-1}e^{-\nu x^2/2}\textrm{, } x\geq 0. \tag{3} \]

An important distribution for this study, well documented in Batista (2012), Gumbel (1958) and Pillai (1950), is the standardized normal midrange distribution, defined by \({W}= {R}/\sigma\). The p.d.f. of \({W}\) is

\[\label{eq:distw1} f_{ {W}}( {w})=\int^{ {w}}_{-\infty}2 n(n-1)\phi(y) \phi(2 {w}-y)[\Phi(2 {w}-y)-\Phi(y)]^{n-2} dy. \tag{4} \]

and the c.d.f. is

\[\label{eq:acumw} F_{ {W}}( {w})=\int^{ {w}}_{-\infty} n\phi(y)[\Phi(2 {w}-y)-\Phi(y)]^{n-1}dy, \tag{5} \]

both results found in David and H. N. Nagaraja (2003), where \(\Phi(.)\) and \(\phi(.)\) are the c.d.f. and p.d.f. of the standard normal distribution \(N(0,1)\), respectively.

Therefore, according to Batista and D. F. Ferreira (2014), the p.d.f. and c.d.f. of \(Q\), in the particular case of the normal distribution with mean \(\mu=0\) and variance \(\sigma^2\), are respectively

\[\begin{aligned} \label{eq:densq} f( {q};n,\nu) =&\int^{\infty}_{0} \int^{x {q}}_{-\infty}2n(n-1)x\phi(y) \phi(2x {q}-y) [\Phi(2x {q}-y)-\Phi(y)]^{n-2}f(x;\nu)dydx, \end{aligned} \tag{6} \]

and

\[\label{eq:acumr} F( {q};n,\nu)=\int^{\infty}_{0} \int^{x {q}}_{-\infty}n\phi(y)[\Phi(2x {q}-y)-\Phi(y)]^{n-1}f(x;\nu)dydx, \tag{7} \]

where \(\nu\) is the number of degrees of freedom.

The p.d.f ((4)) and c.d.f. ((5)) are very important to the externally studentized normal midrange algorithms implementation in the SMR package.

3 Algorithms used in the SMR package

The functions implemented in the SMR package are dependent on specific functions of R which are: pnorm, to obtain the cumulative distribution function of the standard normal, \(\Phi(x)\); dnorm, to obtain the standard normal probability density function, \(\phi(x)\); lgamma, to obtain the logarithm of the gamma function. To compute the nodes and weights of the Gauss-Legendre quadrature, an R function based on the method presented by Hildebrand (1974) was implemented. In the following subsections the algorithms and pseudocodes used in the construction of each routine of SMR package are presented.

The Gauss-Legendre quadrature and the Newton-Raphson method

The basic idea of Gauss-Legendre quadrature of a function \(f(x)\), according to Davis and P. Rabinowitz (1984), is to write it as follows:

\[\label{eq:quadgauss2} \int\limits_{-1}^{1}f(x)dx=\int\limits_{-1}^{1}w(x)g(x)dx\approx \sum\limits_{i=1}^{m}w_ig(x_i), \tag{8} \]

where \(f(x)\equiv w(x)g(x)\) and \(w(x)\) is the weight function in the Gaussian quadrature, \(w_i\) and \(x_i\) are the nodes and weights, respectively, in an \(m\)-point Gaussian quadrature rule, for \(i\) \(=\) \(1\), \(2\), \(\ldots\), \(m\). The weight function is \(w(x)=1\) in the Gauss-Legendre quadrature, thus \(f(x)=g(x)\).

Let the symmetric tridiagonal matrix \(\mathcal{J}\), with elements \(\mathcal{J}_{i,i}=a_{i-1},\, i=1,\ldots,m\) and \(\mathcal{J}_{i-1,i}=\mathcal{J}_{i,i-1}=\sqrt{J_{i,i-1}J_{i-1,i}}= \sqrt{b_{i-1}},\, i=2,\ldots,m\), be the Jacobi matrix, given by \[\mathcal{J}=\left( \begin{array}{llllll} a_0 & \sqrt{b_1} & 0 & \ldots & \ldots & \ldots\\ \sqrt{b_1} & a_1 & \sqrt{b_2} & 0 & \ldots & \ldots \\ 0 & \sqrt{b_2} & a_2 & \sqrt{b_3} & 0 & \ldots \\ 0 & \ldots & \ldots & \ldots & \ldots & 0 \\ \ldots & \ldots & 0 & \sqrt{b_{m-2}} & a_{m-2} & \sqrt{b_{m-1}} \\ \ldots & \ldots & \ldots & 0 & \sqrt{b_{m-1}} & a_{m-1} \end{array} \right),\] where \(a_{i-1}=0\), for \(i=1,\ldots,m\) and \(\sqrt{b_{i-1}}=(i-1)/\sqrt{4(i-1)^2-1}\), for \(i=2,\ldots,m\), considering the Gauss-Legendre quadrature.

For computing the quadrature weights and nodes, the eigenvalues and eigenvectors of \(\mathcal{J}\) were obtained Hildebrand (1974). The nodes for the Gaussian quadrature were the eigenvalues of this tridiagonal matrix \(\mathcal{J}\) and the weights can be computed from the corresponding normalized eigenvectors \(\mathbf{\phi}^{(i)}\) associated to the eigenvalue \(x_i\). The weight \(w_i\) can be computed from the first element \(\phi_1^{(i)}\) of the \(i\)th corresponding eigenvector by \[w_i=\mu_0 \left(\phi_1^{(i)}\right)^2,\] where \(\mu_0=2\) for the Gauss-Legendre quadrature (Gil, J. Segura, and N. Temme 2007).

The set \(\{x_i,\, w_i\}\) is determined such that expression ((8)) yields an exact result for polynomials of degree \(2m-1\) or less (Chihara 1978). For non-polynomial functions the Gauss-Legendre quadrature error is defined by

\[\label{eq:erroquad} \varepsilon_m \approx \left| \sum^{q}_{i=1} w_ig(x_i)-\sum^{m}_{i=1}w_i g(x_i)\right|,\ \ \ q>m. \tag{9} \]

However, the externally studentized normal midrange distribution depends on integrals over infinite intervals. The integral over an infinite range should be changed into an integral over [-1,1] by using the following transformations (Davis and P. Rabinowitz 1984)

\[\begin{aligned} \int\limits_{a}^{\infty}f(y)dy&=&\int\limits_{-1}^{1} f\left(a+\frac{1+t}{1-t}\right)\frac{2}{(1-t)^2}dt, \label{eq:transf1p1} \end{aligned} \tag{10} \]

\[\begin{aligned} \int\limits_{-\infty}^{b}f(y)dy&=&\int\limits_{-1}^{1} f\left(b+\frac{1+t}{t-1}\right)\frac{2}{(t-1)^2}dt, \label{eq:transf2p1} \end{aligned} \tag{11} \]

\[\begin{aligned} \int\limits_{a}^{b}f(y)dy&=&\frac{b-a}{2} \int\limits_{-1}^{1}f\left(\frac{(b-a)}{2}t+\frac{(b+a)}{2} \right) dt. \label{eq:transf3p1} \end{aligned} \tag{12} \]

Therefore, the integrals were computed by applying the Gauss-Legendre quadrature rule on these transformed variables by

\[\begin{aligned} \int\limits_{a}^{\infty}f(y)dy &\approx& \sum_{i=1}^m w_i \frac{2}{(1-t_i)^2}f\left(a+\frac{1+t_i}{1-t_i}\right), \label{eq:transf1} \end{aligned} \tag{13} \]

\[\begin{aligned} \int\limits_{-\infty}^{b}f(y)dy &\approx& \sum_{i=1}^m w_i \frac{2}{(t_i-1)^2}f\left(b+\frac{1+t_i}{t_i-1}\right),\label{eq:transf2} \end{aligned} \tag{14} \]

\[\begin{aligned} \int_a^b f(y)dy &\approx& \frac{b-a}{2} \sum_{i=1}^m w_i f\left(\frac{b-a}{2}t_i + \frac{a+b}{2}\right).\label{eq:transf3} \end{aligned} \tag{15} \]

For more details, see Olver, D. W. Lozier, R. F. Boisvert, and C. W. Clark, editors (2010).

The Newton-Raphson approximation aims to find the roots of a real function, that is, \[y:f(y)=0.\] Thus, iteratively, the Newton-Raphson method follows

\[\begin{aligned} y_{k+1}=y_k-\frac{f(y_k)}{f'(y_k)}, \ \ k=0,1,2,\ldots \end{aligned}\]

where \(f'(y)\) is the first derivative of the function \(f(y)\), see, e.g., Miller (2014).

The process starts with an initial arbitrary value \(y_0\), which should be reasonably close to the true root, then the method will usually converge in a few iterations, considering \(f'(y_k)\neq 0\).

In this study the main objective was to find the quantile \(q\), given that the cumulative probability is \(p\), \(0<p<1\). Hence, considering the cumulative distribution function, \(F(q)\), and the probability density, \(f(q)\), functions of the random variable \(Q\), the desired solution is \[q: F(q)-p = 0.\]

The solution is obtained by

\[\begin{eqnarray} q_{k+1}=q_k-\frac{F(q_k)-p}{f(q_k)}, \ \ k=0,1,2,\ldots \label{eq:newtonraphson2} \end{eqnarray} \tag{16} \]

that should be computed sequentially until a certain convergence criterion is reached.

Standardized normal midrange probability density function

For the standardized normal midrange probability density function ((4)), the integration interval \((-\infty; \, q]\) was divided into two subintervals and \({w}\) was replaced in the algorithm to \(q\), as done before to the cumulative distribution function. Hence, the subintervals \((-\infty,q-8]\) and \([q-8,q]\) were considered. For the first subinterval, the following transformation of variable was used \(y=b+(1+t)/(t-1)\), resulting in an integration interval given by \([-1,1]\), as required to apply the Gauss-Legendre quadrature. With the same purpose, the transformation \(z=(a-b)t/2+(a+b)/2\) was applied to the second subinterval. The algorithm was denoted by dNMR. The approximation of ((4)) was computed by

\[\begin{aligned} f_{Q}(q)=&2 n(n-1)\Bigg[\int^{q-8}_{-\infty}\phi(y) \phi(2q-y)[\Phi(2q-y)-\Phi(y)]^{n-2}dy \nonumber\\ & \hphantom{2 n(n-1)} +\int_{q-8}^{q}\phi(y) \phi(2q-y)[\Phi(2q-y)-\Phi(y)]^{n-2}dy\Bigg]\nonumber\\ \approx & 2 n(n-1) \Bigg[\sum_{i=1}^m \frac{2}{(x_i-1)^2} \phi(y_i) \phi(2q-y_i)[\Phi(2q-y_i)-\Phi(y_i)]^{n-2}\nonumber\\ & \hphantom{2 n(n-1)} + 4\sum_{i=1}^m \phi(z_i) \phi(2q-z_i)[\Phi(2q-z_i)-\Phi(z_i)]^{n-2}\Bigg] \label{eq:distw1ap} \end{aligned} \tag{17}\]

The choice of the value \(8\) was motivated by the same choice made by Copenhaver and B. Holland (1988) to the normal range distribution. Since the midrange distribution is more peaked than the range distribution, this value seems to be a good choice for the standardized normal midrange distribution. Further investigations could be undertaken to optimize the choice of these subintervals.

The pseudocode to compute the standardized normal midrange probability density function is given by:

Input \(q\), \(n>1\), \(m\geq 2\);
Compute \(m\) nodes (\(x_i\)) and weights (\(w_i\)), with \(i=1\), \(2\), \(\ldots\), \(m\);
Transform \(x_i\), from \([-1,1]\) to \((-\infty,\ q-8]\), using:
\(y_i=(q-8)+(1-x_i)/(x_i-1)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(aux_i=\Phi(2q-y_i)-\Phi(y_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(aux1_i=\phi(y_i)\) and \(aux2_i=\phi(2q-y_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
If \(aux_i\leq 0\), then \(aux_i=\epsilon\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\ln(aux1_i)+(n-1)\ln(aux_i)+\ln(aux2_i)+\ln(2)+\ln(n)+\ln(n-1)\),
for \(i=1\), \(2\), \(\ldots\), \(m\);
Apply \(fy_i=\exp(fy_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=[2/(x_i-1)^2]fy_i\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=\sum\limits_{i=1}^{m}w_i fy_i\);
Transform \(x_i\) from \([-1,1]\) to \([q-8,q]\), using, \(a=q-8\) and \(b=q\), where
\(y_i=[(b-a)/2]x_i+(a+b)/2\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Repeat steps 4 to 8, with the \(y_i\) from \(11\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=[(b-a)/2]fy_i\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=I+\sum\limits_{i=1}^{m}w_ify_i\);
Return \(I\).

The standardized normal midrange cumulative distribution function

The algorithm for computing the cumulative distribution function of standardized normal midrange, expression ((5)), was developed using the Gauss-Legendre quadrature. This will be essential in the construction of the externally studentized normal midrange. First, the integration interval was divided into two subintervals to achieve higher accuracy as suggested by Quarteroni, R. Sacco, and F. Saleri (2007) and the following division was considered:

\[\label{eq:inftyw} F_Q(q)=\int^{q-8}_{-\infty}n\phi(y)[\Phi(2 q-y)-\Phi(y)]^{n-1}dy+\int_{q-8}^{q}n\phi(y)[\Phi(2q-y)-\Phi(y)]^{n-1}dy. \tag{18} \]

In these two parts, different variable transformations were considered. In the first part, the transformation \(y=(q-8)+(1+t)/(t-1)\) was applied to obtain an integration interval \([-1,1]\), needed to solve this integral by Gauss-Legendre quadrature. Considering \(y_i\) obtained by the above transformation, where \(t=x_i\) is the \(i\)th Gauss-Legendre quadrature node, \(i=1\), \(2\), \(\ldots\), \(m\), and \(m\) is the number of quadrature points, then the first part of the integral ((18)) was approximated by using the transformations ((11)) and ((14)). Hence,

\[\begin{aligned} \label{eq:inftywp1} I_1(q)=&\int^{q-8}_{-\infty}n\phi(y)[\Phi(2 q-y)-\Phi(y)]^{n-1}dy \nonumber\\ & \approx n \sum_{i=1}^m w_i \frac{2}{(x_i-1)^2}\phi(x_i)[\Phi(2 q-y_i)-\Phi(y_i)]^{n-1}. \end{aligned} \tag{19} \]

For the second part, the transformation \(y=(b-a)t/ 2+(a+b)/2\) was used, with the same goal presented for the first transformation. These changes were implemented implicitly as a numerical device, as shown above. Therefore, the second part of the integral ((18)) was approximated by using the transformations ((12)) and ((15)). It was given by

\[\begin{aligned} \label{eq:inftywp2} I_2(q)=&\int_{q-8}^{q}n\phi(y)[\Phi(2 q-y)-\Phi(y)]^{n-1}dy \nonumber\\ & \approx \frac{n(b-a)}{2}\sum_{i=1}^m w_i \phi(y_i)[\Phi(2 q-y_i)-\Phi(y_i)]^{n-1}, \end{aligned} \tag{20} \]

where \(b=q\) and \(a=q-8\).

Considering the expressions ((19)) and ((20)) the cumulative distribution function ((18)) was approximated by

\[\label{eq:inftyw3} F_Q(q)\approx I_1(q)+I_2(q), \tag{21} \]

in the same way as the numerical approximation adopted by McBane (2006) and by Verma and M. C. Suarez (2014) for obtaining the probability density and cumulative distribution functions and critical values for the range ratio statistics. The main difference between those works and the present results is the use of the Gauss-Legendre quadrature, besides the distribution functions where the quadrature was applied. All other integrals computed in this work will be based on this type of numerical approximation.

In the algorithm pseudocode, the standardized normal midrange denoted by \({w}\), is replaced by \(q\), following the notation of expression ((18)), avoiding confusion with the quadratures weights. Hence, the pseudocode using the Gauss-Legendre quadrature to obtain the cumulative distribution function of the standardized normal midrange ((21)), which is denoted by pNMR, is given by:

Input \(n> 1\), \(q\), and \(m\geq 2\);
Compute \(m\) nodes (\(x_i\)) and weights (\(w_i\)), with \(i=1,\ 2,\ldots,\ m\);
Transform \(x_i\), from interval \([-1,1]\) to \((-\infty,\ q-8]\): \(y_i=(q-8)+(1+x_i)/(x_i-1);\)
Compute \(aux1_i=\Phi(y_i)\);
Compute \(aux2_i=\phi(y_i)\);
Compute \(aux_i=\Phi(2q-y_i)-aux1_i\);
If \(aux_i\leq 0\), then \(aux_i=\epsilon\);
Compute \(fy_i=\log(aux2_i)+(n-1)\log(aux_i);\)
Compute \(fy_i=\exp(fy_i);\)
Compute \(fy_i=n(2/(x_i-1)^2)fy_i;\)
The quantities in steps 3-10, should be computed for each node \(x_i\), \(i=1\), \(2\), \(\ldots\), \(m\). Therefore, compute \(I=\sum\limits_{i=1}^{m}w_ify_i\);
Set \(a=q-8\) and \(b=q\);
Transform \(x_i\) from interval \([-1,1]\) to \([q-8,q]\): \(y_i=[(b-a)/2]x_i+(a+b)/2\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Repeat steps \(4\) to \(9\), with \(y_i\) of step \(13\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=n(b-a)/2\times fy_i\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=I+\sum\limits_{i=1}^{m}w_ify_i;\)
Return \(I\).

Standardized normal midrange quantile function

The algorithm qNMR was derived to compute the standardized normal midrange quantile function. The pseudocode makes use of the Newton-Raphson, expression ((16)), and is given by:

Input \(n>1\), \(m\geq 2\), and \(0<p<1\);
Set \(eps=1\times 10^{-13}\), for the error tolerance, and \(maxIt=5000\), for the maximum number of iterations;
Get initial estimate of \(q\), called \(q_0\), by:
1. If \(p<0.5\), then \(q_0=-0.5\), else go to step [3b];
2. If \(p>0.5\), then \(q_0=0.5\), else set \(q_0=0\);
\(it=0\);
While (\(it\leq maxIt\)) do
\(cdf=\textrm{\texttt{pNMR}}(q_0,\ n,\ np)\);
\(pdf=\textrm{\texttt{dNMR}}(q_0,\ n,\ np)\);
\(q_1=q_0-(cdf-p)/pdf\);
go to step \(6\);
If \(|q_1-q_0|\leq eps\), then return \(q_1\) and exit; otherwise, go to step \(5\), but first update the interactions counter \(it=it+1\) of Newton-Raphson method and do \(q_0=q_1\). In step \(5\), if this counter exceeds the limit \(maxIt\), go to step \(7\);
print the error message: “iterative process did not achieve convergence in \(maxIt\) steps.”

Externally studentized normal midrange probability density function

For computing the externally studentized normal midrange probability density function, given in ((6)), note that the innermost integral is the probability density function of the standardized normal midrange, given by (4) and (17). Thus, the dNMR algorithm presented above was reused for computing the probability density function of interest. Also, the variable \(x\) (standardized standard deviation) is replaced by \(s\) in the implemented algorithm, avoiding confusion with the quadrature nodes. The outermost integral was divided into two parts and a Gauss-Legendre quadrature was also applied in each part. The integration intervals are given by \([0,1]\) and \([1,\infty)\) for the same reasons mentioned above. Thus,

\[\begin{eqnarray}\label{eq:densq2} f(q; n, \nu) &=& \int^{\infty}_{0} f_Q(s q) f(s;\nu) ds\nonumber\\ &=& \int^{1}_{0} f_Q(s q) f(s;\nu) ds + \int^{\infty}_{1} f_Q(s q) f(s;\nu) ds, \end{eqnarray} \tag{22} \]

where \(f(sq)\) was approximated by expression (17), with \(q\) substituted by \(sq\), and computed by the algorithm dNMR, described in the previous section.

The innermost integral multiplied by the probability density function of the variable \(s\), \(f(s;\nu)\), was computed by

\[\begin{aligned} \label{eq:InnerMostfq} d(q; s, n, \nu) =& \textrm{\texttt{dNMR}}(sq, n, m)\times \frac{\nu^{\nu/2}}{\Gamma(\nu/2) 2^{\nu/2-1}}s^{\nu-1}e^{-\nu s^2/2} \end{aligned} \tag{23} \]

The auxiliary algorithm dSMR_aux was constructed to compute ((23)). Its pseudocode is given by:

Input \(s\) (quadrature node at the \(i\)th step), \(q\), \(n>1\) and \(m\geq 2\);
\(\Upsilon=\textrm{\texttt{dNMR}}(s\times q, \ n, \ m)\);
Compute \[dsln=(nu/2)\ln(nu)-\ln\Gamma(nu/2)-(nu/2-1)\ln(2)+(nu-1)\ln(s)-nu\times s^2/2;\]
Compute \(fx=s\times\exp(dsln)\);
Compute \(d=fx \times \Upsilon\);
Return \(d\).

Each integration subinterval of ((22)) was appropriately transformed enabling the Gauss-Legendre quadrature to be applied for approximating the integral ((6)). For this purpose, in the first integration subinterval, \([0,1]\), the transformation \(y=(a-b)t/2+(a+b)/2\) was adopted. Hence,

\[\begin{aligned} \label{eq:fq2p1} I_1(q; n, \nu) =& \int^{1}_{0} f_Q(s q) f(s;\nu) ds\nonumber \\ \approx & \sum_{i=1}^m w_i\frac{1}{2} d(q; y_i, n, \nu), \end{aligned} \tag{24} \]

where \(d(q; y_i, n, \nu)\) is computed by ((23)), with corresponding pseudocode given by dSMR_aux.

In the second subinterval of \([1,\infty)\), the transformation ((13)), with \(a=1\), was applied, being given by \(y=1+(1+t)/(1-t)\). Thus, considering the quadrature nodes \(x_i\)’s, the integral was approximated by

\[\begin{aligned} \label{eq:fq2p2} I_2(q; n, \nu) =& \int^{\infty}_{1} f_Q(s q) f(s;\nu) ds\nonumber \\ \approx & \sum_{i=1}^m w_i\frac{2}{(1-x_i)^2} d(q; y_i, n, \nu). \end{aligned} \tag{25} \]

The integration ((6)) was approximated by

\[\label{eq:densr2} f(q;n,\nu) \approx I_1(q; n, \nu)+I_2(q; n, \nu), \tag{26} \]

where \(I_1(q; n, \nu)\) and \(I_2(q; n, \nu)\) were computed by ((24)) and ((25)), respectively.

Again, all transformations were implicitly applied as a numerical integration device only. The expression ((26)) was used to obtain the pseudocode for the computation of the probability density function of the externally studentized normal midrange, denoted dMR. It is given by:

Input \(q\), \(n>1\), \(nu>0\) and \(m\geq 2\);
Compute \(m\) nodes (\(x_i\)) and weights (\(w_i\)), with \(i=1\), \(2\), \(\ldots\), \(m\);
Transform the variable \(x_i\) from \([-1,1]\) to \([0,1]\), using:
\(y_i=0,5x_i+0,5\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\frac{1}{2}\textrm{\texttt{dSMR\_aux}}(y_i,\ q,\ n,\ m)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=\sum\limits_{i=1}^{m} w_ify_i\);
Transform \(x_i\) from \([-1,1]\) to \([1,\infty]\):
\(y_i=1+(1+x_i)/(1-x_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\textrm{\texttt{dSMR\_aux}}(y_i,\ q, \ n,\ m)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\ln(fy_i)+\ln(2)-2\ln(1-x_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\exp(fy_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=I+\sum\limits_{i=1}^{m}(fy_i \times w_i)\);
Return \(I\).

Quadratures of \(64\) points are sufficient for most circumstances, although it is possible to increase the number of quadrature points and achieve the desired accuracy. The Newton-Raphson method was used to compute quantiles, since the probability density function, the first derivative, was available. The following subsections describe the algorithms and show their pseudocodes.

The externally studentized normal midrange cumulative distribution function

Note that the innermost integral of the externally studentized normal midrange cumulative distribution function, given in ((7)), is the cumulative distribution function of standardized normal midrange, given by ((5)) and ((18)). Thus, the pNMR algorithm presented above was reused for computing the cumulative distribution function of interest. Also, the variable \(x\) (standardized standard deviation) is replaced by \(s\) in the implemented algorithm, avoiding confusion with the quadrature nodes. The outermost integral was divided into two parts and a Gauss-Legendre quadrature was also applied in each part. The integration intervals are given by \([0,1]\) and \([1,\infty)\). The limit \(1\) was chosen since it is the modal value of the standardized normal standard deviation probability density function \(f(x;\nu)\), given in expression ([fxv]), following the ideas of Copenhaver and B. Holland (1988). Thus

\[\begin{aligned} \label{eq:Fq2} F(q; n, \nu) =& \int^{\infty}_{0} F_Q(s q) f(s;\nu) ds\nonumber\\ =& \int^{1}_{0} F_Q(s q) f(s;\nu) ds + \int^{\infty}_{1} F_Q(s q) f(s;\nu) ds, \end{aligned} \tag{27} \]

where \(F_Q(sq)\) was given by expression ((18)), with \(q\) substituted by \(s q\), and computed by the algorithm pNMR, described in the previous section.

The innermost integral multiplied by the probability density function of the variable \(s\), \(f(s;\nu)\), was computed by

\[\begin{aligned} \label{eq:InnerMostFq} I(q; s, n, \nu) =& \textrm{\texttt{pNMR}}(sq, n, m)\times \frac{\nu^{\nu/2}}{\Gamma(\nu/2) 2^{\nu/2-1}}s^{\nu-1}e^{-\nu s^2/2} \end{aligned} \tag{28} \]

Therefore, an auxiliary algorithm, denoted by pNMR_aux, was constructed to compute ((28)). The pseudocode is given by:

Input \(q\), \(s\), \(n>1\), \(nu>0\) and \(m\geq 2\);
\(I=\textrm{\texttt{pNMR}}(s\times q, \ n, \ m)\);
Compute \[fx=(nu/2)\ln(nu)-\ln\Gamma(nu/2)-(nu/2-1)\ln(2)+(nu-1)\ln(s)-s\times nu \times s/2;\]
Apply \(fx=\exp(fx)\);
Compute \(I=fx \times I\);
Return \(I\).

Each subinterval of ((27)) was appropriately transformed enabling the Gauss-Legendre quadrature to be applied for solving the integral ((7)). For this purpose, in the first integration subinterval, \([0,1]\), the transformation \(y=(a-b)t/2+(a+b)/2\), given by ((12)) and ((15)), was adopted. Hence, considering all quadrature nodes \(x_i\)’s, it follows that

\[\begin{aligned} \label{eq:Fq2p1} I_1(q; n, \nu) =& \int^{1}_{0} F_Q(s q) f(s;\nu) ds\nonumber \\ \approx & \sum_{i=1}^m w_i\frac{1}{2} I(q; y_i, n, \nu), \end{aligned} \tag{29} \]

where \(I(q; y_i, n, \nu)\) is given by ((28)).

In the second subinterval of \([1,\infty)\), the transformations ((10)) and ((13)), with \(a=1\), was applied and it is given by \(y=1+(1+t)/(1-t)\). Thus, considering the quadrature nodes \(x_i\)’s, the integral was approximated by

\[\begin{aligned} \label{eq:Fq2p2} I_2(q; n, \nu) =& \int^{\infty}_{1} F_Q(s q) f(s;\nu) ds\nonumber \\ \approx & \sum_{i=1}^m w_i\frac{2}{(1-x_i)^2} I(q; y_i, n, \nu). \end{aligned} \tag{30} \]

The integration ((7)) was computed by

\[\label{eq:acumr2} F(q;n,\nu) \approx I_1(q; n, \nu)+I_2(q; n, \nu), \tag{31} \]

using the results of ((29)) and ((30)).

All transformations were implicitly applied, i.e., they were only a numerical integration device. The pseudocode for the computation of the cumulative distribution function of the externally studentized normal midrange, denoted pMR, applies the ideas of expression ((31)) and is given by:

Input \(q\), \(nu> 0\), \(n>1\), and \(m\geq 2\);
Compute \(m\) nodes (\(x_i\)) and weights (\(w_i\)), with \(i=1\), \(2\), \(\ldots\), \(m\);
For the first part of the split integral, transform: \(y_i=0,5x_i+0,5\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\frac{1}{2}\textrm{\texttt{pNMR\_aux}}(q,\ y_i,\ n,\ m)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=\sum\limits_{i=1}^{m}w_i fy_i\);
For the second part of the split integral, transform: \(y_i=1+(1+x_i)/(1-x_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\textrm{\texttt{pNMR\_aux}}(q, y_i,n,np)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\ln(fy_i)+\ln(2)-2\ln(1-x_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Hence, compute \(fy_i=\exp(fy_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=I+\sum\limits_{i=1}^{m}w_ify_i\);
Return \(I\).

Externally studentized normal midrange quantile function

For the externally studentized normal midrange quantile function, the qMR algorithm was constructed. The algorithm applies the Newton-Raphson method and depends on the pMR an dMR methods. Its pseudocode is:

Input: \(n>1\), \(m\geq 2\), and \(0<p<1\);
Set \(eps=1\times 10^{-13}\), for the error tolerance, and \(maxIt=5000\), for the maximum number of iterations;
Get initial estimate of \(q\), called \(q_0\), by:
1. If \(p<0.5\), then \(q_0=-0.5\), else go to step [3b1];
2. If \(p>0.5\), then \(q_0=0.5\), else set \(q_0=0\);
\(it=0\);
While (\(it\leq maxIt\)) do
\(cdf=\textrm{\texttt{pMR}}(q_0,\ n,\ np)\);
\(pdf=\textrm{\texttt{dMR}}(q_0,\ n,\ np)\);
\(q_1=q_0-(cdf-p)/pdf\);
go to step \(6\);
If \(|q_1-q_0|\leq eps\), then return \(q_1\) and exit; otherwise, go to step \(5\) but first update the interactions counter \(it=it+1\) of Newton-Raphson method and do \(q_0=q_1\). In step \(5\), if this counter exceeds the limit \(maxIt\), go to step \(7\);
print the error message: “iterative process did not achieve convergence in \(maxIt\) steps.”

The random number generator

To generate random sample of size \(N\) from the normal midrange distribution, the algorithm rMR was constructed, with parameter \(n\) and \(\nu\). First, a sample from the standard normal distribution of size \(n\), \(X_1\), \(X_2\), \(\ldots\), \(X_n\), was simulated, where \(X_i\stackrel{\mathrm{iid}}{\sim}N(0,1)\). A chi-square variable \(U\sim \chi_\nu^2\), with degrees of freedom \(\nu\), was also simulated, and the following transformation was computed: \[{Q}=\frac{[max(X_i)+min(X_i)]/2}{\sqrt{\frac{U}{\nu}}}.\] With infinite degrees of freedom, then the following transformation was considered instead \[{W}=\frac{max(X_i)+min(X_i)}{2}.\] The process was repeated \(N\) times and the values of \({Q}\) or \({W}\), depending on the considered case, are stored in a vector of size \(N\).

This distribution can be used to obtain cumulative probabilities and quantiles. Thus, for example, given a quantile \({q}\) or \({w}\), the cumulative probability function, ensuring that \(N\) is large, is computed in an approximated way by: \[P( {Q}\leq {q})=\frac{\sum\limits_{i=1}^{N}I( {Q}_i\leq {q})}{N}, \ \nu < \infty \ \ %\textrm{ou}\] \[P( {W}\leq {w})=\frac{\sum\limits_{i=1}^{N}I( {W}_i\leq {w})}{N}, \ \nu=\infty.\]

The random number generator rMR returns the vector of \(N\) realizations of \({Q}\) or \({W}\) and its pseudocode is presented below. The rMR generator is dependent of the R functions rnorm() and rchisq(), for obtaining a random vector of standard normal independently variables and a chi-square realization, respectively. The matrix(x, p, q) function, presented below, creates a matrix of dimension \(p\times q\) filled by columns with the elements given by the vector \(x\).

Input: \(N > 1\), \(n > 1\) and \(\nu > 0\);
\(x = \textrm{\texttt{rnorm}}(N\times n)\)
\(X = \textrm{\texttt{matrix}}(x,N,n)\) of dimension \((N\times n)\);
if \(nu=\infty\), then go to ([ab]), else go to ([bc]);
\(u = \textrm{\texttt{rchisq}}(N, \nu)\);
\(X=X/\sqrt{(u/\nu)}\);
for \(i=1\) to \(N\) do:
\(\textrm{\texttt{rMR}}[i] = [\max(X[i,])+\min(X[i,])]/2\);
return \(\textrm{\texttt{rMR}}\).

4 Details of the SMR package

The package SMR provides the following functions, where np is the number of nodes and weights of the Gauss-Legendre quadrature:

dSMR(x, size, df, np=32): computes values of the probability density function, given in ((6)) or ((4));
pSMR(q, size, df, np=32): computes values of the cumulative distribution function, given in ((7)) or ((5));
qSMR(p, size, df, np=32): computes quantiles of the externally studentized normal midrange;
rSMR(n, size, df=Inf): drawn a random sample of size \(n\) from the externally studentized normal midrange.

The value of the argument df can be finite or infinity. If df=Inf, values of the probability density, cumulative distribution and quantile functions of the normal midrange (standardized normal midrange) are computed. If the argument df is not specified in the rSMR function, the default value Inf is used and random samples from the normal midrange distribution are drawn. The other functions presented earlier in the previous section are internal algorithms of the SMR package.

As an illustration, consider the following examples:

library(SMR)
set.seed(10701)
q    <- 2   # quantile
x    <- 2   # quantile
p    <- 0.9 # probability
n    <- 10  # sample size to be simulated
size <- 5   # normal sample size
df   <- 3   # degrees of freedom
np   <- 32  # number of points of the Gaussian quadrature

dSMR(x, size, df, np) # SMR pdf
[1] 0.01926172

pSMR(q, size, df, np) # SMR cdf
[1] 0.9851739

qSMR(p, size, df, np) # SMR quantile
[1] 0.8350065

rSMR(n, size, df)     # random sample of the SMR distribution
[1]  0.35108979  0.33786356 -0.13753510 -0.58741681 -0.40358907
[6] -0.72528615  0.45845331  0.08906021 -1.64157684  0.07022362

In the case \(\nu=\infty\), the result of the standardized normal midrange distribution is returned, as illustrated by the following example:

library(SMR)
q    <- 2   # quantile
x    <- 2   # quantile
p    <- 0.9 # cumulative probability
n    <- 10  # sample size to be simulated
size <- 5   # normal sample size
df   <- Inf # degrees of freedom
np   <- 32  # number of points of the Gaussian quadrature

dSMR(x, size, df, np) # normal MR pdf
[1] 0.0004487675

pSMR(q, size, df, np) # normal MR cdf
[1] 0.9999408

qSMR(p, size, df, np) # normal MR quantile
[1] 0.6531507

rSMR(n, size, df,)    # random sample of the normal MR distribution
[1] -0.52475079  0.10198842 -0.38647236  0.18939367  0.17756023
[6] -1.03384242  0.35608349  1.00629514  0.06360581  0.70835452

A concrete application of the SMR package on real dataset is now considered. We use the data on nitrogen contents of red clover plants presented in Steel and J. Torrie (1980), page \(180\). A completely randomized design with one factor with \(6\) levels and \(r=5\) replications were carried out. The mean squared error (MSE) was \(11.7887\), with \(\nu=24\) degrees of freedom. The factor means is presented in Table 1. A multiple comparison procedure was suggested by Batista (2012). The suggested procedure was similar to the Tukey test (Tukey 1953). First the least significant difference of the proposed test is computed by \[\begin{aligned} \Delta=& 2q_{1-\alpha/2;n,\nu}\sqrt{\dfrac{MSE}{r}}, \end{aligned}\] where \(q_{1-\alpha/2;n,\nu}\) is the \(100\alpha/2\%\) quantile of the externally studentized normal midrange. This value can be computed by using the function qSMR(0.975, 6, 24, np=32)=1.0049 of the SMR package, for the significance level \(\alpha=0.05\).

The \(\Delta\) value in this case is \[\begin{aligned} \Delta=& 2\times 1.0049\times\sqrt{\dfrac{11.7887}{5}}= 3.0859. \end{aligned}\]

Finally, the test is performed in the same way of the Tukey test. The results of the proposed midrange and Tukey tests are shown in Table 1. The proposed midrange test show no ambiguous results in this example, as happens with the Tukey test (two or more letters per level).

Table 1: The midrange and Tukey multiple comparison procedures applied to the nitrogen contents of \(6\) red clover plants presented in Steel and J. Torrie (1980), page \(180\).
Levels	means	Tukey\(^*\)	Midrange\(^*\)
5	13.26	c	d
3	14.64	c	d
6	18.70	cb	c
4	19.92	cb	c
2	23.98	ba	b
1	28.82	a	a

\(^*\) Means with the same letter are not significantly different.

5 Concluding remarks

The importance of the externally studentized normal midrange distribution can be enormous in the analysis of experiments, since the midrange estimator is more efficient than the sample mean in platykurtic distributions. Therefore, this distribution could be useful in the proposition of multiple comparison procedures, that could potentially show better results (more robust and powerful) than the traditional tests based on the externally studentized normal range.

This package is easy to use and shows very high accuracy. The accuracy of critical values is estimated through the computation of the difference between two results using different numbers of quadrature points according to the expression ((9)). The number of quadrature points can be chosen and for most cases, \(32\) points are enough to achieve high accuracy. Monte Carlo simulations were also used to evaluate the accuracy of the proposed methods and the consistency of the SMR package. The algorithms were proposed and implemented using Gauss-Legendre quadrature and the Newton-Raphson method in R software, resulting in the SMR package, that is available for download from CRAN. An important aspect is that the SMR package is written in R. Code in Fortran or C can make the SMR functions faster, a feature that is planned for future releases of the package. Another important aspect is the possibility of using in future releases two-dimensional interpolations when dSMR, pSMR and qSMR are called with long vectors of arguments \(x\), \(q\), and \(p\). In this case, two-dimensional grids for \(\{x\), \(q\), and \(p\}\) and \(\{size\}\) can be computed offline and the approx function used to interpolate.

6 Acknowledgments

We would like to thank CNPq and CAPES for their financial support. We would also like to thank two anonymous referees and the Editor for helpful comments which have resulted in an improved manuscript and package.

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.