The main purpose of this paper is to present the main algorithms underlining the construction and implementation of the SMR package, whose aim is to compute studentized normal midrange distribution. Details on the externally studentized normal midrange and standardized normal midrange distributions are also given. The package follows the same structure as the probability functions implemented in R. That is: the probability density function (dSMR
), the cumulative distribution function (pSMR
), the quantile function (qSMR
) and the random number generating function (rSMR
). Pseudocode and illustrative examples of how to use the package are presented.
The SMR package was created to provide an infrastructure for the studentized midrange distribution. This is a new distribution that was inspired by the externally studentized range distribution, which has been largely applied in multiple comparison procedures to identify the best treatment level and has been extensively studied theoretically. Several algorithms to compute the probability density, cumulative distribution, and quantile functions were published by Lund and J. R. Lund (1983) and Copenhaver and B. Holland (1988). Recently, Batista and D. F. Ferreira (2014) developed the theory of the externally studentized normal midrange distribution, which is new in the scientific literature to the best knowledge of the authors. The cumulative distribution, the probability density, and the quantile functions were obtained by them analytically.
Computations of the required multidimensional integrations should be done numerically. Therefore, Batista and D. F. Ferreira (2014) applied Gaussian quadrature for this task. In particular, they chose the Gauss-Legendre quadrature for solving numerical integrations, because it obtains more accurate results when compared with other Gaussian quadrature methods. The quantile function of the externally studentized normal midrange was computed by the Newton-Raphson method. Based on these numerical methods, the SMR package was built and released.
The package name was chosen to identify the Studentized MidRange
distribution. The package follows the same structure as the probability
functions implemented in R. The following functions were implemented in
the package: the probability density function (dSMR
), the cumulative
distribution function (pSMR
), the quantile function (qSMR
), and the
random number generating function (rSMR
).
Therefore, the main purpose of this paper is to present the main algorithms underlining the construction and implementation of the SMR package, showing pseudocode of its functions and providing the fundamental ideas for the appropriate use of the package through illustrative examples.
First, details on the externally studentized normal midrange and standardized normal midrange distributions are given. Second, the algorithms for the construction of the package and their respective pseudocodes are presented. Third, details of the SMR package functions are showed. Finally, illustrative examples of the package are presented.
Let \(X_{1,n}\), \(X_{2,n}\), \(\ldots\), \(X_{n,n}\) be the order statistics of a random sample \(X_1\), \(X_2\), \(\ldots\), \(X_n\) of size \(n\) from a population with cumulative distribution function (c.d.f.) \(F(x)\) and probability density function (p.d.f.) \(f(x)\) arranged in order of magnitude of values.
The midrange \(R\) for a sample \(X_1\), \(X_2\), \(\ldots\), \(X_n\) is defined as \[\begin{aligned} R&=&(X_{1,n}+X_{n,n})/2, \end{aligned}\] see, e.g., Rider (1957).
The p.d.f. and c.d.f. of \({R}\) are
\[\begin{aligned} f_{{R}}({r})&= \int^{{r}}_{-\infty}2n(n-1)f(y) f(2{r}-y)[F(2{r}-y)-F(y)]^{n-2}dy \label{eq:fdpRbar} \end{aligned} \tag{1} \]
and
\[\begin{aligned} F_{{R}}({r}) &= n\int^{{r}}_{-\infty}f(y)[F(2{r}-y)-F(y)]^{n-1}dy, \label{eq:func_dist_r} \end{aligned} \tag{2} \]
see, e.g., David and H. N. Nagaraja (2003).
The studentized midrange \(Q\) for a sample \(X_1\), \(X_2\), \(\ldots\), \(X_n\) is defined as \[\begin{aligned} Q&=&\frac{R/\sigma}{S/\sigma}=\frac{R}{S}=\frac{W}{X}, \end{aligned}\] see, e.g., Batista (2012). When \(S\) and \(R\) are independent, \(Q\) corresponds to the externally studentized midrange, otherwise \(Q\) is the internally studentized midrange. The former occurs for example, when \(R\) is computed from one normal random sample of size \(n\), and \(S\) is obtained from another random sample of size \(m\). It also occurs when \(R/S\) is computed from the treatment means of an experimental factor with \(n\) levels and \(S=\sqrt{(QME)}\), where \(QME\) is the associated experimental mean square error with \(m-1\) degrees of freedom (Searle 1987). When considering the externally studentized midrange, it is relevant to compute \(Q\) as the ratio of \(W = R/\sigma\) the standardized midrange distribution and \(X=S/\sigma\).
Considering the particular case of the normal distribution with mean \(\mu=0\), without loss of generality, and variance \(\sigma^2\) the p.d.f. of \(X=S/\sigma\) is given by (Newman 1939)
\[\label{fxv} f(x;\nu)=\frac{\nu^{\nu/2}}{\Gamma(\nu/2)2^{\nu/2-1}}x^{\nu-1}e^{-\nu x^2/2}\textrm{, } x\geq 0. \tag{3} \]
An important distribution for this study, well documented in Batista (2012), Gumbel (1958) and Pillai (1950), is the standardized normal midrange distribution, defined by \({W}= {R}/\sigma\). The p.d.f. of \({W}\) is
\[\label{eq:distw1} f_{ {W}}( {w})=\int^{ {w}}_{-\infty}2 n(n-1)\phi(y) \phi(2 {w}-y)[\Phi(2 {w}-y)-\Phi(y)]^{n-2} dy. \tag{4} \]
and the c.d.f. is
\[\label{eq:acumw} F_{ {W}}( {w})=\int^{ {w}}_{-\infty} n\phi(y)[\Phi(2 {w}-y)-\Phi(y)]^{n-1}dy, \tag{5} \]
both results found in David and H. N. Nagaraja (2003), where \(\Phi(.)\) and \(\phi(.)\) are the c.d.f. and p.d.f. of the standard normal distribution \(N(0,1)\), respectively.
Therefore, according to Batista and D. F. Ferreira (2014), the p.d.f. and c.d.f. of \(Q\), in the particular case of the normal distribution with mean \(\mu=0\) and variance \(\sigma^2\), are respectively
\[\begin{aligned} \label{eq:densq} f( {q};n,\nu) =&\int^{\infty}_{0} \int^{x {q}}_{-\infty}2n(n-1)x\phi(y) \phi(2x {q}-y) [\Phi(2x {q}-y)-\Phi(y)]^{n-2}f(x;\nu)dydx, \end{aligned} \tag{6} \]
and
\[\label{eq:acumr} F( {q};n,\nu)=\int^{\infty}_{0} \int^{x {q}}_{-\infty}n\phi(y)[\Phi(2x {q}-y)-\Phi(y)]^{n-1}f(x;\nu)dydx, \tag{7} \]
where \(\nu\) is the number of degrees of freedom.
The p.d.f ((4)) and c.d.f. ((5)) are very important to the externally studentized normal midrange algorithms implementation in the SMR package.
The functions implemented in the SMR package are dependent on specific
functions of R which are: pnorm
, to obtain the cumulative distribution
function of the standard normal, \(\Phi(x)\); dnorm
, to obtain the
standard normal probability density function, \(\phi(x)\); lgamma
, to
obtain the logarithm of the gamma function. To compute the nodes and
weights of the Gauss-Legendre quadrature, an R function based on the
method presented by Hildebrand (1974) was implemented. In the following
subsections the algorithms and pseudocodes used in the construction of
each routine of SMR package are presented.
The basic idea of Gauss-Legendre quadrature of a function \(f(x)\), according to Davis and P. Rabinowitz (1984), is to write it as follows:
\[\label{eq:quadgauss2} \int\limits_{-1}^{1}f(x)dx=\int\limits_{-1}^{1}w(x)g(x)dx\approx \sum\limits_{i=1}^{m}w_ig(x_i), \tag{8} \]
where \(f(x)\equiv w(x)g(x)\) and \(w(x)\) is the weight function in the Gaussian quadrature, \(w_i\) and \(x_i\) are the nodes and weights, respectively, in an \(m\)-point Gaussian quadrature rule, for \(i\) \(=\) \(1\), \(2\), \(\ldots\), \(m\). The weight function is \(w(x)=1\) in the Gauss-Legendre quadrature, thus \(f(x)=g(x)\).
Let the symmetric tridiagonal matrix \(\mathcal{J}\), with elements \(\mathcal{J}_{i,i}=a_{i-1},\, i=1,\ldots,m\) and \(\mathcal{J}_{i-1,i}=\mathcal{J}_{i,i-1}=\sqrt{J_{i,i-1}J_{i-1,i}}= \sqrt{b_{i-1}},\, i=2,\ldots,m\), be the Jacobi matrix, given by \[\mathcal{J}=\left( \begin{array}{llllll} a_0 & \sqrt{b_1} & 0 & \ldots & \ldots & \ldots\\ \sqrt{b_1} & a_1 & \sqrt{b_2} & 0 & \ldots & \ldots \\ 0 & \sqrt{b_2} & a_2 & \sqrt{b_3} & 0 & \ldots \\ 0 & \ldots & \ldots & \ldots & \ldots & 0 \\ \ldots & \ldots & 0 & \sqrt{b_{m-2}} & a_{m-2} & \sqrt{b_{m-1}} \\ \ldots & \ldots & \ldots & 0 & \sqrt{b_{m-1}} & a_{m-1} \end{array} \right),\] where \(a_{i-1}=0\), for \(i=1,\ldots,m\) and \(\sqrt{b_{i-1}}=(i-1)/\sqrt{4(i-1)^2-1}\), for \(i=2,\ldots,m\), considering the Gauss-Legendre quadrature.
For computing the quadrature weights and nodes, the eigenvalues and eigenvectors of \(\mathcal{J}\) were obtained Hildebrand (1974). The nodes for the Gaussian quadrature were the eigenvalues of this tridiagonal matrix \(\mathcal{J}\) and the weights can be computed from the corresponding normalized eigenvectors \(\mathbf{\phi}^{(i)}\) associated to the eigenvalue \(x_i\). The weight \(w_i\) can be computed from the first element \(\phi_1^{(i)}\) of the \(i\)th corresponding eigenvector by \[w_i=\mu_0 \left(\phi_1^{(i)}\right)^2,\] where \(\mu_0=2\) for the Gauss-Legendre quadrature (Gil, J. Segura, and N. Temme 2007).
The set \(\{x_i,\, w_i\}\) is determined such that expression ((8)) yields an exact result for polynomials of degree \(2m-1\) or less (Chihara 1978). For non-polynomial functions the Gauss-Legendre quadrature error is defined by
\[\label{eq:erroquad} \varepsilon_m \approx \left| \sum^{q}_{i=1} w_ig(x_i)-\sum^{m}_{i=1}w_i g(x_i)\right|,\ \ \ q>m. \tag{9} \]
However, the externally studentized normal midrange distribution depends on integrals over infinite intervals. The integral over an infinite range should be changed into an integral over [-1,1] by using the following transformations (Davis and P. Rabinowitz 1984)
\[\begin{aligned} \int\limits_{a}^{\infty}f(y)dy&=&\int\limits_{-1}^{1} f\left(a+\frac{1+t}{1-t}\right)\frac{2}{(1-t)^2}dt, \label{eq:transf1p1} \end{aligned} \tag{10} \]
\[\begin{aligned} \int\limits_{-\infty}^{b}f(y)dy&=&\int\limits_{-1}^{1} f\left(b+\frac{1+t}{t-1}\right)\frac{2}{(t-1)^2}dt, \label{eq:transf2p1} \end{aligned} \tag{11} \]
\[\begin{aligned} \int\limits_{a}^{b}f(y)dy&=&\frac{b-a}{2} \int\limits_{-1}^{1}f\left(\frac{(b-a)}{2}t+\frac{(b+a)}{2} \right) dt. \label{eq:transf3p1} \end{aligned} \tag{12} \]
Therefore, the integrals were computed by applying the Gauss-Legendre quadrature rule on these transformed variables by
\[\begin{aligned} \int\limits_{a}^{\infty}f(y)dy &\approx& \sum_{i=1}^m w_i \frac{2}{(1-t_i)^2}f\left(a+\frac{1+t_i}{1-t_i}\right), \label{eq:transf1} \end{aligned} \tag{13} \]
\[\begin{aligned} \int\limits_{-\infty}^{b}f(y)dy &\approx& \sum_{i=1}^m w_i \frac{2}{(t_i-1)^2}f\left(b+\frac{1+t_i}{t_i-1}\right),\label{eq:transf2} \end{aligned} \tag{14} \]
\[\begin{aligned} \int_a^b f(y)dy &\approx& \frac{b-a}{2} \sum_{i=1}^m w_i f\left(\frac{b-a}{2}t_i + \frac{a+b}{2}\right).\label{eq:transf3} \end{aligned} \tag{15} \]
For more details, see Olver, D. W. Lozier, R. F. Boisvert, and C. W. Clark, editors (2010).
The Newton-Raphson approximation aims to find the roots of a real function, that is, \[y:f(y)=0.\] Thus, iteratively, the Newton-Raphson method follows
\[\begin{aligned} y_{k+1}=y_k-\frac{f(y_k)}{f'(y_k)}, \ \ k=0,1,2,\ldots \end{aligned}\]
where \(f'(y)\) is the first derivative of the function \(f(y)\), see, e.g., Miller (2014).
The process starts with an initial arbitrary value \(y_0\), which should be reasonably close to the true root, then the method will usually converge in a few iterations, considering \(f'(y_k)\neq 0\).
In this study the main objective was to find the quantile \(q\), given that the cumulative probability is \(p\), \(0<p<1\). Hence, considering the cumulative distribution function, \(F(q)\), and the probability density, \(f(q)\), functions of the random variable \(Q\), the desired solution is \[q: F(q)-p = 0.\]
The solution is obtained by
\[\begin{eqnarray} q_{k+1}=q_k-\frac{F(q_k)-p}{f(q_k)}, \ \ k=0,1,2,\ldots \label{eq:newtonraphson2} \end{eqnarray} \tag{16} \]
that should be computed sequentially until a certain convergence criterion is reached.
For the standardized normal midrange probability density function
((4)), the integration interval \((-\infty; \, q]\) was
divided into two subintervals and \({w}\) was replaced in the algorithm to
\(q\), as done before to the cumulative distribution function. Hence, the
subintervals \((-\infty,q-8]\) and \([q-8,q]\) were considered. For the
first subinterval, the following transformation of variable was used
\(y=b+(1+t)/(t-1)\), resulting in an integration interval given by
\([-1,1]\), as required to apply the Gauss-Legendre quadrature. With the
same purpose, the transformation \(z=(a-b)t/2+(a+b)/2\) was applied to the
second subinterval. The algorithm was denoted by dNMR
. The
approximation of ((4)) was computed by
\[\begin{aligned} f_{Q}(q)=&2 n(n-1)\Bigg[\int^{q-8}_{-\infty}\phi(y) \phi(2q-y)[\Phi(2q-y)-\Phi(y)]^{n-2}dy \nonumber\\ & \hphantom{2 n(n-1)} +\int_{q-8}^{q}\phi(y) \phi(2q-y)[\Phi(2q-y)-\Phi(y)]^{n-2}dy\Bigg]\nonumber\\ \approx & 2 n(n-1) \Bigg[\sum_{i=1}^m \frac{2}{(x_i-1)^2} \phi(y_i) \phi(2q-y_i)[\Phi(2q-y_i)-\Phi(y_i)]^{n-2}\nonumber\\ & \hphantom{2 n(n-1)} + 4\sum_{i=1}^m \phi(z_i) \phi(2q-z_i)[\Phi(2q-z_i)-\Phi(z_i)]^{n-2}\Bigg] \label{eq:distw1ap} \end{aligned} \tag{17}\]
The choice of the value \(8\) was motivated by the same choice made by Copenhaver and B. Holland (1988) to the normal range distribution. Since the midrange distribution is more peaked than the range distribution, this value seems to be a good choice for the standardized normal midrange distribution. Further investigations could be undertaken to optimize the choice of these subintervals.
The pseudocode to compute the standardized normal midrange probability density function is given by:
Input \(q\), \(n>1\), \(m\geq 2\);
Compute \(m\) nodes (\(x_i\)) and weights (\(w_i\)), with \(i=1\), \(2\), \(\ldots\), \(m\);
Transform \(x_i\), from \([-1,1]\) to \((-\infty,\ q-8]\), using:
\(y_i=(q-8)+(1-x_i)/(x_i-1)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(aux_i=\Phi(2q-y_i)-\Phi(y_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(aux1_i=\phi(y_i)\) and \(aux2_i=\phi(2q-y_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
If \(aux_i\leq 0\), then \(aux_i=\epsilon\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute
\(fy_i=\ln(aux1_i)+(n-1)\ln(aux_i)+\ln(aux2_i)+\ln(2)+\ln(n)+\ln(n-1)\),
for \(i=1\), \(2\), \(\ldots\), \(m\);
Apply \(fy_i=\exp(fy_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=[2/(x_i-1)^2]fy_i\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=\sum\limits_{i=1}^{m}w_i fy_i\);
Transform \(x_i\) from \([-1,1]\) to \([q-8,q]\), using, \(a=q-8\) and
\(b=q\), where
\(y_i=[(b-a)/2]x_i+(a+b)/2\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Repeat steps 4 to 8, with the \(y_i\) from \(11\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=[(b-a)/2]fy_i\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=I+\sum\limits_{i=1}^{m}w_ify_i\);
Return \(I\).
The algorithm for computing the cumulative distribution function of standardized normal midrange, expression ((5)), was developed using the Gauss-Legendre quadrature. This will be essential in the construction of the externally studentized normal midrange. First, the integration interval was divided into two subintervals to achieve higher accuracy as suggested by Quarteroni, R. Sacco, and F. Saleri (2007) and the following division was considered:
\[\label{eq:inftyw} F_Q(q)=\int^{q-8}_{-\infty}n\phi(y)[\Phi(2 q-y)-\Phi(y)]^{n-1}dy+\int_{q-8}^{q}n\phi(y)[\Phi(2q-y)-\Phi(y)]^{n-1}dy. \tag{18} \]
In these two parts, different variable transformations were considered. In the first part, the transformation \(y=(q-8)+(1+t)/(t-1)\) was applied to obtain an integration interval \([-1,1]\), needed to solve this integral by Gauss-Legendre quadrature. Considering \(y_i\) obtained by the above transformation, where \(t=x_i\) is the \(i\)th Gauss-Legendre quadrature node, \(i=1\), \(2\), \(\ldots\), \(m\), and \(m\) is the number of quadrature points, then the first part of the integral ((18)) was approximated by using the transformations ((11)) and ((14)). Hence,
\[\begin{aligned} \label{eq:inftywp1} I_1(q)=&\int^{q-8}_{-\infty}n\phi(y)[\Phi(2 q-y)-\Phi(y)]^{n-1}dy \nonumber\\ & \approx n \sum_{i=1}^m w_i \frac{2}{(x_i-1)^2}\phi(x_i)[\Phi(2 q-y_i)-\Phi(y_i)]^{n-1}. \end{aligned} \tag{19} \]
For the second part, the transformation \(y=(b-a)t/ 2+(a+b)/2\) was used, with the same goal presented for the first transformation. These changes were implemented implicitly as a numerical device, as shown above. Therefore, the second part of the integral ((18)) was approximated by using the transformations ((12)) and ((15)). It was given by
\[\begin{aligned} \label{eq:inftywp2} I_2(q)=&\int_{q-8}^{q}n\phi(y)[\Phi(2 q-y)-\Phi(y)]^{n-1}dy \nonumber\\ & \approx \frac{n(b-a)}{2}\sum_{i=1}^m w_i \phi(y_i)[\Phi(2 q-y_i)-\Phi(y_i)]^{n-1}, \end{aligned} \tag{20} \]
where \(b=q\) and \(a=q-8\).
Considering the expressions ((19)) and ((20)) the cumulative distribution function ((18)) was approximated by
\[\label{eq:inftyw3} F_Q(q)\approx I_1(q)+I_2(q), \tag{21} \]
in the same way as the numerical approximation adopted by McBane (2006) and by Verma and M. C. Suarez (2014) for obtaining the probability density and cumulative distribution functions and critical values for the range ratio statistics. The main difference between those works and the present results is the use of the Gauss-Legendre quadrature, besides the distribution functions where the quadrature was applied. All other integrals computed in this work will be based on this type of numerical approximation.
In the algorithm pseudocode, the standardized normal midrange denoted by
\({w}\), is replaced by \(q\), following the notation of expression
((18)), avoiding confusion with the quadratures weights.
Hence, the pseudocode using the Gauss-Legendre quadrature to obtain the
cumulative distribution function of the standardized normal midrange
((21)), which is denoted by pNMR
, is given by:
Input \(n> 1\), \(q\), and \(m\geq 2\);
Compute \(m\) nodes (\(x_i\)) and weights (\(w_i\)), with \(i=1,\ 2,\ldots,\ m\);
Transform \(x_i\), from interval \([-1,1]\) to \((-\infty,\ q-8]\): \(y_i=(q-8)+(1+x_i)/(x_i-1);\)
Compute \(aux1_i=\Phi(y_i)\);
Compute \(aux2_i=\phi(y_i)\);
Compute \(aux_i=\Phi(2q-y_i)-aux1_i\);
If \(aux_i\leq 0\), then \(aux_i=\epsilon\);
Compute \(fy_i=\log(aux2_i)+(n-1)\log(aux_i);\)
Compute \(fy_i=\exp(fy_i);\)
Compute \(fy_i=n(2/(x_i-1)^2)fy_i;\)
The quantities in steps 3-10, should be computed for each node \(x_i\), \(i=1\), \(2\), \(\ldots\), \(m\). Therefore, compute \(I=\sum\limits_{i=1}^{m}w_ify_i\);
Set \(a=q-8\) and \(b=q\);
Transform \(x_i\) from interval \([-1,1]\) to \([q-8,q]\): \(y_i=[(b-a)/2]x_i+(a+b)/2\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Repeat steps \(4\) to \(9\), with \(y_i\) of step \(13\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=n(b-a)/2\times fy_i\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=I+\sum\limits_{i=1}^{m}w_ify_i;\)
Return \(I\).
The algorithm qNMR
was derived to compute the standardized normal
midrange quantile function. The pseudocode makes use of the
Newton-Raphson, expression ((16)), and is given by:
Input \(n>1\), \(m\geq 2\), and \(0<p<1\);
Set \(eps=1\times 10^{-13}\), for the error tolerance, and \(maxIt=5000\), for the maximum number of iterations;
Get initial estimate of \(q\), called \(q_0\), by:
If \(p<0.5\), then \(q_0=-0.5\), else go to step [3b];
If \(p>0.5\), then \(q_0=0.5\), else set \(q_0=0\);
\(it=0\);
While (\(it\leq maxIt\)) do
\(cdf=\textrm{\texttt{pNMR}}(q_0,\ n,\ np)\);
\(pdf=\textrm{\texttt{dNMR}}(q_0,\ n,\ np)\);
\(q_1=q_0-(cdf-p)/pdf\);
go to step \(6\);
If \(|q_1-q_0|\leq eps\), then return \(q_1\) and exit; otherwise, go to step \(5\), but first update the interactions counter \(it=it+1\) of Newton-Raphson method and do \(q_0=q_1\). In step \(5\), if this counter exceeds the limit \(maxIt\), go to step \(7\);
print the error message: “iterative process did not achieve convergence in \(maxIt\) steps.”
For computing the externally studentized normal midrange probability
density function, given in ((6)), note that the innermost
integral is the probability density function of the standardized normal
midrange, given by (4) and (17). Thus, the
dNMR
algorithm presented above was reused for computing the
probability density function of interest. Also, the variable \(x\)
(standardized standard deviation) is replaced by \(s\) in the implemented
algorithm, avoiding confusion with the quadrature nodes. The outermost
integral was divided into two parts and a Gauss-Legendre quadrature was
also applied in each part. The integration intervals are given by
\([0,1]\) and \([1,\infty)\) for the same reasons mentioned above. Thus,
\[\begin{eqnarray}\label{eq:densq2} f(q; n, \nu) &=& \int^{\infty}_{0} f_Q(s q) f(s;\nu) ds\nonumber\\ &=& \int^{1}_{0} f_Q(s q) f(s;\nu) ds + \int^{\infty}_{1} f_Q(s q) f(s;\nu) ds, \end{eqnarray} \tag{22} \]
where \(f(sq)\) was approximated by expression (17), with
\(q\) substituted by \(sq\), and computed by the algorithm dNMR
, described
in the previous section.
The innermost integral multiplied by the probability density function of the variable \(s\), \(f(s;\nu)\), was computed by
\[\begin{aligned} \label{eq:InnerMostfq} d(q; s, n, \nu) =& \textrm{\texttt{dNMR}}(sq, n, m)\times \frac{\nu^{\nu/2}}{\Gamma(\nu/2) 2^{\nu/2-1}}s^{\nu-1}e^{-\nu s^2/2} \end{aligned} \tag{23} \]
The auxiliary algorithm dSMR_aux
was constructed to compute
((23)). Its pseudocode is given by:
Input \(s\) (quadrature node at the \(i\)th step), \(q\), \(n>1\) and \(m\geq 2\);
\(\Upsilon=\textrm{\texttt{dNMR}}(s\times q, \ n, \ m)\);
Compute \[dsln=(nu/2)\ln(nu)-\ln\Gamma(nu/2)-(nu/2-1)\ln(2)+(nu-1)\ln(s)-nu\times s^2/2;\]
Compute \(fx=s\times\exp(dsln)\);
Compute \(d=fx \times \Upsilon\);
Return \(d\).
Each integration subinterval of ((22)) was appropriately transformed enabling the Gauss-Legendre quadrature to be applied for approximating the integral ((6)). For this purpose, in the first integration subinterval, \([0,1]\), the transformation \(y=(a-b)t/2+(a+b)/2\) was adopted. Hence,
\[\begin{aligned} \label{eq:fq2p1} I_1(q; n, \nu) =& \int^{1}_{0} f_Q(s q) f(s;\nu) ds\nonumber \\ \approx & \sum_{i=1}^m w_i\frac{1}{2} d(q; y_i, n, \nu), \end{aligned} \tag{24} \]
where \(d(q; y_i, n, \nu)\) is computed by ((23)), with
corresponding pseudocode given by dSMR_aux
.
In the second subinterval of \([1,\infty)\), the transformation ((13)), with \(a=1\), was applied, being given by \(y=1+(1+t)/(1-t)\). Thus, considering the quadrature nodes \(x_i\)’s, the integral was approximated by
\[\begin{aligned} \label{eq:fq2p2} I_2(q; n, \nu) =& \int^{\infty}_{1} f_Q(s q) f(s;\nu) ds\nonumber \\ \approx & \sum_{i=1}^m w_i\frac{2}{(1-x_i)^2} d(q; y_i, n, \nu). \end{aligned} \tag{25} \]
The integration ((6)) was approximated by
\[\label{eq:densr2} f(q;n,\nu) \approx I_1(q; n, \nu)+I_2(q; n, \nu), \tag{26} \]
where \(I_1(q; n, \nu)\) and \(I_2(q; n, \nu)\) were computed by ((24)) and ((25)), respectively.
Again, all transformations were implicitly applied as a numerical
integration device only. The expression ((26)) was used to
obtain the pseudocode for the computation of the probability density
function of the externally studentized normal midrange, denoted dMR
.
It is given by:
Input \(q\), \(n>1\), \(nu>0\) and \(m\geq 2\);
Compute \(m\) nodes (\(x_i\)) and weights (\(w_i\)), with \(i=1\), \(2\), \(\ldots\), \(m\);
Transform the variable \(x_i\) from \([-1,1]\) to \([0,1]\), using:
\(y_i=0,5x_i+0,5\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\frac{1}{2}\textrm{\texttt{dSMR\_aux}}(y_i,\ q,\ n,\ m)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=\sum\limits_{i=1}^{m} w_ify_i\);
Transform \(x_i\) from \([-1,1]\) to \([1,\infty]\):
\(y_i=1+(1+x_i)/(1-x_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\textrm{\texttt{dSMR\_aux}}(y_i,\ q, \ n,\ m)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\ln(fy_i)+\ln(2)-2\ln(1-x_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\exp(fy_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=I+\sum\limits_{i=1}^{m}(fy_i \times w_i)\);
Return \(I\).
Quadratures of \(64\) points are sufficient for most circumstances, although it is possible to increase the number of quadrature points and achieve the desired accuracy. The Newton-Raphson method was used to compute quantiles, since the probability density function, the first derivative, was available. The following subsections describe the algorithms and show their pseudocodes.
Note that the innermost integral of the externally studentized normal
midrange cumulative distribution function, given in ((7)),
is the cumulative distribution function of standardized normal midrange,
given by ((5)) and ((18)). Thus, the pNMR
algorithm presented above was reused for computing the cumulative
distribution function of interest. Also, the variable \(x\) (standardized
standard deviation) is replaced by \(s\) in the implemented algorithm,
avoiding confusion with the quadrature nodes. The outermost integral was
divided into two parts and a Gauss-Legendre quadrature was also applied
in each part. The integration intervals are given by \([0,1]\) and
\([1,\infty)\). The limit \(1\) was chosen since it is the modal value of
the standardized normal standard deviation probability density function
\(f(x;\nu)\), given in expression ([fxv]), following the ideas of
Copenhaver and B. Holland (1988). Thus
\[\begin{aligned} \label{eq:Fq2} F(q; n, \nu) =& \int^{\infty}_{0} F_Q(s q) f(s;\nu) ds\nonumber\\ =& \int^{1}_{0} F_Q(s q) f(s;\nu) ds + \int^{\infty}_{1} F_Q(s q) f(s;\nu) ds, \end{aligned} \tag{27} \]
where \(F_Q(sq)\) was given by expression ((18)), with \(q\)
substituted by \(s q\), and computed by the algorithm pNMR
, described in
the previous section.
The innermost integral multiplied by the probability density function of the variable \(s\), \(f(s;\nu)\), was computed by
\[\begin{aligned} \label{eq:InnerMostFq} I(q; s, n, \nu) =& \textrm{\texttt{pNMR}}(sq, n, m)\times \frac{\nu^{\nu/2}}{\Gamma(\nu/2) 2^{\nu/2-1}}s^{\nu-1}e^{-\nu s^2/2} \end{aligned} \tag{28} \]
Therefore, an auxiliary algorithm, denoted by pNMR_aux
, was
constructed to compute ((28)). The pseudocode is given
by:
Input \(q\), \(s\), \(n>1\), \(nu>0\) and \(m\geq 2\);
\(I=\textrm{\texttt{pNMR}}(s\times q, \ n, \ m)\);
Compute \[fx=(nu/2)\ln(nu)-\ln\Gamma(nu/2)-(nu/2-1)\ln(2)+(nu-1)\ln(s)-s\times nu \times s/2;\]
Apply \(fx=\exp(fx)\);
Compute \(I=fx \times I\);
Return \(I\).
Each subinterval of ((27)) was appropriately transformed enabling the Gauss-Legendre quadrature to be applied for solving the integral ((7)). For this purpose, in the first integration subinterval, \([0,1]\), the transformation \(y=(a-b)t/2+(a+b)/2\), given by ((12)) and ((15)), was adopted. Hence, considering all quadrature nodes \(x_i\)’s, it follows that
\[\begin{aligned} \label{eq:Fq2p1} I_1(q; n, \nu) =& \int^{1}_{0} F_Q(s q) f(s;\nu) ds\nonumber \\ \approx & \sum_{i=1}^m w_i\frac{1}{2} I(q; y_i, n, \nu), \end{aligned} \tag{29} \]
where \(I(q; y_i, n, \nu)\) is given by ((28)).
In the second subinterval of \([1,\infty)\), the transformations ((10)) and ((13)), with \(a=1\), was applied and it is given by \(y=1+(1+t)/(1-t)\). Thus, considering the quadrature nodes \(x_i\)’s, the integral was approximated by
\[\begin{aligned} \label{eq:Fq2p2} I_2(q; n, \nu) =& \int^{\infty}_{1} F_Q(s q) f(s;\nu) ds\nonumber \\ \approx & \sum_{i=1}^m w_i\frac{2}{(1-x_i)^2} I(q; y_i, n, \nu). \end{aligned} \tag{30} \]
The integration ((7)) was computed by
\[\label{eq:acumr2} F(q;n,\nu) \approx I_1(q; n, \nu)+I_2(q; n, \nu), \tag{31} \]
using the results of ((29)) and ((30)).
All transformations were implicitly applied, i.e., they were only a
numerical integration device. The pseudocode for the computation of the
cumulative distribution function of the externally studentized normal
midrange, denoted pMR
, applies the ideas of expression
((31)) and is given by:
Input \(q\), \(nu> 0\), \(n>1\), and \(m\geq 2\);
Compute \(m\) nodes (\(x_i\)) and weights (\(w_i\)), with \(i=1\), \(2\), \(\ldots\), \(m\);
For the first part of the split integral, transform: \(y_i=0,5x_i+0,5\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\frac{1}{2}\textrm{\texttt{pNMR\_aux}}(q,\ y_i,\ n,\ m)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=\sum\limits_{i=1}^{m}w_i fy_i\);
For the second part of the split integral, transform: \(y_i=1+(1+x_i)/(1-x_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\textrm{\texttt{pNMR\_aux}}(q, y_i,n,np)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(fy_i=\ln(fy_i)+\ln(2)-2\ln(1-x_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Hence, compute \(fy_i=\exp(fy_i)\), for \(i=1\), \(2\), \(\ldots\), \(m\);
Compute \(I=I+\sum\limits_{i=1}^{m}w_ify_i\);
Return \(I\).
For the externally studentized normal midrange quantile function, the
qMR
algorithm was constructed. The algorithm applies the
Newton-Raphson method and depends on the pMR
an dMR
methods. Its
pseudocode is:
Input: \(n>1\), \(m\geq 2\), and \(0<p<1\);
Set \(eps=1\times 10^{-13}\), for the error tolerance, and \(maxIt=5000\), for the maximum number of iterations;
Get initial estimate of \(q\), called \(q_0\), by:
If \(p<0.5\), then \(q_0=-0.5\), else go to step [3b1];
If \(p>0.5\), then \(q_0=0.5\), else set \(q_0=0\);
\(it=0\);
While (\(it\leq maxIt\)) do
\(cdf=\textrm{\texttt{pMR}}(q_0,\ n,\ np)\);
\(pdf=\textrm{\texttt{dMR}}(q_0,\ n,\ np)\);
\(q_1=q_0-(cdf-p)/pdf\);
go to step \(6\);
If \(|q_1-q_0|\leq eps\), then return \(q_1\) and exit; otherwise, go to step \(5\) but first update the interactions counter \(it=it+1\) of Newton-Raphson method and do \(q_0=q_1\). In step \(5\), if this counter exceeds the limit \(maxIt\), go to step \(7\);
print the error message: “iterative process did not achieve convergence in \(maxIt\) steps.”
To generate random sample of size \(N\) from the normal midrange
distribution, the algorithm rMR
was constructed, with parameter \(n\)
and \(\nu\). First, a sample from the standard normal distribution of size
\(n\), \(X_1\), \(X_2\), \(\ldots\), \(X_n\), was simulated, where
\(X_i\stackrel{\mathrm{iid}}{\sim}N(0,1)\). A chi-square variable
\(U\sim \chi_\nu^2\), with degrees of freedom \(\nu\), was also simulated,
and the following transformation was computed:
\[{Q}=\frac{[max(X_i)+min(X_i)]/2}{\sqrt{\frac{U}{\nu}}}.\] With
infinite degrees of freedom, then the following transformation was
considered instead \[{W}=\frac{max(X_i)+min(X_i)}{2}.\] The process was
repeated \(N\) times and the values of \({Q}\) or \({W}\), depending on the
considered case, are stored in a vector of size \(N\).
This distribution can be used to obtain cumulative probabilities and quantiles. Thus, for example, given a quantile \({q}\) or \({w}\), the cumulative probability function, ensuring that \(N\) is large, is computed in an approximated way by: \[P( {Q}\leq {q})=\frac{\sum\limits_{i=1}^{N}I( {Q}_i\leq {q})}{N}, \ \nu < \infty \ \ %\textrm{ou}\] \[P( {W}\leq {w})=\frac{\sum\limits_{i=1}^{N}I( {W}_i\leq {w})}{N}, \ \nu=\infty.\]
The random number generator rMR
returns the vector of \(N\) realizations
of \({Q}\) or \({W}\) and its pseudocode is presented below. The rMR
generator is dependent of the R functions rnorm()
and rchisq()
, for
obtaining a random vector of standard normal independently variables and
a chi-square realization, respectively. The matrix(x, p, q)
function,
presented below, creates a matrix of dimension \(p\times q\) filled by
columns with the elements given by the vector \(x\).
Input: \(N > 1\), \(n > 1\) and \(\nu > 0\);
\(x = \textrm{\texttt{rnorm}}(N\times n)\)
\(X = \textrm{\texttt{matrix}}(x,N,n)\) of dimension \((N\times n)\);
\(u = \textrm{\texttt{rchisq}}(N, \nu)\);
\(X=X/\sqrt{(u/\nu)}\);
for \(i=1\) to \(N\) do:
\(\textrm{\texttt{rMR}}[i] = [\max(X[i,])+\min(X[i,])]/2\);
return \(\textrm{\texttt{rMR}}\).
The package SMR provides the following functions, where np
is the
number of nodes and weights of the Gauss-Legendre quadrature:
dSMR(x, size, df, np=32)
: computes values of the probability
density function, given in ((6)) or ((4));
pSMR(q, size, df, np=32)
: computes values of the cumulative
distribution function, given in ((7)) or
((5));
qSMR(p, size, df, np=32)
: computes quantiles of the externally
studentized normal midrange;
rSMR(n, size, df=Inf)
: drawn a random sample of size \(n\) from the
externally studentized normal midrange.
The value of the argument df
can be finite or infinity. If df=Inf
,
values of the probability density, cumulative distribution and quantile
functions of the normal midrange (standardized normal midrange) are
computed. If the argument df
is not specified in the rSMR
function,
the default value Inf
is used and random samples from the normal
midrange distribution are drawn. The other functions presented earlier
in the previous section are internal algorithms of the SMR package.
As an illustration, consider the following examples:
library(SMR)
set.seed(10701)
<- 2 # quantile
q <- 2 # quantile
x <- 0.9 # probability
p <- 10 # sample size to be simulated
n <- 5 # normal sample size
size <- 3 # degrees of freedom
df <- 32 # number of points of the Gaussian quadrature
np
dSMR(x, size, df, np) # SMR pdf
1] 0.01926172
[
pSMR(q, size, df, np) # SMR cdf
1] 0.9851739
[
qSMR(p, size, df, np) # SMR quantile
1] 0.8350065
[
rSMR(n, size, df) # random sample of the SMR distribution
1] 0.35108979 0.33786356 -0.13753510 -0.58741681 -0.40358907
[6] -0.72528615 0.45845331 0.08906021 -1.64157684 0.07022362 [
In the case \(\nu=\infty\), the result of the standardized normal midrange distribution is returned, as illustrated by the following example:
library(SMR)
<- 2 # quantile
q <- 2 # quantile
x <- 0.9 # cumulative probability
p <- 10 # sample size to be simulated
n <- 5 # normal sample size
size <- Inf # degrees of freedom
df <- 32 # number of points of the Gaussian quadrature
np
dSMR(x, size, df, np) # normal MR pdf
1] 0.0004487675
[
pSMR(q, size, df, np) # normal MR cdf
1] 0.9999408
[
qSMR(p, size, df, np) # normal MR quantile
1] 0.6531507
[
rSMR(n, size, df,) # random sample of the normal MR distribution
1] -0.52475079 0.10198842 -0.38647236 0.18939367 0.17756023
[6] -1.03384242 0.35608349 1.00629514 0.06360581 0.70835452 [
A concrete application of the SMR package on real dataset is now
considered. We use the data on nitrogen contents of red clover plants
presented in Steel and J. Torrie (1980), page \(180\). A completely randomized design with
one factor with \(6\) levels and \(r=5\) replications were carried out. The
mean squared error (MSE) was \(11.7887\), with \(\nu=24\) degrees of
freedom. The factor means is presented in Table 1. A
multiple comparison procedure was suggested by Batista (2012). The
suggested procedure was similar to the Tukey test (Tukey 1953). First
the least significant difference of the proposed test is computed by
\[\begin{aligned}
\Delta=& 2q_{1-\alpha/2;n,\nu}\sqrt{\dfrac{MSE}{r}},
\end{aligned}\] where \(q_{1-\alpha/2;n,\nu}\) is the \(100\alpha/2\%\)
quantile of the externally studentized normal midrange. This value can
be computed by using the function qSMR(0.975, 6, 24, np=32)=1.0049
of
the SMR package, for the significance level \(\alpha=0.05\).
The \(\Delta\) value in this case is \[\begin{aligned} \Delta=& 2\times 1.0049\times\sqrt{\dfrac{11.7887}{5}}= 3.0859. \end{aligned}\]
Finally, the test is performed in the same way of the Tukey test. The results of the proposed midrange and Tukey tests are shown in Table 1. The proposed midrange test show no ambiguous results in this example, as happens with the Tukey test (two or more letters per level).
Levels | means | Tukey\(^*\) | Midrange\(^*\) |
---|---|---|---|
5 | 13.26 | c | d |
3 | 14.64 | c | d |
6 | 18.70 | cb | c |
4 | 19.92 | cb | c |
2 | 23.98 | ba | b |
1 | 28.82 | a | a |
\(^*\) Means with the same letter are not significantly different.
The importance of the externally studentized normal midrange distribution can be enormous in the analysis of experiments, since the midrange estimator is more efficient than the sample mean in platykurtic distributions. Therefore, this distribution could be useful in the proposition of multiple comparison procedures, that could potentially show better results (more robust and powerful) than the traditional tests based on the externally studentized normal range.
This package is easy to use and shows very high accuracy. The accuracy
of critical values is estimated through the computation of the
difference between two results using different numbers of quadrature
points according to the expression ((9)). The number of
quadrature points can be chosen and for most cases, \(32\) points are
enough to achieve high accuracy. Monte Carlo simulations were also used
to evaluate the accuracy of the proposed methods and the consistency of
the SMR package. The algorithms were proposed and implemented using
Gauss-Legendre quadrature and the Newton-Raphson method in R software,
resulting in the SMR package, that is available for download from
CRAN. An important aspect is that the SMR package is written in R
.
Code in Fortran or C can make the SMR functions faster, a feature that
is planned for future releases of the package. Another important aspect
is the possibility of using in future releases two-dimensional
interpolations when dSMR
, pSMR
and qSMR
are called with long
vectors of arguments \(x\), \(q\), and \(p\). In this case, two-dimensional
grids for \(\{x\), \(q\), and \(p\}\) and \(\{size\}\) can be computed offline
and the approx
function used to interpolate.
We would like to thank CNPq and CAPES for their financial support. We would also like to thank two anonymous referees and the Editor for helpful comments which have resulted in an improved manuscript and package.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Batista & Ferreira, "SMR: An R Package for Computing the Externally Studentized Normal Midrange Distribution", The R Journal, 2015
BibTeX citation
@article{RJ-2014-029, author = {Batista, Ben Dêivide de Oliveira and Ferreira, Daniel Furtado}, title = {SMR: An R Package for Computing the Externally Studentized Normal Midrange Distribution}, journal = {The R Journal}, year = {2015}, note = {https://rjournal.github.io/}, volume = {6}, issue = {2}, issn = {2073-4859}, pages = {123-136} }