The main purpose of this paper is to present the main algorithms underlining the construction and implementation of the SMR package, whose aim is to compute studentized normal midrange distribution. Details on the externally studentized normal midrange and standardized normal midrange distributions are also given. The package follows the same structure as the probability functions implemented in R. That is: the probability density function (dSMR
), the cumulative distribution function (pSMR
), the quantile function (qSMR
) and the random number generating function (rSMR
). Pseudocode and illustrative examples of how to use the package are presented.
The SMR package was created to provide an infrastructure for the studentized midrange distribution. This is a new distribution that was inspired by the externally studentized range distribution, which has been largely applied in multiple comparison procedures to identify the best treatment level and has been extensively studied theoretically. Several algorithms to compute the probability density, cumulative distribution, and quantile functions were published by Lund and J. R. Lund (1983) and Copenhaver and B. Holland (1988). Recently, Batista and D. F. Ferreira (2014) developed the theory of the externally studentized normal midrange distribution, which is new in the scientific literature to the best knowledge of the authors. The cumulative distribution, the probability density, and the quantile functions were obtained by them analytically.
Computations of the required multidimensional integrations should be done numerically. Therefore, Batista and D. F. Ferreira (2014) applied Gaussian quadrature for this task. In particular, they chose the Gauss-Legendre quadrature for solving numerical integrations, because it obtains more accurate results when compared with other Gaussian quadrature methods. The quantile function of the externally studentized normal midrange was computed by the Newton-Raphson method. Based on these numerical methods, the SMR package was built and released.
The package name was chosen to identify the Studentized MidRange
distribution. The package follows the same structure as the probability
functions implemented in R. The following functions were implemented in
the package: the probability density function (dSMR
), the cumulative
distribution function (pSMR
), the quantile function (qSMR
), and the
random number generating function (rSMR
).
Therefore, the main purpose of this paper is to present the main algorithms underlining the construction and implementation of the SMR package, showing pseudocode of its functions and providing the fundamental ideas for the appropriate use of the package through illustrative examples.
First, details on the externally studentized normal midrange and standardized normal midrange distributions are given. Second, the algorithms for the construction of the package and their respective pseudocodes are presented. Third, details of the SMR package functions are showed. Finally, illustrative examples of the package are presented.
Let
The midrange
The p.d.f. and c.d.f. of
and
see, e.g., David and H. N. Nagaraja (2003).
The studentized midrange
Considering the particular case of the normal distribution with mean
An important distribution for this study, well documented in
Batista (2012), Gumbel (1958) and Pillai (1950), is the standardized
normal midrange distribution, defined by
and the c.d.f. is
both results found in David and H. N. Nagaraja (2003), where
Therefore, according to Batista and D. F. Ferreira (2014), the p.d.f. and c.d.f. of
and
where
The p.d.f ((4)) and c.d.f. ((5)) are very important to the externally studentized normal midrange algorithms implementation in the SMR package.
The functions implemented in the SMR package are dependent on specific
functions of R which are: pnorm
, to obtain the cumulative distribution
function of the standard normal, dnorm
, to obtain the
standard normal probability density function, lgamma
, to
obtain the logarithm of the gamma function. To compute the nodes and
weights of the Gauss-Legendre quadrature, an R function based on the
method presented by Hildebrand (1974) was implemented. In the following
subsections the algorithms and pseudocodes used in the construction of
each routine of SMR package are presented.
The basic idea of Gauss-Legendre quadrature of a function
where
Let the symmetric tridiagonal matrix
For computing the quadrature weights and nodes, the eigenvalues and
eigenvectors of
The set
However, the externally studentized normal midrange distribution depends on integrals over infinite intervals. The integral over an infinite range should be changed into an integral over [-1,1] by using the following transformations (Davis and P. Rabinowitz 1984)
Therefore, the integrals were computed by applying the Gauss-Legendre quadrature rule on these transformed variables by
For more details, see Olver, D. W. Lozier, R. F. Boisvert, and C. W. Clark, editors (2010).
The Newton-Raphson approximation aims to find the roots of a real
function, that is,
where
The process starts with an initial arbitrary value
In this study the main objective was to find the quantile
The solution is obtained by
that should be computed sequentially until a certain convergence criterion is reached.
For the standardized normal midrange probability density function
((4)), the integration interval dNMR
. The
approximation of ((4)) was computed by
The choice of the value
The pseudocode to compute the standardized normal midrange probability density function is given by:
Input
Compute
Transform
Compute
Compute
If
Compute
for
Apply
Compute
Compute
Transform
Repeat steps 4 to 8, with the
Compute
Compute
Return
The algorithm for computing the cumulative distribution function of standardized normal midrange, expression ((5)), was developed using the Gauss-Legendre quadrature. This will be essential in the construction of the externally studentized normal midrange. First, the integration interval was divided into two subintervals to achieve higher accuracy as suggested by Quarteroni, R. Sacco, and F. Saleri (2007) and the following division was considered:
In these two parts, different variable transformations were considered.
In the first part, the transformation
For the second part, the transformation
where
Considering the expressions ((19)) and ((20)) the cumulative distribution function ((18)) was approximated by
in the same way as the numerical approximation adopted by McBane (2006) and by Verma and M. C. Suarez (2014) for obtaining the probability density and cumulative distribution functions and critical values for the range ratio statistics. The main difference between those works and the present results is the use of the Gauss-Legendre quadrature, besides the distribution functions where the quadrature was applied. All other integrals computed in this work will be based on this type of numerical approximation.
In the algorithm pseudocode, the standardized normal midrange denoted by
pNMR
, is given by:
Input
Compute
Transform
Compute
Compute
Compute
If
Compute
Compute
Compute
The quantities in steps 3-10, should be computed for each node
Set
Transform
Repeat steps
Compute
Compute
Return
The algorithm qNMR
was derived to compute the standardized normal
midrange quantile function. The pseudocode makes use of the
Newton-Raphson, expression ((16)), and is given by:
Input
Set
Get initial estimate of
If
If
While (
go to step
If
print the error message: “iterative process did not achieve
convergence in
For computing the externally studentized normal midrange probability
density function, given in ((6)), note that the innermost
integral is the probability density function of the standardized normal
midrange, given by (4) and (17). Thus, the
dNMR
algorithm presented above was reused for computing the
probability density function of interest. Also, the variable
where dNMR
, described
in the previous section.
The innermost integral multiplied by the probability density function of
the variable
The auxiliary algorithm dSMR_aux
was constructed to compute
((23)). Its pseudocode is given by:
Input
Compute
Compute
Compute
Return
Each integration subinterval of ((22)) was appropriately
transformed enabling the Gauss-Legendre quadrature to be applied for
approximating the integral ((6)). For this purpose, in the
first integration subinterval,
where dSMR_aux
.
In the second subinterval of
The integration ((6)) was approximated by
where
Again, all transformations were implicitly applied as a numerical
integration device only. The expression ((26)) was used to
obtain the pseudocode for the computation of the probability density
function of the externally studentized normal midrange, denoted dMR
.
It is given by:
Input
Compute
Transform the variable
Compute
Compute
Transform
Compute
Compute
Compute
Compute
Return
Quadratures of
Note that the innermost integral of the externally studentized normal
midrange cumulative distribution function, given in ((7)),
is the cumulative distribution function of standardized normal midrange,
given by ((5)) and ((18)). Thus, the pNMR
algorithm presented above was reused for computing the cumulative
distribution function of interest. Also, the variable
where pNMR
, described in
the previous section.
The innermost integral multiplied by the probability density function of
the variable
Therefore, an auxiliary algorithm, denoted by pNMR_aux
, was
constructed to compute ((28)). The pseudocode is given
by:
Input
Compute
Apply
Compute
Return
Each subinterval of ((27)) was appropriately transformed
enabling the Gauss-Legendre quadrature to be applied for solving the
integral ((7)). For this purpose, in the first integration
subinterval,
where
In the second subinterval of
The integration ((7)) was computed by
using the results of ((29)) and ((30)).
All transformations were implicitly applied, i.e., they were only a
numerical integration device. The pseudocode for the computation of the
cumulative distribution function of the externally studentized normal
midrange, denoted pMR
, applies the ideas of expression
((31)) and is given by:
Input
Compute
For the first part of the split integral, transform:
Compute
Compute
For the second part of the split integral, transform:
Compute
Compute
Hence, compute
Compute
Return
For the externally studentized normal midrange quantile function, the
qMR
algorithm was constructed. The algorithm applies the
Newton-Raphson method and depends on the pMR
an dMR
methods. Its
pseudocode is:
Input:
Set
Get initial estimate of
If
If
While (
go to step
If
print the error message: “iterative process did not achieve
convergence in
To generate random sample of size rMR
was constructed, with parameter
This distribution can be used to obtain cumulative probabilities and
quantiles. Thus, for example, given a quantile
The random number generator rMR
returns the vector of rMR
generator is dependent of the R functions rnorm()
and rchisq()
, for
obtaining a random vector of standard normal independently variables and
a chi-square realization, respectively. The matrix(x, p, q)
function,
presented below, creates a matrix of dimension
The package SMR provides the following functions, where np
is the
number of nodes and weights of the Gauss-Legendre quadrature:
dSMR(x, size, df, np=32)
: computes values of the probability
density function, given in ((6)) or ((4));
pSMR(q, size, df, np=32)
: computes values of the cumulative
distribution function, given in ((7)) or
((5));
qSMR(p, size, df, np=32)
: computes quantiles of the externally
studentized normal midrange;
rSMR(n, size, df=Inf)
: drawn a random sample of size
The value of the argument df
can be finite or infinity. If df=Inf
,
values of the probability density, cumulative distribution and quantile
functions of the normal midrange (standardized normal midrange) are
computed. If the argument df
is not specified in the rSMR
function,
the default value Inf
is used and random samples from the normal
midrange distribution are drawn. The other functions presented earlier
in the previous section are internal algorithms of the SMR package.
As an illustration, consider the following examples:
library(SMR)
set.seed(10701)
q <- 2 # quantile
x <- 2 # quantile
p <- 0.9 # probability
n <- 10 # sample size to be simulated
size <- 5 # normal sample size
df <- 3 # degrees of freedom
np <- 32 # number of points of the Gaussian quadrature
dSMR(x, size, df, np) # SMR pdf
[1] 0.01926172
pSMR(q, size, df, np) # SMR cdf
[1] 0.9851739
qSMR(p, size, df, np) # SMR quantile
[1] 0.8350065
rSMR(n, size, df) # random sample of the SMR distribution
[1] 0.35108979 0.33786356 -0.13753510 -0.58741681 -0.40358907
[6] -0.72528615 0.45845331 0.08906021 -1.64157684 0.07022362
In the case
library(SMR)
q <- 2 # quantile
x <- 2 # quantile
p <- 0.9 # cumulative probability
n <- 10 # sample size to be simulated
size <- 5 # normal sample size
df <- Inf # degrees of freedom
np <- 32 # number of points of the Gaussian quadrature
dSMR(x, size, df, np) # normal MR pdf
[1] 0.0004487675
pSMR(q, size, df, np) # normal MR cdf
[1] 0.9999408
qSMR(p, size, df, np) # normal MR quantile
[1] 0.6531507
rSMR(n, size, df,) # random sample of the normal MR distribution
[1] -0.52475079 0.10198842 -0.38647236 0.18939367 0.17756023
[6] -1.03384242 0.35608349 1.00629514 0.06360581 0.70835452
A concrete application of the SMR package on real dataset is now
considered. We use the data on nitrogen contents of red clover plants
presented in Steel and J. Torrie (1980), page qSMR(0.975, 6, 24, np=32)=1.0049
of
the SMR package, for the significance level
The
Finally, the test is performed in the same way of the Tukey test. The results of the proposed midrange and Tukey tests are shown in Table 1. The proposed midrange test show no ambiguous results in this example, as happens with the Tukey test (two or more letters per level).
Levels | means | Tukey |
Midrange |
---|---|---|---|
5 | 13.26 | c | d |
3 | 14.64 | c | d |
6 | 18.70 | cb | c |
4 | 19.92 | cb | c |
2 | 23.98 | ba | b |
1 | 28.82 | a | a |
The importance of the externally studentized normal midrange distribution can be enormous in the analysis of experiments, since the midrange estimator is more efficient than the sample mean in platykurtic distributions. Therefore, this distribution could be useful in the proposition of multiple comparison procedures, that could potentially show better results (more robust and powerful) than the traditional tests based on the externally studentized normal range.
This package is easy to use and shows very high accuracy. The accuracy
of critical values is estimated through the computation of the
difference between two results using different numbers of quadrature
points according to the expression ((9)). The number of
quadrature points can be chosen and for most cases, R
.
Code in Fortran or C can make the SMR functions faster, a feature that
is planned for future releases of the package. Another important aspect
is the possibility of using in future releases two-dimensional
interpolations when dSMR
, pSMR
and qSMR
are called with long
vectors of arguments approx
function used to interpolate.
We would like to thank CNPq and CAPES for their financial support. We would also like to thank two anonymous referees and the Editor for helpful comments which have resulted in an improved manuscript and package.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Batista & Ferreira, "SMR: An R Package for Computing the Externally Studentized Normal Midrange Distribution", The R Journal, 2015
BibTeX citation
@article{RJ-2014-029, author = {Batista, Ben Dêivide de Oliveira and Ferreira, Daniel Furtado}, title = {SMR: An R Package for Computing the Externally Studentized Normal Midrange Distribution}, journal = {The R Journal}, year = {2015}, note = {https://rjournal.github.io/}, volume = {6}, issue = {2}, issn = {2073-4859}, pages = {123-136} }