Statistical tolerance intervals are used for a broad range of applications, such as quality control, engineering design tests, environmental monitoring, and bioequivalence testing. tolerance is the only R package devoted to procedures for tolerance intervals and regions. Perhaps the most commonly-employed functions of the package involve normal tolerance intervals. A number of new procedures for this setting have been included in recent versions of tolerance. In this paper, we discuss and illustrate the functions that implement these normal tolerance interval procedures, one of which is a new, novel type of operating characteristic curve.
Statistical tolerance intervals of the form
tolerance (Young 2010) is a popular R package for constructing exact and approximate tolerance intervals and regions. Since its initial release in 2009, the package has grown to include tolerance interval procedures for a large number of parametric distributions, nonparametric settings, and regression models. There are also tolerance region procedures for the multivariate normal and multivariate regression settings. Procedures for more specific settings are also included, such as one-sided tolerance limits for the difference between two independent normal random variables (Hall 1984) and fiducial tolerance intervals for the function of parameters from discrete distributions (Mathew and Young 2013). The package also includes some graphical capabilities for visualizing the tolerance intervals (regions) by plotting the limits (regions) on histograms, scatterplots, or control charts of the data.
tolerance has been used for a broad range of applications, including cancer research (Heck et al. 2014), wildlife biology (Pasquaretta et al. 2012), assessing the performance of genetic algorithms (Van der Borght et al. 2014), ratio editing in surveys (Young and Mathew 2015), air quality assessment (Hafner et al. 2013), and instrumentation testing (Burr and Gavron 2010). General interest in tolerance can be gauged by the cranlogs package (Csardi 2015), which pulls download logs of the RStudio (RStudio Team 2015) CRAN mirror. Figure 1 shows the daily number of downloads of tolerance from the beginning of 2013 to the beginning of 2016. There is clearly a general increasing trend over the years as the average number of daily downloads per year is approximately 5, 10, and 15 in 2013, 2014, and 2015, respectively.
Capabilities of tolerance have been discussed in Young (2010, 2014). Even with those varied capabilities, perhaps the most
commonly used methods involve the normal distribution. Normal tolerance
intervals are often required during design verification or process
validation. The utility of normal tolerance intervals is further
highlighted in documents published by the EPA (Environmental Protection Agency 2006), the IAEA (International Atomic Energy Agency 2008),
and standard 16269-6 of the ISO (International Organization for Standardization 2014). In this paper, we discuss new
capabilities in tolerance specifically involving normal tolerance
intervals. This includes the calculation of exact and equal-tailed
normal tolerance intervals, Bayesian normal tolerance intervals,
tolerance intervals for fixed-effects ANOVA, and sample size
determination strategies. We also introduce novel pseudo-operating
characteristic (OC) curves that illustrate how the
As noted earlier, tolerance also includes a function for constructing
multivariate normal tolerance regions. The mvtol.region()
function was
included with the initial release of tolerance. mvtol.region()
includes several Monte Carlo procedures developed in Krishnamoorthy and Mathew (1999) and
Krishnamoorthy and Mondal (2006) for finding the plottol()
function can also be used to plot
tolerance ellipses over bivariate normal data and tolerance ellipsoids
over trivariate normal data. The latter is accomplished using plot3d()
from the rgl package (Adler and Murdoch 2014).
We will not discuss the mvtol.region()
function further since it is
already well-documented (Young 2010, 2014) and our present focus
is on newer capabilities in tolerance for normal tolerance intervals.
For our discussion, we assume that the reader has already installed and loaded tolerance using the usual commands:
> install.packages("tolerance")
> library(tolerance)
Let
Let
(Owen 1964) was the first to propose equal-tailed tolerance intervals for
the normal distribution. Equal-tailed normal tolerance intervals still
take the form of ((5)), but where the tolerance factor
The normtol.int()
function in tolerance is able to calculate all of
the one-sided tolerance limits, two-sided tolerance intervals, and
equal-tailed tolerance intervals discussed above. In the past,
challenges with computing noncentral distributions necessitated the use
of approximations for method
argument and their specific formulas are
outlined in Young (2010), which utilized tolerance version 0.2.2. Since
then, the exact method = "EXACT"
and method = "OCT"
,
respectively. Both of these methods use adaptive quadrature via the
integrate()
function as well as box-constrained optimization via the
optim()
function. The original approximation methods are still
available primarily to retain all functionality of previous versions of
tolerance.
The dataset that we will use to illustrate most of the procedures in our
discussion is a quality control dataset from Krishnamoorthy and Mathew (2009). The data are
from a machine that fills plastic containers with a liter of milk. At
the end of a particular shift, a sample of
> milk <- c(0.968, 0.982, 1.030, 1.003, 1.046,
+ 1.020, 0.997, 1.010, 1.027, 1.010,
+ 0.973, 1.000, 1.044, 0.995, 1.020,
+ 0.993, 0.984, 0.981, 0.997, 0.992)
A quick check of normality with the Shapiro-Wilk test confirms that this is an appropriate assumption:
> shapiro.test(milk)
Shapiro-Wilk normality test
data: milk
W = 0.96364, p-value = 0.6188
For the milk data, the
> normtol.int(x = milk, alpha = 0.05, P = 0.90, side = 1)
alpha P x.bar 1-sided.lower 1-sided.upper
1 0.05 0.9 1.0036 0.9610333 1.046167
> normtol.int(x = milk, alpha = 0.05, P = 0.90, side = 2, method = "EXACT", m = 50)
alpha P x.bar 2-sided.lower 2-sided.upper
1 0.05 0.9 1.0036 0.9523519 1.054848
> normtol.int(x = milk, alpha = 0.05, P = 0.90, side = 2, method = "OCT", m = 50)
alpha P x.bar 2-sided.lower 2-sided.upper
1 0.05 0.9 1.0036 0.9471414 1.060059
Note that the equal-tailed tolerance interval is wider than the
corresponding two-sided tolerance interval due to the more stringent
requirement of controlling the proportions in the tails. For the
two-sided tolerance intervals, the additional argument m
is used to
control the number of subintervals to use for performing the numerical
integration via integrate()
. While not illustrated above, there is an
additional argument that can be used if one wishes to construct
log-normal tolerance intervals. The argument log.norm
is a logical
argument set to FALSE
by default. If set to TRUE
, log-normal
tolerance intervals are calculated using the fact that if normtol.int()
function simply takes the logarithm of the
data in the x
argument, constructs the desired normal tolerance
limits, and then takes the anti-log of those limits.
Users of normal tolerance intervals are often interested in summarizing
a variety of possible K.table
function allows
the user to specify a vector of possible values for each of these three
quantities. A list is then returned whose elements are summarized
according to the by
argument. For example, suppose we are interested
in the
> K.table(n = c(10, 20), alpha = c(0.01, 0.05), P = c(0.95, 0.99),
+ side = 1, by.arg = "P")
$`0.95`
10 20
0.99 3.738315 2.807866
0.95 2.910963 2.396002
$`0.99`
10 20
0.99 5.073725 3.831558
0.95 3.981118 3.295157
For example, the first entry of the first matrix is the side = 2
, which requires the user to specify the method
argument;
e.g. "EXACT"
for values of "OCT"
for values of
by.arg
argument can also be set to "alpha"
or
"n"
depending on which quantity you want to represent the elements of
the outputted list.
Bayesian tolerance intervals were first presented in Aitchison (1964). For the
Bayesian set-up, let x be a vector of realizations of
The parameters
Similar to the classical setting,
Finally, if one considers the non-informative prior distribution
The bayesnormtol.int()
function for computing Bayesian normal
tolerance intervals is new as of tolerance version 1.1.1. It was
composed to closely mirror the normtol.int()
function. For the milk
data, suppose we use the conjugate prior structure in ((14)).
Assuming we have some historical knowledge about the milk filling
process, the following hyperparameter values are used:
> bayesnormtol.int(x = milk, alpha = 0.05, P = 0.90, side = 1,
+ hyper.par = list(mu.0 = 1.000, sig2.0 = 0.001,
+ m.0 = 20, n.0 = 20))
alpha P 1-sided.lower 1-sided.upper
1 0.05 0.9 0.9551936 1.048406
> bayesnormtol.int(x = milk, alpha = 0.05, P = 0.90, side = 2, method = "EXACT",
+ m = 50, hyper.par = list(mu.0 = 1.000, sig2.0 = 0.001,
+ m.0 = 20, n.0 = 20))
alpha P 2-sided.lower 2-sided.upper
1 0.05 0.9 0.9453603 1.05824
> bayesnormtol.int(x = milk, alpha = 0.05, P = 0.90, side = 2, method = "OCT",
+ m = 50, hyper.par = list(mu.0 = 1.000, sig2.0 = 0.001,
+ m.0 = 20, n.0 = 20))
alpha P 2-sided.lower 2-sided.upper
1 0.05 0.9 0.9407625 1.062838
The bayesnormtol.int()
function has the arguments x
, alpha
, P
,
side
, method
, and m
just as in the normtol.int()
. However, here
we also have hyper.par
, which is a list with elements for the four
hyperparameters. The output is structured identically to the output
obtained using normtol.int()
, which allows for easy comparison between
the classical and Bayesian results.
The approach for classical normal tolerance intervals can be easily
extended for the balanced fixed-effects ANOVA model
Let
The
We analyze the well-known dataset that resulted from an experimental
design regarding the effects of wool type and the amount of tension
applied to a loom of yarn on the number of warp breaks that occur on
that loom of yarn (Tippett 1950). The first factor is wool type, which has
two levels: A or B. The second factor is tension level, which has three
levels: low (L), medium (M), or high (H). The six treatments
(i.e. factor level combinations) are randomly assigned to one of 54
looms of yarn. Thus, we have
> lm.out <- lm(breaks ~ wool + tension, data = warpbreaks)
> out <- anovatol.int(lm.out, data = warpbreaks, alpha = 0.10,
+ P = 0.85, side = 2, method = "OCT")
These are 90%/85% 2-sided tolerance intervals.
> out
$wool
mean n k 2-sided.lower 2-sided.upper
A 31.03704 27 1.886857 9.117165 52.95691
B 25.25926 27 1.886857 3.339387 47.17913
$tension
mean n k 2-sided.lower 2-sided.upper
L 36.38889 18 1.948567 13.7521219 59.02566
M 26.38889 18 1.948567 3.7521219 49.02566
H 21.66667 18 1.948567 -0.9701003 44.30343
In the anovatol.int()
function, we have similar arguments as in the
normtol.int()
function, except now we input an object of class "lm"
and we also tell the function the name of the original dataset using the
data
argument. The output is a list summarizing the tolerance interval
results for each factor level. For example, the
We can also produce a figure of the above output by using the
plottol()
function as follows:
> plottol(out, x = warpbreaks)
The above produces the plot in Figure 2. This figure has a
separate panel for each factor. The
As noted in Faulkenberry and Weeks (1968), an important question for statistical
practitioners is "What sample size should be used to determine the
tolerance limits?" Those same authors addressed this problem by
developing an approach to ensure that the calculated tolerance intervals
are "close" to the quantiles that result in a content level at least
as large as
The norm.ss
function for sample size determination of normal tolerance
limits (intervals) is new as of tolerance version 1.1.1. The function
finds the minimum sample size
> norm.ss(alpha = 0.05, P = 0.90, delta = 0.10, P.prime = 0.97,
+ side = 2, m = 50, method = "FW")
alpha P delta P.prime n
1 0.05 0.9 0.1 0.97 60
Thus, the engineer would need a minimum sample size of
In the norm.ss()
function, the argument method
is set to "FW"
.
There are two additional sample size determination strategies that can
be calculated, which are controlled through the method
argument. Both
of these strategies assume there is some historical data and
specification limits for the process at-hand. We briefly illustrate
these strategies below and refer the reader to Young et al. (2016) for further
details.
The first alternative strategy is a simple “back-of-the-envelope”
calculation. We consider the problem of designing a study to demonstrate
that a process or product falls within the specification limits
For our example, suppose that the quality engineer is overseeing the
launch of a new process for filling the one-liter containers of milk,
which is intended to be more accurate than the previous process. The
company set specification limits at
> norm.ss(alpha = 0.05, P = 0.90, side = 2, spec = c(0.990, 1.010),
+ method = "DIR", hyper.par = list(mu.0 = 1.004, sig2.0 = 0.001))
alpha P delta P.prime n
1 0.05 0.9 5
Thus, the minimum sample size is method = "DIR"
, entering the specification limits in the
spec
argument, and entering the assumed hyper.par
.
The second alternative strategy presented in Young et al. (2016) is a method for
providing data-dependent values of the precision quantities
For the milk filling process, suppose the engineer has historical
measurements, which have a combined mean 0.994 liters and variance
0.002. Suppose the specification limits on the original process are
norm.ss()
function as follows:
> norm.ss(x = milk, alpha = 0.05, P = 0.90, side = 2, spec = c(0.900, 1.100),
+ method = "YGZO", hyper.par = list(mu.0 = 0.994, sig2.0 = 0.002))
alpha P delta P.prime n
1 0.05 0.9 0.1807489 0.9733307 42
Thus, the engineer would need a minimum sample size of
Sometimes, engineers and industrial statisticians are interested in
understanding how the confidence level or content level changes as a
function of
The first type of OC curve is used when one specifies a range of values
of the sample size
Suppose a company is designing a product and the engineer needs to
collect enough data so that the resulting two-sided tolerance interval
will have a
norm.OC(k = 4, alpha = NULL, P = c(0.90, 0.95, 0.99), n = 10:20,
side = 2, method = "EXACT", m = 25)
The resulting plot is given in Figure 3. For example, if
the engineer chooses
In the norm.OC()
function, the arguments of side
, method
, and m
are, again, passed down to the underlying K.factor()
function. In
order to generate Figure 3, we need to specify a single
value for k
(i.e. the P
.
Since we are constructing curves where the sample size is on the
n
. Note that alpha
must be
left at its default NULL
value.
Suppose now that the same engineer considers confidence levels of
norm.OC(k = 4, alpha = c(0.01, 0.05, 0.10), P = NULL, n = 10:20,
side = 2, method = "EXACT", m = 25)
The resulting plot is given in Figure 4. For example, if the
engineer chooses norm.OC()
function is similar to the previous
example, except that now we specify at least one value for alpha
and
leave P
must at its default NULL
value.
Finally, the norm.OC()
function can also be used to construct an
OC-curve where the norm.OC()
function
while leaving the k
argument at its default NULL
value:
norm.OC(k = NULL, P = c(0.90, 0.95, 0.99), alpha=c(0.01, 0.05, 0.10),
n = 10:20, side = 2, method = "EXACT", m = 25)
The resulting plot is given in Figure 5. This OC-curve
allows the user to assess the width of the tolerance interval as
tolerance is the only R package devoted to the calculation of tolerance intervals and regions. Since its earlier versions (Young 2010), there have been many updates to the package that include additional parametric tolerance interval procedures, improved nonparametric tolerance interval procedures, and some multivariate tolerance region procedures.
In this paper, we focused on the varied capabilities of tolerance
pertaining to tolerance intervals for the normal distribution. Many of
these procedures have been added to the package since the discussion
presented in Young (2010). We discussed the calculation of one-sided
normal tolerance limits, exact and equal-tailed normal tolerance
intervals, Bayesian normal tolerance intervals, tolerance intervals for
fixed-effects ANOVA, and sample size determination strategies. We also
introduced novel operating characteristic (OC) curves that illustrate
how the
The tolerance package continues to expand the functions available for constructing tolerance intervals and regions. We note that some of the updates over the years have been a direct result of requests by end users of the package. Thus, one can expect additional capabilities in future versions of tolerance, both for the normal setting and other data settings.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Young, "Normal Tolerance Interval Procedures in the *tolerance* Package", The R Journal, 2017
BibTeX citation
@article{RJ-2016-041, author = {Young, Derek S.}, title = {Normal Tolerance Interval Procedures in the *tolerance* Package}, journal = {The R Journal}, year = {2017}, note = {https://rjournal.github.io/}, volume = {8}, issue = {2}, issn = {2073-4859}, pages = {200-212} }