In practical applications of control charts the in-control state and the corresponding chart parameters are usually estimated based on some past in-control data. The estimation error then needs to be accounted for. In this paper we present an R package, spcadjust, which implements a bootstrap based method for adjusting monitoring schemes to take into account the estimation error. By bootstrapping the past data this method guarantees, with a certain probability, a conditional performance of the chart. In spcadjust the method is implement for various types of Shewhart, CUSUM and EWMA charts, various performance criteria, and both parametric and non-parametric bootstrap schemes. In addition to the basic charts, charts based on linear and logistic regression models for risk adjusted monitoring are included, and it is easy for the user to add further charts. Use of the package is demonstrated by examples.
Control charts for statistical process monitoring are commonly used in a variety of different areas like industrial process control, medicine, finance, insurance, environmental science etc. See for instance (Stoumbos et al. 2000), (Woodall 2006), (Frisén 2008), (Schmid 2007a), and (Schmid 2007b) for an overview.
A challenge in most practical applications of control charts is that the in-control state of the process to be monitored needs to be estimated. This introduces estimation error which needs to be accounted for. A common convention in many applications has been to assume the in-control distribution to be known and ignore the estimation error (e.g. Grigg and Farewell 2004; Biswas and Kalbfleisch 2008; Bottle and Aylin 2008; Fouladirad et al. 2008; Gandy et al. 2010). However, there is an increasing awareness that the estimation error might have a detrimental effect on the performance of control charts and many authors have addressed this for various specific charts (e.g. Jones 2002; Albers and Kallenberg 2004, 2005; Jones et al. 2004; Albers et al. 2005; Jensen et al. 2006; Champ and Jones-Farmer 2007; Capizzi and Masarotto 2009; Chatterjee and Qiu 2009; Zhang et al. 2011; Saleh et al. 2015; Zhang et al. 2016).
(Gandy and Kvaløy 2013) presented a bootstrap based method for adjusting for estimation error which applies to a wide variety of control charts and different performance measures. The method can in particular be used to give a guaranteed conditional in-control performance of the chart. A typical application is to calculate an adjusted signal limit of a chart to guarantee with high probability that the in-control average run length or hitting probability is not below/above a specified value. Theoretical properties and conditions needed for the method to apply are worked out in (Gandy and Kvaløy 2013). A similar bootstrap approach was also briefly mentioned in (Jones and Steiner 2012) in a study of risk-adjusted CUSUM charts.
We have developed an R package called spcadjust (Gandy and Kvaløy 2016) which implements the bootstrap method for a number of different charts, with different performance measures and both parametric and non-parametric bootstrapping procedures. The package covers the most common set-ups for Shewhart, CUSUM and EWMA charts, including risk-adjusted charts. Moreover, it is easy for the user to add further charts, data models and estimation procedures.
There exist several other R packages for control charts. The surveillance package (Salmon et al. 2016) provides a variety of methods for monitoring, simulation and visualization of temporal and spatio-temporal data. The packages spc (Knot 2016), qcc (Scrucca 2004), IQCC (Scrucca 2014) and qcr (Flores 2016) provide functions for calculating various performance measures, signal limits, graphical displays etc. for a selection of classical control charts. The edcc package (Zhu and Park 2013) has functions for economic design of control charts, while the MSQC package (Santos-Fernández 2013) provides a tool kit for multivariate process monitoring. However, none of these packages include methods for taking into account the impact of estimation error on the performance of the charts, which is the main novelty of the spcadjust package.
In the next section, we give a brief introduction to the problem and describe the bootstrap approach for adjusting for estimation error. In the two sections thereafter we first illustrate basic use of the spcadjust package, then we describe details of the package and illustrate more advanced use. In the last section, a real data example is provided.
Stable versions of the package spcadjust are available on CRAN. Development versions are available in an open git repository (https://bitbucket.org/agandy/spcadjust/). We welcome contributions to the functionality of the package.
Control charts are a set of statistical techniques for monitoring a stream of data over time. A typical application is to monitor whether a stream of measurements follows a certain distribution, often called the in-control distribution, over time. If the distribution of the measurements deviates from the in-control distribution in a certain way the control chart should quickly detect this. In more advanced situations, regression adjustments are needed, and the monitoring is based on detecting deviations from the regression model.
In most common control charts like Shewhart, CUSUM and EWMA charts, a function of the observations is plotted for each new observation and a signal is given if this function crosses a certain threshold. Both the parameters of the function and the threshold are in most cases calculated according to estimates of the in-control distribution. This implies that estimation error will affect the performance of the control charts. For instance, measurement errors might lead to control charts that give false alarms far too often.
We first give a motivating example, and then our bootstrap method for handling estimation error is described.
We consider an example with a CUSUM chart for monitoring changes in the
mean of a stream of normally distributed data. Assume first that we know
that the stream consists of independent observations
However, in practice, the true in-control distribution
Since the future in-control data come from the true in-control
distribution
A typical application of our bootstrap method described in the next
subsections will be to calculate an adjustment to the naive threshold
estimate
As an illustration we generated
Assuming (incorrectly) that the estimated parameters are the true
parameters this would lead to a threshold of
The result of running the CUSUM chart with these estimated parameters on
a stream of
In the bottom row of Figure 1 the same CUSUM chart is run on data which are out of control from observation 81 and onwards. There is only a slight delay in the detection of the out-of-control situation with the adjusted threshold. Also, there are two false alarms in the first 80 observations with the unadjusted threshold.
In this section we describe our bootstrap procedure for situations with
homogeneous observations. Assume that in-control we have a stream of
independent observations
To run such a chart, certain parameters,
Common performance measures for control charts are the ARL and the
hitting probability of the chart within a certain number of steps. These
depend both on the unknown
The signal limit
Again, both
Let
The suggestion in (Gandy and Kvaløy 2013) is to calculate, by bootstrapping, an
adjusted threshold which with high probability will guarantee that the
in-control performance is not worse than the nominal value. For instance
to calculate an adjustment to
To make a unified presentation of the bootstrap we let
If we return to the example in the previous section and consider the
threshold needed to get a certain in-control
In many applications of control charts the units being monitored are heterogeneous, for instance when monitoring data from human beings. To make reasonable monitoring systems in such situations the explainable part of the difference between units should be accounted for by regression models. Charts based on regression models are often called risk-adjusted, an overview of some such charts is found in (Grigg and Farewell 2004).
For risk-adjusted charts the regression model has to be estimated based
on past data, and the impact of estimation error thus has to be taken
into account. The bootstrap procedure outlined in the previous section
also applies to risk adjusted charts. Let the stream of observations now
be denoted
As an example consider a CUSUM chart for a linear regression model.
Suppose that in-control
To account for the estimation error we can use the bootstrap procedure
from the previous section with e.g.
So far we have focused on how to adjust the signal limit of the control
chart to achieve a certain performance with a high probability. In
practice that will be a typical application, but it is easy to change
focus to other quantities. For instance, instead of adjusting the
threshold to obtain a certain ARL we could instead fix the threshold and
calculate which ARL we with high probability at least will achieve. For
instance with Shewhart charts it is very common to use
Another variant which we consider further in Section Implementing a
new type of chart is Shewhart charts with non-symmetric
signal limits for skew distributions. For such charts the signal limit
can be defined in terms of the quantiles corresponding to a certain tail
probability
In this section we discuss how the package can be used for pre-implemented chart types and data models. The framework provided in the package can also easily be extended to work with other charts, data models and/or estimation procedures as will be explained in Section Details of the package and advanced usage.
An important basic structure of the package is that chart types and data
models are implemented in separate objects and in such a way that they
can be flexibly combined. The implemented chart types are the Shewhart
chart (SPCShew
), the CUSUM chart (SPCCUSUM
) and the EWMA chart
(SPCEWMA
). Table 1 lists the implemented
data models.
Calculation of charts and chart properties like thresholds, ARLs and hitting probabilities are defined with the chart type object. Estimation of chart parameters, the form and cdf of updates and the bootstrap procedure are defined in the data model object. With updates we mean the quantity added to the chart in each step. Parametric bootstraping is used in the normal model, nonparametric bootstraping in all the other pre-implemented models.
The parameter Delta = 0
for Shewhart and EWMA charts.
class name | data model |
---|---|
"SPCModelNormal" |
normally distributed updates of the form |
(Sections CUSUM chart with estimated in-control state and | |
Shewhart chart with estimated in-control state) | |
"SPCModelNonparCenterScale" |
updates |
(Section CUSUM chart with estimated in-control state) | |
"SPCModelNonpar" |
user defined updates, no distributional assumptions |
"SPCModellm" |
linear regression model with updates |
(Sections CUSUM chart with linear regression model and | |
EWMA chart with linear regression model) | |
"SPCModellogregLikRatio" |
logistic regression model, likelihood ratio updates |
(Section Application to cardiac surgery data) | |
"SPCModellogregOE" |
logistic regression model, observed minus expected |
updates |
The main steps for basic usage of the package is to define a chart
object with the new()
function and to calculate the properties of
interest with the SPCproperty()
function. A generic description of
these functions is given below.
To define a chart object we need to specify the combination of chart type and data model as follows:
chartgeneric <- new("Charttype", model = Datamodel(Delta = x),...)
Here "Charttype"
should be one of "SPCShew"
, "SPCCUSUM"
or
"SPCEWMA"
and "Datamodel"
should be one of the class names listed in
Table 1. Finally x
should be 0 for Shewhart
and EWMA chart and set to the desired value for CUSUM charts.
The following function invokes the bootstrap procedure and calculate the property of interest for the chart:
SPCproperty(data, nrep, property = "specifyproperty",chart = chartgeneric,
params = list(specifyparameters), covprob=0.9, parallel=1,...)
Here data
are the past observations (usually a vector) and nrep
is
the number of bootstrap replications. Further, specifyproperty
specifies the property of interest, with choices calARL
, calhitprob
,
ARL
and hitprob
. The two first choices calculate a calibrated
threshold to achieve, with high probability, a desired ARL or a desired
hitting probability . Based on a specified choice of threshold, the two
last choices calculate the smallest ARL or the largest hitting
probability that is attained with high probability. Necessary parameters
are given in specifyparameters
(depending on the property this
includes the desired ARL, the desired hitting probability or the
threshold). Finally, covprob
, gives the desired coverage probability
parallel
option.
Further functions and details are explained in the examples below. In the two first subsections below we consider situations with homogeneous observations, in the next subsections, we consider situations with risk-adjusted charts.
We now return to the motivating example in Section Motivating example. Recall that we want to run a CUSUM chart of the form ((1)) to monitor for a change in the mean in a stream of normally distributed data. We first define the chart object by specifying chart type and data model.
> library(spcadjust)
> chart <- new("SPCCUSUM", model = SPCModelNormal(Delta = 1))
Here "SPCCUSUM"
specifies that it should be a CUSUM chart, and
SPCModelNormal(Delta = 1)
specifies that it is a model with normally
distributed updates of the form
Next we generate xiofdata
to calculate the estimated chart parameters
> X <- rnorm(100)
> xihat <- xiofdata(chart, X)
> str(xihat)
List of 3
$ mu: num -0.0284
$ sd: num 0.921
$ m : int 100
Now we can use the function SPCproperty()
to calculate the naive and
adjusted threshold:
> cal <- SPCproperty(data = X, nrep = 50, property = "calARL",
+ chart = chart, params = list(target = 500), covprob = 0.9,quiet = TRUE)
> cal
90 # CI: A threshold of 5.513 gives an in-control ARL of at least 500.
Unadjusted result: 4.101
Based on 50 bootstrap repetitions.
Here property = "calARL"
specifies that the chart should be calibrated
to achieve a certain ARL, this ARL is specified to be 500
(target = 500
) and the probability of attaining at least this ARL
specified to be 90% (covprob = 0.9
). The adjusted threshold of 5.5 is
calculated using parametric bootstrapping with nrep
replications
assuming normality of the observations. For real applications nrep
should of course be more than 50. If nonparametric bootstrapping is
preferred the model specification in the definition of the chart should
be replaced by SPCModelNonparCenterScale()
.
If we rather would like to calibrate the chart according to a certain
hitting probability, for instance a hitting probability of 0.05 within
100 steps, this is achieved by specifying property = "calhitprob"
and
params = list(target = 0.05, nsteps = 100)
:
> SPCproperty(data = X, nrep = 50, property = "calhitprob",
+ chart = chart, params = list(target = 0.05, nsteps=100), covprob = 0.9,
+ quiet = TRUE)
90 # CI: A threshold of 7.137 gives an in-control false alarm probability
of at most 0.05 within 100 steps.
Unadjusted result: 5.285
Based on 50 bootstrap repetitions.
The function runchart
is used to run the chart on future data, and the
option xi
specifies which parameters to use when running the chart:
> newX <- rnorm(100)
> S <- runchart(chart, newdata = newX, xi = xihat)
The following code produces the plots in the first row of Figure 1, using the ARL calibrated threshold calculated above:
> par(mfrow = c(1, 2), mar = c(4, 5, 0, 0))
> plot(newX, xlab = "t")
> plot(S, ylab = expression(S[t]), xlab = "t", type = "b",
+ ylim = range(S, cal@res+2, cal@raw))
> lines(c(0,100), rep(cal@res, 2), col = "red")
> lines(c(0,100), rep(cal@raw, 2), col = "blue", lty = 2)
> legend("topleft", c("Adjusted threshold","Unadjusted threshold"),
+ col = c("red", "blue"), lty = 1:2)
Next we consider a two-sided Shewhart chart, assuming that all
observations are normally distributed. The in-control mean and standard
deviation are estimated from
We first define the chart by
> chartShew <- new("SPCShew", model = SPCModelNormal(), twosided = TRUE)
and then generate
> X <- rnorm(250)
> xihat <- xiofdata(chartShew, X)
> str(xihat)
List of 3
$ mu: num 0.0251
$ sd: num 1.05
$ m : int 250
If the Shewhart chart is run with the standard threshold property = "ARL"
and
params = list(threshold = 3)
:
> SPCproperty(data = X, nrep = 50, property = "ARL", chart = chartShew,
+ params = list(threshold = 3), quiet = TRUE)
90 # CI: A threshold of 3 gives an in-control ARL of at least 213.1.
Unadjusted result: 370.4
Based on 50 bootstrap repetitions.
A two-sided Shewhart chart for normally distributed data with true
parameters and a threshold of
> cal <- SPCproperty(data = X, nrep = 50, property = "calARL", chart = chartShew,
+ params = list(target = 370), quiet = TRUE)
> cal
90 # CI: A threshold of 3.209 gives an in-control ARL of at least 370.
Unadjusted result: 3
Based on 50 bootstrap repetitions.
Finally we run the chart with new observations. The simulated new observations are in-control for the first 100 observations, and then there is a shift in the mean from observations 101 an onwards. The corresponding plot is given in Figure 2.
> newX <- rnorm(150, mean = c(rep(0, 100), rep(2, 50)))
> S <- runchart(chartShew, newdata = newX, xi = xihat)
The set up is as described in Section Risk-adjusted
charts, and with estimated regression coefficients
The following generates a data set of past observations from the model
> n <- 500
> Xlinreg <- data.frame(x1 = rbinom(n, 1, 0.4), x2 = runif(n, 0, 1), x3 = rnorm(n))
> Xlinreg$y <- 2 + Xlinreg$x1 + Xlinreg$x2 + Xlinreg$x3 + rnorm(n)
Next, we initialize the chart
> chartlinregCUSUM <-
+ new("SPCCUSUM", model = SPCModellm(Delta = 1, formula = "y~x1+x2+x3"))
where SPCModellm()
uses non-parametric bootstrapping as explained in
Section Risk-adjusted charts. The estimated
parameters for running the chart,
> xihat <- xiofdata(chartlinregCUSUM, Xlinreg)
> xihat
Call:
lm(formula = formula, data = P)
Coefficients:
(Intercept) x1 x2 x3
2.0222 1.0360 1.0350 0.9711
Next we find the threshold that with roughly 90% probability results in an average run length of at least 100 in control.
> cal <- SPCproperty(data = Xlinreg, nrep = 50, property = "calARL",
+ chart = chartlinregCUSUM, params = list(target = 100), quiet = TRUE)
> cal
90 # CI: A threshold of 3.138 gives an in-control ARL of at least 100.
Unadjusted result: 2.745
Based on 50 bootstrap repetitions.
Finally, we run the chart with new observations that are in-control for the first 100 observations and then switches to out-of-control. A plot of the resulting CUSUM is given in Figure 3.
> n <- 120
> newXlinreg <- data.frame(x1 = rbinom(n, 1, 0.4), x2 = runif(n, 0, 1),
+ x3 = rnorm(n))
> outind <- c(rep(0, 100), rep(1, n-100))
> newXlinreg$y <-
+ 2 + newXlinreg$x1 + newXlinreg$x2 + newXlinreg$x3 + rnorm(n) + outind
> S <- runchart(chartlinregCUSUM, newdata = newXlinreg, xi = xihat)
An EWMA chart based on the residuals of a linear regression model can be
defined by
"SPCEWMA"
, Delta = 0
and a value of
> chartlinregEWMA <- new("SPCEWMA", model = SPCModellm(Delta = 0,
+ formula = "y~x1+x2+x3"), lambda = 0.1)
> calEWMA <- SPCproperty(data = Xlinreg, nrep = 50, property = "calARL",
+ chart = chartlinregEWMA, params = list(target = 100), quiet = TRUE)
> calEWMA
90 # CI: A threshold of +/- 0.5337 gives an in-control ARL of at least 100.
Unadjusted result: 0.496
Based on 50 bootstrap repetitions.
> xihat <- xiofdata(chartlinregEWMA, Xlinreg)
> M <- runchart(chartlinregEWMA, newdata = newXlinreg, xi = xihat)
A plot of the resulting EWMA chart is given in Figure 4.
Further usage of risk-adjusted charts will be demonstrated in Section Application to cardiac surgery data where a CUSUM for logistic regression will be explained and used.
A basic structure of the package is, as explained in the introduction of Section Basic usage of the package, a definition of two types of objects. We will now look further into the details of these two objects and how they can be used to add new charts, new data models and other estimation procedures.
One object is an S3 class of type "SPCDataModel"
that implements how
observed data are used to fit the model and how updates for the chart
are being computed. The second object is an S4 class of type
"SPCchart"
which implements how these updates are converted into
charts and how the charts are being calibrated. The main advantage of
this separation into two different objects is that it reduces the amount
of redundancy in the code.
The package was originally developed with S4 classes only, to take advantage of the more flexible method dispatch. However, to improve performance, the data model classes, whose methods are called very frequently, were switched to S3 classes.
We first focus on how to generate a bespoke data model. For this one
needs to implement a class of type "SPCDataModel"
. Every element of
the class has to consist of a list of the following functions:
updates
, getcdfupdates
, Pofdata
, resample
, xiofP
, which have
to be of a specific form. The arguments generally have the following
meaning: xi
denotes the parameter vector needed to create updates for
running the chart from observed data, data
is observed data, P
is a
data model.
updates(xi,data)
: Returns updates for the chart using the
parameter xi
and the observed data data
.
Pofdata(data)
: Estimates a probability model from the data.
xiofP(P)
: Computes the parameter xi
needed to compute updates
from an (estimated) probability model P
.
resample(P)
: Generates a new data set from the probability model
P
.
getcdfupdates(P,xi)
: Returns the cumulative distribution function
(CDF) of updates with data generated from the probability model P
and updates computed using the parameter xi
.
In the following we give some examples.
Consider again the example discussed in Sections Motivating example and CUSUM chart with estimated in-control state of a CUSUM chart which assumes a normal distribution of the observations with unknown mean and standard deviation. We now demonstrate how to change the estimators to use the median the mean absolute deviation (MAD) instead of using the mean and the sample standard deviation. Using these robust estimators could be desirable if there could be outliers present in the past in-control data.
For this we only need to override one function of the existing data
model "SPCModelNormal"
, the method Pofdata
that estimates the
parameters. The format of the input and output for the new Pofdata
function needs to be unchanged. The following code first list the old
function and then overrides it with the new.
> model <- SPCModelNormal(Delta = 1)
> model$Pofdata
function (data)
{
list(mu = mean(data), sd = sd(data), m = length(data))
}
> model$Pofdata <- function(data){
+ list(mu = median(data), sd = mad(data), m = length(data))
+ }
Properties of this chart can then be computed as before:
> X <- rnorm(100)
> chartrobust <- new("SPCCUSUM", model = model)
> SPCproperty(data = X, nrep = 50, property = "calARL",
+ chart = chartrobust, params = list(target = 100), quiet = TRUE)
90 # CI: A threshold of 4.162 gives an in-control ARL of at least 100.
Unadjusted result: 2.987
Based on 50 bootstrap repetitions.
In this example we illustrate how to construct a CUSUM chart that
assumes that the observations are coming from an exponential
distribution with unknown rate updates
, Pofdata
,
xiofP
, resample
and getcdfupdates
. The basic CUSUM chart class
“SPCCUSUM” will be used without changes.
The updates for a CUSUM chart can in general situations be based on the
log likelihood ratio between an out-of-control model and the in-control
model (Hawkins and Olwell 1998). I.e. the CUSUM can be written
updates
.
To define the data model the CDF of these updates must also be computed.
This can be done in closed form, but requires distinguishing the case
getcdfupdates
. We decide to use
parametric resampling of the data (resample
) and we decide to estimate
the parameter
The following code implements this.
> SPCModelExponential = function(Delta = 1.25){
+ structure(list(
+ Pofdata = function(data){
+ list(lambda = 1/mean(data), n = length(data))
+ },
+ xiofP = function(P) P,
+ resample = function(P) rexp(P$n, rate = P$lambda),
+ getcdfupdates = function(P, xi) {
+ if (Delta<1)
+ function(x)
+ pmax(0, 1-exp(-P$lambda*(x-log(Delta))/(xi$lambda*(1-Delta))))
+ else
+ function(x)
+ pmin(1, exp(-P$lambda*(log(Delta)-x)/(xi$lambda*(Delta-1))))
+ },
+ updates = function(xi, data) log(Delta)-xi$lambda*(Delta-1)*data
+ ), class = "SPCDataModel")
+ }
Next, we put this into practice. First we initiate the chart.
> ExpCUSUMchart <- new("SPCCUSUM", model = SPCModelExponential(Delta = 1.25))
The following creates some past observations and compute the threshold needed to achieve an ARL of 1000.
> X <- rexp(500)
> cal <- SPCproperty(data = X, nrep = 50, property = "calARL", chart = ExpCUSUMchart,
+ params = list(target = 1000), covprob = 0.9, quiet = TRUE)
> cal
90 # CI: A threshold of 4.054 gives an in-control ARL of at least 1000.
Unadjusted result: 3.165
Based on 50 bootstrap repetitions.
Finally, we generate some new data and make a CUSUM plot with thresholds which is displayed in Figure 5.
We now discuss the chart model and how to implement new charts. Every
chart is an S4 class derived from the class "SPCchart"
. It has one
slot, model
, which contains the data model to be used with the chart.
The main method that needs to be implemented is the method getq
, which
computes desired properties of a given control chart. It receives two
arguments: which property to report (a string, e.g. ARL
, hitprob
,
calARL
, calhitprob
) and additional parameters for this property,
e.g. a threshold when computing the ARL (property ARL
), a threshold
and a number of steps when computing hitting probabilities (property
hitprob
), a desired ARL when calibrating the threshold (property
calARL
).
We now give one example of how to implement a new chart. Suppose the
in-control distribution is assumed to have a distribution with
continuous cdf
The main work in implementing a chart is implementing functions that
compute properties of the chart (e.g. ARL in control, threshold needed
to give a certain ARL or hitting probabilities within certain steps).
These properties need to be computed given the parameter that is used
for running the chart (xi
) and given the distribution of the
observations (P
).
In this example we implement two methods: one for computing the ARL
(ARL
) and one for computing the calARL
). The ARL for the Shewhart chart with asymetric control limits
run with estimated parameters will be
calARL
, the
> setClass("SPCShewAsym", contains = c("SPCchart"))
> setMethod("getq", signature = "SPCShewAsym", function(chart, property, params){
+ if (property == "calARL"){
+ list(
+ q = function(P, xi){
+ pobs <- function(alpha)(
+ getcdfupdates(chart, xi = xi, P = P)(xi$quant(alpha/2))
+ +(1-getcdfupdates(chart, xi = xi, P = P)(xi$quant(1-alpha/2))))
+ res <- uniroot(function(x) params$target-(1/pobs(x)),
+ lower = 1e-7,upper = 0.4)$root
+ as.double(log(res/(1-res)))
+ },
+ trafo = function(x) exp(x)/(1+exp(x)),
+ lowerconf = FALSE,
+ format = function(res)
+ paste("A threshold of alpha=", format(res, digits = 4),
+ " gives an in-control ARL of at least ",
+ params$target, ".", sep = "", collapse = "")
+ )
+ }else if (property == "ARL"){
+ list(
+ q = function(P, xi){
+ -log(getcdfupdates(chart, xi = xi, P = P)(xi$quant(params$alpha/2))
+ +(1-getcdfupdates(chart, xi = xi, P = P)(xi$quant(1-params$alpha/2)))
+ )},
+ trafo = function(x) exp(x),
+ lowerconf = FALSE,
+ format = function(res)
+ paste("A threshold defined by alpha=", params$alpha,
+ " gives an in-control ARL of at least ",
+ format(res, digits = 4), ".", sep = "",collapse = "")
+ )
+ }else stop("property ", property, " not implemented.")
+ })
[1] "getq"
Now we want to use this chart for the example of a gamma distribution.
For this we need to implement a basic data model, which uses the
observations directly as updates. We estimate the parameter of the gamma
distribution via the method of moments (Pofdata
). To run the chart we
need the quantile function to calculate the estimates of the quantiles
xiofP
).
Resampling is again parametric resampling under the assumed Gamma
distribution (resample
).
> X <- rgamma(100, scale = 3, shape = 2)
> modGammaBasic = structure(
+ list(
+ Pofdata = function(data){
+ list(scale = var(data)/mean(data),
+ shape = mean(data)^2/var(data),
+ n = length(data))
+ },
+ xiofP = function(P){
+ res <- P;
+ res$quant <- function(alpha)
+ qgamma(alpha, shape = P$shape, scale = P$scale);
+ res
+ },
+ resample = function(P) {
+ rgamma(P$n, shape = P$shape, scale = P$scale)
+ },
+ getcdfupdates = function(P, xi) {
+ function(x) pgamma(x, shape = P$shape, scale = P$scale)
+ },
+ updates = function(xi, data) data
+ ),
+ class = "SPCDataModel")
> chartAsym <- new("SPCShewAsym", model = modGammaBasic)
> SPCproperty(data = X, nrep = 50, chart = chartAsym,
+ property = "ARL", params = list(alpha = 0.01), quiet = TRUE)
90 # CI: A threshold defined by alpha=0.01 gives an in-control ARL of at
least 34.54.
Unadjusted result: 100
Based on 50 bootstrap repetitions.
> SPCproperty(data = X, nrep = 50,
+ property = "calARL", chart = chartAsym,
+ params = list(target = 100), quiet = TRUE)
90 # CI: A threshold of alpha=0.002869 gives an in-control ARL of at least
100.
Unadjusted result: 0.009998
Based on 50 bootstrap repetitions.
To show the advantage of the modular setup we now modify the data model
to assume that the data is coming from an exponential distribution, in
other words that the shape parameter of the gamma distribution is 1. We
just need to redefine the function PofData
to accomplish this.
> modExp = modGammaBasic
> modExp$Pofdata <- function(data){
+ list(scale = mean(data),
+ shape = 1,
+ n = length(data))
+ }
> chartAsymExp <- new("SPCShewAsym", model = modExp)
> X <- rexp(100)
> SPCproperty(data = X, nrep = 50, chart = chartAsymExp,
+ property = "ARL", params = list(alpha = 0.01), quiet = TRUE)
90 # CI: A threshold defined by alpha=0.01 gives an in-control ARL of at
least 84.08.
Unadjusted result: 100
Based on 50 bootstrap repetitions.
> SPCproperty(data = X, nrep = 50,
+ property = "calARL", chart = chartAsymExp,
+ params = list(target = 100), quiet = TRUE)
90 # CI: A threshold of alpha=0.007553 gives an in-control ARL of at least
100.
Unadjusted result: 0.009998
Based on 50 bootstrap repetitions.
In this section we illustrate use of the package with an application to
a data set on the outcome of cardiac surgery from a UK centre for
cardiac surgery over the period 1992-1998. These data were first
analysed by (Steiner et al. 2000) and have later been used for illustration
by several authors (e.g. Sego and Woodall 2009; Jones and Steiner 2012; Zhang et al. 2016). A
random subset of these data with some random noise added is available in
the data frame cardiacsurgery
in spcadjust. In this data frame the
date of surgery, a surgeon number, the time until death if the patient
died during the follow up time and the Parsonnet score of the patient is
given. The Parsonnet score is a well established scoring system in
cardiac surgery which combines a number of risk factors into a risk
score for the patient. The data frame contains 5595 cases.
> data(cardiacsurgery)
Like (Steiner et al. 2000) we will focus on the 30-day post-operative mortality rate and use a logistic regression model with Parsonnet score as covariate for taking into account the differences in risk between patients, and use a CUSUM for monitoring.
We first describe the general set up for CUSUM monitoring with logistic
regression models and then return to the cardiac surgery example. Assume
we have
For detecting a change to
The two first years of data, containing 1769 cases, are used for estimating the parameters of the logistic regression model. The effect of the Parsonnet score turns out to be non-linear on the logit scale, applying a square root transform of the score sorts out this. We thus set up data for estimating the chart parameters (phase I sample) and for running the chart (phase II sample) as follows:
> #Use dead within 30 days as response
> dead30 <- as.numeric(cardiacsurgery$time <= 30)
> #Use the two first years of data as phase I sample
> phaseone <- cardiacsurgery$date <= 730
> estdata <-data.frame(y = dead30[phaseone],
+ x = sqrt(cardiacsurgery$Parsonnet[phaseone]))
> #Use the five last years of data as phase II sample
> phasetwo <- !phaseone
> rundata <- data.frame(y = dead30[phasetwo],
+ x = sqrt(cardiacsurgery$Parsonnet[phasetwo]),
+ z = cardiacsurgery$surgeon[phasetwo],
+ year = (cardiacsurgery$date[phasetwo]-730)/365)
Next we set up charts for monitoring against roughly a doubling and a
halving of the mortality rate, respectively. With a baseline rate of
6.1% this corresponds to
> chartlogregd <-
+ new("SPCCUSUM", model = SPCModellogregLikRatio(Delta = 0.75, formula = "y~x"))
> chartlogregh <-
+ new("SPCCUSUM", model = SPCModellogregLikRatio(Delta = -0.75, formula = "y~x"))
For calculating the thresholds we specify an in control ARL of
> cald <- SPCproperty(data = estdata, chart = chartlogregd, property = "calARL",
+ nrep = 50, params = list(target = 10000, gridpoints = 250),
+ parallel = Inf)
> cald
90 # CI: A threshold of 6.157 gives an in-control ARL of at least 10000.
Unadjusted result: 5.065
Based on 50 bootstrap repetitions.
> calh <- SPCproperty(data = estdata, chart = chartlogregh, property = "calARL",
+ nrep = 50, params = list(target = 10000, gridpoints = 250),
+ parallel = Inf)
> calh
90 # CI: A threshold of 6.271 gives an in-control ARL of at least 10000.
Unadjusted result: 4.469
Based on 50 bootstrap repetitions.
The option parallel = Inf
speeds up the bootstrap, but this option
must be skipped if the code is run in an environment which does not
support parallel processing.
Assuming that the distribution of Parsonnet scores is roughly the same for the patients each surgeon receives, and that the distribution remains approximately the same in the remainder of the period as in the first two years, the thresholds calculated above can be used for running individual charts for each of the surgeons. Notice that in such a setting where several charts are run with the same estimated parameters a threshold adjustment which achieves a guaranteed conditional performance is particularly relevant (Gandy and Kvaløy 2013).
The resulting CUSUM plots for four of the surgeons are shown in Figures 6 and 7. In Figures 6 the CUSUM for the second surgeon starts to increase after a while and passes both the unadjusted and the adjusted threshold. This could e.g. be due to this surgeon starting to receive more difficult cases, not sufficiently accounted for by the adjustment for Parsonnet score.
For the monitoring against decreased mortality in Figures 7 there is a signal for one of the surgeons, indicating better survival than explained by the adjustment for Parsonnet score. The CUSUM for the third surgeon crosses the unadjusted threshold, but not the adjusted and is thus not regarded as a true signal.
spcadjust, surveillance, spc, qcc, IQCC, qcr, edcc, MSQC
Environmetrics, Epidemiology, SpatioTemporal, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Gandy & Kvaløy, "spcadjust: An R Package for Adjusting for Estimation Error in Control Charts", The R Journal, 2017
BibTeX citation
@article{RJ-2017-014, author = {Gandy, Axel and Kvaløy, Jan Terje}, title = {spcadjust: An R Package for Adjusting for Estimation Error in Control Charts}, journal = {The R Journal}, year = {2017}, note = {https://rjournal.github.io/}, volume = {9}, issue = {1}, issn = {2073-4859}, pages = {458-476} }