This article introduces the R package ctmcd, which provides an implementation of methods for the estimation of the parameters of a continuous-time Markov chain given that data are only available on a discrete-time basis. This data consists of partial observations of the state of the chain, which are made without error at discrete times, an issue also known as the embedding problem for Markov chains. The functions provided comprise matrix logarithm based approximations as described in Israel et al. (2001), as well as Kreinin and Sidelnikova (2001), an expectation-maximization algorithm and a Gibbs sampling approach, both introduced by Bladt and Sørensen (2005). For the expectation-maximization algorithm Wald confidence intervals based on the Fisher information estimation method of Oakes (1999) are provided. For the Gibbs sampling approach, equal-tailed credibility intervals can be obtained. In order to visualize the parameter estimates, a matrix plot function is provided. The methods described are illustrated by Standard and Poor’s discrete-time corporate credit rating transition data.
The estimation of the parameters of a continuous-time Markov chain (see, e.g., Norris (1998) or Ethier and Kurtz (2005); also referred to as Markov process) when only discrete time observations are available is a widespread problem in the statistical literature. Dating back to Elfving (1937), this issue is also known as the embedding problem for discrete-time Markov chains. The problem occurs in the modeling of dynamical systems when due to various reasons such as a difficult measurement procedure only discrete-time observations are available. This is the case in a wide range of applications, e.g., in the analysis of gene sequence data (see, e.g., Hobolth and Stone (2009), Verbyla et al. (2013) or Chen et al. (2014)), for causal inference in epidemiology (see, e.g., Zhang and Small (2012)), for describing the dynamics of open quantum systems (see, e.g., Cubitt et al. (2012)), or in rating based credit risk modeling (see, e.g., Dorfleitner and Priberny (2013), Yavin et al. (2014) or Hughes and Werner (2016)) to name only a few.
In the following, an explicit statement of the missing data setting
shall be given and the notation used in this manuscript shall be
introduced: Consider that realizations of a continuous-time Markov
chain, i.e., paths of states
In the missing data situation described in this paper, these paths are
however not completely observed, but only at points in time
A continuous-time Markov chain has the parameter set
Besides the R package msm, see Jackson et al. (2011), which only provides functions for direct likelihood optimization, there is no other publicly accessible implementation available which allows for estimating the parameters of a continuous-time Markov chain given that data have been only observed on a discrete-time basis. Against this background, this paper introduces the R package ctmcd, a continuously extended, improved and documented implementation based of what started as supplementary R code to Pfeuffer (2016). The functions of the package are explained and illustrated by Standard and Poor’s corporate rating transition data. The outline of the paper is as follows: first, three matrix logarithm adjustment approaches are explained. Second, likelihood inference is illustrated for an instance of the expectation-maximization algorithm. Third, the implementation of a Gibbs sampler is presented to facilitate Bayesian inference. Numerical properties of the different approaches are evaluated and examples for more complex applications of the methods are shown. Finally, the results of the paper are summarized.
A basic approach to estimate generator matrices from discrete-time
observations is to inversely use the matrix exponential relationship
between conditional discrete time transition matrices
However, the matrix logarithm approach has two shortcomings. First, the
matrix logarithm is not a bijective function. As, e.g., shown by
Speakman (1967), a transition matrix can have more than one valid
generator. However, for a certain subset of discrete-time transition
matrices, it can be shown that there exists only a single unique
generator, for details on criteria (for discrete-time transition
matrices) under which this is the case, see, e.g., Cuthbert (1972),
Cuthbert (1973), Singer and Spilerman (1976) or Israel et al. (2001). Second, the method requires that
the derived matrix
In this context, Israel et al. (2001) introduce two approaches. On the one hand,
diagonal adjustment (DA) works by forcing negative off-diagonal
elements of method="DA"
to gm()
, the generic generator
matrix estimation function of the package. The method requires the
specification of a discrete time transition matrix tm
, which in the
case of matrix logarithm adjustment approaches is a matrix of
relative transition frequencies, which refers to the matrix
In order to illustrate the methods, we employ Standard and Poor’s (2000) global corporate
credit ratings data. The rating categories in this data set have the
commonly known symbols tm_abs
a matrix of discrete-time absolute transition frequencies from
the first to the last day of the fiscal year 2000. We have to take into
account that the default category tm_rel
by standardizing the row entries to sum to tm=tm_rel
and the time horizon of this discrete-time matrix
as te=1
, because tm_abs
and tm_rel
refer to credit rating changes
for a single year time interval (fiscal year 2000).
data(tm_abs)
tm_rel <- rbind((tm_abs / rowSums(tm_abs))[1:7,], c(rep(0, 7), 1))
gmda <- gm(tm=tm_rel, te=1, method="DA")
On the other hand, Israel et al. (2001) also describe the weighted adjustment
(WA) of the non-negative off-diagonal entries as an alternative, i.e.,
off diagonal elements are adjusted by
where the cut off jump probability mass is redistributed among the
remaining positive off diagonal elements according to their absolute
values. In analogy to diagonal adjustment, a weighted adjustment
estimate can be derived by using method="WA"
as follows:
gmwa <- gm(tm=tm_rel, te=1, method="WA")
The third matrix logarithm adjustment approach is the
quasi-optimization (QO) procedure of Kreinin and Sidelnikova (2001). This method finds a
generator
gmqo <- gm(tm=tm_rel, te=1, method="QO")
By specifying method="QO"
, we can then get a quasi-optimization
approach result for our data. Despite the possibility to just show the
parameter estimates for the different methods in the console using,
e.g., print(gmDA)
or simply calling gmDA()
, ctmcd also provides a
matrix plot function plotM()
that especially allows the visualization
of generator matrix estimates and can be easily accessed by the generic
plot()
function:
plot(gmda)
plot(gmwa)
plot(gmqo)
The results can be seen in figure 2.
Given that complete continuous-time data is available, the likelihood
function for a generator matrix is given by
The difficulty is now that when data is only observed at times
where gm0
, which is an arbitrarily chosen generator matrix where
all off diagonal entries are
gm0 <- matrix(1, 8, 8)
diag(gm0) <- 0
diag(gm0) <- -rowSums(gm0)
gm0[8,] <- 0
The maximum likelihood estimate can then be obtained by using the gm()
function, providing a matrix of absolute numbers of state changes (in
the credit rating example i.e., tm=tm_abs
), specifying the method
argument by method="EM"
and setting an initial guess (here:
gmguess=gm0
).
gmem <- gm(tm=tm_abs, te=1, method="EM", gmguess=gm0)
plot(gmem)
plot(gmem$ll, main="Expectation Maximization Algorithm\nLog Likelihood Path",
xlab="Iteration", ylab="Log-Likelihood")
The result of this estimate can be seen in figure 3 together with a plot of the log-likelihood path of the single EM algorithm iteration steps.
The function being actually optimized by the EM algorithm is the
marginal likelihood
opt.method="optim"
), Newton optimization (opt.method="nlm"
), a
bounded optimization by a quadratic approximation approach
(opt.method="bobyqa"
) introduced by Powell (2009) and a Fisher
scoring technique by Kalbfleisch and Lawless (1985) (opt.method="fisher"
).
In order to benchmark the EM algorithm, the different techniques shall
be compared using the derived maxima and the time needed to perform the
estimation.
### Data transformation for msm function
mig <- NULL
id <- 0
for(i in 1:7) {
for(j in 1:8) {
if(tm_abs[i,j] > 0) {
for(n in 1:tm_abs[i,j]) {
id <- id + 1
mig <- rbind(mig, c(id, 0, i), c(id, 1, j))
}
}
}
}
mig_df <- data.frame(id=mig[,1], time=mig[,2], state=mig[,3])
### Comparing estimates
gmem <- gm(tm_abs, te=1, method="EM", eps=1e-7, gmguess=gm0)
ctmcdlogLik(gmem$par, tm_abs, 1)
q0 <- rbind(matrix(1, 7, 8), 0)
msm_est1 <- msm(state ~ time, id, data=mig_df, qmat=q0,
opt.method="optim", gen.inits=TRUE)
ctmcdlogLik(qmatrix.msm(msm_est)[[1]], tm_abs, 1)
msm_est2 <- msm(state ~ time, id, data=mig_df, qmat=q0,
opt.method="nlm", gen.inits=TRUE)
ctmcdlogLik(qmatrix.msm(msm_est)[[1]], tm_abs, 1)
msm_est3 <- msm(state ~ time, id, data=mig_df, qmat=q0,
opt.method="nlm", gen.inits=TRUE)
ctmcdlogLik(qmatrix.msm(msm_est)[[1]], tm_abs, 1)
msm_est4 <- msm(state ~ time, id, data=mig_df, qmat=q0,
opt.method="fisher", gen.inits=TRUE)
The marginal likelihood function for a given generator matrix,
discrete-time interval ctmcdlogLik()
. Optimization with the previously employed remote
initial value gm0
fails for all msm optimization methods, with the
closer built-in parameter initialization gen.inits=TRUE
, we obtain the
results presented in table 1. Optimization also fails for the
method of Kalbfleisch and Lawless (1985). However, it is already mentioned in
the helpfiles for the msm()
function, that optimization using this
approach lacks stability. Thus, the advantages of the EM algorithm in
this specific data setting where no covariates are included in the
calculation are its numerical performance and its stability.
Method | EM | optim | nlm | bobyqa |
---|---|---|---|---|
Log-Likelihood | -3194.255 | -3194.255 | -3194.255 | -3194.259 |
Time Elapsed [s] | 0.46 | 4.65 | 28.33 | 167.52 |
The package ctmcd also provides a function for deriving a confidence
interval based on the asymptotic normality of the maximum likelihood
estimate. As however in a partially observed data setting the maximum
likelihood estimate is based on the likelihood function of the complete
observations
Bladt and Sørensen (2009) then concretize this expression for a generator matrix
estimation framework and derive
as diagonal and off-diagonal ciEM()
, which can be easily accessed using the generic
gmci()
command, which takes as arguments an EM algorithm estimate
object and a significance level alpha
.
ciem <- gmci(gmem, alpha=.05)
plot(ciem)
By default, the derivatives for the information matrix are calculated
using the analytical expressions of Smith and Reis (2017) (cimethod="SdR"
),
numerical derivatives as suggested in Bladt and Sørensen (2009) can be accessed by
cimethod="BS"
.
The matrix plot function can also be applied to "gmci"
interval
estimate objects, see, e.g., figure 4. One can see in
this example that interval estimates are not provided for all generator
matrix entries. This is due to the fact that the numerical evaluation of
the above described expressions for deriving the Fisher information
matrix and inverting it becomes unstable when the parameter estimates
are small. Thus a lower limit eps
can or has to be specified so that
for generator matrix elements smaller than eps
, no interval estimates
are obtained. By default eps=1e-04
.
Bladt and Sørensen (2005) show that the Gamma distribution "list"
object named prior
. Thereby, the first
element of the list has to be an
pr <- list()
pr[[1]] <- matrix(1, 8, 8)
pr[[1]][8,] <- 0
pr[[2]] <- c(rep(5, 7), Inf)
as a simple example, where tm=tm_abs
). Furthermore, the
length of the burn-in period must be chosen (here: burnin=1000
).
Convergence of the algorithm is evaluated by the approach of
Heidelberger and Welch (1981), which is implemented in the R package
coda, see Plummer et al. (2006).
The advantage of this method is that it can be applied to single chains;
a shortcoming is that, as for similar methods, evaluation with multiple
parameters is time consuming. Thus, besides specifying a p-value for the
convergence test by the argument conv_pvalue
, one can also set a
frequency criterion conv_freq
for how often with an equidistant number
of trials convergence shall be checked. By default, conv_pvalue=0.05
and conv_freq=10
. One should notice that as the method of
Heidelberger and Welch (1981) is a two sample location test for comparing
the stability of the parameter estimates at the beginning and the end of
the Markov chain, the hypotheses are set so that an increasing p-value
implies a stricter convergence criterion. Another stopping rule is the
maximum number of iterations niter
, which by default is set as
niter=1e04
. If convergence according to the method of
Heidelberger and Welch (1981) is not given before the maximum number of
iterations is reached, a warning is displayed.
Setting the method
argument to the value "GS"
will then lead the
generic generator matrix estimation method to provide a posterior mean
estimate
gmgs <- gm(tm=tm_abs, te=1, method="GS", prior=pr, burnin=1000)
plot(gmgs)
The result can be seen in figure 5.
Central to the Gibbs sampling algorithm is the sampling of realizations
from the missing data full conditional distribution given the current
parameters and the discrete time observations. This yields the sample
paths from a continuous-time Markov chain with generator matrix estimate
sampl_method="ModRej"
, on the other hand the
uniformization sampling scheme of Fearnhead and Sherlock (2006), which is set as default
method and can be manually employed by setting sampl_method="Unif"
. As
the simulation of trajectories of the process is in practice often very
time consuming, the method is implemented in C++ based on the source
code of Fintzi (2016) which itself is built upon the supplementary R code of
Hobolth (2008). Figure 6 shows the time (in seconds)
needed for sampling 10000 trajectories for each of the two methods and
any combination of initial and endstates.
speedmat_modrej <- matrix(0, 8, 8)
speedmat_unif <- matrix(0, 8, 8)
tpm <- expm(gmgs$par)
for(i in 1:7){
for(j in 1:8){
elem <- matrix(0, 8, 8)
elem[i,j] <- 1e5
t0 <- proc.time()
rNijTRiT_ModRej(elem, 1, gmgs$par)
speedmat_modrej[i,j] <- (proc.time() - t0)[3]
t0 <- proc.time()
rNijTRiT_Unif(elem, 1, gmgs$par,tpm)
speedmat_unif[i,j] <- (proc.time() - t0)[3]
}
}
plotM(speedmat_modrej,
main="Time for Simulation of 100,000 Paths\nModified Rejection Sampling",
xnames=rownames(tm_abs), ynames=colnames(tm_abs))
plotM(speedmat_unif,
main="Time for Simulation of 100,000 Paths\nUniformization Sampling",
xnames=rownames(tm_abs), ynames=colnames(tm_abs))
Although in this example, the uniformization sampling approach clearly
outperforms modified rejection sampling, it has to be mentioned that the
computing time for deriving discrete time state transitions in the
uniformization approach (which is carried out by using the matrix
exponential) is not included, because it is only calculated once in each
Gibbs sampling iteration step. Moreover, rejection rates of the approach
of Nielsen (2002) might be lower if trajectories are sampled for a whole row
of a transition matrix and not single initial and end state combinations
because then more than one possible path ending can be accepted in a
single draw. Therefore, to provide the option to combine the individual
strengths of both approaches, it is possible to set
sampl_method="Comb"
and provide a matrix with entries of either "M"
or "U"
for the optional argument combmat
, specifying which algorithm
shall be used for simulating trajectories for the specific start and
endpoint combination. Moreover, it is possible to include an own
external sampling algorithm implementation by specifying method="Ext"
and an argument sampl_func
with a sampling function of the format
sf <- function(tmabs, te, gmest) {
### Derive Expected Holding Times (RiT) and Number of State Transitions (NijT)
return(list(RiT=..., NijT=...))
}
Thereby, the matrix of absolute transition frequencies tmabs
, the time
interval for the discrete-time transitions te
and the current
parameter estimate of the Gibbs sampler gmest
are input variables to
this function and a vector of cumulative simulated holding times RiT
and a matrix of continuous-time state changes NijT
needs to be
returned.
As the numerical performance of the Gibbs sampling algorithm is severely
dependent on the performance of the endpoint conditioned path sampling
algorithm, we would like to briefly point out that the whole method can
also be run in parallel. This can be achieved by setting up a number of
nco
independent chains with N/nco
iterations each, whereas N
denotes the total number of iterations and nco
the number of parallel
threads. As long as the initial states of the chain are forgotten, the
single generator matrix draws can be seen as independent realizations.
Thus, a burnin period must be considered in every thread and
conv_pvalue
has to be set to
library(foreach)
library(doParallel)
N <- 1e5
nco <- detectCores()
cl <- makeCluster(nco)
registerDoParallel(cl)
gspar=foreach(i=1:nco, .packages=c("ctmcd", "expm")) %dopar%
gm(tm=tm_abs, te=1, method="GS", burnin=1000, prior=pr,
conv_pvalue=1, niter=N / nco)
stopCluster(cl)
### Derive Estimate
parlist <- lapply(gspar, function(x) x$par)
parest <- Reduce('+', parlist) / nco
In this multiple chain setting, convergence can be analyzed by the
potential scale reduction factor diagnostic of Gelman and Rubin (1992). The potential
scale reduction factor describes by what fraction the variance of the
draws in the chain can be reduced if the chain is extended to an
infinite length. Convergence is assumed when the factor is close to
### Check Convergence
library(coda)
chainlist <- as.mcmc.list(lapply(gspar, function(x) {
as.mcmc(do.call(rbind, lapply(x$draws, as.vector)))
}))
parchainlist <- lapply(chainlist,
function(x) x[,as.vector(parest) > 0])
gelman.diag(parchainlist)
Employing the implementation of this diagnostic in the coda package,
we derive a multivariate potential scale reduction factor of
After having discussed various aspects of point estimation, an example
for Bayesian interval estimation shall be presented as well. Bladt and Sørensen (2009)
show that equal-tailed credibility intervals can be easily obtained from
samples of the joint posterior distribution by empirical quantiles
gmci()
, the generic function for generator matrix interval
estimates, and specifying that an interval based on a Gibbs sampling
object (here: gmgs
) shall be derived will automatically call the
ciGS()
subroutine and yield an equal-tailed credibility interval as
outlined above. Setting a confidence level alpha=0.05
will then yield
the estimate which can be seen in figure 7.
cigs <- gmci(gm=gmgs, alpha=0.05)
plot(cigs)
The approaches "DA"
, "WA"
, "EM"
and "GS"
are suitable for state
spaces of size "QO"
requires a state space
of at least size
Figure 8 shows how the methods scale concerning numerical
performance. The simulation study shows examples for the average time in
seconds to perform an estimate with each of the methods outlined in this
manuscript. Thereby, the EM algorithm and the Gibbs sampler are run for
100 iterations each and the data on which the estimates are performed is
generated by distributing 1000 and respectively, 10000 discrete-time
transitions over a single unit time horizon uniformly over matrices of
dimension
Based on the derived parameter estimates, predictions about
discrete-time transitions into absorbing states of the Markov chain can
be made and moreover, systemized as a function of the length of the
discrete time interval. In the credit rating example shown in this
paper, such functions are discrete time probabilities of default of a
given initial rating category depending on the time horizon (in years)
and shall be called probability of default profiles in the following.
Thereby, an advantage of the Bayesian approach is that in contrast to
the other methods outlined, interval estimates for such profiles can be
easily derived. This involves the computation of empirical quantiles
For the seven possible initial states in the credit rating example, probability-of-default profiles can then be obtained using the following code, the results are shown in figure 9.
tmax <- 20
for(cat in 1:7){
absStvec <- sapply(1:tmax, function(t) expm(gmgs$par * t)[cat,8])
quantMat <- matrix(0, 4, tmax + 1)
for(t in 1:tmax){
dtdraws <- lapply(gmgs$draws, function(x) expm(t * x))
drawvec <- sapply(1:length(gmgs$draws), function(x) dtdraws[[x]][cat,8])
quantMat[,t + 1] <- quantile(drawvec, c(.025, .05, .95, .975))
}
plot(0:tmax, c(0, absStvec), t="l", lwd=3, ylim=c(0, max(quantMat)),
main=paste0("Absorbing State Profiles\nInitial Rating Category ",
rownames(tm_abs)[cat]),
xlab="Time [Years]", ylab="Probability of Default")
for(i in 1:4)
lines(0:tmax, quantMat[i,], lty=c(3, 2, 2, 3)[i])
legend("topleft", lty=c(3, 2, 1), c("95%", "90%", "Median"))
}
Complementary to the solid line, which represents the posterior mean
estimate based discrete-time probability of default predictions,
pointwise credibility intervals for
The problem of estimating the parameters of a continuous-time Markov chain from discrete-time data occurs in a wide range of applications and especially plays an important role in gene sequence data analysis and rating based credit risk modeling. This paper introduces and illustrates the ctmcd package, which provides an implementation of different approaches to derive such estimates. It supports matrix logarithm-based methods with diagonal and weighted adjustment as well as a quasi-optimization procedure. Moreover, maximum likelihood estimation is implemented by an instance of the EM algorithm and Bayesian estimates can be derived by a Gibbs sampling procedure. For the latter two approaches also interval estimates can be obtained. The Bayesian approach can be used to derive pointwise credibility intervals of discrete-time transition probabilities and systematic profiles of discrete-time transition probabilities into absorbing states given the corresponding time horizon. Above all, a matrix plot function is provided and can be used to visualize both point and interval estimates.
The author thanks two anonymous referees for their thorough review and highly appreciates the comments and suggestions which significantly contributed to increasing the quality of the R package ctmcd and this manuscript.
msm, ctmcd, coda, foreach, doParallel
Bayesian, Distributions, GraphicalModels, HighPerformanceComputing, Survival
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Pfeuffer, "ctmcd: An R Package for Estimating the Parameters of a Continuous-Time Markov Chain from Discrete-Time Data", The R Journal, 2017
BibTeX citation
@article{RJ-2017-038, author = {Pfeuffer, Marius}, title = {ctmcd: An R Package for Estimating the Parameters of a Continuous-Time Markov Chain from Discrete-Time Data}, journal = {The R Journal}, year = {2017}, note = {https://rjournal.github.io/}, volume = {9}, issue = {2}, issn = {2073-4859}, pages = {127-141} }