survidm: An R package for Inference and Prediction in an Illness-Death Model

Multi-state models are a useful way of describing a process in which an individual moves through a number of finite states in continuous time. The illness-death model plays a central role in the theory and practice of these models, describing the dynamics of healthy subjects who may move to an intermediate "diseased" state before entering into a terminal absorbing state. In these models one important goal is the modeling of transition rates which is usually done by studying the relationship between covariates and disease evolution. However, biomedical researchers are also interested in reporting other interpretable results in a simple and summarized manner. These include estimates of predictive probabilities, such as the transition probabilities, occupation probabilities, cumulative incidence functions and the sojourn time distributions. The development of survidm package has been motivated by recent contributions that provides answers to all these topics. Illustration of the software usage is included using real data.


Introduction
Multi-state models are very useful for describing complex event history data with multiple endpoints. These models may be considered a generalization of survival analysis where survival is the ultimate outcome of interest but where information is available about intermediate events which individuals may experience during the study period. For instances, in most biomedical applications, besides the 'healthy' initial state and the absorbing 'dead' state, one may observe intermediate (transient) states based on health conditions (e. g. diseased), disease stages (e. g. , stages of cancer or HIV infection), clinical symptoms (e. g. , bleeding episodes), biological markers (e. g. , CD4 T-lymphocyte cell counts; serum immunoglobulin levels) or they can represent a non-fatal complication in the course of the illness (e. g. , cancer recurrence, transplantation, etc.). Graphically, these models may be illustrated using diagrams with boxes representing the states and with arrows between the states representing the possible transitions. The complexity of the multi-state model greatly depends on the number of states and also on the possible transitions. The illness-death model is probably the most popular one in the medical literature. The irreversible version of this model (Figure 1), describes the pathway from an initial state to an absorbing state either directly or through an intermediate state. Many event-history data sets from biomedical studies with multiple endpoints can be reduced to this generic structure. There exists an extensive literature on multi-state models. Main contributions include books by Andersen et al. (1993) and Hougaard (2000) (Chapter 5 and 6). Recent reviews on this topic may be found in the papers by Putter et al. (2007), Meira-Machado et al. (2009) and Meira-Machado and Sestelo (2019). One important goal in multi-state modeling is to relate the individual characteristics with the intensity rates through a covariate vector but biomedical researchers are also interested in reporting interpretable results in a simple and summarized manner. These include estimates of predictive probabilities, such as the transition probabilities, occupation probabilities, cumulative incidence functions and the sojourn time distributions. The development of survidm R package has been motivated by several recent contributions that account for these problems; in particular the newly developed methods based on subsampling (see Meira-Machado and Sestelo (2019) for further details). The current version of the package provides seven different approaches to estimate the transition probabilities, two methods for the sojourn distributions and two approaches for the cumulative incidence functions. In addition, these probabilities can also be estimated conditionally on covariate measures. The package also allows the user to perform multi-state regression where the estimation of the covariate effects is achieved using Cox regression in which different effects of the covariates are assumed for different transitions.
Several researchers have recently developed software for multi-state survival analysis. A comprehensive list of the available packages in the Comprehensive R Archive Network (CRAN) can be seen in the CRAN task view 'Survival Analysis' (Allignol and Latouche, 2018). In R, several packages provide functions for estimating the transition probabilities (e. g. , the package p3state.msm (Meira-Machado and Roca-Pardiñas, 2011), TPmsm (Araújo et al., 2014), etm (Allignol et al., 2011), mstate (de Wreede et al., 2011) and TP.idm (Balboa and de Uña-Álvarez, 2018)), but none implements all the methods addressed by survidm which includes all newly developed methods based on the subsampling approach (see  and references therein). In addition, not all allow the users to obtain estimates of the transition probabilities conditional to covariates. The cmprsk and the timereg R packages can be used to estimate the cumulative incidence functions in a competing risks model. The package survival (via survfit and coxph functions) can also be used for competing risks data. The msSurv can be used to estimate the state occupation probabilities and the sojourn distributions for multi-state models subject to right-censoring (possibly state-dependent) and left-truncation. The package provides also matrices of transition probabilities between any two states. However, none of the available software provides an encompassing package which can be used to estimate all these quantities. Finally, the use of different packages for estimating these quantities separately is rather difficult because each of the current programs requests its own data structure. This paper introduces survidm (available from the Comprehensive R Archive Network at https://cran.r-project.org/web/packages/survidm/), a software application for R which performs inference in a progressive illness-death model. It describes the capabilities of the program for estimating semiparametric regression models and for implementing nonparametric estimators for all quantities mentioned above.
The remainder of this paper is organized as follows. The following section provides a brief introduction to the methodological background. Then, a detailed description of the package is presented, and its usage is illustrated through the analysis of a real data set; and finally, the last section contains the main conclusions of this work.

Methodology background
The mathematical background underlying the survidm package is briefly introduced in this Section. A more detailed introduction can be found in Meira-Machado and Sestelo (2019). The present contribution builds on this article by offering guidelines for the use of the software to implement the proposed methods.

Notation
A multi-state model is a model for a time continuous stochastic process (Y(t), t ≥ 0) which at any time occupies one of a few possible states. In this paper we consider the progressive illness-death model depicted in Figure 1 and we assume that states are numbered as 0 − healthy, 1 − illness and 2 − death. We also assume that all subjects enter the study in State 0 and that they may either visit State 1 at some time point; or not, going directly to the absorbing state (State 2). This model is characterized by the joint distribution of (Z, T) where Z denote the sojourn time in the initial State 0 and T is the total survival time of the process. As usual with survival data, individuals are generally followed over a certain period of time, providing right-censored observations which are modeled by considering a censoring variable C, which we assume to be independent of of (Z, T). Due to censoring, rather than (Z, T) we observe Z = min(Z, C) and T = min(T, C), and ∆ 1 = I(Z ≤ C) and ∆ = I(T ≤ C) for the respective censoring indicators of Z and T. Finally, the available data is

Regression models for transitions intensities
One important goal in multi-state modeling is to study the relationships between the different predictors and the outcome. To relate the individual characteristics to the intensity rates several models have been used in literature. A common simplifying strategy is to decouple the whole process into various survival models, by fitting separate intensities to all permitted transitions using semi-parametric Cox proportional hazard regression models (Cox, 1972), while making appropriate adjustments to the risk set. The most common models are characterized through one of the two model assumptions that can be made about the dependence of the transition intensities and time. The transition intensities may be modeled using separated Cox models assuming the process to be Markovian (also known as the clock forward modeling approach), which states that past and future are independent given the present state. They can also be modeled using a semi-Markov model in which the future of the process does not depend on the current time but rather on the duration in the current state. Semi-Markov models are also called 'clock reset' models, because each time the patient enters a new state the time is reset to 0. The package survidm is restricted to these two semiparametric multi-state models, but other models are possible for the analysis of multi-state survival data. For example, time-homogeneous markov models and model with piecewise constant intensities are implemented in the msm R package (Jackson, 2011). Aalen additive model (Aalen et al., 2001) and accelerated failure time models (Wei, 1992) are another class of regression models that can be an alternative to the Cox proportional hazards model.

Transition probabilities
For two states h, j and two time points s < t, introduce the so-called transition probabilities p hj (s, t) = P(Y(t) = j|Y(s) = h). In the progressive illness-death model there are five different transition probabilities to estimate: p 00 (s, t), p 01 (s, t), p 02 (s, t), p 11 (s, t) and p 12 (s, t). Since p 00 (s, t) + p 01 (s, t) + p 02 (s, t) = 1 and p 11 (s, t) + p 12 (s, t) = 1, in practice one only needs to estimate three of these quantities. The state occupation probabilities are defined as p j (t) = P(Y(t) = j). If we assume that all subjects are in State 0 at time t = 0 then p j (t) = p 0j (0, t) and, therefore, the occupation probabilities can be seen as a particular case of the transition probabilities. Estimating these quantities is interesting, since they allow for long-term predictions of the process.
The standard nonparametric method to estimate a transition probability matrix is the time-honored Aalen-Johansen (AJ) estimator (Aalen and Johansen, 1978). This estimators benefits from the assumption of Markovianity on the underlying stochastic process extending the time-honored Kaplan-Meier estimator (Kaplan and Meier, 1958) to Markov chains. Explicit formulae of the Aalen-Johansen estimator for the illness-death model are available (Borgan, 1988). Moreira et al. (2013) propose a modification of the Aalen-Johansen estimator in the illness-death model, based on a preliminary smoothing (also known as presmoothing, Dikta (1998); Cao et al. (2005)) of the censoring probability for the total time (respectively, of the sojourn time in State 0), given the available information. The presmoothed Aalen-Johansen (PAJ) estimator proposed by Moreira et al. (2013) is obtained by replacing the censoring indicators (in the transition probabilities p 00 (s, t) and p 11 (s, t)) by an estimator of a binary (logistic) regression function. The authors verified through simulations that the use of presmoothing can lead to improved estimators with less variability.
The Markov assumption may be violated in practice; for example, for the progressive illnessdeath model, the arrival time to the intermediate state of the process often influences the subsequent transition hazard, leading to non-Markov structures. If the Markov property is violated, then the consistency of the time-honored Aalen-Johansen estimator and of its presmoothed versions can not be ensured in general. Exceptions to this are the estimators for p 00 (s, t) or for the so-called occupation probabilities, p 0j (0, t) (Datta and Satten, 2001).
Estimators for the transition probabilities in the progressive illness-death model which do not rely on the Markov assumption were introduced for the first time by Meira-Machado et al. (2006). The proposed estimators were defined in terms of multivariate Kaplan-Meier integrals with respect to the marginal distributions of Z and T. These authors showed the practical superiority of their estimators relative to the Aalen-Johansen in situations in which the Markov condition is strongly violated. However, their proposal have the drawback of requiring that the support of the censoring distribution contains the support of the lifetime distribution, otherwise they only report valid estimators for truncated transition probabilities. To avoid this issue, corrected estimators (labeled in this paper as LIDA, the acronym of Lifetime Data Analysis, the journal in which this estimator was published for the first time) were proposed by de Uña-Álvarez and Meira-Machado (2015) for p 01 (s, t) and p 11 (s, t).
The paper by de Uña-Álvarez and Meira-Machado (2015) also introduce estimators based on subsampling. The idea behind subsampling, also referred to as landmarking (Van Houwelingen, 2007), is to consider the subset of individuals observed in State h by time s. To be specific, given the time point s, to estimate p 0j (s, t) for j = 0, 1, 2 the landmark analysis is restricted to the individuals observed in State 0 at time s; whereas, to estimate p 1j (s, t), j = 1, 2, the landmark analysis proceeds from the sample restricted to the individuals observed in State 1 at time s. The procedure is then based on (differences between) Kaplan-Meier estimators derived from these subsets of the data. These estimators are termed as LM in the present paper as well as in the survidm package.
In some cases, subsampling leads to small sample sizes which may result in estimators with high variability. To avoid this problem, a valid approach is to consider a modification of the landmark estimator based on presmoothing (Meira-Machado, 2016). The presmoothed landmark estimators (PLM) are a good alternative in these situations, since they give mass to all the event times, including the censored observations. Subsampling, was later used by Putter and Spitoni (2018) to derive a landmark Aalen-Johansen estimator (LMAJ) of the transition probabilities. The idea behind the proposed estimator is to use the Aalen-Johansen estimator of the state occupation probabilities derived from those subsets (consisting of subjects occupying a given state at a particular time) for which consistency have already been proved in multi-state models that are not necessarily Markov (Datta and Satten, 2001). In this latter approach, the application of presmoothed estimators (PLMAJ) is possible too.
Also of interest is the estimation of the transition probabilities given a covariate (or a vector of covariates) that is observed for an individual before the individual makes a particular transition of interest. One standard method, particularly well-suited to the setting with multiple covariates, is to consider estimators based on a Cox's regression model (Cox, 1972) fitted marginally to each transition, with the corresponding baseline hazard function estimated by the Breslow's method (Breslow, 1972). One alternative and flexible nonparametric approach is to consider local smoothing by means of kernel weights based on local constant (Nadaraya-Watson) regression. Right censoring is handled by applying inverse probability of censoring weighting. This is a fully nonparametric approach which provides flexible effects of the continuous covariates Rodríguez-Álvarez et al., 2016;Meira-Machado and Sestelo, 2019). The two possible approaches are implemented in the survidm package and labeled as breslow and IPCW, respectively.

Cumulative incidence functions
Another quantity of interest in multi-state modeling is the cause-specific cumulative incidence function, as defined by Kalbfleisch and Prentice (1980). In the illness-death model two cumulative incidence functions are of particular interest: the cumulative incidence of the illness and the cumulative incidence of dying without the disease. These quantity represents the probability of an individual being or having been diseased at time t. One possible estimator for the cause-specific cumulative incidence function in a competing risks setting can be performed using the estimator proposed by Geskus (2011). This estimator based on the subdistribution hazard is obtained by applying the Nelson-Aalen estimator and the product-limit estimator of the disease-free survival. This estimator can also be expressed in terms of the Kaplan-Meier weights of the distribution of Z, the sojourn time in State 0, as introduced in the paper by Meira-Machado and Sestelo (2019). A modification of this estimator based on presmoothing can be introduced to reduce its variability. Both methods are implemented in the survidm package. Estimation methods for the cumulative incidence function conditionally on covariate measures based on local constant (Nadaraya-Watson) regression are also implemented in the package.

Sojourn distributions
The estimation of the marginal distributions in multi-state modeling is an interesting topic too. In the context of the illness-death model, if the independence assumption between the censoring variable C and the vector of times (Z, T) is assumed, the marginal distribution of the sojourn time in State 0, Z, can be consistently estimated by the Kaplan-Meier estimator based on the ( Z i , ∆ 1i )'s. Similarly, the distribution of the total time may be consistently estimated by the Kaplan-Meier estimator based on the ( T i , ∆ i )'s. However, the estimation of the marginal distribution of the sojourn time in State 1 is not such a simple issue. Nonparametric estimates for this marginal distribution allowing for state and path dependent censoring were proposed by Satten and Datta (2002).

survidm in practice
This section introduces an overview of how the package is structured.
This software enables both numerical and graphical outputs to be displayed for all methods described in the previous section. This software is intended to be used with the R statistical program (R Core Team, 2019). Our package is composed of 17 functions that allow users to obtain estimates for all proposed methods. Details on the usage of the functions (described in Table 1) can be obtained with the corresponding help pages.
It should be noted that to implement the methods described in the methodology section one needs the following variables of data: time1, event1, Stime and event. Covariates can also be included. The variable time1 represents the sojourn time in State 0 and Stime the total time, whereas event1 and event are the respective censoring indicators. This means that event1 will take the value 1 if the subject leaves State 0 and 0 otherwise; event takes value 1 if the subject reaches State 2 and 0 otherwise.

NWW
Returns a vector with the Nadaraya-Watson weights. For illustration, we apply the proposed methods to data from a large clinical trial on Duke's stage III patients, affected by colon cancer, that underwent a curative surgery for colorectal cancer (Moertel et al., 1990). This data set is freely available as part of the R survival package. The data is also available as part of the R package survidm. Besides the two event times (disease free survival time and death time) and the corresponding indicator statuses, a vector of covariates including rx (treatment: Obs(ervation), Lev(amisole), Lev(amisole)+5FU), sex (1 -male), age (years), nodes (number of lymph nodes with detectable cancer), surge (time from surgery to registration: 0 = short, 1 = long), adhere (adherence to nearby organs) are also available. The covariate 'recurrence' is the only timedependent covariate, while the other covariates included are fixed. Recurrence can be considered as an intermediate transient state and modeled using the progressive illness-death model with transient states 'alive and disease-free' and 'alive with recurrence', and the absorbing state 'dead'. In the following, we will demonstrate the package capabilities using this data. Bellow is an excerpt of the data.frame with one row per individual. Individuals were chosen in order to represent all possible combinations of movements among the three states. Individual represented in the first line experienced a recurrence of the tumor and have died. In such cases event1 = 1 and time1 = Stime indicates that the individual observed a direct transition from State 0 to State 1 (with event1 = 1). Individual represented in line 2 remain alive and without recurrence at the end of follow-up (event1 = 0 and event = 0). Individual represented in line 16 of the original data set, with event1 = 1 and event = 0, corresponds to an individual with an observed recurrence that remain alive at the end of the follow-up. Note that in this case the disease-free survival time is equal to the death time (time1 = Stime). Finally, individual represented in line 21 of the original data set have died without observing a recurrence. We note that event1 = 1 and event = 0 corresponds to individuals with an observed recurrence that remain alive at the end of the follow-up.
From the total of 929 patients, 468 developed a recurrence and among these 414 died; 38 patients died without developing a recurrence. A summary of the data with the number of the undergoing transitions can be obtained through the nevents function. The colums of the data set must include at least the four columns named time1, event1, Stime and event according to the requirements of the survIDM function presented in the help file. Parameter state.names enables to change the default values of states, 'healthy', 'illness' and 'death'.

Regression models for transitions intensities
To relate the individual characteristics to the intensity rates semiparametric multi-state regression models are used. Specifically, separated Cox models assuming the process to be Markovian (i.e., the transition intensities only depend on the history of the process through the current state) or using a semi-Markov model in which the future of the process does not depend on the current time but rather on the duration in the current state. It is therefore of practical interest to determine whether the Markov property holds within a particular data set in order to determine whether a Markov model or a semi-Markov model is more appropriate.

The Markov assumption
The Markov assumption may be checked, among others, by including covariates depending on the history. For the progressive illness-death model, the Markov assumption is only relevant for mortality transition after recurrence. We can examine whether the time spent in the initial state "Alive and disease-free" (i.e. the past) is important on the transition from the recurrence state to death (i.e. the future). For doing that, let Z be the time spent in State 0, and t the current time. Fitting a model α 12 (t; Z) = α 12,0 (t)exp{βZ}, we now need to test the null hypothesis, H 0 : β = 0, against the general alternative, H 1 : β = 0. This would assess the assumption that the transition rate from the disease state into death is unaffected by the time spent in the previous state.
> library(survival) > fit <-coxph(Surv(time1, Stime, event)~time1, data = colonIDM, subset=c(time1 < Stime)) > fit coef exp(coef) se(coef) z p time1 -0.0002475 0.9997526 0.0001737 -1.424 0.154 Likelihood ratio test=2.04 on 1 df, p=0.1533 n= 468, number of events= 414 Following this procedure, we verified that the effect of time spent in State 0 reported a p value of 0.154 (regression coefficient: -0.0002475) revealing no evidences against the Markov model for the colon data. Results from this test can also be obtained through the function markov.test, which has an output fairly similar to those obtained from coxph function. The transition intensities characterize the hazard for movement from one state to another, revealing how the different covariates affect the various permitted transitions. The results obtained indicate that none of the covariates were found to have a strong effect for all three transitions. Save for covariates age and sex, all the remaining predictors were considered important for recurrence transition. Interestingly, age displayed a strong linear effect on mortality transition without recurrence, whereas all the other covariates failed to show relevant association on this transition. Finally, save for covariates surg and adhere, all the remaining predictors were considered important for the mortality transition after recurrence. The coxidm function also returns the analysis of the deviance for each Cox model. In this case, only an overall p-value is presented for categorical variables. To obtain the outputs, we have to indicate type='anova' in summary function. The effect of the continuous covariates on the log-hazards is often assumed to have a linear functional form in all intensities. To introduce flexibility into the Cox Markov model, several smoothing methods may be applied, but P-splines (Eilers and Marx, 1996) are being most frequently considered in this context. Results showed the presence of a strong nonlinear effect for nodes (checked through a formal test) when using a Cox model on the recurrence transition. Figure 2 returns a centered set of predictions on a log hazard scale. The average predicted value is zero with a mean value of nodes as the reference (see the vignette 'Splines, plots, and interactions' in (Therneau, 2021)). The main curve depicts the smooth curve for nodes, on a log hazard scale, indicating that the risk of recurrence increases rapidly until about 6 nodes. The apparent decrease after 23 nodes is not significant due to the wide confidence intervals.

> ggplotly(nonlinear)
The proportional hazards assumption can be tested formally using the summary function. The output can be obtained putting type= ph in summary function.

Occupation probabilities and transition probabilities
The occupation probabilities and the transition probabilities are key quantities of interest in multi-state models. They offer interpretable results in a simple and summarized manner.
Estimates and plots of the transition probabilities for all methods introduced in Section 2.2 can be obtained using the tprob function. The default method is the Aalen-Johansen estimator (AJ) which assumes the process to be Markovian. The presmoothed version of the Aalen-Johansen estimator (PAJ) assumes also the process to be Markovian while the remaining methods (LIDA, LM, PLM, LMAJ and PLMAJ) are free of the Markov condition.
When one is confident of the Markov assumption, the Aalen-Johansen is preferred over the non-Markovian estimators since it reports a smaller variance in estimation. Estimates and plot for the Aalen-Johansen method can be obtained through the following input commands: Besides being consistent regardless the Markov condition, the landmark non-Markov estimators (LM, PLM, LMAJ and PLMAJ) can be preferable in many situations due to their greater accuracy (smaller bias). When comparing the original nonparametric landmark estimator (LM) and the Aalen-Johansen estimator, some discrepancies are observed for t = 730 and t = 1095 (2 and 3 years, respectively). In addition to the aforementioned discrepancy between the two estimates, the plots for the two methods ( Figure 3) also show that the confidence bands are narrower in the case of the Aalen-Johansen revealing less variability for this method. Since the landmark estimators of the transition probabilities are free of the Markov assumption, they can also be used to introduce such tests (at least in the scope of the illness-death model) by measuring their discrepancy to Markovian estimators. The function markov.test performs a graphical local test for the Markov condition. This graphical test is based on a PP-plot which compares the estimations reported by the Aalen-Johansen transition probabilities to their non-Markov counterparts. The corresponding plot for a local test of Markovianity (s = 365) can be obtained though the following input command: > mk <-markov.test(survIDM(time1, event1, Stime, event)~1, s = 365, data = colonIDM) > autoplot(mk) The plot shown in Figure 4 compares the Aalen-Johansen estimator and the landmark non-Markovian estimator for p 01 (s, t), p 02 (s, t) and p 12 (s, t), for s = 365. Existing deviations of the plots with respect to the straight line y = x reveals some evidence on the lack of Markovianity of the underlying process beyond one year after surgery. For further illustration, this figure jointly display the landmark non-Markovian estimator and the Aalen-Johansen estimator for p 12 (s = 365, t). In this plot the differences between both estimators are clearly seen. Thus, in principle the application of the Aalen-Johansen method is not recommended here, due to possible biases.
The variability of the nonparametric landmark estimator (LM) may be successfully reduced using presmoothing ideas (Dikta, 1998;Cao et al., 2005). The presmoothed landmark estimator is implemented in the same function through method PLM. The same ideas can be used to reduce the variability of the Markovian Aalen-Johansen estimator and the (non-Markov) Landmark Aalen-Johansen estimator through methods PAJ and PLMAJ, respectively.
The package survidm also allows for the computation of the above quantities conditional on covariates that are observed for an individual before the individual makes a particular transition of interest. For continuous covariates, one possible and flexible nonparametric approach is to consider local smoothing by means of kernel weights based on local constant (Nadaraya-Watson: NW) regression. This estimator is implemented in our package through function tprob using the method = IPCW. Below are the input commands to obtain the estimates of the transition probabilities at time s = 365 for an individual of 48 years old. For the bandwidth in the estimator we use dpik function which is available from the R KernSmooth package. This is the data based bandwidth selector of Wand and Jones (1997). The curves depicted in Figure 5, which are purely nonparametric, enable a flexible modeling of the data providing flexible and robust effects of the covariate that can be used at least as a preliminary attempt, providing insights on the data being analyzed. Such methods can be used to capture nonstandard data features that may not be detected through parametric or semiparametric proposals. A general problem in multivariate nonparametric regression estimation is the so-called curse of dimensionality. In higher dimensions the observations are sparsely distributed even for large sample sizes, and consequently estimators based on local averaging (like those based on kernel smoothing) perform unsatisfactorily in this situation.

Sojourn distribution
Another interesting quantity is the sojourn time in each state. Estimates for the distribution function of the sojourn time in the recurrence state can be obtained using the estimator by Satten and Datta (2002) through function sojourn.
> soj <-sojourn(survIDM(time1, event1, Stime, event)~1, data = colonIDM, method = "Satten-Datta", conf = FALSE) > summary(soj, time=365*1:6) Estimation of sojourn ( The estimates for the distribution function of the sojourn time in the recurrence state, corresponding to the time between entry in recurrence and death, reveal that the distribution function increases to a value near 49% and 78% for a time of one and two years, respectively, revealing a high risk of death shortly after relapse. The methods for implementing some of the proposed methods can be computationally demanding. In particular, the use of bootstrap resampling techniques are time-consuming processes because it is necessary to estimate the model a great number of times. In such cases we recommend the use of parallelization (cluster = TRUE). This should considerably increase performance on multi core/ multi threading machines.

Conclusions
There has been several recent contributions for the inference in the context of multi-state models. Many of these contributions were made for the illness-death model. One important and perhaps undervalued aspect of multi-state models is the possibility to apply it to obtain predictions of the clinical prognosis. This is usually achieved using estimates of the transition probabilities and survival estimates. However, there are several other quantities that could also be used in the analysis of these data, such as the state occupation probabilities, the sojourn time distributions and the cumulative incidence functions. To provide the biomedical researchers with an easy-to-use tool for obtaining predictive estimates for all these quantities we develop an R package called survidm. This package can be used to implement several nonparametric and semiparametric estimators for the transition probabilities. In addition, estimators are also implemented that account for the influence of covariates. Bootstrap confidence bands are provided for all methods. The software can also be used to perform multi-state regression (using type-specific Cox models).
One limitation of the survidm R package is that it can only be used in the progressive illness-death model. However, this turns out to be an advantage for those users that only wish to analyze data from a progressive illness-death model. For such cases, the survidm package is ideal since it is user-friendly (as illustrated in the real data analysis as well as in the help files of the main functions tprob, sojourn and CIF) with strong resemblance to the well-known and widely used survival package.