Discrete Time Markov Chains with R by

The markovchain package aims to provide S4 classes and methods to easily handle Discrete Time Markov Chains (DTMCs), filling the gap with what is currently available in the CRAN repository. In this work, I provide an exhaustive description of the main functions included in the package, as well as hands-on examples. Introduction DTMCs are a notable class of stochastic processes. Although their basic theory is not overly complex, they are extremely effective to model categorical data sequences (Ching et al., 2008). To illustrate, notable applications can be found in linguistic (see Markov’s original paper Markov (1907)), information theory (Google original algorithm is based on Markov Chains theory, Lawrence Page et al. (1999)), medicine (transition across HIV severity states, Craig and Sendi (2002)), economics and sociology (Jones (1997) shows an application of Markov Chains to model social mobility). The markovchain package (Spedicato, Giorgio Alfredo, 2016) provides an efficient tool to create, manage and analyse Markov Chains (MCs). Some of the main features include the possibility to: validate the input transition matrix, plot the transition matrix as a graph diagram, perform structural analysis of DTMCs (e.g. classification of transition matrices and states, analysis of the stationary distribution, etc . . . ), perform statistical inference (such as fitting transition matrices from various input data, simulating stochastic processes trajectories from a given DTMC, etc..). The author believes that no R package provides a unified infrastructure to easily manage DTMCs as markovchain does at the time this paper is being drafted. The package targets data scientists using DTMC, Academia members, supporting faculty instructors, as well as students of undergraduate courses on Stochastic Processes. The paper will be organized as follows: Section 2.2 gives a brief overview on R packages and alternative software that provide similar functionalities, Section 2.3 reviews DTMC basic theory, Section 2.4 discusses the package design and structure, Section 2.5 shows how to create and manipulate homogeneous DTMCs, Section 2.6 and Section 2.7 respectively present the functions created to perform structural analysis, and statistical inference on DTMCs. A brief overview of the functionalities written to deal with non homogeneous discrete dime Markov chains (NHDTMCs) is provided in Section 2.8. A discussion on numerical reliability and computational performance is provided in Section 2.9. Finally, Section 2.10 draws final conclusions and briefly discusses future potential developments of the package. Analysis of existing DTMC-related software As reviewed later in more details, a DTMC is defined by a stochastic matrix known as transition matrix (TM), which is a square matrix satisfying Equation 1. { Pij ∈ [0, 1] ∀i, j ∑i Pij = 1 (1) Although defining a stochastic matrix is trivial in any mathematical or statistical software, a DTMC dedicated infrastructure can provide object oriented programmed methods to verify the validity of the input data (i.e. if the input matrix is a stochastic one) , as well as to perform structural analysis on DTMC objects. Various packages mention MCs related models in the CRAN repository, whereby a few of them will be now reviewed. The clickstream package (Scholz, 2016), on CRAN since 2014, aims to model websites click stream using higher order Markov Chains. It provides a MarkovChain S4 class that is similar to the markovchain class. Further, DTMCPack (Nicholson, William, 2013) and MTCM (Bessi, Alessandro, 2015) also work with DTMCs but provide even more limited functions: the first one focuses on creating simulations from a given DTMC, whilst the second contains only one function for estimating the underlying transition matrix for a given categorical sequence. Moreover, none of them appears to have been updated since 2015. The coverage of functionalities provided by markovchain package for analysing DTMCs appears to be more complete than the above mentioned packages, since none of them provides methods for importing or coercing transition matrices from other objects, such as R matrices or data.frames. Furthermore, markovchain is the only package providing a quick graph plotting facility for DTMC objects. The same applies when considering the functionalities used The R Journal Vol. 9/2, December 2017 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLE 85 to perform structural analysis of transition matrices and to fit DTMCs from various kind of input data. More interestingly, the FuzzyStatProb package (Pablo J. Villacorta and José L. Verdegay, 2016) gives an alternative approach for estimating the parameters of DTMCs using "fuzzy logic". This review voluntarily omits discussing packages that are not specifically focused on DTMC. Nonetheless, the depmixS4 (Visser and Speekenbrink, 2010) and the HMM (Himmelmann, 2010) packages deal with Hidden Markov Models (HMMs). In addition, the number of R packages focused on the estimation of statistical models using the Markov Chain Monte Carlo simulation approach is sensibly bigger. Finally, the msm (Jackson, 2011), heemod (Antoine Filipovi et al., 2017) and the TPmsm packages (Artur Araújo et al., 2014) focus on health applications of multi state analysis using different kinds of models, including Markov-related ones among them. Finally, among other well known software used in Mathematics and Statistics, only Mathematica (Wolfram Research, Inc., 2013) provides routines specifically written to deal with Markov processes at the author’s knowledge. Nevertheless, the analysis of DTMCs could be easily handled within the Matlab programming language (MATLAB, 2017) due to its well known linear algebra capabilities. Review of underlying theory In this section a brief review of the theory of DTMCs is presented. Readers willing to dive deeper can inspect Cassandras (1993) and Grinstead and Snell (2012). A DTMC is a stochastic process whose domain is a discrete set of states, {s1, s2, . . . , sk}. The chain starts in a generic state at time zero and moves from a state to another by steps. Let pij be the probability that a chain currently in state si moves to state sj at the next step. The key characteristic of DTMC processes is that pij does not depend upon the previous state in the chain. The probability pij for a (finite) DTMC is defined by a transition matrix previously introduced (see Equation 1). It is also possible to define the TM by column, under the constraint that the sum of the elements in each column is 1. To illustrate, a few toy examples on transition matrices are now presented; the "Land of Oz" weather Matrix, Kemeny et al. (1974). Equation 2 shows the transition probability between (R)ainy, (N)ice and (S)now weathers.  R N S R 0.5 0.25 0.25 N 0.5 0 0.5 S 0.25 0.25 0.5  (2) Further, the Mathematica Matrix 3, taken from the Mathematica 9 Computer Algebra System manual (Wolfram Research, Inc., 2013), that will be used when discussing the analysis the structural proprieties of DTMCs, is as follows:  A B C D A 0.5 0.5 0 0 B 0.5 0.5 0 0 C 0.25 0.25 0.25 0.25 D 0 0 0 1  (3) Simple operations on TMs allow to understand structural proprieties of DTMCs. For example, the n− th power of P is a matrix whose entries represent the probabilities that a DTMC in state si at time t will be in state sj at time t + n. In particular, if ut is the probability vector for time t (that is, a vector whose j− th entries represent the probability that the chain will be in the j− th state at time t), then the distribution of the chain at time t + n is given by un = u ∗ Pn. Main properties of Markov chains are now presented. A state si is reachable from state sj if ∃n→ pn ij > 0. If the inverse is also true then si and sj are said to communicate . For each MC, there always exists a unique decomposition of the state space into a sequence of disjoint subsets in which all the states within each subset communicate. Each subset is known as a communicating class of the MC. It is possible to link this decomposition to graph theory, since the communicating classes represent the strongly connected components of the graph underlying the transition matrix (Jarvis and Shier, 1999). A state sj of a DTMC is said to be absorbing if it is impossible to leave it, meaning pjj = 1. An absorbing Markov chain is a chain that contains at least one absorbing state which can be reached, not necessarily in a single step. Non absorbing states of an absorbing MC are defined as transient states . In addition, states that can be visited more than once by the MC are known as recurrent states . If a DTMC contains r ≥ 1 absorbing states, it is possible to re-arrange their order by separating The R Journal Vol. 9/2, December 2017 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLE 86 transient and absorbing states such that the t transient states come before the r absorbing ones. Such re-arranged matrix is said to be in canonical form (see Equation 4), where its composition can be represented by sub matrices. ( Qt,t Rt,r 0r,t Ir,r ) (4) Such matrices are: Q (a t-square sub matrix containing the transition probabilities across transient states), R (a nonzero t-by-r matrix containing transition probabilities from non-absorbing to absorbing states), 0 ( an r-by-t zero matrix), and Ir (an r-by-r identity matrix). It is possible to use these matrices to calculate various structural proprieties of the DTMC. Since limn→∞ Qn = 0, it can be shown that in every absorbing matrix the probability to be eventually absorbed is 1, regardless of the state where the MC is initiated. Further, in Equation 5 the fundamental matrix is presented, where the generic nij entry expresses the expected number of times the process will transit in state sj, given that it started in state si. Also, the i-th entry of vector t = N ∗ 1̄, being 1̄ a t-sized vector of ones, expresses the expected number of steps before an absorbing DTMC, started in state


Introduction
DTMCs are a notable class of stochastic processes.Although their basic theory is not overly complex, they are extremely effective to model categorical data sequences (Ching et al., 2008).To illustrate, notable applications can be found in linguistic (see Markov's original paper Markov (1907)), information theory (Google original algorithm is based on Markov Chains theory, Lawrence Page et al. (1999)), medicine (transition across HIV severity states, Craig and Sendi (2002)), economics and sociology (Jones (1997) shows an application of Markov Chains to model social mobility).
The markovchain package (Spedicato, Giorgio Alfredo, 2016) provides an efficient tool to create, manage and analyse Markov Chains (MCs).Some of the main features include the possibility to: validate the input transition matrix, plot the transition matrix as a graph diagram, perform structural analysis of DTMCs (e.g.classification of transition matrices and states, analysis of the stationary distribution, etc . . .), perform statistical inference (such as fitting transition matrices from various input data, simulating stochastic processes trajectories from a given DTMC, etc..).The author believes that no R package provides a unified infrastructure to easily manage DTMCs as markovchain does at the time this paper is being drafted.
The package targets data scientists using DTMC, Academia members, supporting faculty instructors, as well as students of undergraduate courses on Stochastic Processes.The paper will be organized as follows: Section 2.2 gives a brief overview on R packages and alternative software that provide similar functionalities, Section 2.3 reviews DTMC basic theory, Section 2.4 discusses the package design and structure, Section 2.5 shows how to create and manipulate homogeneous DTMCs, Section 2.6 and Section 2.7 respectively present the functions created to perform structural analysis, and statistical inference on DTMCs.A brief overview of the functionalities written to deal with non -homogeneous discrete dime Markov chains (NHDTMCs) is provided in Section 2.8.A discussion on numerical reliability and computational performance is provided in Section 2.9.Finally, Section 2.10 draws final conclusions and briefly discusses future potential developments of the package.

Analysis of existing DTMC-related software
As reviewed later in more details, a DTMC is defined by a stochastic matrix known as transition matrix (TM), which is a square matrix satisfying Equation 1.
Although defining a stochastic matrix is trivial in any mathematical or statistical software, a DTMC dedicated infrastructure can provide object oriented programmed methods to verify the validity of the input data (i.e. if the input matrix is a stochastic one) , as well as to perform structural analysis on DTMC objects.
Various packages mention MCs -related models in the CRAN repository, whereby a few of them will be now reviewed.The clickstream package (Scholz, 2016), on CRAN since 2014, aims to model websites click stream using higher order Markov Chains.It provides a MarkovChain S4 class that is similar to the markovchain class.Further, DTMCPack (Nicholson, William, 2013) and MTCM (Bessi, Alessandro, 2015) also work with DTMCs but provide even more limited functions: the first one focuses on creating simulations from a given DTMC, whilst the second contains only one function for estimating the underlying transition matrix for a given categorical sequence.Moreover, none of them appears to have been updated since 2015.The coverage of functionalities provided by markovchain package for analysing DTMCs appears to be more complete than the above mentioned packages, since none of them provides methods for importing or coercing transition matrices from other objects, such as R matrices or data.frames.Furthermore, markovchain is the only package providing a quick graph plotting facility for DTMC objects.The same applies when considering the functionalities used The R Journal Vol.9/2, December 2017 ISSN 2073-4859 to perform structural analysis of transition matrices and to fit DTMCs from various kind of input data.More interestingly, the FuzzyStatProb package (Pablo J. Villacorta and José L. Verdegay, 2016) gives an alternative approach for estimating the parameters of DTMCs using "fuzzy logic".
This review voluntarily omits discussing packages that are not specifically focused on DTMC.Nonetheless, the depmixS4 (Visser and Speekenbrink, 2010) and the HMM (Himmelmann, 2010) packages deal with Hidden Markov Models (HMMs).In addition, the number of R packages focused on the estimation of statistical models using the Markov Chain Monte Carlo simulation approach is sensibly bigger.Finally, the msm (Jackson, 2011), heemod (Antoine Filipovi et al., 2017) and the TPmsm packages (Artur Araújo et al., 2014) focus on health applications of multi -state analysis using different kinds of models, including Markov-related ones among them.
Finally, among other well known software used in Mathematics and Statistics, only Mathematica (Wolfram Research, Inc., 2013) provides routines specifically written to deal with Markov processes at the author's knowledge.Nevertheless, the analysis of DTMCs could be easily handled within the Matlab programming language (MATLAB, 2017) due to its well known linear algebra capabilities.

Review of underlying theory
In this section a brief review of the theory of DTMCs is presented.Readers willing to dive deeper can inspect Cassandras (1993) and Grinstead and Snell (2012).
A DTMC is a stochastic process whose domain is a discrete set of states, {s 1 , s 2 , . . . ,s k }.The chain starts in a generic state at time zero and moves from a state to another by steps.Let p ij be the probability that a chain currently in state s i moves to state s j at the next step.The key characteristic of DTMC processes is that p ij does not depend upon the previous state in the chain.The probability p ij for a (finite) DTMC is defined by a transition matrix previously introduced (see Equation 1).It is also possible to define the TM by column, under the constraint that the sum of the elements in each column is 1.
To illustrate, a few toy -examples on transition matrices are now presented; the "Land of Oz" weather Matrix, Kemeny et al. (1974).Equation 2shows the transition probability between (R)ainy, (N)ice and (S)now weathers.(2) Further, the Mathematica Matrix 3, taken from the Mathematica 9 Computer Algebra System manual (Wolfram Research, Inc., 2013), that will be used when discussing the analysis the structural proprieties of DTMCs, is as follows: Simple operations on TMs allow to understand structural proprieties of DTMCs.For example, the n − th power of P is a matrix whose entries represent the probabilities that a DTMC in state s i at time t will be in state s j at time t + n.In particular, if u t is the probability vector for time t (that is, a vector whose j − th entries represent the probability that the chain will be in the j − th state at time t), then the distribution of the chain at time t + n is given by u n = u * P n .Main properties of Markov chains are now presented.
A state s i is reachable from state s j if ∃n → p n ij > 0. If the inverse is also true then s i and s j are said to communicate.For each MC, there always exists a unique decomposition of the state space into a sequence of disjoint subsets in which all the states within each subset communicate.Each subset is known as a communicating class of the MC.It is possible to link this decomposition to graph theory, since the communicating classes represent the strongly connected components of the graph underlying the transition matrix (Jarvis and Shier, 1999).
A state s j of a DTMC is said to be absorbing if it is impossible to leave it, meaning p jj = 1.An absorbing Markov chain is a chain that contains at least one absorbing state which can be reached, not necessarily in a single step.Non -absorbing states of an absorbing MC are defined as transient states.
In addition, states that can be visited more than once by the MC are known as recurrent states.
If a DTMC contains r ≥ 1 absorbing states, it is possible to re-arrange their order by separating The R Journal Vol.9/2, December 2017 ISSN 2073-4859 transient and absorbing states such that the t transient states come before the r absorbing ones.Such re-arranged matrix is said to be in canonical form (see Equation 4), where its composition can be represented by sub -matrices.
Such matrices are: Q (a t-square sub -matrix containing the transition probabilities across transient states), R (a nonzero t-by-r matrix containing transition probabilities from non-absorbing to absorbing states), 0 ( an r-by-t zero matrix), and I r (an r-by-r identity matrix).It is possible to use these matrices to calculate various structural proprieties of the DTMC.Since lim n→∞ Q n = 0, it can be shown that in every absorbing matrix the probability to be eventually absorbed is 1, regardless of the state where the MC is initiated.
Further, in Equation 5 the fundamental matrix is presented, where the generic n ij entry expresses the expected number of times the process will transit in state s j , given that it started in state s i .Also, the i-th entry of vector t = N * 1, being 1 a t-sized vector of ones, expresses the expected number of steps before an absorbing DTMC, started in state s i , is absorbed.The b ij entries of matrix B = N * R are the probabilities that a DTMC started in state s i will eventually be absorbed in state s j .Finally, the probability of visiting the transient state j when starting from the transient state i is the h ij entry of the matrix H = (N − I t ) * N −1 dg , being dg the diagonal operator.
A DTMC is said to be ergodic if there exist a number N such that it is possible to reach every state in at most N steps.If P n > 0 for some n, then P is a regular DTMC.
Fixed row vectors w, also known as steady state vectors, are vectors such that wP = w.Mathematically, they correspond to eigenvectors associated to unitary eigenvalues of the TM.It can be shown that lim n→∞ v * P n = w and that lim n→∞ P n = W, where v is a generic stochastic vector and w is a matrix where all rows are w.
The mean first passage time m ij is the expected the number of steps needed to reach state s j starting from state s i , where m ii = 0 by convention.For ergodic MCs, r i is the mean recurrence time, that is the expected number of steps to return to s i from s i .It is possible to prove that r i = 1 w i , where w i is the i-th entry of w.Further, let D be a diagonal matrix, in which the diagonal elements come from r i , and let C be a matrix filled with ones.It is then possible to get the mean first passage matrix M from Equation 6.
Let Z = (I − P + M) −1 be the fundamental matrix for an ergodic MC.It is possible to write m ij as a function of Z and w, as Equation 7shows.
A further topic in structural analysis of irreducible DTMCs is periodicity.The period of a state s i , denoted as d (i), is the greatest common divisor of n for which p n ii > 0. If the period is 1, the state is aperiodic, while if the period is greater than 2, the state is periodic; all states in the same class share the same period.
Given a generic DTMC, it is possible to simulate stochastic trajectories following the underlying MC from the TM.Given an initial state s(t) = j, the s(t + 1) state is sampled from the multinomial distribution whose probabilities are expressed by the j-th row.The sampled state indicates from which row the probabilities to sample s(t + 2) are taken from.Also, given a sample sequence, it is possible to estimate the TM of the underlying DTMC.Equation 8shows the maximum likelihood estimator (MLE) of the TM p ij entry, being the n ij elements the number of sequences X t = s i , X t+1 = s j counted in the sample, that is: Equation 10 shows asymptotic confidence intervals for p ij .The bootstrap approach allows to define non -parametric ones.
The R Journal Vol.9/2, December 2017 ISSN 2073-4859 The mode of the X t+1 conditional distribution given X t = s j represents the prediction from a given DTMC and the current chain state X t = s j .
In conclusion, the markovchain package allows to perform statistical analysis on NHDTMCs, in the special case where they can be treated as sequential lists of DTMCs.

Implementation design and details
The markovchain package has been originally written in "native" R. Most functions have been therefore ported in Rcpp (Eddelbuettel, Dirk, 2013) since 2015, yielding sensible improvements in computational time.Other dependencies of markovchain are: igraph Csardi, Gabor and Nepusz, Tamas (2006), matlab Roebuck (2014), Matrix Bates and Maechler (2016) and expm Goulet et al. (2015) ( for operation on matrices ), and the method package for defining S4 classes.
Homogeneous DTMCs are defined by a dedicated S4 class, "markovchain".Such class is defined by the following slots: 1. states: a character vector, listing the states for which transition probabilities are defined.
2. byrow: a logical variable, indicating whether transition probabilities are shown by row or by column.
3. transitionMatrix: a matrix variable defining the TM.
4. name: an optional character variable to name the DTMC.
A "markovchain" S4 class has been designed based on Chambers, J.M. (2008) suggestions.For example, a S4 setValidity method checks the coherence of any newly created markovchain object, by verifying that either the rows or columns of the transition matrix sum to one, and that all elements are bounded between 0 and 1.Another S4 class,"markovchainList", has been created for handling non -homogeneous DTMCs.Finally, the package provides functions and S4 to analyse continuous MCs, as well as higher order MCs, although their discussion is beyond the scope of this paper.
Three vignettes documents the markovchain package.The first one broadly describes the functionalities of the package and it also presents real -world applications of DTMCs using the package.The second one, written using knit and rmarkdown, is a beamer presentation that quickly introduces the key functionalities of the package.The third one presents experimental functions for higher order and multivariate MCs.Finally, the www.github.com/spedygiorgio/markovchainGitHub page hosts the package's wiki as well as its development version.

Creating and manipulating markovchain objects
The package is loaded within R as follows:

library("markovchain")
Creating a markovchain object is easy, and can be done with provided code.
A power operator also exists, ^, and it is based on the expm package (Goulet et al., 2015), providing efficient matrix exponentiation.
#after two days (by square power) mcWeather^2 Weather^2 A 3 -dimensional discrete MC defined by the following states: rainy, nice, sunny The transition matrix (by rows) is defined as follows: rainy nice sunny rainy 0.4375 0.1875 0.3750 nice 0.3750 0.2500 0.3750 sunny 0.3750 0.1875 0.437 Finally, logical operators have been defined as well.

#logical equality and inequality mcWeather==mcWeather [1] TRUE mcWeather!=mathematicaMc [1] TRUE
Both the algebraic and logical operators have been defined by overriding standard R operators, providing a more concise and "natural" code, which can bring the advantage of being more appealing to a novice user, by executing certain operations on TM in an efficient way.Such approach has been stressed in both the class help file and the package vignette code to make the final user fully aware of any potential drawbacks of such choice.
Various convenience S4 methods have been defined to easily manipulate and manage markovchain objects.In the following examples, some of the implemented methods in the "markovchain" class are presented, allowing to: get and set names, return the MC dimension, transpose the transition matrix, and directly access the transition probabilities.

Structural properties of finite Markov chains
The markovchain package embeds functions to analyse the structural proprieties of DTMC.For example, it is possible to find the stationary distribution, as well as classify the states.Feres, Renaldo (2007) and Montgomery, James (2009) provide a full description of the algorithms underlying these functions, whilst a more theoretical perspective can be found in Brémaud, Pierre (1999).The Mathematica MC will be used to illustrate such features.
The summary method provides an overview of the structural properties of the DTMC process underlying the markovchain object.
#plotting and summarizing plot(mathematicaMc) summary(mathematicaMc) Mathematica Markov Chain Markov chain that is composed by: Closed classes: s1 s2 s4 Recurrent classes: {s1,s2},{s4} Transient classes: {s3} The Markov chain is not irreducible The absorbing states are: s4 In the above example, closed and transient classes are identified, irreducibility checks are executed, and a list of absorbing states is returned.Further, it is known that a finite MC has at least one steady-state distribution, and the steadyStates method can be used to obtain it.To illustrate, for the mcWeather matrix there exist a one -dimensional solution, since the underlying TM is irreducible.A higher dimensional solution is given when the irreducibility property does not hold, as of the second example.
We now illustrate the Canonical Form and the Fundamental Matrix concepts using another example taken from classical theory: The Flipping Coin problem.Specifically, consider repeatedly flipping a fair coin until the sequence (heads, tails, heads) appears; it is possible to model such process using a DTMC with four states: "E" empty initial sequence, "H" head, "HT" head followed by tail, "HTH" head followed by tail and head.
Periodicity analysis is shown in the following last example, in which the output shows that the DTMC has a period of 2.
Next, the function createSequenceMatrix is used to obtain the sequence matrix, that is the empirical transition matrix between the preceding and subsequent state, for a given sequence, whilst the function markovchainFit fits DTMCs.We will exemplify the use of such functions on the rain data set (recorded daily rainfall volume in Alofi island) bundled within the package.

Non homogeneous Markov chains
Non homogeneous DTMCs (NHDTMCs) can be handled using the "markovchainList" S4 class, which consists in a list of markovchain objects.

Markovchain 3 Unnamed Markov chain
A 2 -dimensional discrete Markov Chain defined by the following states: s1, s2 The transition matrix (by rows) is defined as follows: s1 s2 s1 0.1 0.9 s2 0.2 0.8 The example above shows that creating a markovchainList S4 object is very simple.Moreover, the rmarkovchain function also works on objects from the "markovchainList" class.
#simulating a NHDTMC mysim<-rmarkovchain(n=100, object=mcList,include.t0=TRUE,what="matrix")head(mysim,n = 10) Finally, it is possible to infer a non -homogeneous sequence of DTMC, that is a markovchainList object from a given matrix, where each row represents a single trajectory and each column stands for a different period.

Numerical reliability
Finding the stationary distribution is a computational -intensive task that could raise numerical issues.The markovchain package relies on the R linear algebra facilities (built on LAPACK routines) when the eigen function is called to find the stationary distribution.An initial analysis of the numerical stability of the markovchain matrix computation has been performed estimating the error rate when calculating the stationary distribution on a large sample of simulated DTMC of a given size k (range set between 2 and 32).Initially, dense matrices were simulated.The following algorithm was used for a given k: 1. generate N random k-sized DTMCs, where each row r has been independently sampled from a Dirichlet distribution, r ∼ Dir( ᾱ).The Dirichlet parameters' vector, ᾱ is itself assumed to follow an Uniform distribution (sampled independently for each row).
2. try to compute the steady -state distribution for the simulated DTMC.
The R Journal Vol.9/2, December 2017 ISSN 2073-4859  The figure shown above displays the success rate observed by TM size.The success rate is higher than 95% for matrices no greater than 10 unit, then it decreases markedly and becomes lower than 50% for matrices bigger then 22.A deeper analysis allowed to identify that the failure reason was due to inaccuracy in the Dirichlet sampling function (row sums numerically different from zero).The TM simulation process was therefore revised normalizing the sum of each row to be numerically equal to one.The experiment was repeated at 2 3 , 2 4 , . . ., 2 8 TM sizes (wider matrices were not tested due to computational timing issues).The observed success rate was always 100% for the sampled TM sizes.
The first example deserves few more words, even if it does not demonstrate any shortcomings in the computational part of the package.Instead, it shows how easy it is to analyze numericallyincorrect TMs as the size of the problems dealt with increases.Various posts have been raised on this topic on the package Github address since the package was published on CRAN.A final test has been performed using TMs with a sparsity factor of 75%.The observed success rate is 100% for matrices wider than 2 5 , inexplicably lower (around 90%) for smaller matrices matrices.
The previous examples are clearly far to exhaustively assess the numerical reliability of the implemented algorithms that would require an much deeper analysis and beyond the scope of The R Journal Vol.9/2, December 2017 ISSN 2073-4859 the paper.In fact, the numerical reliability is likely to be significantly affected by particular TM structures.Nevertheless they can provide an initial insight about the dimension of the problems that the markovchain R package can "safely" handle.The R code used to generate the numerical reliability assessment herewith discussed is available in the "reliability.R" file within the demo folder of markovchain R package.

Computational performance
The computation time needed to estimate the TM from input data sequence depends by the size of input data, as the following example displays: # } plot(sizes, microseconds,type="o",col="steelblue", xlab="character sequence size",ylab="microseconds", main="Computational time vs size") q q q q q q 0 200 400 600 800 1000 The plot shows that the computation time increases linearly with the size of input data sequence, as expected.
The R Journal Vol.9/2, December 2017 ISSN 2073-4859 The last numeric example presented in the section discussing NHDTMCs shows the computational advantages of rewriting the kernel of core functions using Rcpp and RcppParallel snippets generated by (Allaire et al., 2016).The rmarkovchain function allows the final user to choose whether to use the C++ implementation and a parallel backend, by setting the boolean parameters useRcpp and parallel respectively.microbenchmark( rmarkovchain(n=100,object=nhmcFit$estimate,what = "matrix",useRCpp = F), rmarkovchain(n=100,object=nhmcFit$estimate,what = "matrix",useRCpp = T,parallel = F), rmarkovchain(n=100,object=nhmcFit$estimate,what = "matrix",useRCpp = T,parallel = T) ) The omitted output of the code snippet shown above demonstrates that the joint use of Rcpp and RcppParallel fastens the simulations around 10x with respect to the pure R sequential implementation.

Conclusions, discussion and acknowledgements
The markovchain package has been designed in order to make common operations on DTMCs as easy as possible for statisticians.The package allows to create, manipulate, import and export DTMCs.Further, the author believes that the current version of the package fully satisfies standard needs such as inference of underlying TM from empirical data, and states classification of a given DTMC.
The author believes that no other R package provides a set of classes, methods, and functions as wide as the one provided in markovchain, as of May 2017.The package's main vignette gives a complete descriptions of its capabilities, including bayesian estimation, statistical tests, classes and methods for continuous time MCs.Also, a separate vignette describes the functions designed to deal with higher order and multivariate MCs, and should still be considered experimental.In fact, such techniques are generally less used than standard DTMCs, and consequently much less literature, applied examples, and coded algorithms are available.
Clearly, an expanded version of the package's capabilities in that area is expected to be released in the future.Current development efforts target optimizing computation speed and reliability, and increasing the analysis capabilities regarding statistical inference.Rewriting core functions using Rcpp gave a major boost in terms of computing speed, as exemplified in previous sections.Moreover, the rewriting of the internal core parts of the code affected many functions, such as markovchainFit and markovchainFitList.Feedbacks provided by the users of the package at https://github.com/spedygiorgio/markovchain/issueshave been extremely useful for improving the package.To illustrate, bugs due to numerical issues have been found when analyzing relatively big MCs and have led to revising the steadyStates function to be computationally more robust.A known limitation of the package is the lack of a deep assessment of the performance of the package's routines for a relatively large TM.In fact, improving the numerical reliability of the package for large DTMCs is an area on which efforts will be certainly allocated in the near future.At this regard, the implementation of numerical methods methods shown in Stewart (1994) will be explored.
Finally, the package has been available on CRAN since Summer 2013.Notably, it has been granted a funding slot in both 2015, 2016 and 2017 Google Summer of Code (GSOC) editions.In particular, during 2015 GSOC a material part of R code has been ported in Rcpp coding, yielding considerable fastening in computational time.The author is extremely grateful to Tae Seung Kang, Sai Bhargav Yalamanchi and Deepak Yadav for their contribution in improving the package.A special thank should be given to the RJournal referees for their constructive comments.The R Journal Vol.9/2, December 2017 ISSN 2073-4859

Figure 2 :
Figure 2: Plot of the Mathematica MC DTMC process.

Figure 3 :
Figure 3: The communicating classes are the strongly connected components of the graph underlying the DTMC.

Figure 6 :
Figure 6: Computation time by size of input data sequence.
3. calculate the success rate as the relative frequency of previous step non -failures at size k .