ProjectManagement: an R Package for Managing Projects

Project management is an important body of knowledge and practices that comprises the planning, organisation and control of resources to achieve one or more pre-determined objectives. In this paper, we introduce ProjectManagement, a new R package that provides the necessary tools to manage projects in a broad sense, and illustrate its use by examples.


Introduction
Project management is an important body of knowledge and practices that comprises the planning, organisation and control of resources to achieve one or more pre-determined objectives. The most commonly used methods for project planning are PERT (Program Evaluation and Review Technique model) and CPM (Critical Path Method). PERT/CPM analyses the tasks involved in completing a project, especially the time needed to complete each task, and computes the minimum time needed to complete the total project. Through the data obtained in the analysis of the project, PERT/CPM identifies the critical activities, which are those for which any disturbance in its duration modifies the minimum time of execution of the project. Also, it obtains the times that can be assigned to non-critical activities, called slacks, in addition to their fixed durations, to give them flexibility. Project management often deals with the problem of redistribution of resources. Sometimes it is convenient to reduce the time of an activity by increasing the assigned costs. Other times, when the availability of resources is limited in a period of time, it may be necessary to level the use of those resources. These situations require a re-planning of the project.
Even with good project management, once the project has been carried out and the actual durations of the activities are known, there can be a delay in the completion time of the project. When the delay generates an additional cost, ways are needed to distribute the cost of the delay among the different tasks involved. To solve this problem we can use cooperative game theory and rules based on bankruptcy problems.
The essential elements related to project management can be found in Castro et al. (2007) or in Hillier and Lieberman (2001). Project management techniques have been widely used in all fields of engineering. Hall (2012) reviews the impact that such techniques have in various fields and their broad business opportunities. Their fields of application vary from classical construction and engineering to information technology and software development, including modern agile methods. Schmitz et al. (2019) also argue the usefulness of traditional project management techniques in the context of agile methodology. Evdokimov et al. (2018) include a case study that shows the current relevance of project management techniques in software development. Özdamar and Ulusoy (1995) present a survey of the problem of resource constraints. To distribute the delay cost of the project among the activities, Brânzei et al. (2002) provide two rules using, respectively, a game theoretical and a bankruptcy-based approach, and Bergantiños et al. (2018) introduce and analyse a consistent rule based on the Shapley value.
A well-known project management software is Microsoft Project. This tool is designed to create and control a project, through the allocation of resources to tasks, the management of budget and workloads, as well as monitoring developments. Microsoft Project is not open source and its license is fee-based. Other project management applications have been created as free software, such as OpenProj, PpcProject or ProMes (Gregoriou et al., 2013). In Salas- Morera et al. (2013) we can see a useful comparison of these applications.
The aforementioned tools are written in Java or Phyton. To the best of our knowledge, there are only two packages in R available for project management. PlotPrjNetworks (Muñoz, 2015) and plan (Kelley, 2018) are packages that offer the user the creation of a Gantt diagram for the visualization of the project structure. In our opinion, a tool was missing to manage a project from its development to its control. We believe that such a tool would be useful for the user community because it could be integrated with other tools developed in R, it could be easily modified to suit the specific needs of each user, and it could be wrapped into a graphical interface.
In this paper, we introduce ProjectManagement 1 (Gonçalves-Dosantos et al., 2020b), a new R package that provides the necessary tools to manage projects in a broad sense. It calculates the critical activities, the slack of each activity, the minimum duration of the project and the early and last times of each activity. It plots a graph of the project and the schedule. The package also allows cost management to reduce the minimum project time, as well as resource management. Once the actual durations of the activities are known, it is possible to distribute the delay generated in the project among the different activities. When activity durations are considered random variables, the package provides additional functionality. In particular, it calculates the average duration of the project and the criticality index of each activity. It plots a representation of the project duration distribution and the early and last times of the activities. And it calculates several allocation proposals of the delay cost when the project has been completed and the actual duration of the activities is known.
The paper is organized as follows. First, we recall the basic definitions of project management and present different ways to distribute the delay cost when durations are assumed to be known and when they are random variables. Then, we provide a description of ProjectManagement. Finally, we illustrate the use of the package by way of examples.

Project management
In this section we discuss the basic concepts of deterministic and stochastic projects with a special focus on allocating the delay cost among the project activities. The aim of this section is to provide a brief (and quick) survey of the methodologies implemented in the R package ProjectManagement that we introduce later, as well as to indicate the main bibliographical sources in which interested readers can deepen their knowledge of each of these methodologies.
Let X be a finite non-empty set and N be a set of ordered pairs (x 1 , x 2 ), with x 1 , x 2 ∈ X and |N| = n. A directed graph is a pair G = (X, N), where X is the set of nodes and N is the set of arcs. We say that an arc i = x i,1 , x i,2 ∈ N starts at node x i,1 ∈ X and ends at x i,2 ∈ X. A node x s ∈ X is a source node if there is no arc i ∈ N such that x i,2 = x s . A node x e ∈ X is a sink node if there is no arc i ∈ N such that x i,1 = x e . A cycle is a set of arcs i 0 , i 1 , ..., i m ∈ N such that x i j ,2 = x i j+1 ,1 , with j ∈ {0, ..., m − 1}, and x i m ,2 = x i 0 ,1 . To illustrate the concept of directed graph consider the following example. Take graph G = (X, N) given by The diagram representing this graph is depicted in Figure 1. This graph has one source (a) and one sink (d). Arc 3, for instance, starts at node b and ends at node d. This graph has no cycles. However, if we add an arc 5 = (d, a), the resulting graph has two cycles: 1, 3, 5 and 2, 4, 5.
Figure 1: Diagram of the directed graph G = (X, N). The circles represent the nodes and the arrows represent the arcs. This is the standard way of depicting a graph. A deterministic project P is a tuple P = G, x 0 , where G = (X, N) is a directed graph without cycles, with one source node and one sink node, and x 0 ∈ R n + is the vector of non-negative planned durations. In this context, N represents the set of activities in the project. We denote by P N the family of all deterministic projects with set of activities N, and by P the family of all deterministic projects.
In a deterministic project P = G, x 0 ∈ P N , you can calculate the minimum duration of P, denoted by D G, x 0 , i.e. the minimum time the project needs to complete all activities taking into account the structure of the graph. This time can be obtained as the solution of a linear programming problem, and thus, can be easily computed. Alternatively, D G, x 0 can be calculated using a project planning methodology like PERT (see, for instance, Hillier and Lieberman (2001) for details on project planning).
Given a node x ∈ X, we define the set of immediate predecessors of x as the set of activities ending in x, Pred (x) = {i ∈ N/x i,2 = x}, and the immediate successors of x as Suc (x) = {i ∈ N/x = x i,1 }. We define the earliest time D E i G, x 0 of an activity i ∈ N as the minimum time required to complete all immediate predecessor activities of x i,1 , i.e. the earliest start time the activity i can start taking into account the graph The latest completion time D L i G, x 0 of an activity i ∈ N is the latest point in time when the activity can end without delaying the project It is easy to see that D E i G, x 0 ≤ D L i G, x 0 for all i ∈ N. Also, we can calculate the minimum duration of a project, using the earliest start times, as D G, x 0 = max i∈N {D E i G, x 0 + x 0 i }. We define the slack S i G, x 0 of an activity i ∈ N as the maximum time, in addition to x 0 i , that i can use to complete its task without delaying the project If the slack for an activity is equal to 0, then this activity is critical, i.e. any perturbation in its time modifies the duration of the project. We can also define two other types of slack. The free slack of an activity is the maximum amount of time that this activity can be delayed without causing a delay in the project or in the earliest time of the other activities. The free slack of an activity can be calculated as The independent slack of an activity is the maximum time that the activity duration can be increased without affecting the times of others activities Given the slack of an activity, we define the latest start time as the latest time that an activity can start without delaying the project and the earliest completion time as the earliest time in which an activity can end if it starts in its earliest start time Besides the schedule of a project, we can manage the resources allocated to the activities. The minimal cost expediting or MCE method (Kelley, 1961) considers that the duration of some activities can be reduced by increasing the resources allocated to them and thus the implementation costs. An MCE problem is a tuple P,x 0 , c, D , where P is a deterministic project,x 0 ∈ R n + is the vector of minimum durations, that is, for each activity i ∈ N,x 0 i is the minimum duration that the activity can take if the resources allocated to carry it out are increased, c ∈ R n is the vector of unit costs, that is, for each activity i ∈ N, c i is the cost of accelerating a unit of time the duration of i, and D is the minimum duration of the project we are trying to achieve, with D < D G, x 0 . This problem can be solved as a linear programming problem.
Two other interesting problems that arise from the management of resources are the levelling and the allocation (Hegazy, 1999). These problems take into account that in order for activities to be carried out in the estimated time, a certain level of resources must be used. The problem of levelling of resources is to find a schedule that allows to execute the project in its minimum duration time D G, x 0 whilst the use of resources is as uniform as possible over time. In the problem of allocation of resources, the level of resources available in each period of time is limited. The aim is to find the minimum duration time and a schedule for the execution of the project taking into account this resource constraint. Given the complex nature of these problems, their exact resolution is computationally demanding. The most common practice is to use heuristic methods to solve them.
Once the project is completed, we can know the actual (observed) duration of the activities and, therefore, whether there has been a delay in the project, that is, whether the actual duration of the project has been different than expected. We define a deterministic project with delays as a tuple CP = G, x 0 , x, C , where G, x 0 is a deterministic project, x ∈ R n + is the vector of actual duration of the activities, and C : R + → R is the delay cost function. We assume that C only depends on the duration of the project, it is a non-decreasing function, and C D G, x 0 = 0. In practice, the most commonly used functions, for a vector y ∈ R n + , are We denote by CP N the family of all deterministic projects with delays with set of activities N, and by CP the family of all deterministic projects with delays.
In a deterministic project with delays CP ∈ CP N , we may need to allocate C (D (G, x)) among the activities. This can be useful for several reasons. For example, it can serve as an incentive for those responsible for the activities that have been delayed to be more diligent in similar projects that we may carry out with them in the future; or it can be a mechanism to distribute among those responsible for the activities that have been delayed the financial penalty that the project manager has contractually guaranteed. Brânzei et al. (2002) propose two rules based on bankruptcy problems to address this problem: the Proportional rule and the Truncated Proportional rule. Although they define these rules for the case x i ≥ x 0 i , we do not consider this restriction. These rules are only defined when the sum of the individual delays is not zero.
The Proportional rule for deterministic scheduling problems with delays φ is defined, for each i ∈ N, by The Truncated Proportional rule for deterministic scheduling problems with delaysφ is defined, In Bergantiños et al. (2018), the problem of allocating the delay costs is addressed in the context of cooperative game theory using a Shapley rule. As we illustrate later in an example, the Shapley rule allocates the delay costs in a more sensible way than the proportional rules, at least in some cases. It is much more costly to compute it but, in general, the extra effort is worthwhile. A TU-game is a pair (N, v) where N is a finite non-empty set, and v is a map from 2 N to R with v (∅) = 0. We say that N is the player (activity) set of the game and v is the characteristic function of the game, and we usually identify (N, v) with its characteristic function v. The Shapley value, an allocation rule in cooperative game theory, is a map Φ that associates to each TU-game and providing a fair allocation of v (N) among the players in N. The explicit formula of the Shapley value for every TU-game (N, v) and every i ∈ N is given by Since its introduction by Shapley (1953), the Shapley value has proved to be one of the most important rules in cooperative game theory and to have applications in many practical problems (see, for instance, Moretti and Patrone (2008)). Bergantiños et al. (2018) define the Shapley rule for deterministic projects with delays Sh as The calculation of the Shapley value has, in general, an exponential complexity. In this context, its exact calculation is impossible in practice, even for a moderate number of activities. As an alternative to exact calculation, Castro et al. (2009) proposed an estimate of the Shapley value in polynomial time using a sampling process. In practical terms, this estimate is a reasonable solution.
Next, we introduce a generalization of the model and the rules described above. It follows the results in Gonçalves-Dosantos et al. (2020a). If instead of x 0 i , the planned duration of activity i ∈ N , we consider the non-negative random variable X 0 i describing the duration of i, we can define a stochastic project SP as tuple SP = G, X 0 . Unlike in the deterministic setting, the duration of activities, the duration of the project, as well as the early and last times are now random variables instead of fixed numbers.
A stochastic project with delays is a tuple SCP = G, X 0 , x, C , where G, X 0 is a stochastic project, x is the vector of actual durations, and C : R + → R is the delay cost function. We assume that C is non-decreasing and C (D (G, 0)) = 0, where 0 ∈ R n is the vector with all components equal to zero. Proportional rules can be extended to stochastic projects with delays in a straightforward way.
The Stochastic Proportional rule for deterministic scheduling problems with delays φ is defined, for each i ∈ N, by The Stochastic Truncated Proportional rule for deterministic scheduling problems with delaysφ is defined, for each i ∈ N, bȳ Also ,  As an alternative to the previous rule, the Shapley rule in two steps for stochastic projects with delays SSh2 is defined by In general, the calculation of v SCP , v SCP 1 and v SCP 2 is very complex. In our package, we use simulations to approximate these characteristic functions.

ProjectManagement package
ProjectManagement is a new R package that allows the user to address different tasks in project management. The user can obtain the duration of a project and a schedule of activities, and can plot this schedule for a better understanding of the problem. When the actual durations of each activity are observed, the package proposes several allocations of the delay cost, if there was any, among the activities. In the stochastic context, the package estimates the average duration of the project and plots the density functions of the following random variables: duration of the project, and early and last times of the activities. As in the deterministic case, it can make an allocation of the delay cost, if any.
The following dependencies of the package must be taken into account: triangle (Carnell, 2019), plotly (Sievert, 2020), igraph (Csardi and Nepusz, 2006), kappalab (Grabisch et al., 2015), GameTheory (Cano-Berlanga, 2017) and lpSolveAPI (lp_solve et al., 2020). The first one is used for calculations with triangular distributions, the second one to plot interactive graphics, the third one to plot graphs, the next two are related to game-theoretic concepts and the last one to solve linear programming problems.
The functions incorporated in the package can be seen in Table 1. Note that for projects of more than 10 activities, functions delay.pert and delay.stochastic.pert will approximate the Shapley value through a sampling process. Table 2 describes the complete list of parameters used by the  functions. Tables 3, 4, 5, and 6 state which arguments use each function.

Function Description
dag.plot Plots the AON graph of a project. delay.pert Calculates the delay cost of a deterministic project and allocates it among the activities. delay.stochastic.pert Calculates the delay cost of a stochastic project and allocates it among the activities. early.time Calculates the earliest start time for each activity.

last.time
Calculates the latest completion time for each activity.

levelling.resources
Calculates the schedule of the project so that the consumption of resources is as uniform as possible. mce Calculates the costs per activity needed to accelerate the project. organize Relabels the activities of a project (if i precedes j then i ≤ j). rebuild Builds a type 1 precedence matrix.

resource.allocation
Calculates the project schedule so that resource consumption does not exceed the maximum available per time period.

schedule.pert
Calculates the duration of a project and the schedule of each activity, and plots the schedule and the AON graph.

stochastic.pert
Calculates the average duration of a stochastic project, the criticality index of each activity, and the density functions of the duration of the project, early times and last times. ProjectManagement allows the user to plot the activities on nodes graph of the Project (AON). Originally, in the PERT methodology, projects are represented by activities on arcs graphs (AOA). This is the representation we have used in this paper up to now. Both AON and AOA representations are widely used in the literature, each having some advantages over the other in particular circumstances. For automatically drawing the network of a project, the AON representation is more appropriate because it is computationally much more efficient. This is why we have incorporated it into the dag.plot function. This representation will be useful mainly for the user to check that he has entered the precedence matrices correctly, which are the ones that really characterize the project.
ProjectManagement also allows the user to choose from four different types of immediate precedences between the activities.
• Type 1: Finish to start (FS). If an activity i ∈ N precedes type 1 to j ∈ N, then j cannot start until activity i has finished.
• Type 2: Start to start (SS). If an activity i ∈ N precedes type 2 to j ∈ N, then j cannot start until activity i has started.
• Type 3: Finish to finish (FF). If an activity i ∈ N precedes type 3 to j ∈ N, then j cannot finish until activity i has finished.
• Type 4: Start to finish (SF). If an activity i ∈ N precedes type 4 to j ∈ N, then j cannot finish until activity i has started.
The relationships between the types of dependencies are as follows: Type 1 implies type 2, type 2 implies type 4, type 1 implies type 3 and finally type 3 implies type 4. Considering these relations, if one activity precedes another by more than one type, it is only necessary to indicate the one with the strongest character.
The user can indicate types 1 or 2 in the "prec1and2" parameter (see Table 2) using the values 1 or 2 respectively, and types 3 or 4 in "prec3and4" using 3 or 4 respectively. Note that cycles can not exist.

Parameter Description duration
Vector with the expected duration for each activity. prec1and2 Matrix indicating precedence type 1 or type 2 between the activities (Default=matrix (0)). prec3and4 Matrix indicating precedence type 3 or type 4 between the activities (Default=matrix (0)).

observed.duration
Vector with the actual duration for each activity.

max.resources
Value indicating the maximum number of resources that can be used in each period of time.
We start by introducing the data set characterizing the project. We use the function dag.plot for depicting its AON graph. Figure 2 shows it; the green blocks contain the activities and the precedences are represented by arrows. The blocks S and E are the source and sink nodes, respectively. Note that the precedences type 1 are arrows without label, precedences type 2 are labeled as SS, precedences type 3 as FF, and precedences type 4 as SF.
> minimum.durations<-c(1,1,0.5,1,1,2,2,3,1,3) > activities.costs<-c(1,2,1,1,3,2,1,2,3,5) > mce (duration,minimum.durations,prec1and2,prec3and4,activities.costs,duration.project=NULL The parameter duration.project=NULL means that we do not indicate a minimum duration of the project, so the function asks us for a decrease of the duration of the project to obtain all possible solutions. We have considered it convenient a decrease of 0.5 units of time. Therefore, we have obtained that the project can reduce its minimum duration to 10, 9.5, 9, 8.5, 8, 7.5 and 7. For each possible duration of the project, we have the durations of each activity (duration per column and activity per row), as well as the cost needed to reduce their times to these durations.
Suppose now that to complete the project each activity needs the amount of resources (6,6,6,3,4,2,1,2,3,1) , and we are interested in obtaining a new schedule with a uniform consumption of resources over time.
As we can see, the function returns the new earliest start times of the activities and the resources consumed in each period with the new schedule, where time periods start at 0 and end at 10.5 with an increase of 0.5 time units. Figure 6 represents the resources required in each period of time, before and after the readjustment.
To conclude with the analysis of resources, consider that the maximum amount of resources available in each period is 10. We use the resource.allocation function in this situation.
Continuing the example, we now analyse the allocation of delays. The function delay.pert shows if there has been a delay in the project and, in that case, allocates it among the activities. Let us see it using the delay cost function C (D (G, y)) = 0 if D (G, y) ≤ 10.5, D (G, y) − 10.5 otherwise, and the (observed) actual durations x = (2.5, 1.5, 2, 2, 2, 6, 4, 6, 3, 5.5) .
> observed.duration<-c(2.5,1.5,2,2,2,6,4,6,3,5.5) > cost.function<-function(x){return(max(x-10.5,0))} > delay.pert(duration,prec1and2,prec3and4,observed.duration, delta=NULL,cost.function) There has been a delay of = 3 The output shows that there is a delay in the project of 3 units. As there is a delay, we proceed to make the allocation using three rules: proportional, truncated proportional and Shapley. We can see the differences between the three rules, especially in activities 1, 4, 7 and 9. While the proportional and truncated proportional rules assign a positive payment, the Shapley rule does not assign costs to these activities. This is due to the fact that, although they fall behind the planned duration, they do not affect the overall delay of the project. Note that if the project has more than ten activities, delay.pert does not calculate the Shapley rule; instead, it asks the user if he wants to calculate an estimate of its value.
> values<-matrix(c(1,3,2,2/3,0,0,1/2,5/4,5/4,1/4,5/2,7/4,1,3,2,1,5,3/2, 1,7,1,3,5,4,1/2,3,5/2,1,8,6),nrow=10,ncol=3,byrow=T) > distribution<-c("TRIANGLE","EXPONENTIAL",rep("TRIANGLE",8)) > stochastic.pert (prec1and2,prec3and4,distribution,values,percentile=0.95, plot.activities.times=c(7,8)) Average duration of the project = 10.64242 Percentile duration of the project = 14.21999 1 2 3 4 5 6 7 8 9 10 Criticality index by activity 0.6 88 11.4 2 0.1 11.3 2.6 86 4 93.4 In the output we can see the average duration of the project and the percentile duration of the project. The percentile duration of the project shows the value of such that the probability that the duration of the project is smaller than d equals the variable percentile introduced by the user (see Table  2); in this case percentile=0.95. In addition, we obtain the criticality index by activity, that is, the proportion of times that an activity is critical. An activity is critical when it has zero slack. Figure 7 plots estimations of the density function of the project duration, the earliest start time and the latest completion time of activities 7 and 8. We proceed now to the allocation of the delay cost in the stochastic model using the function delay.stochastic.pert. To be able to compare the results, we will use the same delay cost function as in the deterministic case. As expected, there are noticeable differences in the allocations between the two models, as the stochastic model makes use of more complex information.
> delay. stochastic.pert(prec1and2,prec3and4,distribution,values,observed.duration,percentile=NULL,delta=NULL,cost Finally, to illustrate the runtime of previously used functions, Table 8 shows the time (in seconds) needed to compute several problems. We have selected a variety of projects with 2, 4, 6, 8 and 10 activities, and we have run the different routines on a computer with Intel Core i5 − 7200U and 12 GB of RAM.