Analyzing Dependence between Point Processes in Time Using IndTestPP

The need to analyze the dependence between two or more point processes in time appears in many modeling problems related to the occurrence of events, such as the occurrence of climate events at different spatial locations or synchrony detection in spike train analysis. The package IndTestPP provides a general framework for all the steps in this type of analysis, and one of its main features is the implementation of three families of tests to study independence given the intensities of the processes, which are not only useful to assess independence but also to identify factors causing dependence. The package also includes functions for generating different types of dependent point processes, and implements computational statistical inference tools using them. An application to characterize the dependence between the occurrence of extreme heat events in three Spanish locations using the package is shown.


Introduction
A point process in time (PP in short) is a random collection of points in a space in R + where each point usually represents the time of an event. Examples of events that can be modelled as point processes in time include the occurrence of earthquakes, heat waves or the arrivals of insurance claims. Many real problems involve not one but two or more PPs, and they should be studied in a multivariate framework where a description of the independence or dependence structure between them is required. Examples of this situations are the timing of the trades and mid-quote changes in stock exchange, the occurrence of temperature extremes or other climate events at different spatial locations, or the synchrony detection in spike train analysis.
In those situations, statistical tests are required to assess the independence between two or more PPs. If we can assume that the processes are independent, their modelling is much simpler, since it can be carried out separately for each process without any loss of information. The tests are also useful to identify the type of dependence, and to select the type of vector of point processes which can be used to model them. The need of testing independence between PPs appears in climate and environmental sciences, (Cronie and van Lieshout, 2016;, in neuroscience (Tuleau-Malot et al., 2014;Albert et al., 2015), in biology (Myllymäki et al., 2017) and many other fields.
Two types of independence between PPs may be of interest, general independence (Rubin-Delanchy and Heard, 2014a) and independence given the intensities of the processes. The election of the type of independence as null hypothesis depends on the aim of the study, but the second type is more useful in modelling problems based on PPs. In effect, the most frequent approach to model systematic dependence structures caused by common factors, is the use of nonhomogeneous processes with intensities which are functions of the same or dependent covariates. To analyse if the dependence is well represented by those covariates, the null hypothesis of independence given the intensities has to be checked. When the existing dependence cannot be explained by the available covariates, models taking into account that dependence should be considered to model the vectors of PPs.

Vectors of point processes in time
A point process in time, N, is defined in R + and can be described in different equivalent ways. Here, we will mainly use the sequence of occurrence times, T 1 , T 2 , ..., T n , and also the set of random variables N(A) representing the number of points in A, for each A ∈ R + . The notation for A = (0, t] is N(t) = N((0, t]). The intensity measure of the process Λ gives the expected number of points in a set, so that Λ((0, t]) = E(N((0, t])). Its derivative function, provided it exists, is the intensity function, If the intensity is constant, the process is homogeneous, and nonhomogeneous otherwise. The most known PP is the Poisson process, where N(A) has a Poisson(µ A ) distribution, with µ A = A λ(t)dt, and N(A 1 ), . . . N(A k ) are independent variables provided that A i ∩ A j = ∅, ∀i = j.
Herein, we will consider vectors of point processes N = (N 1 , . . . , N d ) observed in the same space Ω = (0, T] ⊂ R + . This definition must not be mixed up with a multivariate point processes defined as a process of random points X i = (X i1 , . . . , X id ) in a d-dimensional space V ∈ R d , whose simplest example with d = 2 is a spatial point processes. Figure 1 shows the differences between the two concepts with d = 3. A vector of PPs can be seen as a marked point process with discrete marks, and can be represented by a countable collection of pairs (T i , D i ) where T i ∈ R + are the occurrence times and D i ∈ {1, . . . , d} are the component indexes.
Most of the results in this work are developed for vectors of d = 2 processes, denoted by (N x , N y ) with intensities λ x (t) and λ y (t). If the results can be extended to higher values of d, it is specified. The n x and n y points in each observed process are denoted t 1 , . . . , t n x and s 1 , . . . , s n y respectively.
Many types of dependence structures can appear between the marginal processes of a vector. The most direct way of modelling it is to use models to represent the dependence between the occurrence times of the processes, such as the common Poisson shock processes, the queue processes, the Poisson processes with dependent marks or the multivariate Neyman-Scott processes, described later.

Point processes in R
There exist many packages in R devoted to the analysis of spatial point processes: the extensive spatstat (Baddeley et al., 2015), whose main functionalities include exploratory data analysis, modelfitting, and simulation, stpp (Gabriel et al., 2020), splancs (Rowlingson and Diggle, 2017), and many others. IDSpatialStats (Giles et al., 2019) provides spatial dependence measures, and future directions include the extension to the spatio-temporal case. However, the number of packages dealing with the analysis of point processes in time is not so high, and most of them deal with univariate analysis of the processes. NHPoisson  provides a global framework for the modelling and diagnosis of Poisson processes in time, PtProcess (Harte, 2010) fits and analyses time dependent marked point processes with an emphasis on earthquake modelling, and mmpp (Hino et al., 2017) offers various similarity and distance metrics for marked point processes.
The aim of IndTestPP is the analysis of vectors of point processes in time, in particular of its dependence, and it provides a general framework for all the steps involved in this type of analysis: data processing, estimation of the marginal intensities of the processes, analysis of independence given the intensity, identification of factors causing dependence, and inference tools based on computational statistics. mppa (Rubin-Delanchy and Heard, 2014b) provides a test for dependence between point processes on the real line, but with a different aim since it tests general independence. The three families of tests implemented in IndTestPP are more general since they are not restricted to Poisson processes, and they test independence given the marginal intensities. This type of conditional independence is more useful in statistical modelling of vectors of point processes, since it helps to identify the factors that cause the dependence. An example of how all the modelling process of a vector of point processes can be carried out using IndTestPP is shown in the application section.

Testing independence between point processes in time
Most of the analysis of independence between point processes in the literature involve spatial processes, but few works deal with the study of independence between processes in time. IndTestPP includes the three families of tests to assess independence between PPs in time by , the POISSON, the CLOSE, and the CROSS families, and a graphical tool, the Dutilleul plot. In all of them, the null hypothesis is the independence between the point processes, given their marginal intensities, and the alternative the existence of any type of random dependence between them. All the tests in these families are constructed by keeping fixed the first observed process, a common approach to test independence given the marginal structure. However, each test is based on different assumptions, and all together they cover a wide range of types of processes appearing in real problems.

POISSON family
The family of tests POISSON is implemented in the function CondTest and it includes two tests to assess the independence between two homogeneous or nonhomogeneous processes, based on the conditional distribution of N y |N x . The assumptions of the tests are that N y is a Poisson process with intensity function λ y (t), specified in the vector argument lambday.
This family is based on the following property. If N x and N y are independent, and a point t i occurs in N x , the distribution of N y does not change. Then, Y i , the number of points in N y in intervals l i of length 2r around t i , follows a Poisson(µ i ) distribution with µ i = l i λ y (t)dt.
Two options are available to perform a test. The test implemented with the argument type= Poisson is based on the fact that, under independence between N x and N y and if not overlapping intervals are used, the statistic Normal approximation is implemented with the argument type= Normal . Again, under the null and with not overlapping intervals, the variables (Y i − µ i )/ √ µ i are zero mean independent variables with standard deviation equal to 1, and the asymptotic distribution of the statistic is N(0, 1). If the argument is type= All , both tests are calculated.
The intervals where the number of points are counted, l i , are centred intervals around points t i of radius r specified by the argument r. If the argument changer=TRUE, when two intervals overlap, their lengths are shortened by half of the intersection period, so that the resulting intervals are disjoint, and independent Y i are obtained.
The power study by  shows that the Normal test performs better provided that conditions to guarantee the normal approximation are fulfilled. These conditions are quite weak, even with a complex intensity, mean values of µ i around 0.6 points per interval lead to a valid approximation with n x = 50, and around 0.3 with n x = 100.

CLOSE family
The CLOSE family includes two tests, the parametric bootstrap (PaB) and the Lotwick-Silverman (LoS) tests, implemented in the functions TestIndNH and TestIndLS, respectively. The LoS test can only be applied to homogeneous processes, but the PaB test also to nonhomogeneous ones. On the other hand, the LoS does not require any assumption to be applied, while PaB requires that N y follows a parametric model with a generation algorithm, such as a Poisson or a Neyman-Scott cluster process. Although both tests allow to check the independence between d processes in general, the calculation of the statistic is only implemented for d = 2 and d = 3.
These tests are based on the close point distance and they aim to compare the behaviour of the sets of close points in a vector of observed processes and in vectors with the same marginal distributions but independent components . A point s j in N y is a close point of t i in N x , if the intervals to their previous points, s j−1 and t i−1 , overlap. The set behaviour is summarized bydx i , the mean of the distances between t i and its close points, |s j − t i |. The set of close points and the mean distance for each point t i are calculated by the functions uniogentri and DistObs respectively.
Given the complexity of the test statistic, its distribution has to be obtained by computational statistical methods. These methods require approaches to generate a sample of r processes N * j = (N x , N * y,j ), where the observed process N x is fixed, N * y,j has the same distribution than N y , and N x and N * y,j are independent. The PaB and LoS tests result from two different generation approaches. Parametric bootstrap test. In this test the N * y,j processes are generated independently from N x using a parametric model. Two types of marginal models are implemented in TestIndNH: Poisson processes (type="Poisson") and Neyman-Scott cluster processes (type="PoissonCluster").The generation of Poisson processes in a given period uses the function simNHPc, based on a two-step algorithm which generates homogeneous occurrence times, and transform them into the points of a NH process with intensity λ(t) (Ross, 2006). Neyman-Scott cluster processes are obtained by the function IndNHNeyScot. Details about these processes are explained later.

Lotwick-Silverman test (LoS).
TestIndLS generates processes using a Monte Carlo method conditional on the observed marginal structure (Lotwick and Silverman, 1982). The steps are the following, 1. The observed processes (N x , N y ) are wrapped onto a circumference by identifying the opposite sides of the time interval where they are observed.
2. Fixing N x , a new N * y is generated by translating N y a random uniform amount on the circumference. This breaks any dependence between the processes and keeps the marginal distributions, provided they do not change over time.
The mean distances of the close point sets in the generated vectors of processes in the PaB and the LoS tests are calculated by the functions DistSim and DistShift, respectively. The calculation of the p-value requires the generation of processes in two steps of the algorithm, to calculate the expectation of the mean distancesdx ij and to estimate the distribution of the statistic in a correlated sample. This calculation is implemented so that the same generated processes are used in the two steps and the computing time is kept low; otherwise, it would be multiplied by the number of simulations. Moreover, parallel computation is implemented.
According to the power study by , both LoS and PaB tests have a high power, but LoS performs slightly better in homogeneous processes with small samples and low dependence.

CROSS family
The CROSS family includes two tests based on the cross K and the cross J spatial functions adapted to the case of PPs in time, implemented in the functions NHK and NHJ, respectively. These functions also provide estimators of the cross functions. They do not require any assumption about the distribution of the marginal processes, only to know their intensities, and the p-values are calculated using a LoS approach. The tests can be applied to two homogeneous or nonhomogeneous processes, and more generally to two sets of processes, C = (N x1 , N x2 , . . . , N xl C ) and D = (N y1 , N y2 , . . . , N yl D ). The information about the processes is provided by arguments posC, a vector containing all the occurrence times in the processes in C, and typeC, a vector containing the code j of the process N xj where each point in posC occurs; D is specified analogously by posD and typeD. For the sake of simplicity, the results are expressed for the case, C = (N x ) and D = (N y ) with intensities λ x (t) and λ y (t). K-function. K xy (r) is the expected value of the number of points in N y within a distance r of a randomly chosen point in N x , adjusted for the possible time varying intensity. NHK calculates two different estimators of K xy (r) at a given grid of r distances, and the corresponding test statistics based on them, K = 1 R ∑ r R r=r 1K xy (r)/2r. The estimator calculated by default (typeEst = 2) performs better in terms of size and power .

J-function.
It compares the functions D xy (r) (distribution function of the distances from a point in N x to the nearest point in N y ) and F y (r) (distribution function of the distances from a point in the space to the nearest point in N y ), in terms of the ratio J xy (r) = (1 − D xy (r))/(1 − F y (r)), if F y (r) < 1.
The estimators of D xy (r) and F y (r), calculated by NHD, and NHF, are used in NHJ to estimate J xy (r). To estimate F y (r), a grid of L values is required. It can be provided in argument L or an automatic selection is calculated otherwise. In homogeneous PPs, the previous estimators are equal to the empirical distribution functions, and the calculation algorithms are changed to reduce the computational cost. The test statistic, which summarizes the deviations of the function from 1, is J = 1 R ∑ r R r=r 1 |Ĵ xy (r) − 1|. In both functions NHK and NHJ, the grid of r distances where the estimators are evaluated are provided in argument r. If it is NULL, an automatic selection based on length T is carried out. The test statistics are also evaluated at that grid and, since dependence often appears between close observations, the addends with high r can make more difficult to discriminate between dependent and independent processes. To avoid that effect, the statistic can be calculated using only the addends with r < r0 by using the argument rTest=r0. To identify and adequate value of r0, valuesK xy (r) or K xy (r)/2r can be optionally plotted.
Computation of the p-values. The calculation of the p-value in CROSS tests is based on a LoS approach for nonhomogenous processes. First, the observed processes (N x , N y ) are wrapped onto a circumference. Then, fixing N x , a new N * y is generated by translating N y and its intensity λ y (t), a random uniform amount. This breaks any dependence but, in nonhomogeneous PPs, it changes the distribution of the marginal processes. However, since the cross functions are adjusted for the time-varying intensity, which is also translated, valid samples of K xy (r) (or J xy (r)) under independence are obtained. Using the empirical distribution of those samples, the p-value and confidence envelopes for K xy (r), or J xy (r), are obtained. Parallel computation is implemented for these calculations.

Dutilleul plot
The function DutilleulPlot carries out the Diggle's randomization testing procedure extended by Dutilleul (2011), which graphically assesses the independence between two homogeneous or nonhomogeneous Poisson processes, given their marginal structure. The idea is to plot the cumulative relative frequency of the nearest neighbour distances between the points in the two observed processes, and to analyse the independence using a confidence band calculated from simulated independent Poisson processes with the observed marginal intensities.

Dependence measures
Unfortunately, there does not exist a general definition to quantify the dependence between two PPs. However, we suggest some measures implemented in IndtestPP which can be useful to describe the level of dependence between many types of processes.
Correlation between the counting variables of two PPs. CountingCor calculates a sample estimator of ρ L xy = Cor(X i , Y i ), the correlation coefficient between X i and Y i the number of points in processes N x and N y , in an interval l i of length L, using a partition of the observed period. Given the discrete character of X i and Y i , and since the usual aim is to quantify any type of dependence, not only linear correlation, Spearman or Kendall coefficients are often more adequate. Kendall should be preferred with short intervals, since there will be a high number of 0 or 1 occurrences per interval, and the Kendall Tau-b coefficient implemented in the function makes an adjustment for ties.
In nonhomogenous processes, variables X i (and Y i ) in intervals measured at different times are not i.d. In the case of Poisson processes, CountingCor can calculate a standardised version of the measure, so that all the variables have the same mean and variance, and if Λ x,i = ∑ t∈l i λ x (t) and Λ y,i = ∑ t∈l i λ y (t) are high enough, they are also i.d., This coefficient measures the correlation given the marginal intensities. That means that the coefficient measures the correlation, once that dependence captured by the intensities (through common covariates for example) has been removed.

Percentage of concordant intervals.
A simpler descriptive measure is the percentage of concordant intervals, that is the percentage of intervals with occurrences in both processes. It is calculated by BinPer as n x,y /(n x,y + n x,0 + n 0,y ), where n x,y is the number of intervals with at least one point in both processes, and n 0,y and n x,0 are the number of intervals with at least one point in one process and 0 in the other. This percentage will tend to be zero in short intervals of independent processes, while positive values will suggest positive dependence. An adequate length of interval, that depends on the marginal intensities, has to be selected to obtain useful interpretations.
Extremal dependence coefficients. In the case of PPs resulting from a Peak over threshold (POT) approach, another interesting measure is the extremal dependence between the variables X and Y where the POT approach is applied. The extremal dependence is the tendency for one variable to be large, given that the other one is large. The extremal dependence coefficient χ of X given Y is defined as and (U, V) are the transformed uniform marginals of X and Y. Another extremal coefficient is χ x|y is on the scale [0, 1], with the set (0, 1] corresponding to asymptotic dependence, and the measureχ x|y falls within the range [−1, 1], with [−1, 1) corresponding to asymptotic independence. Thus, (χ x|y > 0,χ x|y = 1) signifies asymptotic dependence, and the value of χ determines the strength of dependence within that class; (χ x|y = 0,χ x|y < 1) signifies asymptotic independence, andχ x|y determines the strength of dependence within that class. Full details can be found in Coles et al. (1999).
The function depchi estimates the functions χ x|y (u) andχ x|y (u) in a given grid of values u. Both χ x|y (u) and χ y|x (u) (orχ x|y (u) andχ y|x (u)) are calculated and optionally plotted. These graphs are useful to estimate the limit of the function and to obtainχ x|y (and χ x|y ). In the plot of χ x|y (u), the expected behaviour under independence is also plotted.

Generating point processes with different dependence structures
The generation of vectors of PPs with a given dependence structure is necessary to implement Monte Carlo, parametric bootstrap or other inference methods based on simulation, such as those described in section Inference based on computational statistical methods. There are different approaches to model the dependence between the marginal processes in a vector of PPs, but the most direct way is to model the dependence between the occurrence times of the processes. IndTestPP includes functions for the generation of four types of vectors of homogeneous or nonhomogeneous PPs, which will be described later in this section: common Poisson shock processes, multivariate Neyman-Scott processes, queue processes and marked Poisson processes. These types of vectors allow to model three dependence structures frequently observed in real problems.
• Dependence between two or more PPs provoked by the same shock triggering event. This is the most common dependence structure, and can be modelled by a common Poisson shock process (CPSP), or by a multivariate Neyman-Scott process (MNSP). Both models show a short term and positive dependence, generated by common shocks, but in each one the shocks yield points in the processes in a different way. This dependence appears in the spike trains of two neurons, in climate and environmental processes or in financial problems, for example when a political crisis provokes the occurrence of large decreases in different economical indexes.
• Dependence between shifted processes. This is a point to point dependence, where the occurrence of an event in a process triggers an event in the other, so that the points in N x are shifted a positive random amount in N y . It can be modelled by a queue or a network of queues (QUE). Examples with this type of dependence are the processes of the reporting and resolution times of insurance claims, or the occurrence times of floods provoked by an event of intense rainfall.
• Dependence between neighbour points in different processes. It appears when the occurrence of an event in one process boots or blocks the occurrence of an event in the others. It can be modelled by a marked Poisson process with dependent marks, generated for example by a Markov chain (MPP). This model yields medium or long term dependence, since given that the process of all the points is a Poisson process, a model of rare events, the distance between consecutive points tends to be large. An example with this type of dependence is the process of the growth of a specie of plant which favours or avoid the growth of another plant during a period of time.

Common Poisson shock processes
The function DepNHCPSP generates d dependent Poisson processes, which are the marginal processes of a CPSP. A CPSP, see  for full details, is a multivariate PP with an underlying Poisson process of shocks, N 0 , which may yield a point in one or more of d marginal processes N j . These marginal processes are dependent Poisson processes, where the dependence between N i and N j only comes from the occurrence of simultaneous points in those processes.
Generation algorithm. The CPSPs show a property which straightforwardly leads to a generation algorithm: they can be decomposed into m independent Poisson processes, with m = ∑ d k=1 ( d k ). The m indicator processes are the processes of the points occurring only in N 1 , ..., only in N d , simultaneously only in N 1 and N 2 , ..., simultaneously only in N 1 , N 2 and N 3 ,..., and finally, simultaneously in N 1 , N 2 ,... and N d . For example, a CPSP with d = 2 is decomposed into three independent indicator processes, N (1) , N (2) and N (12) , with intensities λ (1) , λ (2) and λ (12) . Each marginal process N j can be expressed as the sum of all the indicator processes including the index j, and its intensity is the sum of the indicator intensities. In the case d = 2, N 1 = N (1) + N (12) , N 2 = N (2) + N (12) , λ 1 = λ (1) + λ (12) and λ 2 = λ (2) + λ (12) . Then, d dependent Poisson processes can be generated in two steps.
2. Each N j is obtained as the union of the points in the indicator processes with index j, N (.j.) .
The intensity of the processes to be generated with DepNHCPSP is specified in argument lambdaiM, a matrix whose columns are the intensity vectors of the indicator processes. Independent Poisson processes in th same period time cannot be generated using DepNHCPSP, but with IndNHPP.
Estimation. It is simple since it reduces to the identification of the indicator processes and the estimation of m independent Poisson processes. In the case d = 2, CPSPpoints identifies the three indicator processes, using as input the points in the two marginal processes. The related function CPSPPOTevents calculates the occurrence times, length, maximum and mean intensity of the extreme events of the indicator processes of the CPSP resulting from a POT approach. The marginal and indicator processes of a CPSP are plotted by the functions PlotMCPSP and PlotICPSP respectively. Poisson processes can be fitted to the indicator processes using the pacakge NHPoisson.

Multivariate Neyman-Scott processes
The function DepNHNeyScot generates d dependent PPs which are the marginal processes of a MNSP. A Neyman-Scott process (NSP) is a process of clusters of points such that the cluster centers C i are a Poisson process, the number of points in each cluster, Z i , are independent Poisson variables possibly with different means µ i , and the distances of each point t j in a cluster to its cluster centre, D ij , are i.i.d. variables. We call a multivariate NSP to a vector of NSP with the same cluster centres. The marginal processes are dependent processes, but they are not Poisson.
Generation algorithm. The previous definition leads to the following generation algorithm. 1. A Poisson process with a given intensity is generated to obtain the cluster centres C i .
2. Given the number of generated cluster centres J, independent series of the number of points in each cluster, (Z l i ) for i = 1, . . . , J, for each marginal process l with l = 1, . . . , d, are generated using Poisson distributions.
3. Given the series (Z l i ), independent distances D l ij to the cluster centre C i for j = 1, . . . , Z l i , i = 1, . . . , J are generated for each marginal process l. The points in each marginal process are obtained as C i + D l ij .
DepNHNeyScot implements two common distributions to model the distances from the points to the cluster centre, N(0, σ) or U(min, max). High values of σ or the range (max − min) lead to a high variability around the centre and to a lower dependence between the processes. Independent NSP in the same period time cannot be generated using DepNHNeyScot, but with IndNHNeyScot.
This is the only model whose estimation is not easy, since the cluster centres are usually unobserved, and they are required to estimate both the underlying Poisson process and the distances of the points in each cluster to its cluster centre.

Queue processes
DepNHPPqueue generates d dependent Poisson processes using d − 1 queues in a tandem. A queue models the input and output times of a customer in a waiting line. In a tandem of queues, the servers are lined up one behind the other, so that an arriving customer undergoes each server before leaving the system. DepNHPPqueue generates all the intermediate processes in a tandem where the first queue can be M(t) \ G \ 1 or M(t) \ G \ ∞, that is a queue where the input is a nonhomogeneous Poisson process, service times have a general distribution G, and there are one or infinity servers.

Generation algorithm. The generation of homogeneous PPs is based on Burke's theorem, that states that if the input of a queue is a homogeneous Poisson process, the output is a dependent
Poisson processes with the same intensity λ. Keilson and Servi (1994) stated that if the input is a nonhomogeneous Poisson process, the output is a Poissonn process whose intensity is the convolution λ out (t) = λ(t) * g(t), where g(t) is the density function of G. Then, the generation algorithm is,

Generation of the input process using a Poisson process with intensity λ(t).
2. Generation of independent time services s k with distribution G for each point i k in the input process.
3. Generation of the output process using the generated input points and time services. If there is only one server, the output times o k depend on the state of the queue: it is o k = o k−1 + s k if the queue is not empty (that is if o k−1 > i k ), and o k = i k + s k otherwise. If there are infinity servers, there are no queues, and o k = i k + s k .
4. The resulting output process is the input process of the following queue.
Steps 2 to 4 are repeated up to obtain d dependent Poisson processes.
The distribution G may be any distribution in stats, see Distributions. The length of the argument lambda fixes T, the length of the observed period, although in homogeneous processes, we can fix instead the number of points to be generated (argument nEv). The vector of the output intensities λ out (t) is part of the output of the function. It is expected that low λ(t) values and mean serving times lead to short queues and, consequently, high dependence between processes.
Estimation. Since the marginal processes are Poisson, they can be fitted and modelled using the package NHPoisson. Additionally, if the connection between the input and output points is known, the service times are the difference between them, and their distribution can be also easily estimated.

Marked Poisson processes with dependent marks
DepNHPPMarked generates d dependent processes which are the marginal processes of a MPP with marks generated by a d-state Markov chain. A MPP is a Poisson process in which a variable, called a mark, is attached to each point. In this case, the marks are discrete variables taking values in {1, . . . , d}, which determine in which of the d marginal processes N j , a point occurs. Given the Markov chain structure, defined by a transition matrix P = (p ij ), only adjacent marks are dependent. The marginal processes are Poisson if and only if the marks are independent.
Generation algorithm. Applying the previous definition, the generation algorithm is simple.

Generation of the points in a Poisson process with a given intensity λ(t).
2. Generation of marks by a Markov chain. It implies an iterative generation of values in 1, . . . , d, given the previous mark, using a multinomial distribution with probabilities given by P.
3. Each marginal process N j includes the points in the Poisson process with marks j.
SpecGap calculates the spectral gap, a measure of the dependence generated by a Markov chain which assesses the convergence speed of the transition matrix to a matrix with the same stationary distribution and equal rows (that is with independent marks). Processes with lower spectral gap yield more dependent marginal processes. Independent Poisson processes can be generated using IndNHPP or a transition matrix with equal rows in DepNHPPMarked.

Estimation.
Given that the process of all the points in the marginal processes is Poisson, λ(t) can be estimated using NHPoisson. TranM estimates the transition matrix of the Makov chain using the MLE based on count data. Then, the estimators of the marginal intensities areλ j (t) =λ(t) ∑ d i=1p ij .

Inference based on computational statistical methods
There are many parameters of potential interest in a vector of point processes, where inference tools based on exact or asymptotic distributions are not available. Inference based on computational statistical methods such as Monte Carlo (MC) or parametric bootstrap are a useful alternative in those cases. IntMPP uses these methods to implement point estimation and calculation of confidence intervals and envelopes of a parameter, or vector of parameters, related to a vector of PPs. The only requirement for the parameters of interest is that it must be possible to estimate them from the observed processes. Some examples are the vector of the number of points in each process in a given time period, or the time of occurrence of the k-th point in the vector.
The idea to construct confidence intervals or envelopes using computational statistical methods is simple when the distribution of the vector of processes is completely known (Monte Carlo approach). In real problems, the parameters of the distribution of the vector of processes are rarely known, and parametric bootstrap methods, where the parameters are estimated from the sample, have to be used. The basic idea is to generate a sample of n s vectors of processes with the distribution. A value of the statistic of interest is calculated from each generated vector, so that a sample of size n s of values of the statistic is obtained. The lower and upper bounds of an interval with a (1 − α)% confidence level are the α/2 and 1 − α/2 quantiles of the generated sample, and the point estimator, the sample mean. Standard tests of hypothesis can be implemented, using those intervals in the usual way.
The two main arguments of IntMPP are fun.name, a function to define the estimator of the parameter, and funMPP.name, a model to generate the vectors of processes. The estimator in fun.name must be a function of the points in the vector of PPs (defined as a list which must be the first argument of the function) and any number of additional arguments, provided by argument fun.args. The models in funMPP.name can be DepNHCPSP, DepNHPNeyScot, DepNHqueue and DepNHPPMarked, or any other implemented by the user. The only requirement of those models is that the first element in the output has to be a list with d elements defining the vector of PPs. Additional arguments for the models are given in funMPP.args. Parallel computation is implemented in this function.

Analysis of the occurrence of extreme heat events in three locations using IndTestPP
This section illustrates how the package IndTestPP can be used to carry out all the steps in the analysis of the occurrence of the extreme heat events (EHEs) in three Spanish locations, Barcelona (B), Zaragoza (Z) and Huesca (H), using a vector of point processes.

Data
The series TxB, TxH and TxZ are the daily maximum temperature in Celsius degrees, during the warm season (May to September) from 1951 to 2016, at Barcelona, Huesca and Zaragoza respectively. The series were provided by the Spanish Meteorological Office (AEMET) and they are stored in the data frame TxBHZ in the data set TxBHZ, available in the package. The days which are not observed in the three series are considered as missing observations, so that three series with 8262 complete observations are available. The date (day, month and year) and day of the year (dayyear) of the observations, and some variables representing the general temperature evolution are also available in the data set. The three locations are sited in a triangle where Barcelona is in the East, around 250km away from the others, and Huesca is 67 km to the North of Zaragoza.
Using the peak over threshold (POT) approach, an EHE is defined as a run of consecutive days where the temperature is over an extreme threshold, and its occurrence point is the day of maximum temperature in the run. The threshold is the 95 th percentile of the series in a reference period, (months of June, July and August in 1981-2010) being 31.3, 36.4 and 37.8 o C in Barcelona, Huesca and Zaragoza respectively. The series are recorded at a discrete time scale, but given that the time unit is short compared with the length of the observed period and that the occurrence intensity of EHEs is quite low, the use of a continuous point processes to model the occurrence of EHEs in a series is justified. The EHEs may affect or not the three locations depending on the type of atmospheric situations which caused them. Then, our aim is to model the occurrence of the EHEs in the three series using a vector of point processes which take into account the dependence between them and to identify the factors causing that dependence.

Processing data and preliminary analysis
To identify the occurrence times of the EHEs in a series using the POT approach, the function POTevents.fun in NHPoisson is used. The case of Zaragoza is shown as an example, and their 104 occurrence times are stored in posZ. Analogously, the 106 and 121 occurrences in Barcelona and Huesca are stored in posB and posH. The temperature series are highly correlated, with Pearson coefficients ρ BH = 0.76, ρ BZ = 0.73, and ρ HZ = 0.94, but to measure their extremal dependence, a more specific measure, such as the extremal dependence coefficients are used. The functions for the analysis between TxZ and TxB are shown as an example, but the pairwise dependence between the three locations is analysed.  Figure 3: Plots from depchi ofχ x|y (u) andχ x|y (u) to estimate the extremal dependence coefficients.
The functions CountingCor and BinPer calculate another extremal dependence measures, the correlation coefficient between the number of EHEs in intervals of a given length ll and the percentage of concordant intervals, The dependence given the empirical intensity of one process (obained by function emplambda.fun in NHPoisson) can be graphically analysed using the Dutilleul plot. Figure 4 shows the plot for Zaragoza-Barcelona, resulting from the following commands, and the plots for Barcelona-Huesca and Zaragoza-Huesca. All the previous results and the plots show that there exist a pairwise dependence between the three locations, and that it is stronger between Zaragoza and Huesca.

Testing independence and analysing dependence factors
Our next aim is to identify the factors which cause the dependence. To that end, the independence tests given the marginal intensities are applied. The first step is to model individually each process. This has a twofold objective, first to identify the factors that have an influence on the occurrence of EHEs in each series, and that may cause the dependence, and second to estimate the marginal intensities of the processes. The second step is to check if the occurrence processes are independent given the fitted intensities. If the tests do not reject the null hypothesis, it can be concluded that the dependence between the EHE processes is explained by the considered covariates, since once its effect is removed, the processes are independent. The rejection of independence gives evidences that there are other non identified factors causing dependence, which have not been included as predictors in the intensities. In those cases, a multivariate model allowing dependence should be used.
Step 2. The independence tests are used to study the pairwise independence given the fitted intensities. Since it can be assumed that the marginal processes are Poisson, the three families of tests, POISSON, CLOSE and CROSS can be applied. In the CLOSE family only the PaB test is applied since the processes are nonhomogeneous, while in the others the most powerful test according to  is selected, that is the Normal test and the K test. Only the functions for the analysis between TxZ and TxB are shown, but all the pairwise comparisons are summarized in Table 1. POISSON family. The Normal test is applied using an interval length r = 15 that guarantees the Normal approximation of the statistic.  24 17 28 20 27 20 17 13 17 27 24 23 23 18 27 5 28 10 7 22 27 27 28  PBZB<-TestIndNH(posZ, posB, nsim = 5000, type = "Poisson",lambdaMarg =cbind(lambdaB),fixed.seed=35) PBZB$pv ## p-value ## 0.2107578 CROSS family. The K test is implemented using an r-grid with values from 1 to 15, value selected with the help of the plot of the estimated K(r). Both the p-value and Figure 5 suggest the independence between Z-B given the intensities, since all the valuesK(r) lie inside the confidence band. On the other hand, the plot for Z-H, also shown in Figure 5, rejects independence for short-term dependence.
auxZB<-NHK(lambdaZ, lambdaB, posC=posZ, posD=posB, r=c(1:15), typePlot= Kfun ,cores=2,fixed.seed=36) auxZB$pv ## p-value ## 0.1558442   the pair B-H, the K test rejects the null while all the other tests do not. It is found that the high value of the K statistic is only due to the occurrence of a point in Huesca in t = 901 and in Barcelona in t = 902, when the intensity in both locations is low. In order to analyse the influence of this event, the point in Barcelona is removed, and the resulting p-value, 0.29, does not reject the independence any more. This suggests that the K test is more sensitive than the others to the existence of an influential point. Given that all the tests are built by conditioning on the occurrences of the first process, the tests are also applied, changing the order of the locations, and the same conclusions are obtained.
These results are graphically confirmed by the Dutilleul plots given the fitted intensities, where only the plot between Zaragoza and Huesca gives evidence of dependence.  The PaB test can also be used to test independence between the three processes simultaneously, PBBHZ<-TestIndNH(posB, posH, posZ, nsim = 1000, type = "Poisson",lambdaMarg=cbind(lambdaH, lambdaZ), fixed.seed=65, cores=2) PBBHZ$pv ## p-value ## 0.002997003 Then we conclude that, given the fitted intensities, the occurrence of the EHEs in Zaragoza-Barcelona and in Barcelona-Huesca are independent, while there exists dependence not explained by the covariates in Zaragoza-Huesca, which are the closest locations. Given these results, the best model for Barcelona is the previously fitted model, while the occurrence processes of Huesca and Zaragoza should be modelled by a vector of PPs taking into account the dependence between them. A model that allows us to include that dependence is a CPSP. The occurrences of the three indicator processes, the process of the events only in Huesca, only in Zaragoza and the simultaneous events, are obtained by the function CPSPPOTevents. Then, the CPSP can be estimated by fitting a Poisson process to each of the three indicator processes using fitPP.fun, see  for some examples.

Inference based on computational statistical methods
This section shows two examples of inference based on computational statististical tools, using the function IntMPP. The first example uses the CPSP which models the occurrence of EHEs in Huesca and Zaragoza taking into account the dependence between them. It is fitted using NHPoisson and the estimated intensities of the three indicator processes are the three last elements of the data.frame TxBHZ, lambdaOZ, lambdaOH and lambdaZH.
In the first example, we calculate the point estimation and a confidence interval of the time of the first EHE in Zaragoza or Huesca. We need the function firstt, whose output is the minimum occurrence time in a vector of processes.
firstt<-function(posNH){minpos<-min(unlist(posNH))} lambdaiZH<-cbind(lambdaOZ,lambdaOH,lambdaZH) aux<-IntMPP(funMPP.name="DepNHCPSP", funMPP.args=list(lambdaiM=lambdaiZH, d=2, dplot=FALSE), fun.name="firstt", fun.args = NULL, clevel = 0.95, cores = 2, fixed.seed = 125) ## Lower bound of CI: 50.4648 ## Point estimator: 116.7493 ## Upper bound of CI: 233.4015 This type of inference also allows us to obtain confidence bands for two or more values. For example, the number of EHEs in Huesca and in Zaragoza in a given time interval I. To that end, we use the function NumI, included in the package, whose output is a vector containing the number of points in an interval I in each marginal process of a vector of processes. To see the evolution of the number of extremes, we consider two intervals, the three first and the three last years of the period. A clear increase of the number of EHEs is observed in the two locations.

Simulating and characterising vectors of processes
In this section some of the tools to generate vectors of processes in IndTestPP are used to characterize the effect of the dependence in the distribution of the nearest distances between two point processes. To that end, two dependent processes with a given dependence structure and two independent processes with the same marginal distribution that the previous ones are generated. The distribution of the samples of nearest distances are compared using histograms and qqplots.
We generate two dependent Neyman-Scott processes using DepNHNeyScot, with mean cluster size equal to 3 y 4 respectively, and N(0, 3) and N(0, 2) distributions for the distances to the centre. The independent processes with the same marginal distribution are generated using IndNHNeyScot. The distribution of the nearest distances is very different in the two cases, as the qqplot shows. In the dependent processes, it is concentrated in low values, while in the independent ones the density decreases more smoothly.

Dependent processes
Nearest dist

Conclusions
Many modelling problems related to the occurrence of events require to analyse the dependence between two or more point processes in time. However, not many tools to carry out this type of analyses are available. IndTestPP provides a useful general framework for applications based on the modelling of a vector of point processes in time, since it includes functions for processing data, estimating the marginal intensities of the processes, testing independence, identifying factors causing dependence, and making inference. In particular, the three families of independence tests by  are implemented. They are useful in different types of modelling problems since they cover a wide variety of processes, homogeneous and nonhomogeneous, Poisson processes, processes with a parametric marginal model, point processes with known marginal intensities, etc. The package also provides functions to generate four different types of vectors of point processes, Common Poisson Shock processes, multivariate Neyman-Scott cluster processes, Poisson processes from queues in a tandem, and vectors of processes resulting from a marked Poisson process with discrete marks from a Markov chain. These generation functions are used to carry out inference based on computational statistical methods. The applicability of the package in real modelling problems is shown by analysing the dependence between the occurrence of extreme temperature events in three Spanish locations, Zaragoza, Barcelona and Huesca.