The need to analyze the dependence between two or more point processes in time appears in many modeling problems related to the occurrence of events, such as the occurrence of climate events at different spatial locations or synchrony detection in spike train analysis. The package IndTestPP provides a general framework for all the steps in this type of analysis, and one of its main features is the implementation of three families of tests to study independence given the intensities of the processes, which are not only useful to assess independence but also to identify factors causing dependence. The package also includes functions for generating different types of dependent point processes, and implements computational statistical inference tools using them. An application to characterize the dependence between the occurrence of extreme heat events in three Spanish locations using the package is shown.
A point process in time (PP in short) is a random collection of points
in a space in
In those situations, statistical tests are required to assess the independence between two or more PPs. If we can assume that the processes are independent, their modeling is much simpler, since it can be carried out separately for each process without any loss of information. The tests are also useful to identify the type of dependence and select the type of vector of point processes used to model them. The need of testing independence between PPs appears in climate and environmental sciences (Abaurrea et al. 2015; Cronie and Lieshout 2016), in neuroscience (Tuleau-Malot et al. 2014; Albert et al. 2015), in biology (Myllymäki et al. 2017), and many other fields.
Two types of independence between PPs may be of interest, general independence (Rubin-Delanchy and Heard 2014a) and independence, given the intensities of the processes. The election of the type of independence as null hypothesis depends on the aim of the study, but the second type is more useful in modeling problems based on PPs. In effect, the most frequent approach to model systematic dependence structures caused by common factors is the use of nonhomogeneous processes with intensities, which are functions of the same or dependent covariates. To analyze if the dependence is well represented by those covariates, the null hypothesis of independence given the intensities has to be checked. When the existing dependence cannot be explained by the available covariates, models taking into account that dependence should be considered to model the vectors of PPs.
The R package IndTestPP (Cebrián 2020) provides a general framework for all the steps to analyze the dependence in a vector of point processes in time: from data processing and tests of independence to inference tools for parameters of interest. That makes it a useful tool for applications based on the modeling of a vector of point processes. As far as we know, there is not other software for this type of analysis. One of the main features of the package is the implementation of the three families of independence tests by Cebrián et al. (2020), which cover a wide variety of homogeneous and nonhomogeneous processes appearing in real problems: Poisson processes, processes with a parametric marginal model, point processes with known marginal intensities, etc. The package also provides functions to generate four different models of dependent PPs, and two types of independent PPs, which are useful to develop inference tools based on computational statistical methods.
The outline of the paper is as follows. The two first sections Vector of point processes in time and Point processes in R introduce some properties for vectors of point processes and some R packages related to this topic. The three following sections describe the implementation in R of the tests of independence, the measures of dependence, and the tools for generating PPs. The final section shows an illustrative example of an analysis to characterize the dependence between the occurrence of extreme heat events in three Spanish locations using IndTestPP.
A point process in time,
If the intensity is constant, the process is homogeneous and
nonhomogeneous otherwise. The most known PP is the Poisson process,
where
Herein, we will consider vectors of point processes
Most of the results in this work are developed for vectors of
Many types of dependence structures can appear between the marginal processes of a vector. The most direct way of modeling it is to use models to represent the dependence between the occurrence times of the processes, such as the common Poisson shock processes, the queue processes, the Poisson processes with dependent marks, or the multivariate Neyman-Scott processes, described later.
There exist many packages in R devoted to the analysis of spatial point processes: the extensive spatstat (Baddeley et al. 2015), whose main functionalities include exploratory data analysis, model-fitting, and simulation, stpp (Gabriel et al. 2020), splancs (Rowlingson and Diggle 2017), and many others. IDSpatialStats (Giles et al. 2019) provides spatial dependence measures, and future directions include the extension to the spatio-temporal case. However, the number of packages dealing with the analysis of point processes in time is not so high, and most of them deal with univariate analysis of the processes. NHPoisson (Cebrián et al. 2015) provides a global framework for the modeling and diagnosis of Poisson processes in time, PtProcess (Harte 2010) fits and analyses time-dependent marked point processes with an emphasis on earthquake modeling, and mmpp (Hino et al. 2017) offers various similarity and distance metrics for marked point processes.
The aim of IndTestPP is the analysis of vectors of point processes in time, in particular of its dependence, and it provides a general framework for all the steps involved in this type of analysis: data processing, estimation of the marginal intensities of the processes, analysis of independence given the intensity, identification of factors causing dependence, and inference tools based on computational statistics. mppa (Rubin-Delanchy and Heard 2014b) provides a test for dependence between point processes on the real line, but with a different aim since it tests general independence. The three families of tests implemented in IndTestPP are more general since they are not restricted to Poisson processes, and they test independence given the marginal intensities. This type of conditional independence is more useful in statistical modeling of vectors of point processes since it helps to identify the factors that cause the dependence. An example of how all the steps of the modeling of a vector of point processes can be carried out using IndTestPP is shown in the application section.
Most of the analysis of independence between point processes in the literature involve spatial processes, but few works deal with the study of independence between processes in time. IndTestPP includes the three families of tests to assess independence between PPs in time by Cebrián et al. (2020), i.e., the POISSON, the CLOSE, and the CROSS families, and a graphical tool, the Dutilleul plot. In all of them, the null hypothesis is the independence between the point processes, given their marginal intensities, and the alternative the existence of any type of random dependence between them. All the tests in these families are constructed by keeping fixed the first observed process, a common approach to test independence given the marginal structure. However, each test is based on different assumptions, and all together they cover a wide range of types of processes appearing in real problems.
The family of tests POISSON is implemented in the function CondTest
,
and it includes two tests to assess the independence between two
homogeneous or nonhomogeneous processes, based on the conditional
distribution of lambday
.
This family is based on the following property. If
Two options are available to perform a test. The test implemented with
the argument type=’Poisson’
is based on the fact that under the
independence between type=’Normal’
. Again,
under the null and with not overlapping intervals, the variables
type=’All’
, both tests are calculated.
The intervals where the number of points is counted, r
. If changer=TRUE
, when two intervals overlap, their
lengths are shortened by half of the intersection period; in this way
the resulting intervals are disjoint and, consequently, the
corresponding variables
The power study by Cebrián et al. (2020) shows that the Normal test performs better
provided that conditions to guarantee the normal approximation are
fulfilled. These conditions are quite weak, even with a complex
intensity, mean values of
The CLOSE family includes two tests, the parametric bootstrap (PaB) and
the Lotwick-Silverman (LoS) tests, implemented in the functions
TestIndNH
and TestIndLS
, respectively. The LoS test can only be
applied to homogeneous processes, but the PaB test also to
nonhomogeneous ones. On the other hand, the LoS does not require any
assumption to be applied, while PaB requires that
These tests are based on the close point distance, and they aim to
compare the behavior of the sets of close points in a vector of observed
processes and in vectors with the same marginal distributions but
independent components (Abaurrea et al. 2015). A point uniogentri
and
DistObs
, respectively.
Given the complexity of the test statistic, its distribution has to be
obtained by computational statistical methods. These methods require
approaches to generate a sample of
Parametric bootstrap test. In this test, the type="Poisson"
) and Neyman-Scott cluster processes
(type="PoissonCluster"
).The generation of Poisson processes in a given
period uses the function simNHPc
, based on a two-step algorithm which
generates homogeneous occurrence times, and transform them into the
points of a NH process with intensity IndNHNeyScot
. Details about these processes are explained later.
Lotwick-Silverman test (LoS). TestIndLS
generates processes using
a Monte Carlo method conditional on the observed marginal structure
(Lotwick and Silverman 1982). The steps are the following:
The observed processes
Fixing
The mean distances of the close point sets in the generated vectors of
processes in the PaB and the LoS tests are calculated by the functions
DistSim
and DistShift
, respectively. The calculation of the
p-value requires the generation of processes in two steps of the
algorithm, to calculate the expectation of the mean distances
According to the power study by Cebrián et al. (2020), both LoS and PaB tests have high power, but LoS performs slightly better in the homogeneous processes with small samples and low dependence.
The CROSS family includes two tests based on the cross K and the cross J
spatial functions adapted to the case of PPs in time, implemented in the
functions NHK
and NHJ
, respectively. These functions also provide
estimators of the cross functions. They do not require any assumption
about the distribution of the marginal processes, only to know their
intensities, and the p-values are calculated using a LoS approach. The
tests can be applied to two homogeneous or nonhomogeneous processes, and
more generally to two sets of processes,
posC
, a vector containing all
the occurrence times in the processes in typeC
, a
vector containing the code posC
occurs; posD
and
typeD
. For the sake of simplicity, the results are expressed for the
case,
NHK
calculates two different
estimators of typeEst = 2
) performs better in terms
of size and power (Cebrián et al. 2020).
It compares the functions
The estimators of NHD
and
NHF
, are used in NHJ
to estimate L
or an automatic selection is calculated otherwise. In the
homogeneous PPs, the previous estimators are equal to the empirical
distribution functions, and the calculation algorithms are changed to
reduce the computational cost. The test statistic, which summarizes the
deviations of the function from 1, is
In both functions, NHK
and NHJ
, the grid of r
. If it is NULL, an
automatic selection based on length rTest=r0
. To
identify an adequate value of
The calculation of the p-value in CROSS tests is based on a LoS
approach for nonhomogenous processes. First, the observed processes
The function DutilleulPlot
carries out Diggle’s randomization testing
procedure extended by Dutilleul (2011), which graphically assesses the
independence between two homogeneous or nonhomogeneous Poisson
processes, given their marginal structure. The idea is to plot the
cumulative relative frequency of the nearest neighbor distances between
the points in the two observed processes and to analyze the independence
using a confidence band calculated from simulated independent Poisson
processes with the observed marginal intensities.
Unfortunately, there does not exist a general definition to quantify the dependence between two PPs. However, we suggest some measures implemented in IndtestPP which can be useful to describe the level of dependence between many types of processes.
Correlation between the counting variables of two PPs. CountingCor
calculates a sample estimator of
In nonhomogenous processes, variables CountingCor
can calculate a standardized version of the
measure, so that all the variables have the same mean and variance, and
if
Percentage of concordant intervals. A simpler descriptive measure is
the percentage of concordant intervals, that is, the percentage of
intervals with occurrences in both processes. It is calculated by
BinPer
as
Extremal dependence coefficients. In the case of PPs resulting from
a Peak over threshold (POT) approach, another interesting measure is the
extremal dependence between the variables
The function depchi
estimates the functions
The generation of vectors of PPs with a given dependence structure is necessary to implement Monte Carlo, parametric bootstrap, or other inference methods based on simulation, such as those described in section Inference based on computational statistical methods. There are different approaches to model the dependence between the marginal processes in a vector of PPs, but the most direct way is to model the dependence between the occurrence times of the processes. IndTestPP includes functions for the generation of four types of vectors of homogeneous or nonhomogeneous PPs, which will be described later in this section: common Poisson shock processes, multivariate Neyman-Scott processes, queue processes, and marked Poisson processes. These types of vectors allow modeling three dependence structures frequently observed in real problems.
The function DepNHCPSP
generates
Generation algorithm. The CPSPs show a property which
straightforwardly leads to a generation algorithm: they can be
decomposed into
Generation of simNHPc
.
Each
The intensity of the processes to be generated with DepNHCPSP
is
specified in argument lambdaiM
, a matrix whose columns are the
intensity vectors of the indicator processes. Independent Poisson
processes in the same period of time cannot be generated using
DepNHCPSP
but with IndNHPP
.
Estimation. It is simple since it reduces to the identification of
the indicator processes and the estimation of CPSPpoints
identifies the three
indicator processes, using as input the points in the two marginal
processes. The related function CPSPPOTevents
calculates the
occurrence times, length, maximum, and mean intensity of the extreme
events of the indicator processes of the CPSP resulting from a POT
approach. The marginal and indicator processes of a CPSP are plotted by
the functions PlotMCPSP
and PlotICPSP
, respectively. Poisson
processes can be fitted to the indicator processes using the package
NHPoisson.
The function DepNHNeyScot
generates
Generation algorithm. The previous definition leads to the following generation algorithm.
A Poisson process with a given intensity is generated to obtain the
cluster centers
Given the number of generated cluster centers
Given the series
DepNHNeyScot
implements two common distributions to model the
distances from the points to the cluster center, DepNHNeyScot
, but with IndNHNeyScot
.
This is the only model whose estimation is not easy, since the cluster centers are usually unobserved, and they are required to estimate both the underlying Poisson process and the distances of the points in each cluster to its cluster center.
DepNHPPqueue
generates DepNHPPqueue
generates all the
intermediate processes in a tandem where the first queue can be
Generation algorithm. The generation of homogeneous PPs is based on
Burke’s theorem stating that if the input of a queue is a homogeneous
Poisson process, the output is a dependent Poisson process with the same
intensity
Generation of the input process using a Poisson process with
intensity
Generation of independent time services
Generation of the output process using the generated input points
and time services. If there is only one server, the output times
The resulting output process is the input process of the following queue.
Steps 2 to 4 are repeated up to obtain
The distribution Distributions
. The length of the argument lambda
fixes nEv
).
The vector of the output intensities
Estimation. Since the marginal processes are Poisson, they can be fitted and modeled using the package NHPoisson. Additionally, if the connection between the input and output points is known, the service times are the difference between them, and their distribution can be also easily estimated.
DepNHPPMarked
generates
Generation algorithm. Applying the previous definition, the generation of algorithm is simple.
Generation of the points in a Poisson process with a given intensity
Generation of marks by a Markov chain. It implies an iterative
generation of values in
Each marginal process
SpecGap
calculates the spectral gap, a measure of the dependence
generated by a Markov chain, which assesses the convergence speed of the
transition matrix to a matrix with the same stationary distribution and
equal rows (that is, with independent marks). Processes with a lower
spectral gap yield more dependent marginal processes. Independent
Poisson processes can be generated using IndNHPP
or a transition
matrix with equal rows in DepNHPPMarked
.
Estimation. Given that the process of all the points in the marginal
processes is Poisson, TranM
estimates the transition matrix of the Markov chain using the
MLE based on count data. Then, the estimators of the marginal
intensities are
There are many parameters of potential interest in a vector of point
processes, where inference tools based on exact or asymptotic
distributions are not available. Inference based on computational
statistical methods such as Monte Carlo (MC) or parametric bootstrap is
a useful alternative in those cases. IntMPP
uses these methods to
implement point estimation and calculation of confidence intervals and
envelopes of a parameter, or vector of parameters, related to a vector
of PPs. The only requirement for the parameters of interest is that it
must be possible to estimate them from the observed processes. Some
examples are the vector of the number of points in each process in a
given time period or the time of occurrence of the k-th point in the
vector.
The idea to construct confidence intervals or envelopes using
computational statistical methods is simple when the distribution of the
vector of processes is completely known (Monte Carlo approach). In real
problems, the parameters of the distribution of the vector of processes
are rarely known, and parametric bootstrap methods, where the parameters
are estimated from the sample, have to be used. The basic idea is to
generate a sample of
The two main arguments of IntMPP
are fun.name
, a function to define
the estimator of the parameter, and funMPP.name
, a model to generate
the vectors of processes. The estimator in fun.name
must be a function
of the points in the vector of PPs (defined as a list which must be the
first argument of the function) and any number of additional arguments
provided by argument fun.args
. The models in funMPP.name
can be
DepNHCPSP
, DepNHPNeyScot
, DepNHqueue
, and DepNHPPMarked
, or any
other implemented by the user. The only requirement of those models is
that the first element in the output has to be a list with funMPP.args
. Parallel computation is implemented in this
function.
This section illustrates how the package IndTestPP can be used to carry out all the steps in the analysis of the occurrence of the extreme heat events (EHEs) in three Spanish locations, Barcelona (B), Zaragoza (Z), and Huesca (H) using a vector of point processes.
The series TxBHZ
, available in
the package. The days which are not observed in the three series are
considered as missing observations so that three series with 8262
complete observations are available. The date (
Using the peak over threshold (POT) approach, an EHE is defined as a run
of consecutive days where the temperature is over an extreme threshold,
and its occurrence point is the day of maximum temperature in the run.
The threshold is the 95
To identify the occurrence times of the EHEs in a series using the POT
approach, the function POTevents.fun
in NHPoisson is used. The case
of Zaragoza is shown as an example, and their 104 occurrence times are
stored in
R> library(NHPoisson)
R> library(IndTestPP)
R> data(TxBHZ)
R> attach(TxBHZ)
R> auxZ<-POTevents.fun(TxZ, thres=37.8)
Number of events: 104
Number of excesses over threshold 37.8 : 176
R> posZ<-auxZ$Px
Then, PlotMargP
is used to plot the points in the three processes:
R> T<-length(TxZ)
R> PlotMargP(list(posB, posH, posZ), T=T, cex.axis=0.6,cex=0.6,
cex.main=0.7, cex.lab=0.7)
PlotMargP
: Point processes of the occurrences
times of the EHEs in Barcelona (The temperature series are highly correlated, with Pearson coefficients
R> aux<-depchi(TxB,TxZ,indgraph=FALSE,xlegend='topright',
thresval=c(9000:9975)/10000)
depchi
of The estimators
The functions CountingCor
and BinPer
calculate another extremal
dependence measures, the correlation coefficient between the number of
EHEs in intervals of a given length
R> aux<-CountingCor(posB,posZ, ll=10, T=T, method='kendall')
R> aux
tau
0.3554213
R> aux<-BinPer(posB,posZ, ll=10, T=T)
Percentage of concordant intervals: 0.272
with
The dependence given the empirical intensity of one process (obtained by
function emplambda.fun
in NHPoisson) can be graphically analyzed
using the Dutilleul plot. Figure 4 shows the plot for
Zaragoza-Barcelona, resulting from the following commands, and the plots
for Barcelona-Huesca and Zaragoza-Huesca. All the previous results and
the plots show that there exists a pairwise dependence between the three
locations and that it is stronger between Zaragoza and Huesca.
R> lambdaEB<-emplambda.fun(posE=posB, t=c(1:T), lint=100, plot=F)$emplambda
R> aux<-DutilleulPlot(posZ, posB, lambdaEB, main="Zaragoza-Barcelona")
Our next aim is to identify the factors which cause the dependence. To that end, the independence tests given the marginal intensities are applied. The first step is to model each process individually. This has a twofold objective: first, to identify the factors that influence the occurrence of EHEs in each series, which may cause the dependence, and second to estimate the marginal intensities of the processes. The second step is to check if the occurrence processes are independent given the fitted intensities. If the tests do not reject the null hypothesis, it can be concluded that the dependence between the EHE processes is explained by the considered covariates since once its effect is removed, the processes are independent. The rejection of independence gives evidence that there are other non-identified factors causing dependence, which have not been included as predictors in the intensities. In those cases, a multivariate model allowing dependence should be used.
Step 1. To model the occurrence of the EHEs in each series, we consider
a nonhomogeneous Poisson process with an intensity that is a function of
a harmonic term (to model the seasonal behavior) and the available
covariate, which represents the local atmospheric situation
(Abaurrea et al. 2015). After the modeling process, based on a likelihood ratio
test, the harmonic term, the covariate, and the squared covariate are
selected in Zaragoza. The same terms are included in Huesca, and the
same plus interaction between the covariate and the harmonic in
Barcelona. These models are fitted using fitPP.fun
in NHPoisson. The
fit of Zaragoza is shown as an example, and the others are carried
analogously to obtain
R> ss<-sin(2*pi*dayyear/366)
R> cc<-cos(2*pi*dayyear/366)
R> covZ<-cbind(ss,cc, Txm15Z, Txm15Z**2 )
R> dimnames(covZ)<-list(NULL, c("Sin", "Cos", "Txm15", "Txm152"))
R> ModZ<-fitPP.fun(covariates = covZ, posE = posZ, inddat = auxZ$inddat,
dplot=F, tit = "Sin+Cos+Txm15+Txm152",
start = list(b0 = 1, b1=-1,b2=1, b3=0, b4=0))
Number of observations not used in the estimation process: 72
Total number of time observations: 8262
Number of events: 104
Convergence code: 0
Convergence attained
Loglikelihood: -430.087
Estimated coefficients:
b0 b1 b2 b3 b4
-54.209 0.190 -2.496 2.434 -0.029
Full coefficients:
b0 b1 b2 b3 b4
-54.209 0.190 -2.496 2.434 -0.029
attr(,"TypeCoeff")
[1] "Fixed: No fixed parameters"
R> lambdaZ<-ModZ@lambdafit
The three fitted models are satisfactorily validated using
globalval.fun
in NHPoisson.
Step 2. The independence tests are used to study the pairwise
independence given the fitted intensities. Since it can be assumed that
the marginal processes are Poisson, the three families of tests POISSON,
CLOSE, and CROSS can be applied. In the CLOSE family, only the PaB test
is applied since the processes are nonhomogeneous; in the others, the
most powerful test, according to Cebrián et al. (2020), is selected, that is the
Normal test and the
POISSON family. The Normal test is applied using an interval length
R> aux<-CondTest(posZ, posB, lambday=lambdaB, r=15)
WARNING: there are overlapping intervals. The independence hypothesis
is not guaranteed.
The intervals have been shortened to obtain disjoint intervals.
The length of the intersection priods are:
[1] 23 21 28 19 11 12 27 26 20 22 15 27 22 18 18 22 28 16 26 25 17 26 12 26 6
[26] 8 24 17 28 20 27 20 17 13 17 27 24 23 23 18 27 5 28 10 7 22 27 27 28 23
[51] 27 26 24 21 19 26 14 14
The shortest length of the considered intervals is: 3
The median of the mui values is: 0.5
R> aux$pvN
Normal p-value
0.6859921
CLOSE family. In the PaB test, the parametric marginal model of the second process, the Poisson process fitted to Barcelona in this case, has to be specified.
R> PBZB<-TestIndNH(posZ, posB, nsim = 5000, type = "Poisson",
lambdaMarg =cbind(lambdaB), fixed.seed=35)
R> PBZB$pv
p-value
0.2107578
CROSS family. The
R> auxZB<-NHK(lambdaZ, lambdaB, posC=posZ, posD=posB, r=c(1:15),
typePlot='Kfun', cores=2,fixed.seed=36)
R> auxZB$pv
p-value
0.1558442
NHK
: Estimation of the K
function and confidence band under independence for Zaragoza-Barcelona
(left) and Zaragoza-Huesca (right).
Table 1 summarizes the three pairwise comparisons. The three
tests lead to the non-rejection of independence between the occurrences
in B-Z, and to the rejection between Z-H. On the other hand, in pair
B-H, the
Z-B | B-H | Z-H | |||||||
Normal | PaB | Normal | PaB | Normal | PaB | ||||
pv | .69 | .21 | .0.16 | .44 | .25 | .00 (0.29) | .03 | .00 | .03 |
These results are graphically confirmed by the Dutilleul plots given the fitted intensities, where only the plot between Zaragoza and Huesca gives evidence of dependence.
R< aux<-DutilleulPlot(posZ, posB, ModB@lambdafit,main="Barcelona-Zaragoza",
cex.main=0.9)
The PaB test can also be used to test independence between the three processes simultaneously:
R> PBBHZ<-TestIndNH(posB, posH, posZ, nsim = 1000, type = "Poisson",
lambdaMarg =cbind(lambdaH, lambdaZ), fixed.seed=65, cores=2)
R< PBBHZ$pv
p-value
0.002997003
Then, we conclude that, given the fitted intensities, the occurrence of
the EHEs in Zaragoza-Barcelona and Barcelona-Huesca are independent,
while there is dependence not explained by the covariates in
Zaragoza-Huesca, which are the closest locations. Given these results,
the best model for Barcelona is the previously fitted model, while the
occurrence processes of Huesca and Zaragoza should be modeled by a
vector of PPs taking into account the dependence between them. A model
that allows us to include that dependence is a CPSP. The occurrences of
the three indicator processes, the process of the events only in Huesca,
only in Zaragoza, and the simultaneous events, are obtained by the
function CPSPPOTevents
. Then, the CPSP can be estimated by fitting a
Poisson process to each of the three indicator processes using
fitPP.fun
; see Cebrián et al. (2015) for some examples.
This section shows two examples of inference based on computational
statistical tools using the function IntMPP
. The first example uses
the CPSP, which models the occurrence of EHEs in Huesca and Zaragoza,
taking into account the dependence between them. It is fitted using
NHPoisson, and the estimated intensities of the three indicator
processes are the three last elements of the data.frame TxBHZ
,
lambdaOZ
, lambdaOH
, and lambdaZH
.
In the first example, we calculate the point estimate and a confidence
interval of the time of the first EHE in Zaragoza or Huesca. We need the
function firstt
, whose output is the minimum occurrence time in a
vector of processes.
R> firstt<-function(posNH){minpos<-min(unlist(posNH))}
R> lambdaiZH<-cbind(lambdaOZ,lambdaOH,lambdaZH)
R> aux<-IntMPP(funMPP.name="DepNHCPSP",
funMPP.args=list(lambdaiM=lambdaiZH, d=2, dplot=F),
fun.name="firstt", fun.args=NULL, clevel=0.95, cores=2, fixed.seed=125)
Lower bound of CI: 50.4648
Point estimator: 116.7493
Upper bound of CI: 233.4015
This type of inference also allows us to obtain confidence bands for two
or more values, for example, the number of EHEs in Huesca and in
Zaragoza in a given time interval NumI
, included in the package, whose output is a vector containing the
number of points in an interval
R> aux<-IntMPP(funMPP.name="DepNHCPSP",
funMPP.args=list(lambdaiM=lambdaiZH, d=2, dplot=F),
fun.name="NumI", fun.args=list(I=c(1,459)), fixed.seed=125)
Lower bound of CI: 1 1
Point estimator: 3.058 3.765
Upper bound of CI: 6 7
R> aux<-IntMPP(funMPP.name="DepNHCPSP",
funMPP.args=list(lambdaiM=lambdaiZH, d=2, dplot=FALSE),
fun.name="NumI", fun.args=list(I=c(7803,8262)), fixed.seed=125)
Lower bound of CI: 9 10
Point estimator: 15.269 16.952
Upper bound of CI: 22 24
In this section, some of the tools to generate vectors of processes in IndTestPP are used to characterize the effect of the dependence in the distribution of the nearest distances between two-point processes. To that end, two dependent processes with a given dependence structure and two independent processes with the same marginal distribution that the previous ones are generated. The distributions of the samples of nearest distances are compared using histograms and qqplots.
We generate two dependent Neyman-Scott processes using DepNHNeyScot
,
with mean cluster size equal to 3 and 4, respectively, and IndNHNeyScot
. The distribution of the nearest distances is very
different in the two cases, as the qqplot shows. In the dependent
processes, it is concentrated in low values, while in the independent
ones the density decreases more smoothly.
R> set.seed(123)
R> lambdaParent<-runif(2000)/10
R> aux<-DepNHNeyScot(lambdaParent=lambdaParent, d=2, lambdaNumP=c(3,4),
dist="normal", sigmaC=c(3,2),fixed.seed=123, dplot=F)
R> posxd<- aux$posNH$N1
R> posyd<- aux$posNH$N2
R> aux<-IndNHNeyScot(lambdaParent=lambdaParent, d=2, lambdaNumP=c(3,4),
dist = "normal", sigmaC=c(3,2), fixed.seed=123, dplot=F)
R> posxi<- aux$N1
R> posyi<- aux$N2
R> par(mfrow=c(1,3))
R> distxyd<-nearestdist(posxd , posyd)
R> hist(distxyd , main='Dependent processes', xlab='Nearest dist',
xlim=c(0,60), ylim=c(0,270),breaks=seq(0,60, by=4) )
R> distxyi<-nearestdist(posxi , posyi)
R> hist(distxyi , main='Independent processes', xlab='Nearest dist',
xlim=c(0,60), ylim=c(0,270),breaks=seq(0,60, by=4) )
R> qqplot(distxyi, distxyd, xlab='Independent processes',
ylab='Dependent processes')
R> lines(distxyd, distxyd, col="red")
Many modeling problems related to the occurrence of events require to analyze the dependence between two or more point processes in time. However, not many tools to carry out this type of analysis are available. IndTestPP provides a useful general framework for applications based on the modeling of a vector of point processes in time since it includes functions for processing data, estimating the marginal intensities of the processes, testing independence, identifying factors causing dependence, and making an inference. In particular, the three families of independence tests by Cebrián et al. (2020) are implemented. They are useful in different types of modeling problems since they cover a wide variety of processes, homogeneous and nonhomogeneous, Poisson processes, processes with a parametric marginal model, point processes with known marginal intensities, etc. The package also provides functions to generate four different types of vectors of point processes, Common Poisson Shock processes, multivariate Neyman-Scott cluster processes, Poisson processes from queues in a tandem, and vectors of processes resulting from a marked Poisson process with discrete marks from a Markov chain. These generation functions are used to carry out inference based on computational statistical methods. The applicability of the package in real modeling problems is shown by analyzing the dependence between the occurrence of extreme temperature events in three Spanish locations, Zaragoza, Barcelona, and Huesca.
The authors are members of the research group Modelos Estocásticos (Gobierno de Aragón) and the project MTM2017-83812-P. They acknowledge J. Abaurrea and AEMET for the data and their advice.
IndTestPP, spatstat, stpp, splancs, IDSpatialStats, NHPoisson, PtProcess, mmpp, mppa, stats
ExtremeValue, Spatial, SpatioTemporal, Survival
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Cebrián & Asín, "Analyzing Dependence between Point Processes in Time Using IndTestPP", The R Journal, 2021
BibTeX citation
@article{RJ-2021-049, author = {Cebrián, Ana C. and Asín, Jesús}, title = {Analyzing Dependence between Point Processes in Time Using IndTestPP}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {1}, issn = {2073-4859}, pages = {444-460} }