Linear fractional stable motion is a type of a stochastic integral driven by symmetric alpha-stable Lévy motion. The integral could be considered as a non-Gaussian analogue of the fractional Brownian motion. The present paper discusses R package rlfsm created for numerical procedures with the linear fractional stable motion. It is a set of tools for simulation of these processes as well as performing statistical inference and simulation studies on them. We introduce: tools that we developed to work with that type of motions as well as methods and ideas underlying them. Also we perform numerical experiments to show finite-sample behavior of certain estimators of the integral, and give an idea of how to envelope workflow related to the linear fractional stable motion in S4 classes and methods. Supplementary materials, including codes for numerical experiments, are available online. rlfsm could be found on CRAN and gitlab.
The linear fractional stable motion (shortly, lfsm)
We proceed with introduction to existing software, with interest towards study of numerical properties of statistical estimators for lfsm as the main motivation. So far, there is no standard approach for software development to operating the general class of stochastic processes driven by Lévy processes. Moreover, there was no systematic indexed and pier-reviewed software for simulating sample paths of lfsm and related estimators prior to rlfsm. There is a particularly simple and useful numerical algorithm for simulating lfsms developed by Stoev and Taqqu (2004). Other methods for simulation of the processes can be found in (Wu et al. 2004) and (Biermé and Scheffler 2008). The paper (Stoev and Taqqu 2004) contains a minimalistic implementation of lfsm generator as a MATLAB function. However, some useful packages, that could be used in numerical routines with Lévy-driven processes (e.g. to create lfsm generator and perform unit testing), exist and have been implemented in R. For instance, R package somebm (Huang 2013) contains functions for generation of fractional Brownian motion (fBM). Currently archived by CRAN dvfBm (Coeurjolly 2009) has routines for generation of fBm and estimator of the Hurst parameter of the latter. stabledist (Wuertz et al. 2016) and stable (Swihart et al. 2017) contain different functions for stable distributions and random variables. A generator of random variables of the kind has been also implemented in MATLAB (see the code in Chapter 1.7 in (Samorodnitsky and Taqqu 1994)).
The paper is organized as follows. In Section 2
we present the simulation method for sample paths of lfsm and its
implementation in our path
function. Then, we present functions for
finite sample studies of statistical estimators, and some other
functions. Section 3 describes
implementations of the high- and the low-frequency parameter estimators
and discusses reasons behind their numerical behavior. Finally, in
Section 4 we suggest an object oriented system that
simplifies software programming of Lévy-driven integrals.
The latest version of the package (1.0.0) suggests that we work with two
types of sample paths. In the low-frequency setting we only use points
spaced 1 temporal index apart from each other, coordinates
and point_num
in the example in
Section 2.2) which varies from 0 to
In this section, we start with a discussion on the simulation method of
the lfsm proposed by Stoev and Taqqu (2004) which is implemented in R by us. In
particular, simulation of sample paths is done via Riemann-sum
approximations of its symmetric path
function
that creates sample paths of the lfsm. The idea underlying this sample
path generator is that it should be always possible not only to obtain
lfsm path, but also the underlying Lévy motion, generated during the
procedure, and since the core function of lfsm is deterministic it
should allow for lfsm path generation based on a given Lévy motion, and,
in theory, otherwise (not always). For this reason generators of both
processes were separated into independent parts (see Figure
1).
The function path
can be used by
path(N,m,M,alpha,H,sigma,freq='L',disable_X=FALSE,
levy_increments=NULL,seed=NULL)
Parameters N, m, M
regard to the index of the process, or time, if
applicable. m
and M
are the only means to control precision of the
integral computation. N
is a number of points of the lfsm to generate.
m
is a discretization parameter that corresponds to the number of
points where Lévy motion is sampled between two nearby indexes (e.g. M
is the truncation parameter, i.e. number of points after
which the integrated function is set to zero; freq
stands for the
frequency of the motion which can take two values: H
for
high-frequency and L
for the low-frequency setting. This is the switch
between the two data types. disable_X
is needed to disable computation
of X
, the default value is FALSE
, when it is TRUE
, only a Lévy
motion is returned, which in turn reduces the computation time. seed
is a parameter that performs seeding of the lfsm generator. Technically,
in the path
the seed is set just before Lévy increments are generated.
The path
function returns a list containing the lfsm, the underlying
Lévy motion, the point number of the motions from 0 to point_num
)
and the corresponding coordinate which depends on the frequency, the
parameters (
Generation of symmetric rstable
from package stabledist with S0
parametrization based on the Zolotarev’s representation for an
S0
is used in
order to make sigma
a scale parameter of the motion and to get exempt
from computing the normalization constant
As it was mentioned in the beginning of Section
2.2, one of the features of path
is the
ability to operate on a pair lfsm - Lévy motion and to switch between
them. We recall that direct computation of the sum approximating the
integral in the definition of lfsm ((1)) would involve
number of operations proportional to
Let us consider an example which will recur and evolve throughout this
section. Consider computing sum ((3)) where
A method based on the discrete convolution theorem is used to obtain
Definition 1. For any sequence
The reverse transform, IDTFT, is defined as
Definition 2. Discrete convolution of two infinite sequences
There is a convolution theorem for discrete sequences which says that the discrete convolution of two sequences is equal to the Inverse Discrete Fourier Transform (IDFT) of the multiplication of the direct transforms of the sequences:
Theorem 3. For any discrete sequences
Definition 4. Let
It is straightforward that the periodic summation in the definition
above has period
DTFT is not directly useful for simulation purpose, that is why we need a special case of Theorem 3, Circular Convolution Theorem which reduces DTFT to DFT.
Definition 5. The DFT of a finite sequence
The IDFT is
Theorem 6.
Returning to the task of computing the sum in ((3)), we
consider two vectors:
Although the setup of the example as is on Figure 4 is
fastest, it is impossible to use it directly, because in some situations
truncation parameter path
function performs an index shift using the following property:
path
always keeps the first
In the next example, we show how one can use the above function to generate a sample path and to provide its visualization. Compare the procedure with the similar one from Section 4.1.1.
# Path generation
List<-path(N=2^10-600,m=256,M=600,alpha=1.8,H=0.8,
sigma=1,freq='L',disable_X=FALSE,seed=3)
str(List)
List of 7
$ point_num : int [1:425] 0 1 2 3 4 5 6 7 8 9 ...
$ coordinates : int [1:425] 0 1 2 3 4 5 6 7 8 9 ...
$ lfsm : num [1:425] 0 -1.3969 0.0159 1.6487 1.87 ...
$ levy_motion : num [1:425] 0 -21.8 28.3 42.1 38.1 ...
$ levy_increments: num [1:262144] -0.292 -0.708 -1.49 0.517 0.803 ...
$ pars : Named num [1:3] 1.8 0.8 1
..- attr(*, "names")= chr [1:3] "alpha" "H" "sigma"
$ frequency : chr "L"
# Normalized paths
Norm_lfsm<-List[['lfsm']]/max(abs(List[['lfsm']]))
Norm_oLm<-List[['levy_motion']]/max(abs(List[['levy_motion']]))
# Visualization of the paths
plot(Norm_lfsm, col=2, type="l", ylab="coordinate")
lines(Norm_oLm, col=3)
leg.txt <- c("lfsm", "oLm")
legend("topright", legend = leg.txt, col =c(2,3), pch=1)
The result of the chart rendering is shown on Figure
6. The following example shows how to switch path
function in order to alter between simulation of lfsm from scratch and
computing based on an existing sample path of the Lévy motion.
m<-256; M<-600; N<-2^12-M
alpha<-1.8; H<-0.8; sigma<-1.8
seed<-2
# Creating Levy motion
levyIncrems<-path(N=N, m=m, M=M, alpha, H, sigma, freq='L',
disable_X=T, levy_increments=NULL, seed=seed)
# Creating lfsm based on the levy motion
lfsm_full<-path(m=m, M=M, alpha=alpha,
H=H, sigma=sigma, freq='L',
disable_X=F,
levy_increments=levyIncrems$levy_increments,
seed=seed)
sum(levyIncrems$levy_increments==
lfsm_full$levy_increments)==length(lfsm_full$levy_increments)
[1] TRUE
In the example the Lévy motion is generated without computing the lfsm,
which was done by setting disable_X=TRUE
, and saved to variable
levyIncrems
. After that, path
was given the obtained Lévy increments
and, basing on them, generated an lfsm path. As one can observe, the
Lévy increments from the both objects produced by path
are identical.
The same holds when we obtain an lfsm path from the above procedure and
one-step simulation of lfsm with seeding. These two facts are used in
automated tests provided for rlfsm package.
In order to study numerical properties of the estimation procedures
developed in (Mazur et al. 2020), we created a technique, that could be used in
solving this problem for any pair stochastic process and an estimator.
The approach was implemented in MCestimLFSM
function (Figure
9). The main motivation here is that for some
estimators we have limit theorems, but we do not have theory which
describes estimator behavior when the length of a path is relatively
small, and thus, for instance, we cannot use closed-form expressions to
obtain confidential intervals. In the following examples we show how to
use functions MCestimLFSM
, PLot_vb
, and Plot_dens
for studying
empirical variance, bias and a density function of an estimator. In the
first example, we study GenLowEstim
estimator, and its bias and
variance dependencies on the length of the sample paths. In particular,
one would be able to determine starting from which path length the
estimator loses significant bias influence.
library(rlfsm)
library(gridExtra)
registerDoParallel()
m<-25; M<-55
p<-.4; p_prime<-.2
t1<-1; t2<-2
k<-2
NmonteC<-5e2
alpha<-1.8; H<-0.8; sigma<-0.3
S<-seq(from = 100, to = 2e3, by =50)
tilda_ests<-MCestimLFSM(s=S, fr='L', Nmc=NmonteC, m=m, M=M,
alpha=alpha,H=H,sigma=sigma,
GenLowEstim,t1=t1,t2=t2,p=p)
# Structure of tilda_ests
names(tilda_ests)
[1] "data" "data_nor" "means" "sds" "biases" "Inference" "params" "freq"
# Structure of BSdM is as follows
head(round(tilda_ests$means,2))
alpha H sigma s
1 1.76 0.67 0.25 100
2 1.81 0.70 0.27 150
3 1.81 0.71 0.27 200
4 1.82 0.73 0.28 250
5 1.83 0.74 0.28 300
6 1.83 0.75 0.29 350
head(round(tilda_ests$biases,2))
alpha H sigma s
1 -0.04 -0.13 -0.05 100
2 0.01 -0.10 -0.03 150
3 0.01 -0.09 -0.03 200
4 0.02 -0.07 -0.02 250
5 0.03 -0.06 -0.02 300
6 0.03 -0.05 -0.01 350
head(round(tilda_ests$sds,2))
alpha H sigma s
1 0.19 0.23 0.09 100
2 0.14 0.20 0.08 150
3 0.13 0.19 0.08 200
4 0.13 0.19 0.07 250
5 0.10 0.17 0.06 300
6 0.11 0.17 0.06 350
Plot_vb(tilda_ests)
Figure 7 shows that when
GenLowEstim
could
be considered unbiased starting approximately from 1000 points.
The second example compares empirical standardized densities of
estimates, obtained by GenLowEstim
with the limiting standard normal
ones, Figure 8.
S<-c(1e2,1e3,1e4)
tilda_ests<-MCestimLFSM(s=S, fr='L', Nmc=NmonteC ,m=m, M=M,
alpha=alpha, H=H, sigma=sigma,
GenLowEstim,t1=t1,t2=t2,p=p)
l_plot<-Plot_dens(par_vec=c('sigma','alpha','H'), MC_data=tilda_ests,
Nnorm=1e7)
ggg<-grid.arrange(l_plot[[1]],l_plot[[2]],l_plot[[3]],nrow=1,ncol=3)
In short, in these examples for different path lengths s
, NmonteC
lfsm paths are simulated. To each path we apply tilde-statistic (see
Section 3.2), therefore obtaining NmonteC
estimates
It is important to notice that generation of lfsm is numerically heavy
routine and also a large number of estimates is needed to compare their
empirical distributions with the limiting ones. The latter task gave
MCestimLFSM
its name. Thus, in order to make computations feasible in
terms of time and memory use, the architecture of MCestimLFSM
must be
well-optimized. Apparently, a multi-core setup is crucial for dealing
with the task.
Having fixed a path length, the whole procedure behind MCestimLFSM
could be split in two parts. First, we need to obtain samples for each
estimator. Second, we obtain statistics of these samples (see Figure
9). Once finished, MCestimLFSM
proceeds to the next
length value until reaches the end of the vector of lengths.
In the first part, we generate s[i]
via path_fast
function. To each of the paths we apply
all the estimators to obtain foreach
-based parallel loop, where each
node simulates a path, computes and returns the statistics removing the
path from memory. path_fast
is an unavailable for users version of
path
with significantly reduced functionality for the sake of saving
execution time. The further desired enlargement of the node task by
adding generation of the whole set of paths instead of just one, making
the loop over s[i]
parallel, leads to extreme memory consumption as
well as unequal distribution of load among nodes. The number of numeric
values in the set of paths equals to MCestimLFSM
the loop over s
is sequential, and the one
over NmonteC
is parallel.
During the second part, averages and standard deviations of the samples
are computed, and subsequently used to compute the standardized
empirical distributions. So that, the three characteristics naturally
come together within the same numerical procedure. So far there is no
empirical evidence that parallel execution in this section makes
MCestimLFSM
more efficient.
Such architecture is of great use when the number of nodes available for
computations exceeds the number of path length, and the length
In this part, we will describe aspects of some of the other R functions implemented in the package.
These increments are the main building block for all statistics we use
(see Section 3). They are defined as
increment()
and increments()
. The former accepts a vector of
points at which a user wants to evaluate higher-order increments, and
computes them using formula
increments()
computes increments iteratively on the whole set of path
points. The first iteration gives increments()
is faster on sample paths with large
number of points, but slower when the increment order is high. As we
will show later, orders greater than increments()
or its hidden “relatives”.
We introduce a pair of functions which makes a panel plot of sample
paths produced by processes with different parameters. Path_array
takes a set of Plot_list_paths()
takes the data frame as an argument and
plots the sample paths on different panels based on their facet_wrap()
from
ggplot2 (Wickham 2016).
For discontinuous paths Plot_list_paths()
draws an overlapping
semitransparent line joining neighbouring points in order to highlight
jumps.
l=list(H=c(0.2,0.5,0.8), alpha=c(0.5,1,1.5), freq="H")
arr<-Path_array(N=300, m=30, M=100, l=l, sigma=0.3)
head(arr)
n X alpha H freq
1 1 0.0000000 0.5 0.2 H
2 2 0.2329891 0.5 0.2 H
3 3 1.1218238 0.5 0.2 H
4 4 -6.1284620 0.5 0.2 H
5 5 -2.2450357 0.5 0.2 H
6 6 3.4979978 0.5 0.2 H
str(arr)
'data.frame': 2709 obs. of 5 variables:
$ n : num 1 2 3 4 5 6 7 8 9 10 ...
$ X : num 0 0.233 1.122 -6.128 -2.245 ...
$ alpha: Factor w/ 3 levels "0.5","1","1.5": 1 1 1 1 1 1 1 1 1 1 ...
$ H : Factor w/ 3 levels "0.2","0.5","0.8": 1 1 1 1 1 1 1 1 1 1 ...
$ freq : Factor w/ 1 level "H": 1 1 1 1 1 1 1 1 1 1 ...
Plot_list_paths(arr)
In this section, we describe estimators for the parameters
First, we consider the case
Now, we consider consistent estimators for the self-similarity parameter
Let us define the following two statistics
Estimators for the scale parameter
Here, we consider general case when an explicit lower bound for
Next, we consider two-stage estimation procedure in the general case in
high-frequency setting which is the same as in the low-frequency
setting. For
We introduce function ContinEstim
for performing statistical inference
according to Section 3.1 when
ContinEstim(t1, t2, p, k, path, freq)
The function is basically comprised by simpler functions alpha_hat
,
H_hat
and sigma_hat
responsible for retrieving the corresponding
parameters. sigma_hat
is called using tryCatch
as the former may
return an error due to numerical integration in Norm_alpha
.
General low-frequency estimation technique, described in Section
3.2 is implemented in GenLowEstim
.
GenLowEstim(t1, t2, p, path, freq = "L")
This estimator first sets a preliminary 2+floor(alpha_0(̂-1))
, and then the new ContinEstim
. This approach induces an
effect, which does not exist in the case when ContinEstim
is applied.
When
High-frequency estimator from the same section was implemented in
GenHighEstim
.
GenHighEstim(p, p_prime, path, freq, low_bound = 0.01, up_bound = 4)
Although the general high- and low-frequency estimators presented in
Section 3.2 have important advantages, namely closed form
expressions for distribution functions and non-suboptimal convergence
rates, they also reveal two drawbacks in performance. Due to condition
and error handling, the time performances of the general estimators are
much worse than those of the continuous ones. On top of that, the
plug-in estimators (because of their nature) have much less probability
of obtaining an estimate at all. The main idea is as follows: the more
statistics are used in a plug-in estimator, the higher the probability
to stumble upon a numerical error during the estimation procedure. We
illustrate this effect by the following experiment, wherein the general
high- and low-frequency estimators are compared to the corresponding
continuous ones. For each pair from a set of parameters NmonteC
sample paths of the both frequencies were generated, and to
each of them the relevant procedures ContinEstim
, GenLowEstim
and
GenHighEstim
were applied (see “Estimate deterioration experiment” in
the supplementary materials). Then, the rates of successful computation
results were computed. The result of estimation was considered
“successful” if during the procedure all three parameters were obtained,
no error occurred, and the estimates are meaningful, namely
|
|
Comparison of success rates for ContinEstim and GenLowEstim. Low frequency case. Path length N=200, number of sample paths NmonteC=300. | Comparison of success rates for ContinEstim and GenHighEstim. High frequency case. Path length N=200, number of sample paths NmonteC=300. |
This experiment shows (Figures 11a and
[fig:Estim_Succ_Compar_high]b) that in
both high- and low-frequency cases ContinEstim
gives much better
precision than the corresponding general estimator. The outcome is
rigorous in low-frequency technique since ContinEstim
and
GenLowEstim
have the same set of tuning parameters. On the other hand,
the high-frequency estimators have non-coinciding parameter sets, and
thus, without fine tuning, the result is merely intuitive. One could
observe that in general estimation near the boundaries of the interval
Errfilter
function in the experiment.
In order to show how the general low-frequency estimation works in
practice, we peform a numerical experiment whose code could be found in
section “Zones with different convergence” of the accompanying .R file.
We set a constant k_new<-2+floor(alpha_0(̂-1))
where alpha_0
is obtained via
alpha_hat
with parameters k=1
, freq=``L
plugged-in. This provides
us simulated distribution of k_ind = seq(1,8,by=1)
and,
given a path, for each of these k’s extract statistics
Three regimes of performance of GenLowEstim
(read, the general
low-frequency estimator
Due to small variance of GenLowEstim
and low
frequency ContinEstim
are the same. At the same time, statistic
ContinEstim
and GenLowEstim
at such length
of the sample path.
When
At
The way
Here we describe a simple S4 system (a short introduction to S4 classes is given in (Wickham 2014), Chapter OO field guide) that could be used to simplify manipulations with the two types of observations of the linear fractional stable motion. Additionally, we present a possible way to extend the system so that it encompasses more general stochastic processes. The system aims to be helpful in
Here we describe the least general classes- “SimulatedLfsmLow” and “SimulatedLfsmHigh”, objects of which are obtained by simulating low- and high-frequency linear fractional stable motions.
Figure 15 shows their internal structure. Roughly
speaking, these classes were designed to contain minimum information
that could fully describe a simulated LFSM path. Indicators of frequency
and a process type are included in the name of a class, which is
supposed to make a method dispatch more straightforward, without
additional condition blocks. Moreover, all generic functions distinct
high- and low-frequency schemes of all types with the help of class
names. The same holds for motion types. Parameters
In the following example we see how an instance of class
“SimulatedLfsmLow” is created and then plotting and inference is
performed using generic functions plot
and ContinInfer
. First, we
register classes, methods and one generic from “S4 classes examples” in
the supplementary materials.
N<-3000; m<-65; M<-300
sigma<-0.3; alpha<-1.8; H<-0.8
p<-.4; t1<-1; t2<-2; k<-2
# Make an object of S4 class SimulatedLfsmLow
List <- path(N,m,M,alpha,H,sigma,freq='L',disable_X=FALSE,seed=3)
# Make an object of parameters
prmts<-new("AlpaHSigma",alpha=List$pars[['alpha']],
H=List$pars[['H']],sigma=List$pars[['sigma']])
X_sim <- new("SimulatedLfsmLow", Process = List$lfsm,
coordinates = List$coordinates, pars = prmts,
levy_motion = List$levy_motion)
# structure of the instance
str(X_sim)
Formal class 'SimulatedLfsmLow' [package ".GlobalEnv"] with 4 slots
..@ pars :Formal class 'AlpaHSigma' [package ".GlobalEnv"] with 3 slots
.. .. ..@ alpha: num 1.8
.. .. ..@ H : num 1.8
.. .. ..@ sigma: num 1.8
..@ levy_motion: num [1:3497] 0 -15 -19.8 -21.2 -24.1 ...
..@ Process : num [1:3497] 0 -0.542 -0.912 -1.12 -1.276 ...
..@ coordinates: int [1:3497] 0 1 2 3 4 5 6 7 8 9 ...
# plot the motion
plot(X_sim)
ContinInfer(x=X_sim,t1=t1,t2=t2,k=k,p=p)
$alpha
[1] 1.870217
$H
[1] 0.8314528
$sigma
[1] 0.3227219
In this example, the plot function takes almost no effort, compared to
the similar one from Section 2.2, which is
due to the fact, that there has been a method defined for generic plot
and object “SimulatedLfsmLow”. The last function, ContinInfer
, is a
generic which has a registered method for class “StochasicProcLow”,
general stochastic processes in low-frequency setting. Since
“SimulatedLfsmLow” inherits from “StochasicProcLow”, the generic
dispatched this method and performed statistical inference.
ContinInfer
was designed to perform inference according to Theorem 3.1
from (Mazur et al. 2020) and is based on R function ContinInfer
. One can see that
plot
(and, less obviously, ContinInfer
) used “Low” from the name of
the class to perform computations.
The authors acknowledge financial support from the project “Ambit fields: probabilistic properties and statistical inference” funded by Villum Fonden. Stepan Mazur acknowledges financial support from the internal research grants at Örebro University, and from the project “Models for macro and financial economics after the financial crisis” (Dnr: P18-0201) funded by Jan Wallander and Tom Hedelius Foundation. The authors would like to thank Prof. Mark Podolskij for significant discussions. Dmitry Otryakhin thanks Dr. Firuza Mamedova for valuable remarks on the draft of this paper.
rlfsm, somebm, stabledist, stable, ggplot2
Distributions, Phylogenetics, Spatial, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Mazur & Otryakhin, "Linear Fractional Stable Motion with the rlfsm R Package", The R Journal, 2020
BibTeX citation
@article{RJ-2020-008, author = {Mazur, Stepan and Otryakhin, Dmitry}, title = {Linear Fractional Stable Motion with the rlfsm R Package}, journal = {The R Journal}, year = {2020}, note = {https://rjournal.github.io/}, volume = {12}, issue = {1}, issn = {2073-4859}, pages = {386-405} }