We introduce and showcase mvpd (an acronym for multivariate product distributions), a package that uses a product distribution approach to calculating multivariate subgaussian stable distribution functions. The family of multivariate subgaussian stable distributions are elliptically contoured multivariate stable distributions that contain the multivariate Cauchy and the multivariate normal distribution. These distributions can be useful in modeling data and phenomena that have heavier tails than the normal distribution (more frequent occurrence of extreme values). Application areas include log returns for stocks, signal processing for radar and sonar data, astronomy, and hunting patterns of sharks.
Multivariate subgaussian stable distributions are the elliptically
contoured subclass of general multivariate stable distributions. To
begin a brief introduction to multivariate subgaussian stable
distributions, we start with univariate stable distributions which may
be more readily familiar and accessible. Univariate stable distributions
are a flexible family and have four parameters,
Univariate symmetric stable distributions are achieved by setting the
skew parameter
Therefore, numerically computing the densities is especially important for application. For univariate stable distributions, there is open-source software to compute modes, densities, distributions, quantiles and random variates, including a number of R packages (stabledist, stable, libstableR, for example – see CRAN Task View: Probability Distributions for more).
As generalizations of the univariate stable distribution, multivariate
stable distributions are a very broad family encompassing many
complicated distributions (e.g. support in a cone, star shaped level
curves, etc.). A subclass of this family is the multivariate subgaussian
stable distributions. Multivariate subgaussian stable distributions are
symmetric and elliptically contoured. Similar to the aforementioned
univariate symmetric stable distributions, the value
For multivariate subgaussian stable distributions, the parameter
Including mvpd, the focus
of this paper, if one wanted R
functions to interact with multivariate
subgaussian stable distributions they have three R
package options.
These packages are compared in Table 1 and detailed
below:
Functionality | alphastable |
stable |
mvpd |
---|---|---|---|
random variates | |||
parameter estimation | |||
density | |||
cumulative distribution (monte carlo) | x | ||
cumulative distribution (integrated) | x | x | |
multivariate subgaussian stable | |||
multivariate independent stable | x | x | |
multivariate isotropic stable | x | x | |
multivariate discrete-spectral-measure stable | x | x |
While the lack of a tractable density and distribution function impedes
directly calculating multivariate subgaussian stable distributions, it
is possible to represent them in terms of a product distribution for
which each of the two distributions in the product has known numerical
methods developed and deployed (in R
packages on CRAN). This paper
utilizes this approach. The next section covers some product
distribution theory.
This section reviews some known results of product distributions and
describes our notation. Allow univariate positive random variable
From Nolan (2013),
Using the aforementioned theory of product distributions, we can arrive
at functions for a multivariate subgaussian stable density and
distribution function thanks to established functions for univariate
stable and multivariate normal distributions. A key package in the
implementation of multivariate subgaussians in R is
mvtnorm
(Genz and Bretz 2009; Genz et al. 2020). In the basic
product-distribution approach of
mvpd, mvtnorm::dmvnorm
and mvtnorm::pmvnorm
respectively. Allow the
density of R
) using
stable::dstable
or libstableR::stable_pdf
(Royuela-del-Val et al. 2017).
Presented as pseudo-code:
The (outermost) univariate integral is numerically evaluated with
stats::integrate
.
We present an outline of the [dpr]mvss
(multivariate subgaussian
stable) functions, and walk through the code in the subsequent
sections. As an overview, we generate 5000 5-dimensional subgaussian
variates with rmvss
. We then recover the parameters with an illustrative call to
fit_mvss
. We can calculate the density (dmvss
) at the center of the
distribution and get a quick estimate of the distribution between -2 and
2 for each member of the 5-dimensional variate using pmvss_mc
. We
investigate a refinement of that quick distribution estimate using
pmvss
.
rmvss
We’ll generate 5000
R> library(mvpd)
## reproducible research sets the seed
R> set.seed(10)
## specify a known 5x5 shape matrix
R> shape_matrix <- structure(c(1, 0.9, 0.9, 0.9, 0.9,
0.9, 1, 0.9, 0.9, 0.9,
0.9, 0.9, 1, 0.9, 0.9,
0.9, 0.9, 0.9, 1, 0.9,
0.9, 0.9, 0.9, 0.9, 1),
.Dim = c(5L, 5L))
## generate 5000 5-dimensional random variables
## with alpha = 1.7 and shape_matrix
R> X <- mvpd::rmvss(n = 5000, alpha = 1.7, Q = shape_matrix)
## plot all pairwise scatterplots (Figure 2)
R> copula::pairs2(X)
The ability to simulate from a distribution is useful for running simulations to test different scenarios about the phenomena being modeled by the distribution, as well as in this case, to generate a dataset with a known shape matrix and alpha to show our fitting software (next section) can recover these parameters. Our quick start code begins with generating a dataset from a known alpha and shape matrix. However, often a practitioner might start with a dataset from which parameters are estimated and then random samples can be generated from the distribution specified with those parameters to learn more about the data generating distribution and the behavior of the phenomena.
fit_mvss
If you have data in a fit_mvss
will return estimates of the parameters using the method
outlined in (Nolan 2013). The method involves fitting
univariate stable distributions for each column and assessing the
resulting libstableR::stable_fit_mle2d(W, parametrization = 1L)
and the off
diagonal elements can be found due to the properties of univariate
stable distributions (see (Nolan 2013)). For your
convenience, the univariate column-wise estimates of Matrix::nearPD
applied to the raw estimate).
## take X from previous section and estimate
## parameters for the data generating distribution
R> fitmv <- mvpd::fit_mvss(X)
R> fitmv
$univ_alphas
[1] 1.698617 1.708810 1.701662 1.696447 1.699372
$univ_betas
[1] -0.02864287 -0.04217262 -0.08444540 -0.06569907 -0.03228573
$univ_gammas
[1] 1.016724 1.000151 1.008055 1.012017 1.002993
$univ_deltas
[1] -0.03150732 -0.06525291 -0.06528644 -0.07730645 -0.04539796
$mult_alpha
[1] 1.700981
$mult_Q_raw
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0337276 0.9034599 0.8909654 0.8937814 0.8647089
[2,] 0.9034599 1.0003026 0.9394846 0.9072368 0.8535091
[3,] 0.8909654 0.9394846 1.0161748 0.8929937 0.9037467
[4,] 0.8937814 0.9072368 0.8929937 1.0241777 0.9281714
[5,] 0.8647089 0.8535091 0.9037467 0.9281714 1.0059955
$mult_Q_posdef
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0337276 0.9034599 0.8909654 0.8937814 0.8647089
[2,] 0.9034599 1.0003026 0.9394846 0.9072368 0.8535091
[3,] 0.8909654 0.9394846 1.0161748 0.8929937 0.9037467
[4,] 0.8937814 0.9072368 0.8929937 1.0241777 0.9281714
[5,] 0.8647089 0.8535091 0.9037467 0.9281714 1.0059955
An alternative for fitting this distribution is
alphastable::mfitstab.elliptical(X, 1.70, shape_matrix, rep(0,5))
and
takes 8 minutes (and requires initial values for alpha, the shape
matrix, and delta). This analysis with fit_mvss(X)
took under 2
seconds. For a run of n=1e6, d=20
, fit_mvss
scales well, taking 60
minutes.
Once the distribution has been fitted, fitmv$mult_alpha
,
fitmv$mult_Q_posdef
, and fitmv$univ_deltas
, can be used as the
alpha
, Q
, and delta
arguments, respectively, in calls to dmvss
to calculate densities and pmvss_mc
or pmvss
to calculate
probabilities. They could also be passed to rmvss
to generate random
variates for simulations.
dmvss
We can calculate the density at the center of the distribution.
## density calculation
R> mvpd::dmvss(x = fitmv$univ_deltas,
+ alpha = fitmv$mult_alpha,
+ Q = fitmv$mult_Q_posdef,
+ delta = fitmv$univ_deltas)[1]
$value
[1] 0.1278952
pmvss_mc
The method of calculating the distribution by Monte Carlo relies on the
ability to produce random variates quickly and then calculate what
proportion of them fall within the specified bounds. To generate
multivariate subgaussian stable variates, a scalar A is drawn from
## first-run of pmvss_mc
R> mvpd::pmvss_mc(lower = rep(-2,5),
+ upper = rep( 2,5),
+ alpha = fitmv$mult_alpha,
+ Q = fitmv$mult_Q_posdef,
+ delta = fitmv$univ_deltas,
+ n = 10000)
[1] 0.6820
We run it again and the answer changes:
## second-run of pmvss_mc
R> mvpd::pmvss_mc(lower = rep(-2,5),
+ upper = rep( 2,5),
+ alpha = fitmv$mult_alpha,
+ Q = fitmv$mult_Q_posdef,
+ delta = fitmv$univ_deltas,
+ n = 10000)
[1] 0.6742
With the Monte Carlo method, precision is not specified and no error is
calculated. The next section introduces how to use the integrated
distribution function
pmvss
There are three inexact entities involved in the distribution
calculation pmvss
: the numerically calculated
The outer integral by integrate
assumes the integrand is calculated
without error, but this is not the case. See the supplementary materials
section “Thoughts on error propagation in pmvss
” for justification and
guidance for specifying the values of abs.tol.si
, abseps.pmvnorm
,
and maxpts.pmvnorm
. The first of these three arguments is passed to
the abs.tol
argument of stats::integrate
and controls the absolute
tolerance of the numerically evaluated outer 1-dimensional integral. The
remaining two are passed to maxpts
and abseps
of
mvtnorm::GenzBretz
and control the accuracy of mvtnorm::pmvnorm
.
Briefly, our experience suggests that to be able to treat abs.tol.si
as the error of the result, abseps.pmvnorm
should be 1e-2 times
smaller than the specified abs.tol.si
which may require a multiple of
the default 25000 default of maxpts.pmvnorm
– which will lead to more
computational intensity and longer computation times as demonstrated
below (as conducted on Macbook Intel Core i7 chip with 2.2 GHz):
## abs.tol.si abseps.pmnvorm maxpts Time
## 1e-01 1e-03 25000
## 1e-02 1e-04 25000*10 3 sec
## 1e-03 1e-05 25000*100 22 sec
## 1e-04 1e-06 25000*1000 4 min
## 1e-05 1e-07 25000*10000 26 min
## 1e-06 1e-08 25000*85000 258 min
With this in mind, the output from the Quick Start code is:
## precision specified pmvss
R> mvpd::pmvss(lower = rep(-2,5),
+ upper = rep( 2,5),
+ alpha = fitmv$mult_alpha,
+ Q = fitmv$mult_Q_posdef,
+ delta = fitmv$univ_deltas,
+ abseps.pmvnorm = 1e-4,
+ maxpts.pmvnorm = 25000*10,
+ abs.tol.si = 1e-2)[1]
$value
[1] 0.6768467
Both pmvss
and pmvss_mc
take infinite limits. Since pmvss_mc
calculates how many random variates pmvss
might be preferred to pmvss_mc
when
calculating the tails of the distribution, unless
We provide a sense of accuracy and computational time trade-offs with a
modest simulation experiment (Figure 3, see
supplementary materials for code). Estimating these distributions is
inherently difficult – difficult in the sense that expecting accuracy
farther out than the 5th decimal place for distribution functions is
unreasonable. Therefore, we will define our “gold standard" targets for
accuracy evaluation as the numerical density produced by Robust
Analysis’ dstable
integrated by cubature::hcubature()
with tolerance
tol=1e-5
.
We will time three functions using bench::mark()
in different
scenarios. The first function is Robust Analysis’ pmvstable.MC()
(abbreviated as RAMC, below) and the other two are mvpd::pmvss_mc()
(abbreviated as PDMC, below) and mvpd::pmvss()
(abbreviated as PD,
below). Fixing
We calculate the distributions in the hypercube bounded by (-2,2) in all
four dimensions. The gold standard for the
From Figure 3, some high-level conclusions can be drawn: higher pairwise associations require more computational resources and time, increasing the precision requires more computational resources and time, and sometimes the Monte Carlo methods are faster than PD, sometimes not. PD seems to be quite precise and possibly underestimating the gold standard.
Of course, we cannot test every possible instance of alpha and shape
matrices for all
pmvss
(PD), respectively.
Analogously, panels C) and D) display results for an exchangeable shape
matrix with pmvstable.MC
(RAMC) and pmvss_mc
(PDMC) with enough
simulated variates to produce a 95 CI width that matches the precision.
Each point is an independent call and the calculated distribution is on
the Y-axis vs the median benchmark time on the X-axis. There are 20
calls per function per scenario. The dotted line is the 1e-3 boundary of
the gold standard and the dashed line is the 1e-4 boundary.QRSVN
algorithmThe derivation of the univariate student’s t distribution is commonly
motivated with a ratio of two quantities each involving random
variables: a standard normal mvtnorm::pmvt
. The reordering and rotational methodology
that makes pmvtnorm::pmvt
so fast is independent of the part that
generates mvtnorm::pmvt
would produce not multivariate student’s t distributions but
multivariate subgaussian stable distributions. We implement a
modified QRSVN algorithm for multivariate subgaussian stable
distributions in a separate package,
mvgb in honor of Genz and
Bretz.
mvgb::pmvss
Generating random variates of mvgb::pmvss
may be much faster than mvpd::pmvss
, such as
10 seconds vs 500 seconds for 4 digits of precision in the following
example:
R> set.seed(321)
R> library(mvgb)
R> tictoc::tic()
## probability calculated by mvgb takes about 10 seconds
R> gb_4digits <-
+ mvgb::pmvss(lower = rep(-2,5),
+ upper = rep( 2,5),
+ alpha = fitmv$mult_alpha,
+ Q = fitmv$mult_Q_posdef,
+ delta = fitmv$univ_deltas,
+ abseps = 1e-4,
+ maxpts = 25000*350)
R> tictoc::toc()
9.508 sec elapsed
> gb_4digits
[1] 0.6768
## now calculate same probability with similar precision
## in mvpd
R> tictoc::tic()
## probability calculated by mvpd takes about 10 MINUTES
R> pd_4digits <-
+ mvpd::pmvss(lower = rep(-2,5),
+ upper = rep( 2,5),
+ alpha = fitmv$mult_alpha,
+ Q = fitmv$mult_Q_posdef,
+ delta = fitmv$univ_deltas,
+ abseps.pmvnorm = 1e-6,
+ maxpts.pmvnorm = 25000*1000,
+ abs.tol.si = 1e-4)
R> tictoc::toc()
518.84 sec elapsed
R> pd_4digits[1]
[1] 0.6768
Although currently on CRAN, we include mvgb::pmvss
here as a
proof-of-concept and as an area of future work. More research is needed
into its computational features and accuracy, and this is encouraged by
promising preliminary results. Additionally, more research may be
warranted for other R package methodologies that use a multivariate
Gaussian, Cauchy, or Holtsmark distribution to generalize to a
multivariate subgaussian stable distribution (a helpful reviewer
suggested generalizing the multivariate distributions as used in
fHMM
(Oelschläger and Adam 2021) and generalizing the normally mixed probit
model in RprobitB). For
more about elliptically contoured multivariate distributions in general,
consult (Fang and Anderson 1990; Fang et al. 2018).
This work utilized the computational resources of the NIH HPC Biowulf
cluster (http://hpc.nih.gov). We thank Robust Analysis for providing
their stable R
package via a software grant.
mvpd, stabledist, stable, libstableR, alphastable, mvtnorm, mvgb, fHMM, RprobitB
Distributions, Econometrics, Finance
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Swihart & Nolan, "Multivariate Subgaussian Stable Distributions in R", The R Journal, 2022
BibTeX citation
@article{RJ-2022-056, author = {Swihart, Bruce J. and Nolan, John P.}, title = {Multivariate Subgaussian Stable Distributions in R}, journal = {The R Journal}, year = {2022}, note = {https://rjournal.github.io/}, volume = {14}, issue = {3}, issn = {2073-4859}, pages = {293-302} }