We introduce and showcase *mvpd* (an acronym for *m*ulti*v*ariate *p*roduct *d*istributions), a package that uses a product distribution approach to calculating multivariate subgaussian stable distribution functions. The family of multivariate subgaussian stable distributions are elliptically contoured multivariate stable distributions that contain the multivariate Cauchy and the multivariate normal distribution. These distributions can be useful in modeling data and phenomena that have heavier tails than the normal distribution (more frequent occurrence of extreme values). Application areas include log returns for stocks, signal processing for radar and sonar data, astronomy, and hunting patterns of sharks.

Multivariate subgaussian stable distributions are the elliptically
contoured subclass of general multivariate stable distributions. To
begin a brief introduction to multivariate subgaussian stable
distributions, we start with univariate stable distributions which may
be more readily familiar and accessible. Univariate stable distributions
are a flexible family and have four parameters, \(\alpha\), \(\beta\),
\(\gamma\), and \(\delta\), and at least eleven parameterizations (!) which
has led to much confusion (Nolan 2020). Here we focus on the
1-parameterization of the Nolan style. Location is controlled by
\(\delta\), scale by \(\gamma \in (0,\infty)\), while \(\alpha \in (0,2]\) and
\(\beta \in [-1,1]\) can be considered shape parameters. Being a
location-scale family, a “standard" stable distribution will be when
\(\gamma=1\) and \(\delta=0\). A solid introduction to univariate stable
distributions can be found in the recent textbook *Univariate Stable
Distributions* (Nolan 2020) and its freely available Chapter 1
online (https://edspace.american.edu/jpnolan/stable/).

Univariate symmetric stable distributions are achieved by setting the
skew parameter \(\beta=0\), which gives symmetric distributions that are
bell-shaped like the normal distribution. A way to remember that these
are called subgaussian is to see that as \(\alpha \in (0,2]\) increases
from 0 it looks more and more normal until it **is** normal for
\(\alpha=2\) (Figure 1). The *sub* in subgaussian
refers to the tail behavior in that the rate of decrease in the tails is
*less* than that of a gaussian – note how the tails are above the
gaussian for \(\alpha<2\) in Figure 1. Equivalently, as
\(\alpha\) decreases, the tails get heavier. A notable value of \(\alpha\)
for subgaussian distributions is \(\alpha=1\) which is the Cauchy
distribution. The Cauchy and Gaussian distribution are most well-known
perhaps because they have closed-form densities, which all other
univariate symmetric stable distributions lack.

Therefore, numerically computing the densities is especially important
for application. For univariate stable distributions, there is
open-source software to compute modes, densities, distributions,
quantiles and random variates, including a number of R packages
(*stabledist*,
*stable*,
*libstableR*, for
example – see CRAN Task View: Probability
*Distributions* for
more).

As generalizations of the univariate stable distribution, multivariate stable distributions are a very broad family encompassing many complicated distributions (e.g. support in a cone, star shaped level curves, etc.). A subclass of this family is the multivariate subgaussian stable distributions. Multivariate subgaussian stable distributions are symmetric and elliptically contoured. Similar to the aforementioned univariate symmetric stable distributions, the value \(\alpha=2\) is the multivariate gaussian and \(\alpha=1\) is the multivariate Cauchy. Being that they are elliptically contoured and symmetric makes them applicable to finance where joint returns have an (approximately) elliptical joint distribution (Nolan 2020). Signal processing, such as with radar and sonar data, tasks itself with filtering impulsive noise from a signal of interest and linear filters in the presence of extreme values tend to underperform, whereas using multivariate stable distributions have been fruitful (Tsakalides and Nikias 1998; Nolan 2013). The (multivariate) Holtsmark distribution is a multivariate subgaussian stable distribution (\(\alpha=1.5\)) that has applications in astronomy, astrophysics, and plasma physics. Lévy flights, which are random walks with steps having a specific type of multivariate subgaussian stable distribution, are used to model interstellar turbulence as well as hunting patterns of sharks (Boldyrev and Gwinn 2003; Sims et al. 2008).

For multivariate subgaussian stable distributions, the parameter \(\alpha\) is a scalar as in the univariate family, while \(\boldsymbol{\delta}\) (location) becomes a \(d\)-dimensional vector and the analogue for the scale parameter is a \(d \times d\) shape matrix \(Q\). The shape matrix \(Q\) needs to be semi-positive definite and is neither a covariance matrix nor covariation matrix. An introduction to multivariate subgaussian stable distributions can be found in Nolan (2013).

Including *mvpd*, the focus
of this paper, if one wanted `R`

functions to interact with multivariate
subgaussian stable distributions they have three `R`

package options.
These packages are compared in Table 1 and detailed
below:

*alphastable*provides random univariate and multivariate generation, density calculation, and parameter fitting (albeit for modest sample sizes) via an EM algorithm method (Teimouri et al. 2018, 2019).*stable*provides support for all stable univariate distributions and multivariate subgaussian stable distributions. Other cases are handled, to varying degrees, such as isotropic, independent, and spectral measure. For the purposes of this paper, we will note that the*stable*package provides random variate generation, density calculation, parameter fitting, distribution calculations via Monte Carlo methods for multivariate subgaussian stable distributions \(2 \leq d \leq 100\). The*stable*package is developed by Robust Analysis and is available for purchase or through a software grant at http://www.robustanalysis.com/. It is distinct from the univariate*stable*package on CRAN authored by Jim Lindsey.*mvpd*provides random variate generation, density calculation, parameter fitting, distribution function calculations via Monte Carlo methods, as well as an integrated method for distribution calculations that allows tolerance specification.

Functionality | `alphastable` |
`stable` |
`mvpd` |
---|---|---|---|

random variates | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |

parameter estimation | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |

density | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |

cumulative distribution (monte carlo) | x | \(\checkmark\) | \(\checkmark\) |

cumulative distribution (integrated) | x | x | \(\checkmark\) |

multivariate subgaussian stable | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |

multivariate independent stable | x | \(\checkmark\) | x |

multivariate isotropic stable | x | \(\checkmark\) | x |

multivariate discrete-spectral-measure stable | x | \(\checkmark\) | x |

While the lack of a tractable density and distribution function impedes
directly calculating multivariate subgaussian stable distributions, it
is possible to represent them in terms of a product distribution for
which each of the two distributions in the product has known numerical
methods developed and deployed (in `R`

packages on CRAN). This paper
utilizes this approach. The next section covers some product
distribution theory.

This section reviews some known results of *product distributions* and
describes our notation. Allow univariate positive random variable \(A\)
with density \(f_A(x)\) and \(d\)-dimensional random variable G to have
density \(f_G(\mathbf{x})\) and distribution function
\(F_G(\mathbf{v}, \mathbf{w}) = P( \mathbf{v} < \mathbf{x} \leq \mathbf{w})\).
Consider the \(d-\)dimensional product \(H=A^{1/2} G\). From standard
product distribution theory we know the density \(f_H\) is represented by
1-dimensional integral:

\[\begin{aligned} f_{H}(\mathbf{x}) &=& \int_0^{\infty} f_B(u) f_G(\mathbf{x} / u) \frac{1}{|u|^d} du \,, \label{eq:density:general} \end{aligned} \tag{1}\] where \(B=A^{1/2}\) so that \(f_B(x) \coloneqq 2 x f_A(x^2)\). Consequently the distribution function \(F_H\) (with lower bound \(\mathbf{v}\) and upper bound \(\mathbf{w}\)) of the r.v. \(H\) is represented by

\[\begin{aligned} F_{H}(\mathbf{v}, \mathbf{w}) &=& \int_0^{\infty} f_B(u) \int_{v_1}^{w_1} \dots \int_{v_d}^{w_d} f_G(\mathbf{t} / u) \frac{1}{|u|^d} dt_1 \dots dt_d du \,, \\ &=& \int_0^{\infty} f_B(u) \int_{v_1 / u}^{w_1 / u} \dots \int_{v_d / u}^{w_d / u} f_G(\mathbf{t}) dt_1 \dots dt_d du \,, \nonumber \\ &=& \int_0^{\infty} f_B(u) F_G(\mathbf{v} / u, \mathbf{w}/u) du. \label{eq:distribution:general} \end{aligned} \tag{2}\] Take note of the representation in ((1)) and ((2)). From a practical standpoint, if we have a (numerical) way of calculating \(f_A\), \(f_G\), and \(F_G\) we can calculate \(f_H\) and \(F_H\). Different choices can be made for \(f_A\), \(f_G\), and \(F_G\) in this general setup. The choices required for multivariate subgaussian stable distributions are covered in the following Implementation section.

From Nolan (2013), \(H\) is a \(d\)-dimensional subgaussian stable distribution if \(A\) is a positive univariate stable distribution \[A \sim S\left( \frac{\alpha}{2}, 1, 2 \cos \left( \frac{\pi \alpha}{4} \right)^{\left(\frac{2}{\alpha}\right)} , 0; 1\right)\] and \(G\) is a \(d\)-dimensional multivariate normal \(G \sim MVN( 0, Q)\), where the shape matrix \(Q\) is positive semi-definite. This corresponds to Example 17 in Hamdan (2000).

Using the aforementioned theory of product distributions, we can arrive
at functions for a multivariate subgaussian stable density and
distribution function thanks to established functions for univariate
stable and multivariate normal distributions. A key package in the
implementation of multivariate subgaussians in R is
*mvtnorm*
(Genz and Bretz 2009; Genz et al. 2020). In the basic
product-distribution approach of
*mvpd*, \(f_G\) and \(F_G\) are
`mvtnorm::dmvnorm`

and `mvtnorm::pmvnorm`

respectively. Allow the
density of \(A\), \(f_A\) (to be numerically calculated in `R`

) using
`stable::dstable`

or `libstableR::stable_pdf`

(Royuela-del-Val et al. 2017).
Presented as pseudo-code:

- \(f_A(x, \alpha) \coloneqq \texttt{libstableR::stable\_pdf}(x, \texttt{pars = } \left(\frac{\alpha}{2}, 1, 2 \cos \{ \frac{\pi \alpha}{4} \}^{\left(\frac{2}{\alpha}\right)} , 0\right); \texttt{pm = 1})\)
- \(f_B(x, \alpha) \coloneqq 2 x f_A(x^2, \alpha)\)
- \(f_{H}(\mathbf{x}, \alpha, Q) = \int_0^{\infty} f_B(u; \alpha) \times \texttt{mvtnorm::dmvnorm}(\mathbf{x} / u, {\rm sigma}= Q) \frac{1}{|u|^d} du\)
- \(F_{H}(\mathbf{v}, \mathbf{w}, \alpha, Q) = \int_0^{\infty} f_B(u; \alpha) \times \texttt{mvtnorm::pmvnorm}({\rm lower} =\mathbf{v}/u, {\rm upper}=\mathbf{w}/u, {\rm sigma}=Q) du\)

The (outermost) univariate integral is numerically evaluated with
`stats::integrate`

.

We present an outline of the `[dpr]mvss`

(*m*ulti*v*ariate *s*ubgaussian
*s*table) functions, and walk through the code in the subsequent
sections. As an overview, we generate 5000 5-dimensional subgaussian
variates with \(\alpha=1.7\) and an “exchangeable” shape matrix using
`rmvss`

. We then recover the parameters with an illustrative call to
`fit_mvss`

. We can calculate the density (`dmvss`

) at the center of the
distribution and get a quick estimate of the distribution between -2 and
2 for each member of the 5-dimensional variate using `pmvss_mc`

. We
investigate a refinement of that quick distribution estimate using
`pmvss`

.

`rmvss`

We’ll generate 5000 \(5\)-dimensional subgaussian random variates with a specified \(\alpha\) and shape matrix. They are pictured in Figure 2. In the next section we will fit a distribution to these.

```
> library(mvpd)
R## reproducible research sets the seed
> set.seed(10)
R## specify a known 5x5 shape matrix
> shape_matrix <- structure(c(1, 0.9, 0.9, 0.9, 0.9,
R0.9, 1, 0.9, 0.9, 0.9,
0.9, 0.9, 1, 0.9, 0.9,
0.9, 0.9, 0.9, 1, 0.9,
0.9, 0.9, 0.9, 0.9, 1),
.Dim = c(5L, 5L))
## generate 5000 5-dimensional random variables
## with alpha = 1.7 and shape_matrix
> X <- mvpd::rmvss(n = 5000, alpha = 1.7, Q = shape_matrix)
R## plot all pairwise scatterplots (Figure 2)
> copula::pairs2(X) R
```

The ability to simulate from a distribution is useful for running simulations to test different scenarios about the phenomena being modeled by the distribution, as well as in this case, to generate a dataset with a known shape matrix and alpha to show our fitting software (next section) can recover these parameters. Our quick start code begins with generating a dataset from a known alpha and shape matrix. However, often a practitioner might start with a dataset from which parameters are estimated and then random samples can be generated from the distribution specified with those parameters to learn more about the data generating distribution and the behavior of the phenomena.

`fit_mvss`

If you have data in a \(n \times d\) matrix \(\boldsymbol{X}\) and want to fit a
\(d\)-dimensional multivariate subgaussian distribution to those data,
then `fit_mvss`

will return estimates of the parameters using the method
outlined in (Nolan 2013). The method involves fitting
univariate stable distributions for each column and assessing the
resulting \(\alpha\), \(\beta\) and \(\delta\) parameters. The column-wise
\(\alpha\) estimates should be similar and the column-wise \(\beta\)
estimates close to 0. This column-wise univariate fitting is carried out
by `libstableR::stable_fit_mle2d(W, parametrization = 1L)`

and the off
diagonal elements can be found due to the properties of univariate
stable distributions (see (Nolan 2013)). For your
convenience, the univariate column-wise estimates of \(\alpha\), \(\beta\),
\(\gamma\) and \(\delta\) are returned in addition to the raw estimate of
the shape matrix and the nearest positive definite shape matrix (as
computed by `Matrix::nearPD`

applied to the raw estimate).

```
## take X from previous section and estimate
## parameters for the data generating distribution
> fitmv <- mvpd::fit_mvss(X)
R> fitmv
R$univ_alphas
1] 1.698617 1.708810 1.701662 1.696447 1.699372
[
$univ_betas
1] -0.02864287 -0.04217262 -0.08444540 -0.06569907 -0.03228573
[
$univ_gammas
1] 1.016724 1.000151 1.008055 1.012017 1.002993
[
$univ_deltas
1] -0.03150732 -0.06525291 -0.06528644 -0.07730645 -0.04539796
[
$mult_alpha
1] 1.700981
[
$mult_Q_raw
1] [,2] [,3] [,4] [,5]
[,1,] 1.0337276 0.9034599 0.8909654 0.8937814 0.8647089
[2,] 0.9034599 1.0003026 0.9394846 0.9072368 0.8535091
[3,] 0.8909654 0.9394846 1.0161748 0.8929937 0.9037467
[4,] 0.8937814 0.9072368 0.8929937 1.0241777 0.9281714
[5,] 0.8647089 0.8535091 0.9037467 0.9281714 1.0059955
[
$mult_Q_posdef
1] [,2] [,3] [,4] [,5]
[,1,] 1.0337276 0.9034599 0.8909654 0.8937814 0.8647089
[2,] 0.9034599 1.0003026 0.9394846 0.9072368 0.8535091
[3,] 0.8909654 0.9394846 1.0161748 0.8929937 0.9037467
[4,] 0.8937814 0.9072368 0.8929937 1.0241777 0.9281714
[5,] 0.8647089 0.8535091 0.9037467 0.9281714 1.0059955 [
```

An alternative for fitting this distribution is
`alphastable::mfitstab.elliptical(X, 1.70, shape_matrix, rep(0,5))`

and
takes 8 minutes (and requires initial values for alpha, the shape
matrix, and delta). This analysis with `fit_mvss(X)`

took under 2
seconds. For a run of `n=1e6, d=20`

, `fit_mvss`

scales well, taking 60
minutes.

Once the distribution has been fitted, `fitmv$mult_alpha`

,
`fitmv$mult_Q_posdef`

, and `fitmv$univ_deltas`

, can be used as the
`alpha`

, `Q`

, and `delta`

arguments, respectively, in calls to `dmvss`

to calculate densities and `pmvss_mc`

or `pmvss`

to calculate
probabilities. They could also be passed to `rmvss`

to generate random
variates for simulations.

`dmvss`

We can calculate the density at the center of the distribution.

```
## density calculation
> mvpd::dmvss(x = fitmv$univ_deltas,
R+ alpha = fitmv$mult_alpha,
+ Q = fitmv$mult_Q_posdef,
+ delta = fitmv$univ_deltas)[1]
$value
1] 0.1278952 [
```

`pmvss_mc`

The method of calculating the distribution by Monte Carlo relies on the ability to produce random variates quickly and then calculate what proportion of them fall within the specified bounds. To generate multivariate subgaussian stable variates, a scalar A is drawn from \[\texttt{libstableR::stable\_rnd}(n, \texttt{pars = } \left(\frac{\alpha}{2}, 1, 2 \cos \{ \frac{\pi \alpha}{4} \}^{\left(\frac{2}{\alpha}\right)} , 0\right); \texttt{pm = 1})\] and then the square-root of \(A\) multiplied by a draw \(G\) from \[\texttt{mvtnorm::rmvnorm}(n, {\rm sigma}=Q).\] This allows for quick calculations but to increase precision requires generating larger number of random variates. For instance, if we wanted the distribution between -2 and 2 for each dimension, we could generate 10,000 random variates and then see how many of them fall between the bounds. It looks like 6,820 variates were within the bounds:

```
## first-run of pmvss_mc
> mvpd::pmvss_mc(lower = rep(-2,5),
R+ upper = rep( 2,5),
+ alpha = fitmv$mult_alpha,
+ Q = fitmv$mult_Q_posdef,
+ delta = fitmv$univ_deltas,
+ n = 10000)
1] 0.6820 [
```

We run it again and the answer changes:

```
## second-run of pmvss_mc
> mvpd::pmvss_mc(lower = rep(-2,5),
R+ upper = rep( 2,5),
+ alpha = fitmv$mult_alpha,
+ Q = fitmv$mult_Q_posdef,
+ delta = fitmv$univ_deltas,
+ n = 10000)
1] 0.6742 [
```

With the Monte Carlo method, precision is not specified and no error is calculated. The next section introduces how to use the integrated distribution function \(F_H\) from product theory and specify precision.

`pmvss`

There are three inexact entities involved in the distribution
calculation \(F_H\) as found in `pmvss`

: the numerically calculated \(F_G\),
the numerically calculated \(f_A\), and the outer numerical integration.

The outer integral by `integrate`

assumes the integrand is calculated
without error, but this is not the case. See the supplementary materials
section “Thoughts on error propagation in `pmvss`

” for justification and
guidance for specifying the values of `abs.tol.si`

, `abseps.pmvnorm`

,
and `maxpts.pmvnorm`

. The first of these three arguments is passed to
the `abs.tol`

argument of `stats::integrate`

and controls the absolute
tolerance of the numerically evaluated outer 1-dimensional integral. The
remaining two are passed to `maxpts`

and `abseps`

of
`mvtnorm::GenzBretz`

and control the accuracy of `mvtnorm::pmvnorm`

.

Briefly, our experience suggests that to be able to treat `abs.tol.si`

as the error of the result, `abseps.pmvnorm`

should be 1e-2 times
smaller than the specified `abs.tol.si`

which may require a multiple of
the default 25000 default of `maxpts.pmvnorm`

– which will lead to more
computational intensity and longer computation times as demonstrated
below (as conducted on Macbook Intel Core i7 chip with 2.2 GHz):

```
## abs.tol.si abseps.pmnvorm maxpts Time
## 1e-01 1e-03 25000
## 1e-02 1e-04 25000*10 3 sec
## 1e-03 1e-05 25000*100 22 sec
## 1e-04 1e-06 25000*1000 4 min
## 1e-05 1e-07 25000*10000 26 min
## 1e-06 1e-08 25000*85000 258 min
```

With this in mind, the output from the Quick Start code is:

```
## precision specified pmvss
> mvpd::pmvss(lower = rep(-2,5),
R+ upper = rep( 2,5),
+ alpha = fitmv$mult_alpha,
+ Q = fitmv$mult_Q_posdef,
+ delta = fitmv$univ_deltas,
+ abseps.pmvnorm = 1e-4,
+ maxpts.pmvnorm = 25000*10,
+ abs.tol.si = 1e-2)[1]
$value
1] 0.6768467 [
```

Both `pmvss`

and `pmvss_mc`

take infinite limits. Since `pmvss_mc`

calculates how many random variates \(H_i,~ i \in \{1,\dots,n\}\) are
within the bounds, `pmvss`

might be preferred to `pmvss_mc`

when
calculating the tails of the distribution, unless \(n\) is made massively
large.

We provide a sense of accuracy and computational time trade-offs with a
modest simulation experiment (Figure 3, see
supplementary materials for code). Estimating these distributions is
inherently difficult – difficult in the sense that expecting accuracy
farther out than the 5th decimal place for distribution functions is
unreasonable. Therefore, we will define our “gold standard" targets for
accuracy evaluation as the numerical density produced by Robust
Analysis’ `dstable`

integrated by `cubature::hcubature()`

with tolerance
`tol=1e-5`

.

We will time three functions using `bench::mark()`

in different
scenarios. The first function is Robust Analysis’ `pmvstable.MC()`

(abbreviated as RAMC, below) and the other two are `mvpd::pmvss_mc()`

(abbreviated as PDMC, below) and `mvpd::pmvss()`

(abbreviated as PD,
below). Fixing \(\alpha=1.7\) and dimension \(d=4\), the different test
scenarios will involve a low level of pairwise association vs. a high
level in a shape matrix of the form:

- \(Q_{exch}= \begin{bmatrix} 1 & \rho & \rho & \rho \\ \rho & 1 & \rho & \rho \\ \rho & \rho & 1 & \rho \\ \rho & \rho & \rho & 1 \end{bmatrix}\) for \(\rho = 0.1\) and \(\rho=0.9\).

We calculate the distributions in the hypercube bounded by (-2,2) in all four dimensions. The gold standard for the \(\rho=0.1\) case was 0.5148227 and 0.7075104 for the \(\rho=0.9\) case. The numerical integration of the former took 3 minutes whereas the latter took 1 hour – which portends that higher associations involve more computational difficulty. We back-calculated the number of samples needed to give the methods involving Monte Carlo (RAMC and PDMC) a 95% CI width that would fall within 0.001 and 0.0001 of the gold standard, and display the scatter plots of estimate and computational time in (Figure 3).

From Figure 3, some high-level conclusions can be drawn: higher pairwise associations require more computational resources and time, increasing the precision requires more computational resources and time, and sometimes the Monte Carlo methods are faster than PD, sometimes not. PD seems to be quite precise and possibly underestimating the gold standard.

Of course, we cannot test every possible instance of alpha and shape matrices for all \(d\) dimensions, integration limits, and specified precision. In our experience, the computational intensity is an interplay between alpha, the integration limits, the shape matrix structure, delta, and the requested precision. We provide the code that we used for our simulation study and encourage the readers who need to explore these issues for their particular integral to edit the code accordingly.