Convenient and easy-to-use programs are readily available in R to simulate data from and probability calculations for several common multivariate distributions such as normal and
A
The objective of our work is to enrich the existing R packages for
supporting simulation and computations for nonnormal continuous
multivariate distributions. To the best of our knowledge and also from
the list of packages for multivariate distributions in CRAN
(https://cran.r-project.org/web/views/Distributions.html), there is no
package that provides both simulation and probability computations for
multivariate Pareto distribution. In our package, we provide functions
for doing so to both Lomax (Pareto type II) and Mardia’s Pareto type I
distributions. For multivariate logistic distribution, package
VGAM (Yee 2019) implements
the bivariate logistic distribution while we support
This paper is organized as follows. In Section 2, we
provide simulation algorithms for generating data from
Multivariate Lomax distribution can be derived as the probability
distribution of a
Theorem 2.1: Conditioned on fixed mixing parameter
In view of the above result, we implement the simulation from
Algorithm-ML
Generate a random number
With
To obtain a random sample of size
Nayak (1987) also generalized this distribution by mixing conditionally
independent
Algorithm-GML
Generate a random number
With
To obtain a random sample of size
Both algorithms are easily implemented using R
stats (R Core Team 2019) functions
rexp()
and rgamma()
, respectively, for generating univariate
exponential and gamma random variates. In the following, we describe
approaches to generate other distributions related to multivariate Lomax
and generalized multivariate Lomax distributions.
Nayak (1987) has also discussed the inter-relationships between many
other multivariate distributions and generalized multivariate Lomax
distribution. In view of these inter-relationships, the above algorithm
can accordingly be amended to simulate data from these distributions - a
task which can be quite difficult to accomplish directly. These
inter-relationships are described in Table 1. For
convenience, we assume
Multivariate | Transformation/ | Probability Density Function |
Distribution | Parameter Substitutions | |
Lomax | ||
Mardia’s | ||
Pareto Type I | ||
Logistic | ||
Burr | ||
Cook-Johnson’s | ||
uniform | ||
with degrees of freedom | ||
Inverted Beta | ||
The multivariate
Figure 1 pictorially summarizes the relationships between (generalized) Lomax and other distributions.
Here, we give details of computations of cumulative probability
distribution function (
The multivariate Lomax distribution has an explicit closed-form
expression for the joint survival function given by ((2)).
The survival or cumulative distribution function of other related
distributions can be obtained either directly or through appropriate
transformations. We summarize these explicit expressions of cumulative
distribution function
Multivariate Distribution | Cumulative Distribution Function | Survival Function |
---|---|---|
Lomax | No closed form | |
Mardia’s Pareto Type I | No closed form | |
Logistic | No closed form | |
Burr | No closed form | |
Cook-Johnson’s Uniform | No closed form | |
For the cumulative distribution functions or survival functions with no
closed-form expressions, we rely on the following useful formulas
(Joe 1997):
The equation
uniroot()
, which is used for finding one
dimensional root.
For generalized multivariate Lomax distribution and its related
distributions, explicit expressions of the cumulative distribution
function and survival function are not available. Thus, we obtain the
cumulative probabilities through multiple integral in
((8)) below over the unit cube hcubature()
in package
cubature
(Narasimhan et al. 2018).
The following result is used for the computation of the cumulative
distribution function for the generalized Lomax distribution.
Property 3.1: Let
Proof of the above is straightforward by making the substitutions
Through parameter substitutions, the cumulative distribution functions
of multivariate
Distribution | Parameter | Cumulative Distribution Function |
Substitutions | ||
Multivariate |
||
with degrees of freedom | ||
Multivariate | ||
Inverted Beta |
For the above method, the run-time consumption rapidly increases as
Algorithm - CDF Computation using Monte Carlo Method:
Generate
Compute
Step 1 above is readily carried out by the random numbers generation as
described in Section 2 using the package NonNorMvtDist.
Since
We also add in our package the calculations of joint probability density
function - Being self-explanatory with all dmv*()
.
We will illustrate here the functions and corresponding arguments for
NonNorMvtDist. The calling sequences include probability density
calculation (dmv*
), cumulative distribution calculation (pmv*
),
equicoordinate quantile calculation (qmv*
), random numbers generation
(rmv*
), and survival function calculation (smv*
) for each of the
multivariate distributions introduced in Section 2. For
each distribution, we consider the bivariate case
For example, for the bivariate Lomax distribution
dmvlomax(x, parm1 = 5, parm2 = (0.5,1))
pmvlomax(q, parm1 = 5, parm2 = (0.5,1))
qmvlomax(p, parm1 = 5, parm2 = (0.5,1))
rmvlomax(n, parm1 = 5, parm2 = (0.5,1))
smvlomax(q, parm1 = 5, parm2 = (0.5,1))
It may be mentioned that our approach may be more efficient than the NORTA method (Ghosh and Henderson 2002) for simulation in that NORTA always first requires simulation from multivariate normal, which are then transformed to multivariate uniform. Only often this step, one could subsequently transform the simulated data to the desired distribution. Consequently, for large dimensions, the approach requires more computing power and time (in fact, to simulate data from the normal distribution, many programs themselves first require random number generations from uniform distributions, from which normal random numbers are obtained).
For each bivariate distribution, we provide the density surface plots
along with contours using the function persp3D()
from the package
plot3D (Soetaert 2017).
To illustrate, we define function dplot2()
, and we pass appropriate
density functions to the density function argument dfun
to create the
density surfaces. These are summarized in the digital complement along
with corresponding resulting plots. For the Lomax distribution with
parameters indicated previously, the statements will be
library(plot3D)
dplot2 <- function(dfun, x1, x2, zlim) {
zmat <- matrix(0, nrow = length(x1), ncol = length(x2))
for (i in 1:length(x1)) {
for (j in 1:length(x2)) {
zmat[i, j] = dfun(x = c(x1[i], x2[j]))
}
}
persp3D(z = zmat, x = x1, y = x2, theta = -60, phi = 10, ticktype = "detailed",
zlim = zlim, contour = list(nlevels = 30, col = "red"),
facets = FALSE, image = list(col = "white", side = "zmin"),
xlab = "X1", ylab="X2", zlab = "Density", expand = 0.5, d = 2)
}
dplot2(dfun = function(x) dmvlomax(x, parm1 = 5, parm2 = c(0.5, 1)), x1 = seq(0, 4, 0.1),
x2 = seq(0, 4, 0.1), zlim = c(-5, 13))
The plot that results is shown in Figure 2.
The following code illustrates the use of the function rmvlomax*()
with a bivariate sample of size set.seed(2019)
in advance. The digital complement explicitly provides
the code as well as output for all of the probability distributions
discussed here. In the output, each row represents a bivariate
observation.
Bivariate Lomax:
> set.seed(2019)
> rmvlomax(n = 2, parm1 = 5, parm2 = c(0.5, 1))
[,1] [,2]
[1,] 1.0174406 0.7076480
[2,] 0.3686253 0.7826978
The applications of pmv*()
, survival function smv*()
, and
equicoordinate quantile function qmv*()
are straightforward and follow
the same pattern earlier. See the digital complement for computation
details. In the following, we give code as well as output only for Lomax
distribution (
Bivariate Lomax:
> pmvlomax(q = c(1, 0.5), parm1 = 5, parm2 = c(0.5, 1))
[1] 0.7678755
> smvlomax(q = c(1, 0.5), parm1 = 5, parm2 = c(0.5, 1))
[1] 0.03125
> qmvlomax(p = 0.5, parm1 = 5, parm2 = c(0.5, 1))
[1] 0.3928917
We assess the run-times for the computation of probability (pmv*
),
equicoordinate quantile function (qmv*
) for multivariate Lomax, and
generalized multivariate Lomax distributions for reference. The survival
function (smv*
) of multivariate Lomax distribution has a closed-form
expression, and hence the assessment of computation time is omitted. We
have used the computer with Intel Core i5-8250U CPU and 8.00 GB RAM. The
results for pmvlomax()
are quit short, even for the dimension qmvlomax()
requires a considerable longer time when
pmvlomax() |
qmvlomax() |
pmvlomax() |
qmvlomax() |
|||
---|---|---|---|---|---|---|
1 | 0.01695895 | 0.03702188 | 11 | 0.04889703 | 0.7992568 | |
2 | 0.01795197 | 0.02293706 | 12 | 0.09025002 | 1.879766 | |
3 | 0.01794696 | 0.02789807 | 13 | 0.1545889 | 3.087925 | |
4 | 0.01795197 | 0.03290296 | 14 | 0.245369 | 5.950396 | |
5 | 0.01976085 | 0.04189205 | 15 | 0.4657819 | 11.37162 | |
6 | 0.01995182 | 0.06582808 | 16 | 0.853502 | 22.04591 | |
7 | 0.02094102 | 0.098768 | 17 | 1.708418 | 45.51909 | |
8 | 0.02197218 | 0.146611 | 18 | 2.803703 | 90.70764 | |
9 | 0.02396512 | 0.2393882 | 19 | 5.453189 | 181.55772 | |
10 | 0.04587412 | 0.4296861 | 20 | 10.40903 | 351.58896 |
The results for generalized pmvglomax()
and
smvglomax()
require relatively much longer time when qmvglomax()
takes longer when MC
for larger dimensions (e.g.,
when
pmvglomax() |
smvglomax() |
||||
numerical |
MC |
numerical |
MC |
||
1 | 0.03395414 | 4.600386 | 0.03084397 | 3.977374 | |
2 | 0.02397585 | 6.536522 | 0.05378199 | 5.510087 | |
3 | 0.101758 | 8.396577 | 0.208086 | 7.229721 | |
4 | 0.9758801 | 10.64278 | 2.13447 | 8.741386 | |
5 | 16.43411 | 19.30024 | 40.11997 | 9.762063 | |
6 | 305.50092 | 19.57221 | 1344.1908 | 11.63218 |
qmvglomax() |
||
numerical |
MC |
|
1 | 0.4144461 | 12.5594 |
2 | 6.383504 | 18.97528 |
3 | 96.86238 | 27.23917 |
4 | 2929.5018 | 30.62036 |
We also include the maximum likelihood estimation to estimate the
parameters for various Lomax related distributions (except for the
bivariate optim()
,
constrOptim()
, and optimize()
in this work. The functionality of
these optimization functions is described in Table 7.
Function | Number of parameters | Usage |
---|---|---|
optim() |
Multiple | General-purpose Optimization |
constrOptim() |
Multiple | Optimization with linear constraints |
optimize() |
Single | One Dimensional Optimization |
By default, all these functions perform the task of minimization of a
function. To maximize ((9)), we only need to add argument
control = list(fnscale = -1)
in functions optim()
and
constrOptim()
, and set maximum = TRUE
in function optimize()
.
For example, for the multivariate Lomax distribution, we define the
log-likelihood function loglik.lomax()
by using the following code:
loglik.lomax <- function(data, par) {
ll <- sum(dmvlomax(data, parm1 = par[1], parm2 = par[-1], log = TRUE))
}
The R stats function constrOptim()
is chosen to obtain the maximizer
of the log-likelihood function (or equivalently, loglik.lomax()
). The
linear constraints imposed on the parameters ui
is set as an
identity matrix diag(3)
, and
constraint vector ci
is set as a zero vector rep(0, 3)
. For our
illustration, let the data be the one simulated by the methods described
earlier, each with
set.seed(1)
bvtlomax <- rmvlomax(n = 300, parm1 = 5, parm2 = c(0.5, 1))
The starting initial values for rep(10, 3)
to the argument theta
in function
constrOptim()
. The gradient argument grad
is optional, and we have
chosen this to be NULL
.
> est = constrOptim(theta = rep(10, 3), f = loglik.lomax, grad = NULL, data = bvtlomax,
+ ui = diag(3), ci = rep(0, 3), control = list(fnscale = -1))
> est$convergence
[1] 0
> est$par
[1] 5.0555691 0.4468724 0.9036692
The output consists of two important pieces of information, (i) whether
convergence is successfully achieved (est$convergence = 0
) or not
(est$convergence = 1
) and (ii) the values of the final maximum
likelihood estimates. In our illustration, the convergence is
successfully achieved, and we have
We summarize the optimization methods, constraints, and the resulting
outputs for all the bivariate distributions (except for bivariate
Multivariate | Parameters | Optimization | Constraints | Estimated Parameters |
Distribution | Method | |||
Lomax | constrOptim() | |||
Mardia’s | constrOptim() | |||
Pareto Type I | ||||
Logistic | optim() | N/A | ||
Burr | constrOptim() | |||
Cook-Johnson’s Uniform | optimize() | |||
Generalized | constrOptim() | |||
Lomax | ||||
Inverted Beta | constrOptim() | |||
In this last section, we give two brief applications, which not only demonstrate the use but also confirm the accuracy and verify the correctness of our work.
Cook-Johnson’s multivariate uniform distribution is a family of
distributions that can be used for modeling nonelliptical symmetric
data. Further, in view of uniform distribution for marginal, it has been
as one of the useful choices for modeling through copula (in fact,
Cook-Johnson’s uniform distribution is indeed a Clayton copula
(Nelsen 2006)). The value of parameter
set.seed(1)
biv.unif <- rmvunif(8000, parm = 2, dim = 2)
biv.norm <- as.data.frame(apply(biv.unif, 2, qnorm))
The sample correlation coefficient biv.norm
is
computed by the following code.
> cor(biv.norm$V1, biv.norm$V2)
[1] 0.3180119
We create a bivariate scatter plot using the function ggplot()
in
package ggplot2
(Wickham 2016) for data set biv.norm
. This is shown in Figure
3 (a).
library(ggplot2)
ggplot(biv.norm, aes(x = V1, y = V2)) + xlim(c(-4, 4)) +
ylim(c(-4, 4)) + xlab("X1") + ylab("X2") + geom_point()
To assess the behavior as a function of
|
|
|
|
|
|
|
|
Multivariate qmvf()
by setting the argument corresponding to df = c(m, n, ..., n)
. The following code
gives algorithm = "numerical"
). With Monte Carlo algorithm
(algorithm = "MC"
with nsim=1,000,000
), we obtain
> qmvf(0.95, df = c(5, 1, 1))
[1] 9.551505
> qmvf(0.95, df = c(5, 1, 1), algorithm = "MC")
[1] 9.550944
For further demonstration and also to further affirm our trust in the
calculations, we compare the output of quantile function qmvf()
using
both adaptive multivariate integration and Monte Carlo methods with the
values given in Armitage and Krishnaiah (1964). These three calculations are
reported in Table 9 for a few choices of
df |
qmvf() Output |
qmvf() Output |
Tabulated Values in | |
(algorithm = "numerical" ) |
(algorithm = "MC" ) |
Armitage and Krishnaiah (1964, pp. 33-42) | ||
9.551505 | 9.550944 | 9.55 | ||
7.879999 | 7.881698 | 7.88 | ||
7.136473 | 7.165361 | 7.14 | ||
6.702224 | 6.715759 | 6.70 | ||
6.412372 | 6.399167 | 6.41 | ||
3.899335 | 3.898442 | 3.90 | ||
3.768494 | 3.767915 | 3.77 | ||
3.665646 | 3.661366 | 3.67 | ||
3.582271 | 3.583923 | 3.58 | ||
3.513163 | 3.514059 | 3.51 |
We have developed a new R package, NonNorMvtDist, for generating
multivariate random numbers from Lomax (Pareto type II), generalized
Lomax, Mardia’s Pareto type I, logistic, Burr, Cook-Johnson’s uniform,
The fact that nonnormal and skewed multivariate distributions are common in the real world but are rarely pursued for analysis due to the lack of ready-to-use computational support underscores the importance of this package. Possibilities of the use of these distributions are practically limitless and yet unforeseen in a variety of areas, starting from the biomedical sciences, reliability, and engineering as well as in statistical finance in the contexts of volatility estimation. Simulations, probability calculations, as well as calculations of quantiles, and the maximum likelihood estimation of parameters are the natural first set of computations in such studies. We have addressed all of these in this work.
The calculations of probabilities of hypercubes (for example, of
pmv*()
can be suitably modified for this
purpose. The probability density surface plots for
Content of this article constitutes a part of the PhD dissertation of Zhixin Lun at Oakland University. Financial research support from Oakland University during the summer of Year 2019 is acknowledged. We would like to thank the anonymous referees for their helpful comments and remarks to improve both package and this paper.
NonNorMvtDist, VGAM, stats, cubature, plot3D, ggplot2
Distributions, Econometrics, Environmetrics, ExtremeValue, NumericalMathematics, Phylogenetics, Psychometrics, Spatial, Survival, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Lun & Khattree, "An R package for Non-Normal Multivariate Distributions: Simulation and Probability Calculations from Multivariate Lomax (Pareto Type II) and Other Related Distributions", The R Journal, 2021
BibTeX citation
@article{RJ-2021-090, author = {Lun, Zhixin and Khattree, Ravindra}, title = {An R package for Non-Normal Multivariate Distributions: Simulation and Probability Calculations from Multivariate Lomax (Pareto Type II) and Other Related Distributions}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {2}, issn = {2073-4859}, pages = {427-440} }