An R package for Non-Normal Multivariate Distributions: Simulation and Probability Calculations from Multivariate Lomax (Pareto Type II) and Other Related Distributions

Convenient and easy-to-use programs are readily available in R to simulate data from and probability calculations for several common multivariate distributions such as normal and t. However, functions for doing so from other less common multivariate distributions, especially those which are asymmetric, are not as readily available, either in R or otherwise. We introduce the R package NonNorMvtDist to generate random numbers from multivariate Lomax distribution, which constitutes a very flexible family of skewed multivariate distributions. Further, by applying certain useful properties of multivariate Lomax distribution, multivariate cases of generalized Lomax, Mardia’s Pareto of Type I, Logistic, Burr, Cook-Johnson’s uniform, F, and inverted beta can be also considered, and random numbers from these distributions can be generated. Methods for the probability and the equicoordinate quantile calculations for all these distributions are then provided. This work substantially enriches the existing R toolbox for nonnormal or nonsymmetric multivariate probability distributions.


Introduction
A k-dimensional multivariate Lomax (Pareto Type II) probability distribution was first introduced by (Nayak, 1987) as a joint distribution of k skewed nonnegative random variables X 1 , · · · , X k with joint probability density function given by f (x 1 , . . . , x k ) = ∏ k i=1 θ i a(a + 1) · · · (a + k − 1) We will denote above density function by ML k (a; θ 1 , . . . , θ k ). Prior to Nayak (1987), the bivariate case of multivariate Lomax distribution was studied by Lindley and Singpurwalla (1986). Nayak (1987) indicated that the k-dimensional multivariate Lomax distribution could be obtained by mixing k independent univariate exponential distributions with different failure rates with the mixing parameter η that has a gamma distribution with certain shape parameter a and the scale parameter 1. This fact readily provides an approach to simulate the multidimensional random vectors from the multivariate Lomax distribution. The multivariate Lomax distribution is also transformable to many other useful multivariate distributions, and therefore, simulations from these distributions are also easily accomplished. Similarly, with appropriate transformations or reparameterizations (or otherwise directly from the probability density function (pd f )), we can also accomplish the cumulative probability calculations as well as the calculation of equicoordinate quantiles. The objective of this work is to formalize all of the above and to provide a ready-to-use R package titled NonNorMvtDist for practitioners to efficiently execute the same. See Lun and Khattree (2020).
numbers generation from Mardia's multivariate Pareto Type I, Logistic, Burr, Cook-Johnson's uniform, and F are achieved. Based on the remarks from Balakrishnan and Lai (2009), we then extend Nayak's work to simulate data from the multivariate inverted beta distribution. In Section Probability computations, we discuss numerical computations of cumulative distribution functions of equicoordinate quantiles and survival functions for the above distributions. In Section Illustrations of simulations and probability calculations, we illustrate the use of respective functions as implemented in R for each of the above distributions for the bivariate (k = 2) case. In Section Computation times, we provide a run-time study to assess the computation times for functions as the dimension of data increases. In Section Maximum likelihood estimation of parameters, we implement maximum likelihood estimation of parameters for these distributions. In Section Two applications, we give two applications of package NonNorMvtDist, namely, (i) data generation from certain nonelliptical symmetric multivariate distributions with univariate normal marginals and (ii) computation of critical values of the multivariate F distribution. Section Concluding remarks includes some concluding remarks, pointing out other applications.

Multivariate Lomax and related distributions
Multivariate Lomax distribution can be derived as the probability distribution of a k-component system where k independent exponential random variables have a common environment or mixing parameter following a gamma distribution with shape parameter a and scale parameter b. Let the corresponding random vector be X = (X 1 , · · · , X k ) ′ . The probability density function of X is given by (1), and the joint survival function of X is Specifically, the pivotal result that we use is given by the following theorem (see Nayak (1987)), Theorem 2.1: Conditioned on fixed mixing parameter η, representing the environment effect, let X 1 , · · · , X k be independent exponentially distributed random variables with failure rates ηλ 1 , · · · , ηλ k , respectively. Let the environment effect η be distributed as a Gamma random variate with probability density Then, the unconditional joint density of X 1 , · · · , X k is given by (1), where θ i = λ i /b, i = 1, · · · , k. Clearly, without loss of generality, b can be taken as 1, in which case θ i = λ i , i = 1, · · · , k. In view of the above result, we implement the simulation from k-dimensional multivariate Lomax distribution by adopting the following algorithm.
Both algorithms are easily implemented using R stats (R Core Team, 2019) functions rexp() and rgamma(), respectively, for generating univariate exponential and gamma random variates. In the following, we describe approaches to generate other distributions related to multivariate Lomax and generalized multivariate Lomax distributions. Nayak (1987) has also discussed the inter-relationships between many other multivariate distributions and generalized multivariate Lomax distribution. In view of these inter-relationships, the above algorithm can accordingly be amended to simulate data from these distributions -a task which can be quite difficult to accomplish directly. These inter-relationships are described in Table  1. For convenience, we assume X = (X 1 , · · · , X k ) ′ ∼ ML k (a; θ 1 , · · · , θ k ) and T = (T 1 , · · · , T k ) ′ ∼ GML k (a; θ 1 , · · · , θ k ; l 1 , · · · , l k ).
The multivariate F distribution can also be obtained by considering where S 0 , S 1 , . . . , S k are independent Chi-square variables with 2a, 2l 1 , . . . , 2l k degrees of freedom respectively; see Johnson and Kotz (1972). It is the joint distribution of the ratios of mean squares under certain linear hypotheses on treatments as discussed in Krishnaiah (1965) in the context of simultaneous ANOVA and MANOVA tests where S 0 is a residual sum of squares and S i 's are various effect sums of squares. The density given in Table 1 is a special case of generalized multivariate F distribution defined by Krishnaiah (1965) when S i 's, i = 1, · · · , k, are all independent. This fact is useful in that the Tables given by Armitage and Krishnaiah (1964) make use of this in constructing the statistical tables for certain linear hypotheses. We use these tables to confirm our calculation as done by our R programs. The multivariate Inverted Beta distribution also called the multivariate inverted Dirichlet distribution, is essentially a special case of multivariate F distribution when l 1 = l 2 = · · · = l k = a. Mardia's Pareto Type I (a, θ)

Probability computations
Here, we give details of computations of cumulative probability distribution function (cd f ), survival function, equicoordinate quantile function for each distribution introduced in Section Multivariate Lomax and related distributions. Depending on the situation, the calculation may sometimes be simpler for joint cd f or joint survival function.

Distributions transformable from Lomax distribution
The multivariate Lomax distribution has an explicit closed-form expression for the joint survival function given by (2). The survival or cumulative distribution function of other related distributions can be obtained either directly or through appropriate transformations. We summarize these explicit expressions of cumulative distribution function F(·) and survival function S(·) in Table 2.
For the cumulative distribution functions or survival functions with no closed-form expressions, we rely on the following useful formulas (Joe, 1997): where is the joint cd f (joint survival function) of x j where the subscripts belong to the set C, which is a subset of {1, 2, · · · , k}. Clearly, C ∈ C where C is the powerset of {1, 2, · · · , k}. Also, |C| represents the cardinality of C.

Cumulative Distribution Function Survival Function
Lomax for a given p does not have a unique solution (q 1 , · · · , q k ). We thus provide the quantile computations only for the equicoordinate quantile, obtained by solving the following equation for q, where 0 < p < 1 is a (given) cumulative probability. We make use of the R stats function uniroot(), which is used for finding one dimensional root.

Distributions related to generalized multivariate Lomax distribution
For generalized multivariate Lomax distribution and its related distributions, explicit expressions of the cumulative distribution function and survival function are not available. Thus, we obtain the cumulative probabilities through multiple integral in (8) below over the unit cube [0, 1] k by using the adaptive multivariate integration function hcubature() in package cubature (Narasimhan et al., 2018).
The following result is used for the computation of the cumulative distribution function for the generalized Lomax distribution.
Proof of the above is straightforward by making the substitutions Through parameter substitutions, the cumulative distribution functions of multivariate F and the inverted beta distribution can be found. These are summarized in Table 3. For the above method, the run-time consumption rapidly increases as k becomes large. Thus as an alternative, we also provide the option of computation of cumulative distribution function via Monte Carlo method. The corresponding algorithm is
Step 1 above is readily carried out by the random numbers generation as described in Section Multivariate Lomax and related distributions using the package NonNorMvtDist. Since cd f is computable using adaptive multivariate integration over unit cube [0, 1] k or via the Monte Carlo method, it follows that the survival function can also be calculated (by using (6)). The equicoordinate quantile is computed by using (7).
We also add in our package the calculations of joint probability density function -Being selfexplanatory with all pd f s available in closed form, it needs no further elaboration. The corresponding function is dmv*().

Illustrations of simulations and probability calculations
We will illustrate here the functions and corresponding arguments for NonNorMvtDist. The calling sequences include probability density calculation (dmv*), cumulative distribution calculation (pmv*), equicoordinate quantile calculation (qmv*), random numbers generation (rmv*), and survival function calculation (smv*) for each of the multivariate distributions introduced in Section Multivariate Lomax and related distributions. For each distribution, we consider the bivariate case (k = 2). This choice enables us to also succinctly and graphically present the probability density plots. The detailed description of the calling sequence for each of the several cases has been moved into a digital complement of this paper.

Random number generation
The following code illustrates the use of the function rmvlomax*() with a bivariate sample of size n = 2. Sampling is done by setting set.seed(2019) in advance. The digital complement explicitly provides the code as well as output for all of the probability distributions discussed here. In the output, each row represents a bivariate observation.

CDF, survival function and equicoordinate quantile
The applications of cd f pmv*(), survival function smv*(), and equicoordinate quantile function qmv*() are straightforward and follow the same pattern earlier. See the digital complement for computation details. In the following, we give code as well as output only for Lomax distribution (a = 5, θ 1 = 0.5, θ 2 = 1) for specified coordinates (x 1 , x 2 ) and for the cumulative probability p = 0.5.

Computation times
We assess the run-times for the computation of probability (pmv*), equicoordinate quantile function (qmv*) for multivariate Lomax, and generalized multivariate Lomax distributions for reference. The survival function (smv*) of multivariate Lomax distribution has a closed-form expression, and hence the assessment of computation time is omitted. We have used the computer with Intel Core i5-8250U CPU and 8.00 GB RAM. The results for p-variate Lomax distribution are summarized in Table 4. As we can see, the run-times for pmvlomax() are quit short, even for the dimension p = 20. However, qmvlomax() requires a considerable longer time when p ≥ 17, which seems to double for every extra dimension added to the size of the random vector. The computation times in Table 4 can also be used as a reference for the distributions related to multivariate Lomax, which are in Table 2 since we apply the same approach for probability computations there as well.  Table 4: Runtimes (in seconds) for functions pmvlomax() and qmvlomax() as functions of p.
The results for generalized p-variate Lomax distribution are summarized in Tables 5 and 6. Both functions pmvglomax() and smvglomax() require relatively much longer time when p > 4, and qmvglomax() takes longer when p > 2. Based on the run-time consumption, we recommend algorithm MC for larger dimensions (e.g., when p > 5). Similarly, this run-time study can also be used as a reference for the related distributions (to generalized Lomax distribution) as listed in Table 3

Maximum likelihood estimation of parameters
We also include the maximum likelihood estimation to estimate the parameters for various Lomax related distributions (except for the bivariate F distribution). Although many of the density functions in Section Multivariate Lomax and related distributions have complicated forms, maximum likelihood estimation can be easily accomplished by using the built-in optimization functions in R stats. The log-likelihood function for a given sample x 1 , . . . , x n is given by where n is the sample size, θ is a vector of parameters to be estimated, and x j = (x j1 , . . . , x jk ) ′ is the jth observation for the random vector X = (X 1 , . . . , X k ) ′ , respectively. The maximizerθ of the log-likelihood function given in (9), namely, is obtained using an appropriate optimization method. The parameter space Θ in each case must be appropriately constrained, and these constraints must be taken into account during the optimization process. We have thus made use of three R stats functions, namely, optim(), constrOptim(), and optimize() in this work. The functionality of these optimization functions is described in Table 7.

Number of parameters Usage
Optimization with linear constraints optimize() Single One Dimensional Optimization Table 7: Use of optimization functions in R stats.
By default, all these functions perform the task of minimization of a function. To maximize (9), we only need to add argument control = list(fnscale = -1) in functions optim() and constrOptim(), and set maximum = TRUE in function optimize().
We summarize the optimization methods, constraints, and the resulting outputs for all the bivariate distributions (except for bivariate F distribution) in Table 8. The detailed illustrations and codes for the remaining distributions are included in the digital complement. Observe that these estimates are reasonably close to the true parameter values, thereby confirming that the program is functioning as it is expected to.

Multivariate Parameters Optimization Constraints Estimated Parameters Distribution Method
Lomax

Two applications
In this last section, we give two brief applications, which not only demonstrate the use but also confirm the accuracy and verify the correctness of our work.

Generating data from the nonelliptical symmetric distributions with univariate normal marginals
Cook-Johnson's multivariate uniform distribution is a family of distributions that can be used for modeling nonelliptical symmetric data. Further, in view of uniform distribution for marginal, it has been as one of the useful choices for modeling through copula (in fact, Cook-Johnson's uniform distribution is indeed a Clayton copula (Nelsen, 2006)). The value of parameter a impacts the common correlation coefficient ρ among variates in that ρ → 0 as a → ∞, and ρ → 1 as a → 0 (Cook and Johnson, 1981). An interesting application of Cook-Johnson's multivariate uniform distribution is to obtain new joint distributions by marginal transformation. Specifically, we consider the problem of generating random numbers from a multivariate distribution that is not elliptically symmetric but has univariate normal marginals. Let U i 's, i = 1, 2, be two random variables corresponding to the Cook-Johnson's bivariate uniform distribution. The following code yields the pairs of random numbers, each having the normal marginals by the transformation X i = Φ −1 (Ui), where Φ −1 (·) is the quantile function of a standard normal distribution. Clearly, the joint distribution of X 1 and X 2 is not bivariate normal. To begin with, the parameter a is taken to be a = 2.

Creating tables for simultaneous MANOVA hypothesis tests
Multivariate F distribution arises naturally as the distribution of test statistics in several testing problems in simultaneous MANOVA. Let s 2 1 , · · · , s 2 k be k independent sums of squares, and s 2 0 be the sum of squares due to error in the classical ANOVA model. Also, let H 1 , · · · , H k be certain individual linear hypotheses with the corresponding sum of squares s 2 1 , · · · , s 2 k . Assume that under H 1 , · · · , H k , respectively, the sums of squares s 2 1 , · · · , s 2 k are all χ 2 random variables each with n degrees of freedom and s 2 0 is a χ 2 random variable with m degrees of freedom and is independent of s 2 1 , · · · , s 2 k . Armitage and Krishnaiah (1964) defined the critical values F α at α level of significance for simultaneously testing hypotheses H 1 , · · · , H k by the probability statement, The quantities (ms 2 i )/(ns 2 0 ), i = 1, · · · , k jointly follow multivariate F distribution if the overall null hypothesis H 0 = k i=1 H i is true. In this case, the critical value F α can be readily computed using the equicoordinate quantile function qmvf() by setting the argument corresponding to k + 1 values of the degrees of freedom as df = c(m,n,...,n). The following code gives F 0.05 = 9.551505 for the bivariate F case when m = 5 and n = 1 with default algorithm by using adaptive multiple integration over unit cube (algorithm = "numerical"). With Monte Carlo algorithm (algorithm = "MC" with nsim=1,000,000), we obtain F 0.05 = 9.550944. Note that the Monte Carlo method is seed dependent, so the output from different runs may slightly differ from each other.
> qmvf(0.95, df = c(5, 1, 1)) [1] 9.551505 > qmvf(0.95, df = c(5, 1, 1), algorithm = "MC") [1] 9.550944 For further demonstration and also to further affirm our trust in the calculations, we compare the output of quantile function qmvf() using both adaptive multivariate integration and Monte Carlo methods with the values given in Armitage and Krishnaiah (1964). These three calculations are reported in Table 9 for a few choices of m and n. The agreement among the three columns shows that the package NonNorMvtDist provides a convenient way to obtain percentage points for the hypothesis testing problems considered by Armitage and Krishnaiah (1964) and Krishnaiah (1965). Clearly, unlike the tables in Armitage and Krishnaiah, the choices of α and degrees of freedom are not restricted, and in that sense, our package is very comprehensive and exhaustive in this respect. α df (m, n, n) qmvf() Output qmvf() Output Tabulated Values in (algorithm = "numerical") (algorithm = "MC") Armitage and Krishnaiah (1964, pp. 33-42) 0.05 Table 9: Comparison between the output of qmvf() with the values given in Armitage and Krishnaiah (1964).

Concluding remarks
We have developed a new R package, NonNorMvtDist, for generating multivariate random numbers from Lomax (Pareto type II), generalized Lomax, Mardia's Pareto type I, logistic, Burr, Cook-Johnson's uniform, F, and inverted beta distributions. Detailed examples of each distribution are given to illustrate data simulation, probability calculations, and statistical modeling.
The fact that nonnormal and skewed multivariate distributions are common in the real world but are rarely pursued for analysis due to the lack of ready-to-use computational support underscores the importance of this package. Possibilities of the use of these distributions are practically limitless and yet unforeseen in a variety of areas, starting from the biomedical sciences, reliability, and engineering as well as in statistical finance in the contexts of volatility estimation. Simulations, probability calculations, as well as calculations of quantiles, and the maximum likelihood estimation of parameters are the natural first set of computations in such studies. We have addressed all of these in this work.
The calculations of probabilities of hypercubes (for example, of P[a 1 < X 1 < b 1 , a 2 < X 2 < b 2 , a 3 < X 3 < b 3 ]) can be easily implemented by appropriately combining several cd f calculations. Alternatively, our codes for pmv*() can be suitably modified for this purpose. The probability density surface plots for any bivariate marginal can be easily constructed since, for the multivariate Lomax distribution, the marginal distributions of any subset of random variables also follow the multivariate Lomax distribution in the appropriate dimension. Further, our work provides a way to generate data from, probability calculations for, as well as modeling for, the data which are marginally distributed as normal but jointly are not.