In this article we present tmvtnorm, an R package implementation for the truncated multivariate normal distribution. We consider random number generation with rejection and Gibbs sampling, computation of marginal densities as well as computation of the mean and covariance of the truncated variables. This contribution brings together latest research in this field and provides useful methods for both scholars and practitioners when working with truncated normal variables.
The package mvtnorm is
the first choice in R for dealing with the Multivariate Normal
Distribution (Genz et al. 2009). However, frequently one or more variates in a
multivariate normal setting
The probability density function for
> library(tmvtnorm)
> mu <- c(0.5, 0.5)
> sigma <- matrix(c(1, 0.8, 0.8, 2), 2, 2)
> a <- c(-1, -Inf)
> b <- c(0.5, 4)
The following tasks for dealing with the truncated multinormal distribution are described in this article:
generation of random numbers
computation of marginal densities
computation of moments (mean vector, covariance matrix)
estimation of parameters
The first idea to generate variates from a truncated multivariate normal
distribution is to draw from the untruncated distribution using
rmvnorm()
in the
mvtnorm package and to
accept only those samples inside the support region (i.e., rejection
sampling). The different algorithms used to generate samples from the
multivariate normal distribution have been presented for instance in
(Mi et al. 2009) and in (Genz and Bretz 2009).
The following R code can be used to generate
> X <- rtmvnorm(n=10000, mean=mu,
> sigma=sigma, lower=a, upper=b,
> algorithm="rejection")
The parameter algorithm="rejection"
is the default value and may be
omitted.
This approach works well, if the acceptance rate rtmvnorm()
first calculates the acceptance rate
> alpha <- pmvnorm(lower=a, upper=b, mean=mu,
> sigma=sigma)
However, if only a small fraction of samples is accepted, as is the case in higher dimensions, the number of draws per sample can be quite substantial. Then rejection sampling becomes very inefficient and finally breaks down.
The second approach to generating random samples is to use the Gibbs
sampler, a Markov Chain Monte Carlo (MCMC) technique. The Gibbs sampler
samples from conditional univariate distributions
algorithm="gibbs"
, users can sample with the
Gibbs sampler.
> X <- rtmvnorm(n=10000, mean=mu,
> sigma=sigma, lower=a, upper=b,
> algorithm="gibbs")
Gibbs sampling produces a Markov chain which finally converges to a
stationary target distribution. Generally, the convergence of MCMC has
to be checked using diagnostic tools (see for instance the
coda package from
(Plummer et al. 2009)). The starting value of the chain can be set as an
optional parameter start.value
. Like all MCMC methods, the first
iterates of the chain will not have the exact target distribution and
are strongly dependent on the start value. To reduce this kind of
problem, the first iterations are considered as a burn-in period which a
user might want to discard. Using burn.in.samples=100
, one can drop
the first 100 samples of a chain as burn-in samples.
The major advantage of Gibbs sampling is that it accepts all proposals
and is therefore not affected by a poor acceptance rate acf()
or ccf()
in the
stats package).
> acf(X)
Taking only a nonsuccessive subsequence of the Markov chain, say every
thinning=k
argument:
> X2 <- rtmvnorm(n=10000, mean=mu,
> sigma=sigma, lower=a, upper=b,
> algorithm="gibbs", burn.in.samples=100,
> thinning = 5)
> acf(X2)
While the samples X
generated by Gibbs sampling exhibit
cross-correlation for both variates up to lag 1, this correlation
vanished in X2
.
Table 1 shows a comparison of rejection sampling and Gibbs
sampling in terms of speed. Both algorithms scale with the number of
samples thinning
and requires extensive loops which are slow in an
interpreted language like R. From package version 0.8, we reimplemented
the Gibbs sampler in Fortran. This compiled code is now competitive with
rejection sampling. The deprecated R code is still accessible in the
package via algorithm="gibbsR"
.
Algorithm | Samples | ||||||
---|---|---|---|---|---|---|---|
Rejection Sampling | 5600 | 0.19 sec | 2670 | 0.66 sec | 0 | ||
56000 | 1.89 sec | 26700 | 5.56 sec | 1 | |||
Gibbs Sampling (R code, no thinning) | 10000 | 0.75 sec | 10000 | 3.47 sec | 10000 | 3.44 sec | |
100000 | 7.18 sec | 100000 | 34.58 sec | 100000 | 34.58 sec | ||
Gibbs Sampling (Fortran code, no thinning) | 10000 | 0.03 sec | 10000 | 0.13 sec | 10000 | 0.12 sec | |
100000 | 0.22 sec | 100000 | 1.25 sec | 100000 | 1.21 sec | ||
Gibbs Sampling (Fortran code and thinning=10 ) |
10000 | 0.22 sec | 10000 | 1.22 sec | 10000 | 1.21 sec | |
100000 | 2.19 sec | 100000 | 12.24 sec | 100000 | 12.19 sec |
The joint density function
dtmvnorm()
. Figure
2 shows a contour plot of the bivariate
density in our example.
However, more often marginal densities of one or two variables will be
of interest to users. For multinormal variates the marginal
distributions are again normal. This property does not hold for the
truncated case. The marginals of truncated multinormal variates are not
truncated normal in general. An explicit formula for calculating the
one-dimensional marginal density dtmvnorm.marginal()
which, for convenience, is wrapped in dtmvnorm()
via a margin=i
argument. Figure 3 shows the
Kernel density estimate and the one-dimensional marginal for both dtmvnorm(..., margin=i)
> x <- seq(-1, 0.5, by=0.1)
> fx <- dtmvnorm(x, mu, sigma,
> lower=a, upper=b, margin=1)
The bivariate marginal density dtmvnorm.marginal2()
which, again for
convenience, is just as good as dtmvnorm(..., margin=c(q,r))
:
> fx <- dtmvnorm(x=c(0.5, 1),
> mu, sigma,
> lower=a, upper=b, margin=c(1,2))
The help page for this function in the package contains an example of how to produce a density contour plot for a trivariate setting similar to the one shown in figure 2.
The computation of the first and second moments (mean vector
mtmvnorm()
, which can be used
as
> moments <- mtmvnorm(mean=mu, sigma=sigma,
> lower=a, upper=b)
The moment calculation for our example results in
> colMeans(X)
> cov(X)
It can be seen in this example that truncation can significantly reduce the variance and change the covariance between variables.
Unlike our example, in many settings
The package tmvtnorm provides methods for simulating from truncated multinormal distributions (rejection and Gibbs sampling), calculations from marginal densities and also calculation of mean and covariance of the truncated variables. We hope that many useRs will find this package useful when dealing with truncated normal variables.
tmvtnorm, mvtnorm, coda, stats
Bayesian, Distributions, Finance, GraphicalModels
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Wilhelm & Manjunath, "tmvtnorm: A Package for the Truncated Multivariate Normal Distribution", The R Journal, 2010
BibTeX citation
@article{RJ-2010-005, author = {Wilhelm, Stefan and Manjunath, B. G.}, title = {tmvtnorm: A Package for the Truncated Multivariate Normal Distribution}, journal = {The R Journal}, year = {2010}, note = {https://rjournal.github.io/}, volume = {2}, issue = {1}, issn = {2073-4859}, pages = {25-29} }