In recent years, composite models based on the lognormal distribution have become popular in actuarial sciences and related areas. In this short note, we present a new R package for computing the probability density function, cumulative density function, and quantile function, and for generating random numbers of any composite model based on the lognormal distribution. The use of the package is illustrated using a real data set.
Two-piece composite distributions arise in many areas of the sciences. The first two-piece composite distribution with each piece described by a normal distribution appears to have been used by (Gibbons and Mylroie 1973). Recently, two-piece composite distributions with the first piece described a lognormal distribution (which we refer to as composite lognormal distributions) have proved popular in insurance and related areas. (Cooray and Ananda 2005) introduced such distributions recently, but one of the referees has pointed out that composite lognormal distributions have been known before at least as early as (Barford and Crovella 1998) in areas like modeling workloads of web servers.
(Cooray and Ananda 2005) showed that composite lognormal distributions can give better fits than standard univariate distributions. They illustrate this fact for the Danish fire insurance data by introducing the composite lognormal-Pareto distribution, a distribution obtained by piecing together lognormal and Pareto probability density functions (pdfs). (Scollnik 2007) improved the composite lognormal-Pareto distribution by using mixing weights as coefficients for each pdf replacing the constant weights applied earlier by (Cooray and Ananda 2005). (Scollnik 2007) also employed generalized Pareto distribution in place of Pareto distribution used by (Cooray and Ananda 2005). (Pigeon and Denuit 2011) provided further extensions of composite lognormal distributions by choosing the cutoff point (the point at which the two pdfs are pieced together) as a random variable. The most recent extensions of composite lognormal distributions are provided in (Nadarajah and Bakar 2012).
Composite lognormal distributions have attracted considerable attention in spite of being introduced only in 2005. Some applications in the last three years include: estimation of insurance claim cost distributions (Bolancé et al. 2010), inflation and excess insurance (Fackler 2010), large insurance claims in case reserves (Lindblad 2011), statistical mechanics (Eliazar and Cohen 2012), and modeling of mixed traffic conditions (Dubey et al. 2013).
The aim of this short note is to present a new contributed package CompLognormal for R that computes basic properties for any composite lognormal distribution. The properties considered include the pdf, cumulative distribution function (cdf), quantile function, and random numbers. Explicit expressions for these properties are given in Section “Properties”. Two illustrations of the practical use of the new R package are given in Section “Illustrations”.
Let \(f_i (\cdot)\), \(i = 1, 2\) be valid pdfs. Let \(F_i (\cdot)\), \(i = 1, 2\) denote the corresponding cdfs. In general, the pdf of a two-piece composite distribution is given by \[\displaystyle f(x) = \left\{ \begin{array}{ll} \displaystyle a_{1} f_{1}^{*}(x), & \mbox{if} \quad 0 < x \leq \theta, \\ \displaystyle a_{2} f_{2}^{*}(x), & \mbox{if} \quad \theta < x < \infty, \end{array} \right. \tag{1} \] where \(f_1^{*}(x) = f_1 (x) / F_1 (\theta)\), \(f_2^{*}(x) = f_2 (x) / \{ 1 - F_2 (\theta) \}\), \(\theta\) is the cutoff point, and \(a_1, a_2\) are non-negative weights summing to one. As suggested in (Nadarajah and Bakar 2012), we take \(a_{1}= \frac {1}{1+\phi}\) and \(a_{2}=\frac {\phi}{1+\phi}\) for \(\phi>0\).
The cdf corresponding to ((1)) is \[\displaystyle F(x) = \left\{ \begin{array}{ll} \displaystyle \frac {\displaystyle 1}{\displaystyle 1+\phi} \frac {\displaystyle F_{1}(x)}{\displaystyle F_{1}(\theta)}, & \mbox{if } \quad 0 < x \leq \theta, \\ \displaystyle \frac {\displaystyle 1}{\displaystyle 1+\phi} \left[ 1 + \phi \frac {\displaystyle F_{2}(x)-F_{2}(\theta)}{\displaystyle 1-F_{2}(\theta)} \right], & \mbox{if } \quad \theta < x < \infty. \end{array} \right. \label{eq:cdfcomp} \tag{2} \] The quantile function corresponding to ((1)) is \[\displaystyle Q(u) = \left\{ \begin{array}{ll} \displaystyle F_1^{-1} \left( u (1+\phi) F_{1}(\theta) \right), & \mbox{if } \quad 0 < u \leq \frac {\displaystyle 1}{\displaystyle 1+\phi}, \\ \displaystyle F_2^{-1} \left( F_{2}(\theta) + \left( 1 - F_{2}(\theta) \right) \left( \frac {\displaystyle u(1+\phi)-1}{\displaystyle \phi} \right) \right), & \mbox{if } \quad \frac {\displaystyle 1}{\displaystyle 1+\phi} < u < \infty. \end{array} \right. \label{eq:qfcomp} \tag{3} \] This quantile function can be used to generate random numbers from the two-piece composite distribution.
We are interested in a two-piece composite distribution, where the first piece is specified by the lognormal distribution. So, we take \[\displaystyle f_{1}(x)= \frac {\displaystyle 1}{\displaystyle x \sigma} \psi \left( \frac {\displaystyle \ln x - \mu}{\displaystyle \sigma} \right) \qquad {\rm and} \qquad \displaystyle F_{1}(x)= \Phi \left( \frac {\displaystyle \ln x - \mu}{\displaystyle \sigma} \right),\] where \(\psi(\cdot)\) and \(\Phi(\cdot)\) denote the standard normal pdf and the standard normal cdf, respectively. For this choice, ((1)), ((2)), and ((3)) reduce to \[\displaystyle f(x) = \left\{ \begin{array}{ll} \displaystyle \frac {\displaystyle \psi \left( (\ln x - \mu) / \sigma \right)} {\displaystyle (1 + \phi) \sigma x \Phi \left( (\ln \theta - \mu) / \sigma \right)}, & \mbox{if} \quad 0 < x \leq \theta, \\ \displaystyle \frac {\displaystyle \phi f_2 (x)}{\displaystyle (1 + \phi) \left( 1 - F_2 (\theta) \right)}, & \mbox{if} \quad \theta < x < \infty, \end{array} \right. \label{eq:pdf2} \tag{4} \]
\[\displaystyle F(x) = \left\{ \begin{array}{ll} \displaystyle \frac {\displaystyle \Phi \left( (\ln x - \mu) / \sigma \right)} {\displaystyle (1 + \phi) \Phi \left( (\ln \theta - \mu) / \sigma \right)}, & \mbox{if } \quad 0 < x \leq \theta, \\ \displaystyle \frac {\displaystyle 1}{\displaystyle 1+\phi} \left[ 1 + \phi \frac {\displaystyle F_{2}(x) - F_{2}(\theta)}{\displaystyle 1 - F_{2}(\theta)} \right], & \mbox{if } \quad \theta < x < \infty, \end{array} \right. \label{eq:cdf2} \tag{5} \] and \[\displaystyle Q(u) = \left\{ \begin{array}{ll} \displaystyle \exp \left\{ \mu + \sigma \Phi^{-1} \left[ u (1+\phi) \Phi \left( \frac {\displaystyle \ln \theta - \mu}{\displaystyle \sigma} \right) \right] \right\}, & \mbox{if } \quad 0 < u \leq \frac {\displaystyle 1}{\displaystyle 1+\phi}, \\ \displaystyle F_2^{-1} \left( F_{2}(\theta) + \left( 1 - F_{2}(\theta) \right) \left( \frac {\displaystyle u(1+\phi)-1}{\displaystyle \phi} \right) \right), & \mbox{if } \quad \frac {\displaystyle 1}{\displaystyle 1+\phi} < u < \infty, \end{array} \right. \label{eq:qf2} \tag{6} \]
respectively. We shall refer to the distribution given by ((4)), ((5)), and ((6)) as the composite lognormal distribution. Random numbers from the composite lognormal distribution can be generated as
\[x_i = Q \left( u_i \right) \label{eq:r2} \tag{7} \] for \(i = 1, 2, \ldots, n\), where \(u_i\), \(i = 1, 2, \ldots, n\) are random numbers from a uniform \([0, 1]\) distribution. Further statistical properties of the composite lognormal distribution including moment properties can be found in (Bakar 2012).
The pdf of two-piece composite distributions are in general not continuous or differentiable at the cutoff point \(\theta\). To have these properties satisfied, we impose the conditions \(a_1 f_1^{*} (\theta) = a_2 f_2^{*} (\theta)\) and \(a_1 d f_1^{*} (\theta) / d \theta = a_2 d f_2^{*} (\theta) / d \theta\). (Nadarajah and Bakar 2012) have shown that these conditions are equivalent to the following for any composite lognormal distribution: $$ = + ^{2} + ^{2} , = .
\tag{8} $$ The smoothness conditions in ((8)) are imposed for technical reasons—the respective parametric models then become more tractable.
Maximum likelihood estimation is a common method for estimation. Suppose we have a random sample \(x_1, x_2, \ldots, x_n\) from ((1)). Let \({\mathbf \lambda} = (\lambda_1\), \(\lambda_2\), \(\ldots\), \(\lambda_q)\) be the parameters of \(f_2 (\cdot)\). Suppose also that \(\mu\) and \(\phi\) can be expressed as \(\mu = \mu (\sigma, \theta, {\mathbf \lambda})\) and \(\phi = \phi (\sigma, \theta, {\mathbf \lambda})\), respectively. Then the log-likelihood function is
\[\begin{aligned} \ln L \left( \sigma, \theta, {\mathbf \lambda} \right) &= -n \ln (1 + \phi) + \sum_{x_i \leq \theta} \ln \psi \left( \frac {\ln x_i - \mu}{\sigma} \right) - \sum_{x_i \leq \theta} \ln \left( \sigma x_i \right) \nonumber \\ & -M \ln \Phi \left( \frac {\ln \theta - \mu}{\sigma} \right) + \sum_{x_i > \theta} \ln f_2 \left( x_i \right) \nonumber \\ & -m \ln \left[ 1 - F_2 (\theta) \right] + m \ln \phi, \label{eq:nlogl2} \end{aligned} \tag{9} \] where \(M = \sum_{i = 1}^n I \{ x_i \leq \theta \}\) and \(m = \sum_{i = 1}^n I \{ x_i > \theta \}\). It is clear that the maximum likelihood estimators of \((\sigma, \theta, {\mathbf \lambda})\) cannot be obtained in closed form. They have to be obtained numerically.
The new package CompLognormal, available from CRAN, computes ((4)), ((5)), and ((6)) for any given composite lognormal distribution. It also generates random numbers from the specified composite lognormal distribution. Table 1 summarizes the functions implemented in the package CompLognormal along with their arguments.
Quantity | Calling sequence |
---|---|
\(f(x)\) in ((4)) | dcomplnorm(x, spec, sigma=1, theta=1, ...) |
\(F(x)\) in ((5)) | pcomplnorm(x, spec, sigma=1, theta=1, ...) |
\(Q(u)\) in ((6)) | qcomplnorm(p, spec, sigma=1, theta=1, ...) |
\(x_i\) in ((7)) | rcomplnorm(n, spec, sigma=1, theta=1, ...) |
The input argument spec
(a character string) specifies the
distribution of the second piece of the composite lognormal
distribution. The distribution should be one that is recognized by R. It
could be one of the distributions implemented in the R base package or
one of the distributions implemented in an R contributed package or one
freshly written by a user. In any case, there should be functions
dspec
, pspec
, qspec
and rspec
, computing the pdf, cdf, qf and
random numbers of the distribution.
Some examples of spec
are: spec = "norm"
, meaning that
\(f_2 (x) = (1 / \sigma) \psi ((x - \mu) / \sigma)\) and
\(F_2 (x) = \Phi ((x - \mu) / \sigma)\); spec = "lnorm"
, meaning that
\(f_2 (x) = \left\{ 1 / (\sigma x) \right\} \psi ((\ln x - \mu) / \sigma)\)
and \(F_2 (x) = \Phi ((\ln x - \mu) / \sigma)\); spec = "exp"
, meaning
that \(f_2 (x) = \lambda \exp (-\lambda x)\) and
\(F_2 (x) = 1 - \exp (-\lambda x)\).
dcomplnorm
, pcomplnorm
, qcomplnorm
and rcomplnorm
can also take
additional arguments in the form of ...
. These arguments could give
inputs (for example, parameter values) for the distribution of the
second piece of the composite lognormal distribution. For example, if
spec = "norm"
then ...
can include mean = 1, sd = 1
to mean that
\(f_2 (x) = \psi (x - 1)\) and \(F_2 (x) = \Phi (x - 1)\); if
spec = "lnorm"
then ...
can include meanlog = 1, sdlog = 1
to mean
that \(f_2 (x) = \psi (\ln x - 1)\) and \(F_2 (x) = \Phi (\ln x - 1)\); if
spec = "exp"
then ...
can include rate = 1
to mean that
\(f_2 (x) = \lambda \exp (-x)\) and \(F_2 (x) = 1 - \exp (-x)\). Further
details about the calling sequences can be seen from the documentation
for the CompLognormal package.
The code for dcomplnorm
, pcomplnorm
and qcomplnorm
currently uses
constructs of the form
do.call(paste("d", spec, sep = ""), list(z}, \code{...))
This technique was necessary as in the R base package distributions are
no data type but rather are given by the respective four constituent
functions <prefix><name>
, where <prefix>
is one of r
, d
, p
,
q
and <name>
is the name of the distribution. A way to circumvent
such constructions would be a data type “distribution”. This has been
available on CRAN for quite a while within the distr
family of
packages, with corresponding S4 classes for distributions. Recently,
another approach based on S5-classes has been pursued in the package
poweRlaw (Gillespie 2013). We
leave these as future work.
We describe two simple illustrations of the new package CompLognormal. The following packages should be loaded (after installing them if necessary) in advance:
library(CompLognormal)
library(actuar)
library(SMPracticals)
library(evd)
library(fitdistrplus)
library(stats4)
The illustration presented here plots the pdf and the cdf of the composite lognormal-loglogistic distribution for varying parameter values.
curve(dcomplnorm(x, "llogis", 0.4, 0.5, shape = 1, scale = 0.8), xlim = c(0, 5),
ylim = c(0, 0.7), xlab = "x", ylab = "f(x)", n = 250, col = "black", lty = 1)
<- dcomplnorm(0.5, "llogis", 0.4, 0.5, shape = 1, scale = 0.8)
d1 segments(0.5, 0, 0.5, d1, col = "black", lty = 2)
curve(dcomplnorm(x, "llogis", 0.3, 0.2, shape = 0.2, scale = 0.5), add = TRUE,
col = "red", lty = 1)
<- dcomplnorm(0.2, "llogis", 0.3, 0.2, shape = 0.2, scale = 0.5)
d2 segments(0.2, 0, 0.2, d2, col = "red", lty = 2)
curve(dcomplnorm(x, "llogis", 0.5, 0.4, shape = 0.5, scale = 0.5), add = TRUE,
col = "blue", lty = 1)
<- dcomplnorm(0.4, "llogis", 0.5, 0.4, shape = 0.5, scale = 0.5)
d3 segments(0.4, 0, 0.4, d3, col = "blue", lty = 2)
legend(1.5, 0.5, legend =
c(expression(paste(sigma==0.4, ",", theta==0.5, ",", shape==1, ",", scale==0.8)),
expression(paste(sigma==0.3, ",", theta==0.2, ",", shape==0.2, ",", scale==0.5)),
expression(paste(sigma==0.5, ",", theta==0.4, ",", shape==0.5, ",", scale==0.8))),
col = c("black", "red", "blue"), lty = 1)
curve(pcomplnorm(x, "llogis", 0.4, 0.5, shape = 1, scale = 0.8), xlim = c(0, 5),
ylim = c(0, 1), xlab = "x", ylab = "F(x)", n = 250, col = "black", lty = 1)
<- pcomplnorm(0.5, "llogis", 0.4, 0.5, shape = 1, scale = 0.8)
d1 segments(0.5, 0, 0.5, d1, col = "black", lty = 2)
curve(pcomplnorm(x, "llogis", 0.3, 0.2, shape = 0.2, scale = 0.5), add = TRUE,
col = "red", lty = 1)
<- pcomplnorm(0.2, "llogis", 0.3, 0.2, shape = 0.2, scale = 0.5)
d2 segments(0.2, 0, 0.2, d2, col = "red", lty = 2)
curve(pcomplnorm(x, "llogis", 0.5, 0.4, shape = 0.5, scale = 0.5), add = TRUE,
col = "blue", lty = 1)
<- pcomplnorm(0.4, "llogis", 0.5, 0.4, shape = 0.5, scale = 0.5)
d3 segments(0.4, 0, 0.4, d3, col = "blue", lty = 2)
Figure 1 shows the pdfs and cdfs of the composite
lognormal-loglogistic distribution. The parameters, \(\sigma\) and
\(\theta\), the scale parameter of the lognormal distribution and the
cutoff point, respectively, are as defined in Section “Properties”. The
parameters, shape
and scale
, are the shape and scale parameters of
the loglogistic distribution, as defined in the R base package. The
vertical lines in the figures correspond to the values for \(\theta\), the
cutoff point.
The illustration presented here fits the composite lognormal-Fréchet distribution to the Danish fire insurance data by the method of maximum likelihood, see ((9)). The data were obtained from the R package SMPracticals (Davison 2012).
<- danish[1:2492]
x nlm(function(p) {
-sum(dcomplnorm(x, "frechet", sigma = exp(p[1]),
theta = exp(p[2]), scale = exp(p[3]), shape = exp(p[4]), log = TRUE))
p = c(0, 0, 0, 0)) },
The output will be
$minimum
1] 3859.293
[
$estimate
1] -1.7178497 0.1176224 -0.2872701 0.4130180
[
$gradient
1] -0.0008301586 0.0031436684 0.0001900844 -0.0004574758
[
$code
1] 1
[
$iterations
1] 23 [
The number \(2492\) refers to the actual length of the Danish data set.
This output shows that the maximized log-likelihood is \(-3859.293\), the
estimated \(\sigma\) is \(\exp(-1.7178497)=0.1794516\), the estimated
\(\theta\) is \(\exp(0.1176224)=1.124819\), the estimated scale parameter of
the Fréchet distribution is \(\exp(-0.2872701)=0.750309\), and the
estimated shape parameter of the Fréchet distribution is
\(\exp(0.4130180)=1.511372\). The standard errors of these estimates can
be computed by adding hessian = TRUE
to the nlm
command.
We have chosen the composite lognormal-Fréchet distribution for simplicity of illustration. The purpose of this illustration was not to find the “best fitting” model for the Danish data. Of course, many other composite lognormal distributions can be expected to give better fits than the composite lognormal-Fréchet distribution. (Nadarajah and Bakar 2012) modeled the Danish data using a large class of composite lognormal distributions, including the composite lognormal-Burr, the composite lognormal-inverse Burr, the composite lognormal-\(F\), the composite lognormal-Fréchet, the composite lognormal-generalized Pareto, the composite lognormal-inverse Pareto, and the composite lognormal-loglogistic distributions. The composite lognormal-Burr distribution was shown to give the best fit.
In Illustration 2, we used nlm
for optimization of the likelihood
function. R has many other model fitting routines like fitdistr
from
the package MASS (Venables and Ripley 2002),
fitdist
from the package
fitdistrplus
(Delignette-Muller et al. 2013), mle
from the package stats4, and MLEstimator
,
MDEstimator
from the package
distrMod (Kohl and Ruckdeschel 2013). For
instance, fitdist
can be used to estimate the parameters of the
composite lognormal-Fréchet distribution as follows:
<- function(x, logsigma, logtheta, logscale, logshape) {
dclnormf dcomplnorm(x, spec = "frechet", sigma = exp(logsigma), theta = exp(logtheta),
scale = exp(logscale), shape = exp(logshape))
}<- function(q, logsigma, logtheta, logscale, logshape) {
pclnormf pcomplnorm(q, spec = "frechet", sigma = exp(logsigma), theta = exp(logtheta),
scale = exp(logscale), shape = exp(logshape))
}<- function(p, logsigma, logtheta, logscale, logshape) {
qclnormf qcomplnorm(p, spec = "frechet", sigma = exp(logsigma), theta = exp(logtheta),
scale = exp(logscale), shape = exp(logshape))
}fitdist(danish[1:2492], "clnormf", start = list(logsigma = -1.718,
logtheta = 0.118, logscale = -0.287, logshape = 0.413))
The output will be
' clnormf ' by maximum likelihood
Fitting of the distribution :
Parameters
estimate Std. Error-1.7180280 0.05737326
logsigma 0.1176365 0.02413085
logtheta -0.2877565 0.16564834
logscale 0.4130318 0.03704017 logshape
Also mle
can be used to estimate the parameters of the composite
lognormal-Fréchet distribution as follows:
<- function(p1, p2, p3, p4) {
nllh -sum(dcomplnorm(danish[1:2492], spec = "frechet",
sigma = exp(p1), theta = exp(p2), scale = exp(p3), shape = exp(p4), log = TRUE))
}mle(nllh, start = list(p1 = -1.718, p2 = 0.118, p3 = -0.287, p4 = 0.413))
The output will be
:
Callmle(minuslogl = nllh, start = list(p1 = -1.718, p2 = 0.118, p3 = -0.287,
p4 = 0.413))
:
Coefficients
p1 p2 p3 p4-1.7178735 0.1176081 -0.2870383 0.4130586
We have developed a new R package for computing quantities of interest for any composite lognormal distribution. The computed quantities include the pdf, cdf, qf, and random numbers. Although the package is specifically designed for composite lognormal distributions, it can be easily altered for any other composite distribution.
The authors would like to thank the Editor and the two referees for careful reading and for their comments which greatly improved the paper.
CompLognormal, CRAN, poweRlaw, SMPracticals, MASS, fitdistrplus, distrMod
ActuarialScience, Distributions, Econometrics, Environmetrics, MixedModels, NumericalMathematics, Psychometrics, Robust, Survival, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Nadarajah & Bakar, "CompLognormal: An R Package for Composite Lognormal Distributions", The R Journal, 2013
BibTeX citation
@article{RJ-2013-030, author = {Nadarajah, S. and Bakar, S. A. A.}, title = {CompLognormal: An R Package for Composite Lognormal Distributions}, journal = {The R Journal}, year = {2013}, note = {https://rjournal.github.io/}, volume = {5}, issue = {2}, issn = {2073-4859}, pages = {97-103} }