WLinfer: Statistical Inference for Weighted Lindley Distribution

New distributions are still being suggested for better fitting of a distribution to data, as it is one of the most fundamental problems in terms of the parametric approach. One of such is weighted Lindley (WL) distribution (Ghitany et al. 2011). Even though WL distribution has become increasingly popular as a possible alternative to traditional distributions such as gamma and log normal distributions, fitting it to data has rarely been addressed in existing R packages. This is the reason we present the WLinfer package that implements overall statistical inference for WL distribution. In particular, WLinfer enables one to conduct the goodness of fit test, point estimation, bias correction, interval estimation, and the likelihood ratio test simply with the WL function which is at the core of this package. To assist users who are unfamiliar with WL distribution, we present a brief review followed by an illustrative example with R codes.

Yu-Hyeong Jang (Korea University) , SungBum Kim (Konkuk University) , Hyun-Ju Jung (Konkuk University) , Hyoung-Moon Kim (Corresponding author) (Konkuk University)
2023-01-17

1 Introduction

Weighted Lindley (WL) distribution has recently received considerable attention since it provides a more flexible fit to data from various fields than traditional widely-used distributions such as exponential, log normal, and gamma distributions (Ghitany et al. 2011; Mazucheli et al. 2013). The probability density function (pdf) of WL distribution is given by \[f(x) = \frac{\lambda^{\phi+1}}{(\lambda+\phi)\Gamma(\phi)}{x}^{\phi-1}(1+{x})\exp(-\lambda {x}), x>0,\lambda>0,\phi>0,\] which can be interpreted as a mixture of two gamma distributions \[\begin{aligned} &f(x) = \frac{\lambda}{\lambda+\phi}f_{1}(x)+\frac{\phi}{\lambda+\phi}f_{2}(x),\quad x>0,\lambda>0,\phi>0, \end{aligned}\] where \[\begin{aligned} &f_{i}(x) = \frac{\lambda^{\phi+i-1}}{\Gamma(\phi+i-1)}x^{\phi+i-2} \exp (-\lambda x),\quad x>0,\lambda>0,\phi>0,\quad i=1,2. \end{aligned}\] Due to its nature as a gamma mixture, however, statistical inference for WL distribution, such as parameter estimation, bias correction, interval estimation, and statistical test, is overall more cumbersome and tedious than those of the aforementioned distributions.

Despite this difficulty, there is no existing R package that implements this comprehensive process for WL distribution. To the best of our knowledge, mle.tools (Mazucheli et al. 2017) is the only R package that enables one to obtain maximum likelihood (ML) estimates for WL distribution with asymptotic variance and bias correction. However, they are just limited to ML estimates. An R package fitdistrplus (Delignette-Muller and Dutang 2015) which fits certain univariate distributions to data sets is not applicable to WL distribution, while LindleyR package (Mazucheli et al. 2016) is exclusively for the use of pdf, cumulative distribution function (cdf), quantile, and random number generation, not statistical inference itself.

Based on this motivation, we present an R package WLinfer by providing various estimation methods in addition to maximum likelihood estimator (MLE), such as method of moment estimator (MME), modified method of moment estimator (MME\(_m\)), and closed form MLE-like estimator (MLE\(_c\)). For bias correction, WLinfer encompasses Firth’s method and the bootstrap method which were not considered in mle.tools, as well as Cox and Snell’s method. Furthermore, WLinfer provides the goodness of fit and likelihood ratio tests.

The remainder of this paper is organized as follows. First, we briefly review the theoretical results for each inferential step ranging from the goodness of fit test to the likelihood ratio test, while introducing relevant arguments of the WL function. We then provide an example to illustrate the whole output of WL function and how to use WL function which implements the whole statistical inference at once. Finally, we conclude with summarizing remarks.

The WLinfer package is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=WLinfer. R code for the examples demonstrated herein has been provided as supplementary material. The supplementary code has been tested with WLinfer version 1.0.0, and results presented herein have been produced with this version.

2 Goodness of fit test

The first step in fitting a distribution to data is to check whether the data can be assumed as generated from the distribution. For this purpose, many goodness of fit tests have been developed. Among them, WLinfer includes three popular tests: the Kolmogorov-Smirnov test, the Anderson-Darling test, and the Cramér-von Mises test. Uncertainty that could occur from using estimates instead of true values are not considered here. The argument dist\(\_\)test of WL function is specified by one of either "ks", "cvm", "ad", or "all". The default is dist\(\_\)test="ks" while dist\(\_\)test=all" returns the results of all three tests.

3 Point estimation

WLinfer package considers four estimators: MLE, MLE\(_{c}\), MME and MME\(_{m}\). To the best of our knowledge, these are the only estimators of which asymptotic distribution has been analytically proven. In this study, we only review the estimation formulas. Please refer to (Mazucheli et al. 2013; Ghitany et al. 2017; Kim and Jang 2020) for more specific theoretical results.

Each estimation method can be implemented by choosing est\(\_\)method of WL function, among "MLE", "MLEc", "MME", and "MMEm". The default refers to est\(\_\)method = "MLEc". If point estimation is solely of interest, one can separately obtain point estimates with the following codes:

    data(fail_fiber)  # fail_fiber is included in WLinfer package.
    
    MLEc_WL(fail_fiber)  # Closed form MLE-like estimator
    MME_WL(fail_fiber)  # Method of moment estimator
    MMEm_WL(fail_fiber)  # Modified method of moment estimator
    MLE_WL(fail_fiber,init =1)  # Initial value for phi is required.

We use fail\(\_\)fiber data set (Bader and Priest 1982) which is included in the WLinfer package. This data set consists of 65 observations of the strength measurement of carbon fiber.

4 Bias correction

Unless sample size is sufficient for achieving consistency, bias correction is highly recommended since all the aforementioned estimators are upwardly biased in the case of small samples (Mazucheli et al. 2013; Wang and Wang 2017; Kim and Jang 2020). To this end, the WLinfer package provides three bias correction methods: Cox and Snell’s method (Cox and Snell 1968; Cordeiro and Klein 1994) for MLE and MLE\(_c\), Firth’s method (Firth 1993) for MLE, and the bootstrap method (Efron 1979; Efron and Tibshirani 1986) for MLE\(_c\). After a series of tedious calculations, correction formulas are given as follows:

For bias correction, the ‘bias\(\_\)cor’ argument should be specified since the default is bias\(\_\)cor=NULL. Note, unlike Cox and Snell’s method that works with both MLE and MLE\(_c\), Firth’s method and the bootstrap method are only applicable to MLE and MLE\(_c\), respectively. If other estimators are chosen with these correction methods, a default setting would be automatically adopted. The number of bootstrap iterations can be specified with boot_iter, while the default is 1000.

    WL(fail_fiber, est_method = "MLE",bias_cor = "firth")   # Firth's method
    WL(fail_fiber, est_method = "MLEc",bias_cor = "coxsnell") # Cox and Snell's method
    WL(fail_fiber, est_method = "MLEc",bias_cor = "boots") # The bootstrap method

5 Interval estimation

Asymptotic confidence intervals

MLE, MLE\(_{c}\), MME, and MME\(_{m}\) for the parameters of WL distribution achieve consistency and asymptotic normality. Thus, if the sample size is large enough, confidence intervals can be straightforwardly obtained with estimated asymptotic variances. \[\widehat{\lambda}\pm z_{\alpha/2}\cdot \frac{\widehat{\sigma_{1}}}{\sqrt{n}}\quad\text{and}\quad\widehat{\phi}\pm z_{\alpha/2}\cdot \frac{\widehat{\sigma_{2}}}{\sqrt{n}}\]

One problem is that the lower bound of estimated intervals can be negative. A confidence interval with negative lower bound may not be useful since the parameters for WL distribution must be positive. Therefore, in case of negative lower bounds, confidence intervals for \(\log (\boldsymbol{\theta})\) are first obtained by the delta method and then scaled back. That is, \[\widehat{\lambda}\cdot\operatorname{exp}\left(\pm z_{\alpha/2}\cdot\frac{\widehat{\sigma_{1}}}{\sqrt{n}\cdot\widehat{\lambda}}\right)\quad\text{and} \quad \widehat{\phi}\cdot\operatorname{exp}\left(\pm z_{\alpha/2}\cdot\frac{\widehat{\sigma_{2}}}{\sqrt{n}\cdot\widehat{\phi}}\right).\]

Bootstrap confidence intervals

When asymptotic distribution is not available or the sample size is insufficient, the bootstrap confidence interval can be a good alternative. The basic bootstrap confidence interval is simply given by \[\left(2\widehat{\lambda}-\widehat{\lambda}^{*}_{(1-\alpha/2)}\quad,\quad 2\widehat{\lambda}-\widehat{\lambda}^{*}_{(\alpha/2)}\right)\quad \text{and} \quad \left(2\widehat{\phi}-\widehat{\phi}^{*}_{(1-\alpha/2)}\quad,\quad 2\widehat{\phi}-\widehat{\phi}^{*}_{(\alpha/2)}\right),\] where \(\hat{\theta}^{*}_{(p)}\) denotes \(\left(100\cdot p\right)\%\) percentile of bootstrap realizations. Readers are referred to Chapter 5 of (Davison and Hinkley 1997) for more details. As in the asymptotic case, if the lower confidence limit is negative, confidence interval for \(\log (\boldsymbol{\theta})\) is first computed and scaled back.

The default arguments of WL function for interval estimation are CI\(\_\)method="asymp", CI\(\_\)scale="normal", CI\(\_\)side="two", and CI\(\_\)alpha=0.05. If any bias correction method is used, CI\(\_\)method="boot" is automatically chosen. CI\(\_\)scale="exp" is recommended in the case of a negative lower confidence limit and one-sided intervals can be obtained by choosing CI\(\_\)side="one".

    WL(fail_fiber,est_method="MLEc")$CI_list  #asymptotic CI for MLEc
    WL(fail_fiber,est_method="MLE")$CI_list  #asymptotic CI for MLE
    
    # Bootstrap CI for MLEc corrected by Cox and Snell's method
    WL(fail_fiber,est_method = "MLEc", bias_cor = "coxsnell")$CI_list

6 Likelihood ratio test (LRT)

Let \(X_{1}, \cdots, X_{n}\) be a random sample from \(f(x \mid \theta), \boldsymbol{\theta} \in \Theta,\) where \(\boldsymbol{\theta}=\left(\lambda,\phi\right)^{t}\), \(\Theta \subset \mathbb{R}^{+}\times\mathbb{R}^{+}\) and \(f(\boldsymbol{x}\mid\theta)\) is the pdf of WL distribution. To test \(H_{0}: \boldsymbol{\theta} \in \Theta_{0}\) vs. \(H_{1}: \boldsymbol{\theta} \in \Theta_{0}^{c},\) we reject \(H_{0}\) if \[2 \log (R_{n}) =2 \log \left( \frac{\sup _{\Theta} l_{n}(\boldsymbol{\theta} \mid \boldsymbol{x})}{\sup _{\Theta_{0}} l_{n}(\boldsymbol{\theta} \mid \boldsymbol{x})} \right) > \chi_{\alpha}^{2}(2),\text{ where } l_{n}(\boldsymbol{\theta} \mid \boldsymbol{x}) = \prod\limits_{i} f(x_{i}\mid\boldsymbol{\theta}).\] The LRT is automatically implemented in the WL function unless wilks\(\_\)test="FALSE" is selected. The significance level and types of null hypotheses are set by wilks\(\_\)alpha and wilks\(\_\)side. The default setting is wilks\(\_\)alpha=0.05 and wilks\(\_\)side="two".

The LRT can be separately conducted with the wilks.test function apart from other inferences. This function is especially useful when testing several possible values or regions. By specifying arguments estimator and side, both simple and composite null hypotheses can be tested.

7 Illustration - WL function

In this section, we provide an example for illustrating how to conduct the aforementioned steps of statistical inference from the beginning using the WL function. The WL function returns an S3 object of class ‘WL’ which is assigned to fiber in this example. For this object of class ‘WL’, summary and plot functions are provided.

    > data("fail_fiber")
    > fiber = WL(fail_fiber,dist_test = "ks", est_method ="MLEc",
                 wilks_alpha=0.05, wilks_side="two")
    > summary(fiber)
    
    Data: fail_fiber 
    
    Data summary:
    Mean: 2.244, Variance: 0.1728 
    Min    1st Qu Median 3rd Qu Max
    1.339  1.931  2.272  2.558  3.174
    
    Kolmogorov-Smirnov test for alpha=0.05 
    D = 0.0723, p-value: 0.8864
    >> This data follows the weighted Lindley distribution with estimated parameters.
    
    Estimation method (bias correction): MLEc(None) 
    lambda: 12.9318, phi: 28.3329 
    
    Variance of lambda & phi:
    Var(lambda)= 5.1126, Var(phi)= 25.2938 
    
    Two-sided confidence interval for 95% 
    (Asymptotic method & Normal scaled ) 
    CI for lambda: (8.5001, 17.3635) 
    CI for phi: (18.4757, 38.1902) 
    
    Two-sided Wilks' theorem test for estimated parameters 
    X = 0.0024, p-value: 0.9988
    >>The null hypothesis cannot be rejected.

The simple descriptive statistic is provided by summary(fiber), followed by the result of the goodness of fit test. This data set can be assumed to originate from WL distribution since the null hypothesis of the KS test is not rejected. We chose MLE\(_c\) without bias correction since the sample size of 65 is moderate, which allows us to use asymptotic variance to construct the confidence interval. Variance of lambda & phi in the output means estimated asymptotic variances obtained by replacing \(\left(\lambda,\phi\right)\) with \(\left(\widehat{\lambda},\widehat{\phi}\right)\) in the variance formulas. Finally, the result of LRT with \(H_{0}:\text{ }\lambda = \widehat{\lambda}_{MLE_{c}}\text{ and }\phi = \widehat{\phi}_{MLE_{c}}\) is given.

graphic without alt textgraphic without alt text

graphic without alt textgraphic without alt text

Figure 1: Histogram with estimated density function, boxplot for detecting outliers, weighted Lindley QQ plot, and contour plot of log-likelihood function with point estimates for a real data fail_fiber.

To obtain some helpful plots, we use plot function. Four plots are returned by plot(fiber) as per Figure 1: the histogram with estimated density function, the boxplot for detecting outliers, QQ plot, and contour plot with point estimates. The first three plots provide insight on how much WL distribution is suitable for the given data, along with the goodness of fit tests. The last contour plot provides the surface of log likelihood near point estimates. Note that the density function in the histogram and the quantiles for the QQ plot are calculated based on estimated parameters.

8 Summary

We presented an R package WLinfer that implements a goodness of fit test, several types of point estimation, bias correction, interval estimation, and the likelihood ratio test, and provide some useful plots. We supply a set of simple codes and an illustrative example of how to apply this package in practice. This package could practically assist practitioners, removing the need to make codes from scratch.

9 Acknowledgments

The corresponding author’s research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education(2018R1D1A1B07045603) and a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (2021R1A4A5032622).

CRAN packages used

WLinfer, mle.tools, fitdistrplus, LindleyR

CRAN Task Views implied by cited packages

ActuarialScience, Distributions, Survival

Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

M. Bader and A. Priest. Statistical aspects of fibre and bundle strength in hybrid composites. in Progress in Science and Engineering of Composites, T. Hayashi , K. Kawata and S. Umekawa , eds., ICCM-IV, Tokyo, 1982.
G. M. Cordeiro and R. Klein. Bias correction in ARMA models. Statistics and Probability Letters, 19(3): 169–176, 1994.
D. R. Cox and E. J. Snell. A general definition of residuals. Journal of the Royal Statistical Society. Series B (Methodological), 30(2): 248–275, 1968.
A. C. Davison and D. V. Hinkley. New bootstrap methods and their application. 1997.
M. Delignette-Muller and C. Dutang. Fitdistrplus: An r package for fitting distributions. Journal of Statistical Software, Articles, 64(4): 1–34, 2015.
B. Efron. Bootstrap methods: Another look at the jackknife. Annals of Statistics, (7): 1–26, 1979.
B. Efron and R. Tibshirani. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci., 1(1): 54–75, 1986.
D. Firth. Bias reduction of maximum likelihood estimates. Biometrika, 80(1): 27–38, 1993.
M. E. Ghitany, F. Alqallaf, D. K. Al-Mutairi and H. A. Husain. A two-parameter weighted lindley distribution and its applications to survival data. Mathematics and Computers in Simulation, 81(6): 1190–1201, 2011.
M. E. Ghitany, P. Song and S. Wang. New modified moment estimators for the two-parameter weighted lindley distribution. Journal of Statistical Computation and Simulation, 87(16): 3225–3240, 2017. URL https://doi.org/10.1080/00949655.2017.1363205 .
H.-M. Kim and Y.-H. Jang. New closed-form estimators for weighted lindley distribution. submitted, 2020.
J. Mazucheli, L. Fernandes and R. de Oliveira. Package. 2016. URL https://cran.r-project.org/web/packages/LindleyR/LindleyR.pdf.
J. Mazucheli, F. Louzada and M. E. Ghitany. Comparison of estimation methods for the parameters of the weighted lindley distribution. Applied Mathematics and Computation, 220: 463–471, 2013.
J. Mazucheli, A. Menezes and S. Nadarajah. Mle.tools: An r package for maximum likelihood bias correction. The R Journal, 9(2): 268–290, 2017.
M. Wang and W. Wang. Bias-corrected maximum likelihood estimation of the parameters of the weighted lindley distribution. Communications in Statistics - Simulation and Computation, 46(1): 530–545, 2017. URL https://doi.org/10.1080/03610918.2014.970696 .

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Jang, et al., "WLinfer: Statistical Inference for Weighted Lindley Distribution", The R Journal, 2023

BibTeX citation

@article{RJ-2022-042,
  author = {Jang, Yu-Hyeong and Kim, SungBum and Jung, Hyun-Ju and author), Hyoung-Moon Kim (Corresponding},
  title = {WLinfer: Statistical Inference for Weighted Lindley Distribution},
  journal = {The R Journal},
  year = {2023},
  note = {https://rjournal.github.io/},
  volume = {14},
  issue = {3},
  issn = {2073-4859},
  pages = {13-19}
}