We discuss implementation of a profile likelihood method for estimating a Pearson correlation coefficient from bivariate data with censoring and/or missing values. The method is implemented in an R package clikcorr which calculates maximum likelihood estimates of the correlation coefficient when the data are modeled with either a Gaussian or a Student
Partially observed data present a challenge in statistical estimation. Simple approaches like complete case analysis allow traditional estimators to be used, but sacrifice precision and may introduce bias. More sophisticated approaches that incorporate information from partially observed cases have the potential to achieve greater power, but can be much more difficult to implement. As a case in point, the Pearson correlation coefficient is easily estimated by the familiar moment-based formula,
Partially observed bivariate data
Here we discuss a likelihood-based method for estimating the correlation coefficient from partially-observed bivariate data, and present an R package implementing the method, clikcorr (“Censored data likelihood based correlation estimation") (Li et al. 2016). clikcorr calculates maximum likelihood estimates of the parameters in a model, in particular the correlation coefficient, in the presence of left, right, and interval censored data and missing values. clikcorr also has functions to help visualize bivariate datasets that contain censored and/or missing values.
The package emphasizes the role of the profile likelihood function for the correlation parameter. It provides functions to visualize the profile likelihood function, and uses likelihood ratio test inversion to provide confidence intervals for the correlation coefficient. This circumvents difficulties that arise when using Wald-type confidence intervals for a parameter that lies in a bounded domain, and that may have an asymmetric sampling distribution.
This project was motivated by the need to calculate correlation coefficients with left-censored chemical assay data. Such data often arise in the context of toxicology, chemistry, environmental science and medical laboratory applications, where some measurements fall below a limit of detection (LOD) (Jaspers et al. 2013). The limit of detection must be known or approximated. Estimating the correlation coefficient from bivariate censored data has been studied (Li et al. 2005), but issues in both methodology and software remain.
The paper is organized as follows. Section 1 develops model-based frameworks for correlation estimation based on Gaussian and Student-t models. Section 2 describes clikcorr, including the data input format, the output format, and program features such as the plotting functions. Section 3 presents simulations to address the performance of clikcorr under a range of scenarios, focusing on run time and coverage probability of the confidence intervals. In Section 4 we use clikcorr to analyze dioxin data from both Jaspers et al. (2013) and the National Health and Nutrition Examination Survey (NHANES). Some further details on using the R package clikcorr are provided in Section 5
Working in a model-based framework of a bivariate distribution with a correlation parameter opens up several paths for handling partially observed data. A likelihood for the observed data can be calculated by integrating a “complete data” likelihood over all data configurations that are consistent with the observed data. The resulting “observed data likelihood” is seen as preserving full information from the data, and can be used with any standard likelihood-based inference approach, including maximum likelihood (ML) estimation, Wald-type inference, likelihood ratio testing, and profile likelihood analysis.
The bivariate Gaussian distribution is a natural starting point for
model-based estimation of a correlation coefficient. The distribution
has five parameters: two marginal means (
The fact that the Gaussian distribution maximum likelihood estimates coincide with the standard method of moments estimates implies some desirable properties. In particular, in the complete data setting, the Gaussian ML estimates will be consistent whenever the law of large numbers can be invoked to give consistency of the sample moments, regardless of whether the data are actually Gaussian. This makes it an appealing foundation for developing an approach that accommodates partially observed cases.
However, inference based on the Gaussian distribution may be misleading if the true distribution is far from Gaussian. In the complete data setting, traditional maximum likelihood (ML) techniques such as likelihood ratio testing and Wald-type inference are seldom performed, with approaches based on Fisher’s (Fisher 1915) variance stabilizing transformation being much more common. An approach to estimation and inference for the correlation parameter in the setting of partially observed data based on Fisher’s transformation has been developed (Li et al. 2005). All these approaches may produce misleading results if the data are strongly non-Gaussian, or if the conditions under which Fisher’s transformation actually stabilizes the variance do not hold.
The approach we implement here is fully model-based and does not rely on
variance stabilizing transformations. Instead, we use standard
maximum-likelihood approaches for estimation and inference. The
model-based framework discussed here can be used with any model for
bivariate data that includes an explicit correlation parameter
(including any “elliptic distribution” with density of the form
To explore the sensitivity of the estimates and inferences to the model
specification, we also implemented our approach for the setting where
the data follow a bivariate Student-t distribution. The density for this
distribution when
Our main goal here is to accommodate partially observed data. In the
presence of left-, right-, and/or interval-censored data, the univariate
likelihood function can be written in terms of density functions (for
exact observed values), cumulative distribution functions (CDF,
We adopt a convention to treat all three censoring types and exact
observed values in a unified representation, with all being special
cases of interval censoring. Specifically, every case is known to lie in
an interval
For the bivariate setting of
In the Gaussian model, for cases 2 and 3 we can take advantage of the
fact that the conditional distributions are also Gaussian, with means
and variances that are easily calculated (Rao 1973). Thus we calculate
mvtnorm
R
package.
Besides censoring, missing data can also be present in bivariate data
(Little and Rubin 2002). When both members of a pair are missing, the likelihood
contribution factor is
The likelihood function
Pair type | Factor to the likelihood | Pair type | Factor to the likelihood |
---|---|---|---|
X o., Y o. | X o., Y lc. | ||
X o., Y rc. | X o., Y ic. | ||
X o., Y m. | X lc., Y o. | ||
X lc., Y lc. | X lc., Y rc. | ||
X lc., Y ic. | X lc., Y m. | ||
X rc., Y o. | X rc., Y lc. | ||
X rc., Y rc. | X rc., Y ic. | ||
X rc., Y m. | X ic., Y o. | ||
X ic., Y lc. | X ic., Y rc. | ||
X ic., Y ic. | X ic., Y m. | ||
X m., Y o. | X m., Y lc. | ||
X m., Y rc. | X m., Y ic. | ||
X m., Y m. | 1 |
When data arise from an underlying heavy tailed bivariate distribution, estimates based on the bivariate normal model may yield confidence intervals with poor coverage probabilities. This is especially a concern when partially observed data are present. We provide the bivariate Student-t model as an alternative approach to estimation in this setting.
The bivariate
mvtnorm
package in R to
approximate these values.
Pair type | Factor to the likelihood | Pair type | Factor to the likelihood |
---|---|---|---|
X o., Y o. | X o., Y lc. | ||
X o., Y rc. | X o., Y ic. | ||
X o., Y m. | X lc., Y o. | ||
X lc., Y lc. | X lc., Y rc. | ||
X lc., Y ic. | X lc., Y m. | ||
X rc., Y o. | X rc., Y lc. | ||
X rc., Y rc. | X rc., Y ic. | ||
X rc., Y m. | X ic., Y o. | ||
X ic., Y lc. | X ic., Y rc. | ||
X ic., Y ic. | X ic., Y m. | ||
X m., Y o. | X m., Y lc. | ||
X m., Y rc. | X m., Y ic. | ||
X m., Y m. | 1 |
For large
Profile likelihood based confidence intervals have some advantages over other methods such as the commonly used Wald-type confidence intervals obtained from the limiting distribution of the MLE. Wald-type intervals may give poor performance when the parameter is close to the boundary of its domain, in which case the parameter estimate assumes a skewed distribution (Stryhn and Christensen 2003). The profile likelihood based approach is robust to this skewness by providing an asymmetric confidence interval estimate. Zhou et al. (2012) compared confidence interval estimates for a logistic regression setting using (1) the Wald method, (2) the Bootstrap method and (3) the profile likelihood based method. In our setting, the profile likelihood based method is more computationally efficient than the bootstrap method and is more robust to cases of parameter estimates with markedly skewed distributions than the Wald method.
We implemented an R package clikcorr (Li et al. 2016) to facilitate inference on the correlation coefficient from bivariate data in the presence of censored and/or missing values. The package is able to handle all combinations of observed, censored (left-, right-, or interval-censored) or missing data on each of the bivariate components. Most of the currently available software for estimating correlation coefficient between bivariate censored data focus on the left-truncated and right-censored data (Li et al. 2005; Schemper et al. 2013). To the best of our knowledge, clikcorr is the first available software that can handle data with a complex mixture types of censoring and missing. The package also has graphical functions to help visualize bivariate censored data and the shape of the profile log likelihood function.
We note the large set of routines for left-censored data given in the
NADA (Non-detects and Data Analysis) package
(Helsel 2005, 2012). NADA currently has no function for the
Pearson correlation coefficient with left-censored data, although it
does include a function for Kendall’s
clikcorr requires a special input data format. Each input variable is
represented by two columns for the interval bounds discussed in section
1.3, with LIFEREG
procedure of
Surv
" class of the
survival package
(Therneau 2015) in R when type=interval2
. Table
3 gives an example dataset formatted for input into
clikcorr, with a pair of variables, Variable1 and Variable2, each
given with lower and upper interval endpoints. For the first sample ID,
both variables are exactly observed at values rightLeftC2DF
and intervalC2DF
can transform input data from”Surv
" class formats
into the clikcorr required input format. rightLeftC2DF
transforms
the right- or left- censored data in the “Surv
" class and”intervalC2DF
" transforms interval-censored data (either interval
or interval2
) in the “Surv
" class.
Sample ID | Lower bound of | Upper bound of | Lower bound of | Upper bound of |
the 1st variable | the 1st variable | the 2nd variable | the 2nd variable | |
1 | 10.9 | 10.9 | 37.6 | 37.6 |
2 | 12.8 | 12.8 | NA | 7.6 |
3 | 8.7 | 8.7 | 26.7 | NA |
4 | 13.2 | 13.2 | 11.7 | 20.9 |
5 | 9.8 | 9.8 | NA | NA |
6 | NA | 10.0 | NA | 6.9 |
7 | 11.4 | 16.3 | 10.8 | 18.7 |
clikcorr has three main functions, one estimating function and two
graphical functions. The estimating function calculates the correlation
coefficient estimate, its confidence interval and the
Below is an example of calling the estimation function and the output
assuming (by default) the bivariate normal distribution. Here ND
is an
example dataset contained in the clikcorr package. See Li et al. (2016) for
details about the ND
dataset. ("t1_OCDD"
, "t1_HxCDF_234678"
) and
("t2_OCDD"
, "t2_HxCDF_234678"
) are column names for the lower and
upper interval bounds for input variables “OCDD
" and”HxCDF_234678
", respectively . cp
is the user specified coverage
probability (confidence level) of the confidence interval, with the
default set to print
method of “clikcorr
" class outputs
the coefficients
matrix, which contains the estimated correlation
coefficient coefficients
, the lower and upper bounds for the
confidence interval CI.lower
and CI.upper
, and the p.value
, from the likelihood ratio test of the null hypothesis that
R> data(ND)
R> logND <- log(ND)
R> clikcorr(data = logND, lower1 = "t1_OCDD", upper1 = "t2_OCDD",
+ lower2 = "t1_HxCDF_234678", upper2 = "t2_HxCDF_234678", cp=.95)
Call:
clikcorr.default(data = logND, lower1 = "t1_OCDD", upper1 = "t2_OCDD",
lower2 = "t1_HxCDF_234678", upper2 = "t2_HxCDF_234678", cp = 0.95)
coefficients 95%CI.lower 95 %CI.upper p.value
0.472657898 -0.006538307 0.769620179 0.053083585
The summary
method further outputs the vector of estimated means
$Mean
, the estimated variance-covariance matrix $Cov
and the log
likelihood value at the MLE $Loglike
.
R> obj <- clikcorr(data=logND, lower1="t1_OCDD", upper1="t2_OCDD",
+ lower2="t1_HxCDF_234678", upper2="t2_HxCDF_234678", cp=.95)
R> summary(obj)
$call
clikcorr.default(data = logND, lower1 = "t1_OCDD", upper1 = "t2_OCDD",
lower2 = "t1_HxCDF_234678", upper2 = "t2_HxCDF_234678", cp = 0.95)
$coefficients
coefficients 95%CI.lower 95%CI.upper p.value
0.472657898 -0.006538307 0.769620179 0.053083585
$mean
[1] 5.3706036 -0.4655811
$Cov
[,1] [,2]
[1,] 0.5938656 0.3100190
[2,] 0.3100190 0.7244269
$loglik
[1] -37.97112
attr(,"class")
[1] "summary.clikcorr"
As we will illustrate in a later section, when data are generated from a
heavy tailed distribution, inference assuming the bivariate normal
distribution can result in poor performance. The option “dist=t
" can
be used to specify a bivariate df
"
specifies the degrees of freedom (d.f.) of the bivariate ND
, and the
output assuming the bivariate
R> obj <- clikcorr(ND, lower1="t1_OCDD", upper1="t2_OCDD", lower2="t1_HxCDF_234678",
+ upper2="t2_HxCDF_234678", dist="t", df=10)
R> summary(obj)
$call
clikcorr.default(data = logND, lower1 = "t1_OCDD", upper1 = "t2_OCDD",
lower2 = "t1_HxCDF_234678", upper2 = "t2_HxCDF_234678", dist = "t",
df = 10, nlm = TRUE)
$coefficients
coefficients 95%CI.lower 95%CI.upper p.value
0.77229746 -0.09201948 0.72893732 0.10812589
$mean
[1] 5.5240929 -0.3338772
$Cov
[,1] [,2]
[1,] 0.5996139 0.4886447
[2,] 0.4886447 0.6676448
$loglik
[1] -104.0848
attr(,"class")
[1] "summary.clikcorr"
clikcorr also provides two graphical functions. Visualizing the data
and the analytic outputs is often useful, and is sometimes invaluable
for understanding the results. A barrier to graphing censored data is
finding a way to plot the range of possible data values. The function
splot
provides an effective way of visualizing bivariate censored
and/or missing data. With splot
, censored values can be identified
with different colors or plotting symbols, depending on the data types,
as described in sections 1.2 and 1.3.
For example, the plotting symbol could be an arrow showing the direction
of uncertainty. Since the ND
dataset in clikcorr package only
contains right censored data points, to demonstrate the fact that
clikcorr can provide a visual representation for all types of censored
and/or missing data, in the following examples, we illustrate the
clikcorr plotting functions using a simulated dataset. Denote by
SimDat
an input data matrix, generated from a tri-variate normal
distribution with pairwise correlation coefficient "L1"
and "U1"
be column names for the lower and upper interval
bounds for input variable 1. Similar notations are also adopted for
variable 2 and variable 3. A single scatterplot of Variable 2 versus
Variable 1, or ("L2"
, "U2"
) versus ("L1"
, "U1"
), would be
plotted by calling splot2(SimDat, c("L1","L2"), c("U1","U2")
. The
following syntax calls splot
and produces a scatterplot matrix with
all combinations of three input variables, as shown in Figure
1. Note that a censored observation has its
position indicated by the tail of an arrow and the head of the arrow is
pointed to the censoring direction.
R> splot(SimDat, c("L1", "L2", "L3"), c("U1", "U2", "U3"))
The S3
function plot
of the clikcorr
class produces a plot of the
profile log likelihood function. The following syntax calls the plot
function using the bivariate normal and the bivariate
R> modN <- clikcorr(SimDat, "L1", "U1", "L2", "U2", cp=.95)
R> plot(modN)
R> modT <- clikcorr(SimDat, "L1", "U1", "L2", "U2", dist="t", df=3)
R> plot(modT)
The clikcorr main estimating function requires that the following
other R functions be loaded: the R package
mvtnorm
(Genz and Bretz 2009; Genz et al. 2012; Hothorn et al. 2013), as well as function “optim
",
and function”integrate
" when using the bivariate
Extensive simulations have been done to demonstrate different aspects of the performance of clikcorr under various settings of sample size, percent censoring, and underlying distribution. We show these results in the following subsections.
First, we investigate the coverage probability of the confidence intervals with two types of right censoring, termed “fixed point” and “random”, as described further below. Data were generated from the bivariate normal distribution, and inference assumed the same distribution.
In the fixed point right censoring setting, all observations were
censored at the same value. Different fixed points were calculated to
give expected censoring rates of
It can be seen from Tables 4 and 5 that
clikcorr gives satisfying confidence interval coverage probabilities
in both the fixed and random censoring settings. In the completely
observed cases, coverage probabilities for both profile likelihood and
Fisher transformation based confidence intervals are very close to the
nominal level of
Table 6 shows the run time of clikcorr under different
settings of sample sizes and censoring proportion configurations, using
the bivariate normal distribution for both data generation and
inference. The censoring proportion configurations, for example
“
For the bivariate
Table 7 shows the estimates of the bias and MSE for
inference on
When the underlying marginal distributions are heavy tailed rather than
normal, basing inference on the bivariate normal can result in poor
confidence interval coverage probabilities. This is illustrated in Table
8, where the data were generated from correlated marginal
The coverage probabilities within the parentheses in Table 8
are for confidence intervals constructed using the bivariate
The clikcorr package does not provide a way to test whether the data are drawn from a heavy tailed distribution or not, nor does it estimate the best value of d.f. to use. The users could refer to Bryson (1974; Resnick 2007) or use QQ plots to guide their decision on which model to use.
In this section, we used clikcorr to analyze two real world datasets and compared the results to conventional approaches.
The first dataset was analyzed in (Jaspers et al. 2013) which used an early version of clikcorr to estimate correlations for left-censored (below detection) data. The data contain Perfluorooctane sulfonate (PFOS) substances in feathers and soft tissues (liver, muscle, adipose tissue and preen glands) of barn owl road-kill victims collected in the province of Antwerp, Belgium. Figure 3 gives an example scatter plot (upper panel) and an example plot of the profile log likelihood of the correlation between PFOS in liver tissue and feathers using clikcorr. Figure 4 shows the scatter plot matrix across tail feather and the tissues generated by clikcorr. The asymmetrical shape of the profile likelihood reflects the asymmetrical sampling distribution, and leads to a confidence interval that is longer to the left than to the right of the point estimate.
In the second analysis using clikcorr, we analyzed dioxin data from
the 2002-2003 National Health and Nutrition Examination Survey (NHANES).
NHANES is one of the most widely-used datasets describing the health and
nutritional status of people residing in the US. The data we analyzed
contained
The pairwise correlation coefficients calculated from clikcorr are
given in upper-right triangle of Table 9. For
each pair, the MLE of the correlation coefficient and associated LRT
The pairs of chemicals with the highest correlation in each group are
shown in Figure 6. Both the MLE of the correlation
coefficient calculated from clikcorr
Some chemicals in the dataset appear to have highly skewed marginal distributions. It is common to apply a symmetrization transformation in such a setting. The correlation coefficient between transformed values estimates a different population parameter, but this parameter may be of as much or greater inference than the correlation computed on the original scale. To demonstrate the effect of such a transformation, we log-transformed the data for each chemical and recalculated the correlation coefficient using the bivariate normal model. The estimation and inference results from the transformed data are given in the lower-left triangle in Table 9.
Using the bivariate normal model for transformed data is an example of
using bivariate Gaussian copulas to allow a wide range of distributions
to be analyzed using software developed for Gaussian models. A general
approach of this type would be to transform
Figure 5 demonstrates the difference between using data on the original and the log-transformed scales. When the observed marginal distributions are highly skewed, assuming a bivariate normal true distribution could lead to false results. For example, the correlation effect could be leveraged away by the “outlier" points on tail of either marginal distributions. As in Figure 5 (c), the original values between chemicals OCDD(d7) and HxCDF234678 (f6) were not significantly correlated due to the divergent effect of the points on the marginal tails. While after the log transformation, the marginal sample distributions are more normally shaped and thus make the bivariate normal assumption more likely to be true, and the two chemicals are significantly correlated with a moderate correlation coefficient.
This section gives technical details regarding function maximization and gives some examples of additional plotting options.
We note that clikcorr depends on the R function
optim
, which is used to find the maximum of the log likelihood, or
profile log likelihood. For some starting values of function cannot be evaluated at initial parameters
". In such
cases, manually adjusting the initial parameter values is recommended.
Figure 7 illustrates two example profile log
likelihood functions for parameters of the bivariate distribution of
PFOS in Tail feather and Adipose tissue in the owl feather dataset. Each
plot shows the log likelihood for certain parameters with the other
parameters fixed. In the upper panel, the log likelihood of
optim
" function and give
the above error message.
It is worth the effort to make the input starting parameter values as
close to the true parameters as possible, if these are known. By
default, clikcorr uses the sample means, the log transform of sample
variances and the sample correlation based on the complete observations
as the starting values. It also allows the user to specify the starting
values and pass them to the “optim
" function by using the option
sv
. This option is especially useful when the number of exact
observations is small. For example, if a dataset contains only one
complete observation, then the starting values of
R> clikcorr(owlData, "feather_L", "feather_U", "AT_L", "AT_U", cp=.95, dist=t,
+ df=5, sv=c(23, 240, 6, 1300, 12))
The default optimization method for searching for the MLE in clikcorr
is the default method used in “optim
", which is the Nelder-Mead
method (R Core Team 2013a). clikcorr also allows the user to use other
optimization methods implemented in”optim
" or use the non-linear
minimization function “nlm
" (R Core Team 2013b). For example:
R> clikcorr(owlData, "feather_L", "feather_U", "AT_L", "AT_U", cp=.95, dist="t",
+ df=3, method="BFGS")
R> clikcorr(owlData, "feather_L", "feather_U", "AT_L", "AT_U", cp=.95, dist="t",
+ df=3, nlm=TRUE)
We note here that the default optimization method in “optim
" function
does not use derivative (or gradient) information but rather does a
greedy simplex search on the function values, which makes it work
relatively slowly. In future work, we plan to implement some independent
gradient-directed types of optimization algorithms into clikcorr to
hasten the convergence speed. It would be especially helpful in
searching for the confidence interval bounds of the correlation
coefficient.
clikcorr also allows R plotting options such as
“xlab",”xlim", “bg" etc., in the”splot
" and “plot
" functions.
For example, in Figure 6 the individual panels are
generated respectively by
R> splot2(NHANESD, "t1_TCDD", "t2_TCDD", "t1_PeCDD", "t2_PeCDD", xlab="d1=TCDD",
+ ylab="d2=PeCDD", bg="#FFB90F", cex.lab=2.0)
R> splot2(NHANESD, "t1_HxCDF_123478", "t2_HxCDF_123478", "t1_HxCDF_123678",
+ "t2_HxCDF_123678", xlab="f4=HxCDF123478", ylab="f5=HxCDF123678", bg="#FFB90F",
+ cex.lab=2.0)
R> splot2(NHANESD, "t1_PCB_156", "t2_PCB_156", "t1_PCB_157", "t2_PCB_157",
+ xlab="p3=PCB156", ylab="p4=PCB157", bg="#FFB90F", cex.lab=2.0)
R> splot2(NHANESD, "t1_PCB_105", "t2_PCB_105", "t1_PCB_118", "t2_PCB_118",
+ xlab="p1=PCB105", ylab="p2=PCB106", bg="#FFB90F", cex.lab=2.0).
Sample size | 0% Censoring rate | 25% Censoring rate | 75% Censoring rate | ||||||
---|---|---|---|---|---|---|---|---|---|
n=50 | 0.938 | 0.962 | 0.946 | 0.938 | 0.946 | 0.968 | 0.958 | 0.954 | 0.956 |
(0.956) | (0.968) | (0.954) | – | – | – | – | – | – | |
n=200 | 0.944 | 0.948 | 0.942 | 0.954 | 0.960 | 0.948 | 0.972 | 0.964 | 0.960 |
(0.948) | (0.954) | (0.950) | – | – | – | – | – | – | |
n=500 | 0.932 | 0.934 | 0.952 | 0.946 | 0.948 | 0.964 | 0.952 | 0.944 | 0.964 |
(0.938) | (0.948) | (0.954) | – | – | – | – | – | – |
Sample size | Censoring distribution=N(-2,1) | Censoring distribution=N(0,1) | Censoring distribution=N(2,1) | ||||||
Average censoring rate |
Average censoring rate |
Average censoring rate |
|||||||
n=50 | 0.934 | 0.936 | 0.942 | 0.908 | 0.946 | 0.958 | – | – | – |
n=200 | 0.958 | 0.968 | 0.948 | 0.942 | 0.962 | 0.958 | 0.942 | 0.946 | 0.930 |
n=500 | 0.946 | 0.948 | 0.956 | 0.944 | 0.945 | 0.954 | 0.947 | 0.938 | 0.962 |
Sample size | Censoring proportion configuration | ||||||||
---|---|---|---|---|---|---|---|---|---|
(X 0%; XY 30%; Y 0%) | (X 15%; XY 15%; Y 15%) | (X 30%; XY 0%; Y 30%) | |||||||
n=50 | 36 | 50 | 59 | 22 | 25 | 32 | 1.2 | 1.4 | 1.8 |
n=200 | 166 | 206 | 233 | 89 | 103 | 110 | 1.5 | 2.1 | 2.2 |
n=500 | 267 | 515 | 728 | 176 | 275 | 374 | 1.8 | 2.4 | 3.5 |
Sample size | Censoring distribution=N(-2,1) | Censoring distribution=N(0,1) | Censoring distribution=N(2,1) | ||||||
Average censoring rate |
Average censoring rate |
Average censoring rate |
|||||||
n=50 | 0.006 | -0.021 | -0.018 | 0.031 | 0.007 | 0.008 | – | – | – |
(0.019) | (0.030) | (0.053) | (0.035) | (0.064) | (0.076) | (–) | (–) | (–) | |
n=200 | -0.005 | -0.005 | -0.026 | 0.020 | 0.003 | -0.037 | -0.017 | 0.053 | 0.021 |
(0.008) | (0.012) | (0.014) | (0.012) | (0.010) | (0.014) | (0.024) | (0.034) | (0.029) | |
n=500 | 0.005 | -0.008 | -0.002 | 0.008 | -0.001 | -0.010 | -0.006 | -0.018 | -0.010 |
(0.002) | (0.003) | (0.005) | (0.003) | (0.006) | (0.007) | (0.010) | (0.009) | (0.013) |
Degree of freedom | Sample size |
Sample size |
Sample size |
||||||
---|---|---|---|---|---|---|---|---|---|
df=3 | 0.95 | 0.81 | 0.69 | 0.96 | 0.76 | 0.55 | 0.96 | 0.70 | 0.54 |
(0.93) | (0.93) | (0.97) | (0.97) | (0.95) | (0.96) | ||||
df=5 | 0.95 | 0.91 | 0.86 | 0.94 | 0.86 | 0.81 | 0.96 | 0.88 | 0.76 |
(0.97) | (0.95) | (0.96) | (0.95) | (0.97) | (0.95) | ||||
df=10 | 0.94 | 0.93 | 0.89 | 0.93 | 0.96 | 0.89 | 0.95 | 0.94 | 0.91 |
(0.93) | (0.94) | (0.94) | |||||||
df=20 | 0.95 | 0.92 | 0.93 | 0.94 | 0.94 | 0.92 | 0.96 | 0.94 | 0.94 |
|
|
|
|
|
|
|
|
Chemical index | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
d1 | d2 | d3 | d4 | d5 | d6 | d7 | f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | p1 | p2 | p3 | p4 | p5 | p6 | |
d1 | – | .81 | .68 | .67 | .61 | .50 | .22 | .22 | .30 | .66 | .65 | .62 | .36 | .08 | .05 | .01 | .44 | .47 | .30 | .39 | .57 | .36 |
– | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (1) | (0) | (0) | (0) | (0) | (.01) | (.46) | (.82) | (0) | (0) | (0) | (0) | (0) | (0) | |
d2 | .79 | – | .78 | .80 | .70 | .52 | .56 | .46 | .29 | .73 | .71 | .71 | .40 | .10 | .08 | .03 | .42 | .47 | .36 | .49 | .61 | .38 |
(0) | – | (0) | (0) | (0) | (0) | (0) | (0) | (.01) | (0) | (0) | (0) | (0) | (0) | (.17) | (.45) | (0) | (0) | (0) | (0) | (0) | (0) | |
d3 | .69 | .79 | – | .71 | .75 | .58 | .50 | .19 | .19 | .60 | .62 | .61 | .41 | .07 | .13 | .08 | .32 | .35 | .32 | .36 | .49 | .39 |
(0) | (0) | – | (0) | (0) | (0) | (1) | (0) | (.02) | (0) | (0) | (0) | (0) | (.01) | (.18) | (1) | (0) | (0) | (0) | (0) | (0) | (0) | |
d4 | .65 | .73 | .83 | – | .66 | .58 | .69 | .15 | .13 | .75 | .79 | .77 | .36 | .13 | .05 | .00 | .40 | .46 | .46 | .51 | .60 | .39 |
(0) | (0) | (0) | – | (0) | (0) | (0) | (.01) | (1) | (0) | (0) | (0) | (0) | (0) | (1) | (1) | (0) | (0) | (0) | (0) | (0) | (0) | |
d5 | .60 | .66 | .75 | .74 | – | .55 | .73 | .30 | .50 | .56 | .60 | .63 | .47 | .13 | .42 | .18 | .32 | .34 | .24 | .28 | .42 | .27 |
(0) | (0) | (0) | (0) | – | (0) | (0) | (0) | (1) | (0) | (0) | (0) | (0) | (0) | (1) | (1) | (0) | (0) | (0) | (0) | (0) | (0) | |
d6 | .54 | .49 | .65 | .56 | .59 | – | .76 | .14 | .15 | .45 | .57 | .51 | .31 | .15 | .12 | .06 | .39 | .41 | .22 | .26 | .41 | .21 |
(0) | (0) | (0) | (0) | (0) | – | (0) | (.08) | (1) | (0) | (0) | (0) | (0) | (0) | (1) | (1) | (0) | (0) | (0) | (0) | (0) | (0) | |
d7 | .60 | .58 | .69 | .66 | .65 | .77 | – | .19 | .33 | .59 | .70 | .60 | -.05 | .14 | -.53 | .03 | .38 | .41 | .28 | .36 | .56 | .25 |
(0) | (0) | (0) | (0) | (0) | (0) | – | (.01) | (1) | (0) | (0) | (0) | (1) | (0) | (1) | (.61) | (0) | (0) | (0) | (0) | (0) | (0) | |
f1 | .29 | .45 | .20 | .23 | .35 | .19 | .24 | – | .83 | .37 | .33 | .36 | .76 | -.04 | -.27 | .24 | .15 | .15 | .09 | .10 | .19 | .61 |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | – | (0) | (0) | (0) | (0) | (0) | (.39) | (1) | (1) | (0) | (0) | (.34) | (.05) | (0) | (1) | |
f2 | .22 | .20 | .16 | -.01 | .21 | .17 | .20 | .76 | – | .46 | .36 | .43 | .82 | .06 | .46 | .21 | .09 | .08 | .05 | .07 | .11 | .57 |
(0) | (0) | (.05) | (1) | (.02) | (.04) | (.02) | (0) | – | (0) | (0) | (0) | (0) | (.50) | (.04) | (1) | (.12) | (.16) | (.44) | (.19) | (.09) | (1) | |
f3 | .67 | .69 | .70 | .74 | .62 | .47 | .59 | .52 | .55 | – | .78 | .81 | .53 | .12 | .05 | .02 | .49 | .54 | .45 | .51 | .65 | .38 |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | – | (0) | (0) | (0) | (0) | (.48) | (1) | (0) | (0) | (0) | (0) | (0) | (0) | |
f4 | .63 | .62 | .67 | .74 | .61 | .57 | .67 | .34 | .41 | .73 | – | .85 | .50 | .18 | -.07 | .03 | .44 | .48 | .34 | .41 | .56 | .31 |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | – | (0) | (0) | (0) | (.30) | (1) | (0) | (0) | (0) | (0) | (0) | (0) | |
f5 | .62 | .67 | .66 | .75 | .65 | .55 | .62 | .42 | .49 | .77 | .79 | – | .58 | .18 | .001 | .14 | .38 | .42 | .34 | .40 | .51 | .29 |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | – | (0) | (0) | (.99) | (1) | (0) | (0) | (0) | (0) | (0) | (0) | |
f6 | .39 | .42 | .43 | .52 | .53 | .48 | .38 | .53 | .78 | .62 | .59 | .67 | – | .12 | .23 | .36 | .19 | .33 | .11 | .13 | .23 | .26 |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | – | (0) | (.02) | (1) | (0) | (1) | (0) | (0) | (0) | (0) | |
f7 | .18 | .20 | .27 | .30 | .34 | .28 | .31 | .21 | .19 | .26 | .39 | .42 | .38 | – | 0 | .13 | .06 | .06 | .05 | .05 | .06 | .04 |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (.01) | (0) | (0) | (0) | (0) | – | (1) | (.13) | (.03) | (.02) | (.06) | (.05) | (.05) | (.25) | |
f8 | .08 | .12 | .14 | .02 | .31 | .06 | .03 | .31 | .32 | .08 | -.02 | .05 | .29 | .13 | – | .28 | .01 | .02 | .02 | -.05 | .06 | -.15 |
(.19) | (.06) | (.03) | (.75) | (0) | (.27) | (.54) | (0) | (0) | (.18) | (.77) | (.44) | (0) | (.02) | – | (0) | (.91) | (1) | (1) | (.40) | (1) | (.05) | |
f9 | .02 | .03 | .04 | .01 | .08 | .01 | .05 | .08 | .11 | .06 | .07 | .08 | .09 | .20 | .42 | – | .05 | .01 | .06 | .09 | .05 | .02 |
(.56) | (.47) | (.37) | (.84) | (.05) | (.85) | (.17) | (.30) | (.32) | (.11) | (.05) | (.03) | (.19) | (0) | (0) | – | (1) | (.55) | (1) | (1) | (.61) | (1) | |
p1 | .58 | .53 | .53 | .50 | .48 | .51 | .54 | .31 | .36 | .59 | .54 | .51 | .34 | .15 | .02 | .08 | – | .97 | .42 | .44 | .67 | .27 |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (1) | (0) | (0) | (0) | (0) | (0) | (.68) | (.03) | – | (0) | (0) | (0) | (0) | (0) | |
p2 | .64 | .59 | .58 | .60 | .52 | .56 | .62 | .32 | .17 | .67 | .61 | .57 | .35 | .15 | .01 | .08 | .96 | – | .52 | .54 | .79 | .31 |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (.02) | (0) | (0) | (0) | (0) | (0) | (.85) | (.02) | (0) | – | (0) | (0) | (0) | (0) | |
p3 | .64 | .68 | .72 | .69 | .53 | .39 | .55 | .32 | .21 | .73 | .63 | .64 | .48 | .13 | -.05 | .05 | .60 | .71 | – | .99 | .83 | .69 |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (.02) | (0) | (0) | (0) | (0) | (0) | (.39) | (.15) | (0) | (0) | – | (0) | (0) | (0) | |
p4 | .65 | .67 | .67 | .68 | .47 | .38 | .55 | .30 | .26 | .71 | .62 | .60 | .35 | .12 | -.08 | .02 | .63 | .72 | .96 | – | .88 | .72 |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (.16) | (.64) | (0) | (0) | (0) | – | (0) | (0) | |
p5 | .63 | .65 | .67 | .65 | .48 | .48 | .59 | .44 | .22 | .69 | .62 | .59 | .31 | .10 | -.06 | .03 | .75 | .83 | .89 | .88 | – | .56 |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (.05) | (0) | (0) | (0) | (0) | (0) | (.35) | (.40) | (0) | (0) | (0) | (0) | – | (0) | |
p6 | .39 | .39 | .41 | .43 | .29 | .23 | .30 | .23 | .17 | .44 | .33 | .33 | .29 | .11 | -.17 | -.07 | .35 | .41 | .62 | .67 | .59 | – |
(0) | (0) | (0) | (0) | (0) | (0) | (0) | (0) | (.15) | (0) | (0) | (0) | (0) | (0) | (.02) | (.10) | (0) | (0) | (0) | (0) | (0) | – |
ClinicalTrials, Distributions, Econometrics, Finance, Survival
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Li, et al., "Profile Likelihood Estimation of the Correlation Coefficient in the Presence of Left, Right or Interval Censoring and Missing Data", The R Journal, 2018
BibTeX citation
@article{RJ-2018-040, author = {Li, Yanming and Gillespie, Brenda and Shedden, Kerby and Gillespie, John}, title = {Profile Likelihood Estimation of the Correlation Coefficient in the Presence of Left, Right or Interval Censoring and Missing Data}, journal = {The R Journal}, year = {2018}, note = {https://rjournal.github.io/}, volume = {10}, issue = {2}, issn = {2073-4859}, pages = {159-179} }