In this paper, we describe an R package named coxphMIC, which implements the sparse estimation method for Cox proportional hazards models via approximated information criterion (Su et al. 2016). The developed methodology is named MIC which stands for “Minimizing approximated Information Criteria". A reparameterization step is introduced to enforce sparsity while at the same time keeping the objective function smooth. As a result, MIC is computationally fast with a superior performance in sparse estimation. Furthermore, the reparameterization tactic yields an additional advantage in terms of circumventing post-selection inference (Leeb and Pötscher 2005). The MIC method and its R implementation are introduced and illustrated with the PBC data.
Time to event (survival time) is often a primary outcome of interest in many research areas, especially in medical research such as time that takes to respond to a particular therapy, time to death, remission, or relapse. Survival times are typically right skewed and subject to censoring due to study termination, loss of follow ups, or withdrawals. Moreover, covariates may vary by time.
Cox Proportional Hazards (PH) model (Cox 1972) is commonly used to
model survival data. Given a typical survival data set that consists of
Since the true
In BSS, the penalty function is set to
A new method, named MIC for “Minimizing approximated Information
Criteria", is developed to conduct sparse estimation of Cox PH models.
MIC borrows strength from both BSS and regularization. The main issue
with BSS is the indicator function,
As shown in Figure 1(a),
The above reparameterization offers several important conveniences:
Sparsity now becomes achievable in estimating
In terms of practical optimization, it is preferable to consider
One consequence of post-selection inference is that no standard
error formula is available for zero estimates of
The R package coxphMIC implements MIC on the basis of R package
survival
(Therneau and Grambsch 2000) and is hosted at CRAN. Type the following
command in R console in order to install the package:
> install.packages("coxphMIC")
To summarize, MIC can be simply formulated as the following optimization
problem
SANN
method combined with the BFGS
method in R function optim()
is fast
and quite effective. The simulated annealing (SA) implemented by SANN
helps locate a nearly minimum point globally. Then the quasi-Newton BFGS
method makes sure that the algorithm stops at a critical point.
There are two functions included in the coxphMIC package: an
internal function LoglikPen()
that computes the partial log-likelihood
and a wrapper function coxphMIC()
that does the MIC sparse estimation.
The function coxphMIC()
has the following usage:
coxphMIC(formula = Surv(time, status) ~ ., data, method.beta0 = "MPLE",
beta0 = NULL, theta0 = 1, method = "BIC", lambda0 = NULL, a0 = NULL,
scale.x = TRUE, maxit.global = 300, maxit.local = 100,
rounding.digits = 4, zero = sqrt(.Machine$double.eps),
compute.se.gamma = TRUE, compute.se.beta = TRUE,
CI.gamma = TRUE, conf.level = 0.95,
details = FALSE)
We briefly explain some of the important options. The formula
argument
is a formula object similar to that in survival
, with the response on
the left of the operator being a survival object as returned by the
Surv
function, and the terms on the right being predictors. The
arguments method.beta0
, beta0
, and theta0
pertains to the initial
starting values. By default, the maximum partial likelihood estimator
with the option MPLE
is used. Otherwise, one can use the ridge
estimator with option ridge
. The theta0
corresponds to the tuning
parameter in ridge estimation. User defined starting values can also be
used such as beta0
. By default, the approximated BIC (Vollinsk and Raftery 2000)
is recommended. However, one can use AIC
(Akaike 1974).
Alternatively, user-specified penalty is allowed by specifying
lambda0
. The default value for maxit.global
allows for specification of the maximal iteration steps in SANN
while
maxit.local
specifies the maximal iteration steps for BFGS
. MIC
computes the standard errors (SE) for both maxit.global
asks whether the user wants to output the
confidence intervals for conf.level
(with 95% as default).
The output of Function coxphMIC()
is an object of S3 class coxphMIC
,
which is essentially a list of detailed objects that can be used for
other purposes. In particular, the item result
presents the most
important results, where one can see the selected model and inference
based on testing print
and
plot
, are made available for exploring a coxphMIC
object.
Several other R packages are available for variable selection of Cox PH
models. The best subset selection (BSS) is available in the R Package
glmulti (Calcagno and de Mazancourt 2010) with AIC only, but it is very slow owing to
the intensive computation involved. For large
Censoring | Method | |||||||
n | p | Rate | Full | Stepwise | MIC | LASSO | ALASSO | SCAD |
200 | 10 | 25% | 0.007 | 0.307 | 0.067 | 0.157 | 0.163 | 5.923 |
40% | 0.000 | 0.320 | 0.060 | 0.150 | 0.170 | 4.900 | ||
50 | 25% | 0.027 | 18.957 | 0.063 | 0.397 | 0.417 | 5.587 | |
40% | 0.027 | 18.107 | 0.057 | 0.453 | 0.480 | 5.480 | ||
100 | 25% | 0.060 | 189.040 | 0.057 | 1.450 | 1.387 | — | |
40% | 0.067 | 181.147 | 0.057 | 2.053 | 1.897 | — | ||
2000 | 10 | 25% | 0.020 | 0.907 | 0.243 | 0.903 | 0.837 | 415.097 |
40% | 0.017 | 0.880 | 0.240 | 0.893 | 0.823 | 328.150 | ||
50 | 25% | 0.110 | 81.380 | 0.243 | 1.590 | 1.153 | — | |
40% | 0.093 | 72.887 | 0.237 | 1.613 | 1.163 | — | ||
100 | 25% | 0.333 | 894.607 | 0.223 | 2.383 | 2.103 | — | |
40% | 0.240 | 673.503 | 0.187 | 2.073 | 1.357 | — |
We illustrate the usage of coxphMIC
via an analysis of the PBC
(primary biliary cirrhosis) data, available from the survival
package (Therneau and Grambsch 2000).
To proceed, some minor data preparation is needed. First of all, we want to make sure that the censoring indicator is 0-1 binary.
> library(survival); data(pbc);
> dat <- pbc; dim(dat);
[1] 418 20
> dat$status <- ifelse(pbc$status == 2, 1, 0)
Next, we explicitly created dummy variable for categorical variables.
The factor()
function could be used instead. Also, grouped sparsity
could be used to handle these dummy variables so that they are either
all selected or all excluded. We plan to explore this possibility in
future research.
> dat$sex <- ifelse(pbc$sex == "f", 1, 0)
Another necessary step is to handle missing values. This current version does not automatically treat missings. Here, the listwise deletion is used so that only the 276 subjects with complete records are used for further analysis.
> dat <- na.omit(dat);
> dim(dat);
[1] 276 20
> head(dat)
id time status trt age sex ascites hepato spiders edema bili chol
1 1 400 1 1 58.76523 1 1 1 1 1.0 14.5 261
2 2 4500 0 1 56.44627 1 0 1 1 0.0 1.1 302
3 3 1012 1 1 70.07255 0 0 0 0 0.5 1.4 176
4 4 1925 1 1 54.74059 1 0 1 1 0.5 1.8 244
5 5 1504 0 2 38.10541 1 0 1 1 0.0 3.4 279
7 7 1832 0 2 55.53457 1 0 1 0 0.0 1.0 322
albumin copper alk.phos ast trig platelet protime stage
1 2.60 156 1718.0 137.95 172 190 12.2 4
2 4.14 54 7394.8 113.52 88 221 10.6 3
3 3.48 210 516.0 96.10 55 151 12.0 4
4 2.54 64 6121.8 60.63 92 183 10.3 4
5 3.53 143 671.0 113.15 72 136 10.9 3
7 4.09 52 824.0 60.45 213 204 9.7 3
The data set now contains 20 variables. Except id
, time
, and
status
, there are a total of 17 predictors.
To apply coxphMIC
, one simply proceeds in the usual way of using
coxph
formula. By default, all predictors are standardized; the
approximated BIC (
> fit.mic <- coxphMIC(formula = Surv(time, status)~.-id, data = dat, CI.gamma = FALSE)
> names(fit.mic)
[1] "opt.global" "opt.local" "min.Q" "gamma" "beta" "VCOV.gamma"
[7] "se.gamma" "se.beta" "BIC" "result" "call"
The output of coxphMIC
contains the minimized result
. In order for the user to be able to
inspect the convergence and other detailed info of the optimization
algorithms, we also output two objects opt.global
and opt.local
,
which result from the global (SANN
by default) and local optimization
(BFGS
by default) algorithms.
The output fit.mic
is a S3 object of coxphMIC
class. Two generic
functions, print
and plot
, are available. The print
function
provides a summary table as below:
> print(fit.mic)
beta0 gamma se.gamma z.stat p.value beta.MIC se.beta.MIC
trt -0.0622 0.0000 0.1071 0.0000 1.0000 0.0000 NA
age 0.3041 0.3309 0.1219 2.7138 0.0067 0.3309 0.1074
sex -0.1204 0.0000 0.1086 -0.0002 0.9998 0.0000 NA
ascites 0.0224 0.0000 0.0991 0.0000 1.0000 0.0000 NA
hepato 0.0128 0.0000 0.1259 0.0000 1.0000 0.0000 NA
spiders 0.0460 0.0000 0.1118 -0.0001 1.0000 0.0000 NA
edema 0.2733 0.2224 0.1066 2.0861 0.0370 0.2224 0.0939
bili 0.3681 0.3909 0.1142 3.4237 0.0006 0.3909 0.0890
chol 0.1155 0.0000 0.1181 0.0002 0.9999 0.0000 NA
albumin -0.2999 -0.2901 0.1248 -2.3239 0.0201 -0.2901 0.1103
copper 0.2198 0.2518 0.1050 2.3986 0.0165 0.2518 0.0868
alk.phos 0.0022 0.0000 0.0837 0.0000 1.0000 0.0000 NA
ast 0.2308 0.2484 0.1128 2.2023 0.0276 0.2484 0.1025
trig -0.0637 0.0000 0.0858 0.0000 1.0000 0.0000 NA
platelet 0.0840 0.0000 0.1129 0.0000 1.0000 0.0000 NA
protime 0.2344 0.2293 0.1046 2.1917 0.0284 0.2293 0.1022
stage 0.3881 0.3692 0.1476 2.5007 0.0124 0.3692 0.1243
The above results are presented as Table 4 in Su et al. (2016). In this example,
MIC started with MPLE given by the first column named beta0
. Columns
2–5 present estimation of age
, edema
, bili
,
albumin
, copper
, ast
, protime
, and stage
.
The plot
function provides error bar plots based on the MIC estimator
of both
> plot(fit.mic, conf.level = 0.95)
as shown in Figure 2. Essentially, the 95% confidence
intervals (CI) are plotted. One can modify the confidence level with the
conf.level
option. To compare two plots conveniently, they are made
with the same range on the vertical y-axis. Note that CI is not
available for any zero
Trying out multiple starting point is a common strategy in facing global
optimization problems. We may consider starting with the beta0 =
method.beta0
is neither ‘MPLE
’ nor ‘ridge
’ and a specific value for beta0
is
not given, i.e., setting beta0 = NULL
.
> fit0.mic <- coxphMIC(formula = Surv(time, status)~.-id, data = dat,
+ method = "BIC", scale.x = TRUE, method.beta0 = "zero")
> c(fit.mic$min.Q, fit0.mic$min.Q)
[1] 974.3340 978.1232
We can compare the minimized objective function min.Q
to decide which
fitting result is preferable (i.e., the smaller one). The above result
suggests that the fit with MPLE as starting point remains preferable.
Concerning sparse estimation, the vectors with 0/+1/-1 values obtained
by applying a threshold to the MPLE
> beta.MPLE <- fit.mic$result[, 1]
> beta0 <- sign(beta.MPLE)*sign(abs(beta.MPLE) > .06);
> cbind(beta.MPLE, beta0)
beta.MPLE beta0
[1,] -0.0622 -1
[2,] 0.3041 1
[3,] -0.1204 -1
[4,] 0.0224 0
[5,] 0.0128 0
[6,] 0.0460 0
[7,] 0.2733 1
[8,] 0.3681 1
[9,] 0.1155 1
[10,] -0.2999 -1
[11,] 0.2198 1
[12,] 0.0022 0
[13,] 0.2308 1
[14,] -0.0637 -1
[15,] 0.0840 1
[16,] 0.2344 1
[17,] 0.3881 1
In the above example, we applied a threshold of 0.06 to the MPLE to obtain a 0/+1/-1 valued vector. To start with this user-supplied starting point, one proceeds as follows.
> fit1.mic <- coxphMIC(formula = Surv(time, status)~.-id, data = dat,
+ method = "BIC", scale.x = TRUE, method.beta0 = "user-supplied", beta0 = beta0)
> c(fit.mic$min.Q, fit0.mic$min.Q, fit1.mic$min.Q)
[1] 974.3340 978.1232 979.6826
Again, the fitting starting at MPLE seems the best in this example, by giving the smallest minimized value.
We may consider obtaining the regularization path with respect to
We try out a spread of A0
.
> set.seed(818)
> n <- NROW(dat); n0 <- sum(dat$status == 1)
> A0 <- 10:200
> p <- NCOL(dat)-3
> BETA <- matrix(0, nrow = length(A0), ncol = p) # USE ARRAY
> for (j in 1:length(A0)){
su.fit <- coxphMIC(formula = Surv(time, status)~.-id, data = dat, a0 = A0[j],
method = "BIC", scale.x = TRUE)
BETA[j, ] <- su.fit$beta
}
> BETA <- as.data.frame(BETA)
> colnames(BETA) <- colnames(dat)[-(1:3)]
> row.names(BETA) <- A0
> head(BETA, n = 5)
trt age sex ascites hepato spiders edema bili chol albumin copper alk.phos
10 0 0.2983 0 0 0 0 0.2024 0.4135 0 -0.2799 0.2495 0
11 0 0.2987 0 0 0 0 0.2015 0.4159 0 -0.2799 0.2491 0
12 0 0.2992 0 0 0 0 0.2006 0.4181 0 -0.2799 0.2487 0
13 0 0.3000 0 0 0 0 0.1998 0.4200 0 -0.2801 0.2482 0
14 0 0.3009 0 0 0 0 0.1992 0.4216 0 -0.2804 0.2478 0
ast trig platelet protime stage
10 0.1937 0 0 0.1912 0.3583
11 0.1924 0 0 0.1895 0.3612
12 0.1914 0 0 0.1878 0.3642
13 0.1906 0 0 0.1862 0.3672
14 0.1901 0 0 0.1847 0.3701
A plot of the regularization path with respect to
> par(mar = rep(5,4), mfrow = c(1,1))
> x.min <- min(A0); x.max <- max(A0)
> plot(x = c(x.min, x.max), y = c(min(BETA), max(BETA)), type = "n",
+ xlab = "a", cex.lab = 1.2, las = 1, ylab = expression(tilde(beta)))
> for (j in 1:ncol(BETA)){
+ lines(x = A0, y = BETA[,j], col = "red", lty = 1, lwd = 1)
+ points(x = A0, y = BETA[,j], col = "red", pch = j, cex = .3)
+ vname <- colnames(BETA)[j]
+ if (abs(BETA[nrow(BETA),j]) > .00001) {
# text(x.max+5, BETA[nrow(BETA),j], labels = vname, cex = 1, col = "blue")
+ mtext(text = vname, side = 4, line = 0.5, at = BETA[nrow(BETA),j], las = 1,
+ cex = 1, col = "blue", font = 1)
+ }
+ }
> abline(h = 0, col = "gray25", lwd = 2)
> abline(v = n0, col = "gray45", lwd = 1.5)
> text(n0+5, -0.2, expression(paste("a = ", n[0], " = ", 111, sep = "")), cex = 1.2,
+ col = "gray35")
From Figure 3, it can be seen that the regularization path is
essentially flat with respect to
The paper presents the coxphMIC
package to implement the MIC method
for Cox proportional hazards models. Compared to several other
competitive methods, MIC has three main advantages by offering a
superior empirical performance for it aims to minimize BIC (albeit
approximated) without reducing the search space, great computational
efficiency since it does not involve selection of any tuning parameter,
and a leeway to perform significance testing that is free of the
post-selection inference.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Nabi & Su, "coxphMIC: An R Package for Sparse Estimation of Cox Proportional Hazards Models via Approximated Information Criteria", The R Journal, 2017
BibTeX citation
@article{RJ-2017-018, author = {Nabi, Razieh and Su, Xiaogang}, title = {coxphMIC: An R Package for Sparse Estimation of Cox Proportional Hazards Models via Approximated Information Criteria}, journal = {The R Journal}, year = {2017}, note = {https://rjournal.github.io/}, volume = {9}, issue = {1}, issn = {2073-4859}, pages = {229-238} }