The receiver operating characteristic (ROC) curve is a graphical tool commonly used to depict the binary classification accuracy of a continuous marker in terms of its sensitivity and specificity. The standard ROC curve assumes a monotone relationship between the marker and the response, inducing classification subsets of the form \((c,\infty)\) with \(c \in \mathbb{R}\). However, in non-standard cases, the involved classification regions are not so clear, highlighting the importance of tracking the decision rules. This paper introduces the R package movieROC, which provides visualization tools for understanding the ability of markers to identify a characteristic of interest, complementing the ROC curve representation. This tool accommodates multivariate scenarios and generalizations involving different decision rules. The main contribution of this package is the visualization of the underlying classification regions, with the associated gain in interpretability. Adding the time (videos) as a third dimension, this package facilitates the visualization of binary classification in multivariate problems. It constitutes a good tool to generate graphical material for presentations.
The use of data to detect a characteristic of interest is a cornerstone of many disciplines such as medicine (to diagnose a pathology or to predict a patient outcome), finance (to detect fraud) or machine learning (to evaluate a classification algorithm), among others. Continuous markers are surrogate measures for the characteristic under study, or predictors of a potential subsequent event. They are measured in subjects, some of whom have the characteristic (positive), and some without it (negative). In addition to reliability and feasibility, a good marker must have two relevant properties: interpretability and accuracy (Mayeux 2004). High binary classification accuracy can be achieved if there exists a strong relationship between the marker and the response. The latter is assessed by a gold standard for the presence or absence of the characteristic of interest. Interpretability refers to the decision rules or subsets considered in the classification process. This piece of research seeks to elucidate both desirable properties for a marker by the implementation of a graphical tool in R language. We propose a novel approach involving the generation of videos as a solution to effectively capture the classification procedure for univariate and multivariate markers. Graphical analysis plays a pivotal role in data exploration, interpretation, and communication. Its burgeoning potential is underscored by the fast pace of technological advances, which empower the creation of insightful graphical representations.
A usual practice when the binary classification accuracy of a marker is of interest involves the representation of the Receiver Operating Characteristic (ROC) curve, summarized by the Area Under the Curve (AUC) (Hanley and McNeil 1982). The resulting plot reflects the trade-off between the sensitivity and the complement of the specificity. Sensitivity and specificity are probabilities of correctly classifying subjects, either positive or negative, respectively. Mathematically, let \(\xi\) and \(\chi\) be the random variables modeling the marker values in the positive and the negative population, respectively, with \(F_\xi(\cdot)\) and \(F_\chi(\cdot)\) their associated cumulative distribution functions. Assuming that the expected value of the marker is larger in the positive than in the negative population, the standard ROC curve is based on classification subsets of the form \(s = (c, \infty)\), where \(c\) is the so-called cut-off value or threshold in the support of the marker \(X\), \(\mathcal{S}(X)\). One subject is classified as a positive if its marker value is within this region, and as a negative otherwise. This type of subsets has two important advantages: first, their interpretability is clear; second, for each specificity \(1-t \in [0,1]\), the corresponding \(s_t = (c_t, \infty)\) is univocally defined by \(c_t = F_\chi^{-1}(1-t)\) for absolutely continuous markers.
When differences in marker distribution between the negative and the positive population are only in location but not in shape, then \(F_\chi(\cdot) < F_\xi(\cdot)\), and the classification is direct by using these decision rules. However, when this is not the case, the standard ROC curve may cross the main diagonal, resulting in an improper curve (Dorfman et al. 1997). This may be due to three different scenarios:
the behavior of the marker in the two studied populations is different but it is not possible to determine the decision rules. Notice that the binary classification problem goes further than distinction between the two populations: the classification subsets should be highly likely in one population and highly unlikely in the other one (Martı́nez-Camblor 2018);
there exists a relationship between the marker and the response with a potential classification use, but this is not monotone;
there is no relationship between the marker and the response at all (main diagonal ROC curve).
In the second case, we have to define classification subsets different from standard \(s_t=(c_t,\infty)\). Therefore, the use of the marker becomes more complex. With the aim of accommodating scenarios where both higher and lower values of the marker are associated with a higher risk of having the characteristic, Martı́nez-Camblor et al. (2017) proposed the so-called generalized ROC (gROC) curve. This curve tracks the highest sensitivity for every specificity in the unit interval resulting from subsets of the form \(s_t=(-\infty, x_t^L] \cup (x_t^U, \infty)\) with \(x_t^L \leq x_t^U \in \mathcal{S}(X)\).
Although final decisions are based on the underlying classification subsets, they are typically not depicted. This omission is not a shortcoming in standard cases, as for each specificity \(1-t \in [0,1]\), there is only one rule of the form \(s_t = (c_t, \infty)\) with such specificity. Particularly, \(s_t\) is univocally defined by \(c_t = 1 - F_\chi^{-1}(1-t)\); and the same applies if we fix a sensitivity. Nevertheless, if the gROC curve is taken, there are infinite subsets of the form \(s_t=(-\infty, x_t^L] \cup (x_t^U, \infty)\) resulting in \(\mathbb{P}(\chi \in s_t) = t\). This loss of univocity underlines the importance of reporting (numerically and/or graphically) the decision rules actually proposed for classifying. This gap is covered in the presented package.
An alternative approach to assess the classification performance of a marker involves considering a transformation of it. This transformation \(h(\cdot)\) aims to capture differences in distribution between the two populations in the ROC sense. Once \(h(\cdot)\) is identified, the standard ROC curve for \(h(X)\) is represented, resulting in the efficient ROC (eROC) curve (Kauppi 2016). Arguing as before, for a fixed specificity, the classification subsets \(s_t=(c_t, \infty)\) in the transformed space are univocally defined, where a subject is classified as positive if \(h(x) \in s_t\) and negative otherwise (with \(x\) representing its marker value). However, they may have any shape in the original space, depending on the monotonicity of the functional transformation \(h(\cdot)\) (Martı́nez-Camblor et al. 2019). Emphasizing the importance of tracking the decision rules underlying the eROC curve, this monitoring process enables an assessment of whether the improved accuracy of the marker justifies the potential loss in interpretability.
The ROC curve is defined for classification accuracy evaluation of univariate markers. To deal with multivariate markers, the usual practice is to consider a transformation \(\boldsymbol{h}(\cdot)\) to reduce it to a univariate one, and then to construct the standard ROC curve. Same considerations as before apply when a functional transformation is taken. In the proposed R library, we consider methods from the literature to define and estimate \(\boldsymbol{h}(\cdot)\) in the multivariate case (Kang et al. 2016; Meisner et al. 2021).
Focusing on the classification subsets underlying the decision rules, the movieROC package incorporates methods to visualize the construction process of ROC curves by presenting the classification accuracy of these subsets. For univariate markers, the library includes both the classical (standard ROC curve) and the generalized (gROC curve) approach. Besides, it enables the display of decision rules for various transformations of the marker, seeking to maximize performance and allowing for flexibility in the final shape of the subsets (eROC curve). For multidimensional markers, the proposed tool visualizes the evolution of decision subsets when different objective functions are employed for optimization, even imposing restrictions on the underlying regions. In this case, displaying the decision rules associated with every specificity in a single static image is no longer feasible. Therefore, dynamic representations (videos) are implemented, drawing on time as an extra dimension to capture the variation in specificity.
Much software available in R could be discussed here covering diverse topics related to ROC curves: the pROC package is a main reference including tools to visualize, estimate and compare ROC curves (Robin et al. 2011); ROCnReg explicitly considers covariate information to estimate the covariate-specific and the covariate-adjusted ROC curves (Rodrı́guez-Álvarez and Inácio 2021); smoothROCtime implements smooth estimation of time-dependent ROC curves based on the bivariate kernel density estimator for \((X, \textit{time-to-event})\) (Dı́az-Coto 2020); OptimalCutpoints includes point and interval estimation methods for optimal thresholds (López-Ratón et al. 2014); and nsROC performs non-standard analysis such as gROC estimation (Pérez-Fernández et al. 2018); among others.
This paper introduces and elucidates the diverse functionalities of the newly developed movieROC package, aimed at facilitating the visualization and comprehension of the decision rules underlying the binary classification process, encompassing various generalizations. Despite the availability of numerous R packages implementing related analyses, we have identified the main gaps covered in this library: tracking the decision rules underlying the ROC curve, including multivariate markers and non-standard scenarios (i.e. non-monotonic). The rest of the paper is structured as follows. In Section 2, we introduce the main R functions and objects implemented, and briefly explain the dataset employed throughout this manuscript to demonstrate the utility of the R library. Section 3 is devoted to reconsidering the definition of the standard ROC curve from the perspective of classification subsets, including an extension to multivariate scenarios. Sections 4 and 5 revisit the gROC curve and the eROC curve, respectively, covering various methods to capture the potential classification accuracy of the marker under study. Each of these sections begins with a state-of-the-art overview, followed by the main syntax of the corresponding R functions. In addition, examples of implementation using the dataset presented in Section 2.3 are provided. Finally, the paper concludes with a concise summary and computational details regarding the implemented tool.
Sections 2.1 and 2.2 provide a detailed description of the main objectives of the implemented R functions. To reflect the practical usage of the developed R package, we employ a real dataset throughout this manuscript, which is introduced in Section 2.3.
A graphical tool was developed to showcase static and dynamic graphics
displaying the classification subsets derived from maximizing diagnostic
accuracy under certain assumptions, ensuring the preservation of the
interpretability. The R package facilitates the construction of the ROC
curve across various specificities, providing visualizations of the
resulting classification regions. The proposed tool comprises multiple R
functions that generate objects with distinct class attributes (see
function names where red arrows depart from and red nodes in
Figure 1, respectively). Once the object of
interest is created, different methods may be used, in order to plot the
underlying regions (plot_regions(), plot_funregions()), to track the
resulting ROC curve (plot_buildROC(), plot()), to predict decision
rules for a particular specificity, and to print relevant information,
among others. The main function of the package, movieROC(), produces
videos to exhibit the classification procedure.
Figure 1: R functions of the movieROC package. The blue nodes include the names of the R functions and the red nodes indicate the different R objects that can be created and worked with. The red arrows depart from those R functions engaged in creating R objects and the black arrows indicate which R functions can be applied to which R objects. The grey dashed arrows show internal dependencies.
It includes algorithms to visualize the regions that underlie the binary classification problem, considering different approaches:
make the classification subsets flexible in order to cover
non-standard scenarios, by considering two cut-off values (gROC()
function); explained in
Section 4;
transform the marker by a proper function \(h(\cdot)\) (hROC()
function); introduced in
Section 5;
when dealing with multivariate markers, consider a functional
transformation with some fixed or dynamic parameters resulting from
different methods available in the literature (multiROC()
function); covered in
Section 3.1.
By using the gROC(), the multiROC() or the hROC() function, the
user obtains an R object of class ‘groc’, ‘multiroc’ or ‘hroc’,
respectively. These will be called
movieROC objects.
Once the object of interest is created, the implemented package includes
many functions (methods) to pass to it. Some of them are generic methods
(print(), plot() and predict()), commonly used in R language over
different objects according to their class attributes. The rest of the
functions are specific for this library and therefore only applicable to
movieROC objects.
The following outline summarizes all these functions and
provides their target and main syntax (with default input parameters).
print(): Print some relevant information.plot(): Plot the ROC curve estimate.predict(): Print the classification subsets corresponding to a particular false-positive rate (FPR) introduced by the user. For a ‘groc’ object, the user may specify a cut-off value C (for the standard ROC curve) or two cut-off values XL and XU (for the gROC curve).plot_regions()Applicable to a ‘groc’ or a ‘hroc’ object. Plot two graphics in the same figure: left, classification subsets for each false-positive rate (grey color by default); right, \(90^\circ\) rotated ROC curve.
plot_regions(obj, plot.roc = TRUE, plot.auc = FALSE, FPR = 0.15, ...)
If the input parameter FPR is specified, the corresponding classification region reporting such false-positive rate and the point in the ROC curve are highlighted in blue color.
plot_funregions()Applicable to a ‘groc’ or a ‘hroc’ object.
Plot the transforming function and the classification subsets reporting the false-positive rate(s) indicated in the input parameter(s) FPR and FPR2.
plot_funregions(obj, FPR = 0.15, FPR2 = NULL, plot.subsets = TRUE, ...)
plot_buildROC()Applicable to a ‘groc’ or a ‘multiroc’ object.
groc’ object: Plot four (if input reduce is FALSE) or two (if reduce is TRUE, only those on the top) graphics in the same figure: top-left, density function estimates for the marker in both populations with the areas corresponding to FPR and TPR colored (blue and red, respectively) for the optional input parameter FPR, C or XL, XU; top-right, the empirical ROC curve estimate; bottom-left, boxplots in both groups; bottom-right, classification subsets for every FPR (grey color).plot_buildROC(obj, FPR = 0.15, C, XL, XU, h = c(1,1),
histogram = FALSE, breaks = 15, reduce = TRUE,
build.process = FALSE, completeROC = FALSE, ...)
If build.process is FALSE, the whole ROC curve is displayed; otherwise, if completeROC is TRUE, the portion of the ROC curve until the fixed FPR is highlighted in black and the rest is shown in gray, while if completeROC is FALSE, only the first portion of the curve is displayed.
multiroc’ object: Plot two graphics in the same figure: right, the ROC curve highlighting the point and the threshold for the resulting univariate marker; left, scatterplot with the marker values in both positive (red color) and negative (blue color) subjects. About the left graphic: for \(p=2\), over the original/feature bivariate space; for \(p>2\), projected over two selected components of the marker (if display.method = "OV" with components selection in displayOV, c(1,2) by default) or the first two principal components from PCA (if display.method = "PCA", default). The classification subset reporting the FPR selected by the user (FPR \(\neq\) NULL) is displayed in gold color.Main syntax:
for \(p=2\):plot_buildROC(obj, FPR = 0.15,
build.process = FALSE, completeROC = TRUE, ...)
If build.process is FALSE, the whole ROC curve is displayed; otherwise, if completeROC is TRUE, the portion of the ROC curve until the fixed FPR is highlighted in black and the rest is shown in gray, while if completeROC is FALSE, only the first portion of the curve is shown.
movieROC()Applicable to a ‘groc’ or a ‘multiroc’ object. Save a video as a GIF illustrating the construction of the ROC curve.
groc’ object:movieROC(obj, fpr = NULL,
h = c(1,1), histogram = FALSE, breaks = 15,
reduce = TRUE, completeROC = FALSE, videobar = TRUE,
file = "animation1.gif", ...)
For each element in vector fpr (optional input parameter), the function executed is plot_buildROC(obj, FPR = fpr[i], build.process = TRUE, ...). The vector of false-positive rates illustrated in the video is NULL by default: if length of output parameter t for gROC() function is lower than 150, such vector is taken as fpr; otherwise, an equally-spaced vector of length 100 covering the range of the marker values is considered.
multiroc’ object:Main syntax:
for \(p=2\):movieROC(obj, fpr = NULL,
file = "animation1.gif", save = TRUE,
border = TRUE, completeROC = FALSE, ...)
The video is saved by default as a GIF with the name indicated in argument file (extension .gif should be added). A border for the classification subsets is drawn by default.
For each element in vector fpr (optional input parameter), the function executed is
plot_buildROC(obj, FPR = fpr[i], build.process = TRUE, completeROC, ...)
plot_buildROC(obj, FPR = fpr[i], build.process = TRUE, completeROC,
display.method, displayOV, ...)
Same considerations about the input fpr as those for movieROC() over a ‘groc’ object.
In order to illustrate the functionality of our R package, we consider
the HCC data. This dataset is derived from gene expression arrays of
tumor and adjacent non-tumor tissues of 62 Taiwanese cases of
hepatocellular carcinoma (HCC). The goal of the original study
(Shen et al. 2012) was to identify, with a genome-wide approach, additional
genes hypermethylated in HCC that could be used for more accurate
analysis of plasma DNA for early diagnosis, by using Illumina
methylation arrays (Illumina, Inc., San Diego, CA) that screen 27,578
autosomal CpG sites. The complete dataset was deposited in NCBI’s Gene
Expression Omnibus (GEO) and it is available through series accession
number GSE37988
(www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37988).
It is included in the presented package (HCC dataset), selecting 948
genes with complete information.
The following code loads the R package and the HCC dataset (see the
vignette
for main structure).
R> library(movieROC)
R> data(HCC)We selected the genes 20202438, 18384097, and 03515901. On the one hand, we chose the gene 03515901 as an example of a monotone relationship between the marker and the response, reporting a good ROC curve. On the other hand, relative gene expression intensities of the genes 20202438 and 18384097 tend to be more extreme in tissues with tumor than in those without it. These are non-standard cases, so if we limit ourselves to detect “appropriate” genes on the basis of the standard ROC curve, they would not be chosen. However, extending the decision rules by means of the gROC curve, those genes may be considered as potential biomarkers (locations) to differ between the two groups. The R code estimating and displaying the density probability function for gene expression intensities of the selected genes in each group (Figure 2) is included in the vignette.
Figure 2: Density histograms and kernel density estimations (lighter) for gene expression intensities of the genes 20202438, 18384097 and 03515901 in negative (non-tumor) and positive (tumor) tissues.
Assuming that there exists a monotone relationship between the marker and the response, the regular, right-sided or standard ROC curve associated with the marker \(X\) considers classification subsets of the form \(s_t=(c_t,\infty)\). For each specificity \(1-t=\mathbb{P}(\chi \notin s_t) \in [0,1]\), also called true-negative rate, there exists only one subset \(s_t\) reporting such specificity and thus a particular sensitivity, also called true-positive rate, \(\mathbb{P}(\xi \in s_t)\). This results in a simple correspondence between each point of the ROC curve \(\mathcal{R}_r(t) = 1-F_\xi \big(F_\chi^{-1}(1-t)\big)\) and its associated classification region \(s_t \in \mathcal{I}_r(t)\), where \[\mathcal{I}_r(t) = \Big\{ s_t = (c_t, \infty) : c_t \in \mathcal{S}(X) , \mathbb{P}(\chi \in s_t) = t \Big\}\] is the right-sided family of eligible classification subsets. The definition of this family captures the shape of the decision rules and the target specificity.
If higher values of the marker are associated with a higher probability of not having the characteristic (see gene 03515901 in Figure 2), the ROC curve would be defined by the left-sided family of eligible classification subsets (Martı́nez-Camblor et al. 2017), \(\mathcal{I}_l(t)\), similarly to \(\mathcal{I}_r(t)\) but with the form \(s_t = (\infty, c_t]\). It results in \(\mathcal{R}_l(t) = F_\xi \big(F_\chi^{-1}(t) \big)\), \(t\in[0,1]\), and the decision rules are also univocally defined in this case.
The ROC curve and related problems were widely studied in the literature; interested readers are referred to the monographs of Zhou et al. (2002), Pepe (2003), and Nakas et al. (2023), as well as the review by Inácio et al. (2021). By definition, the ROC curve is confined within the unit square, with optimal performance achieved when it approaches the left-upper corner (AUC closer to 1). Conversely, proximity to the main diagonal (AUC closer to 0.5) means diminished discriminatory ability, resembling a random classifier.
In practice, let \((\xi_1, \xi_2, \dots, \xi_n)\) and
\((\chi_1, \chi_2, \dots, \chi_m)\) be two independent and identically
distributed (i.i.d.) samples from the positive and the negative
population, respectively. Different estimation procedures are
implemented in the
movieROC package,
such as the empirical estimator (Hsieh and Turnbull 1996) (by default in the gROC()
function), accompanied by its summary indices: the AUC and the Youden
index (Youden 1950). Alternatively, semiparametric approaches based on
kernel density estimation for the involved distributions may be
considered (Zou et al. 1997). The plot_densityROC() function provides plots
for both right- and left-sided ROC curves estimated by this method. On
the other hand, assuming that the marker follows a gaussian distribution
in both populations, that is,
\(\xi \sim \mathcal{N}(\mu_\xi, \sigma_\xi)\) and
\(\chi \sim \mathcal{N}(\mu_\chi, \sigma_\chi)\), parametric approaches
propose plug-in estimators by estimating the unknown parameters while
using the known distributions (Hanley 1988). This parametric estimation
is included in the gROC_param() function, which works similarly to
gROC().
Main syntax:
gROC(X, D, side = "right", ...)
gROC_param(X, D, side = "right", ...)
Table 1 in the
vignette
provides the main input and output parameters of these R functions,
which estimate the regular ROC curve (right-sided or left-sided with
side = "right" or "left", respectively) and associated decision
rules. Its output is an R object of class ‘groc’, to which the
functions listed in Section 2.2 can be applied. Most of them are
visualization tools, but the user may also print() summary information
and predict() classification regions for a particular specificity.
Figure 3 graphically represents the empirical estimation of the standard (gray line) and generalized (black line) ROC curves for each gene in Figure 2. To construct the standard ROC curve for the first two genes (20202438 and 18384097), the right-sided ROC curve is considered; and the left-sided curve for the third one (03515901). As expected following the discussion about Figure 2, the standard and gROC curves are similar for the third gene because there exists a monotone relationship between the marker and the response. However, these curves differ for the first two genes due to the lack of monotonicity in those scenarios. The empirical gROC curve estimator is explained in detail in Section 4.
Next chunk of code generates the figure, providing an example of the use
of gROC() function, plot() and how to get access to the AUC.
R> for(gene in c("20202438", "18384097", "03515901")){
+ roc <- gROC(X = HCC[,paste0("cg",gene)], D = HCC$tumor,
+ side = ifelse(gene == "03515901", "left", "right"))
+ plot(roc, col = "gray50", main = paste("Gene", gene), lwd = 3)
+ groc <- gROC(X = HCC[,paste0("cg",gene)], D = HCC$tumor, side = "both")
+ plot(groc, new = FALSE, lwd = 3)
+ legend("bottomright", paste(c("AUC =", "gAUC ="), format(c(roc$auc, groc$auc),
+ digits = 3)), col = c("gray50", "black"), lwd = 3, bty = "n", inset = .01)}Figure 3: Standard ROC curve (in gray) and gROC curve (in black) empirical estimation for the capacity of genes 20202438, 18384097 and 03515901 to differ between tumor and non-tumor group.
The following code snippet estimates the standard ROC curve for gene
20202438, prints its basic information, and predicts the classification
region and sensitivity resulting in a specificity of 0.9. It provides an
illustrative example of utilizing the print() and predict()
functions.
R> roc_selg1 <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "right")
R> roc_selg1Data was encoded with nontumor (controls) and tumor (cases).
It is assumed that larger values of the marker indicate larger confidence that a
given subject is a case.
There are 62 controls and 62 cases.
The specificity and sensitivity reported by the Youden index are 0.855 and 0.403,
respectively, corresponding to the following classification subset: (0.799, Inf).
The area under the right-sided ROC curve (AUC) is 0.547.R> predict(roc_selg1, FPR = .1)$ClassSubsets $Specificity $Sensitivity
[1] 0.8063487 Inf [1] 0.9032258 [1] 0.3064516The following line of code displays the whole construction of the empirical standard ROC curve for gene 20202438. The video is saved by default as a GIF with the name provided.
R> movieROC(roc_selg1, reduce = FALSE, file = "StandardROC_gene20202438.gif")In practice, many cases may benefit from combining information from different markers to enhance classification accuracy. Rather than assessing univariate markers separately, taking the multivariate marker resulting from merging them can yield relevant gain. However, note that the ROC curve and related indices are defined only for univariate markers, as they require the existence of a total order. To address this limitation, a common approach involves transforming the \(p\)-dimensional multivariate marker \(\boldsymbol{X}\) into a univariate one through a functional transformation \(\boldsymbol{h}: \mathbb{R}^p \longrightarrow \mathbb{R}\). This transformation \(\boldsymbol{h}(\cdot)\) seeks to optimize an objective function related to the classification accuracy, usually the AUC (Su and Liu 1993; McIntosh and Pepe 2002; Martı́nez-Camblor et al. 2019).
We enumerate the methods included in the proposed R tool by the
multiROC() function (with the input parameter method), listed
according to the objective function to optimize. Recall that the output
of multiROC() is an object of class ‘multiroc’, containing
information about the estimation of the ROC curve and subsets for
multivariate scenarios. Table 2 in the
vignette
includes the usage of this function.
Main syntax:
multiROC(X, D, method = "lrm",
formula = 'D ~ X.1 + I(X.1^2) + X.2 + I(X.2^2) + I(X.1*X.2)', ...)
multiROC() function, fixing input parameters
method = "fixedLinear" and methodLinear to one from "SuLiu"
(Su and Liu 1993), "PepeThompson" (Pepe and Thompson 2000), or "minmax" (Liu et al. 2011).
The R function also admits quadratic combinations when \(p=2\), i.e.
\(\mathcal{Q}_{\boldsymbol{\beta}}(\boldsymbol{X}) = \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \beta_4 X_1^2 + \beta_5 X_2^2\),
by fixing method = "fixedQuadratic" for a particular
coefQuadratic \(= \boldsymbol{\beta} = (\beta_1, \dots, \beta_5)\).The risk score function
logit\(\left\{ \mathbb{P}(D = 1 \, | \, \boldsymbol{X}) \right\}\): Our
package allows the user to fit a logistic regression model
(method = "lrm") considering any family of functions (linear,
quadratic, whether considering interactions or not...) by means of
the input parameter formula.lrm. A stepwise regression model is
fitted if stepModel = TRUE. Details are explained in
Section 5.
The sensitivity for a particular specificity:
multiROC() function integrates the estimation
procedures for the bandwidth matrix developed by Duong (2007), by
fixing method = "kernelOptimal" and choosing a proper method
to estimate the bandwidth ("kernelOptimal.H").method = "dynamicEmpirical", only implemented for \(p=2\)), and
may result in overfitting. Instead, Meisner et al. (2021) method is
recommended (method = "dynamicMeisner").Once the classification subsets for a multivariate marker are
constructed by the multiROC() function, several R methods may be used
for the output object (see Section 2.2). They include print relevant
information or plot the resulting ROC curve. The main contribution of
the package is to plot the construction of the ROC curve together with
the classification subsets in a static figure for a particular FPR
(plot_buildROC() function), or in a video for tracking the whole
process (movieROC() function).
Figure 4
illustrates the videos resulting from movieROC() function.
Particularly, classification accuracy of the
bivariate marker (cg20202438, cg18384097) was studied by using four
different approaches indicated on the captions, considering linear
combinations (top) and nonlinear transformations (bottom). This figure
was implemented by the code below, integrating multiROC() and
movieROC() functions. Four videos are saved as GIF files with names
"PepeTh.gif" (a), "Meisner.gif" (b), "LRM.gif" (c), and
"KernelDens.gif" (d).
R> X <- HCC[ ,c("cg20202438", "cg18384097")]; D <- HCC$tumor
R> biroc_12_PT <- multiROC(X, D, method = "fixedLinear", methodLinear = "PepeThompson")
R> biroc_12_Meis <- multiROC(X, D, method = "dynamicMeisner", verbose = TRUE)
R> biroc_12_lrm <- multiROC(X, D)
R> biroc_12_kernel <- multiROC(X, D, method = "kernelOptimal")
R> list_biroc <- list(PepeTh = biroc_12_PT, Meisner = biroc_12_Meis,
+ LRM = biroc_12_lrm, KernelDens = biroc_12_kernel)
R> lapply(names(list_biroc), function(x) movieROC(list_biroc[[x]],
+ display.method = "OV", xlab = "Gene 20202438", ylab = "Gene 18384097",
+ cex = 1.2, alpha.points = 1, lwd.curve = 4, file = paste0(x, ".gif")))