easyROC : An Interactive Web-tool for ROC Curve Analysis Using R Language Environment

ROC curve analysis is a fundamental tool for evaluating the performance of a marker in a number of research areas, e.g., biomedicine, bioinformatics, engineering etc., and is frequently used for discriminating cases from controls. There are a number of analysis tools which are used to guide researchers through their analysis. Some of these tools are commercial and provide basic methods for ROC curve analysis while others offer advanced analysis techniques and a command-based user interface, such as the R environment. The R environmentg includes comprehensive tools for ROC curve analysis; however, using a command-based interface might be challenging and time consuming when a quick evaluation is desired; especially for non-R users, physicians etc. Hence, a quick, comprehensive, free and easy-to-use analysis tool is required. For this purpose, we developed a user-friendly webtool based on the R language. This tool provides ROC statistics, graphical tools, optimal cutpoint calculation, comparison of several markers, and sample size estimation to support researchers in their decisions without writing R codes. easyROC can be used via any device with an internet connection independently of the operating system. The web interface of easyROC is constructed with the R package shiny. This tool is freely available through www.biosoft.hacettepe.edu.tr/easyROC.


Introduction
The receiver operating characteristics (ROC) curve is a graphical approach used to visualize and assess the performance of a binary classifier system.This unique feature of ROC curve analysis makes it one of the most extensively used methods in various fields of science.It was originally developed during World War II to detect whether a signal on the radar screen represented an object or a noise (Egan, 1975;Swets et al., 2000;Fan et al., 2006) and today it is widely used in medicine, radiology, biometrics, bioinformatics and various applications of machine learning and data mining research (Fawcett, 2006;Sonego et al., 2008).ROC curve analysis can be implemented for several reasons: (i) to assess the overall performance of a classifier using several performance measures, (ii) to compare the performances of classifiers, and (iii) to determine the optimal cutpoint for a given classifier, diagnostic test or marker/biomarker.For simplicity of language, we will use the terms classifier and diagnostic test throughout the manuscript.The performance of a classifier can be summarized using the point estimations and confidence intervals of several basic performance measures such as sensitivity, specificity or combined measures of sensitivity and specificity such as likelihood ratios, accuracy, area under the ROC curve (AUC), etc.A ROC curve is basically a plot of a classifier's true positive rates (TPR: sensitivity) versus false positive rates (FPR: 1 − specificity) where each point is generated by a different threshold value, i.e., cutpoint.For the simplicity of equations, we will use the terms TPR and FPR in the equations.One of the major tasks is to determine the optimum cutpoint value which corresponds to the reasonable TPR and FPR values.The determination of an optimum value is usually a trade-off between performance measures.The ROC curve is used to find the optimal cutpoint located on the curve which is the closest point to the top-left corner.However, finding the "optimum" cutpoint is not always based on maximizing the sensitivity and specificity.It is reasonable to select an optimum cutpoint value by regarding alternative selection criteria such as maximization of predictive values, diagnostic odds ratio, etc.
There are a number of commercial (e.g., IBM SPSS, MedCalc, Stata, etc.) and open-source (R) software packages which are used to guide researchers through their ROC curve analysis.Some of these software packages provide basic features for ROC curve analysis while others, such as R, offer advanced features but also a command-based user interface.The R environment includes comprehensive tools for ROC curve analysis, such as ROCR (Sing et al., 2005), pROC (Robin et al., 2011), ROC (Carey and Redestig, 2015) and OptimalCutpoints (Lopez-Raton et al., 2014).
All of the R packages mentioned above perform ROC curve analysis using the related package functions.Although these packages are comprehensive and flexible, they require a good programming knowledge of the R language.However, working with a command-based interface might be challenging and time consuming when a quick evaluation is desired especially for non-R users, such as physicians and other health care professionalists.Fortunately, an R package shiny (Chang et al., 2015) allows users to create interactive web-tools with a nicely designed, user-friendly and easy-to-use user interface.In this context, we developed a web-tool, easyROC, for ROC curve analysis.The

Theory behind ROC analysis
Let us consider the binary classification problem where X denotes the value of the classifier for cases and controls.Consider the values of controls distributed as X 0 ∼ G 0 (.) and cases as X 1 ∼ G 1 (.).
Let Ŷ = {0, 1} be the estimated class labels of the subjects for a given threshold value c as given in Equation 1.
Parametric ROC curve.The parametric ROC curve is plotted using the FPR (1 − Specificity) and TPR (Sensitivity) values given in Equation 2 for all possible cutpoints of a classifier.
When the distribution of the classifier is Normal, the parametric ROC curve is fitted using binormal ROC properties.Suppose X 0 ∼ Normal(µ 0 , σ 2 0 ) and X 1 ∼ Normal(µ 1 , σ 2 1 ).The ROC curve is the function of FPRs; as in Equation 3.
Fitting the ROC curve by using Equation 3 has two major drawbacks: (i) incorrect ROC curves may arise when the underlying distribution is not normal, (ii) ROC lines are improper when within class variations are not similar, i.e., heteroscedasticity.An example of improper ROC curves is given in Figure 1.To overcome these problems, one may nonparametrically fit the ROC curve without considering distributional assumptions or use parametric/semiparametric alternatives to the binormal model (Gönen and Heller, 2010).
Nonparametric ROC curve.Consider the estimated class labels in Equation 1.The FPR and TPR given in Equation 2 are estimated; as given in Equation 5.
The empirical ROC curve is plotted using FPR c and TPR c and the area under the curve, given in Equation 6, is estimated by summing the trapezoids enclosed by the points of the ROC curve.The nonparametric AUC is related to the Mann-Whitney statistic of the rank-sum test (Bamber, 1975;Hanley and McNeil, 1982). where Performance measures and optimal cutpoints.The predicted and actual classes, i.e., gold standard test results, can be shown with a 2 × 2 cross table; as seen in Table 1.The performance of a classifier is basically measured using the total proportion of true positive (TP) and true negative (TN) cases.By using Table 1, several performance measures are also calculated.Among these performance measures, we focused on the measures given in Table 1 which are widely used and well-known.The optimal cutpoint is determined by using one or more performance measures together.An ideal cutpoint, for example, might be selected by maximizing the sensitivity and specificity of a classifier.A classifier with perfect discriminative ability would have sensitivity and specificity measures equal to 1. Hence, the area under the curve for a perfect separation will be equal to 1.
Although researchers are usually interested in the overall diagnostic performance of a classifier, it is sometimes useful to focus on a portion of the ROC curve to compute the partial AUCs (pAUC).pAUC is an extension of the AUC measure which considers the trapezoids within a given interval of  7or by summing the trapezoids within the interval (nonparametric).
As the interval [t 1 , t 2 ] converges to [0, 1], the pAUC will converge to the overall AUC.The best classifier can be selected using either AUC or pAUC values.
Identification of the optimal cutpoint is an important task to avoid incorrect conclusions.Various methods are available in the literature to determine the optimal cutpoint.Most of these methods are based on the sensitivity and specificity measures.However, other methods are also available based on cost-benefit, prevalence, predictive values and diagnostic likelihood ratios.Two popular methods are, for example, the Youden index and the minimization of the distance of the point on the curve to the top-left corner, i.e., the point indicating perfect discrimination.
Table 1 gives the list of optimal cutpoint methods we consider in easyROC.For detailed information and mathematical background, see Lopez-Raton et al. (2014).

Statistical inference.
A common subject of interest in ROC analysis is to compare the performances of several classifiers to select the best one to discriminate cases from controls.For a classifier with random chance discrimination ability, the equation TPR = FPR holds.In that case, the area under the curve is 0.50.Hence, the discrimination ability of a classifier is mostly tested against the value 0.50.
Under the large sample theory, the significance of AUC is tested using the Wald test statistic as given in Equation 9.
When the parametric approach is used, the variance of AUC is estimated using Equation 10 (McClish, 1989;Zhou et al., 2002).
The R Journal Vol.8/2, December 2016 ISSN 2073-4859 and the estimated variances for a and b as follows: The estimated values of a and b are used in Equation 11.A number of methods have been proposed for the estimation of the variance of AUC when the nonparametric approach is used.In this paper, we will focus on the methods described below: 1. Mann-Whitney version of rank-sum test: Hanley and McNeil (1982) propose the variance estimation given in Equation 13.This method estimates the variance using an approximation based on exponential distribution as where The Mann-Whitney version might underestimate the variance when the area is nearly 0.5 and overestimate it when the area is close to 1 (Hanley and McNeil, 1982;Hanley and Hajian-Tilaki, 1997;Obuchowski, 1994).This estimate is mostly used in sample-size estimation.

DeLong et al. (1988)'s estimate:
Since the exponential distribution approximation in Equation 13gives biased variance estimates, DeLong et al. (1988) suggest an alternative method which is free from distributional assumptions.Define the components T 1i for the ith subject from cases and T 0j for the jth subject from controls as follows: Using the Equation 14 the variance of AUC is estimated as where S 2 T 1 and S 2 T 0 are variance estimates of T 1 and T 0 as in Equation 16.
3. Normal approximation of binomial proportion: Another alternative for variance estimation is to use binomial approximation under the large sample theory, as given in Equation 17.For small samples, this method may give biased estimates.
The R Journal Vol.8/2, December 2016 ISSN 2073-4859 The estimated variance derived from one of the methods described above is used to construct the confidence intervals of the AUC.A common method is to use large sample approximation as below: When the area under the curve is close to 1 or the sample size is relatively small, the large sample approximation in Equation 18 produces improper confidence intervals since the upper limit exceeds 1.To solve this problem, Agresti and Coull (1998) proposed the score confidence interval that guarantees the upper limit is less than or equal to 1. Another alternative is to construct the binomial exact confidence intervals given in Equation 19 using the relationship between binomial and F-distribution (Morisette and Khorram, 1998) where p = x/n is the binomial proportion such as sensitivity, specificity and AUC.

Sample size calculation.
In most studies, determining the required sample size is an important step for the research to be able to detect significant results.Sample size determination is required for both constructing the confidence interval of the unknown population parameter and testing a research hypothesis.Obuchowski (1998) reviewed sample size determination for several study designs.In this paper, we cover the sample size determination for three types of studies based on AUCs.In addition, the following sample size calculations can be extended to other performance measures such as sensitivity, specificity, etc.
The variance estimates of AUCs can be obtained using one of the Equations 13, 15 and 17.While Equation 13 is a good approximation for a variety of underlying distributions, the estimated variance will be underestimated if the test results are in a discrete rating format.To overcome this problem, Obuchowski (1998) and Obuchowski et al. (2004) suggest an alternative variance estimation method for rating data using the variance function as given in Equation 20 which is based on an underlying binormal distribution.In this section, we focused on sample size calculation for discrete scale data.However, the same formulas are valid for continuous scale diagnostic tests since the only difference is about estimating the variance of diagnostic test accuracy.
where a = √ 2 Φ −1 (AUC) and R = n 0 /n 1 is the allocation ratio, i.e., the ratio of the number of controls to the number of cases.The estimated variance is then Var( AUC) = V( AUC)/n 1 .The total sample size is equal to n = n 1 (1 + R).One of the variance estimations from Equations 13, 15, 17 and 20 is used for the sample size calculations.The selection of the appropriate variance estimation method is based on the variable type of the test results and underlying distributions.

Hypothesis test to determine the AUC of a single classifier:
In most of the studies with a single classifier, the aim of the study is to determine whether the diagnostic test performs well for discriminating diseased patients from controls.Consider the hypotheses H 0 : AUC = 0.5 versus H 1 : AUC > 0.5 (i.e, one-sided test).The required number of cases is determined using Equation 21 (Obuchowski et al., 2004).
The R Journal Vol.8/2, December 2016 ISSN 2073-4859 where Var 0 and Var 1 are the variance estimations under the null and alternative hypotheses using Equation 20. z 1−α and z 1−β are lower-tailed percentile values of the cumulative standard normal distribution.Finally, the total sample size is obtained using n = n 1 + n 1 × R.

Comparing the AUCs of two classifiers:
When the aim of a study is to compare two classifiers, one may consider the hypotheses The two classifiers will be equally performing under the null hypothesis.The required number of cases is calculated using Equation 22.
where Var 0 and Var 1 are the variance estimations under the null and alternative hypotheses; as given in Equation 23 (Zhou et al., 2002;Obuchowski et al., 2004).
The total sample size is calculated using the allocation ratio.When two classifiers are performed on the same subjects, the design will be paired yielding the covariance term to be a nonzero (usually positive) quantity.However, the covariance term will be zero (i.e., independent classifiers) if each test is performed on different subjects.Detailed information on the calculation of the covariance term can be found in Zhou et al. (2002).

Non-inferiority of a new classifier to a standard one:
In addition to comparing two classifiers, some studies are designed to explore the performance of a new classifier to that of a standard one.The new classifier should perform as well as but not necessarily better than the standard test (Obuchowski et al., 2004).The hypotheses are H 0 : AUC std − AUC new ≥ ∆ versus H 1 : AUC std − AUC new < ∆.The required number of cases is calculated using Equation 24 where ∆ is the non-inferiority margin, i.e., the minimum acceptable difference between the AUCs of the standard and new classifiers.

Current ROC analysis tools and easyROC
ROC curve analysis is one of the standard procedures included in most statistical analysis tools such as IBM SPSS, Stata, MedCalc and R. Each tool offers different features within ROC curve analysis.Among commercial software packages, IBM SPSS, which is one of the most widely used commercial software packages, plots the ROC curve and computes some basic statistics such as AUC and its standard error, confidence interval and statistical significance.However, it does not provide any method for sample size calculation or cutpoint determination.Stata offers a variety of calculations for ROC curve analysis including partial AUC, multiple comparisons of ROC curves, optimal cutpoint determination using the Youden index and several performance measures.Another commercial software alternative for ROC curve analysis is MedCalc, which has comprehensive features compared to most of the other available commercial software packages and is especially developed for biomedical research.MedCalc provides sample size estimation for a single diagnostic test, but it does not have an option for pAUC calculation.
Unlike commercial software packages, R is an open source and free software package that includes all the features of commercial software packages and more through several packages such as ROC,  ROCR, pROC and OptimalCutpoints.ROC is an R/Bioconductor package which can plot the ROC curve and calculate the AUC.It also calculates pAUCs based on false positive rates.This package is originally developed to be used for the ROC analysis with DNA microarrays.ROCR is a comprehensive R package providing over 25 different performance measures (based on package version 1.0-7).It allows users to create two dimensional performance curves.Although ROCR is one of the most comprehensive packages for assessing the performance measures, it provides limited options to select the optimum cutpoint.One may use any of the two-dimensional performance graphs to determine the optimal cutpoint graphically.It computes the AUC and its confidence interval, however, it does not provide a statistical test for performance measures.pROC, on the other hand, offers more comprehensive and flexible features than its free and commercial counterparts.It performs statistical tests for the comparison of ROC curves using DeLong et al. (1988), Venkatraman and Begg (1996) and Venkatraman (2000) for AUC, and Hanley and McNeil (1983) and Pepe et al. (2009) for both AUC and pAUC.It also calculates the confidence intervals for the sensitivity, specificity, ROC curves, pAUC, and smoothed ROC curves.The confidence intervals are computed using DeLong et al. (1988)'s method for AUCs and using bootstrap for pAUCs, sensitivity and specificity at given threshold(s).Bootstrap confidence intervals and pAUC regions are shown in the ROC curve plot.Several diagnostic measures, such as sensitivity, specificity, negative and positive predictive values, are computed for a given threshold.Like ROCR, pROC also offers limited features for detecting the optimal cutpoint.Two methods, i.e., Youden index and closest point to the top-left corner, are available to find the optimal cutpoint.In addition, pROC is an alternative among the ROC packages on CRAN to find the required sample size for a single diagnostic test or the comparison of two diagnostic tests.Two versions of pROC are available: (i) for the R programming language and (ii) with a graphical user interface for the S-PLUS statistical software package.
There are several packages providing optimal cutpoint calculations through R. OptimalCutpoints is a sophisticated R package specifically developed to determine the optimal cutpoint of a test or biomarker (Lopez-Raton et al., 2014).It includes 34 different cutpoint calculation methods based on sensitivity/specificity measures, cost-benefit analysis, predictive values, diagnostic likelihood ratios, prevalences and p-values.A brief description of these methods is given in Supplementary 1.Although these R packages, especially pROC, seem to be a perfect match for ROC curve analysis, none of them has a graphical user interface and all require coding knowledge, which makes them hard to use; especially for non-R users.
Another R package worth mentioning is plotROC (Sachs, 2016) which is available on CRAN and also for shiny platforms.plotROC is a flexible and sophisticated R package which can be used to create nice-looking and interactive ROC graphs.Unlike the packages described above, plotROC has a web-based user interface which is very useful for non-R users.Researchers can use its web service to create ROC graphics and download the figures to their local computer.However, it does not provide any statistical tests or sample size calculations.easyROC aims to extend the features of several ROC packages in R and allows researchers to conduct their ROC curve analysis through a single and easy-to-use interface without writing any R code.This tool is a web-based application created via shiny and HTML programming.easyROC makes use of the R packages plyr (Wickham, 2011), pROC and OptimalCutpoints for conducting ROC analysis.plyr is used for manipulating data while pROC is used for estimation and hypothesis testing of pAUCs.easyROC has comprehensive options for ROC curve analysis which other tools do not have  (or partially shares some features).The ROC curve can be estimated using parametric or nonparametric approaches.It offers four different methods for the calculation of the standard error and confidence interval of the AUC.Researchers can calculate the pAUCs based on sensitivity and specificity, if necessary.One may perform pairwise comparisons to find the classifiers which have similar or different discrimination ability.However, the pairwise comparison should be carried out carefully since the type I error increases with the increasing number of comparisons.easyROC offers multiple test corrections in order to keep type I error at a given level.Multiple comparisons of diagnostic tests can be applied using either Bonferroni or false discovery rate correction.Furthermore, the optimal cutpoints are determined using the methods from OptimalCutpoints and the corresponding measures at a given cutpoint, including sensitivity, specificity, positive and negative predictive values, and positive and negative likelihood ratios are also returned.One can determine the desired sample size for ROC curve analysis using this tool for three different cases.All these comprehensive features are accessible through a graphical user interface, which makes the analysis process easier for all users.The comparison with other tools is given in Table 2 and the features of each module are given in Table 3.

Case study on non-alcoholic fatty liver disease
To illustrate our application, we used the non-alcoholic fatty liver disease (NAFLD) dataset of Celikbilek et al. (2014).This study was designed to identify the non-invasive miRNA biomarkers of NAFLD.
The authors obtained the serum samples of 20 healthy and 20 NAFLD observations and quantified the expression levels of eight miRNAs using quantitative Real-Time PCR (qPCR) technology.After performing the necessary statistical analysis, the authors revealed that miR-197, miR-146b, miR-181d and miR-99a may be potential biomarkers in identifying NAFLD.The normalized expression values The R Journal Vol.8/2, December 2016 ISSN 2073-4859 of these miRNAs and the class information (the column named "Group", where 0 refers to controls and 1 refers to cases) of each observation are given in Supplementary 2. This file can be directly used as input to the easyROC web-tool and users can arrange their own data based on this file.Two example datasets, Mayo and PBC (Murtaugh et al., 1994), are also available in the web-tool for users to practice the application.In our example, the aim is to investigate the discriminative performances of each miRNA, compare each other and identify the optimal cutpoints for each miRNA in identifying NAFLD.

Implementation of easyROC web-tool
The data are uploaded to the easyROC interface using the Data upload tab (Figure 2).easyROC accepts a delimited text file with variable names in the first row.The status variable is also set by the same tab panel.easyROC automatically detects the variable names and exports them into related fields.When data are correctly uploaded, researchers may proceed with ROC curve analysis, cutpoint estimations or sample size calculations.The area under the curve, confidence intervals and significance tests for AUC, multiple comparisons (if multiple markers are selected) and pAUCs are calculated with the ROC curve tab (Figures 3 and 4).The ROC curve is estimated using the nonparametric approach.The advanced option allows researchers to select a method for standard error estimation and confidence intervals.easyROC selects the DeLong et al. ( 1988) method by default.
Here, we select mir197, mir146b, mir181d and mir99a miRNAs to assess their performances and to compare them with each other in identifying NAFLD.Since the expression levels of all miRNAs are underexpressed in the NAFLD group, lower values will indicate higher risk and therefore we should uncheck the "Higher values indicate higher risks" box.Using DeLong et al. (1988) standard error estimations, we obtained the ROC curves for each miRNA biomarker and AUC values as 0.86 (0.75-0.97), 0.77 (0.61-0.92), 0.76 (0.60-0.93) and 0.75 (0.59-0.91) for mir181d, mir197, mir99a and mir146b, respectively.The results revealed that all miRNAs' predictive performances are significant and higher than random chance in identifying NAFLD (Figure 3).By controlling the type I error using Bonferroni correction, all pairwise comparisons showed non-significant results (p > 0.05).This may be due to the small sample size of the data.Increasing the sample size, thus the statistical power of the test, may concretize the predictive ability of mir181d as compared to other miRNAs.
Finding a suitable cutpoint is one of the aims of ROC curve analysis.We made use of the OptimalCutPoints package from R (Lopez-Raton et al., 2014), which has 34 different methods, to calculate cutpoints for each marker.An optimal cutpoint can be computed via the Cut point tab by selecting a marker and a method.Then, the application will calculate an optimal cutpoint and The R Journal Vol.8/2, December 2016 ISSN 2073-4859  performance measures such as sensitivity, specificity, positive and negative predictive value, and positive and negative likelihood ratio based on the corresponding cutpoint value.The "ROC01" method, for example, determines the optimal cutpoint as −0.12977 for mir181d.Using this cutpoint, a new test observation with a mir181d expression level lower than this value can be assigned as an The R Journal Vol.8/2, December 2016 ISSN 2073-4859 NAFLD patient.Based on the identified cutpoint, we obtained statistical diagnostic measures with 95% confidence intervals (Figure 5).We obtain a sensitivity of 0.75 (0.51-0.91) and specificity of 0.80 (0.56-0.94).If users select the "Include plots" option, four plots will appear under the statistics results.The first plot in the upper-left corner displays the optimal cutpoint on the ROC curve.Users can observe the change of sensitivity and specificity measures based on the value of the marker on the plot placed in the upper-right corner.The density and scatter of the expression values in each group are displayed in the bottom-left and bottom-right corners.The plots can be modified through the "More plot options" section.All the results and figures can be downloaded using the related "Download" buttons in each tab panel.

Conclusion
Since ROC curve analysis is one of the principal statistical analysis methods, it is used by a wide range of the scientific community.Both commercial and free software tools are available for users to perform it.Generally, easy-to-use and nicely-designed interfaces are offered by commercial software packages whereas flexible and comprehensive tools are available in free, open-access, code-based software packages, such as R. The first novelty of our tool is that it allows the user to use free and open-access software with an easy-to-use interface.In other words, we combine the power of an open-source and free language with a nicely designed and easily accessible interface.This tool offers more comprehensive features and a wide variety of implementations for ROC curve analysis than its commercial and free counterparts, which is another novelty of this application.It is specifically constructed for ROC curve analysis, unlike the commercial software packages, such as IBM SPSS, Stata and MedCalc.
This web-based application is intended for research purposes only, not for clinical or commercial use.Since it is a non-profit service to the scientific community, it comes with no warranty and no data security.However, since this web server uses the R package shiny, each user performs his/her analyses in a new R session.After uploading data, the application only saves responses within its R session and prints the results instantly.After a user has quit the application, the corresponding R session will be closed and any uploaded data, responses or outputs will not be saved locally or remotely.

Figure 4 :
Figure 4: Multiple comparison of the diagnostic tests.
interface of easyROC is constructed via shiny and HTML codes.easyROC combines several R packages for ROC curve analysis.This tool has three main parts including ROC statistics, cutpoint calculations and sample size estimation.Detailed information about easyROC and the related methods together with mathematical background are given in Section Material and methods.easyROC is freely available at http://www.biosoft.hacettepe.edu.tr/easyROC and all the source codes are on GitHub 1 .

Table 2 :
Comparison of easyROC with other tools.