vivid: An R package for Variable Importance and Variable Interactions Displays for Machine Learning Models

Alan Inglis; Andrew Parnell; Catherine Hurley

doi:10.32614/RJ-2023-054

1 Introduction

Our motivation behind the creation of the vivid package is to investigate machine learning models in a way that is simple to understand while also offering helpful insights into how variables affect the fit. We do this through the use of heatmaps, network graphs, and both a generalized pairs plot style partial dependence plot (PDP) (Friedman 2000) and a space saving PDP based on key variable subsets. While the techniques and fundamental goals of these visualizations have been discussed in Inglis et al. (2022b), we focus here on the implementation details of the package by providing a complete listing of the functions and arguments included in the vivid package with further examples indicating advanced usage beyond that previously shown. In this work we examine the decisions made when designing the package and provide an in-depth look at the package functions and features with the intention of making the work applicable to a larger readership. This article outlines the general architectural principles implemented in vivid, such as the data structures we use and data formatting, function design, filtering techniques, and more. We illustrate each function by way of a practical example. Our package is available on the Comprehensive R Archive Network at https://cran.r-project.org/web/packages/vivid or on GitHub at https://github.com/AlanInglis/vivid.

In recent years machine learning (ML) algorithms have emerged as a valuable tool for both industry and science. However, due to the black-box nature of many of these algorithms it can be challenging to communicate the reasoning behind the algorithm’s decision-making processes. With the need for transparency in ML growing it is important to gain understanding and clarity about how these algorithms are making predictions (Antunes et al. 2018; Felzmann et al. 2019). Many R packages are now available that aid in creating interpretable machine learning (IML) models such as iml (Molnar et al. 2018), DALEX (Biecek 2018), and lime (Hvitfeldt et al. 2022). For a comprehensive review of IML, see Molnar (2022) and Biecek and Burzykowski (2021).

How we choose to visualize aspects of the model output is of vital importance to how a researcher can interpret and communicate their findings. Consequently, model summaries such as variable importance and variable interactions (VImp and VInt; together we term these VIVI) are frequently used in various fields to comprehend and explain the hidden structure in an ML fit. In ecology they are employed to determine the causes of ecological phenomena (e.g. Murray and Conner 2009); in meteorology VImp measures and partial dependence plots are used to examine air quality (e.g. Grange et al. 2018); in bioinformatics, understanding gene-environment interactions have made these measures an important tool for genomic analysis (e.g. Chen and Ishwaran 2012).

In Table 1 we summarize VIVI measures and visualizations provided by a selection of R packages. VIVI measures are categorized as global or local, depending on whether they refer to the entire predictor space or a specific sub-region. For instance, Individual Conditional Expectation (ICE) curves (Goldstein et al. 2015) and Local Interpretable Model-agnostic Explanations (LIME) (Ribeiro et al. 2016) are considered local methods (implementations of which can be found in the ICEbox package (Goldstein et al. 2015) and the lime package (Hvitfeldt et al. 2022)), whereas partial dependence and permutation importance represent global methods. Packages that incorporate local or global methods can be found under the column ‘Method’ in Table 1. Additionally, VIVI measures can be broken down into model specific (embedded) methods or model agnostic methods. The packages listed in Table 1 are grouped by whether they can compute model specific or agnostic measures and can be found under the column `Measure’. In embedded methods the variable importance is incorporated into the ML algorithm. For example, random forests (RF) (Breiman 2001) and gradient boosting machines (GBM) (Friedman 2000) use the tree structure to evaluate the performance of the model. Bayesian additive regression tree models (BART) (Chipman et al. 2010) also use an embedded method to obtain VIVI measures by looking at the proportion of splitting rules for variables or variable pairs used in the trees. The package randomForestExplainer (Paluszynska et al. 2020) provides a set of tools to understand what is happening inside a random forest and uses the concept of minimal depth (Ishwaran et al. 2010) to assess both importance and interaction strength by examining the position of a variable within the trees. Additionally the varImp (Probst 2020) package can be used to compute importance scores for the conditional inference random forest of the party package (Strobl et al. 2008). For gradient boosted machines, the EIX (Maksymiuk et al. 2021) package can be used to measure and identify VIVI and visualize the results.

Model-agnostic methods are techniques that can, in principle, be applied to any ML algorithm. Agnostic methods not only provide flexibility in relation to model selection but are also useful for comparing different fitted ML models. An example of a model agnostic approach for evaluating VImps is permutation importance (Breiman 2001). This method calculates the difference in a model’s predictive performance following a variable’s permutation; implementations are available in the packages iml, flashlight (Mayer 2023), DALEX and vip (Greenwell and Boehmke 2020). Each package provides options for specifying the performance metric to use in computing the model performance as well as providing options to select the number of permutations, with flashlight and DALEX packages additionally providing an option to select the number of observations that should be sampled for the calculation of variable importance. The vip package also provides a PDP/ICE-based variable importance method, which quantifies the variability in PDP and ICE plots. For VInts, Friedman’s $H$-statistic (Friedman and Popescu 2008) is an agnostic interaction measure derived from the partial dependence by comparing a pair of variables’ joint effects with the sum of their marginal effects. Packages iml and flashlight provide implementations.

Partial dependence plots (PDPs) were first introduced by Friedman (2000) as a model agnostic way to visualize the relationship between a specified predictor variable and the fit, averaging over other predictors’ effects. Similar to PDPs, individual conditional expectation curves (ICE) (Goldstein et al. 2015) show the relationship between a specified predictor and the fit, fixing the levels of other predictors at those of a particular observation. PDP curves are then the average of the ICE curves over all observations in the dataset. R packages offering PDPs include iml, DALEX (which calls the ingredients package (Biecek and Baniecki 2023)), flashlight, and pdp (Greenwell 2017); the package ICEbox (Goldstein et al. 2015) provides ICE curves and variations. Each package provides options for selecting the size of the grid for evaluating the predictions, with flashlight and DALEX providing options to select the number of observations to consider in calculating the partial dependence.

In vivid we provide a suite of functions (see Table 2) for calculating and visualizing variable importance, interactions and the partial dependence. Our displays conveniently show (either model specific and agnostic) VImp and VInt jointly using heatmaps and network graphs. Through the use of seriation techniques, we group together variables with the greatest impact on the response. For our network plot, additional filtering to remove less influential variables and clustering options to group highly interacting variables are provided, thus providing a more informative picture identifying relevant features. Additionally, our implementation makes it possible to apply our methods to subsets of data, which leads to locally-based importance measures. Our generalized PDP (GPDP) displays partial dependence plots in a matrix layout combining univariate and bivariate partial dependence plots with variable scatterplots. This layout, coupled with seriation, allows for quick assessment of of how pairs of variables have an impact on the model fit. We furthermore provide a more compact version of the GDPD, the so-called zen-partial dependence plot (ZPDP) consisting only of those bivariate partial dependence plots with high VInt. All of our displays are designed to quickly identify how variables, both singly and jointly, affect the fitted response and can be used for regression or classification fits. As the output of our displays are ggplot2 objects (Wickham 2016), they are easily customizable and provide the flexibility to create custom VIVI visualizations.

This paper is structured as follows. First we introduce a dataset and fit models that will be used as examples throughout this paper. Following this, we describe vivid functionality for calculating VIVI. We then move on to visualizations and focus on the functionality provided by the two functions viviHeatmap and viviNetwork for displaying VIVI, and two functions for displaying PDPs namely, pdpPairs and pdpZen. Finally we provide some concluding discussion.

2 Example: Data and Models

The well-known Boston housing data (Harrison Jr and Rubinfeld 1978) from the R package MASS (Venables and Ripley 2002) concerns prices of 506 houses and 14 predictor variables including property attributes such as number of rooms and social attributes including crime rate and pollution levels. The response is the median value of owner-occupied homes in $1000s (medv).

We first fit a random forest (using the randomForest package). In order to access all the available embedded variable importance scores, the importance argument must be TRUE when calling the randomForest function. This allows any of the provided importance metrics to be used in vivid. However, in our following examples, we use an agnostic VImp measure supplied by the vivid package, which allows us to directly compare VImp values across different fits.

library("randomForest") 
library("MASS")
set.seed(1701)  
data("Boston")

rf <- randomForest(medv ~., data = Boston)

Next we fit a gradient boosted machine (using the xgboost package). For the GBM we set the maximum number of boosting iterations, nrounds, to 100 as no default is provided in xgboost.

library("xgboost") 
gbst <- xgboost(data = as.matrix(Boston[,1:13]), 
                label =  as.matrix(Boston[,14]),
                nrounds = 100,
                verbose = 0)

In the following sections we will explain how aspects of the two fits can be compared with vivid software. We will also explain aspects of our software design with reference to these fits.

3 Calculating VIVI

The first step in using vivid is to calculate variable importance and interactions for a model fit. The vivi function calculates both of these, creating a square, symmetric matrix containing variable importance on the diagonal and variable interactions on the off-diagonal. Required inputs are a fitted ML model, a data frame on which the model was trained, and the name of the response variable for the fit. The returned matrix has importance and interaction values for all variables in the supplied data frame, excluding the response. Our visualizations functions viviHeatmap and viviNetwork are designed to show the results of a vivi calculation, but will work equally well for any square matrix with identical row and column names. (Note, the symmetry assumption is not required for viviHeatmap and viviNetwork uses interaction values from the lower-triangular part of the matrix only.)

The code snippet below shows the creation of a vivid matrix for the random forest fit. For clarity, we include all of the vivi function arguments for the random forest fit, though only the first three are required. Other inputs will be described in the section vivi function additional arguments.

library("vivid")

set.seed(1701)
viviRf <- vivi(fit = rf,
               data = Boston,
               response = "medv",
               reorder = FALSE,
               normalized = FALSE,
               importanceType = "agnostic",
               gridSize = 50,
               nmax = 500,
               class = 1,
               predictFun = NULL,
               numPerm = 4)

In the absence of any model-specific importance measure we use an agnostic permutation method described by Fisher et al. (2019) to obtain the variable importance scores. In this method a model error score (root mean square error) is calculated, then each feature is randomly permuted and the model error is re-calculated. The difference in performance is considered to be the variable importance score for that feature. By default the permutation is set to be replicated four times to account for variability. However, we provide an option to select a desired number of permutations (via the numPerm argument)).

The vivi function calculates both the importance and interactions using S3 methods. By default, the agnostic importance and interaction scores in vivid are computed using the generalized predict function from the condvis2 package (Hurley et al. 2022). Consequently, vivid can be used out-of-the-box with any model type that works with condvis2 predict (see CVpredict from condvis2 for a full list of compatible model types). To allow vivid to run with other model fits, a custom predict function must be passed to the predictFun argument (as discussed below).

The S3 method used to obtain the importance is called vividImportance. vivid relies on the flashlight package to calculate agnostic importance via flashlight::light_importance which currently works for numeric and numeric binary responses only. For model-specific variable importance, we provide individual methods to access importance scores for some of the most popular model fitting R packages, namely; ranger (Wright and Ziegler 2017), randomForest, mlr (Bischl et al. 2016), mlr3 (Lang et al. 2019), and parsnip (Kuhn and Vaughan 2022) (however, more could be added via additional methods). To access any available model-specific variable importance from the aforementioned packages, the importanceType argument must be set to equal the selected importance metric. For example, to select the percent increase in mean square error importance score from the randomForest package, the importanceType argument must be set to equal this measure as it is called within the randomForest package, that is; importanceType = "%IncMSE". If the importanceType is not set or is set to equal agnostic, then the agnostic importance is calculated. It should be noted that when comparing different model fits, using model-specific variable importance will result in importance measures that are not directly comparable (however, comparing model specific scores could be useful when comparing the same fitting procedure evaluated using different parameters). In the case that a practitioner may wish to use our visualizations to compare different model fits, we recommend using the agnostic permutation approach supplied by vivid to make a direct comparison of the importance measures.

For variable interactions, we use the model-agnostic Friedman’s $H$-statistic to identify any pairwise interactions. As discussed in Inglis et al. (2022b), we recommend the unnormalized version of the $H$-statistic which prevents detection of spurious interactions which can occur when the bivariate partial dependence function (used in the construction of the $H$-statistic) is flat. In the case of a binary response classification model, we follow Hastie et al. (2009) and compute the $H$-statistic and partial dependence on the logit scale.

The vivi function calculates interactions using an S3 method called vividInteraction, which again relies on the flashlight package to calculate Friedman’s $H$-statistic via flashlight::light_interaction. Friedman’s $H$-statistic is the only interaction measure currently available in vivid, though the method of Greenwell et al. (2018) could also be used for this purpose. Embedded interaction measures could easily be incorporated via S3 methods in future.

flashlight simplifies the calculation of VIVI values as it allows a custom predict function to be supplied for the calculation of agnostic importance and the $H$-statistic; this flexibility means importance and the $H$-statistic can be calculated for any ML model. Note that as flashlight importance and interaction functions act in a model agnostic way, they will give a VIVI of zero for variables in the dataset (except the response) that are not used by the supplied ML fit. We supply an internal custom predict function called CVpredictfun to both flashlight::light_importance and flashlight::light_interaction. CVpredictfun is a wrapper around CVpredict from the condvis2 package, which adds an option for the classification to select (via the class argument to vivi) the class to be used for prediction and calculates predictions on the logit scale by default. CVpredict accepts a broad range of fit classes thus streamlining the process of calculating VIVI.

In situations where the fit class is not handled by CVpredict (as is the case for the GBM model created from xgboost), supplying a custom predict function to the vivi function by way of the predictFun argument allows the agnostic VIVI values to be calculated. In the code snippet below, we build the vivid matrix for the GBM fit using a custom predict function, which must be of the form given in the code snippet. For brevity we omit some of the optional vivi function arguments. By default, the agnostic variable importance is used to allow for a direct comparison of the importance measures for both model fits.

# predict function for GBM
pFun <- function(fit, data, ...) predict(fit, as.matrix(data[,1:13]))

set.seed(1701) 
viviGBst <- vivi(fit = gbst,
                 data = Boston,
                 response = "medv",
                 reorder = FALSE,
                 normalized = FALSE,
                 predictFun = pFun)

3.1 `vivi` function additional arguments

The vivi function has 11 arguments. Some of these have been discussed above, including fit, data, response, importanceType, and predictFun. Here we provide a summary of the remaining arguments. First, the normalized argument determines if Friedman’s $H$-statistic should be normalized or not (see Inglis et al. (2022b), for the pros and cons of each version). The arguments gridSize and nmax are used to set the size of the grid for evaluating the predictions and maximum number of data rows to consider, respectively, in the calculation of the $H$-statistic. Lowering the grid size can provide a significant speed boost, though at the expense of predictive accuracy. Additionally, sampling the data via nmax offers a speed boost. The default values for gridSize and nmax are 50 and 500, respectively.

3.2 Speed tests

A drawback of using Friedman’s $H$-statistic as a measure of interaction is that it is a computationally expensive calculation, and may be especially time-consuming for models where prediction is slow. Figure 1 shows the build time (rounded to the nearest second) averaged over five runs for the creation of a vivid matrix with default parameters for different ML algorithms using the Boston Housing data. As the Boston housing data has 13 predictor variables, Friedman’s $H$-statistic is computed for 91 predictor pairs. The ML algorithms are: GBM, random forest, support vector machine (SVM), neural network (NN), and k-nearest neighbors (KNN). The SVM, NN, and KNN were built using the e1071 (Meyer et al. 2021), nnet (Venables and Ripley 2002), and kknn (Schliep and Hechenbichler 2016) packages, with the KNN being built through the mlr3 (Lang et al. 2019) framework. Each of the models were built using their default settings and, for each model fit, the agnostic VIVI was measured. The speed tests were performed on both a 2017 Mac-book Pro 2.3 GHz Dual-Core Intel Core i5 with 8GB of RAM and a 2021 32GB Mac-book M1 Pro. Here we are essentially comparing predict times for the various fits. The NN fit created using the nnet package was the fastest, followed by GBM. Both random forests are the slowest, and surprisingly, the older Mac beats its higher spec cousin for the randomForest fit.

Mean time over five runs, on two MacBooks, for the creation of a vivid matrix for different models. Times are highly dependent on the model fit, with NN the fastest and random forests the slowest.

Figure 1: Mean time over five runs, on two MacBooks, for the creation of a vivid matrix for different models. Times are highly dependent on the model fit, with NN the fastest and random forests the slowest.

3.3 Alternative construction of a `vivid` matrix

A vivid matrix may also be obtained from variable importance and interaction values calculated elsewhere. The package vip offers these, and evaluates interactions using a method called the feature importance ranking measure (FIRM, see Greenwell et al. (2018), for more details). The vip2vivid function we provide in vivid takes VIVI values created in vip and turns them into a vivid matrix, that can be subsequently used with our plotting tools. For example, in the code below, model-specific VImp and FIRM VInt scores are calculated for the random forest fit, and subsequently arranged into a vivid matrix with the VImps on the diagonal and VInts on the off-diagonal.

library("vip")
# get model specific VImps using vip package
vipVImp <- vi(rf, method = 'model')
# get VInts using vip package
vipVInt <- vint(rf, feature_names = names(Boston[-14]))

# turn into vivi-matrix
vipViviMat <- vip2vivid(importance = vipVImp, interaction = vipVInt)

4 Heatmap of Variable Importance and Variable Interactions

The viviHeatmap function constructs a heatmap displaying both importance and interactions, with importance on the diagonal and interactions on the off-diagonals. A vivid matrix is the only required input, which does not necessarily need to be symmetric (for example, the interaction measures from the randomForestExplainer package are asymmetric and could be visualized using our heatmap. Color palettes for the importance and interactions are optionally provided via impPal and intPal arguments. For the default color palette we choose single-hue, color-blind friendly sequential color palettes from Zeileis et al. (2020), where low and high VIVI values are represented by low and high luminance color values respectively, aiding in highlighting values of interest.

The ordering of the heatmap is taken from the ordering of the input matrix. As reorder was set to FALSE when building both the random forest and GBM fit vivid matrix, the ordering of the heatmaps matches the variable order in the dataset. This is useful for directly comparing multiple heatmaps, however it does not necessarily lend itself for easy identification of the largest VIVI values. If we were to seriate both vivi matrices separately, we would end up with different optimal orderings for each matrix. An alternative is to create a common ordering by averaging over the two vivid matrix objects and applying the vividReorder function to the result. (This function uses a seriation algorithm based on the techniques of Earle and Hurley (2015) designed to place high interaction variables adjacently and to pull high VIVI variables towards the top-left; see Inglis et al. (2022b) for details.) Both VIVI matrices are then re-ordered using the newly obtained variable order.

# average over matrices and seriate to get common ordering
viviAvg <- (viviRf + viviGBst) / 2
viviAvgReorder <- vividReorder(viviAvg)

# reorder vivi-matrices 
ord <- colnames(viviAvgReorder)
viviRf <- viviRf[ord,ord]
viviGBst <- viviGBst[ord,ord]

Arguments impLims and intLims specify the range of importance and interaction values to be mapped to colors. Default values are calculated from the maximum and minimum VIVI values in the vivid matrix. Importance and interaction values falling outside the supplied limits are squished to the closest limit. It can be useful to specify these limits in the situation where there is an extremely large VIVI value that dominates the display, or where we wish two or more plots to have the same limits for comparison purposes, as in the example below. The angle argument is used to rotate the x-axis labels.

Figure 2 shows our improved ordering so that variables with high VIVI values are pushed to the top left of the plots. Filtering can also be applied to the input matrix to display a subset of variables. When compared to the GBM fit in (b), the random forest fit in (a) has weaker interactions and lower importance scores. Both plots identify $lstat$ as being the most important. Both fits also show that $lstat$ (the percentage of lower status of the population) interacts with several other variables, though the interactions are much stronger for the GBM. Notably, the strongest interaction in both fits are different. These are $lstat:crim$ (where $crim$ is the per capita crime rate by town) for the random forest fit and $lstat:nox$ (where $nox$ is parts per 10 million nitrogen oxides concentration) for the GBM fit.

viviHeatmap(viviRf, angle = 45, intLims = c(0,1), impLims = c(0,8))
viviHeatmap(viviGBst, angle = 45, intLims = c(0,1), impLims = c(0,8))

Agnostic variable importance and variable interaction scores for a random forest fit in (a) and GBM fit in (b) on the Boston housing data displayed as a heatmap. The random forest fit has weaker interactions and lower importance scores than the GBM fit. Both fits identify $lstat$ as the most important followed by $rm$. In both fits, $lstat$ has numerous interactions with other variables, notably $crim$ in the random forest fit and $nox$ in the GBM fit.

Figure 2: Agnostic variable importance and variable interaction scores for a random forest fit in (a) and GBM fit in (b) on the Boston housing data displayed as a heatmap. The random forest fit has weaker interactions and lower importance scores than the GBM fit. Both fits identify $lstat$ as the most important followed by $rm$. In both fits, $lstat$ has numerous interactions with other variables, notably $crim$ in the random forest fit and $nox$ in the GBM fit.

5 Network of Variable Importance and Variable Interactions

The viviNetwork function constructs a network graph displaying both importance and interactions. Similarly to viviHeatmap , this function takes a vivid matrix as the only required input and provides a visual representation of the magnitude of the importance and interaction values through the size of the nodes and edges in the graph, in addition to color. In the plot each variable is represented as a node, with its importance being represented through size and color such that larger, darker nodes indicate a higher importance. Each pairwise interaction is represented by a connecting edge, where larger interaction values get thicker, darker edges; Figure 3 provides an example. This type of plot enables the user to quickly identify the magnitude of the importance and interactions of the variables that have the most impact on the response. The viviNetwork function optional arguments follows the same conventions as viviHeatmap: custom color palettes for importance and interactions are provided via the impPal and intPal, and the range of VIVI values to be mapped to the colors are specified via the impLims and intLims.

By default, we choose a circular layout to display the graphs, as when coupled with the seriated vivid matrix, variables with high VIVI are grouped in a clock-wise arrangement starting at the top. This arrangement allows for easy identification of variables with high VIVI. Custom layouts are possible by providing a numeric matrix with two columns and one row per node to the layout argument. Additionally, any of the layouts available in the igraph package (Csardi and Nepusz 2006) can be specified. The subject of network graph layouts has been extensively studied (for examples see Purchase (1997), Herman et al. (2000), Freeman (2000)). It has been shown that certain layouts of network graphs can significantly aid in cognitive interpretation. For example, Purchase (1997) note that reducing the number of edge crossovers is by far the most important aesthetic (even for small amounts of data), while maximizing symmetry has a lesser effect. Several of the layouts provided by the igraph package can aid interpretation (such as the Sugiyama layout algorithm (Sugiyama et al. 1981), which tries to minimize edge crossover.)

We provide options to filter the graph via the intThreshold and removeNode arguments. This helps to highlight variables with high VIVI scores, which is useful in settings with many predictors. The intThreshold argument filters edges with weight (i.e., VInt value) below a specified value and removeNode removes nodes with no connecting edges after thresholding interaction values. We can optionally cluster similar variables together with respect to their VIVI scores via the cluster argument, thereby aiding in the process of highlighting variables of interest. The cluster argument can take either a vector of cluster memberships for nodes or an appropriate igraph clustering function.

We demonstrate network plots displaying VIVI values for the GBM fit. In Figure 3, we show both a default network plot including all variables in (a) and a filtered and clustered network plot in (b). For the filtered plot we select VIVI values in the top decile. This selection allows us to focus only on the variables with the most impact on the response. The variables that remain are $lstat$, $nox$, $rm$, $crim$, $dis$ (weighted mean of distances to five Boston employment centers), $tax$ (full-value property-tax rate per $10,000), and $ptratio$ (pupil-teacher ratio by town). We then perform a hierarchical clustering treating variable interactions as similarities, with the goal of grouping together high-interaction variables. Here we manually select the number of groups we want to show via the cutree function (which cuts clustered data into a desired number of groups). Finally we rearrange the layout using igraph. Here, igraph::layout_as_star places the first variable (deemed most relevant using the VIVI seriation process above) at the center, which in Figure 3 (b) emphasizes its key role as the most important predictor which also has the strongest interactions.

# default network plot for GBM
viviNetwork(viviGBst)

# clustered and filtered network for GBM
intVals <- viviGBst
diag(intVals) <- NA 

# select VIVI values in top 10%
impTresh <- quantile(diag(viviGBst),.9)
intThresh <- quantile(intVals,.9,na.rm=TRUE)
sv <- which(diag(viviGBst) > impTresh |
              apply(intVals, 1, max, na.rm=TRUE) > intThresh)
              
h <- hclust(-as.dist(viviGBst[sv,sv]), method = "single")

viviNetwork(viviGBst[sv,sv],
            intLims = c(0,1),
            impLims = c(0,8),
            cluster = cutree(h, k = 3), # specify number of groups
            layout = igraph::layout_as_star)

Network plots showing VIVI scores obtained from a GBM fit on the Boston housing data. In (a) we display the all values in a circle. In (b) we use a hierarchical clustering to group variable with high VIVI together and rearrange the layout using an igraph function.

Figure 3: Network plots showing VIVI scores obtained from a GBM fit on the Boston housing data. In (a) we display the all values in a circle. In (b) we use a hierarchical clustering to group variable with high VIVI together and rearrange the layout using an igraph function.

In Figure 3(a), when displaying all the variables, we can clearly identify which variables have the highest VIVI values. The large darker nodes of $lstat$ and $rm$ indicate their importance and the dark, thick connecting edge between $lstat$ and $nox$ tell us that these two variables strongly interact. In (b), after applying a hierarchical clustering, we can see the strongest mutual interactions have been grouped together for the GBM fit. Namely, $lstat$, $nox$, $crim$, $rm$, and $dis$ are all grouped together. The remaining variables are individually clustered.

We provide a conversion of vivid matrix objects to a data frame via an as.data.frame method, as demonstrated below. This facilitates plotting with base R and ggplot2, for example a barplot of either VImp or VInt values. Note that while vivi returns a matrix of class vivid, the class attribute was dropped when the matrix was re-ordered.

class(viviRf)<- c("vivid", class(viviRf)) 
head(as.data.frame(viviRf), 4)

  Variable_1 Variable_2      Value Measure Row Col
1      lstat      lstat 4.80970237    Vimp   1   1
2        nox      lstat 0.06387693    Vint   2   1
3         rm      lstat 0.15226649    Vint   3   1
4       crim      lstat 0.44728494    Vint   4   1

6 Partial Dependence and Individual Conditional Expectation Curves

6.1 Univariate Partial Dependence Plot

The pdpVars function constructs a grid of univariate PDPs with ICE curves for selected variables. We use ICE curves to assist in the identification of linear or non-linear effects. The fit, data frame used to train the model, and the name of the response variable are required inputs. In the code below, we show an example of the partial dependence and ICE curves for the first five features from the GBM vivid matrix, with output shown in Figure 4. We use the custom GBM predict function given previously.

top5 <- colnames(viviGBst)[1:5]
pdpVars(data = Boston,
        fit = gbst,
        response = "medv",
        vars = top5,
        predictFun = pFun)

Partial dependence plots (black line) with individual conditional expectation curves (colored lines) of a GBM fit on the Boston housing data. The changing partial dependence and ICE curves of $lstat$ and $rm$ indicate that these variables have some impact on the response.

Figure 4: Partial dependence plots (black line) with individual conditional expectation curves (colored lines) of a GBM fit on the Boston housing data. The changing partial dependence and ICE curves of $lstat$ and $rm$ indicate that these variables have some impact on the response.

All of our PDP variants handle categorical responses and predictors. The color palette is customized via the pal argument. In all of our PDPs, this defaults to a diverging palette which accentuates fitted values that differ from the average. Dark red and dark blue are used to indicate high and low values of $\hat{y}$ respectively. The middle values are displayed in yellow. The nIce argument specifies the number of ICE curves to be drawn. This is either a single number specifying the number of observations to be sampled for the ICE curves, or a vector of row indices, an option that is useful for example to display ICE curves from particular classes. The default value for nIce is 30, which allows individual curves to be seen.

The ordering of the PDPs is taken from the ordering of variables in the data set, or may be specified via the vars argument. In Figure 4, the ordering is taken directly from our seriated vivid matrix, thereby showing the top five most influential variables. As with the construction of the vivid matrix, the gridSize and nmax arguments determine the number of predictions.

In Figure 4 we can see from the changing PDP and ICE curves that $lstat$ and $rm$ have the clearest impact on the response, with the predicted median house price being higher for low values of $lstat$ and high values of $rm$. Additionally, the predicted median house price appears to be higher for low values of $dis$ before leveling off at around 2.5. The remaining variables have generally flat partial dependence and ICE curves.

6.2 Generalized Pairs Partial Dependence Plot

The pdpPairs function creates a generalized pairs partial dependence plot (GPDP). In our GPDP, we use a matrix layout and plot the univariate partial dependence (with ICE curves) on the diagonal, bivariate partial dependence on the upper diagonal and a scatterplot of raw variable values on the lower diagonal, where all colors are assigned to points and ICE curves by the predicted $\hat{y}$ value. In the case of categorical predictors, the partial dependence for each factor level is shown in the upper-diagonal (for an example of this, see Inglis et al. (2022b)). As with the univariate PDP, the fit, data frame used to train the model, and the name of the response variable are required inputs.

set.seed(1701)
pdpPairs(data = Boston,
         fit = gbst,
         response = "medv",
         gridSize = 20,
         nIce = 50,
         vars = top5,
         convexHull = TRUE,
         fitlims = "pdp",
         predictFun = pFun)

$Filtered generalized pairs partial dependence plot for a GBM fit on the Boston housing data. From both the univariate and bivariate PDPs, we can see that $lstat$ and $rm$ have an impact on the response. As $lstat$ decreases and $rm$ increases, predicted median house price value goes up. The bivariate PDP of $lstat:nox$ shows that as $nox$ increases, the predicted value decreases.$

Figure 5: Filtered generalized pairs partial dependence plot for a GBM fit on the Boston housing data. From both the univariate and bivariate PDPs, we can see that $lstat$ and $rm$ have an impact on the response. As $lstat$ decreases and $rm$ increases, predicted median house price value goes up. The bivariate PDP of $lstat:nox$ shows that as $nox$ increases, the predicted value decreases.

As with the univariate PDP, the ordering can be controlled via the var argument. By default, the ordering is taken from the order of the data. In the code above, we display only the interesting variables seen in previous plots by selecting the first five variables from our seriated vivid matrix. We also chose to display 50 ICE curves. As with pdpVars, additional arguments specify the color palette and number of ice curves, while gridSize and nmax determine the number of predictions.

For our GPDP, we follow the general design choices in vivid and specify the range of predicted values to be mapped to the colors via the fitlims argument. We set the default fit range for the color map for the GPDP to the range of the collection of PDP surfaces with fitlims = "pdp". The setting of this argument at its default value allows for maximum resolution of the bivariate PDPs. Since predictions for specific observations and ICE curves would likely exceed these bounds, the closest value within the color map’s bounds is used to allocate colors. Alternatively fitlims = 'all" specifies that limits are calculated as the full range of predictions shown.

In the upper diagonals we exclude extrapolated areas from the bivariate PDPs to prevent interpretation of the PDPs in areas where there is no data. The removal of extrapolated areas can be prevented by specifying convexHull = FALSE.

In Figure 5, in addition to the univariate PDPs, we capture the effects of the variables on the response via the bivariate PDP on the upper-diagonal and the distribution of the data in the lower-diagonal. The scatterplots are useful for determining if variables are highly correlated, as highly correlated variables may spuriously affect the partial dependence and give erroneous results (Apley and Zhu 2020). Of note are the variables $lstat$ and $rm$. We can clearly see that when the number of rooms ($rm$) is high and the percentage of lower status of the population ($lstat$) is low, the predicted $\hat{y}$ median house price value is high. This is exemplified in the changing bivariate PDP.

7 Zen Partial Dependence Plots

The pdpZen function creates partial dependence plots utilizing a space-saving method based on graph Eulerians (Hierholzer and Wiener 1873). An Eulerian path, also known as an Eulerian trail, is a route that traverses each edge of a graph exactly once. When this path forms a closed loop, the traversal is referred to as an Eulerian tour. We call this display zen-partial dependence plots (ZPDP). The display is based on the zigzag expanded navigation plots, known as zenplots , available in the zenplots package (Hofert and Oldford 2020). Zenplots were created to display paired graphs of high-dimensional data focusing on the most important 2D displays. In our adaptation we show bivariate PDPs that focus on the variables with the largest interaction values in a compact zigzag layout, which is helpful when predictor space is high-dimensional.

The code below illustrates pdpZen, here displaying the first five variables from GBM’s vivid matrix. Later we show an example focusing on high-interacting pairs of variables. We use the same convention as our previous PDPs with regard to color palette and limits, grid size, and the number of rows considered for evaluation. The ZPDP also has a variable rug plot on each axis to avoid interpretation problems that may occur in the presence of skewness.

pdpZen(data = Boston,
       fit = gbst,
       response = "medv",
       convexHull = TRUE,
       zpath = top5,
       predictFun = pFun)

Zen partial dependence plot for the GBM fit on the Boston data. Here we display first five variables from the GBM's `vivid` matrix. Only plots for consecutive variables are shown.

Figure 6: Zen partial dependence plot for the GBM fit on the Boston data. Here we display first five variables from the GBM’s vivid matrix. Only plots for consecutive variables are shown.

The argument zpath specifies the variables to be plotted, defaulting to all dataset variables aside from the response. In the code above, zpath is the vector $lstat$, $nox$, $rm$, $crim$ and $dis$. The resulting plot shown in Figure 6 presents the bivariate PDP for every consecutive pair of variables in a zigzag layout.

7.1 `Zen-paths`

ZPDP are most useful when the bivariate PDPs plotted are selected to be an interesting subset of all pairwise plots. To obtain this subset, we consider a network graph displaying VIVI values, such as that in Figure 3 (a). We then filter the edges below a selected interaction value, leaving only highly interacting variable pairs, as in Figure 3(b). Our goal is to then build a ZPDP consisting of the bivariate plots represented by each edge of the thresholded graph. The zPath function creates a sequence or sequences of variable paths for use in pdpZen.

The zPath function takes four arguments. These are: viv - a matrix of interaction values, cutoff - exclude interaction values below this threshold, method - a string indicating which method to use to create the path, and connect - a logical value indicating if separate Eulerians should be connected

Two methods are provided, either "greedy.weighted" or "strictly.weighted". The first option uses the greedy Eulerian path algorithm (Hurley and Oldford 2011, 2022) for connected graphs. This visits each edge at least once, beginning at the edge with the highest weight and traversing through the remaining edges, giving priority to the highest-weighted edge. Some edges may be visited more than once or additional edges may be visited if the number of nodes in the graph is not even. The second method "strictly.weighted" (provided by zenplot) visits edges strictly in decreasing order by weight (here the interaction values). If connect is TRUE the sequences obtained by the strictly weighted method are concatenated to form a single path.

In the code below, we provide two examples of creating zen-paths, from the top 10% of interaction scores in viviGBst.

intThresh <- quantile(intVals, .9, na.rm=TRUE)
# set zpaths with different parameters
zpGw  <- zPath(viv = viviGBst, cutoff = intThresh, method = "greedy.weighted")
zpGw

 [1] "nox"     "lstat"   "dis"     "ptratio" "lstat"   "rm"     
 [7] "crim"    "lstat"   "tax"     "rm"      "nox"

zpSw  <- zPath(viv = viviGBst, cutoff = intThresh, connect = FALSE, method = "strictly.weighted")
zpSw

[[1]]
[1] "nox"   "lstat" "dis"  

[[2]]
[1] "lstat" "rm"    "nox"  

[[3]]
[1] "lstat" "crim"  "rm"   

[[4]]
[1] "ptratio" "lstat"   "tax"

Our first created zen-path object, zpGw, uses the greedy.weighted method and visits each edge exactly once. The second zen-path, zpSw, uses the strictly.weighted method with connect = FALSE. zpSw consists of four unconnected paths. The zenplots for two of these paths are constructed below.

pdpZen(data = Boston,
       fit = gbst,
       response = "medv",
       zpath = zpGw,
       convexHull = TRUE,
       predictFun = pFun) 

pdpZen(data = Boston,
       fit = gbst,
       response = "medv",
       zpath = zpSw,
       convexHull = TRUE,
       predictFun = pFun)

ZPDP for a GBM fit on the Boston data. In (a) the zpath is defined by the `greedy.weighted` sorting method. In (b), the sorting method is defined by the `strictly.weighted` method and is unconnected. For low values of $lstat$ and and high values of $rm$, predicted median house price value increases.

Figure 7: ZPDP for a GBM fit on the Boston data. In (a) the zpath is defined by the greedy.weighted sorting method. In (b), the sorting method is defined by the strictly.weighted method and is unconnected. For low values of $lstat$ and and high values of $rm$, predicted median house price value increases.

Note that there are 7 different variables involved in high interactions, which could be displayed in a $7 \times 7$ GPDP, showing a total of 21 bivariate PDPs. But only 8 of these have VInt values above the 90% quantile, and Figure 7(b) using the strictly.weighted path shows just these bivariate PDPs’ compact layout. Using the greedy.weighted sorting method in (a) produces a smaller, neater plot but at the expense of including some plots that are not particularly interesting (for example the pair $dis:ptratio$). However, it should be noted that any of the arguments from the zenplot function from the zenplots package can be used with the pdpZen function. These include multiple options for different or custom layouts.

8 Summary

We have presented a detailed exposition of our R package vivid which contains a suite of integrated functions implementing algorithms and novel visualizations for exploring variable importance and variable interactions in machine learning models. Our techniques are intuitive, adaptable, easy to customize and facilitate model comparison. When building the vivid matrix to use in our heatmap and network visualizations, VIVI metrics that are model specific or model-agnostic may be employed. For measuring interactions we currently only provide the option to use the agnostic Friedman’s $H$-statistic. However, as outlined in the Calculating VIVI section, the inclusion of different VIVI measures is easily possible.

Our vivid package is a useful addition to the other packages in the area of model visualization, such as those discussed in the Introduction section. Our heatmap and network plots efficiently determine which variables have the greatest impact on the response. When coupled with the seriation, filtering, and clustering techniques, these visualizations enhance the interpretation of ML predictions. Our GPDP and ZPDP can be used to provide a thorough examination of the behavior of a fitted ML model by examining the individual variable effects and their pairwise interactions. These plots combine the bivariate PDP, ICE curves, and scatterplots of the raw variable values. They further allow focusing on subsets of variables with high VInt, and so allow us to efficiently explore a fitted ML model by focusing attention to only the most crucial aspects.

It has been noted that the presence of correlated variables can lead to biased VIVI measures. For example, Friedman and Popescu (2008) note that when subsets of variables are highly correlated, it becomes difficult to distinguish between low and higher order interactions among them in a straightforward manner when using the $H$-statistic. Similarly, Fisher et al. (2019) state that for permutation variable importance, the presence of highly correlated variables can affect the measured variable importance and propose using a method of conditional importance, such as that implemented for conditional random forests in the party package. Our GPDP can be used to identify any evident correlations by observing the plot of the data and the PDP with convex hull. However, careful consideration should be employed when interpreting the VIVI measures from vivid and we recommend evaluating any potential correlation between variables in conjunction with our proposed visualizations. Several R packages are available for assessing and visualising correlations, such as corrplot (Wei and Simko 2021) and corrgrapher (Morgen and Biecek 2020).

For future work, the inclusion of other model summaries could be incorporated into vivid, such as the interaction statistics described in Greenwell et al. (2018) or the use of accumulated local effects (ALE) of Apley and Zhu (2020). This latter method was created to address bias problems with partial dependency functions and could be used in place of the bivariate PDPs seen in both the GPDP and ZPDP. However the calculation of an agnostic, easily interpretable variable interaction measure that accounts for correlated variables remains an ongoing research goal. Allowing a user to view the variation across replicates when performing permutation importance could also be incorporated. This could be added into the heatmap graphic by way of a value suppressing uncertainty palette (Correll et al. 2018), where the uncertainty in included in the visualization (for example, see Inglis et al. (2022a)). Additionally, providing alternative metrics such as AUC or accuracy for computation of permutation importance in classification, as well as computations and visualizations of higher order interactions, interactive graphics and incorporating multivariate response variables are interesting areas for future research.

8.1 Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2023-054.zip

8.2 CRAN packages used

vivid, vip, ggplot2, iml, flashlight, DALEX, lime, pdp, ICEbox, randomForestExplainer, randomForest, EIX, varImp, party, bartMachine, ingredients, MASS, xgboost, condvis2, ranger, mlr, mlr3, parsnip, e1071, nnet, kknn, igraph, zenplots, zenplot, corrplot, corrgrapher

8.3 CRAN Task Views implied by cited packages

Bayesian, Cluster, Distributions, Econometrics, Environmetrics, GraphicalModels, HighPerformanceComputing, MachineLearning, MissingData, MixedModels, ModelDeployment, NumericalMathematics, Optimization, Phylogenetics, Psychometrics, Robust, Spatial, Survival, TeachingStatistics

N. Antunes, L. Balby, F. Figueiredo, N. Lourenco, W. Meira and W. Santos. Fairness and transparency of machine learning for trustworthy cloud services. In 2018 48th annual IEEE/IFIP international conference on dependable systems and networks workshops (DSN-w), pages. 188–193 2018. IEEE.

D. W. Apley and J. Zhu. Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(4): 1059–1086, 2020.

P. Biecek. DALEX: Explainers for complex predictive models in R. Journal of Machine Learning Research, 19(84): 1–5, 2018. URL https://jmlr.org/papers/v19/18-416.html.

P. Biecek and H. Baniecki. Ingredients: Effects and importances of model ingredients. 2023. URL https://CRAN.R-project.org/package=ingredients. R package version 2.3.0.

P. Biecek and T. Burzykowski. Explanatory model analysis: Explore, explain and examine predictive models. Chapman; Hall/CRC, 2021.

B. Bischl, M. Lang, L. Kotthoff, J. Schiffner, J. Richter, E. Studerus, G. Casalicchio and Z. M. Jones. mlr: Machine learning in R. Journal of Machine Learning Research, 17(170): 1–5, 2016. URL https://jmlr.org/papers/v17/15-066.html.

L. Breiman. Random forests. Machine Learning, 45(1): 5–32, 2001. DOI 10.1023/A:1010933404324.

X. Chen and H. Ishwaran. Random forests for genomic data analysis. Genomics, 99(6): 323–329, 2012.

H. A. Chipman, E. I. George and R. E. McCulloch. BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1): 266–298, 2010.

M. Correll, D. Moritz and J. Heer. Value-Suppressing Uncertainty Palettes. In Proceedings of the 2018 CHI conference on human factors in computing systems, pages. 1–11 2018.

G. Csardi and T. Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems: 1695, 2006. URL https://igraph.org.

D. Earle and C. Hurley. Advances in dendrogram seriation for application to visualization. Journal of Computational and Graphical Statistics, 24: 2015. DOI 10.1080/10618600.2013.874295.

H. Felzmann, E. F. Villaronga, C. Lutz and A. Tamò-Larrieux. Transparency you can trust: Transparency requirements for artificial intelligence between legal norms and contextual concerns. Big Data & Society, 6(1): 2053951719860542, 2019.

A. Fisher, C. Rudin and F. Dominici. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res., 20(177): 1–81, 2019.

L. C. Freeman. Visualizing social networks. Journal of social structure, 1(1): 4, 2000.

J. H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29: 2000. DOI 10.1214/aos/1013203451.

J. H. Friedman and B. E. Popescu. Predictive learning via rule ensembles. The Annals of Applied Statistics., 916–954, 2008.

A. Goldstein, A. Kapelner, J. Bleich and E. Pitkin. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1): 44–65, 2015. DOI 10.1080/10618600.2014.907095.

S. K. Grange, D. C. Carslaw, A. C. Lewis, E. Boleti and C. Hueglin. Random forest meteorological normalisation models for swiss PM 10 trend analysis. Atmospheric Chemistry and Physics, 18(9): 6223–6239, 2018.

B. M. Greenwell. pdp: An R package for constructing partial dependence plots. The R Journal, 9(1): 421–436, 2017. URL https://journal.r-project.org/archive/2017/RJ-2017-016/index.html.

B. M. Greenwell and B. C. Boehmke. Variable importance plots—an introduction to the vip package. The R Journal, 12(1): 343–366, 2020. URL https://doi.org/10.32614/RJ-2020-013.

B. M. Greenwell, B. C. Boehmke and A. J. McCarthy. A simple and effective model-based variable importance measure. arXiv preprint arXiv:1805.04755, 2018.

D. Harrison Jr and D. L. Rubinfeld. Hedonic housing prices and the demand for clean air. Journal of environmental economics and management, 5(1): 81–102, 1978.

T. Hastie, R. Tibshirani and J. Friedman. The elements of statistical learning: Data mining, inference, and prediction. 2nd ed Springer Series in Statistics. Springer-Verlag., 2009.

I. Herman, G. Melançon and M. S. Marshall. Graph visualization and navigation in information visualization: A survey. IEEE Transactions on visualization and computer graphics, 6(1): 24–43, 2000.

C. Hierholzer and C. Wiener. Über die möglichkeit, einen linienzug ohne wiederholung und ohne unterbrechung zu umfahren. Mathematische Annalen, 6(1): 30–32, 1873.

M. Hofert and W. Oldford. Zigzag expanded navigation plots in R: The R package zenplots. Journal of Statistical Software, 95(4): 1–44, 2020. DOI 10.18637/jss.v095.i04.

C. B. Hurley and R. W. Oldford. Eulerian tour algorithms for data visualization and the PairViz package. Computational Statistics, 26: 613–633, 2011. DOI 10.1007/s00180-011-0229-5.

C. B. Hurley and R. W. Oldford. PairViz: Visualization using graph traversal. 2022. URL https://CRAN.R-project.org/package=PairViz.

C. Hurley, M. OConnell and K. Domijan. condvis2: Interactive conditional visualization for supervised and unsupervised models in shiny. 2022. https://cbhurley.github.io/condvis2, https://github.com/cbhurley/condvis2.

E. Hvitfeldt, T. L. Pedersen and M. Benesty. Lime: Local interpretable model-agnostic explanations. 2022. URL https://CRAN.R-project.org/package=lime. R package version 0.5.3.

A. Inglis, A. Parnell and C. Hurley. Visualizations for bayesian additive regression trees. arXiv preprint arXiv:2208.08966, 2022a.

A. Inglis, A. Parnell and C. B. Hurley. Visualizing variable importance and variable interaction effects in machine learning models. Journal of Computational and Graphical Statistics, 31(3): 766–778, 2022b. DOI 10.1080/10618600.2021.2007935.

H. Ishwaran, U. B. Kogalur, E. Z. Gorodeski, A. J. Minn and M. S. Lauer. High-dimensional variable selection for survival data. Journal of the American Statistical Association, 105(489): 205–217, 2010.

M. Kuhn and D. Vaughan. Parsnip: A common API to modeling and analysis functions. 2022. URL https://CRAN.R-project.org/package=parsnip. R package version 1.0.3.

M. Lang, M. Binder, J. Richter, P. Schratz, F. Pfisterer, S. Coors, Q. Au, G. Casalicchio, L. Kotthoff and B. Bischl. mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software, 2019. URL https://joss.theoj.org/papers/10.21105/joss.01903.

S. Maksymiuk, E. Karbowiak and P. Biecek. EIX: Explain interactions in ’XGBoost’. 2021. URL https://CRAN.R-project.org/package=EIX. R package version 1.2.0.

M. Mayer. flashlight: Shed light on black box machine learning models. 2023. URL https://CRAN.R-project.org/package=flashlight. R package version 0.9.0.

D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel and F. Leisch. e1071: Misc functions of the department of statistics, probability theory group (formerly: E1071), TU wien. 2021. URL https://CRAN.R-project.org/package=e1071. R package version 1.7-8.

C. Molnar. Interpretable machine learning: A guide for making black box models explainable. 2nd ed 2022. URL https://christophm.github.io/interpretable-ml-book.

C. Molnar, B. Bischl and G. Casalicchio. iml: An R package for interpretable machine learning. JOSS, 3(26): 786, 2018. URL https://joss.theoj.org/papers/10.21105/joss.00786.

P. Morgen and P. Biecek. Corrgrapher: Explore correlations between variables in a machine learning model. 2020. URL https://CRAN.R-project.org/package=corrgrapher. R package version 1.0.4.

K. Murray and M. M. Conner. Methods to quantify variable importance: Implications for the analysis of noisy ecological data. Ecology, 90(2): 348–355, 2009.

A. Paluszynska, P. Biecek and Y. Jiang. randomForestExplainer: Explaining and visualizing random forests in terms of variable importance. 2020. URL https://CRAN.R-project.org/package=randomForestExplainer. R package version 0.10.1.

P. Probst. varImp: RF Variable Importance for Arbitrary Measures. 2020. URL https://CRAN.R-project.org/package=varImp. R package version 0.4.

H. Purchase. Which aesthetic has the greatest effect on human understanding? In Graph drawing, pages. 248–261 1997.

M. T. Ribeiro, S. Singh and C. Guestrin. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages. 1135–1144 2016.

K. Schliep and K. Hechenbichler. kknn: Weighted k-nearest neighbors. 2016. URL https://CRAN.R-project.org/package=kknn. R package version 1.3.1.

C. Strobl, A.-L. Boulesteix, T. Kneib, T. Augustin and A. Zeileis. Conditional variable importance for random forests. BMC Bioinformatics, 9(307): 2008. DOI 10.1186/1471-2105-9-307.

K. Sugiyama, S. Tagawa and M. Toda. Methods for visual understanding of hierarchical system structures. IEEE Transactions on Systems, Man, and Cybernetics, 11(2): 109–125, 1981.

W. N. Venables and B. D. Ripley. Modern applied statistics with s. Fourth New York: Springer, 2002. URL https://www.stats.ox.ac.uk/pub/MASS4/. ISBN 0-387-95457-0.

T. Wei and V. Simko. R package ’corrplot’: Visualization of a correlation matrix. 2021. URL https://github.com/taiyun/corrplot. (Version 0.92).

H. Wickham. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016. URL https://ggplot2.tidyverse.org.

M. N. Wright and A. Ziegler. ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(1): 1–17, 2017. DOI 10.18637/jss.v077.i01.

A. Zeileis, J. C. Fisher, K. Hornik, R. Ihaka, C. D. McWhite, P. Murrell, R. Stauffer and C. O. Wilke. Colorspace: A toolbox for manipulating and assessing colors and palettes. Journal of Statistical Software, Articles, 96(1): 1–49, 2020. DOI 10.18637/jss.v096.i01.

vivid: An R package for Variable Importance and Variable Interactions Displays for Machine Learning Models

1 Introduction

2 Example: Data and Models

3 Calculating VIVI

3.1 `vivi` function additional arguments

3.2 Speed tests

3.3 Alternative construction of a `vivid` matrix

4 Heatmap of Variable Importance and Variable Interactions

5 Network of Variable Importance and Variable Interactions

6 Partial Dependence and Individual Conditional Expectation Curves

6.1 Univariate Partial Dependence Plot

6.2 Generalized Pairs Partial Dependence Plot

7 Zen Partial Dependence Plots

7.1 `Zen-paths`

8 Summary

8.1 Supplementary materials

8.2 CRAN packages used

8.3 CRAN Task Views implied by cited packages

References

Reuse

Citation

vivid: An R package for Variable Importance and Variable Interactions Displays for Machine Learning Models

1 Introduction

2 Example: Data and Models

3 Calculating VIVI

3.1 vivi function additional arguments

3.2 Speed tests

3.3 Alternative construction of a vivid matrix

4 Heatmap of Variable Importance and Variable Interactions

5 Network of Variable Importance and Variable Interactions

6 Partial Dependence and Individual Conditional Expectation Curves

6.1 Univariate Partial Dependence Plot

6.2 Generalized Pairs Partial Dependence Plot

7 Zen Partial Dependence Plots

7.1 Zen-paths

8 Summary

8.1 Supplementary materials

8.2 CRAN packages used

8.3 CRAN Task Views implied by cited packages

References

Reuse

Citation

3.1 `vivi` function additional arguments

3.3 Alternative construction of a `vivid` matrix

7.1 `Zen-paths`