The tourr package in R
has several algorithms and displays for showing multivariate data as a
sequence of low-dimensional projections. It can display as a movie but
has no capacity for interaction, such as stop/go, change tour type,
drop/add variables. The
tourrGui package
provides these sorts of controls, but the interface is programmed with
the dated RGtk2 package.
This work explores using custom messages to pass data from R to D3 for
viewing, using the Shiny framework. This is an approach that can be
generally used for creating all sorts of interactive graphics.
Did you know you can run any javascript you like in a Shiny application and you can pass whatever you want including JSON back and forth? This massively widens the scope of what you can do with Shiny, and generating a tour of multivariate data with this approach is a really good example of what is possible.
The tour algorithm (Asimov 1985) is a way of systematically generating and displaying projections of high-dimensional spaces in order for the viewer to examine the multivariate distribution of data. It can do this either randomly, or by picking projections judged interesting according to some criterion or index function. The tourr package (Wickham et al. 2011) provides the computing and display in R to make several types of tours: grand, guided, little and local. The projection dimension can be chosen between one and the number of variables in the data. The display, though, has no capacity for interaction. The viewer can watch the tour like a movie, but not pause it and restart, or change tour type, or number of variables.
These interactive controls were provided with the tourrGui package (Huang et al. 2012), with was programmed with the RGtk2 package (Lawrence and Temple Lang 2010). This is not the toolkit of choice today, and has been superceded with primarily web-capable tools, like Shiny (Chang et al. 2017). To display dynamic graphics though, is not straight-forward. This paper explains how to use D3 (Bostock et al. 2011) as the display engine in a Shiny graphical user interface (GUI), using custom message passing between server and client.
The tourr package (Wickham et al. 2011) is an R implementation of the tour algorithms discussed in (Cook et al. 2007). It includes methods for geodesic interpolation and basis generation, as well as an implementation of the simulated annealing algorithm to optimise projection pursuit indices for the guided tour. The tour can be displayed directly in the R graphics device, for example, the code below generates a 1D density tour. Figure 1 shows snapshots.
library(tourr)
# quartz() # to display on a Mac; X11() # For windows; The Rstudio graphics
# device is not advised
animate_dist(flea[, 1:6], center = TRUE)
A tour path is a smooth sequence of projection matrices, \(p\times d\),
that when combined with a matrix of n data points, \(n\times p\), and a
rendering method, produces a steady stream of \(d\)-dimensional views of
the data. Each tour is initialised with the new_tour()
method, which
instantiates a tour object and takes as arguments the data \(X\), the tour
method, e.g. guided_tour()
, and the starting basis. Once initialised,
a new target plane is chosen, and a series of steps along a geodesic
path from starting to target plane are generated by interpolation.
This requires a series of calls to the tour object producing the series
of projections. The steps are discrete, of size given by
\(\omega/\Delta\), where \(\omega\) denotes the angular velocity of the
geodesic interpolation, and \(\Delta\) is a parameter denoting frames per
second, reflecting the rendering speed of the device in use. The
\(\Delta\) parameter can be thought of as the frames per second, while
\(\omega\) affects the speed at which the tour moves through the
projection space. For our purposes, \(\Delta\), fps
in the code, is set
at 25, while the \(\omega\) can be adjusted by the user.
sendCustomMessage
D3.js (Data-Driven Documents) (Bostock et al. 2011) is a JavaScript library for manipulating documents based on data. The advantages of D3 are similar to those provided by Shiny: namely, an industry standard with rich array of powerful, easy to use methods and widgets that can be displayed on a wide variety of devices, with a large user base. D3 works on data objects in the JavaScript Object Notation (JSON) format, which are then parsed and used to display customisable data visualisations.
The new implementation of the tour interface uses D3 to render each projection step returned by R, focusing on 2D projections as a test case. It does this by drawing and re-drawing a scatterplot with dots (or circles in D3 language) and providing SVG objects for the web browser to render. Figure 2 shows the new GUI.
The Shiny functions session$sendCustomMessage()
and
Shiny.addCustomMessageHandler()
are provided to transport data between
R and JavaScript. Whenever the former is executed in R, the latter
function will execute a code block in JS. There are many examples of
such functions being used to pass arbitrary data from an R app to a JS
front-end, few examples exist of this basic functionality to update a D3
animation in real-time.
To set up the interface for the app, we need to load the relevant
scripts into the Shiny app and assign a section for the resulting plots.
This is done when setting up the user interface. We import D3 and our
plotting code via the tags$script
(for web links) and includeScript
(for reading from a full path). We use tags$div
to assign an id for
the output section that can be accessed in the D3 code.
$script(src = "https://d3js.org/d3.v4.min.js"),
tagsincludeScript(system.file("js/d3anim.js", package = "tourrGUID3")),
$div(id = "d3_output") tags
On the D3 side we can access the id defined in Shiny, and for example assign it to a scalable vector graphics (svg) object to be filled in D3 and rendered onto the Shiny app.
= d3.select("#d3_output")
var svg .append("svg")
.attr("width", w)
.attr("height", h);
The data format expected by D3 is in JSON format, which combines two basic programming paradigms: a collection of name/value pairs, and an ordered list of values. R’s preferred data formats include data frames, vectors and matrices. Every time a new projection has been calculated with the tour path, the resulting matrix needs to be converted to JSON and sent to D3. Using a named list we can send multiple JSON datasets to D3, e.g. to draw both the data points (stored in dataframe d) and the projection axes (stored in dataframe a). Converting dataframes will pass the column names to JSON. The code to send the D3 data looks like this:
$sendCustomMessage(type = "data", message = list(d = toJSON(d), a = toJSON(a))) session
This code is from the observe environment from the server.R
file. It
converts the matrix of projected data points to JSON format, and sends
it to JavaScript with the id data. The list entries of the “message” can
parsed in D3 by its data()
method, e.g. data(message.d)
to access
the projected data points, and we can access each column through the
column names assigned in the original dataframe, and loop over all rows
for rendering. All of the code required to render the scatterplots and
legends, along with colours, is JavaScript code in the file d3anim.js
.
In particular, the data from R is handled with the following code:
Shiny.addCustomMessageHandler("data",
function(message) {
/* D3 scatterplot is drawn and re-drawn using the
*/
data sent from the server. }
Every time the message is sent (25 times per second), the code-block is run.
The observeEvent
Shiny method defines a code block to be run whenever
some input value changes. The following code snippet restarts a tour
using a random basis:
observeEvent(input$restart_random,
{<- length(input$variables)
p <- matrix(runif(2*p), p, 2)
b $tour <-
rvnew_tour(as.matrix(rv$d[input$variables]),
choose_tour(input$type,
$guidedIndex,
inputc(rv$class[[1]])), b)
})
The projections are calculated using the tour object in an observe()
environment, which re-executes the code whenever it is invalidated. The
invalidation is either by a change in reactive value inside the code
block, or we can schedule a re-execution by explicitly invalidating the
observer after a selected interval using invalidateLater()
. The
projections are calculated using the following code block:
observe({
if (length(rv$mat[1, ]) < 3) {
$sendCustomMessage(type = "debug",
sessionmessage = "Error: Need >2 variables.")
}<- rv$aps
aps <- rv$tour
tour <- rv$tour(aps / fps)
step invalidateLater(1000 / fps)
<- center(rv$mat %*% step$proj)
j <- cbind(j, class = rv$class)
j colnames(j) <- NULL
$sendCustomMessage(type = "data",
sessionmessage = list(d = toJSON(data.frame(pL=rv$pLabel[,1], x=j[,2],
y=j[,1], c=j[,3])),
a = toJSON(data.frame(n=rv$vars, y=step$proj[,1],
x=step$proj[,2]))))
})
You can try the app yourself using this code:
::install_github("uschiLaa/tourrGUID3")
devtoolslibrary(tourrGUID3)
launchApp(system.file("extdata", "geozoo.csv", package = "tourrGUID3"))
Fixing bugs in the JavaScript code can be cumbersome, as R and Shiny will not report any errors. Tracing JavaScript errors can be done when using the JavaScript console in the web browser. For example, in Google Chrome the console can be accessed via the “Developer Tools” option found under “Moore Tools” in the control menu. Typical errors that we encountered were version dependent syntax in D3, e.g. for axis definitions or scaling.
The D3 canvas makes for smooth drawing and re-drawing of the data projections. Adding a GUI around the display is straightforward with the Shiny package, e.g. control elements such as stop/go, increase/decrease speed, change tour type, add/remove variables from the mix.
The main disadvantage is that the speed is inconsistent, as server and client play tag to keep up with each other, and the display cannot handle many observations. Noticeable slow down was oberved with 2000 points, the main reason being the rendering time required for the large number of SVG circle elements. The situation can be improved when using a single HTML5 canvas element to draw the scatter points, significantly reducing the rendering time.
Another disadvantage is that the displays needs to be coded anew. D3 provides mostly primitives, and example code, to make scatterplots, and contours, but the data displays all need to be coded again.
The custom message tools from Shiny provide a way to share a tour path with the D3 renderer, and embed it in a Shiny GUI providing controls such as stop/go, increase/decrease speed, change tour type, add/remove variables. However, the approach doesn’t provide the smooth motion that is needed for easy display of projections, and is slow for large numbers of observations.
The code is available at https://github.com/uschiLaa/tourrGUID3, and the source material for this paper is available at https://github.com/dicook/paper-tourrd3.
Thanks to Yihui Xie for pointing out the custom message tools.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Kipp, et al., "Connecting R with D3 for dynamic graphics, to explore multivariate data with tours", The R Journal, 2019
BibTeX citation
@article{RJ-2019-002, author = {Kipp, Michael and Laa, Ursula and Cook, Dianne}, title = {Connecting R with D3 for dynamic graphics, to explore multivariate data with tours}, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {1}, issn = {2073-4859}, pages = {245-249} }