Pipe notation is popular with a large league of R users, with magrittr being the dominant realization. However, this should not be enough to consider piping in R as a settled topic that is not subject to further discussion, experimentation, or possibility for improvement. To promote innovation opportunities, we describe the wrapr R package and “dot-pipe” notation, a well behaved sequencing operator with S3 extensibility. We include a number of examples of using this pipe to interact with and extend other R packages.
Using pipes to sequence operations has a number of advantages. Piping is analogous to representing function composition as a left to right flow of values, which is a natural direction for Western readers, and is much more legible than composition represented as nesting.
Pipe notation is a popular topic in the R community. Related work includes:
][
, which is essentially as an example of piping or method
chaining.This article will discuss using the operator %.>%
from the package
wrapr (Mount and Zumel 2018)
(colloquially called “dot-pipe” or “dot-arrow”). Dot-pipe is compatible
with many other meta-programming paradigms, and is directly S3
extensible.
There are a number of important pipe notations in and out of R:
a
\(\circ\)b
to denote a(b)
.process1 | process2
streams results from process1
as
input to process2
.a |> b
means b a
, using F#’s
partial application feature.a %>% b(...)
is most commonly used to denote
{. <- a; b(., ...)}
(with dot side effects hidden).%.>%
(the topic of this article), where
a %.>% b
is intended to approximately mean {. <- a; b}
.%.>%
to sequence operationsIn this section, we demonstrate the use of “dot-pipe” %.>%
from
wrapr and some of its merits. The intended semantics of %.>%
are:
a %.>% b
is nearly equivalent to{. <- a; b}
where a
and b
are taken to be R expressions, presumably with the dot
occurring as a unbound (or free) symbol in b
. For example:
> library("wrapr")
> 5 %.>% sin(.)
1] -0.9589243
[
> print(.)
1] 5 [
Notice the wrapr dot-pipe leaves the most recent left-hand side value in the variable named dot. While this is a visible side-effect of the pipe which can conflict with other uses of dot, we feel these explicit semantics are sensible, easy to teach, and easy to work with.
We can also write 5 %.>% sin
, as the dot-pipe looks up functions by
name (even for qualified names such as base::sin
) as a user
convenience. This function lookup is a non referentially transparent
special case, as names are deliberately treated differently than values.
However, it is an important capability that we will discuss later and
greatly expand using S3 object oriented dispatch. Dot-pipe’s default
service does not work with the expression 5 %.>% sin()
and throws an
informative error message. Maintaining an explicit distinction between
sin
(a name), sin()
(an expression with no free-use of dot), and
sin(.)
(an expression with free-use of dot) has benefits, some of
which we will demonstrate in the 4 section. For
dot-pipe usage in general, the explicit expression sin(.)
is preferred
to sin
under the rubric “dot-pipe has lots of dots.”
Additional dot-pipe examples include:
> 5 %.>% {1 + .}
1] 6
[
> 5 %.>% (1 + .)
1] 6 [
Notice dot-pipe treated the last two statements similarly. We warn the
reader that in R the expression 5 %.>% 1 + .
is read as
(5 %.>% 1) + .
, since special operators (those using %
) have higher
operator precedence than binary arithmetic operators.
The dot-pipe works well with many packages, including dplyr:1
> library("dplyr")
> disp <- 4
> mtcars %.>%
+ filter(., .data$cyl == .env$disp) %.>%
+ nrow(.)
1] 11 [
Dot-pipe’s primary dispatch is user extensible; by default, it treats
a %.>% b
as {. <- a; b}
. However, it does this via S3 dispatch
through a method of signature apply_left(a, b, –more–)
. User or
package code can override this method to add custom effects. For
example, one can extend dot-pipe to be a
ggplot2 Wickham et al. (2018) layer
compositor as we show below.
> library("ggplot2")
> apply_left.gg <- function(pipe_left_arg,
+ pipe_right_arg,
+ pipe_environment,
+ left_arg_name,
+ pipe_string,
+ right_arg_name) {
+ pipe_right_arg <- eval(pipe_right_arg,
+ envir = pipe_environment,
+ enclos = pipe_environment)
+ pipe_left_arg + pipe_right_arg
+ }
We have defined an implementation of apply_left.gg
, as this is the
class used by ggplot2 to recognize its own objects (i.e., ggplot2
works by defining ‘+‘.gg
). Essentially apply_left.gg(a, b)
is
implemented as a + b
, the only detail being b
is passed as a
un-evaluated language argument, so it must be evaluated before being
used as a regular value, a detail discussed in the package
documentation.
We can now easily write a pipeline that combines sequencing dplyr transformation steps and combining ggplot2 geom objects, producing figure 1.
> data.frame(x = 1:20) %.>%
+ mutate(., y = cos(3*x)) %.>%
+ ggplot(., aes(x = x, y = y)) %.>%
+ geom_point() %.>%
+ geom_line() %.>%
+ ggtitle("piped ggplot2",
+ subtitle = "wrapr")
Notice how we can use the same pipe notation for both the initial
dplyr data processing steps and for the later ggplot2 layer
aggregation steps. As before, the data processing steps (e.g., mutate
)
require dot as a free symbol to specify where the piped values go.
However, the ggplot2 steps do not use such a dot argument, as these
functions do not expect previous steps as arguments.
Dot-pipe was able to add capabilities to the ggplot2 package without requiring any changes to the ggplot2 package. This extension capability is important.
If an object on the right hand side of a dot-pipe stage is an R language
name (or a qualified name such as base::sin
), then that object is
retrieved. If the result is a function, the function is applied. If the
result is a more general object, then S3 dispatch is used on the class
of this second or right hand side argument. That is, a %.>% b
is
treated as b(a)
or \(\text{f}_{\text{class(b)}}\)(a, b)
.
A good example using this capability is extending the
rquery package (Mount 2018)
to allow relational operator trees to be used both as inspectable
objects and as functions that can be applied directly to data. In the
following example, we create an operator tree that adds the column y
to a data frame, d
.
> library("rquery")
> optree <- mk_td(table_name = "d", columns = "x") %.>%
+ extend_nse(., y = cos(2*x))
We can treat optree
as an object as we show below.
> class(optree)
1] "relop_extend" "relop"
[
> print(optree)
1] "table(d; x) %.>% extend(., y := cos(2 * x))"
[
> column_names(optree)
1] "x" "y"
[
> columns_used(optree)
$d
1] "x" [
Or we can pipe into it, as we now demonstrate.
> # get a database connection
> db = DBI::dbConnect(RSQLite::SQLite(),
+ ":memory:")
> # make our db connection available to rquery package
> options(list("rquery.rquery_db_executor" = list(db = db)))
> data.frame(x = 1:3) %.>% optree # apply optree to d
x y1 1 -0.4161468
2 2 -0.6536436
3 3 0.9601703
In this example, the rquery package defined a surrogate S3 method for
the right hand side pipe argument: apply_right.relop
. Any user or
package can extend the dot-pipe to suit their needs, just as we have
shown here. The rquery package defines apply_right.relop
allowing
new data to be applied to existing pipelines as we saw above.
We provide wrapr::apply_right_S4()
as an S4 dispatch interface. This
flexibility can be used to define special effects such as “same class to
same class” ideas. For example, we can arrange for data frames to
automatically call rbind
when piped into each other. Note that it
usually does not make sense to pipe into a non-expression or
non-function object.
> d1 <- data.frame(x = 1)
> d2 <- data.frame(x = 2)
> tryCatch(
+ d1 %.>% d2,
+ error = function(e) { invisible(cat(format(e))) })
::apply_right_S4 default called with classes:
wrapr
d1 data.frame
d2 data.frame
must have a more specific S4 method defined to dispatchNULL
If one sets a generic signature for apply_right_S4
this can be made a
sensible and useful operation.
> setMethod(
+ "apply_right_S4",
+ signature = c("data.frame", "data.frame"),
+ definition = function(pipe_left_arg,
+ pipe_right_arg,
+ pipe_environment,
+ left_arg_name,
+ pipe_string,
+ right_arg_name) {
+ rbind(pipe_left_arg, pipe_right_arg)
+ })
> d1 %.>% d2
x1 1
2 2
However, the apply_right
execution path is only active when the right
pipe argument is a name. The rbind
effect would not work if piped
directly into a value. The default apply_right
implementation is an S3
dispatch on the right pipe argument.
> d1 %.>% data.frame(x = 2)
x1 2
In this case data.frame(x = 2)
was evaluated an an expression where
the dot had the value data.frame(x = 1)
, which was in turn ignored.
We have been describing dot-pipe semantics by introducing transformed code that we consider equivalent to the dot-pipe pipeline. Think of that as the specification. Dot-pipe’s implementation is not by code substitution but through execution rules we outline here.
In R, special operators (those written with %
) are left to right
associative (meaning a %.>% b %.>% c
is taken to mean
(a %.>% b) %.>% c
) with fairly high operator precedence (meaning they
are applied earlier than some other operators).
The dot-pipe semantics are realized by the following processing rules.
a %.>% b
is processed as follows:
Def.- We choose the default “control on the left case” (or “L case”) if
the second or right hand side argument b
is not a R language name
or other dereferencable entity, otherwise we take the “control on
right case” (or “R case”). We then continue with one of these two
cases.
L case.- S3 dispatch is performed on `apply_left(a, b, env, nm)`, with
`class(a)` being the method-determining argument, and `b` an
un-evaluated R language object. The default implementation of
`apply_left(a, b, env, nm)` is `. <- a ; eval(b)` (performed in
the calling environment).
R case.- We look-up the second or right hand side argument `b` and then
branch as follows.
1. If `b` is now a function, the value `b(a)` is returned.
2. Otherwise, S3 dispatch is performed on
`apply_right(a, b, env, nm)` with `class(b)` as the method
determining argument. If `b` is an S4 object,
`apply_right.default(a, b, env, nm)` in turn dispatches to
`apply_right_S4(a, b, env, nm)`.
This may seem involved, but it is in fact quite regular with only one exception: a dereference triggers right-dispatch. Roughly, the rule is: “treat the second or right hand side argument as an expression, unless it is a name.” The intent is for dot-pipe to have simple semantics that are capable of being combined many ways to allow rich emergent behavior.
The magrittr package supplies a very popular R pipe operators, so it is worth a bit of discussion.
The magrittr package works by capturing entire (possibly more than one step) pipelines un-evaluated and then inspecting the captured code for its own piping symbols. This can be confirmed by looking at the implementation and also by attempting to re-name the magrittr pipe.
> library("magrittr")
> 5 %>% sin
1] -0.9589243
[
> `%userpipe%` <- magrittr::`%>%`
> tryCatch(
+ 5 %userpipe% sin,
+ error = function(e) {e})
<simpleError in pipes[[i]]: subscript out of bounds>
The wrapr pipe executes by looking only at the arguments it is given, holding the right argument un-evaluated until the left value is available. Multiple stage wrapr pipes are just an effect of running stages one after the other.
> `%userpipe%` <- wrapr::`%.>%`
> 5 %userpipe% sin
1] -0.9589243 [
There are also differences in how magittr and wrapr handle functions
and function arguments. With magittr, one can not reliably pipe into
substitute
with %>%
. Note that the word value
below is the result,
not an input.
> library("magrittr")
> 5 %>% substitute
value
Also, %>%
does not work with qualified names unless one uses the more
general expression notation base::sin(.)
.
> tryCatch(
+ 5 %>% base::sin,
+ error = function(e) {e})
<simpleError in .::base: unused argument (sin)>
In contrast, %.>%
behaves closer to common user expectations.
> library("wrapr")
> 5 %.>% substitute
1] 5
[
> 5 %.>% base::sin
1] -0.9589243 [
Note that the magrittr package can in fact trigger S3 dispatch through
its exposition pipe operator, %$%
pipe, but this is only because this
operator calls the method with()
, which is itself S3 overridable. This
should not be treated as an intended feature of the magrittr.
Both R users and package developers can achieve a great number of useful
effects by adding S3 implementations for apply_left()
or for
apply_right()
. Some possibilities include:
%.>%
as a layering function for ggplot2 (as a
replacement for +
, as we demonstrated).apply_right.model_class
to the appropriate predict
method.> d <- data.frame(x = 1:5, y = c(1, 1, 0, 1, 0))
> model <- glm(y~x, family = binomial, data = d)
> apply_right.glm <-
+ function(pipe_left_arg,
+ pipe_right_arg,
+ pipe_environment,
+ left_arg_name,
+ pipe_string,
+ right_arg_name) {
+ predict(pipe_right_arg,
+ newdata = pipe_left_arg,
+ type = 'response')
+ }
> data.frame(x = c(1, 3)) %.>% model
1 2
0.9428669 0.6508301
Notice we can pipe new data directly into the model for prediction.
The S3 `apply_right` extensions give us a good opportunity to
regularize model predictions functions to take the same arguments
and have the desired default behaviors.
> # get a database connection
> db = DBI::dbConnect(RSQLite::SQLite(),
+ ":memory:")
> apply_right.SQLiteConnection <-
+ function(pipe_left_arg,
+ pipe_right_arg,
+ pipe_environment,
+ left_arg_name,
+ pipe_string,
+ right_arg_name) {
+ DBI::dbGetQuery(pipe_right_arg, pipe_left_arg)
+ }
> "SELECT * FROM sqlite_temp_master" %.>% db
1] type name tbl_name rootpage sql
[<0 rows> (or 0-length row.names)
Here we piped SQL code directly into the database connection.
> apply_left.character <- function(pipe_left_arg,
+ pipe_right_arg,
+ pipe_environment,
+ left_arg_name,
+ pipe_string,
+ right_arg_name) {
+ pipe_right_arg <- eval(pipe_right_arg,
+ envir = pipe_environment,
+ enclos = pipe_environment)
+ paste0(pipe_left_arg, pipe_right_arg)
+ }
> "a" %.>% "b" %.>% "c"
1] "abc" [
One can, of course, define a string concatenation operator directly,
but this is a good example of the use of the dot-pipe as a sort of
compound constructor.
> apply_left.formula <- function(pipe_left_arg,
+ pipe_right_arg,
+ pipe_environment,
+ left_arg_name,
+ pipe_string,
+ right_arg_name) {
+ pipe_right_arg <- eval(pipe_right_arg,
+ envir = pipe_environment,
+ enclos = pipe_environment)
+ pipe_right_arg <- paste(pipe_right_arg, collapse = " + ")
+ update(pipe_left_arg, paste(" ~ . +", pipe_right_arg))
+ }
> (y~a) %.>% c("b", "c", "d") %.>% "e"
~ a + b + c + d + e y
We anticipate motivated package authors can find many special cases that the dot-pipe can streamline for their users. The value will be when many packages add effects on the same pipe, so users know by using that pipe they will simultaneously have many powerful features made available.
We have found it profitable to roughly think of apply_left()
as a
“programmable comma” and apply_right()
as “automatic execution”
(usually achieved by overriding print()
).
There are limitations to the class-driven pipe dispatch approach. The
class of the left item (driving apply_left()
) is often uninformative,
as in R it will very often be a data frame. The class of the right item
(used by apply_right()
) is not available until the right item has been
evaluated, which is too late for the most common pipe effect (evaluating
the right item with the left available as a dot). However, the authors
feel this system is more R like as it leaves more of the execution to
the R interpreter and tries to minimize the pipe operator itself being a
type of replacement interpreter implementation.
We have demonstrated a predictable, well-behaved, S3-extensible tool for
sequencing or pipe-lining operations in R. The left-dispatch of
apply_left()
method is useful in assembling composite structures such
as building a ggplot2 plot up from pieces. The right-dispatch
apply_right()
is unusual, but a natural extension of the “pipes write
functions on the right” idea. The goal of dot-pipe is to supply simple
semantics that can be composed into powerful specific applications. The
dot-pipe can be used to extend packages, or to add user desired effects.
We would like the wrapr dot-pipe to be a testing ground both for
pipe-aware package extensions and for experimenting with the nature of
piping in R itself.
The authors would like to thank the wrapr users for their feedback. In particular we would like to thank the R Journal editors and reviewers who contributed a number of important points, including the idea that S4 evaluation was a good possibility and the natural way to discuss dispatching on a right argument.
data.table, magrittr, dplyr, future, rmonad, pipeR, backpipe, drake, wrapr, ggplot2, rquery
Databases, Finance, HighPerformanceComputing, ModelDeployment, Phylogenetics, ReproducibleResearch, Spatial, TeachingStatistics, TimeSeries, WebTechnologies
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Mount & Zumel, "Dot-Pipe: an S3 Extensible Pipe for R", The R Journal, 2018
BibTeX citation
@article{RJ-2018-042, author = {Mount, John and Zumel, Nina}, title = {Dot-Pipe: an S3 Extensible Pipe for R}, journal = {The R Journal}, year = {2018}, note = {https://rjournal.github.io/}, volume = {10}, issue = {2}, issn = {2073-4859}, pages = {309-316} }