The rmonad package presents a monadic pipeline toolset for chaining functions into stateful, branching pipelines. As functions in the pipeline are run, their results are merged into a graph of all past operations. The resulting structure allows downstream computation on node documentation, intermediate data, performance stats, and any raised messages, warnings or errors, as well as the final results. rmonad is a novel approach to designing reproducible, well-documented, and maintainable workflows in R.
Pipeline programming is common practice in the R community, with
magrittr,
pipeR, and
wrapr packages offering
infix pipe operators (Bache and Wickham 2014; Ren 2016; Mount and Zumel 2018). The
value on the left of the pipe operator is passed as the first argument
to the right-hand function. This style of programming simplifies code by
removing the need to name intermediate values or write deeply nested
function calls. For example, using the
magrittr pipe operator,
%>%
, the expression x %>% f %>% g
is equivalent to g(f(x))
. These
pipelines are equivalent to applied function compositions and termed
function composition pipelines.
A monadic (Wadler 1990) pipeline extends composition pipelines by allowing context to be threaded through the pipeline. Each function call in the pipeline produces both a new value (assuming successful evaluation) and a computational context surrounding that new value. This new value and context is then merged with the context of the prior node in the pipeline, allowing past context to be stored. In this way, monadic pipelines can be automatically self-describing by returning both the result and a description of the process that created it.
In this paper, we present rmonad, the first explicitly monadic pipeline program developed for the R language. rmonad captures the history of a pipeline as a graph of all past operations. Each node in the graph represents either an input or a function. These nodes store the source code, documentation, any raised messages/warnings/errors, benchmarking info, and arbitrary additional metadata. rmonad also generalizes the standard linear pipeline to a directed graph with support for branching and looping pipelines.
rmonad is one of many graph-based workflow tools available to R programmers. The drake package (Landau 2017) allows specification of R workflows using Make-family semantics (Stallman et al. 2002). The R packages tidycwl (Koc et al. 2020) and sevenbridges (Xiao and Yin 2020) wrap the Common Workflow Language which allows specification of DAG-based workflows that can be easily run on high-performance platforms. Many build systems allow execution of R code snippets, such as Snakemake (Köster and Rahmann 2012), Nextflow (Di Tommaso et al. 2017) and Cuneiform (Brandt et al. 2017). Like these programs, rmonad specifies a graph of dependent operations and can handle large, complex projects. However, rmonad offers a lighter solution, with no dependencies outside R. In the simplest case, rmonad has no more syntactic complexity than a composition pipeline like magrittr.
Since rmonad can annotate and summarize intermediate data, it can serve as a provenance tracking tool. Provenance tracking of data generated through a pipeline is critical for research reproducibility (Gentleman and Lang 2007). For example, the provenance manager VisTrails builds directed acyclic graphs (DAG) of workflows and stores intermediate data objects as external XML files in an external database (Silva et al. 2010). It also provides a visualization of the workflow (or provenance trail) as it is being run. By visualizing the workflow in a DAG-like structure, the user can perform exploratory analysis and retooling on the fly. The R provenance tracking packages archivist (Biecek and Kosinski 2017), trackr (Becker et al. 2019), and adapr (Gelfond et al. 2018) store manual annotations (metadata) of data objects as hooks to an external binary or JSON database.
In the following sections, we introduce the rmonad monadic pipeline operator, show how rmonad generalizes linear pipelines to support branching and nesting, describe how rmonad evaluation allows pipeline debugging and annotation, tie these ideas together with a case study, and provide an overview of the application of rmonad to a large-scale project.
A pipeline consists of a series of expressions that are evaluated using
upstream data as input. The context that is passed through an
rmonad pipeline is stored
as an “Rmonad” S4 object. This object consists of a directed graph of
the relationships between nodes in the pipelines, a list containing the
information about each node (including the output if it is cached), and
a unique identifier for the head node—the node whose output will be
passed to the next operation in the pipeline. Each expression in the
pipeline is evaluated by the special
rmonad function,
evalwrap
, that takes an R expression and returns an “Rmonad” object.
After each new expression in a pipeline is evaluated, the past “Rmonad”
object is merged with the new one (see Algorithm 1).
The rmonad function
evalwrap
evaluates an R expression and returns an “Rmonad” object. The
type signature of evalwrap
is:
\[\text{evalwrap} :: R \rightarrow M\ a\]
The evalwrap
function takes the R expression, \(R\), and returns \(M\ a\),
which is the “Rmonad” object \(M\) wrapping the value returned from the
evaluation of \(R\). On success, the returned value has type \(a\). Thus,
whereas a composition pipeline would consist of chained functions of
type \(a \rightarrow b\), \(b \rightarrow c\), \(c \rightarrow d\), etc, an
rmonad pipeline consists
of \(a \rightarrow M\ b\), \(b \rightarrow M\ c\), \(c \rightarrow M\ d\).
Each evaluation step in an rmonad pipeline creates a contextualized object. However, including the context in the output causes a type conflict. For example, suppose there are functions \(f\) and \(g\) with types (\(a \rightarrow M\ b\)) and (\(b \rightarrow M\ c\)), respectively. Function \(f\) produces an output of type \(M\ b\), but function \(g\) requires an input of type \(b\). This conflict is resolved through the special evaluation performed within the monadic pipe operator.
The monadic pipe operator, or the bind operator, has the type signature (Wadler 1990):
\[bind\ :: \underbrace{m\ b}_{\text{output of $f$}} \rightarrow \underbrace{(b \rightarrow m\ c)}_{\text{the function $g$}} \rightarrow \underbrace{m\ c}_{\text{output of $g$}}\]
where \(m\) is a generic monad. The function bind
takes an input of type
\(m\ b\) and the function \(g\) of type (\(b \rightarrow m\ c\)). It returns
the output of \(g\) which has type \(m\ c\). Many functions of the general
type \(a \rightarrow m\ b\) can be chained together using this bind
function.
For example, the call bind(bind(f(x), g),h)
would chain the
contextualized results of \(f\) through \(g\) and then \(h\). The
implementation of the bind function defines how context from \(m\ b\) is
passed through the monadic chain to \(m\ c\).
The simplest possible implementation of the bind
function passes no
state and is identical to applied functional composition (e.g., as done
in magrittr):
The monadic pipeline operator of
rmonad, %>>%
, has the
type signature:
\[\underbrace{M\ a}_{lhs} \rightarrow \underbrace{(a \rightarrow \ b)}_{rhs} \rightarrow \underbrace{M\ b}_{output}\]
%>>%
is a binary operator where the left hand side (lhs
) is an
“Rmonad” object (\(M\)) wrapping a value of type \(a\). The right hand side
(rhs
) is a normal R function that takes an input of type \(a\) and, if
successful, returns a value of type \(b\). If lhs
stores a failing state
(i.e., a prior node in the pipeline raised an error), then the rhs
function is not evaluated and the failed state is propagated. Otherwise,
the value is extracted from lhs
and evalwrap
then evaluates the
rhs
function with the lhs
value as its first argument yielding a new
“Rmonad” object. Finally, this new object is merged with the prior,
lhs
“Rmonad” object. Merging involves joining the node graphs of the
old and new “Rmonad” objects, setting the head of the resulting graph to
the head of the new graph, and removing the value stored in the prior
head (see Algorithm 3). The “head” of a graph is critical
for branching pipelines (see the Branching and Nesting section).
The difference between %>>%
and a true monadic bind
operator is that
the rhs
of a monadic bind operator is a function (\(a \rightarrow M\
b\)), whereas the rhs
of %>>%
is a normal R function. The %>>%
operator essentially transforms the rhs
R function into a function
that yields the monadic object. This is carried out within the monadic
bind function through the special evaluation offered by evalwrap
.
While the primary rmonad
operator is the monadic pipe operator, %>>%
, several additional
operators are provided for operating on “Rmonad” objects using pipeline
syntax (listed in Table 1).
Operator | Description |
---|---|
%>>% |
pass lhs as initial argument of rhs function |
%v>% |
like %>>% but caches the lhs value |
%*>% |
pass list of arguments from lhs to rhs |
%__% |
rhs starts a new chain that preserves lhs history |
%||% |
use rhs value if lhs is failing |
%|>% |
call rhs on lhs if lhs failed |
The %>>%
operator by itself can only create linear chains of
operations. Mechanisms for lifting this limitation are introduced in the
next section.
In a linear pipeline, the output of each internal function is piped to just one downstream function. In contrast, rmonad allows branching to be formed in one of two main ways: 1) the pipeline’s head may be reset to an internal node and the pipeline can continue growing from there or 2) multiple pipelines may be merged.
The first branching method uses the tag
function to attach a label to
the current head node and the view
function to change the head node to
a previously tagged node. An example of a branched pipeline using these
function is shown in Figure 1. A node may be
associated with one or more tags.
The second branching method allows multiple pipelines to merged into
one. The most direct merge method uses the %*>%
operator to pass the
head value from each “Rmonad” object in the left-hand side list as
arguments to the right-hand side function.
rmonad also offers a
dedicated loop
function that takes an “Rmonad” object containing a
list of values, passes each into monadic function, and connects the
histories and final results of each pipeline into a new “Rmonad” node.
The example below demonstrates a loop where nodes where individual elements are dynamically tagged for later access:
<- loop(
m evalwrap(letters[1:3]),
function(x){ x %>>% paste0("!") %>% tag(c("letters", x)) }
%*>% paste0
) get_value(m, tag="letters")
#> $`letters/c`
#> [1] "c!"
#>
#> $`letters/b`
#> [1] "b!"
#>
#> $`letters/a`
#> [1] "a!"
get_value(m, tag="letters/b")[[1]]
#> "b!"
The elements of the first argument to the loop
function (the letters
‘a’, ‘b’, and ‘c’) are passed to loop
‘s second argument. The second
argument is an anonymous function that adds an exclamation mark to the
input and tags the resulting value. The tags are hierarchical, thus
get_value(m, tag="letters")
returns all values with the initial tag
’letters’. Specific values can be accessed like files in a path (e.g.,
"letters/b").
Since rmonad pipelines
are branched, there is in general no single output value of the
pipeline. Rather, the data contained in the “Rmonad” object is queried
using a family of vectorized getter functions. For example, get_value
will return a list containing the value stored in each node (or NULL
if no value is stored); get_error
returns a list of all error
messages, get_warning
returns a list of all warnings, get_code
returns a list of all code strings, etc. The code below fails on the
‘sqrt’ call and the failing node can be found by searching for code
blocks that were not successfully executed.
<- "a" %>>% paste("cat") %>>% sqrt
m get_code(m)[!get_OK(m)]
#> [[1]]
#> [1] "sqrt"
|
|
plot
functions visualize
the graph with values in nodes if the values are cached and “-”
otherwise. The layout of the plots was modified in the vector editor
Inkscape. Top: A linear rmonad
pipeline that ends in an error. The pipeline begins at node 1 with the
value “x”. This is piped into the paste
function which
concatenates the letter “a”. Since the paste
is successful,
the result is stored in node 2 and the value in node 1 is deleted to
save memory. The value in node 2 is piped into paste
again,
concatenating the letter “b” and storing it in node 3. The value in node
3 is piped into the log
function, where an error is raised,
terminating this branch, and storing the final failing value, “x a b”,
and the error message. The value is only stored at the end node to avoid
storing all intermediate values across a pipeline. That way, values are
stored when there are errors or where explicitly tagged by the user.
Bottom: A branched rmonad
pipeline and its resulting graph. From node 2, the “Rmonad” object is
piped into the tag
function which annotates the head node
(node 2) with the tag “a1” and sets a flag that ensures the value will
be cached for later use. After function 4, the “Rmonad” object is piped
into view
, which sets the head of the graph to node 2.
Lastly, the value in node 2 is piped into the final paste
function that concatenates “c”.
In addition to branching, rmonad allows complex pipelines to be built from smaller nested pipelines defined in normal R functions (see Figure 2). When data is piped into a function that wraps a nested rmonad pipeline, the input values will be linked to the nodes in the nested pipeline that use the input. In this way, rmonad enables multilevel debugging. Storing the input to each failed function at each nest level allows a programmer to step through the code in the failed node using the input data, without having to rerun the entire pipeline.
|
f
. Next, within the scope of f
, node 4 starts
a new pipeline with the value “<”, node 5 pastes “<” from node 4
to the local x variable (which
is the value passed from node 2), and finally node 6 appends the closing
“>” character. The function f
returns an “Rmonad” object
to node 3. The value of node 6 is transferred to node 3 (thus node 6 is
empty, “-”). Finally, node 7 appends “C” and the pipeline finishes
successfully.
In this section, we expound on how errors are handled in rmonad, how nodes are documented and annotated, and how post-processing functionality is added to specify log messages, summarize node output and clean up raised messages.
The core functionality of
rmonad is the stateful
data piping provided by the monadic operator %>>%
. Linear chains of
operations can be constructed with this operator, where each successful
node stores information about the function and results. In the case of
an error, rmonad provides
access to the traceback and to the inputs to each failing function.
Knowing the error messages and the function inputs allows the programmer
to step through the failed function and easily diagnose the problem. All
information is stored within the “Rmonad” object, rather than in the
ephemeral state of an R session.
Here is a concrete example:
<- "a cat" %>>% log %>>% sqrt
m get_error(m)
#> [[1]]
#> character(0)
#>
#> [[2]]
#> [1] "non-numeric argument to mathematical function"
get_code(m)[[2]]
#> "log"
get_value(m)[[2]]
#> [1] "a cat"
Here an illegal value is passed into the natural log function.
rmonad catches this error
and saves the first failing input and error message. The node index and
error message of the failing function can be found with get_error(m)
,
the failing expression can be accessed with get_code
, and the inputs
to the failing function can be retrieved with get_value
. This approach
scales cleanly to large and deeply nested pipelines.
rmonad leverages R non-standard evaluation to parse the abstract syntax tree of pipeline functions at runtime, prior to evaluation of the functions. rmonad extracts 1) the function’s code as a string, 2) an optional documentation string, and 3) an optional list of metadata. All three items are stored in the “Rmonad” node. For example:
<- function(x){
foo "This is a docstring"
list(sysinfo = sessionInfo())
return(x)
}
The first two lines in the function body are the docstring and metadata
list, respectively. Each must 1) be of the appropriate type (string and
list, respectively), 2) not be assigned to a variable, and 3) not be the
final line in the function body. Thus foo
is a legal R function that
can be used naturally outside of the
rmonad context. The
docstring and metadata would be “dead” lines of code that are evaluated
but that are not assigned to any variable or returned. When
rmonad parses the
function before evaluation, the first two lines will be removed and
stored, yielding the following function for evaluation:
function(x){
return(x)
}
The docstring and the function code are stored as simple strings. The metadata list is evaluated within the function environment, giving it access to function input, and then stored.
The metadata is any list associated with a node. It can be used to store static data such as the author’s name, a version for the function, arbitrary notes. It can also store report generation parameters (like code chunks in knitr) (Xie 2015). Because the list is evaluated, its contents are dynamic, allowing, for example, session info to be stored or knitr parameters to be a function of the input. Whereas knitr nests code chunks and their parameters in a text document, rmonad nests text and parameters within the code.
The metadata can be modified freely even after the pipeline is run, to enable the user to store notes that are a function of the pipeline results, as well as personal annotations, reminders, or comments on the results.
A built-in use of the metadata is to add formatters, summarizers, and loggers, which are executed automatically after a node is run. For example, a pipeline developer might write the following wrapper around a base 10 log function:
<- function(x){
fancy_log10 list(
format_warnings = function(x, xs) {
sprintf("%s NaNs produced", sum(is.na(x)))
},format_log = function(x, passing) {
if(passing){
cat("pass\n")
else {
} cat("fail\n")
}
},summarize = list(len = length)
)log10(x)
}
When run, the captured warnings are processed by format_warnings
and
log messages by format_log
, with the following result:
"a cat" %>>% fancy_log10 -> m
#> fail
c(-2,-1,0,1,2) %>>% fancy_log10 -> m
#> pass
get_warnings(m)
#> [[1]]
#> character(0)
#>
#> [[2]]
#> [1] "2 NaNs produced"
> get_summary(m)[[2]]$len
#> 5
In the first case, an illegal value is passed to the fancy_log10
function. This leads to a failure in the second node, and the logger
prints “fail”. In the second case, the user passes the integers between
-2 and 2, storing the result in m
. Since these are legal values (from
R’s perspective), the logger prints the message “pass” after evaluation.
When the returned object is printed, the post-processed warning message
“2 NaNs produced” is shown. The result of the summarizing function is
accessed through the get_summary
function.
As an example of a simple branching rmonad pipeline with error, warning and run time handling we analyzed the Iris dataset (Anderson 1936; Fisher 1936). The Iris dataset is often used for case studies of statistics and machine learning workflows, and consists of features of three species of flowers: Iris setosa, Iris virginia, and Iris versicolor. Among these features is petal length. We used three statistical methods, (1) ANOVA, (2) Kruskal-Wallis, and (3) t-test, to determine if petal length is significantly different across the three Iris species. Some statistical methods are not appropriate for this dataset without data pre-processing. This case study provides an example of running multiple methods using a branching rmonad pipeline, while comparing the output and running times of each method.
Normally, a programmer would run the three methods separately using an R script similar to the following:
# === Load data
data(iris)
# === 3 Statistical Tests (run one at a time)
# (1) Anova
<- aov(Petal.Length ~ Species, data = iris)
res.aov summary(res.aov)
# (2) Kruskal-Wallis
<- kruskal.test(Petal.Length ~ Species, data = iris)
res.kr
res.kr
# (3) T-Test
t.test(Petal.Length~Species, data=iris)
Using rmonad tags, data
can be branched out to encompass the three statistical tests. Here, the
R variable \(m\) stores the output “Rmonad” S4 object. We must initially
tag the branch point node (in this case, the original Iris dataset).
Since we gave the first node the tag (“indata”), its value will be
cached and can be accessed with the command
get_value(m, tag="indata")
. From here, we can access and pipe (%>>%
)
the viewed “indata” tag into the different statistical tests, as
scripted below and visualized in Figure 3.
# === rmonad (run together)
<- {
m "iris dataset"
evalwrap(iris, tag="indata")
%>>% {
} "anova"
<- aov(Petal.Length ~ Species, data = .)
res.aov summary(res.aov)
}
<- {
m view(m, "indata")
%>>% {
} "Kruskal-Wallis"
<- kruskal.test(Petal.Length ~ Species, data = iris)
res.kr
res.kr
}
<- {
m view(m, "indata")
%>>% {
} "t-test"
t.test(Petal.Length~Species, data=iris)
}
The above code could have been chained together using
%>% get_value(tag="indata") %>%
commands, but instead was separately
added to the \(m\) rmonad object for ease of reading. From the \(m\) rmonad
objects, we can plot the pipeline. In the following command we label the
nodes by node id, documentation, running time, and any errors if they
exists.
plot(m, label = function(m){paste(get_id(m),
get_doc(m),
get_time(m),
gsub("character\\(0\\)", "", get_error(m)),
sep=":")})
In Figure 3, the center node is the iris dataset and has three arrows going outwards toward one red and two green nodes. Of those, the red node near the top represents the t-test and shows the expected error “grouping factor must have exactly 2 levels”. Since we are testing the petal length among the three species, this error is expected. Any errors of the pipeline can also be obtained in a table:
missues(m)
#> id type issue
#> 1 4 error grouping factor must have exactly 2 levels
Going clockwise, ANOVA and Kruskal-Wallis are represented by nodes 2 and 3. The green nodes indicate that both ran although their running times were different. From their node labels, Kruskal-Wallis ran in 0.001 ms, slightly faster than ANOVA (0.002). Also note that green nodes only indicate that the method ran successfully, not the results of that method or statistical significance. The results of the ANOVA and Kruskal-Wallis test can be pulled out of the pipeline using their Node ID number and the following commands.
> id=c(2,3) # place id(s) of end result(s) here
> get_value(m)[id]
#> [[1]]
#> Df Sum Sq Mean Sq F value Pr(>F)
#> Species 2 437.1 218.55 1180 <2e-16 ***
#> Residuals 147 27.2 0.19
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> [[2]]
#> Kruskal-Wallis rank sum test
#>
#> data: Petal.Length by Species
#> Kruskal-Wallis chi-squared = 130.41, df = 2, p-value < 2.2e-16
Both tests agree that there is a significant difference between
Petal.Length
across the three Iris species. ANOVA ran on the dataset,
which means that petal length follows a normal distribution within each
species. Kruskal-Wallis does not assume a normal distribution. The
analyst can decide which method to use; in this case the conclusion is
the same. Figure 3 is an example of a
branched rmonad pipeline
comparing three different statistical methods applied to the iris
dataset to test a hypothesis.
An example of a large and complex pipeline that uses
rmonad is the orphan gene
classification R pipeline, fagin
(Arendsee et al. 2019) (Figure
4). This pipeline compares genes from one species
of interest (the focal species) to genomes of several related species.
The first step in the pipeline is to store the user’s session
information, which can be used in debugging if needed. Next, the
pipeline loops across each species, where, for each species, genomes and
annotation data are loaded and validated. Then secondary data (e.g.,
protein sequences) are derived, and diagnostic summaries are produced
and stored. Next, each of the orphan genes in the focal species is
compared to each of the related species genomes to create 12 features
that are used to classify potential evolutionary relatives of each the
focal gene in the target species. Finally, all data for each focal gene
is compiled into a description.
The output of this pipeline is a single “Rmonad” object. Further analysis of the pipeline entails a series of queries against this returned object. Warnings and messages are tabulated into an HTML report. Tagged summary data is extracted and used to build diagnostic figures. The primary results are extracted as tabular data and visualized in the final report. Issues with a pipeline can be identified by searching through the raised warnings stored in the “Rmonad” object. Debugging consists of identifying the node of failure, extracting the stored inputs to the failing node, and then stepping through the failing code.
We implemented a monadic pipeline in R via the rmonad package. rmonad provides an infrastructure for data analysis and report generation. rmonad stores pipeline results and metadata that can be easily explored interactively and collated into reports using tools such as the literate programming package knitr (Xie 2015) or the HTML report generator Nozzle.R1 (Gehlenborg et al. 2013).
rmonad integrates a simple profiler into the workflows by automatically capturing the runtime and memory usage of each node. This feature makes it easier for the pipeline developer to identify bottlenecks in the code or potential culprits of memory overflow. Often, a coder must add benchmarking code to key locations in a pipeline. rmonad has built-in benchmarking, such that all locations in the pipeline are automatically tested and performance can be checked post-run.
rmonad provides a
powerful tool for creating and resolving issue reports. If an
rmonad pipeline fails,
the resulting object will store all failing functions, their raised
error/warning messages and also their inputs. This object can be used to
find the error messages, load all inputs to the failing function, and
proceed to step through the code until the bug is found. If the user
prepends a node that stores the local session data (e.g.,
sessionInfo()`` %__% ...
), the debugger gains access to the state of
the user’s machine (an often-requested item in a bug report). An
“Rmonad” object with session info attached contains everything needed to
debug the issue. This streamlines issue resolution by improving
automation and simplifying submission.
Performance has not been a focus of rmonad up to this point. The package currently lacks support for the re-use of cached values when pipelines are re-run. Also each evaluation step has a high overhead cost relative to lighter pipeline tools like magrittr. rmonad pipelines tend to be memory intensive, since they store many intermediate results and metadata in the “Rmonad” objects. Addressing these performance issues is a major goal for future work.
In summary, rmonad integrates the concepts of a pipeline, a build system, a data structure, and an low-level report-generating engine. An rmonad project consists of incremental piped operations (like a pipeline program), supports complex branching projects (like a build system), and produces a data structure that can be computed on to generate dynamic reports.
rmonad is published under the GPL-3 license and is available on the Comprehensive R Archive Network (CRAN) and on GitHub at https://github.com/arendsee/rmonad. Systematic documentation of the features with simple examples can be found in the vignettes, available through CRAN.
This material is based upon work supported by the National Science Foundation under Grant No. IOS 1546858.
rmonad, magrittr, pipeR, wrapr, drake, tidycwl, sevenbridges, archivist, trackr, adapr, knitr, Nozzle.R1
HighPerformanceComputing, ReproducibleResearch
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Arendsee, et al., "rmonad: pipelines you can compute on", The R Journal, 2022
BibTeX citation
@article{RJ-2022-007, author = {Arendsee, Zebulun and Chang, Jennifer and Wurtele, Eve}, title = {rmonad: pipelines you can compute on}, journal = {The R Journal}, year = {2022}, note = {https://rjournal.github.io/}, volume = {14}, issue = {1}, issn = {2073-4859}, pages = {22-33} }