The RWiener Package: an R Package Providing Distribution Functions for the Wiener Diffusion Model

Abstract:

We present the RWiener package that provides R functions for the Wiener diffusion model. The core of the package are the four distribution functions dwiener, pwiener, qwiener and rwiener, which use up-to-date methods, implemented in C, and provide fast and accurate computation of the density, distribution, and quantile function, as well as a random number generator for the Wiener diffusion model. We used the typical Wiener diffusion model with four parameters: boundary separation, non-decision time, initial bias and drift rate parameter. Beyond the distribution functions, we provide extended likelihood-based functions that can be used for parameter estimation and model selection. The package can be obtained via CRAN.

Cite PDF Tweet

Published

April 19, 2014

Received

Aug 23, 2013

Citation

Wabersich & Vandekerckhove, 2014

Volume

Pages

6/1

49 - 56


1 Introduction

The diffusion model is one of the most successful models of choice reaction time in cognitive psychology (see , for an overview) and, while software packages that make diffusion model analyses possible have been made available , a diffusion model R package was so far missing. Due to the nature of the model, no closed analytical forms for the computation of the distribution functions exist, which has generated a need for more involved computational approximations of the various functions .

Here, we present an R package, RWiener, that provides the four functions R uses to represent a distribution, implemented for the Wiener diffusion model: the density function d to compute the probability density function (PDF) at a given quantile for a given parameter set; the probability function p to compute the cumulative distribution (CDF) at a given quantile for a given parameter set; the inverse CDF or quantile functionImplemented using a reverse lookup algorithm of the CDF function. q to compute the quantile at a given p-value (CDF-value) for a given parameter set; and a random number generator (RNG) r to generate random samples for a given parameter set.

In addition, we extended the package to include several likelihood-based functions that can be used in combination with R’s built-in estimation routines, like the nlm or optim function, to estimate the model parameters for an observed data set, assuming a Wiener diffusion process as the underlying process that created the data.

First, we will present the standard Wiener diffusion model with four parameters: the boundary separation α, the non-decision time τ, the bias parameter β and a drift rate parameter δ. Next, we will show the standard functionality of the RWiener package and how it can be used for parameter estimation. To conclude, we add a small discussion about the utility and extensibility of the here presented methods.

2 The Wiener diffusion model

For two-choice reaction time (2CRT) data coming from experiments in which subjects are asked to give one of two responses on a decision task, the Wiener diffusion model and its extensions have provided useful modeling approaches. In its simplest complete form, the Wiener diffusion model includes four parameters. The decision process is thought of as a continuous random walk, or diffusion, process that starts somewhere between two boundaries and ends as soon as it hits or crosses one or the other. The time of the first passage, and which boundary is hit first (and therefore which decision will be made) is probabilistic with a probability determined by the parameter set.

Table 1: The four main parameters of the Wiener diffusion model, with their substantive interpretations. Reprinted with permission from . Copyright 2009, Joachim Vandekerckhove and Department of Psychology and Educational Sciences, University of Leuven, Belgium.
Symbol Parameter Interpretation
α Boundary separation Speed-accuracy trade-off
(high α means high accuracy)
β Initial bias Bias for either response
(β>0.5 means bias towards response ‘A’)
δ Drift rate Quality of the stimulus
(close to 0 means ambiguous stimulus)
τ Nondecision time Motor response time, encoding time
(high means slow encoding, execution)

The standard Wiener diffusion model incorporates the following four parameters, a summary of which is provided in Table 1: (1) the boundary separation α which defines the distance between the two boundaries, with the first boundary being at 0 and the second being at α, (2) the non-decision time τ that defines the time that passes without any diffusion process going on (e.g., this incorporates the time needed to encode a stimulus and to execute a motor response), (3) the bias parameter β defining the relative starting point of the diffusion process between the two boundaries (the absolute starting point ζ0 can be obtained by ζ0=βα) and (4) the drift rate parameter δ which captures the tendency of the diffusion process to drift towards the upper boundary. Given these parameters, the latent information accumulation process is modeled with the stochastic process ddtXtNormal(δ,s2), with initial condition X0=αβ. The decision time is modeled as the first time at which Xt0 or Xtα, and the response time is the sum of the decision time and the non-decision time τ. Figure 1 shows an illustration of the Wiener diffusion model as described.

graphic without alt text
Figure 1: A graphical illustration of the Wiener diffusion model for two-choice reaction times. An evidence counter starts at value αβ and evolves with random increments. The mean increment is δ. The process terminates as soon as the accrued evidence exceeds α or deceeds 0. The decision process starts at time τ from the stimulus presentation and terminates at the reaction time.

The parameters have the following restrictions: 0<β<1, α>0, τ>0. Often, the variance parameter s is fixed at 0.1, whereas we fixed it at 1, yielding a more convenient interpretation of the drift rate parameter and greater computational efficiency. Conversion to a scale with the variance parameter fixed at any s can be done by multiplying the α and δ parameters by s.

The upper boundary may be defined as corresponding to correct trials, with the lower boundary corresponding to error trials. However, it often also makes sense to map qualitatively different responses to the two boundaries (e.g., in a same-different discrimination task, the upper bound could represent the ‘different’ responses). For generality, we will always speak simply of upper and lower boundaries. An overview of the computational strategy for the PDF and CDF functions is provided in the Appendix.

3 Usage of the package

The package can be obtained via CRAN for R version 2.15.0 or higher, and includes documentation that provides useful information and examples. As a guide through this demonstration of the RWiener package functionality, we use an example data set created by the random function of the package (using a RNG seed of 0 for exact reproducibility):

set.seed(0)
dat <- rwiener(n=100, alpha=2, tau=.3, beta=.5, delta=.5) 

The arguments for this function call are explained in the next paragraph. This command will generate a random data set, consisting of 100 observations, each generated from a random Wiener diffusion model process with the given four parameters. We used a drift rate parameter δ of 0.5, meaning that the drift diffusion process slightly tends towards the upper boundary. There was no initial bias towards either of the two boundaries, indicated by β=0.5. The boundaries are separated by a value of α=2 and we used a non-decision time of τ=0.3, so there will never be a RT smaller than 0.3.

The created dat object is a dataframe with 100 observations and two variables: dat$q contains the RTs, whereas dat$resp contains a factor to indicate the response made (upper vs. lower). Note that the upper bound is assigned to the first level and the lower bound to the second one.

The four main functions of the package are described in the following subsection. All four functions are implemented in C, using the R library for C functions. These C functions are called inside end-user R functions, so that the end-user never calls the C functions directly, only indirectly through the provided R functions.

Standard distribution functions

To get the density of a specific quantile, one can use the density function dwiener:

dwiener(dat$q[1], alpha=2, tau=.3, beta=.5, delta=.5, resp=dat$resp[1], give_log=FALSE)

The function takes seven arguments, the first argument is the RT at which the density shall be evaluated, the next four arguments are the model parameters α, τ, β, and δ. The next argument, resp takes any of three valid values: upper, lower, or both, meaning that the function shall return the density at the given quantile for the upper boundary, the lower boundary or the sum of both densities together, respectively. Its default value is set to upper. Lastly, the function can be given an additional argument give_log that takes TRUE or FALSE as values and is set to FALSE as a default. When this argument is TRUE, the function will return the logarithm of the density instead of the density itself.

This function can also be used to draw simple plots. For example, the following code results in the plot in the left panel of Figure 2:

curve(dwiener(x, 2, .3, .5, .5, rep("upper", length(x))), 
      xlim=c(0,3), main="Density of upper responses", 
      ylab="density", xlab="quantile") 
graphic without alt text
Figure 2: Example of plots made with function dwiener and pwiener.

To calculate the cumulative distribution, we can use the CDF function pwiener as follows:

pwiener(dat$q[1], alpha=2, tau=.3, beta=.5, delta=.5, resp=dat$resp[1])

The arguments are the same as for the previously described function, without the give_log argument. Further, this function can be used in the same manner to draw plots, like the one shown in the right panel of Figure 2.

We can also find the appropriate quantile for a given probability via the inverse CDF function, implemented as qwiener:

# lookup of the .2 quantile for the CDF of the lower boundary 
qwiener(p=.2, alpha=2, tau=.3, beta=.5, delta=.5, resp="lower")

Again, the last five arguments of this function have the same meaning as for the two previously described functions. The first argument, however, is different: instead of a quantile, the first argument needs to be a probability, indicating the CDF-value to be looked up.

To round out the four standard distribution functions, the RWiener package further provides a function named rwiener that generates random values. Again, the last four input variables correspond to the model parameters and are the same as in the other distribution functions. The first argument n takes a number, indicating the desired number of randomly generated variates.

Plot function

To easily visualize data from the Wiener diffusion model, we can use the plot function wiener_plot as follows:

wiener_plot(dat)

This draws the two densities of the observed (or, in this case, generated) lower and upper responses. Figure 3 shows an example of this plot.

graphic without alt text
Figure 3: Plot drawn with wiener_plot function.

Parameter estimation with R’s general-purpose optimization algorithms

In addition to the four distribution functions dwiener, pwiener, qwiener, rwiener and the plot function wiener_plot, the RWiener package provides four more functions designed for use in parameter estimation and model selection: (1) the wiener_likelihood function, which calculates the logarithm of the likelihood for given parameter values and data; (2) the wiener_deviance function, which calculates the model deviance 2; (3) the wiener_aic function, which calculates the model AIC (Akaike’s Information Criterion) 2+8; and (4) the wiener_bic function, which calculates the model BIC (Bayesian Information Criterion) 2+4ln(N).

x <- c(2, .3, .5, .5) 
wiener_likelihood(x=x, dat=dat) 
wiener_deviance(x=x, dat=dat) 
wiener_aic(x=x, dat=dat) 
wiener_bic(x=x, dat=dat) 

The first argument of these functions, x, is a vector containing the four parameter values of the model in the order: α, τ, β, δ. The second argument of these functions, dat, is a dataframe that contains the data points and the values of the two dependent variables: the RT and the boundary that was hit. This dataframe has to have the same shape as the dataframe dat as generated previously with the rwiener function. The first column of the dataframe has to contain the RTs and the second column of the dataframe has to contain the response type (upper vs. lower).

Estimating parameters for a single condition

In order to estimate model parameters for a given data set, one can use R’s optim or nlm function:Note that the wiener_likelihood function will return Inf for certain values, causing the nlm function to warn that the Inf was replaced by the maximum positive value.

# using optim, first with Nelder-Mead algorithm, then with BFGS 
optim1 <- optim(c(1, .1, .1, 1), wiener_deviance, dat=dat, method="Nelder-Mead") 
optim2 <- optim(optim1[["par"]], wiener_deviance, dat=dat, method="BFGS", hessian=TRUE)

# using nlm, which uses a Newton-type algorithm 
nlm1 <- nlm(p=c(1, .1, .1, 1), f=wiener_deviance, dat=dat) 

The obtained model parameter estimates can then be used to draw further inferences or create predictions. The help pages of the functions presented here show additional information and examples of usage.

Estimating parameters for multiple conditions

The deviance function can be combined to compute a single loss value for multiple conditions. To do so, first create a new inline function:

many_drifts <- function(x, datlist) {
  l = 0
  for (c in 1:length(datlist)) {
    l = l + wiener_deviance(x[c(1, 2, 3, c+3)], datlist[[c]])
  }
  return(l)
}

# create a second data set and a list containing both data sets 
dat2 <- rwiener(n=100, alpha=2, tau=.3, beta=.5, delta=1) 
datlist <- list(dat, dat2) 

# use nlm to estimate parameters 
nlm1 <- nlm(p=c(1, .1, .1, 1, 1), f=many_drifts, dat=datlist)

If the input x is a parameter vector containing the first three parameters (α, τ, β) and then C drift rates, and dat is a list with C data sets, then the result will contain point estimates of three joint parameters in addition to C condition-specific drift rate estimates.

An alternative function can be defined that does not allow the drift rates to differ between conditions:

one_drift <- function(x, datlist) { 
  l = 0 
  for (c in 1:length(datlist)) { 
    l = l + wiener_deviance(x, datlist[[c]]) 
  } 
  return(l) 
} 

nlm2 <- nlm(p=c(1, .1, .1, 1), f=one_drift, dat=datlist)

Finally, the two competing models can be compared using the AIC criterion for model selection.The input argument loss, with which a custom deviance function can be passed to compute the AIC for more complex models, is only available from RWiener v1.2.

AIC1 <- wiener_aic(x=nlm1$estimate, dat=datlist, loss=many_drifts) 
AIC2 <- wiener_aic(x=nlm2$estimate, dat=datlist, loss=one_drift) 

For more details on the interpretation of the AIC criterion, see .

4 Discussion

We presented RWiener, an R package that provides efficient algorithms to compute Wiener diffusion model functions and use these for further inference. This is the first package in R that provides full Wiener diffusion model functionality for R. We demonstrated how to use these functions and how to extend their usage to combine them with other R functions in order to do parameter estimation.

The presented functions can help researchers who work with 2CRT data to analyze their data in the Wiener diffusion model framework.

5 Authors’ note

Correspondence concerning this article may be addressed to DW () or JV (). This project was supported by grant #1230118 from the National Science Foundation’s Measurement, Methods, and Statistics panel to JV, and a travel grant from German Academic Exchange Service (PROMOS) to DW.

6 Appendix

Mathematical descriptions of the density and distribution functions

Probability density function

The PDF of the diffusion model is that of the first-passage times of the accumulation process over the boundaries. That is, it is the expected distribution of the time until the process first hits or crosses one or the other boundary. This results in a bivariate distribution, over responses x and hitting times t. The PDF is approximated by Equation (1):

(1)d(t,x=0|α,τ,β,δ)=1α2exp[αβδ12δ2(tτ)]f(tτα2|β).

The first passage time distribution for hits at the opposite boundary is d(t,x=1|α,τ,β,δ)=d(t,x=0|α,1β,τ,δ). In Equation (1), f can be either the large-time representation

(2)fL(u|β)=πk=1+kexp(k2π2u2)sin(kπβ)

or the small-time representation

(3)fS(u|β)=12πu3k=+(2k+β)exp[(2k+β)22u].

Either approximation of f can be more computationally efficient (in terms of the number of terms required to have the infinite sum converge below an error bound ε=1010), depending on the parameter and data values. Specifically, provide a decision rule to determine the more efficient formula: Equation (3) should be used if and only if

(4)2+2ulog(2ε2πu)2log(πuε)π2u<0,

where u=tτα2. The derivation of this decision rule, as well as demonstrations of its efficiency, are given in .

Cumulative distribution function

A similar rule for the efficient computation of the CDF can be found in —there exists a large-time representation

(5)pL(t,x=0|α,τ,β,δ)=P02πα2exp(αβδδ2(tτ)2)×k=1+ksin(πkβ)δ2+(kπ/α)2exp[12(kπα)2(tτ)],

and a small-time representation

(6)pS(t,x=0|α,τ,β,δ)=P0sgnδk=+{exp(2δαk2δαβ)Φ[sgnδ2αk+αβδ(tτ)tτ]exp(2δαk)Φ[sgnδ2αkαβδ(tτ)tτ]},

where sgn is the signum function and Φ denotes the cumulative normal distribution function. In both cases, P0 is the probability of a hit at the lower boundary,

P0={1exp[2δα(1β)]exp(2δαβ)exp[2δα(1β)]if δ0,1βif δ=0.

In order to reach an approximation with absolute error not exceeding ε=1010, the infinite sums in Equations (5) and (6) are approximated with KL and KS terms, respectively . KL and KS are the smallest integers that satisfy the following constraints: {KL21tτ(απ)2KL22tτ(απ)2{log[12επ(tτ)(δ2+π2α2)]+δαβ+12δ2(tτ)} and {KSβ1+12δαlog{ε2[1exp(2δα)]}KS12α[0.5352(tτ)+δ(tτ)+αβ]KS12β12αtτΦ1{εα0.32π(tτ)exp[12δ2(tτ)+δαβ]}. Due to the larger computational effort of the small-time representation of the CDF, the large-time representation is selected if KL/KS<10.

CRAN packages used

RWiener

CRAN Task Views implied by cited packages

Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

Footnotes

  1. Implemented using a reverse lookup algorithm of the CDF function.[↩]
  2. Note that the wiener_likelihood function will return Inf for certain values, causing the nlm function to warn that the Inf was replaced by the maximum positive value.[↩]
  3. The input argument loss, with which a custom deviance function can be passed to compute the AIC for more complex models, is only available from RWiener v1.2.[↩]

References

S. P. Blurton, M. Kesselmeier and M. Gondan. Fast and accurate calculations for cumulative first-passage time distributions in Wiener diffusion models. Journal of Mathematical Psychology, 56(6): 470–475, 2012.
D. J. Navarro and I. G. Fuss. Fast and accurate calculations for first-passage times in Wiener diffusion models. Journal of Mathematical Psychology, 53(4): 222–230, 2009.
F. Tuerlinckx, E. Maris, R. Ratcliff and P. De Boeck. A comparison of four methods for simulating the diffusion process. Behavior Research Methods, 33(4): 443–456, 2001.
J. Vandekerckhove. Extensions and applications of the diffusion model for two-choice response times. 2009.
J. Vandekerckhove and F. Tuerlinckx. Diffusion model analysis with MATLAB: A DMAT primer. Behavior Research Methods, 40(1): 61–72, 2008.
J. Vandekerckhove and F. Tuerlinckx. Fitting the Ratcliff diffusion model to experimental data. Psychonomic Bulletin & Review, 14(6): 1011–1026, 2007.
J. Vandekerckhove, F. Tuerlinckx and M. D. Lee. Hierarchical diffusion models for two-choice response times. Psychological Methods, 16(1): 44, 2011.
A. Voss and J. Voss. fast-dm: A free program for efficient diffusion model analysis. Behavior Research Methods, 39(4): 767–775, 2007.
E.-J. Wagenmakers. Methodological and empirical developments for the Ratcliff diffusion model of response times and accuracy. European Journal of Cognitive Psychology, 21(5): 641–671, 2009.
E.-J. Wagenmakers and S. Farrell. AIC model selection using Akaike weights. Psychonomic Bulletin & Review, 11(1): 192–196, 2004.
E.-J. Wagenmakers, H. L. J. van der Maas and R. P. P. P. Grasman. An EZ-diffusion model for response time and accuracy. Psychonomic Bulletin & Review, 14(1): 3–22, 2007.
T. V. Wiecki, I. Sofer and M. J. Frank. HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python. Frontiers in Neuroinformatics, 7(14): 2013.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Wabersich & Vandekerckhove, "The RWiener Package: an R Package Providing Distribution Functions for the Wiener Diffusion Model", The R Journal, 2014

BibTeX citation

@article{RJ-2014-005,
  author = {Wabersich, Dominik and Vandekerckhove, Joachim},
  title = {The RWiener Package: an R Package Providing Distribution Functions for the Wiener Diffusion Model},
  journal = {The R Journal},
  year = {2014},
  note = {https://rjournal.github.io/},
  volume = {6},
  issue = {1},
  issn = {2073-4859},
  pages = {49-56}
}