quickpsy is a package to parametrically fit psychometric functions. In comparison with previous R packages, quickpsy was built to easily fit and plot data for multiple groups. Here, we describe the standard parametric model used to fit psychometric functions and the standard estimation of its parameters using maximum likelihood. We also provide examples of usage of quickpsy, including how allowing the lapse rate to vary can sometimes eliminate the bias in parameter estimation, but not in general. Finally, we describe some implementation details, such as how to avoid the problems associated to round-off errors in the maximisation of the likelihood or the use of closures and non-standard evaluation functions.
The response of humans, other animals and neurons in a classification task with a binary response variable and a stimulus level as explanatory variable is often binomially modelled as (Watson 1979; O’Regan and Humbert 1989; Klein 2001; Wichmann and Hill 2001a; Macmillan and Creelman 2004; Gold and Shadlen 2007; Kingdom and Prins 2009; Knoblauch and Maloney 2012; Gold and Ding 2013; Lu and Dosher 2013)
where
where
The model assumes that a given classification response does not depend on previous classifications. This is an idealisation, given the known order effects such as adaptation, fatigue, learning or serial dependence (Kingdom and Prins 2009; Fründ et al. 2011; Van der Burg et al. 2013; Fischer and Whitney 2014; Summerfield and Tsetsos 2015).
Light detection. To measure the ability of an observer to detect
light, a dim flash of light selected at random from 5 different light
intensities (
Criterion-independent light detection. In the previous procedure,
Light detection with lapses. Sometimes, the observer will miss the
flash (because of a blink, for example) or will make an error reporting
the response (pressing the wrong response button, for example). To
account for these response lapses,
The point estimation of the parameters
Maximising
The confidence intervals CI for
The observer’s behaviour in classification tasks is often summarised by
a threshold
ML estimates of the parameters are often obtained using non-linear optimisation (Nash 2014), a method that might produce unsuitable estimates when the data suffer from lapses as those described above. Future studies might elucidate the possible problems of non-linear optimisation to fit psychometric functions with lapses and whether Bayesian approaches might be preferred (Kuss et al. 2005).
quickpsy (Linares and López-Moliner 2016) is
a package to estimate and plot the parameters, the thresholds, the
bootstrap confidence intervals for the parameters and the thresholds,
and the associated psychometric functions for the model in (1).
quickpsy also allows the comparison of the parameters and the
thresholds for different groups using the bootstrap. It is build to
easily analyse data for multiple groups, which is common in
classification experiments (Gescheider 1997; Lu and Dosher 2013): multiple
observers, for example, might be tested under different conditions, such
as flashes of different size in the light detection experiments. For
that purpose, quickpsy incorporates, for example, a function to easily
create a data frame from multiple files (quickreadfiles
) or uses the
dimensions of the groups in the data to produce standard plots for the
parameters, the thresholds and the associated psychometric functions.
As an alternative to quickpsy, classification data can also be
analysed in R using the base function glm
for fitting generalized
linear models and tools from package
psyphy
(Knoblauch and Maloney 2012; Knoblauch 2014). Fitting psychometric functions could be
considered a special case of fitting generalised linear models (GLM)
(Moscatelli et al. 2012; Knoblauch and Maloney 2012), which is done in R using glm
(Knoblauch and Maloney 2012). With glm
, one can estimate the parameters of the
linear predictor associated with the GLM and use them to estimate the
parameters of the model in (1) (Knoblauch and Maloney 2012). Also by
applying confint
to the glm
output (Knoblauch and Maloney 2012), the
confidence intervals can be calculated using the profile likelihood
method (Venables and Ripley 2002). To calculate the parameters and confidence
intervals for multiple groups, the user needs to manually or
automatically loop by group. Alternatively, if one is not interested
in the individual values of the parameters for each group, but on
fitting a single model to the multiple groups, glm
allows to do that
using advanced statistical methods
(Moscatelli et al. 2012; Knoblauch and Maloney 2012).
To estimate the parameters in (1) using glm
is straightforward
when glm
in action for the
HSP
dataset, see Knoblauch and Maloney (2012)), but becomes more complicated when
glm
when
psyfun.2asym
function, which by
iterating glm
calls, allows the estimation of
The specific shape of
In other languages, the current available tools to calculate the parameters in (1) and its confidence intervals are psignifit (Fründ et al. 2011) for Python (free), psycophysica (Watson and Solomon 1997) for Mathematica (commercial) and palamedes (Kingdom and Prins 2009) for MATLAB (commercial). From them, palamedes can also fit models for multiple groups. Furthermore, palamedes and psignifit can fit psychometric functions using Bayesian methods, which is something not implemented in quickpsy.
A classic experiment on light detection (similar to the one first described in the Introduction) was conducted by Hecht et al. (1942). The data from the experiment is available in MPDiR, a package that includes material from the book Modeling Psychophysical Data in R (Knoblauch and Maloney 2012).
The data is included in the data frame HSP
, which has a tidy
structure (Wickham 2014), that is, each column corresponds to a
variable and each row is an observational unit. The variables in HSP
are the intensity level measured in quanta Q
(what we named stimulus
level N
(what we named Obs
and the run for each observer Run
. In
the original dataset, the number of Yes responses for each stimulus
level p
of responding Yes was
given instead. But, calculating the number of Yes responses from p
is trivial:
library(MPDiR)
library(dplyr)
library(quickpsy)
HSP <- HSP %>% mutate(k = round(p * N / 100)) # adding a column with the number of Yes
We used the round
function because, curiously, there was some error in
the original data set (see (Knoblauch and Maloney 2012)).
To fit the model in (1) to the HSP
dataset, we call the
quickpsy
function from package quickpsy
fit <- quickpsy(HSP, Q, k, N, grouping = .(Run, Obs), B = 1000)
where B
indicates the number of bootstrap samples (which can be
reduced to shorten the computation time). In this call to quickpsy
,
many arguments were not explicitly specified but left to the default
values: guess = 0
and lapses = 0
;
fun = cum_normal_fun
;
the probability to calculate the threshold set to 0.5:
prob = guess + 0.5 * (1 - guess)
; the confidence intervals calculated
using parametric bootstrap: bootstrap = "parametric"
.
quickpsy
returns a list with all the information from the fitting.
Most elements of the list are data frames. For example, the parameters
fit$par
(where p1
corresponds to p2
corresponds to fit$parci
. The thresholds and the confidence
intervals for the thresholds are located in fit$thresholds
and
fit$thresholdsci
.
quickpsy
also returns the data frame fit$parcomparisons
, which
includes for each parameter, paired comparisons between groups for all
possible pairs of groups using the bootstrap (Efron and Tibshirani 1994).
To compare two given groups, the difference between the bootstrap
estimations of the parameter is calculated for all samples and from the
distribution of differences, given the significance level set by the
user (default: .95), the percentile confidence intervals are calculated.
It is considered that the parameter differs between the two groups if
the confidence intervals do not include zero. Paired comparisons for the
thresholds performed using the same method are available in
fit$thresholdcomparisons
.
HSP
data.To plot the fitted psychometric functions, we call the quickpsy
function plotcurves
including the fitted model as an argument
plotcurves(fit)
To plot the parameters and the thresholds, we use functions
plotpar(fit)
and plotthresholds(fit)
from package quickpsy,
respectively.
Hecht et al. (1942), indeed, used quickpsy
with log = TRUE
fit <- quickpsy(HSP, Q, k, N, grouping = .(Run, Obs), B = 1000, log = TRUE)
The plots associated to the fit above (Figure 1), using package gridExtra (Auguie 2015) for the plot arrangements, can be obtained as follows
library(gridExtra)
grid.arrange(plotcurves(fit), arrangeGrob(plotpar(fit), plotthresholds(fit), ncol = 2))
HSP
is a data frame that contains summarised data: the counts of Yes
responses. quickpsy
, however, can fit the data frames containing more
raw data in which each row corresponds to the result of one
classification. In that case, the data frame should contain a response
column with 1
s indicating Yes responses and 0
s or -1
s indicating
No responses and quickpsy
should be called with the name of the
response column as the k
argument (without the argument n
corresponding to the number of trials).
We hope that this example has illustrated that quickpsy requires little coding to perform typical fits and plots in a classification task with multiple groups.
Following the second example in the Introduction, consider some hypothetical data in which an observer needs to decide on which from two intervals a flash of light was presented.
Q <- c(80, 160, 240, 320, 400) # luminance
n <- 100 # number of classifications per stimulus level
k <- c(59, 56, 69, 84, 96) # number of correct classifications
dat2IFC <- data.frame(Q, k, n)
To fit the model in (1) to this data, we call quickpsy
with
guess = 0.5
)
fit <- quickpsy(dat2IFC, Q, k, n, guess = .5)
Consider some hypothetical data from a light detection experiment, in which the observer commits a lapse (third example of the Introduction).
Q <- seq(0, 420, 60)
n <- 20
k <- c(0, 0, 4, 18, 20, 20, 19, 20) # lapse in the second to last intensity
datLapse <- data.frame(Q, k, n)
Suppose that we suspect that the observer might have committed lapses.
Accordingly, we fit the model in (1) allowing lapses = TRUE
).
fit <- quickpsy(datLapse, Q, k, n, lapses = TRUE, prob = .75)
The fitted psychometric function obtained with plotcurves(fit)
is
shown in Figure 2 on the left.
Typically, we are interested in the values of
fit <- quickpsy(datLapse, Q, k, n, lapses = 0, prob = .75)
Figure 2 on the right shows that
Allowing
parweibull <- c(10, 3)
create_xs <- function(i, f) data.frame(scheme = i, y = f,
x = inv_weibull_fun(f, parweibull))
s <- list()
s[[1]] <- create_xs(1, c(.3, .4, .48, .52, .6, .7))
s[[2]] <- create_xs(2, c(.1, .3, .4, .6, .7, .9))
s[[3]] <- create_xs(3, c(.3, .44, .7, .8, .9, .98))
s[[4]] <- create_xs(4, c(.1, .2, .3, .4, .5, .6))
s[[5]] <- create_xs(5, c(.08, .18, .28, .7, .85, .99))
s[[6]] <- create_xs(6, c(.3, .4, .5, .6, .7, .99))
s[[7]] <- create_xs(7, c(.34, .44, .54, .8, .9, .98))
s <- do.call("rbind", s)
s$scheme <- factor(s$scheme)
Next, Wichmann and Hill simulated binomial responses using the Weibull
function with several possible values for
create_sim_dat <- function(d) {
psychometric_fun <- create_psy_fun(weibull_fun, .5, d$lambda)
ypred <- psychometric_fun(d$x, parweibull)
k <- rbinom(length(d$x), d$n, ypred)
data.frame(x = d$x, k = k, n = d$n , y = k/d$n)
}
library(dplyr)
simdat <- merge(s, expand.grid(n = 160, sample = 1:100, lambda = seq(0,.05, .01))) %>%
group_by(scheme, n, sample, lambda) %>% do(create_sim_dat(.))
To fit the simulated data, we use quickpsy
bounding the possible
values of
fit_lapses <- quickpsy(simdat, x, k, n, within = .(scheme, lambda, sample),
fun = weibull_fun, bootstrap = "none", guess =.5, lapses = TRUE,
parini = list(c(1, 30), c(1, 10), c(0, .06)))
Then, we average the threshold estimation across simulations
thre_lapses <- fit_lapses$thresholds %>% group_by(scheme, lambda) %>%
summarise (threshold = mean(thre))
and plot them including the non-biased threshold for comparison
real_threshold <- inv_weibull_fun((.75 - .5) / (1 - .5 - 0), parweibull)
library(ggplot2)
ggplot(thre_lapses) + geom_point(aes(x = lambda, y = threshold, color = scheme)) +
geom_hline(yintercept = real_threshold, lty = 2)
Figure 3 shows, consistent with Prins (2012), that the
thresholds are biased for all of the sampling schemes and that the bias
increases with the
In the previous sections, we exemplified the use of quickpsy for
performance-based procedures (light detection), but fitting psychometric
functions is also common for appearance-based procedures
(Kingdom and Prins 2009). For example, psychometric functions are often fitted
to data measuring visual illusions such as the flash-lag (e.g.,
(López-Moliner and Linares 2006)). In those cases,
quickpsy
, the main function of quickpsy, is a non-standard
evaluation NSE function and, as such, the names of the arguments and not
only their values can be accessed (Wickham 2014). NSE functions,
which are common in R, are useful for example to label the axes of a
plot using the name of the arguments (Wickham 2014). As calling NSE
functions from other functions is difficult (Wickham 2014),
quickpsy also incorporates quickspsy_
which is the standard
evaluation SE version of quickpsy
. To call SE functions, some of the
arguments need to be quoted. The following code exemplifies the use of
quickspsy_
to fit the HSP
dataset from the first example of usage
fit <- quickpsy_(HSP, "Q", "k", "N", grouping = c("Run", "Obs"))
The NSE high-level plotting functions of quickpsy plotcurves
,
plotpar
and plotthresholds
also have associated SE versions:
plotcurves_
, plotpar_
and plotthresholds_
.
One of the main features of quickpsy is the possibility of fitting
multiple groups. To handle the data analysis and plotting for multiple
groups, quickpsy relies on
dplyr (Wickham and Francois 2015) and
ggplot2 (Wickham 2009)
respectively. In particular, quickpsy makes extensive use of the
function do
from dplyr, which splits input data frames by group,
applies a function to each group and returns an output data frame.
quickpsy
, for example, calls the functions curves
and thresholds
,
which basically contain a do
function that, in turn, calls one_curve
and one_threshold
, which are the functions that calculate the
psychometric curve and the threshold for a group of data (this method of
function X
calling function one_X
, which performs the computations
for a group of data, is used for some other functions called from
quickpsy
).
Closures or function factories are functions written by functions
(Wickham 2014). When estimating the maximum likelihood parameters
for the model in (1), two procedures can be executed naturally
using closures. One is the creation of create_psy_fun
closure. The other is the creation of the negative log
likelihood function from create_nll
closure.
Consider that we want to manually fit a cumulative normal psychometric function to the following data
x <- c(-0.056, 0.137, 0.331, 0.525, 0.719, 0.912, 1.100)
k <- c(0, 5, 11, 12, 12, 12, 12)
n <- c(12, 12, 12, 12, 12, 12, 12)
by direct minimisation of the negative log likelihood. In quickpsy we build the negative log likelihood using a function similar to
nll <- function(p) {
phi <- pnorm(x, p[1], p[2])
-sum(k * log(phi) + (n - k) * log(1 - phi))
}
and use optim
to find the parameters that minimise it
optim(c(0.5, 0.1), nll)
optim
requires initial values for the parameters that we arbitrarily
chose as c(0.5, 0.1)
. But suppose that we choose another set of
initial parameters
optim(c(0.1, 0.1), nll)
This time optim
returns an error:
Error in optim(c(0.1, 0.1), nll) :
function cannot be evaluated at initial parameters
The problem is that for the 1.100
stimulus level,
pnorm(1.100, 0.1, 0.1)
is rounded to 1
and therefore the term
log(1 -
phi)
in the negative log likelihood is -Inf
.
To avoid this problem, the coding of the negative log likelihood in quickpsy incorporates the following lines
phi[phi < .Machine$double.eps] <- .Machine$double.eps
phi[phi > (1 - .Machine$double.eps)] <- 1 - .Machine$double.eps
That is, values that are smaller than the machine accuracy
(.Machine$double.eps
) are replaced by .Machine$double.eps
and values
that are larger than 1 -
.Machine$double.eps
are replaced by 1 -
.Machine$double.eps
. We can verify that quickpsy
does not produce
errors when using the problematic initial values with the following
code
dat <- data.frame(x, k, n)
fit <- quickpsy(dat, x, k, n, parini = c(0.1, 0.1))
quickpsy allows to use the cumulative normal function, but also the
cumulative logistic, cumulative Weibull or any other cumulative
distribution function defined by the user for modeling the probability
of responding Yes. The function nll
evaluating the log likelihood
was thus defined in quickpsy similar to the way above in order to
encompass all these variants. Note, however, that if the aim was to fit
only a cumulative normal function, the negative log likelihood function
could be defined in the following way in R to cover a larger range of
possible values of p
without problems by directly evaluating the
cumulative distribution function on the log scale
nll <- function(p) {
logPhi <- pnorm(x, p[1], p[2], log.p = TRUE)
log1mPhi <- pnorm(x, p[1], p[2], lower.tail = FALSE, log.p = TRUE)
-sum(k * logPhi + (n - k) * log1mPhi)
}
By default, quickpsy
searches the minimum of the negative log
likelihood using optim
, which requires initial values for the
parameters. To free the user from the necessity of providing initial
parameters, quickpsy
uses as initial values the parameters obtained by
linear modelling of the probit-transformed data
(McKee et al. 1985; Gescheider 1997). Linear modelling
probit-transformed data is a poor technique to estimate the parameters
of the psychometric function (McKee et al. 1985), but our tests suggest
that they are good enough to be used as initial parameters.
The user can overwrite the initial parameters calculated using
probit-transformed data by providing a vector of initial parameters to
the argument parini
of quickpsy
. A list of vectors can also be fed
when the user wants to set up some bounds for the parameters (see the
last example of usage).
We thank Kenneth Knoblauch for providing helpful comments on a draft version of the manuscript and helping us with the problem associated to the rounding-off errors in the calculation of the likelihood. The research group is supported by Grant 2014SGR79 from the Catalan government. J. López-Moliner was supported by an ICREA Academic Distinguished Professorship award and grant PSI2013-41568-P from MINECO.
quickpsy, psyphy, modelfree, MPDiR, gridExtra, dplyr, ggplot2
Databases, ModelDeployment, Phylogenetics, Psychometrics, Spatial, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Linares & López-Moliner, "quickpsy: An R Package to Fit Psychometric Functions for Multiple Groups", The R Journal, 2015
BibTeX citation
@article{RJ-2016-008, author = {Linares, Daniel and López-Moliner, Joan}, title = {quickpsy: An R Package to Fit Psychometric Functions for Multiple Groups}, journal = {The R Journal}, year = {2015}, note = {https://rjournal.github.io/}, volume = {8}, issue = {1}, issn = {2073-4859}, pages = {122-131} }