This article describes the R package rdrobust, which provides data-driven graphical and inference procedures for RD designs. The package includes three main functions: rdrobust
, rdbwselect
and rdplot
. The first function (rdrobust
) implements conventional local-polynomial RD treatment effect point estimators and confidence intervals, as well as robust bias-corrected confidence intervals, for average treatment effects at the cutoff. This function covers sharp RD, sharp kink RD, fuzzy RD and fuzzy kink RD designs, among other possibilities. The second function (rdbwselect
) implements several bandwidth selectors proposed in the RD literature. The third function (rdplot
) provides data-driven optimal choices of evenly-spaced and quantile-spaced partition sizes, which are used to implement several data-driven RD plots.
The regression-discontinuity (RD) design is a widely employed quasi-experimental research design in social, behavioral and related sciences; for reviews see Imbens and T. Lemieux (2008) and Lee and T. Lemieux (2010). In this design, units are assigned to treatment based on whether their value of an observed covariate is above or below a known cutoff, and the probability of receiving treatment conditional on this covariate jumps discontinuously at the cutoff. This jump induces “variation” in treatment assignment that may be regarded, under appropriate assumptions, as being unrelated to potential confounders. Thus, inference in RD designs is typically conducted using only observations near the cutoff or threshold, where the discontinuous change in the probability of treatment assignment occurs. Due to its local nature, RD average treatment effects estimators are usually constructed using local-polynomial nonparametric regression, and statistical inference is based on large-sample approximations.
This article gives an introduction to the R package rdrobust (Calonico et al. 2015b), which offers an array of data-driven local-polynomial and partitioning-based inference procedures for RD designs. We introduce three main functions implementing several data-driven nonparametric point and confidence intervals estimators, bandwidth selectors, and plotting procedures useful for RD empirical applications:
rdrobust
(). This function implements the bias-corrected robust (to
“large” bandwidth choices) inference procedure proposed by
Calonico et al. (2014, CCT hereafter), as well as
many other RD inference procedures employing local-polynomial
regression. The function rdrobust
offers bias-corrected robust
confidence intervals for average treatment effects at the cutoff for
sharp RD, sharp kink RD, fuzzy RD and fuzzy kink RD designs.
rdbwselect
(). This function implements several data-driven
bandwidth selectors for RD designs based on the recent work of
Imbens and K. Kalyanaraman (2012, IK hereafter) and CCT. Although
this command may be used as a stand-alone bandwidth selector in RD
applications, its main purpose is to provide fully data-driven
bandwidth choices to be used by our main function rdrobust()
.
rdplot
(). This function implements several data-driven optimal
choices of evenly-spaced and quantile-spaced bins, which are useful
to produce RD plots that either approximate the regression function
by local sample averages or represent the overall variability of the
data in a disciplined way. These optimal choices are based on an
integrated mean squared error expansion of appropriately constructed
partitioning estimators, as discussed in
Calonico et al. ({in press}); see also
Cattaneo and Farrell (2013) for related results. These binned sample
means and partition size chosen are used to construct the popular RD
plots commonly found in RD applications in a fully automatic way.
We first provide a brief review of all the methods implemented in
rdrobust, and then discuss an empirical illustration using some of the
features of our functions rdrobust()
, rdbwselect()
and rdplot()
. A
full description of the capabilities of the package rdrobust is
available in its manual and help files. A companion Stata (StataCorp. 2013)
package is described in Calonico, M. D. Cattaneo, and R. Titiunik (2014).
We now present the basic RD framework, describe the population parameters of interest, introduce the local-polynomial based estimators, review different inference procedures, and briefly summarize the popular RD plots. Regularity conditions and other technical aspects underlying the estimands and estimators may be found in the references given throughout. Recent results and further details on RD designs are discussed in IK and CCT (see also the supplemental appendix), and references therein.
We adopt the potential outcomes framework commonly employed in the treatment effects literature (e.g., Heckman and E. J. Vytlacil (2007) and Imbens and J. M. Wooldridge (2009)). Let \(\{(Y_i(0),Y_i(1),T_i(0),T_i(1),\) \(X_i)' : i=1,2,\ldots,n\}\) be a random sample from \((Y(0),Y(1),T(0),T(1),X)'\), where \(Y(1)\) and \(Y(0)\) denote the potential outcomes with and without treatment, respectively, \(T(0)\) and \(T(1)\) denote potential treatment status, and the scalar regressor \(X_i\in\mathbb{R}\) is the so-called “running variable” or “score” determining treatment assignment based on whether it exceeds a known cutoff. In particular, unit \(i\) is assigned treatment if \(X_i>\bar{x}\) and not assigned treatment if \(X_i<\bar{x}\), for some known fixed value \(\bar{x}\in\mathbb{R}\). \(f(x)\) denotes the continuous (Lebesgue) density of \(X_i\).
This setup allows for imperfect compliance, which in the RD literature is known as the fuzzy RD design. The case of perfect treatment compliance is usually called the sharp RD design. In either case, the observed outcome and treatment status are, respectively, \[Y_i =\left\{ \begin{array}{ccl} Y_i(0) && \text{if } X_i<\bar{x} \\ Y_i(1) && \text{if } X_i\ge\bar{x}\\ \end{array} \right.\qquad \text{and} \qquad T_i =\left\{ \begin{array}{ccl} T_i(0) && \text{if } X_i<\bar{x}\\ T_i(1) && \text{if } X_i\ge\bar{x}\\ \end{array} \right..\]
The observed data is \(\{(Y_i,T_i,X_i)' : i=1,2,\ldots,n\}\), a random sample from a large population. In sharp RD designs, \(T_i=\mathbb{1}(X_i\ge\bar{x})\) with \(\mathbb{1}(\cdot)\) denoting the indicator function, which leads to \(\mathbb{P}[T_i=0|X_i<\bar{x}]=1=\mathbb{P}[T_i=1|X_i\ge\bar{x}]\). More generally, in fuzzy RD designs, treatment assignment and treatment status may differ (imperfect compliance). Therefore, for each unit \(i\), the scalar random variable \(Y_i\in\mathbb{R}\) denotes the outcome of interest, and \(T_i\in\{0,1\}\) denotes actual treatment take-up (\(T_i=1\) treatment taken, \(T_i=0\) treatment not taken).
We introduce some additional notation. For \(\nu\in\mathbb{Z}_+=\{0,1,2,\ldots\}\), define \[\mu_{Y+}^{(\nu)}(\bar{x})=\lim_{x\to\bar{x}^+}\frac{\partial^\nu}{\partial x^\nu} \mu_{Y+}(x),\qquad \mu_{Y-}^{(\nu)}(\bar{x})=\lim_{x\to\bar{x}^-}\frac{\partial^\nu}{\partial x^\nu} \mu_{Y-}(x),\] with \(\mu_{Y+}(x)=\mathbb{E}[Y(1)|X=x]\) and \(\mu_{Y-}(x)=\mathbb{E}[Y(0)|X=x]\), and \[\mu_{T+}^{(\nu)}(\bar{x})=\lim_{x\to\bar{x}^+}\frac{\partial^\nu}{\partial x^\nu} \mu_{T+}(x),\qquad \mu_{T-}^{(\nu)}(\bar{x})=\lim_{x\to\bar{x}^-}\frac{\partial^\nu}{\partial x^\nu} \mu_{T-}(x),\] with \(\mu_{T+}(x)=\mathbb{E}[T(1)|X=x]\) and \(\mu_{T-}(x)=\mathbb{E}[T(0)|X=x]\). Whenever there is no ambiguity, we drop the subindex denoting the dependent variable or the point of evaluation in the conditional expectations and other functions.
We focus on average treatment effects at the cutoff in the sharp RD, fuzzy RD, sharp kink RD and fuzzy kink RD designs, although the results cover other possibilities. For further details on the interpretation of these estimands and regularity conditions see, among others, (Hahn, P. Todd, and W. van der Klaauw 2001), Porter (2003), Lee (2008), IK, CCT, (Card et al. 2014), and references therein.
Sharp RD designs. Two popular parameters of interest are the sharp RD average treatment effect at the threshold, denoted by \(\tau_0\), and the sharp kink RD average treatment effect at the threshold, denoted by \(\tau_1/\kappa\), with \(\kappa\) a known scale constant, and using the generic notation: \[\tau_\nu := \tau_{Y,\nu}(\bar{x}) = \left.\frac{\partial^\nu}{\partial x^\nu}\mathbb{E}[Y_i(1)-Y_i(0)|X_i=x]\right|_{x=\bar{x}} = \mu^{(\nu)}_{Y+} - \mu^{(\nu)}_{Y-},\qquad \nu\in\mathbb{Z}_+,\] where the definition drops the subindex denoting the dependent random variable and evaluation point for notational simplicity. The last equality is a nonparametric identification result that holds under mild continuity conditions: the parameter \(\tau_\nu\) can be written as a function of observed data because \(\mu^{(\nu)}_{Y+}=\mu^{(\nu)}_{Y+}(\bar{x})\) and \(\mu^{(\nu)}_{Y-}=\mu^{(\nu)}_{Y-}(\bar{x})\) are estimable from the observed data; \(\tau_\nu=\tau_{Y,\nu}\) is a difference of two (one-sided) nonparametric regression functions at \(\bar{x}\).
Fuzzy RD designs. The parameters of interest take the form \[\varsigma_\nu := \varsigma_\nu(\bar{x}) = \frac{\left.\frac{\partial^\nu}{\partial x^\nu}\mathbb{E}[Y_i(1)-Y_i(0)|X_i=x]\right|_{x=\bar{x}}} {\left.\frac{\partial^\nu}{\partial x^\nu}\mathbb{E}[T_i(1)-T_i(0)|X_i=x]\right|_{x=\bar{x}}} = \frac{\tau_{Y,\nu}}{\tau_{T,\nu}} = \frac{\mu^{(\nu)}_{Y+} - \mu^{(\nu)}_{Y-}}{\mu^{(\nu)}_{T+} - \mu^{(\nu)}_{T-}}, \qquad \nu\in\mathbb{Z}_+,\] where the last two equalities represent a nonparametric identification result that holds under mild continuity conditions. Note that \(\mu^{(\nu)}_{Y+}=\mu^{(\nu)}_{Y+}(\bar{x})\), \(\mu^{(\nu)}_{Y-}=\mu^{(\nu)}_{Y-}(\bar{x})\), \(\mu^{(\nu)}_{T+}=\mu^{(\nu)}_{T+}(\bar{x})\) and \(\mu^{(\nu)}_{T-}=\mu^{(\nu)}_{T-}(\bar{x})\) are all estimable from the observed data, since \(\tau_{Y,\nu}\) and \(\tau_{T,\nu}\) are each a difference of two (one-sided) regression functions at \(\bar{x}\), and \(\varsigma_\nu\) is just their ratio. In the RD literature, the two main parameters of interest are \(\varsigma_0\), the fuzzy RD average treatment effect at the cutoff, and \(\varsigma_1\), the fuzzy kink RD average treatment effect at the cutoff.
Statistical inference in the RD design reduces to nonparametric regression inference at the induced boundary point \(\bar{x}\), employing observations at either side of the threshold separately. Local-polynomial estimators have become the preferred choice of nonparametric estimator in the RD literature because of their excellent boundary properties (Fan and Gijbels 1996). The package rdrobust implements local polynomial estimators of various orders for the two variables \(Y_i\) and \(T_i\) depending on the RD design considered, and also includes different bandwidths selectors and alternative confidence intervals estimators. All these features are briefly reviewed in the following subsections, but first we introduce the local polynomial RD estimators of order \(p\) in general. To reduce repetition, we describe the estimators using a generic outcome variable \(Z\) which either takes the value \(Y\) or \(T\), depending on the outcome variable under consideration. For \(Z\in\{Y,T\}\) and \(\nu,p\in\mathbb{Z}_+\) with \(\nu\le p\), \[\hat\tau_{Z,\nu,p}(x;h_n) = \hat\mu^{(\nu)}_{Z+,p}(x;h_n)-\hat\mu^{(\nu)}_{Z-,p}(x;h_n),\] \[\hat\mu^{(\nu)}_{Z+,p}(x;h_n)=\mathbf{e}_\nu'\hat{\boldsymbol{\beta}}_{Z+,p}(x;h_n) \qquad\text{and}\qquad \hat{\mu}^{(\nu)}_{Z-,p}(x;h_n)=\mathbf{e}_\nu'\hat{\boldsymbol{\beta}}_{Z-,p}(x;h_n),\] \[\hat{\boldsymbol{\beta}}_{Z+,p}(x;h_n)=\arg\min_{\boldsymbol{\beta}\in\mathbb{R}^{p+1}}\sum_{i=1}^n\mathbb{1}(X_i\ge x)(Z_i-\mathbf{r}_p(X_i-x)'\boldsymbol{\beta})^2 K_{h_n}(X_i-x),\] \[\hat{\boldsymbol{\beta}}_{Z-,p}(x;h_n)=\arg\min_{\boldsymbol{\beta}\in\mathbb{R}^{p+1}}\sum_{i=1}^n\mathbb{1}(X_i<x)(Z_i-\mathbf{r}_p(X_i-x)'\boldsymbol{\beta})^2 K_{h_n}(X_i-x),\] where here \(\mathbf{r}_p(x)=(1,x,\ldots,x^p)'\), \(\mathbf{e}_\nu\) is the conformable (\(\nu+1\))-th unit vector (e.g., \(\mathbf{e}_1=(0,1,0)'\) if \(p=2\)), \(K_h(u)=K(u/h)/h\) with \(K(\cdot)\) a kernel function and \(h_n\) is a positive bandwidth sequence.
Using the generic notation above, the sharp RD estimators are: \[\hat\tau_{\nu,p}(h_n):=\hat\tau_{Y,\nu,p}(\bar{x};h_n), \qquad \nu\le p.\] Similarly, for the fuzzy RD designs we have the RD estimators: \[\hat\varsigma_{\nu,p}(h_n):=\frac{\hat\tau_{Y,\nu,p}(h_n)}{\hat\tau_{T,\nu,p}(h_n)},\qquad \hat\tau_{Y,\nu,p}(h_n):=\hat\tau_{Y,\nu,p}(\bar{x};h_n),\qquad \hat\tau_{T,\nu,p}(h_n):=\hat\tau_{T,\nu,p}(\bar{x};h_n), \qquad \nu\le p.\]
Assuming the bandwidth \(h_n\to 0\) and other regularity conditions hold, consistency of these estimators follows easily from well-known properties of local polynomial estimators. In applications, the most common choices are \(p=1\) for \(\tau_0\) (local-linear sharp RD estimator), \(p=2\) for \(\tau_1\) (local-quadratic sharp kink RD estimator), \(p=1\) for \(\varsigma_0\) (local-linear fuzzy RD estimator), and \(p=2\) for \(\varsigma_1\) (local-quadratic fuzzy kink RD estimator).
The function rdrobust()
implements the above RD point estimators with
options c
to set \(\bar{x}\), deriv
to set \(\nu\), p
to set \(p\),
kernel
to set \(K(\cdot)\), and h
to set \(h_n\), among others.
The main obstacle in the practical implementation of RD local polynomial estimators is bandwidth selection. The package rdrobust implements the main two approaches for bandwidth selection available in the literature: (i) plug-in rules based on mean squared error (MSE) expansions, and (ii) cross validation. IK provide a comprehensive review of these approaches.
Direct plug-in rules. Direct plug-in (DPI) approaches to bandwidth selection are based on a mean squared error (MSE) expansion of the sharp RD estimators, leading to the MSE-optimal choice \[h_{\mathtt{MSE},\nu,p} = C_{\mathtt{MSE},\nu,p} \; n^{-\frac{1}{2p+3}},\qquad C_{\mathtt{MSE},\nu,p} = \left(\frac{(1+2\nu)\mathsf{V}_{\nu,p}}{2(p+1-\nu)\mathsf{B}_{\nu,p}^2}\right)^{\frac{1}{2p+3}},\] where \(\mathsf{B}_{\nu,p}\) and \(\mathsf{V}_{\nu,p}\) are the leading asymptotic bias and variance of the RD estimator, respectively.
The package rdrobust implements two data-driven versions of this
MSE-optimal bandwidth, one proposed by IK (available for \(\nu=0\)
only) and the other proposed by CCT (valid for all choices of
\(\nu\)). Both implementations include regularization as originally
discussed in IK, although the option scalereg
allows users to
remove it. The IK implementation, denoted by
\(\hat{h}_{\texttt{IK},0,p}\), may be viewed as a nonparametric
first-generation plug-in rule (e.g., Wand and M. Jones 1995),
sometimes called a DPI-1 (direct plug-in of order 1) selector. The
CCT implementation, denoted by \(\hat{h}_{\texttt{CCT},\nu,p}\), may
be viewed as a second-generation plug-in bandwidth selection
approach. Further details on implementation of the bandwidth
selectors may be found in IK, CCT and their supplemental materials.
Cross validation. This bandwidth choice is implemented as follows: \[\hat{h}_{\texttt{CV},p}=\arg \min_{h>0}\mathsf{CV}_\delta(h), \qquad \mathsf{CV}_\delta(h)=\sum_{i=1}^{n}\mathbb{1}(X_{-,[\delta]}\leq X_i \leq X_{+,[\delta]})\left(Y_i-\hat\mu_p\left(X_i;h\right)\right)^2,\] where
\[\hat\mu_p(x;h) =\left\{\begin{eqnarray} \mathbf{e}_0'\hat{\boldsymbol{\beta}}_{Y+,p}(x,h) && \text{if } x>\bar{x}\\ \mathbf{e}_0'\hat{\boldsymbol{\beta}}_{Y-,p}(x,h) && \text{if } x<\bar{x} \end{eqnarray} \right. ,\]
and, for \(\delta \in (0,1)\), \(X_{-,[\delta]}\) and \(X_{+,[\delta]}\) denote the \(\delta\)-th quantile of \(\{X_i:X_i<\bar{x}\}\) and \(\{X_i:X_i>\bar{x}\}\), respectively. See IK for further discussion on this alternative approach, which is valid only for sharp RD designs (\(\nu=0\)).
The function rdbwselect()
implements the above bandwidth selectors.
We briefly review the main asymptotic properties of the local polynomial RD estimators, with particular emphasis on the properties of the associated confidence interval estimators. Specifically, we discuss three types of confidence intervals (CI) based on Gaussian approximations: (i) conventional CI (assuming “small” bias), (ii) bias-corrected CI (not necessarily requiring undersmoothing), and (iii) robust bias-corrected CI (not necessarily requiring undersmoothing).
Optimal point estimators. The package rdrobust implements the following data-driven RD treatment effect point estimators. \[\begin{aligned} \text{Sharp RD:} &\qquad \hat\tau_{\nu,p}(\hat{h}_{\texttt{CCT},\nu,p}), \qquad \hat\tau_{0,p}(\hat{h}_{\texttt{IK},0,p}), \qquad \hat\tau_{0,p}(\hat{h}_{\texttt{CV},p}), \qquad \nu\le p.\\ \text{Fuzzy RD:} &\qquad \hat\varsigma_{\nu,p}(\hat{h}_{\texttt{CCT},\nu,p}), \qquad \hat\varsigma_{0,p}(\hat{h}_{\texttt{IK},0,p}), \qquad \hat\varsigma_{0,p}(\hat{h}_{\texttt{CV},p}),\qquad \nu\le p. \end{aligned}\] These estimators are constructed employing MSE-optimal bandwidth choices for the sharp RD case, which means that \(\hat\tau_{\nu,p}(\hat{h}_{\texttt{CCT},\nu,p})\), \(\hat\tau_{0,p}(\hat{h}_{\texttt{IK},0,p})\) and \(\hat\tau_{0,p}(\hat{h}_{\texttt{CV},p})\) may be interpreted as consistent and (asymptotically) MSE-optimal point estimators of \(\tau_\nu\). For the fuzzy RD cases, the bandwidth choices employed are technically optimal only for the numerator of the estimators, but since the rate of the MSE-optimal bandwidth choice does not differ from the sharp RD case, the estimators \(\hat\varsigma_{\nu,p}(\hat{h}_{\texttt{CCT},\nu,p})\), \(\hat\varsigma_{0,p}(\hat{h}_{\texttt{IK},\nu,p})\) and \(\hat\varsigma_{0,p}(\hat{h}_{\texttt{CV},p})\) may also be viewed as consistent and (asymptotically) MSE-optimal point estimators of \(\varsigma_\nu\).
Sharp RD confidence intervals. Confidence intervals accompanying the point estimators discussed above rely on the following distributional approximation:
\[\label{eq:clt} \sqrt{nh^{1+2\nu}_n}\left(\hat\tau_{\nu,p}(h_n)-\tau_\nu-h_n^{p+1-\nu}\;\mathsf{B}_{\nu,p}\right) \to_{\text{d}}\mathcal{N}(0,\mathsf{V}_{\nu,p}), \qquad \nu\le p, \tag{1} \]
where \(\mathsf{B}_{\nu,p}\) and \(\mathsf{V}_{\nu,p}\) denote, respectively, the asymptotic bias and variance of the RD estimator.
Conventional confidence intervals. An asymptotic \(100(1-\alpha)\)-percent confidence interval for \(\tau_\nu\) is \[\mathsf{CI}_{1-\alpha}(h_n) = \left[ \; \hat\tau_{\nu,p}(h_n)\pm\Phi^{-1}_{1-\alpha/2} \; \sqrt{\frac{\mathsf{V}_{\nu,p}}{nh^{1+2\nu}_n}} \; \right],\] where \(\Phi^{-1}_a\) denotes the appropriate quantile of the Gaussian distribution (e.g., \(1.96\) for \(a=.975\)). This approach is valid only if the leading bias in ((1)) is “small”. This smoothing bias is ignored by relying on an “undersmoothing” argument, that is, by assuming the bandwidth chosen is “small” enough so that the bias is negligible. In practice, however, this procedure may be difficult to implement because most bandwidth selectors, such as \(h_{\mathtt{MSE},\nu,p}\), will not satisfy the conditions required for undersmoothing. This fact implies that most empirical bandwidth selectors could in principle lead to a non-negligible leading bias in the distributional approximation, which in turn will bias the associated confidence intervals.
Bias-corrected confidence intervals. As an alternative to undersmoothing, we can directly bias-correct the estimator by constructing an estimator of \(\mathsf{B}_{\nu,p}\), which is then subtracted from the RD point estimate in an attempt to eliminate the leading bias in ((1)). The resulting asymptotic \(100(1-\alpha)\)-percent confidence interval for \(\tau_\nu\) is \[\mathsf{CI}^\texttt{bc}_{1-\alpha}(h_n,b_n) = \left[\left(\hat\tau_{\nu,p}(h_n)-h_n^{p+1-\nu}\;\hat{\mathsf{B}}_{\nu,p,q}\right)\pm\Phi^{-1}_{1-\alpha/2}\;\sqrt{\frac{\mathsf{V}_{\nu,p}}{nh^{1+2\nu}_n}} \right],\] where \(\hat{\mathsf{B}}_{\nu,p,q}\) denotes the bias estimate, which is constructed using a possibly different bandwidth \(b_n\). To implement this approach, CCT propose a MSE-optimal bandwidth choice of \(b_n\) for the bias estimator \(\hat{\mathsf{B}}_{\nu,p,q}\) taking the form \(b_{\texttt{MSE},\nu,p,q}= \mathsf{C}_{\nu,p,q}^{1/(2q+3)} \; n^{-1/(2q+3)}\), where \(\mathsf{C}_{\nu,p,q}\) denotes a constant depending on the data generating process. CCT also discuss an implementation procedure of \(b_{\texttt{MSE},\nu,p,q}\), leading to the data-driven estimator denoted by \(\hat{b}_{\texttt{CCT},\nu,p,q}\).
Robust bias-corrected confidence intervals. The confidence intervals discussed so far have some unappealing properties that may affect their performance in applications. On the one hand, the confidence intervals \(\mathsf{CI}_{1-\alpha}(h_n)\) require undersmoothing (or, alternatively, a “small” bias), which may lead to coverage distortions in cases where the bias is important. On the other hand, the bias-corrected confidence intervals \(\mathsf{CI}^{\texttt{bc}}_{1-\alpha}(h_n,b_n)\), while theoretically justified for a larger range of bandwidths, are usually regarded as having poor performance in empirical settings, also leading to potentially large coverage distortions in applications.
CCT propose an alternative, more robust confidence interval formula based on the bias-corrected RD treatment effect estimators, but employing a different variance for studentization purposes. Intuitively, the bias-corrected RD estimator does not perform well in finite-samples because the bias estimate introduces additional variability in the statistic, which is not accounted for when forming the associated confidence intervals \(\mathsf{CI}^{\texttt{bc}}_{1-\alpha}(h_n,b_n)\). Thus, CCT propose the asymptotic \(100(1-\alpha)\)-percent confidence interval for \(\tau_\nu\) given by \[\mathsf{CI}^{\texttt{rbc}}_{1-\alpha}(h_n,b_n) = \left[\left(\hat\tau_{\nu,p}(h_n)-h_n^{p+1-\nu}\;\hat{\mathsf{B}}_{\nu,p,q}\right)\pm\Phi^{-1}_{1-\alpha/2}\;\sqrt{\mathsf{V}^{\texttt{bc}}_{n,\nu,p,q}} \right],\] which includes the alternative variance formula \(\mathsf{V}^{\texttt{bc}}_{n,\nu,p,q}(h_n,b_n)\) accounting for the additional variability introduced by the bias estimate. See CCT and Calonico et al. (2015a) for further details.
Sharp RD variance estimation. To construct fully feasible confidence intervals the unknown asymptotic variance is replaced by a consistent estimator thereof. The asymptotic variance formulas introduced above have a “sandwich” structure coming from the weighted least-squares structure of local polynomials. The package rdrobust offers two distinct valid variance estimators, employing either “plug-in estimated residuals” or “fixed-matches estimated residuals”.
Plug-in estimated residuals. In this approach, the unknown residuals are estimated directly using the RD local polynomial estimator. Usually the same bandwidth \(h_n\) is employed, although his choice may not be optimal and could lead to poor finite-sample performance of the variance estimator. \(\check{\mathsf{V}}_{n,\nu,p}\) and \(\check{\mathsf{V}}^{\texttt{bc}}_{n,\nu,p,q}\) denote the resulting estimators of \(\mathsf{V}_{n,\nu,p}\) and \(\mathsf{V}^{\texttt{bc}}_{n,\nu,p,q}\), respectively.
Fixed-matches estimated residuals. CCT propose an alternative variance estimator employing a different construction for the residuals, motivated by the work of (Abadie and Imbens 2006). This estimator is constructed using a simple nearest-neighbor (or fixed-matches) estimator for the residuals. \(\hat{\mathsf{V}}_{n,\nu,p}\) and \(\hat{\mathsf{V}}^{\texttt{bc}}_{n,\nu,p,q}\) denote the resulting estimators of \(\mathsf{V}_{n,\nu,p}\) and \(\mathsf{V}^{\texttt{bc}}_{n,\nu,p,q}\), respectively.
Extensions to Fuzzy RD confidence intervals. All the ideas and results presented above extend to the case of fuzzy RD designs, which we omit here to conserve space. See IK and CCT for further details. The package rdrobust also implements fuzzy RD estimators and confidence intervals.
The function rdrobust()
may be used to conduct inference in all RD
settings; by default, this function employs the function rdbwselect()
for bandwidth selection. Specifically, assuming \(y\) is the output
variable, \(t\) is the treatment status variable, \(x\) is the running
variable and the cutoff is \(c=0\), we have the following generic cases
(with default options and bandwidth selection procedure):
Sharp RD:
rdrobust(y = y, x = x)
Sharp Kink RD:
rdrobust(y = y, x = x, deriv = 1)
Fuzzy RD:
rdrobust(y = y, x = x, fuzzy = t)
Fuzzy Kink RD:
rdrobust(y = y, x = x, fuzzy = t, deriv = 1)
Because of the simplicity of the RD design, it is customary (and advisable) to summarize the main features of the RD design in a graphical way. The package rdrobust also implements the results in Calonico et al. ({in press}), which offer several optimal data-driven choices of tuning parameters useful to produce several versions of the popular RD plots. These plots present global and local estimates of the regression functions, \(\mu_{Y-}(x)\) and \(\mu_{Y+}(x)\), in an attempt to describe their shapes for control (\(X_i<\bar{x})\) and treated (\(X_i\ge\bar{x}\)) units. The plot typically includes two smooth “global” polynomial regression curve estimates, for control and treatment units separately, as well as binned sample means of the outcome, which are included either to capture the local behavior of the underlying regression functions or to represent the overall variability of the raw data in a disciplined way. These binned sample means are used to either (i) investigate whether other discontinuities are present, and (ii) depict an informative “cloud of points” around the smooth global polynomial fits.
The first ingredient of the RD plot (two regression curves estimated for \(X_i<\bar{x}\) and \(X_i\ge\bar{x}\) separately) is easy to construct because it requires only estimating a polynomial regression on the data; typical choices are \(4\)-th and \(5\)-th order global polynomials. The second ingredient of the RD plot requires computing sample means over non-overlapping regions of the support of the running variable \(X_i\) for control and treatment units, separately. For implementation the researcher needs to choose the number of bins to be used, denoted by \(J_{-,n}\) (control) and \(J_{+,n}\) (treatment), and the length of each bin; evenly-spaced bins is the most common partitioning scheme used. Calonico et al. ({in press}) study the problem of selecting an optimal number of bins \(J_{-,n}\) and \(J_{+,n}\) under two partitioning schemes, evenly-spaced (ES) and quantile-spaced (QS), and propose several consistent nonparametric selectors using spacings and polynomial regression estimators. In particular, they propose two approaches to select the number of bins: (i) IMSE-optimal (tailored to approximate the underlying regression functions), and (ii) Mimicking Variance (tailored to represent the overall variability of the data in a disciplined way). These two proposed choices take the form, respectively, \[\text{IMSE-optimal : }\quad J_{-,n}^* = \left\lceil\mathscr{C}_-\;n^{1/3}\right\rceil \quad\text{and} \quad J_{+,n}^* = \left\lceil\mathscr{C}_+\;n^{1/3}\right\rceil,\] and \[\text{Mimicking Variance : }\quad J_{-,n}^* = \left\lceil\mathscr{C}_-\;\frac{n}{\log(n)^2}\right\rceil\quad\text{and} \quad J_{+,n}^* = \left\lceil\mathscr{C}_+\;\frac{n}{\log(n)^2}\right\rceil,\] where \(\lceil\cdot\rceil\) denotes the ceiling function, and the unknown constants take different values depending on the targeted method and partitioning scheme used. Once the partitioning scheme is selected, the optimal choices \(J_{-,n}^*\) and \(J_{+,n}^*\) can be estimated using preliminary plug-in estimators of \(\mathscr{C_-}\) and \(\mathscr{C_+}\).
The function rdplot()
offers eight distinct automatic implementations
for RD plots depending on (i) the choice of partitioning (ES or QS),
(ii) the goal of the plot (IMSE-optimal or Mimicking Variance), and
(iii) the estimation approach used (spacings or polynomial regression).
Specifically, the function rdplot()
covers the following.
Population quantities:
\(J_{\mathtt{ES}\text{-}\mu,\cdot,n}=\) IMSE-optimal choice with
evenly-spaced (ES) bins.
\(J_{\mathtt{ES}\text{-}\vartheta,\cdot,n}=\) Mimicking Variance
choice with evenly-spaced (ES) bins.
\(J_{\mathtt{QS}\text{-}\mu,\cdot,n}=\) IMSE-optimal choice with
quantile-spaced (QS) bins.
\(J_{\mathtt{QS}\text{-}\vartheta,\cdot,n}=\) Mimicking Variance
choice with quantile-spaced (QS) bins.
Estimators:
\(\hat{J}_{\mathtt{ES}\text{-}\mu,\cdot,n}\text{ and }\hat{J}_{\mathtt{QS}\text{-}\mu,\cdot,n}=\)
spacings implementations of \(J_{\mathtt{ES}\text{-}\mu,\cdot,n}\) and
\(J_{\mathtt{QS}\text{-}\mu,\cdot,n}\), respectively.
\(\check{J}_{\mathtt{ES}\text{-}\mu,\cdot,n}\text{ and }\check{J}_{\mathtt{QS}\text{-}\mu,\cdot,n}=\)
polynomial regression implementations of
\(J_{\mathtt{ES}\text{-}\mu,\cdot,n}\) and
\(J_{\mathtt{QS}\text{-}\mu,\cdot,n}\), respectively.
\(\hat{J}_{\mathtt{ES}\text{-}\vartheta,\cdot,n}\text{ and }\hat{J}_{\mathtt{QS}\text{-}\vartheta,\cdot,n}=\)
spacings implementations of
\(J_{\mathtt{ES}\text{-}\vartheta,\cdot,n}\) and
\(J_{\mathtt{QS}\text{-}\vartheta,\cdot,n}\), respectively.
\(\check{J}_{\mathtt{ES}\text{-}\vartheta,\cdot,n}\text{ and }\check{J}_{\mathtt{QS}\text{-}\vartheta,\cdot,n}=\)
polynomial regression implementations of
\(J_{\mathtt{ES}\text{-}\vartheta,\cdot,n}\) and
\(J_{\mathtt{QS}\text{-}\vartheta,\cdot,n}\), respectively.
Further details on implementation and syntax are given in the help file
of the function rdplot()
. For other technical and methodological
details see Calonico et al. ({in press}).
We employ an extract of the dataset constructed by
Cattaneo et al. (2015) to illustrate some of the features
of our R package rdrobust. This dataset contains information on
elections for the U.S. Senate during the period 1914–2010. We focus on
the RD effect of the Democratic party winning a U.S. Senate seat on the
vote share obtained in the following election for that same seat,
mimicking the analysis conducted in (Lee 2008) for the U.S. House.
The dataset rdrobust_RDsenate
contains two variables: vote and
margin. The variable vote records the state-level vote share of the
Democratic party in a given statewide election for a Senate seat, while
the variable margin records the margin of victory of the Democratic
party in the previous election for the same Senate seat (i.e., six years
prior).
First, we load the database and present basic summary statistics. The functions included in the R package rdrobust allow for missing values, which are automatically excluded for estimation purposes.
> library(rdrobust)
> data(rdrobust_RDsenate)
> vote <- rdrobust_RDsenate$vote
> margin <- rdrobust_RDsenate$margin
> summary(vote)
1st Qu. Median Mean 3rd Qu. Max. NA's
Min. 0.00 42.67 50.55 52.67 61.35 100.00 93
> summary(margin)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-100.000 -12.210 2.166 7.171 22.770 100.000
This data set has a total of \(1,297\) complete observations. The variable margin ranges from \(-100\) to 100, and records the margin of victory in a given election for a given U.S. Senate seat, defined as the vote share of the Democratic party minus the vote share of the strongest opponent. When margin is above zero, the Democratic party wins the election for that seat, otherwise it loses. The variable vote ranges from 0 to 100 because it records the outcome of the (two-periods ahead) election for that given seat. Thus, observations for years 2008 and 2010 have missing vote. As it is usual in the literature, we exploit the discontinuity in incumbency status that occurs at 0 on margin to employ an RD design.
We use rdplot()
to construct an automatic plot of the RD design.
> (rdplot(y = vote, x = margin, title = "RD Plot - Senate Elections Data",
+ y.label = "Vote Share in Election at time t+1",
+ x.label = "Vote Share in Election at time t"))
:
Callrdplot(y = vote, x = margin, title = "RD Plot - Senate Elections Data",
x.label = "Vote Share in Election at time t",
y.label = "Vote Share in Election at time t+1")
: mimicking variance evenly-spaced method using spacings estimators
Method
Left Right 595 702
Number of Obs. 4 4
Polynomial Order 1 1
Scale
15 35
Selected Bins 6.6614 2.8561
Bin Length
-optimal bins 8 9
IMSE15 35
Mimicking Variance bins
-optimal:
Relative to IMSE1.8750 3.8889
Implied scale 0.1317 0.0167
WIMSE variance weight 0.8683 0.9833 WIMSE bias weight
Figure 1 is constructed using the default options in the
command rdplot
, which produce an RD plot with evenly-spaced bins
selected to mimic the underlying variability of the data and is
implemented using spacings estimators. Using the notation introduced
above, the number of optimal bins for control and treatment units are
\(\hat{J}_{-,n}=15\) and \(\hat{J}_{+,n}=35\), respectively, implying bin
lengths of 6.661 and 2.856 percentage points, respectively. The global
polynomial is constructed using a \(4th\) degree polynomial (\(p=4\) for
\(\hat{\mu}_{-.p,1}(x)\) and \(\hat{\mu}_{+.p,1}(x)\)).
Next, we construct an alternative RD plot using evenly-spaced bins selected to trace out the underlying regression function (i.e., using the IMSE-optimal selector), also implemented using spacings estimators. The resulting plot is given in Figure 2.
> (rdplot(y = vote, x = margin, binselect = "es",
+ title = "RD Plot - Senate Elections Data",
+ y.label = "Vote Share in Election at time t+1",
+ x.label = "Vote Share in Election at time t"))
:
Callrdplot(y = vote, x = margin, binselect = "es", x
title = "RD Plot - Senate Elections Data",
x.label = "Vote Share in Election at time t",
y.label = "Vote Share in Election at time t+1")
: IMSE-optimal evenly-spaced method using spacings estimators
Method
Left Right 595 702
Number of Obs. 4 4
Polynomial Order 1 1
Scale
8 9
Selected Bins 12.4901 11.1071
Bin Length
-optimal bins 8 9
IMSE15 35
Mimicking Variance bins
-optimal:
Relative to IMSE1.0000 1.0000
Implied scale 0.5000 0.5000
WIMSE variance weight 0.5000 0.5000 WIMSE bias weight
While providing a good approximation to the underlying regression
function (taking the global polynomial fit as benchmark), the
IMSE-optimal number of bins will usually be too small in applications.
This happens because the optimal formulas seek to balance squared bias
and variance in order to approximate the underlying regression function
globally. To obtain a visual “cloud of points” we need to increase the
number of bins, that is, undersmooth the estimator. In other words, in
order to increase the overall variability of the plotted points, we may
reduce the bin-length – which is done by increasing the total number of
bins used. This may be easily done using the option scale
as follows:
> (rdplot(y = vote, x = margin, binselect = "es", scale = 5,
+ title = "RD Plot - Senate Elections Data",
+ y.label = "Vote Share in Election at time t+1",
+ x.label = "Vote Share in Election at time t"))
:
Callrdplot(y = vote, x = margin, binselect = "es", scale = 5,
title = "RD Plot - Senate Elections Data",
x.label = "Vote Share in Election at time t",
y.label = "Vote Share in Election at time t+1")
: IMSE-optimal evenly-spaced method using spacings estimators
Method
Left Right 595 702
Number of Obs. 4 4
Polynomial Order 5 5
Scale
40 45
Selected Bins 2.4980 2.2214
Bin Length
-optimal bins 8 9
IMSE15 35
Mimicking Variance bins
-optimal:
Relative to IMSE5.0000 5.0000
Implied scale 0.0079 0.0079
WIMSE variance weight 0.9921 0.9921 WIMSE bias weight
Figure 3 shows the resulting (undersmoothed) RD plot, where now the number of bins used is five times larger than the optimal choice in an integrated mean squared error sense. The resulting estimator is naturally more variable than before.
Next, we conduct fully data-driven RD treatment effect estimation and
inference. The function rdrobust()
using its default options leads to
the following output:
> rdrobust(y = vote, x = margin)
:
Callrdrobust(y = vote, x = margin)
:
Summary
1297
Number of Obs 3
NN Matches
BW Type CCT
Kernel Type Triangular
Left Right 343 310
Number of Obs Poly (p) 1 1
Order Loc Bias (q) 2 2
Order Poly (h) 16.7936 16.7936
BW Loc Bias (b) 27.4372 27.4372
BW rho (h/b) 0.6121 0.6121
:
Estimates>|z| CI Lower CI Upper
Coef Std. Err. z P7.4253 1.4954 4.9656 0.0000 4.4944 10.3562
Conventional 0.0000 4.0697 10.9833 Robust
These results contain a variety of information, which is organized in three panels. The first two contain a summary of the main choices selected to construct the RD treatment effect estimators, while the lower panel includes the main estimation results. Specifically, using the notation introduced above, this table shows:
The total number of observations is \(1,297\), with effective \(343\) control and \(310\) treated units (given the bandwidth \(h_n\) chosen; see below). The estimation is conducted using a local-linear (\(p=1\)) estimator with a local-quadratic (\(q=2\)) bias-correction estimate, with a triangular kernel. The standard error estimators are the ones proposed by CCT, computed using 3 nearest-neighbors.
The bandwidth selection procedure is the one proposed by CCT, leading to \(\hat{h}_{\mathtt{CCT},\nu,p}=16.7936\) with \(p=1\), and \(\hat{b}_{\mathtt{CCT},\nu,p,q}=27.4372\) with \(q=2\). Recall \(\hat{h}_{\mathtt{CCT},\nu,p}\) and \(\hat{b}_{\mathtt{CCT},\nu,p,q}\) are the estimated bandwidths used to construct the RD point estimator and the RD bias-correction, respectively, and \(\nu=0\) in this illustration.
The RD point estimator is \(\hat\tau_p(\hat{h}_{\mathtt{CCT},\nu,p}) = 7.4253\) and the RD robust confidence intervals is \(\hat{\mathsf{CI}}^{rbc}_{1-\alpha}(\hat{h}_{\mathtt{CCT},\nu,p},\hat{b}_{\mathtt{CCT},\nu,p,q}) = [\;4.0697\;,\;10.9833\;]\), with a default choice of \(\alpha=0.05.\)
The function rdrobust()
also offers a more detailed output, which
includes all the point estimators, standard errors and confidence
intervals discussed previously. These results are retrieved using the
all = TRUE
option. The corresponding output is as follows:
> rdrobust(y = vote, x = margin, all = TRUE)
:
Callrdrobust(y = vote, x = margin, all = TRUE)
:
Summary
1297
Number of Obs 3
NN Matches
BW Type CCT
Kernel Type Triangular
Left Right 343 310
Number of Obs Poly (p) 1 1
Order Loc Bias (q) 2 2
Order Poly (h) 16.7936 16.7936
BW Loc Bias (b) 27.4372 27.4372
BW rho (h/b) 0.6121 0.6121
:
Estimates>|z| CI Lower CI Upper
Coef Std. Err. z P7.4253 1.4954 4.9656 0.0000 4.4944 10.3562
Conventional -Corrected 7.5265 1.4954 5.0333 0.0000 4.5957 10.4574
Bias7.5265 1.7637 4.2675 0.0000 4.0697 10.9833 Robust
Finally, we illustrate all the bandwidth selection procedures contained
in our package. To this end, we employ our companion function
rdbwselect()
to compare the CCT bandwidth selectors with the IK and CV
approaches. We have:
> rdbwselect(y = vote, x = margin, all = TRUE)
:
Callrdbwselect(y = vote, x = margin, all = TRUE)
BW Selector All 1297
Number of Obs 3
NN Matches
Kernel Type Triangular
Left Right595 702
Number of Obs Poly (p) 1 1
Order Loc Bias (q) 2 2
Order
h b16.79357 27.43722
CCT 15.66761 16.48524
IK 35.42113 NA CV
In this case we employed the option all = TRUE
, which computes the
three bandwidth selectors briefly discussed above. Notice that the
option CV
is currently not available for derivative estimation. To
further understand the performance of the CV approach, we include a
graph of the CV objective function over the grid being considered. This
is done using the option cvplot
as shown next (in this example we also
changed the grid features to obtain a better plot, and to show this
additional functionality in action as well).
> rdbwselect(y = vote, x = margin, bwselect = "CV",
+ cvgrid_min = 10, cvgrid_max = 80, cvplot = TRUE)
:
Callrdbwselect(y = vote, x = margin, bwselect = "CV", cvgrid_min = 10,
cvgrid_max = 80, cvplot = TRUE)
BW Selector CV 1297
Number of Obs 3
NN Matches
Kernel Type Triangular
Left Right595 702
Number of Obs Poly (p) 1 1
Order Loc Bias (q) 2 2
Order
h b34.5 34.5
Figure 4 shows the CV objective function as a function of a grid of bandwidth. In this example, the cross validation approach delivers a minimum at \(\hat{h}_{\mathtt{CV},p}=34.5\).
Our functions contained in the R package rdrobust have many other
options. For instance, for the main function rdrobust()
we have the
following additional examples (output is not provided to conserve
space).
Estimation using uniform kernel:
rdrobust(y = vote, x = margin, kernel = "uniform")
Estimation using the IK bandwidth selector:
rdrobust(y = vote, x = margin, bwselect = "IK")
Estimation using the CV bandwidth selector:
rdrobust(y = vote, x = margin, bwselect = "CV")
Estimation using \(h_n = 15\) and \(\rho = h_n/b_n = 0.8\):
rdrobust(y = vote, x = margin, h = 15, rho = 0.8)
Estimation using \(p = 2\) and \(q = 4\):
rdrobust(y = vote, x = margin, p = 2, q = 4)
Estimation using plug-in residuals estimates:
rdrobust(y = vote, x = margin, vce = "resid")
We introduced the main features of the R package rdrobust, which
includes the functions rdrobust()
, rdbwselect()
, and rdplot()
designed to conduct data-driven nonparametric robust inference in RD
designs. This implementation covers average RD treatment effects at the
cutoff in the sharp RD, sharp kink RD, fuzzy RD and fuzzy kink RD
designs. A full description of this package may be found in its manual
and help files. A companion Stata package offering the same structure
and capabilities is described in Calonico, M. D. Cattaneo, and R. Titiunik (2014).
We thank the Editor Deepayan Sarkar, two anonymous reviewers, Richard Anderson, Devin Caughey and Marko Klašnja for comments and suggestions on previous versions of our R package rdrobust. The authors gratefully acknowledge financial support from the National Science Foundation (SES 1357561).
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Calonico, et al., "rdrobust: An R Package for Robust Nonparametric Inference in Regression-Discontinuity Designs", The R Journal, 2015
BibTeX citation
@article{RJ-2015-004, author = {Calonico, Sebastian and Cattaneo, Matias D. and Titiunik, Rocío}, title = {rdrobust: An R Package for Robust Nonparametric Inference in Regression-Discontinuity Designs}, journal = {The R Journal}, year = {2015}, note = {https://rjournal.github.io/}, volume = {7}, issue = {1}, issn = {2073-4859}, pages = {38-51} }