The *revisit* package, developed as a collaborative tool for scientists, also serves as a tool for teaching statistics, in a manner that can be highly motivating for students. Using either the included case studies or datasets/code provided by the instructor, students can explore several alternate paths of analysis, such as the effects of including/excluding certain variables, employing different types of statistical methodology and so on. The package includes features that help students follow modern statistical standards and avoid various statistical errors, such as “p-hacking” and lack of attention to outlier data.

With this issue of the R Journal, there will be a section for teaching R and teaching using R as well as for empirical research on teaching R.

A GitHub repository will be setup where the teaching material can be accessed. All material should be under some licence of Creative Commons to allow re-use. Please prepare a short text describing your material. For a sample, see the description by Norman Matloff et al. (below) of class room material he is using.

Some research is considering the relative success of different approaches to teaching R and teaching statistics with R. For example, see some contributions on the recent useR!2017 in Brussels (Gert Janssenwill et al., The analysis of R learning styles with R, Matthias Gehrke and Karsten Luebke, A quasi-experiment for the influence of the user interface on the acceptance of R). The R Journal is interested in receiving such submissions.

This column will appear before this GitHub repository is established, so only links to an existing cooperating repository and mailing list can be provided at this time. Subsequent columns will provide full technical details or links to such details.

The R package *revisit* was developed in response to the recent concern
over *reproducibility* in research, especially problems related to
statistical analysis (Baker 2016). The motivation and methods are
detailed in Matloff, R. Davis, L. Beckett, and P. Thompson (2017), but in this paper we turn to teaching.

The package has both text and GUI versions, the latter being based on the RStudio integrated development environment for R. It is currently at an early stage of development, with more features and refinements being added continuously. It may be downloaded from https://github.com/matloff/revisit.

One can obtain a quick introduction to the package by starting it and
running one of the case studies. To start the package in GUI form, click
Addins in RStudio, and choose “Revisit”. The text version is not
colorful, but is more flexible. To run it, simply run
`library(revisit)`

. Use of the case studies will be introduced later in
this paper.

The package includes a number of examples for student examination and
participation. One of the case studies involves the controversial paper
by Harvard economists C. Reinhart and K. Rogoff (Cassidy 2016). They had
found that nations with high budget deficits average a \(-0.1\)% growth
rate in GDP. This finding had major impact on policymakers, with the
paper attracting particular interest from the “deficit hawks” in the
U.S. Congress. The *Washington Post* took to describing the finding as
“consensus among economists.”

But when later researchers tried alternate analysis, the picture changed
entirely. Some data had been excluded from the initial published
analysis, for what arguably were weak reasons. The original analysis
also had the flaw of giving equal weights to all nations, regardless of
size, as well as other aspects that some researchers considered flaws.
When the alternate analysis was run, the figure \(-0.1\)% changed to
+2.2%. Thus the original findings on deficit spending now seem
questionable. This leads to the theme in the present paper, use of
*revisit* for teaching.

At a talk given by one of the authors of the present paper (Matloff 2017),
the author was stunned at student reaction to the Reinhart/Rogoff
example. The students, all doing graduate work in science and
engineering, were captivated by the fact that the study, which had had
such influence on policy, may have been seriously flawed, with alternate
analyses of the same data yielding starkly different results. The
potential of *revisit* as a teaching tool had been suggested earlier by
one of the other authors of the present paper, but the strong student
reaction here dramatized the point.

The package is structured as follows:

A set of case studies for students to explore and modify (and to which instructors can add their own examples).

“Statistical audit” warnings/advice given to students regarding statistical best practices as they proceed in their analyses.

A code management infrastructure to facilitate exploration of alternative analyses.

The package includes various case studies, consisting of data and code. The latter comprises a complete, though possibly brief, analysis of the data from start to finish — the code may include data configuration, data cleaning, preliminary graphical analysis, predictor variable/model selection, and so on. Students can then try their own alternative analyses. Most of the case studies are not as dramatic as Reinhart and Rogoff, but each illustrates important concepts in data analysis.

There are just a few case studies included in the package currently, but more are being added, and instructors will often prefer to use their own data anyway. In many cases, the code will be minimal, affording the students even more opportunity for nonpassive thought and exploration.

Assignments centered on the use of *revisit* can be very specific or
more open-ended, according to the instructor’s preference. Here are
possible samples:

“This assignment involves the Reinhart and Rogoff data, included in the

*revisit*package. Revisit the analysis with different weightings for the nations based on various factors.”“This assignment involves the famous Pima diabetes study. Some later inspection suggested that the data includes a number of erroneous values, such as impossible 0 values. Investigate this, say using graphical means. Run a logistic regression model, with and without the suspect data points, and compare the results. Also, the authors of the study used a neural network approach. Try using random forests, and compare to the authors’ level of predictive ability.”

“This assignment analyzes the famous Forest Cover dataset, in which one of seven cover types is predicted from variables such as Hillside Shade at Noon. The given code tries prediction using, of course, the random forests method. The first 10 predictors are used. In your alternate analysis, try using all 54 predictors. That is quite a bit, and even with over 500,000 observations, one must always worry about overfitting; investigate that here.”

“This assignment analyzes the Fong-Ouliaria currency data. The supplied code fits a linear regression model, with a respectable \(R^2\) value. However, one can do better. First, explore this with some of the graphical methods in the

*regtools*package, and then try adding some quadratic and interaction terms to the model, and/or try a nonparametric regression approach.”“Here you will work with data from the 2000 Census, involving salaries of programmers and engineers in Silicon Valley. One aspect of interest is the gender issue — are women paid less than comparable men? A preliminary regression analysis indicates a difference of over $10,000, for fixed age and educational level. But much more needs to be done. For instance, what happens when occupation is factored in, and when the non-monotonic relation of wage and age is accounted for? Investigate such questions.”

We are continuing to add case studies. At this early stage, the list includes:

UCI Pima diabetes data (Lichman 2017)

Zavodny guestworker data (Zavodny 2011)

Reinhart and Rogoff (Cassidy 2016)

MovieLens (Harper and J. A. Konstan 2015)

Fong-Ouliaria currency data (Fong and S. Ouliaris 1995)

UCI forest cover data (Lichman 2017)

Salary data on programmers and engineers, Silicon Valley, 2000 Census

We anticipate that instructors will typically develop their own case
studies. We encourage them to contribute these to *revisit*.

To load a case study in the GUI, just select the desired one from the Case Studies window. The code will then be loaded, ready to run. For the text version, at present this is done as in this example, for the Currency data:

```
<- system.file('CaseStudies/Currency/currency.R',package='revisit')
fname loadb(fname)
```

One of the most important features of *revisit* is that it plays the
role of a “statistical audit,” in much the same way that tax preparation
software might warn of questionable claims in a tax return.

Much of this is accomplished by wrapper functions provided by *revisit*.
For instance, if the student wishes to form a confidence interval for a
mean in R, she might use `t.test()`

. But in *revisit* she can instead
use `t.test.rv()`

. Instead of **lm()** she can call `lm.rv()`

, a wrapper
for `lm()`

that adds “statistical audit” functionality.

As noted, this function calls `lm()`

and returns an object of class
`"lm"`

, but with additional functionality. At present this takes on two
forms:

It checks whether the response variable takes on only two values, in which case the function suggests a logistic model using

**glm()**:`> y <- c(1,0,1) > x <- 1:3 > d <- data.frame(y,x) > lm.rv(y ~ x,d) ...: Coefficients (Intercept) x 0.6667 0.0000 : Warning messagelm.rv(y ~ x, user.data = d) : In 2 distinct Y values; consider a logistic model only`

It runs a parallel analysis, i.e. with the same formula and data as the

`lm()`

call, but with median (Minimum Absolute Deviation) regression, implemented with`rq()`

from the*quantreg*package. For instance, with the salary case study, one might run:`> lm.rv(wageinc ~ age+sex+ms+phd,user.data=pe) : 0.3221592 max. prop. difference, linear median regression larger values, may indicate outlier or model fit issues : Calllm(formula = formula, data = user.data) : Coefficients (Intercept) age sex ms phd 53286.4 441.4 -12343.6 18363.6 27770.3`

There is more than a 32% difference in the two model fits, which turns out to be in the age coefficient. (The

`rq()`

coefficients are available as a component`$rqc`

of the`"lm"`

object returned by`lm.rv()`

.) As noted, this may indicate issues with outliers or model fit.

The package aims to reduce p-hacking by monitoring the number of inference actions — p-values, confidence intervals — the student has accumulated in his/her analysis, and may issue a warning that the student should consider employing multiple-inference methods. For the latter, at present the package offers just Bonferroni’s Method, but more will be added (Hsu 1996).

Moreover, the package responds to the dramatic 2016 announcement by the
American Statistical Assocation (Wasserstein and N. A. Lazar 2016), which warned on the overuse of
p-values. Though this problem had been common knowledge for many years
(Freedman, R. Pisani, and R. Purves 1978; Jones and N. Matloff 1986; Ziliak and D. McCloskey 2008), the ASA announcement gave new urgency to
the issue. The *revisit* package takes an active role in encouraging
students not to rely much on p-values in the first place. Confidence
interval-based analysis is preferred.

Here is the code for `t.test.rv()`

:

```
> t.test.rv
function (x, y, alpha = 0.05, bonf = 1)
{<- alpha/bonf
alpha <- t.test(x, y, conf.level = 1 - alpha)
tout <- tout$estimate[1]
muhat1 <- tout$estimate[2]
muhat2 $p.value <- tout$p.value * bonf
tout$pcount <<- rvenv$pcount + 1
rvenvif (tout$p.value < alpha && muhat1 != 0) {
if (abs(muhat1 - muhat2)/abs(muhat1) < rvenv$smalleffect)
warning(paste("small p-value but effect size",
"could be of little practical interest"))
}
tout }
```

Here the argument `bonf`

is a multiplicative factor used to expand
confidence interval widths for Bonferroni corrections, as seen in the
above code.

Note the incrementing of `rvenv$pcount`

. This is the global count (which
includes potential confidence intervals) alluded to earlier, to be used
in the warning that the user should consider multiple inference
methods.^{1} Note too that the code may warn the student that there was
a “small p-value but effect size could be of little practical interest.”

Steering students (and their instructors) to confidence intervals
instead of p-values can be difficult not only in terms of breaking
habits, but also in technical terms. Consider the log-linear model. for
instance. Most packages perform log-lin solely from a hypothesis testing
point of view. R’s `stats::loglin()`

function, for instance, will not
provide standard errors, and only provides point estimates on request.
Our package will go further, offering point estimates and standard
errors for estimated cell probabilities. As the user steps through the
model hierarchy, at a certain point it will become clear that the
estimates are not changing in important ways, and one can stop the model
selection process. This will be accomplished by the “Poisson
trick”(Christensen 2013), in conjunction with R’s `glm()`

function.

After a student selects an example from the Case Study menu, the package
loads the desired code and data, and enters the code into the package’s
visual text editor. The student can now edit and run the revised code,
*including just a partial run, up to a given line*.

The latter capability is especially useful. Say the code consists of 32
lines, and line 11 is of interest to the student. The student can direct
*revisit* to run lines 1-10 of the code, then pause. At that point the
student can try executing an alternative to line 11, by executing the
alternative line in the Console box, which provides direct access to the
RStudio R console in which execution takes place. The student can then
resume execution of the code starting from line 12. This allows the
student to quickly try alternative code without actually changing the
contents of the text editor.

In generating various alternative analyses, the student can save the
interesting ones, each version in a different file, but all managed
conveniently by *revisit*. Borrowing from software engineering
terminology, each of these files is called a *branch*.

Let’s start with a simple and quite well-known example, the Pima diabetes data. The authors (Baker 2016) used a form of neural networks to predict diabetes from variables such as BMI and insulin. To keep things simple, though, we will not engage in prediction analysis here.

Upon launching *revisit* in the GUI, a new window associated with the
RStudio session then pops up, and the screen then looks like Figure
1. Pima is the first case study listed.

The included code here consists of just forming confidence intervals comparing diabetic and nondiabetic groups, on each of the eight predictor variables. The student can choose to run the entire code or just a part; say she chooses the former. The confidence intervals can then be seen in the R console portion of the RStudio window (Figure 2).

At this point, the student may ask, “What if we make the Bonferroni
correction here?” Although she could edit line 15 in the *revisit* text
editor window to

`<- t.test.rv(diab[ ,i], nondiab[ ,i])$conf.int tmp `

and then re-run, if this is just a tentative change, she could avoid changing the code, by entering the above into the Console box, as seen in Figure 3. As expected, the confidence intervals become wider (not shown). If the student wishes to make further changes, she may now wish to make the above change in the text editor, and possibly save the new code into a new branch.

After the Pima dataset was curated, there were reports of erroneous
values in some data points. To investigate this, one might run the
`discparcoord()`

function from the package
*cdparcoord*, included
in *revisit*:

`discparcoord(pima,k=769)`

Here the second argument specifies forming the graph on all 769 records
in the data.^{2}

The resulting graph will be displayed in the Viewer pane in the RStudio window. In the text version, the student would run the above directly, in which case the graph appears in the student’s Web browser.

The result is shown in Figure 4. This is a *parallel
coordinates* plot (Inselberg 2009). Each data point is displayed as a
segmented line connecting dots at heights given by the values of the
variables in that data point.^{3}

The results are rather striking. We see that there are data points having the impossible value of 0 for variables such as glucose and blood pressure. Indeed, there are some data points with multiple 0s. Clearly the student will need to remove some of the data points, and re-run the analyses.

The student may wish to create multiple versions of the code, e.g. in
this case, code with and without the erroneous points. The *revisit*
package facilitates this, creating branches 0, 1, 2 and so on of the
code.

As mentioned, a number of wrapper functions are planned, as well as expanding the “statistical audit” features of existing wrappers. Much more graphical analyis is slated.

The number of case studies will continue to grow.

The *revisit* package facilitates nonpassive, hands-on exploration of
statistical and data analytic methodology on real data. The students
learn that even published data is not sacrosanct, and that alternate
analysis can yield additional insight into the phenomena under study.

Statistical methodology pervades almost every conceivable aspect of our
world today. In addition to the students’ possible future usage of
statistics in professional roles, educators should prepare them to act
as informed, critical thinkers in their roles as citizens and consumers.
We hope that *revisit* can play a part in achieving such goals.

It is also hoped that instructors and students will contribute suggestions for improvement, including pull requests on GitHub. These, and of course new datasets, would be highly appreciated.

We are grateful to Bohdan Khomtchouk for inviting one of the authors to
speak on *revisit* at the inaugural meeting of the Stanford R Group, and
to Karen Wu for a careful reading of the manuscript.