Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions

Methodology extending nonparamet-ric goodness-of-fit tests to discrete null distributions has existed for several decades. However, modern statistical software has generally failed to provide this methodology to users. We offer a revision of R's ks.test() function and a new cvm.test() function that fill this need in the R language for two of the most popular nonpara-metric goodness-of-fit tests. This paper describes these contributions and provides examples of their usage. Particular attention is given to various numerical issues that arise in their implementation .


Introduction
Goodness-of-fit tests are used to assess whether data are consistent with a hypothesized null distribution.The χ 2 test is the best-known parametric goodnessof-fit test, while the most popular nonparametric tests are the classic test proposed by Kolmogorov and Smirnov followed closely by several variants on Cramér-von Mises tests.
In their most basic forms, these nonparametric goodness-of-fit tests are intended for continuous hypothesized distributions, but they have also been adapted for discrete distributions.Unfortunately, most modern statistical software packages and programming environments have failed to incorporate these discrete versions.As a result, researchers would typically rely upon the χ 2 test or a nonparametric test designed for a continuous null distribution.For smaller sample sizes, in particular, both of these choices can produce misleading inferences.This paper presents a revision of R's ks.test() function and a new cvm.test()function to fill this void for researchers and practitioners in the R environment.This work was motivated by the need for such goodness-of-fit testing in a study of Olympic figure skating scoring (Emerson and Arnold, 2011).We first present overviews of the theory and general implementation of the discrete Kolmogorov-Smirnov and Cramér-von Mises tests.We discuss the particular implementation of the tests in R and provide examples.We conclude with a short discussion, including the state of existing continuous and two-sample Cramérvon Mises testing in R.

Kolmogorov-Smirnov test Overview
The most popular nonparametric goodness-of-fit test is the Kolmogorov-Smirnov test.Given the cumulative distribution function F 0 (x) of the hypothesized distribution and the empirical distribution function F data (x) of the observed data, the test statistic is given by When F 0 is continuous, the distribution of D does not depend on the hypothesized distribution, making this a computationally attractive method.Slakter (1965) offers a standard presentation of the test and its performance relative to other algorithms.The test statistic is easily adapted for one-sided tests.For these, the absolute value in ( 1) is discarded and the tests are based on either the supremum of the remaining difference (the 'greater' testing alternative) or by replacing the supremum with a negative infimum (the 'lesser' hypothesis alternative).Tabulated p-values have been available for these tests since 1933 (Kolmogorov, 1933).
The extension of the Kolmogorov-Smirnov test to non-continuous null distributions is not straightforward.The formula of the test statistic D remains unchanged, but its distribution is much more difficult to obtain; unlike the continuous case, it depends on the null model.Use of the tables associated with continuous hypothesized distributions results in conservative p-values when the null distribution is discontinuous (see Slakter (1965), Goodman (1954), andMassey (1951)).In the early 1970's, Conover (1972) developed the method implemented here for computing exact one-sided p-values in the case of discrete null distributions.The method developed in Gleser (1985) is used to provide exact p-values for two-sided tests.

Implementation
The implementation of the discrete Kolmogorov-Smirnov test involves two steps.First, the particular test statistic is calculated (corresponding to the desired one-sided or two-sided test).Then, the p-value for that particular test statistic may be computed.
The form of the test statistic is the same as in the continuous case; it would seem that no additional work would be required for the implementation, but this is not the case.Consider two non-decreasing functions f and g, where the function f is a step function with jumps on the set {x 1 , . . .x N } and g is continuous (the classical Kolmogorov-Smirnov situation).In The R Journal Vol.3/2, December 2011 ISSN 2073-4859 order to determine the supremum of the difference between these two functions, notice that Computing the maximum over these 2N values (with f equal to F data (x) and g equal to F 0 (x) as defined above) is clearly the most efficient way to compute the Kolmogorov-Smirnov test statistic for a continuous null distribution.When the function g is not continuous, however, equality (3) does not hold in general because we cannot replace lim x→x i g(x) with the value g(x i ).
If it is known that g is a step function, it follows that for some small , where the discontinuities in g are more than some distance apart.This, however, requires knowledge that g is a step function as well as of the nature of its support (specifically, the break-points).As a result, we implement the Kolmogorov-Smirnov test statistic for discrete null distributions by requiring the complete specification of the null distribution.
Having obtained the test statistic, the p-value must then be calculated.When an exact p-value is required for smaller sample sizes, the methodology in Conover (1972) is used in for one-sided tests.For two-sided tests, the methods presented in Gleser (1985) lead to exact two-sided p-values.This requires the calculation of rectangular probabilities for uniform order statistics as discussed by Niederhausen (1981).Full details of the calculations are contained in source code of our revised function ks.test() and in the papers of Conover and Gleser.For larger sample sizes (or when requested for smaller sample sizes), the classical Kolmogorov-Smirnov test is used and is known to produce conservative p-values for discrete distributions; the revised ks.test() supports estimation of p-values via simulation if desired.

Cramér-von Mises tests Overview
While the Kolmogorov-Smirnov test may be the most popular of the nonparametric goodness-of-fit tests, Cramér-von Mises tests have been shown to be more powerful against a large class of alternatives hypotheses.The original test was developed by Harald Cramér and Richard von Mises (Cramér, 1928;von Mises, 1928) and further adapted by Anderson andDarling (1952), andWatson (1961).The original test statistic, W 2 , Anderson's A 2 , and Watson's U 2 are: (5) As with the original Kolmogorov-Smirnov test statistic, these all have test statistic null distributions which are independent of the hypothesized continuous models.The W 2 statistic was the original test statistic.The A 2 statistic was developed by Anderson in the process of generalizing the test for the two-sample case.
Watson's U 2 statistic was developed for distributions which are cyclic (with an ordering to the support but no natural starting point); it is invariant to cyclic reordering of the support.For example, a distribution on the months of the year could be considered cyclic.
It has been shown that these tests can be more powerful than Kolmogorov-Smirnov tests to certain deviations from the hypothesized distribution.They all involve integration over the whole range of data, rather than use of a supremum, so they are best-suited for situations where the true alternative distribution deviates a little over the whole support rather than having large deviations over a small section of the support.Stephens (1974) offers a comprehensive analysis of the relative powers of these tests.
Generalizations of the Cramér-von Mises tests to discrete distributions were developed in Choulakian et al. (1994).As with the Kolmogorov-Smirnov test, the forms of the test statistics are unchanged, and the null distributions of the test statistics are again hypothesis-dependent. Choulakian et al. (1994) does not offer finite-sample results, but rather shows that the asymptotic distributions of the test statistics under the null hypothesis each involve consideration of a weighted sum of independent chi-squared variables (with the weights depending on the particular null distribution).

Implementation
Calculation of the three test statistics is done using the matrix algebra given by Choulakian et al. (1994).The only notable difficulty in the implementation of The R Journal Vol.3/2, December 2011 ISSN 2073-4859 the discrete form of the tests involves calculating the percentiles of the weighted sum of chi-squares, where p is the number of elements in the support of the hypothesized distribution.Imhof (1961) provides a method for obtaining the distribution of Q, easily adapted for our case because the chi-squared variables have only one degree of freedom.The exact formula given for the distribution function of Q is given by for continuous functions θ(•, x) and ρ(•) depending on the weights λ i .
There is no analytic solution to the integral in ( 9), so the integration is accomplished numerically.This seems fine in most situations we considered, but numerical issues appear in the regime of large test statistics x (or, equivalently, small p-values).The function θ(•, x) is linear in x; as the test statistic grows the corresponding periodicity of the integrand decreases and the approximation becomes unstable.As an example of this numerical instability, the red plotted in Figure 1 shows the non-monotonicity of the numerical evaluation of equation ( 9) for a null distribution that is uniform on the set {1, 2, 3}.Bound given by (11) Bound given by ( 13) Figure 1: Plot of calculated p-values for given test statistics using numerical integration (red) compared to the conservative chi-squared bound (dashed blue) and the Markov inequality bound (dashed green).
The null distribution is uniform on the set {1, 2, 3} in this example.The sharp variations in the calculated p-values are a result of numerical instabilities, and the true p-values are bounded by the dashed curves.
We resolve this problem by using a combination of two conservative approximations to avoid the numeri-cal instability.First, consider the following inequality: The values for the weighted sum can be bounded using a simple transformation and a chi-squared distribution of a higher degree of freedom.Second, consider the Markov inequality: where the bound can be minimized over t ∈ (0, 1/2λ max ).The upper bounds for the p-value given by ( 11) and ( 13) are both calculated and the smaller is used in cases where the numerical instability of ( 9) may be a concern.The original formulation, numerical integration of (9), is preferable for most p-values, while the upper bound described above is used for smaller p-values (smaller than 0.001, based on our observations of the numerical instability of the original formulation).Figure 1 shows the bounds with the blue and green dashed lines; values in red exceeding the bounds are a result of the numerical instability.Although it would be preferable to determine the use of the bound based on values of the test statistic rather than the p-value, the range of "extreme" values of the test statistic varies with the hypothesized distribution.

Examples
Consider a toy example with observed data of length 2 (specifically, the values 0 and 1) and a hypothesized null distribution that places equal probability on the values 0 and 1.With the current ks.test() function in R (which, admittedly, doesn't claim to handle discrete distributions), the reported p-value, 0.5, is clearly incorrect: > stats::ks.test(c(0,1), ecdf(c(0, 1))) One-sample Kolmogorov-Smirnov test data: c(0, 1) D = 0.5, p-value = 0.5 alternative hypothesis: two-sided Instead, the value of D given in equation ( 1) should be 0 and the associated p-value should be 1.Our revision of ks.test() fixes this problem when the user provides a discrete distribution: > library(dgof) > dgof::ks.test(c(0,1), ecdf(c(0, 1))) One-sample Kolmogorov-Smirnov test data: c(0, 1) D = 0, p-value = 1 alternative hypothesis: two-sided Next, we simulate a sample of size 25 from the discrete uniform distribution on the integers {1, 2, . . ., 10} and show usage of the new ks.test()implementation.The first is the default two-sided test, where the exact p-value is obtained using the methods of Gleser (1985).

Discussion
This paper presents the implementation of several nonparametric goodness-of-fit tests for discrete null distributions.In some cases the p-values are known to be exact.In others, conservativeness in special cases with small p-values has been established.Although we provide for Monte Carlo simulated p-values with the new ks.test(),no simulations may be necessary necessary for these methods; they were generally developed during an era when extensive simulations may have been prohibitively expensive or timeconsuming.However, this does raise the possibility that alternative tests relying upon modern computational abilities could provide even greater power in certain situations, a possible avenue for future work.
In the continuous setting, both of the Kolmogorov-Smirnov and the Cramér-von Mises tests have twosample analogues.When data are observed from two processes or sampled from two populations, the hypothesis tested is whether they came from the same (unspecified) distribution.With the discrete case, however, the null distribution of the test statistic depends on the underlying probability model, as discussed by Walsh (1963).Such an extension would require the specification of a null distribution, which generally goes unstated in two-sample goodness-offit tests.We note that Dufour and Farhat (2001) explored two-sample goodness-of-fit tests for discrete distributions using a permutation test approach.
Further generalizations of goodness-of-fit tests for discrete distributions are described in the extended study of de Wet and Venter (1994).There are existing R packages for certain type of Cramér-von Mises goodness-of-fit tests for continuous distributions.Functions implemented in package nortest (Gross, 2006) focus on the composite hypothesis of normality, while package ADGofTest (Bellosta, 2009) provides the Anderson-Darling variant of the test for general continuous distributions.Packages CvM2SL1Test (Xiao and Cui, 2009a) and CvM2SL2Test (Xiao and Cui, 2009b) provide two-sample Cramér-von Mises tests with continuous distributions.Package cramer (Franz, 2006) offers a multivariate Cramér test for the two-sample problem.Finally, we note that the discrete goodness-of-fit tests discussed in this paper do not allow the estimation of parameters of the hypothesized null distributions (see Lockhart et al. (2007) for example).
Functions ks.test() and cvm.test() are provided for convenience in package dgof, available on CRAN.Function ks.test() offers a revision of R's Kolmogorov-Smirnov function ks.test() from recommended package stats; cvm.test() is a new function for Cramér-von Mises tests.The revised ks.test() function supports onesample tests for discrete null distributions by allowing the second argument, y, to be an empirical cumulative distribution function (an R function with class "ecdf") or an object of class "stepfun" specifying a discrete distribution.As in the original version of ks.test(), the presence of ties in the data (the first argument, x) generates a warning unless y describes a discrete distribution.If the sample size is less than or equal to 30, or when exact=TRUE, exact p-values are The R Journal Vol.3/2, December 2011 ISSN 2073-4859 provided (a warning is issued when the sample size is greater than 30 due to possible numerical instabilities).When exact = FALSE (or when exact is unspecified and the sample size is greater than 30) the classical Kolmogorov-Smirnov null distribution of the test statistic is used and resulting p-values are known to be conservative though imprecise (see Conover (1972) for details).In such cases, simulated p-values may be desired, produced by the simulate.p.value=TRUE option.The function cvm.test() is similar in design to ks.test().Its first two arguments specify the data and null distribution; the only extra option, type, specifies the variant of the Cramér-von Mises test: x a numerical vector of data values.y an ecdf or step-function (stepfun) for specifying the null model type the variant of the Cramér-von Mises test; W2 is the default and most common method, U2 is for cyclical data, and A2 is the Anderson-Darling alternative.