npordtests: An R Package of Nonparametric Tests for Equality of Location Against Ordered Alternatives

Ordered alternatives are an important statistical problem in many situation such as increased risk of congenital malformation caused by excessive alcohol consumption during pregnancy life test experiments, drug-screening studies, dose-ﬁnding studies, the dose-response studies, age-related response. There are numerous other examples of this nature. In this paper, we present the npordtests package to test the equality of locations for ordered alternatives. The package includes the Jonckheere-Terpstra, Beier and Buning’s Adaptive, Modiﬁed Jonckheere-Terpstra, Terpstra-Magel, Ferdhiana-Terpstra-Magel, KTP, S and Gaur’s Gc tests. A simulation study is conducted to determine which test is the most appropriate test for which scenario and to suggest it to the researchers.


Introduction
Ordered alternative tests are employed to evaluate if a quantitative feature is linked to an ordinal trait, as in the association between ammonia levels and the severity of hepatic encephalopathy (Ong et al., 2003), the association of abnormal MRI findings with bone-marrow-related disease (Bredella et al., 2006), and the association between single nucleotide polymorphisms in human genes and quantitative phenotypes (Hoffmeyer et al., 2000;Cheng et al., 2005;Kawaguchi et al., 2012;Uchiyama et al., 2012;Tan et al., 2014;Yorifuji et al., 2018) There are parametric and nonparametric methods to test ordered alternatives.Nevertheless, the statistical validity of parametric methods depends upon distributional assumptions, such as normality or equality of variances.However, nonparametric tests do not necessitate assumptions about the distribution of the data and are robust to outliers and influential values (Lin et al., 2017b).
Several nonparametric tests were developed to test the equality of locations against ordered alternatives.These tests can be grouped under three headings such as linear combination of two sample statistics, linear rank statistics, and statistics based on k-tuplet.
Linear rank statistics consist of a combination of the rank scores obtained from the combined data and the regression constants.These statistics were originally named as the Left Skewed (LS) and Right Skewed (RS) scores as proposed by Hogg et al. (1975).Gastwirth (1965), Buning and Kossler (1996), and Beier and Buning (1997) proposed Short-Tailed (ST), Long-Tailed (LT), and Wilcoxon (WS) scores, respectively.Beier and Buning (1997) proposed a nonparametric Adaptive Test (AT) for the choice of suitable scores based on the underlying distribution.
The k-tuplet tests are based on the information simultaneously obtained across all samples.These tests are determined by adding N * = n 1 × n 2 × ... × n k functions.That is, k-tuplet includes one observation from each group.Terpstra and Magel (2003) proposed a test k-tuplet statistic (TM), which is based on the indicator function.Ferdhiana et al. (2008) proposed a test statistic (FTM), which can be viewed as a generalization of the TM test.The FTM test uses Kendall correlation coefficient based on the following data: (1, X 1i 1 ), (2, X 2i 2 ), ..., (k, X ki k ), where X ij i = 1, 2, ..., k, j = 1, 2, ..., n i is the sample data.Here, k is the number of groups and n i denotes the number of observations in the ith group.Similarly, Terpstra et al. (2011) proposed KTP test, which uses Spearman correlation coefficient instead of Kendall correlation coefficient.
However, there may be more efficient tests than JT for different data scenarios; nonetheless, a The R Journal Vol.12/1, June 2020 ISSN 2073-4859 perusal of literature does not yield a comprehensive simulation study in which ordered alternative tests are compared for various scenarios.The nonparametric ordered alternative tests have recently been adapted for such big data structures as gene data and machine learning (Lin et al., 2017b), which clearly indicates the significance such a simulation study has.
Our study contributes significantly to the related literature in two ways: 1) This study includes most of the ordered alternative tests in the literature, introduced as an R package, npordtests (Altunkaynak and Gamgam, 2019) including the JT, Modified JT, LS, RS, ST, LT, WS, AT, TM, FTM, KTP, S, and Gaur's Gc tests, and presents open source codes.The npordtests package is publicly available on the CRAN.
2) This study presents a comprehensive simulation study that compares ordered alternative tests in terms of power, which helps researchers choose the most appropriate test for a given scenario.
The organization of this paper is presented as follows.After the introduction, firstly, we give the theoretical information about the nonparametric tests for ordered alternatives included in this study.Secondly, we introduce the npordtests package and demonstrate the applicability of the package using two benchmark datasets.Thirdly, a simulation study is conducted to determine which test is the most appropriate test for which scenario and to give some advice to the researchers.The results of this simulation study and general comments are given in the final section.

Ordered alternative tests
Let X i1 , X i2 , ..., X in i , i = 1, ..., k be random independent samples with size n i from k populations with continuous cumulative distribution function F i (x) = F((x − θ i )/σ i ), where −∞ < θ i < +∞ and σ i > 0 are location and scale parameters, respectively.The null hypothesis to identify whether the populations have common continuous cumulative distribution function can be expressed as A number of test statistics have been proposed to test the null hypothesis in (1) under certain assumptions and for different forms of H 1 .The ordered alternative states that the distributions are stochastically ordered, i.e., (2) Under H 1 , X i tends to be smaller than X i+1 , i = 1, 2, ..., k − 1, since F i (x) ≥ F i+1 (x) implies that P(X i ≤ X i+1 ) ≥ 1/2.For the special case of the location model, (2) is equivalent to (Terpstra et al., 2011) H 1 : Similarly, the ordered alternative hypothesis states that X i tends to be larger than . For the location model, ( 4) is equivalent to (5)

Jonckheere-Terpstra test
This classic nonparametric test is typically used for ordered alternatives and was proposed by Terpstra (1952) and Jonckheere (1954).It is known that the Mann-Whitney statistic defines as where n i and n j are the sample sizes for the ith and jth populations, respectively, and I(ψ) = 1 if ψ is true and 0 otherwise.The test statistic JT corresponds to the sum of the k(k − 1)/2 Mann-Whitney statistics, i.e., The statistic JT is approximately normally distributed under H 0 .The mean and variance of this The R Journal Vol.12/1, June 2020 ISSN 2073-4859 statistic are where

Beier and Buning's Adaptive test
This test is a two-step method based on the selection of the weight coefficients of the linear rank statistics according to the shape of the distribution (Beier and Buning, 1997).A linear rank statistics has the following form: where N is the combined sample size; c N (.) are the regression constants; a N (.) are the scores; R ij is the rank of X ij in the combined data.For an ordered alternative, the following proposal is made: Under H 0 , the mean and variance of linear rank statistics are and The distribution of a linear rank statistic converges to a normal distribution with mean E(L N ) and variance V(L N ) (Hogg and Craig, 2013;Beier and Buning, 1997).
There are some suggestions for the score a N (.) according to the shape of the distribution in the literature as follows These scores are efficient for detecting shifts in distributions that are skewed to the left (Beier and Buning, 1997).
These scores are particularly good for detecting shifts in short-tailed distributions and were proposed by Gastwirth (1965).
These scores are efficient for detecting shifts in long-tail distributions and were proposed by Buning The R Journal Vol.12/1, June 2020 ISSN 2073-4859 andKossler (1996).
These scores are efficient for detecting shifts in distributions that are skewed to the right (Hogg et al., 1975).
The adaptive test proposed by Beier and Buning (1997) is denoted by the index of their scores.For example, the distribution-free test based on the scores a ST (.) of Gastwirth (1965), which is particularly good for detecting a shift in short-tailed distributions, is denoted by ST.Now, the adaptive test AT is defined by where x p is the quantile value of the combined data, and the estimation values of the skewness and tailweight of the distribution are Ŝ1 = x 0.975 − x 0.5 x 0.5 − x 0.025 and Ŝ2 = x 0.975 − x 0.025 x 0.875 − x 0.125 .
Since the adaptive statistic is a linear rank statistic, the distribution of each of these statistics converges to a normal distribution with mean E(L N ) and variance V(L N ).
Modified Jonckheere-Terpstra test Tryon and Hettmansperger (1973) proposed the modified JT statistic to test H 0 against the ordered alternatives, where U ij is the Mann-Whitney statistic computed for the samples from the ith and jth populations.Neuhäuser et al. (1998) suggested that this test be used in place of the JT tests because it often has larger powers.
This statistic has a normal distribution under H 0 , and its mean and variance are Terpstra-Magel test Terpstra and Magel (2003) proposed a test statistic that does not focus on pairwise information.Instead, they use the information present in the N * = n 1 × n 2 × ... × n k k-tuplets, where a k-tuplet includes one observation from each treatment group.More specifically, the Terpstra-Magel (TM) test is based on the following statistic: where the indicator function is equal to one when X 1i 1 < X ki k .The statistic TM is approximately normally distributed under H 0 .The mean and variance of this The R Journal Vol.12/1, June 2020 ISSN 2073-4859 statistic are where l 0 = 0. Ferdhiana et al. (2008) proposed FTM test statistic can be viewed as a generalization of the TM test.
Under H 0 , the statistic KTP is approximately normally distributed, and its mean and variance are , and In the KTP test, Spearman's rank correlation coefficient r s is given by the following formula: The R Journal Vol.12/1, June 2020 ISSN 2073-4859 where d i represents the difference between the rank given to the value of the variable for each item of the particular data with y i .This formula is applied in cases when there are no tied observations.The formula to use when there are tied observations is: zero.This result is also similar for Kendall correlation coefficient.Therefore, FTM and KTP tests cannot be applied to this type data.See Lehmann's data used in the demonstration of the npordtests package.Shan et al. (2014) proposed the new rank-based nonparametric test by incorporating the actual differences as follows

S test
where Under H 0 , the statistic S has a normal distribution with the following mean and variance

Gaur's Gc test
Let (w 1 , w 2 , ..., w k−1 ) be suitably selected real positive constants.Gaur (2017) proposed the G c statistic to test H 0 against the ordered alternatives, where The R Journal Vol.12/1, June 2020 ISSN 2073-4859 for g < h; h = 1, 2..., k; ∑ 0 is the sum over all combinations (α 1 , ..., α c ) of c integers selected from (1, ..., n g ) and over all combinations (β 1 , ..., β c ) of c integers selected from (1, ..., n h ); The distribution of Gaur's statistic G c converges to a normal distribution with zero mean under H 0 , and the variance of this statistic are obtained as follows where w = (w 1 , w 2 , ..., w k−1 ) and ∑ = [σ gh ] is the variance-covariance matrix, such as: where It is recommended to use G c tests for light-tailed and moderate-tailed distributions with c = 2, whereas for heavy-tailed and long-tailed distributions with large values of c.The optimum weights w g 's in the G c test are

Demonstration of the npordtests package
The npordtests package includes thirteen tests and six datasets for ordered alternatives.In this section, firstly, we introduce the datasets included in the package.Then, we demonstrate the usage of the package by using two of these datasets.All the examples in this section should run if you type them in exactly as printed, provided that you have the npordtests package not only installed but also loaded into your current search path.This is done by entering

R> library(npordtests)
at the command prompt.

Datasets Jonckheere's data: jdata
This hypothetic data given by Jonckheere (1954) are used to test the hypothesis that the four samples have come from the same population against the alternative that the populations are such that the values from the samples I, II, III, IV are in an expected order of increasing value.

Lehmann's data: lehmann
This dataset was used by Lehmann (1975) to assess if it is possible for a particular diagnostic test to be successfully interpreted without psychological training.This dataset later became one of the classical datasets used to investigate sequential alternatives (Beier and Buning, 1997).The data included 72 evaluators' (21 staff members, 23 trainees and 28 undergraduate psychology majors) assessment scores for the diagnostic test.If training and experience have any effects, the staff members could be expected to perform the most accurately, the trainees next, and the undergraduates the least.

Chicks' weight data: chicks
These data are given by Desu and Raghavarao (2004) to examine the hypothesis that the chicks' mean weight goes up with the increase in the amount of protein.Eighteen chicks were randomly assigned to three treatments with six chicks in each for balanced data.Treatment 1 had the diet with the lowest level of protein; treatment 2 had the diet with a medium level of protein; and treatment 3 had the highest level of protein.After six weeks of feeding, the values of weight gain were recorded.We wanted to test if the mean weight gain increased with the amount of protein (Chang and Yen, 2011).

Hepatic vein waveform index data: hvwi
These data were collected by Pedersen et al. (2008) through doppler waveforms corresponding to 66 patients scheduled for a percutaneous liver needle biopsy.The waveforms were characterized using a hepatic vein waveform index (HVWI), whereas the biopsy specimens were grouped according to the degree of fibrosis.The hypothesis of interest was that the HVWI values would tend to decrease as the degree of fibrosis increases (Terpstra et al., 2011).

Hypertension data: hypertension
These data presented by Dmitrienko et al. (2006) examine the effect of different drug doses on diastolic blood pressure.The patients with hypertension were randomized into four groups with different dose levels, 0, 10, 20, and 40 mg/day, where the group with 0 mg/day was the placebo group.The number of the patients in each group were 17, 17, 18, and 16, respectively.The complete data can be found at the Dmitrienko et al. (2006) or Shan et al. (2014).

Neuhauser's data: neuhauser
These synthetic data are reported by Neuhäuser et al. (1998).The data consist of 4 groups with 10 observations in each.
In order to compare the distributions of groups for each dataset, the boxplots are given in Figure 1.As can be seen from the figure, there is a ordered alternative pattern in all datasets.The R Journal Vol.12/1, June 2020 ISSN 2073-4859

Tests
Using the datasets which are named jdata and lehmann, demonstration of the tests are given below, respectively.
alpha is the level of significance to assess the statistical difference.Default is set to alpha = 0.05.na.rm is a logical value indicating whether NA values should be stripped before the computation proceeds.Default is na.rm = TRUE.verbose is a logical for printing output to R console.Default is set to verbose = TRUE.These arguments are available in the functions for ordered alternatives.The users who would like to use the statistics in the output in their programs can use the following codes.
R> res<-JtTest(Y~X,jdata,alpha=0.05,na.rm=TRUE,verbose=FALSE) Here, the codes for how to obtain the statistics from the Jonckheere-Terpstra test output are given.Since all ordered alternative tests return similar outputs, similar codes are not repeated in the other tests.For all tests, the level of significance is taken as 0.05.

Beier and Buning's Adaptive test: AtTest(...)
The AtTest function in the npordtests package is used to perform the Adaptive test.The LS, RS, ST, WS and LT tests are also available as functions in the package.

R> TmTest(Y~X,jdata)
In the output, the Statistic is calculated from the Equation (10).Z is calculated from (TM − E(TM))/ V(TM).p-value is the significance value for the TM test.The p-value for the TM test is 0.00000002205097.Thus, we can conclude that the null hypothesis of the equality of locations is rejected under setting α = 0.05.

KTP test: KtpTest(...)
The KtpTest function in the npordtests package is used to perform the KTP test.

R> KtpTest(Y~X,jdata)
The R Journal Vol.12/1, June 2020 ISSN 2073-4859 Here, the Statistic is calculated value of the test statistic.p-value is the significance value for the MJT test.The p-value for the MJT test is 0.0007331448.Since this p-value is smaller than α = 0.05, the null hypothesis against the ordered alternative is rejected.

R> FtmTest(Values~Group,lehmann)
As seen in the output, the error standard deviation is zero is encountered.This error occurs because the values of 68.5, 69.0, 70.5, 71.5, 73.0, 74.0, 74.5 are included in all groups.
The sample size patterns in this simulation study are shown in Table 1.We used log -F(v 1 , v 2 ) distributions to generate the random variable X ij = θ i + ε ij , where ε ij is the iid log-F distribution, and θ i is the location parameter; which is symmetric when v 1 = v 2 , right skewed when v 1 > v 2 , and left skewed when v 1 < v 2 (Terpstra et al., 2011).In order to evaluate the performances of the tests, we consider the cases of (v 1 , v 2 ) = (5, 5), (1, 10) and (10, 1) for the symmetric, left skewed and right skewed populations, respectively.
While the location parameters of populations are equal, simulated type I error rates are calculated.Otherwise, in case the location parameters of the populations are not equal, the simulated powers of the tests are computed.In order to assess the robustness of the tests in terms of Type I error rate, we used the robustness criterion recommended by Bradley (1978).This liberal criterion for the robustness is set at ±.5α around the nominal alpha level.For instance, using the alpha level of .05, a test is considered robust when the simulated Type I error rates fall between .025 and .075.

Results
Figure 2 presents a set of boxplots based on the simulated Type I error rates for all scenarios considered while the nominal alpha level is .05.As shown in Figure 2, although all of the tests ensure the Bradley's liberal criterion, the JT, MJT, and FTM tests are the three best performing approaches that controlled nominal Type I error in all simulation scenarios.On the other hand, the TM test has a wider range than the others for the simulated type I error rates.
The simulated power values of the tests for the simulation scenarios above are given in Table 2-4.The results in these tables can be interpreted as follows: • As seen in Table 2, when the data is generated from the symmetric distribution (log -F(5, 5)), the most powerful test changes according to the shape of ordered alternative.When the shape of ordered alternative is linear, the MJT test are more powerful test than the other tests for all sample size patterns.On the other hand, when the shape of ordered alternative is convex, the S test has the highest power among all tests considered for all sample size patterns.Beside these, the simulated power values of KTP test for ordered alternative with concave shape are higher than those of the other tests when sample size patterns are progressive or one extreme.But, the S test is better than the other tests in terms of power when the sample size pattern is equal.On the other hand, when the average sample size for all distributions was quite large such as 50, the simulated power values for all tests were found to be quite close to 1.
• When the data is generated from the log -F(10, 1) distribution which is a skewed to the right, Table 4 shows that the AT test for ordered alternatives with linear shape, generally, gives better results.On the other hand, when average sample size is 5 and k = 3 the TM test for this situation is the most powerful test.As seen in Table 4, when the shape of ordered alternative is a convex, it is observed that the S test generally yields the highest power values.In addition, while the sample size patterns are progressive and equal, and average sample size is 20, 30, and 50, the power values of the AT test for this situation are greater than those of the others.By the examination of the results in Table 6, when ordered alternative has a concave shape, it is seen that the TM test is the most powerful test among the whole tests.
Table 5 gives decision rules indicating which test is more appropriate for which design.
When the ordered alternative has a linear shape and the distribution is symmetric, the MJT test should be preferred.However, when the ordered alternative has a linear shape and the distribution is skewed to left and average sample size is 5 or 10, it can be stated that the TM test has a more significant power advantage than the others.On the other hand, average sample size is 20, 30, or 50, it can be said that the AT test has a more significant power advantage than the others.
On the other hand, when the ordered alternative has a convex shape, the AT test is recommended for the distributions skewed to left.However, if these distributions are symmetric, the S test is proposed.Besides this, if the distributions are skewed to right and the sample size pattern is equal, then the MJT test is recommended.Further, if the distributions are skewed to right and the sample size pattern is progressive or one extreme, then S test is used.
When the ordered alternative has a concave shape and the sample size pattern is equal, then the S test is used for symmetric distribution.In addition, when the ordered alternative has a concave shape and the sample size pattern is progressive or one extreme, then the KTP test is recommended for symmetric distribution.Moreover, if the distributions are skewed to left and the sample size is 5, TM test is recommended, but in the case of 10, 20, 30, 50 for the sample size, the KTP test is recommended.Finally, if the distributions are skewed to right, the TM test is recommended.
The R Journal Vol.12/1, June 2020 ISSN 2073-4859 Table 5: The rules based on the simulation results for choice the test.For example, when the ordered alternative has a linear shape and the distribution is symmetric, the MJT test should be preferred.

Figure 1 :
Figure 1: Boxplots for the datasets.Each box plot gives median (the bold line that divides the box into two parts), lower and upper quartiles (start and end points of the box on the vertical axis) and min and max value (the horizontal lines outside the box).The outliers appear as the circles.
is not rejected.
hypothesis is rejected.
Asymp.p-value = NA Error in if (p-value > alpha) { : missing value where TRUE/FALSE needed In addition: Warning message:

Figure 2 :
Figure 2: Distributions of simulated Type I error rates across all simulation scenarios when nominal alpha is .05.Each box plot gives median (the bold line that divides the box into two parts), lower and upper quartiles (start and end points of the box on the vertical axis) and min and max value (the horizontal lines outside the box).The outliers appear as the circles.

Table 1 :
Simulation study sample size patterns.k is number of samples and n is average number of observations per group.The values in the table are sample sizes.For example, in case of k = 3, n = 5 and progressive pattern, the sample sizes of groups are 4, 5 and 6, respectively.