rankFD: An R Software Package for Nonparametric Analysis of General Factorial Designs

Many experiments can be modeled by a factorial design which allows statistical analysis of main factors and their interactions. A plethora of parametric inference procedures have been developed, for instance based on normality and additivity of the effects. However, often, it is not reasonable to assume a parametric model, or even normality, and effects may not be expressed well in terms of location shifts. In these situations, the use of a fully nonparametric model may be advisable. Nevertheless, until very recently, the straightforward application of nonparametric methods in complex designs has been hampered by the lack of a comprehensive R package. This gap has now been closed by the novel R-package rankFD that implements current state of the art nonparametric ranking methods for the analysis of factorial designs. In this paper, we describe its use, along with detailed interpretations of the results.


Introduction
Nonparametric methods and in particular rank-based methods are commonly used for the analysis of experiments when it cannot be assumed that the observations derive from a normal population distribution.In online discussion fora regarding the application of statistical methods one can often find questions such as: "Does anybody know whether there is a nonparametric analog of ANOVA?".The common response is: "You may use rank methods" which usually prompts the next question: "Does anybody know a software package performing the computations for a nonparametric ANOVA / rank ANOVA?".The answers to this question vary: some list more or less popular statistical software packages, others give the heuristic advice of simply replacing the observations by their ranks and then performing regular ANOVA on the ranks.This suggests that there is a lack of clear advice on not just how to implement rank-based methods, but also how to interpret and understand the theoretical background.As such, the goal of the present article is to both explain when and how to use the procedures implemented in rankFD, and also provide the reader with enough of the theoretical background so that they can interpret the results correctly.
In order to provide a more precise answer regarding the nonparametric analog of ANOVA, one has to discuss the quantities by which a potential effect in a trial can be intuitively described.Such effects may be the differences or ratios of the means of the observations or of some other parameter or estimand defined in a semi-parametric model.To compare the differences of means in semi-parametric models where the normal distribution cannot be assumed, the so-called studentized permutation procedures (Janssen, 1997;Pauly et al., 2015;Smaga, 2015) are appropriate.These procedures provide quite accurate results even in case of small to moderate sample sizes, depending on the type of the data and the underlying population distribution.However, there are several situations where differences or linear combinations of means may not be appropriate to describe intuitive treatment effects -for example if the data have floor and ceiling effects or if the distributions have completely different shapes.In case of ordinal data, means are not even defined, and using a numerical encoding of the ordered categories as seemingly metric data may lead to incorrect conclusions (Kahler et al., 2008).In such cases, treatment effects can reasonably be described by the so-called relative effect which was introduced by Mann and Whitney (1947) and Putter (1955).For independent observations X ∼ F 1 and Y ∼ F 2 , the relative effect is defined as θ = P(X < Y) + 1 2 P(X = Y), which can be equivalently written as θ = F 1 dF 2 .It may be noted that this effect has been known under many different names in the literature, for example Wilcoxon functional (Janssen, 1999a), Mann-Whitney type effect (Dobler et al., 2019), stochastic superiority (D'Agostino et al., 2006), or probabilistic index (Acion et al., 2006;Thas et al., 2012).We prefer the expression "relative effect" or "nonparametric relative treatment effect" with reference to Birnbaum and Klose (1957).
The relative effect θ can be estimated by replacing the distribution functions F 1 and F 2 with their empirical counterparts, F 1 and F 2 , the so-called empirical distribution functions.This leads to the k=1 R 2k denotes the mean of the overall ranks R 2k of the observations X 21 , . . ., X 2n 2 among all N = n 1 + n 2 observations in the experiment.It is well-known that θ is an unbiased and L 2 -consistent estimator of the relative effect θ and thus, the mean of the ranks provides the basis for estimating θ and for statistical inference regarding θ.
The R Journal Vol.15/1, March 2023ISSN 2073-4859 For two random variables X and Y, a relative effect θ > 1/2 indicates a tendency that X takes smaller values than Y, while θ < 1/2 means that X tends to have larger values than Y.No tendency in either direction corresponds to a relative effect of θ = 1 2 .Crucially, the presence of a relative effect does not translate to a difference in means, and likewise, the absence of a relative effect does not suggest that the means are the same.In other words, if X has a mean µ x and Y has a mean µ y , then we may have θ ̸ = 1/2 when µ x = µ y , or θ = 1 2 when µ x ̸ = µ y .Analogously, for the medians µ x and µ y , it is possible that θ ̸ = 1 2 and µ x = µ y , or that θ = 1 2 and µ x ̸ = µ y .Thus, from a significant result of a rank test it cannot be concluded that µ x ̸ = µ y or µ x ̸ = µ y .In this sense, rank tests based on θ (e.g., the Wilcoxon-Mann-Whitney test, the Fligner-Policello test, or the Brunner-Munzel test) are not tests of the equality of means or medians, and therefore not simply nonparametric analogs of the t-test since the hypotheses and consistency regions of these tests are not identical.Note that the consistency region contains all distribution functions for which the power of the test tends to 1 as sample sizes tend to ∞.In most parametric models, the set of distribution functions contained in the hypothesis and in the consistency region are complementary.In some nonparametric models, however this is in general not the case which may lead to difficulties interpreting "significant" results obtained by rank-based tests (Brunner et al., 2020).Some details will be explained in Section 2.2.Similar remarks apply to rank tests for multiple samples or even in factorial designs.This is ultimately the reason why the heuristic approach of replacing the observations by their ranks may lead to non-valid procedures in general (Conover and Iman, 1981).Especially in factorial designs, linear combinations of means may have different meanings than linear combinations of relative effects.With this in mind, users of the R-package for rank tests described in this paper should know that they might get different results than obtained by using a common ANOVA package.
The second question often read in discussion fora -'what software package should I use' -can be answered more easily.Most statistical software packages provide options for the classical nonparametric rank-based methods, however, these can still be quite limited and more contemporary and/or appropriate methods may not be available.For example, most statistical software packages offer the Wilcoxon-Mann-Whitney and the Kruskal-Wallis test for independent observations, as well as some particular procedures from the literature.However, more modern nonparametric rank-based methods developed during the last decades (Ruymgaart, 1980;Akritas and Arnold, 1994;Akritas et al., 1997;Brunner and Puri, 1996;Konietschke et al., 2012;Brunner et al., 2017Brunner et al., , 2019) ) are not mplemented in most packages.Moreover, in software tools following a more classical paradigm, ties (i.e., two or more different observations with exactly the same value, as frequently is the case in ordinal or count data) are often considered in form of "corrections" that are added to the case of no ties, instead of considering the situation of no ties as a special case of a general model allowing for arbitrary ties (only the trivial case of one-point distributions should generally be excluded).Also, quick algorithms (Streitberg and Röhmel, 1986;Mehta et al., 1988) for the computation of exact p values for permutation-based procedures are rarely used, and general methods for purely nonparametric effects in factorial designs are not provided in standard implementations.However, exactly such procedures are often needed in applications.Researchers are then tempted to use heuristic procedures as described above, although the conclusions drawn from them might be misleading.
Finally, confidence intervals for purely nonparametric effects, such as the relative effect θ, are not provided in standard software, in spite of the fact that appropriate confidence intervals for the effect measures being used in the analysis have been required by the pertinent guidelines for decades.Instead, some software packages offer confidence intervals for location shift effects which in general may be neither compatible to the decisions of the rank tests nor justified regarding the types of alternatives or the scales of the measurements in the experiment.Recall that the relative effect is not a measure of mean or median differences, and therefore confidence intervals for mean or median shifts are not congruent with hypothesis tests based on the relative treatment effect, such as the Wilcoxon-Mann-Whitney and the Kruskal-Wallis tests, among others.
The R package rankFD intends to close these gaps.It includes the classical rank tests for continuous observations as special cases, allows for situations with arbitrary ties, and extends these procedures to factorial designs.The hypotheses tested in factorial designs are expressed as linear hypotheses in terms of the distribution functions as introduced in Akritas et al. (1997) or as linear combinations of the relative effects as discussed in Brunner et al. (2017).Ranking procedures for testing equalities of distribution functions in factorial longitudinal data (repeated measures) and multivariate data are implemented in R packages nparLD (Noguchi et al., 2012), npmv and nparMD (Burchett et al., 2017;Kiefel et al., 2022), respectively.Semiparametric methods for testing null hypotheses in general factorial designs in means are implemented in the R packages GFD (Friedrich et al., 2017) and MANOVA.RM (Friedrich et al., 2019b).
In any case, it must be clearly noted that rank methods, especially in factorial designs, answer different questions than those considered by the ANOVA in common factorial designs.The relations between linear combinations of the expectations of the observations and their respective counterparts The R Journal Vol.15/1, March 2023 ISSN 2073-4859 expressed in terms of rank or pseudo-rank means depend on the underlying distribution functions.Questions investigated by parametric factorial designs are related to the expected values of the observations, while questions investigated by using rank-and pseudo-rank-based methods are related to relative effects.The latter compare the distributions in the different treatment groups to an average distribution.Thus, it should not be a surprise to obtain different answers if different questions are posed.This must be kept in mind when responding to the seemingly simple question: "Does anybody know whether there is a nonparametric analog of ANOVA?".
The paper is organized as follows.Section 2.2 discusses the statistical models and explains the concepts and methodology underlying the inferential procedures provided by the package rankFD while the corresponding test statistics are described in Section 2.3.Section 2.4 lists and explains the different functions used in this package, as well as examples demonstrating the usage of these functions on real-life data.The paper closes with a discussion of the meaning and interpretation of these methods and their relations to some procedures implemented in other R packages

Statistical models, effects, and hypotheses
First we consider the simple experimental design involving only one factor A with a levels involving n i independent observations in each level i.These are modeled as (1) Throughout, we assume that the observations X ik are measured at least on an ordinal scale, whereas F i denotes an arbitrary distribution (or its cdf), with the exception of one-point distributions.In total, there are N = ∑ a i=1 n i observations in the trial.This statistical model does not involve any explicit parameters or parametrization that could be used to describe appropriate treatment effects.To describe effects in such a general model, we therefore define weighted and unweighted relative effects In this general definition of a relative treatment effect, each distribution function F i is compared either to a weighted average H N = 1 N ∑ a i=1 n i F i or an unweighted average G = 1 a ∑ a i=1 F i of the distribution functions.This can be regarded as comparing each observation X ik ∼ F i with either an artificial independent observation Y ∼ H N of the weighted mean distribution or Z ∼ G of the unweighted mean distribution.The former leads to the weighted relative effect θ i , while the latter leads to the unweighted relative effect ψ i .In case of equal sample sizes, both effects coincide.
The unweighted relative effects ψ i can be interpreted as follows: If ψ i < 1 2 , then the observations in group i tend to be smaller than those coming from the average distribution G.If ψ i = ψ j , then in relation to the average distribution G, the observations coming from distributions F i and F j have the exact same tendency towards smaller or larger observations.Thus, it is reasonable to consider the case of ψ i = ψ j as no (relative) treatment effect between levels i and j.The relations and interpretations for the weighted effects θ i and θ j follow analogously.In the following, we collect all distribution functions and relative effects in the vectors F = (F 1 , . . ., F a ) ⊤ and ψ = (ψ 1 , . . ., ψ a ) ⊤ or θ = (θ 1 , . . ., θ a ) ⊤ , respectively.
Estimators of the weighted relative effects θ i defined in (2) can be obtained using the ranks R ik of the observations X ik .In fact, k=1 R ik , and R ik denotes the rank of X ik among all N = ∑ d i=1 n i observations.In case of ties, mid-ranks must be used.Formally, the mid-rank R ik is obtained from the empirical weighted average distribution function ).In the same way, the unweighted relative effects ψ i defined in (3) are estimated using the so-called pseudo-ranks R , where G(x) denotes the empirical unweighted average distribution function.An unbiased and consistent estimator ψ i of ψ i is given by where Thus, rank tests are related to the weighted relative effects θ i in (2), while pseudo-rank tests are related to the unweighted relative effects ψ i in (3).

Hypotheses formulated in terms of distribution functions
Classical rank-based methods for a one-way layout, (e.g., Kruskal-Wallis test, Kruskal (1952); Kruskal and Wallis (1952); or Hettmansperger-Norton test, Hettmansperger and Norton (1987)) can be used to test null hypotheses formulated in terms of the distribution functions, such as where obviously, equal distribution functions imply equal variances if H F 0 in ( 5) is true (if second moments exist).Two-and higher way layouts are covered within model ( 1) by sub-indexing the index i, similar to the theory of linear models.For instance, a two-way design involving a factor A with a levels and a factor B with b levels, respectively, can be written as and the distribution functions and relative effects are then collected in the structured vectors F = (F 11 , . . ., F ab ) ⊤ and ψ = (ψ 11 , . . ., ψ ab ) ⊤ or θ = (θ 11 , . . ., θ ab ) ⊤ , respectively.
Consequently, Akritas and Arnold (1994), Brunner and Puri (1996), and Akritas et al. (1997) suggested to formulate null hypotheses in two-and higher-way layouts in a similar way as in linear models, with the expected values being replaced by the corresponding distribution functions.In a two-way layout, for example, hypotheses of no (distribution-)main effects A or B and no (distribution-)interaction (AB) are written as In order to extend the hypotheses in ( 5) or (7) to higher-way layouts, general hypotheses are written using matrix notation as where C denotes an appropriate hypothesis matrix, in the same way as in linear models, only replacing means with the respective distribution functions.Note that 0 is here understood to be a vector of functions which are identically 0. Testing these hypotheses H F 0 of no distribution effects can be performed using the argument hypothesis="H0F" in the rankFD function.More details are provided in Section 2.4.

Hypotheses formulated in terms of relative effects
In general, researchers may not be interested in detecting the somewhat abstract alternative H F 1 : CF ̸ = 0 that H F 0 in (8) is not true, but instead they want to detect whether a tendency to smaller or larger values exists between treatment levels.In a one-way layout, for example, the latter corresponds to the testing problem formulated in terms of the relative effects ψ i .Here, the symbol H P 0 refers to the probabilities ψ i in (3).Remark: Of course, one can also state the hypothesis but it must be kept in mind that the hypothesis (10) depends on the relative sample sizes n i /N in groups i = 1, . . ., a. Thus, the rejection region of such a test is not invariant, but it changes with the ratios n i /N of the sample sizes.In extreme cases, this might lead to surprising results when compared The R Journal Vol.15/1, March 2023 ISSN 2073-4859 to the results obtained in designs with equal sample sizes.For details we refer to Brunner et al. (2020) and Brunner et al. (2019).The unweighted mean distribution is, however, one reference distribution of choice that helps in reducing the issues obtained with the weighted version.Whether the unweighted version is the "best" one, can not be answered and guaranteed, in general (Zimmermann et al., 2022).
In a two-way layout, for example, the hypotheses of no main effects or no interactions in terms of the relative effects ψ ij = GdF ij are written as where The matrix notation of these hypotheses is, analogously to ( 7) and ( 8), where ψ denotes the vector of unweighted relative effects.For a detailed explanation of using matrix notation in factorial designs we refer to, e.g., Brunner et al. (2017) or Brunner et al. (2019), Sect.5.2 and Sect.8.7.1.
In a similar way as in the one-way layout, the hypotheses involving the weighted relative effects θ ij in the two-way layout can be stated by replacing ψ ij , ψ i• , and ψ •j in (11) with θ ij , θ i• , and θ •j , respectively.It may be noted, however, that -unlike in the one-way layout -in two-or higher-way layouts surprising results may already be obtained in case of moderate unequal samples sizes in simple shift-effect models.These basic models cannot be considered "extreme cases".This means that unequal sample sizes in two-or higher-way layouts constitute a serious challenge for rank tests while this is not the case for pseudo-rank tests.For more details we refer to Brunner et al. (2019), Chapter 5 and Brunner et al. (2020), Section 4.
Note that H P 0 in (12) neither implies variance homogeneity nor equal shapes of the distributions.In the case of two samples, this situation is also known as the nonparametric Behrens-Fisher problem (Fligner and Policello, 1981;Brunner and Munzel, 2000;Konietschke et al., 2012).In general, it is easier to estimate the covariance matrix of the empirical relative effects under the stronger null hypothesis H F 0 than under H P 0 .Therefore, statements about the sampling distribution of test statistics based on ranks have traditionally been formulated under H F 0 , even though it is well-known that those test statistics can only detect alternatives of the form H P 1 : Cθ ̸ = 0 or H P 1 : Cψ ̸ = 0. Remark: The rankFD package implements the current state-of-the-art methods for testing H P 0 (using ranks as well as pseudo-ranks) in general factorial designs (Konietschke et al., 2012;Brunner et al., 2017), and it allows for the computation of a wide range of nonparametric test statistics.It explicitly also includes the classical tests based on weighted relative effects θ i (using ranks) and on unweighted relative effects ψ i (using pseudo-ranks).Both types of ranking procedures are included in rankFD.A reason for including the former tests is that it allows users to reproduce findings that have been obtained by other researchers using rank tests.Also, it offers the possibility to directly compare procedures which may facilitate a transparent discussion in that regard.

Multiple comparisons
So far, both null hypotheses H F 0 and H P 0 have been written as global null hypotheses.If they get rejected, one may only conclude that some factor level differs from the others (at corresponding significance level α).However, it still remains unknown specifically which one differs.Therefore, testing global null hypotheses often does not answer the particular research question of interest to scientists applying statistical methods, namely the specific localization of those treatment groups that are "driving" the significant results.In order to accomplish this goal, testing linear contrasts using a q × a contrast matrix in terms of multiple null hypotheses H (ℓ) is the key.Here, each row vector c ⊤ ℓ describes one of q different contrasts reflecting the researcher's particular question.For The R Journal Vol.15/1, March 2023 ISSN 2073-4859 instance, in a one-way layout with a = 4 levels, many-to-one (Dunnett-type) (Dunnett, 1955) or all pairwise (Tukey-type) comparisons are performed with the contrast matrices Note: left shows many-to-one (Dunnett-type); right shows all-pairwise (Tukey-type) contrast matrix respectively.Which contrast to use depends on the respective research question of interest.Bretz et al. (2001) provide a broad overview of different contrast matrices, which are numerically available within the contrMat function of the multcomp package in R (Hothorn et al., 2008).In general factorial designs involving more than one factor, multiple comparisons in terms of means of the levels of the main effects are a meaningful and valuable asset of a fundamental data analysis.For instance, in a 2 × 4 two-way design, many-to-one comparisons to the control group (j = 1 of factor B) are expressed as The rankFD function implements a broad list of pre-defined contrasts as well as flexible options allowing for user-defined contrast matrices for making multiple comparisons of the levels of the main or interaction effects.We provide computational details in Section 2.4.

Confidence intervals
To comply with the basic principle "no test without a confidence interval", the rankFD package also provides confidence intervals for the nonparametric quantities upon which the test is based.Two-sided (1 − α)-confidence intervals for ψ i and θ = ψ 2 − ψ 1 are obtained from the asymptotic distribution of the estimators ψ i in (4) by where z 1−α/2 denotes the (1 − α/2) quantile of the standard normal distribution.Here, the variance estimator s 2 i is a quite involved linear combination of different quadratic forms obtained from different rankings of the observations X ik .For details we refer to Brunner et al. (2019), Sect.4.6.1.
The confidence intervals in (13) may suffer from poor coverage probability if ψ i is close to the limits 0 or 1 and, moreover, the limits of the confidence interval may exceed the boundaries 0 or 1.In this case, so-called range preserving intervals can be obtained by using the logit-transformation.The limits thus obtained are then "back-transformed" using the expit-transformation.For details we refer to Brunner et al. (2019), Sect.4.6.2.
In rankFD, these confidence intervals are computed by the function rankFD() using the options CI.method = "normal" for the limits in (13) or CI.method = "logit" for the range preserving confidence intervals obtained by the logit-transformation.By default, rankFD() provides confidence intervals for both, ψ i and θ i .Regarding the confidence intervals for θ i the same remarks as in Sect.2.2.2 apply.Furthermore, since the Wilcoxon-Mann-Whitney test (and relative methods) use variance estimators that are only consistent under the respective null hypothesis H F 0 formulated in terms of the distribution functions, the tests cannot be inverted into confidence intervals for ψ i .

Test statistics
The rankFD package implements a broad class of different test statistics for testing the general null hypotheses H F 0 : CF = 0, H P 0 : Cψ = 0, and H P 0 : Cθ = 0, respectively.They include global test procedures (quadratic forms) and multiple contrast tests (linear statistics) for the analysis of data The R Journal Vol.15/1, March 2023 ISSN 2073-4859 from general factorial designs, as well as methods specifically designed for the evaluation of two independent samples including the classical rank tests.
In the following, we will briefly explain these procedures.They are all based on the (asymptotic) distribution of standardized vectors of point estimators θ = ( θ 1 , . . ., θ d ) ⊤ or ψ = ( ψ 1 , . . ., ψ d ) ⊤ of the weighted or unweighted relative effects as defined in (2) and (3), respectively.Since both of them denote the probabilities (appropriately weighted) of data being smaller in group i than in the joint sample, estimators can be constructed using the (usual) ranks R ik or the so-called pseudo-ranks R ψ ik (Happ et al., 2020).In rankFD these point estimators are obtained by effect=weighted (scaled) mean of ranks R ik effect=unweighted (scaled) mean of pseudo-ranks R ψ ik For more details, we refer to (Brunner et al., 2019, Section 2.3.2).Besides the vectors of point estimators θ or ψ, their (estimated) covariance matrices are needed for the computation of test statistics.In the general nonparametric setup considered here, we can take advantage of the type of hypothesis we aim to test.Assuming H F 0 to hold, then the covariance matrices of √ N( θ − θ) and of √ N( ψ − ψ) have (much) simpler structures than under H P 0 (Konietschke et al., 2012).This property carries over to its estimation and therefore the estimators used in the statistics for testing H F 0 or H P 0 are different.However, for the ease of notation, we denote with V N their estimators in a general way having both versions in mind.In the following, we therefore provide the statistics using ψ (and in turn the pseudo-ranks) for the ease of convenience only.For more details we refer to Brunner et al. (2020).

Global test procedures
In order to test the null hypothesis H F 0 as given in ( 8), the rankFD package implements the Wald-type statistic Here, the matrix [A] + denotes the Moore-Penrose inverse of the matrix A. Under the hypothesis H F 0 , the statistic W N (C) follows, for large sample sizes, a χ 2 w -distribution with w = rank(C V N C ⊤ ) degrees of freedom.Since the statistic involves the estimators and the known contrast matrix only, its numerical computation is feasible.However, very large sample sizes (n i ≥ 50; depending on the actual design) are necessary for an accurate type-1 error rate control.Therefore, Akritas et al. (1997) and Brunner et al. (2017) propose the so-called ANOVA-type statistic and approximate its distribution by an F-distribution with f 1 and f 2 degrees of freedom (obtained via Box-type approximation as derived by Brunner et al. (1997)).In comparison with the Wald-type statistic W N (C) in ( 14), the ANOVA-type statistic A N (C) controls the type-I error much better in small sample sizes; n i ≥ 15 depending on the design and hypothesis of interest.Moreover, the approximation of the distribution of A N (C) is also valid under the more general hypothesis H P 0 .We note that, basically, both statistics can also be computed using the ranks R ik instead of the pseudo-ranks R ψ ik .But the general remarks in Sections 2.2.2 regarding the usual ranks R ik must be carefully considered.We also note that the asymptotic distribution of the Wald-type statistic W N (C) under the more general hypothesis H P 0 is not the χ 2 w -distribution with w = rank(C V N C ⊤ ) in general.This would require an additional assumption on the sequence of the empirical covariance matrices V N which cannot be verified in practice.
The preceeding comments and discussion might appear somewhat difficult to understand but they are necessary to explain the different options in the printout of rankFD.At this point, it becomes evident that the question "Does anybody know whether there is a nonparametric analog of ANOVA?" cannot be answered by some simple statements and that the heuristic technique replacing observations by their ranks and then performing an 'ANOVA on the ranks' may lead to non valid procedures and incorrect conclusions in general.

Multiple contrast test procedures
Both the Wald-type and ANOVA-type statistics are global tests, i.e. if the respective hypothesis H F 0 or H P 0 is rejected, the only available information is that any of the factor levels (or their combinations) differ at pre-assigned significance level α.The identification of the factor levels which are responsible for the difference is, however, often of major interest and a key research question.Local test decisions in terms of adjusted p-values and simultaneous confidence intervals are of primary importance and key elements of a complete data evaluation.These can be exposed using Multiple Contrast Test Procedures (MCTP) (Bretz et al., 2001;Hothorn et al., 2008;Konietschke et al., 2012), which are also known as max-t-test type procedures in parametric models (Konietschke et al., 2021).In order to test the local null hypothesis H (ℓ) 0 : c ⊤ ℓ ψ = 0, we use the test statistic where the contrast vector c ℓ reflects the researcher's particular question.Typical contrast vectors are discussed by Bretz et al. (2001).
Since the statistics T ℓ and T ℓ ′ are not necessarily independent when ℓ ̸ = ℓ ′ , we collect them in the vector T = (T 1 , . . ., T q ) ⊤ , which follows, asymptotically, as N → ∞, a multivariate normal distribution with expectation 0 and correlation matrix R. Since R is unknown, we replace it with the estimator R obtained from standardizing C ⊤ V N C, see Konietschke et al. (2012).For large sample sizes, we reject the individual null hypothesis Finally, the global null hypothesis H P 0 (or For small sample sizes, Konietschke et al. (2012) suggest to use t quantiles rather than normal and the Fisher-transformation for the computation of range-preserving confidence intervals.The rankFD function implements all of the different procedures.

Software and examples
In the following, we will analyze different data sets to illustrate the application of the implemented functions in rankFD.They differ in their complexity and cover two-and several samples as well as a factorial design, respectively.We note that the wrapper function rankFD() realizes the actual statistical design from the given formula argument.However, few of the statistical methods are available for two independent samples only and we therefore implemented the function rank.two.samples for their exclusive analysis.First, we will explain the syntax of the two functions and then illustrate their application using real data sets.
and Munzel, 2000) with t-approximation, normal uses the standard normal quantiles and rangepreserving confidence intervals are obtained by logit or probit tranformation functions (Pauly et al., 2016).
• permu indicates whether additional studentized permutation tests shall be computed (Janssen, 1999b;Neubert and Brunner, 2007;Pauly et al., 2016) • alternative Two-sided and one-sided tests and confidence intervals are available using the argument alternative.

• wilcoxon
gives the option to compute additional Wilcoxon-Mann-Whitney tests for testing the equality of the two distributions H F 0 : F 1 = F 2 of the two samples.We use the coin package for these computations (Zeileis et al., 2008).Both the asymptotic as well as exact distribution of the test is available.
• shift.intcan be used for the computation of a confidence interval for the shift-effect (Hodges-Lehmann).
• nperm, conf.level,info and rounds list optional arguments specifying the numbers of permutation, coverage probability, output explanation and decimals.
The use of the plot() function to a rank.two.samplesobject displays a plot of the confidence interval for θ.
• CI.method specifies the computational method of the confidence intervals, either using the normal approximation or the logit transformation function.
• effect defines the effect to be estimated, in particular, effect = "weighted" or effect = "unweighted" estimate the weighted or unweighted relative effect, respectively.As explained above, this choice either leads to using traditional ranks (weighted) or pseudo-ranks (unweighted).
• hypothesis defines the null hypothesis of interest (either H F 0 or H P 0 formulated in terms of distribution functions or relative effects, respectively).
• contrast is specified to perform multiple contrast tests.The argument must be given as a list() specifying the factor level and the kind of contrast (optional).The user can chose from a preimplemented list of possible contrasts or commit a user-specific contrast matrix.
• sci.method defines the computational method of the simultaneous confidence intervals.
• Factor.Information is a logical argument whether descriptive information (effect estimators, standard error and confidence intervals) for each factor and interaction effect is of interest and shall be displayed.
• info and rounds list optional arguments specifying the numbers of output explanation and decimals.

Plot options:
In order to visualize the results of the analysis, the confidence intervals can be plotted by using the generic plot() function (being applied to a rankFD object).In two-and higher way layouts, the user is asked to type the name of the main or interaction effect the confidence intervals of which should be drawn.All standard font, width and color arguments apply (lwd, pch, cex, etc.).Furthermore, the argument cex.ci sets the "cex" (number indicating the amount by which plotting text and symbols should be scaled relative to the default) of the confidence interval limits.

Two independent samples
As an illustrating example, we use a part of the reaction time data provided by Shirley (1977).In this animal experiment, N = 40 mice were randomized to a = 4 dose groups (n = 10 animals per group).

A one-way factorial design
As an example of a one-way factorial design we use the data set EEG that is included in the package MANOVA.RM (Friedrich et al., 2019a(Friedrich et al., , 2021)).The data set contains EEG measurements of 160 patients who were diagnosed with either Alzheimer's Disease (AD), mild cognitive impairments (MCI), or subjective cognitive complaints without clinically significant deficits (SCC), based on neuropsychological diagnostics (Bathke et al., 2018).For demonstration purposes, we restrict our analysis to the measurement of Hjorth complexity (represents change in frequency) obtained at central electrode positions.The question of interest is whether this EEG value tends to be larger or smaller than the mean Mann-Whitney effect across the different diseases and therefore, the relative effects defined in (3) are used for the analysis.
The EEG data is analyzed using the function rankFD().Here, we calculate confidence intervals with the logit approach and estimate the unweighted relative treatment effects to test the null hypothesis H P 0 .Moreover, we specify a multiple contrast test based on Tukey-type contrasts for the pairwise comparisons of the three diagnosis groups.The output consists of several parts: First, a brief description of the methods is given.B$Descriptive returns the sample sizes, the estimated relative effects as well as their standard errors and confidence intervals for the factor levels.B$ Wald.Type.Statistic and B$ANOVA.Type.Statistic return the results of the Wald-type and ANOVA-type test as described in Section 2.3, respectively.Since we specified our null hypothesis in terms of H P 0 , Kruskal-Wallis test is not performed.The part B$MCTP finally contains the results of the multiple contrast test: the contrast matrix (Tukey-type), the local test results T ℓ as well as the global results T 0 along with the t-quantile and the corresponding degrees of freedom (Konietschke et al., 2012)   significantly from the other two groups, see also Figure 2.

A two-way factorial design
As an illustrative example of a two-way factorial design, we chose the Irritation of the Nasal Mucosa trial provided by Brunner et al. (2019, Chapter B.3.2) and included in the package.In this trial, the researchers investigated the damage of two gaseous substances (factor A) on the nasal mucous membrane of mice.Hereby, both substances were given in three different concentrations (1[ppm], 2[ppm] and 5[ppm]) (factor B) to 25 mice each.The degree of irritation and damage was histopathologically assessed using an ordinal score ranging from 0 to 4 with 0 = "no irritation", 1 = "mild irritation", 2 = "strong irritation", 3 = "severe irritation" and 4 = "irreversible damage", respectively.The outcome is displayed in Figure 3.The code to analyze this data is similar to that provided above, but we additionally include an interaction term in the formula.In this example, we formulate the null hypothesis in terms of the distribution functions to show the R-code for testing this hypothesis.Note that due to the balanced design, both weighted and unweighted estimators give the same results.The right plot in Figure 4 shows that the relative effects increase at a similar rate in both levels of the main effect suggesting no qualitative interaction between the factor substance and the concentration.

Summary
The rankFD-package implements current state of the art rank methods for nonparametric inference in general factorial designs with independent observations.It comprises of functions for computing various test statistics for testing null hypotheses formulated either in distribution functions or in relative effects using ranks or pseudo-ranks, respectively.Up until now, no other software package for testing null hypotheses in relative effects in general factorial designs have existed.Besides global procedures (Wald-type and ANOVA-type statistics) using quadratic forms, rankFD implements multiple contrast tests and simultaneous confidence intervals for relative effects.The possibility of testing contrasts between the main and interaction effects makes rankFD a powerful tool for the application of nonparametric methods in data analysis and a useful addition to nparcomp (Konietschke et al., 2015).Besides the inference methods discussed above, rankFD furthermore implements formulas for computing sample sizes using the functions WMWSSP() and noether() (Happ et al., 2019).Since these methods apply for two independent samples only, we did not discuss them in the present manuscript.
We designed the package and its functions to be similar to the well known R-functions lm(), aov() for the analysis of linear models and the glht() function of the multcomp package for the computation of multiple contrast tests in means.Both rankFD and multcomp use the mvtnorm package (Genz et al., 2021) for the computation of critical values.However, as explained in detail in the Introduction, the effect measures used in multcomp and mvtnorm are different from those used in rankFD.In general parlance, this means that the parametric and nonparametric methods are not comparable at hand.We plan to update rankFD frequently with novel procedures.For instance, various international research groups are currently investigating rank-based methods for the analysis of clustered data, see also the package clusrank Jiang et al. (2020) for the analysis of two samples, sample size planning, as well as analysis of covariance methods.We plan to add these methods in the future.The package rankFD is online available on CRAN.

Figure 1 :
Figure 1: Boxplots (left) and 95%-confidence interval (right) for the relative effect of the reaction time data.

Figure 4 :
Figure 4: Local 95%-confidence interval for the (relative) main and interaction effects of the reaction time data.
are reported, see Section 2.3.2 for details.The significant difference between the diagnosis groups and the results of the post-hoc tests reveal that SCC patients differ