Ranked set sampling (RSS) is an advanced data collection method when the exact measurement of an observation is difficult and/or expensive used in a number of research areas, e.g., environment, bioinformatics, ecology, etc. In this method, random sets are drawn from a population and the units in sets are ranked with a ranking mechanism which is based on a visual inspection or a concomitant variable. Because of the importance of working with a good design and easy analysis, there is a need for a software tool which provides sampling designs and statistical inferences based on RSS and its modifications. This paper introduces an R package as a free and easy-to-use analysis tool for both sampling processes and statistical inferences based on RSS and its modified versions. For researchers, the RSSampling package provides a sample with RSS, extreme RSS, median RSS, percentile RSS, balanced groups RSS, double versions of RSS, L-RSS, truncation-based RSS, and robust extreme RSS when the judgment rankings are both perfect and imperfect. Researchers can also use this new package to make parametric inferences for the population mean and the variance where the sample is obtained via classical RSS. Moreover, this package includes applications of the nonparametric methods which are one sample sign test, Mann-Whitney-Wilcoxon test, and Wilcoxon signed-rank test procedures. The package is available as RSSampling on CRAN.
Data collection is the crucial part in all types of scientific research. Ranked set sampling (RSS) is one of the advanced data collection methods, which provides representative sample data by using the ranking information of the sample units. It was firstly proposed by (McIntyre 1952) and the term "ranked set sampling" was introduced in the study of (Halls and Dell 1966) about the estimation of forage yields in a pine hardwood forest. (Takahasi and Wakimoto 1968) theoretically studied the efficiency of the mean estimator based on RSS which is unbiased for the population mean. They found that its variance is always smaller than the variance of the mean estimator based on simple random sampling (SRS) with the same sample size when the ranking is perfect. Some other results on the efficiency of RSS can be found in (Dell and Clutter 1972), (David and Levine 1972), and (Stokes 1980b). (Stokes 1977) studied the use of concomitant variables for ranking of the sample units in the RSS procedure and found that the ranking procedure was allowed to be imperfect. In another study, she constructed the estimator for the population variance in the presence of the ranking error (Stokes 1980a). For some examples and results on the regression estimation based on RSS, see, (Yu and Lam 1997) and (Chen 2001). The estimation of a distribution function with various settings of RSS can be found in (Stokes and Sager 1988), (Kvam and Samaniego 1993), and (Chen 2000). Other results on distribution-free test procedures based on RSS can be found in (Bohn and Wolfe 1992, 1994), and (Hettmansperger 1995). Additional results for inferential procedures based on RSS can be found in the recent works of (Zamanzade and Vock 2015), (Zhang et al. 2016), and (Ozturk 2018). For more details on RSS, we refer the review papers by (Kaur et al. 1995), (Chen et al. 2003), and (Wolfe 2012).
The RSS method and its modified versions have come into prominence recently due to its efficiency and therefore new software tools or packages for a quick evaluation is required. A free software called Visual Sample Plan (VSP) created by Pacific Northwest National Labaratory has many sampling designs including classical RSS method for developing environmental sampling plans under balanced and unbalanced cases. It provides the calculation of the required sample size and cost information with the location to be sampled. Also, a package NSM3 by (Schneider 2015) in R has two functions related to classical RSS method. It only provides the Monte Carlo samples and computes a statistic for a nonparametric procedure. Both the VSP and NSM3 package include only the classical RSS method as a sampling procedure and provide limited methods for inference. Therefore, there is no extensive package for sampling and statistical inference using both classical and modified RSS methods in any available software packages. In this study, we propose a pioneering package, named RSSampling, for sampling procedures based on the classical RSS and the modified RSS methods in both perfect and imperfect ranking cases. Also, the package provides the estimation of the mean and the variance of the population and allows the use of the one sample sign, Mann-Whitney-Wilcoxon, and Wilcoxon signed-rank test procedures under classical RSS. The organization of the paper is as follows: in the following section, we give some brief information about classical RSS and modified RSS methods. Then, we introduce the details of RSSampling package and further, we give some illustrative examples with a real data analysis. In the last section, we give the conclusion of the study.
RSS and its modifications are advanced sampling methods using the rank information of the sample units. The ranking of the units can be done by visual inspection of a human expert or a concomitant variable. The procedure for the RSS method is as follows:
Select
Rank these
Keep the smallest judged unit from the ranked set.
Select second set of
Continue the process until
The first five steps are referred to as a cycle. Then, the cycle repeats
RSS design obtains more representative samples and gives more precise estimates of the population parameters relative to SRS (EPA 2012). The main difference between the RSS method and the other modified methods is the selection procedure of the sample units from the ranked sets. For example, (Samawi et al. 1996) suggested extreme RSS using the minimum or maximum units from each ranked set. (Muttlak 1997) introduced median RSS using only median units of the random sets. (Jemain et al. 2008) suggested balanced groups RSS which is defined as the combination of extreme RSS and median RSS. For additional examples of modified methods, see (Muttlak 2003b), (Al-Saleh and Al-Kadiri 2000), and for robust methods see, (Al-Nasser 2007), (Al-Omari and Raqab 2013), and (Al-Nasser and Mustafa 2009). In literature, the studies for modified RSS methods are generally interested in obtaining a sample more easily or making a more robust estimation for a population parameter. Such studies are made for the investigation of properties (for example, bias and mean squared error) of a proposed estimator and they have generally focused on the comparisons of SRS and RSS methods. Note that the true comparisons of the modified RSS methods to the others are difficult to present in general terms. Because the advantages of the sampling methods, when compared to each other, may vary according to the situations such as the parameter to be estimated, underlying distribution, the presence of ranking error, etc. For more detailed information on the modifications of RSS, see (Al-Omari and Bouza 2014) and references therein. In the following, the modified RSS methods which are considered in RSSampling are introduced.
Extreme RSS (ERSS) is the first modification of RSS suggested by
(Samawi et al. 1996) to estimate the population mean only using the minimum or
maximum ranked units from each set. The procedure for ERSS can be
described as follows: select
Median RSS (MRSS) was suggested by (Muttlak 1997). In this method, only
median units of the random sets are chosen as the sample for estimation
of population mean. For the odd set sizes, the
(Muttlak 2003b) suggested another modification for the RSS, percentile RSS
(PRSS), where only the upper and lower percentiles of the random sets
are chosen as the sample for selected value of
Balanced groups RSS (BGRSS) can be defined as the combination of ERSS
and MRSS. (Jemain et al. 2008) suggested to use BGRSS for estimating the
population mean with a special sample size
(Al-Saleh and Al-Kadiri 2000) introduced another modification of RSS, that is double
RSS (DRSS) as a beginning of multistage procedure. Several researchers
also extended the DRSS method to modified versions such as double
extreme RSS (DERSS) by (Samawi 2002), double median RSS (DMRSS)
by (Samawi and Tawalbeh 2002), and double percentile RSS (DPRSS) by
(Jemain and Al-Omari 2006). The DRSS procedure is described as follows:
L-RSS, which is a robust RSS procedure, is based on the idea of L
statistic and it was introduced by (Al-Nasser 2007) as a generalization
of different type of RSS methods. The first step for L-RSS procedure is
selecting
When
The truncation-based RSS (TBRSS) was presented by (Al-Omari and Raqab 2013).
This procedure can be summarized as follows: select randomly
Note that when
Robust extreme RSS (RERSS) scheme was introduced by (Al-Nasser and Mustafa 2009).
This method can be described as follows: identify
If
The package RSSampling is available on CRAN and can be installed and loaded via the following commands:
> install.packages("RSSampling")
> library("RSSampling")
The package depends on the stats package and uses a function from the non-standard package LearnBayes (Albert 2018) for random data generation in the Examples section. The proposed package consists of two main parts which are the functions for sampling methods described in Table 1 and the functions for inference procedures described in Table 2 based on RSS. The sampling part of the package includes perfect and imperfect rankings with a concomitant variable allowing researchers to sample with classical RSS and the modified versions. The functions for inference procedures provide estimation for parameters and some hypothesis testing procedures based on RSS.
In this part, we introduce a core function, which is called
rankedsets
, to obtain s
ranked sets consisting of randomly chosen
sample units with the set size rankedsets
function for the studies based on other modified RSS
methods which are not mentioned in this paper.
Function | Description |
rss |
Performs classical RSS method |
Mrss |
Performs modified RSS methods (MRSS, ERSS, PRSS,BGRSS) |
Rrss |
Performs robust RSS methods (L-RSS, TBRSS, RERSS) |
Drss |
Performs double RSS methods (DRSS, DMRSS, DERSS, DPRSS) |
con.rss |
Performs classical RSS method by using a concomitant variable |
con.Mrss |
Performs modified RSS methods (MRSS, ERSS, PRSS,BGRSS) by using a concomitant variable |
con.Rrss |
Performs robust RSS methods (L-RSS, TBRSS, RERSS) by using a concomitant variable |
obsno.Mrss |
Determines the observation numbers of the units which will be chosen to the sample for classical and modified RSS methods by using a concomitant variable |
The function rss
provides the ranked set sample with perfect ranking
from a specific data set, sets = TRUE
(default
sets = FALSE
) with the set size Mrss
provides a sample from MRSS,
ERSS, PRSS, and BGRSS which are represented by "m"
,"e"
, "p"
, and
"bg"
, respectively. The type = "r"
, defined as the default,
represents the classical RSS. For the sampling procedure PRSS, there is
an additional parameter p
which defines the percentile. We note that,
when p = 0.25
in PRSS, one can obtain a sample with quartile RSS given
by (Muttlak 2003a). Rrss
provides samples from L-RSS,
TBRSS, and RERSS methods which are represented by "l"
, "tb"
, and
"re"
, respectively. The parameter alpha
is the common parameter for
these methods and defines the cutting value. Drss
function is for
double versions of RSS, MRSS, ERSS, and PRSS under perfect ranking.
type = "d"
is defined as the default which represents the double RSS.
Values "dm"
, "de"
, and "dp"
are defined for DMRSS, DERSS, and
DPRSS methods, respectively.
In the literature, most of the theoretical inferences and numerical
studies are conducted based on perfect ranking. However, in real life
applications, the ranking process is done with an expert judgment or a
concomitant variable. Let us consider RSS with a concomitant variable
The functions con.rss
, con.Mrss
, and con.Rrss
provide methods to
obtain a sample under imperfect ranking. With the con.rss
function, a
researcher can obtain a classical ranked set sample from a specific data
set using a concomitant variable con.Mrss
and con.Rrss
have similar usage with con.rss
function except the selection method which is defined by type
parameter. Also, these functions are simply extensions of the Mrss
and
Rrss
for concomitant variable cases.
In a real-world research, the values of the variable of interest obsno.Mrss
provides the code for this kind of application,
when the researchers prefer to use RSS methods. After determining the
sample frame and the concomitant variable to be used for ranking, the
code provides the number of the units to be selected according the
values of the concomitant variable. Then, the researcher obtain easily
the observation numbers of the units which will be chosen to the sample.
type = "r"
is defined as the default which represents the classical
RSS. MRSS, ERSS, PRSS, and BGRSS are represented by "m"
, "e"
,
"p"
, and "bg"
, respectively.
Statistical inference refers to the process of drawing conclusions and having an information about the interested population. Researchers are generally interested in fundamental inferences for the parameters such as mean and variance. Using the RSSampling package, we provide an easy way to estimate the parameters about the interested population and to use some distribution-free tests; namely the sign, Mann-Whitney-Wilcoxon, and Wilcoxon signed-rank tests for nonparametric inference when the sampling procedure is RSS.
Function | Description |
meanRSS |
Performs mean estimation and hypothesis testing with classical RSS method |
varRSS |
Performs variance estimation with classical RSS method |
regRSS |
Performs regression estimation for mean of interested population with classical RSS method |
sign1testrss |
Performs one sample sign test with classical RSS method |
mwwutestrss |
Performs Mann-Whitney-Wilcoxon test with classical RSS method |
wsrtestrss |
Performs Wilcoxon signed-rank test with classical RSS method |
The meanRSS
function provides point estimation, confidence interval
estimation, and asymptotic hypothesis testing for the population mean
based on RSS see, (Chen et al. 2003). For the variance estimation based
on RSS, we define varRSS
function which has two type
parameters;
"Stokes"
and "Montip"
. (Stokes 1980a) proved that estimator of variance
is asymptotically unbiased regardless of presence of ranking error. For
the "Montip"
type estimation, (Tiensuwan and Sarikavanij 2003) showed that
there is no unbiased estimator of variance for one cycle but they
proposed unbiased estimator of variance for more than one cycle. With
regRSS
function, regression estimator for mean of interested
population can be obtained based on RSS. The "B"
in regRSS
function) is calculated under the assumption of known
population mean for concomitant
Finally, for nonparametric inference, sign1testrss
, mwwutestrss
, and
wsrtestrss
functions implement, respectively, the sign test, the
Mann-Whitney-Wilcoxon test, and the Wilcoxon signed-rank test depending
on RSS. The normal approximation is used to construct the test
statistics and an approximate confidence intervals. For detailed
information on these test methods, see the book of (Chen et al. 2003).
In this section, we present examples illustrating the RSSampling package.
This example shows the process to obtain a sample by using TBRSS method
for the variable of interest, m
is 4 and the cycle size r
is 2. The
ranked sets of con.Rrss
. Thus, the resultant sample for
##Loading packages
library("RSSampling")
library("LearnBayes")
## Imperfect ranking example for interested (X) and concomitant (Y) variables
## from multivariate normal dist.
set.seed(1)
mu <- c(10, 8)
variance <- c(5, 3)
a <- matrix(c(1, 0.9, 0.9, 1), 2, 2)
v <- diag(variance)
Sigma <- v%*%a%*%v
x <- rmnorm(10000, mu, Sigma)
xx <- as.numeric(x[,1])
xy <- as.numeric(x[,2])
## Selecting a truncation-based ranked set sample
con.Rrss(xx, xy, m = 4, r = 2, type = "tb", sets = TRUE, concomitant = FALSE,
\ alpha = 0.25)
\$corr.coef
[1] 0.9040095
\$var.of.interest
[,1] [,2] [,3] [,4]
[1,] 12.332134 13.116611 15.675967 21.72312
[2,] 11.350275 8.846237 10.164005 17.07950
[3,] 4.143757 9.608573 8.708221 11.57671
[4,] 2.284106 9.535388 12.709489 14.11595
[5,] 3.212739 8.089833 11.430411 14.53190
[6,] 6.556222 12.759335 13.210037 11.02219
[7,] 3.337564 -0.864634 12.800243 13.47315
[8,] 5.988893 8.850680 13.208956 15.82731
\$concomitant.var.
[,1] [,2] [,3] [,4]
[1,] 8.034720 10.398398 11.800919 13.754743
[2,] 8.003575 8.118947 11.136804 12.149531
[3,] 4.733177 7.377396 8.866563 11.658837
[4,] 4.027061 8.008146 9.977435 10.912382
[5,] 3.909958 6.220087 7.564130 8.739562
[6,] 5.893001 8.760754 10.067927 10.244593
[7,] 2.119661 2.813413 10.651769 10.775596
[8,] 5.406154 7.722866 8.602551 10.874853
\$sample.x
m = 1 m = 2 m = 3 m = 4
r = 1 12.332134 8.846237 8.708221 14.11595
r = 2 3.212739 12.759335 12.800243 15.82731
Random determination of the sample units is an important task for
practitioners. The function obsno.Mrss
is for the practitioners who
have the frame of the population with unknown variable
## Loading packages
library("RSSampling")
## Generating concomitant variable (Y) from exponential dist.
set.seed(5)
y = rexp(10000)
## Determining the observation numbers of the units which are chosen to sample
obsno.Mrss(y, m = 3, r = 5, type = "m")
m = 1 m = 2 m = 3
r = 1 "Obs. 2452" "Obs. 6417" "Obs. 3227"
r = 2 "Obs. 9094" "Obs. 1805" "Obs. 9877"
r = 3 "Obs. 1333" "Obs. 9252" "Obs. 3219"
r = 4 "Obs. 6397" "Obs. 7038" "Obs. 5019"
r = 5 "Obs. 446" "Obs. 9663" "Obs. 10"
In order to illustrate the usage of the package, we give a simulation
study with 10,000 repetitions for mean estimation of m = 5
and r = 10 assuming that
## Loading packages
library("RSSampling")
library("LearnBayes")
## Imperfect ranking example for interested (X) and concomitant (Y) variables
## from multivariate normal dist.
mu <- c(10, 8)
variance <- c(5, 3)
rho = seq(0, 0.9, 0.1)
se.x = mse.x = numeric()
repeatsize = 10000
for (i in 1:length(rho)) \{
\ set.seed(1)
\ a <- matrix(c(1, rho[i], rho[i], 1), 2, 2)
\ v <- diag(variance)
\ Sigma <- v%*%a%*%v
\ x <- rmnorm(10000, mu, Sigma)
\ xx <- as.numeric(x[,1])
\ xy <- as.numeric(x[,2])
\ for (j in 1:repeatsize) \{
\ set.seed(j)
\ samplex = con.Mrss(xx, xy, m = 5, r = 10, type = "r", sets = FALSE,
\ concomitant = FALSE)\$sample.x
\ se.x[j] = (mean(samplex)-mu[1])^2
\ \}
\ mse.x[i] = sum(se.x)/repeatsize
\}
plot(rho[-1], mse.x[-1], type = "o", lwd = 2,
main = "MSE values based on increasing correlation levels",
xlab = "corr.coef.", ylab = "MSE", cex = 1.5, xaxt = "n")
axis(1, at = seq(0.1, 0.9, by = 0.1))
In this real data example, we used the abolone data set which is freely
available at
https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data.
The data consists of
abaloneData <- read.csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases
\ /abalone/abalone.data"), header = FALSE, col.names = c("sex", "length",
\ "diameter", "height", "whole.weight", "shucked.weight", "viscera.weight",
\ "shell.weight", "rings"))
Suppose that we aimed to estimate the mean of
cor(abaloneData\$viscera.weight, abaloneData\$whole.weight)
[1] 0.9663751
set.seed(50)
sampleRSS = con.rss(abaloneData\$viscera.weight, abaloneData\$whole.weight, m = 5, r = 5,
\ sets = TRUE, concomitant = FALSE)\$sample.x
meanRSS(sampleRSS, m = 5, r = 5, alpha = 0.05, alternative = "two.sided", mu_0 = 0.18)
\$mean
[1] 0.17826
\$CI
[1] 0.1293705 0.2271495
\$z.test
[1] -0.06975604
\$p.value
[1] 0.9443878
varRSS(sampleRSS, m = 5, r = 5, type = "Stokes")
[1] 0.0135364
The results from our sample data indicate that the estimated mean and
the variance are p.value
RSS is an efficient data collection method compared to SRS especially in situations where the measurement of a unit is expensive but the ranking is less costly. In this study, we propose a package which obtains sample from RSS and its modifications and provide functions to allow some inferential procedures by RSS. We create a set of functions for sampling under both perfect and imperfect rankings with a concomitant variable. For the inferential procedures, we consider mean, variance, and regression estimator and sign, Mann-Whitney-Wilcoxon, and Wilcoxon signed-rank tests for the distribution free tests. Proposed functions in the package are illustrated with the examples and analysis of a real data is given. Future improvements of the package may be provided by adding new inference procedures based on RSS methods.
The authors thank two anonymous referees and the associate editor for their helpful comments and suggestions which improved the presentation of the paper. This study is supported by the Scientific and Technological Research Council of Turkey (TUBITAK-COST Grant No. 115F300) under ISCH COST Action IS1304.
NSM3, RSSampling, stats, LearnBayes
Bayesian, Distributions, Survival, TeachingStatistics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Sevinc, et al., "RSSampling: A Pioneering Package for Ranked Set Sampling ", The R Journal, 2019
BibTeX citation
@article{RJ-2019-039, author = {Sevinc, Busra and Cetintav, Bekir and Esemen, Melek and Gurler, Selma}, title = {RSSampling: A Pioneering Package for Ranked Set Sampling }, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {1}, issn = {2073-4859}, pages = {401-415} }