In this paper, we describe the R package rTableICC that provides an interface for random generation of 2
Random generation of contingency tables is essential for simulation
studies conducted over categorical data. The main characteristic of a
contingency table is determined by the assumed sampling plan and the
correlation structure between categorical variables constituting the
table. There are three main sampling plans: Poisson, multinomial, and
product multinomial. In the Poisson plan, each cell is independently
Poisson distributed and there is no restriction on the total sample
size. In the multinomial plan, total sample size is fixed while row and
column totals are not fixed. When one of the margins of the table is
fixed and the rest are set free, we have a product multinomial plan
(Bishop, S. Fienberg, and P. Holland 1975; Agresti 2002). If both margins are naturally fixed, the
sampling plan becomes hypergeometric, which is seldom used in practice
(Agresti 2002). There are numerous ways in R to generate contingency
tables of various dimensions. The function r2dtable()
in the base
package stats generates random two-way tables with given marginals
using Patefield’s algorithm under product-multinomial sampling
(Patefield 1981). Alternatively, one can generate a random contingency
table over log-linear models with a predetermined association structure.
However, there is no package in R for random generation of
2
It is highly possible to have intraclass correlations (ICCs) in surveys
conducted over sampling units with more than one observation unit if
these units are correlated. Familial data also include ICCs. In a public
health survey, if data are collected over families, intraclass
correlations arise due to the within family dependence. Presence of
intraclass correlations can invalidate results of classical categorical
models or chi-square tests (Demirhan 2013). Therefore, use and further
developments of methods specific to the cases with ICCs are essential.
In the literature, Cohen (1976) and Altham (1976) introduced categorical
analyzes under the presence of ICCs. (Borkowf 2000) proposed an ICC
statistic for contingency tables with the empirical multivariate
quantile-partitioned distributions. Nandram and J. Choi (2006) proposed Bayesian
analysis of R
Monte Carlo simulation studies are essential in the development of new
statistical methods to handle ICCs. However, there is neither a Monte
Carlo approach nor an R package to implement random generation of
contingency tables under intraclass-correlated individuals. In this
article, we propose a simple approach for the generation of
2
The package rTableICC includes four functions for random generation of
2
In the following sections, the approach for the generation of random tables in the presence of ICCs is described, details of data generation processes under considered sampling plans are mentioned, input and output structures of rTableICC are demonstrated, and use of the package is illustrated by several examples. We also provide a performance analysis regarding the mean running times of the functions in the package rTableICC. Then, we conclude with a brief summary.
Altham (1976) introduced two probabilities to deal with ICCs over an
R
and
where
We utilize equations ((1)) and ((2)) to incorporate
ICCs into the data generation process. We work over clusters to generate
data. For all sampling plans, the total sample size either entered or
obtained over randomly generated data is distributed across the
clusters. Then, for the clusters with only one individual, because there
is no ICC affecting the individual, we randomly assign it to one of the
cells of the table taking the input vector of cell probabilities into
account,
Algorithm 1.
Input
Set
Generate all possible compositions of order
Write generated compositions to an
For each composition
Normalize the series of probabilities,
Randomly select one of the compositions based on the probability function obtained at step 6.
Write selected composition to an
If
In Algorithm 1, compositions
from the
package partitions
(Hankin 2006) to generate all possible compositions at the step 3 of
Algorithm 1. Each composition represents one of the possible allocations
of individuals in a cluster into target cells. For example, let us have
4 cells to distribute 5 individuals in a cluster. We run the following
code to get the
> N <- t(compositions(5, 4, include.zero = TRUE))
The resulting output looks like
[1,] 5 0 0 0 [2,] 4 1 0 0 [3,] 3 2 0 0 [4,] 2 3 0 0 ...
The vector
The package rTableICC consists of four main functions:
rTableICC.RxC
, rTableICC.2x2xK
, rTable.RxC
and rTable.2x2xK
; and
an auxiliary function rDiscrete
, which is also suitable for use
individually. In the general functioning of the package, first, main
inputs are checked by an initial layer according to the presence of ICCs
and used sampling plan; and then the related function is called. In
addition to general checks, specific checks are done by the related
function itself. Below, we describe the processing of each function
after the general check.
The function rTableICC.RxC
is called to generate an RrTableICC.RxC
.
Algorithm 2.
Input sampling plan,
If sampling plan is multinomial goto step 3, product-multinomial goto step 7, and Poisson goto step 15;
If any of inputs
Distribute rmultinom(1, N, rep(1/M,
M))
;
If the maximum number of individuals in one of the clusters is greater than the maximum allowed cluster size then stop;
Employ Algorithm 1 with joint probabilities for all clusters and goto step 21;
If any of inputs
Determine the fixed margin according to input parameters
col.margin
or row.margin
and set
Calculate conditional probabilities regarding the fixed margin;
If conditional probabilities calculated over entered row margins and
Distribute individuals in the
If the maximum number of individuals in one of the clusters is greater than the maximum allowed cluster size then stop;
Employ Algorithm 1 with calculated conditional probabilities for all
clusters and set
If
If input
Generate number of individuals in each cell by
rpois(R * C, t(lambda))
;
Calculate cell probabilities and total number of individuals
Distribute rmultinom(1, N, rep(1/M,
M))
;
If the maximum number of individuals in one of the clusters is greater than the maximum allowed cluster size then stop, else goto step 20;
Employ Algorithm 1 with probabilities calculated at step 17 for all clusters;
Calculate desired output forms of generated table.
Suitability checks at steps 3, 7, and 15 are made on minimum and maximum
values and dimensions of input vectors. Because the total sample size,
which is entered by the user for multinomial sampling, randomly
generated for Poisson sampling, and entered as a fixed row (column)
margin for product-multinomial sampling, is randomly distributed into
the clusters, it is coincidentally possible to have clusters with more
individuals than the allowed maximum cluster size. In this case, the
following error message is generated:
Maximum number of individuals in one of the clusters is 14, which is
greater than maximum allowed cluster size. (1) Re-run the function, (2)
increase maximum allowed cluster size by increasing the number of
elements of theta, (3) increase total number of clusters, or (4)
decrease total number of individuals!
and execution is stopped at steps 5, 12, and 19 of Algorithm 2.
For the product-multinomial sampling, suppose that row totals are fixed
and
where
Let rTableICC.RxC
include two arrays in addition to the generated table.
The first one,
The function rTableICC.2x2xK
is called to generate a rTableICC.2x2xK
. We assume that we have rTableICC.2x2xK
generates a
Algorithm 3.
Input sampling plan,
If sampling plan is multinomial goto step 3, product-multinomial goto step 9, and Poisson goto step 16;
If any of inputs
Distribute rmultinom(1, N,
rep(1/sum(num.cluster), sum(num.cluster)))
and store the results
in a
If the maximum number of individuals in one of the clusters is
greater than the maximum allowed cluster size then stop, else set
Scale joint probabilities of the
Employ Algorithm 1 with scaled joint probabilities for all clusters
of center
If
If any of inputs
Calculate conditional probabilities regarding the fixed centers and
set
Scale conditional probabilities of step 10 under the
Distribute individuals in the rmultinom(1,
N[k], rep(1/num.cluster[k], num.cluster[k]))
;
If the maximum number of individuals in one of the clusters is greater than the maximum allowed cluster size then stop;
Employ Algorithm 1 with scaled conditional probabilities for all
clusters of center
If
If input
Generate number of individuals in each cluster by
rpois(num.cluster[k], lambda[k])
;
Calculate total number of individuals
Scale joint probabilities of the
If the maximum number of individuals in one of the clusters is greater than the maximum allowed cluster size then stop;
Employ Algorithm 1 with probabilities calculated at step 19 for all clusters;
Calculate desired output forms of generated table.
Suitability checks at steps 3, 9, and 16 are made on minimum and maximum
values and dimensions of input vectors. For the incompatibility between
generated and allowed maximum cluster sizes, the same situation as the
RrTableICC.2x2xK
proceeds over each center.
For product-multinomial sampling plan, suppose that center totals are
denoted by
where
Arrays rTableICC.2x2xK
. Here,
The function rTable.RxC
is used to generate an RrTable.RxC
is described in Algorithm 4.
Algorithm 4.
Input sampling plan,
If sampling plan is multinomial goto step 3, product-multinomial goto step 5, and Poisson goto step 11;
If any of inputs
Distribute rmultinom(1, N, pi)
and goto step 12;
If any of inputs
Determine the fixed margin according to input parameters
col.margin
or row.margin
and set
Calculate conditional probabilities regarding the fixed margin;
If conditional probabilities calculated over entered row margins and
Distribute individuals in the
If
If input
Generate number of individuals in each cell by
rpois(R * C, t(lambda))
;
Calculate desired output forms of generated table.
Suitability checks at steps 3, 5, and 11 are made on minimum and maximum
values and dimensions of input vectors. For the product-multinomial
sampling plan, the multinomial form in equation ((3)) is used.
Raw data corresponding to each individual are also generated among
outputs of rTable.RxC
.
The function rTable.2x2xK
is employed to generate a rTable.2x2xK
is described in Algorithm 5. Assume that we have rTableICC.2x2xK
, rTable.2x2xK
generates a
Algorithm 5.
Input sampling plan,
If sampling plan is multinomial goto step 3, product-multinomial goto step 5, and Poisson goto step 10;
If any of inputs
Distribute rmultinom(1, N, pi)
and goto step 12;
If any of inputs
Calculate conditional probabilities for center
Scale conditional probabilities of step 6 under the
Distribute individuals in the
If
If input
Generate number of individuals in each cell of rpois(2 * 2 * K, lambda)
;
Calculate desired output forms of generated table.
Suitability checks at steps 3, 9, and 16 are made on minimum and maximum
values and dimensions of input vectors.The multinomial form in equation
((4)) is used for product-multinomial sampling plan. It is
possible to enter a mean number of individuals for each cell under
Poisson sampling plan at step 11 of Algorithm 5 by entering an array for
lambda
. Raw data corresponding to each individual are also generated
among outputs of rTable.2X2XK
.
The function rDiscrete
is used to generate a random value from an
empirical probability distribution. This function is called by both
rTableICC.RxC
and rTableICC.2x2xK
. Implementation of rDiscrete
is
explained by Algorithm 6.
Algorithm 6.
Input empirical probability function (pf) with
Check whether input probabilities sum to one and number of
observations
Calculate cumulative distribution function (cdf),
Set
Generate a random value
If
If
Return the generated values.
rDiscrete
returns an array of generated values and calculated cdf at
step 3 of Algorithm 6.
To generate random R
> library(rTableICC)
Then, the relevant function is called with proper inputs.
In the first example, we illustrate two important cases that generate
errors and stop execution of functions rTableICC.RxC
and
rTableICC.2x2xK
. In the second and third examples, we demonstrate
outputs of rTableICC.2x2xK
and rTableICC.RxC
. In the fourth example,
we exemplify rTable.RxC
, rTable.2x2xK
, and rDiscrete
functions.
In this example, we illustrate two incompatibilities between generated
and allowed maximum cluster sizes and total number of individuals and
number of clusters for functions rTableICC.RxC
and rTableICC.2x2xK
.
When a user enters the value of intraclass correlation for each cluster
size, the maximum allowed cluster size is correspondingly defined.
However, because rTableICC.RxC
and rTableICC.2x2xK
distribute total
sample size, which is entered or generated, among the given number of
clusters, we would have clusters with number of individuals greater than
the maximum allowed cluster size. This case should be regarded while
entering the values of intraclass correlations, total or mean number of
individuals, and total number of clusters.
The following code attempts to generate a max.cluster.size
) is defined to specify the size
of array including ICCs. In this setting, it is highly possible to
allocate more than 4 individuals in one of the clusters.
> num.centers <- 3
> sampl <- "Multinomial"
> max.cluster.size <- 4
> num.cluster <- 25
> num.obs <- 500
> ICCs <- array(0.1, dim = max.cluster.size)
> ICCs[1] <- 0
> cell.prob <- array(1/12, dim = c(num.centers, 4))
> x <- rTableICC.2x2xK(p = cell.prob, theta = ICCs, M = num.cluster, sampling = sampl,
+ N = num.obs)
When 500 individuals are distributed across 25 clusters, the maximum
cluster size is realized as max.cluster.size
, as expected. Then,
execution is stopped with the following error message:
Error in rtableICC.2x2xK.main(p, theta, M, sampling, N, lambda, print.regular, :
Maximum number of individuals in one of the clusters is 14, which is greater
than maximum allowed cluster size.
(1) Re-run the function,
(2) increase maximum allowed cluster size by increasing the number of
elements of theta,
(3) increase total number of clusters, or
(4) decrease total number of individuals!
Now, we change the settings to eliminate the error. rTableICC.2x2xK
generates the desired table when the total number of observations is
decreased to 50, the total number of clusters is increased to 250, or
the maximum cluster size is increased to 15 with the same inputs for the
rest of the arguments.
User should ensure compatibility between the number of individuals and
the total number of clusters. When we run the code given above with
num.obs <- 50
and zero.clusters <- FALSE
, rTableICC.2x2xK
tries to
distribute 50 individuals to 75 clusters; and hence, the following error
message is generated:
Error in rtableICC.2x2xK.main(p, theta, M, sampling, N, lambda, zero.clusters, :
Because number of individuals is less than the total number of clusters, it is
impossible to allocate an individual to each cluster! Set zero.clusters = TRUE
and re-run the function.
The problem is eliminated when zero.clusters
is set to TRUE
.
In this example, the output structure of rTableICC.2x2xK
is
illustrated. We run the code in Example 1 with num.centers <- 2
,
num.obs <- 50
, and zero.clusters <- TRUE
and call print(x)
. The
following part presents the summary information on the data generation
process.
Call:
rTableICC.2x2xK.default(p = cell.prob, theta = ICCs, M = num.cluster,
sampling = sampl, N = num.obs, zero.clusters = TRUE, print.regular = TRUE,
print.raw = FALSE)
Process summary:
----------------
100 observations in 2 centers were successfully generated under Multinomial
sampling plan! Number of clusters for each center is as the following:
25 for Center 1
25 for Center 2
17 clusters include no individual.
21 clusters include one individual.
12 clusters include more than one individual.
Because the multinomial distribution is used to distribute the total sample size across the clusters, there are some clusters with no individuals, as reported in the process summary. Because probabilities used to represent intraclass correlations in equations ((1)) and ((2)) change according to cluster size, we report the number of clusters containing one and more than one individuals in the process summary.
The following part of the output includes
The number of t sized clusters in the set of clusters in which all individuals fall
in cell (j,k) for j,k=1,2:
g.t =
, , Cluster of size 2
C- 1 C- 2
Center- 1 R- 1 0 2
Center- 1 R- 2 1 2
Center- 2 R- 1 1 0
Center- 2 R- 2 0 1
, , Cluster of size 3
C- 1 C- 2
Center- 1 R- 1 1 1
Center- 1 R- 2 0 0
Center- 2 R- 1 0 1
Center- 2 R- 2 0 0
, , Cluster of size 4
C- 1 C- 2
Center- 1 R- 1 0 0
Center- 1 R- 2 0 0
Center- 2 R- 1 0 0
Center- 2 R- 2 0 1
The number of clusters of size t outside the set of clusters in which all individuals
fall in a single cell: g.tilde = ( 0 1 0 )
Generated random table in two dimensions :
R1C1 R1C2 R2C1 R2C2
Center- 1 4 10 7 7
Center- 2 3 5 5 9
Generated random table in three dimensions :
, , Center- 1
C- 1 C- 2
R- 1 4 10
R- 2 7 7
, , Center- 2
C- 1 C- 2
R- 1 3 5
R- 2 5 9
To illustrate the output raw data format, we run the following code:
> num.centers <- 3
> num.cluster <- 5
> num.obs <- 10
> ICCs <- array(0.1, dim = 4)
> ICCs[1] <- 0
> cell.prob <- array(1/12, dim = c(num.centers, 4))
> x <- rTableICC.2x2xK(p = cell.prob, theta = ICCs, M = num.cluster,
+ sampling = "Multinomial", N = num.obs)
The resulting raw data output given below is printed as a three
dimensional array. The first dimension includes observations, the second
dimension has
Generated random table in raw data format =
, , 1
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 0 0 0 0 0
[2,] 0 0 0 0 0 0
[3,] 1 0 0 0 0 0
...
[10,] 0 0 0 0 0 0
, , 2
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 1 0 0 0 0 0
[3,] 0 0 0 0 0 0
...
[10,] 0 0 0 0 1 0
The output structure of rTableICC.RxC
is similar to that of
rTableICC.2x2xK
. We run the following code to generate a
> num.cluster <- 12
> ICCs <- array(0.1, dim = 9)
> ICCs[1] <- 0
> num.obs <- 24
> zeros <- FALSE
> sampl <- "Product"
> row <- c(12, 12)
> cell.prob <- array(0, dim = c(2, 3))
> cell.prob[1, 1:2] <- 0.2
> cell.prob[1, 3] <- 0.1
> cell.prob[2, 1:2] <- 0.1
> cell.prob[2, 3] <- 0.3
> y <- rTableICC.RxC(p = cell.prob, theta = ICCs, row.margins = row, M = num.cluster,
+ sampling = sampl, zero.clusters = zeros, print.regular = TRUE,
+ print.raw = FALSE)
> print(y)
In the output of rTableICC.RxC
, first the following summary table is
generated. Coincidentally, there is no cluster with more than one
individual. Clusters are enforced to contain at least one individual.
Call:
rTableICC.RxC.default(p = cell.prob, theta = ICCs, M = num.cluster,
row.margins = row, sampling = sampl, zero.clusters = zeros,
print.regular = TRUE, print.raw = FALSE)
Process summary:
----------------
24 observations in 12 12 clusters were successfully generated under Product
multinomial sampling plan!
Each cluster includes at least one individual.
12 clusters include one individual.
0 clusters include more than one individual.
In the output, the vector
The number of t sized clusters in the set of clusters in which all individuals fall in
cell (j,k) for j=1,...,R and k=1,...,C: g.t =
, , Cluster of size 2
C- 1 C- 2 C- 3
R- 1 0 0 0
R- 2 0 0 0
...
, , Cluster of size 9
C- 1 C- 2 C- 3
R- 1 0 0 0
R- 2 0 0 0
The number of clusters of size t outside the set of clusters in which all individuals
fall in a single cell: g.tilde = ( 0 0 0 0 0 0 0 0 )
Generated random table in row format = ( 5 4 3 3 3 6 )
Generated random table in RxC format =
C- 1 C- 2 C- 3
R- 1 5 4 3
R- 2 3 3 6
In this example, we run a couple of codes to illustrate random
contingency table generation without ICCs. Besides, we show outputs of
the function rDiscrete
.
The following code generates and prints a random
> num.row <- 5
> num.col <- 7
> sampl <- "Multinomial"
> cell.prob <- array(1/35, dim = c(num.row, num.col))
> num.obs <- 124
> x <- rTable.RxC(p = cell.prob, sampling = sampl, N = num.obs)
> print(x)
The corresponding output of rTable.RxC
is as follows. After a brief
summary, the generated table is printed.
Call:
rTable.RxC.default(p = cell.prob, sampling = sampl, N = num.obs)
Process summary:
----------------
124 observations across 5 rows and 7 columns were successfully generated under
Multinomial sampling plan!
Generated random table in RxC format =
C- 1 C- 2 C- 3 C- 4 C- 5 C- 6 C- 7
R- 1 4 2 3 4 5 4 4
R- 2 4 5 5 5 5 5 4
R- 3 4 1 5 3 3 2 3
R- 4 2 1 1 6 4 3 3
R- 5 2 1 4 5 7 2 3
The following code is run to randomly generate a
> num.centers <- 3
> sampl <- "Poisson"
> cell.mean <- array(3, dim = c(2, 2, num.centers))
> z <- rTable.2x2xK(sampling = sampl, lambda = cell.mean)
> print(z)
Consequently, 31 observations were generated under 3 centers.
Call:
rTable.2x2xK.default(sampling = sampl, lambda = cell.mean)
Process summary:
----------------
31 observations in 3 centers were successfully generated under Poisson sampling plan!
Generated random table in 2x2xK format =
, , Center- 1
C- 1 C- 2
R- 1 1 0
R- 2 2 4
, , Center- 2
C- 1 C- 2
R- 1 6 3
R- 2 2 4
, , Center- 3
C- 1 C- 2
R- 1 2 3
R- 2 2 2
To generate random values from an empirical probability function, we
call rDiscrete
. We run the following code to generate two random
values from a given probability function:
> p <- c(0.23, 0.11, 0.05, 0.03, 0.31, 0.03, 0.22, 0.02)
> rDiscrete(n = 2, pf = p)
Consequently, the generated random values and corresponding cdf are printed.
$rDiscrete
[1] 1 7
$cdf
[1] 0.00 0.23 0.34 0.39 0.42 0.73 0.76 0.98 1.00
The package rtableICC is intended to be used in combinaion with other
code that implements Monte Carlo simulation. Therefore, the
computational performance of rtableICC is of importance. We
investigate running times of functions in rtableICC under various
combinations of table structure, sample size, and sampling plan.
Tables 1 and 2 show test conditions of each function
of rtableICC related with rTableICC.2x2xK
and
rTableICC.RxC
functions, the argument zero.clusters
was set to
TRUE
to allow clusters with no individuals. Note that when
zero.clusters
is set to FALSE
, we get shorter mean running times.
All the combinations were run on a MAC-Pro computer equipped with 6
Intel(R) Xenon(R) CPU E5-1650 v2 at 3.5GHz, 16 GB of RAM, and Windows
8.1 operating system.
Plan | N. of Observations | N. of Centers | N. of Clusters | Cell Mean | Center Margins | |
---|---|---|---|---|---|---|
rTableICC.2x2xK |
Mult. | — | — | |||
rTable.2x2xK |
Mult. | — | — | — | ||
rTableICC.2x2xK |
Poi. | — | — | |||
rTable.2x2xK |
Poi. | — | — | — | ||
rTableICC.2x2xK |
Pro. | — | — | |||
rTable.2x2xK |
Pro. | — | — | — |
N: Number; Mult: Multinomial; Poi: Poisson; Pro: Product Multinomial; Plan: Sampling plan.
Plan | N. of Obs. | N. of Rows | N. of Columns | N. of Clusters | Cell Mean | Row Margins | |
---|---|---|---|---|---|---|---|
rTableICC.RxC |
Mult. | — | — | ||||
rTable.RxC |
Mult. | — | — | — | |||
rTableICC.RxC |
Poi. | — | — | ||||
rTable.RxC |
Poi. | — | — | — | |||
rTableICC.RxC |
Pro. | — | — | ||||
rTable.RxC |
Pro. | — | — | — |
N: Number; Mult: Multinomial; Poi: Poisson; Pro: Product Multinomial; Obs: Observations; Plan: Sampling plan.
For multinomial, Poisson, and product multinomial sampling plans,
scatter plots representing the mean running times of rTableICC.2x2xK
according to some of the considered factors are given in
Figure 1. Due to the small variances within repetitions, plots
are drawn only for the mean running times.
|
(a) |
|
(b) |
|
(c) |
rTableICC.2x2xK
function under multinomial, Poisson, and product multinomial sampling
plans. Panels (a) and (c) represent mean running time versus number of
observations colored according to number of centers for the multinomial
and product multinomial sampling plans, respectively. Panel (c)
represents mean running time versus mean number of observations in each
cell colored according to number of clusters for the Poisson sampling
plan.
For the multinomial sampling plan, the scatter plot of mean implementation time versus number of observations colored according to number of clusters is very similar to the one given in panel (a) of Figure 1. For the Poisson sampling plan, the scatter plot of mean running time versus mean number of observations in each cell colored according to number of centers is very similar to the one given in panel (b) of Figure 1. For the product multinomial sampling plan, the scatter plot of mean running time versus fixed row totals colored according to number of centers is very similar to the one given in panel (c) of Figure 1. Therefore, these plots are omitted here.
Under the multinomial sampling plan, the mean running time for
rTableICC.2x2xK
is equally affected by number of clusters and number
of centers. The number of observations has the primary effect on mean
running time. We have long mean running times even for small number of
clusters or number of centers if the number of observations is large.
Smaller mean running times with high number of centers were recorded for
small number of clusters and vice versa. Due to high running times in a
small portion of test combinations, the overall distribution of times is
right-skewed. The overall median of mean running times is 0.589 seconds
with overall median variance of 0.002 and rTableICC.2x2xK
increases along with the mean number of observations in each cell. We
have high running times for greater number of clusters. The same case is
also seen for greater number of centers. The mean number of observations
in each cell is the dominant factor on implementation time. The overall
distribution of mean running times is right-skewed. The overall median
of mean running times is 1.109 seconds with overall median variance of
0.016 and rTableICC.2x2xK
increases with increasing number of observations in each fixed margin.
Also, we have longer running times for both greater number of centers
and number of clusters. Rarely, it is also possible to have long running
times for a moderate number of clusters or a moderate number of centers.
The number of observations in the fixed margins has the primary effect
on the mean running time. The overall distribution of mean running times
is highly right-skewed due to the outlier value seen in panel (c) of
Figure 1. The overall median of mean running times is 0.528
seconds with overall median variance of 0.002 and
When the function rTable.2x2xK
was run under the multinomial sampling
plan with corresponding test combinations given in Table 1,
all of the mean running times were less than rTable.2x2xK
; and
hence, no plots are provided for the mean running times of
rTable.2x2xK
. It is possible to record higher running times with a
greater number of observations or number of centers. However, setting
these parameters to such large values is unreasonable. For the Poisson
sampling plan, the maximum mean implementation time over all of the
corresponding test combinations in Table 1 is 0.013 seconds.
The effect of the number of centers on running time is unobservable. The
overall median of mean running times is less than
For the function rTableICC.RxC
, plots of mean implementation time
versus number of observations and number of clusters colored by number
of rows under multinomial, Poisson, and product multinomial samplings
are given in Figure 2. Corresponding plots colored by number
of columns are very similar to those seen in Figure 2; hence,
they are omitted here. For the multinomial sampling plan, mean running
times are severely affected by both increasing number of observations
and increasing number of rows. However, this is not seen for an
increasing number of clusters. We have long mean running times for
moderate and small number of clusters. Number of rows (columns) and
number of observations are mainly impactful on the running time of
rTableICC.RxC
under the multinomial sampling plan. For the multinomial
sampling plan, the overall average of mean running times is 21.312
seconds with median variance of 0.015. The overall median of mean
running times is 0.515 seconds, their distribution is highly
right-skewed, and
|
|
(a) | (b) |
|
|
(c) | (d) |
|
|
(e) | (f) |
rTableICC.RxC
function under considered sampling plans. Panel (a) shows mean running
time versus number of observations colored by number of rows for the
multinomial sampling plan. Panel (c) shows mean running time versus mean
number of observations in each cell colored by number of rows for the
Poisson sampling plan. Panel (e) shows mean running time versus fixed
row counts colored by number of rows for the product multinomial
sampling plan. Panels (b), (d), and (f) represent mean running time
versus number of clusters colored by number of rows for the multinomial,
Poisson, and product multinomial sampling plans,
respectively.
For the function rTable.RxC
, we have similar results than for
rTable.2x2xK
. Under multinomial, Poisson, and Product multinomial
sampling plans, the overall averages of mean running times are 0.00007,
0.001, and 0.001 with overall median variances less than rTable.RxC
.
In conclusion, the performance of the functions generating tables
without ICC is better than those generating tables with ICCs. Running
times of both rTable.2x2xK
and rTable.RxC
are not notably affected
by the values of their arguments and short enough to be used in
combination with other Monte Carlo simulation algorithms. Running times
of both rTableICC.2x2xK
and rTableICC.RxC
are severely affected by
the process carried out by the function compositions
of the package
partitions. Therefore, their running times are sensitive to inputs
and, in general, affected by the total number of individuals to be
generated. If generation of a table with a very large total number of
individuals is intended, a smaller number of individuals can be
generated by a proper scaling on the number of individuals in each cell.
In this article, we introduced the R package rTableICC to generate
2rTableICC.2x2xK
and rTableICC.RxC
for the generation of tables under the presence of intraclass
correlations between individuals. Also, we described the function
rDiscrete
for random number generation from empirical probability
functions. We provided detailed algorithms working behind the functions
and illustrated use and input-output structures of functions in
rTableICC by numerical examples. Then, we conducted a detailed
performance analysis over mean running times of functions
rTableICC.2x2xK
, rTable.2x2xK
, rTable.RxC
, and rTableICC.RxC
. In
the performance analysis, we obtained very short running times for the
functions rTable.2x2xK
and rTable.RxC
, and reasonable running times
for the functions rTableICC.2x2xK
and rTableICC.RxC
.
As a limitation, when there is ICCs between individuals and the number
of rows or columns is greater than 5, functions rTableICC.2x2xK
and
rTableICC.RxC
may require long running times based on the total number
of individuals to be generated. The cause of this situation is the
execution time required by the compositions
function of the package
partitions. To overcome this limitation, we are planning to decrease
complexity of some inner loops of both rTableICC.2x2xK
and
rTableICC.RxC
functions in forthcoming versions of rTableICC.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Demirhan, "rTableICC: An R Package for Random Generation of 2x2xK and RxC Contingency Tables", The R Journal, 2016
BibTeX citation
@article{RJ-2016-004, author = {Demirhan, Haydar}, title = {rTableICC: An R Package for Random Generation of 2x2xK and RxC Contingency Tables}, journal = {The R Journal}, year = {2016}, note = {https://rjournal.github.io/}, volume = {8}, issue = {1}, issn = {2073-4859}, pages = {48-63} }