CryptRndTest: An R package for testing the cryptographic randomness

In this article, we introduce the R package CryptRndTest that performs eight statistical randomness tests on cryptographic random number sequences. The purpose of the package is to provide a software for the implementation of recently proposed cryptographic randomness tests utilizing goodness-of-ﬁt tests superior to the usual chi-square test in terms of statistical performance. Most of the tests included by CryptRndTest are not conducted by available software such as the R package RDieHarder or the C library TestU01. Chi-square, Anderson-Darling, Kolmogorov-Smirnov, and Jarque-Bera goodness-of-ﬁt procedures are applied along with cryptographic randomness tests. CryptRndTest utilizes multiple precision ﬂoating numbers for sequences longer than 64-bit by the use of package Rmpfr. By this way, included tests are applied precisely for higher bit-lengths. CryptRndTest provides a user friendly interface for eight existing and recently proposed cryptographic randomness tests. As an illustrative application, CryptRndTest is used to test available random number generators in R.


Introduction
Cyptographic random numbers constitute the heart of ciphering processes. Security of the transmitted information is mostly based on the quality of random numbers used to cipher the information. Due to the efficiency considerations, pseudo random numbers that ensure some hard-to-achieve properties are used for ciphering in practice. There are a considerable amount of pseudo random number generators (RNG's) in the literature of cryptography. Suitability of these RNG's for use in cryptographic applications is evaluated by using statistical randomness tests that are specifically designed to test randomness at the level required for ciphering processes.
In a cryptographic randomness test, first, empirical distribution of a test statistic is obtained over a random number sequence by various data manipulations. Then, a statistical goodness-of-fit test is applied to evaluate significance of the difference between the empirical distribution and its theoretical counterpart at a predetermined level of significance. The need for a certain level of randomness to ensure unpredictability in cryptography context makes procedures used to check cryptographic and classical randomness different from each other. The manipulations of random number sequences are required to make the cryptographic randomness tests more sensitive to small deviations from the exact randomness than their classical counterparts. The null hypothesis of the test is "H 0 : Sequences generated by the RNG of interest are random." There are more than a hundred alternative tests for the evaluation of cryptographic randomness (L' Ecuyer and Hellekalek, 1998).
In the literature, some of these tests are grouped into test batteries or test suites (L' Ecuyer and Simard, 2007;Marsaglia and Tsang, 2002). A detailed review of test batteries is given by Demirhan and Bitirim (2015a). To be qualified as suitable, an RNG should be identified as random in a predetermined portion or all of the tests in a test battery. The basic test battery is introduced by Knuth (1998Knuth ( , 1981Knuth ( , 1969. Then, Marsaglia (1996) introduced the Diehard test battery composed of 12 randomness tests. Disadvantages of Diehard test battery were overcame by another test battery called Dieharder that is introduced by . Dieharder includes 26 cryptographic randomness tests. It is an improvement of Diehard battery, provides a user friendly interface and a useful open source toolset for users of random numbers . The Dieharder test battery is implemented by the R package RDieHarder prepared by . At the time of writing, Windows and OS X binaries are not available for this package. US National Institute of Standards and Technology developed the NIST battery composed of 16 tests Sýs andŘíha, 2014;Rukhin et al., 2010;Rukhin, 2001;Soto, 1999). The NIST battery is still used as a straightforward tool for formal certifications and accepted as a standard test battery. Sadique et al. (2012) reviewed the tests included in NIST test battery. A suite of test batteries, TestU01, was introduced by L' Ecuyer andSimard (2007, 2014). TestU01 is a C library that combines most of the available randomness tests and RNGs in six test batteries (McCullough, 2006;L'Ecuyer and Simard, 2007). There are also smaller scale test batteries in terms of extensiveness. ENT was proposed by Walker (2014) that contains 5 statistics and tests. The Helsinki test battery is based on Ising model and random walks on lattices and was proposed by Vattulainen et al. (1995). The Crypt-X test battery, which includes 6 tests, was developed by Information Security Research Center at Queensland University of Technology (Sýs anď Ríha, 2014;Soto, 1999). SPRNG test battery includes some tests from the battery of Knuth (Mascagni and Srinivasan, 2000). Ruetti (2004) combined Knuth, Helsinki, Diehard, and SPRNG batteries and proposed a test battery consisting of 37 statistical and physical tests.
In addition to the tests included by test batteries, there are recently proposed cryptographic randomness tests that are not performed by test batteries. Maurer (1992) proposed a statistical test for random bit generators. Hernandez et al. (2004) proposed a new test called Strict Avalanche Criterion (SAC). Ryabko et al. (2004) proposed an adaptation of well-known chi-square test. This test is more efficient than the usual chi-square test in small samples. "Book Stack" and "Order" tests were proposed by Ryabko and Monarev (2005) for testing binary random bit sequences. Doganaksoy et al. (2006) proposed three randomness tests based on random walk process. Advantage of these tests is that it is possible to calculate exact probabilities corresponding to the test statistics. "Topological Binary Test" was introduced by Alcover et al. (2013) to test randomness in bit sequences. It counts different bit patterns of pre-determined length in a sequence of random bits.
Availability of a software for the implementation of a test battery or even that of an individual cryptographic randomness test is a critical issue on the usefulness of related test or battery. The library TestU01 is developed on ANSI C; hence, it is compiled by GNU tools instead of today's C compilers. Although TestU01 performs a wide variety of tests and their combinations, it lacks flexibility of implementation. Because the battery Dieharder is implemented by an R package, namely RDieHarder, it is more applicable and user-friendly than TestU01. However, unavailability of Windows and OS X binaries can be seen as a disadvantage that decreases its accessibility. A package for the implementation of the NIST battery is prepared on SUN workstation using ANSI C (Rukhin et al., 2010). Rukhin et al. (2010) provides a user guide for setting up the package and running the included tests. Ease of implementation of NIST battery is similar with TestU01. For the implementation of individual randomness tests, there are also numerous R packages such as randtests or DescTools. Although some of the tests included by these packages are also used to evaluate cryptographic randomness, they cover neither recently proposed tests nor those developed specifically to test cryptographic randomness.
The usual chi-square test is applied with nearly all of the cryptographic randomness tests in the literature. The mentioned implementations including those covered by R automatically apply chi-square test. However, there are numerous alternatives to chi-square goodness-of-fit test such as Kolmogorov-Smirnov, Anderson-Darling, or Jarque-Bera. It is apparent that because statistical qualities of these tests are better than the chi-square test, there will be a gain in performance of cryptographic randomness tests applied with better goodness-of-fit tests. Thus, we need a software that is capable of conducting actual cryptographic randomness tests such as topological binary, book stack, etc. with goodness-of-fit tests better than usual chi-square in statistical performance. When the range and variety of cryptographic randomness tests implemented by software and practicability of available software are considered, the new software should effectively implement new tests with various goodness-of-fit tests and has a user-friendly interface. The package CryptRndTest contributes to satisfy this need.
The aim of this article is to describe and illustrate use of the R package CryptRndTest (currently in version 1.2.2) that performs some of recently proposed and basic cryptographic randomness tests. The article is mainly based on the paper of Demirhan and Bitirim (2015b). The package includes the functions adaptive.chi.square, birthday.spacings, book.stack, GCD.test, topological.binary, and random.walk.tests to perform adaptive chi-square, birthday spacing's, book stack, greatest common divisor, topological binary tests, and three tests based on the random walk process, respectively. To the best of our knowledge, the adaptive chi-square, topological binary, and the tests based on the random walk process are first implemented by a software with CryptRndTest. In addition to the chi-square procedure, these functions apply Anderson-Darling, Kolmogorov-Smirnov, and Jarque-Bera procedures when suitable. Because statistical performances of goodness-of-fit tests differ under various conditions, application of different goodness-of-fit procedures is a beneficial feature. This is another important utility of CryptRndTest. In addition, it has the following auxiliary functions: GCD, GCD.q, GCD.big, Strlng2, toBaseTwo, toBaseTen, and TBT.criticalValue to compute greatest common divisor under different conditions of inputs, approximately calculate the Stirling number of the second kind when the inputs are large, make base conversions precisely with large inputs, and calculate critical values for topological binary test.
The paper is organized as follows: in the next section, methodologies of the tests included in CryptRndTest are briefly given. Details of algorithms used to manipulate integer and bit sequences are mentioned, and applications of goodness-of-fit procedures performed by CryptRndTest are clarified. Parameter settings and limitations for each test are mentioned. Finally, as an illustrative application of CryptRndTest, random number generators available in R are tested by using the proposed package under different sequence and bit-length conditions. By this application, implementation performance of the package is analyzed, recently proposed tests are evaluated, and usage of CryptRndTest is illustrated.

Performed tests Adaptive chi-square
Adaptive chi-square test was introduced by Ryabko et al. (2004). It is empirically demonstrated by Ryabko et al. (2004) that the adaptive chi-square test is more efficient than the classical chi-square test in the identification of non-random patterns in samples smaller than those required by the chi-square test. For example, when we work with 64-bit numbers the length of the alphabet is 2 64 ; hence, we need to have a sequence of length greater than 5 · 2 64 to apply the classical chi-square test safely. The logic behind the test is to divide the alphabet into subsets and perform chi-square test over subsets instead of individual elements of the sample. By this way, subsets are considered as a new alphabet and a new null hypothesis and its alternative are formed over the subsets. Because the number of categories required to test new hypotheses is equal to the number of subsets, the chi-square test is applied with much smaller samples. To conclude randomness, it is expected to observe a uniformity in the distribution of input numbers into the subsets. Deviations from this uniformity are detected by the adaptive chi-square test.
The function adaptive.chi.square() is called to apply the test. It implements the following pseudo-code algorithm: Algorithm 1.
1. Input data as a matrix of bits or a vector of integers, the number of subsets (S) that the alphabet will be divided into, and proportion of training data set; 2. If data is represented by bits, transform data to base-10; 3. Divide whole data set into training and testing subsets with regarding input weights; 4. Identify the numbers that are seen in the sequence of interest at least once; 5. Find the frequency of occurrences for each element of alphabet in training and testing subsets; 6. For i = 1, . . . , S, find the frequency of elements that are seen i-times in the training and testing subsets; 7. Apply the two-sample chi-square test with the expected and observed counts obtained at the previous step over the training and testing subsets, respectively; 8. Return value of the test statistic, corresponding p-value, and the decision on the null hypothesis.
While working with integers, the alphabet corresponds to the range of considered numbers. For instance, if 32-bit numbers are being tested, the alphabet in Algorithm 1 includes the numbers between 0 and 2 32 − 1. At step 4, we do not form whole alphabet, instead we count the numbers (words) that are seen at least once; and hence, the rest of the numbers have zero count. At step 7, the degrees of freedom of the test is S − 1.
Parameters of the adaptive chi-square test are: weight of training and testing samples (r), the length of the considered number sequence (n), and the number of subsets (S) that the alphabet is divided into. Ryabko et al. (2004) do not give strict rules for the determination of values of these parameters. They suggest to run some experiments to find the values of parameters that provide the highest statistical performance such as power and specificity. Because such a study would not be cost-effective for an individual application of the test, at least, the user may evaluate sensitivity of test results to the values of S and r. In the function adaptive.chi.square(), we set r = 0.5 by default. The value of S is set by user. That of n is determined by the length of input data. Because input data is a random sample from the RNG of interest, the value of n should be increased with increasing bit-length to successfully represent the range of numbers that will be generated by the RNG. When bit-length is greater than 64, we utilize the package Rmpfr to work with higher precision.
Algorithm complexity of the function adaptive.chi.square() is O(n 2 ) in the worst case. Required memory is directly related with the length of input sequence. Due to the algorithm complexity of the function used to identify unique numbers at step 4, implementation time of the function adaptive.chi.square increases quadratically along with the length of input sequence.

Birthday Spacings
The Birthday Spacings test was given by Marsaglia and Tsang (2002). It focuses on the number of duplicated values of spacings between ordered birthdays among a year of pre-determined length. The observed duplication patterns in input numbers are compared with the patterns that should be observed under randomness. Thus, birthday spacings test detects deviations from randomness by focusing on repetition frequency of numbers to ensure uniformity. Marsaglia and Tsang (2002) propose that the number of duplicated values is approximately distributed according to the Poisson distribution. They also derive an expression for the mean rate of the Poisson distribution.
The function birthday.spacings() is employed to run the test. It implements the following pseudo-code algorithm: 1. Input data as a vector of integers of size n, the number of birthdays (m), the length of year (N), the mean rate of the theoretical Poisson distribution (λ), and the number of classes (k) that is constructed for goodness-of-fit tests; 2. Reshape the first m · n/m elements of input vector as a matrix of n/m rows and m columns; 3. Sort each row of the matrix of step 2 according to the values in columns; 4. For each row, find the distance between columns of the sorted matrix by extracting the values in the columns at the previous step; 5. Count duplicated values among the distances obtained at step 4; 6. Calculate class probabilities over the Poisson distribution with mean rate λ for x = 0, . . . , k, and assign the rest of probability mass to the (k + 1)-th class; 7. Calculate expected frequencies corresponding to the probabilities obtained at the previous step; 8. Replicate the expected counts to form the corresponding sample; 9. Apply the Anderson-Darling test to compare goodness-of-fit of the samples obtained at steps 5 and 8; 10. Apply the Kolmogorov-Smirnov test to compare goodness-of-fit of the samples obtained at steps 5 and 8; 11. Construct frequency table of the counts obtained at step 5; 12. Apply chi-square test over the frequency tables obtained at steps 7 and 11; 13. Return the values of test statistics, corresponding p-values, and decisions on the null hypothesis.
At step 2 of Algorithm 2, each row of the reshaped matrix includes birthdays in columns. Total number of rows determines the size of sample that is used in goodness-of-fit tests applied at steps 9, 10, and 12. Manipulation of the input vector according to the birthday spacings test is completed at step 5. This manipulation produces the empirical sample in testing the goodness-of-fit to Poisson distribution. The Anderson-Darling test at step 9 is applied by using function ad.test from the package ksamples. The Kolmogorov-Smirnov test at step 10 is applied by using function ks.test from the package stats. Marsaglia and Tsang (2002) give some insight into the optimal values of parameters. The mean rate is λ = m 3 /(4n). They state that for an RNG, it is harder to pass this test for increasing values of either m or n. Specifically, the case with m = 4096 and n = 2 32 is qualified as a compelling setting for 32-bit generators. Length of the input sequence is another important parameter. Because the size of sample used in testing the goodness-of-fit is equal to n/m , the length of the input sequence (n) should be chosen large enough to apply the goodness-of-fit tests appropriately.
Algorithm complexity of the function birthday.spacings() is O(n 2 ) in the worst case. The limitation of birthday.spacings() is directly related with the value of m. For all combinations of m and n suggested by Marsaglia and Tsang (2002), λ is equal to 4. Following this logic, when n = 2 64 the value of m giving λ = 4 is 6,658,043. In this case, for a reliable application of goodness-of-fit tests at steps 9, 10, and 12, we need at least 133,160,860 integers and correspondingly 8,522,295,040 bits. For bit lengths higher than 32, the value of λ can be taken as 2. For instance, when n = 2 64 , the corresponding value of m is 5,284,492. Thus, decreasing the value of λ does not overcome the need for a huge data set for a reliable testing. Note that use of huge data set for testing is a memory consuming operation.

Book Stack
The Book Stack test was proposed by Ryabko and Monarev (2005). Positions of the numbers on a stack are taken into consideration. In this test, randomness implies that frequency of finding each number at each position is equally likely. Departures from this equality mean that some of the words are seen more frequently in contrast to the nature of randomness. The book stack test focuses on non-uniform patterns and frequent repetitions of input numbers to detect deviations from randomness by means of unexpected autocorrelation patterns and non-uniformity.
The function book.stack() implements the following pseudo-code algorithm to run the test: 1. Input data as a matrix of bits or a vector of integers and the number of subsets (k) that the alphabet will be divided into; 2. If data are represented by bits, transform data to base-10; 3. Form an array that includes the numbers from 1 to the number of unique words in the input sequence; 4. Write each element of the input vector in place of the first element of the array formed at the previous step, and move the other elements except the one written to the first cell of the array one step right; 5. Record the array obtained at the previous step; 6. Go back to step 4 until all elements of the input vector are taken into account; 7. Divide the whole alphabet into k non-overlapping subsets (A 1 , A 2 , . . . , A k ); 8. For each subset of alphabet, find the frequency of occurrences of the number corresponding to the position of each element of input vector in the arrays formed at steps 4 and 5; 9. Apply chi-square test with expected counts equal to n · A i , where i = 1, . . . , k and n is the length of input vector or number of columns of input matrix; 10. Return the value of test statistic, corresponding p-value, and decision on the null hypothesis.
In order to get an integer number of subsets, the length of input vector should be determined to get an integer as the length of subsets. Optimal value for the length of input vector is given as n ≈ B · 2 B/2 , where B is the bit-length of considered RNG (Ryabko and Monarev, 2005;. For an appropriate determination of number of subsets, k, Ryabko and Monarev (2005) suggest performing an empirical study. As for an appropriate bit-length, it is mentioned by Ryabko and Monarev (2005) that it is hard to set up a sensible test with much higher bit-lengths.
Algorithm complexity of the function book.stack() is O(n 2 ) in the worst case. The limitation of the Book Stack test is based on the bit-length of considered RNG. For example, for B = 64 the length of input vector is calculated as 1.37 · 10 11 and we need 1 terabyte memory whereas the memory requirement is 4 megabytes for B = 32. Due to both memory and sensibility issues, it is not appropriate to work with high bit-lengths such as 64.

Greatest Common Divisor
Two tests proposed by Marsaglia and Tsang (2002) are based on the number of required iterations (k) and the value of greatest common divisor (GCD) obtained in the GCD operation. When perceived as random variables, both k and GCD are independently and identically distributed and their distributions can be obtained under randomness. Marsaglia and Tsang (2002) derived distributions of k with an empirical study and that of GCD theoretically under the null hypothesis of randomness. Departures from randomness imply nonconformity between empirical and theoretical distributions of k and GCD. Thus, these tests focus on the deviations from independence and uniformity.
The function gcd.test() is called to apply the test. The following pseudo-code algorithm is implemented by gcd.test() when all of the goodness-of-fit tests are set to TRUE: 1. Input data as an N × 2 matrix of integers, mean and standard deviation of theoretical normal distribution of k; 2. Constitute a pair of numbers from each row of input matrix; 3. Apply GCD operation to each pair formed at the previous step; 4. Store values of k for N pairs; 5. If obtained GCD is less than 3, store it as 3 and if that of GCD is greater than 35, store it as 35; 6. Generate a random sample of size N from normal distribution with input values of mean and standard deviation.
7. If the tests based on k will be conducted, go to the next step, otherwise go to step 13; 8. Apply the two sample Kolmogorov-Smirnov test in a two-sided setting to samples obtained at steps 4 and 6; 9. Apply the chi-square test to samples obtained at steps 4 and 6; 10. Standardize the values of k by using its empirical mean and standard deviation; 11. Apply the Jarque-Bera test to the standardized sample of step 10; 12. Apply the Anderson-Darling test to samples obtained at steps 4 and 6; 13. If the tests based on GCD will be conducted, go to the next step, otherwise go to step 19; 14. Construct the cumulative distribution function (cdf) of the probability function (pf) of GCD given by Marsaglia and Tsang (2002).
15. Obtain theoretical frequencies for GCD over the cdf of step 14. Specifically, if theoretical frequency of GCD is less than 3, store it as 3 and if that of GCD is greater than 35, store it as 35; 16. Replicate the expected counts to form the corresponding sample; 17. Apply the two sample Kolmogorov-Smirnov test in a two-sided setting to samples obtained at steps 5 and 16; 18. Apply the chi-square test to samples obtained at steps 5 and 16; 19. Return the values of calculated test statistics, corresponding p-values, and decisions on the null hypothesis.
Mean and standard deviation of theoretical normal distribution for bit lengths other than 32 are not given by Marsaglia and Tsang (2002). We conducted extensive empirical studies, details of which are mentioned in following sections, to obtain these parameters and tabulated obtained values in Table  3.
When bit-length is increased, corresponding value of GCD mostly becomes greater than 35; hence, the operation at step 15 of Algorithm 4 gets unreasonable. Thus, we observe that it is not appropriate to conduct tests based on GCD for high bit-lengths such as 128.
The Kolmogorov-Smirnov and chi-square tests at steps 8 and 17, and 9 and 18 are applied by using functions ks.test and chisq.test from the package stats, respectively. The Jarque-Bera test at step 11 is implemented by using the function jarque.bera.test from the package tseries. The Anderson-Darling test is applied by using the function ad.test from the package ksamples.
Calculations of the number of required iterations and the value of GCD are time consuming tasks for bit-lengths greater than 64. To overcome this difficulty, we prepared three functions to calculate GCD-related variables. The first function GCD.q computes the number of required iterations, the value of GCD, and the sequence of partial quotients by using the Euclidean algorithm. The function GCD is the recursive version of the Euclidean algorithm and it only provides the number of required iterations and the value of GCD. The function GCD.big applies the Euclidean algorithm over multiple precision floating point numbers using the Rmpfr and provides all three outputs related with the GCD operation. While GCD is the fastest one, GCD.big gives the most precise results. It is also possible to use the binary GCD algorithm to decrease the implementation time. However, in this case it is not possible to apply tests over the number of required iterations of the Euclidean algorithm. When the GCD operation is done recursively, the algorithm complexity of gcd.test() is O(log(a)), where a is the maximum initial input to the recursive algorithm. Memory requirement for GCD tests is directly related with the value of N.

Random walk tests
In the literature, binary sequences are analyzed in detail by using random walk process. Doganaksoy et al. (2006) proposed three tests based on the random walk stochastic process. In a random walk process, magnitude or direction of each change is determined by chance; hence, a random walk is random if increment and decrement probabilities are equal to each other. Therefore, random walk processes provide a good basis for randomness. In a random walk, a part of sequence that intersects the x-axis with two successive points is called excursion, and over all excursions, the maximum distance from the x-axis is defined as height, and the vertical distance between minimum and maximum points over y-axis is called expansion. Thus, we have three characteristics of random walk process to observe deviations from randomness. The corresponding tests are called Random Walk Excursion, Random Walk Height, and Random Walk Expansion. If there is a trend in the process, input sequence fails in the excursion test. The height test focuses on the moves with very low or high magnitude to detect non-randomness. The expansion test focuses on the anomalies in amplitude of the walk to identify non-random patterns. Because the exact probabilities corresponding to test statistics are calculated, the tests proposed by Doganaksoy et al. (2006) are also applicable for small sample sizes.
The function random.walk.tests() is called to apply three tests, selectively. The following pseudocode algorithm is implemented by random.walk.tests() when all of the tests are to be applied: The Anderson-Darling test at step 9 is applied by using function ad.test from the package ksamples. The Kolmogorov-Smirnov test at step 10 is applied by using function ks.test from the package stats. The chi-square test at step 11 is the classical application of the test without using a predefined function. If one of the tests is not applied, all the results related with that test in output are set to -1.
Algorithm complexities of expansion, excursion, and height tests are O(B), O(B k · B ), and O(B k · B ), respectively. The limitation of the tests is unavailability of theoretical cdf's for bit-lengths other than 32, 64, 128, and 256. Therefore, using the information given by Doganaksoy et al. (2006) the excursion is applied for bit-lengths of 16, 32, 64, 128, and 256; the height test is applied for bit-lengths of 64, 128, 256, 512, and 1024; and the expansion test is applied for bit-lengths of 32, 64, and 128. Although the size of required memory increases along with the length of input sequence, it is possible to apply the tests with reasonable sequence lengths without causing memory pressure.

Topological binary
The topological binary test was proposed by Alcover et al. (2013) to test the randomness in bit sequences. The logic behind the test is based on the number of different fixed-length bit patterns in a bit sequence. Frequency of distinct non-overlapping bit patterns over the sequence of interest is influential on the test result. In case of randomness, we expect to have many different bit patterns in the input sequence. The main strength of the topological binary test is that it focuses on the number of bit patterns rather than frequency of occurrence of numbers. Because the exact distribution of test statistic is derived, it is possible to apply the test for short bit sequences.
The function topological.binary() implements the following pseudo-code algorithm to run the test:

Algorithm 6.
Vignette for the CRAN Package CryptRndTest 1. Input data as a B × k matrix of bits, where B is the bit-length and k is the length of considered number sequence, and the critical value; 2. Find and store non-overlapping blocks of length B; 3. Count the number of different B-bit patterns that appear across all the k blocks; 4. If the result of step 3 is less than one, then reject the null hypothesis; 5. else if the result of step 3 is greater than min(k, 2 B ), then do not reject the null hypothesis; 6. else if the result of step 3 is less than the input critical value, then reject the null hypothesis; 7. else do not reject the null hypothesis; 8. Return the result of step 3 as the value of test statistic and the decision on the null hypothesis.
Although the exact distribution of test statistic is derived by Alcover et al. (2013), calculation of the Stirling numbers of the second kind with large inputs is required with bit-lengths greater than 16 for the calculation of cdf of the tests statistic. Therefore, it is hard to obtain the critical value of the test for large bit-lengths by using available functions in R packages such as the function Stirling2 of copula. This case is a limitation of the function topological.binary(). To overcome this limitation of the test, we prepared the function TBT.CriticalValue to calculate required critical values for testing. Algorithm complexity of the function topological.binary() is O(n 2 ) in the worst case. The required memory to run the topological binary test is related with the value of k.

Auxiliary functions
The package CryptRndTest has seven auxiliary functions, namely Strlng2(), GCD(), GCD.q(), GCD.big(), toBaseTwo(), toBaseTen(), and TBT.CriticalValue(). These functions are also suitable for individual use. Strlng2() is used to calculate critical values for the topological binary test implemented by TBT.CriticalValue(). GCD() and GCD.q() are called to calculate the greatest common divisor in the GCD test implemented by gcd.test(). Three possible outcomes of the greatest common divisor operation are the number of iterations, the sequence of partial quotients, and the value of greatest common divisor. GCD() provides all of these outcomes for any pair of integers excluding zero. Functions toBaseTwo() and toBaseTen() are used for base conversion from base 2 to 10 and vice versa for large integers.
The function Strlng2() is used to compute natural logarithm of Stirling numbers of the second kind for large values of inputs in an approximate manner by the approaches of Bleick and Wang (1974) and Temme (1993). In this approach, Lambert W functions are employed at the log scale to overcome memory overflows.
Due to the large factorials in the calculation of Stirling numbers of the second kind, it is nearly impossible to compute exact cdf of the topological binary test statistic for higher bit lengths without memory flows in R. The function TBT.CriticalValue() implements an approach for the calculation of cdf and approximately computes the required critical value for the topological binary test at a given level of α. Because TBT.CriticalValue() utilizes Strlng2(), accuracy of results decreases with increasing bit lengths and number of words under consideration. It is also possible to make exact calculations by TBT.CriticalValue(). In this case, the function Stirling2 from the package gmp is employed instead of Strlng2(). Because the gmp uses multiple precision arithmetic, implementation time of TBT.CriticalValue() considerable increases. User should evaluate the trade off between implementation time and high precision.
Arguments of main and auxiliary functions of CryptRndTest package are summarized in Table 1.

A numerical illustration
As a numerical illustration of the package, we employed CryptRndTest to test the randomness of RNG's available in R. By this way, we aim to get results of the tests that have not been applied to RNG's of interest yet, figure out implementation performance of CryptRndTest under various scenarios, and illustrate some issues on the determination of parameters of the tests for considered scenarios. Note that it is impossible to observe the ability to control type-I error (rejection of randomness hypothesis while it is actually true) for the tests with an empirical study such as conducted in this section.  while the rest of tests take integers as input. BDS and RWT tests are applied separately with each of Anderson-Darling, Kolmogorov-Smirnov, and chi-square goodness-of-fit tests, and GCD test is applied separately with each of Anderson-Darling, Kolmogorov-Smirnov, Jargue-Bera, and chi-square goodness-of-fit tests. The total number of applied randomness tests is 21. All the tests are applied at both 0.01 and 0.05 levels of significance and 8, 16, 32, 64, and 128-bit lengths. Considered lengths of random number sequences for each bit-length are given in Table 2. Because we get unreasonable implementation times for longer sequences at the level of 128-bit, the same sequence lengths as 64-bit are considered for 128-bit numbers.
To conduct the adaptive chi-square test, we need to determine the value of argument S and the proportions of training and testing samples. The latter one is taken equal. As for the value of S, we did not detect a significant change in the test results observed for medium sequence length for all bit-lengths for S = 2, 3, 4 in pilot runs. The values greater than 4 increase the implementation time whereas small values decrease resolution. Thus, it is taken as 4 for all bit-lengths to work with a reasonable degrees of freedom in the chi-square test. Also, adaptive chi-square test is applied for all bit-lengths.
Arguments of the birthday spacings test are the number of birthdays (m), the length of year (n), the mean rate of the theoretical Poisson distribution (lambda), and the number of classes (num.class), which is used for goodness-of-fit tests. In the experiments, the argument m was taken as 8, 128, and 4096 for 8, 16, and 32-bit-lengths, respectively. The argument n was set to 2 B , where B is the bit-length. The argument lambda was calculated by the formula given by Marsaglia and Tsang (2002). The argument num.class was set to 5 and 10 for 8 and 16-bit and higher lengths, respectively.
For the book stack test, length of the sample (n) should be determined and data should be prepared according to the value of n. Also, the number of subsets that the alphabet will be divided into (k) should be determined. The formula proposed by Ryabko and Monarev (2005) is used to calculate the value of n, and we set k=n/B.
In the GCD test procedure, tests are conducted for two outputs of GCD operation that number of iterations required to find GCD (k) and GCD (g) itself. The population distribution of k is well approximated by a normal distribution and parameters of the normal distribution are given by Marsaglia and Tsang (2002) for 32-bit integers after an extensive numerical study. We observed that the parameters of population distribution differ for different bit-lengths and conducted a numerical study to figure out the values of parameters for considered bit-lengths. For this study, 10 6 30-bit true random numbers were obtained from the web service "www.random.org." Then, they were converted to 8, 16, 32, 64, and 128-bit numbers. The GCD operation was applied and mean (mu.GCD) and standard deviation (sd.GCD) of k were obtained as given in Table 3 after checking the normality of the empirical distribution by means of descriptive statistics and Anderson-Darling goodness of fit test. The values obtained for 32-bit are very close to those obtained by Marsaglia and Tsang (2002). As expected, mean of k increases along with bit-length, and it approaches to 35 as treated by Marsaglia and Tsang (2002). The mild increase in the values of standard deviations is due to the increasing range of the numbers that can be generated with a given bit-length. Also, the GCD test is applied for all bit-lengths. However, nearly for all 128-bit random numbers, g > 35. Due to the operation done at step 15 of Algorithm 4, it is unreasonable to conduct the GCD test over g for 128-bit numbers.
Topological binary test is also applied for all bit-lengths. Critical values for topological binary test are calculated by using the function TBT.criticalValue() for each bit and sequence length combination and presented in Table 4. Because the length of sequence being tested cannot be longer than 2 m − 1, where m is the bit-length, critical values for medium and long sequences at 8-bit and for long sequences at 16-bit levels are not available in Table 4. In the application, random numbers were generated by the Function 1 given in R codes of this vignette. Experiments were carried out by the Function 2 of related R codes. In both functions, RNG is the number indicating employed pseudo random number generator, m is the bit-length, and len is the length of the random number sequence. In the function experiments(), cv.TBT is the critical value of topological binary test, and mu.GCD and sd.GCD are mean and standard deviation of true distribution of k, respectively.
Random number sequences used for the performance analysis are of medium length given in Table 2 and generated by WH generator under each bit level. Five replications were made for each test. Mean implementation times calculated over five replications are shown in Table 5 in seconds. All variances of implementation times are less than 0.01. BDS, RWT, and BS tests were not applied at all bit-lengths due to reasons explained in relevant sections. Implementation times of all tests from 8 to 64-bit levels are all sufficient. For 128 bits, most of the implementation times of Achi and GCD tests are taken by finding unique values in a sequence composed of multiple precision floating-point (mpf) numbers at step 4 of Algorithm 1 and the value of gcd for mpf numbers at step 3 of Algorithm 4, respectively. For these operations, mpf numbers are used via the package Rmpfr. The package Rmpfr is based on GMP GNU library and provides an interface from R to the C (Maechler, 2011a,b). Due to the use of mpf numbers via the package Rmpfr, there is a considerable increase in implementation time of Achi and GCD tests at 128-bit level.
However, the gain in precision is worth the delay in implementation of these tests. Performances of the tests working with binary numbers are all sufficient at 128-bit level. Implementation time of the BS test exponentially increases along with the bit-length. Although it is reasonable for 32 bits, application of the test for higher bit-lengths requires unreasonable amount of time for implementation.
All the tests were applied at both 0.01 and 0.05 levels of significance. The null hypothesis is "H 0 : Sequences generated by the RNG of interest are random" for all tests. At 0.05 level of significance, test results for all generators of interest are given in Table 6-12. Due to the similarity between results at both levels of significance, those for 0.01 level are omitted.  Test  I  II III  I  II III  I  II III  I  II III  I  II  III  TBT  1  1 1 For both levels of significance, success rates of RNGs over the total number of applied tests are given in Table 13. The total number of applied tests is given in the last row of Table 6 for each test scenario. For example, because the birthday spacings test is not applied for 64 bit-length, the total number of applied tests is 17 for all sequence lengths. Note that the values given in Table 13 should not be confused with issues related with statistical performance of the tests such as type I error or power. Table 13 represents the proportion of RNG's that did not fail in the given number of tests. In addition, because each test is applied individually, the information presented by Table 13 should not be perceived as the results of application of a test battery.
In general, proportion of success decreases with increasing sequence and bit-lengths. According to proportions of success, performance of WH generator is satisfactory for 16 and 32-bit numbers for all sequence lengths. The reason of getting a decreasing success rate with increasing bit-length is that the random walk tests with all goodness-of-fit tests and GCD test with Jarque-Bera goodness-of-fit test reject the randomness hypothesis while the rest of the tests mostly accept the hypothesis for bit-lengths greater than 32. In detail, WH generator successfully passes both of the TBT and Achi tests nearly in all bit-sequence length combinations. Results of AD and KS goodness-of-fit tests applied under both BDS and GCD tests (with k) are similar, and CS test more likely decides randomness of WH generator. It is unsuccessful in passing the random walk tests for high bit-lengths. BS test concludes WH's randomness under all of the test conditions. GCD with JB goodness-of-fit test rejects the null hypothesis of randomness under all test conditions but the first one. At 0.01 level of significance, there is nearly no change in the results. WH generator passes the GCD test with CS goodness-of-fit test for k at (8, I), (8, II) and (32, I) scenarios, and the BDS test with AD goodness-of-fit test at (16, II). According to proportions of success, SD generator mostly passes the tests for 16 and 32-bit integers for all sequence lengths, and 8-bit integers for short and long sequences. Detailed test results for SD generator at 0.05 level of significance are similar to that of WH generator for TBT, Achi, BDS, RWT, and BS tests. It is better in GCD test with JB goodness-of-fit test for k. At 0.01 level of significance, CS  Bit-length  8  16  32  64  128  Sequence length  Test  I  II III  I  II III  I  II III  I  II III  I  II  III  TBT  1  1  1  1  1  0  1  0  1  1  1  1  1  1  1  Achi  1  0  0  1  1  1  1  1  1  1  1  1  1  1 short sequence for given bit-length, II: medium sequence for given bit-length, III: long sequence for given bit-length, AD: Anderson-Darling, CS: chi-square, KS: Kolmogorov-Smirnov, JB: Jarque-Bera, Prop: proportion of success in applied tests, -: not available.  Test  I  II III  I  II III  I  II III  I  II III  I  II  III  TBT  0  1  1  0  0  0  1  0  1  1  1  1  1  1  1  Achi  1  0  0  1  1  1  1  1  1  1  1  1  1  1 8  16  32  64  128  Sequence length  Test  I  II III  I  II III  I  II III  I  II III  I  II  III  TBT  1  1  1  1  1  0  1  0  1  1  1  1  1  1  1  Achi  1  0  0  1  1  1  1  1  1  1  1  1  1  1 Test  I  II III  I  II III  I  II III  I  II III  I  II  III  TBT  1  1  1  1  1  0  1  1  1  1  1  1  1  1  1  Achi  1  0  0  1  1  1  1  1  1  1  1  1  1  1  1  BDS.AD 1  0  0  1  1  1  ------ 8  16  32  64  128  Sequence length  Test  I  II III  I  II III  I  II III  I  II III  I  II  III  TBT  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  Achi  1  0  0  1  1  1  1  1  1  1  1  1  1  1 Test  I  II III  I  II III  I  II III  I  II III  I  II  III  TBT  1  1  1  1  1  0  1  1  1  1  1  1  1  1  1  Achi  1  0  0  1  1  1  1  1  1  1  1  1  1  1  1  BDS.AD 1  0  0  1  1  1  ---------BDS.KS  1  0  0  1  1  1  ---------BDS.CS  0  1  1  0  1  1  ---------