FarmTest: An R Package for Factor-Adjusted Robust Multiple Testing

Abstract:

We provide a publicly available library FarmTest in the R programming system. This library implements a factor-adjusted robust multiple testing principle proposed by for large-scale simultaneous inference on mean effects. We use a multi-factor model to explicitly capture the dependence among a large pool of variables. Three types of factors are considered: observable, latent, and a mixture of observable and latent factors. The non-factor case, which corresponds to standard multiple mean testing under weak dependence, is also included. The library implements a series of adaptive Huber methods integrated with fast data-driven tuning schemes to estimate model parameters and to construct test statistics that are robust against heavy-tailed and asymmetric error distributions. Extensions to two-sample multiple mean testing problems are also discussed. The results of some simulation experiments and a real data analysis are reported.

Cite PDF Tweet

Published

Jan. 15, 2021

Received

Jun 3, 2020

Citation

Fan, et al., 2021

Volume

Pages

12/2

389 - 402


1 Introduction

In the era of big data, large-scale multiple testing problems arise from a wide range of fields, including biological sciences such as genomics and neuroimaging, social science, signal processing, marketing analytics, and financial economics. When testing multitudinous statistical hypotheses simultaneously, researchers appreciate statistically significant evidence against the null hypothesis with a guarantee of controlled false discovery rate (FDR) . Since the seminal work of , multiple testing with FDR control has been extensively studied and successfully used in many applications. Most of the existing testing procedures are tailored to independent or weakly dependent hypotheses or tests. See, , and , to name a few. The independence assumption, however, is restricted in real applications as correlation effects are ubiquitous in high dimensional measurements. Ignoring such strong dependency and directly applying standard FDR controlling procedures can lead to inaccurate false discovery control, loss of statistical power, and unreliable scientific conclusions.

Over the past decade, a multi-factor model has proven to be an effective tool for modeling cross-sectional dependence, with applications in genomics, neuroscience, and financial economics. Related references in the context of multiple testing include , , , and . A common thread of the aforementioned works is that the construction of test statistics and p-values heavily relies on the assumed joint normality of factors and noise, which is arguably another folklore regarding high dimensional data. Therefore, it is imperative to develop large-scale multiple testing tools that adjust cross-sectional dependence properly and are robust to heavy-tailedness at the same time.

Recently, developed a Factor-Adjusted Robust Multiple Test (FarmTest) procedure for large-scale simultaneous inference with highly correlated and heavy-tailed data. Their emphasis is on achieving robustness against both strong cross-sectional dependence and heavy-tailed sampling distribution. Specifically, let X=(X1,,Xp) be a random vector with mean μ=(μ1,,μp). We are interested in testing the p hypotheses H0j:μj=0, and wish to find a multiple comparison procedure to test individual hypotheses while controlling the FDR. The FarmTest method models the dependency among Xj’s through an approximate multi-factor model, namely Xj=μj+bjf+uj, where f is a zero-mean random vector capturing the dependence structure of X. The method applies to either observable or unobservable factor f. The former includes the non-factor case which corresponds to the standard multiple mean testing problem. For the latter, we estimate the factors in a data-driven way. Test statistics are then calculated by subtracting out the realized common factors. Multiple comparisons are then applied to these weakly dependent factor-adjusted test statistics. Also, adjusting the factors before testing reduces signal-to-noise ratios, which enhances statistical power. Since a data-driven eigenvalue ratio method is used to estimate the number of (latent) factors, the testing procedure still works when the dependence is weak and therefore is rather flexible.

This article describes an R library named FarmTest, which implements the FarmTest procedure(s) developed in . It is a user-friendly tool to conduct large-scale hypothesis testing, especially when one or several of the following scenarios are present: the dimensionality is far larger than the sample size available; the data is heavy-tailed and/or asymmetric; there is strong cross-sectional dependence among the data. FarmTest is implemented using the Armadillo library with Rcpp interfaces . A simple call of FarmTest package only requires the input of a data matrix and the null hypotheses to be tested. It outputs the hypotheses that are rejected, along with the p-values and some estimated parameters which may be of use in further analysis. Testing can be carried out for both one-sample and two-sample problems.

Another key feature of our package is that it implements several recently developed robust methods for fitting regression models and covariance estimation . When data is generated from a heavy-tailed distribution, test statistics that are based on the least-squares method are sensitive to outliers, which often causes significant false discoveries and suboptimal power . The effect of heavy-tailedness is amplified by high dimensionality; even moderate-tailed distributions can generate very large outliers by chance, making it difficult to separate the true signals from spurious variables. As a result, large-scale multiple testing based on non-robust statistics may engender an excessive false discovery rate, which arguably is one of the causes of the current crisis in reproducibility in science. Moreover, to choose the multiple tuning parameters in robust regression and covariance estimation, we employ the recently developed data-driven procedures , which are particularly designed for adaptive Huber regression and are considerably faster than the cross-validation method used in .

We further remark that most existing multiple testing R packages do not address the robustness against both heavy-tailed distribution and strong dependence. The hypothesis testing function in R, named t.test, neither adjusts for strong dependence in the data nor estimates the parameters in focus robustly. The built-in function p.adjust or the package qvalue only adjust user-input p-values for multiple testing and do not address the problem of estimating the p-values themselves. The package multcomp provides simultaneous testing tools for general linear hypotheses in parametric models under the assumptions that the central limit theorem holds. The package multtest is developed to implement non-parametric bootstrap and permutation resampling-based multiple testing procedures. The multtest can calculate test statistics based on ranked data which is robust against outliers but yields biased mean estimators. In addition, multtest cannot explicitly model the dependence structure in data. The package mutoss is designed to apply many existing multiple hypothesis testing procedures with FDR control and p-value correction. Nevertheless, none of the tools in mutoss is suitable to deal with both strong dependency and heavy-tailedness. Moreover, existing packages are often difficult to navigate since users need to combine many functions to perform multiple tests.

2 Factor-adjusted robust multiple testing

In this section, we revisit the problem of simultaneous inference on the mean effects under a factor model and discuss the main ideas behind the FarmTest method developed by .

Multiple testing with false discovery rate control

Suppose we observe n independent data vectors X1,,Xn from a p-dimensional random vector X=(X1,,Xp). Further, let μ=(μ1,,μp) and Σ=(σjk)1j,kp denote the mean vector and covariance matrix of X, respectively. In the language of hypothesis testing, we are interested in one of the following three types of hypotheses: (1)H0j:μj=hj0  versus   H1j:μjhj0; (2)H0j:μjhj0  versus   H1j:μj>hj0;

(3)H0j:μjhj0  versus   H1j:μj<hj0; for j=1,,p. In the default setting, hj0=0 for all j.

Here we take the two-sided test ((1)) as an example to discuss the false discovery rate (FDR) control. For 1jp, let Tj be a generic test statistic for the jth hypothesis. Given a prespecified threshold z>0, we reject the jth null hypothesis if |Tj|z. The FDR is defined as the expected value of the false discovery proportion (FDP): FDR(z)=E{FDP(z)} with FDP(z)=V(z)/max{R(z),1}, where R(z)=j=1p1(|Tj|z) is the number of total rejections and V(z)=j:μj=hj01(|Tj|z) is the number of false discoveries. If the FDP(z) were known, the rejection threshold will be zα=inf{z0:FDP(z)α} in order to achieve FDP control. Notice that R(z) is observable given the data while V(z) is an unobserved random quantity that needs to be estimated.

Assume that there are p0=π0p true nulls and p1=(1π0)p true alternatives. Suppose the constructed test statistic Tj is close in distribution to standard normal for every j=1,,p, if the test statistics are weakly dependent. Heuristically the number of false discoveries V(z) is close to 2p0Φ(z) for any z0. A conservative way is to replace V(z) by 2pΦ(z). Assuming the normal approximation is sufficiently accurate, 2pΦ(z) provides an overestimate of the number of false discoveries, resulting in an underestimate of the FDP(z). A more accurate method is to estimate the unknown proportion of null hypotheses π0=p0/p from the data. Let {Pj=2Φ(|Tj|)}j=1p be the approximate p-values. For a predetermined λ[0,1), suggest to estimate π0 by π^0(λ)={(1λ)p}1j=1p1(Pj>λ), because larger p-values are more likely to come from the true null hypotheses. Consequently, a data-driven rejection threshold is z^α=inf{z0:FDP^(z)α}, where FDP^(z)=2π^0(λ)pΦ(z)/R(z).

Factor-adjusted test statistics

In this section, we discuss the construction of test statistics under strong cross-sectional dependency captured by common factors. Specifically, we allow the p coordinates of X to be strongly correlated through an approximate factor model of the form X=μ+Bf+u, where B=(b1,,bp)Rp×K represents the factor loading matrix, f=(f1,,fK)RK is the common factor, and u=(u1,,up)Rp denotes a vector of idiosyncratic errors uncorrelated with f. The observed samples thus follow (4)Xi=μ+Bfi+ui,   i=1,,n, where (fi,ui)’s are independent copies of (f,u). Assume that both f and u have zero means. Further, denote by Σf and Σu=(σu,jk)1j,kp the covariance matrices of f and u, respectively.

Our package allows the common factor f to be either observable or unobservable. In the former case, we observe {(Xi,fi)}i=1n so that model ((4)) is reduced to a multi-response linear regression problem; for the latter, we only observe {Xi}i=1n and therefore need to recover the latent factors. The latent factor model has identifiability issues; see for a set of possible solutions. For simplicity, we assume that Σf=IK and BB is diagonal.

Robust estimation

As another key feature, the FarmTest method is robust against heavy-tailed sampling distributions. Under such scenarios, the ordinary least squares estimators can be suboptimal. Recently, and proposed the adaptive Huber regression method, the core of which is Huber’s M-estimator with a properly calibrated robustification parameter that adapts to the sample size, dimensionality and noise level. They showed that the adaptive Huber estimator admits a sub-Gaussian-type deviation bound under mild moment conditions. This package exploits this approach to estimate the unknown parameters and to construct test statistics.

3 Algorithms

In this section, we formally describe the algorithms for the FarmTest procedure. We revisit and discuss procedures for the two scenarios with observable and unobservable/latent factors . Notice that the two scenarios are inherently different in terms of estimating unknown parameters and constructing test statistics. Moreover, the selection of tuning parameters is based on the recent methods proposed by and .

Observable factors

Suppose we observe independent data vectors {(Xi,fi)}i=1n from model (4). The testing procedure for the hypotheses in (1)(3) is described in Algorithm 1. Algorithm 1 automatically selects the robustification parameters {τj,υj}j=1p following the data-driven method proposed by . See the Section of 3.4 for more details. To enhance the finite sample performance, alternatively we can use the weighted/multiplier bootstrap to compute p-values for all the marginal hypotheses. For b=1,,B, we obtain the corresponding bootstrap draw of (μ^j,b^j) via (μ^b,j,b^b,j)=argminμ,bi=1nwb,ijτj(Xijμfib), where {wb,ij,i=1,,n,j=1,p} are independent and identically distributed (iid) random variables that are independent from the data and satisfy E(wb,ij)=1 and var(wb,ij)=1. To retain convexity of the loss function, nonnegative random weights are preferred, such as wb,ijExp(1)—exponential distribution with rate 1, or wb,ij2Ber(1/2)P(wb,ij=0)=P(wb,ij=2)=1/2. For two-sided alternatives, the bootstrap p-values are then defined as Pj=(1/B)b=1BI(|μ^b,jμ^j||μ^j|), followed by Steps 5–7 in Algorithm 1.

Input: Data {(Xi,fi)}i=1n, null hypotheses {hj0}j=1p, and α,λ(0,1)

  1. For j=1,,p, obtain the Huber estimators

    (μ^j,b^j)argminμ,bi=1nτj(Xijμfib).

  2. Estimation of residual variances σu,jj’s: compute

    (i)

    Σ^f=(1/n)i=1nfifi, θ^jargminθi=1nυj(Xij2θ) for j=1,,p;

    (ii)

    σ^u,jj=θ^jμ^j2b^jΣ^fb^j  if  θ^j>μ^j2+b^jΣ^fb^j; otherwise σ^u,jj=θ^j.

  3. Construct test statistics Tj=n/σ^u,jj(μ^jhj0) for j=1,,p.

  4. Compute p-values {Pj}j=1p={{2Φ(|Tj|)}j=1pfor (???),{Φ(Tj)}j=1pfor (???),{Φ(Tj)}j=1pfor (???).

  5. Estimate the proportion of true alternatives: π^0(λ)=Card{Pj>λ}(1λ)p.

  6. Order the p-values as P(1)P(p).
    Compute the rejection threshold t:=max{1jp:P(j)αjπ^0(λ)p}

  7. Reject each hypothesis in the set {1jp:PjP(t)}.

Output: Rejected hypotheses, p-values, other estimated parameters
Algorithm 1: FarmTest with known factors

An extension of Algorithm 1 to the two-sample problem is also implemented in the package. Suppose we observe two independent samples {(Xi,fiX)}i=1n1 and {(Yi,fiY)}i=1n2 from the models (5)X=μX+BXfX+uX and Y=μY+BYfY+uY. We are interested in the p hypotheses H0j:μjXμjY=hj0 versus H1j:μjXμjYhj0 or versus some one-sided alternatives. To begin with, applying Step 1 in Algorithm 1 separately to each dataset to obtain the estimates {(μ^jX,μ^jY)}j=1p and {(σ^u,jjX,σ^u,jjY)}j=1p. Next, define the two-sample counterparts of the test statistics in Step 2 as Tj=(μ^jXμ^jYhj0)/σ^u,jjX/n1+σ^u,jjY/n2 for j=1,,p. After that, we follow Steps 3–7 as in Algorithm 1 to obtain the p-values and rejected hypotheses.

Latent factors

In this section, suppose we are given independent observations {Xi}i=1n. The strong dependency among the coordinates of Xi is captured by a latent factor fi . Due to the need of recovering latent factors from the data, the corresponding testing procedure is more involved. We summarize the major steps in Algorithm 2. All the tuning parameters required for Algorithm 2, {τj,υj}j=1p and {υjk}1j<kp, are automatically selected from the data; see 3.4.

Input: Data {Xi}i=1n, null hypotheses {hj0}j=1p, and α,λ(0,1)

  1. For j=1,,p, compute

    • μ^j=argminμi=1nτj(Xijμ), θ^j=argminθi=1nυj(Xij2θ),
    • σ^jj={θ^jμ^j2if θ^j>μ^j2,θ^jotherwise.
  2. Define the paired data {Y1,Y2,,YN}={X1X2,X1X3,,Xn1Xn}, where N=n(n1)/2. For 1j<kp, compute

    • σ^jk=argminθi=1Nυjk(YijYik/2θ), and σ^kj=σ^jk.
  3. Define the covariance matrix estimator Σ^=(σ^jk)1j,kp.

    • Let λ1λ2λp be the ordered eigenvalues of Σ^ and denote by v1,v2,,vp the corresponding eigenvectors.
    • Calculate K=argmax1kmin(n,p)/2λkλk+1. This step is omitted if K is user-specified.
    • Calculate B^=(b^1,,b^p)=(λ11/2v1,,λK1/2vK)Rp×K.
  4. f¯=argminfRKj=1pγ(X¯jb^jf), where X¯j=(1/n)i=1nXij.

  5. For j=1,,p, compute σ^u,jj={σ^jjb^j22if σ^jj>b^j22,σ^jjotherwise.

  6. Construct test statistics Tj=n/σ^u,jj(μ^jb^jf¯hj0), j=1,,p.

  7. Compute p-values Pj={{2Φ(|Tj|)}j=1pfor (???),{Φ(Tj)}j=1pfor (???),{Φ(Tj)}j=1pfor (???).

  8. Estimate the proportion of true alternatives: π^0(λ)=Card{Pj>λ}(1λ)p.

  9. Compute the rejection threshold t:=max{1jp:P(j)αjπ^0(λ)p}

  10. Reject each hypothesis in the set {1jp:PjP(t)}.

Output: Rejected hypotheses, p-values, other estimated parameters
Algorithm 2: FarmTest with latent factors

An extension of Algorithm 2 to the two-sample problem is also included in the library. Suppose we observe two independent samples {Xi}i=1n1 and {Yi}i=1n2, and wish to test the hypotheses H0j:μjXμjY=hj0 versus H1j:μjXμjYhj0 or some one-sided alternatives. In this case, Steps 1–5 in Algorithm 2 are applied separately to each dataset to obtain the estimates {(μ^jX,μ^jY)}j=1p, {(σ^u,jjX,σ^u,jjY)}j=1p, B^X, B^Y, f¯X and f¯Y. After replacing the test statistics in Step 6 with Tj=(μ^jXb^jX,f¯X)(μ^jYb^jY,f¯Y)hj0σ^u,jjX/n1+σ^u,jjY/n2,j=1,,p, one can follow Steps 7–10 to obtain the p-values and rejected hypotheses.

Partially observable factors

Motivated by applications to comparative microarray experiments and mutual fund selection , we further discuss the case where both explanatory variables and latent factors are present. The statistical model is of the form $$ Xi=μ+Bfi+Cgi+ui,  i=1,,n,

$$ where fiRK is a vector of explanatory variables and giRL represents the latent factor. Here L1 may be user-specified or unknown. For multiple comparison of the mean effects under this model, the FarmTest package can be used in a two-stage fashion. In the first stage, apply Algorithm 1 to fit model (4) with observed data {(Xi,fi)}i=1n and compute fitted residuals Xires=XiB^fi; in the second stage, run Algorithm 2 on {Xires}i=1n to conduct factor-adjusted multiple testing.

Selection of tuning parameters

The FarmTest procedure involves multiple tuning parameters, including the number of factors K (if not specified by the user) and robustification parameters for fitting factor models. For the former, we apply the eigenvalue ratio method to estimate K, that is, K^=argmax1kKmaxλk(Σ^)/λk+1(Σ^), where Σ^ is a generic covariance matrix estimator with eigenvalues λ1(Σ^)λp(Σ^), and Kmax be a prescribed upper bound. In the library, we take Kmax=min(n,p)/2. This method is chosen as it does not involve other hyperparameters (except Kmax). When the factors are unobservable, the estimation of K is essentially an un-supervised learning problem. We choose K to be the smallest nonnegative integer such that the residuals XiBfi are weakly correlated. Therefore, slight overestimation of K does not affect much of the testing results. If K is set to be zero, the FarmTest library directly applies a robust multiple testing procedure based on Huber’s M-estimation partnered with multiplier bootstrap. See for more details.

The robustification parameter in the Huber loss plays an important role in controlling the bias-robustness tradeoff. According to the theoretical analysis in , the optimal choice of τj in Algorithm 1 depends on the variance of Xj. Due to heterogeneity, we have p different τj’s that need to be selected from the data. Furthermore, the covariance estimation step in Algorithm 2 entails as many as p(p1)/2 parameters υjk. Cross-validation is therefore computationally expensive when the dimension is large. Recently, and proposed fast data-driven methods, which estimate the regression coefficients/covariances and calibrate the tuning parameter simultaneously by solving a system of equations. Numerical studies therein suggest that this data-driven method is considerably faster than cross-validation while performs equally as well.

4 Package overview

The FarmTest package is publicly available from the Comprehensive R Archive Network (CRAN) and its GitHub page https://github.com/XiaoouPan/FarmTest. It contains four core functions. The main function farm.test carries out the entire FarmTest procedure, and outputs the testing results along with several useful estimated model parameters. User-friendly summary, print, and plot functions that summarize and visualize the test outcome are equipped with farm.test. The other three functions, huber.mean, huber.cov and huber.reg implement data-driven robust methods for estimating the mean vector and covariance matrix as well as the regression coefficients . In particular, the huber.reg function uses the gradient descent algorithm with Barzilai and Borwein step size . In this section, we focus primarily on introducing the farm.test function, and demonstrate its usage with numerical experiments.

A showcase example

We first present an example by applying the package to a synthetic dataset. To begin with, we use the rstiefel package to simulate a uniformly distributed random orthonormal matrix as the loading matrix B after rescaling. With sample size n=120, dimension p=400 and number of factors K=5, we generate data vectors {Xi}i=1n from model (4), where the factors fiRK follow a standard multivariate normal distribution and the noise vectors uiRp are drawn from a multivariate t3 distribution with zero mean and identity covariance matrix. For the mean vector μ=(μ1,,μp), we set the first p1=100 coordinates to be 1 and the rest to be 0.

library(FarmTest)
library(rstiefel)
library(mvtnorm)
n <- 120
p <- 400
K <- 5
set.seed(100)
B <- rustiefel(p, K) %*% diag(rep(sqrt(p), K))
FX <- rmvnorm(n, rep(0, K), diag(K))
p1 <- 100
strength <- 1
mu <- c(rep(strength, p1), rep(0, p - p1))
U <- rmvt(n, diag(p), 3)
X <- rep(1, n) %*% t(mu) + FX %*% t(B) + U

Function call with default parameters

Using the data generated above, let us call the main function farm.test with all default optional parameters, and then print the outputs.

output <- farm.test(X)
output

One-sample FarmTest with unknown factors
n = 120, p = 400, nFactors = 5
FDR to be controlled at: 0.05
Alternative hypothesis: two.sided
Number of hypotheses rejected: 104

As shown in the snapshot above, the function farm.test correctly estimates the number of factors, and rejects 104 hypotheses with 4 false discoveries. For this individual experiment, the FDP and power are 0.038 and 1, respectively as calculated below. Here the power is referred to as the ratio between the number of correct rejections and the number of nonnulls p1.

FDP <- sum(output$reject > p1) / length(output$reject)
FDP

[1] 0.03846154

power <- sum(output$reject <= p1) / p1
power

[1]  1

All the outputs are incorporated into a list, which can be quickly examined by names() function. See Table 1 for detailed descriptions of the outputs.

names(output)

 [1] "means"       "stdDev"      "loadings"    "eigenVal"    "eigenRatio"  "nFactors"   
 [7] "tStat"       "pValues"     "pAdjust"     "significant" "reject"      "type"
[13] "n"           "p"           "h0"          "alpha"       "alternative"
Table 1: Objects in the output list of farm.test function with their implications, and description of data type and class in R language.
Output Implication Data type R class
means estimated means p-vector matrix
stdDev estimated standard deviations p-vector matrix
loadings estimated loading matrix (p×K)-matrix matrix
eigenVal eigenvalues of estimated covariance p-vector matrix
eigenRatio eigenvalue ratios of estimated covariance (min{n,p}/2)-vector matrix
nFactors (estimated) number of factors positive integer integer
tStat test statistics p-vector matrix
pValues p-values p-vector matrix
pAdjust adjusted p-values p-vector matrix
significant indicators of significance boolean p-vector matrix
reject indices of rejected hypotheses vector integer
type whether factor is known string character
n sample size positive integer integer
p data dimension positive integer integer
h0 null hypothesis p-vector numeric
alpha nominal FDR level numerical number numeric
alternative alternative hypothesis string character

We can present the testing results using the affiliated summary function.

head(summary(output))

      means     p-values   p-adjusted significance
1 1.0947056 1.768781e-18 8.936997e-17            1
2 0.8403608 3.131733e-09 1.157817e-08            1
3 0.8668348 1.292850e-11 6.532295e-11            1
4 0.9273998 2.182485e-12 1.350281e-11            1
5 0.7257105 7.699350e-08 2.593465e-07            1
6 0.9473088 1.180288e-13 1.192712e-12            1

To visualize the testing results, in Figure 1 we present several plots based on the outputs. From the histograms of estimated means and test statistics, we see that data are generally categorized into two groups, one of which has μ^j concentrated around 1 and test statistics bounded away from 0. It is therefore relatively easy to identify alternatives/signals from the nulls. From the eigenvalue ratio plot, we see that the fifth ratio (highlighted as a red dot) is evidently above the others, thus determining the number of factors. The scree plot, on the other hand, reveals that the top 5 eigenvalues (above the red dashed line) together explain the vast majority of the variance, indicating that the proportion of common variance (due to common factors) is high.

graphic without alt textgraphic without alt text
graphic without alt textgraphic without alt text

Figure 1: Upper panel: histograms of estimated means and test statistics. Lower panel: eigenvalue ratio plot with the largest ratio highlighted and scree plot of the eigenvalues of the estimated covariance matrix.

Function call with options

In this section, we illustrate farm.test function with other options that allow us to call it more flexibly. When the factors are observable, we can simply put the n×K factor matrix into argument fX, and the output is formatted the same as before. As a remark, among all the items listed in Table 1, eigenVal and eigenRatio, which are eigenvalues and eigenvalue ratios of estimated covariance matrix, are not available in this case; see Algorithm 1.

output <- farm.test(X, fX = FX)
output

One-sample FarmTest with known factors
n = 120, p = 400, nFactors = 5
FDR to be controlled at: 0.05
Alternative hypothesis: two.sided
Number of hypotheses rejected: 101

Consider one-sided alternatives H1j:μj0, j=1,p with a nominal FDR level 1%. We modify the arguments alternative and alpha as follows:

output <- farm.test(X, alternative = "greater", alpha = 0.01)
output

One-sample FarmTest with unknown factors
n = 120, p = 400, nFactors = 5
FDR to be controlled at: 0.01
Alternative hypothesis: greater
Number of hypotheses rejected: 101

Users can specify null hypotheses by passing any vector with length p into argument h0. In the next example, we consider the p null hypotheses as all the means are equal to 1, so that the number of true nonnulls becomes 300.

output <- farm.test(X, h0 = rep(1, p), alpha = 0.01)
output

One-sample FarmTest with unknown factors
n = 120, p = 400, nFactors = 5
FDR to be controlled at: 0.01
Alternative hypothesis: two.sided
Number of hypotheses rejected: 300

When the factors are unknown, users can also specify the number of factors based on some subjective grounds. In this case, Step 3 in Algorithm 2 is avoided. For example, we run the function with the number of factors chosen to be KX = 2, which is less than the true parameter 5. This misspecification results in a loss of power with two true alternatives unidentified.

output <- farm.test(X, KX = 2)
power <- sum(output$reject <= p1) / p1
power

[1] 0.98

As a special case, if we declare KX = 0 in the function, a robust multiple test without factor-adjustment is conducted.

output <- farm.test(X, KX = 0)
output

One-sample robust multiple test without factor-adjustment
n = 120, p = 400
FDR to be controlled at: 0.05
Alternative hypothesis: two.sided
Number of hypotheses rejected: 95

Finally, we present an example of two-sample FarmTest. Using the same sampling distributions for the factor loading matrix, factors and noise vectors, we generate another sample {Yi}i=1m from model (5) with m=150.

m <- 150
set.seed(200)
BY <- rustiefel(p, K) %*% diag(rep(sqrt(p), K))
FY <- rmvnorm(m, rep(0, K), diag(K))
uY <- rmvt(m, diag(p), 3)
Y <- FY %*% t(BY) + uY

Then farm.test function can be called with an additional argument Y.

output <- farm.test(X, Y = Y)
output

Two-sample FarmTest with unknown factors
X.n = 120, Y.n = 150, p = 400, X.nFactors = 5, Y.nFactors = 5
FDR to be controlled at: 0.05
Alternative hypothesis: two.sided
Number of hypotheses rejected: 105

The output is formatted similarly as in Table 1, except that means, stdDev, loadings, eigenVal, eigenRatio, nFactors and n now consist of two items for samples X and Y.

names(output$means)

[1] "X.mean" "Y.mean"

5 Simulations

In this section, we assess and compare the performance of farm.test function in the FarmTest package with the following methods:

For t-test and WMW-test, the functions we call produce vectors of p-values, to which the method proposed in is applied, see Steps 5–7 in Algorithm 1 or Steps 8–10 in Algorithm 2.

In all the numerical experiments, we consider two-sided alternatives with a nominal FDR level α=5%. The true number of factors is 5. Factors and loadings are generated the same way as in 4.1 Section. To add dependency among idiosyncratic errors, the covariance matrix of u, denoted by Σu, is taken to be a block-diagonal symmetric matrix with block size 5×5. Within each block, the diagonal entries are all equal to 3 and the off-diagonal entries are generated from U[0,1]. In the simulations, we drop the case where the generated Σu is not positive-definite. The distribution of u is specified in two models as follows.

For each model, we consider various combinations of sample size n and dimensionality p, specifically, n{60,80,100,120,140} and p{200,400,600,800,1000}. The number of true alternatives p1 is taken to be 0.2p, and the signal strength is set as 4log(p)/n.

Figures 2 and 3 depict the FDR and power curves for either "fixed n growing p" or "fixed p growing n" based on 200 simulations. Across various settings, FarmTest consistently maintains high empirical power with FDR well controlled around the nominal level. In contrast, the competing methods may lose as many as 10% to 30% powers, which can be ascribed to not accounting for the common factors. In summary, we conclude that the FarmTest package provides an efficient implementation of the FarmTest method, which carries out multiple testing for multivariate data with heavy-tailed distribution and a strong dependency structure.

graphic without alt textgraphic without alt text
graphic without alt textgraphic without alt text

Figure 2: Comparison of FarmTest with three other methods in terms of FDR and power under Model 1 (multivariate normal distribution). In the upper panel, p is fixed at 600 and n grows from 60 to 140; in the lower panel, n is fixed at 100 and p ranges from 200 to 1000.

graphic without alt textgraphic without alt text
graphic without alt textgraphic without alt text

Figure 3: Comparison of FarmTest with three other methods in terms of FDR and power under Model 2 (multivariate t-distribution). In the upper panel, p is fixed at 600 and n grows from 60 to 120; in the lower panel, n is fixed at 100 and p ranges from 200 to 1000.

6 Real data example

In this section, we apply the FarmTest package to test the mean effects of stock returns. In capital asset pricing theory, the stock’s risk-adjusted mean return or "alpha" is a quantity of interest since it indicates the excessive return incurred from investing in a particular stock. If the efficient equity market hypothesis holds, we expect "alpha" to be zero. Hence, detecting non-zero alphas can help investors to identify market inefficiencies, that is, whether certain stocks exhibit an abnormal rate of return or are mispriced. As discussed in , both cross-sectional dependency and heavy tailedness are silent features of stock returns.

In this study, we test the annual mean effects of stocks in the S&P500 index. The data is available on COMPUSTAT and CRSP databases. We find that most of the stocks with continuous membership in the S&P500 index from 2008 to 2016 have excess kurtosises greater than zero, indicating tails heavier than that of a normal distribution. Also, more than 33% of the stocks are severely heavy-tailed as their excess kurtosises exceed 6, which is the excess kurtosis of t5-distribution. We collect monthly returns of stocks from the S&P500 index over rolling windows: for each month between 2008 and 2016, we collect monthly returns of stocks who have continuous records over the past year. The average number of stocks collected each year is 598. For each rolling window, we conduct multiple testing using the four methods considered in the previous section, that is, FarmTest, t-test, WMW-test, and RmTest.

The nominal FDR level is set as α=1%. Within each rolling time window, we have p600 and n=12. The numbers of discoveries of each method are depicted chronologically in Figure 4, and Table 2 displays several key summary statistics. Since the t-test barely discovers any stock throughout the whole procedure, we only present the results for the other three methods in Figure 4. It is interesting to observe that across different time rolling windows, the testing outcomes of the WMW-test are relatively stable and time-insensitive. FarmTest, on the other hand, selects much fewer stocks in the year of 2009, coinciding to some extent with the financial crisis during which the market volatility is much higher. RmTest typically selects the most stocks, which is partly due to the lack of FDR control under strong dependency. A major, noticeable impact of dependence is that it results in clusters of rejections: if a test is rejected, then there are likely to be further rejections for tests that are highly correlated with this one. This phenomenon is in accord with our simulation results, showing that FarmTest simultaneously controls the FDR and maintains high power while the other methods either make too many false discoveries or fail to detect true signals.

graphic without alt text

Figure 4: Stack bar plot of the numbers of discoveries via FarmTest, WMW-test and RmTest from 2008 to 2016, using rolling windows of one year. Within each time window, we report the number of stocks in the S&P500 index that show significant statistical evidence against null hypotheses that there are no excessive returns, with FDR controlled at 1%.
Table 2: Summary statistics of the number of discoveries via FarmTest, WMW-test and RmTest between 2008 and 2016 using rolling windows of size 12 (months).
Method Mean Std. Dev. Median Min Max
FarmTest 14.477 11.070 12 0 52
WMW-test 10.991 1.005 11 8 12
RmTest 8.147 14.414 3 0 68

7 Summary

We provide an R package to implement FarmTest, a flexible large-scale multiple testing method that is robust against strongly dependent and heavy-tailed data. The factor-adjustment procedure helps to construct weakly dependent test statistics, and also enhances statistical power by reducing the signal-to-noise ratio. Moreover, by exploiting the idea of adaptive Huber regression, the testing procedure is robust against heavy-tailed noise. The efficacy of our package is demonstrated on both real and simulated datasets.

CRAN packages used

FarmTest, Rcpp, multcomp, mutoss, rstiefel

CRAN Task Views implied by cited packages

Bayesian, ClinicalTrials, HighPerformanceComputing, NumericalMathematics, Survival

Bioconductor packages used

qvalue, multtest

Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

Footnotes

    References

    S. C. Ahn and A. R. Horenstein. Eigenvalue ratio test for the number of factors. Econometrica, 81(3): 1203–1227, 2013. URL https://doi.org/10.3982/ECTA8968.
    J. Bai and K. Li. Statistical analysis of factor models of high dimension. The Annals of Statistics, 40(1): 436–465, 2012. URL https://doi.org/10.1214/11-AOS966.
    J. Barzilai and J. M. Borwein. Two-point step size gradient methods. IMA Journal of Numerical Analysis, 8(1): 141–148, 1988. URL https://doi.org/10.1093/imanum/8.1.141.
    Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1): 289–300, 1995. URL https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
    X. Chen and W.-X. Zhou. Robust inference via multiplier bootstrap. The Annals of Statistics, 2019. URL https://arxiv.org/abs/1903.07208.
    R. Cont. Empirical properties of asset returns: Stylized facts and statistical issues. Quantitative Finance, 1(2): 223–236, 2001. URL https://doi.org/10.1080/713665670.
    K. H. Desai and J. D. Storey. Cross-dimensional inference of dependent high-dimensional data. Journal of the American Statistical Association, 107(497): 135–151, 2011. URL https://doi.org/10.1080/01621459.2011.645777.
    D. Eddelbuettel and R. Francois. Rcpp: Seamless R and C++ integration. Journal of Statistical Software, 40(8): 1–18, 2011. URL https://doi.org/10.18637/jss.v040.i08.
    D. Eddelbuettel and C. Sanderson. RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Computational Statistics and Data Analysis, 71: 1054–1063, 2014. URL https://doi.org/10.1016/j.csda.2013.02.005.
    J. Fan and X. Han. Estimation of the false discovery proportion with unknown dependence. Journal of the Royal Statistical Society: Series B (Methodological), 79(4): 1143–1164, 2017. URL https://doi.org/10.1111/rssb.12204.
    J. Fan, X. Han and W. Gu. Estimating false discovery proportion under arbitrary covariance dependence. Journal of the American Statistical Association, 107(499): 1019–1035, 2012. URL https://doi.org/10.1080/01621459.2012.720478.
    J. Fan, Y. Ke, Q. Sun and W.-X. Zhou. FarmTest: Factor-adjusted robust multiple testing with approximate false discovery control. Journal of the American Statistical Association, 114(528): 1880–1893, 2019. URL https://doi.org/10.1080/01621459.2018.1527700.
    J. Fan, Q. Li and Y. Wang. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society: Series B (Methodological), 79(1): 247–265, 2017. URL https://doi.org/10.1111/rssb.12166.
    C. Friguet, M. Kloareg and D. Causeur. A factor model approach to multiple testing under dependence. Journal of the American Statistical Association, 104(488): 1406–1415, 2009. URL https://doi.org/10.1198/jasa.2009.tm08332.
    C. Genovese and L. Wasserman. A stochastic process approach to false discovery control. The Annals of Statistics, 32(3): 1035–1061, 2004. URL https://doi.org/10.1214/009053604000000283.
    P. D. Hoff. Simulation of the matrix Bingham–von Mises–Fisher distribution, with applications to multivariate and relational data. Journal of Computational and Graphical Statistics, 18(2): 438–456, 2012. URL https://doi.org/10.1198/jcgs.2009.07177.
    T. Hothorn, F. Bretz and P. Westfall. Simultaneous inference in general parametric models. Biometrical Journal, 50(3): 346–363, 2008. URL https://doi.org/10.1002/bimj.200810425.
    P. J. Huber. Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1): 73–101, 1964. URL https://doi.org/10.1214/aoms/1177703732.
    Y. Ke, S. Minsker, Z. Ren, Q. Sun and W.-X. Zhou. User-friendly covariance estimation for heavy-tailed distributions. Statistical Science, 34(3): 454–471, 2019. URL https://doi.org/10.1214/10.1214/19-STS711.
    C. Lam and Q. Yao. Factor modeling for high-dimensional time series: Inference for the number of factors. The Annals of Statistics, 40(2): 694–726, 2012. URL https://doi.org/10.1214/12-AOS970.
    W. Lan and L. Du. A factor-adjusted multiple testing procedure with application to mutual fund selections. Journal of Business and Economic Statistics, 37(1): 147–157, 2019. URL https://doi.org/10.1080/07350015.2017.1294078.
    J. T. Leek and J. D. Storey. A general framework for multiple testing dependence. Proceedings of the National Academy of Sciences of the United States of America, 105(48): 18718–18723, 2008. URL https://doi.org/10.1073/pnas.0808709105.
    E. L. Lehmann and J. P. Romano. Generalizations of the familywise error rate. The Annals of Statistics, 33(3): 1138–1154, 2005. URL https://doi.org/10.1214/009053605000000084.
    K. S. Pollard, S. Dudoit and M. J. van der Laan. Multiple testing procedures: The multtest package and applications to genomics. Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Springer, 249–271, 2005. URL https://link.springer.com/chapter/10.1007/0-387-29362-0_15.
    C. Sanderson and R. Curtin. Armadillo: A template-based C++ library for linear algebra. Journal of Open Source Software, 1(2): 26, 2016. URL https://doi.org/10.21105/joss.00026.
    J. D. Storey. A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Methodological), 64(3): 479–498, 2002. URL https://doi.org/10.1111/1467-9868.00346.
    Q. Sun, W.-X. Zhou and J. Fan. Adaptive Huber regression. Journal of the American Statistical Association, 115(529): 254–265, 2020. URL https://doi.org/10.1080/01621459.2018.1543124.
    L. Wang, C. Zheng, W. Zhou and W.-X. Zhou. A new principle for tuning-free huber regression. Statistica Sinica, to appear: 2020. URL https://doi:10.5705/ss.202019.0045.
    W.-X. Zhou, K. Bose, J. Fan and H. Liu. A new perspective on robust M-estimation: Finite sample theory and applications to dependence-adjusted multiple testing. The Annals of Statistics, 46(5): 1904–1931, 2018. URL https://doi.org/10.1214/17-AOS1606.

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Fan, et al., "FarmTest: An R Package for Factor-Adjusted Robust Multiple Testing", The R Journal, 2021

    BibTeX citation

    @article{RJ-2021-023,
      author = {Fan, Koushiki Bose, Jianqing and Ke, Yuan and Zhou, Xiaoou Pan, Wen-Xin},
      title = {FarmTest: An R Package for Factor-Adjusted Robust Multiple Testing},
      journal = {The R Journal},
      year = {2021},
      note = {https://rjournal.github.io/},
      volume = {12},
      issue = {2},
      issn = {2073-4859},
      pages = {389-402}
    }