Table 1. Overview of combined test rationale and performance.
| Test | Tests combined | Rationale | Average correlationa | Observed strengths | Observed weaknesses | Overall performanceb |
|---|---|---|---|---|---|---|
| CT1 | L(1), L(2), L(3), L(∞) | Assess robustness against non-causal variants and tradeoff with number of tests combined | 0.53 | Minimal | Poor performance with risk-reducing variants | Poor (Min(p): 38.6%) (Fisher's: 47.2%) |
| CT2 | J(1), J(2), J(3), J(∞) | Assess robustness against non-causal variants and tradeoff with number of tests combined | 0.87 | Handles risk-reducing variants | Redundant | Good (Min(p): 8.9%) (Fisher's: 7.9%) |
| CT3 | CMC, L(1) | Assess impact of combining highly correlated tests | 0.92 | Minimal | Poor performance with risk-reducing variants; redundant | Poor (Min(p): 42.7%) (Fisher's: 42.2%) |
| CT4 | SKAT, J(2) | Assess impact of combining highly correlated tests | 0.99 | Handles risk-reducing variants | Redundant | Good (Min(p): 7.4%) (Fisher's: 7.4%) |
| CT5 | SKAT, CMC | Assess a ‘standard' combination of tests | 0.46 | Fairly robust | Lacks robustness to high proportion of non-causal variants | Good (Min(p): 7.5%) (Fisher's: 8.1%) |
| CT6 | L(1), L(2), L(3), L(∞), J(1), J(2), J(3), J(∞) | Assess robustness against non-causal variants and tradeoff with number of tests combined | 0.54 | Fairly robust | Some poorly performing tests (eg, length tests) make Fisher's perform suboptimally | Good (Min(p): 5.6%) (Fisher's: 9.2%) |
| CT7 | L(1), L(4), J(1), J(4) | Assess robustness against non-causal variants and tradeoff with number of tests combined | 0.50 | Fairly robust | Fairly good performance, though Fisher's performs a bit poorer due to length tests | Very good (Min(p): 4.9%) (Fisher's: 6.5%) |
| CT8 | SKAT-O, J(∞) | Assess ability to create a more robust SKAT-O | 0.77 | More robust to inclusion of non-causal variants | Slightly lower power than SKAT-O when few non-causal variants | Very good (Min(p): 4.6%) (Fisher's: 3.0%) |
Average pairwise correlation across all pairs of tests in the combined test. See Supplementary Figure 1 for complete matrix of pairwise correlations.
Percent of simulations in which method had at least 5% lower power than other methods.