Discussion on Covariate-assisted ranking and screening for large-scale two-sample inference

Guo Yu; Jacob Bien; Daniela Witten

. Author manuscript; available in PMC: 2019 Jun 6.

Published in final edited form as: J R Stat Soc Series B Stat Methodol. 2019 Mar 18;81(2):229–231.

Discussion on Covariate-assisted ranking and screening for large-scale two-sample inference

Guo Yu ¹, Jacob Bien ², Daniela Witten ³

PMCID: PMC6553469 NIHMSID: NIHMS1019147 PMID: 31178656

In this discussion contribution, we connect the elegant proposal of Cai, Sun and Wang to multiview data, in which multiple sets of variables (or ‘views’) are measured on the same observations. Using ideas from Section 4, we show that we can exploit a secondary view to improve power for testing on the first view.

Consider independent and identically distributed observations of m random variables under two conditions. In condition $l \in {1, 2},$ observation $i \in {1, \dots, n_{l}}$ of variable $j \in {1, \dots, m}$ is given by (view 1 )

X_{i j} (l) = μ_{j} (l) + ε_{i j} (l),

Where $ε_{i j} (l)$ is zero mean, and we suppress the common intercept. The random-mean vectors μ(1) and μ(2) are sparse. Furthermore, for the same individuals, we also observe a second view of $\tilde{m}$ variables (view 2):

Z_{i k} (l) = {\tilde{μ}}_{k} (l) + {\tilde{ε}}_{i k} (l) for k \in {1, \dots, \tilde{m}} .

The mean vectors $\tilde{μ} (l)$ are sparse, ${\tilde{ε}}_{i k} (l)$ is zero mean and again we suppress the intercept. Suppose that the two views satisfy a hierarchical sparsity constraint: for $j \in {1, \dots, m}$ and $l \in {1, 2},$

{\tilde{μ}}_{σ (j)} (l) = 0 \Rightarrow μ_{j} (l) = 0,

(6)

where $σ (j)$ maps the jth entry of μ(l) to its parent in $\tilde{μ} (l) :$ Fig. 11.

Concretely, suppose that X(l) and Z(l) contain protein and gene expression measurements respectively. If transcripts that encode the jth protein are absent (i.e. ${\tilde{μ}}_{σ (j)} (l) = 0$ ),then the jth protein cannot be present (i.e. $μ_{j} (l) = 0) .$

Suppose that $(μ_{j} (1), {\tilde{μ}}_{σ (j)} (1))$ is independent of $(μ_{j} (2), {\tilde{μ}}_{σ (j)} (2))$ . Further assume that the random errors $(ε_{i j} (l), {\tilde{ε}}_{i σ (j)} (l))$ are bivariate normal and independent across j, l and i, and independent of μ(l) and $\tilde{μ} (l) .$

Using the terminology of Cai, Sun and Wang the ‘primary statistic’ for testing $H_{0 j} : μ_{j} (1) = μ_{j} (2)$ is

T_{j} = C_{j} {{\bar{X}}_{j} (1) - {\bar{X}}_{j} (2)}

for some constant C_j. We consider a pair of ‘auxiliary statistics’,

R_{j} = D_{j} [{\bar{X}}_{j} (1) + \frac{n_{2} var {ε_{i j} (1)}}{n_{1} var {ε_{i j} (2)}} {\bar{X}}_{j} (2)],

S_{j} = E_{j} [{\bar{Z}}_{σ (j)} (1) + \frac{n_{2} cov {ε_{i j} (1), {\tilde{ε}}_{i σ (j)} (1)}}{n_{1} cov {ε_{i j} (1), {\tilde{ε}}_{i σ (j)} (2)}} {\bar{Z}}_{σ (j)} (2)],

for some constants D_j and E_j. The statistic R_j is the same as T_2j in the paper, whereas S_j is constructed by using the second data view. A small value of $| S_{j} |$ provides evidence for ${\tilde{μ}}_{σ (j)} (1) = {\tilde{μ}}_{σ (j)} (2) = 0,$ which by constraint (6) suggests that $μ_{j} (1) = μ_{j} (2) .$ By analogy with proposition 1 in the paper, the oracle statistic is

T_{OR}^{(j)} (t_{j}, r_{j}, s_{j}) \equiv Pr (θ_{1 j} = 0 | T_{j} = t_{j}, R_{j} = r_{j}, S_{j} = s_{j}) = \frac{f (t_{j}, r_{j}, s_{j} | θ_{1 j} = 0) Pr (θ_{1 j} = 0)}{f (t_{j}, r_{j}, s_{j})} = \frac{f (t_{j} | θ_{1 j} = 0) f (r_{j}, s_{j} | θ_{1 j} = 0) Pr (θ_{1 j} = 0)}{f (t_{j}, r_{j}, s_{j})} .

Moreover, $T_{OR}^{(j)} (t_{j}, r_{j}, s_{j})$ enjoys the properties in theorem 3 of the paper. Detailed proofs are available from https://hugogogo.github.io/paper/cars_discussion_supplement.pdf. If there is not a one-to-one mapping between σ(j) and j then $T_{OR}^{(j)} (t_{j}, r_{j}, s_{j})$ must be estimated carefully.

Supplementary Material

supplement

NIHMS1019147-supplement-supplement.pdf^{(230.3KB, pdf)}

Fig. 9. — (a) Power comparison and (b) empirical misclassification rates for two classes $N_{m} (μ_{1}, l_{m})$ and $N_{m} (μ_{2}, I_{m})$ based on 500 replications (FDR level *α =* 0.05; n₁ = 50; n₂ = 60; m = 1000; $μ_{1, 1 : k} = 5 / \sqrt{30}; μ_{1, (k + 1) : (2 k)} = 4 / \sqrt{30}; μ_{1, (2 k + 1) : m} = 0; μ_{2, 1 : k} = 2 / \sqrt{30}; μ_{2, (k + 1); (2 k)} = 4 / \sqrt{30}; μ_{2, (2 k + 1) : m} = 0) :,$ :, method (5) based on Benjamini and Hochberg (1995);, method (5) based on CARS;, Bayes rule

Inline graphic — (a) Power comparison and (b) empirical misclassification rates for two classes $N_{m} (μ_{1}, l_{m})$ and $N_{m} (μ_{2}, I_{m})$ based on 500 replications (FDR level *α =* 0.05; n₁ = 50; n₂ = 60; m = 1000; $μ_{1, 1 : k} = 5 / \sqrt{30}; μ_{1, (k + 1) : (2 k)} = 4 / \sqrt{30}; μ_{1, (2 k + 1) : m} = 0; μ_{2, 1 : k} = 2 / \sqrt{30}; μ_{2, (k + 1); (2 k)} = 4 / \sqrt{30}; μ_{2, (2 k + 1) : m} = 0) :,$ :, method (5) based on Benjamini and Hochberg (1995);, method (5) based on CARS;, Bayes rule

Fig. 10. — Empirical misclassification rates when the same amount of locations are chosen for both methods:, Benjamini and Hochberg (1995);, CARS;, Bayes rule

Contributor Information

Guo Yu, University of Washington, Seattle.

Jacob Bien, University of Southern California, Los Angeles.

Daniela Witten, University of Washington, Seattle.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplement

NIHMS1019147-supplement-supplement.pdf^{(230.3KB, pdf)}

PERMALINK

Discussion on Covariate-assisted ranking and screening for large-scale two-sample inference

Guo Yu

Jacob Bien

Daniela Witten

Fig. 11.

Supplementary Material

Fig. 9.

Fig. 10.

Contributor Information

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Discussion on Covariate-assisted ranking and screening for large-scale two-sample inference

Guo Yu

Jacob Bien

Daniela Witten

Fig. 11.

Supplementary Material

Fig. 9.

Fig. 10.

Contributor Information

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases