Abstract
The largest eigenvalue of a single or a double Wishart matrix, both known as Roy’s largest root, plays an important role in a variety of applications. Recently, via a small noise perturbation approach with fixed dimension and degrees of freedom, Johnstone and Nadler derived simple yet accurate approximations to its distribution in the real valued case, under a rank-one alternative. In this paper, we extend their results to the complex valued case for five common single matrix and double matrix settings. In addition, we study the finite sample distribution of the leading eigenvector. We present the utility of our results in several signal detection and communication applications, and illustrate their accuracy via simulations.
Keywords: Complex Wishart distribution, Rank-one perturbation, Roy’s largest root, Signal detection in noise 2010 MSC: Primary 60B20, Seconday 62H10, 33C15
1. Introduction
Wishart matrices, both real and complex valued, play a central role in statistics, with numerous engineering applications, specifically signal processing and communications. Of particular interest are the roots of a single Wishart matrix H, and of a double Wishart matrix E−1 H, with H and E independent [1]. The latter can be viewed as the multivariate analogue of the univariate F distribution and is also closely related to the multivariate beta distribution [32, Section 3.3]. Here we consider the largest eigenvalue ℓ1 of either the matrix H or the matrix E−1 H, a test statistic proposed by Roy [38, 39], known as Roy’s largest root [32, Section 10.6]. Specifically, we focus on the complex-valued case where H, E are independent complex-valued Wishart matrices. Throughout this paper, we consider m × m matrices, where E follows a complex valued central Wishart distribution with nE degrees of freedom and identity covariance matrix ∑E = I, denoted . The distribution of the matrix H will either be central , or non-central . For the definition of central and non-central complex valued Wishart matrices, see for example [15] and [19, Section 8].
Obtaining simple expressions, exact or approximate, for the distribution of this top eigenvalue, denoted by ℓ1, in the single or double matrix case has been a subject of intense research for more than 50 years. Khatri [27] derived an exact expression for the distribution of ℓ1 in the single central matrix case with an identity covariance matrix (∑H = I). His result was generalized to several other settings, such as an arbitrary covariance matrix or a non-centrality matrix [24, 28, 36, 37, 41]. The resulting expressions are, in general, challenging to evaluate numerically. More recently, Zanella et al. [44] derived simpler exact, yet recursive expressions, both for the central case with arbitrary ∑H and for the non-central case but with ∑H = I. Alternative recursive formulas in the real-valued case and in the complex-valued case were derived by Chiani [4–6].
A different approach to derive approximate distributions for the largest eigenvalue when ∑E = ∑H = I, is based on random matrix theory. Considering the limit as nH and m (and in the double matrix case also nE) tend to infinity, with their ratios converging to constants, ℓ1 in the single matrix case and ln(ℓ1) in the double matrix case, asymptotically follow a Tracy-Widom distribution [20–22]. Furthermore, with suitable centering and scaling, the convergence to these limiting distributions is quite fast [10, 31].
In this paper, motivated by statistical signal detection and communication applications, we consider complex valued Wishart matrices H whose population covariance is a rank-one perturbation of a base covariance matrix. Specifically, in the central case we assume ∑H = I + λvv†, where λ is a measure of signal strength, the unit norm vector is its direction, and v† denotes the conjugate transpose of v. Similarly, in the non-central case, we assume that , with a rank-one non-centrality matrix Ω = λvv†. Our goal is to study the distribution of ℓ1 and its dependence on λ, which as discussed below is a central quantity of interest in various applications. A classical result in the single-matrix case is that with dimension m fixed, as nH → ∞, the largest eigenvalue of H converges to a Gaussian distribution [1]. In the random matrix setting, as both nH and m tend to infinity with their ratio tending to a constant, Baik et al. [3] and Paul [34] proved that if then ℓ1 still converges to a Gaussian distribution, but with a different variance. In the two-matrix case, the location of the phase transition and the limiting value of the largest eigenvalue of E−1 H were recently studied by Nadakuditi and Silverstein [33]. Dharmawansa et al. [8] proved that above the phase transition, ℓ1 converges to a Gaussian distribution and provided an explicit expression for its asymptotic variance.
Whereas the above results assume that dimension and degrees of freedom tend to infinity, in various common applications these quantities are relatively small. In such settings, the above mentioned asymptotic results may provide a poor approximation to the distribution of the largest eigenvalue ℓ1, which can be quite far from Gaussian, see Fig. 1 (left) for an illustrative example. Accurate expressions for the distribution of ℓ1, for small dimension and degrees of freedom, were recently derived for single and double real-valued Wishart matrices by Johnstone and Nadler [23], via a small noise perturbation approach. In this paper, we build upon their work and extend their results to the complex valued case and to the study of the distribution of the leading sample eigenvector, not considered in their work. As discussed below, both are important quantities in various applications.
Fig. 1.

Density of the largest eigenvalue in Case 1 (left) and Case 2 (right). The parameters for Case 1 are nH = m = 5, λ = 5 and σ2 = 0.01. For Case 2 they are nH = m = 5, ω = 1 and σ2 = 0.01. The red solid line corresponds to Propositions 1 and 2.
Propositions 1-5 in Section 2 provide approximate expressions for the distribution of ℓ1 under the five single-matrix and double-matrix cases outlined in Table 1. In Section 3 we study the finite sample fluctuations of the leading eigenvector and its overlap with the population eigenvector. Next, in Section 4 we illustrate the utility of these approximations in signal detection and communication applications. Specifically, Section 4.1 considers the power of Roy’s largest root test under two common signal models, whereas Section 4.2 considers the outage probability in a specific multiple-input and multiple-output (MIMO) communication system [24]. For a rank-one Rician fading channel, we show analytically that to minimize the outage probability it is preferable to have an equal number of transmitting and receiving antennas. This important design property was previously observed via simulations [24].
Table 1.
Five common single-matrix and double-matrix cases. The middle column describes the distribution of the covariance matrices of the observed data. In the first two cases only one sample covariance matrix is computed. The right column describes several relevant applications.
| Case | General Form of Distribution | Application |
|---|---|---|
| 1 |
∑ is known |
Signal detection in noise, known noise covariance matrix. |
| 2 |
Σ is known |
Constant modulus signal detection in noise, known noise covariance matrix. |
| 3 |
|
Signal detection in noise, estimated noise covariance matrix. |
| 4 |
|
Constant modulus signal detection in noise, estimated noise covariance matrix. |
| 5 |
Ω is a rank-one matrix |
canonical correlation analysis between two groups of sizes p ≤ q |
2. On the Distribution of Roy’s Largest Root
Table 1 outlines five common single matrix and double matrix complex Wishart cases, along with some representative applications. Propositions 1-5 below, are the complex analogues of those in [23], and provide simple approximations to the distribution of Roy’s largest root in these cases. As outlined in the appendix, their proof follows those of [23], with some notable differences. In particular, we present complex valued analogues of some well known results for real valued Wishart matrices. In what follows we denote by the expectation operator. We also denote by the chi-squared distribution with k degrees of freedom and by the non-central chi-squared distribution with non-centrality parameter η. Throughout the manuscript we follow the standard definition of complex valued multivariate Gaussian random variables, see [15]. Specifically, if then it can be written as where A, B ∈ ℝ are independent random variables and .
We start with the simplest Case 1 in Table 1, involving a single central Wishart matrix, . In various engineering applications the matrix ∑ denotes the covariance of the noise measured at m sensors and is often assumed to be known, whereas λ is a measure of the signal strength and the unit norm vector v denotes its direction. Without loss of generality, we thus assume ∑ = σ2I, where σ2 then denotes the noise variance. In contrast to previous asymptotic approaches, whereby the number of samples nH → ∞ and possibly also the dimension m → ∞, in the following we keep nH and m fixed, and study the distribution of the largest eigenvalue in the limit of small noise, namely as σ → 0. To emphasize that we study the dependence of the largest eigenvalue of H on the parameter σ, we shall denote it by ℓ1 (σ).
Proposition 1. Let , with ||v|| = 1, λ > 0 and let ℓ1(σ) be its largest eigenvalue. Then, with (m, nH, λ) fixed, as σ → 0
| (1) |
where A, B, C are independent random variables, distributed as , , and .
Remark 1. Given that ℓ1 is the largest eigenvalue of a Wishart matrix, it has finite mean and variance. Approximate formulas for these quantities follow directly from (1). Since , and for k > 2 then for nH > 1
and similarly,
Remark 2. The exact distribution of the largest eigenvalue ℓ1 in the setting of Proposition 1, with number of samples larger than the dimension, has been recently derived by Chiani [6, Theorem 4, part 3]. The result is given in terms of the determinant of an m × m matrix whose entries depend on the generalized incomplete gamma function, with parameters that depend on λ and on σ. In contrast, while (1) is approximate, the dependence on the values of λ and σ is more explicit.
The next proposition considers a non-central single Wishart, Case 2 in Table 1.
Proposition 2. Let with ||v|| = 1, ω > 0 and let ℓ1(σ) be its largest eigenvalue. Then, with (m, nH, ω) fixed, as σ → 0
| (2) |
where A, B, C are all independent and distributed as , and .
Remark 3. By definition, . Furthermore, it is easy to show that as η → ∞, and . Note that as σ → 0 the non-centrality parameter 2ω/σ2 which appears in the random variable A in (2) tends to infinity. Hence, for small σ, we can approximate the mean and variance of ℓ1(σ) in (2) by
and
The next two propositions provide approximations to the distribution of Roy’s largest root in the central and non-central double matrix settings, which correspond to Cases 3 and 4 in Table 1. For Case 3, for example, in principle we need to study ℓ1(E−1 H) where and . However, a simplification can be made based on the following observations: (i) The matrix E−1 H has the same eigenvalues as ∑1/2E−1 H∑−1/2, which is equal to (Σ−1/2EΣ−1/2)−1(Σ−1/2HΣ−1/2), (ii) the matrix and (iii) the matrix , where v = Σ−1/2w/||Σ−1/2w|| has unit norm, and . Hence, in the following propositions we assume without loss of generality that the covariance matrix of E is Σ = I.
Proposition 3. Let and be independent, with nE > m + 1 and ||v|| = 1. Let ℓ1 be the largest eigenvalue of E−1H. Then, with (m, nH, nE) fixed, as λ becomes large
| (3) |
where the two F distributed random variates are independent and
| (4) |
Proposition 4. Suppose that and are independent, with nE > m + 1, ω > 0, and ||v|| = 1. Let ℓ1 be the largest eigenvalue of E−1H. Then, with (m, nH, nE) fixed, as ω becomes large
| (5) |
where the two F distributed random variates are independent and the parameters ai, bi, ci are given in (4).
Remark 4. In the limit as nE → ∞, the two F-distributed random variables in (3) and (5) converge to χ2 distributed random variables, thus recovering the leading order terms in (1) and (2), respectively.
Let us illustrate the accuracy of our approximations via several simulations. Fig. 1 compares the empirical density of the largest eigenvalue, computed from 105 independent Monte Carlo realizations, in Cases 1 and 2 defined in Table 1, to the two corresponding propositions. For reference, we also plot the standard Gaussian density. The accuracy of our proposition for computing tail probabilities of the form Pr(ℓ1 > t) is illustrated in Fig. 2 for Case 1. Similar results (not shown) hold for other cases. Results for Cases 3 and 4 of Table 1 are shown in Fig. 3. As can be seen, in all cases, due to the small sample size and dimension, the distribution of the largest root deviates significantly from the asymptotic Gaussian one, with our propositions being significantly more accurate.
Fig. 2.

Tail probabilities for largest eigenvalue in Case 1, same parameters as in Fig. 1.
Fig. 3.

Density of the largest eigenvalue in Case 3 (left) and Case 4 (right). In both plots nE = nH = 10, m = 5. In Case 3, λ = 50 and in Case 4 ω = 150. The blue solid line corresponds to Propositions 3 and 4.
2.1. On the leading canonical correlation coefficient
We now consider the fifth Case of Table 1 and study the largest sample canonical correlation coefficient between a first group of p variables and a second group of q variables, in the presence of a single large canonical correlation coefficient in the population. Canonical correlation analysis is widely used in a variety of applications, for example in medical image processing [7, 26, 30], signal processing [2, 35, 40], and array processing [11].
Since the canonical correlation is invariant under unitary transformations within each of the two groups of variables, in the presence of a single large correlation coefficient, without loss of generality we can choose the following form for the matrix Σ,
Here with P = diag (ρ, 0,…, 0) ∈ ℝp×p and ρ is the value of the correlation coefficient.
To study the sample canonical correlation, consider n + 1 complex-valued m-dimensional multivariate Gaussian observations , i = {1, …, n + 1} on m = p + q variables, where without loss of generality p ≤ q. The corresponding sample covariance matrix S decomposes as
where and represent the first p variables and the remaining q variables, respectively.
Our interest is in the largest sample canonical correlation coefficient, denoted by r1. Similar to the real valued case [32, Chapter 10], its square is the largest root of the following characteristic equation
| (6) |
where . Introducing the notation H = Y†QY and E = Y†(Ip − Q)Y, (6) can be rewritten as
Hence, we may equivalently study the largest root of E−1H, since it is related to by .
Similar to [23], it can be shown that with Φ = Ip − P2, conditional on X, the two matrices H and E are independent and distributed as
| (7) |
with the non-centrality matrix given by
| (8) |
Since , then all diagonal entries of X†X follow a chi-square distribution. In particular, . The next proposition provides an approximation to the distribution of the largest sample canonical correlation in the presence of a single population canonical correlation. To this end, we introduce the following notation. We denote by a random variable, which is defined as a function of three other random variables as follows: First, generate a random variable . Next, generate two independent random variables, one distributed as and the other as . Finally, compute their ratio
| (9) |
Proposition 5. Let , where r1 is the largest sample canonical correlation between two groups of size p ≤ q computed from n + 1 i.i.d. observations with v = n − p − q > 1. Then in the presence of a single large population correlation coefficient ρ between the two groups, asymptotically as ρ → 1,
where
Remark 5. It can be shown that the probability density of is
where is the Gauss hypergeometric function and is the beta function. This formula is useful for numerical evaluation for small parameter values.
Fig. 4 illustrates the accuracy of Proposition 5. A good match between the theoretical approximation formula and simulation results is clearly visible, particularly at the right tail of the distribution.
Fig. 4.

Density function of ℓ1(E−1H) in canonical correlation analysis.
3. Distribution of the Leading Sample Eigenvector
Another key quantity of both theoretical and practical importance is the squared dot product between the leading sample eigenvector, denoted , and its corresponding population eigenvector v. Assuming ,
| (10) |
A practical application where it is important to understand the behavior of R under a rank one spike, involves the design of dominant mode rejection (DMR) adaptive beamformers in array processing [42]. The main purpose of this beamformer is to eliminate interferences from undesired directions other than the steering direction. As shown in [43], an important parameter which determines the performance of the DMR scheme is the correlation between the random sample eigenvectors and the unknown population eigenvectors. Specifically, in the presence of a single dominant interferer, the population covariance matrix takes the form of a rank one spiked model [43, Eq. 17], and the effectiveness of the DMR depends on the quantity R. Another application where the quantity R plays a key role is passive radar detection with digital illuminators having several periodic identical pulses [14]. In a sequence of papers [12–14], the authors developed a new framework for passive radar detection based on the leading eigenvector of the sample covariance matrix. This detection scheme outperforms traditional detectors [14]. Motivated by these and other applications, we now develop stochastic approximations to R. For Case 1 of Table 1, we have:
Proposition 6. Let , with ||v|| = 1 and λ > 0. Let be the eigenvector corresponding to the largest eigenvalue of H. Then, with (m, nH, λ) fixed, for small σ
where , and are all independent.
The distribution of R in Case 2 of Table 1 is given by the following proposition.
Proposition 7. Let , with ||v|| = 1 and ω > 0. Let be the eigenvector corresponding to the largest eigenvalue of H. Then, with (m, nH, ω) fixed, for small σ
where , and are all independent.
Propositions 6 and 7 can be useful to analyze theoretically various DMR and radar detections schemes, and shed light on their dependence on the relevant system parameters.
For the double-matrix Case 3 in Table 1, we have
Proposition 8. Let and be independent, with nE > m + 1 and ||v|| = 1. Let be the eigenvector corresponding to the largest eigenvalue of E−1H. Then, with (m, nH, nE) fixed, for large λ
where and are independent.
In the context of array processing, the double matrix Case 3 of Table 1 corresponds to a setting where the noise characteristics of the m sensors are not perfectly known, but rather their covariance matrix is estimated from nE samples that do not contain any signal. Comparing Proposition 8 with Proposition 6 sheds light on the effect of estimating the covariance matrix of the noise. Whereas in Case 1, as signal strength λ → ∞ the quantity R converges to one, in Case 3, the random variable R does not converge to one, but rather to a Beta distribution.
Figs. 5 and 6 illustrate the accuracy of our approximate distributions of the squared inner product between the leading sample and population eigenvectors.
Fig. 5.

Empirical versus theoretical density of R in Case 1 (left) and Case 2 (right).
Fig. 6.

Comparison of empirical density of R in Case 3 of Table 1 with Proposition 8, for nH = 10, nE = 16, m = 5 and λ = 100.
4. Applications
We now demonstrate the utility of our approximations to Roy’s largest root distribution under a rank-one perturbation in three different engineering applications. The first two are concerned with common problems in signal detection, whereas the third with the outage probability of a rank-one Rician fading MIMO channel.
4.1. Signal Detection in Noise
Detecting the presence of a signal in a noisy environment is a fundamental problem in detection theory. Specific examples include spectrum sensing in cognitive radio [17] and target detection in sonar and radar [42]. Assuming additive Gaussian noise, the observed vector at time t is of the form
| (11) |
where is the time dependent signal, is normalized such that ||u|| = 1 is its direction, λ ≥ 0 is a measure of the signal strength and the vector is a zero mean complex valued random noise, assumed to be independent of the signal and distributed as . The positive definite Hermitian matrix Σ is thus the population covariance of the additive random noise. In some cases it is assumed to be explicitly known, whereas in others it needs to be estimated. The signal s(t) is often modeled as a random quantity with . For example, in multiple antenna spectrum sensing for cognitive radio a common model is that , namely s(t) = s1(t) + ιs2(t) where s1(t) and s2(t) are real valued and independent random variables distributed [45,46]. Similarly, in detection of constant modulus signals (e.g., FM signals [18]), s(t) = exp(ιϕ(t)), where ϕ(t) is random.
When the covariance matrix Σ of the noise vector n is assumed known, the observed data used to detect if a signal is present are often nH i.i.d. observations y1, …, ynH, from (11). A popular approach is to compute the sample covariance matrix , and declare that a signal is present if some function of its eigenvalues is larger than a suitable threshold. Several such detection tests have been proposed [18, 45, 46], including Roy’s largest root [29]. As discussed below, depending on the model of the signal, this leads precisely to Cases 1 and 2 in Table 1.
In other situations, Σ is unknown, but it is possible to observe both the nH samples yi of (11) as well as an additional set of nE independent realizations n1, …, nnE of the noise vector n. The latter are measured, for example, in time slots at which it is a-priori known that no signals are emitted. Here, a typical approach is to form both the matrix H as above and the matrix and detect the presence of a signal via some function of the eigenvalues of E−1H. Signal detection based on the largest eigenvalue of E−1H leads to Cases 3 and 4 in Table 1.
As discussed in Section 2, one may assume without loss of generality that Σ = σ2I. Thus, when ,
In contrast, if s = exp(ιϕ), conditional on ϕ1,…, ϕnH,
Propositions 1-4 can thus be used to approximate the detection power of Roy’s largest root test as a function of signal strength λ in both the single matrix cases and the double matrix cases,
| (12) |
where μ is a given threshold parameter. The accuracy of (12) is illustrated in Fig. 7.
Fig. 7.

Detection power profile for several signal to noise ratios as a function of threshold μ for a known covariance matrix or an unknown covariable. In both cases λ = 1, nH = 5, m = 5. In the right panel nE = 10. From top to bottom, σ = 1/10,2/10 and 3/10.
4.2. Rank-One Rician-Fading MIMO Channel
As a last application, consider the outage probability of a MIMO communication channel with nT transmitters and nR receivers. Here, the transmitted signals and received signals are related as
where H is the nR × nT channel matrix and n is additive random complex valued noise, assumed to be distributed as , where is its (real-valued) variance. Due to fluctuations in the environment, the channel matrix H is modeled as a random quantity. In particular, under a common Rician fading model [16], H has the form
| (13) |
where H1 represents the specular (Rician) component from a direct line-of-sight between transmitter and receiver antennas and H2 represents the scattered Rayleigh-fading component. With fixed sender and receiver locations, the matrix H1 is constant whereas H2 is random with entries modeled as i.i.d. complex Gaussians, . Under the normalization , the factor K represents the ratio of deterministic-to-scattered power of the environment.
Under the maximal ratio transmission strategy, where the transmitter sends information along the leading eigenvector of HH†, the channel signal to noise ratio is given by
| (14) |
where is the power of the transmitted signal vectors [24]. An important quantity is the channel’s outage probability, defined as the probability of failing to achieve a specified minimal SNR μmin required for satisfactory reception. Based on (14), the outage probability Pout can be written as
| (15) |
One particularly interesting case is when the Rician component H1 is assumed to be of rank one, H1 = uv†, where , . An important design question is which configuration of antennas minimizes (15), under the constraint that the total number of transmitting and receiving antennas is fixed. Via simulations, [24] showed it is best to have an equal number of transmitting and receiving antennas. Here we analytically prove this result asymptotically in the limit of small scattering variance (i.e., σH « 1).
Proposition 9. Consider a rank-one Rician fading channel with a fixed number of antennas, nT + nR = N. Then, for σH « 1, the outage probability is minimized at nT = nR = N/2 for N even (or say nT = ⌊N/2⌋, nR = ⌈N/2⌉| for N odd).
Proof. Under the model in (13) and the assumption that H1 = uv† is rank one, the j-th column of H, of dimension nR, is distributed as . Therefore,
is non-central Wishart, with
Thus, Proposition 2 implies that for fixed (nT, nR, K),
| (16) |
where A, B, C are independent random variables distributed as
and
| (17) |
Since , and c2 → as σH → 0, we may neglect the third term in (16). Furthermore, since A and B are independent,
Clearly Pout of (15) is minimal when the largest eigenvalue ℓ1 is stochastically as large as possible, or in turn, when its non-centrality parameter c2 is maximal. Since by (17), c2 ∝ nTnR, the proposition follows. □
Fig. 8.

Outage probability as a function of nT, with nT + nR fixed. Circles represent a Monte-Carlo simulation whereas the solid line is our approximation (which can be computed for any non-integer nT ∈ ℝ+). These graphs support Proposition 9 and demonstrate the accuracy of our approximations. In both graphs, K = 2, σH = 0.3, σn = 1 and ΩD = 5.
Acknowledgments
We thank the editor in chief, the associate editor and the referees for their constructive comments and suggestions. This work was supported in part by grants NIH BIB R01EB1988 (PD) and BSF 2012-159 (PD, BN, OS). B.N. is incumbent of the William Petschek professorial chair of mathematics.
Appendix A. Proofs of main propositions
We prove our main results using the analytical framework developed in [23]. For a complex-valued number , its real and imaginary parts are denoted and , respectively, whereas is its complex conjugate. We begin with the following auxiliary lemma, which describes the analytic structure of the leading eigenvalue and eigenvector of a covariance matrix constructed from vectors all in the same direction, which without loss of generality we choose as the standard vector e1 = (1, 0, …, 0)⊺, but corrupted by small perturbations. Its proof is in Appendix B.
Lemma 1. Let be n vectors in of the form
| (A.1) |
where uj are complex valued scalars, with are the perturbations in orthogonal directions to e1 and ϵ ∈ ℝ is a small parameter. Define z ∈ ℝ, and Z ∈ C(m−1)×(m−1)
| (A.2) |
Let ℓ1(ϵ) be the largest eigenvalue of with corresponding leading eigenvector v1(ϵ) normalized such that . Then ℓ1(ϵ) is an even analytic function of ϵ, whereas v1(ϵ)−e1 is an odd function of ϵ. In particular, the Taylor expansions of ℓ1(ϵ) and ϵ = 0 are given by
| (A.3) |
Proof of Propositions 1 and 2. Since the eigenvalues of H do not depend on the direction of the vector v, without loss of generality we thus assume that v = e1. Then, H may be realized from nH i.i.d. observations of the form (A.1) with ϵ replaced by σ,
| (A.4) |
and μj are arbitrary complex numbers satisfying Σj|μj|2 = ω.
For each realization of u = (uk) and , Lemma 1 yields the approximation (A.3) for ℓ1(σ). To derive the distributions of the various terms in (A.3) we proceed as follows. Define , choose columns o2, …, on so that O = [o1, …, onH] is an nH×nH unitary matrix, and consider the following (m−1)×nH matrix V = ΞO. Its first column is , and thus the O(ϵ2) term in (A.3) is b†b = ||v1||2. For the forth order term, observe that Z = ΞΞ† = VV† and so the quantity D = b†(Z − bb†)b may be written as
Hence, (A.3) becomes
where V0 = ||u||2, V2 = ||v1||2 and . To study the distributions of V0, V2, V4, note that by assumption in (A.4), with
Therefore, is a sum of 2nH independent squares of either mean centered or non-centered Gaussian random variables. This in turn gives
Since given u, O is unitary and fixed, then . Since this distribution is independent of u, . By similar arguments
which is independent of ||u||2. Finally, conditioned on(u, v1), we have and . Thus,
where the variate is independent of (u, v1). We conclude that
Since the random variables V0, V2, V4 are independent, then so are A, B, C in either (1) or (2). This completes the proof of Propositions 1 and 2. □
To prove Propositions 3 and 4, we first introduce some additional notation and two auxiliary lemmas, whose proofs are deferred to Appendix B. For a matrix S, denote by S jk and S jk the (j, k)-th entries of S and S−1, respectively.
Lemma 2. Let and , with the vector b fixed and orthogonal to e1. Define a 2 × 2 diagonal matrix . Then
and the two random variables S11 and S 22 are independent with
Lemma 3. Let and let , where Z is an (m − 1) × (m − 1) random matrix independent of E, with . Then
Proof of Propositions 3 and 4. Without loss of generality we may assume that the signal direction is v = e1. Hence
Next, we apply a perturbation approach similar to the one used in the previous proof. To introduce a small parameter, set
The matrix Hϵ = ϵ2H has a representation of the form X†X with X [x1, …, xnH] where each xj follows (A.1) but now with
where Σ |μj|2 = ω. In particular,
With b as in (A.2), using the same arguments as in the previous proof, we have that , independently of u.
The matrix Hϵ may be written as Hϵ = A0 + ϵA1 + ϵ2A2, where
| (A.5) |
with Z as in (A.2). For future use we define the following quantities
Note that the condition nE ≥ m ensures that E is invertible with probability 1. This follows for example from Theorem 3.2 in [9].
The matrix E−1Hϵ is similar to the Hermitian matrix E−1/2HϵE−1/2. Therefore, all its eigenvalues are real-valued for any value of ϵ. Furthermore, since E−1/2 HϵE−1/2 is a holomorphic symmetric function of ϵ, it follows from Kato ([25], Theorem 6.1 page 120) that the largest eigenvalue ℓ1 and its eigenprojection are analytic functions of ϵ in some neighborhood of zero, where the largest eigenvalue has multiplicity one. The projection to the corresponding eigenspace of E−1Hϵ is . As the matrix E does not depend on ϵ, this projection is also an analytic function in some neighborhood of ϵ = 0.
At ϵ = 0, E−1e1 is an eigenvector with eigenvalue E11z, that is,
from which we obtain
| (A.6) |
Since is an analytic function of ϵ and the inner product is a smooth function, then there exists a neighborhood of ϵ = 0 where , e1 is both analytic in ϵ and strictly positive. In this neighborhood, we may define
| (A.7) |
Clearly v1(ϵ) is the eigenvector corresponding to the eigenvalue ℓ1(ϵ) and it is also analytic. We thus expand
| (A.8) |
Inserting these expansions into the eigenvalue-eigenvector equations E−1Hϵv1 = ℓ1v1 gives the following equations: at the O(1) level,
whose solution is
| (A.9) |
By (A.6)-(A.7), w0 = v1(0) = E−1e1, so the above constant is one.
By (A.7), . Hence for all j ≥ 1. Furthermore, since , then A0wj = 0 for all j ≥ 1. The O(ϵ) equation is thus
| (A.10) |
However, A0w1 = 0. Multiplying this equation by gives that
| (A.11) |
Inserting the expression for λ1 into (A.10) gives that
The next O(ϵ2) equation is
Multiply this equation by and recall that A0w2 = 0 and gives
| (A.12) |
Combining (A.9)-(A.12), we obtain the following approximate stochastic representation for the largest eigenvalue ℓ1 of E−1Hϵ
| (A.13) |
Next, to derive the approximate distribution of ℓ1 corresponding to the above equation, we study a 2 × 2 Hermitian matrix S, whose inverse is defined by
where is a 2 × 2 matrix. Inverting this matrix gives
Hence in terms of the matrices S and S−1, (A.13) can be written as
| (A.14) |
To establish Propositions 3 and 4, we start from (A.14). We neglect the second term which is symmetric with mean zero, and whose variance is much smaller than that of the first term. We also approximate the last term, denoted by T2, by its mean value, using Lemma 3. We now have
where c(m, n) is the expectation from Lemma 3. Since ℓ(ϵ) is the largest eigenvalue of E−1Hϵ = ϵ2E−1H, (A.14) should be divided by ϵ2 to obtain the largest eigenvalue of E−1H. By doing so, and inserting the distributions of S11 and combining this with the S22 from Lemma 2 gives
Next, by inserting the distributions of ||b||2, z and the relevant value of ϵ, we get that for Proposition 3
and for Proposition 4
From Lemma 2 and the independency of u and z, all of the above χ2 random variables are independent. Finally, since ratios of independent χ2 random variables follow an F distribution, the two propositions follow. □
Proof of Proposition 5. By (8), the non-centrality parameter ω depends on the data only through X†X. Conditioning on X†X, following (7), we invoke Proposition 4 with the parameters m = p, nH = q, and nE = n − q to obtain
Now the final result follows by integrating over the distribution of , and using the definition of given in (9). □
Proof of Proposition 6 and 7. Let us assume without loss of generality that v = e1. If is not normalized, then we can write (10) as . From Lemma 1, we have
where
with i.i.d. variables all independent of ,
and ||μ||2 = ω. Therefore,
The result follows from the distribution of these quantities. □
Proof of Proposition 8. Let us rewrite (A.8) as follows
where w0 = E−1e1, . For convenience, decompose the matrices E and E−1 as
| (A.15) |
where E11 ∈ ℝ, , and . Consequently, E11 = 1/(E11 − E12†E22−1E12) ∈ ℝ and . The exact form of E22 is unimportant as it does not affect our calculations.
Let us now focus on the numerator of R. Since e†w1 = 0, we have
from which we obtain
The denominator of R can be written as
Using the decomposition of E−1 given in (A.15), we get
Now we can conveniently express R as
where
Since PE and QE are zero mean random variables, we neglect them to obtain
where we have used the relation . Noting that with , we can show that is beta distributed with parameters nE − m + 2 and m − 1. Now the final result follows from the observation that, for and , X/(X + Y) is beta distributed with parameters p/2 and q/2. □
Appendix B. Proof of Auxiliary Lemmas
Proof of Lemma 1. Write the m × n matrix X(ϵ) = [x1, …, xn] and observe that X(−ϵ) = UX(ϵ), where U = diag(1, −1, …, −1), is an orthogonal matrix. Thus, H(−ϵ) = UTH(ϵ)U has the same eigenvalues as H(ϵ). In particular, the largest eigenvalue ℓ1 and its corresponding eigenvector v1 satisfy
| (B.1) |
Hence ℓ1 and the first component of v1 are even functions of ϵ whereas the remaining components of v1 are odd.
We decompose the matrix as
with the matrices A0, A1 and A2 are given in (A.5). Following similar arguments which lead to (A.7) and (A.8) with E = I, we can establish that ℓ1(ϵ) and v1(ϵ) are analytic in some neighborhood of zero. Therefore, we have the following Taylor series expansions:
| (B.2) |
Also, the eigenprojection P(ϵ) of ℓ1 satisfies
| (B.3) |
Inserting the expansions (B.2) into the eigenvalue equation Hv1 = ℓ1v1 gives the following set of equations for r ≥ 0
| (B.4) |
with the convention that vectors with negative subscripts are zero. From the r = 0 equation, A0w0 = λ0w0, we readily find that
Eq. (B.3) implies that and w0 = v1(0) = e1. This implies that wj, for j ≥ 1, is orthogonal to e1, that is orthogonal to w0.
From the eigenvector remarks following (B.1) it follows that w2j = 0 for j ≥ 1. These remarks allow considerable simplification of (B.4); we use those for r = 1 and r = 3
| (B.5) |
from which we obtain
| (B.6) |
Multiply (B.4) on the left by and use the first equation of (B.5) to obtain, for r even,
and hence
Therefore, we can further simplify (B.6) to yield
To prove Lemmas 2 and 3, we shall use the following two claims, which are the complex analogues of Theorems 3.2.10 and 3.2.11 in Muirhead [32]. While their proofs are similar to those in the real valued case, for completeness we present them below.
Claim 1. Suppose with n > m − 1 where A and Σ are partitioned as follows
and let , and . Then, A11.2 is distributed as and is independent of A12, A21 and A22.
Claim 2. Let and let M be a k×m matrix of rank k, where M is independent of A. Then .
Proof of Claim 1. Let C = Σ−. We partition it as follows,
| (B.7) |
where , , and with . Consequently, .
Following [15, 19], the density of A is given by
| (B.8) |
where tr(·) denotes the trace operator and
with Γ(·) denoting the classical gamma function.
To prove the claim we shall study the form of det(A) and of tr(Σ−1A). First of all, we have that
Next, we introduce a change of variables from the entries of the matrix A, to , B12 = A12, B22 = A22. The Jacobian of this transformation is an upper triangular matrix, with all diagonal entries equal to one. Hence, the volume element in (B.8) is dA = dA11dA12dA22 = dA11.2dB12dB22. Furthermore, using the expansion
along with the fact that yields that
| (B.9) |
Now we may use the decomposition
to rewrite (B.9) as
| (B.10) |
where
| (B.11) |
and
The factorization in (B.10) establishes that A11.2 is independent of A12 and A22. Finally, (B.11) implies that which concludes the proof. □
Proof of Claim 2. Set B = Σ−1/2AΣ−1/2. Now . For R = MΣ−1/2, (MA−1M†)−1 = (RB−1R†)−1 and (MΣ−1M†)−1 = (RR†)−1. Thus, it is sufficient to prove that . Let R = L[Ik : 0]H be the SVD decomposition of R, where L is k × k and nonsingular and H is m × m unitary. Now,
where . Let
where F11 and C11 are k × k. Then , and since , it follows from Claim 1 that . Hence , and since (LL†)−1 = (RR†)−1, the proof is complete. □
Proof of Lemma 2. Note that . Then, by Claim 2, , meaning . Next, by definition S = (M†E−1M)−1, with fixed M. Thus, by the same claim, from which we obtain . Finally, since , by Claim 1, (S11)−1 is independent of S22. □
Proof of Lemma 3. First we decompose the expectation as follows:
Next, since A2 is independent of E,
Combining the above two equations gives that
To compute this expectation, consider the matrix . Since S22 = E22 and S22 = E11/(E11E22 − ‖E12‖2), we have
| (B.12) |
Noting that and , we take the expectation of both sides of (B.12) to obtain
which completes the proof. □
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- [1].Anderson TW, An introduction to multivariate statistical analysis, Wiley, New York, third edition, 2003. [Google Scholar]
- [2].Asendorf N, Nadakuditi RR, Improved detection of correlated signals in low-rank-plus-noise type data sets using informative canonical correlation analysis (icca), IEEE Trans. Inform. Theory 63 (2017) 3451–3467. [Google Scholar]
- [3].Baik J, Ben Arous G, Peche S, Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices, Ann. Probab. (2005) 1643–1697. [Google Scholar]
- [4].Chiani M, Distribution of the largest eigenvalue for real Wishart and Gaussian random matrices and a simple approximation for the Tracy-Widom distribution, J. Mult. Anal 129 (2014) 68–81. [Google Scholar]
- [5].Chiani M, Distribution of the largest root of a matrix for Roy’s test in multivariate analysis of variance, J. Mult. Anal 143 (2016) 467–471. [Google Scholar]
- [6].Chiani M, On the probability that all eigenvalues of Gaussian, Wishart, and double Wishart random matrices lie within an interval, IEEE Trans. Inform. Theory 63 (2017) 4521–4531. [Google Scholar]
- [7].Correa NM, Adali T, Li YO, Calhoun VD, Canonical correlation analysis for data fusion and group inferences, IEEE Sig. Proc. Magazine 27 (2010) 39–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Dharmawansa P, Johnstone IM, Onatski A, Local asymptotic normality of the spectrum of high-dimensional spiked F-ratios, arXiv preprint arXiv:1411.3875 (2014). [Google Scholar]
- [9].Edelman A, Eigenvalues and condition numbers of random matrices, SIAM J. Matrix Anal. Appl 9 (1988) 543–560. [Google Scholar]
- [10].El Karoui N, A rate of convergence result for the largest eigenvalue of complex white Wishart matrices, Ann. Probab 34 (2006) 2077–2117. [Google Scholar]
- [11].Ge H, Kirsteins IP, Wang X, Does canonical correlation analysis provide reliable information on data correlation in array processing?, in: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2113–2116. [Google Scholar]
- [12].Gogineni S, Setlur P, Rangaswamy M, Nadakuditi RR, Random matrix theory inspired passive bistatic radar detection of low-rank signals, in: IEEE Radar Conference, pp. 1656–1659. [Google Scholar]
- [13].Gogineni S, Setlur P, Rangaswamy M, Nadakuditi RR, Comparison of passive radar detectors with noisy reference signal, in: IEEE Statistical Signal Processing Workshop (SSP), pp. 1–5. [Google Scholar]
- [14].Gogineni S, Setlur P, Rangaswamy M, Nadakuditi RR, Passive radar detection with noisy reference signal using measured data, in: IEEE Radar Conference, pp. 858–861. [Google Scholar]
- [15].Goodman NR, Statistical analysis based on a certain multivariate complex Gaussian distribution (an introduction), Ann. Math. Statist 34 (1963) 152–177. [Google Scholar]
- [16].Hansen J, Bolcskei H, A geometrical investigation of the rank-1 Ricean MIMO channel at high SNR, in: Intl. Symp. on Inform. Theory, IEEE, p. 64. [Google Scholar]
- [17].Haykin S, Cognitive radio: brain-empowered wireless communications, IEEE J. Sel. areas Commun 23 (2005) 201–220. [Google Scholar]
- [18].Haykin S, Moher M, Communication systems, Wiley, New York, 5th edition, 2009. [Google Scholar]
- [19].James AT, Distributions of matrix variates and latent roots derived from normal samples, Ann. Math. Statist 35 (1964) 475–501. [Google Scholar]
- [20].Johansson K, Shape fluctuations and random matrices, Comm. Math. Phys 209 (2000) 437–476. [Google Scholar]
- [21].Johnstone IM, On the distribution of the largest eigenvalue in principal components analysis, Ann. Statist 29 (2001) 295–327. [Google Scholar]
- [22].Johnstone IM, Approximate null distribution of the largest root in multivariate analysis, Ann. Appl. Statist 3 (2009) 1616–1633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Johnstone IM, Nadler B, Roy’s largest root test under rank-one alternatives, Biometrika 104 (2017) 181–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Kang M, Alouini M-S, Largest eigenvalue of complex Wishart matrices and performance analysis of MIMO MRC systems, IEEE J. Sel. Areas Commun 21 (2003) 418–426. [Google Scholar]
- [25].Kato T, Perturbation theory of linear operators, Springer, Berlin, second edition, 1995. [Google Scholar]
- [26].Khalid MU, Seghouane AK, Improving functional connectivity detection in FMRI by combining sparse dictionary learning and canonical correlation analysis, in: IEEE 10th Intl. Symp. on Biomedical Imaging, pp. 286–289. [Google Scholar]
- [27].Khatri C, Distribution of the largest or the smallest characteristic root under null hypothesis concerning complex multivariate normal populations, Ann. Math. Statist 35 (1964) 1807–1810. [Google Scholar]
- [28].Khatri C, Non-central distributions of the i-th largest characteristic roots of three matrices concerning complex multivariate normal populations, Ann. I. Stat. Math 21 (1969) 23–32. [Google Scholar]
- [29].Kritchman S, Nadler B, Non-parametric detection of the number of signals: Hypothesis testing and random matrix theory, IEEE Trans. Signal Process 57 (2009) 3930–3941. [Google Scholar]
- [30].Lin D, Zhang J, Li J, Calhoun V, Wang YP, Identifying genetic connections with brain functions in schizophrenia using group sparse canonical correlation analysis, in: IEEE 10th Intl. Symp. Biomedical Imaging, pp. 278–281. [Google Scholar]
- [31].Ma Z, Accuracy of the Tracy-Widom limits for the extreme eigenvalues in white Wishart matrices, Bernoulli 18 (2012) 322–359. [Google Scholar]
- [32].Muirhead RJ, Aspects of multivariate statistical theory, Wiley, New York, 1982. [Google Scholar]
- [33].Nadakuditi RR, Silverstein JW, Fundamental limit of sample generalized eigenvalue based detection of signals in noise using relatively few signal-bearing and noise-only samples, IEEE J. Sel. Topics Sig. Proc 4 (2010) 468–480. [Google Scholar]
- [34].Paul D, Asymptotics of sample eigenstructure for a large dimensional spiked covariance model, Statist. Sinica 17 (2007) 1617–1642. [Google Scholar]
- [35].Pezeshki A, Scharf LL, Azimi-Sadjadi MR, Lundberg M, Empirical canonical correlation analysis in subspaces, in: Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, volume 1, pp. 994–997. [Google Scholar]
- [36].Ratnarajah T, Vaillancourt R, Alvo M, Eigenvalues and condition numbers of complex random matrices, SIAM J. Matrix Anal. Appl 26 (2004) 441–456. [Google Scholar]
- [37].Ratnarajah T, Vaillancourt R, Alvo M, Complex random matrices and Rician channel capacity, Problems of Information transmission 41 (2005) 1–22. [Google Scholar]
- [38].Roy SN, On a heuristic method of test construction and its use in multivariate analysis, Ann. Math. Statist 24 (1953) 220–238. [Google Scholar]
- [39].Roy SN, Some aspects of multivariate analysis, Wiley, New York, 1957. [Google Scholar]
- [40].Scharf L, Thomas JK, Wiener filters in canonical coordinates for transform coding, filtering, and quantizing, IEEE Trans. Signal Process 46 (1998) 647–654. [Google Scholar]
- [41].Sugiyama T, Distributions of the largest latent root of the multivariate complex Gaussian distribution, Ann. I. Stat. Math 24 (1972) 87–94. [Google Scholar]
- [42].Van Trees HL, Optimum array processing: Part IV of detection, estimation, and modulation theory, John Wiley & Sons, New York, 2002. [Google Scholar]
- [43].Wage KE, Buck JR, Snapshot performance of the dominant mode rejection beamformer, IEEE J. Oceanic Eng 39 (2014) 212–225. [Google Scholar]
- [44].Zanella A, Chiani M, Win MZ, On the marginal distribution of the eigenvalues of Wishart matrices, IEEE Trans. Commun 57 (2009) 1050–1060. [Google Scholar]
- [45].Zeng Y, Liang Y-C, Eigenvalue-based spectrum sensing algorithms for cognitive radio, IEEE Trans. Commun 57 (2009) 1784–1793. [Google Scholar]
- [46].Zeng Y, Liang Y-C, Hoang AT, Zhang R, A review on spectrum sensing for cognitive radio: challenges and solutions, EURASIP J. Adv. Sig. Pr 2010 (2010) 381465. [Google Scholar]
