Estimating the Correlation in Bivariate Normal Data with Known Variances and Small Sample Sizes

Bailey K Fosdick; Adrian E Raftery

doi:10.1080/00031305.2012.676329

. Author manuscript; available in PMC: 2013 Jan 30.

Published in final edited form as: Am Stat. 2012 Mar 21;66(1):34–41. doi: 10.1080/00031305.2012.676329

Estimating the Correlation in Bivariate Normal Data with Known Variances and Small Sample Sizes¹

Bailey K Fosdick ¹, Adrian E Raftery ¹

PMCID: PMC3558980 NIHMSID: NIHMS433250 PMID: 23378667

Abstract

We consider the problem of estimating the correlation in bivariate normal data when the means and variances are assumed known, with emphasis on the small sample case. We consider eight different estimators, several of them considered here for the first time in the literature. In a simulation study, we found that Bayesian estimators using the uniform and arc-sine priors outperformed several empirical and exact or approximate maximum likelihood estimators in small samples. The arc-sine prior did better for large values of the correlation. For testing whether the correlation is zero, we found that Bayesian hypothesis tests outperformed significance tests based on the empirical and exact or approximate maximum likelihood estimators considered in small samples, but that all tests performed similarly for sample size 50. These results lead us to suggest using the posterior mean with the arc-sine prior to estimate the correlation in small samples when the variances are assumed known.

Keywords: Arc-sine prior, Bayes factor, Bayesian test, Maximum likelihood estimator, Uniform prior, Jeffreys prior

1 INTRODUCTION

Sir Francis Galton defined the theoretical concept of bivariate correlation in 1885, and a decade later Karl Pearson published the formula for the sample correlation coefficient, also known as Pearson’s r (Rodgers and Nicewander, 1988). The sample correlation coefficient is still the most commonly used measure of correlation today as it assumes no knowledge of the means or variances of the individual groups and is the maximum likelihood estimator for the correlation coefficient in the bivariate normal distribution when the means and variances are unknown.

In the event that the variances are known, information is lost by using the sample correlation coefficient. We cannot simply substitute the known variance quantities into the denominator of the sample correlation coefficient since that results in an estimator that is not the maximum likelihood estimator and has the potential to fall outside the interval [−1, 1]. When the variances are known, we seek an estimator that takes advantage of this information.

Kendall and Stuart (1979) noted that conditional on the variances, the maximum likelihood estimator of the correlation is the solution of a cubic equation. Sampson (1978) proposed a consistent, asymptotically efficient estimator based on the cubic equation that avoided the need to solve the equation directly. In a simulation study, we found that when the true correlation is zero and the sample size is small, the variances of these estimators are undesirably large. This led us to search for more stable estimates of the correlation, which condition on the known variances and perform well when sample sizes are small.

Our interest in this problem arose in the context of probabilistic population projections. Alkema et al. (2011) developed a Bayesian hierarchical model for projecting the total fertility rate (TFR) in all countries. This model works well for projecting the TFR in individual countries. However, for creating aggregated regional projections, there was concern that excess correlation existed between the country fertility rates that was not accounted for in the model. To investigate this we considered correlations between the normalized forecast errors in different countries, conditional on the model parameters. Often there were as few as five to ten data points to estimate the correlation. For each pair of countries, these errors were treated as samples from a bivariate normal distribution with means equal to zero and variances equal to one. Determining whether the correlations between the countries are nonzero, and if so estimating them, is necessary to assess the predictive distribution of aggregated projections.

In Section 2 we describe the estimators we consider, in Section 3 we give the results of our simulation study, and in Section 4 we discuss alternative approaches.

2 ESTIMATORS OF CORRELATION

Let (X_i, Y_i), i = 1, …, n be independent and identically distributed observations from a bivariate normal distribution with means equal to zero, variances equal to one, and correlation unknown. We let $SSx = \sum_{i = 1}^{n} X_{i}^{2}, SSy = \sum_{i = 1}^{n} Y_{i}^{2}$ , and $SSxy = \sum_{i = 1}^{n} X_{i} Y_{i}$ and consider eight estimators of the correlation.

The first estimator is the maximum likelihood estimator for bivariate normal data when the variances are unknown. We refer to this as the sample correlation coefficient even though we have conditioned on the means being zero. This estimator is defined as follows:

{\hat{ρ}}^{(1)} = \frac{\frac{\sum_{i = 1}^{n} X_{i} Y_{i}}{n}}{\sqrt{(\frac{\sum_{i = 1}^{n} X_{i}^{2}}{n}) (\frac{\sum_{i = 1}^{n} Y_{i}^{2}}{n})}} = \frac{SSxy}{\sqrt{SSx SSy}} .

The second estimator is a modification of the first estimator, where we assume the variances are known to be equal to one. We name this estimator the empirical estimator with known variances and define it as:

{\hat{ρ}}^{(2)} = \frac{\sum_{i = 1}^{n} X_{i} Y_{i}}{n} = \frac{SSxy}{n} .

This estimator is unbiased yet is not guaranteed to fall in [−1, 1], especially for small samples. This unappealing property motivated us to define the third estimator called the truncated empirical estimator with known variances, ρ̂⁽³⁾, where the second estimator is truncated at −1 if it falls below −1 and at 1 if it falls above 1.

The maximum likelihood estimator (MLE) when the means are known to be zero and variances are known to be one is the fourth estimator. This estimator is found by solving the cubic equation

0 = ρ^{3} - ρ^{2} \frac{SSxy}{n} - ρ \frac{(n - SSx - SSy)}{n} - \frac{SSxy}{n},

(1)

which results from setting the derivative of the log-likelihood equal to zero. If we define

\begin{array}{l} ψ \equiv ψ (SSx, SSy, SSxy) = - 3 n (n - SSx - SSy) - {SSxy}^{2}, and \\ γ \equiv γ (SSx, SSy, SSxy) = - 36 n^{2} SSxy + 9 nSSx \times SSxy + 9 nSSy \times SSxy - 2 {SSxy}^{3}, \end{array}

then the three roots of this equation can be written fairly compactly, as follows:

\begin{array}{l} ρ_{1}^{(4)} = \frac{SSxy}{3 n} + \frac{2^{1 / 3} (ψ)}{3 n {(γ + \sqrt{4 {(ψ)}^{3} + {(γ)}^{2}})}^{1 / 3}} - \frac{{(γ + \sqrt{4 {(ψ)}^{3} + {(γ)}^{2}})}^{1 / 3}}{3 \times 2^{1 / 3} n}, \\ ρ_{2}^{(4)} = \frac{SSxy}{3 n} - \frac{(1 + i \sqrt{3}) (ψ)}{3 \times 2^{2 / 3} n {(γ + \sqrt{4 {(ψ)}^{3} + {(γ)}^{2}})}^{1 / 3}} + \frac{(1 - i \sqrt{3}) {(γ + \sqrt{4 {(ψ)}^{3} + {(γ)}^{2}})}^{1 / 3}}{6 \times 2^{1 / 3} n}, \\ ρ_{3}^{(4)} = \frac{SSxy}{3 n} - \frac{(1 - i \sqrt{3}) (ψ)}{3 \times 2^{2 / 3} n {(γ + \sqrt{4 {(ψ)}^{3} + {(γ)}^{2}})}^{1 / 3}} + \frac{(1 + i \sqrt{3}) {(γ + \sqrt{4 {(ψ)}^{3} + {(γ)}^{2}})}^{1 / 3}}{6 \times 2^{1 / 3} n} . \end{array}

Kendall and Stuart (1979) noted that at least one of the roots above is real and lies in the interval [−1, 1]. However, it is possible that all three roots are real and in the admissible interval, in which case the likelihood can be evaluated at each root to determine the true maximum likelihood estimate. Based on whether (SSxy/n)² is bigger than 3(SSx/n+SSy/n−1), and whether γ/(2ψ) is bigger than 1, Madansky (1958) specified conditions under which each of the three roots is the maximum likelihood estimate.

Sampson (1978) acknowledged the effort involved in computing the maximum likelihood estimate when the variances are known and proposed an asymptotically efficient estimator of the correlation based solely on the coefficients in the cubic equation (1). Sampson’s estimator does not necessarily fall in the interval [−1, 1] so he suggested truncating the estimate to lie in the interval, as was done with the empirical estimator with known variances. This less computationally intensive estimator is referred to as Sampson’s truncated MLE approximation, ρ̂⁽⁵⁾, and is the fifth estimator we consider.

The remaining three estimators are Bayesian. Our sixth estimator is the posterior mean assuming a uniform prior, which has the form:

{\hat{ρ}}^{(6)} = E [ρ ∣ X, Y] = \frac{\int_{- 1}^{1} \frac{ρ}{2} {(\frac{1}{2 π \sqrt{1 - ρ^{2}}})}^{n} exp (- \frac{1}{2 (1 - ρ^{2})} [SSx - 2 ρ SSxy + SSy]) d ρ}{\int_{- 1}^{1} \frac{1}{2} {(\frac{1}{2 π \sqrt{1 - ρ^{2}}})}^{n} exp (- \frac{1}{2 (1 - ρ^{2})} [SSx - 2 ρ SSxy + SSy]) d ρ}

where X = (X₁, …, X_n) and Y = (Y₁, …, Y_n). The denominator is the integral of the likelihood of the bivariate normal data multiplied by 1/2, representing the Uniform(−1, 1) prior, while the numerator is the same but with the integrand multiplied by ρ for the expectation.

Jeffreys (1961) described the improper prior, conditional on the variances, as:

λ_{Jeffreys} (ρ) \propto \frac{\sqrt{1 + ρ^{2}}}{1 - ρ^{2}} .

This prior was the basis for the seventh estimator: the posterior mean assuming a Jeffreys prior, ρ̂⁽⁷⁾.

Finally, Jeffreys (1961) noted that the arc-sine prior,

λ_{arc - sine} (ρ) = \frac{1}{π} \frac{1}{\sqrt{1 - ρ^{2}}},

is similar to the Jeffreys prior, but integrable on [−1, 1]. The posterior mean assuming an arc-sine prior, ρ̂⁽⁸⁾, represents the eighth, and final, estimator investigated.

Each of these priors is shown in Figure 1. The curve for the Jeffreys prior is an approximation since it is not integrable on [−1, 1]. Note that the arc-sine distribution on ρ is equivalent to placing a generalized beta (2, 1, 0.5, 0.5) of the first kind on |ρ| (McDonald (1984)). Similarly, the uniform prior corresponds to a generalized beta (1, 1, 1, 1) of the first kind on |ρ|. Of these estimators, the empirical estimator with known variances and truncated empirical estimator with known variances are, to our knowledge, proposed here for the first time.

The density of each of the priors for the Bayesian estimators are shown. The Jeffreys curve is an approximation since it is not integrable on [−1, 1]. Observe that the arc-sine and Jeffreys priors are very similar, but the Jeffreys puts more weight on extreme values.

3 SIMULATION STUDY

3.1 Estimating the Correlation

Samples of sizes 5, 10, and 50 were generated from a bivariate normal distribution with means equal to zero, variances equal to one, and a specified correlation value. The estimators were first evaluated for positive and negative values of the correlation and were all found to be symmetric. Thus values of the correlation were sampled uniformly from symmetric intervals on [−1, 1] to analyze how the estimators performed for different magnitudes of correlation. The estimators were compared based on root mean squared error using one million samples. The results are shown in Table 1.

Table 1.

Root mean squared errors multiplied by 1000 are shown for each estimator based on one million simulated data sets (n=sample size). The estimators with the smallest root mean squared error are shown in bold for each sample size and each true correlation interval.

n	Estimator	\|ρ\|
n	Estimator	[0,1]	[0,.25]	[.25,.50]	[.50,.75]	[.75,1]
5	Sample Correlation Coeff	352	442	406	326	172
	Emp w/ Known Var	516	452	479	529	595
	Trunc Emp w/ Known Var	387	419	399	369	358
	MLE	373	464	437	352	161
	Sampson’s MLE Approx	382	462	435	357	232
	Mean w/ Uniform Prior	297	289	315	332	244
	Mean w/ Jeffreys Prior	311	358	354	319	182
	Mean w/ Arc-sine Prior	299	316	330	325	213

10	Sample Correlation Coeff	240	311	280	213	101
	Emp w/ Known Var	365	319	338	373	421
	Trunc Emp w/ Known Var	299	314	312	295	274
	MLE	248	334	295	203	72
	Sampson’s MLE Approx	249	333	295	206	90
	Mean w/ Uniform Prior	216	241	246	227	124
	Mean w/ Jeffreys Prior	222	277	261	208	92
	Mean w/ Arc-sine Prior	217	254	251	219	109

50	Sample Correlation Coeff	104	139	122	88	39
	Emp w/ Known Var	163	143	151	167	188
	Trunc Emp w/ Known Var	150	143	151	161	145
	MLE	100	142	117	75	29
	Sampson’s MLE Approx	100	142	117	75	29
	Mean w/ Uniform Prior	97	129	116	82	33
	Mean w/ Jeffreys Prior	98	135	115	78	30
	Mean w/ Arc-sine Prior	98	131	116	80	32

Open in a new tab

Numerical issues arose when computing the integrals involved in the posterior mean estimators in cases where the true correlation value was extremely close to one in magnitude. To handle this, a tolerance of 10⁻⁶ × n was put on the value of |SSx + SSy ± 2SSxy| since SSx + SSy ± 2SSxy = 0 signifies a correlation of ∓1, respectively. When this tolerance was satisfied, the correlation estimate was given the appropriate value of 1 or −1. This approximation was used about ten times out of one million in the [0, 1] interval and thirty times out of one million in the [0.75, 1] interval for each sample size.

For the first column, since the correlations were drawn uniformly from the interval [−1, 1], the Bayesian estimator assuming a uniform prior will have the lowest mean squared error according to theory. In samples of size 5, the uniform and arc-sine priors had superior performance over the entire [−1, 1] interval compared to the other estimators, with a root mean squared error of about 0.3. The empirical estimator with known variances performed least well, whereas the maximum likelihood estimator and sample correlation coefficient performed similarly, with the sample correlation coefficient doing slightly better. This suggests that in small sample sizes, knowing the variances yields no improvement when using the maximum likelihood estimator.

However, when the correlations are decomposed by magnitude, a different story is told. For extreme correlation values, the sample correlation coefficient, maximum likelihood estimator and posterior mean assuming a Jeffreys prior had the smallest root mean squared errors. The Jeffreys prior is highly concentrated at extreme correlation values so we would expect it to outperform the other Bayesian estimators in the last interval. The posterior mean assuming an arc-sine prior and that assuming a uniform prior had root mean squared errors 1.3 and 1.5 times as large as that for the MLE, or best estimator. Conversely, at low values of correlation, the uniform and arc-sine posterior mean estimates had significantly lower root mean squared error than all other estimators. The posterior median estimators for each of the priors was also considered. Overall they performed very similar to the posterior mean estimates and hence are not included here.

In general, one does not know the magnitude of the correlation to be estimated, so an estimator that performs well for all levels of correlation is desired. Both the posterior mean assuming an arc-sine prior and that assuming a uniform prior had routinely low root mean squared error values when compared to the other estimators and were fairly consistent across the different correlation magnitudes. Therefore, we concluded that these should be the methods of choice for small sample sizes. One might argue that if estimating large correlations accurately is of greater interest then the posterior mean assuming the arc-sine prior should be used since it outperforms that with a uniform prior at extreme correlations.

As the sample size increased from 5 to 10 and from 10 to 50, the root mean squared errors decreased for all estimators, as expected. For samples of size 50, the root mean squared errors for correlations on the entire interval [−1, 1] were low and effectively the same for all estimators except the empirical estimators when the variances are known. However, the estimators’ performances by magnitude of the correlation still varied as in the case of samples of size 5.

Sampson’s truncated approximation of the maximum likelihood estimator performed similarly to the maximum likelihood estimator for smaller sample sizes and almost identically for the larger sample sizes. This is because, as the sample size increases, the probability of the cubic equation having more than one real root goes to zero. Thus, large samples make it easier to use properties of cubic equations to pinpoint the correct MLE root.

Figure 2 shows the first 5,000 samples of each estimator’s correlation estimates and the true correlation values for samples of size 5. Notice that the empirical estimate with known variances often lay outside the range [−1, 1]. In addition, for small values of the correlation, the empirical estimates, maximum likelihood estimates and Sampson’s estimates were extremely variable, spanning most of the interval [−1, 1]. The Bayesian estimates showed a closer association overall between the true correlation value and the estimates, especially when the true correlation was small. However, there was some curvature in the tails of the plots for the Bayesian estimators, suggesting that the estimators typically underestimate the magnitude of the correlation when the true correlation is high. This is to be expected, as the Bayesian approach shrinks estimators away from the extremes.

For samples of size 5, the true and estimated correlation values for each estimator is shown above for the first 5,000 samples. The dotted lines in the empirical with known variances plot mark the admissible interval [−1, 1].

3.2 Hypothesis Tests

Estimating the value of the correlation is important, but often with small sample sizes our interest is not in its actual value but simply in whether or not it is non-zero. We often have knowledge about the sign of the correlation between two variables. Here we consider the case when we are interested in testing if the correlation is positive.

One way of testing this is to look at the confidence bounds of the estimators. A level 0.05 test of whether the true correlation is positive can be derived by generating numerous samples of independent bivariate normal random variables with means equal to zero and variances equal to one, calculating a correlation estimate for each sample, and determining the sample 95% quantile of the correlations. A level 0.05 test then rejects the hypothesis that the correlation is zero in favor of the alternative that it is positive if the estimate obtained is greater than the 95% quantile, i.e. the significance test bound. Table 2 shows the 95% significance test bounds for all non-Bayesian estimators based on one million simulations with ρ = 0. For example, for the sample correlation coefficient, the significance test bound for samples of size 5 is 0.73, indicating that about 5% of the samples resulted in an estimated correlation value greater than 0.73.

Table 2.

95% Significance Test bounds for testing if ρ > 0 for the non-Bayesian estimators when ρ = 0 based on one million simulated data sets.

Sample Size	5	10	50
Sample Correlation Coeff	0.729	0.522	0.233
Emp w/ Known Var	0.731	0.518	0.232
Truncated Emp w/ Known Var	0.731	0.518	0.232
MLE	0.754	0.565	0.241
Sampson’s MLE Approx	0.756	0.566	0.241

Open in a new tab

For Bayesian tests, Jeffreys (1935, 1961) developed ideas based on Bayes factors for testing/deciding between two models; see also Kass and Raftery (1995). A Bayes factor, B₁₀, is the ratio of the probability of the data under the alternative model to the probability of the data under the null model. Equivalently, it is the ratio of the posterior odds for the alternative against the null model, to its prior odds. A test that rejects the null hypothesis when B₁₀ > 1 minimizes the sum of the probabilities of Type I and Type II errors if the prior odds between the models are equal to one.

However, if we wish to fix the probability of a Type I error at 0.05 for example, we can generate data under the null model and determine the value c such that the probability under the null model that the Bayes factor is greater than c is 0.05. A level 0.05 test is then carried out for the null model against the alternative model by rejecting the null model if the Bayes factor is greater than c. This method was used with ρ = 0 as the null hypothesis and ρ > 0 as the alternative hypothesis to compare the performance of the Bayesian and non-Bayesian methods when the Type I error is fixed at 0.05. Note that the Bayes factor is

B_{10} = \frac{P (X, Y ∣ ρ > 0)}{P (X, Y ∣ ρ = 0)} = \frac{\int_{0}^{1} p (X, Y ∣ ρ) p (ρ ∣ ρ > 0) d ρ}{p (X) p (Y)} = \frac{2 \int_{0}^{1} p (X, Y ∣ ρ) p (ρ) d ρ}{p (X) p (Y)}

(2)

where p(ρ) is one of the three prior distributions for ρ and the denominator is the product of the marginal probabilities assuming ρ = 0, or independence. The factor of two in equation (2) is due to the fact all prior distributions are centered at zero. The Bayes factor is undefined for the Jeffreys prior so we do not consider it here forward.

Table 3 shows the values of c obtained for the various prior distributions and sample sizes. We see that as sample size increased, the values of c decreased since the amount of evidence for the null increased. Also, the values of c for the arc-sine prior were much greater than those for the uniform prior, reflecting the fact that the arc-sine prior places more weight on extreme correlation values.

Table 3.

Value of c such that the Bayes factor has 5% probability of exceeding c if the true value of ρ is 0 (i.e. P(B₁₀ > c|ρ = 0) = 0.05) based on one million simulated data sets.

Sample Size	5	10	50
Uniform Prior	2.701	2.304	1.238
Arc-sine Prior	2.275	1.715	0.817

Open in a new tab

Table 4 shows the power when the true correlation was uniformly generated from various intervals for each of the non-Bayesian significance tests and the tests based on Bayes factors. In samples of size 5 the Bayesian tests had the greatest power over the entire [0, 1] interval and for the most extreme correlation values. For the smaller correlation values, all tests, except possibly those based on the MLE and Sampson’s MLE, performed about the same. The tests based on the arc-sine prior and uniform prior performed similarly for all correlation values and sample sizes. As sample size increased, the difference between the powers of the tests based on the MLE and Sampson’s MLE and all others decreased.

Table 4.

Average power multiplied by 1000 over intervals for ρ when testing ρ = 0 vs ρ > 0 at the 0.05 significance level based on one million simulated data sets. For the non-Bayesian estimators, the significance test bounds found in Table 2 were used. The Bayesian tests were based on the Bayes factors using the value of c listed in Table 3. The tests with the largest power are shown in bold for each sample size and each correlation interval.

n	Test Based on	ρ
n	Test Based on	[0,1]	[0,.25]	[.25,.50]	[.50,.75]	[.75,1]
5	Sample Correlation Coeff	383	81	187	423	839
	Emp w/ Known Var	288	89	198	348	517
	Trunc Emp w/ Known Var	288	89	198	348	517
	MLE	356	69	141	352	862
	Sampson’s MLE Approx	355	69	140	350	861
	Uniform Prior	397	82	196	442	869
	Arc-sine Prior	397	81	192	440	875

10	Sample Correlation Coeff	529	106	326	708	977
	Emp w/ Known Var	441	110	302	562	793
	Trunc Emp w/ Known Var	441	110	302	562	793
	MLE	505	88	262	682	990
	Sampson’s MLE Approx	505	88	260	680	990
	Uniform Prior	534	105	325	722	985
	Arc-sine Prior	534	104	323	723	987

50	Sample Correlation Coeff	770	250	833	998	1000
	Emp w/ Known Var	759	245	800	993	1000
	Trunc Emp w/ Known Var	759	245	800	993	1000
	MLE	768	238	834	999	1000
	Sampson’s MLE Approx	768	238	834	999	1000
	Uniform Prior	772	250	838	998	1000
	Arc-sine Prior	772	250	838	999	1000

Open in a new tab

As mentioned, tests based on the Bayes factor are optimal in that they minimize the sum of the probabilities of Type I and Type II errors when simulating from the prior. For this reason the uniform prior performs best over the entire interval [0,1] for all sample sizes. Table 5 shows the average value of the Type I and Type II error probabilities when the standard rule of rejecting the null hypothesis when the Bayes factor is greater than one is used. This optimal Bayesian method is compared with the significance test bound procedure for the non-Bayesian estimators via this average error measure. The Bayesian tests had the smallest average error for samples of size 5. The MLE and Sampson’s MLE approximation performed very similarly to the Bayesian tests at the extreme correlation values.

Table 5.

Average error probability, [Type I + Type II]/2, when testing if ρ = 0 versus ρ > 0, multiplied by 1000, based on one million simulated data sets. The error probabilities for the non-Bayesian tests are based on 0.05 level significance tests and the Bayesian test error probabilities are based on rejecting the null hypothesis that ρ = 0 if the Bayes factor is greater than 1. The tests with the smallest average error are shown in bold for each sample size and each correlation interval.

n	Test Based on	ρ
n	Test Based on	[0,1]	[0,.25]	[.25,.50]	[.50,.75]	[.75,1]
5	Sample Correlation Coeff	333	485	431	313	106
	Emp w/ Known Var	381	480	426	351	267
	Trunc Emp w/ Known Var	381	480	426	351	267
	MLE	347	490	455	349	94
	Sampson’s MLE Approx	348	491	455	350	95
	Uniform Prior	284	460	351	210	113
	Arc-sine Prior	289	469	374	225	88

10	Sample Correlation Coeff	261	472	362	171	37
	Emp w/ Known Var	304	470	374	244	129
	Trunc Emp w/ Known Var	304	470	374	244	129
	MLE	272	481	394	184	30
	Sampson’s MLE Approx	273	481	395	185	30
	Uniform Prior	235	446	291	128	76
	Arc-sine Prior	240	458	319	132	51

50	Sample Correlation Coeff	140	400	109	26	25
	Emp w/ Known Var	145	402	125	29	25
	Trunc Emp w/ Known Var	145	402	125	29	25
	MLE	141	406	108	25	25
	Sampson’s MLE Approx	141	406	108	25	25
	Uniform Prior	139	389	100	33	32
	Arc-sine Prior	142	411	114	21	20

Open in a new tab

At larger sample sizes, the tests performed effectively equally well. For the extreme correlation values with samples of size 50, all tests have essentially 100% power so their average error achieves its lower bound at one-half the Type I error rate. Notice again that the tests based on the arc-sine prior had slightly smaller average error than that assuming a uniform prior at extreme correlation values and that its performance on the entire interval [0, 1] was close to the uniform, which was best.

4 DISCUSSION

We have considered the estimation of the correlation in bivariate normal data when the means and variances are assumed known, with emphasis on the small sample situation. Using simulation, we found that the posterior mean using a uniform prior or an arc-sine prior consistently outperformed several previously proposed empirical and exact and approximate maximum likelihood estimators for small samples. The arc-sine prior performed similarly to the uniform prior for small values of ρ, and better for large values of ρ in small samples. This suggests using the posterior mean with the arc-sine prior for estimation when it is important to identify extreme correlations.

For testing whether the correlation is zero, we carried out a simulation for positive values of ρ within specified intervals, and found that Bayesian tests had smaller average error than the non-Bayesian tests when n = 5. With n = 50, however, all the tests performed similarly.

Spruill and Gastwirth (1982) derived estimators of the correlation when the data are normal but the variables are contained in separate locations and cannot be combined. Their work combines the data into groups based on the value of one variable to obtain an estimate of the correlation. This differs from the more usual situation considered here where both variables are available in their sampled pairs.

Estimation of the sample correlation coefficient with truncation was investigated by Gajjar and Subrahmaniam (1978). However, it is the underlying distribution that is assumed to be truncated instead of the estimator as here.

Data sets and distributions for which use of the sample correlation coefficient is inappropriate were investigated by Carroll (1961). Norris and Hjelm (1961) considered estimation of correlation when the underlying distribution is not normal, and Farlie (1960) considered it for general bivariate distribution functions. Since we limit ourselves to the bivariate normal distribution, we did not consider these estimators.

Olkin and Pratt (1958) derived unbiased estimates of the correlation in the case when the means are known and the case when all parameters are unknown. This addresses different situations to the one we have considered, where the variances are also assumed known.

Others have considered estimating the correlation in a Bayesian framework for the bivariate normal setting. Berger and Sun (2008) addressed this problem using objective priors whose posterior quantiles match up with the corresponding frequentist quantiles. Ghosh et al. (2010) extended these results by considering a probability matching criterion based on highest posterior density regions and the inversion of test statistics. However, in both cases the focus was on matching frequentist probabilities rather than estimation accuracy.

Much of the other Bayesian correlation work relates to estimation of covariance matrices. Barnard et al. (2000) discussed prior distributions on covariance matrices by decomposing the covariance matrix into Σ = SRS where S = diag(s) is a diagonal matrix of standard deviations and R is the correlation matrix. With this, one can use the prior factorization p(σ, R) = p(σ)p(R|σ) to specify a prior on the covariance matrix. Barnard et al. (2000) suggest some default choices for the prior distribution on R that are independent of σ. Specifically they mention the possibility of placing a uniform distribution on R, p(R) ∝ 1, where R must be positive definite. The marginal distributions of the individual correlations are then not uniform. Alternatively, for a (d × d) matrix R one can specify

p (R ∣ ν) \propto ∣ R ∣^{\frac{1}{2} (ν - 1) (d - 1) - 1} (\prod_{i = 1}^{d} ∣ R_{i i} ∣^{- ν / 2}), ν \geq d,

where R_ii is the ith principal submatrix of R. This is the marginal distribution of R when Σ has a standard inverse-Wishart distribution with ν degrees of freedom and results in the following marginal distribution on the pairwise correlations

f (r_{i j} ∣ ν) \propto {(1 - r_{i j}^{2})}^{\frac{ν - d - 1}{2}}, where ∣ r_{i j} ∣ \leq 1

Uniform marginal distributions for all pairwise correlations comes from the choice ν = d + 1. Note that for ν = 2 and d = 2, this prior reduces to the arc-sine prior. This is the boundary case that is the most diffuse prior in the class. Barnard et al. (2000) discussed using these priors for shrinkage estimation of regression coefficients and a general location-scale model for both categorical and continuous variables. Zhang et al. (2006) focused on methods for sampling such correlation matrices.

Liechty et al. (2004) considered a model where all correlations have a common truncated normal prior distribution under the constraint that the resulting correlation matrix be positive definite. They also considered the model where the correlations or observed variables are clustered into groups that share a common mean and variance. Chib and Greenberg (1998) assumed a multivariate truncated normal prior in the context of a multivariate probit model, and Liu and Sun (2000) and Liu (2001) assumed a Jeffreys’ prior on R in the context of a multivariate probit and multivariate multiple regression model.

A number of advances have been made with respect to estimation of the covariance matrix treating the variances as unknown, unlike here. Geisser and Cornfield (1963) developed posterior distributions for multivariate normal parameters with an objective prior, and Yang and Berger (1994) focused on estimation with reference priors. Geisser (1965), Tiwari et al. (1989), and Press and Zellner (1978) derived posterior distributions of the multiple correlation coefficient using the prior from Geisser and Cornfield, an informative beta distribution, and diffuse and natural conjugate priors assuming fixed regressors, respectively. It is possible that some of these ideas regarding prior specification of covariance matrices could be applied to the present setting or be used to extend this work to the multivariate setting.

Footnotes

This work was supported by NICHD grant R01 HD54511. Raftery’s research was also partially supported by NIH grant R01 GM084163, and NSF grants ATM0724721 and IIS0534094. The authors thank Sam Clark, Jon Wellner and the Probabilistic Population Projections Group at the University of Washington for helpful comments and discussion.

Contributor Information

Bailey K. Fosdick, Email: bfosdick@u.washington.edu.

Adrian E. Raftery, Email: raftery@u.washington.edu.

References

Alkema L, Raftery AE, Gerland P, Clark SJ, Pelletier F, Buettner T, Heilig G. Probabilistic Projections of the Total Fertility Rate for All Countries. Demography. 2011;48(3):815–839. doi: 10.1007/s13524-011-0040-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barnard J, McCulloch R, Meng X. Modeling Covariance Matrices in Terms of Standard Deviations and Correlations, With Application to Shrinkage. Statistics Sinica. 2000;10:1281–1311. [Google Scholar]
Berger JO, Sun D. Objective Priors for the Bivariate Normal Model. Annals of Statistics. 2008;36:963–982. [Google Scholar]
Carroll JB. The Nature of the Data, or How to Choose a Correlation Coefficient. Psychometrika. 1961;26:347–372. [Google Scholar]
Chib S, Greenberg E. Analysis of Multivariate Probit Models. Biometrika. 1998;85:347–361. [Google Scholar]
Farlie DJG. The Performance of Some Correlation Coefficients for a General Bivariate Distribution. Biometrika. 1960;47:307–323. [Google Scholar]
Gajjar AV, Subrahmaniam K. On the Sample Correlation Coefficient in the Truncated Bivariate Normal Population. Communications in Statistics - Simulation and Computation. 1978;7:455–477. [Google Scholar]
Geisser S. Bayesian Estimation in Multivariate Analysis. The Annals of Mathematical Statistics. 1965;36:150–159. [Google Scholar]
Geisser S, Cornfield J. Posterior Distributions for Multivariate Normal Parameters. Journal of the Royal Statistical Society Series B (Methodological) 1963;25:368–376. [Google Scholar]
Ghosh M, Mrkherjee B, Santra U, Kim D. Bayesian and Likelihood-based Inference for the Bivariate Normal Correlation Coefficient. Journal of Statistical Planning and Inference. 2010;140:1410–1416. [Google Scholar]
Jeffreys H. Some Tests of Significance, Treated by the Theory of Probability. Proceedings of the Cambridge Philosophy Society. 1935;31:203–222. [Google Scholar]
Jeffreys H. Theory of Probability. Oxford University Press; 1961. [Google Scholar]
Kass RE, Raftery AE. Bayes Factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]
Kendall SM, Stuart A. The Advanced Theory of Statistics. 4. Vol. 2. MacMillan Publishing Co., Inc; 1979. [Google Scholar]
Liechty JC, Liechty M, Muller P. Bayesian Correlation Estimation. Biometrika. 2004;91:1–14. [Google Scholar]
Liu C. Discussion: Bayesian Analysis of Multivariate Probit Model. Journal of Computational and Graphical Statistics. 2001;10:75–81. [Google Scholar]
Liu C, Sun DX. Analysis of Interval Censored Data from Fractionated Experiments using Covariance Adjustments. Technometrics. 2000;42:353–365. [Google Scholar]
Madansky A. On the Maximum Likelihood Estimate of the Correlation Coefficient. Rand Corporation; Santa Monica, Calif: 1958. Report No. P-1355. [Google Scholar]
McDonald JB. Some Generalized Functions for the Size Distribution of Income. Econometrica. 1984;52(3):647–665. [Google Scholar]
Norris RC, Hjelm HF. Nonnormality and Product Moment Correlation. The Journal of Experimental Education. 1961;29:261–270. [Google Scholar]
Olkin I, Pratt JW. Unbiased Estimation of Certain Correlation Coefficients. The Annals of Mathematical Statistics. 1958;29:201–211. [Google Scholar]
Press SJ, Zellner A. Posterior Distribution for the Multiple Correlation Coefficient with Fixed Regressors. Journal of Econometrics. 1978;8:307–321. [Google Scholar]
Rodgers JL, Nicewander WA. Thirteen Ways to Look at the Correlation Coefficient. The American Statistician. 1988;42:59–66. [Google Scholar]
Sampson AR. Simple BAN Estimators of Correlations for Certain Multivariate Normal Models with Known Variances. Journal of the American Statistical Association. 1978;73:859–862. [Google Scholar]
Spruill NL, Gastwirth JL. On the Estimation of the Correlation Coefficient from Grouped Data. Journal of the American Statistical Association. 1982;77:614–620. [Google Scholar]
Tiwari RC, Chib S, Jammalamadaka SR. Bayes Estimation of the Multiple Correlation Coefficient. Communications in Statistics - Theory and Methods. 1989;18:1401–1413. [Google Scholar]
Yang R, Berger JO. Posterior Distributions for Multivariate Normal Parameters. Annals of Statistics. 1994;22:1195–1211. [Google Scholar]
Zhang X, Boscardin WJ, Belin TR. Sampling Correlation Matrices in Bayesian Models With Correlated Latent Variables. Journal of Computational and Graphical Statistics. 2006;15:880–896. [Google Scholar]

[R1] Alkema L, Raftery AE, Gerland P, Clark SJ, Pelletier F, Buettner T, Heilig G. Probabilistic Projections of the Total Fertility Rate for All Countries. Demography. 2011;48(3):815–839. doi: 10.1007/s13524-011-0040-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Barnard J, McCulloch R, Meng X. Modeling Covariance Matrices in Terms of Standard Deviations and Correlations, With Application to Shrinkage. Statistics Sinica. 2000;10:1281–1311. [Google Scholar]

[R3] Berger JO, Sun D. Objective Priors for the Bivariate Normal Model. Annals of Statistics. 2008;36:963–982. [Google Scholar]

[R4] Carroll JB. The Nature of the Data, or How to Choose a Correlation Coefficient. Psychometrika. 1961;26:347–372. [Google Scholar]

[R5] Chib S, Greenberg E. Analysis of Multivariate Probit Models. Biometrika. 1998;85:347–361. [Google Scholar]

[R6] Farlie DJG. The Performance of Some Correlation Coefficients for a General Bivariate Distribution. Biometrika. 1960;47:307–323. [Google Scholar]

[R7] Gajjar AV, Subrahmaniam K. On the Sample Correlation Coefficient in the Truncated Bivariate Normal Population. Communications in Statistics - Simulation and Computation. 1978;7:455–477. [Google Scholar]

[R8] Geisser S. Bayesian Estimation in Multivariate Analysis. The Annals of Mathematical Statistics. 1965;36:150–159. [Google Scholar]

[R9] Geisser S, Cornfield J. Posterior Distributions for Multivariate Normal Parameters. Journal of the Royal Statistical Society Series B (Methodological) 1963;25:368–376. [Google Scholar]

[R10] Ghosh M, Mrkherjee B, Santra U, Kim D. Bayesian and Likelihood-based Inference for the Bivariate Normal Correlation Coefficient. Journal of Statistical Planning and Inference. 2010;140:1410–1416. [Google Scholar]

[R11] Jeffreys H. Some Tests of Significance, Treated by the Theory of Probability. Proceedings of the Cambridge Philosophy Society. 1935;31:203–222. [Google Scholar]

[R12] Jeffreys H. Theory of Probability. Oxford University Press; 1961. [Google Scholar]

[R13] Kass RE, Raftery AE. Bayes Factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]

[R14] Kendall SM, Stuart A. The Advanced Theory of Statistics. 4. Vol. 2. MacMillan Publishing Co., Inc; 1979. [Google Scholar]

[R15] Liechty JC, Liechty M, Muller P. Bayesian Correlation Estimation. Biometrika. 2004;91:1–14. [Google Scholar]

[R16] Liu C. Discussion: Bayesian Analysis of Multivariate Probit Model. Journal of Computational and Graphical Statistics. 2001;10:75–81. [Google Scholar]

[R17] Liu C, Sun DX. Analysis of Interval Censored Data from Fractionated Experiments using Covariance Adjustments. Technometrics. 2000;42:353–365. [Google Scholar]

[R18] Madansky A. On the Maximum Likelihood Estimate of the Correlation Coefficient. Rand Corporation; Santa Monica, Calif: 1958. Report No. P-1355. [Google Scholar]

[R19] McDonald JB. Some Generalized Functions for the Size Distribution of Income. Econometrica. 1984;52(3):647–665. [Google Scholar]

[R20] Norris RC, Hjelm HF. Nonnormality and Product Moment Correlation. The Journal of Experimental Education. 1961;29:261–270. [Google Scholar]

[R21] Olkin I, Pratt JW. Unbiased Estimation of Certain Correlation Coefficients. The Annals of Mathematical Statistics. 1958;29:201–211. [Google Scholar]

[R22] Press SJ, Zellner A. Posterior Distribution for the Multiple Correlation Coefficient with Fixed Regressors. Journal of Econometrics. 1978;8:307–321. [Google Scholar]

[R23] Rodgers JL, Nicewander WA. Thirteen Ways to Look at the Correlation Coefficient. The American Statistician. 1988;42:59–66. [Google Scholar]

[R24] Sampson AR. Simple BAN Estimators of Correlations for Certain Multivariate Normal Models with Known Variances. Journal of the American Statistical Association. 1978;73:859–862. [Google Scholar]

[R25] Spruill NL, Gastwirth JL. On the Estimation of the Correlation Coefficient from Grouped Data. Journal of the American Statistical Association. 1982;77:614–620. [Google Scholar]

[R26] Tiwari RC, Chib S, Jammalamadaka SR. Bayes Estimation of the Multiple Correlation Coefficient. Communications in Statistics - Theory and Methods. 1989;18:1401–1413. [Google Scholar]

[R27] Yang R, Berger JO. Posterior Distributions for Multivariate Normal Parameters. Annals of Statistics. 1994;22:1195–1211. [Google Scholar]

[R28] Zhang X, Boscardin WJ, Belin TR. Sampling Correlation Matrices in Bayesian Models With Correlated Latent Variables. Journal of Computational and Graphical Statistics. 2006;15:880–896. [Google Scholar]

PERMALINK

Estimating the Correlation in Bivariate Normal Data with Known Variances and Small Sample Sizes¹

Bailey K Fosdick

Adrian E Raftery

Roles

Abstract

1 INTRODUCTION

2 ESTIMATORS OF CORRELATION

Figure 1.

3 SIMULATION STUDY

3.1 Estimating the Correlation

Table 1.

Figure 2.

3.2 Hypothesis Tests

Table 2.

Table 3.

Table 4.

Table 5.

4 DISCUSSION

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Estimating the Correlation in Bivariate Normal Data with Known Variances and Small Sample Sizes1

Bailey K Fosdick

Adrian E Raftery

Roles

Abstract

1 INTRODUCTION

2 ESTIMATORS OF CORRELATION

Figure 1.

3 SIMULATION STUDY

3.1 Estimating the Correlation

Table 1.

Figure 2.

3.2 Hypothesis Tests

Table 2.

Table 3.

Table 4.

Table 5.

4 DISCUSSION

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Estimating the Correlation in Bivariate Normal Data with Known Variances and Small Sample Sizes¹