Abstract
In this paper, we provide a unified framework for two-sample t-test with partially paired data. We show that many existing two-sample t-tests with partially paired data can be viewed as special members in our unified framework. Some shortcomings of these t-tests are discussed. We also propose the asymptotically optimal weighted linear combination of the test statistics comparing all four paired and unpaired data sets. Simulation studies are used to illustrate the performance of our proposed asymptotically optimal weighted combinations of test statistics and compare with some existing methods. It is found that our proposed test statistic is generally more powerful. Three real data sets about CD4 count, DNA extraction concentrations, and the quality of sleep are also analyzed by using our newly introduced test statistic.
Keywords: Partially paired, optimal weight, two-sample, t-tests, unpaired data
1. Introduction
In medical research, paired data are commonly encountered. A typical example is pre- and post-treatment studies, in which the endpoints for each patient before and after the treatment are measured. In practice, however, the data collected are often only partially paired due to various reasons. For instance, some patients may drop out after the first treatment, while some other patients may refuse to receive the first treatment, but decide to receive the second one. As a result, the data actually observed is only partially paired and consists of two subsamples: one subsample of individuals who have both pre- and post-treatment measurements (the paired subsample), and another subsample of individuals who have only either pre-treatment or post-treatment (the independent subsample). Examples for partially correlated data can be found in the literature, see for instance, Lim et al. [9], Kuan [7], Pazos et al. [13] and Qin et al. [14].
To be precise, let and denote the pre- and post-treatment measurements, respectively, for the ith patient in a pre–post-treatment study. Suppose that among a total of m + n + l patients under study, m patients have both pre- and post-treatment (i.e. paired) measurements and ; another n patients have only post-treatment measurements ; and additional l patients have only pre-treatment measurements . Partially paired data occur when we have both paired and unpaired data, e.g. or . The former can be considered as a special case of the latter with l = 0. Partially paired data can be viewed as a kind of missing data. Following the literature, the missing mechanism is usually assumed to be missing completely at random (MCAR, Little and Rubin [11]).
For partially paired data, we want to know whether there is any possible difference between the pre- and post-treatment measurements. For mean difference, t-test can be adopted. One simple approach is to discard the unpaired (or paired) observations and use the paired (or independent) t-test for the remaining paired (or independent) observations. Obviously, this approach is not efficient because some useful data is simply deleted. Another naive approach is to treat the partially paired data as two independent samples and use the two-sample t-test to compare the means. Although all available data are used, this approach ignores the correlation between the paired measurements, and thus this method is not valid.
Several alternative methods have been proposed to analyze partially paired data. For instance, Lin and Stivers [10] obtained a modified maximum likelihood estimator (MLE) of the mean difference and formed a t-test based on the asymptotic distribution of the modified MLE. See also Ekbohm [2]. Looney and Jones [12] proposed a corrected z-test by using all of the observed data and accounting for the correlation between the pre- and post-treatment measurements in the paired sample. Samawi and Vogel [16] proposed a test statistic that pools the pre-post mean difference calculated from paired sample with that calculated from unpaired observations. Kim et al. [6] introduced the t3 statistic, which is a linear combination of paired and unpaired t statistics, with weights depending on the sample size. Based on methods from meta-analysis (Zaykin [18]), Kuan and Huang [8] proposed a p value pooling method to pool the p value from comparing and and the p value from comparing and . Recently, a comparative review of various t-tests for partially paired data can be found from Guo and Yuan [3]. They further introduced an optimal pooled t-test. We should note that in their and also our paper, the ‘optimal’ refers to ‘asymptotically optimal’.
In this paper, we provide a unified framework for two-sample t-test with partially paired data. We show that many existing two-sample t-tests with partially paired data are actually special members in our unified framework. In fact, the statistics introduced recently by Looney and Jones [12], Kim et al. [6], Samawi and Vogel [16] and Guo and Yuan [3] all belong to the unified class. In this unified framework, we point out the shortcomings of some existing methods. To improve the performance, we further propose the asymptotically optimal weighted linear combination of the test statistics comparing all four paired and unpaired data sets. Simulation studies are used to illustrate the performance of our proposed asymptotically optimal weighted combinations of test statistics and compare with some existing methods. Three real data sets are also analyzed by using our newly introduced test statistic. These three real examples are about CD4 count, DNA extraction concentrations, and the quality of sleep, respectively. There are some challenges of analyzing such data. For instance, for the CD4 count example, there are only extra unpaired observations from one sample. Then most existing methods, including those proposed by Kim et al. [6], Samawi and Vogel [16] and Guo and Yuan [3], are not applicable. However, our proposed procedure explores deeply the information in the available data and still works in this situation.
The paper is organized as follows. In Section 2, we first introduce our unified framework and then show that many existing test statistics are special members in our framework. Shortcomings of these existing methods are discussed. We also develop the asymptotically optimal weighted linear combination of test statistics based on all four comparisons in this section. Simulation studies are reported in Section 3 where we compare the performance of our proposed asymptotically optimal weighted combinations of test statistics with some existing methods. Three real data examples are reported in Section 4 to illustrate the application of our proposed tests. We end the paper with a discussion in Section 5.
2. Test statistics construction
Before we introduce our unified framework, we first give the following notations:
Define . Here T denotes the transpose of vector. In this paper, we consider the null hypothesis , where and are the expectations of X and Y. Under the null hypothesis, we have the following theorem:
Theorem 2.1
Let be i.i.d. bivariate observations with marginal distribution functions F and G. Let be i.i.d. observations from G and be i.i.d. observations from F. Further assume , and are independent. The variances of are finite. Assume and . Then under , we have
where denotes convergence in distribution and
Now we consider the weighted linear combinations of in the following general form. Let , with . Recall that under . Define the test statistic
In the following, we will show that many existing two-sample t-tests with partially paired data can be viewed as special members of the above-defined . In practical applications, V should be replaced by a consistent estimator, . In fact, the variance matrix V can be easily estimated. Specially, , and can be estimated by the sample variance and sample covariances based on the paired sample. While we estimate and by the sample variances of the pooled samples ( ), and ( ), respectively. By Slutsky Theorem, the asymptotic distributions of and of with V replaced with coincide. Thus in theoretical comparisons, we focus on known V.
2.1. Corrected z-test
Looney and Jones [12] proposed a corrected z-test to compare the means for partially paired data. The corrected z-test uses all of the observed data and is based on the difference between the pre-treatment and post-treatment means, and . To be precise, here
In other words, and are the pooled mean estimators of X and Y from ( , ), and ( , ), respectively. As a result, we get:
This implies that is a weighted linear combination of with the weight vector being:
2.2. The t3 statistic
Kim et al. [6] introduced a linear combination of the pre-post mean difference calculated from paired observations (i.e. ) with that calculated from unpaired observations, , with weights depending on the sample sizes. Let be the harmonic mean of n and l. The form is as follows:
We have that is a weighted linear combination of with weight vector
2.3. Pooled t-test
Samawi and Vogel [16] proposed another pooled test statistic, which assigns equal weights to the two estimates of the mean difference, and . While, is a weighted linear combination of with the weight vector being:
2.4. Weighted t-test
Another way to combine information between paired observations and unpaired observations is to take a weighted average of the paired t-test statistic (for paired observations) and the two-sample t-test statistic (for unpaired observations). Samawi and Vogel [16] proposed the following weighted t-test,
Here . Immediately, is also a weighted linear combination of with the weight vector being:
2.5. Optimal pooled t-test
Recently Guo and Yuan [3] pointed out that the proposed by Samawi and Vogel [16] is not optimal in terms of the statistical power to detect the pre-post difference. They introduced an optimal pooled t-test based on and . They considered the weighted linear combination of with weight vector being:
2.6. Comparison between the above t-tests
Now we make a short comparison between the above-reviewed t-tests. It is clear that the last four test statistics, , and all rely on and only. Among these four tests, asymptotically, is the most powerful test. On the other hand, explores the information contained not only in and , but also in and . However, the weights used to combine the individual comparisons depend on only the sample sizes but not the correlation between X and Y, and thus the power of the test generally is not optimal. When there is only extra unpaired observations from one sample, the sample structure can be represented as . Then we can only use and . In this situation, , and are not applicable. But can still work by setting l = 0.
2.7. Optimal weighted linear combination of
In this subsection, we aim to find the optimal weighted linear combination of . Under . This implies that all with different ω can control the size well. Thus the differences between with different ω are the power performances under alternative hypotheses. To this end, we now consider the following Pitman local alternative hypotheses:
Here . In this situation, by adopting the central limit theorem for a double array (under stronger conditions that can be seen in section 1.9.3 in Serfling [17]), we have:
Here is a 4-dimensional vector, whose elements are all ones. Under ,
Further recall that
Then under , we have with .
Note that all the tests including our test are special members of with different weight vectors. Thus we only present the critical region (CR) of the general test statistic . The forms of CRs depend on the forms of alternative hypotheses. For two-sided alternative hypotheses , the CR of our test (and, in general, of all tests) should be . Here is the upper percentile for the standard normal distribution. If the alternative hypotheses are one-sided or , then the CR should be or , respectively. Throughout the paper, we focus on the two-sided alternative hypotheses. For the above two-sided Pitman local alternative hypotheses , the CR is . Further note that the asymptotic power function, , under has the following form:
Here is the distribution function of the standard normal distribution. The derivative of with respect to Δ is
Here is the density function of the standard normal distribution. This implies that with increase of , would be larger. As a result, the optimal weight choice is the one, which minimizes .
For determining , we apply the Lagrange multiplier method. In fact, this is the problem of minimizing subject to the restriction that . By Lagrange multiplier method, we have:
Then we get two equations as follows:
From the first equation, we have . Plugging this result in the second equation, we then get . Then we get:
It is noted that a global minimum is in fact achieved. With the above-defined weight vector, , asymptotically is the most powerful one among all weighted linear combinations of . In other words, asymptotically is more powerful than , and , which will be further confirmed by our simulation studies.
2.8. Incompleteness in single sample
When there is only extra unpaired observations from one sample, the sample structure can be represented as . While extensive statistical researches exist for paired data with incompleteness in both samples, hardly any recent work can be found on paired data with incompleteness in single sample. In this situation, the two-sample t-tests , and cannot be used. While our proposed is still applicable. In this situation, and the variance matrix degenerates into
3. Simulation studies
In this section, some simulation studies are conducted to compare our proposed test statistics with suggested by Guo and Yuan [3], and for and for . In all studies, the nominal type 1 error rate is set at 0.05. The reported results are based on 10000 Monte Carlo replicates.
Three bivariate distributions are considered to generate . The first one is bivariate normal distribution with mean and covariance matrix
where ρ is the correlation coefficient between X and Y. We consider three values of δ: (for evaluating the type I error) and and 0.5 (for evaluating power), and three correlation coefficients: 0, 0.5, and 0.8, representing no, moderate, and strong correlations, respectively. The second distribution is a bivariate t distribution with non-centrality parameter , scale matrix , and degrees of freedom 3. The last one is the gamma distribution, which is implemented using normal copulas. The correlation is set as 0, 0.5, and 0.8. The marginal distributions are set as and .
We first consider the tests proposed for . We consider m = 30 or 50 and two different unpaired sample size scenarios: n = l = 20 or 60. The simulation results are shown in Tables 1–3, respectively. First for the empirical sizes, we find that all the three test statistics can control the sizes well. For the empirical powers, performs slightly better than , and they both are more powerful than . The powers increase when δ becomes larger or the sample sizes increase. Further when the correlation between X and Y increases, the powers also become larger. We also observe that the powers of the three test statistics are larger under normal distributions than those under t distribution, which in turn are larger than those under gamma distribution.
Table 2. Empirical sizes and powers of , and with t distribution for partially paired data.
n = l | ρ | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 |
---|---|---|---|---|---|---|---|---|---|---|
m = 30 | ||||||||||
20 | 0.0549 | 0.0578 | 0.0610 | 0.0510 | 0.0532 | 0.0546 | 0.0527 | 0.0561 | 0.0577 | |
60 | 0.0627 | 0.0638 | 0.0669 | 0.0505 | 0.0498 | 0.0510 | 0.0558 | 0.0567 | 0.0589 | |
m = 30 | ||||||||||
20 | 0.1131 | 0.1528 | 0.2687 | 0.1094 | 0.1295 | 0.1491 | 0.1113 | 0.1457 | 0.2561 | |
60 | 0.1626 | 0.1977 | 0.3154 | 0.1376 | 0.1514 | 0.1688 | 0.1510 | 0.1869 | 0.2934 | |
m = 30 | ||||||||||
20 | 0.3977 | 0.5625 | 0.8347 | 0.3811 | 0.4795 | 0.5840 | 0.3909 | 0.5517 | 0.8222 | |
60 | 0.5760 | 0.7059 | 0.8951 | 0.5459 | 0.6210 | 0.6672 | 0.5657 | 0.6958 | 0.8867 | |
m = 50 | ||||||||||
20 | 0.0514 | 0.0550 | 0.0547 | 0.0508 | 0.0515 | 0.0558 | 0.0519 | 0.0529 | 0.0521 | |
60 | 0.0545 | 0.0560 | 0.0577 | 0.0498 | 0.0514 | 0.0533 | 0.0507 | 0.0510 | 0.0544 | |
m = 50 | ||||||||||
20 | 0.1280 | 0.1806 | 0.3367 | 0.1279 | 0.1627 | 0.2135 | 0.1287 | 0.1780 | 0.3296 | |
60 | 0.1698 | 0.2243 | 0.3812 | 0.1604 | 0.1879 | 0.2108 | 0.1660 | 0.2178 | 0.3665 | |
m = 50 | ||||||||||
20 | 0.4823 | 0.6948 | 0.9303 | 0.4736 | 0.6279 | 0.7832 | 0.4791 | 0.6887 | 0.9232 | |
60 | 0.6490 | 0.8039 | 0.9540 | 0.6276 | 0.7184 | 0.7923 | 0.6441 | 0.7937 | 0.9487 |
Table 1. Empirical sizes and powers of , and with normal distribution for partially paired data.
n = l | ρ | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 |
---|---|---|---|---|---|---|---|---|---|---|
m = 30 | ||||||||||
20 | 0.0576 | 0.0595 | 0.0602 | 0.0518 | 0.0527 | 0.0535 | 0.0549 | 0.0557 | 0.0571 | |
60 | 0.0592 | 0.0614 | 0.0647 | 0.0503 | 0.0521 | 0.0533 | 0.0528 | 0.0548 | 0.0574 | |
m = 30 | ||||||||||
20 | 0.1854 | 0.2611 | 0.4724 | 0.1756 | 0.2296 | 0.2849 | 0.1842 | 0.2581 | 0.4671 | |
60 | 0.2820 | 0.3536 | 0.5532 | 0.2728 | 0.3099 | 0.3466 | 0.2763 | 0.3471 | 0.5418 | |
m = 30 | ||||||||||
20 | 0.7133 | 0.8831 | 0.9949 | 0.7091 | 0.8424 | 0.9285 | 0.7124 | 0.8822 | 0.9948 | |
60 | 0.9164 | 0.9718 | 0.9990 | 0.9160 | 0.9582 | 0.9756 | 0.9179 | 0.9724 | 0.9990 | |
m = 50 | ||||||||||
20 | 0.0543 | 0.0556 | 0.0570 | 0.0521 | 0.0532 | 0.0540 | 0.0539 | 0.0515 | 0.0585 | |
60 | 0.0587 | 0.0565 | 0.0559 | 0.0538 | 0.0490 | 0.0510 | 0.0562 | 0.0554 | 0.0539 | |
m = 50 | ||||||||||
20 | 0.2241 | 0.3563 | 0.6473 | 0.2202 | 0.3229 | 0.4420 | 0.2223 | 0.3556 | 0.6444 | |
60 | 0.3289 | 0.4470 | 0.7048 | 0.3267 | 0.3975 | 0.4581 | 0.3276 | 0.4441 | 0.7023 | |
m = 50 | ||||||||||
20 | 0.8374 | 0.9747 | 0.9998 | 0.8381 | 0.9547 | 0.9953 | 0.8374 | 0.9748 | 0.9999 | |
60 | 0.9579 | 0.9940 | 1.0000 | 0.9578 | 0.9889 | 0.9959 | 0.9575 | 0.9947 | 1.0000 |
Table 3. Empirical sizes and powers of , and with gamma distribution for partially paired data.
n = l | ρ | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 |
---|---|---|---|---|---|---|---|---|---|---|
m = 30 | ||||||||||
20 | 0.0619 | 0.0643 | 0.0689 | 0.0544 | 0.0532 | 0.0537 | 0.0565 | 0.0550 | 0.0581 | |
60 | 0.0603 | 0.0664 | 0.0688 | 0.0487 | 0.0489 | 0.0500 | 0.0507 | 0.0536 | 0.0563 | |
m = 30 | ||||||||||
20 | 0.1068 | 0.1356 | 0.2094 | 0.0974 | 0.1114 | 0.1320 | 0.1001 | 0.1270 | 0.2019 | |
60 | 0.1363 | 0.1679 | 0.2570 | 0.1207 | 0.1398 | 0.1545 | 0.1265 | 0.1569 | 0.2415 | |
m = 30 | ||||||||||
20 | 0.3294 | 0.4642 | 0.7478 | 0.3166 | 0.4143 | 0.5232 | 0.3232 | 0.4606 | 0.7431 | |
60 | 0.5016 | 0.6155 | 0.8288 | 0.4906 | 0.5630 | 0.6148 | 0.4973 | 0.6138 | 0.8246 | |
m = 50 | ||||||||||
20 | 0.0539 | 0.0580 | 0.0564 | 0.0519 | 0.0538 | 0.0518 | 0.0521 | 0.0559 | 0.0544 | |
60 | 0.0582 | 0.0587 | 0.0593 | 0.0526 | 0.0535 | 0.0508 | 0.0527 | 0.0531 | 0.0545 | |
m = 50 | ||||||||||
20 | 0.1105 | 0.1514 | 0.2811 | 0.1080 | 0.1370 | 0.1848 | 0.1075 | 0.1491 | 0.2782 | |
60 | 0.1467 | 0.1916 | 0.3078 | 0.1383 | 0.1674 | 0.1905 | 0.1419 | 0.1861 | 0.2997 | |
m = 50 | ||||||||||
20 | 0.4151 | 0.6062 | 0.9090 | 0.4100 | 0.5623 | 0.7372 | 0.4118 | 0.6047 | 0.9065 | |
60 | 0.5826 | 0.7196 | 0.9408 | 0.5739 | 0.6684 | 0.7681 | 0.5809 | 0.7211 | 0.9386 |
We also consider the tests designed for . The sample size of the paired subsample is m = 30 or 50, while the sample size of the unpaired subsample is set to be n = 20 or 60. In this situation, is not applicable, and thus we focus on comparing and . The simulation results are presented in Tables 4–6. From these tables, we observe that when there is no correlation between X and Y, the powers of and are almost the same. However, when the correlation is not zero, performs better than .
Table 5. Empirical sizes and powers of and with t distribution for partially paired data.
n | ρ | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 |
---|---|---|---|---|---|---|---|
m = 30 | |||||||
20 | 0.0535 | 0.0544 | 0.0562 | 0.0537 | 0.0509 | 0.0523 | |
60 | 0.0558 | 0.0563 | 0.0576 | 0.0523 | 0.0542 | 0.0548 | |
m = 30 | |||||||
20 | 0.0913 | 0.1305 | 0.2536 | 0.0912 | 0.1151 | 0.1583 | |
60 | 0.1147 | 0.1494 | 0.2612 | 0.1057 | 0.1180 | 0.1389 | |
m = 30 | |||||||
20 | 0.3257 | 0.4981 | 0.8070 | 0.3198 | 0.4395 | 0.6038 | |
60 | 0.3770 | 0.5279 | 0.8110 | 0.3606 | 0.4322 | 0.5037 | |
m = 50 | |||||||
20 | 0.0503 | 0.0531 | 0.0512 | 0.05128 | 0.0550 | 0.0487 | |
60 | 0.0508 | 0.0510 | 0.0537 | 0.0494 | 0.0511 | 0.0520 | |
m = 50 | |||||||
20 | 0.1105 | 0.1787 | 0.3430 | 0.1109 | 0.1599 | 0.2377 | |
60 | 0.1327 | 0.1788 | 0.3475 | 0.1274 | 0.1540 | 0.1926 | |
m = 50 | |||||||
20 | 0.4246 | 0.6439 | 0.9169 | 0.4192 | 0.6040 | 0.8044 | |
60 | 0.4830 | 0.6839 | 0.9212 | 0.4730 | 0.6039 | 0.7282 |
Table 4. Empirical sizes and powers of and with normal distribution for partially paired data.
n | ρ | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 |
---|---|---|---|---|---|---|---|
m = 30 | |||||||
20 | 0.0563 | 0.0581 | 0.0624 | 0.0532 | 0.0557 | 0.0543 | |
60 | 0.0598 | 0.0611 | 0.0623 | 0.0552 | 0.0554 | 0.0576 | |
m = 30 | |||||||
20 | 0.1506 | 0.2293 | 0.4316 | 0.1501 | 0.2060 | 0.2790 | |
60 | 0.1757 | 0.2423 | 0.4453 | 0.1686 | 0.2020 | 0.2406 | |
m = 30 | |||||||
20 | 0.5877 | 0.8282 | 0.9918 | 0.5844 | 0.7885 | 0.9254 | |
60 | 0.6691 | 0.8506 | 0.9922 | 0.6617 | 0.7790 | 0.8601 | |
m = 50 | |||||||
20 | 0.0544 | 0.0569 | 0.0542 | 0.0530 | 0.0531 | 0.0527 | |
60 | 0.0560 | 0.0568 | 0.0573 | 0.0541 | 0.0548 | 0.0510 | |
m = 50 | |||||||
20 | 0.2083 | 0.3141 | 0.6261 | 0.2054 | 0.2867 | 0.4616 | |
60 | 0.2262 | 0.3390 | 0.6394 | 0.2220 | 0.2962 | 0.3907 | |
m = 50 | |||||||
20 | 0.7785 | 0.9561 | 0.9999 | 0.7777 | 0.9424 | 0.9957 | |
60 | 0.8396 | 0.9655 | 0.9998 | 0.8384 | 0.9410 | 0.9819 |
Table 6. Empirical sizes and powers of and with gamma distribution for partially paired data.
n | ρ | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 |
---|---|---|---|---|---|---|---|
m = 30 | |||||||
20 | 0.0605 | 0.0614 | 0.0625 | 0.0589 | 0.0592 | 0.0534 | |
60 | 0.0598 | 0.0637 | 0.0678 | 0.0569 | 0.0574 | 0.0561 | |
m = 30 | |||||||
20 | 0.0956 | 0.1311 | 0.1986 | 0.0986 | 0.1246 | 0.1415 | |
60 | 0.1111 | 0.1352 | 0.2189 | 0.1124 | 0.1226 | 0.1412 | |
m = 30 | |||||||
20 | 0.2678 | 0.4002 | 0.7019 | 0.2737 | 0.3693 | 0.5210 | |
60 | 0.3109 | 0.4324 | 0.7099 | 0.3169 | 0.3874 | 0.4553 | |
m = 50 | |||||||
20 | 0.0564 | 0.0524 | 0.0553 | 0.0557 | 0.0502 | 0.0538 | |
60 | 0.0541 | 0.0558 | 0.0593 | 0.0522 | 0.0527 | 0.0525 | |
m = 50 | |||||||
20 | 0.1090 | 0.1480 | 0.2640 | 0.1106 | 0.1458 | 0.2089 | |
60 | 0.1298 | 0.1601 | 0.2795 | 0.1310 | 0.1499 | 0.1813 | |
m = 50 | |||||||
20 | 0.3568 | 0.5499 | 0.8841 | 0.3647 | 0.5298 | 0.7552 | |
60 | 0.4104 | 0.5786 | 0.8873 | 0.4178 | 0.5369 | 0.6564 |
In sum, from the above simulation studies, we may conclude that the new introduced test statistic exhibits larger power than its competitors in the tried cases, and is applicable for .
However, we should realize that our proposed test statistic is asymptotically optimal. When the sample size is very small, it may be not the suitable one. Now consider m = 6, n = 8, l = 16 for and m = n = 8 for . These sample size settings correspond to Examples 2 and 3 in the next section, respectively. The results are shown in Tables 7 and 8. From these two tables, all the three test statistics are liberal for small sample sizes. Among them, the problem is most serious for . is the second one. While performs relatively better. This is due to the fact that both our proposed test statistic and Guo and Yuan [3]'s test statistic require to estimate the weight vectors, which depend on the covariance matrix V. To confirm this point, we further conduct simulation studies with known V. The results are given in Tables 9 and 10. From Table 9, we can see clearly that now the empirical sizes of are very close to the nominal level 0.05. The empirical sizes of are also under control. But now cannot control the empirical sizes. is more powerful than . While in Table 10, the empirical sizes of both and are close to 0.05. is still powerful than , especially with strong correlation.
Table 7. Empirical sizes and powers of , and with m = 6, n = 8, l = 16 for partially paired data.
n, l | ρ | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 |
---|---|---|---|---|---|---|---|---|---|---|
m = 6, Normal | ||||||||||
8, 16 | 0.1285 | 0.1450 | 0.1461 | 0.0648 | 0.0632 | 0.0585 | 0.0986 | 0.1043 | 0.1071 | |
m = 6, Normal | ||||||||||
8, 16 | 0.1783 | 0.1993 | 0.2627 | 0.1093 | 0.1078 | 0.1138 | 0.1480 | 0.1605 | 0.2143 | |
m = 6, Normal | ||||||||||
8, 16 | 0.3819 | 0.4658 | 0.6707 | 0.3303 | 0.3656 | 0.4070 | 0.3582 | 0.4372 | 0.6447 | |
m = 6, t | ||||||||||
8, 16 | 0.1280 | 0.1392 | 0.1471 | 0.0566 | 0.0536 | 0.0509 | 0.0859 | 0.0927 | 0.0931 | |
m = 6, t | ||||||||||
8, 16 | 0.1570 | 0.1730 | 0.2202 | 0.0811 | 0.0805 | 0.0797 | 0.1180 | 0.1259 | 0.1662 | |
m = 6, t | ||||||||||
8, 16 | 0.2775 | 0.3448 | 0.5061 | 0.1933 | 0.2132 | 0.2311 | 0.2374 | 0.2980 | 0.4459 | |
m = 6, Gamma | ||||||||||
8, 16 | 0.1374 | 0.1606 | 0.1694 | 0.0642 | 0.0597 | 0.0569 | 0.0953 | 0.0988 | 0.1001 | |
m = 6, Gamma | ||||||||||
8, 16 | 0.1461 | 0.1650 | 0.2011 | 0.0723 | 0.0678 | 0.0710 | 0.1080 | 0.1133 | 0.1398 | |
m = 6, Gamma | ||||||||||
8, 16 | 0.2203 | 0.2742 | 0.3881 | 0.1400 | 0.1545 | 0.1666 | 0.1817 | 0.2267 | 0.3350 |
Table 8. Empirical sizes and powers of and with m = n = 8 for partially paired data.
n | ρ | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 |
---|---|---|---|---|---|---|---|
m = 8, Normal | |||||||
8 | 0.1004 | 0.1051 | 0.1092 | 0.0812 | 0.0715 | 0.0693 | |
m = 8, Normal | |||||||
8 | 0.1291 | 0.1527 | 0.2157 | 0.1052 | 0.1148 | 0.1142 | |
m = 8, Normal | |||||||
8 | 0.2714 | 0.3880 | 0.6640 | 0.2457 | 0.3185 | 0.4112 | |
m = 8, t | |||||||
8 | 0.0842 | 0.0916 | 0.0951 | 0.0664 | 0.0617 | 0.0627 | |
m = 8, t | |||||||
8 | 0.1048 | 0.1253 | 0.1712 | 0.0849 | 0.0894 | 0.0933 | |
m = 8, t | |||||||
8 | 0.1873 | 0.2755 | 0.4696 | 0.1649 | 0.2087 | 0.2538 | |
m = 8, Gamma | |||||||
8 | 0.0975 | 0.1175 | 0.1141 | 0.0769 | 0.0796 | 0.717 | |
m = 8, Gamma | |||||||
8 | 0.1130 | 0.1335 | 0.1616 | 0.0989 | 0.1062 | 0.1136 | |
m = 8, Gamma | |||||||
8 | 0.1704 | 0.2305 | 0.3538 | 0.1640 | 0.2005 | 0.2362 |
Table 9. Empirical sizes and powers of , and with m = 6, n = 8, l = 16 and known V for partially paired data.
n, l | ρ | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 |
---|---|---|---|---|---|---|---|---|---|---|
m = 6, Normal | ||||||||||
8, 16 | 0.0476 | 0.0565 | 0.0544 | 0.0761 | 0.0891 | 0.0859 | 0.0476 | 0.0344 | 0.0162 | |
m = 6, Normal | ||||||||||
8, 16 | 0.0898 | 0.1165 | 0.1328 | 0.1301 | 0.1539 | 0.1646 | 0.0913 | 0.0801 | 0.0577 | |
m = 6, Normal | ||||||||||
8, 16 | 0.3068 | 0.3963 | 0.4973 | 0.3807 | 0.4553 | 0.5099 | 0.2990 | 0.3258 | 0.3273 | |
m = 6, t | ||||||||||
8, 16 | 0.0465 | 0.0556 | 0.0557 | 0.0677 | 0.0781 | 0.0825 | 0.0458 | 0.0361 | 0.0218 | |
m = 6, t | ||||||||||
8, 16 | 0.0546 | 0.0671 | 0.0761 | 0.0841 | 0.0935 | 0.1037 | 0.0541 | 0.0460 | 0.0289 | |
m = 6, t | ||||||||||
8, 16 | 0.1179 | 0.1561 | 0.1873 | 0.1661 | 0.1999 | 0.2239 | 0.1144 | 0.1081 | 0.0871 | |
m = 6, Gamma | ||||||||||
8, 16 | 0.0507 | 0.0582 | 0.0566 | 0.0786 | 0.0890 | 0.0880 | 0.0507 | 0.0367 | 0.0184 | |
m = 6, Gamma | ||||||||||
8, 16 | 0.0646 | 0.0753 | 0.0786 | 0.0977 | 0.1042 | 0.1092 | 0.0640 | 0.0490 | 0.0284 | |
m = 6, Gamma | ||||||||||
8, 16 | 0.1292 | 0.1728 | 0.2154 | 0.1766 | 0.2212 | 0.2442 | 0.1281 | 0.1251 | 0.1093 |
Table 10. Empirical sizes and powers of and with m = n = 8 and known V for partially paired data.
n | ρ | 0 | 0.5 | 0.8 | 0 | 0.5 | 0.8 |
---|---|---|---|---|---|---|---|
m = 8, Normal | |||||||
8 | 0.0507 | 0.0497 | 0.0488 | 0.0507 | 0.0506 | 0.0522 | |
m = 8, Normal | |||||||
8 | 0.0752 | 0.0917 | 0.1538 | 0.0752 | 0.0876 | 0.1059 | |
m = 8, Normal | |||||||
8 | 0.2104 | 0.3272 | 0.6401 | 0.2104 | 0.2911 | 0.3981 | |
m = 8, t | |||||||
8 | 0.0449 | 0.0459 | 0.0451 | 0.0449 | 0.0464 | 0.0462 | |
m = 8, t | |||||||
8 | 0.0491 | 0.0556 | 0.0676 | 0.0491 | 0.0548 | 0.0591 | |
m = 8, t | |||||||
8 | 0.0872 | 0.1134 | 0.2348 | 0.0872 | 0.1097 | 0.1391 | |
m = 8, Gamma | |||||||
8 | 0.0512 | 0.0578 | 0.0582 | 0.0512 | 0.0543 | 0.0525 | |
m = 8, Gamma | |||||||
8 | 0.0588 | 0.0675 | 0.0865 | 0.0588 | 0.0649 | 0.717 | |
m = 8, Gamma | |||||||
8 | 0.1006 | 0.1439 | 0.2724 | 0.1006 | 0.1298 | 0.1626 |
In sum, when the sample sizes are relatively large, should be used. While when the sample sizes are small, we may choose . But we should also realize that all existing test statistics are liberal for small sample sizes since they all rely on asymptotic normality. Developing suitable test statistics for small sample sizes are strongly required.
4. Real-data application
In this section, we use the proposed testing procedures to analyze three real data sets about CD4 count, DNA extraction concentrations, and the quality of sleep, respectively. Firstly, consider a real-world data collected from a HIV clinical trial. There are 567 male patients who had not received antiretroviral therapy before this trial and received combined antiretroviral therapy in this trial. Let X=the CD4 count at 96±5 weeks post therapy and Y=CD4 count at 20±5 weeks. Due to death and dropout, 199 patients in X have missing values. While Y for all patients is obtained. A more detailed description of this dataset can be found in Hammer et al. [5] and Guo et al. [4]. We aim to test whether there is any difference between the CD4 count at 96±5 weeks post therapy and the CD4 count at 20±5 weeks.
For this data set, there is incompleteness in single response, and thus cannot be used here. We apply our proposed and the to this dataset. We get that the values of and are 7.7123, and 5.7725, respectively. The p-values are all smaller than . In sum, both test statistics provide strong evidence that the CD4 count at 96±5 weeks post therapy and the CD4 count at 20±5 weeks have the significant mean difference in statistics.
Now we turn to our second example. This data set is from a research project conducted by Riordan [15], aiming to compare two methods for extracting DNA from coyote blood samples. One method is Kit method, and the other one is the more traditional chloroform method. The scientific question is whether these extraction methods differ with respect to the mean concentration of DNA. Due to time and cost considerations, DNA was measured using both methods for only 6 of these coyotes. The DNA of 8 coyotes was measured by only Kit method, and the DNA of the remaining 16 coyotes was measured by only chloroform method. The values of and are 0.4851, −0.2855, and 0.0079, respectively. The p-values are 0.3138, 0.3876, and 0.4968, respectively. Thus there is not a statistically significant mean difference between these two techniques with respect to DNA extraction concentration.
Finally, we consider a data set about the quality of sleep. This data set is provided by Derrick et al. [1]. The sleep fragmentation index measures the quality of sleep for an individual over one night. The lower sleep fragmentation score is, the better the quality of sleep is. In this study, 8 individuals only watch a ‘horror’ movie before sleep in one night. Another 8 participants watch a ‘feel good’ movie before bedtime in one night. An additional sample of 8 individuals watch a ‘feel good’ movie and a ‘horror’ movie over two separate nights. The question is whether the quality of sleep is the same between individuals watching a ‘horror’ movie and individuals watching a ‘feel good’ movie. The values of and are 2.4464, 2.4411, and 2.4177, respectively. The p-values are 0.0072, 0.0073, and 0.0078, respectively. The results suggest that individuals watching a ‘feel good’ movie before bedtime, have less disrupted sleep compared to individuals watching a ‘horror’ movie before bedtime.
5. Conclusions and discussions
In this paper, we provide a unified framework for two-sample t-test with partially paired data. We show that many existing two-sample t-tests with partially paired data can be viewed as special members in our unified framework. In this way, we discuss the shortcomings of some existing t-tests. Then we propose the optimal weighted linear combination of the test statistics comparing all four paired and unpaired data sets. Numerical studies and a real data example are used to illustrate the performance of our proposed optimal weighted combinations of test statistics and compare with some existing methods. For practical application, we recommend to use the new introduced t-test due to its good power performance and wide range of applications.
Acknowledgments
The authors are grateful to the Editor, the associate editor, and the two anonymous referees for substantive comments that have significantly improved this manuscript.
Funding Statement
The research described herewith was supported by National Natural Science Foundation of China [grant number 11701034], [grant number 61877049], the Fundamental Research Funds for the Central Universities, and China Postdoctoral Science Foundation [grant number 2017M610058], [grant number 2016M590934], [grant number 2017T100731].
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Derrick B., Toher D., and White P., How to compare the means of two samples that include paired observations and independent observations: A companion to Derrick, Russ, Toher and White (2017), Quant. Methods Psychol. 13 (2017), pp. 120–126. doi: 10.20982/tqmp.13.2.p120 [DOI] [Google Scholar]
- 2.Ekbohm G., On comparing means in the paired case with incomplete data on both responses, Biometrika 63 (1976), pp. 299–304. doi: 10.1093/biomet/63.2.299 [DOI] [Google Scholar]
- 3.Guo B.B. and Yuan Y., A comparative review of methods for comparing means using partially paired data, Stat. Methods Med. Res. 26 (2017), pp. 1323–1340. doi: 10.1177/0962280215577111 [DOI] [PubMed] [Google Scholar]
- 4.Guo X., Wang T., Xu W.L., and Zhu L., Dimension reduction with missing response at random, Comput. Stat. Data Anal. 69 (2014), pp. 228–242. doi: 10.1016/j.csda.2013.08.001 [DOI] [Google Scholar]
- 5.Hammer S.M., Katzenstein D.A., Hughes M.D., Gundacker H., Schooley R.T., Haubrich R.H., Henry W.K., Lederman M.M., Phair J.P., Niu M., Hirsch M.S., and Merigan T.C., A trial comparing nucleotide monotherapy with combined therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter, N. Engl. J. Med. 335 (1996), pp. 1081–1090. doi: 10.1056/NEJM199610103351501 [DOI] [PubMed] [Google Scholar]
- 6.Kim B., Kim I., Lee S., Kim S., Rha S., and Chung H., Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer, Bioinformatics 21 (2005), pp. 517–528. doi: 10.1093/bioinformatics/bti029 [DOI] [PubMed] [Google Scholar]
- 7.Kuan P.F., Propensity score method for partially matched omics studies, Cancer Inform. 13 (2014), pp. 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kuan P.F. and Huang B, A simple and robust method for partially matched samples using the p-values pooling approach, Stat. Med. 32 (2013), pp. 3247–3259. doi: 10.1002/sim.5758 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lim J., Kim J., Kim S.C., Yu D., Kim K., and Kim B.S., Detection of differentially expressed gene sets in a partially paired microarray data set, Stat. Appl. Genet. Mol. Biol. 11 (2012), pp. 00–00. Article 5. doi: 10.1515/1544-6115.1610 [DOI] [PubMed] [Google Scholar]
- 10.Lin P. and Stivers L., On differences of means with incomplete data, Biometrika 61 (1974), pp. 325–334. doi: 10.1093/biomet/61.2.325 [DOI] [Google Scholar]
- 11.Little R.J.A. and Rubin D.B., Statistical Analysis with Missing Data, John Wiley & Sons, Hoboken, NJ, 2014. [Google Scholar]
- 12.Looney S.W. and Jones P.W., A method for comparing two normal means using combined samples of correlated and uncorrelated data, Stat. Med. 22 (2003), pp. 1601–1610. doi: 10.1002/sim.1514 [DOI] [PubMed] [Google Scholar]
- 13.Pazos M., Yang H.L., Gardiner S.K., Cepurna W.O., Johnson E.C., Morrison J.C., and Burgoyne C.F., Expansions of the neurovascular scleral canal and contained optic nerve occur early in the hypertonic saline rat experimental glaucoma model, Exp. Eye Res. 145 (2014), pp. 173–186. doi: 10.1016/j.exer.2015.10.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Qin H., Prentice E., and Freeman K., Analyzing partially correlated longitudinal data in community survey research, Soc. Natur. Resour. 31 (2018), pp. 142–149. doi: 10.1080/08941920.2016.1264650 [DOI] [Google Scholar]
- 15.Riordan B., Northeastern Ohio Coyote Hybridization with Wolves. Honors research project, 2012, University of Akron, Akron
- 16.Samawi H.M. and Vogel R., Notes on two sample tests for partially correlated (paired) data, J. Appl. Stat. 41 (2014), pp. 109–117. doi: 10.1080/02664763.2013.830285 [DOI] [Google Scholar]
- 17.Serfling R.Approximation Theorems of Mathematical Statistics, John Wiley, New York, 1980. [Google Scholar]
- 18.Zaykin D.V., Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis, J. Evol. Biol. 24 (2011), pp. 1836–1841. doi: 10.1111/j.1420-9101.2011.02297.x [DOI] [PMC free article] [PubMed] [Google Scholar]