Skip to main content
Computational and Mathematical Methods in Medicine logoLink to Computational and Mathematical Methods in Medicine
. 2015 May 18;2015:172918. doi: 10.1155/2015/172918

Estimation of Sensitive Proportion by Randomized Response Data in Successive Sampling

Bo Yu 1,2, Zongda Jin 3, Jiayong Tian 4, Ge Gao 1,*
PMCID: PMC4451016  PMID: 26089958

Abstract

This paper considers the problem of estimation for binomial proportions of sensitive or stigmatizing attributes in the population of interest. Randomized response techniques are suggested for protecting the privacy of respondents and reducing the response bias while eliciting information on sensitive attributes. In many sensitive question surveys, the same population is often sampled repeatedly on each occasion. In this paper, we apply successive sampling scheme to improve the estimation of the sensitive proportion on current occasion.

1. Introduction

Social survey sometimes includes stigmatizing or sensitive issues of enquiry, such as habitual tax evasion, sexual behaviour, substance abuse, and excessive gambling that it is difficult to obtain valid and trustworthy information. If the respondents are asked directly about controversial matters, it often results in refusal or untruthful answers, especially when they have committed stigmatizing behaviour. To overcome this difficulty, Warner [1] introduced randomized response techniques to estimate the proportion of people bearing such a stigmatizing or sensitive characteristic in a given community. This technique allows the respondent to answer sensitive questions truthfully without revealing embarrassing behaviour. Following the pioneering work of Warner [1], some researchers have made important contributions in this area, such as Christofides [2, 3], Singh [4], Kim and Elam [5], Huang [6, 7], Singh and Sedory [8], Chang and Kuo [9], Arnab et al. [10]. All these results are based on a sample on one occasion, which is not the case in the present study.

In many sensitive question surveys, the same population is often sampled repeatedly on each occasion, so that the development over time can be followed. In such situations, the use of successive sampling scheme can be attractive alternative to improve the estimators of level at a point in time or to measure the change between two time points. In successive sampling on two occasions, previous theory [11, 12] aimed at providing the optimum estimator of mean on the current (second) occasion. Successive sampling has also been discussed in some detail by Narain [13], Raj [14], Singh [15], Ghangurde and Rao [16], Okafor [17], Arnab and Okafor [18], Biradar and Singh [19], G. N. Singh and V. K. Singh [20], Artes et al. [21], and so forth, and Singh et al. [22]. However no effort has been made to estimate the proportions of sensitive infinite population on the current occasion. This motivation led the authors to consider the problem of estimating the binomial proportions of sensitive or stigmatizing attributes in the population of interest in successive sampling on two occasions. In addition, cluster sampling is usually preferred when the target population is geographically diverse. In this paper, we utilize the rotation cluster sample design to construct a class of estimators for the case of randomized response survey. The rest of the paper is organized as follows. In Section 2, we proposed a new scientific survey method using the Simmons model with cluster rotation sampling. In Section 3, corresponding formulas for the mentioned survey method are found followed by the aforementioned method and corresponding formulas were successfully designed and applied in a survey of premarital sexual behaviour among students at Soochow University in Section 4. Section 5 contains the conclusion.

2. The Proposed Survey Methods

2.1. Simmons Model

Simmons model which is based on Warner's randomized response technique was put forward by Horvitz et al. [23]. The basic thought is to develop a random rapport between the individuals and two unrelated questions. Simmons design consists of two unrelated questions, A and B, to be answered on probability basis, where A is “do you possess the sensitive characteristic” and B is a nonsensitive question such as “is your birthday number odd or not.” The two questions A and B are presented to respondents with preset probabilities P and 1 − P, respectively. The simple random sampling with replacement (SRSWR) is assumed. The selected respondent is asked to select a question A or B and report “yes” if his/her actual status matches with the selected question and “no” otherwise.

2.2. Simmons Model in Cluster Rotation Sampling

In the following sampling on two occasions is considered to estimate population proportion with a sensitive characteristic on second occasion when the rotation sampling units are clusters. The sampling steps for Simmons model under partial clusters rotation are as follows.

Firstly, the population is divided into primary sampling units (or cluster) and the units within the clusters are the secondary sampling units (persons).

Secondly, in the first occasion a random sample of n clusters with replacement is drawn from the population. The people within the drawn clusters are asked to select a question A or B and report “yes” if his/her actual status matches with the selected question and “no” otherwise, using the Simmons model.

Thirdly, in the second occasion m of the n clusters selected on the first occasion are retained at random and the remaining u = nm of the clusters are replaced by a fresh selection. All the people within the total clusters in the second occasion are investigated using the Simmons model.

3. Formulas Deduction

3.1. The Estimator of the Population Proportion on the Second Occasion and Its Variance

Consider a random sample of n clusters with replacement drawn from the population which consists of N clusters and the ith cluster of M i units (i = 1,2,…, N).

In the second (current) occasion m of the n clusters selected on the first occasion are retained at random and the remaining u = nm of the clusters are replaced by a fresh selection. Let a 1,l be the number of the lth retained cluster (including M l units) with the sensitive characteristic under study on the first occasion (l = 1,2,…, m) and let a 1,m+k be the number of the kth rotated cluster (including M m+k units) with the sensitive characteristic under study on the first occasion, respectively (k = 1,2,…, u). a 2,l is the number of the lth retained cluster (including M l units) with the sensitive characteristic under study on the second (current) occasion (l = 1,2,…, m) and a 2,m+j is the number of the jth fresh cluster (including M m+j′ units) with the sensitive characteristic under study on the second (current) occasion (j = 1,2,…, u). Similarly, let π 1,l be the proportion of the lth retained cluster with the sensitive characteristic under study on the first occasion (l = 1,2,…, m) and let π 1,m+k be the proportion of the kth rotated cluster with the sensitive characteristic under study on the first occasion (k = 1,2,…, u), respectively. π 2,l is the proportion of the lth retained cluster with the sensitive characteristic under study on the second (current) occasion (l = 1,2,…, m) and π 2,m+j is the proportion of the jth fresh cluster with the sensitive characteristic under study on the second (current) occasion (j = 1,2,…, u). Assume that the variance and the correlation coefficient between the first occasion and second occasion are constant s and the overall correction coefficient is ignored.

Define the following:

  • π 1: the population proportion of the sensitive characteristic on the first occasion;

  • π 2: the population proportion of the sensitive characteristic on the second occasion;

  • π 1 m: the proportion of m retained clusters with the sensitive characteristic on the first occasion;

  • π 2 m: the proportion of m retained clusters with the sensitive characteristic on the second occasion;

  • π 1 u: the proportion of u rotated clusters with the sensitive characteristic on the first occasion;

  • π 2 u: the proportion of u fresh clusters with the sensitive characteristic on the second occasion.

The following is according to the formula and results given by Cochran [24].

The estimator of π 1 is

π^1=i=1na1,ii=1nMi. (1)

The estimator of π 1m is

π^1m=i=1ma1,ii=1mMi. (2)

The estimator of π 1u is

π^1u=j=1ua1,m+jj=1uMm+j  . (3)

The estimator of π 2m is

π^2m=j=1ma2,jj=1mMj. (4)

The estimator of π 2u is

π^2u=j=1ua2,m+jj=1uMm+j. (5)

Consider a generalized estimator π^2 of the population proportion of the sensitive characteristic on the second occasion or current occasion as

π^2=aπ^1u+bπ^1m+cπ^2u+dπ^2m, (6)

where a, b, c, and d are suitable constants.

We have

Eπ^2=a+bπ1+c+dπ2. (7)

Because the estimator π^2  of π 2 is an unbiased estimator of π 2, we have a + b = 0 and c + d = 0.

Hence, the estimator 6 takes the form

π^2=aπ^1uπ^1m+cπ^2u+1cπ^2m. (8)

The variance of estimator π^2 is

Varπ^2=a2Var(π^1u)+a2Var(π^1m)+c2Var(π^2u)+1c2Var(π^2m)2a(1c)cov(π^1m,π^2m)=a2S12u+a2S12m+c2S22u+1c2S22m2a1cρS1S2m=a2S12u+S12m+c2S22u+1c2S22m2a1cρbS1S2m. (9)

Other covariance terms are zero.

Minimizing the variance of estimator π^2 with respect to a and c when N is sufficiently large,

Var(π^2)a=2a1u+1mS122(1c)ρbS1S2m=0,Var(π^2)c=2cS22u21cS22m+2aρbS1S2m=0. (10)

Then we get

a1u+1mS12=1cρbS1S2m,aρbS1S2m=cS22u1cS22m. (11)

We derive

1/u+1/mρb/m=(1c)ρb/mc/u1c/m. (12)

One has

u+m/umρb/m=(1c)ρb/mc(u+m)u/um, (13)

for u + m = n.

We have

n/uρb=(1c)ρbcnu/u. (14)

Hence,

c=u/nu2/n2ρb21u2/n2ρb. (15)

Define μ = u/n.

We get

c=μμ2ρb21μ2ρb2. (16)

By 16, we derive

a=(1c)(ρbS2)(m/u+1)S1, (17)

for u + m = n and μ = u/n.

One has

a=(1c)μ(ρbS2)S1. (18)

By 16 and 18, we get

a=S2S1·μ1μρb1μ2ρb2, (19)

where

Sh2=1N1i=1Nπh,iπh2,h=1,2;μ=un,ρb=i=1N(π2,iπ2)(π1,iπ1)i=1Nπ2,iπ22i=1Nπ1,iπ12. (20)

Theorem 1 . —

Under the Simmons model in partial clusters rotation, one has

π^2=S2S1·μ1μρb1μ2ρb2(π^1uπ^1m)+μμ2ρb21μ2ρb2π^2u+(1μ1μ2ρb2)π^2m, (21)

and the variance of estimator π^2 is

VarminP^2=1μρb21μ2ρb2S22n. (22)

Remark 2 . —

In practice, the ρ b and S h 2 are unknown. The estimator of ρ b is

ρ^b=k=1m(π2,kπ^2m)(π1,kπ^1m)k=1mπ2,kπ^2m2k=1mπ1,kπ^1m2. (23)

And the estimator of S h 2 is

sh2=1m1i=1mπh,iπ^hm2,h=1,2. (24)

Theorem 3 . —

Under the Simmons model in partial clusters rotation, one has the optimum rotation rate as

μ=1+1ρb21. (25)

And the optimum variance of estimator π^2 is

Varminoptπ^2=1+1ρb2S222n. (26)

Practically, the costs of sample survey usually represent the following simple function, according to Cochran [24]:

CT=c0+c1m+c2u, (27)

where C T is the total cost of sampling, c 0 is the fundamental cost of the survey, c 1 is the average fundamental cost of investigating one retained cluster on the second occasion, and c 2 is the average fundamental cost of investigating one fresh cluster on the second occasion.

Theorem 4 . —

Under the given cost of sample survey C T, one has

Varoptπ^2=c11ρb2+c2S222CTc0. (28)

And the estimation of sample size in partial clusters rotation is

n=MCTc01+1ρb2c11ρb2+c2, (29)

where

S22=1N1i=1Nπ2,iπ22. (30)

3.2. The Estimator of π h,i

Let R h,i be the proportion of the selected ith cluster (including M i units) with the nonsensitive characteristic under study on the hth occasion; λ h,i and m h,i denote the number and the proportion of “yes” answers in the ith cluster, respectively, where λ^h,i=mh,i/Mi, h = 1,2, (i = 1,2,…, n).

From the total probability formulas (see [25]), we can get

πh,i=λh,i(1P)Rh,iP,h=1,2,i=1,2,,n. (31)

Hence

ah,i=Miπh,i,h=1,2,i=1,2,,n. (32)

4. Applications

4.1. Survey Design

The survey is about premarital sexual behavior among students in Dushu Lake Campus of Soochow University. We regard every class as a cluster of 45 persons per class on average. In the first occasion (2011), 12 classes were drawn from all the classes randomly. All the persons in the selected 12 classes are surveyed by Simmons model for sensitive questions. In the second occasion (2013), 8 of the 12 classes selected on the first occasion are retained at random and the remaining 4 classes are replaced by a fresh selection. Then all the persons in the selected classes that consist of 8 retained classes and 4 fresh classes are surveyed by Simmons model for sensitive questions.

In our design, each person was asked to draw a ball at random with replacement from a bag containing 6 red balls and 4 white balls with known probability (the proportion of red balls was 0.6). If a red ball was selected by the respondent, then he or she would be asked the sensitive question A, where A is “are you a member of the group having premarital sexual behavior.” If a white ball was selected, he or she would answer the nonsensitive question B, where B is “is your student number odd or not.” The respondent reports “yes” if his/her actual status matches with the selected question and “no” otherwise.

All the questionnaires of two occasions had been checked to ensure that they are completed independently and no questions were omitted. The recovery rate of the survey was 100% with no failure questionnaire. All data was processed and analyzed by Excel 2003 and SAS 9.13.

4.2. Results

4.2.1. Result of the Survey

In our design, each person was asked to draw a ball at random with replacement from a bag containing 6 red balls and 4 white balls with known probability (the proportion of red balls was 0.6). If a red ball was selected by the respondent, then he or she would be asked the sensitive question A, where A is “are you a member of the group having premarital sexual behavior.” If a white ball was selected, he or she would answer the nonsensitive question B, where B is “is your student number odd or not.” The respondent reports “yes” if his/her actual status matches with the selected question and “no” otherwise. According to 31, we get the sample proportion of the undergraduate students who have premarital sexual behavior π h,i, h = 1,2, as is shown in Table 1.

Table 1.

The proportion of every class with premarital sexual behavior in first and second occasions.

Class number π1,i π2,i
1 0.2624 0.2348
2 0.1631 0.1945
3 0.2101 0.2264
4 0.2063
5 0.1556 0.1986
6 0.2390
7 0.1783
8 0.1970 0.1550
9 0.0123 0.0114
10 0.0476 0.0738
11 0.0455
12 0.1185 0.1187
13 0.2035
14 0.1587
15 0.1926
16 0.1583

4.2.2. The Estimator of the Population Proportion on the Second Occasion and Its Variance

By 1, the estimator of the population proportion with premarital sexual behavior on the first occasion is as follows: π^1=0.142933.

According to 24, 2, and 3, we have

s12=0.005853,π^1m=0.1458,π^1u=0.1372, (33)

respectively.

According to the results of investigation premarital sexual behavior among students in Dushu Lake Campus of Soochow University on the second occasion, from formulae 4 and 5,

π^2m=0.1562,π^2u=0.1783. (34)

By 23 and 24, we obtain S 2 2 = 0.004262 and ρ^b=0.936, respectively.

From formula 25, μ = 0.7985; then according to formula 21, we get

π^2=0.2912(0.13720.1458)+0.5453×0.1783+(10.5453)×0.1562=0.1657. (35)

Using 22, we get Varmin(opt)(π^2)=0.00024. Hence, standard deviation is as follows:

Varmin(opt)(π^2)=0.01549. (36)

So, 95% confidence interval of the population proportion with the premarital sexual is [0.1353,0.1961].

5. Discussion and Conclusion

To sum up, in this study, we proposed a new sampling method to solve the question of sensitive questions surveys repeated over time, which is the first attempt made by the authors in this direction. Then the corresponding formulas for the estimator of the population proportion with sensitive characteristic and its variance for the proposed sampling method are provided. In addition, formulas for the optimal rotation rate and sample size under the given cost of sample survey are given.

The aforementioned method and corresponding formulas were successfully designed and applied in the premarital sex survey in Dushu Lake Campus of Soochow University. In a word, the designed sampling method and corresponding formulas have important theory and application value to achieve the sensitive questions continuous survey.

6. Proofs of Theorems

Proof of Theorem 1. —

Using the optimum values of a and c given by 16 and 19, estimator π^2 reduces to 21.

By 9, 16, and 19, we have

VarminP^2=1μρb21μ2ρb2S22n. (37)

Proof of Theorem 3. —

The optimum value of μ is given by further minimizing 22 with respect to μ,

Varmin(π^2)μ=0. (38)

So

μ=1+1ρb21. (39)

Substituting 39 in 22, we have the optimum variance of estimator π^2 as

Varminoptπ^2=1+1ρb2S222n. (40)

Proof of Theorem 4. —

ByTheorem 3,

μ=1+1ρb21,μ=un,u=nm. (41)

Substituting 41 in 27, we obtain

n=CTc01+1ρb2c11ρb2+c2. (42)

Suppose the average cluster consists of M- units; then

n=MCTc01+1ρb2c11ρb2+c2. (43)

Substituting 42 in 26, we have

Varoptπ^2=c11ρb2+c2S222CTc0. (44)

Acknowledgments

The authors would like to express their deep thanks to the related referees for carefully reading the paper and for comments which greatly improved the paper. This paper is supported by a grant from the National Natural Science Foundation of China (no. 81273188 to G. Gao). The authors are grateful to G. Gao (corresponding author) for his invaluable help.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

  • 1.Warner S. L. Randomized response: a survey technique for eliminating evasive answer bias. Journal of the American Statistical Association. 1965;60(309):63–69. doi: 10.1080/01621459.1965.10480775. [DOI] [PubMed] [Google Scholar]
  • 2.Christofides T. C. A generalized randomized response technique. Metrika. 2003;57(2):195–200. doi: 10.1007/s001840200216. [DOI] [Google Scholar]
  • 3.Christofides T. C. Randomized response in stratified sampling. Journal of Statistical Planning and Inference. 2005;128(1):303–310. doi: 10.1016/j.jspi.2003.11.001. [DOI] [Google Scholar]
  • 4.Singh G. N. On the use of chain-type ratio to difference estimator in successive sampling. International Journal of Applied Mathematics and Statistics. 2006;6:41–49. [Google Scholar]
  • 5.Kim J.-M., Elam M. E. A stratified unrelated question randomized response model. Statistical Papers. 2007;48(2):215–233. doi: 10.1007/s00362-006-0327-6. [DOI] [Google Scholar]
  • 6.Huang K.-C. Estimation for sensitive characteristics using optional randomized response technique. Quality and Quantity. 2008;42(5):679–686. doi: 10.1007/s11135-007-9082-6. [DOI] [Google Scholar]
  • 7.Huang K.-C. Unbiased estimators of mean, variance and sensitivity level for quantitative characteristics in finite population sampling. Metrika. 2010;71(3):341–352. doi: 10.1007/s00184-009-0234-7. [DOI] [Google Scholar]
  • 8.Singh S., Sedory S. A. A true simulation study of three estimators at equal protection of respondents in randomized response sampling. Statistica Neerlandica. 2012;66(4):442–451. doi: 10.1111/j.1467-9574.2012.00524.x. [DOI] [Google Scholar]
  • 9.Chang H.-J., Kuo M.-P. Estimation of population proportion in randomized response sampling using weighted confidence interval construction. Metrika. 2012;75(5):655–672. doi: 10.1007/s00184-011-0346-8. [DOI] [Google Scholar]
  • 10.Arnab R., Singh S., North D. Use of two decks of cards in randomized response techniques for complex survey designs. Communications in Statistics. Theory and Methods. 2012;41(16-17):3198–3210. doi: 10.1080/03610926.2012.682634. [DOI] [Google Scholar]
  • 11.Jessen R. J. Statistical investigation of a survey for obtaining farm facts. Iowa Agricultural Experiment Station Research Bulletin. 1942;304:1–104. [Google Scholar]
  • 12.Yates F. Sampling Methods for Censuses and Surveys. London, UK: Charles Griffin; 1949. [Google Scholar]
  • 13.Narain R. D. On the recurrence formula in sampling on successive occasions. Journal of the Indian Society of Agricultural Statistics. 1953;5:96–99. [Google Scholar]
  • 14.Raj D. On sampling over two occasions with probability proportionate to size. Annals of Mathematical Statistics. 1965;36:327–330. doi: 10.1214/aoms/1177700297. [DOI] [Google Scholar]
  • 15.Singh D. Estimates in successive sampling using a multi-stage design. Journal of the American Statistical Association. 1968;63:99–112. [Google Scholar]
  • 16.Ghangurde P. D., Rao J. N. Some results on sampling over two occasions. Sankhya. 1969;31:463–472. [Google Scholar]
  • 17.Okafor F. C. The theory and application of sampling over two occasions for the estimation of current population ratio. Statistica. 1992;42:137–147. [Google Scholar]
  • 18.Arnab R., Okafor F. C. A note on double sampling over two occasions. Pakistan Journal of Statistics. 1992;8(3):9–18. [Google Scholar]
  • 19.Biradar R. S., Singh H. P. Successive sampling using auxiliary information on both the occasions. Calcutta Statistical Association Bulletin. 2001;51(203-204):243–251. [Google Scholar]
  • 20.Singh G. N., Singh V. K. On the use of auxiliary information in successive sampling. Journal of the Indian Society of Agricultural Statistics. 2001;54(1):1–12. [Google Scholar]
  • 21.Artes R., Eva M., Garcia L., Amelia V. Estimation of current population ratio in successive sampling. Journal of the Indian Society of Agricultural Statistics. 2001;54(3):342–354. [Google Scholar]
  • 22.Singh H. P., Tailor R., Singh S., Kim J.-M. Estimation of population variance in successive sampling. Quality & Quantity. 2011;45(3):477–494. doi: 10.1007/s11135-009-9309-9. [DOI] [Google Scholar]
  • 23.Horvitz D. G., Shah B. V., Simmons W. R. The unrelated question randomized response model. Proceedings of the Social Statistics Section: American Statistical Association. 1967;326:65–72. [Google Scholar]
  • 24.Cochran W. G. Sampling Techniques. 3rd. New York, NY, USA: John Wiley & Sons; 1977. [Google Scholar]
  • 25.Du Z. Sampling Techniques and Its Application. 1st. Beijing, China: Tsinghua University Press; 2005. [Google Scholar]

Articles from Computational and Mathematical Methods in Medicine are provided here courtesy of Wiley

RESOURCES