Skip to main content
Heliyon logoLink to Heliyon
. 2024 Mar 6;10(5):e27488. doi: 10.1016/j.heliyon.2024.e27488

An efficient estimator of population variance of a sensitive variable with a new randomized response technique

Muhammad Azeem a,, Najma Salahuddin b, Sundus Hussain b, Musarrat Ijaz c, Abdul Salam a
PMCID: PMC10943444  PMID: 38495208

Abstract

In sampling theory, a majority of the available estimators of population variance are designed for use with non-sensitive variables only. Such estimators cannot perform efficiently when the variable of interest is of sensitive nature, such as use of drugs, illegal income, abortion, cheating in examination, the amount of income tax payable, and the violation of rules by employees, etc. In the current literature, the shortage of research studies on variance estimators of a sensitive variable has created a big research gap and a room for improvement in the efficiency of such estimators. In this paper, a new randomized scrambling technique is proposed, along with a new estimator of population variance. The new estimator achieves improvement in efficiency over the available variance estimators. The proposed estimator is designed for use with simple random sampling and uses the information on an auxiliary variable. The improvement in efficiency is shown for different choices of constants. Besides efficiency, improvement in the unified measure of estimator quality is also achieved with the proposed estimator under the new randomized response model.

Keywords: Auxiliary variable, Efficiency, Mean square error, Randomized response technique, Variance estimator

1. Introduction

Survey researchers are often faced with refusals and false responses in sample surveys on sensitive topics. Respondents often fail to provide truthful information on question related to sensitive issues such as abortion, use of drugs, illegal income, and cheating in examination. Introduced in 1965, the randomized response techniques can be a good alternative to the direct-questioning method when collecting information on sensitive variables. The Warner's [1] randomized response technique and its modified variants are designed to get sensitive information from the respondents yet protecting their privacy. Warner [2] introduced the use of a scrambling variable for the purpose of the respondents' privacy protection. A drawback of the Warner's [2] technique was that it forced every respondent to scramble his/her response even if the respondent perceived the question as non-sensitive. To alleviate this problem, Gupta et al. [3] introduced what is called the optional randomized response model. With optional randomization techniques, the respondents have the option to either provide the true response or go for reporting a scrambled response. The respondent's decision to report the true or a scrambled response depends on whether he/she perceives the question being asked as sensitive or not. Unlike the usual additive or multiplicative scrambling methods, a recent study of Azeem [4] introduced the idea of exponential scrambling technique.

In sampling theory, Cochran [5] laid the foundation of using auxiliary information in estimating the population parameters by introducing ratio-type estimators. Later on, Das and Tripathi [6] suggested using the auxiliary information in the estimation of the variance of a finite population. Isaki [7] introduced a ratio estimator for the variance based on an auxiliary variable. Singh et al. [8] analyzed calibration estimators for the population variance. Kadilar and Cingi [9] used auxiliary variable to develop improved estimators of the variance of population. The new estimators achieved a significant improvement in efficiency over previous estimators. Subramani and Kumarapandiyan [10] suggested a variance estimator based on the median of the supplementary variable.

Gupta et al. [11] developed a generalization of the estimators of the variance under a linear scrambling model. The findings of the study suggested that the new estimator was more precise than the Isaki's [7] estimator under the linear response model. Recently, Saleem et al. [12] developed a new randomized response model and presented a novel estimator of the variance based on two auxiliary variables. The new estimator gained a significant boost in efficiency over the available variance estimators.

Zaman and Bulut [13] utilized the mathematical function of the auxiliary variable parameters to develop new variance estimators. In another study, Zaman and Bulut [14] suggested new estimators of variance by using the minimum covariance determinant. Another recent study of Zaman et al. [15] suggests a randomized technique for improving the efficiency of estimates using group discussion.

Using a new scrambling model, this paper presents an efficient variance estimator based on a non-sensitive ancillary variable. The suggested estimator is found to achieve a big improvement in efficiency over the Gupta et al. [11] estimator, and the Isaki [7] estimator under the proposed randomized response model.

This paper is outlined as follows:

Section 2 presents the mathematical notations and assumptions which have been used in the subsequent sections. Section 3 presents the mathematical equations of some of the available estimators from the literature. Section 4 presents the proposed variance estimator and its algebraic properties under the proposed randomized response model. Section 5 provides the performance evaluation measures for the proposed variance estimator. In Section 6, the performance of the suggested estimator is compared with the existing estimators and the results have been presented in tables. Section 7 presents a detailed discussion related to the findings of the analysis and the conclusion of the study.

2. Notations

Suppose the population of interest contains N units U1,U2,...,UN and consider a simple random sample of n units obtained from a target population. Let the sensitive-type variable under consideration be denoted by Y, with X being the notation for an auxiliary variable having positive correlation with variable Y. Further, (xi,yi) denotes the value of (X,Y) corresponding to the ith unit of the population. The mathematical expressions for the different parameters and their estimators have been provided in Table 1.

Table 1.

Notations for sample and population data.

Notation Meaning
X=1Ni=1Nxi Population Mean of X
Y=1Ni=1Nyi Population Mean of Y
x=1ni=1nxi Sample Mean of X
y=1ni=1nyi Sample Mean of Y
Sx2=1N1i=1N(xiX)2 Population Variance of X
Sy2=1N1i=1N(yiY)2 Population Variance of Y
sx2=1n1i=1n(xix)2 Sample Variance of X
sy2=1n1i=1n(yiy)2 Sample Variance of Y
λrs=μrsμ20r2μ02s2 Moment ratio, where ‘r’ and ‘s’ denote positive integers.
μrs=1N1i=1N(yiY)r(xiX)s Cross-product moment, where ‘r’ and ‘s’ denote positive integers.

3. Variance estimators in literature

A simple unbiased estimator of the variance may be expressed as:

t0=sy2. (1)

The sampling variance of the estimator in equation (1) may be expressed as:

Var(t0)=θSy4(λ401), (2)

where θ=1n, and λ40 in equation (2) can be obtained from the general moment ratio λrs presented in Table 1.

Diana and Perri [16] proposed a linear model as:

Z=TY+S, (3)

where Z is the response observed by the interviewer, and T and S denote scrambling/random variables, such that E(T)=1 and E(S)=0.

Under the scrambling model given in equation (3), the variance of Z can be derived as:

Sz2=STY2+SS2,
=ST2Sy2+ST2μy2+Sy2+SS2,

or

Sy2=Sz2SS2ST2Z2ST2+1. (4)

If the variable of interest Y is sensitive, a randomized variant of the variance estimator t0 can be obtained by replacing Sz2 and Z2 by sz2 and z2, respectively. From equation (4), we get the following basic estimator of the variance:

t0(R)=sz2SS2ST2z2ST2+1. (5)

The estimator in equation (5) can be used in the development of ratio estimators of population variance.

Isaki [7] presented a ratio estimator for the population variance, which can be expressed as:

t1=sy2(Sx2sx2). (6)

The bias and Mean Square Error (MSE) of the estimator given in equation (6), up to the first order of approximation, can be expressed as:

Bias(t1)=θSy2(λ04λ22), (7)

and

MSE(t1)=θSy4(λ40+λ042λ22). (8)

The moment ratios λ40, λ04, and λ22 in equation (7) and equation (8) can be obtained from the general expression of λrs presented in Table 1.

Using the linear model, Gupta et al. [11] proposed the following generalized variance estimator:

t2(R)=[((sz2SS2ST2z2ST2+1)+(Sx2sx2))(αSx2+βw(αsx2+β)+(1w)(αSx2+β))g], (9)

where g, w, α, and β are predetermined constants. For different values of constants, we can obtain special cases of the estimator given in equation (9).

The bias in the estimator t2(R) up to the first order of approximation may be derived as:

Bias(t2(R))=θST2Z2ST2+1Cz2αgwθSx2αSx2+β[Sz2(λ221)2ST2Z2λ12CzST2+1Sx2(λ041)], (10)

In equation (10), the notation Cz2 can be expressed as:

Cz2=Cy2ST2+SS2Y2.

The optimum MSE of t2(R) may be expressed as:

MSE(t2(R))opt=θ(ST2+1)2[(Sz4(λ401)+4ST4Z4Cz24Sz2ST2Z2λ30Cz)
1(λ041)(Sz2(λ221)2ST2Z2λ12Cz)2]. (11)

The expression for optimum MSE in equation (11) can be used for the purpose of efficiency comparison. Saleem et al. [12] proposed a generalized estimator of population variance as:

tg=[k1(sz2SS2ST2z2ST2+1)+k2(Sx12sx12)+k3(Sx22sx22)]exp(Sx12sx12Sx12+sx12)α1(Sx22sx22)α2, (12)

where k1,k2,k3,α1, and α2 are generalizing constants, (Sx12,Sx22) and (sx12,sx22) denote the population and sample variances of the auxiliary variables X1 and X2. Saleem et al. [12] also discussed many special cases of the estimator presented in equation (12). Two special cases of the Saleem et al. [12] generalized estimator are as follows:

t3(R)=sz2SS2ST2z2ST2+1exp(Sx2sx2Sx2+sx2). (13)

and

t4(R)=[sz2SS2ST2z2ST2+1+(Sx2sx2)]exp(Sx2sx2Sx2+sx2). (14)

Before deriving the mean squared error of each of the available estimators under the proposed model, we first present our proposed model in Section 4. Using our proposed model, we have derived the mean square error of various estimators, including those given in equation (13) and equation (14), in Section 5.

4. Proposed model and variance estimator

To estimate the population variance, the following scrambling model is proposed:

Z=γ(Y+S)+(1γ)(Y+YS), (15)

where γ is a constant such that 0γ1, and S is a scrambling variable such that E(S)=0, and Var(S)=SS2. For our proposed model presented in equation (15), we assume that the sensitive variable Y is uncorrelated with the scrambling variable S.

Model's simplicity is one of the criteria for the practical usability of any randomized response model. In almost every sample survey using a randomized response technique, the respondent calculates his/her scrambled response and report it to the interviewer. This makes it necessary that the model employed for data collection should be simple enough so that the respondent can easily calculate and report his/her scrambled response. The proposed model uses a single scrambling variable and hence is simpler than many of the available randomized response models where two scrambling are used. Compared to a two-variable model, a one-variable model puts less burden on the respondents to calculate the scrambled response.

Using the proposed model, a simple estimator of the population variance may be obtained as:

Sz2=γ2SY+S2+(1γ)2SY+YS2. (16)

Equation (16) can be further simplified as:

Sz2=γ2(Sy2+SS2)+(1γ)2[Sy2+E(YS)2{E(YS)}2].

Using the independence of Y and S, further simplification yields:

Sz2=γ2(Sy2+SS2)+(1γ)2[Sy2+(Sy2+μy2)SS2],

or

Sz2=ASy2+[γ2+(1γ)2Z2]SS2, (17)

where

A=γ2+(1γ)2+(1γ)2SS2. (18)

Further simplification of equation (17) yields:

Sy2=Sz2[γ2+(1γ)2Z2]SS2A, (19)

where A is defined in equation (18).

In equation (19), replacing Z and Sz2 by their unbiased estimators, z and sz2, respectively, we obtain the basic estimator of Sy2 as:

t0(R)=sz2[γ2+(1γ)2z2]SS2A. (20)

The estimator given in equation (20) can be used to develop new estimators of population variance. Motivated by the study of Azeem and Hanif [17], the following estimator of the population variance is proposed:

tp=sy2sx*2Sx2, (21)

where

sx*2=(N1)Sx2(n1)sx2Nn, (22)
sy2=1n1i=1n(yiy)2,

and

Sx2=1N1i=1N(xiX)2.

Using the proposed model and using equation (22), the suggested estimator given in equation (21) may be expressed as:

tp(R)=sz2[γ2+(1γ)2z2]SS2A[(N1)Sx2(n1)sx2(Nn)Sx2]. (23)

In the subsequent theorems, we prove the algebraic properties of our proposed estimator presented in equation (23).

Theorem 1

The bias of the proposed estimator may be expressed as:

Bias(tp(R))θA[(1γ)2Z2SS2CZ2DSZ2(λ221)+2D(1γ)2Z2SS2λ12CZ], (24)

where

A=γ2+(1γ)2+(1γ)2SS2,andD=n1Nn.

Proof: In order to obtain the bias, let

sz2=Sz2(1+dz),sx2=Sx2(1+dx),andz=Z(1+ez),

where

dz=sz2Sz2Sz2,dx=sx2Sx2Sx2,andez=zZZ,

so that

E(dz)=E(dx)=E(ez)=0,E(dz2)=θ(λ401),E(dx2)=θ(λ041),E(ez2)=θCz2,
E(dzdx)=θ(λ221),E(dzez)=θλ30Cz,andE(dxez)=θλ12Cz.

Using these notations, the proposed estimator may be expressed as:

tp(R)=Sz2(1+dz){γ2+(1γ)2Z2(1+ez)2}SS2A[(N1)Sx2(n1)Sx2(1+dx)(Nn)Sx2],

or

tp(R)=Sz2+Sz2dz{γ2+(1γ)2Z2}SS22(1γ)2Z2SS2ez(1γ)2Z2SS2ez2A
×[(Nn)Sx2(n1)Sx2dx(Nn)Sx2],

or

tp(R)=Sz2+Sz2dz{γ2+(1γ)2Z2}SS22(1γ)2Z2SS2ez(1γ)2Z2SS2ez2A(1Ddx),

where

D=n1Nn.

Further simplification yields:

tp(R)Sy2=Sz2dz2(1γ)2Z2SS2ez(1γ)2Z2SS2ez2ADdxA[Sz2+Sz2dz{γ2+(1γ)2Z2}SS2
2(1γ)2Z2SS2ez(1γ)2Z2SS2ez2]. (25)

Applying expectation and simplification yields:

E[tp(R)Sy2]=1AE[(1γ)2Z2SS2ez2DSz2dzdx+2D(1γ)2Z2SS2dxez].

On further simplification, we get the result given in equation (24) as:

Bias(tp(R))=θA[(1γ)2Z2SS2Cz2DSz2(λ221)+2D(1γ)2Z2SS2λ12Cz].

This completes the proof.

Theorem 2

The MSE of the suggested estimator can be obtained as:

MSE(tp(R))=θA2[Sz4(λ401)+4(1γ)4Z4SS4Cz2+B2D2(λ041)4(1γ)2Z2Sz2SS2λ30Cz
2BDSz2(λ221)+4(1γ)2BDZ2SS2λ12Cz],

where

B=Sz2{γ2+(1γ)2Z2}SS2.

Proof:

Ignoring higher order terms, equation (25)simplifies to:

tp(R)Sy2=Sz2dz2(1γ)2Z2SS2ezADdxA[Sz2{γ2+(1γ)2Z2}SS2],

or

tp(R)Sy2=1A[Sz2dz2(1γ)2Z2SS2ezDSz2dx+D{γ2+(1γ)2Z2}SS2dx]. (26)

Squaring both sides of equation (26) and applying expectation, we get:

E[tp(R)Sy2]2=1A2E[Sz2dz2(1γ)2Z2SS2ezBDdx]2,

where

B=Sz2{γ2+(1γ)2Z2}SS2.

Further simplification yields the required result.

MSE(tp(R))=θA2[Sz4(λ401)+4(1γ)4Z4SS4Cz2+B2D2(λ041)4(1γ)2Z2Sz2SS2λ30Cz
2BDSz2(λ221)+4(1γ)2BDZ2SS2λ12Cz]. (27)

Differentiating equation (27) with respect to γ and then equating to zero yields the optimum value as:

γopt=1SS2(Sz2λ30BDλ12)2Z2Sz4Cz. (28)

The optimum value ofγfrom equation (28) may be used in equation (27) to get the optimum mean squared error of the suggested estimator.

5. Performance evaluation

Yan et al. [18] proposed a novel metric to quantify the degree of the respondents' privacy. In the case of our proposed model, the respondents’ degree of privacy can be calculated as:

Δ=E(ZY)2=E[γS+(1γ)YS]2,
=γ2E(S2)+(1γ)2E(Y2S2)+2γ(1γ)E(YS2),
=E(S2)[γ2+(1γ)2E(Y2)+2γ(1γ)E(Y)],

or

Δ=SS2[γ2+(1γ)2(Sy2+μy2)+2γ(1γ)μy]. (29)

The measure Δ given in equation (29) uses the respondents’ privacy protection level but ignores another important aspect of model quality – its efficiency. A better approach would be to quantify privacy and efficiency into a single metric. For this purpose, a unified metric was proposed by Gupta et al. [19] as:

δ=MSEΔ. (30)

The model-evaluation measure given in equation (30) is useful for comparison of models.

Using the proposed model, the mathematical expression for the bias and the mean squared error of the basic estimator t0(R) may be obtained as:

Bias(t0(R))=θZ2(1γ)2SS2Cz2A, (31)

and

MSE(t0(R))=θA2[Sz4(λ401)+4(1γ)4Z4SS4Cz24(1γ)2Z2Sz2SS2λ30Cz]. (32)

The symbol A in equation (31) and equation (32) has been defined in equation (18). The bias and mean squared error of the Isaki's [7] estimator t1(R) under the proposed model may be derived as:

Bias(t1(R))=θA[(1γ)2Z2SS2Cz2+Sz2(λ221)2(1γ)2Z2SS2λ12CzB(λ041)], (33)

and

MSE(t1(R))=θA2[Sz4(λ401)+4(1γ)4Z4SS4Cz2+B2(λ041)4(1γ)2Z2SS2Sz2λ30Cz
2BSz2(λ221)+4(1γ)2BZ2SS2λ12Cz]. (34)

The bias and MSE of the Gupta et al. [11] estimator t2(R) under the proposed model may be derived as:

Bias(t2(R))=θA[(1γ)2Z2SS2Cz2gCSz2(λ221)+2(1γ)2gCZ2SS2λ12Cz
+g(g+1)2C2B(λ041)], (35)

and

MSE(t2(R))=θA2[Sz4(λ401)+4(1γ)4Z4SS4Cz2+(ASx2+gBC)2(λ041)4(1γ)2Z2Sz2SS2λ30Cz
+4(1γ)2Z2(ASx2+gBC)SS2λ12Cz2(ASx2+gBC)Sz2(λ221)], (36)

where

C=wαSx2αSx2+β.

The optimum value of g may be obtained by differentiating equation (36) with respect to g and solution of the equation yields:

gˆopt=Sz2(λ221)2(1γ)2Z2SS2λ12CzASx2(λ041)BC(λ041). (37)

This optimum value of g from equation (37) may be used in equation (36) to get the optimum variance of the Gupta et al. [11] estimator t2(R).

The MSE of the Saleem et al. [12] estimator t3(R) under the proposed model may be derived as:

MSE(t3(R))=θ4A2[4Sz4(λ401)+16(1γ)4Z4SS4Cz2+B2(λ041)16(1γ)2Z2Sz2SS2λ30Cz
4BSz2(λ221)+8(1γ)2BZ2SS2λ12Cz]. (38)

The MSE of the Saleem et al. [12] estimator t4(R) under the proposed model may be derived as:

MSE(t4(R))=θA2[Sz4(λ401)+4(1γ)4Z4SS4Cz2+E2(λ041)4(1γ)2Z2Sz2SS2λ30Cz
2ESz2(λ221)+4(1γ)2EZ2SS2λ12Cz], (39)

where

E=ASx2+B2.

We have used the results presented in equation (33), (34), (35) and equation (38), (39) for comparison of the models in Table 2 and Table 3.

Table 2.

MSEs of various estimators for Sx2=10, Sz2=8, μy=3, Sy2=2, N=5000, α=2, β=3.

γ w n t1(R) t2(R) t3(R) t4(R) tp(R)
0.8 0.3 50 24.9205 45.1686 18.6367 36.7023 17.0639
100 12.4603 22.5843 9.3183 18.3512 8.5251
200 6.2301 11.2921 4.6592 9.1756 4.2571
500 2.4921 4.5169 1.8637 3.6702 1.7017
1000 1.2460 2.2584 0.9318 1.8351 0.8633
0.8 50 24.9539 103.9532 18.6438 36.7635 17.0453
100 12.4770 51.9766 9.3219 18.3817 8.5161
200 6.2385 25.9883 4.6609 9.1909 4.2529
500 2.4954 10.3953 1.8644 3.6763 1.7003
1000 1.2477 5.1977 0.9322 1.8382 0.8630
0.4 0.3 50 14.7725 34.6525 11.9102 30.2480 10.4305
100 7.3862 17.3262 5.9551 15.1240 5.2240
200 3.6931 8.6631 2.9775 7.5620 2.6216
500 1.4772 3.4652 1.1910 3.0248 1.0628
1000 0.7386 1.7326 0.5955 1.5124 0.5495
0.8 50 14.9552 61.6075 11.9615 30.8018 10.3531
100 7.4776 30.8038 5.9808 15.4009 5.1867
200 3.7388 15.4019 2.9904 7.7004 2.6043
500 1.4955 6.1608 1.1962 3.0802 1.0577
1000 0.7478 3.0804 0.5981 1.5401 0.5488

Table 3.

δ values of various estimators for Sx2=10, Sz2=8, μy=3, Sy2=2, N=5000, α=2, β=3.

γ w n t1(R) t2(R) t3(R) t4(R) tp(R)
0.8 0.3 50 12.2159 22.1414 9.1356 17.9913 8.3646
100 6.1080 11.0707 4.5678 8.9957 4.1790
200 3.0540 5.5354 2.2839 4.4978 2.0868
500 1.2216 2.2141 0.9136 1.7991 0.8341
1000 0.6108 1.1071 0.4568 0.8996 0.4232
0.8 50 12.2323 50.9575 9.1391 18.0213 8.3555
100 6.1162 25.4787 4.5696 9.0107 4.1745
200 3.0581 12.7394 2.2848 4.5053 2.0847
500 1.2232 5.0957 0.9139 1.8021 0.8335
1000 0.6116 2.5479 0.4570 0.9011 0.4230
0.4 0.3 50 2.6569 6.2325 2.1421 5.4403 1.8760
100 1.3285 3.1162 1.0711 2.7201 0.9396
200 0.6642 1.5581 0.5355 1.3601 0.4715
500 0.2657 0.6232 0.2142 0.5440 0.1912
1000 0.1328 0.3116 0.1071 0.2720 0.0988
0.8 50 2.6898 11.0805 2.1514 5.5399 1.8621
100 1.3449 5.5402 1.0757 2.7699 0.9329
200 0.6724 2.7701 0.5378 1.3850 0.4684
500 0.2690 1.1080 0.2151 0.5540 0.1902
1000 0.1345 0.5540 0.1076 0.2770 0.0987

6. Comparison of estimators

Our suggested estimator will be more efficient than the basic estimator t0(R) if:

MSE(tp(R))<MSE(t0(R)),

or

BD(λ041)<2Sz2(λ221)4(1γ)2Z2SS2λ12Cz.

Our suggested estimator will be more efficient than the Isaki's [7] estimator t1(R) if:

MSE(tp(R))<MSE(t1(R)),

or

n1<Nn.

This condition is strong and always holds if the population size is more than twice the sample size.

The proposed estimator tp(R) will be more efficient than the Gupta et al. [11] estimator t2(R) if:

MSE(tp(R))<MSE(t2(R)),

or

ASx2+B(gC1)>0.

The proposed estimator tp(R) will be more efficient than the Saleem et al. [12] estimator t3(R) if:

MSE(tp(R))<MSE(t3(R)).

On simplification, the above condition reduces to:

B<2(D2)(D21)(λ041)[Sz2(λ221)2(1γ)2Z2SS2λ12Cz].

The proposed estimator tp(R) will be more efficient than the Saleem et al. [12] estimator t4(R) if:

MSE(tp(R))<MSE(t4(R)).

On simplification, the above condition reduces to:

BD+E<2λ041[Sz2(λ221)2(1γ)2Z2SS2λ12Cz].

The mean square errors of the estimators t0(R), t1(R), t2(R), t3(R), t4(R), and the suggested estimator tp(R) are displayed in Table 2 for various values of w, γ, and the sample size n. Table 3 presents the δ values for various choices of ST2, SS2, and the sample size n. Table 4 presents the values of root mean square error (RMSE) of various estimators. Examining Table 2 to Table 4, the improvement in terms of efficiency and δ values may clearly be observed. Moreover, Table 5 shows the simulated values of Mean Absolute Error (MAE) of the proposed and other variance estimators, based on 1000 iterations using different sample sizes from an artificial population generated through R code. The improvement in terms of mean absolute deviation can also be observed from Table 5.

Table 4.

Root Mean Square Error (RMSE) of various estimators for Sx2=10, Sz2=8, μy=3, Sy2=2, N=5000, α=2, β=3.

γ w n t1(R) t2(R) t3(R) t4(R) tp(R)
0.8 0.3 50 4.9920 6.7208 4.3170 6.0582 4.1308
100 3.5299 4.7523 3.0526 4.2838 2.9198
200 2.4960 3.3604 2.1585 3.0291 2.0633
500 1.5786 2.1253 1.3652 1.9158 1.3045
1000 1.1163 1.5028 0.9653 1.3547 0.9291
0.8 50 4.9954 10.1957 4.3178 6.0633 4.1286
100 3.5323 7.2095 3.0532 4.2874 2.9182
200 2.4977 5.0979 2.1589 3.0316 2.0622
500 1.5797 3.2242 1.3654 1.9174 1.3040
1000 1.1170 2.2798 0.9655 1.3558 0.9290
0.4 0.3 50 3.8435 5.8866 3.4511 5.4998 3.2296
100 2.7178 4.1625 2.4403 3.8890 2.2856
200 1.9217 2.9433 1.7256 2.7499 1.6191
500 1.2154 1.8615 1.0913 1.7392 1.0309
1000 0.8594 1.3163 0.7717 1.2298 0.7413
0.8 50 3.8672 7.8490 3.4585 5.5499 3.2176
100 2.7345 5.5501 2.4456 3.9244 2.2774
200 1.9336 3.9245 1.7293 2.7750 1.6138
500 1.2229 2.4821 1.0937 1.7550 1.0285
1000 0.8647 1.7551 0.7734 1.2410 0.7408

Table 5.

Simulated Mean Absolute Error (MAE) of various estimators for Sx2=10, N=5000, α=2, β=3, g = 5.

γ w n t1(R) t2(R) t3(R) t4(R) tp(R)
0.8 0.3 50 10.4141 10.2237 10.0333 10.0536 9.9678
100 10.2323 10.1426 10.0490 10.0620 10.0175
200 10.1975 10.1536 10.1284 10.1309 10.1215
500 10.1625 10.1455 10.1287 10.1315 10.1223
1000 10.1909 10.1836 10.1798 10.1801 10.1768
0.8 50 10.4141 11.9726 10.0333 10.0536 9.9678
100 10.2323 10.9741 10.0490 10.0620 10.0175
200 10.1975 10.5047 10.1284 10.1309 10.1215
500 10.1625 10.3033 10.1287 10.1315 10.1223
1000 10.1909 10.2415 10.1798 10.1801 10.1768
0.4 0.3 50 6.1595 6.0622 5.9214 5.9341 5.8770
100 4.7969 4.7726 4.6920 4.7071 4.6769
200 4.1433 4.1368 4.1057 4.1104 4.1027
500 3.7233 3.7169 3.7119 3.7119 3.7113
1000 3.7072 3.7047 3.7016 3.7021 3.7000
0.8 50 6.1595 7.1953 5.9214 5.9341 5.8770
100 4.7969 5.3234 4.6920 4.7071 4.6770
200 4.1433 4.3700 4.1057 4.1104 4.1027
500 10.4141 10.2237 10.0333 10.0536 9.9678
1000 10.2323 10.1426 10.0490 10.0620 10.0174

7. Discussion and conclusion

This study introduced a new randomized response model for precise estimation of the variance of a finite population. Additionally, a new estimator of the variance has been developed which outperforms the existing variance estimators. The mathematical properties of the suggested variance estimator under the proposed model have been derived. Table 2 shows the mean square error of the Isaki [7] estimator, the Gupta et al. [11] estimator, and the suggested estimator for different sample sizes and for various choices of the constants. The corresponding δ values have been presented in Table 3 for different sample sizes under the proposed model.

Table 2 clearly shows that, based on the proposed model, the proposed estimator is the most efficient estimator. It may also be examined in the table that an increase in the sample size n results in a decline in the mean square error of each estimator. It is also clear that the Isaki's [7] estimator performs better than the Gupta et al. [11] estimator under the proposed model. It may also be observed that as the value of γ changes from 0.8 to 0.4, the mean square error of each estimator decreases.

Glancing at the combined measure of estimator quality, δ, presented in Table 3, one may observe that the proposed estimator produces the best δ values of all three estimators. This makes the suggested variance estimator the most suitable estimator for use with sensitive surveys. Table 3 also shows that the Isaki's [7] estimator produces smaller δ values than the Gupta et al. [11] estimator under the proposed model. Based on the findings of this study, it is recommended for survey researchers to use the proposed variance estimator in situations where the variable of interest is of sensitive nature. Table 5 shows that the proposed variance estimator achieves the least mean absolute error of all five estimators.

The proposed variance estimator is designed for use with simple random sampling where the variable of interest is sensitive in nature. It is recommended for future researchers to extend the proposed estimator and/or the proposed model to other sampling schemes, including stratified sampling and systematic sampling. The proposed estimator can also be used in unequal probability sampling, and it is therefore suggested that future researchers analyze its properties under unequal probability sampling.

It may also be interesting if future researchers analyze the properties of the new suggested estimator in the case of measurement error and non-response error. Researchers may also work on modifying the proposed model for even more improvement in efficiency.

Data availability

All relevant data is available within the manuscript.

Funding for the study

The authors received no funding for this study.

CRediT authorship contribution statement

Muhammad Azeem: Writing – original draft, Validation, Supervision, Methodology, Investigation, Formal analysis, Conceptualization. Najma Salahuddin: Writing – review & editing, Visualization, Software, Investigation, Data curation. Sundus Hussain: Writing – review & editing, Software, Methodology, Investigation. Musarrat Ijaz: Validation, Project administration, Data curation. Abdul Salam: Validation, Software, Investigation.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  • 1.Warner S.L. Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965;60(309):63–69. doi: 10.1080/01621459.1965.10480775. [DOI] [PubMed] [Google Scholar]
  • 2.Warner S.L. The linear randomized response model. J. Am. Stat. Assoc. 1971;66(336):884–888. doi: 10.1080/01621459.1971.10482364. [DOI] [Google Scholar]
  • 3.Gupta S., Gupta B., Singh S. Estimation of sensitivity level of personal interview survey questions. J. Stat. Plann. Inference. 2002;100(2):239–247. doi: 10.1016/S0378-3758(01)00137-9. [DOI] [Google Scholar]
  • 4.Azeem M. Using the exponential function of scrambling variable in quantitative randomized response models. Math. Methods Appl. Sci. 2023;46(13):13882–13893. doi: 10.1002/mma.9295. [DOI] [Google Scholar]
  • 5.Cochran W.G. The estimation of the yields of the cereal experiments by sampling for the ratio of grain to total produce. J. Agric. Sci. 1940;30:262–275. doi: 10.1017/S0021859600048012. [DOI] [Google Scholar]
  • 6.Das A.K., Tripathi T.P. Use of auxiliary information in estimating the finite population variance. Sankhya. 1978;40:139–148. [Google Scholar]
  • 7.Isaki C.T. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983;78(381):117–123. doi: 10.2307/2287117. [DOI] [Google Scholar]
  • 8.Singh S., Horn S., Chowdhury S., Yu F. Calibration of the estimators of variance. Aust. N. Z. J. Stat. 1999;41:199–212. doi: 10.1111/1467-842X.00074. [DOI] [Google Scholar]
  • 9.Kadilar C., Cingi H. Improvement in variance estimation using auxiliary information. Hacettepe Journal of Mathematics and Statistics. 2006;35(1):111–115. [Google Scholar]
  • 10.Subramani J., Kumarapandiyan G. Variance estimation using median of the auxiliary variable. Int. J. Probab. Stat. 2012;1(3):36–40. doi: 10.5923/j.ijps.20120103.02. [DOI] [Google Scholar]
  • 11.Gupta S., Aloraini B., Qureshi M.N., Khalil S. Variance estimation using randomized response technique. REVSTAT – Statistical Journal. 2020;18(2):165–176. [Google Scholar]
  • 12.Saleem I., Sanaullah A., Al-Essa L.A., Bashir S. S. Efficient estimation of population variance of a sensitive variable using a new scrambling response model. Sci. Rep. 2023;13:1–11. doi: 10.1038/s41598-023-45427-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zaman T., Bulut H. Scientia Iranica; 2023. A New Class of Robust Ratio Estimators for Finite Population Variance. [DOI] [Google Scholar]
  • 14.Zaman T., Bulut H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat. Theor. Methods. 2023;52(8):2610–2624. doi: 10.1080/03610926.2021.1955388. [DOI] [Google Scholar]
  • 15.Zaman Q., Ijaz M., Zaman T. A randomization tool for obtaining efficient estimators through focus group discussion in sensitive surveys. Commun. Stat. Theor. Methods. 2023;52(10):3414–3428. doi: 10.1080/03610926.2021.1973502. [DOI] [Google Scholar]
  • 16.Diana G., Perri P.F. A class of estimators of quantitative sensitive data. Stat. Pap. 2011;52(3):633–650. doi: 10.1007/s00362-009-0273-1. [DOI] [Google Scholar]
  • 17.Azeem M., Hanif M. Joint influence of measurement error and non-response on estimation of population mean. Commun. Stat. Theor. Methods. 2017;46(4):1679–1693. doi: 10.1080/03610926.2015.1026992. [DOI] [Google Scholar]
  • 18.Yan Z., Wang J., Lai J. An efficiency and protection degree-based comparison among the quantitative randomized response strategies. Commun. Stat. Theor. Methods. 2008;38(3):400–408. doi: 10.1080/03610920802220785. [DOI] [Google Scholar]
  • 19.Gupta S., Mehta S., Shabbir J., Khalil S. A unified measure of respondent privacy and model efficiency in quantitative rrt models. Journal of Statistical Theory and Practice. 2018;12(3):506–511. doi: 10.1080/15598608.2017.1415175. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All relevant data is available within the manuscript.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES