Skip to main content
Heliyon logoLink to Heliyon
. 2023 Jun 8;9(6):e17121. doi: 10.1016/j.heliyon.2023.e17121

An improved version of systematic sampling design for use with linear trend data

Muhammad Azeem a,, Sundus Hussain b, Musarrat Ijaz c, Najma Salahuddin b, Abdul Salam a
PMCID: PMC10361301  PMID: 37484426

Abstract

In survey sampling, systematic sampling design has attracted survey researchers in recent years due to its simplicity of use. We introduce a modified variant of systematic sampling scheme which improves the efficiency of a recently developed diagonal systematic sampling method. The suggested modification is also found to be more efficient than the other popular sampling designs in circumstances where the units of the population under consideration exhibit an increasing or decreasing perfect or near-perfect linear trend. Moreover, it is observed that the conditions for efficiency are mathematically strong and practically always hold, hence making the suggested sampling design preferable over the available sampling designs.

Keywords: Efficiency, Sample surveys, Diagonal systematic sampling, Milk yield data, Linear trend

MSC: Mathematics Subject Classification 2020, 62D05, 65C05

1. Introduction

In recent decades, survey researchers have taken a keen interest in systematic random sampling as it is an easy but useful procedure of selection of a random sample from a given population of interest. Researchers find systematic sampling even simpler than the default simple random sampling due to the fact that it selects only the first unit (or the first few units) randomly from the population of interest. The remaining units for the sample are obtained by using a pre-defined rule. First introduced by Madow and Madow [1], systematic random sampling and its various forms are used by the researchers under different real-life circumstances. Madow and Madow [1] introduced the novel idea of a pre-defined pattern for selection of units and termed it systematic sampling. Madow and Madow [1] method was only manageable in circumstances where the size of the finite population can be regarded as a constant multiple of the sample size, which limits its usability. To alleviate this problem, Lahiri [2] presented the circular type of systematic sampling design. Later on, Chang and Huang [3] suggested a modification on the systematic random sampling scheme which they called remainder systematic sampling design which can be applied in situations in which the size of a finite population cannot be expressed as a multiple of the size of the sample. Subramani [4] introduced what is known as diagonal-systematic sampling. As its name suggests, the units are obtained diagonally in diagonal systematic sampling. Sampath and Varalakshmi [5] as well as Subramani [6] introduced modified forms of the diagonal systematic sampling design. For those cases with the sample size being an odd integer, Subramani [7] introduced an efficient form of the linear systematic sampling design which was found to be more efficient than the previous versions of systematic sampling. Likewise, another efficient form of the linear systematic sampling design was introduced in the research study of Subramani and Gupta [8]. The Subramani and Gupta [8] technique was beneficial as it didn't require a mathematical relationship between the population and sample size. More recently, Azeem and Khan [9] studied the estimation of mean under a new modification on the diagonal systematic sampling scheme. In addition, several other survey researchers also studied different aspects of the variants of systematic random sampling based on real-life situations, including the research studies of [[10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21]].

Azeem et al. [22] suggested a new modified variant of the diagonal systematic random sampling and proved the improvement in terms of efficiency over both linear and diagonal systematic sampling, in addition to some of the other available sampling schemes. Motivated by the study of Azeem et al. [22], we introduce a modified form of the Azeem et al. [22] sampling method. We observe that our proposed version is more efficient than Azeem et al. [22] sampling scheme. We also show the improvement over some other popular sampling schemes for those real-life situations where a linear trend exists among the units of the population. The sampling variance of the mean on the basis of the new suggested sampling design is derived. The improvement in efficiency is observed for a real data set as well as for situations with a perfect linear trend..

2. Suggested sampling design

Suppose the finite population under consideration consists of N units and let a random sample of size n is needed to be obtained such that N=kn=kk+(nk1)k+k. The new proposed method selects the sample from the finite population in the following steps:

  • 1)

    Partition the entire population into three non-overlapping and exhaustive sets of units: Set-I, Set-II and Set-III, so that Set-I gets the first k×k=k2 units yi (i = 1, 2,…, k2) to form a k×k square matrix, Set-II gets the next (nk2)k units yi (i=kk+1,kk+2,kk+3,...,(n2)k), whereas Set-III gets the last 2k units yi (i=(n2)k+1,(n2)k+2,(n2)k+3,...,nk).

  • 2)

    Arrange the units in Set-I in a k×k matrix of units. Inside Set-II, place the (nk2)k units in the form of a matrix having order (nk2)×k, and in Set-III, organize the last 2k units in a matrix of order 2×k, as given in Table 1.

  • 3)

    Select three random numbers r1, r2 and r3 where 1r1k, 1r2k and 1r3k. In Set-I, the units for the sample are chosen in such a manner that the chosen k units belong to the diagonal of the resulting matrix of units. In Set-II, units for the sample are chosen in a manner so that the chosen nk2 units belong to the r2 th column of the matrix. Likewise, in Set-III, two units are randomly selected from the total 2k units in the set in such a manner that the selected units belong to the same column. Finally, the units selected from all three sets are merged into a single group, hence yielding the required sample of size n.

Table 1.

Population units organized in different sets.

Set-I Set-II
S.No. 1 2 k S.No. 1 2 k
1 y1 y2 yk k+1 ykk+1 ykk+2 ykk+k=(k+1)k
2 yk+1 yk+2 y2k k+2 y(k+1)k+1 y(k+1)k+2 y(k+2)k
3 y2k+1 y2k+2 y3k k+3 y(k+2)k+1 y(k+2)k+2 y(k+3)k
k
y(k1)k+1
y(k1)k+2

ykk
n-2
y(n3)k+1
y(n3)k+2

y(n2)k
Set-III
S.No. 1 2 k
n-1 y(n2)k+1 y(n2)k+2 y(n1)k
n y(n1)k+1 y(n1)k+2 ynk

It is to be noted that the suggested method differs from the one suggested by Azeem et al. [22] method in the sense that it partitions the finite population into three non-overlapping groups as opposed to the two groups in the Azeem et al. [22] method. That is, the last two rows of Set-II in the Azeem et al. [22] have been moved into a separate group – named Set-III. Like the Azeem et al. [22] method, the allocation of units to the three sets is done in a mutually exclusive and collectively exhaustive way in the proposed method too. In Section 4, it is observed that this approach of unit-allotment results in a more improved sampling design.

The proposed sampling scheme divides the population into three non-overlapping groups, which leads to a more efficient sampling design compared to the available sampling schemes which are based on dividing the population into two groups. It is to be noted that the population may be divided into more than three groups for further improvement in efficiency over the suggested method, however, it may result in a very limited number of choices between the values of N and n, which may not be practically applicable. Thus, the division of population into three groups is a good balance between efficiency and the practical usefulness of the sampling scheme.

One can easily observe that the total number of possible samples that can be selected in the new sampling design is k×k×k=k3, each having size n. The first order probability of inclusion based on the suggested sampling design is given by:

πi=1k (1)

where the subscript ‘i’ denotes the ith unit of the population. Also,

πij={1k,ifithandjthunitsbelongtothesamediagonalinSetI,1k,ifithandjthunitsbelongtothesamecolumnofSetIIorSetIII,1k2,ifithandjthunitsarefromtwodifferentsets,0,otherwise. (2)

Generally, the units selected under the proposed sampling scheme are:

Sr1r2r3={yr1,y(k+1)+r1,...,y(k1)(k+1)+r1,ykk+r2,y(k+1)k+r2,...,y(n3)k+r2,yr3,yk+r3.ifr1=1yr1,y(k+1)+r1,...yt(k+1)+r1,y(t+1)k+1,y(t+2)k+2,...,y(k1)k+kt1,ykk+r2,y(k+1)k+r2,...,y(n2)k+r2,yr3,yk+r3.}ifr1>1

where r2=1,2,...,k, r3=1,2,...,k.

The mean on the basis of the new suggested sampling design is given by:

ymsy=w1y1+w2y2+w3y3, (3)

where the mean of the sample from Set-I is given in equation (4) as:

y1={1kl=0k1yl(k+1)+r1,ifr1=1,1k(i=0tyi(k+1)+r1+i=1kt1y(t+i)k+i),ifr1>1. (4)

where t = k - r1. The mean of the sample from Set-I and Set-II are given in equations (5), (6) as:

y2=1nk2l=kn3ylk+r2,w1=kn,w2=nk2n,w3=2n,w1+w2+w3=1, (5)

and,

y3=12(y(n2)k+r3+y(n1)k+r3). (6)

Theorem: The mean of the sample may be expressed in the mathematical form given by Horvitz and Thompson [23] and is unbiased estimator of the population mean with variance given as:

Var(ymsy)=1N2[k4{1ki=1k(y1iY1)2}+(nk2)2k2{1ki=1k(y2iY2)2}+4k2(y3iY3)2],

where y1, y2 and y3 denote the means of the sample obtained from the three sets. Moreover, Y1, Y2 and Y3 denote the mean based on all units belonging to Set-I, II and III, respectively, and k denotes the total samples that can be drawn under the new sampling scheme.

Proof: By definition

ymsy=k2Ny1+(nk2)kNy2+2kNy3=1N(kis1y1i+kis2y2i+kis3y3i), (7)

where s1, s2 and s3 stand for the sample obtained from the units of Set-I, II and III, respectively.

ymsy=1N(is1y1i1/k+is2y2i1/k+is3y3i1/k)=1Nisyiπi=yHT. (8)

The symbol ‘s’ in equation (8) denotes the sample drawn from the total units of population. Applying expectation on both sides of equation (7) gives:

E(ymsy)=k2NE(y1)+(nk2)kNE(y2)+2kNE(y3). (9)

Now,

E(y1)=E(1ki=1ky1i)=1ki=1kE(y1i).
=E(1ki=1ky1i)=E(y1i)=Y1 (10)

Similarly,

E(y2)=E(1nk2i=1nk2y2i)=E(y2i)=Y2, (11)

and,

E(y3)=E(12(y3i+y3i))=Y3. (12)

Now using equations (10), (11), (12) in equation (9) and on simplification, we get E(ymsy)=Y.

Now applying variance on both sides of equation (3) gives:

Var(ymsy)=k4N2Var(y1)+(nk2)2k2N2Var(y2)+4k2N2Var(y3), (13)

where,

Var(y1)=1ki=1k(y1iY1)2, (14)

since each possible sample in Set-I has probability equal to 1/k. Similarly,

Var(y2)=1ki=1k(y2iY2)2, (15)
Var(y3)=1ki=1k(y3iY3)2. (16)

Using equations (14), (15), (16) in equation (13), the variance of the mean ymsy under the suggested sampling scheme is obtained as:

Var(ymsy)=1N2[k4{1ki=1k(y1iY1)2}+(nk2)2k2{1ki=1k(y2iY2)2}+4k2(y3iY3)2]. (17)

Remark 1: Using the approach given by Sen-Yates-Grundy (Sen [24], Yates and Grundy [25]), the sampling variance equivalent to the variance in equation (17) may be obtained as:

Var(ymdsy)=1N2{12i=1Nj=1jiN(πiπjπij)(yiπiyjπj)2}=VarSYG(yHT) (18)

Remark 2: The Sen-Yates-Grundy type estimator of the sampling variance given in equation (18), is given as:

var(ymdsy)=1N2{12i=1nj=1jin(πiπjπijπij)(yiπiyjπj)2}=varSYG(yHT) (19)

One can use the mathematical expressions of πi and πij given in equation (1) and equation (2) in equations (18), (19) to get the sampling variance as well as its unbiased estimator based on the proposed method.

3. Linear trend

Linear trend refers to the arithmetic progression which may be found in the order in which the population units are arranged, which may be in increasing or decreasing pattern. One can observe a moderate to a high degree of linear trend in many practical cases. As an illustration, educational institutes almost everywhere in the world offer admissions on merit basis in various academic departments. Many universities tend to allocate roll numbers to their students on the basis on their quantified academic scores during admission process. In such cases, intelligent students tend to occupy the top enrollment numbers. Thus, upon admission, if the examination marks obtained by students are observed in order of students’ enrollment roll numbers, one can expect a moderate level of increasing linear trend since the top-enrolled students, being the merit toppers, tend to perform better in subsequent examinations.

Likewise, to practically observe a decreasing form of the linear trend, one can think of the daily record of the milk yield data, starting from calving. One can naturally expect that the daily milk yield quantity will tend to decrease with the passage of time, thus resulting in a decreasing type of linear trend.

Suppose N = nk = k∙k + (n-k-1)k + k population units exhibit a linear trend. Thus,

yi=a+bi,wherei=1,2,3,.,N. (20)

The sampling variance under simple random sampling based on perfect linear trend defined in equation (20) is:

Var(yr)=(k1)(N+1)b212 (21)

The sampling variance under systematic random sampling design is as follows:

Var(ysy)=(k1)(k+1)b212 (22)

Further, the variance in the case of diagonal systematic sampling method is as follows:

Var(ydsy)=(kn)[n(kn)+2]b212n (23)

where N = nk + r. The variance under the modified systematic sampling suggested by Subramani [7] is given as:

Var(yssy)=((n1)2+1n2)(k1)(k+1)b212 (24)

The variance of the Azeem et al. [22] sampling scheme is given as:

Var(ymdsy)=(nkn)2(k1)(k+1)b212 (25)

Finally, under complete linear trend in the population units, the variance of the suggested sampling scheme can be derived as:

Var(ymsy)=w12Var(y1)+w22Var(y2)+w32Var(y3) (26)

In order to obtain the variance of the proposed sampling scheme under perfect linear trend, we first need to obtain the variance expressions for y1, y2 and y3. Since the total number of units in Set-I are k × k = k2, so using k = n in equation (23) leads to

Var(y1)=0 (27)

Moreover, since a linear systematic sampling design is used in both Set-II and Set-III and since the right-hand side of equation (22) does not depend on n, so,

Var(y2)=(k1)(k+1)b212 (28)

Similarly,

Var(y3)=(k1)(k+1)b212 (29)

Substituting equations (27), (28), (29) in equation (26), and after simplification, the variance of ymsy is obtained as:

Var(ymsy)=[(nk2)2+4n2](k1)(k+1)b212. (30)

4. Efficiency comparison under perfect linear trend

4.1. Comparison with simple random and systematic random sampling

In circumstances where the units exhibit a perfect linear tendency, the suggested sampling scheme will be more efficient compared to the simple random sampling design if,

Var(ymsy)<Var(yr) (31)

Substituting equations (21) and (30) in equation (31) yields:

[(nk2)2+4n2](k+1)<N+1. (32)

Condition (32) is strong and always holds since N=nk>k. This means that the suggested sampling procedure is more efficient than the simple random sampling procedure.

Our suggested sampling technique will be more efficient than the systematic random sampling design if

Var(ymsy)<Var(ysy) (33)

Substituting equations (22), (30) in equation (33) yields:

[(nk2)2+4n2](k1)(k+1)b212<(k1)(k+1)b212.

The above inequality on further simplification reduces to:

[(nk2)2+4n2]<1, (34)

Condition (34) is strong, thus the suggested method is more efficient compared to systematic random sampling.

4.2. Comparison with Subramani's [7] modified systematic sampling

Under perfect linear trend, the new suggested sampling scheme will be more efficient than the sampling design suggested by Subramani [7] if,

Var(ymsy)<Var(yssy) (35)

Using equations (24), (30) in equation (35) yields:

[(nk2)2+4n2](k1)(k+1)b212<((n1)2+1n2)(k1)(k+1)b212,

Or

[(nk2)2+4n2]<((n1)2+1n2),

Or

(nk2)2+3<(n1)2. (36)

Condition (36) always holds. This implies that the suggested method is always more efficient than Subramani's [7] modified sampling.

4.3. Comparison with diagonal systematic sampling

Our new proposed sampling design will be more precise than the diagonal sampling design if,

Var(ymsy)<Var(ydsy) (37)

Using equations (23), (30) in equation (37)

[(nk2)2+4n2](k1)(k+1)b212<(kn)[n(kn)+2]b212n

Or

[(nk2)2+4n](k1)(k+1)<(kn)[n(kn)+2],

Or

[(nk2)2+4](k1)(k+1)<n(kn)[n(kn)+2] (38)

Since the proposed method uses n > k, which implies condition (38) is strong and always holds.

4.4. Comparison with Azeem et al. [22] Sampling design

Our proposed sampling scheme will be more efficient than Azeem et al. [22] sampling design if,

Var(ymsy)<Var(ymdsy) (39)

Using equations (25), (30) in equation (39) gives:

[(nk2)2+4n2](k1)(k+1)b212<(nkn)2(k1)(k+1)b212,

Or

(nk2)2+4<(nk)2,

Or

(m2)2+4<m2,wherem=nk (40)

Since the proposed method uses n>k, so condition (40) is strong and always hold for m=nk>2n>k+2.

5. Efficiency comparison using milk-yield data

Efficiency comparison is a useful tool to know the usefulness of any statistical technique. Yang et al. [26] utilized efficiency comparison method for ranked set sampling for a modified geometric distribution. The improvement in efficiency of the proposed method over the existing sampling techniques is assessed using the milk yield data taken from Pandey and Kumar [27] and is presented in Table 6 (see Appendix). The results in Table 2 clearly indicate that our suggested sampling scheme is more precise than the popularly used sampling schemes. The milk-yield (measured in liters) related to the Sahiwal cows for a 252 consecutive days period, starting from the day of calving, has been considered from the paper of Pandey and Kumar [27]. One can clearly observe a decreasing linear tendency in the data where the milk yield follows a decreasing pattern over time. The variances of various systematic sampling methods for data set are presented in Table 2. The findings clearly indicate that our new suggested sampling procedure is more precise than the existing sampling designs discussed in Section-3. As the population size in the milk yield data is 252 and since systematic random sampling scheme needs N=kn, so for various choices of n and k, a few units from our population in the milk yield data were randomly removed in order to compromise between the choice of values of n, N, and k. As an example, if we choose n = 10 with k = 25, we delete two population units at random in order to reduce the population size to 250 units in place of using N = 252 units. This will make efficiency comparison feasible for milk yield data.

Table 2.

Calculations of variances of various sampling designs using milk yield data set.

n k Var(yr) Var(ysy) Var(ydsy) Var(yssy) Var(ymdsy) Var(ymsy)
83 3 1.7329 2.0410 1.6124 0.9154 0.8653 0.6723
63 4 1.5051 1.1436 1.0070 0.7900 0.5180 0.3874
49 5 1.0628 0.7286 0.6023 0.5826 0.3496 0.2317
42 6 0.9291 0.6542 0.5317 0.4604 0.2604 0.1831
35 7 0.7690 0.5881 0.4781 0.3713 0.2267 0.1426
31 8 0.7157 0.5017 0.4496 0.3418 0.1807 0.1241
28 9 0.6143 0.4210 0.3305 0.2501 0.1268 0.1004
25 10 0.5243 0.3781 0.2851 0.1947 0.1097 0.0872
22 11 0.5034 0.3535 0.2641 0.1693 0.0988 0.0686
21 12 0.4632 0.3130 0.2345 0.1471 0.0923 0.0713
19 13 0.4153 0.3042 0.2160 0.1260 0.0891 0.0638
18 14 0.3613 0.2896 0.1990 0.1063 0.0856 0.0609
16 15 0.3650 0.2775 0.1630 0.0993 0.0833 0.0581

Next let us consider the cases in which a perfect linear trend is followed by the units of a finite population. Efficiency comparison has been conducted for different choices of the values of N, n, and k. The variances of the sample mean of the suggested and a few other popular sampling designs have been given in Table 3. The different values of the sample and population sizes for the purpose of efficiency comparison were taken in a manner so that N=kn where n>k. It is also to be noted that the constant b2 is a multiplication factor in the variance expressions of all of the sampling procedures discussed in Section 3, so in order to make the efficiency comparison analysis simple, b = 1 has been used in the calculation of the variances in Table 3. The findings from Table 3 clearly indicate that our proposed systematic sampling scheme is superior in terms of efficiency over the existing sampling schemes, including the one suggested by Azeem et al. [22].

Table 3.

Linear trend – based variances of different sampling designs.

n k Var(yr) Var(ysy) Var(ydsy) Var(yssy) Var(ymdsy) Var(ymsy)
10 4 10.25 1.25 2.90 1.03 0.45 0.25
6 25.42 2.92 1.27 2.39 0.47 0.23
8 47.25 5.25 0.30 4.31 0.21 0.21
30 5 50.33 2.00 51.94 1.87 1.39 1.18
10 225.75 8.25 33.22 7.72 3.67 3.01
15 526.17 18.67 18.67 17.46 4.67 3.59
20 951.58 33.25 8.28 31.11 3.69 2.51
25 1502.00 52.00 2.06 48.65 1.44 0.75
50 10 375.75 8.25 133.20 7.93 5.28 4.78
20 1584.92 33.25 74.90 31.95 11.97 10.48
30 3627.42 74.92 33.27 71.98 11.99 9.83
40 6503.25 133.25 8.30 128.03 5.33 3.62
100 20 3168.25 33.25 533.20 32.59 21.28 20.24
40 13003.25 133.25 299.90 130.61 47.97 44.88
60 29504.92 299.92 133.27 293.98 47.99 43.43
80 52673.25 533.25 33.30 522.69 21.33 17.49
500 100 412508.25 833.25 13333.20 829.92 533.28 527.97
200 1658349.92 3333.25 7499.90 3319.94 1199.97 1184.08
300 3737524.92 7499.92 3333.27 7469.98 1199.99 1176.23
400 6650033.25 13333.25 833.30 13280.02 533.33 512.42

6. Simulation study

A simulation study was carried out to compare the performance of the proposed sampling design with the Azeem et al. [22] sampling design. We used two data sets for simulation – a real data set and an artificial data set. The real data set was taken from Pandey and Kumar [27] which exhibits a decreasing trend, as shown in Fig. 1. Besides the real population, an artificial population of 400 units with a non-exact linear trend was generated, as shown in Fig. 2.

Fig. 1.

Fig. 1

Linear trend in milk yield data.

Fig. 2.

Fig. 2

Linear trend in artificial data.

The results of simulated variances of the mean under the proposed and the Azeem et al. [22] sampling designs for the real and artificial data sets have been presented in Table 4 and Table 5, respectively. The real data set on milk yield for 420 days was taken from Pandey and Kumar [27] and is presented in Table 7 (see Appendix). The results presented in the tables were averaged over 1000 iterations. It is also to be noted that in order to reconcile between the population and sample sizes, a few values were randomly deleted to make calculations possible. One may clearly observe the improvement in efficiency over the Azeem et al. [22] systematic sampling design.

Table 4.

Simulated variances based on milk yield data.

k n Var(μˆmsy) Var(μˆmdsy)
10 40 0.07020625 0.07897375
39 0.06506246 0.0687879
38 0.0676392 0.07166136
37 0.06648035 0.06871658
9 43 0.0623855 0.06300824
42 0.0646646 0.06662195
41 0.06323439 0.06426209
35 0.08303383 0.08589805
8 50 0.01932045 0.02289855
47 0.02149887 0.02213083
46 0.0197552 0.02069282
25 0.072272 0.0769128
7 55 0.01034132 0.01279601
53 0.01082763 0.01154477
45 0.0120892 0.01298613
42 0.01400712 0.01565078
6 65 0.03050977 0.03143663
62 0.02805151 0.03172945
60 0.02847568 0.02886475
55 0.03090053 0.03379904
5 80 0.01470031 0.01726031
77 0.01508396 0.01544109
65 0.02651124 0.02723342
62 0.0301295 0.0331334

Table 5.

Simulated variances based on synthetic data.

k n Var(μˆmsy) Var(μˆmdsy)
10 40 0.00349539 0.004362683
39 0.003673637 0.003827209
37 0.003856536 0.0038967
35 0.003797756 0.004148683
9 43 0.007839106 0.008811967
42 0.008341945 0.008632896
41 0.008479745 0.008992375
35 0.009368519 0.01024975
8 48 0.007535187 0.008435035
45 0.00725946 0.008845018
43 0.006945541 0.00778288
38 0.008121302 0.008583599
7 55 0.003963771 0.005508161
53 0.003644271 0.003992082
50 0.00321205 0.003999207
40 0.004209553 0.004506942
6 65 0.01133966 0.01206911
62 0.0102278 0.0116993
60 0.01012279 0.01077084
55 0.006761836 0.009083443
5 80 0.003063108 0.003224376
77 0.002667662 0.002964467
70 0.00191757 0.002318078
63 0.001259042 0.001458852

7. Conclusion

Unlike the Azeem et al. [22] sampling design which divides the entire population into two disjoint and exhaustive subgroups, the suggested sampling scheme partitions the finite population into three mutually exclusive subsets. For the purpose of sample selection, the new proposed sampling design utilizes the diagonal systematic sampling method in Set-I, and a linear systematic sampling method in the other two sets. A weighting approach is then used to estimate the finite population mean based on the new sampling scheme. Efficiency comparison analysis has been carried out to study the performance of the new sampling scheme with other existing sampling schemes, using a real data set as well as perfect linear trend, and the improvement in efficiency has been shown. Based on the empirical and theoretical analysis in the current study, the proposed method is recommended to be utilized in those practical circumstances where a high degree of an increasing or decreasing linear trend exists in the population.

Our analysis showed the improvement in efficiency over the existing sampling schemes for the mean estimator. For future research, it may be interesting if researchers study the efficiency in estimators of population variance under the proposed and the available sampling schemes.

Author contribution statement

Muhammad Azeem, Ph.D.: Conceived and designed the experiments; Performed the experiments; Wrote the paper.

Sundus Hussain: Performed the experiments; Analyzed and interpreted the data.

Musarrat Ijaz: Performed the experiments; Contributed reagents, materials, analysis tools or data.

Najma Salahuddin: Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data.

Abdul Salam: Conceived and designed the experiments; Contributed reagents, materials, analysis tools or data.

Data availability statement

Data included in article/supp. material/referenced in article.

Funding statement

The authors received no funding for this study.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.heliyon.2023.e17121.

APPENDIX.

Table 6.

Horizontally Day wise Milk-Yield Data Set for S-19 brand of Cows (Pandey and Kumar [27])

10 10 10 10 10 10 10 10 10 10 10 10 11 12 12 12
12 12 12 12 12 12 12 13 12 12 12 12 12 12 13 11
12 12 12 12 14 13 13 13 13 13 13 12 12 12 12 13
13 12 12 13 12 12 12 12 13 12 12 12 12 12 12 12
12 12 12 12 13 12 12 12 12 12 12 12 12 13 12 12
12 12 13 12 13 12 12 12 12 12 12 12 12 12 12 12
12 11 10 14 15 14 14 13 14 14 12 14 14 14 13 14
12 14 13 14 14 14 13 14 14 14 13 14 13 12 13 13
12 12 12 12 12 12 13 13 15 12 13 12 12 12 12 12
11 11 12 12 12 12 12 12 12 12 13 12 12 12 12 12
12 12 12 13 13 13 14 15 14 14 14 12 14 14 14 12
13 12 12 12 12 13 12 12 13 12 12 12 11 10 11 11
11 11 11 10 10 10 11 11 10 10 10 12 12 12 12 12
12 12 12 13 13 13 14 15 14 14 14 12 14 14 14 12
12 12 12 12 10 11 9 8 8 8 8 8 8 9 8 8
8 8 8 8 8 9 8 8 8 9 8 9 8 8 8 8
8 8 8 8 8 8 8 9 9 9 9 9 9 8 8 8
8 9 8 3 2 2 1 0

Table 7.

Horizontally Day wise Milk Yield Data (in Liters) for X-205 brand of Cows for 420 days

8 10 8 8 17 18 18 18 18 18 18 18 18 20 24 24
24 26 26 26 26 24 25 26 27 26 27 26 24 24 25 26
25 25 22 21 26 24 24 24 26 26 27 27 27 28 28 27
26 15 25 27 27 26 26 27 27 28 28 26 26 27 26 26
27 28 27 26 28 17 25 27 27 27 29 26 26 26 26 27
25 27 22 25 25 24 24 24 24 23 22 23 24 24 23 22
21 23 23 24 23 23 23 23 22 23 23 22 22 23 23 24
21 20 20 21 21 21 20 20 20 20 21 20 20 20 18 19
19 18 20 20 21 20 21 19 18 19 18 19 19 18 19 18
21 21 21 22 22 20 20 21 22 21 20 20 21 20 20 21
20 22 22 20 19 20 20 20 21 21 20 20 20 21 21 17
19 18 18 19 18 18 17 16 17 19 20 21 20 21 20 20
20 20 19 19 18 19 20 20 18 16 16 17 16 16 17 14
15 16 16 16 16 15 16 16 17 16 15 14 15 15 16 15
16 14 13 12 13 14 13 12 12 11 11 10 16 16 17 17
15 14 15 15 15 15 17 17 18 17 16 16 17 16 16 16
17 16 16 17 17 16 16 16 16 16 17 13 13 13 12 12
12 12 12 13 12 13 12 13 12 12 13 12 12 12 12 14
13 12 12 13 12 12 13 12 13 13 13 14 13 12 12 14
14 14 12 12 12 13 13 12 12 13 13 12 13 12 11 12
10 10 11 11 11 11 11 12 12 12 12 13 12 12 12 12
12 11 13 12 12 12 11 11 11 10 10 11 11 12 11 10
10 11 10 11 11 11 10 12 12 12 12 13 12 12 12 13
12 12 12 12 12 12 12 12 12 12 12 12 13 11 12 11
12 12 12 11 11 14 12 12 12 10 12 12 12 13 12 12
12 12 13 13 12 12 10 11 12 8 8 7 6 7 6 7
6 5 4 1

Appendix A. Supplementary data

The following is the Supplementary data to this article:

Multimedia component 1
mmc1.txt (1.9KB, txt)

References

  • 1.Madow W.G., Madow L.H. On the theory of systematic sampling. Ann. Math. Stat. 1944;15:1–24. [Google Scholar]
  • 2.Subramani J., Gupta S.N., Prabavathy G. Circular systematic sampling in the presence of linear trend. Am. J. Math. Manag. Sci. 2014;33:1–19. [Google Scholar]
  • 3.Chang H.J., Huang K.C. Remainder linear systematic sampling. Sankhya: Indian J. Stat. Ser. B. 2000;62:249–256. [Google Scholar]
  • 4.Subramani J. Diagonal systematic sampling scheme for finite populations. J. Indian Soc. Agric. Stat. 2000;53:187–195. [Google Scholar]
  • 5.Sampath S., Varalakshmi V. Diagonal circular systematic sampling. Model Assist. Stat. Appl. 2008;3:345–352. [Google Scholar]
  • 6.Subramani J. Further results on diagonal systematic sampling for finite populations. J. Indian Soc. Agric. Stat. 2009;63:277–282. [Google Scholar]
  • 7.Subramani J. A modification on linear systematic sampling for odd sample size. Bonfring Int. J. Data Min. 2012;2:32–36. [Google Scholar]
  • 8.Subramani J., Gupta S.N. Generalized modified linear systematic sampling scheme for finite populations. Hacet. J. Math. Stat. 2014;43:529–542. [Google Scholar]
  • 9.Azeem M., Khan Z. An improved diagonal-cum-linear systematic sampling scheme. Indian J. Econ. Bus. 2021;20:1197–1210. [Google Scholar]
  • 10.Madow W.G. On the theory of systematic sampling III-comparison of centered and random start systematic sampling. Ann. Math. Stat. 1953;24:101–106. [Google Scholar]
  • 11.Yates F. Systematic sampling. Philos. Trans. R. Soc. A. 1948;241:345–377. doi: 10.1098/rsta.1948.0023. [DOI] [Google Scholar]
  • 12.Bellhouse D.R., K Raom J.N. Systematic sampling in the presence of linear trends. Biometrika. 1975;62:694–697. [Google Scholar]
  • 13.Bellhouse D.R. Systematic sampling. Handb. Stat. 1988:125–145. [Google Scholar]
  • 14.Fountain R.L., Pathak P.L. Systematic and non-random sampling in the presence of linear trends. Commun. Stat. Theor. Methods. 1989;18:2511–2526. [Google Scholar]
  • 15.Sampath S., Uthayakumaran N. Markov systematic sampling. Biom. J. 1998;40:883–895. [Google Scholar]
  • 16.Subramani J. A modification on linear systematic sampling. Model Assist. Stat. Appl. 2013;8:215–227. [Google Scholar]
  • 17.Ashwood F., Vanguelova E.I., Benham S., Butt K.R. Developing a systematic sampling method for earthworms in and around deadwood. For. Ecosyst. 2019;6:1–12. doi: 10.1186/s40663-019-0193-z. [DOI] [Google Scholar]
  • 18.Connor M.J., Miah S., Jayadevan R., Khoo C.C., Eldred-Evans D., Shah T., Ahmed H.U., Marks L. Value of systematic sampling in an mp-MRI targeted prostate biopsy strategy. Transl. Androl. Urol. 2020;9:1501–1509. doi: 10.21037/tau.2019.07.16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zamanzade E., Mahdizadeh M., Samawi H.M. Efficient estimation of cumulative distribution function using moving extreme ranked set sampling with application to reliability. AStA Adv. Stat. Anal. 2020;104:485–502. doi: 10.1007/s10182-020-00368-3. [DOI] [Google Scholar]
  • 20.Mahdizadeh M., Zamanzade E. Estimation of a symmetric distribution function in multistage ranked set sampling. Stat. Pap. 2020;61:851–867. doi: 10.1007/s00362-017-0965-x. [DOI] [Google Scholar]
  • 21.Pandey K.K., Shukla D. Stratified linear systematic sampling based clustering approach for detection of financial risk group by mining of big data. Int. J. Syst. Assur. Eng. Manag. 2022;13:1239–1253. doi: 10.1007/s13198-021-01424-0. [DOI] [Google Scholar]
  • 22.Azeem M., Asif M., Ilyas M., Rafiq M., Ahmad S. An efficient modification to diagonal systematic sampling for finite populations. AIMS Math. 2021;6:5193–5204. [Google Scholar]
  • 23.Horvitz D.G., Thompson D.J. A generalization of sampling without replacement from finite universe. J. Am. Stat. Assoc. 1952;47:663–685. [Google Scholar]
  • 24.Sen A.R. On the estimate of variance in sampling with varying probabilities. J. Indian Soc. Agric. Stat. 1953;5:119–127. [Google Scholar]
  • 25.Yates F., Grundy P.M. Selection without replacement from within strata with probability proportional to size. J. R. Stat. Soc. Ser. B. 1953;15:253–261. [Google Scholar]
  • 26.Yang R., Chen W., Yao D., Long C., Dong Y., Shen B. The efficiency of ranked set sampling design for parameter estimation for the log-extended exponential–geometric distribution. Iran. J. Sci. Technol. Trans. A-Science. 2020;44:497–507. [Google Scholar]
  • 27.Pandey T.K., Kumar V. Systematic sampling for non-linear trend in milk yield data. J. Reliabil. Stat. Stud. 2014;7:157–168. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.txt (1.9KB, txt)

Data Availability Statement

Data included in article/supp. material/referenced in article.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES