Skip to main content
Entropy logoLink to Entropy
. 2023 Apr 3;25(4):611. doi: 10.3390/e25040611

Sample Size Calculations in Simple Linear Regression: A New Approach

Tianyuan Guan 1,2, Mohammed Khorshed Alam 2, Marepalli Bhaskara Rao 2,*
Editor: Yuehua Wu
PMCID: PMC10137477  PMID: 37190399

Abstract

The problem tackled is the determination of sample size for a given level and power in the context of a simple linear regression model. The standard approach deals with planned experiments in which the predictor X is observed for a number n of times and the corresponding observations on the response variable Y are to be drawn. The statistic that is used is built on the least squares’ estimator of the slope parameter. Its conditional distribution given the data on the predictor X is utilized for sample size calculations. This is problematic. The sample size n is already presaged and the data on X is fixed. In unplanned experiments, in which both X and Y are to be sampled simultaneously, we do not have data on the predictor X yet. This conundrum has been discussed in several papers and books with no solution proposed. We overcome the problem by determining the exact unconditional distribution of the test statistic in the unplanned case. We have provided tables of critical values for given levels of significance following the exact distribution. In addition, we show that the distribution of the test statistic depends only on the effect size, which is defined precisely in the paper.

Keywords: least squares estimator, level, power, unconditional distribution

1. Introduction

Multiple regression is one of the core methodologies in statistics. Power computation and sample size determination have become integral part of many research proposals submitted for funding. Funding agencies such as UKRI (UK Research and Innovation) and NIH (National Institutes of Health) have been demanding sample size calculations in all prospective proposals. Regression has a long history dating back to Galton [1]. Horton and Switzer [2] reported that 51% of research articles published in the New England Journal of Medicine during May 2004 have Multiple Regression as one of the methods used. The figure for power analysis is at 39%.

In this paper, we focus on power computation in the context of simple linear regression. The current approach in power computations lacks justification. We will point out difficulties in this setting [3].

Simple linear regression is ubiquitous in pediatric clinical diagnostics. The model sets standards for normal growth in children on several metrics [4]. As an illustration, a pediatrician wants to check whether the lung function of a 13-year-old patient is normal. Data is to be collected on healthy subjects in the age range 12–14 years with response,

Y = FEV (Forced Expiratory Volume)

and predictor,

X = Height,

which is an example of an unplanned experiment.

In order to trust the model, we need to decide on the sample size, which in turn, depends on the level of significance, power, and effect size.

First, we investigate the setting under the simple linear regression paradigm. The model has two entities X, the predictor, and Y, the response variable. It is stated as

Y|X~Nβ0+β1X, σ2

for some β0, β1, and σ2>0. The null hypothesis of interest is H0:β1=0 against the alternative H1:β10. What should be the required sample size, n, for a given level of significance α, power 1-β, and at the alternative value A of β1. Let (X1, Y1), (X2, Y2), …, (Xn, Yn), be a potential sample for the testing problem. Let β^1 be the least squares estimator of β1, i.e.,

β^1=SXYSXX

where

SXY=i=1nXiX¯YiY¯

and

SXX=i=1nXiX¯2

Let RSS be the residual sum of squares, i.e.,

RSS=i=1nYiY¯β^1XiX¯2

For testing the null hypothesis H0, the following test statistic is used:

T=β^1SXX/RSS/n2.

Under the null hypothesis, conditioned on the X-data, T has a t-distribution with n − 2 degrees of freedom. Under the alternative value β1 = A, T has a non-central t-distribution with degrees of freedom n − 2, and non-centrality parameter λ=ASxx /σ.

We reject the null hypothesis if and only if T>tn2,1α2 where tn2,1α2 is such that the area to the left of Student’s t-curve on (n − 2) degrees of freedom is 1 − α/2.

The power formula is given by

PowerA=PrReject H0|β1=A=PrT>tn2,1α2|β1=A.

We can set power equal to 1-β and solve for n. This would work as long as we know what λ=ASXX /σ is. This would require knowledge of the alternative value of β1, σ2, and SXX. We will not know what SXX is, prior to data collection, in the unplanned experiments. Equivalently, one should spell out what λ is. This is a tall order. Adcock [5] recognized these problems. Some software and textbooks assume that 1/ni=1n(XiX¯)2 is known. For example, the software PASS [6] and nQuery [7] proceed this way. To overcome these difficulties, we proceed with deriving the exact unconditional distribution of a variant of T. This requires a knowledge of the distribution of X. Let σX2 be the variances of X.

Modify the test statistic.

T=β^1σ^X/σ^,  (1)

where σ^2=RSS/n2 andσ^X2=SXX/n1.

We obtain the unconditional distribution of T under β1 = 0 as well as under β1 = A ≠ 0. We assume X~N (µx, σX2), both parameters unknown. Under this assumption, the distribution of T is derived.

In due course, we will show the distribution of T when β1 = A ≠ 0 depends only on δ = AσX/σ, which we can deem as the effect size.

The five-parameter model now is:

Y|X~Nβ0+β1X, σ2 
X~N(µx, σX2).

Note that the vector (X, Y) has a bivariate normal distribution.

The paper is organized as follows. In Section 2, we provide a literature review. In Section 3, we outline the main results. We derive the unconditional distribution of T under the null hypothesis in Section 3.1. In Section 3.2, we calculate critical values using the main results. In Section 3.3, we lay out the sample sizes required for a given level, power, and effect size δ =AσX/σ. In Section 4, we summarize the results and draw conclusions. The computational details along with the R code [8] are presented in the Supplementary Materials.

2. Literature Review

Ryan [3] has pointed out difficulties in power calculations in the environment of simple linear regression. The problem is how we handle the predictor X. Adcock [5] has looked at some possible scenarios. One scenario is that the investigator knows the Xi-values (deterministic) for every sample size n. In such a case, the test statistic

β^1SXX/RSS/n2 (2)

is eminently usable for power calculations. Its (conditional) null and non-null distributions have been worked out explicitly. The conditional approach is also followed by Dupont et al. [9], Draper et al. [10], Hsieh et al. [11], Maxwell [12], and Thigpen [13].

As an alternative to the test statistic (2), we can build a test based on the sample correlation coefficient ρ^ [14], under the joint normality of X and Y. The null and non-null distributions of the underlying test statistic based on ρ^ have been worked out explicitly. In our consulting work, many researchers prefer to use the test based on β^1. It is a choice between causality and association [3,14,15,16,17,18,19,20,21]. The hypotheses H0: β1 = 0 and H0: ρ  = 0 under bivariate normality are equivalent, but the test statistics are different. It is easy to determine sample size under the correlation context [14]. However, this sample size cannot be offered for testing the hypothesis on the slope. The power is less. In other words, test hopping is not permissible; i.e., they are two different tests with distinct power functions.

3. Outline of Results

We will now derive the unconditional distribution of β^1, which will be instrumental in sample size calculations. We use the test statistic T = β^1σ^X/σ^.

Under the null hypothesis β1 = 0, we show that

T2~n2n1W1W4W2W3,

where W1~χ12, W2~χn12, W3~χn22 and W4~χn12, with the Wi values being mutually independent. It follows implicitly that

T ~ n2n1U1U2/U3

with U1, U2, U3 independently distributed, U1~tn1, U2~χn1, and U3~χn2, and where χn1 is the χ distribution with n1 degrees of freedom

We use this result to obtain the critical values of the test based on T, for given levels. For power and sample size computations, we need the distribution of T for any given value of β1. The distribution depends on the alternative values of β1, σX2 and σ2. It turns out that the distribution depends only on δ = β1σX/σ, which we can deem as the effect size. The specification of δ facilitates computation of power. Despite all these deliberations, no magic explicit formula for power surfaces. Knowing the distribution of T2 when δ is spelled out, the pain is eased a little bit.

3.1. Distributional Results

In this section, we will derive the distribution of T of (1) unconditionally. The following series of steps will give the desired result.

  1. Given X1, X2, …, Xn, β^1 has a normal distribution with mean β1 and variance σ2/SXX and β^1 and RSS are independent.

  2. Uncoditionally, RSS/σ2~χn22

  3. SXX/σX2~χn12.

  4. RSS and SXX are independent.

More generally, we obtain the distribution of T=β^1β1σ^X/σ^ for a given value of β1.

The joint density function of β^1 and SXX:

gβ^1,SXX=SXX2πσexpSXX2σ2β^1β121Гn122n12 expSXX2σX2SXXσX2n1211σX2,<β^1<, 0<SXX<

The (unconditional) marginal density of β^1 is given by:

fβ^1=12122n12πГn12σσX2n120SXXn21expSXX2 β1^β12σ2+1σX2dSXX=Гn2πГn12σσX2n1211σX2+β^1β12σ2n2=σXB12,n12σ11+β^1β12σX2σ2n2,             <β^1<

Some properties of this density are clear to observe. For example, the distribution is symmetric around the true value β1. If n = 2, the distribution is Cauchy. In addition,

σXσβ^1β1n1 ~tn1.

Further, if n > 3, unconditionally,

  1. Eβ^1=β1 and Varβ^1=(σ2/σX2)n31;

  2. In the conditional set-up,
    Eβ^1|X1,X2,,Xn=β1
    Varβ^1|X1,X2,,Xn=σ2/SXX;
  3. The random variable U=β^1β1σX/σ has the probability density function:
    fU=1B12,n1211+U2n2, <U<;
  4. It follows that U2~W1/W2, where W1~χ12 and W2~χn12, with W1 and W2 being independent;

  5. Exact distribution of T2: note that β^1β1/σ and σ^X2 are independent;

  6.  
    T2=β^1β12σ^X2/σ^2=β^1β12σX2/σ2(σ2/σ^2)σ^X2/σX2~(W1/W2)n2/W3W4/n1,
    where W3~χn22 and W4~χn12, and with W1,W2,W3, and W4 being independent.
    In short,
    T2 ~ n2/n1(W1W4)/(W2W3) (3)
  7. It follows that:
    ET2=n2/{n3n4 
  8. An alternative form of the distribution [22]:
    n1/n2T2 ~ (W1W4)/(W2W3)~ BetaII12,n22BetaIIn12,n12,
    where BetaII signifies the beta distribution of the second kind.

3.2. Critical Values

We obtain the critical values of the test based on the test statistic T=β^1σ^X/σ^ for three levels of significance. We denote the critical value by Cn,α. The critical value Cn,α satisfies the equation:

α=Prβ^1σ^Xσ^>Cn,α|H0:β1=0=Prβ^12σ^X2σ^2>Cn,α2|H0:β1=0

Under H0,

T2=β^12σ^X2σ^2~n2n1W1W4W2W3,

where W1~χ12, W2~χn12, W3~χn22 and W4~χn12, with Wi values being independent.

There are two options. One is using the pdf of n2/n1  T2. Following Jambunathan [22], one can write the pdf of the product U*V of the random variables U and V with U~BetaII1/2 ,n2/2, V~ BetaIIn1/2,n1/2, and U and V being independent. The pdf is in the form of a double integral and its evaluation would require the use of a quadrature formula with the attendant errors of approximation. The second option is to determine the distribution of T2 by sampling extensively the components that make up T2 via Monte-Carlo. We have pursued the second option. The critical values are tabulated in File S1.

One can also obtain the critical value Cn,α via the asymptotic distribution of T. One benefit of our derivation of the exact distribution is that if n is large, and null hypothesis is true,

T~Normal (0, n2/{n3(n4)}), approximately.

There are several ways to establish the asymptotic normality of T. The exact unconditional distribution of n1 β^1β1σX/σ is tn−1, which is asymptotically N (0, 1). Then we use the fact that σ^X is consistent for σX and that σ^ is consistent for σ. Since we know the variance of T exactly, we use this variance in the description of the asymptotic distribution of T. We can now calculate the critical values, as well as those coming from the exact distribution, following the asymptotic distribution.

In File S1, we report the average critical values Cn,α along with the critical values stemming from the asymptotic theory. A description of these asymptotic critical values is provided below.

Critical values from the normal approximation:

Level Critical Value Formula Verbal description in File S1
10% 1.645×n2/n3n4 10% normal
5% 1.96×n2/n3n4 5% normal
1% 2.576×n2/n3n4 1% normal

Comments on File S1: The Normal Critical Value column is explained.

  1. Normal critical value 10% = critical value coming from the asymptotic distribution when α = 0.10.

  2. Normal critical value 5% = critical value coming from the asymptotic distribution when α = 0.05.

  3. Normal critical value 1% = critical value coming from the asymptotic distribution when α = 0.01.

  4. Critical value 10% = critical value coming from the exact distribution of T when α = 0.10.

  5. Critical value 5% = critical value coming from the exact distribution of T when α = 0.05.

  6. Critical value 1% = critical value coming from the exact distribution of T when α = 0.01.

  7. When α = 0.10, |Normal critical value 10%: Critical value 10%| ≤ 0.001 for n ≥ 50. One can enjoy the benefit of normal approximation when n ≥ 50.

  8. When α = 0.05, |Normal critical value 5%: Critical value 5%| ≤ 0.001 for n ≥ 89. One can enjoy the benefit of normal approximation when n ≥ 89.

  9. For α = 0.01, Table 1 is not informative when |Normal critical value 1%: Critical value 1%| ≤ 0.001.

3.3. Sample Size and Power

For a given level α, sample size n, and alternative value of β1 = A, power is given by

Power (A)=Pr(β^1σ^X/σ^>Cn,α | β1=A). 

Suppose 1 − β is the specified power. For the sample size, we set

1β=Pr(β^1σ^X/σ^>Cn,α | β1=A).

and solve for n. We will need the distribution of β^1σ^X/σ^, when β1=A. Rewrite

β^1σ^X/σ^=(β^1 β1)σ^X/σ^+β1σ^X/σ^. 

The distribution of (β^1 β1)σ^X/σ^ is described in Section 3.1 and it is free of the parameters of the regression model. Consequently, the random variables (β^1 β1)σ^X/σ^ and β1σ^X/σ^ are independently distributed. Since σ^ and σ^X are independently distributed,

β1σ^X/σ^2 d β12σX2/n1W5n2/σ21/W6=β1σX/σ2n2/n1W5/W6,

with W5~χn12, W6~χn22, and W5 and W6 being independent.

An important fact emerges from these deliberations in that the distribution of (β^1 β1)σ^X/σ^ + β1σ^X/σ^ depends only on δ = β1σX/σ, which we declare as the effect size.

In short, when β1 = A ≠ 0, the key steps are:

  • T = (β^1 β1)σ^X/σ^ + β1σ^X/σ^

  • with {(β^1 β1)σ^X/σ^}2~n2/n1(W1W4)/(W2W3),

  • (β1σ^X/σ^)2~AσX/σ2n2/n1W5/W6,

  • (β^1 β1)σ^X/σ^ and β1σ^X/σ^ are independent,

and the distribution of T depends only on n and effect size δ = AσX/σ.

In spite of all these labors, the distribution of T is not amenable to direct and simple computation of power.

We simulate the regression model for power computations. Simulations are greatly simplified when we exploit the key nature of the alternative distribution, namely, that it depends only on n and δ. Simulations are reported in the Supplementary Materials. Sample sizes are tabulated in Table 1, Table 2 and Table 3.

Table 2.

Sample Size for Given Effect Size, Power, Level of Significance 5%, Mean of Power in the Validation Step, and its Standard Deviation.

α ES = β1 ∗ (σx/σ) Power n Mean Sd
0.05 0.1 80% 790 0.8006 0.0129
90% 1080 0.9054 0.0088
95% 1350 0.9557 0.0067
99% 1850 0.9898 0.0032
0.05 0.2 80% 199 0.797 0.0133
90% 272 0.9039 0.0094
95% 330 0.9497 0.0069
99% 450 0.9891 0.0033
0.05 0.3 80% 91 0.7978 0.0124
90% 123 0.9028 0.0094
95% 150 0.9505 0.0067
99% 220 0.992 0.0028
0.05 0.4 80% 53 0.773 0.0128
90% 70 0.8966 0.0096
95% 87 0.9494 0.0071
99% 121 0.9891 0.0034
0.05 0.5 80% 36 0.8051 0.0124
90% 48 0.9095 0.0091
95% 58 0.95 0.0068
99% 79 0.9888 0.0033
0.05 0.6 80% 26 0.8005 0.0124
90% 34 0.8985 0.0094
95% 43 0.9547 0.0066
99% 59 0.9901 0.0031

Table 3.

Sample Size for Given Effect Size, Power, Level of Significance 1%, Mean of Power in the Validation Step, and its Standard Deviation.

α ES = β1 ∗ (σx/σ) Power n Mean Sd
0.01 0.1 80% 1180 0.8026 0.0124
90% 1500 0.9015 0.0095
95% 1760 0.946 0.0072
99% 2440 0.9906 0.0031
0.01 0.2 80% 301 0.8045 0.0121
90% 388 0.9046 0.0093
95% 458 0.9529 0.0065
99% 620 0.991 0.0031
0.01 0.3 80% 136 0.8044 0.0129
90% 172 0.9012 0.0089
95% 199 0.9432 0.0071
99% 265 0.9872 0.0034
0.01 0.4 80% 78 0.8017 0.0124
90% 95 0.8856 0.0099
95% 118 0.949 0.007
99% 158 0.9892 0.0033
0.01 0.5 80% 51 0.8042 0.0126
90% 64 0.9011 0.0099
95% 77 0.9464 0.007
99% 104 0.9891 0.0032
0.01 0.6 80% 37 0.7975 0.0125
90% 48 0.906 0.0089
95% 56 0.9485 0.007
99% 73 0.9874 0.0035

Table 1.

Sample Size for Given Effect Size, Power, Level of Significance 10%, Mean of Power in the Validation Step, and its Standard Deviation.

α ES = β1 ∗ (σx/σ) Power n Mean Sd
0.1 0.1 80% 620 0.7993 0.013
90% 870 0.9027 0.0095
95% 1120 0.9546 0.0067
99% 1690 0.993 0.0027
0.1 0.2 80% 161 0.8259 0.0123
90% 219 0.90007 0.0096
95% 274 0.949 0.0069
99% 440 0.0039 0.0024
0.1 0.3 80% 73 0.8017 0.0124
90% 100 0.9006 0.00995
95% 124 0.9475 0.0071
99% 195 0.931 0.0026
0.1 0.4 80% 43 0.8031 0.0126
90% 60 0.9073 0.0093
95% 72 0.9518 0.0073
99% 105 0.9896 0.0031
0.1 0.5 80% 29 0.8045 0.0134
90% 39 0.9003 0.0099
95% 48 0.947 0.0068
99% 69 0.989 0.0034
0.1 0.6 80% 21 0.8 0.0129
90% 28 0.8961 0.0096
95% 35 0.9476 0.0069
99% 52 0.9911 0.003

Comments on Table 1, Table 2 and Table 3:

  1. The first column in each table entertains three types of effect sizes: small (0.1, 0.2); medium (0.3, 0.4); and large (0.5, 0.6) [15].

  2. The second column in each table lays out the powers entertained.

  3. The third column in each table spells out the requisite sample size.

  4. The fourth column is the fruit of our effort to validate the sample size. At the ascertained sample size, data are generated under the specifications, power calculated, and power averaged over thousand times.

  5. The fifth column records the standard deviation of the thousand powers calculated.

  6. We are satisfied that the sample sizes laid out are holding true.

4. Discussion

A simple linear regression is a five-parameter model spelling out causality between two quantitative variables Y and X typified by:

Y|X~Nβ0+β1X, σ2X~NµX, σX2 

for some parameters β0, β1, μX, σ2>0, and σX2>0. The goal is to sample (X, Y) for testing H0:β1=0 versus the alternative H1:β10. For determining sample size, we need the level of significance α, power 1 − β, and the effect size δ=AσX/σ, where A is the given alternative value of β1. The test statistic T used here is the one based on the least squares’ estimator β^1 of  β1.

The regression model, as originally formulated, is a conditional model, i.e., Y|X~Nβ0+β1X, σ2. In practice, in a planned experiment, the experimenter selects x1,x2,. . .,xn of X, and observes one or more Ys from the conditional distribution of Y|xi for each i. Thus, the sample size n has already been chosen. The statistic β^1SXX/RSS/n2 is used for testing H0:β1=0 against the alternative H1:β10. The conditional distribution of the test statistic given the data on X is Student’s t with n-2 degree of freedom under H0, and the distribution is non-central Student’s t with n-2 degrees of freedom and non-centrality parameter ASxx/σ under H1: β1=A. The alternative distribution can be used to calculate the power of the test at β1=A, and nothing more. The entities n and SXX are already in place, and σ has to be spelled out. The value of A is provided by the experimenter as the one of clinical significance. From the consulting experience of one of the authors, the experimenter usually comes up with value for σ from his/her pilot study.

In some statistical circles [6,7], the non = null distribution is used to calculate the sample size with the desired power, with SXX remaining the same. This is controversial and discussed in [3,5,13].

We are dealing with unplanned experiments in which both X and Y are sampled together. Unplanned experiments are very common in clinical studies [4]. The effect size, in this context, is the multiple of the alternative value of β1 by the ratio of the two standard deviations of the model.

The current practice demands α, 1 − β, A, σ and SXX, which we do not have. Specification of SXX is avoided by determining the unconditional distribution of

T=β^1σ^X/σ^.

Exploiting the unconditional distribution of T, we calculated the critical values and required sample size. The unconditional distribution under the alternative depends on the effect size δ = β1σX/σ, as well as n and α. As a contrast, popular software such as PASS [6] and nQuery [7] use the conditional distribution of the test statistic T* given the data on X, for calculating sample size.

An additional feature of our paper is that we provide a comprehensive table of critical values and sample sizes, unlike commercial software.

The main result that the non-null distribution of the test statistic T depends only on the effect size δ has an echo in other inference problems. For example, when testing μ1=μ2 under the normality and common variance σ2 assumptions, the non-null distribution of the two-sample t-statistic depends only on the effect size λ=(μ1μ2)/σ. This result, in spirit, is like ours. We have archived our findings for comments and insights [23].

We trust that the tables provided will help researchers to calculate sample size in the context of simple linear regression in unplanned experiments avoiding the controversies that have been problematic till now. We will continue to study how sample sizes are contrasted between one test based on the slope parameter of the model and one based on the correlation coefficient.

Acknowledgments

The authors are immensely indebted to the four reviewers, who exhorted them to bring the paper into a sharp focus highlighting its strength. One of the reviewers identified the true pulse of the paper and commented that the paper is in the ambit of an unplanned simple linear regression domain in contrast to the traditional planned simple linear regression armory.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/e25040611/s1.

Author Contributions

The original sample size problem came from the consulting work of M.K.A. The project was designed by M.K.A. The bulk of derivations and computations were done by T.G. This was part of her thesis work. M.B.R. was the mentor. Conceptualization, M.K.A.; methodology, M.B.R.; software, T.G.; simulations, T.G.; writing—original draft preparation, T.G.; writing—review and editing, M.B.R.; supervision, M.B.R. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Funding Statement

This research received no external funding.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Galton F. Regression towards Mediocrity in Hereditary Stature. J. Anthropol. Inst. Great Br. Irel. 1886;15:246–263. doi: 10.2307/2841583. [DOI] [Google Scholar]
  • 2.Horton N.J., Switzer S.S. Statistical Methods in the Journal. N. Engl. J. Med. 2005;353:1977–1979. doi: 10.1056/NEJM200511033531823. [DOI] [PubMed] [Google Scholar]
  • 3.Ryan T.P. Sample Size Determination and Power. John Wiley and Sons; Hoboken, NJ, USA: 2013. [Google Scholar]
  • 4.Gripp K.W. Handbook of Physical Measurements. 3rd ed. Oxford University Press; Oxford, UK: 2013. [Google Scholar]
  • 5.Adcock C.J. Sample size determination: A review. J. R. Stat. Soc. D. 1997;46:261–283. doi: 10.1111/1467-9884.00082. [DOI] [Google Scholar]
  • 6.PASS . Power Analysis and Sample Size Software. NCSS, LLC.; Kaysville, Utah: 2021. [(accessed on 1 January 2020)]. Available online: https://www.ncss.com/software/pass/ [Google Scholar]
  • 7.nQuery . Sample Size and Power Calculation. Statsols (Statistical Solutions Ltd.); Cork, Ireland: 2017. [Google Scholar]
  • 8.R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2017. [(accessed on 1 January 2020)]. Available online: https://www.R-project.org/ [Google Scholar]
  • 9.Dupont W.D., Plummer W.D. Power and Sample Size Calculations for Studies Involving Linear Regression. Control. Clin. Trials. 1998;19:589–601. doi: 10.1016/S0197-2456(98)00037-3. [DOI] [PubMed] [Google Scholar]
  • 10.Draper N.R., Smith H. Applied Regression Analysis. 2nd ed. Wiley; New York, NY, USA: 1981. [Google Scholar]
  • 11.Hsieh F., Bloch D., Larsen M. A simple method of sample size calculation for linear and logistic regression. Stat. Med. 1998;17:1623–1634. doi: 10.1002/(SICI)1097-0258(19980730)17:14<1623::AID-SIM871>3.0.CO;2-S. [DOI] [PubMed] [Google Scholar]
  • 12.Maxwell S.E. Sample Size and Multiple Regression Analysis. Psychol. Methods. 2000;5:434–458. doi: 10.1037/1082-989X.5.4.434. [DOI] [PubMed] [Google Scholar]
  • 13.Thigpen C.C. A Sample-Size Problem in Simple Linear Regression. Am. Stat. 1987;41:214–215. [Google Scholar]
  • 14.Gatsonis C., Sampson A.R. Multiple Correlation: Exact Power and Sample Size Calculations. Psychol. Bull. 1989;106:516–524. doi: 10.1037/0033-2909.106.3.516. [DOI] [PubMed] [Google Scholar]
  • 15.Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. L. Erlbaum Associates; Hillsdale, NJ, USA: 1988. [Google Scholar]
  • 16.SAS Analytics Software and Solution—Version 9.4. [(accessed on 1 January 2023)]. Available online: https://support.sas.com/software/94/
  • 17.Krishnamoorthy K., Xia Y. Sample size calculation for estimating or testing a nonzero squared multiple correlation coefficient. Multivar. Behav. Res. 2008;43:382–410. doi: 10.1080/00273170802285727. [DOI] [PubMed] [Google Scholar]
  • 18.Mendoza J.L., Stafford K.L. Confidence Intervals, Power Calculation, and Sample Size Estimation for the Squared Multiple Correlation Coefficient under the Fixed and Random Regression Models: A Computer Program and Useful Standard Tables. Educ. Psychol. Meas. 2001;61:650–667. doi: 10.1177/00131640121971419. [DOI] [Google Scholar]
  • 19.Kelley K. Sample size planning for the squared multiple correlation coefficient: Accuracy in parameter estimation via narrow confidence intervals. Multivar. Behav. Res. 2008;43:524–555. doi: 10.1080/00273170802490632. [DOI] [PubMed] [Google Scholar]
  • 20.Shieh G. A Unified Approach to Power Calculation and Sample Size Determination for Random Regression Models. Psychometrika. 2007;72:347–360. doi: 10.1007/s11336-007-9012-5. [DOI] [Google Scholar]
  • 21.Shieh G. Sample size requirements for interval estimation of the strength of association effect sizes in multiple regression analysis. Psicothema. 2013;25:402–407. doi: 10.7334/psicothema2012.221. [DOI] [PubMed] [Google Scholar]
  • 22.Jambunathan M.V. Some Properties of Beta and Gamma Distributions. Ann. Math. Stat. 1954;25:401–405. doi: 10.1214/aoms/1177728800. [DOI] [Google Scholar]
  • 23.Guan T., Alam M.K., Rao M.B. Sample Size Calculations in Simple Linear Regression: Trials and Tribulations. arXiv. 2019 doi: 10.3390/e25040611.1907.10569 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

Not applicable.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES