Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Mar 20.
Published in final edited form as: Am Stat. 2019 Mar 20;73(Suppl 1):129–134. doi: 10.1080/00031305.2018.1518788

Evidence from marginally significant t statistics

Valen E Johnson 1
PMCID: PMC6527351  NIHMSID: NIHMS1515708  PMID: 31123367

Abstract

This article examines the evidence contained in t statistics that are marginally significant in 5% tests. The bases for evaluating evidence are likelihood ratios and integrated likelihood ratios, computed under a variety of assumptions regarding the alternative hypotheses in null hypothesis significance tests. Likelihood ratios and integrated likelihood ratios provide a useful measure of the evidence in favor of competing hypotheses because they can be interpreted as representing the ratio of the probabilities that each hypothesis assigns to observed data. When they are either very large or very small, they suggest that one hypothesis is much better than the other in predicting observed data. If they are close to 1.0, then both hypotheses provide approximately equally valid explanations for observed data. I find that p-values that are close to 0.05 (i.e., that are “marginally significant”) correspond to integrated likelihood ratios that are bounded by approximately 7 in two-sided tests, and by approximately 4 in one-sided tests.

The modest magnitude of integrated likelihood ratios corresponding to p-values close to 0.05 clearly suggests that higher standards of evidence are needed to support claims of novel discoveries and new effects.

1. Introduction

In a pair of recent articles (Johnson 2013; Benjamin et al. 2017), my co-authors and I recommended that the threshold for declaring “statistical significance” be changed from 0.05 to 0.005. Criticisms of this proposal have focused on comparisons of type 1 and type 2 errors, false negative and false positive rates, and other more sophisticated decision-theoretic based quantities. There is also a persistent misunderstanding regarding the amount of statistical evidence contained in p-values, and many scientists are unwilling to adjust their interpretation of p-values based on more direct measures of evidence. For example, Lakens et al. 2018 state, “given that the marginal likelihood is sensitive to different choices for the models compared, redefining alpha levels as a function of the Bayes factor is undesirable.” Indeed, many non-statisticians mistakenly interpret p-values as the probability that a null hypothesis is true, and many more are not aware of the relatively arbitrary manner in which the value of 0.05 was chosen to define statistical significance.

In this article I examine the fundamental question, How much evidence is contained in a t statistic when the p-value is close to 0.05? Ideally, this question would be answered by providing a formula to compute the probability that a null hypothesis is true based on the p-value. That probability is the quantity that scientists are most interested in knowing. Unfortunately, there is no unique mapping from p-values to the probability that a null hypothesis is true, and so this article instead focuses on providing upper bounds on likelihood ratios and integrated likelihood ratios when a p-value of 0.05 is observed. Loosely speaking, a likelihood ratio (LR) represents the ratio of the probability assigned to data under an alternative hypothesis to the probability assigned to data under the null hypothesis. In Bayesian analyses, the LR is directly related to the probability that each hypothesis is true. When the LR is large, the alternative hypothesis provides a better explanation for observed data than the null hypothesis does; when the LR is small, the null provides a better explanation. When the LR is close to 1.0, both hypotheses provide approximately equally valid explanations for observed data. Alternative hypotheses refer to the presence of an effect; the null hypothesis corresponds to no effect. Likelihood ratios can only be computed when all model parameters are completely specified under both hypotheses.

When LR’s cannot be computed, integrated likelihood ratios (ILR’s) can be computed instead. Like LR’s, ILR’s reflect the relative probability assigned to the data by alternative and null hypotheses and thus provide a direct measure of evidence regarding the relative validity of two competing hypotheses. The term integrated likelihood ratio (ILR) is used to describe the ratio of marginal densities obtained by integrating out nuisance parameters that define one or both hypotheses. Integrated likelihoods are one of the two main approaches to handling nuisance parameters, the other being maximization (e.g., profile likelihoods). Integrated likelihoods are used in both frequentist and Bayesian settings, and often have desirable properties not possessed by maximization methods (Kalbfleisch and Sprott 1970; Berger et al. 1999). In Bayesian settings ILRs are called Bayes factors, but due to the data-dependent nature of the alternative hypotheses considered here, resulting ILRs are not consistent with standard Bayesian practice and so the term Bayes factor has been avoided.

Most of the alternative hypotheses examined in this article have been chosen to bias LR’s and ILR’s in their favor. Similar to earlier analyses in, for example Edwards et al. (1963), the alternative hypotheses have been chosen to make a p-value of 0.05 look as “significant” as possible. For p-values close to 0.05, I find that LR’s and ILR’s for two-sided tests are less than 7, and LR’s and ILR’s for one-sided tests are less than 4. When ILR’s are calculated as part of a Bayesian analysis, many statisticians feel that values greater than 10 or 20 are required to provide strong evidence in favor of one hypothesis over another (Jeffreys 1961; Kass and Raftery 1995).

2. One-sided tests

To begin, consider one-sided tests of a normal mean. Let X1,…, Xn denote independent random variables with N (μ, σ2) distributions. For simplicity, suppose that the null and alternative hypotheses are specified as follows:

H0:μ=0,H1:μN(a,gσ2). (1)

A normal distribution centered on a with variance g times the observational variance is used to represent the alternative hypothesis. When g = 0, the alternative hypothesis becomes a simple hypothesis, i.e., a point mass prior centered on a.1

The ILR’s considered here for composite hypotheses (i.e., g > 0) are computed under the assumption that the marginal distribution on the variance parameter σ2 is proportional to 1/σ2. This assumption corresponds to an improper, non-informative prior on the variance parameter and is applied to both the null and alternative hypotheses. It also results in certain numerical (but not philosophical) equivalences between standard frequentist and Bayesian analyses. For example, if a non-informative prior density is also imposed on μ, the Bayesian posterior density for μ is a standard t density. Further discussion of noninformative and improper priors on variance parameters can be found in, for example, Berger and Bernardo (1992).

With these assumptions, the marginal density of the data X = {X1,…, Xn} under the alternative hypothesis, obtained by integrating out μ and the nuisance parameter σ2, can be expressed as

m1(X)=c(ng+1)1/2[1+ta2(ng+1)(n1)]n/2, (2)

where

s2=1n1i=1n(XiX¯)2,ta=X¯as/n, (3)

and

c=[π(n1)s2]n/2Γ(n2).

The value ta represents the standard t statistic for testing a hypothesis that μ = a, and s2 is the usual sample variance.

The marginal density of the data under the null hypothesis can be obtained by taking a = 0 and g = 0 in (2), yielding

f(X;a=g=0)=c[1+t02(n1)]n/2. (4)

The marginal density of the data under the simple alternative hypothesis μ = a is similarly obtained by taking g = 0 in (2), yielding

f(X;μ=a,g=0)=c[1+ta2(n1)]n/2. (5)

For composite alternative hypotheses, it follows that the ILR between the hypotheses specified in (1) can be expressed as

ILR=m1(X)f(X;a=g=0)=[1+t02n1]n/2ng+1[1+ta2(ng+1)(n1)]n/2. (6)

For simple hypotheses, the ILR can be expressed as

ILR=f(X;a,g=0)f(X;a=g=0)=[1+t02n1]n/2[1+ta2(n1)]n/2. (7)

This equation was obtained by integrating out the variance parameter, σ2, and setting g = 0 in (2). Alternatively, (7) can be obtained directly by considering the sampling distribution of the t statistic. Under the null hypothesis, t0 has a standard t density, while under the alternative hypothesis, ta has a standard t density. Thus, the ILR defined in (7) can also be regarded from the classical perspective as a simple LR.

2.1. Maximum integrated likelihood ratios

From (2), it follows that the maximum probability that can be assigned to the data under the alternative hypothesis is obtained by taking a=X¯ and g = 0. For this choice of a and g, the alternative hypothesis becomes a point mass centered on the sample mean, i.e.,

H1:μ=X¯. (8)

This assumption maximizes the marginal density of the data under the alternative hypothesis. For this choice of alternative, the ILR simplifies to

[1+t02(n1)]n/2. (9)

This value represents the maximum value that can be achieved by the ILR for t statistics based on normally distributed data (see Edwards et al. (1963) for further discussion maximum likelihood ratios).

In actual scientific practice, sampling variation makes it unlikely that X¯ would exactly equal the population mean μ. Nonetheless, Figure 1 depicts the maximum ILR obtained under the alternative hypothesis specified in (8) as a function of the degrees of freedom of the t statistic ν (= n − 1) when t0 yields a p-value of 0.05 (i.e., t0=Τ0.05v, where Ταv represents the (1 − α) quantile of a standard t distribution on ν degrees of freedom). Thus, Figure 1 displays the maximum of the ratio between the marginal probabilities assigned to the data under any alternative hypothesis and the null hypothesis when t0=Τ0.05v.

Figure 1:

Figure 1:

Maximum integrated likelihood ratio for one-sided t-test yielding p = 0.05.

From Figure 1, we see that the ILR is less than 5 whenever there are 8 or more degrees of freedom. That is, the data is at least 1/5 as likely under the null hypothesis as it is under any alternative hypothesis regarding the value of μ.

For 1 degree of freedom, the ILR can be as high as 40.9. With 1 degree of freedom (n = 2), this value is obtained when the t statistic (nX¯/s) is 6.31 and the estimated standardized effect size, X¯/s, is 4.46. With 5 degrees of freedom, the maximum ILR of 5.95 is obtained when the estimated standardized effect size is 0.82, and for 12 degrees of freedom the maximum ILR of 4.60 is obtained for an estimated standardized effect size of 0.49.

In many studies in the social sciences, the magnitudes of standardized effect sizes (when present) are often smaller than 1.0. For instance, Cohen (1988) classified standardized effect sizes for differences in means as being “small” when near 0.2, “medium” when near 0.5, and “large” when close to 0.8. Sawilowsky (2009) extended these descriptors to “very large” (1.2) and “huge” (2.0) standardized effect sizes. Large effect sizes are often easy to detect, while very small effect sizes may not be of substantive importance. For this reason, hypothesis tests that attempt to detect small to medium effect sizes typically present the greatest challenge and are often of the most substantive interest. If we modify the alternative hypothesis in (8) to restrict μ to be less than 1/2 of an estimated standardized effect size, then a more realistic alternative hypothesis can be expressed as

H1:μ=a=sgn(X¯)min(|X¯|,s2)andg=0. (10)

The black curve in Figure 2 depicts the ILR under this alternative hypothesis. It shows that the maximum constrained ILR occurs at 9 degrees of freedom and is 4.71. For estimated standardized effect sizes known to be less than 0.5 (or medium effect sizes in Cohen’s terminology), this figure thus shows that the maximum ILR between the t-statistic under the alternative and null hypotheses is less than 5 whenever p = 0.05.

Figure 2:

Figure 2:

ILRs for one-sided tests. The black curve represents the integrated likelihood ratio for one-sided t-tests yielding p = 0.05 under the alternative hypothesis specified in (10). The red curve represents the “average” ILR for a one-sided t-test yielding p = 0.05. The red curve was obtained by replacing the marginal density of the t statistic under the alternative hypothesis by its expectation. The blue curve represents the ILR obtained under the alternative hypothesis corresponding to a=X¯ and g = 1/n in (1)

2.2. Accounting for sampling variation

2.2.1. Classical approach

For small to medium estimated standardized effect sizes, the ILRs in the previous section assumed that the true population mean μ under the alternative hypothesis exactly equaled the observed sample mean X¯. Based on this assumption, the marginal density of the data under the alternative hypothesis was computed from (2) by taking X¯ and g = 0. Of course, the probability that the sample mean X¯ exactly equals the population mean μ is zero.

If, however, the true state of nature was known, then the “true” ILR would be obtained by specifying the alternative hypothesis to be this value. In other words, if the datagenerating value of μ was known, we would assume that a = μ and g = 0 in (1). Under this assumption, the ILR would be assigned the value

(1+t02v)n/2(1+tμ2v)n/2. (11)

Unfortunately, the true value of μ is not known, so the quantity in the denominator cannot be computed.

Because we are conditioning on the event p = 0.05, we know that X¯=Τ0.05vs/n or equivalently that t0=Τ0.05v. Under this condition, the numerator in (11) is a fixed and known quantity. However if we ignore the conditioning on the value of X¯, then tμ, evaluated at the true but unknown value of μ, is known to have exactly a t distribution on ν degrees of freedom. This makes it possible to calculate the expected value of

1[1+tμ2v]n/2. (12)

Simple calculations show that this expectation can be expressed as

Eμ[(1+tμ2v)n/2]=Γ(n2)Γ(2n12)Γ(n12)Γ(n). (13)

Thus, even though we don’t know the true state of nature and the data-generating value of μ, we can compute the expectation of (12) under this value.

It follows that an approximation to the “average ILR” that would be obtained under the true but unknown μ can be expressed as

average ILRΓ(n2)Γ(2n12)Γ(n12)Γ(n)[1+t02v]n/2. (14)

Of course, this expression does not equal the expected value of the ILR because the expectation in (13) ignored the condition that X¯=Τ0.05vs/n. Nonetheless, this expression provides an approximation to the average ILR that would be obtained for the “true” alternative hypothesis.

The red curve in Figure 2 depicts the average ILR for ν ∈ (5, 50) and t statistics that yield p = 0.05. For medium estimated effect sizes (corresponding to more than 5 degrees of freedom), the average ILR is less than 3 when p = 0.05. As before, ILRs greater than 3 can be obtained for ν < 5, but these ILRs correspond to comparatively large and easily detectable standardized effect sizes.

2.2.2. Bayesian approach

A Bayesian approach can also be taken toward evaluating the uncertainty regarding the true value of μ under the alternative hypothesis. For instance, the alternative hypothesis for μ might be assumed to be normally distributed around X¯ with variance σ2/n (i.e., a=X¯ and g = 1/n). If the variance was known a priori, this assumption would correspond to specifying the alternative hypothesis to be the posterior distribution on μ given the sample mean X¯. It leads to the ILRs displayed by the blue curve Figure 2. This curve produces ILRs that are very similar to the average ILRs obtained in the previous section. Of course, a genuine Bayesian analysis would not be premised on a prior centered on the sample mean, but the similarity between the average ILR and this pseudo-Bayes factor is revealing.

3. Two-sided tests

3.1. Bayesian approach

From a Bayesian perspective, the conduct of a two-sided test suggests that values of μ above and below the null value are possible, which in turn suggests that only alternative hypotheses that are symmetric around the null hypothesis should be considered (Berger and Sellke 1987; Sellke et al. 2001). Under this constraint, an alternative hypothesis of the following form approximately maximizes the ILR against the null hypothesis (Berger and Sellke 1987, p.116):

H1:μ={X¯withprobability12X¯withprobability12 (15)

For this alternative hypothesis, the ILR can be expressed as

(1+t02v)n/2[12+12(1+4t02v)n/2], (16)

where t0 now refers to Τ0.025v.

A plot of this ILR against degrees of freedom ν appears as the green curve in Figure 3. This figure suggests that the maximum ILR for marginally significant t statistics under the constraint of a symmetric alternative hypothesis is less than 5 when there are 8 or more degrees of freedom (small to medium estimated effect sizes).

Figure 3:

Figure 3:

ILRs for two-sided tests. The black curve depicts the maximum ILR for a two-sided t-test yielding p = 0.05. The alternative hypothesis underlying this curve assumes that μ=Τ0.025vs/n, the value of the sample mean that produces a two-sided p-value of 0.05. The green curve represents the ILR for a two-sided t-test yielding p = 0.05 obtained by setting a=±X¯, each with probability one-half, and g = 0. The blue curve was obtained similarly, except that g = 1/n to account for variation in the sample mean. The purple curve was obtained by taking a=X¯ and g = 1/n in (1). The red curve represents the “average” ILR for two-sided t-tests yielding p = 0.05. The marginal likelihood for this curve was obtained by replacing the marginal density of the data under the alternative hypothesis with its expected value at the true value of μ.

As in the case of one-sided tests, the alternative hypotheses used to define the ILRs in the Bayesian test can be revised to account for sampling variability in the value of X¯. One approach toward accounting for this variability is to assume a symmetric alternative in which 1/2 mass is assigned to two normal densities centered on ± X¯ and variance σ2/n. This assumption roughly corresponds to taking one-half of the posterior density centered on X¯ and re-centering it on X¯. The integrated likelihood ratio that results from this alternative model is

122(1+t02v)n/2[1+(1+2t02v)n/2]. (17)

The blue curve in Figure 3 shows the ILR’s that result from this assumption on the prior distribution. The values in this curve approximately mimic the values of the blue curve in Figure 2, which were based on a similar Bayesian analysis of one-sided tests.

If the symmetry constraint on the alternative hypothesis is removed and the alternative hypothesis is instead defined by taking a=X¯ and g = 1/n, then the ILR can be expressed as

12(1+t02v)n/2. (18)

Values of the ILR under this assumption are represented by the purple curve in Figure 3 and are approximately twice the value of the blue curve.

3.2. Classical approach

Finally, let us examine ILRs for two-sided t-tests that are significant at the 5% level. The maximum bounds in this case are identical to the bounds that would be obtained in a one-sided t-test that yielded p = 0.025, and are obtained by assuming that the alternative hypothesis specifies that a=X¯ and g = 0 in (3). The sample mean is assumed to equal Τ0.025vs/n. The black curve in Figure 3 displays the resulting maximum ILR for 5 or more degrees of freedom, or small to medium estimated standardized effect sizes. Because a two-sided test is performed even though the optimal alternative hypothesis is inherently “one-sided,” the ILRs in this scenario are larger than they were in previous scenarios.

As for one-sided tests, the assumption that the true population mean exactly equals the sample mean is unrealistic. If we account for the sampling variation in the sample mean and instead use the expected value of the t density under the assumption that μ is known (as in (13)), then the average ILR can be approximated by the red curve in Figure 3. The values depicted in this curve are approximately twice the values of the blue curve, which were obtained by placing one-half mass each on a=±X¯ and g = 1/n, and are very close to the values in the purple curve obtained by taking a=X¯ and g = 1/n. As noted previously, the factor of two in the former arises from the fact that the alternative split the Bayesian posterior distribution in two, re-centering one-half of the posterior distribution on X¯ in order to maintain a symmetric alternative.

The average ILRs in the case of two-sided tests are between 5 and 8 for small to medium estimated standardized effect sizes and p-values near 0.05.

4. Conclusions

Under a variety of assumptions regarding the values of non-zero effects, ILRs in favor of alternative hypotheses are less than 4 for one-sided t tests based on more than 5 degrees of freedom, and are less than 7 for two-sided tests t tests based on more than 7 degrees of freedom. For alternative hypotheses that are constrained to be symmetric around the null hypotheses, ILRs are less than about 5 or 6 for medium estimated standardized effect sizes, and less than about 3 or 4 for small estimated effect sizes in two-sided tests.

This range of ILR values is less conservative than the Bayesian analyses of p-values and Bayes factors presented in Sellke et al. (2001), which required alternative hypotheses to be symmetric—and in many cases unimodal—around the null value. That is, Sellke and co-authors estimated ILRs that were even smaller than those exposed here.

The difference in evidence reflected by one-sided and two-sided bounds on ILRs illustrate the importance of properly specifying alternative hypotheses. Indeed, it is quite possible that many journals and regulators implicitly impose significance thresholds of p < 0.025 by requiring that two-sided tests be conducted for alternative hypotheses that are inherently one-sided. Of course, this higher standard for declaring statistical significance is only effective when the sign of an effect is known a priori. It offers no additional protection against HARKing (hypothesizing after results are known; Kerr 1998) when the sign of an effect not specified before data are analyzed.

In my opinion, the best estimate of the evidence provided by t statistics is provided by the average ILR, which is obtained by replacing the marginal density of data under the alternative hypothesis by its (unconditional) expectation. The expectation of the marginal density of the t statistic under the alternative is free of additional assumptions and represents the exact expectation of a t density at the true value of the population mean μ. It is thus insensitive to prior model choices and other modeling assumptions.

For t statistics based on more than 6 degrees of freedom, the average ILR for two-sided tests is less than 6. For one-sided tests with p-values around 0.05, the average IRL is less than about 3. In other words, the data are, on average, only 3 or 6 times more likely under the “true” model than they are under the null hypothesis. Importantly, these values are independent of prior assumptions regarding the value of the population mean under the alternative hypothesis, and apply for all hypothesis tests based on t statistics. They clearly suggest that higher standards of evidence are needed to support claims of novel discoveries and new effects.

5. Acknowledgements

The author thanks an anonymous associate editor for numerous comments that improved this article. Financial support was provided by NIH award R01 CA158113.

Footnotes

1

A simple hypothesis is a hypothesis in which the value of the unknown parameter is completely specified. For composite hypotheses, the value of unknown parameters is only constrained to take values from a specified set, or to be drawn from a specified probability distribution.

References

  • [1].Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers E–J, Berk R, Bollen KA, Brembs B, Brown L, Camerer C, Cesarini D, Chambers CD, Clyde M, Cook TD, De Boeck P, Dienes Z, Dreber A, Easwaran K, Efferson C, Fehr E, Fidler F, Field AP, Forster M, George EI, Gonzalez R, Goodman S, Green E, Green DP, Greenwald A, Hadfield JD, Hedges LV, Held L, Ho T–H, Hoijtink H, Jones JH, Hruschka DJ, Imai K, Imbens G, Ioannidis JPA, Jeon M, Kirchler M, Laibson D, List J, Little R, Lupia A, Machery E, Maxwell SE, McCarthy M, Moore D, Morgan SL, Munaf´o M, Nakagawa S, Nyhan B, Parker TH, Pericchi L, Perugini M, Rouder J, Rousseau J, Savalei V, Sch¨onbrodt FD, Sellke T, Sinclair B, Tingley D, Van Zandt T, Vazire S, Watts DJ, Winship C, Wolpert RL, Xie Y, Young C, Zinman J, and Johnson VE, “Redefine Statistical Significance,” Nature Human Behaviour, https://www.nature.com/articles/s41562-017-0189-z, 2017. [DOI] [PubMed] [Google Scholar]
  • [2].Berger JO and Bernardo JM, “On the development of reference priors (with discussion). In Bayesian Statistics 4 (Bernardo JM, Berger JO, Dawid AP and Smith AFM, eds), 35–60, Oxford University Press, 1992. [Google Scholar]
  • [3].Berger JO, Liseo B, and Wolpert RL, “Integrated likelihood methods for eliminating nuisance parameters,” Statistical Science, 14, 1–28, 1999. [Google Scholar]
  • [4].Berger JO and Sellke T, “Testing a point null hypothesis: The irreconcilability of P values and evidence,” Journal of the American Statistical Association, 82(397), 112–122. [Google Scholar]
  • [5].Cohen J, Statistical Power Analysis for the Behavioral Sciences, Routledge, New York, 1988. [Google Scholar]
  • [6].Edwards W, Lindman H, and Savage L, “Bayesian statistical inference for psychological research,” Psychological Review, 70, 193–242, 1963. [Google Scholar]
  • [7].Jeffreys H, Theory of Probability (3rd ed.) Oxford, U.K., Oxford University Press, 1961. [Google Scholar]
  • [8].Johnson Valen E., Revised standards for statistical evidence. Proceedings of the National Academy of Sciences, 110 (48), 19313–19317, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Kalbfleisch J and Sprott DA, “Application of Likelihood Methods to Models Involving Large Numbers of Parameters,” Journal of the Royal Statistical Society, Series B, 32(2), 175–208, 1970. [Google Scholar]
  • [10].Kass R and Raftery AE, “Bayes factors,” Journal of the American Statistical Association, 90(430), 773–795, 1995. [Google Scholar]
  • [11].Kerr N, “HARKing: Hypothesizing after the results are known,” Personality and Social Psychology Review, bf 2(3), 196–217, (1998). [DOI] [PubMed] [Google Scholar]
  • [12].Lakens D, Adolfi FG, Albers CJ, Anvari F, Apps MAJ, Argamon SE, Baguley T, Becker RB, Benning SD, Bradford DE, Buchanan EM, Caldwell AR, Calster BV, Carlsson R, Chen S-C, Chung B, Colling LJ, Collins GS, Crook Z, Cross ES, Daniels S, Danielsson H, DeBruine L, Dunleavy DJ, Earp BD, Feist MI, Ferrell JD, Field JG, Fox NW, Friesen A, Gomes C, Gonzalez-Marquez M, Grange JA, Grieve AP, Guggenberger R, Grist J, Harmelen A, Hasselman F, Hochard KD, Hoffarth MR, Holmes NP, Ingre M, Isager PM, Isotalus HK, Johansson C, Juszczyk K, Kenny DA, Khalil AA, Konat B, Lao J, Larsen EG, Lodder G, Lukavsky J, Madan CR, Manheim D, Martin SR, Martin AE, Mayo DG, McCarthy RJ, McConway K, McFarland C, Nio AQX, Nilsonne G, Oliveira CL, Xivry J, Parsons S, Pfuhl G, Quinn KA, Sakon JJ, Saribay SA, Schneider IK, Selvaraju M, Sjoerds Z, Smith SG, Smits T, Spies JR, Sreekumar V, Steltenpohl CN, Stenhouse N, Swiatkowski W, Vadillo MA, Van Assen M, Williams MN, Williams SE, Williams DR, Yarkoni T, Ziano I, Zwaan RA, “Justify your alpha”, Nature Human Behaviour, 2, 168–171. [Google Scholar]
  • [13].Sawilowsky S, “New effect size rules of thumb,” Journal of Modern Applied Statistical Methods, 8(2), 467–474, 2009. [Google Scholar]
  • [14].Sellke T, Bayarri MJ, Berger JO, “Calibration of p values for testing precise hypotheses,” The American Statistician,” 55(1), 62–71, 2001. [Google Scholar]

RESOURCES