Skip to main content
Springer logoLink to Springer
. 2023 Feb 2;201(2):53. doi: 10.1007/s11229-022-04025-x

How good is an explanation?

David H Glass 1,
PMCID: PMC9895044  PMID: 36748080

Abstract

How good is an explanation and when is one explanation better than another? In this paper, I address these questions by exploring probabilistic measures of explanatory power in order to defend a particular Bayesian account of explanatory goodness. Critical to this discussion is a distinction between weak and strong measures of explanatory power due to Good (Br J Philos Sci 19:123–143, 1968). In particular, I argue that if one is interested in the overall goodness of an explanation, an appropriate balance needs to be struck between the weak explanatory power and the complexity of a hypothesis. In light of this, I provide a new defence of a strong measure proposed by Good by providing new derivations of it, comparing it with other measures and exploring its connection with information, confirmation and explanatory virtues. Furthermore, Good really presented a family of strong measures, whereas I draw on a complexity criterion that favours a specific measure and hence provides a more precise way to quantify explanatory goodness.

Keywords: Explanation, Explanatory goodness, Explanatory power, Bayesian, Information, Confirmation

Introduction

It would be difficult to overstate the interest among philosophers of science on the topic of explanation. Much of this has focussed on the nature of explanation (Woodward, 2017), with modern discussions stemming from the deductive-nomological model of Hempel and Oppenheim (1948) and going on to consider other models such as statistical relevance (Salmon, 1971), unification (Friedman, 1974; Kitcher, 1989), causal-mechanical (Salmon, 1984), causal (Woodward, 2003) and pragmatic accounts (van Fraassen, 1980). Although not receiving so much attention, there has also been interest in quantifying or comparing explanations using probability (Popper, 1959; Good, 1960, 1968), including a number of recent proposals (Schupbach and Sprenger, 2011; Crupi and Tentori, 2012; Glass, 2007, 2021).

It might be thought that an answer to the question ‘what is an explanation?’ would be needed before attempting to answer questions such as ‘how good is an explanation?’ or ‘when is one explanation better than another?’, but that does not seem to be the case. Discussions about the nature of explanation typically involve pre-theoretical intuitions about explanation and often these extend to intuitions about the goodness of explanations, particularly comparative judgments. One can hold that quantum mechanics provides a very good explanation of blackbody radiation or that thermodynamics provides a better explanation of heat than the caloric theory without settling the question of what exactly constitutes an explanation. Similarly, in the area of medical diagnosis, one might try to formalize comparative judgments about which condition best explains the symptoms without taking a view on the metaphysics of explanation.

Explanatory goodness is clearly very important in the context of inference to the best explanation (IBE) (Lipton, 2004; Douven, 2017). While defenders of IBE need not be committed to a particular view on the nature of explanation, they nevertheless need to be able to give an account of the goodness of an explanation and more specifically how to compare explanations. In this context, discussion of explanatory virtues is important with explanations that do better according to a range of virtues being judged better (Mackonis, 2013). Of particular relevance here are approaches to IBE that seek to evaluate explanations using probability theory (Douven, 1999, 2013; Schupbach, 2018; Glass, 2012, 2021), though one might expect that such approaches should also capture at least some of the explanatory virtues. However, questions about explanatory goodness and comparative judgments, which arise both within science and outside it, are still important irrespective of whether one is committed to IBE as a legitimate mode of scientific inference.

In this paper, I will focus on measures of explanatory power to address the central question. A variety of probabilistic measures of explanatory power in this sense have been proposed in the literature (Good, 1968; Schupbach and Sprenger, 2011; Schupbach, 2011, 2018; Crupi and Tentori, 2012). Arguably, these measures can be seen as attempts to quantify how well a hypothesis would explain a given explanandum if the hypothesis were true. According to a distinction due to Good (1968), which is central to the current paper, these are measures of weak explanatory power whereas strong measures take into account not only how well the hypothesis would account for the explanandum if it were true, but also how likely it is to be true in the first place.

Explanatory power and explanatory goodness

Measures of explanatory power

One way to approach the questions ‘how good is an explanation?’ and the related question ‘when is one explanation better than another?’ would be to appeal to various probabilistic measures of explanatory power. At the outset, it is worth noting that these measures are not intended to define what constitutes an explanation, but only to measure the explanatory power of a hypothesis that has been determined on other grounds to provide an explanation. With that in mind, considering an explanandum e and an explanans or explanatory hypothesis h, these measures quantify—in a sense to be discussed—the extent to which h explains e. For example, after identifying seven adequacy conditions to quantify explanatory power, Schupbach and Sprenger (2011) show that the only measure satisfying their conditions is:

E1(e,h)=P(h|e)-P(h|¬e)P(h|e)+P(h|¬e). 1

Here and elsewhere, P represents a probability function, which is assumed to be regular (for any contingent proposition q, 0<P(q)<1), e represents the explanandum and h a hypothesis that provides at least a potential explanation of e. Probabilities are taken to represent degrees of belief relevant to background knowledge, which is omitted in the notation for convenience. Hence, the current approach should be thought of in Bayesian terms.

Crupi and Tentori (2012) present an axiomatic representation for measures ordinally equivalent to E1 and then, after offering criticisms of some aspects of Schupbach and Sprenger’s approach, they present an alternative axiomatization for measures ordinally equivalent to their preferred measure:

E2(e,h)=P(e|h)-P(e)1-P(e)ifP(e|h)P(e)P(e|h)-P(e)P(e)ifP(e|h)<P(e). 2

Cohen (2016) draws attention to another measure that had been proposed by Good (1960), who provided an axiomitzation for it, and also discussed by McGrew (2003):

E3(e,h)=logP(e|h)P(e). 3

Cohen shows how measures ordinally equivalent to E3 could also be given a much simpler axiomatic representation by drawing on a result from Crupi et al. (2013).

Another measure of explanatory power was proposed by Popper (1959) as follows:

E4(e,h)=P(e|h)-P(e)P(e|h)+P(e), 4

though he also considered E3 to provide an adequate definition of explanatory power as well. In fact, E4 is ordinally equivalent to E3 so both of these measures produce the same comparative explanatory judgments.

What exactly are these measures intended to quantify? According to Schupbach and Sprenger, the conception of explanatory power they have in mind is that of a ‘hypothesis’s ability to decrease the degree to which we find the explanandum surprising’ (2011, p. 108) and similarly Crupi and Tentori claim that their account captures ‘how the background surprisingness/expectedness of explanandum e is reduced by assuming candidate explanans h’ (2012, p. 375). Plausibly all these measures can be understood as attempts to capture explanatory power in the sense of ‘h reducing surprise in e’ or perhaps ‘h increasing expectedness of e’ (though see Sect. 4.3). How surprising e is will differ from one case to another, but the key factor is the reduction of surprise. We can think of this by comparing the P(e|h), the probability of e given h, with P(e), the probability of e given only background knowledge. A low value of P(e) would represent a case where e is surprising and the lower the value of P(e) the greater the extent to which e is surprising. If P(e|h) is greater than P(e), this would represent the situation where h reduces the surprise in e and the greater P(e|h) the greater the reduction. Hence, if one is comparing two hypotheses for a given explanandum, it is the one that increases its probability most that has greater explanatory power (see Sect. 2.3). More importantly, what these measures have in common is that they attempt to quantify how well h would explain e (in the sense just noted) if h were true. That is, they are intended to capture something about the relationship between h and e under the assumption that h is true.1 In Good’s (1968) terminology, they are all measures of weak explanatory power. I will return to his distinction between weak and strong explanatory power in Sect. 3.

Before considering the suitability of these measures, it is worth commenting on some concerns about the general approach. If measures of explanatory power are concerned with reduction of surprise in the sense noted above, the problem of old evidence that is posed for Bayesianism is relevant (Glymour, 1980). Essentially, the problem is that if e is old evidence it is included in background knowledge so that P(e)=1, which also means that P(e|h)=1 and hence P(h|e)=P(h), so that e cannot confirm h. Some Bayesians have appealed to variants of Garber’s (1983) approach, which sought to show that what confirms h is not e but the discovery that h entails e. However, this strategy does not help in the current context since if it is accepted that P(e|h)=P(e)=1, then there can be no reduction in surprise. Alternatively, in the counterfactual approach to the problem, the idea is to suppose that e is not known to be true, removing it from background knowledge, and then to consider the impact that learning e would have on h. This approach was defended by Howson (1991), but he later rejected it because of difficulties involved in extracting e from background knowledge and inconsistency with a subjective Bayesian approach. Instead, he argued that ‘a minimalist version of Objective Bayesianism does straightforwardly solve the problem’ (Howson, 2017) and based his approach on earlier work by Rosenkrantz (1983). An important aspect of this approach is that a probability less than one can be legitimately assigned to e in the case of old evidence. If a solution along these lines is viable or if the counterfactual approach can be defended against criticisms, this would undermine the possible concern about the general approach adopted here.2

Another concern is that it might reasonably be doubted whether explanation could be fully analyzed in probabilistic terms. In response, it can be noted that these measures are not intended to define what constitutes an explanation, but only to measure the explanatory power of a hypothesis that has been determined on other grounds to provide an explanation. However, further objections might relate specifically to using probability to measure explanatory goodness. For example, a number of philosophers have argued that explanation is intimately tied to understanding (see, for example, Friedman, 1974; Kitcher, 1989) and, if correct, it might seem questionable whether this could be fully analyzed probabilistically. While this can be acknowledged as a potential limitation, the proposed strategy is to explore the probabilistic approach to see how far it goes. The measures of explanatory power discussed above have had some success in this regard and the hope is to extend that success further. Arguably, the suggestion that the explanatory measures described above capture ‘reduction in surprise’ and the argument that the account proposed here does justice to a number of explanatory virtues (see Sect. 4.5) might go some way to addressing this concern.

A further concern is that in the context of probabilistic explanation some have argued that low probabilities explain just as well as high probabilities (see, Jeffrey, 1969; Salmon, 1971; Railton, 1981), a viewpoint known as egalitarianism. Yet according to all the measures discussed above, explanatory power is greater for a hypothesis that confers a higher probability on a given explanandum (see Sect. 2.3). This is consistent with ‘moderate elitism’, a view defended by Strevens (2000, 2014) which does not deny that low probability events can be explained, but maintains that conferring a high probability is better. Consider, for example, a polarizer oriented at angle θ to the vertical. According to quantum theory, a probability of cos2(θ) that an incoming, vertically polarized photon will be transmitted. On an egalitarian view, the transmission of the photon is equally well explained irrespective of whether θ is small or large, and hence the probability of transmission large or small (though not zero) respectively. A motivation for the egalitarian view is that in the low probability scenario, there are no further relevant factors that could be cited. However, there also seems to be a clear motivation for saying that the transmission is better explained by small θ (and hence high probability). Suppose we know that θ was either small (oriented very close to the vertical) or large (very close to the horizontal) and that both hypotheses are equally plausible in light of background knowledge. The transmission of the vertically polarized photon would be surprising given large θ, but much less surprising given small θ. Furthermore, the reduction of surprise would be greater given small θ if multiple vertically polarized photons were all transmitted. Hence, thinking about explanation in terms of reduction of surprise (as well as in the context of IBE, see introduction) gives some reason for thinking that the small θ hypothesis provides a better explanation in this case. A detailed discussion of these issues is beyond the scope of this paper, but these points suggest that in at least some cases there is justification for pursuing the current approach.3

Could these measures be used to make judgments about explanatory goodness? According to Schupbach and Sprenger (2011), their goal is to propose a measure of explanatory power that ‘would clarify the conditions under which hypotheses are judged to provide strong versus weak explanations’ (p. 106). They further claim that an appropriate analysis of explanatory power ‘would also clarify the meaning of comparative explanatory judgments such as “hypothesis A provides a better explanation of this fact than does hypothesis B”’ (p. 106). However, they also point out that they ‘take no position on whether our analysis captures the notion of explanatory power generally; it is consistent with our account that there be other concepts that go by this name but which do not fit our measure’ (p. 106). According to Schupbach (private communication), their measure E1 is appropriate for making judgments about explanatory goodness in some cases, such as those where priors are not accessible or whenever agents have knowingly ungrounded subjective priors, but in other cases judgments of explanatory goodness may require other factors to be taken into account. In particular, they may require a trade-off between explanatory power in their sense, which corresponds to Good’s notion of weak explanatory power, and the improbability of the hypothesis since a hypothesis with high explanatory power might not rank so well in terms of overall explanatory goodness if it has a low prior probability. Following Good (1968), I will assume that the probability and complexity of a hypothesis are inversely related, so the more improbable a hypothesis, the greater its complexity.4 Achieving an appropriate trade-off between weak explanatory power and improbability/complexity is the focus of the current paper.

In the rest of this section, I will highlight the need for such a trade-off in a wide range of cases and to that end I will focus on what the four measures identified so far have in common.

Entailment

E1 and E2 are maximal in cases where h entails e, while E3 and E4 take on their greatest values for a given e in cases where h entails e. Although this is appropriate for the specific concept of explanatory power these measures attempt to explicate (reduction of surprise), it seems to be a distinct weakness if one is trying to evaluate the overall goodness of an explanation or to compare explanations with each other. We can often distinguish between how well two hypotheses explain the evidence in cases where both of them entail the evidence. For example, explanationists typically cite simplicity as an explanatory virtue that could discriminate in such cases. If a conspiracy theory is deliberately constructed in such a way that if it were true, it would entail the explanandum in question, it would still be reasonable to think that it is a very poor explanation if it is very unlikely to be true in the first place.

Equal likelihoods and irrelevant conjunction

Closely related to the case of entailment, it turns out that all four measures satisfy the following condition for a given e: E(e,h1)=E(e,h2) if and only if P(e|h1)=P(e|h2). In fact, this condition is enshrined in the principle of positive relevance, which is used in the axiomatization of these measures (Cohen, 2016):

Positive relevance. E(e,h1)E(e,h2) if and only if P(e|h1)P(e|h2).

An application of positive relevance gives rise to another important feature of all four measures known as irrelevant conjunction. It says that conjoining an irrelevant hypothesis, h2, to a given hypothesis, h1, has no effect on h1’s (weak) explanatory power:5

Irrelevant conjunction. If h2 is probabilistically independent of e, h1 and their conjunction, then E(e,h1h2)=E(e,h1).

It is easy to see that this follows from positive relevance since P(e|h1h2)=P(e|h1) when h2 is probabilistically independent of e and h1, and hence given positive relevance that E(e,h1h2)=E(e,h1). Schupbach and Sprenger argue for this condition on the grounds that ‘h1h2 will not make e any more or less surprising than h1 by itself already does’ (2011, p. 110) and hence has no effect on explanatory power. Crupi and Tentori agree, noting that ‘it does not alter the degree to which e is explained’ (2012, p. 367). While this is appropriate for measures of explanatory power that seek to explicate how well a hypothesis would explain the explanandum if the hypothesis were true, it seems clear that adding an irrelevant hypothesis results in a worse explanation overall. Why? Because once again considerations of simplicity and plausibility come into play. Let e be a description of the bending of light from a distant source by the sun and h1 an explanation of this by Einstein’s theory of general relativity. Let h2 be the hypothesis that I have an identical twin elsewhere in the universe. All four measures judge that Einstein’s theory explains the bending of light to the same extent that the conjunction of Einstein’s theory and the hypothesis about my identical twin explains it. A plausible measure of explanatory goodness should show that this conjunction provides a worse explanation.

The foregoing discussion suggests a satisfactory measure of explanatory goodness should capture the idea that in the case of irrelevant conjunction the more concise explanation is better:

Concise explanation. If h2 is probabilistically independent of e, h1 and their conjunction, then E(e,h1h2)E(e,h1), with equality only in the case where P(h2|h1)=1.

However, the more general point that applies to all cases where two hypotheses have equal likelihoods is that explanatory factors such as simplicity or plausibility can discriminate between them. This motivates the following adequacy condition for a measure of explanatory goodness based on the relevance of the initial plausibility of the hypotheses as measured by their prior probabilities given only background knowledge:

Initial plausibility. If P(e|h1)=P(e|h2), then E(e,h1)E(e,h2) if and only if P(h1)P(h2).

Note that the concise explanation condition follows from initial plausibility. I will explore the role of prior probabilities further below.

Probabilistic relevance

Could a hypothesis which is negatively relevant to e provide a better explanation than one which is positively relevant to e? Consider a bag consisting of 99 fair coins and one coin with a bias towards heads such that its objective chance of landing heads is 0.51. A coin is selected at random and, on being tossed, lands heads. Consider the hypotheses: ‘the selected coin is fair’ (h1) and ‘the selected coin is biased’ (h2). Note that h1 is negatively related to the observation since P(e|h1)=0.5<0.5001=P(e) while h2 is positively related to it. Thinking of explanation in terms of how well the hypotheses would account for the explanandum if they were true, which is what the four measures specified earlier seem to explicate, h2 provides the better explanation. However, this does not take into account the prior improbability of h2, which is relevant if we are assessing the overall goodness of the explanations. In this sense, given the very small difference in the likelihoods and the much greater prior probability of h1, it is plausible to think that h1 provides a much better explanation overall. Arguably, a trade-off needs to be made between probabilistic relevance and complexity (in the sense of lower probability) when evaluating an explanation, though how exactly that trade-off should be made is not immediately obvious. I explore this matter in Sects. 2.5 and 3.6

Even though h1 seems to provide a better overall explanation than h2, there is also something deficient about h1 as an explanation due to its negative relevance to the explanandum and this should feature in any plausible account of explanatory goodness. I will return to this point in Sect. 4.4.

Striking the balance

While it is perfectly reasonable to consider explanatory power in the sense explicated by measures E1-E4 (weak explanatory power) as a factor in explanatory goodness, the focus in this section has been on the need for a trade-off between weak explanatory power and complexity. Or to put it another way, a measure of explanatory goodness should combine weak explanatory power and prior probability in an appropriate manner. The initial plausibility condition specifies how the priors can play a role when the likelihoods are equal, but how should they be taken into account more generally?

It might be thought that Bayes’ theorem provides an answer since it essentially combines E3 with the prior probability. In that case, the goodness of an explanation would be identified with its posterior probability. But there are good reasons to reject this approach. First, while priors are relevant to explanatory goodness, arguably this approach gives too much weight to priors via Bayes’ theorem. While explanationists would like to think that the best explanation would often turn out to be the most probable hypothesis, it certainly seems possible that in at least some scenarios this might fail to be the case. Second, it also seems that in some cases a conjunctive explanation that combines two compatible hypotheses, h1h2 say, could turn out to be a better explanation than either h1 or h2, yet this is ruled out if explanatory goodness is identified with posterior probability.

One way of putting this is as follows. If h1 and h2 have equal posteriors then since P(e|h1)·P(h1)=P(e|h2)·P(h2), if we were to treat them as equal in terms of explanatory goodness, we would essentially be giving the priors as much importance as likelihoods. While I have argued that excluding priors is too extreme in one direction, giving them this much of a role is arguably too extreme in the other direction; a better balance is needed. In light of these considerations, it seems reasonable to use likelihoods to discriminate between hypotheses with equal posterior probabilities. Hence, despite my concerns about the positive relevance condition, it does seem appropriate to apply it in cases where the priors or the posteriors of the hypotheses are equal. This suggests the following restricted version of the positive relevance condition:

Restricted positive relevance. If P(h1)=P(h2) or P(h1|e)=P(h2|e), then E(e,h1)E(e,h2) if and only if P(e|h1)P(e|h2).

Now we are in a position to consider how to make the appropriate trade-off between weak explanatory power and improbability/complexity.

A Good approach to good explanation

The mathematician and World War II cryptologist I. J. Good made significant contributions to this topic. I have already drawn attention to his measure in Eq. (3), which is probably the best known measure of explanatory power. Based on the desiderata he set out in his 1960 paper, he argued that this measure was ‘essentially the only possible explicatum for explanatory power’ (Good, 1960, p. 320). However, in another paper in 1968 he distinguished between explanatory power in the weak sense (weak explanatory power) and the strong sense (strong explanatory power) and noted that ‘the double meaning of “explanatory power” has previously been overlooked’ (Good, 1968, p. 124). By weak explanatory power, he meant that the explanatory power of a hypothesis h is ‘unaffected by cluttering up [h] with irrelevancies’, while strong explanatory power ‘is affected by the cluttering’ (Good, 1968, p. 123).

When is a hypothesis ‘cluttered up with irrelevancies’? One of Good’s desiderata (axiom 10 in the 1968 paper) provides the answer and hence the key distinction between weak and strong measures of explanatory power. This desideratum is essentially the irrelevant conjunction condition specified earlier. So irrelevant conjunction must be satisfied by a weak measure of explanatory power since it is unaffected by the inclusion of an irrelevant hypothesis (clutter). However, strong measures do not satisfy irrelevant conjunction, but instead take the prior probability into account to penalize the inclusion of an irrelevant hypothesis.

Note that strong explanatory power is intended to penalize not just the addition of irrelevant hypotheses, but also improbable/complex hypotheses more generally. An analogy with model selection might help to motivate this approach. By adopting a sufficiently complex model, it is possible to obtain an excellent fit to the data, but in doing so one is likely to over-fit the model to noise in the data. Hence, a trade-off between how well the model fits the data and the complexity of the model is sought and this can be achieved by penalizing models for their complexity. In Bayesian model selection, more complex models can be assigned lower probabilities so they are penalized more. This trade-off is closely related to that needed here. For example, in many cases it is possible to come up with an ad hoc hypothesis or conspiracy theory that has been deliberately constructed to entail the explanandum (see Sect. 2.2) even though the hypothesis itself is very improbable. To avoid this, hypotheses need to be penalized for their improbability/complexity. In model selection, this is often expressed in terms of Ockham’s razor and as we will see this is also how Good refers to his approach.

The strong measure advocated by Good is:

E5(e,h)=logP(e|h)·P(h)γP(e), 5

where 0<γ<1 is a constant and so (5) provides a continuum of measures of strong explanatory power. According to Good, ‘the constant γ measures the degree to which the simplicity of the hypothesis is regarded as desirable ... as compared with its weak explanatory power’ (Good, 1968, p. 130). If γ=0 were permitted then E5 would just be Good’s weak measure, E3, so weak explanatory power can be seen as a limiting case of strong explanatory power. Furthermore, requiring γ>0 means that E5 satisfies the concise explanation condition. Also, if γ=1 were permitted then E5 would just be the log of posterior probability and so requiring γ<1 means that E5 satisfies the restricted positive relevance condition.

Relating this to the discussion in Sect. 2, we can see that the positive relevance condition is closely related to weak explanatory power since it entails the irrelevant conjunction condition. By contrast, the initial plausibility condition is closely related to strong explanatory power since it entails the concise explanation condition. And while a strong measure should not satisfy the positive relevance condition, it should nevertheless satisfy the restricted positive relevance condition.

The four measures discussed in Sect. 2 are appropriate if one is making judgments about weak explanatory power and so debates about their relative merits are to be understood in that light. However, if instead one is interested in when one hypothesis provides a better overall explanation of a given explanandum than another hypothesis does, then it seems that something along the lines of Good’s strong sense of explanatory power is needed. In fact, Good proposes what he calls a ‘sharpened version of “Ockham’s razor” which is that if our primary purpose is explanation we should select the hypothesis (among those we know) which has the maximum strong explanatory power’ (1968, p. 123).

A measure motivated by considerations of coherence provides another example of a measure of strong explanatory power (Glass, 2021):

E6(e,h)=P(e|h)·P(h|e)=P(e|h)2·P(h)P(e). 6

Strictly speaking, E6 was not proposed as a measure of explanatory power as such, but rather as a measure for ranking hypotheses as explanations of an explanandum e. It is easy to show that E6 satisfies the initial plausibility, concise explanation and restricted positive relevance conditions. Furthermore, when comparing h1 and h2 as explanations of e it judges h1 to be better than h2 if and only if:

P(e|h1)2·P(h1)>P(e|h2)2·P(h2)P(e|h1)·P(h1)1/2>P(e|h2)·P(h2)1/2, 7

and hence it provides the same ordering of explanations as Good’s strong measure, E5, if we set γ=12, which Good describes as the simplest explicatum. I will return to this point in Sect. 4.4.

The overlap coherence measure was also proposed for ranking explanations. For a hypothesis h and explanandum e the overlap coherence is given by (Glass, 2002; Olsson, 2002):

E7(e,h)=P(he)P(he). 8

Like E5 and E6, it also satisfies the initial plausibility, concise explanation and restricted positive relevance conditions and so can be considered as another strong measure of explanatory power. So Good’s strong measure is not the only measure of strong explanatory power and hence further reasons need to be given in its defence. I now turn to that task.

A defence of Good’s strong measure

Deriving Good’s strong measure

In his 1968 paper, Good adopted a two stage strategy to show that a measure of strong explanatory power must be a monotonically increasing function of his E5 measure. First, he drew on his 1960 paper where he showed that a weak measure of explanatory power must be a monotonically increasing function of his E3 measure based on ten axioms or desiderata for such a measure. Then he made some assumptions about a strong measure and its relation to his weak measure in order to derive his result concerning E5.

Here I want to present two new derivations that relate more closely to some of the desiderata for measures of explanatory power found in the recent literature. The first approach does not require establishing E3 as a measure of weak explanatory power, but rather a property of it, which is sufficient to establish E5. The second follows Good’s strategy of first establishing E3, in this case drawing on a result by Crupi et al. (2013), before using Good’s result to establish E5 as a measure of strong explanatory power.

The first condition is based on Crupi and Tentori (2012). It is a formal assumption about measures of weak and strong explanatory power, which I will denote as EW and ES respectively.

(A1) Let L be a propositional language and Lc the contingent formulas in L. Let P be the set of regular probability functions that can be defined over L and let EW:Lc×Lc×PR and ES:Lc×Lc×PR. There exist continuous, differentiable functions w and s such that, for any e,hLc and any PP, EW(e,h)=w[P(eh),P(h),P(e)] and ES(e,h)=s[P(eh),P(h),P(e)].

In terms of the dependence on P(eh), P(h) and P(e), A1 just says that EW and ES are functions of absolute and conditional probabilities of logical combinations of h and e since all of these probabilities are determined by P(eh), P(h) and P(e). The requirement of continuity and differentiability enables us to take advantage of part of Good’s proof and ensures that the functions are well-behaved.

The second condition requires that ES depend only on EW and P(h). This is motivated by the distinction between a weak and strong measure of explanatory power since the latter should take into account the simplicity/complexity of the hypothesis in addition to its weak explanatory power.

(A2) ES can be expressed as a function of EW and P(h) so that ES(e,h)=s[P(eh),P(h),P(e)]=sW[EW(e,h),P(h)].

A possible objection to this condition is that while it might be accepted that ES should depend on EW and P(h), it might be questioned whether it should only depend on these two factors. However, we need to distinguish conceptually between a measure of strong explanatory power and a measure of overall explanatory goodness. Given Good’s account of strong explanatory power, this condition seems unobjectionable. Whether a strong measure will turn out to provide a plausible measure of overall explanatory goodness will depend on how well it captures various explanatory virtues (see Sect. 4.5).

The third condition says that a weak measure of explanatory power should treat probabilistic independence between e and h as a special case by assigning it a fixed, neutral value. This clearly holds for E1-E4 since they are measures of probabilistic relevance.

(A3) EW has a fixed, neutral point α such that EW(e,h)=α if and only if h and e are probabilistically independent.

Suppose that h1 provides an explanation of e1 and h2 provides an explanation of e2, but that h2 and e2 are irrelevant to h1 and e1. The fourth condition says that the degree to which h1h2 explains e1e2 is a function of the degree to which h1 explains e1 and the degree to which h2 explains e2 and that this applies for both weak and strong measures of explanatory power. Such a condition is discussed by Good (1968) in the context of strong explanatory power and by Cohen (2016), who presents a generalized version of this condition for an arbitrary number of explanandum-explanans pairs. Formally, it can be stated as follows:

(A4) If h2 and e2 are each probabilistically independent of h1, e1 and their conjunction, then EW(e1e2,h1h2) can be expressed as a function, wc, of EW(e1,h1) and EW(e2,h2) so that EW(e1e2,h1h2)=wc[EW(e1,h1),EW(e2,h2)], where wc is strictly increasing in each argument when the other argument is fixed and non-extreme (i.e. neither its maximum or minimum value) and non-decreasing otherwise. Similarly there is a corresponding function, sc, for ES so that ES(e1e2,h1h2)=sc[ES(e1,h1),ES(e2,h2)].

In some cases, it seems very appropriate to combine independent explanations in this way. Cohen (2016), for example, highlights its relevance to sets of experiments where each is carried out in a different laboratory and has a separate hypothesis. However, Cohen does not propose this property as a necessary requirement for measures of explanatory power since he sees its virtue as being one of convenience. Certainly, it can be very convenient if a measure decomposes into products or sums, as is the case for Good’s measures. For Good’s weak measure we have:

E3(e1e2,h1h2)=logP(e1e2|h1h2)P(e1e2)=logP(e1|h1)P(e1)+logP(e2|h2)P(e2)=E3(e1,h1)+E3(e2,h2), 9

when the appropriate independence relationships hold and it is easy to show that the corresponding result holds for his strong measure as well. While such a decomposition is convenient, there are good reasons to think that A4 should indeed be a necessary condition for measures of explanatory power.

Suppose a patient reports two symptoms, e1 and e2. Whatever the patient might think, suppose the doctor has good reason to believe that there is no dependence between these symptoms and is able to explain them by conditions h1 and h2 respectively, which again are independent of each other and of the evidence they do not explain. In such a case, it is reasonable to combine these independent hypotheses to explain the symptoms to the patient. Furthermore, how well they explain the symptoms is very plausibly taken to be an increasing function of each explanation. For example, suppose the doctor had two potential explanations, h2 and h3, for e2 and that both satisfied the relevant independence conditions with h1 and e1. It seems clear that if h2 provides a better explanation of e2 than h3 does, then the combined explanatory power of h1 and h2 would be greater than that of h1 and h3.

So the importance of A4 lies not merely its convenience, but rather in the plausibility of requiring that when explanations are combined and the relevant independence conditions are met, explanatory power should be an increasing function of each explanation. This becomes clear when we see scenarios where measures such as E1 and E2 violate A4.

Example 1

Suppose that (e1,h1), (e2,h2) and (e2,h2) are three explanandum-explanans pairs satisfying the relevant independence conditions for A4. These can be thought of as three pairs of symptoms and corresponding conditions that explain them, with each of the three pairs being irrelevant to the other pairs. Suppose P(e1|h1)=1, P(e1)=0.5, P(e2|h2)=0.4, P(e2)=0.2, P(e2|h2)=0.8 and P(e2)=0.75. Clearly, E1(e1,h1)=E2(e1,h1)=1. Also, E1(e2,h2)0.455>0.143E1(e2,h2). Combining explanations, we find that E1(e1e2,h1h2)0.714<0.739E1(e1e2,h1h2). Hence, since E1(e2,h2)>E1(e2,h2), but E1(e1e2,h1h2)<E1(e1e2,h1h2), E1 violates A4. The same is true of E2 since E2(e2,h2)=0.25>0.2=E2(e2,h2), while E2(e1e2,h1h2)0.333<0.68=E2(e1e2,h1h2).

So although (a) h1 explains e1 and (b) h2 explains e2 better than h2 explains e2, E1 and E2 counterintuitively give the result that (c) h1h2 explains e1e2 less well than h1h2 explains e1e2. In fact, matters are worse than this since, according to E1 and E2, the combination of two poorer explanations can be better than the combination of two better explanations. These results provide good reasons for adopting A4 as a necessary requirement for both weak and strong measures of explanatory power.

In the discussion so far, I have argued for A4 by appealing to examples that are intended to highlight its plausibility. However, since E1 and E2 violate A4, these examples provide counterexamples to these measures. A possible response is to say that even in terms of weak explanatory power E1 and E2 are better thought of as explications of a different concept from the one being proposed here and hence from E3. I think this is a reasonable response and will return to it in Sect. 4.2 where I discuss the fact that A4 leads to a property of E3 and E5 that has been criticized in the literature.

The final two conditions were discussed earlier:

(A5) ES satisfies initial plausibility (see Sect. 2.3).

(A6) ES satisfies restricted positive relevance (see Sect. 2.4).

Based on these assumptions and recalling that ES is a function of EW and P(h) as expressed in A2, we then get the following theorem for Good’s strong measure of explanatory power, E5.7

Theorem 1

If EW and ES are weak and strong measures of explanatory power respectively that satisfy A1 - A6, then ES is a monotonically increasing function of Good’s strong measure, E5.8

For an alternative way to derive Good’s measure, consider the following conditions for a weak measure of explanatory power:

(A7) For any e,h1,h2Lc and PP, EW(e,h1)EW(e,h2) if and only if P(e|h1)P(e|h2), i.e. EW satisfies positive relevance (see Sect. 2.3).

(A8) For any e1,e2,hLc and PP, EW(e1,h)EW(e2,h) if and only if P(h|e1)P(h|e2).

A7 (positive relevance) seems like a very plausible condition for weak explanatory power. Of course, positive relevance was criticized in Sect. 2.3 in the context of overall explanatory goodness, but it is appropriate to retain it as a condition for weak explanatory power. Furthermore, all of the weak measures considered in this paper, E1-E4, satisfy A7.

What about A8? In some particular cases, there is clear justification for A8 if we think of weak explanatory power in terms of reducing surprise. If P(e1)=P(e2), then for EW(e1,h)>EW(e2,h) seems to require that P(e1|h)>P(e2|h), from which it follows that P(h|e1)>P(h|e2). Similarly, if P(e1|h)=P(e2|h), then EW(e1,h)>EW(e2,h) seems to require that P(e1)<P(e2), from which it again follows that P(h|e1)>P(h|e2). More generally, however, A8 is widely accepted as a necessary requirement for measures of the degree to which e confirms h. Indeed, so central is it that Crupi and Tentori (2014) include it as part of their definition of confirmation, calling it final probability. In the present context, A8 can then be understood as stating a fundamental relationship between explanation and confirmation. It ensures that if h provides explanations of e1 and e2, then it weakly explains (reduces the surprise of) e1 better than e2 exactly when e1 provides greater confirmation of h than does e2. This seems very plausible indeed since it is precisely the ability to explain otherwise very surprising phenomena that can provide strong confirmation of a hypothesis.9

Using A7 and A8 we then get the following theorem for Good’s strong measure of explanatory power, E5.

Theorem 2

If EW and ES are weak and strong measures of explanatory power respectively that satisfy A1, A2, A4 - A8 then EW is a monotonically increasing function of Good’s weak measure, E3, and ES is a monotonically increasing function of Good’s strong measure, E5.10

Irrelevant evidence

The most significant objection to Good’s weak measure of explanatory power, but which applies equally to his strong measure, is the problem of irrelevant evidence due to Schupbach and Sprenger (2011). Let e be a general description of Brownian motion and h be Einstein’s atomic explanation of it. Assuming P(e|h)/P(e)1, Good’s weak measure correctly judges this to be a good explanation. However, let e be the irrelevant proposition that the mating season for an American green tree frog takes place from mid-April to mid-August. According to Good’s measures (weak and strong) this has no bearing on the explanatory power of Einstein’s account since P(ee|h)/P(ee)=P(e|h)/P(e). By contrast, according to the measures E1 and E2, the addition of e reduces the explanatory power.

Is this consequence of Good’s measures as counterintuitive as Schupbach and Sprenger claim? I will respond by trying to show that there is a very plausible way to make sense of the alleged counterexample. Unlike the measures E1 and E2, Good’s measures place no upper boundary on the degree of explanatory power. If there are two explananda, e1 and e2, Good’s weak measure can be expressed as

E3(e1e2,h)=logP(e1e2|h)P(e1e2)=logP(e2|h,e1)P(e2|e1)+logP(e1|h)P(e1),=E3(e2,h|e1)+E3(e1,h), 10

where E3(e2,h|e1) represents the conditional weak explanatory power, i.e. the degree to which h weakly explains e2 after conditioning on e1. Hence, the weak explanatory power of h for e1e2 is obtained by adding the degree to which it weakly explains e2 conditional on e1 to the degree to which it weakly explains e1. Good (1960) refers to this as strict additivity of the first kind. Clearly, there is always scope for the explanatory power to be greater when e2 is included than it was in the case of just e1. If the degree to which h explains e2 given e1 is positive, then the explanatory power increases, if it is negative, explanatory power decreases, and if it is zero, explanatory power remains unchanged. Even if h entails e1, the explanatory power could increase further. For example, if h also entailed e2 then it would increase further (provided e1 did not entail e2).

Returning to the earlier example, while Einstein’s atomic account, h, provides an excellent explanation of e, which gives a general description of Brownian motion, its explanatory power would be increased further if it could explain additional relevant evidence. For example, Einstein’s account explained not only the general phenomenon of Brownian motion, but also the much more specific results of Perrin’s 1908 experiments to determine the mean square displacement of particles undergoing Brownian motion and its relation to Avogadro’s number, which further confirmed the atomic theory. By contrast, the explanatory power of Einstein’s account would have decreased had there been additional evidence, e2 say, for which Einstein’s account had negative explanatory power (given e). So conjoining the original evidence e with additional positively relevant evidence explained by h would increase the explanatory power of h, while conjoining e with additional negatively relevant evidence such as e2 would decrease the explanatory power of h. What effect should conjoining e with a proposition about American green tree frogs, which is completely irrelevant to h, have on the explanatory power of h? According to E3, it has no effect whatsoever, which seems very reasonable.

However, measures E1 and E2 are not additive and so give very different results. In fact, this gives rise to a counterintuitive feature of these measures relating to entailment. If a hypothesis, h entails evidence, e, then conjoining e with further evidence cannot increase the explanatory power of h, no matter how well h explains this further evidence. And so if Einstein’s account entails a general description of Brownian motion, then its explanatory power would not be increased by conjoining this evidence with Perrin’s findings relating to Avogadro’s number.

Furthermore, suppose a hypothesis entails an explanandum that is not at all surprising because it has a high prior probability in light of background knowledge, then its explanatory power cannot be enhanced by entailing a further surprising explanandum. Planetary orbits (e1) that could be derived from Newton’s theory could also be derived from Einstein’s theory (h) and so the explanatory power of Einstein’s theory would be one according to E1 and E2. However, the perihelion of Mercury (e2) could also be derived from Einstein’s theory, but according to E1 and E2 its explanatory power for e1e2 would not be any greater than it was for e1. We might call this the problem of relevant evidence.

In summary, there is a very plausible way to make sense of the irrelevant evidence issue from the perspective of Good’s weak measure (and hence his strong measure too). Furthermore, I have argued that the non-additive nature of measures such as E1 and E2 can give rise to counterintuitive judgments about explanatory power and, in particular, the problem of relevant evidence. However, maybe there is another way to view these differences. Although E1, E2 and Good’s measure, E3, as well as E4, are all weak measures of explanatory power, they may nevertheless be explicating different concepts. Arguably, measures such as E1 and E2 are better understood as explications of the degree to which h entails e.11 However, if one wants a weak measure of explanatory power that does justice to explanatory scope and so increases appropriately as it explains more evidence, then an additive measure such as Good’s weak measure, E3, is suitable since it satisfies Eq. (10) as well as A4. I will also argue below that E3 has advantages in terms of explicating reduction of surprise.12

Explanatory power and information

Good (1968) considers how his measures of weak and strong explanatory relate to semantic information. According to one very widely used account, the semantic information or information content of h is given by Bar-Hillel and Carnap (1953):

Inf(h)=-logP(h) 11

for a probability distribution P, while the information content of h given e is:

Inf(h|e)=-logP(h|e). 12

The information concerning h provided by e is given by:

Inf(h,e)=logP(e|h)P(e), 13

which Good also calls the mutual information between h and e since it is symmetric in h and e (Good, 1966, 1968). Hence, Good identifies the degree to which h weakly explains e [see Eq. (3)] with the information concerning h provided by e or equivalently, and perhaps more appropriately, the information concerning e provided by h.

Since Inf(e) is a decreasing function of P(e), it could be taken to represent the degree to which e is surprising, in which case Inf(e|h) would represent the degree to which e is surprising given h. Good’s weak measure of explanatory power can then be understood as representing how well h reduces the degree to which e is found to be surprising since it can be expressed as follows:13

E3(e,h)=logP(e|h)P(e)=Inf(e)-Inf(e|h). 14

Schupbach and Sprenger (2011) also interpret their measure of explanatory power in terms of reducing surprise, but there are a couple of advantages to Good’s measure in this respect. First, as we have just seen, Good’s weak measure can be formulated very straightforwardly in terms of semantic information.

Second, Schupbach and Sprenger’s measure, E1, fails to discriminate appropriately in terms of reduction of surprise for different explananda which are entailed by a hypothesis, and the same is true of Crupi and Tentori’s measure, E2, since both give the maximum value of one in such cases. Suppose that e1 is very surprising in light of background knowledge, while e2 is not surprising at all. Further suppose that h entails e1 and also e2. While E1 and E2 quantify the degree to which h explains e1 to be the same as the degree to which it explains e2, according to Good’s measure, E3, h provides a much better weak explanation of e1 than it does of e2. In fact, since Inf(e1|h)= Inf(e2|h)=0, the degree to which h explains e1 is just Inf(e1) and similarly the degree to which h explains e2 is Inf(e2) according to E3. Since Inf(e1) can be thought of as the degree to which e1 is surprising in light of background knowledge only, it is clearly much greater than Inf(e2). As noted earlier, E1 and E2 are better thought of as measures of the degree to which h entails e.

Good’s strong measure of explanatory power, E5 [see Eq. (5)], can be expressed in terms of semantic information as follows:

E5(e,h)=logP(e|h)P(e)+γlogP(h)=Inf(e)-Inf(e|h)-γInf(h)=Inf(e,h)-γInf(h). 15

In light of our discussion, we can then say that strong explanatory power measures how well h reduces the degree to which e is found surprising together with the inclusion of a penalty for the complexity of h.

Making Good’s measure precise

Recall that Good’s measure, E5, has a parameter, γ, which is required to be in the interval (0, 1). Can a particular value for γ be defended? As Good pointed out, E5 can be expressed as follows:

E5(e,h)=(1-γ)Inf(h,e)-γInf(h|e). 16

On the basis of this expression, he suggested γ=12 as the simplest explicatum of E5 since it gives equal weighting to (weak) explanatory power and the term -Inf(h|e), which he associates with ‘the avoidance of “clutter”’. However, while Good’s suggestion is not implausible a more convincing justification is needed.

To address this point, we can draw on a complexity criterion proposed for explanatory goodness (Glass, 2023). The criterion requires that for an explanation h of explanandum e to be a good one, the reduction in complexity of e brought about by h must be greater than the complexity introduced by h in the context of e, where the first of these quantities is represented by Inf(h,e) and the second by Inf(h|e). Expressed in terms of strong explanatory power, it is:

Complexity criterion for strong explanatory power. If ES(e,h) is a measure of strong explanatory power of h for e then:

ES(e,h)0if and only ifInf(h,e)Inf(h|e). 17

Note that since Inf(e,h) is Good’s weak measure and Inf(h|e)0, this means that a positive value of weak explanatory power is necessary, but not sufficient, for an explanation to be a good one.

In light of (16), E5(h,e)>0 if and only if (1-γ)Inf(h,e)>γInf(h|e) and hence if E5 is to satisfy the complexity criterion, γ must be 12. This provides a strong justification for adopting this specific version of Good’s measure and, as noted earlier, for a given explanandum e this will give the same ordering of hypotheses as measure E6.

Let us now return to example from Sect. 2.4 about a bag containing 99 fair coins and one with an objective chance of 0.51 of landing heads. On being tossed, a randomly selected coin lands heads (e) and we considered the hypotheses ‘the selected coin is fair’ (h1) and ‘the selected coin is biased’ (h2). Using Good’s measure with γ=12, we find that E5(e,h1)log(0.9998)+log(0.9950)-0.0023 which is greater than E5(e,h2)log(1.0198)+log(0.1)-0.9915. So h1 is indeed the better explanation according to E5, whereas h2 would be judged better by weak measures since it is positively relevant to e while h1 is not. According to E5, h2 does not sufficiently reduce the complexity of e to compensate for the complexity introduced by h2. Notice, however, that even though E5 judges h1 to be better, it is clearly deficient in the sense that it has a negative degree of explanatory power, so it might be more accurate to say that h1 is not as bad an explanation as h2.

Explanatory virtues and inference to the best explanation

We have already seen that Good relates his strong measure of explanatory power to the explanatory virtue of simplicity. According to his version of Ockham’s razor, if two hypotheses have equal likelihoods with respect to the explanandum we should prefer the simpler of the two, which he says is ‘equivalent to the choice of the more probable hypothesis’ (1968, p. 139). Given the discussion in Sects. 2.2 and 2.3, Good’s measure does indeed accommodate simplicity in a way that weak measures do not.

Good’s measure is also able to do justice to other explanatory virtues such as scope and unification. More specifically, it is his weak measure, which is a factor in his strong measure, that is able to capture these virtues. In terms of explanatory scope, we have already seen from Eq. (10) that the explanatory power of a hypothesis increases as it explains more evidence. In terms of unification, Myrvold (2003) develops an account in terms of informational relevance. Expressing a result of Myrvold’s in terms of Good’s measure of weak explanatory power gives:

E3(e1e2,h)=E3(e1,h)+E3(e2,h)+U(e1,e2;h), 18

where U(e1,e2;h) is the degree to which h unifies e1 and e2 and is given by

U(e1,e2;h)=I(e1,e2|h)-I(e1,e2). 19

hence h weakly explains e1e2 to a degree that is the sum of how well it weakly explains e1 and e2 separately plus the degree to which it unifies them. It follows that if the sum of the weak explanatory power for e1 and e2 is the same for two hypotheses, then the one that unifies e1 and e2 more will have greater weak explanatory power. If they also have the same priors, then the hypothesis that unifies e1 and e2 more will have greater strong explanatory power as well. A similar conclusion can be reached concerning Whewell’s (1847) ‘consilience of inductions’ in terms of the value of diverse evidence (see, McGrew, 2016).

Does this mean that Good’s strong measure fully captures explanatory goodness? The various weak measures may well capture an aspect of explanatory goodness, but since they fail to accommodate simplicity, I have argued that they are not plausible candidates of explanatory goodness in a general sense. Since Good’s measure incorporates simplicity as well as the other virtues described above, it is a much more plausible candidate. Whether it fully captures explanatory goodness, however, is another matter. As acknowledged in Sect. 2.1, there may be some limitations to what can be captured probabilistically and this could include limitations arising from the fact that the account does not attempt to capture what constitutes an explanation. Also, the current approach does not take into account the potential relevance of manipulations to explanatory goodness. Eva and Stern (2019) have shown how this can be done for Schupbach and Sprenger’s measure of explanatory power, so it would be interesting to explore whether a similar approach might be appropriate for the current measure. Nevertheless, as it stands, Good’s measure does seem to go a long way to capturing key aspects of explanatory goodness.

A related topic concerns the relevance of Good’s strong measure to IBE. Recent work has demonstrated the merits of E6 [Eq. (6)] in this regard (Glass, 2021) and, as we have seen, it produces the same ranking as Good’s strong measure when γ is 12. Results showed that using this measure for IBE finds the actual or true hypothesis much more frequently than versions of IBE based on weak measures. There is a lot more that could be said about explanatory virtues and IBE, but this brief discussion suggests that Good’s strong measure does well on both fronts.

Conclusion

Strong measures of explanatory power attempt to strike a balance between how well a hypothesis accounts for the explanandum (weak explanatory power) and the improbability/complexity of the hypothesis. As such, they can be viewed as ways of making Ockham’s razor precise. While weak measures seek to capture an important aspect of explanation, I have argued that strong measures are better for quantifying explanatory goodness. In defence of Good’s strong measure, I have presented two new derivations of it, explored its connection with information theory and explanatory virtues, shown how it can be made precise, and addressed objections to it. Since Good’s strong measure depends on his weak measure, I have also presented several reasons for preferring his weak measure to the other weak measures. In particular, his weak measure is able to differentiate between explanatory power in cases where a given hypothesis entails two explananda where one is more surprising than the other.

There are various directions for further work. As noted above, it would be interesting to explore the potential relevance of manipulations to explanatory goodness in the context of Good’s measure. Also, in debates about IBE and Bayesianism, it is usually assumed that a Bayesian approach requires selecting the hypothesis with the highest posterior probability. However, this is not the case for the Bayesian approach to IBE based on Good’s measure. This is particularly relevant in cases where there are multiple compatible hypotheses. Strong measures of explanatory power should shed light on when it is appropriate to accept conjunctive explanations involving two or more hypotheses rather than just a single hypothesis.

Acknowledgements

I would like to thank participants at the conference on ‘Scientific Explanations, Competing and Conjunctive’ at the University of Utah in June 2019 for helpful discussions and Jonah Schupbach for very insightful comments on an earlier draft. I would also like to thank anonymous reviewers for their comments and suggestions. This publication was made possible through the support of a grant from the John Templeton Foundation (Grant No. 61115). The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the John Templeton Foundation.

Appendix

Proof of theorem 1

Lemma A.1

If eLc is probabilistically independent of e, hLc and their conjunction, then EW(ee,h)=EW(e,h).

Proof

Let e, h, eLc and PP be such that e is independent of e, h and their conjunction according to P.

Suppose also that e2,h2Lc. We can construct a probability distribution PP such that P(±e±h±e)=P(±e±h±e), where ±p denotes either p or ¬p, so that all probabilities involving logical combinations of eh and e are preserved.

Now we can specify P in such a way that each of e2 and h2 is independent of e, h, eh and e, and e2 and h2 are also independent of each other. To obtain such a distribution we can set conditional probabilities as follows:

  • (i)

    P(e2|±e±h±e)=a(0,1),

  • (ii)

    P(h2|±e±h±e±e2)=b(0,1).

Note that h2 is probabilistically independent of e, h and he, as is ee2 since a) P(ee2|e)=P(e2|ee)P(e|e)=P(e2|e)P(e)=P(ee2) and so ee2 is independent of e, b) P(ee2|h)=P(e2|he)P(e|h)=P(e2|e)P(e)=P(ee2) and so ee2 is independent of h, and c) P(ee2|he)=P(e2|hee)P(e|he)=P(e2|e)P(e)=P(ee2) and so ee2 is independent of he. Hence, given (A3), the relevant conditions for applying (A4) as follows are satisfied14

EW(eee2,hh2)=wc[EW(e,h),EW(ee2,h2)]=wc[EW(e,h),α].

Similarly, we can show that the relevant conditions for applying (A4) as below are also satisfied

EW(eee2,hh2)=wc[EW(ee,h),EW(e2,h2)]=wc[EW(ee,h),α].

Since wc is strictly increasing in each argument, it follows that EW(ee,h)=EW(e,h). This holds for distribution P, but since it was constructed so as to preserve the probabilities for all logical combinations of eh and e, it also holds in distribution P. This establishes lemma A.1.

Lemma A.1 will be used later in the proof of theorem 1 (in the proof of lemma A.3), but first it will be useful to introduce another lemma. Note that from (A1) it follows that there exist continuous, differentiable functions w1 and s1 such that for any e, hLc and any PP, EW(e,h)=w1[P(h|e),P(h),P(e)] and ES(e,h)=s1[P(h|e),P(h),P(e)].

To simplify matters we can identify triplets (xyz) representing [P(h|e),P(h),P(e)] that satisfy the following conditions:15

  1. 0<y,z<1

  2. 0x1

  3. xy+z-1z since xz=P(eh)P(e)+P(h)-1=y+z-1.

  4. xy/z since xz=P(eh)P(h)=y.

Let us then posit w1:{(x,y,z)[0,1]×(0,1)2|yzxy+z-1z}R and denote the domain of w1 as Dw1.

Lemma A.2

For any x,y,z1,z2 such that x[0,1], y,z1,z2(0,1) and yz1xy+z1-1z1 and yz2xy+z2-1z2, there exist e,e,hLc and PP such that P(h|e)=P(h|ee)=x, P(h)=y, P(e)=z1 and P(ee)=z2 where P(e)=z2/z1 and so P(ee)=P(e)P(e).

Proof

This can be achieved by means of the following probability assignments:

  • P(hee)=xz2,

  • P(he¬e)=x(z1-z2),

  • P(h¬ee)=(y-xz1)z2/z1,

  • P(h¬e¬e)=(y-xz1)(1-z2/z1),

  • P(¬hee)=(1-x)z2,

  • P(¬he¬e)=(1-x)(z1-z2),

  • P(¬h¬ee)=[(1-y)-(1-x)z1]z2/z1,

  • P(¬h¬e¬e)=[(1-y)-(1-x)z1](1-z2/z1).

Lemma A.3

There is a continuous, differentiable function s2 such that for any e, hLc and any PP, ES(e,h)=s2[P(h|e),P(h)].

Proof

Suppose there exist (x,y,z1) and (x,y,z2)Dw1, the domain of w1, such that w1(x,y,z1)w1(x,y,z2). Then, by lemma A.2 there exist e,e,hLc and PP such that P(h|e)=P(h|ee)=x, P(h)=y, P(e)=z1 and P(ee)=z2 where P(e)=z2/z1. Clearly, P(ee)=P(e)P(e) so e is independent of e. Similarly, P(he)=xz2+(y-xz1)z2/z1=yz2/z1=P(h)P(e) so e is independent of h, and P(hee)=xz2=xz1.z2/z1=P(he)P(e) so e is also independent of he. Thus, there exist e,e,hLc and PP such that EW(e,h)=w1(x,y,z1)w1(x,y,z2)=EW(ee,h) even though e is independent of eh and their conjunction, contradicting lemma A.1. Conversely, lemma A.1 implies that for any (x,y,z1) and (x,y,z2)Dw1, w1(x,y,z1)=w1(x,y,z2). Hence, lemma A.1 requires that there must exist w2 such that, for any e,hLc and PP, EW(e,h)=w2[P(h|e),P(h)] and w2(x,y)=w1(x,y,z). Hence it follows from (A2) that there is a differentiable function s2 such that, for any e,hLc and PP, ES(e,h)=s2[P(h|e),P(h)] since a differentiable function of differentiable functions is itself differentiable. This establishes lemma A.3.

Given lemma A.3 and (A4), Good shows in his 1968 paper that, up to a differentiable monotonic transformation, ES(e,h) is given by

log[P(h|e)]+(γ-1)log[P(h)], A1

where γ is a constant or alternatively,

logP(e|h)P(e)+γlog[P(h)]. A2

(A5) implies that if P(e|h1)=P(e|h2), then γlog[P(h1)]γlog[P(h2)] if and only if P(h1)P(h2) and so γ>0. (A6) implies that if P(h1|e)=P(h2|e), then (γ-1)log[P(h1)](γ-1)log[P(h2)] if and only if P(e|h1)P(e|h2), but since P(h1|e)=P(h2|e) this will be the case if only if P(h1)P(h2) and hence γ<1.

These conditions also require that any monotonic transformation of this function must be increasing. Suppose that ES(e,h) were a decreasing function of (A2). Suppose also that P(e|h1)=P(e|h2). Then, if γlog[P(h1)]>γlog[P(h2)], and hence P(h1)>P(h2), it would follow that ES(e,h1)ES(e,h2), which contradicts (A5). This establishes theorem 1.

Proof of theorem 2

The result for EW follows from (A1), (A7) and (A8) as demonstrated by theorem 3 of Cohen (2016), which was in turn proved in the context of E3 as a confirmation measure by Crupi et al. (2013). Lemma A.1 follows trivially given the result for EW and lemma A.3 then follows straightforwardly from lemma A.1 and (A2). The result for ES can then by established from the relevant part of the proof for theorem 1 based on lemma A.3, (A4), (A5) and (A6).

Declarations

Conflict of interest

The author has no competing interests to declare that are relevant to the content of this article.

Footnotes

1

This is clear from Crupi and Tentori’s expression ‘assuming candidate explanans h’.

2

Even if that is not the case, measures of explanatory power may still be applicable in cases where the relevant probabilities are accessible. It would be impossible to do justice to the vast literature on the problem of old evidence here, but for some recent proposals see Sprenger (2015) and Eva and Hartmann (2020).

3

Hitchcock (1999) discusses a similar example to motivate indeterministic contrastive explanation. He considers a person who is puzzled that a photon was transmitted rather than absorbed when the polarizer is believed to be aligned along the horizontal, but suggests that noting that the polarizer was in fact aligned very close to the vertical explains the contrast. For discussion of the relevance of high likelihoods in the context of inference to the best explanation, see Lipton (2004, Chap. 7). Finally, note that in the proposed approach, higher likelihood does not always result in greater explanatory goodness (see Sect. 2.4).

4

It could be questioned whether this does justice to our intuitions about complexity, but as we shall see in Sect. 4, it is based on a widely used account of semantic information and is relevant to some of the formal results presented. Given this approach, the simplicity of a hypothesis can be represented as the negative of its complexity.

5

This is Schupbach and Sprenger’s third adequacy condition, CA3. Note that two propositions p and q are said to be independent if P(pq)=P(p)P(q) and they are conditionally independent given r if P(pq|r)=P(p|r)P(q|r).

6

One response might be to deny that h1 provides an explanation since it lowers the probability of e (for discussion of this topic, particularly in the context of causal explanation, see Salmon (1980), Hitchcock (2004)). In the current context, h1 does seem to provide an explanation. It might be that h1 is not a very good explanation; just better than h2 (see Sect. 4.4). Note that measures E1-E4 also allow for negative degrees of explanatory power.

7

A1 corresponds closely to one of Good’s axioms, while A2 corresponds to an assumption he states for a measure of strong explanatory power, as does the part of A4 that refers to ES. A5 and A6 do not correspond to Good’s formally stated axioms and assumptions, but they provide a formal way to constrain the form of E5 and more specifically the possible values of γ instead of his informal discussion.

8

That is, EW satisfies A1, A3 and A4, while ES satisfies A1, A2, A4, A5 and A6.

9

Good’s weak measure turns out to have a significant advantage over other measures in terms of how it accounts for reduction or surprise as we shall see in Sect. 4.3.

10

Strictly speaking, EW satisfies A1, A7 and A8 while ES satisfies A1, A2, A4, A5 and A6. Note that only the part of A4 relating to ES is required.

11

Both E1 and E2 are well-known measures of the degree to which h confirms e and can very plausibly be considered as measures of confirmation in the sense of partial entailment since they are maximal when h entails e and minimal when h entails the negation of e. For further discussion, see Fitelson (2006), Crupi and Tentori (2013).

12

E6 and E7 do not face the problems of irrelevant evidence, irrelevant conjunction or relevant evidence. E6 also satisfies A4, but E7 does not.

13

See Crupi and Tentori (2014) and Milne (2014) for similar discussions in the context of confirmation.

14

This part of the proof is based on that of Theorem 4 by Cohen (2016).

15

This part of the proof is similar in style to those used by Crupi and Tentori (2012) in the context of weak measures of explanatory power.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Bar-Hillel Y, Carnap R. Semantic information. The British Journal for the Philosophy of Science. 1953;IV(14):147–157. [Google Scholar]
  2. Cohen MP. On three measures of explanatory power with axiomatic representations. British Journal for the Philosophy of Science. 2016;67(4):1077–1089. [Google Scholar]
  3. Crupi V, Tentori K. A second look at the logic of explanatory power (with two novel representation theorems) Philosophy of Science. 2012;79(3):365–385. [Google Scholar]
  4. Crupi V, Tentori K. Confirmation as partial entailment: A representation theorem in inductive logic. Journal of Applied Logic. 2013;11(4):364–372. [Google Scholar]
  5. Crupi V, Tentori K. State of the field: Measuring information and confirmation. Studies in History and Philosophy of Science Part A. 2014;47:81–90. [Google Scholar]
  6. Crupi V, Chater N, Tentori K. New axioms for probability and likelihood ratio measures. British Journal for the Philosophy of Science. 2013;64(1):189–204. [Google Scholar]
  7. Douven I. Inference to the best explanation made coherent. Philosophy of Science. 1999;66:S424–S435. [Google Scholar]
  8. Douven I. Inference to the best explanation, Dutch books, and inaccuracy minimisation. The Philosophical Quarterly. 2013;63(252):428–444. [Google Scholar]
  9. Douven I. Abduction. In: Zalta EN, editor. The Stanford encyclopedia of philosophy, summer. 2017. Stanford University, Metaphysics Research Lab; 2017. [Google Scholar]
  10. Eva B, Hartmann S. On the origins of old evidence. Australasian Journal of Philosophy. 2020;98(3):481–494. [Google Scholar]
  11. Eva B, Stern R. Causal explanatory power. The British Journal for the Philosophy of Science. 2019;70(4):1029–1050. [Google Scholar]
  12. Fitelson B. Logical foundations of evidential support. Philosophy of Science. 2006;73(5):500–512. [Google Scholar]
  13. Friedman M. Explanation and scientific understanding. Journal of Philosophy. 1974;71(1):5–19. [Google Scholar]
  14. Garber D. Old evidence and logical omniscience in Bayesian confirmation theory. In: Earman J, editor. Minnesota Studies in the Philosophy of Science. University of Minnesota Press; 1983. pp. 99–131. [Google Scholar]
  15. Glass DH, et al. Coherence, explanation and Bayesian networks. In: O’Neill M, Sutcliffe R, Ryan C, et al., editors. Artificial intelligence and cognitive science. Lecture notes in artificial intelligence. Springer; 2002. pp. 177–182. [Google Scholar]
  16. Glass DH. Coherence measures and inference to the best explanation. Synthese. 2007;157:275–296. [Google Scholar]
  17. Glass DH. Inference to the best explanation: Does it track truth? Synthese. 2012;185:411–427. [Google Scholar]
  18. Glass DH. Coherence, explanation, and hypothesis selection. The British Journal for the Philosophy of Science. 2021;72(1):1–26. [Google Scholar]
  19. Glass, D. H. (2023). Information and explanatory goodness. Unpublished manuscript.
  20. Glymour C. Theory and evidence. Princeton University Press; 1980. [Google Scholar]
  21. Good IJ. Weight of evidence, corroboration, explanatory power, information, and the utility of experiments. Journal of the Royal Statistical Society: Series B. 1960;22:319–331. [Google Scholar]
  22. Good IJ. A derivation of the probabilistic explication of information. Journal of the Royal Statistical Society: Series B (Methodological) 1966;28:578–581. [Google Scholar]
  23. Good IJ. Corroboration, explanation, evolving probability, simplicity and a sharpened razor. The British Journal for the Philosophy of Science. 1968;19(2):123–143. [Google Scholar]
  24. Hempel CG, Oppenheim P. Studies in the logic of explanation. Philosophy of Science. 1948;15(2):135–175. [Google Scholar]
  25. Hitchcock C. Contrastive explanation and the demons of determinism. The British Journal for the Philosophy of Science. 1999;50(4):585–612. [Google Scholar]
  26. Hitchcock C. Do all and only causes raise the probabilities of effects? In: Collins J, Hall N, Paul LA, editors. Causation and Counterfactuals. MIT Press; 2004. pp. 403–418. [Google Scholar]
  27. Howson C. The ‘old evidence’ problem. The British Journal for the Philosophy of Science. 1991;42(4):547–555. [Google Scholar]
  28. Howson C. Putting on the Garber style? Better not. Philosophy of Science. 2017;84(4):659–676. [Google Scholar]
  29. Jeffrey RC. Statistical explanation vs statistical inference. In: Rescher N, editor. Essays in honor of Carl G. Hempel. D. Reidel; 1969. pp. 104–113. [Google Scholar]
  30. Kitcher P. Explanatory unification and the causal structure of the world. In: Kitcher P, Salmon W, editors. Scientific Explanation. University of Minnesota Press; 1989. pp. 410–505. [Google Scholar]
  31. Lipton P. Inference to the best explanation. 2. Routledge; 2004. [Google Scholar]
  32. Mackonis A. Inference to the best explanation, coherence and other explanatory virtues. Synthese. 2013;190(6):975–995. [Google Scholar]
  33. McGrew L. Evidential diversity and the negation of H: A probabilistic account of the value of varied evidence. Ergo. 2016 doi: 10.3998/ergo.12405314.0003.010. [DOI] [Google Scholar]
  34. McGrew T. Confirmation, heuristics and explanatory reasoning. British Journal for the Philosophy of Science. 2003;54:553–567. [Google Scholar]
  35. Milne P. Information, confirmation, and conditionals. Journal of Applied Logic. 2014;12(3):252–262. [Google Scholar]
  36. Myrvold WC. A Bayesian account of the virtue of unification. Philosophy of Science. 2003;70(2):399–423. [Google Scholar]
  37. Olsson EJ. What is the problem of coherence and truth? Journal of Philosophy. 2002;99:246–272. [Google Scholar]
  38. Popper K. The logic of scientific discovery. Routledge; 1959. [Google Scholar]
  39. Railton P. Probability, explanation, and information. Synthese. 1981;48:233–256. [Google Scholar]
  40. Rosenkrantz RD. Why Glymour is a Bayesian. In: Earman J, editor. Minnesota studies in the philosophy of science. University of Minnesota Press; 1983. pp. 69–97. [Google Scholar]
  41. Salmon W. Statistical explanation. In: Salmon W, editor. Statistical explanation and statistical relevance. University of Pittsburgh Press; 1971. pp. 29–87. [Google Scholar]
  42. Salmon WC. Probabilistic causality. Pacific Philosophical Quarterly. 1980;61:50–74. [Google Scholar]
  43. Salmon WC. Scientific explanation and the causal structure of the world. Princeton University Press; 1984. [Google Scholar]
  44. Schupbach JN. Comparing probabilistic measures of explanatory power. Philosophy of Science. 2011;78(5):813–829. [Google Scholar]
  45. Schupbach JN. Inference to the best explanation, cleaned up and made respectable. In: McCain K, Poston T, editors. Best explanations: New essays on inference to the best explanation. Oxford University Press; 2018. pp. 39–61. [Google Scholar]
  46. Schupbach JN, Sprenger J. The logic of explanatory power. Philosophy of Science. 2011;78(1):105–127. [Google Scholar]
  47. Sprenger J. A novel solution to the problem of old evidence. Philosophy of Science. 2015;82(3):383–401. [Google Scholar]
  48. Strevens M. Do large probabilities explain better? Philosophy of Science. 2000;67:336–390. [Google Scholar]
  49. Strevens M. Probabilistic explanation. In: Sklar L, editor. Physical theory: Method and interpretation. Oxford University Press; 2014. pp. 40–62. [Google Scholar]
  50. van Fraassen BC. The scientific image. Oxford University Press; 1980. [Google Scholar]
  51. Whewell W. The philosophy of the inductive sciences. Founded upon their History. John W; 1847. [Google Scholar]
  52. Woodward J. Making things happen: A theory of causal explanation. Oxford University Press; 2003. [Google Scholar]
  53. Woodward J. Scientific explanation. In: Zalta EN, editor. The Stanford encyclopedia of Philosophy, Fall 2017 edn. Stanford University, Metaphysics Research Lab; 2017. [Google Scholar]

Articles from Synthese are provided here courtesy of Springer

RESOURCES