Skip to main content
Shanghai Archives of Psychiatry logoLink to Shanghai Archives of Psychiatry
. 2018 Apr 25;30(2):139–143. doi: 10.11919/j.issn.1002-0829.218026

Simpson’s Paradox: Examples

辛普森悖论的范例

Bokai WANG 1, Pan WU 2, Brian KWAN 3, Xin M TU 3, Changyong FENG 1,4,*
PMCID: PMC5936043  PMID: 29736137

Summary

Simpson’s paradox is very prevalent in many areas. It characterizes the inconsistency between the conditional and marginal interpretations of the data. In this paper, we illustrate through some examples how the Simpson’s paradox can happen in continuous, categorical, and time-to-event data.

Key words: conditional expectation, odd ratio, time-to-event analysis

1. Introduction

Consider the following scenario. Suppose the 4th grade students of two schools, Alpha and Beta, from DYC school district participated in a national standard math test. We want to compare the average scores of these two schools. Assume we are told that the average scores of both male and female in Beta are higher than those in Alpha. What can we say about the overall average score in those schools? Is it true that the School Beta gets a higher average score than Alpha? The answer seems to be affirmative and intuitive. To be more specific, assume the average scores of male and female students in each school are presented in Table 1.

Table 1.

Average scores of male and female students in two schools

School (X1) Gender (X2)
Male (1) Female (2)
n Average n Average
Alpha (1) 80 84 20 80
Beta (2) 20 85 80 81

It is obvious that both male and female students in School Beta have higher average scores. However, simple calculation shows that the overall average scores in these two schools are 83.2 and 81.8, respectively. School Alpha won on the average score!

Suppose the students in School Beta received a more advanced instruction which improves the traditional method (which was adapted by School Alpha). Intuitively, the students in Beta should get a better score on average. Why is this example so counterintuitive? Is there anything wrong here? Is the average score a reasonable measure of the performance of students in a school? In fact, when we talk about two schools, most of the time we assume that the proportion of male students in those two schools are approximately the same. It is easy to prove that if the proportions of male students in those two schools above are exactly same, and the average scores of male and female students in Beta are higher than their counterparts in Alpha, then the overall average score in Beta is higher. Our example means that the difference in the gender components may reverse the relation we want to study.

The scenario above is an example of the well-known Simpson’s paradox.[1] Loosely speaking, Simpson’s paradox says that the conditional relation (conditional on gender in each school in the example) does not imply marginal relation, and vice versa. Although the statistical community had known the ‘inconsistency’ between the conditional and marginal interpretation based on the same data, see for example Yule[2], the effect of Simpson’s paradox has been way beyond the statistical community. In fact, the Simpson’s paradox is very prevalent in many areas, from natural science, [3] to social sciences, [4] and even in philosophy[5]. We can even say that it is an inherent property of data from observational studies. [6]

In this paper, we discuss some examples of Simpson’s paradox in continuous data, categorical data, and in time-to-event data. In Section 2 we give a general statistical interpretation of Simpson’s paradox using conditional expectation. In the next two sections, we show through examples how the Simpson’s paradox can occur in categorical data and in time-to-event data. The conclusion is reported in Section 5.

2. Simpson’s Paradox and Conditional Expectation

We know that if

graphic file with name sap-30-139-e001.jpg

then

graphic file with name sap-30-139-e002.jpg

(assuming b+d≠0). Do we have the similar property for inequalities of fractions? Specifically, assume sij, nij (i=1,2, j=1,2) are positive numbers with

graphic file with name sap-30-139-e003.jpg

Is it true that

graphic file with name sap-30-139-e004.jpg

Simpson [1] says that it not may be. For example,

graphic file with name sap-30-139-e005.jpg

However,

graphic file with name sap-30-139-e006.jpg

This means that the pooled data shows a reversal relation. This is the original form of ‘Simpson’s paradox’. In this section, we construct a probability model to study why this reversion occurs.

Let Y be a random variable with E|Y|<∞ . Suppose X1 and X2 are two random variables with Xi ∈ {1,2,…,ki}, where ki (≥2),i=1,2 are positve integers. Then, for any m ∈ {1,…,k1},

graphic file with name sap-30-139-e007.jpg (1)

Let us make connection of equation (1) to our example of average score in Section 1. Let X1=1 or 2 denote schools Alpha and Beta, and X2=1 or 2 denote male and female in gender, respectively. Let Y denote the score of a randomly selected 4th student in those two schools. Then from Table 1 we have

graphic file with name sap-30-139-e008.jpg

It is obvious that

graphic file with name sap-30-139-e009.jpg (2)

Equation (2) shows that both male and female students in School Beta have higher scores. When we calculate the average score of each school, we need to consider the gender component. In (1) we can see that the average scores of schools are the weighted average of the scores of males and females, which are

graphic file with name sap-30-139-e010.jpg

Using (1), we find that

graphic file with name sap-30-139-e011.jpg (3)

A close look at the data shows that the distribution of gender plays an important role in reversing the inequalities from (2) to (3). It is obvious that if the inequalities in (2) hold, and two schools have the same proportions of male students, the average score in Beta will be higher than that in Alpha.

In this example, gender is called a confounder in causal inference literatures.[7] Although the new instruction method increases the score of both boys and girls, the imbalance of the gender distribution in two schools may confound the effect of the new instruction method. This has been widely studied in the causal inference literature based on observational studies especially in Epidemiology. [6]

The example above shows how Simpson’s paradox occurs in continuous outcomes. In the following two sections, we illustrate how such a phenomenon can occur in categorical data and time-to-event data.

3. Simpson’s Paradox in Categorical Data Analysis

Suppose a certain disease can be characterized as being less severe or more severe. The patients have an option to go to either one of two hospitals for treatment: better or normal hospital. The outcome of the treatment is binary: success or failure. Consider the following example.

We can see that for less severe patients, the success rate in the better treatment hospital is much higher than the normal hospital. Similar results hold true for more severe patients.

We construct three more tables from Table 2. Table 3 is the cross-classification of the treatment and outcome. The overall success rates of two types of hospitals are 50/100 and 68/100, respectively. This seems to show that the success rate in the normal hospital is higher than the better hospital. This is not what we have expected.

Table 2.

Success rate of the treatment outcome in different severity of the disease

Hospital Severity Outcome Total
Success Failure
Better Less severe 18 2 20
More severe 32 48 80
Normal Less severe 64 16 80
More severe 4 16 20

Table 3.

Summary of the cross-classification of the treatment and outcome

Treatment Outcome Total
Success Failure
Better 50 50 100
Normal 68 32 100

Table 4 is the cross-classification of severity and the outcome. The success rates of less severe and more severe patients are 82/100 and 36/100, respectively. This is reasonable.

Table 4.

Summary of the cross-classification of the severity and outcome

Severity Outcome Total
Success Failure
Less severe 82 18 100
More severe 36 64 100

Table 5 is the cross-classification of treatment and severity. We can see that proportion of more severe patients in the better treatment group is much higher than that in the normal treatment.

Table 5.

Summary of the cross-classification of the treatment and severity

Treatment Severity Total
Less severe More severe
Better 20 80 100
Normal 80 20 100

Let O denote the outcome, which has possible values of s (“success”) or f (“failure”), T denote the treatment with possible values b (“better”) or n (“normal”), and S denote the severity with possible values l (“less severe”) or m (“more severe”). Note that

graphic file with name sap-30-139-e012.jpg

Although from table 2 it is clear that Pr{O=s│T=b,S=l}>Pr{O=s│T=n,S=l} and Pr{O=s│T =b,S=m}>Pr{O=s│T=n,S=m}, table 3 shows that Pr{O=s│T=b}<Pr{O=s│T=n}. From tables 4 and 5 we know that the success rate for more severe patients is much lower than the less severe patients, and the portion of more severe patients in the better treatment facility is much more than that in normal hospital. This imbalance reverses the direction of treatment effect.

4. Simpson’s Paradox in Time-to-event Data Analysis

Simpson’s paradox may also occur in time-to-event data. [8] Suppose we have two treatment groups (denoted by X1: treatment (1)/control (0)). We consider two age groups X2= 1 (or 0) if age is ≤ 65 (> 65) years. Suppose the hazard function of the life time T of patients given the treatment and age categories are

graphic file with name sap-30-139-e013.jpg

Furthermore, we assume that the distribution of age categories of treatment groups are

graphic file with name sap-30-139-e014.jpg

It is obvious that within each age category, the hazard function of the treatment groups is always below that of the control group. Figure 1 shows the hazard functions of two treatment groups within each age category. It is clear that treatment does a better job than control.

Figure 1.

Figure 1.

Hazard functions in different age categories

The marginal hazard functions of two treatment groups are

graphic file with name sap-30-139-e015.jpg

Figure 2 shows the marginal hazard function of two treatment groups after integrating out the age. In Figure 1, the hazard ratio of treatment versus control is a constant within each age category. However, the marginal hazard ratio is not a constant any more. This may cause some confusion especially if the follow-up time is censored at some time point . In that case, the estimated hazard function of the treatment group may be much higher than the control group, although this may not be what was expected.

Figure 2.

Figure 2.

Marginal hazard functions of two treatment groups

5. Conclusion

Simpson’s paradox is very common in observational studies due to effects of confounding. In this paper, we used some examples to show how this phenomenon can occur for continuous, categorical and survival outcomes. If the confounding effects are not addressed appropriately, conclusions obtained from statistical analyses may be totally wrong. The study of Simpson’s paradox, or more generally, of the effects of confounders, forms the rubric of the theory of causal inference, which is especially relevant in the error of big data as most data are observational in nature and confounders can obscure relationships of interest if not addressed.

Biography

graphic file with name sap-30-139-g003.gif

Bokai Wang obtained his BS in Statistics from the Nankai University in 2010 and his MS in Applied Statistics from the Bowling Green State University (Bowling Green, OH) in 2012. He is currently a PhD student in Statistics at the University of Rochester. His research interests include but are not limited to Survival Analysis, Causal Inference, and Variable Selection in Biomedical Research. As of now he has published 7 papers in peer reviewed journals.

Footnotes

Funding statement

This study received no external funding.

Conflict of interest statement

The authors have no conflict of interest to declare.

Authors’ contributions

Bokai Wang, Changyong Feng, and Xin M. Tu: theoretical derivation

Pan Wu and Brian Kwan: manuscript drafting

References


Articles from Shanghai Archives of Psychiatry are provided here courtesy of Shanghai Mental Health Center

RESOURCES