Pairwise difference regressions are just weighted averages

Carlos Góes

doi:10.1038/s41598-021-02096-3

letter

. 2021 Nov 29;11:23044. doi: 10.1038/s41598-021-02096-3

Pairwise difference regressions are just weighted averages

Carlos Góes ^1,^✉

PMCID: PMC8630001 PMID: 34845244

arising from: R. F. Savaris et al.; Scientific Reports 10.1038/s41598-021-84092-1 (2021).

Savaris et al.¹ aim at “verifying if staying at home had an impact on mortality rates.” This short note shows that the methodology they have applied in their paper does not allow them to do so. An estimated coefficient $β \approx 0$ does not imply that there is no association between the variables in either country. Rather, their pairwise difference regressions are computing coefficients that are weighted-averages of region-specific time series regressions, such that it is possible that the association is significant in both regions but their weighted-average is close to zero. Therefore, the results do not back up the conclusions of the paper.

Consider two regions: A and B. Suppose that the true relationships between the change in deaths per million $(Δ Y_{t}^{i})$ and the change in an index of staying at home $(Δ X_{t}^{i})$ at epidemiological week t in countries $i = A, B$ are the following:

\begin{matrix} Δ Y_{t}^{A} = & β_{A} Δ X_{t}^{A} + ε_{t}^{A} \\ Δ Y_{t}^{B} = & β_{B} Δ X_{t}^{B} + ε_{t}^{B} \end{matrix}

For simplicity in exposition, assume that $Δ X_{t}^{A}, Δ X_{t}^{B}, ε_{t}^{A}, ε_{t}^{B}$ are all zero mean, iid processes. By subtracting the second equation from the first and defining $Δ Y_{t} \equiv Δ Y_{t}^{A} - Δ Y_{t}^{B}$ and $Δ X_{t} \equiv Δ X_{t}^{A} - Δ X_{t}^{B}$ , we can write:

\begin{matrix} Δ Y_{t}^{A} - Δ Y_{t}^{B} = & β (Δ X_{t}^{A} - Δ X_{t}^{B}) + (β_{A} - β) Δ X_{t}^{A} - (β_{B} - β) Δ X_{t}^{B} + (ε_{t}^{A} - ε_{t}^{B}) \\ Δ Y_{t} = & β Δ X_{t} + η_{t} \end{matrix}

1

where $η_{t} \equiv (β_{A} - β) Δ X_{t}^{A} - (β_{B} - β) Δ X_{t}^{B} + (ε_{t}^{A} - ε_{t}^{B})$ . It is easy to see that, for $β_{i} \neq β$ , estimation of $β$ will not be consistent, since, by construction, $c o v (Δ X_{t}, η_{t}) \neq 0$ .

If nonetheless one estimates (1) by ordinary least squares, what does the regression coefficient $β$ converge to? It turns out that it converges to a variance-weighted average of $β_{A}$ , $β_{B}$ , as summarized in the following proposition.

Proposition 1

Let $Δ X_{t}^{A}, Δ X_{t}^{B}, ε_{t}^{A}, ε_{t}^{B}, β_{A}$ , $β_{B}, β$ be all as above. Then $\hat{β}$ , the ordinary least squares coefficient of regressing $Δ Y_{t}$ on $Δ X_{t}$ , converges in probability to:

\begin{matrix} β = w β_{A} + (1 - w) β_{B} \end{matrix}

2

with $w \equiv \frac{E [{(Δ X_{t}^{A})}^{2}]}{E [{(Δ X_{t}^{A})}^{2}] + E [{(Δ X_{t}^{B})}^{2}]}$ .

Proof

Under the stated assumptions, $\hat{β} = \frac{\sum_{t}^{T} Δ X_{t} Δ Y_{t}}{\sum_{t}^{T} Δ X_{t}^{2}} \overset{p}{\to} \frac{E [Δ Y_{t} Δ X_{t}]}{E [Δ X_{t}^{2}]} \equiv β$ . One can calculate the population parameter $β$ analytically:

\begin{matrix} β = & \frac{E [Δ Y_{t} Δ X_{t}]}{E [Δ X_{t}^{2}]} \\ = & \frac{E [(Δ Y_{t}^{A} - Δ Y_{t}^{B}) (Δ X_{t}^{A} - Δ X_{t}^{B})]}{E [{(Δ X_{t}^{A} - Δ X_{t}^{B})}^{2}]} \\ = & \frac{E [Δ Y_{t}^{A} Δ X_{t}^{A}] + E [Δ Y_{t}^{B} Δ X_{t}^{B}]}{E [{(Δ X_{t}^{A})}^{2}] + E [{(Δ X_{t}^{B})}^{2}]} ∵ E [Δ X_{t}^{A} Δ X_{t}^{B}] = E [Δ X_{t}^{A} Δ Y_{t}^{B}] = E [Δ X_{t}^{B} Δ Y_{t}^{A}] = 0 \\ = & \frac{E [{(Δ X_{t}^{A})}^{2}]}{E [{(Δ X_{t}^{A})}^{2}] + E [{(Δ X_{t}^{B})}^{2}]} \frac{E [Δ Y_{t}^{A} Δ X_{t}^{A}]}{E [{(Δ X_{t}^{A})}^{2}]} + \frac{E [{(Δ X_{t}^{B})}^{2}]}{E [{(Δ X_{t}^{A})}^{2}] + E [{(Δ X_{t}^{B})}^{2}]} \frac{E [Δ Y_{t}^{B} Δ X_{t}^{B}]}{E [{(Δ X_{t}^{B})}^{2}]} \end{matrix}

Note that $\frac{E [Δ Y_{t}^{A} Δ X_{t}^{A}]}{E [{(Δ X_{t}^{A})}^{2}]} = β_{A}$ and $\frac{E [Δ Y_{t}^{B} Δ X_{t}^{B}]}{E [{(Δ X_{t}^{B})}^{2}]} = β_{B}$ . Using that and the definition of w we arrive at the desired result. $□$

The intuition regarding the (2) in the Proposition is simple. Whenever the variance of $Δ X_{t}^{A}$ is large relative to country B, $w \to 1$ and $β \to β_{A}$ . Similarly, if the variance of $Δ X_{t}^{B}$ is large relative to country A, $w \to 0$ and $β \to β_{B}$ .

What does this mean for the analysis of Savaris et al.¹? In general, it means that one cannot interpret their estimated $\hat{β}$ without knowing the underlying relative variances. Additionally, one cannot infer that an insignificant (or even numerically zero) $\hat{β}$ implies absence of association in either country.

To see that, suppose countries A and B have identical variance in their independent variables, but $β_{A}$ , $β_{B}$ are different. In country A, the policymaker adjusts stay-at-home orders in response to the increase in deaths, such that the change in the percentage of the public staying at home is positively correlated with the change in deaths. In country B, the policymaker does not act, such that the change in share of population staying at home is negatively correlated with contacts, infections, and deaths.

Consider the case in which $β_{B} = - β_{A}$ . Then, since the regions have identical variance, $w = 1 / 2$ and $β = 0$ even though the true association is nonzero in both countries. The regression coefficients in Savaris et al.¹ should not lead one to conclude that, in either country, there is no association between the change in mobility and the change in deaths per million. Figure 1 shows the result of 10,000 simulated $\hat{β}$ in which $β_{A} = 10$ and $β_{B} = - 10$ . In this case, $v a r (X_{t}^{A}) = v a r (X_{t}^{B})$ and variables are iid and normally distributed. As expected, sample estimates are distributed around the population value of $β = 0$ .

In-sample simulated $\hat{β}$ for 10,000 random draws with $Δ X_{t}^{i} \sim N (0, 10)$ , $ε_{t}^{i} \sim N (0, 1)$ , and $Δ Y_{t}^{i} = β_{i} Δ X_{t}^{i} + ε_{t}^{i}$ , for $i = A, B$ ; $T = 1, 000$ ; and $β_{A} = 10$ , $β_{B} = - 10$ . As expected the sample values are distributed around the true population value of $β = 0$ .

For $β_{A} \neq β_{B}$ , then, region-specific dynamics are heterogeneous and, as shown by Pesaran & Smith², aggregating or pooling slopes can lead to biased estimates, making individual regressions for each group member preferable. If authors assume that $β_{A} = β_{B}$ for each pair in their sample – i.e., homogeneous $β$ –, then dynamic panels would have many advantages in terms of efficiency and use of instruments to circumvent endogeneity. In either case, their pairwise approach would not be appropriate.

In order to verify if “staying at home had an impact on mortality rates,” it would be necessary to address many other issues in the analysis, including, but not limited to, omitted variable bias, measurement error, and endogeneity of the regressors. However, as shown above, even in a purely correlational analysis, with no causality claims, the applied methodology will simply deliver a weighted-average of coefficients across the two regions. An estimated coefficient $β \approx 0$ does not imply that there is no association between the variables in either country. Therefore, their conclusion does not follow from their regressions.

Supplementary Information

Supplementary Information.^{(60.7KB, pdf)}

Author contributions

This article is solo authored.

Competing interests

The author declares no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-021-02096-3.

References

1.Savaris, R. F., Pumi, G., Dalzochio, J. & Kunst, R. Stay-at-home policy is a case of exception fallacy: an internet-based ecological study. Scientific Reports 11, 5313. issn: 2045-2322 (2021). [DOI] [PMC free article] [PubMed] [Retracted]
2.Pesaran MH, Smith R. Estimating long-run relationships from dynamic heterogeneous panels. J. Econ. 1995;68:79–113. doi: 10.1016/0304-4076(94)01644-F. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information.^{(60.7KB, pdf)}

[CR1] 1.Savaris, R. F., Pumi, G., Dalzochio, J. & Kunst, R. Stay-at-home policy is a case of exception fallacy: an internet-based ecological study. Scientific Reports 11, 5313. issn: 2045-2322 (2021). [DOI] [PMC free article] [PubMed] [Retracted]

[CR2] 2.Pesaran MH, Smith R. Estimating long-run relationships from dynamic heterogeneous panels. J. Econ. 1995;68:79–113. doi: 10.1016/0304-4076(94)01644-F. [DOI] [Google Scholar]

PERMALINK

Pairwise difference regressions are just weighted averages

Carlos Góes

Proposition 1

Proof

Figure 1.

Supplementary Information

Author contributions

Competing interests

Footnotes

Supplementary Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Pairwise difference regressions are just weighted averages

Carlos Góes

Proposition 1

Proof

Figure 1.

Supplementary Information

Author contributions

Competing interests

Footnotes

Supplementary Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases