Skip to main content
Scientific Reports logoLink to Scientific Reports
letter
. 2021 Nov 29;11:23044. doi: 10.1038/s41598-021-02096-3

Pairwise difference regressions are just weighted averages

Carlos Góes 1,
PMCID: PMC8630001  PMID: 34845244

arising from: R. F. Savaris et al.; Scientific Reports 10.1038/s41598-021-84092-1 (2021).

Savaris et al.1 aim at “verifying if staying at home had an impact on mortality rates.” This short note shows that the methodology they have applied in their paper does not allow them to do so. An estimated coefficient β0 does not imply that there is no association between the variables in either country. Rather, their pairwise difference regressions are computing coefficients that are weighted-averages of region-specific time series regressions, such that it is possible that the association is significant in both regions but their weighted-average is close to zero. Therefore, the results do not back up the conclusions of the paper.

Consider two regions: A and B. Suppose that the true relationships between the change in deaths per million (ΔYti) and the change in an index of staying at home (ΔXti) at epidemiological week t in countries i=A,B are the following:

ΔYtA=βAΔXtA+εtAΔYtB=βBΔXtB+εtB

For simplicity in exposition, assume that ΔXtA,ΔXtB,εtA,εtB are all zero mean, iid processes. By subtracting the second equation from the first and defining ΔYtΔYtA-ΔYtB and ΔXtΔXtA-ΔXtB, we can write:

ΔYtA-ΔYtB=β(ΔXtA-ΔXtB)+(βA-β)ΔXtA-(βB-β)ΔXtB+(εtA-εtB)ΔYt=βΔXt+ηt 1

where ηt(βA-β)ΔXtA-(βB-β)ΔXtB+(εtA-εtB). It is easy to see that, for βiβ, estimation of β will not be consistent, since, by construction, cov(ΔXt,ηt)0.

If nonetheless one estimates (1) by ordinary least squares, what does the regression coefficient β converge to? It turns out that it converges to a variance-weighted average of βA, βB, as summarized in the following proposition.

Proposition 1

Let ΔXtA,ΔXtB,εtA,εtB,βA, βB,β be all as above. Then β^, the ordinary least squares coefficient of regressing ΔYt on ΔXt, converges in probability to:

β=wβA+(1-w)βB 2

with wE[(ΔXtA)2]E[(ΔXtA)2]+E[(ΔXtB)2].

Proof

Under the stated assumptions, β^=tTΔXtΔYttTΔXt2pE[ΔYtΔXt]E[ΔXt2]β. One can calculate the population parameter β analytically:

β=E[ΔYtΔXt]E[ΔXt2]=E[(ΔYtA-ΔYtB)(ΔXtA-ΔXtB)]E[(ΔXtA-ΔXtB)2]=E[ΔYtAΔXtA]+E[ΔYtBΔXtB]E[(ΔXtA)2]+E[(ΔXtB)2]E[ΔXtAΔXtB]=E[ΔXtAΔYtB]=E[ΔXtBΔYtA]=0=E[(ΔXtA)2]E[(ΔXtA)2]+E[(ΔXtB)2]E[ΔYtAΔXtA]E[(ΔXtA)2]+E[(ΔXtB)2]E[(ΔXtA)2]+E[(ΔXtB)2]E[ΔYtBΔXtB]E[(ΔXtB)2]

Note that E[ΔYtAΔXtA]E[(ΔXtA)2]=βA and E[ΔYtBΔXtB]E[(ΔXtB)2]=βB. Using that and the definition of w we arrive at the desired result.

The intuition regarding the (2) in the Proposition is simple. Whenever the variance of ΔXtA is large relative to country B, w1 and ββA. Similarly, if the variance of ΔXtB is large relative to country A, w0 and ββB.

What does this mean for the analysis of Savaris et al.1? In general, it means that one cannot interpret their estimated β^ without knowing the underlying relative variances. Additionally, one cannot infer that an insignificant (or even numerically zero) β^ implies absence of association in either country.

To see that, suppose countries A and B have identical variance in their independent variables, but βA, βB are different. In country A, the policymaker adjusts stay-at-home orders in response to the increase in deaths, such that the change in the percentage of the public staying at home is positively correlated with the change in deaths. In country B, the policymaker does not act, such that the change in share of population staying at home is negatively correlated with contacts, infections, and deaths.

Consider the case in which βB=-βA. Then, since the regions have identical variance, w=1/2 and β=0 even though the true association is nonzero in both countries. The regression coefficients in Savaris et al.1 should not lead one to conclude that, in either country, there is no association between the change in mobility and the change in deaths per million. Figure 1 shows the result of 10,000 simulated β^ in which βA=10 and βB=-10. In this case, var(XtA)=var(XtB) and variables are iid and normally distributed. As expected, sample estimates are distributed around the population value of β=0.

Figure 1.

Figure 1

In-sample simulated β^ for 10,000 random draws with ΔXtiN(0,10), εtiN(0,1), and ΔYti=βiΔXti+εti, for i=A,B; T=1,000; and βA=10, βB=-10. As expected the sample values are distributed around the true population value of β=0.

For βAβB, then, region-specific dynamics are heterogeneous and, as shown by Pesaran & Smith2, aggregating or pooling slopes can lead to biased estimates, making individual regressions for each group member preferable. If authors assume that βA=βB for each pair in their sample – i.e., homogeneous β –, then dynamic panels would have many advantages in terms of efficiency and use of instruments to circumvent endogeneity. In either case, their pairwise approach would not be appropriate.

In order to verify if “staying at home had an impact on mortality rates,” it would be necessary to address many other issues in the analysis, including, but not limited to, omitted variable bias, measurement error, and endogeneity of the regressors. However, as shown above, even in a purely correlational analysis, with no causality claims, the applied methodology will simply deliver a weighted-average of coefficients across the two regions. An estimated coefficient β0 does not imply that there is no association between the variables in either country. Therefore, their conclusion does not follow from their regressions.

Supplementary Information

Author contributions

This article is solo authored.

Competing interests

The author declares no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-021-02096-3.

References

  • 1.Savaris, R. F., Pumi, G., Dalzochio, J. & Kunst, R. Stay-at-home policy is a case of exception fallacy: an internet-based ecological study. Scientific Reports 11, 5313. issn: 2045-2322 (2021). [DOI] [PMC free article] [PubMed] [Retracted]
  • 2.Pesaran MH, Smith R. Estimating long-run relationships from dynamic heterogeneous panels. J. Econ. 1995;68:79–113. doi: 10.1016/0304-4076(94)01644-F. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES