ABSTRACT.
Identifying the effects of environmental change on the transmission of vectorborne and zoonotic diseases is of fundamental importance in the face of rapid global change. Causal inference approaches, including instrumental variable (IV) estimation, hold promise in disentangling plausibly causal relationships from observational data in these complex systems. Valle and Zorello Laporta recently critiqued the application of such approaches in our recent study of the effects of deforestation on malaria transmission in the Brazilian Amazon on the grounds that key statistical assumptions were not met. Here, we respond to this critique by 1) deriving the IV estimator to clarify the assumptions that Valle and Zorello Laporta conflate and misrepresent in their critique, 2) discussing these key assumptions as they relate to our original study and how our original approach reasonably satisfies the assumptions, and 3) presenting model results using alternative instrumental variables that can be argued more strongly satisfy key assumptions, illustrating that our results and original conclusion—that deforestation drives malaria transmission—remain unchanged.
There is substantial and increasing interest in understanding the role that processes of global change are playing in the ecology and transmission of vectorborne and zoonotic diseases.1,2 Although these questions are of fundamental importance given the increasing rate of climate and land use change and the large proportion of emerging infectious diseases that are vectorborne or of zoonotic origin,3 causally linking these two processes is an enormous challenge. Take as an example the case of deforestation impacts on malaria transmission in the Brazilian Amazon, the focus of MacDonald and Mordecai4 and the critique by Valle and Zorello Laporta.5 The gold standard of a randomized controlled trial in which deforestation is experimentally manipulated and randomly assigned to different regions to assess its impact on malaria transmission presents obvious logistical and ethical barriers that make such an approach largely infeasible. As a result, researchers must rely on observational data and use statistical approaches to approximate, as closely as possible, the experimental ideal.
One promising set of statistical techniques—broadly referred to as causal inference methods, which includes Instrumental Variable (IV) estimation, are increasingly being leveraged to disentangle plausibly causal relationships from observational data in ecology. Because of the challenges just described, these approaches have been used by researchers assessing global change impacts on infectious disease,6–14 including in another recent study investigating the effects of deforestation on malaria transmission in Brazil,14 with similar results to our own work. Valle and Zorello Laporta5 rightly point out that model assumptions are critically important in such approaches and that causal conclusions should be carefully drawn in these contexts. However, the authors unfortunately conflate the assumptions of IV estimation in their perspective piece. As a relatively new approach in ecology and environmental science,6 it is important that the underlying assumptions are clear for appropriate application.
IV is a useful approach to overcome what is known as endogeneity bias, which is due to a relationship between the error term and one or more of the explanatory variables (formally, where and represent the error term and explanatory variable for observation i). Such a relationship could be due to bidirectional causality in which, for example, deforestation may drive malaria transmission but malaria burden may also influence rates of deforestation. In IV, a third variable, known as an instrument (, is used to isolate exogenous variation in explanatory variable and recover a statistically consistent estimator for the true relationship between the exogenous variable and the outcome.
The instrument must meet two conditions for IV to be a consistent estimator, which are sometimes termed “relevance” and “exclusion” criteria. In words, the instrument must be statistically associated with the endogenous variable (“relevance”) and must be related to the outcome only through its relationship with the endogenous variable (“exclusion”). Although the wording is easy to remember, it leaves much open to interpretation. For example, does relevance require a causal link? Does exclusion require statistical independence? The derivation makes these key assumptions much more apparent. Before showing the derivation, we first provide brief background to our original study,4 the critique by Valle and Zorello Laporta5 and our response.
In MacDonald and Mordecai,4 we were first interested in predicting annual malaria incidence as a function of annual deforestation and used aerosol optical depth (AOD) in the month of September from MODIS satellite imagery as our “instrument.” We expand on the methodology and terminology later in this article, but set the context of the argument here. Valle and Zorello Laporta5 have two critiques of our IV approach. The first, however, is a misrepresentation of the assumptions of IV—namely, that a valid IV requires that the IV has a causal effect on the endogenous explanatory variable. They state, “However, it is deforestation that causes aerosol pollution … rather than aerosol pollution that causes deforestation. … As a result, [the relevance] assumption is clearly violated.” As we show subsequently, causality is not required.15 Rather, there must be an “association,” or more specifically, the covariance between the instrument and the endogenous variable must not be zero. However, it is possible that an instrumental variable itself introduces endogeneity bias if it does not meet the exclusion criteria, and this can be particularly problematic in the case of “weak instruments” as we show later. This can occur, for example, in cases where the instrument (e.g., AOD) is strongly driven by the endogenous predictor variable (e.g., deforestation). In our case, we chose AOD as an instrument for deforestation because it is an indicator of human activity on the landscape.16 Further, over our study period, AOD was decoupled from deforestation as biomass burning in the Brazilian Amazon—and resulting AOD—was primarily driven by fires intentionally set to keep existing pastures and agricultural lands clear16 and by drought conditions leading to wildfires in already-degraded forests,16–18 rather than by new deforestation activity.
Nevertheless, to explore the extent to which our original IV estimates of the effect of deforestation on malaria may have been affected by potential endogeneity introduced by the use of AOD as an IV, we run additional IV models using 1) last year’s AOD as an instrument for this year’s deforestation and 2) remotely sensed, average municipality soil quality19 processed in Google Earth Engine,20 interacted with annual international soy and beef commodity prices from the World Bank. We chose last year’s AOD because it is correlated with this year’s deforestation (relevance), but this year’s deforestation could not have caused last year’s AOD. Although this addresses the issue of reverse causality, it is plausible that there remain endogeneity issues in this context. For example, if last year’s AOD somehow acts on this year’s malaria through mechanisms beyond deforestation, then the exclusion criteria would fail. To address these potential lingering concerns, we run additional models using soil quality coupled with international agricultural commodity prices for key Brazilian exports, which may influence a land owners’ decision to clear forest for agricultural production (relevance); in this case, deforestation rates do not cause soil quality and are highly unlikely to shift international commodity prices (exclusion). We run these IV models on our interior Amazon sample of municipalities, where active deforestation rates are highest and where we predict forest clearing should have the strongest effect on malaria transmission,4 predicting both total malaria and Plasmodium falciparum malaria incidence, following our original study.4 Results are presented in Supplemental Table 1. In brief, we find significant positive effects of deforestation on malaria transmission in each of these additional model specifications, with coefficients of similar, although slightly larger magnitude than our original study. Our main conclusion, that deforestation increases malaria transmission in the Brazilian Amazon, remains unchanged.
The second goal of MacDonald and Mordecai4 is to understand whether annual malaria burden feeds back to influence annual rates of deforestation, and we use optimal temperature for malaria transmission in the dry season as our instrument for malaria. Optimal temperature was defined as the sum of days falling within a narrow temperature band that is optimal for malaria transmission (24–26°C) based on earlier mosquito and parasite trait-based mechanistic modeling studies.21 Valle and Zorello Laporta’s5 second critique is that the exclusion assumption may be violated in this model because “it is possible that temperature affects deforestation not only through malaria, but also through other causal paths,” particularly the relationship between temperature and agricultural gross domestic production.22 In other words, favorable temperatures for mosquitos and malaria parasites may affect deforestation not just through malaria but by also being favorable for agricultural growing conditions, which increase the potential value of forest clearing. We agree that temperature is important to both agriculture and malaria and that those clearing land may consider the land’s growing potential. However, rather than counting the number of days in a 2°C temperature window during the dry season, we suggest agricultural producers will instead consider the general growing conditions of a region as they relate to commonly grown crops—for example, soil quality, climate, topography, and infrastructure. Because land clearing for agriculture is a large and long-term investment, average growing conditions are much more likely to influence clearing decisions than are small deviations in weather from year to year.
There are two additional primary reasons that our IV, optimal malaria transmission temperature, is highly unlikely to fail the exclusion criteria. First, we specifically use municipality “fixed effects” or dummy variables15 to remove roughly time invariant characteristics specific to each municipality through differencing. Thus, average characteristics (e.g., soil quality, average precipitation, average temperature) that are likely to influence the evolution of regional agricultural land use and the location of processing plants and other infrastructure are removed and the model is identified from deviations from the municipality-specific mean. Second, the range of optimal average temperatures for soybean—Brazil’s main crop by area and production23—cultivation and development in Brazil is from 20°C to 35°C.24 Recall optimal temperature for malaria transmission is 24°C to 26°C, and we use the number of days in the dry season within this narrow temperature band as our instrument. Thus, an additional day at 25°C relative to 27°C would be expected to lead to increases in malaria transmission. However, this same change in temperature would likely have a trivial impact on soy yields because both temperatures are well within the bounds of optimal soy cultivation. Given the breadth of favorable temperatures for soy, it is unlikely that changes in the number of days between 24°C to 26°C will influence land clearing decisions for agricultural production.
We too feel that causal inference approaches hold much promise in disease ecology and agree that researchers interested in exploring the use of such methods should carefully consider model assumptions. Toward that end, we briefly derive the simplest form of IV to illustrate to potential users what is under the hood of the IV approach and how the exclusion and relevance assumptions function in this technique.
DERIVING THE IV ESTIMATOR
To keep it as intuitive as possible, let us assume a bivariate regression of the form
| (1) |
where is the outcome variable (e.g., malaria incidence) for observation (e.g., municipality) i, is the endogenous explanatory variable (e.g., deforestation), is the error term, is the intercept, and is the coefficient of interest.
To derive the IV estimator, we can take the covariance of each side of equation 1 with respect to the instrument, :
| (2) |
| (3) |
Because is a constant, and the covariance of a variable with a constant is 0, the first term drops out. Similarly, because is a constant, it can be removed from the covariance. The exclusion assumption of IV is that the instrument () only affects the outcome through changes in the endogenous variable (), which is more formally written as Thus with basic rearranging, we have derived the IV estimator (),
| (4) |
CONSISTENCY OF IV
If we then want to illustrate that the IV estimator is consistent—in other words, as the sample size gets larger and larger the distribution of the estimator converges to the true parameter value—we can plug the right-hand side of Equation (1) into in Equation (4). We substitute with since we are considering whether the estimated slope from an IV converges in probability to the true slope .
| (5) |
Following a similar logic as with Equation (3), Equation (5) becomes:
| (6) |
From Equation (6), the second assumption of IV becomes evident. The second assumption is the relevance assumption or that the instrument must be statistically associated with the endogenous variable (). As can be seen in Equation (6), this means, in mathematical terms, . Covariance does not imply a direction to the relationship, whether AOD (our instrument) determines deforestation or deforestation determines AOD (or neither) is irrelevant, as it is the covariance between the two that is important.
By these two assumptions of IV, that and , Equation (6) simplifies to , illustrating IV is a consistent estimator of the true relationship.
WEAK INSTRUMENTS
Equation (6) also illustrates another important aspect when considering the application of instrumental variables, and that is a problem known as “weak instruments.” The problem occurs if the exclusion criteria, , fails. On the basis of the relationship between covariance and correlation (namely, where is the standard deviation of each variable) and assuming , we can rewrite Equation (6) to illustrate the problem (omitting subscripts for simplicity).
| (7) |
If there is a small correlation between the instrument and the error, the last term in Equation (7) does not drop out, and the IV estimator is inconsistent (). If is just slightly different from zero and is much different than zero, the last term is of minimal influence. However, if the instrument is only weakly correlated with the endogenous covariate, the last term of Equation (7) can become large. In practice, weak instruments can cause the IV estimator to be severely biased. Because there is no test to validate the exclusion criteria, the strength of the relationship between the instrument and the endogenous variable is important in practice and can be formally tested25 as in the supplementary material from MacDonald and Mordecai.4
CONCLUSION
Understanding the effects of environmental change on infectious disease transmission—from diseases long endemic to the tropics like malaria, to novel emerging pathogens we have yet to discover like SARS-COV-2—is of fundamental and increasing importance. In these complex socioecological systems that are difficult to study experimentally, emerging data sources (e.g., high spatiotemporal resolution earth observation data) and causal inference methods (e.g., IV estimation) represent one methodological approach that can help us achieve such clearer understanding.
Supplemental Material
ACKNOWLEDGMENTS
We acknowledge Dr. Ashley Larsen and two anonymous reviewers for their thoughtful comments and feedback on this manuscript.
Note: Supplemental table appear at www.ajtmh.org.
REFERENCES
- 1. Plowright RK, Reaser JK, Locke H, Woodley SJ, Patz JA, Becker DJ, Oppler G, Hudson PJ, Tabor GM, 2021. Land use-induced spillover: a call to action to safeguard environmental, animal, and human health. Lancet Planet Health 5: e237–e245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Thomas MB, 2020. Epidemics on the move: climate change and infectious disease. PLoS Biol 18: e3001013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Jones KE, Patel NG, Levy MA, Storeygard A, Balk D, Gittleman JL, Daszak P, 2008. Global trends in emerging infectious diseases. Nature 451: 990–993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. MacDonald AJ, Mordecai EA, 2019. Amazon deforestation drives malaria transmission, and malaria burden reduces forest clearing. Proc Natl Acad Sci USA 116: 22212–22218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Valle D, Laporta GZ, 2021. A cautionary tale regarding the use of causal inference to study how environmental change influences tropical diseases. Am J Trop Med Hyg 104: 1960–1962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Larsen AE, Meng K, Kendall BE, 2019. Causal analysis in control–impact ecological studies with observational data. Methods Ecol Evol 10: 924–934. [Google Scholar]
- 7. Bonds MH, Dobson AP, Keenan DC, 2012. Disease ecology, biodiversity, and the latitudinal gradient in income. PLoS Biol 10: e1001456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. MacDonald AJ, Larsen AE, Plantinga AJ, 2019. Missing the people for the trees: identifying coupled natural–human system feedbacks driving the ecology of Lyme disease. J Appl Ecol 56: 354–364. [Google Scholar]
- 9. Bauhoff S, Busch J, 2020. Does deforestation increase malaria prevalence? Evidence from satellite data and health surveys. World Dev 127: 104734. [Google Scholar]
- 10. Jones IJ. et al. , 2020. Improving rural health care reduces illegal logging and conserves carbon in a tropical forest. Proc Natl Acad Sci USA 117: 28515–28524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Garg T, 2019. Ecosystems and human health: the local benefits of forest cover in Indonesia. J Environ Econ Manage 98: 102271. [Google Scholar]
- 12. Couper LI, MacDonald AJ, Mordecai EA, 2021. Impact of prior and projected climate change on US Lyme disease incidence. Glob Change Biol 27: 738–754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Larsen AE, MacDonald AJ, Plantinga AJ, 2014. Lyme disease risk influences human settlement in the wildland–urban interface: evidence from a longitudinal analysis of counties in the northeastern United States. Am J Trop Med Hyg 91: 747–755. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Santos AS, Almeida AN, 2018. The impact of deforestation on malaria infections in the Brazilian Amazon. Ecol Econ 154: 247–256. [Google Scholar]
- 15. Wooldridge JM, 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press. [Google Scholar]
- 16. Morgan WT, Darbyshire E, Spracklen DV, Artaxo P, Coe H, 2019. Non-deforestation drivers of fires are increasingly important sources of aerosol and carbon dioxide emissions across Amazonia. Sci Rep 9: 16975. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Aragão LEOC. et al. , 2018. 21st century drought-related fires counteract the decline of Amazon deforestation carbon emissions. Nat Commun 9: 536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Chen Y, Morton DC, Jin Y, Collatz G, Kasibhatla PS, van der Werf GR, DeFries RS, Randerson J, 2013. Long-term trends and interannual variability of forest, savanna and agricultural fires in South America. Carbon Manag 4: 617–638. [Google Scholar]
- 19. Hengl T, Wheeler I, 2018. Soil organic carbon content in × 5 g/kg at 6 standard depths (0, 10, 30, 60, 100 and 200 cm) at 250 m resolution (Version v0.2). doi: 10.5281/zenodo.2525553.
- 20. Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Thau D, Moore R, 2017. Google Earth Engine: planetary-scale geospatial analysis for everyone. Remote Sens Environ 202: 18–27. [Google Scholar]
- 21. Mordecai EA. et al. , 2012. Optimal temperature for malaria transmission is dramatically lower than previously predicted. Ecol Lett 16: 22–30. [DOI] [PubMed] [Google Scholar]
- 22. Burke M, Hsiang SM, Miguel E, 2015. Global non-linear effect of temperature on economic production. Nature 527: 235–239. [DOI] [PubMed] [Google Scholar]
- 23. Cattelan AJ, Dall’Agnol A, 2018. The rapid soybean growth in Brazil. OCL 25: D102. [Google Scholar]
- 24. Viana JS, Gonçalves EP, Silva AC, Matos VP, 2013. Climatic conditions and production of soybean in northeastern Brazil. IntechOpen. doi: 10.5772/52184. [Google Scholar]
- 25. Olea JLM, Pflueger C, 2013. A robust test for weak instruments. J Bus Econ Stat 31: 358–369. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
