Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 18.
Published in final edited form as: Cancer. 2019 Feb 27;125(11):1771–1773. doi: 10.1002/cncr.32031

What Can Be Learned From Mapping the Occurrence of Acute Myeloid Leukemia?

Jonathan M Samet 1,2,#, Myles Cockburn 2,#
PMCID: PMC7500563  NIHMSID: NIHMS1626521  PMID: 30811588

Mapping of the occurrence of disease in space and time is a fundamental tool of epidemiology. In a well-known example, John Snow’s “ghost map” of clustered epidemic cases of cholera in nineteenth-century London provided a link to its source, the Broad Street pump, which was supplying cholera-contaminated water.1 This clustering, a clue to the water-borne transmission of cholera, epitomizes the historical role of mapping in epidemiology and its informativeness.

Geospatial analysis has long been used as a tool to generate hypotheses about the etiology of cancer with the rationale that patterns of clustering may be suggestive of environmental exposures causing the clustering (eg, see Bartsch et al2). We mention one now-landmark example: the mapping of cancer mortality across the United States by the National Cancer Institute. In 1974, Mason and McKay3 published maps of cancer mortality by cancer for US counties, spanning 1950 to 1969. A main purpose of the maps was to generate hypotheses for further etiological research based on the geographic patterns of mortality. For example, the high rates of lung cancer mortality in selected coastal counties prompted case-control studies that linked lung cancer mortality to occupational asbestos exposure, confirming prior studies of workers.4,5

In this issue of Cancer, Ghazawi et al provide an extensive report on spatial-temporal patterns of incidence and mortality from acute myeloid leukemia (AML) in Canada from 1992 to 2010. The findings include the identification of several industrial cities with rates well above the national average. Among these cities, the authors single out Sarnia, an industrial city with an incidence rate 3 times the national average. Some areas had rates significantly below the national average, as would be anticipated by chance. The authors conclude that the analyses confirm “existing risk factors” and “may reveal novel ones for AML.”

The context for interpreting these findings is set by what we already know about the etiology of AML; causal risk factors have been identified, including ionizing radiation, benzene and formaldehyde, tobacco smoking, and receipt of chemotherapeutic agents.6 These exposures are unlikely to explain broad spatial patterns of occurrence of AML. On the other hand, analyses at the scale of county (as in Mason and McKay) or city (as in Ghazawi et al) are unlikely to be able to separate the impacts of those risk factors from one another, from demographic factors, or from novel risk factors, should they exist. In considering the clustering of AML in Canada, the authors comment that “…pollution from local oil refineries and chemical plants in Sarnia may be implicated as a risk factor for AML in that city.” This conclusion is confirmatory, but the analyses do not lead to new hypotheses.

Large databases on cancer occurrence and software packages readily support analyses of the spatial distribution of AML; nevertheless, there are methodological complications. Ghazawi et al highlight “clusters” that use postal codes to establish the clustering unit (the smallest unit considered in the analysis, but not the smallest unit available for analysis), raising the possibility of misclassification of the location of the cluster, as well as the question of how much of the apparent clustering is produced by the unit of analysis. Postal codes reflect the addresses of people at diagnosis, and the identified clusters represent industrial areas, so the question becomes: Where do people actually live within the “cluster” postal code? Using large areas such as postal codes (with populations ranging from 0 to 139,128 and a median of 18,0637) as the unit of analysis results in bias affecting cancer incidence and mortality rate comparisons, leading to the artifactual generation of areas with both higher- and lower-than-expected rates (eg, see Goldberg and Cockburn8). Further misclassification of the location of “clusters” is presented in Figure 4 in Ghazawi et al. In this figure, the “cluster” is represented by a label placed at the centroid of the city or health board boundaries; however, this would accurately represent the “cluster” only if the population was concentrated at the centroid of each area (which is clearly not the case). More importantly, presenting data in this manner misrepresents the spatial proximity of the proposed areas of environmental exposure (heavy industry in this case) and the area-based rates of AML, producing what is most likely a spurious association.

There are 2 key epidemiologic constructs that were followed in Mason and Kay’s 1974 cancer mortality maps but not in Ghazawi et al’s analysis. First, it appears that the rates provided by Ghazawi et al were not age- adjusted, so that geographical variation could in part be due to variation in age structure of the resident populations (the median ages of the postal codes range from 23.5 to 65.5 years7). Second, in the postal code analysis presented in Figure 5 of Ghazawi et al, the values presented are for the entire postal code. The higher rates, based on a limited total number of cases, could be produced by only a few excess cases anywhere in the highlighted area. Misclassification of cases between boundaries of these areas could produce an excess of cases within one of those boundaries; for example, note that in Figure 5A, the “high” area is adjacent to the “low” area so that the “high” area could have been generated simply by a small number of cases being misclassified from the low area to the high area. Finally, postal codes singled out in the analysis include those with 10 cases, but the actual number could be as low as 6 (due to the random rounding approach used), meaning that rounding alone could affect the rate by 67%.

Our understanding of the etiology of AML remains limited. How could it be advanced? We suggest that the optimal approach would be case-control studies, ideally nested within the population-based frameworks provided by cancer registries. Such studies have the strength of being nested within defined populations and have been performed using cancer registries.9 Analyses like those of Ghazawi et al indicate locations where such studies may be made informative by being located in populations with high- and low-risk (provided the highlighted areas remain of interest after accounting for age and population size). One further complication is the heterogeneity of AML; its numerous subtypes might be linked to particular etiological agents, but stratifying a case series by subtype will limit statistical power. A further barrier to subtype-specific investigations would be the lack of information on subtypes in the majority of cases, as with AML in Canada.

Inevitably, this report raises the issue of the utility of geospatial analyses like those of Ghazawi et al. Cancer registries and mortality data provide a richer resource for geospatial analyses at a much finer scale than in the analysis performed by Ghazawi et al. The search for clustering in large datasets inevitably leads to the identification of clusters, some indicating an excess of cases and others a deficit of cases. Some will occur by chance, and these cannot readily be distinguished from those reflecting novel risk factors. This problem is well known, and we await resolution. Confirmatory findings, as in the instance of the report by Ghazawi et al, do not necessarily advance etiological understanding.

Finally, we comment on the authors’ overreaching conclusion based on the limited evidence: namely, the need for “ubiquitous installation of high-efficiency particulate air filters in these AML high-incidence cities.” Spatial clustering in particular cities is an insufficient basis for proposing an as-yet unproven control strategy. Given the limitations in our ability to estimate the precise location of either the relevant exposures or the affected populations, Ghazawi et al provide insufficient data on precisely where to conduct any relevant prevention interventions. Data are available at the individual level for these kinds of analyses, and the methods are also available to capitalize on higher locational precision without compromising patient confidentiality, and accounting for age and other known risk factors. If the aim is to find AML’s Broad Street pump, then the analysis presented is not the way to go about it.

Acknowledgments

FUNDING SUPPORT

Myles Cockburn was supported in part by National Institutes of Health grant P30CA046934.

Footnotes

CONFLICT OF INTEREST DISCLOSURES

The authors made no disclosures.

REFERENCES

  • 1.Johnson S The Ghost Map: The Story of London’s Most Terrifying Epidemic—and How it Changed Science, Cities, and the Modern World. London, UK: Riverhead Books; 2007. [Google Scholar]
  • 2.Bartsch DC, Springher F, Falk H. Acute nonlymphocytic leukemia. An adult cluster. JAMA. 1975;232:1333–1336. [PubMed] [Google Scholar]
  • 3.Mason TJ, McKay FW. US Cancer Mortality by County, 1950–1969. Bethesda, MD: National Institutes of Health; 1974. [Google Scholar]
  • 4.Blot WJ, Harrington JM, Toledo A, Hoover R, Heath CW Jr, Fraumeni JF Jr. Lung cancer after employment in shipyards during World War II. N Engl J Med. 1978;299:620–624. [DOI] [PubMed] [Google Scholar]
  • 5.Blot WJ, Fraumeni JF Jr. Geographic patterns of lung cancer: industrial correlations. Am J Epidemiol. 1976;103:539–550. [DOI] [PubMed] [Google Scholar]
  • 6.Linet MS, Morton LM, Devesa SS, Dores GM. Leukemias In: Cancer Epidemiology and Prevention. 4th ed. New York, NY: Oxford University Press; 2017. [Google Scholar]
  • 7.Statistics Canada. Population and Dwelling Count Highlight Tables, 2016 Census. https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/comprehensive.cfm. Accessed January 10, 2019.
  • 8.Goldberg DW, Cockburn MG. The effect of administrative boundaries and geocoding error on cancer rates in California. Spat Spatiotemporal Epidemiol. 2012;3:39–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Tsai RJ, Luckhaupt SE, Schumacher P, Cress RD, Deapen DM, Calvert GM. Acute myeloid leukemia risk by industry and occupation. Leuk Lymphoma. 2014;55:2584–2591. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES