Skip to main content
JAMA Network logoLink to JAMA Network
. 2017 Jun 28;153(9):911–914. doi: 10.1001/jamadermatol.2017.1870

Correlation Among Cancer Incidence and Mortality Rates and Internet Searches in the United States

Mackenzie R Wehner 1,, Kevin T Nead 2, Eleni Linos 3
PMCID: PMC5817428  PMID: 28658470

Key Points

Question

Does state-specific internet search volume correlate with incidence and mortality rates of common cancers in the United States?

Findings

By state, relative Google search volume correlated with cancer incidence rates in 5 of 8 commonly diagnosed cancers in the United States and correlated with cancer mortality rates in 4 of those 5 cancers.

Meaning

Population-level internet search behavior may be a valuable tool to estimate cancer incidence and mortality rates, especially for cancers not included in national registries.

Abstract

Importance

Population-level disease metrics are critical to guide the distribution of resources and implementation of public health initiatives. Internet search data reflect population interest in health topics and may be an alternative metric of disease characteristics when traditional sources are lacking, such as in basal and squamous cell carcinomas, which are not included in national cancer registries. However, these data are not yet well validated or understood.

Objective

To evaluate whether state-specific normalized internet search volume correlates with incidence and mortality rates of common cancers in the United States, including melanoma.

Design, Setting, and Participants

This was a cross-sectional analysis of Google search volume index data and US cancer incidences and mortalities of 8 of the most incident cancers in the United States in 2009 to 2013, at the state level, per the National Program of Cancer Registries. Participants were people performing Google searches and patients diagnosed as having cancers reported to cancer registries.

Main Outcomes and Measures

Correlation between Google search volumes, normalized to total Google search volume, and National Program of Cancer Registries recorded cancer incidence and mortality rates.

Results

By state, relative Google search volume statistically significantly correlated with cancer incidence rates in 5 of 8 commonly diagnosed cancers in the United States (colon cancer: R = 0.61; P < .001; lung cancer: R = 0.73; P < .001; lymphoma: R = 0.51; P < .001; melanoma: R = 0.36; P = .01; and thyroid cancer: R = 0.30; P = .03). For 4 of those 5 cancers (colon cancer: R = 0.61; P < .001; lung cancer: R = 0.62; P < .001; lymphoma: R = 0.38; P = .006; and melanoma: R = 0.31; P = .03), relative Google search volume also correlated with mortality rates.

Conclusions and Relevance

Population-level internet search behavior may be a valuable real-time tool to estimate cancer incidence and mortality rates, especially for cancers not included in national registries, such as basal and squamous cell carcinomas.


This cross-sectional analysis of Google seach volume index data investigates whether internet search volumes are correlated with registry recorded incidences and mortalities of common cancers

Introduction

Disease registries provide a valuable source of information to guide research and public health initiatives. However, rigorous nationwide registry data are often unavailable for even the most common diseases, such as basal and squamous cell carcinomas. Internet search data are a novel and promising tool to estimate the impact of disease in the absence of existing data sources or where traditional methods are inadequate. We investigated the association of internet search volumes for common cancers, normalized to total search volumes, with published cancer incidence and mortality rates in the United States by state. We hypothesized that internet search volumes would be positively correlated with registry recorded incidences and mortalities of common cancers.

Methods

Data Sources and Collection

We used Google search volume data, collected through Google Trends (https://trends.google.com/trends/) to estimate the relative search volume of each of the included cancer types, by state. Google Trends data are normalized for total Google search volume: “Each data point is divided by the total searches of the geography and time range it represents, to compare relative popularity.” The results are presented in Search Volume Indices (SVIs) on a scale of 0 to 100. For example, the state with the highest number of searches for lung cancer, relative to total number of searches, would be assigned an SVI of 100, while other states with lower relative search numbers for lung cancer would get lower SVIs, relative to this value.

We chose to evaluate the 10 most incident cancers in the United States, based on the Centers for Disease Control and Prevention’s National Program of Cancer Registries in the time period 2009 to 2013 for all 50 US states and the District of Columbia. We collected Google SVIs by state for each cancer, using exact search terms, in the most common lay terms used for these cancers (eg, lung cancer, breast cancer, or prostate cancer). For non-Hodgkin lymphoma we used the more common term lymphoma because this search term is 200 times more common than non-Hodgkin lymphoma and approximately 90% of lymphomas are non-Hodgkin lymphomas. For colorectal cancer we used the more common term colon cancer. We were unable to include cancer of the “corpus and uterus, NOS” or cancer of the “kidney and renal pelvis” because they lacked unifying search terms with adequate data for analysis in Google Trends.

We downloaded Google SVI data in September 2016. We used the National Program of Cancer Registries age-adjusted cancer incidence and mortality rates by state for each cancer for 2009 to 2013. Incidences and mortalities included both sexes except in the case of breast cancer and prostate cancer, for which the only reported data were for women and men, respectively. We collected data from all 50 US states and the District of Columbia except in the case of incidences from Nevada, which were not included in the registry. This study was considered exempt from independent institutional review board by the University of California–San Francisco because all data used were publicly available.

Statistical Analysis

We used Pearson correlation coefficients to evaluate the relationship between known cancer incidence and mortality rates on the 8 cancer types and Google SVIs by state. Each relationship was checked visually for outliers, and if outliers were present, the Pearson correlation coefficient and P value were compared with a Spearman rank-order correlation coefficient and P value for concordance.

Statistical significance was defined as P < .05. All analyses were performed using Stata statistical software (version 12.0; StataCorp Inc).

Results

The Table shows correlation coefficients between actual incidence rates and relative Google search volume for 8 of the 10 most common cancers in the United States: breast, bladder, colorectal, lung, non-Hodgkin lymphoma, melanoma, prostate, and thyroid cancers. We found statistically significant correlations between incidence rates and relative Google search volume for colon cancer (R = 0.61; P < .001), lung cancer (R = 0.73; P < .001), lymphoma (R = 0.51; P < .001), melanoma (R = 0.36; P = .01), and thyroid cancer (R = 0.30; P = .03).

Table. Correlation Coefficients Between Cancer Incidence and Mortality Rates and Relative Google Search Volume Indices, 2009 to 2013.

Cancer Type Incidence Mortality
R (Correlation Coefficient) P Value R (Correlation Coefficient) P Value
Bladder 0.16 .26 0.27 .06
Breast 0.09 .52 0.23 .10
Colon 0.61 <.001 0.61 <.001
Lung 0.73 <.001 0.62 <.001
Lymphoma 0.51 <.001 0.38 .006
Melanoma 0.36 .01 0.31 .03
Prostate 0.24 .09 0.10 .47
Thyroida 0.30 .03 NA NA

Abbreviation: NA, not applicable.

a

We do not report results for thyroid cancer mortality because it had significant outliers and notably different correlation coefficients using Pearson and Spearman methods, with 1 statistically significant P value and 1 nonstatistically significant P value, respectively.

When examining cancer mortality, we noted similar results. There were statistically significant correlations between cancer-specific mortality rates and relative Google search volume for colon cancer (R = 0.61; P < .001), lung cancer (R = 0.62; P < .001), lymphoma (R = 0.38; P = .006), and melanoma (R = 0.31; P = .03) (Table). Breast cancer, prostate cancer, and bladder cancer did not have statistically significant correlations with incidence or mortality rates.

A representative scatter plot of incidence and mortality rates for melanoma vs Google SVIs are presented in Figure 1.

Figure 1. Scatter Plots With Fitted Linear Regression Lines of Incidence Rates and Mortality Rates vs Google SVIs by State for Melanoma.

Figure 1.

Linear regression lines are not equivalent to correlation coefficients and are included for visual purposes. SVI indicates search volume index.

Discussion

For several cancers, including colon, lung, lymphoma, and melanoma, state-specific relative Google search volume positively correlates with state-specific cancer incidence and mortality rates recorded by the National Program of Cancer Registries. This proof of concept study supports the potential use of internet search data and publicly available information on population interest in health topics more broadly to estimate disease characteristics, such as incidence and mortality rates. These types of data sources may be particularly useful in cases in which national registry data are unavailable, such as basal and squamous cell carcinomas, or when real-time information is desired given that cancer registry data are frequently several years old when published.

While most cancers that we examined showed statistically significant correlations, breast, prostate, and bladder cancers did not. This could be partly explained by strong public health campaigns, including screening and awareness initiatives, which may broadly increase search volume independent of disease metrics. For example, it has been previously shown that there is a marked increase in Google searches for “breast cancer” in October in the United Sates during Breast Cancer Awareness Month. Similarly, melanoma search volume varies seasonally (Figure 2), which has been previously reported. The previous study used more limited Google SVI and registry data time periods and showed an association between Google search volume and melanoma mortality but not melanoma incidence.

Figure 2. Melanoma Google Search Volume Indices (SVIs) in the United States Over Time, 2009 to 2013.

Figure 2.

Limitations

This study has a number of limitations. The use of Google search data to estimate disease metrics may not be completely generalizable because the data are restricted to those with access to the internet who use Google, although this represents most of the US population. Our analysis was also limited to cancer terms with recorded search volume in Google Trends, and therefore the findings of this study may not be generalizable to rare diseases or diseases without a common unifying search term. Because search volume may change independently of disease metrics, such as secondary to public health campaigns targeted to specific cancers, this method may not be appropriate for comparing incidence and mortality rates between diseases.

Conclusion

Population-level disease metrics are critically important to guide the distribution of resources and design of public health initiatives. Internet search data may provide useful estimates of disease, such as incidence, particularly where registry data are insufficient, lagging, or lacking.

References


Articles from JAMA Dermatology are provided here courtesy of American Medical Association

RESOURCES