Key Points
Question
Does state-specific internet search volume correlate with incidence and mortality rates of common cancers in the United States?
Findings
By state, relative Google search volume correlated with cancer incidence rates in 5 of 8 commonly diagnosed cancers in the United States and correlated with cancer mortality rates in 4 of those 5 cancers.
Meaning
Population-level internet search behavior may be a valuable tool to estimate cancer incidence and mortality rates, especially for cancers not included in national registries.
Abstract
Importance
Population-level disease metrics are critical to guide the distribution of resources and implementation of public health initiatives. Internet search data reflect population interest in health topics and may be an alternative metric of disease characteristics when traditional sources are lacking, such as in basal and squamous cell carcinomas, which are not included in national cancer registries. However, these data are not yet well validated or understood.
Objective
To evaluate whether state-specific normalized internet search volume correlates with incidence and mortality rates of common cancers in the United States, including melanoma.
Design, Setting, and Participants
This was a cross-sectional analysis of Google search volume index data and US cancer incidences and mortalities of 8 of the most incident cancers in the United States in 2009 to 2013, at the state level, per the National Program of Cancer Registries. Participants were people performing Google searches and patients diagnosed as having cancers reported to cancer registries.
Main Outcomes and Measures
Correlation between Google search volumes, normalized to total Google search volume, and National Program of Cancer Registries recorded cancer incidence and mortality rates.
Results
By state, relative Google search volume statistically significantly correlated with cancer incidence rates in 5 of 8 commonly diagnosed cancers in the United States (colon cancer: R = 0.61; P < .001; lung cancer: R = 0.73; P < .001; lymphoma: R = 0.51; P < .001; melanoma: R = 0.36; P = .01; and thyroid cancer: R = 0.30; P = .03). For 4 of those 5 cancers (colon cancer: R = 0.61; P < .001; lung cancer: R = 0.62; P < .001; lymphoma: R = 0.38; P = .006; and melanoma: R = 0.31; P = .03), relative Google search volume also correlated with mortality rates.
Conclusions and Relevance
Population-level internet search behavior may be a valuable real-time tool to estimate cancer incidence and mortality rates, especially for cancers not included in national registries, such as basal and squamous cell carcinomas.
This cross-sectional analysis of Google seach volume index data investigates whether internet search volumes are correlated with registry recorded incidences and mortalities of common cancers
Introduction
Disease registries provide a valuable source of information to guide research and public health initiatives. However, rigorous nationwide registry data are often unavailable for even the most common diseases, such as basal and squamous cell carcinomas. Internet search data are a novel and promising tool to estimate the impact of disease in the absence of existing data sources or where traditional methods are inadequate. We investigated the association of internet search volumes for common cancers, normalized to total search volumes, with published cancer incidence and mortality rates in the United States by state. We hypothesized that internet search volumes would be positively correlated with registry recorded incidences and mortalities of common cancers.
Methods
Data Sources and Collection
We used Google search volume data, collected through Google Trends (https://trends.google.com/trends/) to estimate the relative search volume of each of the included cancer types, by state. Google Trends data are normalized for total Google search volume: “Each data point is divided by the total searches of the geography and time range it represents, to compare relative popularity.” The results are presented in Search Volume Indices (SVIs) on a scale of 0 to 100. For example, the state with the highest number of searches for lung cancer, relative to total number of searches, would be assigned an SVI of 100, while other states with lower relative search numbers for lung cancer would get lower SVIs, relative to this value.
We chose to evaluate the 10 most incident cancers in the United States, based on the Centers for Disease Control and Prevention’s National Program of Cancer Registries in the time period 2009 to 2013 for all 50 US states and the District of Columbia. We collected Google SVIs by state for each cancer, using exact search terms, in the most common lay terms used for these cancers (eg, lung cancer, breast cancer, or prostate cancer). For non-Hodgkin lymphoma we used the more common term lymphoma because this search term is 200 times more common than non-Hodgkin lymphoma and approximately 90% of lymphomas are non-Hodgkin lymphomas. For colorectal cancer we used the more common term colon cancer. We were unable to include cancer of the “corpus and uterus, NOS” or cancer of the “kidney and renal pelvis” because they lacked unifying search terms with adequate data for analysis in Google Trends.
We downloaded Google SVI data in September 2016. We used the National Program of Cancer Registries age-adjusted cancer incidence and mortality rates by state for each cancer for 2009 to 2013. Incidences and mortalities included both sexes except in the case of breast cancer and prostate cancer, for which the only reported data were for women and men, respectively. We collected data from all 50 US states and the District of Columbia except in the case of incidences from Nevada, which were not included in the registry. This study was considered exempt from independent institutional review board by the University of California–San Francisco because all data used were publicly available.
Statistical Analysis
We used Pearson correlation coefficients to evaluate the relationship between known cancer incidence and mortality rates on the 8 cancer types and Google SVIs by state. Each relationship was checked visually for outliers, and if outliers were present, the Pearson correlation coefficient and P value were compared with a Spearman rank-order correlation coefficient and P value for concordance.
Statistical significance was defined as P < .05. All analyses were performed using Stata statistical software (version 12.0; StataCorp Inc).
Results
The Table shows correlation coefficients between actual incidence rates and relative Google search volume for 8 of the 10 most common cancers in the United States: breast, bladder, colorectal, lung, non-Hodgkin lymphoma, melanoma, prostate, and thyroid cancers. We found statistically significant correlations between incidence rates and relative Google search volume for colon cancer (R = 0.61; P < .001), lung cancer (R = 0.73; P < .001), lymphoma (R = 0.51; P < .001), melanoma (R = 0.36; P = .01), and thyroid cancer (R = 0.30; P = .03).
Table. Correlation Coefficients Between Cancer Incidence and Mortality Rates and Relative Google Search Volume Indices, 2009 to 2013.
Cancer Type | Incidence | Mortality | ||
---|---|---|---|---|
R (Correlation Coefficient) | P Value | R (Correlation Coefficient) | P Value | |
Bladder | 0.16 | .26 | 0.27 | .06 |
Breast | 0.09 | .52 | 0.23 | .10 |
Colon | 0.61 | <.001 | 0.61 | <.001 |
Lung | 0.73 | <.001 | 0.62 | <.001 |
Lymphoma | 0.51 | <.001 | 0.38 | .006 |
Melanoma | 0.36 | .01 | 0.31 | .03 |
Prostate | 0.24 | .09 | 0.10 | .47 |
Thyroida | 0.30 | .03 | NA | NA |
Abbreviation: NA, not applicable.
We do not report results for thyroid cancer mortality because it had significant outliers and notably different correlation coefficients using Pearson and Spearman methods, with 1 statistically significant P value and 1 nonstatistically significant P value, respectively.
When examining cancer mortality, we noted similar results. There were statistically significant correlations between cancer-specific mortality rates and relative Google search volume for colon cancer (R = 0.61; P < .001), lung cancer (R = 0.62; P < .001), lymphoma (R = 0.38; P = .006), and melanoma (R = 0.31; P = .03) (Table). Breast cancer, prostate cancer, and bladder cancer did not have statistically significant correlations with incidence or mortality rates.
A representative scatter plot of incidence and mortality rates for melanoma vs Google SVIs are presented in Figure 1.
Discussion
For several cancers, including colon, lung, lymphoma, and melanoma, state-specific relative Google search volume positively correlates with state-specific cancer incidence and mortality rates recorded by the National Program of Cancer Registries. This proof of concept study supports the potential use of internet search data and publicly available information on population interest in health topics more broadly to estimate disease characteristics, such as incidence and mortality rates. These types of data sources may be particularly useful in cases in which national registry data are unavailable, such as basal and squamous cell carcinomas, or when real-time information is desired given that cancer registry data are frequently several years old when published.
While most cancers that we examined showed statistically significant correlations, breast, prostate, and bladder cancers did not. This could be partly explained by strong public health campaigns, including screening and awareness initiatives, which may broadly increase search volume independent of disease metrics. For example, it has been previously shown that there is a marked increase in Google searches for “breast cancer” in October in the United Sates during Breast Cancer Awareness Month. Similarly, melanoma search volume varies seasonally (Figure 2), which has been previously reported. The previous study used more limited Google SVI and registry data time periods and showed an association between Google search volume and melanoma mortality but not melanoma incidence.
Limitations
This study has a number of limitations. The use of Google search data to estimate disease metrics may not be completely generalizable because the data are restricted to those with access to the internet who use Google, although this represents most of the US population. Our analysis was also limited to cancer terms with recorded search volume in Google Trends, and therefore the findings of this study may not be generalizable to rare diseases or diseases without a common unifying search term. Because search volume may change independently of disease metrics, such as secondary to public health campaigns targeted to specific cancers, this method may not be appropriate for comparing incidence and mortality rates between diseases.
Conclusion
Population-level disease metrics are critically important to guide the distribution of resources and design of public health initiatives. Internet search data may provide useful estimates of disease, such as incidence, particularly where registry data are insufficient, lagging, or lacking.
References
- 1.Siegel RL, Miller KD, Jemal A. Cancer statistics, 2015. CA Cancer J Clin. 2015;65(1):5-29. [DOI] [PubMed] [Google Scholar]
- 2.Miller DL, Weinstock MA. Nonmelanoma skin cancer in the United States: incidence. J Am Acad Dermatol. 1994;30(5, pt 1):774-778. [DOI] [PubMed] [Google Scholar]
- 3.Carneiro HA, Mylonakis E. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clin Infect Dis. 2009;49(10):1557-1564. [DOI] [PubMed] [Google Scholar]
- 4.How Trends data is adjusted. https://support.google.com/trends/answer/4365533?hl=en. Accessed October 20, 2016.
- 5.US Cancer Statistics Working Group United States Cancer Statistics: 1999–2013 Incidence and Mortality Web-based Report. Atlanta, GA: US Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute 2016. https://www.cdc.gov/cancer/npcr/uscs/index.htm. Accessed October 20, 2016.
- 6.Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA Cancer J Clin. 2016;66(1):7-30. [DOI] [PubMed] [Google Scholar]
- 7.Glynn RW, Kelly JC, Coffey N, Sweeney KJ, Kerin MJ. The effect of breast cancer awareness month on internet search activity: a comparison with awareness campaigns for lung and prostate cancer. BMC Cancer. 2011;11:442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bloom R, Amber KT, Hu S, Kirsner R. Google Search trends and skin cancer: evaluating the US population’s interest in skin cancer and its association with melanoma outcomes. JAMA Dermatol. 2015;151(8):903-905. [DOI] [PubMed] [Google Scholar]
- 9.comScore Releases February 2016 US Desktop Search Engine Rankings. 2016. http://www.comscore.com/Insights/Rankings/comScore-Releases-February-2016-US-Desktop-Search-Engine-Rankings. Accessed October 20, 2016.
- 10.Americans’ Internet Access 2000-2015. 2015; http://www.pewinternet.org/2015/06/26/americans-internet-access-2000-2015/. Accessed October 20, 2016.