Skip to main content
Journal of General Internal Medicine logoLink to Journal of General Internal Medicine
letter
. 2021 Mar 19;37(5):1315–1317. doi: 10.1007/s11606-021-06695-8

Data-Tracking on Government, Non-profit, and Commercial Health-Related Websites

Alexander R Zheutlin 1,, Joshua D Niforatos 2, Jeremy B Sussman 3,4
PMCID: PMC7978469  PMID: 33742302

INTRODUCTION

Health information privacy is a core value of American health. The privacy of communication between healthcare providers and patients is protected by law, but no equivalent protection exists for individuals performing health-related searches on patient-facing websites. Consumer aggregated data obtained from searches on patient-facing websites can be used to create a health profile and even merge it with non-health-related information. These data profiles can be sold to other companies or used to curate targeted advertisement that follows the individual user. While this information collection is acknowledged, the degree of such data releases and specificity to the individual is often unknown to individual users.1 Therefore, we used a privacy inspection tool2 to determine prevalence and type of data tracking from commonly searched government and non-government health-related websites.

METHODS

Our sample was restricted to US-based websites from the most trafficked health websites measured by the website traffic monitoring service SimilarWeb (www.similarweb.com) and all sites from the Medical Library Association’s recommended websites for health information as of October 17, 2020.

To identify website monitoring and data collection, each website URL was examined with Blacklight, an internet-based tool which tests how website use surveillance on their users.3 We present data for each website including use of ad tracking, use of third-party tracking and identification “cookies,” and availability of use data for Facebook and Google Analytics tracking.

Data was described using frequencies and measures of central tendencies. The ANOVA test and chi-square test were used to assess differences in means and frequencies between government, non-profit, and commercial health websites. All analyses were conducted in R version 4.0.2 (R Foundation).

RESULTS

The average number of ad trackers across included websites was 2.11 (SD 0.60), 7.15 (SD 6.26), and 15.84 (SD 10.29) for government, non-profit, and commercial health websites, respectively (p < 0.001). The average number of third-party cookies across included websites was 1.11 (SD 1.05), 10.85 (SD 11.9), and 25.08 (SD 25.45) for government, non-profit, and commercial health websites, respectively (p=0.003). Regarding websites informing Facebook of user activity, 0 (0.0%) government website, 10 (50.0%) non-profit websites, and 15 (60.0%) of commercial websites provided user data to Facebook. Regarding websites informing Google analytics of user activity, 6 (67.7%) government websites, 14 (70.0%) non-profit websites, and 16 (64.0%) of commercial websites provided user data to Google analytics. Average search results are recorded in Table 1, while individual results can be found in Table 2.

Table 1.

Mean and count frequency of data provided to outside organizations by Government, Non-profit, and Commercial website

Ad trackers Third-party cookies Information provided to Facebook Information provided to Google Analytics
Reference 7 3 * *

Government websites

N=9

2.11 (0.60) 1.11 (1.05) 0 (0.0%) 6 (67.7%)

Non-profit

N=20

7.15 (6.26) 10.85 (11.99) 10 (50.0%) 14 (70.0%)

Commercial websites

N=25

15.84 (10.29) 25.08 (25.45) 15 (60.0%) 16 (64.0%)

Percentages based upon row totals

Data abstracted on October 17, 2020

Table 2.

Compiled List of Reviewed Websites

Ad trackers Third-party cookies Information provided to Facebook Information provided to Google Analytics
Reference 7 3 * *
Government websites
Cdc.gov 3 2
Medlineplus.gov 3 0 Yes
NIH.gov 2 2 Yes
NIA.NIH.gov 2 0 Yes
Health.gov 2 2 Yes
Cancer.gov 1 2
NIDDK.nih.gov 2 0
Healthfinder.gov 2 2 Yes
nei.nih.gov 2 0 Yes
Commercial websites
Healthline.com 8 16 Yes
Webmd.com 28 33 Yes
Medicalnewstoday.com 13 18 Yes
Cvs.com 3 3
Walgreens.com 30 54 Yes
Athenahealth.com 20 27 Yes
Myfitnesspal.com 2 0 Yes
Fitbit.com 16 16 Yes Yes
Drugs.com 9 16 Yes
Psychologytoday.com 6 4 Yes
Menshealth.com 17 15 Yes Yes
Womenshealthmag.com 17 17 Yes Yes
Msdmanuals.com 6 3 Yes Yes
Medicinenet.com 15 20 Yes
Medscape.com 20 16 Yes
Healthgrades.com 10 4 Yes Yes
Babycenter.com 38 92 Yes
Practicefusion.com 8 8 Yes Yes
Weightwatchers.com 25 51 Yes Yes
Yahoo.com/lifestyle/tagged/health 38 86
Everydayhealth.com 28 47 Yes Yes
Healthgrades.com 10 4 Yes Yes
Mercola.com 5 4
Health.com 12 53 Yes
Rxlist.com 12 20 Yes
Non-profit
Mayoclinic.org 15 31 Yes
Clevelandclinic.org 4 3 Yes
Kidshealth.org 6 11 Yes Yes
Hivinsite.ucsf.edu 0 1
Familydoctor.org 9 6 Yes Yes
Cancer.org 21 40 Yes Yes
Breastcancer.org 10 4 Yes Yes
Cancercare.org 3 4 Yes
Foundationforwomenscancer.org 3 6 Yes
Oncolink.org 7 17 Yes
Diabetes.org 13 19 Yes Yes
Joslin.org 2 0 Yes
Childrenwithdiabetes.com 0 0
Afb.org 5 6 Yes
w-e-h.org 0 0
Heart.org 16 34 Yes Yes
Thebody.com 16 10 Yes Yes
Americanstroke.org 1 4
Health.harvard.edu 3 3 Yes
Stroke.org 9 18 Yes Yes

Data abstracted on October 17, 2020, using Blacklight a publicly available service, with code made available on GitHub, which traverses individual websites and collects information such as third-party cookies, ad trackers, key logging, session recording, canvas fingerprinting, Facebook tracking, and Google Analytics use

DISCUSSION

All health websites studied provide data to ad trackers and third-party cookies. Popular commercial websites used substantially more third-party cookies and ad-trackers than non-profit websites, which had more than the average government website.

Health-related websites often serve as a supplement, in which patients can find answers and further explanations for disease and treatment options. Data provided this way can be used to construct a personal profile of personal health information, and subsequently be provided or sold to other companies for the purposes of improving advertising targeting, as is the case in the current report in which this relationship with Google and Facebook is observed. The degree to which this information is used this way is unkown.4 However, this appears to be a common practice, and a recent publication of COVID-19-related websites found similar results related to the prevalence of third-party tracking.5 Greater clarity of how websites use collected health data may allow better identification of online resources that maximize the privacy of users, while also provided helpful medical information.

This study has several limitations. First, we used one software program to examine website tracking, and the algorithm for monitoring data-privacy may vary.6 Furthermore, we limited our search to selected commonly trafficked websites. A broader sample may provide more granularity among health-related websites. Finally, it is not entirely known what the websites do with this information, or its overall benefits or harms.

We found that every category of health-related website examined provided information to ad trackers and created third-party cookies. Furthermore, providing data to Facebook and Google for targeted advertising was found to be relatively common, particularly among our sample of commercial and non-profit health-related websites. Searching for personal health information is not a private action and patients and providers must account for this when they search for health information online.

Funding

Alexander Zheutlin is supported by the Utah Stimulating Access to Research in Residency Transition Scholar (StARRTS) Award Number 1R38HL143605-01.

Declarations

Conflict of Interest

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References


Articles from Journal of General Internal Medicine are provided here courtesy of Society of General Internal Medicine

RESOURCES