Skip to main content
Investigative Ophthalmology & Visual Science logoLink to Investigative Ophthalmology & Visual Science
. 2018 Feb;59(2):910–920. doi: 10.1167/iovs.17-22818

Clinical Age-Specific Seasonal Conjunctivitis Patterns and Their Online Detection in Twitter, Blog, Forum, and Comment Social Media Posts

Michael S Deiner 1,2, Stephen D McLeod 1,2, James Chodosh 3, Catherine E Oldenburg 1,2,4, Cherie A Fathy 5, Thomas M Lietman 1,2,4, Travis C Porco 1,2,4
PMCID: PMC5815847  PMID: 29450538

Abstract

Purpose

We sought to determine whether big data from social media might reveal seasonal trends of conjunctivitis, most forms of which are nonreportable.

Methods

Social media posts (from Twitter, and from online forums and blogs) were classified by age and by conjunctivitis type (allergic or infectious) using Boolean and machine learning methods. Based on spline smoothing, we estimated the circular mean occurrence time (a measure of central tendency for occurrence) and the circular variance (a measure of uniformity of occurrence throughout the year, providing an index of seasonality). Clinical records from a large tertiary care provider were analyzed in a similar way for comparison.

Results

Social media posts machine-coded as being related to infectious conjunctivitis showed similar times of occurrence and degree of seasonality to clinical infectious cases, and likewise for machine-coded allergic conjunctivitis posts compared to clinical allergic cases. Allergic conjunctivitis showed a distinctively different seasonal pattern than infectious conjunctivitis, with a mean occurrence time later in the spring. Infectious conjunctivitis for children showed markedly greater seasonality than for adults, though the occurrence times were similar; no such difference for allergic conjunctivitis was seen.

Conclusions

Social media posts broadly track the seasonal occurrence of allergic and infectious conjunctivitis, and may be a useful supplement for epidemiologic monitoring.

Keywords: infectious conjunctivitis, allergic conjunctivitis, machine learning, social media, Twitter


Using “big data” to complement traditional clinical case reporting has been conducted for a number of conditions,116 including conjunctivitis.1721 It has been argued that epidemiologic application of such data may result in bias,22,23 but could also provide information not otherwise available.24 Aside from neonatal cases, no CDC reporting system for conjunctivitis exists, despite having some economic25 and public health26 importance. We asked whether social media streams reveal important features of the epidemiology of conjunctivitis, such as age-specific seasonal occurrence, for instance.

Previously, we found that temporal patterns of conjunctivitis, whether allergic or infectious, correlated with searches and tweets.20 In this paper, using data from a longer time period, we test whether the seasonality of clinical case counts for both infectious and allergic conjunctivitis is related to the seasonality of social media posts. In addition to tweets, we examine posts and comments on blogs and Internet forums.

Methods

EMR Clinical Data Acquisition

All UCSF electronic medical records (EMR) from June 3, 2012, to July 16, 2016, were queried. Queries were constructed to identify cases of: (1) conjunctivitis (all encounters with a diagnosis name containing “conjunctivi”) and controls; (2) glaucoma (all encounters with a diagnosis name containing “glaucoma”); (3) macular degeneration (ICD9 diagnosis code of 362.5 and ICD10 equivalents); (4) corneal ulcer (ICD9 diagnosis code of 370.x and ICD10 equivalents); (5) common cold (ICD9 diagnosis code of 460.x and ICD10 equivalents); and (6) influenza (ICD9 level 3 group of “Influenza” and ICD10 equivalents). Using ICD9/10 codes and diagnosis names, conjunctivitis was further classified as infectious conjunctivitis, allergic conjunctivitis, or other conjunctivitis (see Supplementary Material for details). These records included diagnoses from all provider specialties. For each diagnosis, a maximum of only one encounter per day per patient was included for subsequent analysis. Weekly counts were grouped by overall diagnosis category. Infectious conjunctivitis, allergic conjunctivitis, other conjunctivitis and flu were also tabulated in two ways: most common top provider specialties and age categories (0–5 years, 6–17 years, 18–39 years, and 40 or more years of age).

Social Media Data Acquisition

Using a commercial social media analytics platform Crimson Hexagon,27,28 we identified social media posts related to a case of a person (or group of people) having any form of infectious or allergic conjunctivitis or eye allergy using a Boolean query (to refine and limit results using keywords and phrases with the operators AND, OR, NOT). Cinematic or cultural references to pink eye were similarly excluded. To further refine the data into distinct infectious or allergic groups, we then trained the crimson Hexagon BrightView classifier to classify posts as either “infectious conjunctivitis” (posts concerning conjunctivitis or pink eye but not related to allergy); “allergic conjunctivitis” (posts concerning eye allergy or eyes with allergic conjunctivitis symptoms such as itching, but not related to infectious spread); and “irrelevant/uncertain” (posts concerning cosmetics, humor, other disease, emotional crying, or otherwise unclassifiable). The BrightView classifier is a supervised learning algorithm based in part on stacked regression analysis of simplified numerical representation of text.29 Posts that may potentially address infectious or allergic conjunctivitis were collected as a corpus of tweets from Twitter and a separate corpus of pooled posts from forums, blogs, and public comment sections (“forums”). All forums, blogs, and comment sections available through Crimson Hexagon were queried. Within the identified infectious and allergic conjunctivitis Twitter and forums posts, a Boolean classification was then used to characterize posts as related to young people or children, or not being related to young people or children, in order to create “younger” and “older” age-based subgroups. Boolean queries above were also used to remove posts containing common terms, phrases, song names, user accounts, and so forth, if obviously not related to posts about occurrence of known conjunctivitis in a person (see Appendix and Supplementary Material for more query details).

Validation of Machine-Coded Classification

Validation of the machine-coded classifications of infectious and allergic conjunctivitis was assessed by comparison with human classifications. A random sample of the classified posts, excluding any used in training the machine, was selected. Two independent human raters were masked to the machine-coded classifications and to each other's classifications. Humans rated posts independently, and then a modified Delphi process was used to arrive at a consensus human rating. The consensus human rating was then compared to the BrightView classifier using percent agreement. Details are provided in the Appendix.

Statistical Approach

Clinical and social media data weekly counts were analyzed using negative binomial regression30 with a cubic polynomial in time to account for secular trends, and cyclic cubic splines31 as a flexible model of arbitrary seasonal variation (with eight knots per year). Estimation was conducted by maximum likelihood, yielding detrended and smoothed estimates of the average count over the course of 1 year. Based on these estimates, we then computed the circular mean, the circular variance, and the relative amplitude. The circular mean provides a measure of central tendency for estimated occurrence frequency over a year (e.g., the circular mean week for school graduations in the United States might be the first week of June). The circular variance measures the degree of uniformity of seasonal occurrence, ranging from 0 when all events occur at the same instant every year, and 1 when events are uniformly distributed throughout the year and no seasonal variation exists. Hypothesis testing was conducted using the likelihood ratio χ2. We also compared the similarity of time series using Spearman rank correlation by first detrending (using negative binomial regression with a cubic polynomial in time, and using time series bootstrap with a fixed window of length 8). When we found evidence of outliers in the time series, we conducted sensitivity analysis using additional predictors of the form Inline graphic (where Inline graphic and Inline graphic are the beginning and end of each sequence of outliers). All computations were conducted in R, version 3.3.1 for Macintosh (R Foundation for Statistical Computing, Vienna, Austria). Details are provided in the Appendix.

IRB Approval

This study followed the tenets of the Declaration of Helsinki, with approval obtained from the UCSF institutional review board prior to commencing this study (IRB approval# 14-14743).

Results

We examined the period June 3, 2012, to July 16, 2016. Distributions of clinical conjunctivitis diagnosis groups for all ages combined, as well as for age-based subgroups, are shown in Table 1. For the clinical data, we extracted encounters for infectious conjunctivitis, allergic conjunctivitis, and other conjunctivitis cases (such as those resulting from chemical exposure). For comparison, we also included five additional conditions, including influenza. The EMR patient data set yielded 108,699 total patient records for these eight conditions. Counting each patient only once for each category, a total of 31,037 patients were analyzed. Of these, we analyzed 4713 allergic conjunctivitis visits for 2614 patients, 8036 infectious conjunctivitis visits for 6186 patients, and 810 other conjunctivitis visits for 513 patients. For influenza, we analyzed 3237 visits for 2270 patients. Clinical infectious, allergic, and other conjunctivitis encounters had differing distributions depending on specialty. Of the total infectious conjunctivitis patient encounters, the most (27%) were seen in pediatrics followed by internal medicine at 22% and ophthalmology at 14%. Of the total allergic conjunctivitis patient encounters, pediatrics and ophthalmology ranks were reversed from infectious conjunctivitis: the most (32%) allergic encounters were seen in ophthalmology, followed by optometry at 17% and pediatrics at 14%. For other conjunctivitis patient encounters, the most (44%) were seen in ophthalmology, followed by optometry at 22% and pediatrics at 7%. Emergency medicine contributed 11% of infectious, 2% of allergic, and 6% of other conjunctivitis cases, with conjunctivitis-related sex, age, and seasonal results similar to that found for a national emergency medicine clinical database.32

Table 1.

Distributions of EMR and Social Media Post Disease Groups, by Conjunctivitis Disease Group and Age

graphic file with name i1552-5783-59-2-910-t01.jpg

The Twitter query resulted in 183,301 infectious conjunctivitis posts and 232,052 allergic conjunctivitis posts. The queried forums, blogs and comments (“forums”) pooled results (consisting of approximately 89% forums, 9% blogs, and 2% comments), yielded 15,434 infectious conjunctivitis posts and 18,117 allergic conjunctivitis posts overall.

Validation of Social Media Data Machine-Coded Classifications

For Twitter, comparison of the consensus human classification to the machine-coded classifications showed that humans agreed 92% (95% confidence interval [CI]: 86%–96%) of the time with the machine-coded classifications, with human agreement 89% (95% CI: 79%–95%) of the time when the machine classification was infectious, and 95% (95% CI: 87%–99%) of the time when the machine classification was allergic. For Twitter posts where the consensus human classification was infectious conjunctivitis, the machine-coded classification was infectious 100% of the time (57/57) and allergic 0% of the time. For posts where the consensus human classification was allergic conjunctivitis, the machine-coded classification was allergic 100% (61/61) of the time and infectious 0% of the time.

For forums posts, comparison of the consensus human classification to the machine-coded classifications showed that humans agreed 55% (95% CI: 42%–67%) of the time with the machine classifications, with human agreement 59% (95% CI: 41%–76%) of the time when the machine classification was infectious, and 50% (95% CI: 32%–68%) of the time when the machine classification was allergic. Only 1 case of the disagreement between human consensus and machine-coded classifications was due to opposing forms of human and machine-assigned conjunctivitis classifications. Almost all disagreement was because the consensus human classification found 43% of the machine-coded conjunctivitis forums posts to be uncertain or unrelated to conjunctivitis. For comparison, for Twitter posts, the consensus human classification found only 8% of the machine-coded conjunctivitis Twitter posts to be uncertain or unrelated to conjunctivitis. This may reflect the longer length and less uniform nature of forums posts. For forums posts where the consensus human classification was infectious conjunctivitis, the machine-coded classification was infectious 100% of the time (19/19) and allergic 0% of the time. For forums posts where the consensus human classification was allergic conjunctivitis, the machine-coded classification was allergic 94% of the time (16/17) and infectious 6% of the time (1/17).

Age-Based Query Results

Table 1 summarizes age-based query results for EMR clinical, Twitter, and forum data. The ratio of infectious to allergic conjunctivitis posts by age subgroup demonstrates a strong similarity between clinical data, Twitter posts, and forum posts in that younger ages manifest a much higher ratio of infectious to allergic cases than older ages. Text content for conjunctivitis type and age-based subgroups of Twitter posts can be compared in the topic wheel visualizations shown in Figure 1, depicting apparent differences in some main topics and subtopic content for each subgroup.33 Additional content comparison of Twitter and forum age-based subgroups for infectious conjunctivitis are shown in Figure 2.

Figure 1.

Figure 1

Conversation topics, Twitter. Topic wheels created from a random sample of 1000 posts used for analysis. Left: younger. Right: older. Top: infectious. Bottom: allergic. Source: Crimson Hexagon.

Figure 2.

Figure 2

Conversation topics, all ages. Left: infectious conjunctivitis. Right: allergic. Top: Twitter. Bottom: forums. Source: Crimson Hexagon.

Estimated Seasonal Patterns and Comparisons

Weekly infectious conjunctivitis cases, tweets, and posts are shown from 2012 to 2016 in Figure 3 (gray, smoothed curve in red). The figure also shows allergic conjunctivitis (gray, with smoothed curve in green). Detrending yielded estimated seasonal variation over the course of 1 year, shown in the smoothed curves in Figure 4. From these detrended smoothed curves, we calculated summary statistics representing specific seasonal features (i.e., the circular mean, circular variance, and the relative amplitude; shown in Tables 2, 3; Fig. 5). The estimated circular mean for infectious conjunctivitis clinical data was March 18 (95% CI: March 10–26), providing a measure of occurrence time within a year. The estimated circular variance was 0.79 (95% CI: 0.76–0.82), providing a measure of seasonality.

Figure 3.

Figure 3

Weekly EMR and social media counts and estimated seasonal pattern fitted curves: conjunctivitis groups and age groups for clinical and social media data. Weekly count data, using data from over a multiple years, were analyzed using negative binomial regression to create fitted seasonal curves. Panels show raw weekly data and corresponding fitted curves, these can be compared between conjunctivitis infectious, allergic and other groups for both EMR clinical data as well as analogous Twitter or Forum post conjunctivitis post groups. Rows: Clinical cases (row 1); Twitter (row 2); and forums (row 3). Columns: All ages combined (column 1); younger ages (column 2); older ages (column 3). Colors: fitted curves for infectious conjunctivitis (red); fitted curves for allergic conjunctivitis (green); observed weekly counts (gray). Each tick on the horizontal axis represents 6 months and each date listed corresponds to the tick mark centered above its listed date.

Figure 4.

Figure 4

Smoothed detrended seasonal curves. (A) Infectious conjunctivitis for older (thick lines) and younger individuals (thin lines); clinical EMR (solid lines); and Twitter (dashed lines). (B) Allergic conjunctivitis for older (thick lines) and younger (thin lines); clinical EMR (solid lines); and Twitter (dashed lines). (C) Infectious conjunctivitis (red) and allergic conjunctivitis (green), for pediatrics (solid lines) and ophthalmology (dashed lines). (D) Clinical influenza (blue) and infectious conjunctivitis (red), for older (solid lines) and younger individuals (dashed lines). (E) Clinical influenza (blue); allergic conjunctivitis (green); infectious conjunctivitis (red); and corneal ulcers (gray), all ages. X-axis: top tick marks are 10-week intervals starting at week 0, bottom tick marks indicate middle of months.

Table 2.

Seasonal Characteristics of Conjunctivitis in Clinical EMR and Social Media Posts

graphic file with name i1552-5783-59-2-910-t02.jpg

Table 3.

EMR Seasonal Characteristics of Infectious, Allergic, and Other Conjunctivitis Groups, as Well as Influenza, by Age Group

graphic file with name i1552-5783-59-2-910-t03.jpg

Figure 5.

Figure 5

Timing and degree of seasonality for selected clinical and social media data. The circular mean week of occurrence is shown on the horizontal axis; the vertical axis displays a measure of degree of seasonality (one minus the circular variance; higher location indicates greater seasonality). Infectious conjunctivitis is shown in red, allergic conjunctivitis in green. Glaucoma and corneal ulcers are shown for comparison.

Overall, similar seasonal features were detected between social media and clinical data, for all ages combined. (Because seasonal occurrence data are angular data, we used the circular mean week to summarize the central tendency.) The circular mean week of infectious conjunctivitis was similar for data from EMR, Twitter, and forums; and the circular mean week of allergic conjunctivitis was similar for data from EMR, Twitter and forums (see Table 2). The circular mean week for infectious conjunctivitis preceded the allergic conjunctivitis mean week by approximately 10 weeks for all three data sources (see Table 2, Figs. 3, 4). Weekly infectious conjunctivitis counts for all three data sources were more strongly correlated with each other than with any allergic conjunctivitis data source (Spearman correlation, please see Table 4). Similarly, counts for allergic conjunctivitis for all three data sources were more strongly correlated with each other than with any infectious conjunctivitis data source (see Table 4).

Table 4.

Spearman Rank Correlation of Detrended Residuals for Clinical Counts, Twitter, and Forum Posts, for Infectious and Allergic Conjunctivitis

graphic file with name i1552-5783-59-2-910-t04.jpg

For both infectious and allergic conjunctivitis, the seasonality of tweets and forum posts is similar to that of clinical conjunctivitis, in both younger and older age groups. Since the age classification for clinical cases is not the same as for tweets and forum posts, we do not expect identical seasonal patterns. However, the mean occurrence day of infectious conjunctivitis for the youngest age group for the clinical data was March 9 (95% CI: March 5–15). The mean occurrence day of tweets for infectious conjunctivitis for younger people was February 26 (95% CI: February 19–March 6); see Table 5 for details. Figure 5 shows the seasonality (one minus the circular variance) as a function of the circular mean occurrence time, by age and condition. Allergic conjunctivitis appears somewhat later in the year than infectious conjunctivitis. Among infectious conjunctivitis cases, children show greater seasonal variation than older individuals. These patterns are apparent in both the clinical data and in social media. Standard errors are given in Table 5. Broadly speaking, for infectious conjunctivitis in both the younger and the older age groups, tweets and forums are more Spearman correlated with infectious clinical cases than for allergic clinical cases (not shown).

Table 5.

Seasonal Characteristics of Conjunctivitis in Clinical EMR and Social Media Posts, by Age

graphic file with name i1552-5783-59-2-910-t05.jpg

Comparing seasonal infectious conjunctivitis features between age groups, for Twitter, forums, or EMR data, infectious conjunctivitis at younger ages had a higher degree of seasonality than at older ages (see Fig. 3, column 2 versus 3; Fig. 5 solid red versus open red). Besides using patient age groups, as an alternative approach to compare seasonal characteristics of younger clinical cohorts to older cohorts, we also compared seasonal characteristics of patients seen in pediatrics to other specialties and found that infectious conjunctivitis for pediatrics had a much higher degree of seasonality and earlier mean week than for ophthalmology (see Fig. 4C) as well as, to a lesser extent, for other clinical specialties (data not shown).

Comparing between infectious and allergic conjunctivitis by age groups, there were more infectious than allergic cases at younger ages than at older ages, for EMR and social media data (see Table 1 “Ratio”, Fig. 3, columns 2–3). At younger ages, infectious conjunctivitis had stronger circular variance seasonality than allergic conjunctivitis, for EMR and forums (but not Twitter). Additionally, for all three data sources, at younger ages, infectious conjunctivitis had a larger relative amplitude than allergic conjunctivitis (see Table 5; Fig. 5, red closed versus green closed; Figs. 4A, 4B, thin lines). Inversely, at older ages, allergic conjunctivitis had equal or stronger circular variance seasonality and relative amplitude than infectious conjunctivitis did, for all three data sources (see Table 5; Fig. 5; Fig. 4A, 4B, thick lines).

Not all seasonal features, however, were the same between EMR, Twitter, and forums. EMR infectious conjunctivitis had an earlier circular mean week for younger age groups than for older, but this variation was not observed as much for younger compared to older ages in Twitter and forums (see Table 5 and Fig. 5). Additionally, the mean week of infectious and allergic conjunctivitis consistently occurred slightly later for EMR than for Twitter and forums (see Tables 2, 5).

We compared seasonal features of infectious conjunctivitis to influenza, which is known to be highly seasonal, as well as to other diseases considered less seasonal. Age-based variation of circular mean week, circular variance and relative amplitude observed for infectious conjunctivitis were less varied for influenza and more pronounced (see Fig. 4D, Table 3). Comparing circular variance and relative amplitude, these infectious conjunctivitis seasonal features were lower than for influenza, but higher than for allergic conjunctivitis, other conjunctivitis, and nonconjunctivitis eye disease (see Tables 2, 3; Fig. 4E).

Discussion

Social media posts from Twitter or forums, when classified as pertaining to infectious conjunctivitis, had similar seasonal characteristics (mean week, circular variance, and amplitude) to each other, and to seasonal infectious conjunctivitis occurrence. In the same way, the occurrence of social media posts classified as related to allergic conjunctivitis showed similar timing to the occurrence of clinical cases of allergic conjunctivitis. The mean week of occurrence of infectious conjunctivitis consistently occurred approximately 10 weeks before allergic conjunctivitis, for any data source (clinical, Twitter, or Forum). Broken down by age, social media posts also showed similar seasonal characteristics to corresponding clinical age groups. Our results suggest that social media data regarding conjunctivitis may mirror underlying clinical phenomena.

Despite finding seasonal similarities of posts and clinical conjunctivitis data, some differences existed. For example, we found a smaller ratio of infectious to allergic conjunctivitis for posts than for clinical data. We note that clinical data includes the true calendar age, whereas social media analysis may involve inferred ages. Other biases may exist: perhaps individuals posting concerning allergic conjunctivitis seek health care with a lower frequency than those posting with infectious conjunctivitis. Clinically, seasonality of infectious conjunctivitis for younger ages (0–5, pediatrics) showed an earlier typical occurrence, and exhibited a more pronounced seasonality (smaller circular variance) than older ages (6–40+, ophthalmology).

Several limitations apply to our findings. Boolean and machine-learned classification of posts into disease and age groups introduces unavoidable misclassification. Our human-derived validations of posts indicated that misclassification appears to be uncommon for Twitter posts. They also indicated that for both Twitter and forum posts, whenever humans agreed that a post was either about infectious or allergic conjunctivitis, the machine almost always agreed. However, as indicated in the “Results” section, for a substantial fraction of those forum posts (but not of Twitter posts) that were machine coded as conjunctivitis, the humans classified these as uncertain or not about conjunctivitis. Despite this, forum data still appear to support the distinct seasonal infectious and allergic conjunctivitis relationships seen in Twitter and clinical data. Future study could further refine the forum queries, increasing the agreement rate for forum posts and eliminating nonspecific posts that may have no distinct seasonal pattern. Moreover, posts, as well as our EMR clinical data, may represent limited portions of the national population.3437

Regarding clinical data, we have compared populations and seasonal patterns of acute conjunctivitis cases for emergency medicine within our dataset to that of the national emergency department sample dataset, and found similar populations by age and by conjunctivitis group, suggesting that (at least for emergency medicine) the clinical data used in this current study may at least be partially representative of national clinical data (see Fig. 6; figures 1a, 1b, 2b in Ramirez et al.32). Additionally, clinical data also could include diagnoses from multiple specialties, whose providers may exhibit differences in diagnosis and treatment of the same condition.38 For any given episode, the patient's self-diagnosis (as reflected in the subject of the tweet or post) may differ from the clinical diagnosis, and it is possible that patients are more likely to consider an episode “infectious” than allergic. A similar phenomenon may explain the relative overprescribing of antibiotics even by noneye care specialists versus eye specialists in the treatment of conjunctivitis.38 Future studies could consider use of national clinical data representing all specialties as well as other social media, search, and weblog data. Alternatively, a future study comparing specific geographic or demographic sectors, from diverse clinical or social media platforms, could identify important differences in occurrence to potentially guide more targeted eye health care or policy implementations.

Figure 6.

Figure 6

Age-based and monthly UCSF EMR emergency medicine data, for comparison to national clinical emergency department data (see Ref. 32). Top: age distributions stratified by sex (for comparison, see figure 1a of Ref. 32). Center: age distributions stratified by conjunctivitis diagnosis code groups (for comparison, see figure 1b of Ref. 32). Bottom: monthly distributions stratified by conjunctivitis diagnosis code groups (for comparison, see figure 2b of Ref. 32).

Despite these and other limitations, the findings of our study suggest future use of machine learning and refined Boolean query could allow for even more granular understanding of prevalence and seasonal patterns of different conjunctivitis etiologies. This, in turn, may greatly enhance identification of infectious conjunctivitis occurrence outside normal seasonal patterns for age or geographic subgroups, potentially allowing for improved outbreak detection by combined monitoring and analysis of clinical and Internet-based data.

Supplementary Material

Supplement 1

Acknowledgments

The authors thank Suling Wang for assistance.

Supported in part by Grant 1R01EY024608-01A1 from the National Institutes of Health National Eye Institute (NIH-NEI); Grant EY002162 (Core Grant for Vision Research) from the NIH-NEI; and an unrestricted grant from Research to Prevent Blindness, through the University of California San Francisco Information Technology Enterprise Information Analytics Departments Research Data Browser and Clinical Data Research Consultation Services.

Disclosure: M.S. Deiner, None; S.D. McLeod, None; J. Chodosh, None; C.E. Oldenburg, None; C.A. Fathy, None; T.M. Lietman, None; T.C. Porco, None

Appendix

Statistical Calculation

For observation Inline graphic, denote the observed count (clinical, Twitter, or forums) by Inline graphic and the occurrence time as Inline graphic, with corresponding phase angle Inline graphic (with 0 radians corresponding to January 1). Negative binomial regression with both temporal trend and seasonal spline basis functions then yields the detrended, smoothed estimate of the expected count at phase Inline graphic as Inline graphic, where Inline graphic is the estimated regression coefficient for the Inline graphicth spline basis function (Inline graphic is the number of spline basis functions used). Corresponding to Inline graphic is the estimated first circular moment,39 a complex valued quantity given by Inline graphic (note: Inline graphic). The estimated circular mean phase Inline graphic was computed from the first circular moment Inline graphic according to Inline graphic = arg(Inline graphic) , and reported in days or weeks (instead of degrees or radians). The circular variance Inline graphic is defined as Inline graphic, and measures the lack of seasonality. We defined relative amplitude as simply Inline graphic (i.e., the ratio of the peak-to-trough distance to the total peak). Residuals were examined for autocorrelation. Standard errors for overall seasonal summary statistics (circular mean time of occurrence, circular variance, and relative amplitude) were computed using Monte Carlo simulation based on the estimated standard errors for the regression coefficients for the spline basis functions which yield the fitted seasonal curve (from which the summary statistics were derived). Standard errors for these quantities were inflated based on an autocorrelation-based effective sample size formula.40

Validation of Machine-Coded Classifications

Two human raters reviewed materials from the American Academy of Ophthalmology website concerning infectious and allergic conjunctivitis, followed by two training sessions using randomly chosen conjunctivitis posts. We conducted a modified Delphi session as follows in which each rater classified each post from a common set of randomly chosen posts as allergic (Inline graphic); infectious (Inline graphic); or unsure/other (Inline graphic). Raters were masked to the machine-coded classifications and to other raters. After all ratings were completed, a moderator identified posts for which disagreement occurred. The moderator was masked to the machine-coded classifications. For each post on which the raters disagreed, the moderator elicited follow-up comments (one to two sentences) from raters in a random order, followed by an opportunity for the raters to update their classifications. A final round was conducted in the same way, after which the data set was locked. From this final human rated dataset, the human consensus classification Inline graphic for two ratings Inline graphic and Inline graphic was defined by Inline graphic, Inline graphic, and Inline graphic. The sample size for tweets was fixed in advance at 128, which provides a confidence interval half-width of under 0.1 for a proportion of 0.5; half this number of forum posts were scored (since the forum posts are, on average, much longer and take more time to assess).

Social Media Geographic Location

Using Crimson Hexagon's geocoding and language algorithms,41,42 we sought to maximize USA geographic specificity in our query. Twitter results were filtered to include only tweets which contained “Locations: United States of America”. Blogs, forums, and comments results, for which reliable geocoding was not available, were filtered to include just those posts which contained “Language: English” (please see Supplementary Material for additional query details).

References

  • 1. Brownstein JS, Freifeld CC. . HealthMap: the development of automated real-time internet surveillance for epidemic intelligence. Euro Surveill. 2007; 12:E071129.5. [DOI] [PubMed] [Google Scholar]
  • 2. Brownstein JS, Freifeld CC, Madoff LC. . Digital disease detection—harnessing the web for public health surveillance. N Engl J Med. 2009; 360: 2153– 2155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Madan A, Cebrian M, Lazer D, Pentland A. . Social sensing for epidemiological behavior change. Proc ACM Int Conf Ubiquitous Comput. 2010: 291– 300.
  • 4. Khan K, McNabb SJN, Memish ZA,et al. . Infectious disease surveillance and modelling across geographic frontiers and scientific specialties. Lancet Infect Dis. 2012; 12: 222– 230. [DOI] [PubMed] [Google Scholar]
  • 5. Sadilek A, Kautz H, Bigham JP. . Modeling the interplay of people's location, interactions, and social ties. : Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. Beijing, China: AAAI Press; 2013: 3067– 3071. [Google Scholar]
  • 6. Hartley DM, Nelson NP, Arthur RR,et al. . An overview of internet biosurveillance. Clin Microbiol Infect. 2013; 19: 1006– 1013. [DOI] [PubMed] [Google Scholar]
  • 7. Velasco E, Agheneza T, Denecke K, Kirchner G, Eckmanns T. . Social media and Internet-based data in global systems for public health surveillance: a systematic review. Milbank Q. 2014; 92: 7– 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Nuti SV, Wayda B, Ranasinghe I,et al. . The use of Google trends in health care research: A systematic review. PLoS One. 2014; 9: e109583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Brownstein JS, Mandl KD. . Reengineering real time outbreak detection systems for influenza epidemic monitoring. AMIA Annu Symp Proc. 2006: 866. [PMC free article] [PubMed]
  • 10. Brownstein JS, Freifeld CC, Madoff LC. Influenza A. (H1N1) virus, 2009—online monitoring. N Engl J Med. 2009; 360: 2156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. . Detecting influenza epidemics using search engine query data. Nature. 2009; 457: 1012– 1014. [DOI] [PubMed] [Google Scholar]
  • 12. Barboza P, Vaillant L, Le Strat Y,et al. . Factors influencing performance of internet-based biosurveillance systems used in epidemic intelligence for early detection of infectious diseases outbreaks. PLoS One. 2014; 9: e90536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Generous N, Fairchild G, Deshpande A, Del Valle SY, Priedhorsky R. . Global disease monitoring and forecasting with Wikipedia. PLoS Comput Biol. 2014; 10: e1003892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Santillana M, Nguyen AT, Dredze M, Paul MJ, Nsoesie EO, Brownstein JS. . Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Comput Biol. 2015; 11: e1004513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Hoen AG, Keller M, Verma AD, Buckeridge DL, Brownstein JS. . Electronic event-based surveillance for monitoring dengue, Latin America. Emerg Infect Dis. 2012; 18: 1147– 1150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. HealthMap. Available at: http://www.healthmap.org/site/about. Accessed July 19, 2017.
  • 17. Leffler CT, Davenport B, Chan D. . Frequency and seasonal variation of ophthalmology-related Internet searches. Can J Ophthalmol. 2010; 45: 274– 279. [DOI] [PubMed] [Google Scholar]
  • 18. Kang MG, Song WJ, Choi S,et al. . Google unveils a glimpse of allergic rhinitis in the real world. Allergy. 2015; 70: 124– 128. [DOI] [PubMed] [Google Scholar]
  • 19. McGregor F, Somner JEA, Bourne RR, Munn-Giddings C, Shah P, Cross V. . Social media use by patients with glaucoma: What can we learn? Ophthalmic Physiol Opt. 2014; 34: 46– 52. [DOI] [PubMed] [Google Scholar]
  • 20. Deiner MS, Lietman TM, McLeod SD, Chodosh J, Porco TC. . Surveillance tools emerging from search engines and social media data for determining eye disease patterns. JAMA Ophthalmology. 2016; 134: 1024– 1030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Bousquet J, Caimmi DP, Bedbrook A,et al. . Pilot study of mobile phone technology in allergic rhinitis in European countries: the MASK-rhinitis study. Allergy. 2017; 72: 857– 865. [DOI] [PubMed] [Google Scholar]
  • 22. Benke KK. . Uncertainties in big data when using Internet surveillance tools and social media for determining patterns in disease incidence. JAMA Ophthalmol. 2017; 135: 402. [DOI] [PubMed] [Google Scholar]
  • 23. Sommer A. . The utility of “big data” and social media for anticipating, preventing, and treating disease. JAMA Ophthalmol. 2016; 134: 1030– 1031. [DOI] [PubMed] [Google Scholar]
  • 24. Deiner MS, Lietman TM, Porco TC. . Uncertainties in big data when using Internet surveillance tools and social media for determining patterns in disease incidence-reply. JAMA Ophthalmology. 2017; 135: 402– 403. [DOI] [PubMed] [Google Scholar]
  • 25. Smith AF, Waycaster C. . Estimate of the direct and indirect annual cost of bacterial conjunctivitis in the United States. BMC Ophthalmol. 2009; 9: 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Benzekri R, Belfort R Jr, Ventura CV,et al. . Manifestations oculaires du virus Zika: Où en sommes-nous? J Fr Ophtalmol. 2017; 40: 128– 145. [DOI] [PubMed] [Google Scholar]
  • 27. Crimson Hexagon. Available at: http://www.crimsonhexagon.com. Accessed July 19, 2017.
  • 28. Hopkins D, King G. . A method of automated nonparametric content analysis for social science. Am J Pol Sci. 2010; 54: 229– 247. [Google Scholar]
  • 29. Firat A, Brooks M, Bingham C, Herdagdelen A, King G. . Systems and methods for calculating category proportions. 2014.
  • 30. Hilbe JM. . Negative Binomial Regression. Cambridge: Cambridge University Press; 2011. [Google Scholar]
  • 31. Wood SN. . Generalized Additive Models: An Introduction with R, Second Edition. Boca Raton, FL: CRC Press; 2017. [Google Scholar]
  • 32. Ramirez DA, Porco TC, Lietman TM, Keenan JD. . Epidemiology of conjunctivitis in US emergency departments. JAMA Ophthalmol. 2017; 135: 1119– 1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Crimson Hexagon Topic Wheel. Available at: https://help.crimsonhexagon.com/hc/en-us/articles/203641365-Explore-Tab-Topic-Wheel-Section. Accessed July 19, 2017.
  • 34. Sadah SA, Shahbazi M, Wiley MT, Hristidis V. . Demographic-based content analysis of web-based health-related social media. J Med Internet Res. 2016; 18: e148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Pew Research. Social media fact sheet. Available at: http://www.pewinternet.org/fact-sheet/social-media/. Accessed July 19, 2017.
  • 36. Sadah SA, Shahbazi M, Wiley MT, Hristidis V. . A study of the demographics of web-based health-related social media users. J Med Internet Res. 2015; 17: e194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Pew Research. Social media update 2016. Available at: http://www.pewinternet.org/2016/11/11/social-media-update-2016/. Accessed July 19, 2017.
  • 38. Shekhawat NS, Shtein RM, Blachley TS, Stein JD. . Antibiotic prescription fills for acute conjunctivitis among enrollees in a large United States managed care network. Ophthalmology. 2017; 124: 1099– 1107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Fisher NI. . Statistical Analysis of Circular Data. Cambridge: Cambridge University Press; 1993. [Google Scholar]
  • 40. Leith CE. . The standard error of time-average estimates of climatic means. J Appl Meteorol. 1973; 12: 1066– 1069. [Google Scholar]
  • 41. Crimson Hexagon Location Methodology. Available at: https://help.crimsonhexagon.com/hc/en-us/articles/203952525-Location-Methodology. Accessed July 19, 2017.
  • 42. Crimson Hexagon Language Methodology. Available at: https://help.crimsonhexagon.com/hc/en-us/articles/202772699-Language-Filter-How-Does-it-Work. Accessed July 19, 2017.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

Articles from Investigative Ophthalmology & Visual Science are provided here courtesy of Association for Research in Vision and Ophthalmology

RESOURCES