Abstract
Society’s increasing interactions with technology are creating extensive “digital traces” of our collective human behavior. These new data sources are fuelling the rapid development of the new field of computational social science. To investigate user attention to the Hurricane Sandy disaster in 2012, we analyze data from Flickr, a popular website for sharing personal photographs. In this case study, we find that the number of photos taken and subsequently uploaded to Flickr with titles, descriptions or tags related to Hurricane Sandy bears a striking correlation to the atmospheric pressure in the US state New Jersey during this period. Appropriate leverage of such information could be useful to policy makers and others charged with emergency crisis management.
Steadily increasing quantities of data are being generated through society’s interactions with technology, automatically documenting human actions in a previously unimaginable fashion1,2,3,4,5,6,7,8,9,10. Analysis of such “big data” is opening up new windows for a more precise quantification of real world social phenomena. A particularly fruitful area of research has focused on the analysis of Internet user search queries, as logged by search engines such as Google. Strong links have been found between changes in the information users are seeking online and events in the real world, ranging from reports of flu infections across the USA11 to trading volume in the US stock markets12. A recent study has also shown that Internet users from countries with a higher per capita GDP are significantly more likely to search for information about years in the future than years in the past13. Preis, Moat and Stanley have demonstrated that changes in the number of searches for financially related terms on Google may have contained early warning signs of stock market moves14. Moat et al. presented evidence that increases in the number of views of financially related pages on Wikipedia could be detected before stock market falls15.
Collective human attention to topics can be measured by various indices into online information flow. While analysis of search volume provides insight into the information that people are seeking, there are other data sources which one can analyze to gain insight into information that people are distributing. This information can take various forms, from text to multimedia, such as photos and videos.
In this case study, we analyze the usage of a prominent photo sharing website, Flickr19,20,16,17,18. We investigate whether we can identify any relationship between catastrophic events such as natural disasters, and users’ photo sharing activity on Flickr, to provide insight into the dynamics of human attention to such events.
The subject of our case study is Hurricane Sandy—a hurricane that devastated portions of the Caribbean and the Mid-Atlantic and Northeastern United States during late October 2012. “Sandy”, classified as the eighteenth named storm and tenth hurricane of the 2012 Atlantic hurricane season, made landfall near Atlantic City, New Jersey at 00:00 Coordinated Universal Time (UTC) on 30 October 201221.
Results
We examine photos uploaded to Flickr and labeled with the terms Hurricane, Sandy or Hurricane Sandy in their tags, title or description text. We analyze the times at which these photos were taken by users around the world. We normalize hourly counts of photos labeled with these hurricane related terms by the hourly count of all photos taken. To eliminate daily periodicity in the Flickr data, the counts for photos labeled with the terms Hurricane, Sandy, Hurricane Sandy, and for all photos taken are transformed to represent at each hour t the average value from a surrounding moving window of Δt hours (t − Δt/2; t + Δt/2]. Visualization of the data reveals that the normalized number of photos taken increased continuously while “Sandy” was moving toward the coast of the United States (Figure 1A).
In this case study, we compare the normalized number of Hurricane Sandy related Flickr photos taken to a direct measure of the environment during the development of Hurricane Sandy: the atmospheric pressure in the US state New Jersey between 20 October 2012 and 20 November 2012 (Figure 1B). Atmospheric pressure data are compiled from average measurements from 62 stations in New Jersey forming part of the Automated Surface Observing System (ASOS), and are analyzed at an hourly granularity.
We find a striking correlation between the moving average of the normalized number of Hurricane Sandy related Flickr photos taken and the atmospheric pressure in New Jersey for Δt = 24 hours (Kendall’s tau = −0.37, z = −15.14, p < 0.001). Notably, the time of landfall of Hurricane Sandy not only marks the time of lowest air pressure, but also the time at which the largest number of Flickr photos labeled with terms related to Hurricane Sandy were taken. We find qualitatively similar results for a moving average window with Δt = 12 hours (Kendall’s tau = −0.36, z = −14.62, p < 0.001). Analysis of the interval starting 48 hours before and ending 48 hours after landfall of Hurricane Sandy also reveals qualitatively similar results for Δt = 24 hours (Kendall’s tau = −0.83, z = −12.02, p < 0.001) and Δt = 12 hours (Kendall’s tau = −0.73, z = −10.63, p < 0.001).
Discussion
In summary, the number of photos taken and subsequently uploaded to Flickr with labels related to Hurricane Sandy bears a striking correlation to the atmospheric pressure in the US state New Jersey in the period from 20th October 2012 until 20th November 2012. We propose two possible interpretations of this result. First, we suggest that users may have taken more photos as the severity of the problem increased – in this case, atmospheric pressure dropping and therefore wind speed increasing. This would suggest that in cases where no external sensors were available, it may be possible to measure the number of Flickr photos relating to a topic to gauge the current level of this category of problems. A second alternative interpretation would be that users were well informed as to the expected time of landfall due to extensive media coverage, and that their attention to the problem increased as the anticipated climax of the disaster approached, leading to an increase in the numbers of photos taken. This would equally open the possibility that increases in Flickr photo counts with particular labels may reveal notable increases in attention to an issue, such that issues which have received less extensive media coverage but which may merit further investigation may be identified. Future research investigating other examples of catastrophic events would be needed to demonstrate universality of the results we find. Such research should also take into account the number of active Flickr users in a country, given the country’s population.
We suggest that Flickr can be considered as a system of large scale real-time sensors documenting collective human attention. The analysis of other examples of catastrophic events, beyond this case study of Hurricane Sandy, is however needed to evaluate whether an appropriate leverage of such a system could be of interest to policy makers and others charged with emergency crisis management.
Methods
We retrieved data on image uploads to Flickr by accessing the Flickr API (http://www.flickr.com/services/api/flickr.photos.search.html) on 3 December 2012. The photo search function used returns a list of photos matching given criteria. We retrieved data on atmospheric pressure from 62 weather stations in New Jersey which form part of the Automated Surface Observing System (http://www.ncdc.noaa.gov/land-based-station-data/automated-surface-observing-system-asos) on 28 December 2012.
Author Contributions
T.P., H.S.M., S.R.B., P.T. and H.E.S. performed analyses, discussed the results, and contributed to the text of the manuscript.
Acknowledgments
T.P., H.S.M., S.R.B. and P.T. acknowledge the support of Research Councils UK via Grant EP/K039830/1. HES thanks NSF Grant CMMI 1125290. TP, HSM and HES were also supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI/NBC) contract number D12PC00285. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government.
References
- Johnson N. et al. Pattern in Escalations in Insurgent and Terrorist Activity. Science 333, 81–84 (2011). [DOI] [PubMed] [Google Scholar]
- Lazer D. et al. Computational Social Science. Science 323, 721–723 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conte R. et al. Manifesto of computational social science. Eur. Phys. J. Spec. Top. 214, 325–346 (2012). [Google Scholar]
- Moat H. S. et al. Using big data to predict collective behavior in the real world. Behav. Brain Sci. (in press). [DOI] [PubMed] [Google Scholar]
- Batty M. The Size, Scale, and Shape of Cities. Science 319, 769–771 (2008). [DOI] [PubMed] [Google Scholar]
- Preis T., Schneider J. J. & Stanley H. E. Switching processes in financial markets. PNAS 108, 7674–7678 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vespignani A. Predicting the Behavior of Techno-Social Systems. Science 325, 425–428 (2009). [DOI] [PubMed] [Google Scholar]
- Stanley H. E. et al. Correlated randomness and switching phenomena. Physica A 389, 2880–2893 (2010). [Google Scholar]
- Preis T., Kenett D.Y., Stanley H.E., Helbing D. & Ben-Jacob E. Quantifying the behavior of stock correlations under market stress. Sci. Rep. 2, 752; 10.1038/srep00752 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng L. et al. Linking agent-based models and stochastic models of financial markets. PNAS 109, 8388–8393 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ginsberg J. et al. Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014 (2009). [DOI] [PubMed] [Google Scholar]
- Preis T., Reith D. & Stanley H. E. Complex dynamics of our economic life on different scales: insights from search engine query data. Phil. Trans. R. Soc. A 368, 5707–5719 (2010). [DOI] [PubMed] [Google Scholar]
- Preis T., Moat H. S., Stanley H. E. & Bishop S. R. Quantifying the Advantage of Looking Forward. Sci. Rep. 2, 350 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Preis T., Moat H. S. & Stanley H. E. Quantifying trading behavior in financial markets using Google Trends. Sci. Rep. 3, 1684 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moat H. S. et al. Quantifying Wikipedia usage patterns before stock market moves. Sci. Rep. 3, 1801 (2013).23652774 [Google Scholar]
- Goodchild M. F. & Glennon J. A. Crowdsourcing geographic information for disaster response: a research frontier. Int. J. Digital Earth 3, 231–241 (2010). [Google Scholar]
- Angus E., Thelwall M. & Stuart D. General patterns of tag usage among university groups in Flickr. Online Inform, Rev. 32, 89–101 (2008). [Google Scholar]
- Rafferty P. & Hidderley R. Flickr and Democratic Indexing: dialogic approaches to indexing. Aslib Proc. 59, 397–410 (2007). [Google Scholar]
- Cox A. M. Flickr: a case study of Web2.0. Aslib Proc. 60, 493–516 (2008). [Google Scholar]
- Singla A. & Weber I. Camera Brand Congruence and Camera Model Propagation in the Flickr Social Graph. ACM Trans. Web 5, 20 (2011). [Google Scholar]
- Brown D. Post-Tropical Cyclone Sandy Discussion Number 31 (http://www.nhc.noaa.gov/archive/2012/al18/al182012.discus.031.shtml). National Hurricane Center. Retrieved on 28th December 2012.