Skip to main content
PLOS One logoLink to PLOS One
. 2023 Feb 22;18(2):e0280423. doi: 10.1371/journal.pone.0280423

From sunrise to sunset: Exploring landscape preference through global reactions to ephemeral events captured in georeferenced social media

Alexander Dunkel 1,*, Maximilian C Hartmann 2, Eva Hauthal 1, Dirk Burghardt 1, Ross S Purves 2
Editor: Jacinto Estima3
PMCID: PMC9946259  PMID: 36812172

Abstract

Events profoundly influence human-environment interactions. Through repetition, some events manifest and amplify collective behavioral traits, which significantly affects landscapes and their use, meaning, and value. However, the majority of research on reaction to events focuses on case studies, based on spatial subsets of data. This makes it difficult to put observations into context and to isolate sources of noise or bias found in data. As a result, inclusion of perceived aesthetic values, for example, in cultural ecosystem services, as a means to protect and develop landscapes, remains problematic. In this work, we focus on human behavior worldwide by exploring global reactions to sunset and sunrise using two datasets collected from Instagram and Flickr. By focusing on the consistency and reproducibility of results across these datasets, our goal is to contribute to the development of more robust methods for identifying landscape preference using geo-social media data, while also exploring motivations for photographing these particular events. Based on a four facet context model, reactions to sunset and sunrise are explored for Where, Who, What, and When. We further compare reactions across different groups, with the aim of quantifying differences in behavior and information spread. Our results suggest that a balanced assessment of landscape preference across different regions and datasets is possible, which strengthens representativity and exploring the How and Why in particular event contexts. The process of analysis is fully documented, allowing transparent replication and adoption to other events or datasets.

1. Introduction

Sunset and sunrise are witnessed and appreciated daily by countless people around the world. The timing and visibility of these ubiquitous events varies according to a host of local anthropogenic and natural environmental factors, including weather, pollution, topography and the built environment. Sunsets and sunrises are valued in many cultures through ceremonies, storytelling and art, and often linked to landscape appreciation through paintings and photographs (c.f. [1]). In this paper we take advantage of social media manifestations of this phenomena to explore global patterns using millions of photographs and their descriptions.

The rationale for choosing this event is twofold. Firstly, there is a general lack of reproducibility in studies of behavior using social media because samples, population and the phenomena under observations change between studies [2]. This means that methods and results of case studies are difficult to generalize more broadly. For this reason, we chose an event type with a strong temporal and spatial consistency. Sunset and sunrise were among the few events fulfilling these criteria. Secondly, while the use of geo-social media has rapidly increased in recent years, there is a lack of validation of transferability for methods allowing robust identification of value or worth [3]. This issue is commonly known as results reproducibility. Improving results reproducibility ideally requires an experiment with two maximally separated datasets and “finding relationships in the same direction and at similar strength” [3 p38] in both. This is difficult to implement with more ‘newsworthy’ topics. The consistent global and long-term footprint of the sunset and sunrise let us maximize sample size while simultaneously providing a basis for reproducing results using two datasets, albeit not universally representative but independent, collected from Instagram and Flickr.

Importantly, the subject of this study is not simply one of abstract interest. Understanding what is collectively valued where, by whom and when is important because an equitable society should take into account diverse ways of perceiving, valuing and appreciating the world [4]. Indeed, as we grapple with the challenge of developing indicators measuring progress towards the Sustainable Development Goals, the importance of capturing this diversity becomes more apparent (un.org/sustainabledevelopment). In natural resource management, planners require information about the spatial distribution and relative importance of collective values related to the landscape, in order to plan and manage scarce resources, for example by identifying regions under pressure, designating protected areas or channeling visitor flows. A common term to refer to these values in natural resource management is ‘landscape preference’, relating the relative importance and subjective meanings people attribute to landscapes. Landscape preference is expressed through, for example, repetition of activities, personal experiences and meanings attached to places, or the choices and identities that shape behavior [5 p454]. Ilieva & McPhearson emphasize the overarching significance of such information because “human behavior and values […] are affecting, and may even drive, the future of global sustainability” [6 p553]. Concretely this means, to inform decision makers, it is necessary to identify which parts of the landscape are valued and how they are used.

In recent years, landscape preference researchers have turned to georeferenced social media, or more broadly User Generated Content (UGC), for example in the form of Instagram pictures or Twitter posts [7], as a new data source supplementing traditional approaches. However, these data sources are challenging to work with. They are noisy, biased, difficult to completely sample, and often shared through incompletely documented Application Programming Interfaces (APIs). Not surprisingly, Teles da Mota & Pickering [8] summarize that only five out of 48 papers they surveyed conduct analysis at a global scale.

In landscape preference, comparison with global aspects of a phenomenon is important because it allows putting observations into perspective and validating individual, local patterns. A global perspective could for example allow us to study underlying patterns and fundamental biases of the system, or visualization methods, as a whole. By focusing on one, in principle universally observable event captured in two independent datasets, we aim to:

  • Contribute a workflow template for consistent visualization of global patterns of landscape preference in popular social media platforms (Flickr and Instagram) with respect to a single phenomenon

  • Assess the robustness and applicability of privacy preserving metrics describing variation in a global event

  • Interpret the results of a global study of descriptions of sunrise and sunset in social media from the perspective of landscape preference

1.1 Related work

Natural resource management, which through planning and policy seeks to contribute to sustainable development goals, applies a broad range of methods to understand landscape preference, including self-directed participant photography, surveys or visual landscape questionnaires [911]. However, scaling these approaches beyond individual landscapes remains costly and complex, and is an important limiting factor in practice [12, 13]. One approach to filling this gap is the use of publicly available social media and crowdsourced geodata [1418].

Locally, analysis of these data are not fundamentally different to contemporary approaches. Chen, Parkins & Sherren [19], for example, study landscape values through a collection of 10,000 geotagged Instagram posts to assess the impact of two proposed hydroelectric dams in Canada. Using manual filtering techniques, qualitative content analysis methods, and context, the authors describe social and cultural landscape values in the proximity of these projects. Similarly, Langemeyer, Calcagni & Baró [20] explore landscape aesthetics in a province in Spain. Their approach quantifies preference in space, using a 500m grid resolution and counting posts (photos). Making results transferable requires standardization, for example by counting unique users only once at each location [20 p544]. Tieskens, van Zanten, Schulp & Verburg [21], take this idea one step further, explicitly using user counts as a primary measure, randomly selecting one photo per user per square kilometer, to reduce the bias from more active users.

Beyond individual local case studies, a number of authors have demonstrated that new data sources are not only comparable to traditional landscape preference research, but also reliable and reproducible. Wood et al. [22] study the frequency of Flickr users per month for national parks and find a strong correlation between officially reported visitation counts. They propose “Photo User Days (PUD)” as a measure, the cumulative number of unique user visits per day. PUD is commonly used as a quantitative proxy for people’s choices, behavior and overall valuation (e.g. [17, 23, 24]). As social media and VGI become increasingly popular data sources, the need to understand the source and nature of differences becomes all the more important. Wartmann & Purves [16] in a study comparing free listing, unstructured text and Flickr tags find that differing approaches overlap with respect to some elements of landscape (e.g. the biophysical) but give quite different results in other aspects (e.g. sense of place).

Obviously, further examination of these issues is urgent. Several gaps in research prove to be ongoing barriers. A primary challenge is that cross-platform comparison of patterns and integration of data is complex [25, 26]. Understandably, most of the work cited above uses a spatial filter as the main entry point of analysis, extracting data for a specific area or region. Other dimensions, such as who, what, or when [27], are either ignored or subsumed as distributional measures under the spatial filter (where), preventing assessment of fundamental biases [28 p299]. Often, filtering beyond spatial selection only seeks to exclude blunders, such as wrongly geotagged information (e.g. [20]), or to classify data into relatively broad categories such as landcover (e.g. [21]). Consequently, the collected data in these studies may cover “every aspect of the environment and all human environmental experience, recollection and imagination” [13 p270]. The result is an exponential increase of ‘incidental variables’ that must be considered, making it difficult to pinpoint the why and how of patterns [27].

In addition to these core challenges, protecting the privacy of users is increasingly relevant when working with user-generated content [29]. However, in a review of social media studies assessing nature-based tourism, Teles da Mota & Pickering found that “only 12 [out of 48 papers] referred in some way to ethical issues [such as privacy], and mostly briefly” [8 p7]. This is surprising, given that a variety of privacy-preserving methods are well known, for example K-anonymity [30], for estimating and limiting the risk of re-identification in shared datasets, or Differential Privacy [31], providing exact guarantees for privacy-preservation based on carefully calibrated levels of noise added to datasets. These methods particularly focus on privacy-preserving publishing of results. According to Malhotra, Kim & Agarwal, any “act of data collection […] is the starting point of various information privacy concerns” [32 p338]. To reduce the risk of re-identification at data collection time, a promising direction is provided by Probabilistic Data Structures (PDS) [33]. PDS follow the principle of data minimization (ibid.). One such PDS is HyperLogLog (HLL), a data abstraction format for estimating quantities within guaranteed error bounds [34]. A workflow to use HLL with geo-social media was demonstrated by Dunkel et al. [35] studying user frequency of worldwide Flickr posts and quantifying the effects on privacy.

All of the above means that studies regarding global properties of crowdsourced data are rare in landscape research [8]. Among the few exceptions, van Zanten et al. [10] quantified landscape value at a continental-scale, randomly sampling social media data from Panoramio, Flickr, and Instagram, using keywords to filter content for values such as ‘aesthetic enjoyment’ or ‘outdoor recreation’. They emphasize though that the frequency of geotagged photographs strongly correlates with overall frequentation, resulting in overestimates for urban areas and highly frequented, popular sites.

To shift the initial lens of observation from specific locations, we limit this work to a specific sort of events, sunset and sunrise, and thus as a starting point pose a “what” question. This narrow thematic filter allows us to conduct a more focused description and assessment of contextual variables. These events are entirely ephemeral, but have a profound, measurable impact on human-environment perception and interaction. Unlike many other events, an ability to perceive sunset and sunrise is narrowly bound by time, but almost entirely uncoupled from space. In regard to landscape preference, Howard [36] forms a seminal list of 11 ‘lenses’ that may explain differences in preference and perception, from universal preference factors, to nationality, culture, or social status, to more openly defined categories he terms insideness, activity, or medium. Sunset and sunrise appear to be among the few events that could be assigned to the category of ‘universal preference factors’ in Howard’s list. Although we believe care is needed in making claims of universal appreciation, especially from a western perspective [37], sunset and sunrise offer the promise of an indicator for aesthetic values worldwide.

In the context of online sharing on social media, such as Instagram or Flickr, photographs function as evidence for presence in place and time [38]. Individual photographs reflect different memorable experiences and therefore represent different preference contexts. Since mobile phone cameras have become ubiquitous, the barriers to spontaneous photography have decreased and taking a picture of a sunset or sunrise is trivial (Fig 1). Urry linked this behavior to a literary concept, the hermeneutic circle [39 p129] in the context of collectively repeated and mediated behavior. Sunset and sunrise imagery shared online is strongly linked to such behavior [40], and the focus of our analysis. We apply a privacy preserving approach which also greatly improves efficiency, as proposed by Dunkel et al. [35] in a previous scoping study.

Fig 1. Sunrise in the Swiss Alps from one of the authors.

Fig 1

2. Materials and methods

2.1 Workflow

Our analytical focus was on a workflow utilizing, integrating and improving existing standard methods to explore patterns at a global scale. Our rationale is that using well-known methods allows more attention to be paid to the implementation details, the combination of methods and the analytical process overall. The code used to produce figures in this work is available in Jupyter Notebooks in S1S9 Files and our workflow is summarized in Fig 2.

Fig 2. Workflow schema for data filtering, transformation, aggregation and visualization.

Fig 2

2.2 Data collection

Data was collected from the public Flickr and Instagram Application Programming Interfaces (APIs) covering periods of 10 years (2007 to 2018) and 5 months (2017/08/01 to 2018/01/04). Our goal was to sample a comparable volume of data for Flickr on the one hand, while reducing incidental variables for Instagram on the other, by covering at least two seasons (Fall and Winter). The lower limit of 2007 for Flickr was chosen based on the year the tagging feature became available. The rationale is that behavior of users and what data could be uploaded is affected by the interface and feature availability (e.g. the tagging field). Therefore, by limiting collection to the time after 2007, we sampled data from a period where the current Flickr feature set was largely consistent and fully developed. Reactions to sunset and sunrise were filtered based on a list of keywords (Table 1) found in the title (Flickr only), the post body, or as explicit hashtags (Instagram) or tags (Flickr). To increase coverage of captured reactions beyond English the keyword list included translations of terms in three European languages: German, Dutch and French. By sampling three additional languages, we aimed to explore the bias which would result if only English was used. Since we also intended to interpret tags and their associations, it was important to use languages which authors of the paper understood, since automatic translations of tags cannot deal with ambiguity and metaphor. Data collection for Instagram was repeated daily using Netlytic [41]. This platform facilitated access to the Instagram API for researchers, but ceased support in December 2018. For the given period, all new posts for each hashtag (Table 1) were retrieved and merged. These hashtag-specific streams include both posts attached to places (‘geotagged’) and those not explicitly georeferenced. Conversely, Flickr data was retrospectively queried using the API endpoint flickr.photos.search, to search for georeferenced posts in the period from 2007 to 2018.

Table 1. List of keywords, for the four chosen languages and for sunset (top row) and sunrise (bottom row), whose whole occurrence in title, post body, or (hash-) tags defined the initial set of collected reactions.

Language Event English German Dutch French
Sunset sunset, sunsets sonnen-untergang zonson-dergang coucherdusoleil, couchersoleil, coucher_soleil, coucher_du_soleil, coucher_de_soleil, coucherdesoleil, coucher AND soleil
Sunrise sunrise, sunrises sonnen-aufgang zonsopgang, zonsopkomst leverdusoleil, leversoleil, lever_du_soleil, lever_soleil, lever_de_soleil, leverdesoleil, lever AND soleil

This workflow for collecting data leads to a number of sampling effects. Firstly, user groups on Flickr and Instagram are not universally representative, despite having covered together about one billion users in 2017 [42]. Secondly, filters for space, time and language lead to further intra-dataset biases. These are unavoidable when working with these data sources [43]. Our workflow for examining the consistency of results across different groups (Instagram and Flickr), and in terms of separate partitions for "what" is collectively valued, "where", by "whom" and "when” directly reflects this situation. Nonetheless, our results are not fully representative of all users on Flickr and Instagram. Particularly our language filter introduces biases for specific groups using these languages, that we attempt to quantify in §3.3.

2.3 Preprocessing

Since different social media platforms have different data structures and attributes, we mapped these to a common structure for comparison [44]. We stored data using a data abstraction format called HyperLogLog (HLL) [34]. HLL estimates the number of distinct items in a set by an irreversible approximation, preventing identification of individual users from collected data and significantly improving data processing performance. At data collection time, the computation of a HLL set requires hashing ids (e.g. a user ID, a post ID), as a means to randomize character distribution [35]. The binary version of hashes is then divided into “buckets” of equal bit length. For each bucket, only the maximum number of leading zeroes is memorized, which frequently means that adding new IDs does not change the HLL signature. Based on the maximum number of zeroes observed for all buckets, it is possible to predict how many distinct items must have been added to a single HLL set, by so-called cardinality estimation.

From a practical viewpoint, the functionality and use of HLL sets is akin to conventional sets. For instance, several HLL sets can be merged (a union operation), to compute the combined count of distinct elements of both sets, without losing accuracy. We use this function to sequentially aggregate data to larger spatial levels in our analysis (§2.5). From a conceptual point of view, HLL effectively summarizes values (in contrast to raw or pseudonymized data), since multiple original IDs can randomly produce the same HLL binary signature. Importantly, we were not interested in individual users, only quantities, and HLL allowed us to reduce the data collection footprint to these quantitative measurements early in the process. Consequently, the study illustrated here can be repeated without the need to store raw data, providing both performance and privacy benefits [35]. Since these additional steps are not relevant to the analytical process and results, they are not described in further detail here. All quantities reported in this paper are estimates, with guaranteed error bounds of ±2.30% [35].

This mode of data collection resulted in two primary datasets. The first contains 3,310,400 Flickr posts for sunset-sunrise reactions between 2007 to 2018 and includes about 2.9 times more sunset (2,545,460) than sunrise (881,320) images. About 3.5% of posts (116,390) contain at least one term from both lists for sunset and sunrise (Table 1). The Instagram dataset, despite covering a much smaller temporal window of only 5 months, contains a larger distinct number of 21,192,990 sunset-sunrise reactions. About 44% (9,462,270) of these posts are geotagged with explicit references to places. With a ratio of 3.7 from sunset (17,660,470) to sunrise (4,741,050) related imagery, this dataset contains a slightly higher proportion of sunset posts. The ratio is consistent with the officially reported number of all posts ever shared on Instagram, for sunset (282,265,750 posts) and sunrise (75,482,380) [45], providing a baseline of validation for the data collection process. About 5.7% (1,208,540) of the Instagram posts contain references to both sunset and sunrise (Table 1).

Two additional datasets were retrieved, with the goal of reflecting a random collection of posts, to be used as the denominator for the chi statistics described in §2.5. For Flickr, the dataset contains all georeferenced posts (302,101,300) over the same period. For Instagram, the dataset samples a random selection of 20 million posts. Finally, we extracted the subset of references (URLs) to Flickr images with Creative Commons licenses, for qualitative exploration. This included 82,850 images for sunrise and 284,990 images for sunset.

2.4 Ethics statement

The use of the datasets was done in compliance to the Flickr, Instagram and Netlytic Terms and Conditions and PLOS ONE requirements for this type of study.

2.5 Transformation

In this study, we wished to represent locations, users, time, and semantics, each reflecting one of the four dimensions (Where, Who, When and What) (see Fig 2). The spatial dimension (where) refers to the spatial references and is represented with geographical coordinates for posts. For Instagram coordinates are available for a known gazetteer of ‘places’ to which images can be related, while in Flickr, coordinates are directly available either as GPS coordinates or through manual geotagging. The temporal dimension (when), represents the time of reaction (taking a photograph). This is directly available on Flickr as the ‘post create date’, as inferred from the image timestamp. The ‘post publish date’ is used as a substitute for Instagram because the original date of photo taking is not captured. We use user IDs (who) not to identify individuals, but to allow user counts to be measured. We chose user count as our primary measure since this is not biased by individual prolific posters (Results for photocount and Photo User Days [22] are also available, see S1S9 Files). Finally, the textual content associated with posts in the form of titles, post body and tags is used to allow analysis of the semantics considered worth reporting by users (what).

2.6 Aggregation

This initial data is reduced to a coarser ‘data collection granularity’ (Fig 2), which is sufficient for worldwide analysis. For coordinates, this means that we ‘snap’ points to a grid using a GeoHash of 5 (see [46]), referring to an average aggregation distance of about four kilometers. Similarly, to explore temporal distributions, dates are grouped to distinct months and years. Distinct terms are selected from the post body (the Flickr photo description and Instagram caption), the post title (Flickr only) and tags (Flickr) or hashtags (Instagram), and used to explore associated semantics (what).

From this initial data collection, measures are stepwise aggregated (1) to a 100x100 km grid, (2) country, and (3) worldwide levels. We chose a 100 km resolution as a balance for the worldwide analysis, after testing with both 50 km and 200 km. Notebooks (S1S9 Files) allow for exploration of results for arbitrary resolutions and extents. The anonymized data needed to run notebooks and reproduce results are shared in the data repository.

The count of unique elements (i.e. the estimated number of users) are used for visualizing relationships. We chose to use the signed chi value to capture over and under representation of sunset and sunrise, with respect to the overall use of social media, rather than visualizing absolute counts [4749, p156]. We use a spatial formulation of signed chi values as proposed in an exploratory analysis of social media by Clarke, Wood, Dykes & Slingsby [50]:

chi=obs*normexpexpnorm=ΣexpΣobs

The resulting chi values take into account the observed values (obs), as the subset of users taking sunset or sunrise pictures in a spatial unit (e.g. a 100 km grid cell or country), and compare it with a baseline expected frequency (exp, e.g. the total number of users active in the spatial unit). Clarke et al. [50] refer to this as a “chi expectation surface” (p. 1181). A challenge for calculating chi is that it requires collecting a random sample of social media posts, as a measure for the denominator, the expected frequencies. The simplest approach to generating such a sample is to use the entirety of a social media collection, and indeed for Flickr this approach was possible. However, for Instagram this was not the case. A random selection of 20 million posts was sampled, from a query of the first 50000 posts for each place captured in the core dataset. Since some places have more than 50000 photos, we note that randomness is potentially skewed for Instagram towards highly populated areas or regions. Finally, chi values are only useful where expected counts are sufficiently large, and we used a critical value of chi of 3.84 (1 degree of freedom, p < 0.05) to exclude differences which were not significant. This automatically excludes places and regions where not enough photos are available, as indicated with gray color in our figures.

To explore semantic patterns, we used two approaches. We ranked the terms for each country using term-frequency inverse document-frequency (TF-IDF) as a function of their global frequency (inverse document frequency) [51]. We define a ‘document’ as the set of all terms used by a single user per country. TF-IDF ranks terms used by many users in a country higher than those that are globally common, and ranked lists therefore reveal terms characterizing a grid cell or a country.

TFt,c=NumberofusersusingtermtincountrydTotalnumberoftermsincountry
IDFt=logTotalnumberofcountries1+numberofcountriesinwhichtermtappears
TFIDFt,d=TFt,c*IDFt

To compare semantics between countries, we calculated binary cosine similarity using vectors of all terms used to describe a country (c.f. [16]). Here, binary means that the terms used per country are considered solely based on occurrence, contrary to weighted vectors for terms calculated from the frequency of use. Binary cosine vectors are easier to interpret and allow better comparison of the similarity of vocabularies between countries. Countries reflecting a very similar use of terms tend towards a cosine similarity of 1, while those with very different terms have lower cosine similarities.

cosinesimilarity=i=1nAiBii=1nAi2i=1nBi2

where Ai and Bi are binary vectors indicating presence of a term for a particular country.

3. Results

3.1 Spatial patterns

We first asked whether it is possible to synthesize results in a single map, to compare sunset and sunrise reactions. Fig 3 shows the global pattern of locations where significantly more sunset (red) or sunrise (blue) events were photographed in Flickr than would be expected given the underlying distribution of all Flickr images globally. Values are scaled to have the same range (1–1000), and the map therefore illustrates locations where, firstly, sunrise or sunset is often photographed, and secondly, sunrise or sunset events form spatial patterns. The rationale for scaling is that sunset and sunrise are differently valued overall. This makes direct comparison of chi values difficult because more pictures are available on average for sunset (§2.3). Scaling chi values to the same range allows comparison of local event importance, independent of overall valuation. Furthermore, focus is given to positive chi values, which highlights areas featuring over-representation, where either the sunset (red) or the sunrise (blue) attracts significant more attention. An interactive version is available in S4 File. A graphic that shows under- and overrepresentation individually for sunset and sunrise is shown in S1 Fig.

Fig 3. Chi-value merged—Flickr user count for "sunrise" (blue) and "sunset" (red), 2007–2018.

Fig 3

Focus on positive chi values, normalized to 1–1000 range, Head-Tail-Breaks, 100 km grid. Most significant five grid cells highlighted for sunrise (diamond) and sunset (square).

A general trend in Fig 3 is that sunrise events are globally often associated with east coasts (e.g. US, Australia, UK), while sunset events are photographed on west coasts. Southern Europe and many islands appear to be dominated by sunset events, while patterns associated with sunrise also occur in mountainous regions (e.g. the Rockies in the US, Himalayas, and the mountains of Java, Indonesia). As well as these more spatially autocorrelated regions, many individual cells are scattered globally, though with a general tendency away from the global south. Naively, one might expect that these spatial fluctuations are best explained by the visual qualities of the sunset and sunrise. That is, locations will exist where particularly picturesque, vibrant or vivid variants of these two events are more often observed. This might, for example, be the case for regions where weather and atmospheric conditions favor stunning sunrises and sunsets. Indeed, the prevalence of coastal locations from where sunset and sunrise can be viewed on a distant horizon, and mountainous locations, speaks to such effects.

But, making sense of the patterns found is more complex. This is most evident when examining extreme values and inspecting sample images found in these cells. For Flickr and sunrise (Fig 3), the highest chi values are in grid cells related to three themes: religious landmarks, famous rock formations and eastern facing coasts. The highest chi value (1000, as scaled from 426) is found at the cell hosting Angkor Wat, a religious temple structure in Cambodia. Cells with similarly high chi values are found at the Grand and Bryce Canyons in the USA (876); the east coast of Maui, Hawaii, USA (858); Ayers Rock in Australia (847); and at the volcano Mount Batok with Gunung Penanjakan, a crater viewpoint in Indonesia (824). Interestingly, while these extreme values all represent popular areas, they do not include societies most frequented city centers, the common focus of ecosystem services assessment for human well-being [52].

For sunset (Fig 3), peak values can be observed at the west coasts of Maui (1000, as scaled from 283) and O’ahu (717), Hawaii; at a western beach stretch of Bali in Indonesia, with Pura Tanah Lot, a waterside Hindu temple (658), and at several cells along the west coast of Florida (776, 600 and 594). When compared to sunrise, landmarks and individual locations appear to have an equally strong influence over preference on Flickr. The city of Oía (Cyclades) in Greece shows the second highest chi value for sunset perception worldwide (931). According to Wikipedia [53], it “provides excellent views of the sunset over the caldera”. More supporting evidence can be found, for instance, in a travel magazine [54], which offers a ranking of top spots to view the sunset, with Oía taking first place. However, the peak chi value worldwide for sunset, at the west coast of Maui, Hawaii (1000), is not included in the ranking [54], indicating a bias of information in the travel magazine’s report that would be difficult to disprove otherwise. Since these maps are based on user counts, they emphasize the behavior of all users and reduce possible biases from very active individuals.

The patterns for sunset and sunrise reactions on Instagram (Fig 4) largely follow the general trends observed for Flickr. Particularly extreme values for individual cells in Flickr also reflect high significance based on the Instagram dataset. Individual differences in ranking are not surprising, given the fact that Instagram data has been collected for only five months (August to December), compared to Flickr reflecting patterns across all months and from a 10-year timespan. For instance, Oía in Greece is ranked highest for sunset on Instagram, but superseded by the west coast of Maui for Flickr. Maui is known as a popular northern winter travel destination—a season only partially covered in our Instagram dataset. Other differences are more striking, such as the Burning Man festival in Nevada ranking second worldwide for sunrise reactions on Instagram. 1295 (±30) users shared sunrise images during the short period of the 2017 Burning Man festival on Instagram, compared to only 54 (±2) Flickr users for the same location over a 10-year timespan.

Fig 4. Chi-value merged—Instagram, user count for "sunrise" (blue) and "sunset" (red), Aug-Dec 2017.

Fig 4

Focus on positive chi values, normalized to 1–1000 range, Head-Tail-Breaks, 100 km grid. Most significant five grid cells highlighted for sunrise (diamond) and sunset (square).

To explore the patterns at a higher aggregation level, we calculated chi values for countries (Fig 5, code in S5 File). Given the increasing spatial aggregation and reduced influence from individual dominant place characteristics, the differences between Instagram and Flickr are amplified. A general tendency can be observed with Instagram dominating over Flickr for both sunrise and sunset in (e.g.) Southern Europe, Africa or Indonesia, which are known to be tourist destinations. A possible interpretation could be that sunset and sunrise viewing for the Instagram community overall is more prominently favored during travel, whereas preference for sunrise and sunset photography for Flickr users is also strongly linked to exploration of everyday landscapes and local areas, from the user perspective. When focusing on the countries where Flickr and Instagram trends confirm each other, single dominant causal factors become easier to identify. For instance, a popular trend for “guided” sunrise tours [40] for tourists in Indonesia likely leads to positive chi values for both Flickr and Instagram. In contrast, a different positive (Instagram) and negative (Flickr) chi for the sunset suggests that the viewing of this event in this country is affected by a higher variety of ‘incidental variables’, depending on (e.g.) platform incentives and preferences of certain groups. In other words, while the “sunset” appears to be a significant attraction factor for Instagram users visiting Indonesia, Flickr users’ photo behavior is affected by a larger variety of other aspects not captured in our analysis.

Fig 5. Chi expectation surface for Countries for Flickr (top, 2007–2018) and Instagram (bottom, Aug-Dec 2017) and sunrise (left) and sunset (right).

Fig 5

Based on user counts, over- and underrepresentation, pooled quantiles classification applied to all four maps. Countries with non-significant results are shown with hatching.

3.2 Temporal patterns

In addition to the spatial dimension, taking into account temporal and semantic patterns can provide important contextual information supporting our ability to interpret patterns and draw accurate conclusions. To illustrate temporal relationships, monthly statistics for post count are summarized in Fig 6 for Flickr (see code in S7 File). Since Instagram data were only available for five months, we do not plot a temporal series here.

Fig 6. Temporal distribution of collected data for sunset and sunrise from Flickr.

Fig 6

Several observations can be made. Firstly, the overall proportion of sunrise to sunset images is relatively stable over time, which we interpret as an indication of the reliability of the collected data. Secondly, an increasing trend is visible up to 2012, with a continual decrease afterwards. Overall platform popularity is one explanation [55], with a possible peak in 2012–2013. Finally, despite having collected data worldwide, seasonal trends appear to affect the quantity of shared photos for Flickr. Frequent data peaks can be observed during the northern hemisphere summer season (June, July). Conversely, lows are common during winter months (December, January). These trends may exist due to the bias of the underlying group of users that produced this data. Exploring spatial results (§3.1), it becomes obvious that urban areas and tourist hot spots in Europe and North America are dominantly visible, whereas Africa, South America and large parts of Asia have less data. This is not only a consequence of the language filter that was used (see §2.2). Access to these platforms is limited by origin, income levels, technological knowledge and other factors, generally subsumed under the ‘digital divide’ [56]. Consequently, the northern hemisphere summer vacation period may reinforce the observed seasonal patterns in data, even for data collected outside Europe or North America due to the effects of tourism.

3.3 Semantic patterns

To assess the use of language amongst different users and groups we first explored the use of terms across languages in Flickr and Instagram (Table 2). The ranking is very similar, indicating a universal use of terms to describe sunset and sunrise not significantly influenced by platform. Furthermore, the two top scoring terms “sunset” and “sunrise” are used in 96.7% (Instagram) and 96.9% (Flickr) of all posts in our dataset, revealing a strong preference for English as mode of communication. Thus, if we had searched using only the terms ‘sunset’ and ‘sunrise’, we would have been able to capture almost all posts of the other three languages as well. The inverse was also true—of those using language specific terms not in English, 67% of all posts with German, Dutch or French terms also contained at least one of the English references (sunrise, sunset, or sunrises or sunsets). We attribute this result to incentives for reaching the broadest possible audience in social media [57].

Table 2. Search terms and comparison of ranking of post count quantities overall for Instagram and Flickr (corresponding code in S9 File).

Instagram Rank diff Flickr
term post count term post count
sunset 16,992,173 0 sunset 2,431,495
sunrise 4,662,611 0 sunrise 851,468
sunsets 1,443,750 1 sonnenuntergang 112,093
sonnenuntergang 351,388 -1 sunsets 70,104
sunrises 126,114 4 sonnenaufgang 41,513
sonnenaufgang 103,459 -1 zonsondergang 25,306
zonsondergang 27,318 -1 coucher soleil 19,603
coucherdusoleil 12,977 2 leverdesoleil 10,283
leverdesoleil 11,494 -1 sunrises 9,627
zonsopkomst 6,769 2 coucherdusoleil 7,035
leverdusoleil 6,080 3 lever soleil 6,750
zonsopgang 3,423 1 zonsopkomst 5,163
lever soleil 3,386 -2 zonsopgang 3,527
coucher_du_soleil 2,556 2 leverdusoleil 2,049
couchersoleil 2,556 0 couchersoleil 1,110
coucher soleil 801 -9 coucher_du_soleil 1,078
leversoleil 525 0 leversoleil 186
coucher_soleil 90 0 coucher_soleil 31
lever_du_soleil 72 0 lever_du_soleil 0
lever_de_soleil 67 0 lever_de_soleil 0
lever_soleil 52 0 lever_soleil 0

To explore context with respect to our data, we took advantage of the tags associated with pictures. By calculating TF-IDF values at the country level, we can gain a better understanding of the concepts associated with sunrise or sunset. Fig 7 illustrates TF-IDF values for Zambia and Spain. A few points are evident when we explore the term list for Zambia. First, all of the terms are in English. Second, we find a mixture of generic landscape terms (e.g. river, sky, water, clouds) suggesting elements often photographed in conjunction with sunsets in this context. The prominence of water is unsurprising, given the general preference for water as an element of landscape. The few placenames mentioned (zambezi, african, zimbabwe, victoriafalls) seem likely to refer to the Victoria Falls on the border between Zambia and Zimbabwe, an iconic site likely being photographed by tourists. Other terms also suggest these terms are related to tourist activities at this and other locations (cruise). Although all of the terms associated with Zambia were English, this need not be the case, as is illustrated by the terms retrieved for Spain, where a mixture of English and Spanish is prominent, and the second highest rated term is the Spanish for sunset (which was not a search term). It is also possible to explore similarities between countries, through the words used to describe sunset or sunrise, using cosine similarity. For Zambia we see a strip of neighboring east and south African countries with high cosine similarity, suggesting that the context associated with sunset is more similar in nearby locations. An interactive map, allowing exploration of all countries and the corresponding code is found in S6 File.

Fig 7. TF-IDF for top scoring sunset terms in Zambia and Spain and global country similarity ranking based on cosine similarity to Zambia.

Fig 7

3.4 Relationships

As observed earlier, a strong consistency of rank order exists across our datasets. In other words, grid cells ranking relatively higher compared to others in Flickr are also likely to show the same relationship for Instagram (Figs 3 and 4). As a last synthesis of information, we therefore generate four relationship plots (Fig 8, see code in S8 File), based on absolute sunrise/ sunset user count ranks as a function of countries and platforms. These plots allow us to explore, firstly, in which countries sunrise and sunset are photographed more often by unique users, secondly, the relationship between the frequency of photographing sunrise and sunset and thirdly, the extent to which behavior depends on the platform.

Fig 8. Relationship between sunset and sunrise (two plots on the left) and Flickr and Instagram (two plots on the right).

Fig 8

Based on ranked (absolute) user count, each dot represents a country (su_a3 codes).

Generally (Fig 8) we find a strong relationship between particular countries (in Europe and North America) and rank for both Flickr and Instagram. Overall, the strongest relationship between rank of sunset and sunrise photographs was in Instagram (ρ = 0.97). However, both Flickr and Instagram have Spearman rank correlations of greater than ρ = 0.96, pointing to a generally strong relationship between how often these events are photographed at the level of individual countries in social media. Comparing the two phenomena between different social media datasets (Fig 8, two plots on the right), we note that sunset has a slightly stronger correlation between Flickr and Instagram (ρ = 0.93), perhaps suggesting that this event is somewhat more universally appreciated, at least in our data sources, than sunrise. Lastly, European countries and the US/Canada are highlighted in red. These countries tend to dominate the upper ranks for user counts, underpinning earlier observations of western culture dominating on both Instagram and Flickr.

4. Discussion

Much research has been conducted around social media for landscape preference studies, but conceptual and practical challenges still hamper application and further development. One challenge is the variability of data and parameter choices complicating comparisons across studies. We explicitly limited the initial set of collected data to a narrow thematic filter—worldwide reactions to sunset and sunrise. This allows us to compare parameter effects in isolation, test the robustness of existing measures and identify opportunities for improvement.

Our results show that it is possible to disconnect the study of landscape preference from overall visitation frequencies. Such a coupling, which appears to be the case in the majority of current studies, may mislead practitioners to overemphasize landscape preference in urban areas and highly frequented popular places, as highlighted by, for example, Teles da Mota & Pickering [8] and Ghermandi & Sinclair [25]. Instead, by using the signed chi square test, we can identify collectively important places and areas (with respect to reactions to sunset and sunrise) at a global scale, independent of overall user frequencies. Furthermore, we also identified places where high user frequencies coincide with an unusually high visual (and virtually communicated) affinity for these events, such as Angkor Wat in Indonesia, Ayers Rock in Australia, or some national parks in the US. For these places and areas, the relative quantification of landscape preference from social media can provide important clues towards their collective use, meaning, and value for social recreation and well-being. Our results are largely consistent across two independent datasets from Flickr and Instagram, indicating a strong consistency of underlying behavior patterns and preference factors for these two events.

For application of results in decision making processes, a number of conceptual challenges remain. Firstly, chi maps show relations and understanding causal factors means zooming into the data. For example, the city of Oía attracts many photographs of sunset while the crater vantage point Gunung Penanjakan in Indonesia (§3.1) is a popular location for photographing sunrise. How much of this popularity can be attributed to the global spread of information? Communication, tourist reports and magazine entries (etc.) will amplify certain behavior patterns. It is not possible to fully distinguish between the different motivations for visiting places in hindsight. These self-reinforcing tourist trends are commonly referred to with the hermeneutic circle [39 p129]. Whether it is valid to consider these trends as ‘collective values’ or as indicators for ‘landscape preference’ is an important theoretical question if we are to use social media data in research. In planning, for example, identifying areas under pressure from mass-tourism [58] or so-called ‘cybercascades’ [59], warranting action, can be seen as an equally valid outcome, as would protecting ‘visually’ important areas or developing under-used ones.

We consider it also important to report on negative results and do so in S1S9 Files. For instance, only user counts proved to be a reliable and robust measurement for collective user attention. User days, as popularized by Wood et al. [22], particularly when measured across several years, as for Flickr, are affected by similar disadvantages to those observed for post (or photo) counts. An example is given in S1 File, where the 100 km grid reveals an unusually high frequency of user days for the grid cell of Berlin. Using Flickr’s online search, we manually identified a single user who shared more than 50 thousand photos for sunrise, by what appeared to be a scripted upload of webcam pictures. Such biases towards individual very active users or bots have been identified by multiple authors [6, 15]. These biases cannot occur when we count unique users, where each user is only counted once (per grid cell, or country). A compromise could be to measure the average number of user days per month, or to consider new measurements such as ‘user years’, taking into account user visits only once per year per location. This may be particularly useful to focus on repeated attachment to places over longer periods, for specifically capturing the local population’s preferences. Notwithstanding these future opportunities, it should be emphasized that visualization of absolute frequencies, still used as a measure in many studies, amplifies inherent biases in user generated content and should be avoided.

Despite the fact that both platforms have different user groups, Instagram users´ place-based preference for sunset and sunrise, in essence, resembles that of Flickr. This confirms the general notion that preference in landscape perception is not random. Turning to classical landscape preference research, the environmental psychologist Stephen Kaplan argues that aesthetics is “an expression of some basic and underlying aspect of the human mind” [60 p242] and later concludes that there are some remarkable communalities in perception between different people, “perhaps in part because of our common evolutionary heritage” [60 p242]. To capture these characteristics, unconscious, implicitly and in-situ collected data is particularly useful [57]. It appears possible to assign at least some of the patterns in our data to these underlying human behavioral traits such as a collective preference for watching the sun rising and setting above water. We sound though a note of caution. Our results are not representative of all cultures, as evidenced by the dominant use of English to describe images, irrespective of search terms in other languages. The patterns we find reflect the preferences of a particular group of individuals expressed globally, rather than a universal preference.

Other patterns required us to zoom into more individual preference factors and characteristics of particular places, or peculiarities resulting from data collection through social media. The opportunities provided by these data sources are manifold. Communalities in perception can be spatially quantified and visualized globally, with unprecedented prospects to improve models of visitation and preference. To this effect the expected frequencies for Instagram and Flickr users per 100 km are made available in the data repository (abstracted HLL data) and can be used to transfer the methods presented in this paper to the exploration of other events and topics. On the other hand, the underlying vagueness in human conceptualization and communication of spatial knowledge makes systematic use of this data difficult [61]. Furthermore, comparable data rarely exists, which means that verification of quality is limited to internal consistency checks [62]. From a systematic model point of view, slicing data into four dimensions (Where, When, What, Who; e.g. [27, 63]), for comparative analysis, and comparing results across these slices, greatly improved assessing the fitness of the collected data. Furthermore, the use of abstracted, estimated non-personal data based on HyperLogLog, was a practically viable solution, supporting a shift towards privacy-preserving and ethically-aware data analytics in research on human preferences [64].

Several caveats apply. A primary challenge of the chi equation is the need to calculate expected values from underlying data whose properties are largely unknown, especially given the growing volumes of these data sources and opaque platform interfaces. Furthermore, we only explored one particular set of reactions, to a narrow selection of two events for the sunset and sunrise. Based on these subsets, extrapolation of results to assess overall landscape experience and preference is not possible. Similar to how landscape preference has been selectively studied using ‘landscape inventories’ [65], a possible future opportunity could be to consider ‘event inventories’, as a means to specifically capture temporal landscape preference factors from (e.g.) ephemeral events, such as changing wildlife [66], social events or recurring phenomena and specific light conditions. This will ultimately improve the information basis that is available to capture how the world is perceived, valued and appreciated.

5. Conclusions

By explicitly limiting the initial set of collected data to a narrow thematic filter of millions of worldwide reactions to the sunset and sunrise on social media, we explore possible directions for more robust analysis and visualization of landscape preference. Using the signed chi square test, the results primarily illustrate how this important indicator can be studied without being tied to overall user frequencies or platform, offering an opportunity for a more balanced consideration of collective values in both popular, urban and less frequented, rural areas at a global scale.

Our results address an increasing need for reproducible studies focusing on improving integration of existing methods and standardization, rather than developing new methods. Through the isolation of measures, such as user count, user days, or post counts, and the comparison across datasets (Instagram, Flickr) and dimensions (Who, What, When, Where), a number of pitfalls and issues with data were revealed that may easily invalidate similar studies. Here, the illustrated process can be seen as a blueprint, offering a workflow that can be adapted and transferred to other contexts, beyond reactions to the sunset and sunrise. To this effect, the code for data processing and creation of figures is fully provided in several notebooks, which includes important considerations for data processing, for compatibility with ethical norms in human research.

More broadly, opportunities exist in widening the types of online reactions captured and explicitly considering event inventories. Here, reactions to the setting and rising of the sun captured through social media can be seen as one of many possible indicators for assessing the temporality of human behavior, ephemeral values and transient landscape meaning. Such information may help to better plan and manage a collectively beneficial and equitable development of the environment.

Supporting information

S1 File

(HTML)

S2 File

(HTML)

S3 File

(HTML)

S4 File

(HTML)

S5 File

(HTML)

S6 File

(HTML)

S7 File

(HTML)

S8 File

(HTML)

S9 File

(HTML)

S1 Fig. Chi expectation surface for Flickr and Instagram and sunrise and sunset per 100 km grid.

Based on user count, over- and underrepresentation, Natural Breaks classification.

(TIF)

Data Availability

All HyperLogLog data used to produce figures and results in this work (see code Supporting information S1S9) is made available in a public data repository https://doi.org/10.25532/OPARA-200.

Funding Statement

This work was supported by the German Research Foundation as part of the priority programme ‘Volunteered Geographic Information: Interpretation, Visualisation and Social Computing’ (VGIscience, SPP 1894) and the Swiss National Science Foundation (Project No 200021E-186389). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Seel M. Eine Aesthetik der Natur. Suhrkamp; 1996. [Google Scholar]
  • 2.Gruebner O, Sykora M, Lowe SR, et al. Big data opportunities for social behavioral and mental health research. Soc Sci Med 2017; 189: 167–169. doi: 10.1016/j.socscimed.2017.07.018 [DOI] [PubMed] [Google Scholar]
  • 3.Laraway S, Snycerski S, Pradhan S, Huitema BE. An Overview of Scientific Reproducibility: Consideration of Relevant Issues for Behavior Science/Analysis. Perspect Behav Sci. 2019;42: 33–57. doi: 10.1007/s40614-019-00193-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mannix EA, Neale MA, Northcraft GB. Equity, Equality, or Need? The Effects of Organizational Culture on the Allocation of Benefits and Burdens. Organ Behav Hum Decis Process. 1995;63: 276–286. doi: 10.1006/obhd.1995.1079 [DOI] [Google Scholar]
  • 5.Lynch JA, Gimblett RH. Perceptual values in the cultural landscape: A computer model for assessing and mapping perceived mystery in rural environments. Comput Environ Urban Syst. 1992;16: 453–471. doi: 10.1016/0198-9715(92)90005-C [DOI] [Google Scholar]
  • 6.Ilieva RT, McPhearson T. Social-media data for urban sustainability. Nat Sustain. 2018;1: 553–565. doi: 10.1038/s41893-018-0153-6 [DOI] [Google Scholar]
  • 7.See L, Mooney P, Foody G, Bastin L, Comber A, Estima J, et al. Crowdsourcing, citizen science or volunteered geographic information? The current state of crowdsourced geographic information. ISPRS Int J Geo-Information. 2016;5. doi: 10.3390/ijgi5050055 [DOI] [Google Scholar]
  • 8.Teles da Mota V, Pickering C. Using social media to assess nature-based tourism: Current research and future trends. J Outdoor Recreat Tour. 2020;30: 100295. doi: 10.1016/j.jort.2020.100295 [DOI] [Google Scholar]
  • 9.Gobster PH, Ribe RG, Palmer JF. Themes and trends in visual assessment research: Introduction to the Landscape and Urban Planning special collection on the visual assessment of landscapes. Landsc Urban Plan. 2019;191: 103635. doi: 10.1016/j.landurbplan.2019.103635 [DOI] [Google Scholar]
  • 10.van Zanten BT, van Berkel DB, Meentemeyer RK, Smith JW, Tieskens KF, Verburg PH. Continental-scale quantification of landscape values using social media data. Proc Natl Acad Sci U S A. 2016;113: 12974–12979. doi: 10.1073/pnas.1614158113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Salesses P, Schechtner K, Hidalgo CA. The Collaborative Image of The City: Mapping the Inequality of Urban Perception. PLoS One. 2013;8. doi: 10.1371/journal.pone.0068400 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kaußen L. Landscape perception and construction in social media: An analysis of user-generated content. J Digit Landsc Archit. 2018;2018: 373–379. doi: 10.14627/537642040 [DOI] [Google Scholar]
  • 13.Daniel TC. Whither scenic beauty? Visual landscape quality assessment in the 21st century. Landsc Urban Plan. 2001;54: 267–281. doi: 10.1016/S0169-2046(01)00141-4 [DOI] [Google Scholar]
  • 14.Oteros-Rozas E, Martín-López B, Fagerholm N, Bieling C, Plieninger T. Using social media photos to explore the relation between cultural ecosystem services and landscape features across five European sites. Ecol Indic. 2016. doi: 10.1016/j.ecolind.2017.02.009 [DOI] [Google Scholar]
  • 15.Martí P, Serrano-Estrada L, Nolasco-Cirugeda A. Social Media data: Challenges, opportunities and limitations in urban studies. Comput Environ Urban Syst. 2019;74: 161–174. doi: 10.1016/j.compenvurbsys.2018.11.001 [DOI] [Google Scholar]
  • 16.Wartmann FM, Purves RS. Investigating sense of place as a cultural ecosystem service in different landscapes through the lens of language. Landsc Urban Plan. 2018;175: 169–183. doi: 10.1016/j.landurbplan.2018.03.021 [DOI] [Google Scholar]
  • 17.Ghermandi A, Camacho-Valdez V, Trejo-Espinosa H. Social media-based analysis of cultural ecosystem services and heritage tourism in a coastal region of Mexico. Tour Manag. 2020;77: 104002. doi: 10.1016/j.tourman.2019.104002 [DOI] [Google Scholar]
  • 18.Boy JD, Uitermark J. How to study the city on instagram. PLoS One. 2016;11: 1–16. doi: 10.1371/journal.pone.0158161 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Chen Y, Parkins JR, Sherren K. Using geo-tagged Instagram posts to reveal landscape values around current and proposed hydroelectric dams and their reservoirs. Landsc Urban Plan. 2017; 0–1. doi: 10.1016/j.landurbplan.2017.07.004 [DOI] [Google Scholar]
  • 20.Langemeyer J, Calcagni F, Baró F. Mapping the intangible: Using geolocated social media data to examine landscape aesthetics. Land use policy. 2018;77: 542–552. doi: 10.1016/j.landusepol.2018.05.049 [DOI] [Google Scholar]
  • 21.Tieskens KF, Van Zanten BT, Schulp CJE, Verburg PH. Aesthetic appreciation of the cultural landscape through social media: An analysis of revealed preference in the Dutch river landscape. Landsc Urban Plan. 2018;177: 128–137. doi: 10.1016/j.landurbplan.2018.05.002 [DOI] [Google Scholar]
  • 22.Wood SA, Guerry AD, Silver JM, Lacayo M. Using social media to quantify nature-based tourism and recreation. Sci Rep. 2013;3: 2976. doi: 10.1038/srep02976 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Gosal AS, Ziv G. Landscape aesthetics: Spatial modelling and mapping using social media images and machine learning. Ecol Indic. 2020;117. doi: 10.1016/j.ecolind.2020.106638 [DOI] [Google Scholar]
  • 24.Fisher DM, Wood SA, White EM, Blahna DJ, Lange S, Weinberg A, et al. Recreational use in dispersed public lands measured using social media data and on-site counts. J Environ Manage. 2018;222: 465–474. doi: 10.1016/j.jenvman.2018.05.045 [DOI] [PubMed] [Google Scholar]
  • 25.Ghermandi A, Sinclair M. Passive crowdsourcing of social media in environmental research: A systematic map. Glob Environ Chang. 2019;55: 36–47. doi: 10.1016/j.gloenvcha.2019.02.003 [DOI] [Google Scholar]
  • 26.Mashhadi A, Winder SG, Lia EH, Wood SA. No Walk in the Park: The Viability and Fairness of Social Media Analysis for Parks and Recreational Policy Making. Proc Int AAAI Conf Web Soc Media. 2021;15: 409–420. Available from: https://ojs.aaai.org/index.php/ICWSM/article/view/18071 [Google Scholar]
  • 27.Di Minin E, Tenkanen H, Toivonen T. Prospects and challenges for social media data in conservation science. Front Environ Sci. 2015;3: 1–6. doi: 10.3389/fenvs.2015.00063 [DOI] [Google Scholar]
  • 28.Toivonen T, Heikinheimo V, Fink C, Hausmann A, Hiippala T, Järv O, et al. Social media data for conservation science: A methodological overview. Biol Conserv. 2019;233: 298–315. doi: 10.1016/j.biocon.2019.01.023 [DOI] [Google Scholar]
  • 29.boyd D, Crawford K. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Inf Commun Soc. 2012;15: 662–679. doi: 10.1080/1369118X.2012.678878 [DOI] [Google Scholar]
  • 30.Samarati P, Sweeney L. Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement through Generalization and Suppression. Tech Rep SRI-CSL-98-04, Comput Sci Lab SRI Int. 1998. http://www.csl.sri.com/papers/sritr-98-04/
  • 31.Dwork C. Differential privacy: A survey of results. Lect Notes Comput Sci. 2008;4978 LNCS: 1–19. doi: 10.1007/978-3-540-79228-4_1 [DOI] [Google Scholar]
  • 32.Malhotra NK, Kim SS, Agarwal J. Internet users’ information privacy concerns (IUIPC): The construct, the scale, and a causal model. Inf Syst Res. 2004;15: 336–355. doi: 10.1287/isre.1040.0032 [DOI] [Google Scholar]
  • 33.Singh A, Garg S, Kaur R, Batra S, Kumar N, Zomaya AY. Probabilistic data structures for big data analytics: A comprehensive review. Knowledge-Based Syst. 2020;188. doi: 10.1016/j.knosys.2019.104987 [DOI] [Google Scholar]
  • 34.Flajolet P, Fusy É, Gandouet O. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. 2007; 127–146. [Google Scholar]
  • 35.Dunkel A, Löchner M, Burghardt D. Privacy-aware visualization of volunteered geographic information (VGI) to analyze spatial activity: a benchmark implementation. ISPRS Int J Geo-Information. 2020;9. doi: 10.3390/ijgi9100607 [DOI] [Google Scholar]
  • 36.Howard P. Perceptual lenses. The Routledge Companion to Landscape Studies. 2013. pp. 43–53. doi: 10.4324/9780203096925-10 [DOI] [Google Scholar]
  • 37.Fletcher MS, Hamilton R, Dressler W, Palmer L. Indigenous knowledge and the shackles of wilderness. Proc Natl Acad Sci U S A. 2021;118: 1–7. doi: 10.1073/pnas.2022218118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Villi M. “hey, im here right now”: Camera phone photographs and mediated presence. Photographies. 2015;8: 3–22. doi: 10.1080/17540763.2014.968937 [DOI] [Google Scholar]
  • 39.Urry J. The Tourist Gaze (2nd edition). Sage; 2002. [Google Scholar]
  • 40.Smith SP. Tourism and symbolic power: Leveraging social media with the stance of disavowal. J Socioling. 2021; 1–18. doi: 10.1111/josl.12484 [DOI] [Google Scholar]
  • 41.Gruzd A. Netlytic: Software for Automated Text and Social Network Analysis. 2016. Available from https://netlytic.org [Google Scholar]
  • 42.Giordano G, Primerano I, Vitale P. A Network-Based Indicator of Travelers Performativity on Instagram. Soc Indic Res. 2020. doi: 10.1007/s11205-020-02326-7 [DOI] [Google Scholar]
  • 43.Lu D, Wu R, Sang J. Overlapped user-based comparative study on photo-sharing websites. Inf Sci (Ny). 2017;376: 54–70. doi: 10.1016/j.ins.2016.10.005 [DOI] [Google Scholar]
  • 44.Open LBSN. A common language independent, privacy-aware and cross-network social-media data scheme. 2020. https://lbsn.vgiscience.org
  • 45.Instagram. #sunset and #sunrise photographs. Accessed 03/03/2021, from https://www.instagram.com/explore/tags/sunset/ &../sunrise/
  • 46.Ruppel P, Küpper A. Geocookie: A space-efficient representation of geographic location sets. J Inf Process. 2014;22: 418–424. doi: 10.2197/ipsjjip.22.418 [DOI] [Google Scholar]
  • 47.Visvalingam M. The signed Chi-square measure for mapping. Cartogr J. 1978;15: 93–98. doi: 10.1179/caj.1978.15.2.93 [DOI] [Google Scholar]
  • 48.UK Census Research Unit. People in Britain: a census Atlas. HMSO; 1980.
  • 49.Brown P, Hirschfield A, Marsden J. Analysing Spatial Patterns of Disease: Some Issues in the Mapping of Incidence Data for Relatively Rare Conditions. 1995; 145–163. doi: 10.1007/978-0-585-31560-7_10 [DOI] [Google Scholar]
  • 50.Clarke K, Wood J, Dykes J, Slingsby A. Interactive Visual Exploration of a Large Spatio-Temporal Dataset: Reflections on a Geovisualization Mashup. IEEE Trans Vis Comput Graph. 2007;13: 1176–1183. doi: 10.1109/TVCG.2007.70570 [DOI] [PubMed] [Google Scholar]
  • 51.Rattenbury T, Naaman M. Methods for extracting place semantics from Flickr tags. ACM Trans Web. 2009;3: 1–30. doi: 10.1145/1462148.1462149 [DOI] [Google Scholar]
  • 52.Brilha J, Gray M, Pereira DI, Pereira P. Geodiversity: An integrative review as a contribution to the sustainable management of the whole of nature. Environ Sci Policy. 2018;86: 19–28. doi: 10.1016/j.envsci.2018.05.001 [DOI] [Google Scholar]
  • 53.Wikipedia. Oia, Greece. Accessed 06/25/2021, from: https://en.wikipedia.org/wiki/Oia,_Greece
  • 54.Accor Travel Magazine. The Top 10 Sunset Locations in the World. Accessed 21/07/2021, from: https://all.accor.com/gb/middle-east/magazine/one-hour-one-day-one-week/top-10-sunset-locations-in-the-world-0e168.shtml
  • 55.Teles da Mota V, Pickering C. Assessing the popularity of urban beaches using metadata from social media images as a rapid tool for coastal management. Ocean Coast Manag. 2021;203: 105519. doi: 10.1016/j.ocecoaman.2021.105519 [DOI] [Google Scholar]
  • 56.Taubenböck H, Staab J, Zhu XX, Geiß C, Dech S, Wurm M. Are the poor digitally left behind? indications of urban divides based on remote sensing and Twitter data. ISPRS Int J Geo-Information. 2018;7. doi: 10.3390/ijgi7080304 [DOI] [Google Scholar]
  • 57.Bubalo M, van Zanten BT, Verburg PH. Crowdsourcing geo-information on landscape perceptions and preferences: A review. Landsc Urban Plan. 2019;184: 101–111. doi: 10.1016/j.landurbplan.2019.01.001 [DOI] [Google Scholar]
  • 58.Øian H, Fredman P, Sandell K, Sæþórsdóttir AD, Tyrväinen L, Søndergaard Jensen F. Tourism, Nature and Sustainability. 2018. doi: 10.6027/TN2018-534 [DOI] [Google Scholar]
  • 59.Sunstein CR. #Republic: Divided Democracy in the Age of Social Media. 2018. doi: 10.1111/jcom.12344 [DOI] [Google Scholar]
  • 60.Kaplan S. Perception and landscape: conceptions and misconceptions. Environ Aesthet. 1979; 241–248. Available from: https://www.fs.fed.us/psw/publications/documents/psw_gtr035/psw_gtr035_05_s-kaplan.pdf [Google Scholar]
  • 61.Hollenstein L, Purves R. Exploring place through user-generated content: Using Flickr to describe city cores. Jsis. 2010;1: 21–48. doi: 10.5311/JOSIS.2010.1.3 [DOI] [Google Scholar]
  • 62.Senaratne H, Mobasheri A, Ali AL, Capineri C, Haklay M (Muki). A review of volunteered geographic information quality assessment methods. Int J Geogr Inf Sci. 2017;31: 139–167. doi: 10.1080/13658816.2016.1189556 [DOI] [Google Scholar]
  • 63.Dunkel A, Andrienko G, Andrienko N, Burghardt D, Hauthal E, Purves R. A conceptual framework for studying collective reactions to events in location-based social media. Int J Geogr Inf Sci. 2019;33: 780–804. doi: 10.1080/13658816.2018.1546390 [DOI] [Google Scholar]
  • 64.Huang H, Yao XA, Krisp JM, Jiang B. Analytics of location-based big data for smart cities: Opportunities, challenges, and future directions. Comput Environ Urban Syst. 2021;90: 101712. doi: 10.1016/j.compenvurbsys.2021.101712 [DOI] [Google Scholar]
  • 65.Dakin S. There’s more to landscape than meets the eye: Towards inclusive landscape assessment in resource and environmental management. Can Geogr. 2003;47: 185–200. doi: 10.1111/1541-0064.t01-1-00003 [DOI] [Google Scholar]
  • 66.Edwards T, Jones CB, Perkins SE, Corcoran P. Passive citizen science: The role of social media in wildlife observations. PLoS One. 2021;16: 1–22. doi: 10.1371/journal.pone.0255416 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Jacinto Estima

25 Aug 2022

PONE-D-22-16755From sunrise to sunset: Exploring landscape preference through global reactions to ephemeral events captured in georeferenced social mediaPLOS ONE

Dear Dr. Dunkel,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 09 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jacinto Estima

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

Additional Editor Comments:

The paper investigates human reactions to the sunset and the sunrise using geo-referenced photos collected from Instagram and Flickr aiming to understand the motivations behind taking and sharing photographies of sunset and sunrise. The topic of the paper is interesting but relatively narrow and likely not particularly interesting to a broad audience.

There are some minor points to be addressed and other more substancial mostly related to methodological decisions and strategy. Please address all the points raised by the reviewers and try to respond to them one-by-one. That will help improving the paper.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors use Flickr and Instagram data associated to individual users collected to understand the underlying motivations for taking and sharing sunset and sunrise photography. While the narrative is easily followed and there are some nice figures, the paper is generally colloquial. Overall, I think there are a few issues need to be addressed and clarified further.

1. I recommend that the authors improve the introduction section, especially how the work contribute to the current state-of-the-art.

2. I recommend that the authors improve the results and discussion sections. Currently, the results stay on the description where more in-depth explanations are expected.

3. A figure showing the distribution of photos across countries is helpful. This is needed because if certain countries only have a few data or the distribution is highly skewed, the result is not enough to draw conclusion. In addition, it would be better to add some explanations for the selected time period.

4. There is lack of consideration of bias in the sample of Flickr and Instagram users and the used keywords for collecting data. More discussion is needed with respect to the data representatives. Besides, did the authors consider avoiding fake users or bots running by programming, or did the author conduct an outlier analysis? If yes, please clarify.

5. Why 100x100 km grid? Did the authors do some form of sensitivity test for different size of grid? I’s typically do some simple sensitivity testing to check whether the results are sensitive to resolution.

6. Figures are consistently unclear, or due to text size. They are hard or impossible to read.

Reviewer #2: The paper explores human global reactions to the sunset and the sunrise, using two datasets of geo-referenced photos collected from Instagram and Flickr, and aiming to understand the motivations behind taking and sharing sunset and sunrise photography. The authors attempted to analyze reactions across different groups, and in terms of aspects such as "what" is collectively valued "where", by "whom" and "when." To do this, the authors used relatively simple data analysis methods, specifically (a) using the HyperLogLog algorithm to count the number of distinct users sharing photos related to sunset or sunrise taken at different regions (i.e., different countries, or different cells in a global raster with a resolution of 4km) and at different months, (b) using TF-IDF heuristics to select discriminative terms associated to different regions, (c) using a spatial formulation of signed chi values to select regions that over and under represent the concepts of sunset and the sunrise.

Overall, the paper is both sound and clearly written, presenting the results of what I consider to be an interesting analysis. Still, there are no methodological innovations being proposed (and I do have some questions regarding some of the methodological choices), and the topic addressed in the paper is relatively narrow and likely not particularly interesting to a broad audience.

I have some suggestions in terms of aspects that can perhaps be improved in the manuscript, which I list next.

* The paper should perhaps further justify the choice of languages, besides english, that were considered for analysis with basis on a translation of the terms used for data collection (e.g., German, Dutch, and French). Although I do not consider this a requirement for acceptance of the paper, perhaps other popular languages could have been considered as well, including Spanish, Mandarin, Portuguese, Arabic, or Russian.

* The authors can perhaps further justify the use of HyperLogLog, ideally also presenting a brief explanation on the paper. It is not entirely clear how/if the actual source data will be shared by the authors (the paper mentions a public repository, but at this time I could only access the pre-prepared maps and the Python notebooks. Also, I am not sure if sharing the source data is allowed by Flickr/Instagram), but I would argue that conducting the analysis with privacy-preserving methods would not be the main motivation in using HyperLogLog, although perhaps computational efficiency is an important motivation for this.

* The explanation associated to the use of TF-IDF should be improved. The authors mention the use of "spatial TF-IDF", but it is not clear what the "spatial" adaptation actually is, nor what is the difference towards standard TF-IDF. From the descriptions that are provided latter, I guess the authors are aggregating the textual descriptions from each photo and considering each of the spatial regions (countries or cells) as "individual documents", computing the TF and IDF components with basis on this aggregations (and hence they are able to use TF-IDF to get the most "discriminative" terms associated to each spatial region). However, this is not entirely clear, and should be better explained in the paper. The explanations associated to the TF-IDF equation should also be improved, given for instance that "f" is not a variable used in the equation (whereas "df" is).

* For comparing countries, the authors mention the use of "binary cosine similarity." This should also be further explained in the paper (e.g., does "binary" mean that the authors are considering vectors indicating only the presence of particular terms?) and, ideally, also further justified (why not use vectors of TF-IDF weights?).

* Many of the figures also seem to have a relatively low resolution and, for inclusion in a final manuscript, the authors should ideally provide vector versions of the images, instead of PNG files.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Bruno Martins

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Feb 22;18(2):e0280423. doi: 10.1371/journal.pone.0280423.r002

Author response to Decision Letter 0


12 Oct 2022

We would like to thank the editor and the reviewers for their suggestions and insightful comments. We have addressed all suggestions in the revised manuscript. In particular, we have substantially revised the introduction to make clearer our contribution, and carefully revised the paper throughout. We have also provided high resolution versions of all figures, and made these available online and interactively.

Below, we provided detailed point by point responses to each reviewer's comments individually.

> Reviewers' comments:

> Reviewer #1:

> The authors use Flickr and Instagram data associated to individual users collected to understand the underlying motivations for taking and sharing sunset and sunrise photography. While the narrative is easily followed and there are some nice figures, the paper is generally colloquial. Overall, I think there are a few issues need to be addressed and clarified further.

Response: Many thanks for these comments. We are pleased that the narrative is easy to follow - we have edited the manuscript to avoid obvious colloquial use of language. However, we don’t intend to deliberately obfuscate, so we have tried to stick with clear, direct sentences without unnecessary use of complex terminology.

> \\1. I recommend that the authors improve the introduction section, especially how the work contribute to the current state-of-the-art.

Response:

We have carefully revised the introduction, making clearer our motivations for choosing the sunset and sunrise (and not any more 'newsworthy' event) and how this choice is linked to the state of the art. Specifically, there are several characteristics of this type of event that allow us to significantly reduce the number of 'incidental variables' (while maintaining sample volume) when studying consistency and reproducibility of our method. So far, research using geo-social media in general struggles with the issue of 'results reproducibility' because samples are either too small or data is affected by too many superimposed events included in the user generated data. This usually makes it difficult to generalize methods, which is a critical contribution from the point of our article.

Importantly, we plan to share the "expected" frequencies for Instagram and Flickr as Supporting Information (data repository S10), meaning that these datasets can be used to verify reproducibility for other topics in other studies (albeit at a granular 100 km level suitable for global studies). Variability of data also partly affects our data, given the global data collection footprint and the general noisiness of geo-social media.

However, our focus on this single event type at the macro scale allowed for both, a maximization of volume, for statistical significance testing, and a minimization (as far as possible) of incidental variables. Later in the text, we more clearly describe the possible effects of the remaining variables outwith our control. We have also removed a number of some material from the introduction that duplicated the state of the art section.

> \\2. I recommend that the authors improve the results and discussion sections. Currently, the results stay on the description where more in-depth explanations are expected.

Response: We think this comment is to some extent a reflection on the introduction. By rewriting the introduction, as detailed above, and making edits to our results and discussion, we think the paper overall is more coherent.

> \\3. A figure showing the distribution of photos across countries is helpful. This is needed because if certain countries only have a few data or the distribution is highly skewed, the result is not enough to draw conclusion.

Response: This is indeed correct, countries with few observations should be excluded from the analysis. While we did this for other figures, we omitted highlighting non-significant results for the country figure. The issue is addressed in the paper’s revised Fig. 5. While there are a number of non-significant (sub-)countries for both Flickr and Instagram, they are mostly limited to very small countries or islands, which means the effect on the map, albeit noticeable, is not big. For instance, for Instagram and sunset, the total area of countries with significant results is about 143 million km², while the non-significant area covers some 2 million km² or about 2% of the total area. Given the significance test, we believe a separate graphic showing absolute numbers is not necessarily needed.

Nonetheless, we agree that these numbers may indeed be helpful to readers interested in in-depth examination of our data processing and we updated the supplementary jupyter notebook, which now lists these and other absolute numbers for countries in Supporting Information S5 (05_countries.ipynb, see cells 401ff).

> In addition, it would be better to add some explanations for the selected time period.

Response: We agree and added our rationale for limiting the time periods during data collection, p. 9, ll. 178-184:

“Our goal was to sample a comparable volume of data for Flickr on the one hand, while reducing incidental variables for Instagram on the other, by covering at least two seasons (Fall and Winter). The lower limit of 2007 for Flickr was chosen based on the year the tagging feature became available. The rationale is that behavior of users and what data could be uploaded is affected by the interface and feature availability (e.g. the tagging field). Therefore, by limiting collection to the time after 2007, we sampled data from a period where the current Flickr feature set was largely consistent and fully developed.”

> \\4. There is lack of consideration of bias in the sample of Flickr and Instagram users and the used keywords for collecting data. More discussion is needed with respect to the data representatives.

Response: We agree that bias and representativity is a crucial topic, particularly in the context of our article. We appreciate the chance to devote more attention in §2.2 (Data Collection) to sampling effects from data collection, specifically from our language filter. The following new paragraph was added (p. 10, ll. 200-209; see also changes in ll. 188-192):

“This workflow for collecting data leads to a number of sampling effects. Firstly, user groups on Flickr and Instagram are not universally representative, despite having covered together about one billion users in 2017 [42]. Secondly, filters for space, time and language lead to further intra-dataset biases. These are unavoidable when working with these data sources [43]. Our workflow for examining the consistency of results across different groups (Instagram and Flickr), and in terms of separate partitions for "what" is collectively valued, "where", by "whom" and "when” directly reflects this situation. Nonetheless, our results are not fully representative of all users on Flickr and Instagram. Particularly our language filter introduces biases for specific groups using these languages, that we attempt to quantify in §3.3.”

While we improved the description of these biases early in the manuscript, we highlight our investigation of consistency of term use and distribution in §3.3, specifically p.21, ll. 423-430:

“[...] the two top scoring terms “sunset” and “sunrise” in English are used in 96.7% (Instagram) and 96.9% (Flickr) of all posts in our dataset, revealing a strong preference for English as mode of communication. Thus, if we had searched using only the terms ‘sunset’ and ‘sunrise’, we would have been able to capture almost all posts of the other three languages as well. The inverse was also true – of those using language specific terms not in English, 67% of all posts with German, Dutch or French terms *also* contained at least one of the English references (sunrise, sunset, or sunrises or sunsets). We attribute this result to incentives for reaching the broadest possible audience in social media [57].”

In other words, we indeed expected a much stronger effect of our language selection on results, and our work shows, at least for these European languages, that sampling only in English would retrieve a very similar dataset. We provide other researchers with the opportunity to use the intersection capabilities of HyperLogLog to examine further relationships within the data that we share in Supporting Information S10, together with the code in S9 (Jupyter Notebook: 09_statistics.ipynb).

Lastly, we observe a strong consistency of rank order across our datasets (p.22, ll. 450-469). For instance, the ranking of term use between Instagram and Flickr is largely consistent (Table 2), despite the fact that both networks have a quite different user makeup and languages distribution [43], which speaks, from our point of view, for a low or only moderate data bias from language sampling.

> Besides, did the authors consider avoiding fake users or bots running by programming, or did the author conduct an outlier analysis? If yes, please clarify.

Response: Thank you for pointing this out, we agree that bot detection is important (though perhaps less so in Flickr). We did not conduct an outlier analysis because this would have made it necessary to process individual users in our dataset, which we felt was ethically unjustified and not compatible with the HyperLogLog approach that we used. We did describe and discuss a single observation that refers to such fake users or bots, namely "a single user who shared more than 50 thousand photos for sunrise, by what appeared to be a scripted upload of webcam pictures" (p.25, ll. 511-512). The effect of this user was only noticeable when studied using post count or user days, not user counts. The reason here is that sample volume and the 100x100km grid effectively prevents individual users (or bots) from gaining a significant impact on results. The impact of individual users is also detectable from the ratio between these quantitative measurements. For instance, for the grid cell in Berlin (Flickr, Sunrise), we observe only 444 (estimated) distinct users, but 47,582 user days and 49,597 post count, an extreme outlier given the worldwide average ratio used (and visualized) by chi. This is also why we decided to focus on user counts in our study and emphasize this in the discussion.

Lastly, a good question is if a systematic underlying sampling effect from bots/fake users exists that applies to the whole dataset or specific parts of it. We cannot fully answer this question, since we did not study individual users. However, identifying bots is a complex, non-deterministic process (e.g. see You et al. 2012) and any weakness in such a process or algorithm would have likely also introduced its own sampling effects. Instead, we chose to focus on validating the impact of bots/cyborgs by comparing the consistency of results across two independent datasets (Flickr and Instagram). The consistency of results suggests that our dataset (user count) is not significantly affected by bots or fake users.

You, A., Chu, Z., Gianvecchio, S., Wang, H., Member, S., Jajodia, S., & Member, S. (2012). Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? 9(6), 811–825. DOI: 10.1109/TDSC.2012.75

> \\5. Why 100x100 km grid? Did the authors do some form of sensitivity test for different size of grid? I’s typically do some simple sensitivity testing to check whether the results are sensitive to resolution.

Response: Yes, we did this. We tested the output for a 50x50 km grid and found it too small for worldwide analysis (the outputs are included in the data repository, S10, for 50 km). We also tested a granularity of 200 km, which on the other hand we found not granular enough for the manner in which we intended to interpret results.

Different resolutions can be explored using the data in Supporting Information S10. The initial parameters are defined in Jupyter notebook S1 (01_grid_agg.ipynb, section "2.2 Parameters"). By adjusting the parameter “GRID_SIZE_METERS = 100000” (cell 5, S1), the subsequent code and notebooks S2-S9 will react accordingly and produce figures and results at a different granularity. However, possible parameter settings have a lower limit of ~50 km, which is explained by effects of MAUP (see Andrade et al. 2020) and based on our initial data sampling with a Geohash of 5. It would also be relatively straightforward to adapt these notebooks to visualize higher resolution grids, using differently captured datasets for specific regions or areas (and not the world wide level). However, we did not test this.

de Andrade, S. C., Restrepo-Estrada, C., Nunes, L. H., Rodriguez, C. A. M., Estrella, J. C., Delbem, A. C. B., & Porto de Albuquerque, J. (2020). A multicriteria optimization framework for the definition of the spatial granularity of urban social media analytics. International Journal of Geographical Information Science, 00(00), 1–20. DOI: 10.1080/13658816.2020.1755039

> \\6. Figures are consistently unclear, or due to text size. They are hard or impossible to read.

Response: Thank you for pointing this out (reviewer 2 emphasized this too). We think there was an issue with the way PLOS processed our figures (png). While our original figures were high resolution (300 dpi), the PDF from PLOS looked very pixelated. We apologize for the inconvenience. We made sure that graphics submitted as part of this revision follow the PLOS guidelines for figures and also verified this with PACE (Analysis and Conversion Engine digital diagnostic tool, https://pacev2.apexcovantage.com/). We also increased font sizes in all figures, changed font consistently to Times New Roman, and reformatted figures selectively where we felt improvements were possible. In addition, figures and graphics in the data repository (S10, at this stage available at https://anonymous-peer12345.github.io/) are now available in the following formats:

• Interactive HTML (Bokeh), for zooming into maps, with additional information on hover. We used these interactive maps for examining results for the discussion in section §3.

• SVG (Vector graphics)

• PDF (Vector), for archiving purposes

• Rastered TIFF files in 600 dpi, for the PLOS one submission

• Rastered PNG files in 300 dpi, for web view

Note: The generated "Approval PDF" from PLOS still includes Figures that appear downsampled to low resolution. Since we have no influence over the PDF generation at PLOS, we would like to point out this issue. Full resolution figures (TIFF) are available through the URLs in the PDF, or from our data repository.

> Reviewer #2:

> The paper explores human global reactions to the sunset and the sunrise, using two datasets of geo-referenced photos collected from Instagram and Flickr, and aiming to understand the motivations behind taking and sharing sunset and sunrise photography. The authors attempted to analyze reactions across different groups, and in terms of aspects such as "what" is collectively valued "where", by "whom" and "when." To do this, the authors used relatively simple data analysis methods, specifically (a) using the HyperLogLog algorithm to count the number of distinct users sharing photos related to sunset or sunrise taken at different regions (i.e., different countries, or different cells in a global raster with a resolution of 4km) and at different months, (b) using TF-IDF heuristics to select discriminative terms associated to different regions, (c) using a spatial formulation of signed chi values to select regions that over and under represent the concepts of sunset and the sunrise.

> Overall, the paper is both sound and clearly written, presenting the results of what I consider to be an interesting analysis. Still, there are no methodological innovations being proposed (and I do have some questions regarding some of the methodological choices), and the topic addressed in the paper is relatively narrow and likely not particularly interesting to a broad audience.

Response: Thank you for these very supportive suggestions and helpful comments. It was indeed not our aim to improve individual methods, but rather combine and chain existing methods in a way that could serve as a robust "workflow template" for similar studies. We significantly revised the introduction to make our contributions to existing literature clearer and also added a paragraph that better explains why we explored a “relatively narrow” topic (p.3, ll. 28-41).

> I have some suggestions in terms of aspects that can perhaps be improved in the manuscript, which I list next.

> * The paper should perhaps further justify the choice of languages, besides english, that were considered for analysis with basis on a translation of the terms used for data collection (e.g., German, Dutch, and French). Although I do not consider this a requirement for acceptance of the paper, perhaps other popular languages could have been considered as well, including Spanish, Mandarin, Portuguese, Arabic, or Russian.

Response: Thank you for this question, we appreciate the opportunity to explain our choice of languages. Indeed, reviewer 1 also highlighted the importance to better explain and reflect on the consequences of this decision. We added a new paragraph to this effect (p. 10, ll. 200-209; see also changes in ll. 188-192). In particular, the choice of filter terms was primarily made based on the set of languages reasonably known to the authors of this paper, with the goal to avoid sampling errors from missed but important semantics in other languages. An example is given with our search terms for French (Tab. 1), where the language evolved to offer a more nuanced and richer set of meanings - one that is not available with the other three languages. The Coucher/Lever de Soleil, for example, is commonly used to reference the exact moment the sun sets or rises. In contrast, Coucher/Lever du Soleil typically refers to the overall experience of these passing events, (e.g.) with its colorful skies and light spectacles. Without knowing these cultural specificities, it is impossible to know which sampling biases are introduced by an inadequate list of (perhaps automatically translated) search terms. Nonetheless, Spanish, Mandarin, Portuguese, Arabic, or Russian are good examples for languages we would have wished to capture as well, since these appear to belong to countries where Flickr/Instagram use fluctuates more widely, based on a co-existence with other locally popular photo platforms. Please also note our answer to reviewer 1, where we pointed to other paragraphs that were important in the context of search terms and language selection.

> The authors can perhaps further justify the use of HyperLogLog, ideally also presenting a brief explanation on the paper. It is not entirely clear how/if the actual source data will be shared by the authors (the paper mentions a public repository, but at this time I could only access the pre-prepared maps and the Python notebooks. Also, I am not sure if sharing the source data is allowed by Flickr/Instagram), but I would argue that conducting the analysis with privacy-preserving methods would not be the main motivation in using HyperLogLog, although perhaps computational efficiency is an important motivation for this.

Response: Thank you very much for this question. We entirely understand the reviewers uncertainty regarding the HLL format and the data that is shared. Firstly, we added a paragraph (p.11, ll. 214-231) how HyperLogLog affects the collected data. Indeed, privacy was initially not our main motivation, but rather incidental. Given the immense volume, originally speaking, we were primarily interested in data minimization, as a means to practically perform the necessary number of iterations to tune parameters for different visualizations. However, data minimization provides benefits to both privacy and performance (see Dunkel et al. 2020). For instance, the dataset that abstracts all 302 Million IDs of geotagged posts from Flickr in our study is only 2.55 KB in size (see S9, cell 5). To our knowledge, this is the first applied study based on social media that makes use of HyperLogLog. The attached Jupyter Notebooks can serve as a base for transferring tools and methods to other studies.

Regarding the actual data that is referenced as "data repository" (S10) in our study: It is correct that this repository with the actual notebooks (ipynb) and data (HLL) is not published yet. The referenced repository at https://anonymous-peer12345.github.io/ only contains HTML converts of notebooks and results. In the short timeframe of this revision, further changes were committed as part of the review process and it was not possible to finalize notebooks. We would like to take the time to make comments in notebooks consistent with the current version of the manuscript. It is also correct that we cannot share the raw data, only the abstracted HLL version. However, since we did not use raw data, only the captured HLL data in our analysis, this is sufficient to reproduce all results and figures in our study using the notebooks (S1-S9). Furthermore, we think that the shared expected frequencies (e.g. Flickr 300 million) can also be very useful to calculate chi in studies of other phenomena at global scales, for differently sampled data (e.g.) based on a different set of search terms.

> The explanation associated to the use of TF-IDF should be improved. The authors mention the use of "spatial TF-IDF", but it is not clear what the "spatial" adaptation actually is, nor what is the difference towards standard TF-IDF. From the descriptions that are provided latter, I guess the authors are aggregating the textual descriptions from each photo and considering each of the spatial regions (countries or cells) as "individual documents", computing the TF and IDF components with basis on this aggregations (and hence they are able to use TF-IDF to get the most "discriminative" terms associated to each spatial region). However, this is not entirely clear, and should be better explained in the paper. The explanations associated to the TF-IDF equation should also be improved, given for instance that "f" is not a variable used in the equation (whereas "df" is).

Response: Thanks for this comment - you are correct and we have reformulated the text to make it clearer.

“To explore semantic patterns, we used two approaches. We ranked the terms for each country using term-frequency inverse document-frequency (TF-IDF) as a function of their global frequency (inverse document frequency) [51]. We define a ‘document’ as the set of all terms used by a single user per country. TF-IDF ranks terms used by many users in a country higher than those that are globally common, and ranked lists therefore reveal terms characterizing a grid cell or a country.” (p. 15, ll. 304-309)

We also reformulated the TF-IDF equation to avoid any ambiguity (p. 15), thank you very much for these suggestions.

> For comparing countries, the authors mention the use of "binary cosine similarity." This should also be further explained in the paper (e.g., does "binary" mean that the authors are considering vectors indicating only the presence of particular terms?) and, ideally, also further justified (why not use vectors of TF-IDF weights?).

Response: Again, thanks, you are quite right. We used the binary formulation - based on term occurrence for countries (instead of frequency vectors), because the results were better suited to compare similarities between vocabularies at a country level (as opposed to identifying prominent terms in individual countries). We did some tests using frequency vectors, but the results were noisy and still included many ‘filling’ words at the upper ranks, despite TF-IDF weights. We clarified this in the manuscript (p. 15, ll. 310-316).

> Many of the figures also seem to have a relatively low resolution and, for inclusion in a final manuscript, the authors should ideally provide vector versions of the images, instead of PNG files.

Response: Thank you - there appear to have been some issues with figure rendering, and this is dealt with in our response to reviewer 1 too. For convenience we copy our response to reviewer 1 here:

Response: Thank you for pointing this out (reviewer 2 emphasized this too). We think there was an issue with the way PLOS processed our figures (png). While our original figures were high resolution (300 dpi), the PDF from PLOS looked very pixelated. We apologize for the inconvenience. We made sure that graphics submitted as part of this revision follow the PLOS guidelines for figures and also verified this with PACE (Analysis and Conversion Engine digital diagnostic tool, https://pacev2.apexcovantage.com/). We also increased font sizes in all figures, changed font consistently to Times New Roman, and reformatted figures selectively where we felt improvements were possible. In addition, figures and graphics in the data repository (S10, at this stage available at https://anonymous-peer12345.github.io/) are now available in the following formats:

• Interactive HTML (Bokeh), for zooming into maps, with additional information on hover. We used these interactive maps for examining results for the discussion in section §3.

• SVG (Vector graphics)

• PDF (Vector), for archiving purposes

• Rastered TIFF files in 600 dpi, for the PLOS one submission

• Rastered PNG files in 300 dpi, for web view

Note: The generated "Approval PDF" from PLOS still includes Figures that appear downsampled to low resolution. Since we have no influence over the PDF generation at PLOS, we would like to point out this issue. Full resolution figures (TIFF) are available through the URLs in the PDF, or from our data repository.

Attachment

Submitted filename: Rebuttal letter.docx

Decision Letter 1

Jacinto Estima

3 Jan 2023

From sunrise to sunset: Exploring landscape preference through global reactions to ephemeral events captured in georeferenced social media

PONE-D-22-16755R1

Dear Dr. Dunkel,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jacinto Estima

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Most of the comments from the previous revision were addressed. Please do take into account the low resolution of images mentioned by reviewer 2.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: I have analyzed the revised version of the manuscript, and I was also one of the reviewers for the previous version (i.e., Reviewer 2).

The revised version of the manuscript has addressed most concerns put forward in the previous round of reviews, and the authors did a fine job in terms of further motivating the study and proposed approaches. There remains the fact that the paper is addressing a rather narrow topic, with no particular technical innovations, but the motivation is now clearer and I believe the manuscript can be accepted.

The suggestions for improvement that I had pointed before, e.g. regarding the explanations associated to TF-IDF and to the computation of cosine similarity), have been taken into account and, overall, I believe that the quality of the manuscript has improved.

The figures in the manuscript that was given to me for reviewing remain with some problems in terms of resolution, although the issue is likely due to the way PLOS processed the original files provided by the authors (and hopefully the issue can be corrected in the preparation of a camera ready version).

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (HTML)

    S2 File

    (HTML)

    S3 File

    (HTML)

    S4 File

    (HTML)

    S5 File

    (HTML)

    S6 File

    (HTML)

    S7 File

    (HTML)

    S8 File

    (HTML)

    S9 File

    (HTML)

    S1 Fig. Chi expectation surface for Flickr and Instagram and sunrise and sunset per 100 km grid.

    Based on user count, over- and underrepresentation, Natural Breaks classification.

    (TIF)

    Attachment

    Submitted filename: Rebuttal letter.docx

    Data Availability Statement

    All HyperLogLog data used to produce figures and results in this work (see code Supporting information S1S9) is made available in a public data repository https://doi.org/10.25532/OPARA-200.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES