Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Mar 23;15(3):e0230264. doi: 10.1371/journal.pone.0230264

Migrant mobility flows characterized with digital data

Mattia Mazzoli 1,*, Boris Diechtiareff 2, Antònia Tugores 1, Willian Wives 2, Natalia Adler 3, Pere Colet 1, José J Ramasco 1,*
Editor: Jordi Paniagua4
PMCID: PMC7089540  PMID: 32203523

Abstract

Monitoring migration flows is crucial to respond to humanitarian crisis and to design efficient policies. This information usually comes from surveys and border controls, but timely accessibility and methodological concerns reduce its usefulness. Here, we propose a method to detect migration flows worldwide using geolocated Twitter data. We focus on the migration crisis in Venezuela and show that the calculated flows are consistent with official statistics at country level. Our method is versatile and far-reaching, as it can be used to study different features of migration as preferred routes, settlement areas, mobility through several countries, spatial integration in cities, etc. It provides finer geographical and temporal resolutions, allowing the exploration of issues not contemplated in official records. It is our hope that these new sources of information can complement official ones, helping authorities and humanitarian organizations to better assess when and where to intervene on the ground.

Introduction

Migration is an ubiquitous phenomenon in human history. People move to improve living conditions or, simply, to escape social distress [1] and natural disasters [2]. The collection of data on migration flows dates back at least to 1871, when the United Kingdom registered the difference of inhabitants in a period of a decade [3]. The data showed that the population changes could not be explained by the number of births and deaths alone, hence another reason had to be involved in the process: migration. Official data on migration flows relies on the comparison of heterogeneous records across countries, which usually have different time scales and time coverage [4]. Data sources include national census, which are released every 10 years, specific surveys, border control and residence permits requests. Surveys and census records are helpful tools for estimating migratory statistics, but the information is not provided at a multinational scale, they are costly and, consequently, the update times tend to be long. Looking at a single country, the net population variation, besides changes due to births and deaths, can be calculated as the difference between the number of immigrants and emigrants. Migrants’ mobility, however, can involve several countries biasing the single-country statistics. There can be multiple entrances in a country by the same individual or several countries along the trajectory of migrants. This adds inconsistencies to the information at the local perspective, e.g., recurring and returning migrants can affect the number of border crossing events. To capture the complexity of the migratory phenomenon, the concept of interregional flows must be introduced [5]. In this context, another potential data source could be air traffic records that allow to estimate human mobility worldwide [6]. However, they only capture one transportation mode, long or medium distance movements and there exist a bias toward wealthier individuals.

Given this situation, there has been recent calls for new sources providing reliable data on the mobility of migrants and, especially, refugees [710]. Data associated to the use of information and communication technologies (ICT) has experienced a large growth in the last decade. ICT data contains temporal and spatial records and it is a useful, fast and inexpensive information source to characterize human mobility at different scales [11]. For example, mobile phone records have been employed to study human mobility with promising results at urban [1215] and inter-urban level [1618]. They have been also used to analyze migrant communities distribution and integration in cities [19, 20]. Yet the phone sector is too fragmented to cover a global scale since the data is usually restricted to a single country, making cross-border movements hard to observe. Other examples with similar, yet even more reduced geographical scope, are the GPS tracks left by cars [21].

To expand the focus beyond national borders, we need data coming from services that are genuinely transnational. The widespread adoption of online social networks has finally introduced such global dimension. For instance, the advertising tool of Facebook has been used as a source of migration data although the internal user classification of the platform is not very transparent [22]. In a more open context, data from Twitter has been used to uncover worldwide patterns of human mobility [2325], infer international migration flows [2628], estimate cross-border movements [29] and study immigrant integration [30, 31]. Twitter data is also available in countries where census data is unaccessible, outdated or only attainable in the local language [32]. Furthermore, Twitter data has been shown to bear mobility information compatible with the one provided by cell phone records in cities [33]. On the other hand, the data is sparse (i.e., users are not continuously active) and there are biases towards younger and richer individuals [3336]. This implies that, when taking into account smaller scales, a thorough validation exercise is necessary. Although geolocated Twitter data is sparser than census, surveys and mobile phone records, the observed level of correlation allows for the interchangeability of these sources to study population density and mobility [33, 37, 38].

In this work, we introduce a method to uncover migration flows using Twitter data. As a proof of concept, we focus on the current Venezuelan migration crisis. At country level, the flows obtained from Twitter have been validated against official estimations released by the International Organization for Migration (IOM) in September 2018, the Federal Police (Policia Federal) of Brazil and by the United Nations High Commissioner for Refugees (UNHCR or ACNUR) in November 2018 and January 2019. Furthermore, our method provides information at finer geographical and temporal resolutions. It also allows the exploration of other issues not contemplated in official records, such as mobility across multiple countries, routes taken by the migrants, as well as the places where they settle down.

Materials and methods

Classical data sources

Traditionally, official statistics come from census surveys, household or labor surveys, population registers, administrative records and border control. For sake of completeness, we briefly summarize the characteristics of these sources and offer references for further information. In terms of surveys and starting by census, surveys of census focus on migrant stocks or migrant flows including socio-economic features like the country of birth, citizenship, age, sex, education, occupation and time of arrival of migrants in the new country [39, 40]. Spatial information in census surveys is typically given at regional scale but modern censuses provide information even at finer scales. The typical limitations of censuses on migration data are their cost, the slow updating (usually every 5 or 10 years), which decreases the information validity since it gets outdated fast, and the fact that they do not capture illegal immigration. Secondly, household surveys, apart from measuring migrant stocks or migrant flows, focus on the drivers and the impact of migration, internal displacement, emigration and immigration of a given country. A third source comes from labor force surveys which produce statistics on migrant stocks in the labor market of the country. These data allow for country by country comparability and they offer detailed data on small population groups. However, many countries do not include crucial migration questions in their census surveys such as citizenship, country of birth and year of immigration of the person, they may be infrequent or costly and can also present issues with sample size and coverage.

Population registers are held by local authorities and, as the surveys, provide information on the people living in the area. However, there are differences on how the registers are designed, how they define residence, on the population coverage sometimes excluding foreigners, e.g. citizens of some countries are not required to register inside EU states and sometimes leaving migrants are not required to unregister. As a result, they lack of comparability from country to country. Other important and classical sources of migrant flows and stocks gathered in a different way comes from administrative records, which can include records of citizens changing their usual residence, tourist visas, work visas, study permits, residence permits and work permits. Even if these records are continuous and comprehensive, they lack compatible definitions among different countries, moreover they often lack coverage and availability in some countries. Further issues may come from the fact that this data often does not cover naturalized or illegal residents who overstay their visas. Data may not represent the total number of immigrants in the country, e.g., if the visa granted to the head of the family covers his or her dependents and, as a further example, authorities may not track renewals or changes of citizenship. Lastly, border post records of leaving or entering people keep track of flows between countries. These records often do not discern between migration flows and other kinds of mobility, such as tourism. Moreover, in places where the possibility of evading the border control is high, the records become highly unreliable. A review on these classical methods can be found at [41, 42]. In particular, here we employ official data from border control, permits statistics and registers gathered by the UNHCR [43, 44], the IOM [45] and the Federal Police of Brazil [46, 47] to validate the information obtained from Twitter.

Twitter data for mobility information

Classical data sources have major caveats in terms of comparability, coverage and immediacy of the furnished information [710, 48]. ICT data has the potential to complement classical sources by generating estimates of socio-economic phenomena such as transport, mobility, urbanism and migration. Of all the possible ICT data like mobile phone records, social networks, etc., Twitter is one of the most accessible since the data is openly distributed. Twitter data is not safe from biases, several studies demonstrated that people tweeting are mostly young, wealthy and tend to live in cities [34, 49]. However, more recent studies show that the relevance of some of these biases like those related to age or geographical origin are slowly decreasing [35, 49].

Beyond demographics, the impact of biases in the study of aggregated mobility from geolocated tweets has been addressed in [33, 37]. Origin-destination (OD) matrices obtained from Twitter, mobile phone records and mobility surveys were found to be equivalent at scales larger than one square kilometer. This is due to the fact that mobility patterns are not strongly dissimilar across age groups and gender [50] and the demographic biases do not influence so strongly the OD matrices extracted. A similar effect could be expected in refugees and migrant displacements that may be carried out in groups, and thus detecting some members could be enough to describe the mobility patterns.

Accessing and filtering Twitter data

The data was accessed using the publicly available Twitter Streaming API [51], selecting tweets with geographical information (a description of the script employed is included in Availability of data and materials). We have collected data from January 2015 to December 2018. The location has to be provided at the level of place with a bounding box smaller than 40 km on the longest side, otherwise the tweets are not included in the spatial analysis but we still use them to count country-level flows (using the country identification in the tweet). A minimal filter for bots is implemented by neglecting users tweeting more than twenty times per hour on average in their tweeting life span. Multi-thread tweets are counted as a single one.

The analysis focuses on South and Central America and Caribbean, where over 70% of the Venezuelan migration is concentrated, according to the UN [52]. We divided this extensive territory in a grid of cells of 40 km side, which became our basic spatial units. The position of the users is approximated by the centroid of the cells where they tweet from. With this information, we calculate the radius of gyration of each user trips:

rg=1nin(rcm-ri)2 (1)

where n is the number of tweets, ri the cell centroid and rcm stands for the center of mass of each user movements. We disregard users with rg > 5000 km (see the distribution of rg in the Supporting Information) because this implies all the trips are long distance. These accounts are typically multiuser, institutional or company, and cannot offer any mobility information.

Ethics statement

As a final comment, we adhere to strict data responsibility and ethics principles, ensuring that no personally identifiable information is kept. The meta-data of the tweets, which can be considered personal, are deleted before storage and the user ID is irreversibly encrypted using a SHA-512 algorithm. We do not track individuals, all the spatial analyses are performed using aggregated trips and number of residents at a minimal scale of 1 km2 and in most of the cases over 1, 600 km2. All the figures and tables show aggregated data and only when the number of individuals is larger than three in a cell.

Resident classification

There is no unique definition of resident applicable to a situation with sparse data as ours. Given the lack of clear definition, we provide several definitions and check if they bear differences in the outcomes of the analyses. Every tweet has an associated country code reference. Users tweeting from Venezuela less then five times in the period 2015-18 or with all the tweets posted in a time window shorter than three months were excluded from the study as non-representative data. After that for each user we identified the most common country of origin of the tweets posted every month. By doing so, we created the users’ history as a sequence of country flags, with one flag for each month. In particular, if the most common country was Venezuela, we labeled the corresponding month as “Ven”. If, for a given month, there were no tweets posted or there was a draw between Venezuela and another country, we labeled the month as “Und” (for ‘undetermined’). Then we considered the following criteria to discern Twitter Users which are Venezuelan residents (TUVs) based on filters with a gradual level of restriction:

  • The first criterion is the most restrictive one. We considered TUVs the users appearing to be flagged Venezuela three months in a row at least once in our database. In this way, we obtained approximately 160, 000 TUVs.

  • The second criterion is slightly less restrictive. It included the users identified before and those with a sub-sequence Ven-Und-Ven. With this method we estimated around 200, 000 TUVs.

  • The third criterion is similar to the previous one. We included all the users coming from the first criterion and added those with a sequence Und-Ven-Ven or Ven-Ven-Und. This yielded around 216, 000 TUVs.

  • The fourth criterion is comprehensive of all of the above. With this method, we obtained around 253, 000 TUVs.

Determining the number of TUVs is crucial not only to measure the displacement flows but also to infer the upscaling factors as discussed in the subsection on upscaling factors. These factors allow us to translate the observed TUV numbers into the total flows. We estimated them as the ratio between the population of the country projected from the census and the number of TUVs, updated every year in terms of empirical flows recorded.

Definition of migrants

The UN definition of long-term migrant requires the person to stay in the new country for at least 12 months. However here we rely on the term of “migrant”, which is a term not defined under international law. A migrant is generally defined as a person who leaves the usual place of residence, within the country or crossing a border, in a short or long term and for many reasons [53]. The number of migrants estimated by UN agencies in a single country often relies on classical data, such as records referring to residence permits or flows observed at the border. The objective in this work is to capture flows of people leaving a country in a humanitarian crisis. Definitions based on long-time scales are not adequate for describing population mobility under these circumstances and it does not exist a universal definition of migrant [53]. Therefore, we consider here as migrant any individual leaving Venezuela during the time window of observation, concordantly to the general definition of migrant given by the IOM [53].

Results

Upscaling factors

The number of TUVs identified with the four criteria and detected as active (posting tweets) every year is shown in Table 1. As shown in the Table, independently of the criteria used the number of TUVs is relatively stable in 2015 and 2016, and then we found a decline of over 20% in 2017 and almost 50% in 2018. The migration crisis had the first peak in the last months of 2016, so there is a combination of factors that can explain the drop in the number of TUVs. The first one is migration to other countries –estimating this is our target–, which can dissuade users from using geolocated social networks, but this factor alone does not explain the entire complexity of the change. We see a drop of 43, 000 TUVs with criterion 4 between 2016 and 2017, while the observed TUVs leaving the country with the same criterion 4 in 2016 is 11, 000 TUVs. To better assess whether this decline of geolocated tweets is a specific behavior of the Venezuelan population or a general trend in the region, we execute the same analysis with Colombian residents as a base line. As a proof of concept, the number of geolocated users residing in Colombia is analyzed using the corresponding criterion 4 for this country. We observe a drop in the same years from 2016 to 2017, from 236, 000 to 198, 000. A general drop in the use of geolocated tweets seems to be happening in the region as a general trend. Factors such as the general use of geolocated Twitter, as well as economic and social stress, must have contributed to such a systematic decline.

Table 1. Number of TUVs detected according to the four criteria discussed in the subsection on resident classification in thousands of individuals.

Year crit.1 crit.2 crit.3 crit.4
2015 106 K 129 K 134 K 157.5 K
2016 110 K 132 K 137 K 159.5 K
2017 82 K 98 K 100 K 116 K
2018 51 K 60 K 62 K 71 K

The population of Venezuela in 2011, according to the census, was P(2011) = 29 million inhabitants [54]. The United Nations Population Division projects the Venezuelan population at around P(2015) = 30.1 million for 2015 [55]. As a simplifying assumption, we neglect the population changes due to the difference between natality and mortality. Given the short period of time considered, its effects on the total population are much weaker than those introduced by migration.

The number of active TUVs that same year according to criterion 4 was u4(2015) = 157, 500. This gives us an upscaling factor of S4(2015) = P(2015)/u4(2015). For the following year, we have a number e4(2015) TUVs leaving the country according to the same criterion 4. This implies a population decrease given by P(2016) = P(2015) − S4(2015) e4(2015). In general, we can write a recurrent formula for the upscaling factor associated to criterion k (k = 1, …, 4):

Sk(t)=P(t-1)-ek(t-1)Sk(t-1)uk(t), (2)

where the first year corresponds to 2015 with P(2015) = 30.1, as stated before. Applying these calculations for a given criterion k, we obtain an upscaling factor per year. These factors are the inverse of the population fraction tweeting with geolocation during the corresponding year and, with all the needed cautions on possible biases, one can assume that for criterion k one TUV leaving the country represents Sk(t) actual migrants. Measuring the TUVs detected abroad in 2018 and upscaling the flows according to the first exit year, we obtain the numbers shown in Table 2 and in Fig 1.

Table 2. Estimated migration of Venezuela citizens in the four neighboring countries from which a humanitarian crisis has been reported.

The different lines in “Data” correspond to the different criteria for establishing Venezuelan residents with the Twitter data and the last four lines correspond to official figures from IOM, the United Nations (UNHCR) and the Federal Police of Brazil (PFB). The units are thousands of individuals (K).

2018 Brazil Peru Ecuador Colombia Total
Data(1st) 168K 589K 247K 1,080K 3,970K
Data(2nd) 163K 588K 242K 1,070K 3,920K
Data(3rd) 163K 593K 246K 1,090K 3,960K
Data(4th) 160K 590K 243K 1,080K 3,920K
IOM Sep18 75K 414K 209K 935K 2,600K
UNHCR Nov18 85K 500K 220K 1,000K 3,000K
UNHCR Jan19 96K 506K 221K 1,100K 3,400K
PFB Dec18 199K

Fig 1. Country-level flow validation.

Fig 1

Comparison between the migrant flows estimated from the upscaled Twitter data Ntwitter following the different resident criteria and the official numbers Nofficial from the UNHCR in January 2018. These official numbers in some countries are projections. Each point corresponds to a country. The panel (a) shows all the countries of the considered area, and the panel (b) specific detail for neighboring countries from which a humanitarian crisis has been reported. In both cases, the grey dashed lines are the diagonal. The correlation produces a R2 = 0.98 for all the countries and R2 = 0.99 for Brazil, Colombia, Ecuador and Peru alone. The country codes are Argentina (AR), Aruba (AW), Bolivia (BO), Brazil (BR), Chile (CL), Colombia (CO), Costa Rica (CR), Curaçao (CW), Dominican Republic (DO), Ecuador (EC), Guyana (GY), Panama (PA), Paraguay (PY), Peru (PE) and Trinidad and Tobago (TT). The panels are displayed in log-log scale due to the several orders of magnitude of the flows. However, the important point here is to verify the identity between expected and measured values and, therefore, the correlation analysis is performed with R2 in the original scale.

Validation of external flows

We are interested in estimating the number of Venezuelan migrants in each country because these are the numbers reflected in the statistics of entries and registers, and constitute the basis for the records used by national and international authorities. Note that the official sources are highly heterogeneous in nature and vary from country to country. The method to define a previously classified Venezuelan resident as a migrant is to detect at least one tweet from them abroad as a proof of border crossing. This means to count for each year the number of TUVs appearing for the first time in a different country and determine the migrant flows by upscaling as discussed before. Some of the TUVs can be passing through and continue the travel to third countries, where in turn, they will be counted as well. The comparison of the flows obtained in Table 2 with the numbers provided by the international and national agencies shows a good alignment with our estimation. An impression further confirmed in Fig 1 for the UNHCR data in most of the countries in the area with R2 over 0.9. The main outlier in the Table 2 is Brazil, where the IOM and UNHCR give values well below our estimations. However, the Federal Police of Brazil registered a total number of entries over 199, 000 from January 2017 until December 2018 (even though half of them end up returning to Venezuela [46]). Our numbers lay within the range provided by the different sources of information. Given the similarity across the upscaled flows obtained with the different criteria, we continued our analysis with criterion 4, which provides the largest statistics.

Migration routes

Beyond flow estimations, it is important to identify the preferred routes taken by migrants in order to better provide targeted humanitarian assistance. As mentioned, we took as geographical basis a grid of cells of 40 km side covering the full South American continent, the Caribbean and part of Center America. We counted the cumulative number of TUVs posting messages in each cell. The number of distinct TUVs in the whole period (2015-2018) is plotted as a heat map in Fig 2. The densest areas are located in Venezuela, specifically near Caracas. The relevant users for the analysis are those classified as residents (TUV) and who have been detected abroad. The beginning of most of their trips are in Venezuela and the density matches population distribution. Once in other countries, the routes concentrate also in the large cities like Buenos Aires, Santiago, São Paulo, Rio de Janeiro and Lima (which can be temporary stops or travel destinations) and around the principal roads and rivers. For example, following a preferred route from Venezuela to Manaus, a split occurs: some TUVs take the ferry navigating the Amazon to the Brazilian city of Belem and subsequently travel along the coast to the cities of Fortaleza and Salvador; while other TUVs travel from Manaus through the forest via the BR-319 highway towards the border with Peru and Bolivia. Overall, the preferred migration route in South America is the Pan-American highway, passing through Colombia, Ecuador, Peru and all the way to Chile and Argentina. Other minor routes are also detected, such as the one from La Paz in Bolivia to Cordoba in Argentina.

Fig 2. Migrants’ routes.

Fig 2

Number of individuals observed in every 40 × 40 km2 cell in the area of study. The heatmap scale is logarithmic. Only cells with more than 3 individuals are displayed. In (a) the full South American continent plus the Caribbean and Center America; (b) A zoom in on the Northern area focused on the Caribbean; In (c), a zoom in highlighting the Southern Cone; And in (d), a zoom in on Brazil. Map tiles by Carto, under CC-BY 4.0. Data by OpenStreetMap, underODbL.

To better understand the potential application of our method, we built a vector representation in each cell. For every TUV tweeting from the cell, we built a unit vector pointing from the present location to the cell with the consecutive tweet with the condition that it must be closer than 500 km to exclude as much as possible air traveling and noise coming from infrequent users. We separated the unit vectors in those pointing towards or away from Venezuela, and we added those in every cell in each category. The resulting outgoing vectors are displayed in red in Fig 3, while the in-going ones are in blue. These maps provide information on the main ground exiting routes from Venezuela reported from official agencies and also on the total flow observed in both directions. To go further, we can establish a line intersecting the routes (center of the parallel dashed lines in Fig 3) and consider the upscaled number of TUVs to calculate the number of people crossing the line per year and the direction they are going. When calculating these numbers, we are not using the vector representation. To be sure that these users effectively crossed the given line, we impose as a necessary condition to have them tweeting on both sides of the dashed line within circles of 350 km radius. For example, in Fig 3b, the dashed line is placed halfway between Boa Vista and Manaus, the first circle comprehends an area of the radius of 350 km around Boa Vista, and the other one spans the same area around Manaus. In Fig 3a, the same is done between Bogotá and Quito. The results for 2017 are shown in the right-bottom corner of the maps, while the information collected year by year is included in Table 3.

Fig 3. Crossing routes.

Fig 3

Map of two main ground migrant exit routes from Venezuela reported by official agencies: in (a) the Pan-American road with a portion of Colombia, Ecuador and Peru, and in (b) the area of Roraima, Amazonas and Pará states in Brazil. Blue arrows indicate flows toward Venezuela and the red ones away from it. Only cells with more than three TUVs are shown in the maps. The lightness of the color of the arrows is proportional to the net in- and out-flows from light to darker colors. The upscaled net flows crossing the dashed lines are displayed in the right-bottom corner of each plot.

Table 3. Upscaled number of TUVs crossing the lines of Fig 3 away (←) and toward (→) Venezuela in the two routes considered.

Year 2015 2016 2017 2018
Pan-American road away (←) 6,550 11,300 18,500 36,200
toward (→) 5,500 9,300 13,500 23,700
Roraima away (←) 1,200 1,490 2,600 1,070
toward (→) 1,200 1,300 1,400 720

It is important to note that the numbers are relatively small. This is due to the fact that we need to see two consecutive tweets: one above and another below the line to identify a user. However, not all TUVs have these two tweets, as some may travel directly from Venezuela by plane or simply do not tweet so often during their trips. These numbers are, therefore, underestimations although they are proportional to the total flow. We would need to further upscale them according to the fraction of TUVs active enough to be detected on both sides of the lines. Without further processing, we can, nevertheless, compare results obtained on the same route year by year and between routes with the same methodology. In Table 3, we observe a clear domain of the outflows over the inflows to Venezuela. In the Pan-American highway, there is an important grow in the flows year by year, almost doubling in the last period between 2017 and 2018. The route entering in Roraima has, in turn, a flow that is close to a factor 10 below the Pan-American one, with an initial increase until 2017 followed by a strong decline in the following year. The latter case may be explained by the fact that approximately half of the migrants entering Brazil have been registered to leave in the next months [46, 47].

Recurrence

Note that 25% of the migrant TUVs travel back and forth from Venezuela (recurrent travelers) as we will show below, while the remaining 75% stay most of the time abroad. To estimate the time spent in a location, we take the following convention: the time between consecutive tweets is assigned to the location of the first one. Applied to countries, this allows us to define the cumulative time spent abroad for each TUV after his/her first travel to other country, tout. Additionally, we can calculate the time span between the first tweet abroad and the last tweet of the user, tTot. In this way, we define a ratio between the time spent abroad and the total after the first exit from Venezuela for each TUV, R = tout/tTot. There is a wide range of behaviors as can be seen in the distribution of Fig 4. TUVs detected abroad for the first time in their last tweet are not considered. Note that the precision is limited by the heterogeneity in the user inter-event time distribution and the total time window that we are able to analyze. Some TUVs correspond the canonical view of migrants, leaving the country and coming back only seldom, while others go back and forth. More recent migrant TUVs may be classified as not recurrent, even though they might be correctly classified if observed for longer time periods. Even so, we need to establish a criterion to discern between frequent returners and those staying mostly abroad. The distribution of R shows two clear peaks at the extreme values of the domain with a valley in the region between 0.4 and 0.75, so the threshold is set at 0.5. Results are similar for other values of the threshold provided that they are in the range [0.4, 0.75]. We find 12, 518 TUVs classified as recurrent and 22, 459 as non recurrent.

Fig 4. Time spent abroad.

Fig 4

Probability distribution of the fraction of time spent abroad after the first country exit. TUVs with R lower than 0.5 are classified as recurrent.

Recurrent TUVs stay most of the time in Venezuela, so it can be assumed that they still reside somewhere in the country. Similarly, non recurrent individuals are likely to have a residence place abroad. We define as residence place/country the location from which they tweet most in the last month of activity. Out of the 22, 459 non recurrent TUVs, 16, 292 have a place of residence assigned in the geographical area of our analysis out of Venezuela. To upscale these numbers in each of the countries, we apply the same technique as in previous sections, see Eq (2). The factor is calculated as the ratio between the updated Venezuelan population and the number of TUVs in each year. The number of migrant TUVs in every country is upscaled according to the factor corresponding to their first year of exit. The results per country are shown in Table 4 and compared with the official estimations from international agencies. The numbers of Table 2 refer to entries across the border, a single individual can contribute to more than one country. In contrast, the records in Table 4 assign one country to each migrant and the numbers contain only distinct individuals in the full study area. Our method provides numbers below the official statistics in Brazil, Ecuador and Colombia, while in Peru it is slightly higher. We see that the new settlement places concentrate in Argentina, Chile, Colombia and Peru, while the flows get through other countries such as Brazil but the fraction of migrant fixation is lower. We have performed as well a systematic comparison between the number of Venezuelan residents in each country provided by our method and by that of international agencies. The result is displayed in Fig 5, where one can see an acceptable agreement with a R2 over 0.97.

Table 4. Estimated number of venezuelan residents in each of the countries in late 2018.

The units are thousands of individuals (K).

2018 Brazil Peru Ecuador Colombia
Data(4th) 58K 504K 170K 826K
IOM Sep18 75K 414K 209K 935K
UNHCR Nov18 85K 500K 220K 1,000K
UNHCR Jan19 96K 506K 221K 1,100K

Fig 5. Validation of new residents.

Fig 5

Scattered plot with the comparison between the estimations of new Venezuelan residents obtained with our method and the official data from the international agencies in each country. Every circle is a country, the dashed gray line is the diagonal. In this case, the correlation is R2 = 0.97. The plot is displayed in log-log scale due to the several orders of magnitude of the number of residents. As before, the important point is to verify the identity and, therefore, the correlation analysis is performed in the linear scale.

Spatial integration of migrants

In addition to large-scale numbers, the data also allows for a local study on the place of residence of the migrant population. As example, we look at three of the main cities in South America: Bogotá in Colombia, São Paulo in Brazil and Lima in Peru, where the statistics are more reliable. The urban space is divided in a grid of cells with 1 × 1 km2 area where we assign to migrants the most common place from which they tweet during night hours (between 8PM and 8AM local time). The resulting heatmaps are displayed in Fig 6. Below, we also show the distribution of the local population obtained from local twitter users to have a comparison basis. Besides a visual inspection, we have calculated the segregation index proposed in [31] called h. This metric is the ratio between the entropy of the spatial distribution of the migrant community and that for the local population with a correction to take into account finite size effects. If h = 1, both populations are similarly distributed, while smaller h indicates segregation. As a complement, we also calculate the normalized mutual information NMI between the distribution of migrants and locals. The NMI is a way to compare the distribution of two variables and it ranges between 0 (the variables are independent) to 1 (they come from the same distribution) [57]. When applied to the former Venezuelan residents, the results are shown in Table 5. All the values of h and NMI are low. In these cities, Venezuelan migrants are far from being well integrated from a spatial point of view. There are many causes behind this behavior, ranging from housing prices and availability to the presence of migrant communities from the same country. Moreover, the distribution of locals says nothing on migrants’ residence places. Hence it is not to be considered a proxy for migrant distribution in the three cities. Specifically, in the case of São Paulo, both metrics are very low, although this must be taken with certain caution because only 50 users were detected there against 1, 300 in Bogotá and 570 in Lima.

Fig 6. Residence locations in the main cities.

Fig 6

Log-scale heat map of the population distribution in the main cities of the area. In the top row, the data corresponds to numbers of migrant TUVs and in the bottom to the local geolocated Twitter users. The scale of the heatmap is logarithmic and the maximum is rescaled in each of the maps. In (a) and (d), results for Bogotá (Colombia). In (b) and (e) for São Paulo (Brazil. And, in (c) and (f), for Lima (Peru). Data on roads by OpenStreetMap contributors and from MapCruzin, all available under the Open Database License ODbL. For more information check [56].

Table 5. Segregation indicators h and NMI for migrant TUVs.

Bogotá Lima São Paulo
h 0.59 0.62 0.27
NMI 0.05 0.07 0.04

Temporal distribution of upscaled outflows

Considering only outflow from Venezuela, we can unfold a time series of the upscaled number of exits per month. The results are shown in Fig 7a. The monthly numbers start to increase in 2015, peaked in late 2016, and increases later until the end of our time window in January 2019, consistently with the data recorded from the Federal Police of Brazil [47]. The histogram shows some peaks and valleys that can be correlated with special events during the crisis. This can be seen in the lower panel Fig 7b, where we consider the first exit from Venezuela per TUV. In this version, the impact of the events is more clearly appreciated since they correlate better with outflow. In all cases, we are only showing the upscaled outflows. As discussed for the recurrent TUVs, there exists an inflow that partially compensates the exits.

Fig 7. Exit times distribution.

Fig 7

(a) Total upscaled exits from Venezuela Fexit. (b) First exits from Venezuela per TUV upscaled to obtain Fusers.

Discussion

According to five UN agencies, massive gaps in data covering refugees, asylum seekers, migrants and internally displaced populations threaten the lives and wellbeing of millions of children on the move. Through a joint Call to Action [58], the agencies confirm the critical need to improve the availability, reliability, timeliness and accessibility of data and evidence to better understand how migration and forcible displacement affect the wellbeing of people. In this work, we have developed a method to contribute to filling these glaring data and knowledge gaps by extracting migrant mobility flows from geolocated Twitter data. We focus on the current crisis in Venezuela, although the method is universal and can be translated to other contexts conditioned only to the data coverage.

The analysis performed here is subject to following considerations, restrictions and assumptions:

  • A few previous studies anticipated the potential of Twitter for the study of migration flows [2628]. The main focus of these works is on the presence of migrants and the estimation of country level migration flows. Here we show the full potential of Twitter geolocated data to capture mobility patterns on the route, finding the specific destinations of migrants within a country and to study the time series of migration flows. We have provided as well a systematic check of different methodologies to define residents and a sound comparison between flows detected from Twitter and numbers offered by international and national agencies.

  • Geolocated information in Twitter. There are two types of tweets according to the field containing the metadata on geolocation: tweets with coordinates and those with place. The ones with coordinates are precise within GPS resolution, they are a minority and Twitter is discontinuing their usage due to privacy concerns, even though user’s informed consent is required to post them. The tweets with place are a majority among those geolocated [59]. The place refers to a geographical bounding box around the posting position, which may corresponds to neighborhoods, municipalities, provinces or states, etc. Since we want to study migration out of a country, most of these place levels are accurate enough for our purposes and they are used in the analysis.

  • Note that in this work the definition of Venezuelan residents comes from temporal tracking of users’ location at country level. We have not used methods such as the content of the tweets [60], where the authors manually processed a sample of the tweet texts in order to classify users as refugees or not. The analysis was based on the presence of keywords such as “refugee”, “asylum”, “camp”, etc. Only 5.4% of the users in their text filtered dataset is classified as a refugee, another 16.2% are potential refugees, while the rest are journalists, tourists and others. Our work is free of these biases because we use a completely different approach based on geolocation only. The analysis of the message contents may be of interest as a byproduct of our work with aim of detecting the main concerns of the migrant population.

  • Here we employ the term “migrant” concordantly to the definition given by the IOM [53], which requires the person to leave the original place of residence for a variety of reasons. The method to define a previously classified Venezuelan resident as a migrant is to detect at least one tweet from them abroad as a proof of border crossing, similarly to a previous study [60]. We could have used a stricter criterion and request two or more tweets abroad to classify migrants (and on the other hand request two tweets per year in order to classify residents as active residents). This would not affect the average flows (as we saw from other methods, the upscaling factors absorb and rescale the estimated flows), although it notably narrows the statistics. Hence we discarded this option. On the other hand, our definition relies on the classification of users as Venezuelan residents. We do not have a unequivocal definition and, therefore, we have studied four different methods based on temporal and quantitative sampling, which are shown to be almost identical after rescaling. This can be observed in the way they give similar results when measuring migration flows with respect to classical data estimations from the UN in Fig 1.

  • Twitter data provides information from a fraction of the total population and, consequently, it may suffer from biases. Previous studies have shown that these biases in terms of age and socio-economic level have a limited impact on the estimation of mobility flows [33, 37]. These studies were conducted in Europe, where the economic inequalities between rich and poor are less pronounced than in Latin America. Wealthy migrants from Venezuela are able to fly and their destination can include, besides, the South Cone countries in the North of the Americas and Europe. The fact that we detect non-negligible flows of terrestrial movements show that we are capturing also the least wealthy sections of the population. Additionally, we perform a systematic comparison between flows estimated with our methods and the ones offered by official records. The close agreement proves that the analysis is not ignoring significant parts of the population.

  • Among the countries with the largest flows of Fig 1, we see an overestimation of migrants in countries of the Southern Cone, namely, Chile and Argentina. Up to our understanding, this distortion may be due to two relevant factors. On one side, the penetration rate of people using Twitter from these countries may be different from Venezuela. From a social perspective, Venezuelans moving to these countries may conform to the local culture regarding a different usage of social platforms. This results in a over-weighted estimation of those migrants who end up tweeting from Chile and Argentina. This may be a factor, but it is unlikely that people adapts so fast to local usages. On the other hand and most importantly, there could be a sub-representation of Venezuelan migrants in the official statistics of these countries. For example, this happens in the numbers of Table 2 with respect to immigrants entering in Brazil. The official UN-agencies estimated almost one half of the inflows registered by the Federal Police. The numbers estimated with our method were much closer to those of the Police.

  • Twitter data has the advantage of being publicly available. Still, before using them in operational context, a through validation exercise like the one we have performed here is necessary. Part of the validation could be done against other private data sources such as mobile phone records, but these data sets are usually constrained to a single country and users can change their phone number/provider at border crossing while the online social network accounts remain unaltered.

This work has proved useful for humanitarian agencies, such as UNICEF, in better understanding the magnitude of the Venezuelan crisis in a country of continental proportions, such as Brazil, shaping the design of broader interventions beyond the border-crossing in the North of the country. In addition, the method simplicity, coupled with the open availability and pervasiveness of the Twitter data, can bring a new generation of studies to explore migratory crises from a multidisciplinary point of view. Authorities and humanitarian organizations can thus count with this extra data source to implement more informed protocols of response in humanitarian contexts.

Availability of data and materials

In this work, we use several data sources: Geolocated Twitter, UNHCR, IOM and Federal Police of Brazil. The access links are included as references in the manuscript. In particular, the Twitter API in Ref. [51], the UNHCR in Refs. [43, 44, 52], the UN population division DESA [55], the IOM at [45], the Federal Police of Brazil at [46, 47] and the Venezuelan census [54].

For Twitter, the geolocated data is downloaded using the streaming API. Every user can access this information by following the instructions of the updated user guide on the Twitter streaming API provided at https://developer.twitter.com/en/docs. We include next a working example of the python code needed to do a query to the Twitter streaming API in a limited geographical BOX enclosed between latitude y0 and y1, and longitude x0 and x1. The aim of this code is only illustrative, the commands can be changed in any moment by the Twitter developers.

from tweepy import Stream, OAuthHandler

from tweepy.streaming import StreamListener

CONSUMER_KEY = ’’

CONSUMER_SECRET = ’’

ACCESS_KEY = ’’

ACCESS_SECRET = ’’

BOX = [x0, y0, x1, y1]

class MyStreamListener(StreamListener):

 def on_status(self, status):

  print(status)

if __name__ == ’main’:

 auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)

 auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)

 listen = MyStreamListener()

 stream = Stream(auth, listen, gzip=True)

 stream.filter(locations=BOX)

Acknowledgments

We thank Riccardo Gallotti and Daniela Paolotti for useful comments and suggestions.

Data Availability

The manuscript contains all the information needed to download the data and to replicate the analysis.

Funding Statement

MM is funded by the Conselleria d’Innovaci\’o, Recerca i Turisme of the Government of the Balearic Islands and the European Social Fund with grant code FPI/2090/2018. AT acknowledges financial support from the AEI, Spanish National Research Agency, with grant code PTA2017-13872-I and the Government of the Balearic Islands. MM, AT, PC and JJR also acknowledge funding from the Spanish Ministry of Science, Innovation and Universities, the AEI and FEDER (EU) under the grant PACSS (RTI2018-093732-B-C22) and the Maria de Maeztu program for Units of Excellence in R\&D (MDM-2017-0711). We acknowledge support of the publication fee by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).

References

  • 1. Jones D. Conflict resolution: Wars without end. Nature. 2015;519:148 10.1038/519148a [DOI] [PubMed] [Google Scholar]
  • 2. López-Carr D, Marter-Kenyon J. Human adaptation: Manage climate-induced resettlement. Nature. 2015;517:265 10.1038/517265a [DOI] [PubMed] [Google Scholar]
  • 3. Ravenstein EG. The laws of migration. Journal of the Statistical Society of London. 1885;48:167–235. 10.2307/2979181 [DOI] [Google Scholar]
  • 4. Abel GJ, Sander N. Quantifying global international migration flows. Science. 2014;343:1520–1522. 10.1126/science.1248676 [DOI] [PubMed] [Google Scholar]
  • 5. Rogers A, Willekens F, Ledent J. Migration and settlement: a multiregional comparative study. Environment and planning A. 1983;15:1585–1612. 10.1068/a151585 [DOI] [PubMed] [Google Scholar]
  • 6. Barrat A, Barthelemy M, Pastor-Satorras R, Vespignani A. The architecture of complex weighted networks. Proceedings of the National Academy of Sciences of the USA. 2004;101:3747–3752. 10.1073/pnas.0400087101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Willekens F, Massey D, Raymer J, Beauchemin C. International migration under the microscope. Science. 2016;352:897–899. 10.1126/science.aaf6545 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Editorial article. Data on movements of refugees and migrants are flawed. Nature. 2017;543:5–6. 10.1038/543005b [DOI] [PubMed] [Google Scholar]
  • 9. Butler D. What the numbers say about refugees. Nature. 2017;543:22–23. 10.1038/543022a [DOI] [PubMed] [Google Scholar]
  • 10. Dijstelbloem H. Migration tracking is a mess. Nature. 2017;543:32 10.1038/543032a [DOI] [PubMed] [Google Scholar]
  • 11. Barbosa H, Barthelemy M, Ghoshal G, James CR, Lenormand M, Louail T, et al. Human mobility: Models and applications. Physics Reports. 2018;734:1–74. 10.1016/j.physrep.2018.01.001 [DOI] [Google Scholar]
  • 12. Kang C, Ma X, Tong D, Liu Y. Intra-urban human mobility patterns: An urban morphology perspective. Physica A: Statistical Mechanics and its Applications. 2012;391(4):1702–1717. 10.1016/j.physa.2011.11.005 [DOI] [Google Scholar]
  • 13. Calabrese F, Diao M, Di Lorenzo G, Ferreira J Jr, Ratti C. Understanding individual mobility patterns from urban sensing data: A mobile phone trace example. Transportation Research Part C: Emerging Technologies. 2013;26:301–313. 10.1016/j.trc.2012.09.009 [DOI] [Google Scholar]
  • 14. Louail T, Lenormand M, Ros OGC, Picornell M, Herranz R, Frias-Martinez E, et al. From mobile phone data to the spatial structure of cities. Scientific Reports. 2014;4:5276 10.1038/srep05276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Yang Y, Tan C, Liu Z, Wu F, Zhuang Y. Urban Dreams of Migrants: A Case Study of Migrant Integration in Shanghai. In: Procs. of The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-2018). AAAI; 2018.
  • 16. Gonzalez MC, Hidalgo CA, Barabasi AL. Understanding individual human mobility patterns. Nature. 2008;453:779–782. 10.1038/nature06958 [DOI] [PubMed] [Google Scholar]
  • 17. Krings G, Calabrese F, Ratti C, Blondel VD. Urban gravity: a model for inter-city telecommunication flows. Journal of Statistical Mechanics: Theory and Experiment. 2009;2009:L07003. [Google Scholar]
  • 18. Kung KS, Greco K, Sobolevsky S, Ratti C. Exploring universal patterns in human home-work commuting from mobile phone data. PLoS ONE. 2014;9:e96180 10.1371/journal.pone.0096180 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Bajardi P, Delfino M, Panisson A, Petri G, Tizzoni M. Unveiling patterns of international communities in a global city using mobile phone data. EPJ Data Science. 2015;4:3 10.1140/epjds/s13688-015-0041-5 [DOI] [Google Scholar]
  • 20. Alfeo AL, Cimino MGCA, Lepri B, Pentland AS, Vaglini G. Assessing Refugees’ Integration via Spatio-temporal Similarities of Mobility and Calling Behaviors. IEEE Transactions on Computational Social Systems (Early Access). 2019; p. 1–13. [Google Scholar]
  • 21. Gallotti R, Bazzani A, Rambaldi S, Barthelemy M. A stochastic model of randomly accelerated walkers for human mobility. Nature Communications. 2016;7:12600 10.1038/ncomms12600 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Zagheni E, Weber I, Gummadi K, et al. Leveraging Facebook’s advertising platform to monitor stocks of migrants. Population and Development Review. 2017;43:721–734. 10.1111/padr.12102 [DOI] [Google Scholar]
  • 23. Hawelka B, Sitko I, Beinat E, Sobolevsky S, Kazakopoulos P, Ratti C. Geo-located Twitter as proxy for global mobility patterns. Cartography and Geographic Information Science. 2014;41:260–271. 10.1080/15230406.2014.890072 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Lenormand M, Gonçalves B, Tugores A, Ramasco JJ. Human diffusion and city influence. Journal of The Royal Society Interface. 2015;12:20150473 10.1098/rsif.2015.0473 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Dredze M, García-Herranz M, Rutherford A, Mann G. Twitter as a source of global mobility patterns for social good. arXiv preprint arXiv:160606343. 2016.
  • 26.Zagheni E, Garimella VRK, Weber I, et al. Inferring international and internal migration patterns from twitter data. In: Proceedings of the 23rd International Conference on World Wide Web. ACM; 2014. p. 439–444.
  • 27.Aswad F, Menezes R. Refugee and Immigration: Twitter as a Proxy for Reality. In: The Thirty-First International Florida Artificial Intelligence Research Society Conference (FLAIRS-31). AAAI Publications; 2018. p. 17627.
  • 28.Hausman R, Hinz J, Yildirim MA. Measuring Venezuelan emigration with Twitter. Kiel Working Paper, No. 2106, Kiel Institute for the World Economy (IfW), Kiel; 2018. Available from: http://hdl.handle.net/10419/17912.
  • 29. Blanford JI, Huang Z, Savelyev A, MacEachren AM. Geo-located tweets. Enhancing mobility maps and capturing cross-border movement. PLoS ONE. 2015;10:e0129202 10.1371/journal.pone.0129202 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Arribas-Bel D. The spoken postcodes. Regional Studies, Regional Science. 2015;2:458–461. 10.1080/21681376.2015.1067151 [DOI] [Google Scholar]
  • 31. Lamanna F, Lenormand M, Salas-Olmedo MH, Romanillos G, Gonçalves B, Ramasco JJ. Immigrant community integration in world cities. PLoS ONE. 2018;13:e0191612 10.1371/journal.pone.0191612 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Stepanova E. The role of information communication technologies in the “arab spring”. Ponars Eurasia. 2011;15:1–6. [Google Scholar]
  • 33. Lenormand M, Picornell M, Cantú-Ros OG, Tugores A, Louail T, Herranz R, et al. Cross-checking different sources of mobility information. PLoS One. 2014;9:e105184 10.1371/journal.pone.0105184 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Mislove A, Lehmann S, Ahn YY, Onnela JP, Rosenquist JN. Understanding the demographics of twitter users. In: Fifth international AAAI conference on weblogs and social media; 2011.
  • 35. Bokányi E, Kondor D, Dobos L, Sebők T, Stéger J, Csabai I, et al. Race, religion and the city: twitter word frequency patterns reveal dominant demographic dimensions in the United States. Palgrave Communications. 2016;2:16010 10.1057/palcomms.2016.10 [DOI] [Google Scholar]
  • 36. Sloan L. Who tweets in the United Kingdom? Profiling the Twitter population using the British social attitudes survey 2015. Social Media & Society. 2017;3:2056305117698981. [Google Scholar]
  • 37. Lenormand M, Tugores A, Colet P, Ramasco JJ. Tweets on the road. PLoS ONE. 2014;9:e105407 10.1371/journal.pone.0105407 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Mazzoli M, Molas A, Bassolas A, Lenormand M, Colet P, Ramasco JJ. Field theory for recurrent mobility. in press. 2019;XX:XX. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Bilsborrow RE, Hugo G, Oberai AS, et al. International migration statistics: Guidelines for improving data collection systems. Geneva, Switzerland: International Labour Organization; 1997. [Google Scholar]
  • 40. United Nations, Economic Commission for Europe, Committee on Environmental Policy. Principles and Recommendations for Population and Housing Censuses, Revision 2. Geneva, Switzerland: United Nations Publications; 2008. [Google Scholar]
  • 41. Hughes C, Zagheni E, Abel GJ, Sorichetta A, Wi’sniowski A, Weber I, et al. Inferring Migrations: Traditional Methods and New Approaches based on Mobile Phone, Social Media, and other Big Data: Feasibility study on Inferring (labour) mobility and migration in the European Union from big data and social media data. 2016. [Google Scholar]
  • 42.Migration Data Portal. Migration data sources; 2019. https://migrationdataportal.org/themes/migration-data-sources.
  • 43.UNHCR website. Number of refugees and migrants from Venezuela reaches 3 million; 2018. https://www.unhcr.org/news/press/2018/11/5be4192b4/number-refugees-migrants-venezuela-reaches-3-million.html?query=venezuela.
  • 44.UNHCR website. R4V América Latina y el Caribe, refugiados y migrantes venezolanos en la región—Enero 2019; 2019. https://data2.unhcr.org/es/documents/details/68070.
  • 45.IOM website. Migration trends in the Americas; 2018. https://www.iom.int/venezuela-migration-trends-americas-september-2018.
  • 46.Federal Police of Brazil 2018; 2018. http://www.casacivil.gov.br/central-de-conteudos/noticias/2018/dezembro/comite-federal-apresenta-balanco-de-acoes-de-acolhimento-de-venezuelanos.
  • 47.Federal Police of Brazil 2019; 2019. http://www.pf.gov.br/servicos-pf/imigracao/apresentcao-policia-federal-ate-abril-de-2019.pdf.
  • 48. Cesare N, Lee H, McCormick T, Spiro E, Zagheni E. Promises and pitfalls of using digital traces for demographic research. Demography. 2018;55:1979–1999. 10.1007/s13524-018-0715-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Sloan L, Morgan J. Who tweets with their location? Understanding the relationship between demographic characteristics and the use of geoservices and geotagging on Twitter. PloS one. 2015;10:e0142209 10.1371/journal.pone.0142209 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Lenormand M, Louail T, Cantú-Ros OG, Picornell M, Herranz R, Arias JM, et al. Influence of sociodemographic characteristics on human mobility. Scientific Reports. 2015;5:10075 10.1038/srep10075 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Documentation on the Twitter access API;. https://developer.twitter.com/en/docs.
  • 52.Joint UNHCR-IOM press release: Venezuelan outflow continues unabated, stands now at 3.4 million;. https://www.unhcr.org/ph/15238-venezuelan-outflow-continues-unabated-stands-now-at-3-4-million.html.
  • 53.Internation Organization for Migration UN. Glossary on Migration. Geneva, Switzerland: IOM; 2019. Available from: https://publications.iom.int/system/files/pdf/iml_34_glossary.pdf.
  • 54.Instituto Nacional de Estadística de Venezuela. Censo de Población y Vivienda de Venezuela 2011; 2011. http://www.redatam.ine.gob.ve/Censo2011/index.html.
  • 55.United Nations Population Division DESA. World Population Prospects 2017; 2017. https://population.un.org/wpp/Download/Standard/Population/.
  • 56.OpenStreetMap; 2019. https://www.openstreetmap.org/copyright.
  • 57. Mackay DJC. Information Theory, Inference and Learning Algorithms. Cambridge, UK: Cambridge University Press; 2003. [Google Scholar]
  • 58.UNICEF: A Call to Action: Protecting children on the move starts with better data;. https://data.unicef.org/resources/call-action-protecting-children-move-starts-better-data/.
  • 59.Bromberg Gaber Y. Collecting by geographic location;. https://gwu-libraries.github.io/sfm-ui/posts/2017-04-12-geographic-collecting.
  • 60. Hübl F, Cvetojevic S, Hochmair H, Paulus G. Analyzing refugee migration patterns using geo-tagged tweets. ISPRS International Journal of Geo-Information. 2017;6:302 10.3390/ijgi6100302 [DOI] [Google Scholar]

Decision Letter 0

Jordi Paniagua

11 Dec 2019

PONE-D-19-28022

Migrant mobility flows characterized with digital data

PLOS ONE

Dear Mr. MAZZOLI,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Both reviewers consider that your manuscript has merits to be considered for publication. However, they also observe some minor issues that should be addressed. Please respond to all reviewers' comments.

We would appreciate receiving your revised manuscript by Jan 25 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Jordi Paniagua

Academic Editor

PLOS ONE

Journal Requirements:

1.

When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2.  Please remove your figures from within your manuscript file, leaving only the individual TIFF/EPS image files, uploaded separately.  These will be automatically included in the reviewers’ PDF.

3.

We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service.  

Whilst you may use any professional scientific editing service of your choice, PLOS has partnered with both American Journal Experts (AJE) and Editage to provide discounted services to PLOS authors. Both organizations have experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. To take advantage of our partnership with AJE, visit the AJE website (http://learn.aje.com/plos/) for a 15% discount off AJE services. To take advantage of our partnership with Editage, visit the Editage website (www.editage.com) and enter referral code PLOSEDIT for a 15% discount off Editage services.  If the PLOS editorial team finds any language issues in text that either AJE or Editage has edited, the service provider will re-edit the text for free.

Upon resubmission, please provide the following:

  • The name of the colleague or the details of the professional service that edited your manuscript

  • A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file)

  • A clean copy of the edited manuscript (uploaded as the new *manuscript* file)

4. Please clarify whether there was any ethical oversight over the study, and whether the authors had access to any identifying information.

5.

We note that Figures 2 and 5 in your submission contain [map/satellite] images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

1.    You may seek permission from the original copyright holder of Figure(s) [#] to publish the content specifically under the CC BY 4.0 license. 

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

2.    If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper analyzes a relevant social issue, which is migration under humanitarian crisis. It is a valuable contribution from the computational social science perspective, and shows that the usage of data from social media can help to real-time track migration events. The paper is clearly written and the methodology is well described, so I think that it can be a valuable contribution for PLOS ONE.

However, I have a couple of minor issues and doubts which I suggest that the authors address:

1) In Fig. 1 and Fig. 5, I think that the correlation values in term of the R^2 do not provide a clear highlight when the distribution of values seems to obbey a power law. This R^2 is probably mainly dominated by the first 2 or 3 countries. I suggest to remove this indications, or either justify their usage, or either compute them in a logarithmic scale.

2) It would be interesting to point out that this methodology does not fit well for some countries in the South Cone which are further from Venezuela, like Chile (~290k "official" migrants vs ~800k? estimated) and Argentina (~130k "official" vs ~600k? estimated). This might be related to higher Twitter penetration in those countries, which promotes its usage by migrants; or to the correlation between distance travelled and socio-economic status, or to other factors that make the upscaling work incorrectly for those countries. I think that this issue deserves to be briefly discussed.

3) I did not understand the following phrase in the Discussion: "We could have used a stricter criterion and request two or more tweets abroad but this does not affect the average flows (the upscaling factors absorb it), although it notably enhances the statistical fluctuations".

As far as I understand, the scaling factor is computed as the ratio between Venezuela population and the amount of residents (TUV's). The criterion for detecting a migration situation does not affect any of the previous quantities. In this sense, if we apply a stricter "migration criterion" then the upscaled migration amount will be affected as well.

Reviewer #2: The authors propose a novel method for assessing and studying the phenomenon of migration using twitter geo-located data. The authors apply the method to the Venezuelan Migration crisis showing they are able to estimate the amounts of migrants in certain years. The estimates are compatible with those found by international organizations. Moreover, they provide a way to study in detail the geographic distribution of routes of migration.

I find the idea of using Twitter data for migrations quite appealing, despite the limitations this kind of data might have (even though the authors provide a discussion of such limitations in the conclusions). Researchers interested in migration patterns do not have always access to private data from mobile phone companies, and surveys made by international organizations might not have the desired level of detail for certain studies.

Hence, I would recommend the article for publication with minor revisions I am certain the authors will be able to address easily:

- in page 9 the authors propose a way to estimate the fluxes crossing the border of the Venezuelan country. However, I was not able to understand precisely how this is done (maybe due to my limited comprehension ability). Is it estimated by counting the number of vectors crossing the line? Are the authors able of following the trajectory of an individual and hence assess whether he crosses the border? Please rephrase it better in the text.

- The authors at page 9 makes distinction between migration patterns on land and by airplane. Of course the second ones belong to less disadvantaged individuals but still the flux might be relevant for migration studies. Do the authors think it would be possible to identify this kind of migrations as well?

- In the discussion section the authors state that the work has proven to be helpful for humanitarian agencies. Do they mean it has already been applied by these agencies for some of their studies? In this case, I would add a reference if available. Otherwise, I would state that the method "could be useful" or "have potential use for" these agencies.

- In general the work is interesting due to the fact the data used is publicly available. I would discuss a little bit about possible comparisons with private data in order to further validate the method.

After having addressed this minor comments, in my opinion the work will be ready for publication.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Mar 23;15(3):e0230264. doi: 10.1371/journal.pone.0230264.r002

Author response to Decision Letter 0


8 Jan 2020

Answers to reviewers' comments:

First of all, we would like to thank the reviewers for the positive and constructive comments.

Reviewer #1:

This paper analyzes a relevant social issue, which is migration under humanitarian crisis. It is a valuable contribution from the computational social science perspective, and shows that the usage of data from social media can help to real-time track migration events. The paper is clearly written and the methodology is well described, so I think that it can be a valuable contribution for PLOS ONE.

However, I have a couple of minor issues and doubts which I suggest that the authors address:

1) In Fig. 1 and Fig. 5, I think that the correlation values in term of the R^2 do not provide a clear highlight when the distribution of values seems to obey a power law. This R^2 is probably mainly dominated by the first 2 or 3 countries. I suggest to remove this indications, or either justify their usage, or either compute them in a logarithmic scale.

The log-log scale in the representation is only a matter of convenience since the data points have several orders of magnitude. Both figures are showing an identity relation, as we are comparing expected versus estimated flows. The important question is whether the points fall over the diagonal and, consequently, the analysis on the “goodness” of fit must be done linearly to have sense. We cannot expect any scaling law out of linearity because that would imply that the model is not reproducing well the numbers observed in official data and that a systematic bias has been introduced. We added in the captions of Fig. 1 and 5 a note to clarify this point.

2) It would be interesting to point out that this methodology does not fit well for some countries in the South Cone which are further from Venezuela, like Chile (~290k "official" migrants vs ~800k? estimated) and Argentina (~130k "official" vs ~600k? estimated). This might be related to higher Twitter penetration in those countries, which promotes its usage by migrants; or to the correlation between distance travelled and socio-economic status, or to other factors that make the upscaling work incorrectly for those countries. I think that

this issue deserves to be briefly discussed.

We thank the reviewer for noticing this distortion. This is indeed an important observation to make. We think that the factors acting here may be two: on one side, as the reviewer states, the upscaling factor here is probably affected by the fact that the penetration rate in these countries is different. From a social perspective, migrants moving to other countries may conform to the local culture in the way the social platforms are used. However, this change of habits take time and, in some cases, it can be even generations. On the other hand, the difference could be this high because of a miss-representation of the Venezuelan migrants from the official statistics. Note what happened in Brazil, where the official numbers from the UN organizations are quite below the ones from the Federal Police and ours (Table 2). This is why it is so important to get information from extra sources. As we see from Figure 2, the routes of migration from Venezuela extend up to Santiago and Buenos Aires. We added a paragraph mentioning this issue in the Discussion Section.

3) I did not understand the following phrase in the Discussion: "We could have used a stricter criterion and request two or more tweets abroad but this does not affect the average flows (the upscaling factors absorb it), although it notably enhances the statistical fluctuations". As far as I understand, the scaling factor is computed as the ratio between Venezuela population and the amount of residents (TUV's). The criterion for detecting a migration situation does not affect any of the previous quantities. In this sense, if we apply a stricter "migration criterion" then the upscaled migration amount will be affected as well.

As the reviewer observed it is true that “the scaling factor is computed as the ratio between Venezuela population and the amount of residents (TUV's)”. The sentence "We could have used a stricter criterion and request two or more tweets abroad but this does not affect the average flows (the upscaling factors absorb it), although it notably enhances the statistical fluctuations" is intended to say that if we want to use a stricter criterion to classify people abroad as migrants, we should be consistent on the residents classification as well. In this sense, if we want to check for two consecutive tweets abroad in a specific country, in order not to lose the consistency, we should ask for residents to tweet at least twice in that year to consider them as active. On the other hand, by doing this, one notably reduces the sampling of the data and narrows the statistics, hence we discarded this option. In order to make the text clearer, we added this reflection to the above sentence.

Reviewer #2:

The authors propose a novel method for assessing and studying the phenomenon of migration using twitter geo-located data. The authors apply the method to the Venezuelan Migration crisis showing they are able to estimate the amounts of migrants in certain years. The estimates are compatible with those found by international organizations. Moreover, they provide a way to study in detail the geographic distribution of routes of migration. I find the idea of using Twitter data for migrations quite appealing, despite the limitations this kind of data might have (even though the authors provide a discussion of such limitations in the conclusions). Researchers interested in migration patterns do not have always access to private data from mobile phone companies, and surveys made by international organizations might not have the desired level of detail for certain studies. Hence, I would recommend the article for publication with minor revisions I am certain the authors will be able to address easily:

- in page 9 the authors propose a way to estimate the fluxes crossing the border of the Venezuelan country. However, I was not able to understand precisely how this is done (maybe due to my limited comprehension ability). Is it estimated by counting the number of vectors crossing the line? Are the authors able of following the trajectory of an individual and hence assess whether he crosses the border? Please rephrase it better in the text.

We thank the reviewer for pointing out this issue on the understanding of what are the numbers estimated in our methods. In the previous part of the manuscript, the method used to define a previously classified Venezuelan resident as a migrant was to detect at least one tweet from them in a second country as a proof of border crossing (Fig.1 and Table 2). We added this explanation in Validation of external flows subsection, lines 246 and following. On the other hand, from line 291, we introduce a different method, which requires a stricter sampling. We now want to assess whether migrants moving on the ground crossed a specific line along their trajectory. The results of this new measure are depicted in Figure 3 and Table 3. In order to be sure that they crossed the dashed line, we have to take at least one tweet on one side and one tweet on the other side of the dashed line. The vectorial depiction is a way to characterize the general direction of movement but it is not used to count the number of crossings. We added a clarification regarding this measure in lines 293 and following.

- The authors at page 9 makes distinction between migration patterns on land and by airplane. Of course the second ones belong to less disadvantaged individuals but still the flux might be relevant for migration studies. Do the authors think it would be possible to identify this kind of migrations as well?

There is possibility to detect air trips by having tweets of the same user in two faraway places and with a time interval compatible with a flight speed (between 300 and 900 km/h). We found a few cases in our data but they are not enough to do proper statistics. One can always use more relaxed criteria, like assuming that tweets happening between distant locations are footprints of air displacements regardless of the time between them. However, this can lead to false positives, like people who traveled on the ground by car/bus and never tweeted along the route.

- In the discussion section the authors state that the work has proven to be helpful for humanitarian agencies. Do they mean it has already been applied by these agencies for some of their studies? In this case, I would add a reference if available. Otherwise, I would state that the method "could be useful" or "have potential use for" these agencies.

It must be noticed that part of the authors belongs to UNICEF, specifically they are based in Brasilia and New York. Some of them are operatives and our results and data were discussed during the decision-making process regarding the Venezuelan crisis in Brazil and other nearby countries. This statement was included by them during the writing process and the rest of authors has no reason to consider it as false.

In particular, the insights from this research helped UNICEF to keep a broader vision of the scale of the migration problem beyond the border with Venezuela, which was the case before. Based on this, the UNICEF team moved into looking at (i) how to integrate the humanitarian response into our regular program of cooperation, especially our Municipal Seal of Approval; and (ii) expanding the reach of an AI-inspired project on xenophobia beyond the State of Roraima, close to the border.

- In general the work is interesting due to the fact the data used is publicly available. I would discuss a little bit about possible comparisons with private data in order to further validate the method.

We have added a paragraph in the Discussion section commenting on the possible comparisons that one could make with other data sources like private data.

After having addressed this minor comments, in my opinion the work will be ready for publication.

Attachment

Submitted filename: answers_reviewers.pdf

Decision Letter 1

Jordi Paniagua

26 Feb 2020

Migrant mobility flows characterized with digital data

PONE-D-19-28022R1

Dear Dr. MAZZOLI,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Jordi Paniagua

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: All my previous comments have been sufficiently addressed. Therefore, I recommend this article for publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Acceptance letter

Jordi Paniagua

6 Mar 2020

PONE-D-19-28022R1

Migrant mobility flows characterized with digital data

Dear Dr. Mazzoli:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jordi Paniagua

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: answers_reviewers.pdf

    Data Availability Statement

    The manuscript contains all the information needed to download the data and to replicate the analysis.

    In this work, we use several data sources: Geolocated Twitter, UNHCR, IOM and Federal Police of Brazil. The access links are included as references in the manuscript. In particular, the Twitter API in Ref. [51], the UNHCR in Refs. [43, 44, 52], the UN population division DESA [55], the IOM at [45], the Federal Police of Brazil at [46, 47] and the Venezuelan census [54].

    For Twitter, the geolocated data is downloaded using the streaming API. Every user can access this information by following the instructions of the updated user guide on the Twitter streaming API provided at https://developer.twitter.com/en/docs. We include next a working example of the python code needed to do a query to the Twitter streaming API in a limited geographical BOX enclosed between latitude y0 and y1, and longitude x0 and x1. The aim of this code is only illustrative, the commands can be changed in any moment by the Twitter developers.

    from tweepy import Stream, OAuthHandler

    from tweepy.streaming import StreamListener

    CONSUMER_KEY = ’’

    CONSUMER_SECRET = ’’

    ACCESS_KEY = ’’

    ACCESS_SECRET = ’’

    BOX = [x0, y0, x1, y1]

    class MyStreamListener(StreamListener):

     def on_status(self, status):

      print(status)

    if __name__ == ’main’:

     auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)

     auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)

     listen = MyStreamListener()

     stream = Stream(auth, listen, gzip=True)

     stream.filter(locations=BOX)


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES