Skip to main content
PLOS One logoLink to PLOS One
. 2020 Jun 30;15(6):e0235224. doi: 10.1371/journal.pone.0235224

A method to estimate population densities and electricity consumption from mobile phone data in developing countries

Hadrien Salat 1,2,*, Zbigniew Smoreda 2, Markus Schläpfer 1
Editor: Shihe Fu3
PMCID: PMC7326166  PMID: 32603345

Abstract

High quality census data are not always available in developing countries. Instead, mobile phone data are becoming a popular proxy to evaluate the density, activity and social characteristics of a population. They offer additional advantages: they are updated in real-time, include mobility information and record visitors’ activity. However, we show with the example of Senegal that the direct correlation between the average phone activity and both the population density and the nighttime lights intensity may be insufficiently high to provide an accurate representation of the situation. There are reasons to expect this, such as the heterogeneity of the market share or the particular granularity of the distribution of cell towers. In contrast, we present a method based on the daily, weekly and yearly phone activity curves and on the network characteristics of the mobile phone data, that allows to estimate more accurately such information without compromising people’s privacy. This information can be vital for development and infrastructure planning. In particular, this method could help to reduce significantly the logistic costs of data collection in the particularly budget-constrained context of developing countries.

Introduction

Mobile phone data allow, under certain conditions, to recover a map of the population and can potentially simplify the logistics of census data collection [14]. This could prove particularly useful in developing countries where such costs cannot be overlooked. However, these approaches have only been validated in developed countries where detailed fine-grained data are comparatively easier to access to train the models. Furthermore, a primary objective of population mapping is to inform infrastructure planning. In that respect, mobile phone data have a number of advantages over a simple population count. They represent some notion of intensity of activity, include dynamic real-time usage information and contain mobility patterns. For example, significant results have been obtained for the prediction of short-term population dynamics inside cities [5, 6] and for the prediction of detailed socioeconomic characteristics of users from metadata [79]. However, these methods once again require a large amount of fine-grained data, up to the individual level and its associated privacy concerns, to train the models or may require additional sources of data such as satellite images.

Building on these pioneering studies, we propose a new method, based on the daily, weekly and yearly phone activity curves and on the network characteristics of the mobile phone data (such as node centrality or incoming to outgoing traffic ratio), that is focused on the specific context of data scarcity in developing countries and that does not require information about identified individuals. Some earlier work has revealed mobile phone data’s particular potential for predicting electricity demand [10, 11]. In sub-Saharan Africa, electrification rates remain extremely low, without much optimism for a rapid improvement of the situation [1215]. In 2013 in Senegal, when the last census was collected, the average electrification rate in rural areas was as low as 24%. Paradoxically, mobile phones have still found their way into the homes of about 75% of the population in these same rural areas. In fact, some studies praise the large coverage achieved by mobile phones in the entire African region [16, 17]. Our aim is to use the resulting data to guide efficient manual data collection and therefore reduce the logistic costs of gathering information for development and infrastructure planning in developing countries. We both test the possibility of estimating census data from mobile phone data and evaluate the potential of better predicting electricity demand from mobile phone data rather than from the population count. We first describe our method in details in the materials and methods section, then use a bulk of data that we have gathered for Senegal in 2013, including the aggregated call detail records from Sonatel, the leading mobile phone operator in Senegal (with 65% market share), to validate our proposed approach.

Materials and methods

The population density for each commune in Senegal is given by the 2013 census. There are 552 communes of irregular sizes according to the division provided (created in December 2013), including urban communes (communes de ville and communes d’arrondissement) and rural communes (communautés rurales). This information was collected using a door to door approach over the entire country, rather than by estimation. The population densities’ distribution is close to a narrow Poisson distribution, with an average of 2162 inh./km2 and a maximum of 54325 inh./km2. The distribution is mapped over a rather precise shapefile that includes in particular the boundaries of all big and medium towns. We use mobile phone data provided by the largest Senegalese telecommunication operator, with about 65% market share in 2013, Sonatel. They contain the number of text messages, number of calls and total length of calls made during each hour between each of the operator’s 1666 communication towers during the year 2013. Out of these, 54 towers were inactive and have been removed from the analysis, leaving 1612 towers. To estimate the electricity consumption, we used NOAA’s average nighttime lights intensity for the year 2013. The intensity is given as a number between 0 and 63 for each cell of a 30 second arc grid. Since Senegal is close to the equator, this grid is regular and its cells measure about 1km per 1km. This data has been cleaned by NOAA from the interference of moonlight, clouds, etc. to the best of their ability.

We have produced two different levels of aggregation. The first one is the commune level. It is aimed at preserving the population counts as accurate as possible. The second consists of Voronoi cells around each tower. It aims at preserving the mobile phone data as precise as possible. In this case, the population count inside each Voronoi cell has been estimated from the intersection between the Voronoi cell and the communes, assuming a uniform distribution inside each commune. Since the small high and medium density communes are finely separated from the big low density communes in the geographical data, this assumption seems reasonable. The nighttime light intensity is averaged inside each Voronoi cell or commune. If a pixel is only partially included in the cell, its intensity is weighted in the average by the area actually included in the cell. The end result is two tables containing the population count per square kilometre, the average number of texts, average number of calls and average total call length per hour per square kilometre, and the nighttime light intensity per pixel, inside each of the 552 communes and each of the 1612 Voronoi cells.

For the record, we also computed the total value of each variable inside the cells instead of their density per square kilometre. We found consistently better correlations between densities rather than between total values. The results for the total values are therefore not shown in this paper. In addition, since nighttime lights and residential locations represent a better picture of the night activity rather than the day activity, we isolated the texts and calls made between 7 p.m. and 7 a.m. Finally, the communes have been further divided into 444 “low density” communes and 108 “high density” communes, corresponding to a density lower or higher than 1000 inhabitants per pixel, to distinguish between mainly rural and mixed or purely urban areas. This threshold is arbitrary, but commonly used, for example by the Food and Agriculture Organization (FAO) of the United Nations and the Global Rural Urban Mapping Project (GRUMP). Similarly, the Voronoi cells have been divided into 1027 “low density” cells and 639 “high density cells” according to the same criteria.

The direct correlations shown in the results are simple squared Pearson coefficients. To exploit further the data, we compute some curves representing for each hour of the day, week and year, the average number of texts, number of calls and total call length for each tower site. These 9 types of curves represent the local phone usage profiles. They are used to generate matrices, called distance matrices, based on the point-by-point correlation and the standard deviation between the point-by-point distances between two curves to evaluate how point-by-point “parallel” they are. Here, each point represents each hour contained in a day, week or year and “point-by-point” means that we are comparing the values between the curves at each hour. This gives a total of 18 distance matrices. We also exploit the characteristics of the data’s network structure. We transform it into weighted directed graphs averaged over the entire year. An edge is created between two cell towers if the daily activity is above a predefined threshold. We use five thresholds: 0, 75, 150, 300 and 600, based on the histogram of the activity between two pair of nodes (approximately an inverse power law of exponent 1.81). We then create feature matrices recording for all nodes their degree, betweenness and closeness centrality measures (both weighted and unweighted), the ratio of self-loops to the total traffic, the ratio between the number of incoming and outgoing traffic and the average distance travelled by a text message or call. With this process, we obtain an additional 15 feature matrices.

As a reference, the hourly curves for the number of calls aggregated at national level for each day of the year are represented in Fig 1(a). There is one colour per month ranging from reds to yellows to greens to blues. The yearly average of number of texts per hour of the day sent from each tower is shown in random colours in Fig 1(b). The network structure in January limited to edges corresponding to at least 2000 text messages sent is represented in Fig 1(c).

Fig 1. Mobile phone activity profiles and network structure.

Fig 1

(a) Number of calls per hour aggregated at national level for each day of the year. (b) Yearly average of the number of texts per hour of the day sent from each tower. (c) Network structure limited to edges corresponding to at least 2000 text messages sent in January.

Our proposed method consists in trying to rebuild the original dataset from a sample as small as possible using hierarchical clustering of some of the 33 previously created matrices. The working hypothesis is that similar locations will share in particular similar phone activity habits and have a similar place in the communication network. In the first step, we build a dendrogram from the distance and feature matrices using the hclust hierarchical clustering algorithm implemented in R. Assuming that we know the population density or nighttime lights intensity for a number of reference towers, the values of all the other towers are predicted from the proximity of their activity curve or network characteristics to the activity curves or network characteristics of the reference towers. Specifically, the value for a non-reference tower is set to be equal to that of the closest reference tower for the chosen distance. We compare the performance of samples of reference towers chosen randomly with samples chosen according to the clustering tree.

To choose a sample according to the clustering tree, we first cut the tree at a chosen depth, then generate the sample by including one randomly chosen leave per resulting branch. By construction, when the selected depth is increased, the sample size is also increased. In particular, the depth can be chosen to match a desired sample size. To illustrate this process, consider the tree corresponding to the daily number of texts used above. It is shown in Fig 2(a). Each branch is identified by a binary number counting the number of left turns (indicated by a 0) and right turns (indicated by a 1) that are necessary to reach it while scrolling the tree starting from the top. Five illustrative clusters, evidenced by a colour code, are plotted over a map of Senegal in Fig 2(b). We can observe that the blue cluster identifies mostly rural areas, the orange one is mixed, while the other clusters identify only cities. For a chosen depth, equivalent to a binary numbers’ length, we select one random leave in each induced cluster to populate the sample. Examples are given in Fig 2(c). With a depth of 3 (dark blue), we obtain 0.5% of all towers, with a depth of 7 (medium blue), we obtain 3.3% of all towers, and with a depth of 19 (teal), 44.4% of all towers. Naturally, in the event of a branch being reduced to only one leave before the chosen depth is reached, this leave is kept in the sample and the branch is not divided further. There may therefore be fewer elements in the sample than the power of two of the chosen depth.

Fig 2. Description of the tree cutting process.

Fig 2

(a) Tree corresponding to the daily number of texts. The branches are identified by a binary number. (b) Locations of the members of five clusters identified on the tree. (c) Locations of one randomly chosen element per branch of the tree cut at depth 3 (dark blue), 7 (medium blue) and 19 (cyan).

The census data for the year 2013 in Senegal can be directly accessed through the official website. The nighttime lights data can be accessed through NOAA’s open database. The mobile phone data at Voronoi and Commune level and aggregated over the year are available as part of the supplementary material (S1 and S2 Files). The identity of the callers has been removed and the exact location of the communication towers has been slightly modified for confidentiality reasons. In addition, a time series containing the number of calls per hour for the month of January is also available as part of the supplementary material (S3 File). To obtain the dataset over the entire year, one would need to contact Sonatel directly and present the research project that would require the data (contact: Mr El Hadji Birahim Gueye, Direction des Systèmes d’information Sonatel, ebgueye@orange-sonatel.com or post mail: Orange-Sonatel, 46 Boulevard de la République, BP 69 Dakar, Senegal). The analysis was performed using R.

This research uses data from Senegal. It was approved by the Senegalese Commission de Protection des Données Personnelles (private data protection commission) on the 13th of July 2015 as part of the “Data for Development (D4D)” project. This analysis only uses data that was de-identified by the mobile phone operator before accession by the authors.

Results and discussion

In Table 1, the squared Pearson correlation coefficients (r2) of population density and nighttime lights versus the average daily number of texts, number of calls and total call length per square kilometre at tower and commune levels are reported. The average hourly values of the number of text messages and calls and of the total call length per tower are only moderately correlated with the local population density at tower level in Senegal, with r2 values of 0.4-0.6. The results are better at the much coarser commune level, with values between 0.6-0.8. The mobile phone variables are also moderately correlated with the nighttime light intensity with values around 0.4 at tower level and 0.65 at commune level. Reducing the mobile phone activity to nighttime activity does improve the correlations for calls, but not for text messages.

Table 1. Population density (Pop.) and nighttime lights (Elec.) correlations (r2) with number of texts, calls and total call length daily average values (per km2) at tower and commune levels.

(n) means that only the activity between 7 p.m. and 7 a.m. was included. For each case, three results are given: all areas included, only low density areas included and only high density areas included. All p-values are < e−15.

Voronoi cells around towers
Texts Calls Length Texts (n) Calls (n) Length (n)
Pop. all 0.43 0.45 0.46 0.47 0.62 0.60
Pop. low density 0.26 0.25 0.26 0.27 0.23 0.24
Pop. high density 0.24 0.25 0.26 0.29 0.46 0.43
Elec. all 0.39 0.41 0.44 0.39 0.43 0.45
Elec. low density 0.64 0.33 0.37 0.60 0.27 0.30
Elec. high density 0.20 0.19 0.21 0.20 0.21 0.23
Communes
Texts Calls Length Texts (n) Calls (n) Length (n)
Pop. all 0.59 0.73 0.72 0.61 0.81 0.79
Pop. low density 0.85 0.93 0.92 0.84 0.92 0.91
Pop. high density 0.50 0.67 0.65 0.53 0.76 0.73
Elec. all 0.59 0.66 0.66 0.58 0.65 0.66
Elec. low density 0.59 0.65 0.65 0.58 0.62 0.62
Elec. high density 0.53 0.61 0.61 0.53 0.60 0.61

There are a number of reasons to explain why we could expect these relatively low values. Although Sonatel, the operator curating the mobile phone data, is the market leader, its market coverage is not uniform over the entire country. In addition, the only population data that was available remains at a relatively coarse spatial resolution. Unfortunately, these limitations cannot be avoided in the context of Senegal in 2013. Another limitation is the lack of precision of the Voronoi modelling that does not take into account congestion and hand offs among the towers.

We now show how we can significantly improve these predictions with our method. We focus on the tower level, as this aggregation level is far more interesting for planning endeavours than the coarse “commune level”. Fig 3(a) reports the r2 values of the population density distribution predicted from samples (expressed as a percentage of the entire distribution) guided by the dendrogram produced from the daily texts activity curves’ standard deviation. It is plotted using black dots against predictions from fully random samples of increasing size in grey squares. The boxes (lower and upper quartiles) and whiskers have been computed experimentally by repeating the process 30 times. Fig 3(b) repeats the process with electricity distributions. The maximum r2 value achieved for each case by the direct correlations is indicated as a horizontal dash line. Additional figures based on samples guided by some of the other best performing clustering trees are provided in the supplementary material (S1 Fig of S1 Appendix). A table comparing the performance of each of the 33 prepared clustering matrices is then shown in S1 Table of S1 Appendix. Only the curve-based clustering is shown here since it consistently provided slightly better results than the network-based clustering for the particular phone usage habits in Senegal.

Fig 3. Recovering population density and nighttime lights from a sample of reference mobile phone activity curves.

Fig 3

(a) Population density. (b) Nighttime lights intensity. The grey boxes are for a sample randomly chosen among all curves. The red boxes are for a sample selection guided by the daily number of texts profile clustering tree. The best direct correlation from Table 1 for each case is represented as a horizontal dash lines. All four cases have been tested 30 times to build the boxes (lower and upper quartiles) and whiskers.

We can immediately observe that using the tree as a guide has a major impact on the quality of the results, especially for population density predictions. For example, in panel (a), the technique outperforms direct density correlations with samples as small as 30%, and allows obtaining an r2 of almost 1 for samples as small as 55%. The difference between the tree guided sampling and the random sampling is smaller in panel (b), although we can notice that using the curves rather than the average values to identify similarities outperforms significantly direct correlations even with fully random sampling of reference curves. Finally, note that the direct squared correlation between electricity and population is only 0.51. We can therefore obtain better results from the mobile phone activity than from the population density.

To get some insights about the hidden functioning of the clustering, we show the average number of text messages sent per hour normalised by the total volume over the day in Fig 4(a) for a 4 clusters partition of the daily text messages standard deviation tree. Panel (b) shows the same content normalised by the phone traffic at 2 pm. We observe two effects: the green curve corresponding mostly to low density areas is more impacted than the red and blue curves during work to home travel time (4 to 7 pm) and at night. We can indeed hypothesise that the lack of electrification forces people to go to bed earlier in electricity deprived low density areas. Note that one cluster made only of five odd towers has been omitted in the figure.

Fig 4. Average curve inside the clusters.

Fig 4

(a) Average daily activity curves normalised by the overall daily volume in 3 clusters from the daily texts standard deviation dendrogram. (b) Same average activity curves normalised by the activity at 2 pm.

Finally, potential errors introduced in the clustering by locally different habits in phone usage could be smoothed out by non-parametric methods such as kernel smoothing or weighted averages between several close-by reference towers. Alternatively, potential hidden biases specific to one type of usage (e.g. calls) that might not exist for another type of usage (e.g. texts) could be mitigated by averaging the results obtained from different distance matrices. In practice, we found that combining the results from several distance matrices did not improve the results compared to the most successful distance matrices considered alone. This averaging might still be necessary if a training set cannot be gathered to identify which are these most successful distance matrices for a different specific context. Another bias could come from a shift towards platforms such as WhatsApp and Facebook Messenger, or simply different usage among different age groups, leading to a possible under-estimation of certain demographics. However, since we are only establishing similarity of usage between towers, a shift in usage that does not break similarity between places does not impact the results. In particular, smartphones and mobile data are still excessively expensive for a widespread usage in the area [17]. Internet users are therefore only going to be found in sufficiently large number in the richest areas, allowing the method to differentiate more these areas from other areas, rather than blurring the results. Similarly, if the variable ‘age distribution inside an area’has a significant impact on phone usage then the clustering will group together places with similar age distributions, so that these will be represented by adequate reference towers.

Conclusion

Our first important result is that the average mobile phone activity is not necessarily well correlated with the population density, an idea that became particularly tempting after the seminal work by Lu et al. who predicted population displacements after a natural disaster from mobile phone data [18]. With our methodology, we introduce new perspectives. We have shown that the population count of an entire census can be estimated from a substantially smaller sample of carefully selected locations with the help of the clustering trees and without requiring the Call Data Records of identified individuals. There are two main practical applications to this: a reduction in data collection costs to about one half in the best case scenarios and the possibility to keep tracking the changes in the population distribution between two census surveys. This technique could complement for example the approach by Lai et. al. which shares the same aim, but uses direct regressions from average values only [19]. Note that contrary to previous methods, the clustering relies solely on mobile phone data, without requiring a population training dataset. In addition, it can accommodate external information known a priori. It should be possible for example to use satellite data to subset potential reference locations into obviously low or obviously high density areas. This is a step in the direction proposed by the director of UN Global Pulse, Robert Kirkpatrick, who asserted that “the next phase in call-records research should be cost–benefit analyses that look at the investment needed to conduct a study, roll out an intervention and appraise the advantages for communities.” [20].

We appreciate that our method requires access to a sufficiently large mobile phone dataset and that Senegal is a leader in sub-Saharan Africa in terms of electrification rate, mobile phone penetration and census data collection. However, since we are not using the full mobile phone penetration rate, but only the 65% market share of Sonatel, we believe that there should be enough underlying data in many other developing countries to apply the method there. Beware, however, that in some cases, using a single operator with an inhomogeneous market share might introduce some important biases (cheaper providers may be chosen more by people who consume less electricity for example).

Supporting information

S1 File. Mobile phone data at Voronoi level and aggregated over the year.

The table contains an id of the Voronoi cell, the longitude and latitude coordinates of each Voronoi centre (slightly modified for privacy reasons), the average population density and nighttime light intensity per km2 inside the cell, and the number of text messages, calls and total call length per km2 for each cell.

(ZIP)

S2 File. Mobile phone data at Commune level and aggregated over the year.

Equivalent to the previous file, but at Commune level.

(ZIP)

S3 File. Time series of outgoing calls for each Voronoi cell in January.

The table contains a Voronoi cell id, a time stamp for each hour of the month and the number of outgoing calls during this hour in the cell.

(ZIP)

S1 Appendix. Additional methods and figures.

Alternative method to estimate nightlights intensity from approximate data, validation of the performance of the clustering process and additional figures.

(PDF)

Acknowledgments

The authors acknowledge Aike Steentoft for his guidance in choosing an adequate methodology to cluster the mobile phone activity curves.

Data Availability

The census data for the year 2013 in Senegal and the nighttime lights data can be accessed directly through the public databases of ANSD and NOAA (links provided in the article). The mobile phone data at Voronoi and Commune levels and aggregated over the year are available as part of the supplementary material. The identity of the callers has been removed and the exact location of the communication towers has been slightly modified for confidentiality reasons. In addition, a time series containing the number of calls per hour for the month of January is also available as part of the supplementary material. To obtain the dataset over the entire year, one would need to contact Sonatel directly and present the research project that would require the data (contact: Mr El Hadji Birahim Gueye, Direction des Systèmes d’information Sonatel, ebgueye@orange-sonatel.com or post mail: Orange-Sonatel, 46 Boulevard de la République, BP 69 Dakar, Senegal).

Funding Statement

H.S. was supported by the Orange Labs-Sonatel-ETH Singapore SEC Research Agreement No. H11283. M.S. acknowledges the Future Cities Laboratory at the Singapore-ETH Centre, which was established collaboratively between ETH Zurich and Singapore’s National Research Foundation (FI 370074016) under its Campus for Research Excellence and Technological Enterprise Programme.

References

  • 1. Ratti C, Frenchman D, Pulselli RM, S W. Mobile Landscapes: Using Location Data from Cell Phones for Urban Analysis. Environ Plann B. 2006;33(5):727–748. [Google Scholar]
  • 2. Deville P, Linard C, Martin S, Gilbert M, Stevens FR, Gaughan AE, et al. Dynamic population mapping using mobile phone data. Proc Natl Acad Sci USA. 2014;111(45):15888–15893. 10.1073/pnas.1408439111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Ricciato F, Craglia M, Widhalm P, Pantisano F, European Commission, Joint Research Centre, et al. Estimating population density distribution from network-based mobile phone data. Luxembourg: Publications Office; 2015. [Google Scholar]
  • 4. Vanhoof M, Reis F, Ploetz T, Smoreda Z. Assessing the quality of home detection from mobile phone data for official statistics. J Off Stat. 2018;34(4):935–960. [Google Scholar]
  • 5. Reades J, Calabrese F, Sevtsuk A, Ratti C. Cellular census: explorations in urban data collection. IEEE Pervas Comput. 2007;6(3):30–38. [Google Scholar]
  • 6. Louail T, Lenormand M, Cantu Ros OG, Picornell M, Herranz R, Frias-Martinez E, et al. From mobile phone data to the spatial structure of cities. Sci Rep UK. 2015;4(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Blumenstock J, Cadamuro G, On R. Predicting poverty and wealth from mobile phone metadata. Science. 2015;350(6264):1073–1076. [DOI] [PubMed] [Google Scholar]
  • 8. Steele JE, Sundsøy PR, Pezzulo C, Alegana VA, Bird TJ, Blumenstock J, et al. Mapping poverty using mobile phone and satellite data. J R Soc Interface. 2017;14(127):20160690 10.1098/rsif.2016.0690 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Jahani E, Sundsøy P, Bjelland J, Bengtsson L, Pentland AS, de Montjoye YA. Improving official statistics in emerging markets using machine learning and mobile phone data. EPJ Data Sci. 2017;6(1). [Google Scholar]
  • 10.Martinez-Cesena EA, Mancarella P, Ndiaye M, Schläpfer M. Using mobile phone data for electricity infrastructure planning; 2015. Available from: http://arxiv.org/abs/1504.03899.
  • 11.Selvarajoo S, Schläpfer M, Tan R. Urban electric load forecasting with mobile phone location data. In: Asian Conference on Energy, Power and Transportation Electrification (ACEPT). Singapore; 2018.
  • 12.Contreras Z. Modèle d’électrification rurale pour localités de moins de 500 habitants au Sénégal. Ministère de l’Energie et des Mines; 2006.
  • 13. Bernard T. Impact analysis of rural electrification projects in sub-Saharan Africa. The World Bank research observer. 2010;27(1):33–51. [Google Scholar]
  • 14. Sanoh A, Parshall L, Sarr OF, Kum S, Modi V. Local and national electricity planning in Senegal: scenarios and policies. Energy Sustain Dev. 2012;16(1):13–25. [Google Scholar]
  • 15. Diouf B, Pode R, Osei R. Initiative for 100% rural electrification in developing countries: case study of Senegal. Energy Pol. 2013;59:926–930. [Google Scholar]
  • 16. Aker JC, Mbiti IM. Mobile phones and economic development in Africa. J Econ Perspect. 2010;24(3):207–232. [Google Scholar]
  • 17.Houngbonon GV, Le Quentrec E. Access to electricity and ICT usage: a country-level assessment on sub-Saharan Africa In: 2nd Europe—Middle East—North African Regional Conference of the International Telecommunications Society (ITS). Aswan, Egypt; 2019.
  • 18. Lu X, Bengtsson L, Holme P. Predictability of population displacement after the 2010 Haiti earthquake. Proc Natl Acad Sci USA. 2012;109(29):11576–11581. 10.1073/pnas.1203882109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Lai S, Erbach-Schoenberg E, Pezzulo C, Ruktanonchai NW, Sorichetta A, Steele J, et al. Exploring the use of mobile phone data for national migration statistics. Palgrave Com. 2019;5(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Maxmen A. Can tracking people through phone-call data improve lives? Nature. 2019;569(7758):614–617. [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Shihe Fu

14 Apr 2020

PONE-D-20-02661

A method to estimate population densities and electricity consumption from mobile phone data in developing countries

PLOS ONE

Dear Dr. Salat,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Both reviewers liked your paper but also raised some conerns for revision. Please address their concerns as much as you can in the revision. I myself also have a question. As a careful reader I want to know more about how you make the prediction. Specifically, you wrote in lines 105-107  "the values of all the other towers are predicted from the proximity of their activity curve or network characteristics to the activity curves or network characteristics of the reference towers." How is the prediction exactly implemented? Do you use some nonparametric method such as kernal smoothing or weighted average? I hope you can provide more details on how you generate the prediction.

We would appreciate receiving your revised manuscript by May 29 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Shihe Fu, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper presents a method of estimating population density and electricity consumption using mobile phone data usage data, including SMS, call and data usage. Usage characteristics of cell towers are used to generate feature matrices, which are then

used to generate a graph representation of cell towers, with similar towers having connecting edges. Network analysis is then applied to the resulting graph in order to extract additional feature matrices for degree, betweenness and closeness. This

results in several feature matrices, each describing different pairwise similarity measures of the cell towers. Hierarchical clustering is applied to towers using the computed features. Subsequently, tree cutting of the resulting dendrogram is

performed at various depths, with a leaf from each of the resulting branches then being randomly selected.

Baseline R2 measures are generated by correlating SMS/Call/data volumes with population/electricity usage data. It was found that selecting nodes using the tree cutting process could result in higher correlation scores for both population density and electricity consumption measures compared to the baseline method, with results varying according to the resulting sample size for a given tree cut depth.

To the best of my knowledge, this method, in particular applying tree cutting to the dendrogram as a means to sample cell towers, is a novel approach to the problem of population and electricity usage estimation, with results that appear promising.

The authors draw on existing peer reviewed work when formulating their method, and build on existing and respected work. However, there is a lack of references when describing the exact methods employed during the analysis, which is reflected in the

relatively short citation count.

Although the results presented in this paper appear promising, there are some areas I would like to see expanded/improved on. Specifically:

* It was not entirely clear if this method was being proposed as an *alternative* to manual data collection through census, or as a way to guide efficient collection. It may be worth clarifying this point.

* SMS (and to a lesser extent, traditional mobile phone calls) volumes are decreasing in many countries (https://www.statista.com/statistics/271561/number-of-sent-sms-messages-in-the-united-kingdom-uk/), with shifts towards platforms such as WhatsApp and Facebook Messenger. If this is the case in Senegal, then the model is likely to be less effective in 2020 compared to 2013, and may result in the under-estimation of certain demographics (if, for instance, younger people are more likely to use alternatives to SMS). It may be worth addressing this point.

* P-Values are not presented in the evaluation of either the baseline or the proposed model. It would be good to see these, if possible.

* Although error bars are presented (by running the model 30 times with different random seeds), I would be interested in seeing more analysis around the sensitivity wrt. the random selection process.

* Some of the constants chosen appear fairly arbitrary; for instance the five thresholds mentioned on L87 and the 1,000 inhabitant threshold mentioned on L73. Consider explaining how they were chosen.

* A brief discussion on the type of data collected within the Senegal census may be relevant here. The authors claim that "an entire census can be estimated", however only population density and electricity consumption levels are estimated. This may be because the Senegalese census consists exclusively of population count, but this should be explained.

* Several feature descriptors are used for the hierarchical clustering process -- however, these features are not directly used when modelling population density/electricity consumption. This (superficially) seems like a wasted opportunity, it may be worth explaining why?

* A minor point, but the X-ticks for hour of day plots may be slightly more natural as [4, 8, 12, 16, 20]?

Overall, this work has strong potential, but in my opinion requires some additional work before publication.

Reviewer #2: Summary:

"A Method to estimate population densities and electricity consumption from mobile phone data in developing countries" provides a good method to evaluate the population density and activity (electrical usage) based on the mobile phone data. Their proposed method is utilizing a machine learning algorithm – hierarchical clustering to recover an entire census from a very small sample in the census using daily, weekly and yearly mobile activity. The selection of this small sample is based on the clustering algorithm. Since mobile data is relatively easier to get than the high-quality census data, this method has a high potential to utilize mobile data to help construct the census data and at the same time, greatly reduce the costs of census data collection (by utilizing the available mobile data to impute the entire census data, the collecting cost of census data is shrinking to a much smaller training sample). In general, I think it is a very interesting paper and I have much enjoyed reading it.

General comments:

My general comment is regarding mobile phone penetration and the frequency of usage of mobile phone service in developing countries. First of all, I just googled “the percentage of the world has a cell phone in 2019”, it shows 67 percent from statista.com. I do not know how reliable the data I found on google is. But I am interested in knowing whether this method can apply to other developing countries. However, based on the results of this paper, I speculate the mobile phone penetration rate is very high in Senegal. I appreciate the authors mention in the introduction that even in low electrification rural areas, mobile phone penetration in those areas are still 75 %. Secondly, mobile phone usage may vary across ages and/or education levels. People of different ages may have quite a different percentage of mobile usage even they all own a cell phone. In some extreme cases, children or school-age teenagers may not be encouraged to have/use a mobile phone. Then the lack of information from these categories of the population may affect the prediction power of the learning algorithm. Thirdly, regarding the representability of the mobile data from one provider. I appreciate the author uses the largest Senegalese telecommunication operator’s data, 65 percent of market share. From the results, I believe in Senegal, the other providers more or less target on similar categories of people compared to Sonatel. But in some cases (if extend this method to another developing country), different providers may target different categories of people. Some people choose to use a cheaper provider and they may choose to consume less amount of electricity, which may lead to bias in the model forecasting. The overall penetration of mobile phones and their frequency of usage in a particular developing country (like Senegal) may be introduced in the introduction. It would be interesting to know if in the case of low mobile penetration and/or high diverge in mobile phone usage in a developing country, how effective this method will be and what is the authors’ recommendation to use their method to uncover census data in the above cases. Besides those, I appreciate the authors take the consideration of tower in the data aggregation.

Minor comments:

Page 1, line 17,

What are the network characteristics you refer to?

Page 3 line 83,

What is the definition of ‘distance matrices’ in your paper? Can you also give more details on the ‘point-by-point’ correlation you refer? What is the definition of ‘point’?

Page 5 line 161

What is the total number of towers? I agree that the authors remove 54 towers with no activity throughout the year in the clustering algorithm approach.

Page 5 Table 1

In table 1, illustrations in the Voronoi cells around towers section (use average mobile data instead of the clustering algorithm), as a comparison, do you also remove the 54 inactive towers? If not, why? I speculate if the inactive towers are included, it will reduce the value of the correlation.

Figure 3

Why are the correlations values of towers - calls(n) and towers-texts or towers-length(n) and towers- texts be the only ones that are selected as the horizontal lines? I can see that calls(n) are the highest correlation value in pop.all and length(n) is the highest one in elec.all (both of them are in terms of aggregating of towers).

Supporting documents – S1_Appendix, Figure S1

It would be interesting if you can also add the performance from the “fully random samples of increasing size” (the grey lines in Figure 3).

Minor typos I found from the manuscript:

Page3 line 83,

Left quotation mark on the top left of the word “parallel”.

Page 4 Line 143

The extra word “calls” after “number of calls”.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Joseph Redfern

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Jun 30;15(6):e0235224. doi: 10.1371/journal.pone.0235224.r002

Author response to Decision Letter 0


4 May 2020

Dear Dr. Fu,

We want to thank the Editor and the Reviewers for their useful comments and hope that the changes detailed below will answer all their questions.

Please note that all the lines indicated in the response refer to the marked-up version.

Editor's comments

> As a careful reader I want to know more about how you make the prediction. Specifically, you wrote in lines 105-107 "the values of all the other towers are predicted from the proximity of their activity curve or network characteristics to the activity curves or network characteristics of the reference towers." How is the prediction exactly implemented? Do you use some nonparametric method such as kernal smoothing or weighted average? I hope you can provide more details on how you generate the prediction.

The value for an unknown tower is set to be equal to that of the reference tower that is closest to it for the chosen distance. This is preferred over suggested kernel smoothing or weighted averages for two reasons: first, we want to make the method as simple and accessible as possible; second, since we have a multitude of distance matrices, we can preferably smooth potential errors by averaging over different distance matrices rather than over different towers for one matrix (in the style of Table~S1). This way, potential hidden biases specific to one type of usage (e.g. calls) can be mitigated. In practice, we did not find any noticeable improvement when doing so.

We have made the following changes:

Added above precision on prediction mechanics (l.118-119).

Added averaging (kernel, different matrices\\dots) as possible extensions (l.219-228).

Reviewer #1's comments

> Although the results presented in this paper appear promising, there are some areas I would like to see expanded/improved on. Specifically:

> It was not entirely clear if this method was being proposed as an *alternative* to manual data collection through census, or as a way to guide efficient collection. It may be worth clarifying this point.

This has been clarified in the introduction (l.28).

> SMS (and to a lesser extent, traditional mobile phone calls) volumes are decreasing in many countries (https://www.statista.com/statistics/271561/number-of-sent-sms-messages-in-the-united-kingdom-uk/), with shifts towards platforms such as WhatsApp and Facebook Messenger. If this is the case in Senegal, then the model is likely to be less effective in 2020 compared to 2013, and may result in the under-estimation of certain demographics (if, for instance, younger people are more likely to use alternatives to SMS). It may be worth addressing this point.

One of the strength of the method is that we are only establishing similarity of usage between towers. As a result, a shift in usage that does not break similarity does not impact the results. In particular, smartphones are still excessively expensive for a widespread usage in the area (see reference [17] where it is reported that the cost of charging a smartphone for a year at a service kiosk alone was estimated at 6\\% of the GDP per capita in Kenya in 2013). Internet users are even fewer than smartphone owners. Smartphones are therefore only going to be found in (moderately) large number in the richest areas. If anything, this will help to differentiate these specific areas from other areas, hence making the model more effective rather than less. This is an interesting remark, so we have added this argument in the discussion (l.228-240).

As a side note, internet communications will, in principle, be visible in \\emph{xDR} which could be mobilised in future analyses.

> P-Values are not presented in the evaluation of either the baseline or the proposed model. It would be good to see these, if possible.

Since our samples are quite large and the obtained r² are fairly above 0, all p-values are mechanically extremely small (<2.2e-16, which is the factory practical limit in R). We have written this information in the legend of table 1 (between l.168 and l.169).

> Although error bars are presented (by running the model 30 times with different random seeds), I would be interested in seeing more analysis around the sensitivity wrt. the random selection process.

The error bars have been replaced with standard boxes/whiskers to add information about the sensitivity with respect to the selection process. See new figure 3.

> Some of the constants chosen appear fairly arbitrary; for instance the five thresholds mentioned on L87 and the 1,000 inhabitant threshold mentioned on L73. Consider explaining how they were chosen.

The five thresholds are based on the shape of the distribution of daily activities between all pairs of towers (approximately an inverse power law of exponent 1.81). Also, defining what constitutes rural areas is still generally considered an open debate. The 1000 inhabitant threshold is indeed arbitrary, although it seems to have become somewhat of a "default" value for the UN and others (e.g. fao.org/3/a0310e/A0310E07.htm). We have added these two justifications (l.79-82 \\& l.97-98).

> A brief discussion on the type of data collected within the Senegal census may be relevant here. The authors claim that "an entire census can be estimated", however only population density and electricity consumption levels are estimated. This may be because the Senegalese census consists exclusively of population count, but this should be explained.

We acknowledge that our phrasing was quite misleading in this instance (changed l.246). For the record, the Senegalese census does encompass many more questions (such as the nature of the roof cover or the type of toilets), none of which appeared particularly suitable for predictions in our case.

> Several feature descriptors are used for the hierarchical clustering process -- however, these features are not directly used when modelling population density/electricity consumption. This (superficially) seems like a wasted opportunity, it may be worth explaining why?

All the features are used in table~S1. Combining the results obtained from different features did not noticeably improve the overall predictions (see response to the editor's comments), as is now explained in l.219-228. As a matter of fact, we found that the results based on activity curves were consistently better than those based on network features, although only by a small margin. We believe that enlightening the network aspects in the main text might still be useful as the relative performance of the two approaches could be reversed in another context (for example, as electrification rate grows, the nocturnal characteristics of the curves could disappear). This is now underlined immediately after the reference to table~S1 (l.191-193).

> A minor point, but the X-ticks for hour of day plots may be slightly more natural as [4, 8, 12, 16, 20]

Fixed. See new figures 1 and 4.

Reviewer #2's comments

General comments:

> My general comment is regarding mobile phone penetration and the frequency of usage of mobile phone service in developing countries. First of all, I just googled “the percentage of the world has a cell phone in 2019”, it shows 67 percent from statista.com. I do not know how reliable the data I found on google is. But I am interested in knowing whether this method can apply to other developing countries. However, based on the results of this paper, I speculate the mobile phone penetration rate is very high in Senegal. I appreciate the authors mention in the introduction that even in low electrification rural areas, mobile phone penetration in those areas are still 75%.

The method does require access to a sufficiently large mobile phone dataset and it is true that Senegal tends to be top of the class for sub-Saharan Africa in terms of electrification rate, mobile phone penetration and census data collection. That being said, the 67% figure is most likely an under-representation, since mobile phones can be shared in poor areas. Note also that we are not using the full penetration rate, but only the 65\\% market share of Sonatel. As a result, we believe that there should be enough underlying data in many other developing countries, and we are aware of some interested in this approach. We acknowledge that obtaining access to the data is however not straightforward since it is usually privately owned. We have included these arguments at the end of the conclusion (l.262-270).

> Secondly, mobile phone usage may vary across ages and/or education levels. People of different ages may have quite a different percentage of mobile usage even they all own a cell phone. In some extreme cases, children or school-age teenagers may not be encouraged to have/use a mobile phone. Then the lack of information from these categories of the population may affect the prediction power of the learning algorithm.

As mentioned above in the response to reviewer #1, we are only comparing towers among themselves and establishing similarity of usage. Hence, different usage inside the population that do not break similarity between places do not impact the results. Specifically, if the variable `age distribution inside an area' has a significant impact on phone usage then the clustering will group together places with similar age distributions, so that these will be represented by adequate reference towers. See text added l.228-240.

> Thirdly, regarding the representability of the mobile data from one provider. I appreciate the author uses the largest Senegalese telecommunication operator’s data, 65 percent of market share. From the results, I believe in Senegal, the other providers more or less target on similar categories of people compared to Sonatel. But in some cases (if extend this method to another developing country), different providers may target different categories of people. Some people choose to use a cheaper provider and they may choose to consume less amount of electricity, which may lead to bias in the model forecasting. The overall penetration of mobile phones and their frequency of usage in a particular developing country (like Senegal) may be introduced in the introduction. It would be interesting to know if in the case of low mobile penetration and/or high diverge in mobile phone usage in a developing country, how effective this method will be and what is the authors’ recommendation to use their method to uncover census data in the above cases. Besides those, I appreciate the authors take the consideration of tower in the data aggregation.

We have now emphasised the possible market share bias in a short discussion about possible extensions to other countries at the end of the conclusions (l.262-270), and have also added the market share information in the introduction (l.35). It is our belief that the only way to truly circumvent market share issues is probably not methodological, but rather by working on convincing different operators to release their data conjointly. The data we use for this specific project is already aggregated, so we cannot under-sample it to test low-penetration rates (and we would not be able to remove some targeted age or socio-economic groups anyway due to the information missing). We appreciate nonetheless this remark and keep it in mind in case some data allowing targeted under-sampling become available.

Minor comments:

> Page 1, line 17, What are the network characteristics you refer to?

Added examples l.18.

> Page 3 line 83, What is the definition of ‘distance matrices’ in your paper? Can you also give more details on the ‘point-by-point’ correlation you refer? What is the definition of ‘point’?

See improved version l.88-93.

> Page 5 line 161, What is the total number of towers? I agree that the authors remove 54 towers with no activity throughout the year in the clustering algorithm approach.

> Page 5 Table 1, In table 1, illustrations in the Voronoi cells around towers section (use average mobile data instead of the clustering algorithm), as a comparison, do you also remove the 54 inactive towers? If not, why? I speculate if the inactive towers are included, it will reduce the value of the correlation.

After verification, Table~1 was indeed computed with the inactive towers removed. The removal of the 54 inactive towers is now mentioned directly at the very beginning (l.50-51).

> Figure 3, Why are the correlations values of towers - calls(n) and towers-texts or towers-length(n) and towers- texts be the only ones that are selected as the horizontal lines? I can see that calls(n) are the highest correlation value in pop.all and length(n) is the highest one in elec.all (both of them are in terms of aggregating of towers).

These lines are in fact the maximum and minimum (at national level) to visualise the full range. Since the bottom line is not really necessary and confusing, we have removed it from the new fig.~3 and have updated the caption accordingly (between l.193-194).

> Supporting documents – S1\\_Appendix, Figure S1, It would be interesting if you can also add the performance from the “fully random samples of increasing size” (the grey lines in Figure 3).

Done.

> Minor typos I found from the manuscript:

Page3 line 83, Left quotation mark on the top left of the word “parallel”.

Page 4 Line 143, The extra word “calls” after “number of calls”.

Fixed.

Other

Completed reference [17], published during the review process.

Once again, we would like to thank the Editor and the Reviewers for their time.

Yours sincerely,

Hadrien Salat (corresponding author)

Attachment

Submitted filename: Response to Reviewers.pdf

Decision Letter 1

Shihe Fu

8 Jun 2020

PONE-D-20-02661R1

A method to estimate population densities and electricity consumption from mobile phone data in developing countries

PLOS ONE

Dear Dr. Salat,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Both reviewers are happy with your revision and recommended acceptance, but Reviewer 2 has a couple of additional minor comments on exposition. I also have one: in the abstract, "underwhelming" seems inappropriate, what does "a correlation is underwhelming" exactly mean? Please consider rephrasing this.

Please submit your revised manuscript by Jul 23 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Shihe Fu, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have suitably addressed all of my previous concerns/comments in their revised manuscript.

Reviewer #2: Thank you for the revision, which addresses issues I previously raised. This paper reflects scientific soundness. Therefore, I recommend acceptance.

Some minor questions to the author:

1. The author adds sentences "The map was created by the authors using R", what information authors would like to convey?

2. In the description of Figure 3, the best direct correlation from Table ?? for each case is represented..., which Table?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Joseph Redfern

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Jun 30;15(6):e0235224. doi: 10.1371/journal.pone.0235224.r004

Author response to Decision Letter 1


8 Jun 2020

Dear Dr. Fu,

We thank the Editor and the Reviewers for their new comments and positive feedback.

Editor's comment

> In the abstract, "underwhelming" seems inappropriate, what does "a correlation is underwhelming" exactly mean? Please consider rephrasing this.

We have changed "underwhelming" to "insufficiently high to provide an accurate representation of the situation" in the abstract.

Reviewer #2's comments

> 1. The author adds sentences "The map was created by the authors using R", what information authors would like to convey?

During re-submission, the in-house checks revealed that we needed to either "(1) present written permission from the copyright holder to publish [our] figures specifically under the CC BY 4.0 license, or (2) remove the figures from [our] submission". The quoted sentence was added to indicate that the maps do not, in fact, contain any copyrighted material as we created them ourselves. The justification within the submission system is probably enough, so we have removed the two clumsy sentences from the text at this stage.

> 2. In the description of Figure 3, the best direct correlation from Table ?? for each case is represented..., which Table?

Well spotted, thank you! This has now been fixed.

Other

We have changed the named contact at Sonatel to whom data inquires should be addressed, also requested during the in-house checks, as we are now aware that someone else within Sonatel is a better fit for the role.

We thank the Editor and the Reviewers for their new comments.

Yours sincerely,

Hadrien Salat

(corresponding author)

Attachment

Submitted filename: Response to Reviewers.pdf

Decision Letter 2

Shihe Fu

11 Jun 2020

A method to estimate population densities and electricity consumption from mobile phone data in developing countries

PONE-D-20-02661R2

Dear Dr. Salat,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Shihe Fu, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Shihe Fu

15 Jun 2020

PONE-D-20-02661R2

A method to estimate population densities and electricity consumption from mobile phone data in developing countries

Dear Dr. Salat:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Shihe Fu

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Mobile phone data at Voronoi level and aggregated over the year.

    The table contains an id of the Voronoi cell, the longitude and latitude coordinates of each Voronoi centre (slightly modified for privacy reasons), the average population density and nighttime light intensity per km2 inside the cell, and the number of text messages, calls and total call length per km2 for each cell.

    (ZIP)

    S2 File. Mobile phone data at Commune level and aggregated over the year.

    Equivalent to the previous file, but at Commune level.

    (ZIP)

    S3 File. Time series of outgoing calls for each Voronoi cell in January.

    The table contains a Voronoi cell id, a time stamp for each hour of the month and the number of outgoing calls during this hour in the cell.

    (ZIP)

    S1 Appendix. Additional methods and figures.

    Alternative method to estimate nightlights intensity from approximate data, validation of the performance of the clustering process and additional figures.

    (PDF)

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Data Availability Statement

    The census data for the year 2013 in Senegal and the nighttime lights data can be accessed directly through the public databases of ANSD and NOAA (links provided in the article). The mobile phone data at Voronoi and Commune levels and aggregated over the year are available as part of the supplementary material. The identity of the callers has been removed and the exact location of the communication towers has been slightly modified for confidentiality reasons. In addition, a time series containing the number of calls per hour for the month of January is also available as part of the supplementary material. To obtain the dataset over the entire year, one would need to contact Sonatel directly and present the research project that would require the data (contact: Mr El Hadji Birahim Gueye, Direction des Systèmes d’information Sonatel, ebgueye@orange-sonatel.com or post mail: Orange-Sonatel, 46 Boulevard de la République, BP 69 Dakar, Senegal).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES