Skip to main content
Bulletin of the World Health Organization logoLink to Bulletin of the World Health Organization
. 2014 Jun 17;92(9):690–694. doi: 10.2471/BLT.14.140756

Sampling at community level by using satellite imagery and geographical analysis

Échantillonnage au niveau de la communauté à l'aide de l'imagerie satellite et de l'analyse géographique

El muestreo a nivel comunitario mediante el uso de imágenes de satélite y análisis geográfico

أخذ العينات على صعيد المجتمع عن طريق استخدام التصوير بالأقمار الصناعية والتحليل الجغرافي

使用卫星图像和地理分析的社区层面抽样

Выборка на уровне общин с использованием спутниковых снимков и географического анализа

Veronica Escamilla a,, Michael Emch b, Leonard Dandalo c, William C Miller d, Francis Martinson a, Irving Hoffman a
PMCID: PMC4208578  PMID: 25378761

Abstract

Problem

Traditional random sampling at community level requires a list of every individual household that can be randomly selected in the study community. The longitudinal demographic surveillance systems often used as sampling frames are difficult to create in many resource-poor settings.

Approach

We used Google Earth imagery and geographical analysis software to develop a sampling frame. Every household structure within the catchment area was digitized and assigned coordinates. A random sample was then generated from the list of households.

Local setting

The sampling took place in Lilongwe, Malawi and formed a part of an investigation of the intensity of Plasmodium falciparum transmission in a multi-site Phase III trial of a candidate malaria vaccine.

Relevant changes

Creation of a complete list of household coordinates within the catchment area allowed us to generate a random sample representative of the population. Once the coordinates of the households in that sample had been entered into the hand-held receivers of a global positioning system device, the households could be accurately identified on the ground and approached.

Lessons learnt

In the development of a geographical sampling frame, the use of Google Earth satellite imagery and geographical software appeared to be an efficient alternative to the use of a demographic surveillance system. The use of a complete list of household coordinates reduced the time needed to locate households in the random sample. Our approach to generate a sampling frame is accurate, has utility beyond morbidity studies and appears to be a cost-effective option in resource-poor settings.

Problem

In estimating disease prevalence at community level, a sampling frame is required to ensure that the study sample is representative of the study community. Such a sampling frame typically lists all households that can be randomly selected in the study community and is often based on an existing demographic surveillance system. In the absence of such a system, one option is to create a complete listing of all households in the study area. In many malaria indicator surveys, for example, each household in the study area is visited, information on each head of household is captured and a sketch map showing the location of each household structure is drawn. However, this approach carries substantial field costs and these costs may increase when large census enumeration areas have to be sampled and a single segment has to be randomly selected from each enumeration area. Many resource-poor settings lack the funds and labour needed to create a full household listing in this manner – or to create and maintain a longitudinal demographic surveillance system.

Open-source software packages for geographical analysis are being increasingly employed in public health research, are available to those working in resource-poor settings and can be used to create an affordable sampling frame. We therefore used Google Earth and other geographical analysis software to generate an inexpensive sampling frame for a study on the intensity of Plasmodium falciparum transmission in Lilongwe, Malawi. In this paper, we report our experiences in constructing this sampling frame and using it to generate a random sample of households at the community level.

Local setting

Malaria transmission intensity – which is dependent on several factors, including the parasite, vector, human host, environment and treatment – serves as an important guide for interventions of malaria control. The effects of such interventions on malaria morbidity may vary with transmission intensity.1 Thus, there is a need to assess not only how decreasing parasite exposure – via vaccines and other interventions – will alter malaria transmission but also how varying levels of malaria intensity may alter the efficacy of any intervention.

Lilongwe, Malawi, is one of 11 sites of a Phase III efficacy trial for the candidate malaria vaccine, RTS,S/AS01 (GlaxoSmithKline, Brentford, England). As part of this trial, temporal trends in the prevalence of human infection with P. falciparum in the vaccine catchment area needed to be assessed. The Lilongwe site is not covered by a demographic surveillance system and does not have the resources needed to conduct a complete field-based listing of all of the households in the catchment area. We therefore needed an alternative, inexpensive and low-resource approach to create a valid sampling frame for the investigation of malaria transmission intensity.

Approach

We used two open-source software packages – Google Earth 5.1 (Google, Mountain View, United States of America) and Digipoint 2 (Zonum Solutions, Tucson, USA)2 – to generate a listing that showed the geographical location of each household. We then used two more programs – ArcGIS 9.3 (Environmental Systems Research Institute, Redlands, USA) and Hawth’s Tools for ArcGIS3 – to select a random sample from the complete geographical listing. Hand-held Garmin eTrex 10 global positioning system receivers (Garmin Ltd, Schaffhausen, Switzerland) were subsequently employed to find the sample households in the field.

We first identified the 61 census enumeration areas that defined the catchment area for the Lilongwe vaccine trial. For this, we used the enumeration areas of residence for the malaria patients attending the Malawi Ministry of Health’s area 18 health centre. The boundary files for the enumeration areas used in the 2008 census were obtained from the Malawi National Statistics Office. Community health workers used Garmin eTrex 10 receivers to collect the coordinates of each village within the catchment area. These coordinates and the boundary files were then imported into ArcGIS 9.3, converted into keyhole markup language (KML) files – for compatibility with Google Earth – and then imported into Google Earth so that the boundary of the catchment area could be defined. Village locations were overlaid onto the boundaries of enumeration areas and enumeration areas that included household structures within villages were digitized.

After the catchment area was defined in Google Earth, a list of the coordinates of every household structure in the area was generated. Individual household structures on the Google Earth satellite images of the catchment area were digitized using Digipoint 2 rather than Google Earth because this simplified the saving of multiple digitized points. The satellite images that we used had been recently updated – on 11 December 2009 or 4 June 2010 – and had sufficient resolution to identify individual structures (Fig. 1). However, they could not be used to distinguish houses from local commercial structures that were similar to the local houses in terms of size and construction material.

Fig. 1.

Zoomed Google Earth image of the sampling frame, Lilongwe, Malawi, 2010

Note: The random sample of households to be surveyed are indicated by black dots. The complete list of households is indicated by green dots. The full length of the scale bar – which formed part of the image downloaded from Google Earth – represents a distance of 288 m.

Map data: Google, digital globe.

Fig. 1

All of the digitized points – approximately 18 000 – were converted to Universal Transverse Mercator coordinates in Digipoint 2 and then imported into ArcGIS 9.3. We then used the random selection tool from Hawth’s Tools for ArcGIS to generate a simple random sample of 880 household structures. The size of the sample required had already been defined by the manufacturer of the candidate vaccine. The coordinates of the structures in the random sample were entered into Garmin eTrex 10 receivers so that health workers given the receivers could find and visit the structures. Households were visited in the order of their random selection. In instances where two houses were very closely situated, the house closest to the relevant coordinates was selected. Structures that were included in the initial sample but found to be latrines, kitchens or commercial structures when visited were noted but otherwise ignored, and not replaced by new resampled coordinates. Houses found to be unoccupied or occupied only by children when first visited were revisited at a later date. If an adult was present in a visited household, demographic information was collected and the household members were invited to participate in our malaria survey at a later date.

Evaluation and lessons learnt

By using Google Earth satellite imagery and geographical methods we were able to create a geographical sampling frame and select a representative random sample of the target population for our study. We eventually evaluated P. falciparum transmission during peak season – February to June – in 2011, 2012 and 2013. For each annual survey, we resampled the complete listing of household coordinates. Households that were investigated more than once were easily identified using the spatial join tool within ArcGIS 9.3.

Our method of generating a geographical household list appeared efficient and cost-effective and required relatively few resources other than the Garmin eTrex 10 receivers and an ArcGIS software licence. The digitizing exercise was completed by one research assistant within 3 months.

The geographical listing of households was found to be fairly accurate when used by community health workers on the ground. Fewer than 5.0% of the structures included in our initial list of structures to be visited were found to be latrines, kitchens or commercial structures. The ratio of male to female subjects in our sample – 1.08 – was very similar to that recorded in the Malawi demographic and health survey in 2010 –1.07.4

The use of Google Earth imagery and geographical methods to assemble a household list appears to be an efficient use of time in the field. Households that were targeted for study could be rapidly located on the ground by a community health worker with a global positioning system receiver. The list could be easily updated and could provide a useful framework for the future development of a demographic surveillance system. All survey data could be linked to household coordinates and displayed in ArcGIS – allowing spatial patterns in any household characteristic to be illustrated and investigated. Using our method of generating a sampling frame, all households in a target area can be digitized relatively easily, regardless of the area’s size. There is no need to segment large study areas before sampling. In the future, it may be possible to distinguish households from industrial buildings, even in densely populated areas, by using high-resolution satellite or aerial imagery,5,6 although identification of the use of multi-storey buildings from such imagery is likely to remain a challenge.

Our method of generating a sampling frame could be used for many health investigations. Google Earth has already been employed to identify and sample household clusters,7,8 capture neighbourhood characteristics that influence health outcomes9 and improve disease surveillance.5,6,10 Several different geographical methods – and combinations of such methods – can be used to generate a geographical sampling frame. For example, if the simultaneous export of a large number of points is not a concern, household structures can be digitized directly in Google Earth. Alternative software is available if researchers cannot obtain an ArcGIS license. For example, QGIS11 is open-source and includes a random selection tool that can be used to create a representative sample of households. Our approach – or a variation of it – can provide an accurate and apparently cost-effective sampling frame that has multiple potential applications in resource-poor settings (Box 1).

Box 1. Summary of main lessons learnt.

  • In the development of a geographical sampling frame, the use of Google Earth imagery and geographical software was an efficient alternative to the use of a demographic surveillance system.

  • The use of a complete list of household coordinates reduced the time needed to locate households in the random sample.

  • The approach that we used to generate a sampling frame appears accurate and cost-effective and has utility beyond morbidity studies, even in resource-poor settings.

Acknowledgements

We thank all the individuals who participated in this study. We also thank the members of the UNC Project Malaria RTS,S 055 team, the Community Advisory Board and the Malawi Malaria Control Program, directed by Doreen Ali.

Funding:

This work was supported by the PATH Malaria Vaccine initiative, GlaxoSmithKline Biologicals SA and the Bill and Melinda Gates Foundation. VE is supported by NIH 5 T32AI07001–36, R24 HD050924 and a GlaxoSmithKline postgraduate fellowship.

Competing interests:

None declared

References

  • 1.Carneiro I, Roca-Feltrer A, Griffin JT, Smith L, Tanner M, Schellenberg JA, et al. Age-patterns of malaria vary with severity, transmission intensity and seasonality in sub-Saharan Africa: a systematic review and pooled analysis. PLoS ONE. 2010;5(2):e8988. 10.1371/journal.pone.0008988 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Digipoint 2 [Internet]. Tucson: Zonum Solutions; 2014. Available from: http://www.zonums.com/gmaps/digipoint.phphttp://[cited 2014 May 12].
  • 3.Beyer HL Hawth's analysis tools for ArcGIS [Internet]. Spatial Ecology; 2004. Available from: http://www.spatialecology.com/htools/http://[cited 2014 May 12].
  • 4.Malawi demographic and health survey 2010. Zomba: National Statistical Office; 2011. [Google Scholar]
  • 5.Lozano-Fuentes S, Elizondo-Quiroga D, Arturo Farfan-Ale J, Fernandez-Salas I, Beaty BJ, Eisen L. Use of Google Earth to facilitate GIS-based decision support systems for arthropod-borne diseases. Adv Dis Surveill. 2007;4(1):91. 10.1186/1476-072X-8-49 [DOI] [Google Scholar]
  • 6.Chang AY, Parrales ME, Jimenez J, Sobieszczyk ME, Hammer SM, Copenhaver DJ, et al. Combining Google Earth and GIS mapping technologies in a dengue surveillance system for developing countries. Int J Health Geogr. 2009;8(1):49. 10.1186/1476-072X-8-49 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wampler PJ, Rediske RR, Molla AR. Using ArcMap, Google Earth, and Global Positioning Systems to select and locate random households in rural Haiti. Int J Health Geogr. 2013;12(1):3. 10.1186/1476-072X-12-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Galway L, Bell N, Sae AS, Hagopian A, Burnham G, Flaxman A, et al. A two-stage cluster sampling method using gridded population data, a GIS, and Google Earth(TM) imagery in a population-based mortality survey in Iraq. Int J Health Geogr. 2012;11(1):12. 10.1186/1476-072X-11-12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Clarke P, Ailshire J, Melendez R, Bader M, Morenoff J. Using Google Earth to conduct a neighborhood audit: reliability of a virtual audit instrument. Health Place. 2010November;16(6):1224–9. 10.1016/j.healthplace.2010.08.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kamadjeu R. Tracking the polio virus down the Congo River: a case study on the use of Google Earth in public health planning and mapping. Int J Health Geogr. 2009;8(1):4. 10.1186/1476-072X-8-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.QGIS. A free and open source geographic information system [Internet]. Open Source Geospatial Foundation Project; 2014. Available from: http://qgis.osgeo.org [cited 2014 May 12].

Articles from Bulletin of the World Health Organization are provided here courtesy of World Health Organization

RESOURCES