Skip to main content
PLOS One logoLink to PLOS One
. 2021 Nov 17;16(11):e0258877. doi: 10.1371/journal.pone.0258877

Representativeness of individual-level data in COVID-19 phone surveys: Findings from Sub-Saharan Africa

Joshua Brubaker 1, Talip Kilic 1, Philip Wollburg 2,*
Editor: Bjorn Van Campenhout3
PMCID: PMC8598049  PMID: 34788292

Abstract

The COVID-19 pandemic has created urgent demand for timely data, leading to a surge in mobile phone surveys for tracking the impacts of and responses to the pandemic. Using data from national phone surveys implemented in Ethiopia, Malawi, Nigeria and Uganda during the pandemic and the pre-COVID-19 national face-to-face surveys that served as the sampling frames for the phone surveys, this paper documents selection the biases in individual-level analyses based on phone survey data. In most cases, individual-level data are available only for phone survey respondents, who we find are more likely to be household heads or their spouses and non-farm enterprise owners, and on average, are older and better educated vis-a-vis the general adult population. These differences are the result of uneven access to mobile phones in the population and the way that phone survey respondents are selected. To improve the representativeness of individual-level analysis using phone survey data, we recalibrate the phone survey sampling weights based on propensity score adjustments that are derived from a model of an individual’s likelihood of being interviewed as a function of individual- and household-level attributes. We find that reweighting improves the representativeness of the estimates for phone survey respondents, moving them closer to those of the general adult population. This holds for both women and men and for a range of demographic, education, and labor market outcomes. However, reweighting increases the variance of the estimates and, in most cases, fails to overcome selection biases. This indicates limitations to deriving representative individual-level estimates from phone survey data. Obtaining reliable data on men and women through future phone surveys will require random selection of adult interviewees within sampled households.

1. Introduction

With the onset of the coronavirus disease 2019 (COVID-19) pandemic, governments, academic institutions, and international organizations scrambled to measure and monitor the pandemic’s impacts on livelihoods and tailor policy responses. A global survey of National Statistical Offices (NSOs) showed that already in May 2020 over 80 percent were involved in collecting data related to the COVID-19 pandemic, focusing predominantly on its socioeconomic and business impacts. However, prompted by lockdowns, travel restrictions and safety concerns, face-to-face (F2F) survey data collection was suspended in most countries at the onset of the pandemic. Since then, the movement to resume F2F surveys, even under strict COVID-19 fieldwork protocols, has been slow and uncertainty regarding the timeline for fully resuming activities under the “new normal” prevails [1]. These developments have led to a proliferation of telephone surveys as the tool of choice for collecting data on COVID-19 impacts among the majority of NSOs [2, 3]. Similarly, the World Bank has launched a global initiative to monitor COVID-19 impacts using phone surveys as have UN Women, Innovations for Poverty Action and Young Lives, among many others [47]. In the meantime, insights derived from these phone surveys have been used widely in published research and to inform policy.

Phone surveys with national coverage had previously been rather uncommon in low-income countries and relatively little was known about their feasibility and best practices. However, the COVID-19 pandemic accelerated a more widespread adoption of phone surveys as an instrument of choice in low-income countries, so that phone surveys are likely to remain commonplace even after the COVID-19 pandemic, complementing F2F surveys [8].

Making effective use of phone survey data for research and to inform policies in low-income countries now and in the future requires addressing selection biases from which phone surveys are more prone to suffer than F2F surveys and which threaten the representativeness of estimates based on phone survey data.

In this paper, we document how selection biases affect individual-level data derived from phone surveys from four Sub-Saharan African countries and assess to what extent these biases can be addressed through reweighting. Individual-level analysis is of special interest in this context because important outcomes such as attitudes towards and knowledge of the COVID-19 pandemic are individual outcomes that are captured for the survey respondent only. For example, a recent study uses individual respondent level phone survey data to examine the acceptance of COVID-19 vaccines in six Sub-Saharan African countries [9]. Moreover, individual-level data are critical to properly understand the heterogenous impacts of COVID-19 by gender, age group, and other subpopulations of interest [10, 11]. Since phone ownership is less common among women and vulnerable populations, surveying these groups in a representative fashion is a particular challenge, especially in the context of the COVID-19 pandemic [12, 13].

Phone surveys are prone to various forms of selection biases. First, phone surveys usually require phone ownership; in low-income countries phone ownership is not universal, which may lead to coverage bias. A review paper of 15 phone-based studies from 11 low- and middle-income countries finds that phone survey samples are skewed towards men and individuals in wealthier, male-headed, urban and better-educated households and therefore under-represent certain parts of the population [14]. Second, response rates in phone surveys are lower than in F2F surveys because respondents do not pick up, refuse participation at higher rates, or phone numbers are disconnected. This leads to non-response bias when responding households are systematically different from households that do not respond. A recent study documents the nature and extent of both coverage and non-response biases at the household level in phone survey data from Ethiopia, Malawi, Nigeria, and Uganda [15]. The study finds that households in phone survey samples are wealthier and less likely rural or agricultural than a nationally representative sample of households. A pre-COVID-19 study find similar patterns in phone surveys in South Sudan, Tanzania, and Honduras [16].

For individual-level data, which we focus on in this paper, a further potential source of bias is related to respondent selection. Most phone surveys are done with just one main respondent and respondent selection protocols often target heads of households or “most knowledgeable” adult household members such that the sample of respondents may not be representative of the general adult population. Selecting the “most knowledgeable” adult as a respondent is a common practice in household surveys, whether face-to-face or telephone, and concerns with individual-level representativeness arise from this choice in all cases. However, interviewing household members other than the main respondent, or asking the main respondent to report information on behalf of other household members, is considerably easier and more common in F2F surveys [17, 18].

The severity of these biases may vary depending on the phone survey mode [19, 20] and depending on the sampling strategy [14]. Three main sampling strategies have been employed for phone surveys in low- and middle-income countries, both during the COVID-19 emergency and before. First, based on phone numbers collected in a previous F2F survey. Second, using a list of phone numbers otherwise obtained, for example from a mobile network operator. Third, random digit dialing (RDD), whereby randomly generated phone numbers are called, which is used widely when no pre-existing list of phone numbers is available [21]. Phone surveys based on RDD in low- and middle-income countries have been found to suffer from significantly higher non-response rates than those based on existing contact information, which may in turn lead to greater non-response bias [14, 21]. However, RDD-based phone surveys typically use individual phone numbers and may face less of a respondent selection problem.

The advantage of a sampling strategy based on an existing list of phone numbers from a representative F2F survey is that there is a wealth of information on each household or individual with a phone number as well as on households or individuals without a phone number. This information in turn can be used to characterize selection biases and recalibrate sampling weights to improve the representativeness of the phone survey data–a feature that we will also make use of in this analysis. A recent study using phone survey data from Ethiopia, Malawi, Nigeria, and Uganda shows that recalibrating survey weights is relatively successful at overcoming coverage and non-response biases at the household-level [15]. Reweighting was also used to improve the representativeness of a study on the impacts of the Ebola crisis in Liberia and Sierra Leone, albeit without a systematic attempt at assessing the relative success of this method [22].

Our analysis leverages data from national high-frequency phone surveys on COVID-19 (HFPS) in Ethiopia, Malawi, Nigeria and Uganda and the nationally-representative F2F surveys that had been implemented prior to the pandemic under the World Bank Living Standards Measurement Study–Integrated Surveys on Agriculture (LSMS-ISA) program and that served as the sampling frames for the phone surveys. The F2F surveys collected the phone numbers of at least one individual per household, and in some cases of all household members, which were then used to contact households for the high-frequency phone surveys. This setup allows us to compare phone survey respondents and the general adult population along a range of individual and household characteristics.

Our analysis confirms that concerns regarding the representativeness of individual-level phone survey data are warranted. Selected phone survey respondents are most often household heads or their spouses, and on average, are older, better educated and more likely to own a non-farm enterprise vis-a-vis the general adult population. To account for these differences and improve the representativeness of individual-level phone survey data, we recalibrate the household-level phone survey sampling weights based on propensity score adjustments that are derived from a cross-country comparable model of an adult individual’s likelihood of being interviewed in a phone survey household as a function of a rich set of individual- and household-level attributes [23] and assess to what extent the recalibrated weights can address selection biases. Reweighting generally improves the representativeness of the individual-level estimates, moving the variable means for phone survey respondents closer to those of the general adult population. This holds for both women and men and for a range of demographic, education, and labor market outcomes. However, reweighting increases the variance of the estimates and fails to fully overcome individual-level selection biases, with differences in means remaining statistically significant for the majority of outcomes–somewhat contrary to what a recent study with the same data sources found for household-level biases [15]. Obtaining reliable individual-level data from these phone surveys, therefore, requires fundamental changes to the individual respondent selection protocols with a focus on random selection of interviewees.

Our paper is part of a growing literature on methodology and best practices for designing and conducting phone surveys in low- and middle-income countries, covering a range of issues including sampling [21, 24]; survey mode [14, 20, 25]; survey cost, non-response, attrition, and use of incentives [16, 2631]; and questionnaire design [19, 32, 33]. There are also several guidebooks and synthesis reports that summarize best practices and experiences with phone surveys from before the COVID-19 pandemic [16, 26, 29] as well as in the context of the COVID-19 pandemic [8, 14, 32].

The remainder of the paper is structured as follows. Section 2 describes the data and methods we use to assess individual-level biases and the relative success of bias reduction techniques. Section 3 presents the main emerging findings. Section 4 concludes with a discussion of what the results mean for individual level analysis and data collection using phone surveys.

2. Data and methods

2.1. Data sources

The longitudinal survey data informing our analysis originate from (i) the national high-frequency phone survey (HFPS) that was implemented on a monthly basis in Ethiopia, Malawi, Nigeria and Uganda during the COVID-19 pandemic, and (ii) the pre-COVID-19 F2F household survey that served as a sampling frame for each HFPS.

Each pre-COVID-19 F2F survey that was the source of the phone numbers for the respective country had been designed to be representative at the national, regional, and urban/rural levels. These F2F surveys are the Ethiopia Socioeconomic Survey (ESS) 2018/19, the Malawi Integrated Household Panel Survey (IHPS) 2019, the Nigeria General Household Survey (GHS)—Panel 2018/19, and the Uganda National Panel Survey (UNPS) 2019/20. In Ethiopia, Malawi, and Uganda, the HFPS attempted to call all pre-COVID-19 F2F survey households for which at least one phone number was available. The Nigeria HFPS first drew a national sub-sample from the universe of F2F survey households with contact details, based on a balanced sampling approach using the cube method [34], before this sub-sample of households was contacted.

In Ethiopia, we use data from the first round of the HFPS, which was implemented in April-May 2020, covering 3,249 households. In Malawi, we use data from the first and fifth rounds of the HFPS, which were implemented in May-June 2020 and October-November 2020, covering 1,729 and 1,589 households, respectively. Similarly, in Nigeria, we use data from the first and fifth rounds of the HFPS, which were implemented in April-May 2020 and September 2020, covering 1,950 and 1,774 households, respectively. We use the fifth round of the HFPS in the specific cases of Malawi and Nigeria to analyze individual-level employment data which was collected on all adults in each household only in these two countries. Lastly, in Uganda, we use data from the first round of the HFPS, which was implemented in June 2020, covering 2,227 households.

The implementing agency for the national phone surveys in Ethiopia, Malawi, Nigeria and Uganda are, respectively, Laterite Ethiopia, the Malawi National Statistical Office, the Nigeria Bureau of Statistics, and the Uganda Bureau of Statistics. The anonymized, unit-record phone survey data are available publicly through the World Bank Microdata Library under the High-Frequency Phone Survey collection [35]. The World Bank World Bank Microdata Library is the preferred platform for public dissemination among the NSOs in Ethiopia, Malawi, Nigeria, and Uganda. The approach to the phone survey questionnaire design and sampling was generally comparable across countries, albeit with some scope for contextualization, informed by a set of tools designed for the HFPS on COVID-19, including a template questionnaire, phone survey sampling guidelines, and computer-assisted telephone interviewing (CATI) guidelines [21, 33, 36, 37]. The template questionnaire included a set of core modules which were adopted across countries as well as a set of other modules which countries adopted optionally according to interest and need.

Since the phone surveys build on the F2F surveys, and the phone survey respondent was recorded using unique anonymized household and household member identification numbers, we can link the phone survey data with the pre-COVID-19 F2F survey data at the individual-level. This gives us two samples to compare: (i) the phone survey respondents and (ii) the general adult population, derived from the nationally representative pre-COVID-19 F2F sample of which the phone survey populations are a subsample. Individuals 15 and above were considered part of the general adult population as these individuals were eligible to be respondents in the HFPS and the F2F surveys.

Our analysis assesses the differences between phone survey respondents and the general adult population as represented in the pre-COVID-19 F2F surveys and gauges the success in utilizing bias correction techniques to derive general adult population representative estimates for a core set of individual-level variables related to gender, age, marital status, relationship with the household head, education, and employment. S1 Table shows the unweighted means of these variables for the samples of interest.

2.2. Ethics approval

Informed consent was received from all phone survey and F2F survey respondents in each country. The World Bank does not require institutional ethics approval for household surveys that are partly or fully financed by the World Bank, including the national phone surveys in Ethiopia, Malawi, Nigeria, and Uganda that inform our research. Furthermore, each phone survey was implemented by the respective national statistical office (NSO), except for Ethiopia where a private firm was the implementing agency. This means that in the specific cases of Malawi, Nigeria, and Uganda, the NSO conducts the survey as the sole official statistical authority in the country and in accordance with the respective National Statistical Act, which exempts the NSO from institutional ethics approvals. All data sets used were fully anonymized prior to our access, that is, all personal identifying information on households and individuals was removed and households and individuals were given anonymized identification numbers.

2.3. Sampling frames, contact protocols, and respondent selection

Though informed by the same general guidelines [36], the protocols for contacting the sampled households and subsequently selecting the respondent in each household were slightly different in each HFPS, reflecting country-level survey design choices as well as differences in how phone numbers were recorded in the pre-COVID-19 F2F surveys which served as sampling frames. In Malawi, the IHPS 2019 was the sampling frame for the HFPS. During the IHPS 2019, phone numbers were collected from the sampled households in two ways: First, each household member’s phone number was collected during the interview and recorded as part of the household roster, provided that the individual had a phone number. Second, phone numbers for up to three non-household reference contacts, such as neighbors or friends, were noted at the beginning of the interview. Prior to the implementation of the first round of the HFPS, the resulting list of phone numbers for each household was put in random order. During the first round of the HFPS, enumerators then called the phone numbers in accordance with this order in each household. However, the first contact was not necessarily the same person as the main respondent, since being the main respondent required an ability and willingness to respond to survey questions and thus it was possible for first contacts to hand over the phone to another person. In the following rounds, the first phone number to be called was the one that the respondent of the first round indicated as the best number to reach them. The original list of phone numbers was retained in the event that the preferred phone number could not be reached. Of the 3,181 IHPS 2019 households that were interviewed face-to-face, 2,337 provided at least one phone number and all of these households were attempted to be contacted by the HFPS. Of the attempted households, 1,729 households were fully interviewed in the first round, a response rate of 74 percent.

In Ethiopia, the ESS 2018/19 was the sampling frame for the HFPS. The ESS 2018/19 interviewed 6,770 households which were asked to provide phone numbers for the head of household, up to three additional household members and up to two non-household reference individuals. At least one phone number was obtained for 5,374 ESS 2018/19 households. The enumerators called the available phone numbers for each household in the order in which they were recorded during the ESS 2018/19 interview. During the first round of the HFPS, all 5,374 households were attempted to be contacted, of whom 3,249 were successfully interviewed, for a final response rate of 60 percent.

In Nigeria, the GHS-Panel 2018/19 was the sampling frame for the HFPS. The GHS-Panel 2018/19 interviewed 4,976 households of whom 4,934 provided phone numbers and from which 3,000 were in turn randomly selected to be contacted in the first round of the HFPS. The contact protocol targeted the household head, who was called first if their number was listed, followed by the remaining household members and the reference contacts in the order in which they were captured by the GHS-Panel 2018/19. During the first round of the HFPS, 1,950 households were successfully interviewed out of 3,000 households attempted, equivalent to a 65 percent response rate.

Finally, in Uganda, the UNPS 2019/20 was the sampling frame for the HFPS. The UNPS 2019/20 interviewed 3,098 households, of whom 2,386 provided a phone number for at least one household member or a reference contact. The HFPS attempted to contact all 2,386 households, of whom 2,227 were successfully interviewed, markedly the highest response rate in our sample at 93 percent. Like Nigeria, the Uganda HFPS contact protocol prioritized the household head, followed by other household members, and referenced contacts, in the order in which they were captured during the UNPS 2019/20.

Table 1 presents a summary of the sampling steps and pertinent sample sizes of the four HFPS used in this paper.

Table 1. Selection of HFPS households.

  Ethiopia Malawi Nigeria Uganda
Sample Households (HHs) N % N % N % N %
Face-to-face (F2F) HH sample 6,770 100 3,181 100 4,976 100 3,098 100
HHs with phone numbers 5,374 79.4 2,337 73.5 4,934 99.2 2,386 77.0
HHs called by HFPS 5,374 79.4 2,337 73.5 3,000 60.3 2,386 77.0
HHs reached by HFPS 3,357 49.6 1,743 54.8 2,057 41.3 2,246 72.5
HHs successfully interviewed by HFPS 3,249 48.0 1,729 54.4 1,950 39.2 2,227 71.9
HHs successfully interviewed by HFPS with the phone survey respondent also appearing in the F2F survey 3,196 47.2 1,701 53.5 1,910 38.4 2,128 68.7

Across all households in the F2F survey database, there are a total of 17,563 adults in Ethiopia, 8,588 in Malawi, 15,230 in Nigeria, and 8,763 in Uganda–irrespective of being contacted or interviewed in one of the HFPS rounds that are used in our analysis. Of these adults, 8,004 in Ethiopia, 4,670 in Malawi, 6,178 in Nigeria, and 6,361 belonged to F2F survey households that were also interviewed in the first round of the HFPS.

Table 2 presents unweighted descriptive statistics for (i) individuals that were respondents in successfully interviewed HFPS households in round 1 (i.e. phone survey respondents), and (ii) all adults living in F2F survey households, irrespective of being contacted or interviewed by the HFPS (i.e. the general adult population). In all HFPS rounds that inform our analysis, the majority of respondents were household heads, ranging from 74 percent in Uganda to 83 percent in Ethiopia with Malawi and Nigeria standing at 79 and 82 percent, respectively. This similarity in the share of household heads interviewed across countries is notable because Ethiopia, Nigeria, and Uganda implicitly or explicitly targeted the household head as the HFPS respondent, whereas in Malawi the order of the contacted phone numbers was randomized for each household. One reason for this is that household heads are likely to own phones and as a result are more likely to be called. Another conceivable reason is that individual phone owners other than the household head handed phones to the household head to respond on behalf of the household. Next to household heads, in each country the remaining HFPS respondents were predominantly spouses of the household head.

Table 2. Unweighted descriptive statistics for HFPS respondents and adult population in F2F survey.

    Ethiopia Malawi Nigeria Uganda
    Phone resp. Adult pop. Phone resp. Adult pop. Phone resp. Adult pop. Phone resp. Adult pop.
Gender Women 37.6 52.7 36.9 52.4 27.2 51.7 48.3 51.8
Men 62.4 47.3 63.1 47.6 72.8 48.3 51.7 48.2
Age Group 15–24 12.9 34.3 11.8 39.6 5.7 31.6 5.9 37.7
25–49 66.6 49.6 65.5 44.5 55.0 45.0 59.8 40.8
50+ 20.5 16.1 22.6 15.9 39.3 23.4 34.3 21.5
Relationship to HH Head Head 82.8 38.5 78.7 37.0 82.7 32.7 74.1 35.1
Spouse 9.8 24.8 16.5 26.1 9.2 28.1 20.2 22.0
Child 6.0 26.3 3.1 24.6 6.5 30.3 4.4 32.1
Other 1.5 10.3 1.8 12.3 1.7 9.0 1.4 10.8
Observations 3,196 17,563 1701 8,588 1910 15,230 2128 8,763

Note: Table 2 presents unweighted results. Phone resp. = phone survey respondents; Adult pop. = General adult population as captured in pre-COVID-19 nationally representative household surveys. The sample underlying the estimates in this table exclude individuals that were HFPS respondents but that were not household members at the time of the pre-COVID19 F2F surveys. In Ethiopia, 98.4 percent of successfully interviewed households in the first HFPS round had a respondent that was also present in the associated F2F survey. This rate was 98.3 percent in Malawi, 97.9 percent in Nigeria, and 93.9 percent in Uganda.

A majority among phone survey respondents was male, ranging from 73 percent in Nigeria to just slightly above the population average in Uganda at 52 percent with Ethiopia and Malawi standing at 62 and 63 percent, respectively. The HFPS respondents were also much less likely to be among the youth (i.e. between the ages of 15 and 24 years) vis-à-vis the general adult population. The gap was most pronounced in Uganda where 6 percent of respondents versus 38 percent of adults fall in the 15–24 age range and was smallest in Ethiopia where 13 percent of respondents versus 34 percent of adults fall in the same age range. This finding is somewhat contrary to previous studies, which often found youth to be overrepresented among phone survey respondents [14].

2.4. Household and individual sampling weights

There are several sampling weights that are used in our analysis. To start with, there are the pre-COVID-19 F2F household survey sampling weights (wb). These sampling weights serve as the starting point for the computation of the HFPS household sampling weights in public use datasets (w1), which are calibrated versions of wb that address coverage and non-response biases at the household-level by leveraging the rich, pre-COVID-19 F2F survey data on (i) households that do not own a mobile phone and are excluded from the sampling frame; (ii) households that participate in the HFPS, and (iii) households that are contacted but cannot not be reached. This latter scenario is overwhelmingly due to non-working phone numbers or prospective respondents not answering calls as opposed to answering the phone call but then refusing to respond to the survey.

The household-level bias adjustment to create w1 follows the methodology proposed in a previous methodological contribution [23] and detailed specifically for the HFPS rounds in Ethiopia, Malawi, Nigeria, and Uganda in a recent paper [38]. This methodology is also commonly used for the computation of sampling weights in longitudinal F2F surveys with tracking of individuals over time. The HFPS household sampling weights are further post-stratified to match the projected population totals at the highest spatial resolution possible, ranging from region to district, based on the data availability in each country.

Yet, w1 does not account for the non-random selection of an individual to be a HFPS respondent. To address this and allow for the analysis of individual-level phone survey data in a way that is more representative of the general adult population, an additional individual-level sampling weight is needed. The objective of this paper is to assess the effectiveness of this recalibrated weight to correct for selection biases at the individual level. In what follows, we detail an approach that can be followed by any potential data user, leveraging solely the publicly available data on successfully interviewed HFPS households and their adult household members—as captured in the pre-COVID-19 F2F surveys and the HFPS.

To create the individual-level weight (w2), we follow an adjustment procedure that is similar to the procedure used to create w1. First, using the sample of all adult members of HFPS households (respondents and non-respondents), we estimate an unweighted logit regression to model the individual-level probability of selection as a HFPS respondent:

Pr(respondent=1)=F(β0+k=1KβkXk) (1)

The dependent variable in this model is a binary variable indicating whether a given individual was the round 1 HFPS respondent. X is a vector containing K independent variables that originate from the F2F survey and that are expected to predict the likelihood of being a HFPS respondent. The sample for Eq 1 is individuals who were household members both in the pre-COVID-19 F2F surveys and in the HFPS. A cross-country consistent set of independent variables is used for Eq 1, including an extensive range of individual and household attributes and spatial fixed effects. Eq 1 is then estimated separately for each country. Since the individual’s relationship to the household head is likely to impact respondent-ship due to the HFPS respondent selection protocols, dichotomous variables are included to identify household head, spouse of the household, and child/adopted child of the household head, with the omitted category being any other relationship to the household head.

Additional dichotomous variables are included to identify (i) men; (ii) married individuals; (iii) those aged 25–49 and, separately, 50+, with individuals in the age range of 15–24 constituting the omitted category; (iv) individuals with completed primary education, completed secondary education, completed post-secondary certificate/training, and completed post-secondary degree, with individuals having less than completed primary education being the omitted category; and (v) individuals that can read and write in any language. Since individuals with different time use may have different incentives and availability to respond to a phone survey, a set of non-exclusive dichotomous variables are included to discern whether the individual had regular wage employment; was the owner of a household enterprise and participated in casual labor (with the latter being restricted only to Ethiopia and Malawi, in view of data availability and importance of casual labor activities in these contexts). Finally, a dichotomous variable is included to identify an individual’s ownership of a mobile phone, which is expected to increase likelihood of being a HFPS respondent. The household-level attributes in Eq 1 are (i) household size, which is expected to decrease the probability of any single adult being a HFPS respondent; and (ii) dichotomous variables identifying the household’s total annual per capita household consumption expenditure quintile, with the lowest quintile being the excluded category.

The significance level and size of the marginal effects associated with the regression coefficients (β) of the binary independent variables can be interpreted as the change in likelihood of being a phone survey respondent as a result of having the respective individual characteristic. Following the estimation of Eq 1, we predict the probability of being a HFPS respondent across the entire sample of adult household members in successfully interviewed HFPS households. Following guidance from the relevant literature [21, 23, 39], we then create deciles for this variable compute the average predicted probability within each decile, and take the reciprocal of this average to define the adjustment factor for each decile (afD = d):

afD=d=1i=1Nrespondenti^N (2)

where N is the number of individuals in each decile. The computation of the average probability per decile ensures that there are both respondents and non-respondents assigned to each value of the reweighting adjustment factor, creating covariate balance between respondents and non-respondents which the raw probability variable could not achieve [39]. The adjustment factor is then applied to w1, the HFPS household sampling weight in the public use phone survey dataset:

wi,af=afD=d*w1 (3)

wi,af is in turn winsorized at the top and bottom 2 percent, in order to deal with extreme outliers, which reduces standard errors and makes estimates more efficient [23]. The winsorized weight is then post-stratified to equal population totals at the highest spatial resolution available, following the approach to the post-stratification of w1. Post-stratification ensures the weights sum up to known population totals and also reduces overall standard errors [23, 40]. In each country, the post-stratification adjustment (wps) is produced at the level of the lowest administrative unit for which population projections are available (typically region or district, depending on the country). It is computed as (i) the weighted total number of households residing in each administrative unit of interest, as measured by the sum of winsorized wi,af values in that unit, divided by (ii) the household population projection in that unit. Once computed, wps for each administrative unit is associated with all surveyed households in that unit, and wi,af is multiplied with wps to derive the final individual weight, w2:

w2=wps*wi,af (4)

2.5. Assessing differences between HFPS respondents and general adult population under different sampling weights

To assess the effectiveness of the bias reduction techniques for the individual-level phone survey data analysis, we focus on the individual-level variables that are captured in the pre-COVID-19 F2F survey and that are related to gender, age, marital status, relationship with the household head, education, and employment (see S1 and S2 Tables), which are the individual-level variables included in the logit regression as part of creating the recalibrated weight w2 (see section 2.4). We derive estimates of the mean of these variables using two different samples: (i) all adult household members, as captured in the F2F survey, who are assumed to be representative of the general adult population with the use of F2F household sampling weights (wb), and (ii) HFPS respondents who were also present in the F2F survey (i.e. ii is a subsample of i).

The weighted estimates for the adult household members in the F2F survey, denoted as b, serve as the benchmark to which we compare the sample of HFPS respondents under three different scenarios:

  1. unweighted (w0),

  2. weighted by the HFPS household sampling weights in the public use datasets (w1), and,

  3. weighted by our newly generated HFPS individual sampling weight (w2), which is the recalibrated version of w1, intended to account for the non-random selection to be a HFPS respondent among the adult household members residing in the successfully interviewed HFPS households.

We use two different approaches to assess the effectiveness of HFPS household and individual sampling weights in reducing the bias in estimates for the HFPS respondents vis-à-vis the general adult population (as captured through the F2F survey). First, we take a graphical approach, where the estimates from the F2F and phone surveys are standardized by subtracting the F2F survey mean. This means that the F2F survey mean is always zero and all other estimates are standardized in relation to the F2F survey mean, allowing a comparison across the competing estimates. The graphs then present the weighted mean and 95 percent confidence interval estimated for a range of individual-level variables for the general adult population (b), and the same set of statistics estimated for the HFPS respondents, without the use of any sampling weight (w0) and employing the HFPS household (w1) or individual (w2) sampling weight. This allows us to assess how large the differences in the two populations are at the outset (b vs w0) and how well the HFPS household sampling weights (b vs w1), and the HFPS individual weights (b vs w2) perform in reducing the differences between the HFPS respondents and the general adult population.

Second, we rely on Wald tests to assess whether the HFPS-based estimates obtained under different weights are significantly different vis-à-vis the F2F survey-based estimates for the general adult population. This approach requires constructing an appended dataset containing:

  1. all adult household members in the F2F survey households and the F2F survey household sampling weight (wb),

  2. the HFPS respondents and the HFPS household sampling weight set to 1 (w0),

  3. the HFPS respondents and the HFPS household sampling weight in the public use datasets (w1), and

  4. the HFPS respondents with the HFPS individual sampling weight (w2).

In this set up, the samples (ii) through (iv) are composed of identical individuals that are appended with different sampling weights and that constitute a subset of sample (i). A common name is used for the sampling weight variable across the appended datasets and each appended dataset includes the same set of individual-level variables, as listed in S1 and S2 Tables. Furthermore, a new categorical variable is defined to uniquely identify each appended sample (i through iv). A weighted linear regression is then estimated for each outcome of interest, with an identical set of independent variables that include the dichotomous variables identifying the samples (ii) through (iv), with the sample (i) (i.e., all adult household members in the F2F survey households) serving as the comparison category. The sampling weight for each observation is equal either to wb, w0, w1 or w2 in accordance with the sample that the record belongs to. When presenting the results from this regression, the base category is shown on the top row and represents the mean from which all other estimates deviate. The values in rows other than the base category express the difference in mean from the base category.

2.6. An application with phone survey data

In the previous section, we presented the approach to (i) understanding the differences in key attributes of HFPS respondents and the general adult population as captured in the pre-COVID-19 F2F surveys, and (ii) assessing how well individual-level weight adjustments can reduce these differences. The analysis in the previous section considers the individual-level variables used in the logit regression as part of creating the individual-level adjusted weights. This analysis is therefore a test of how well the adjustment model discussed in section 2.4 worked on a subset of its own model parameters.

In this section, we expand the analysis beyond these individual attributes from the F2F data to a practical application in using the HFPS phone survey data in a way many analysts might, which serves as a validation of our initial results. Most individual-level analyses using phone survey data face the constraint that the data are only available for the main respondent. However, we leverage the fifth HFPS round in Malawi and Nigeria, where individual-level data on labor market outcomes were collected for all adult household members and not just the main respondent. This special case allows us to create an alternative benchmark for the general adult population, which we use to understand (i) the differences in outcomes measured in the HFPS data between the HFPS respondents and the general adult population and (ii) how well the individual-level adjusted weight can overcome these differences. For this, we weight the individual-level HFPS data on select employment outcomes using the standard HFPS household sampling weights (w1) and assume these to be the alternative benchmark estimates for the general adult population. The HFPS household sampling weights (w1) are calibrated to provide representative estimates for the general household population as discussed in section 2.2 and demonstrated in a forthcoming study [15]. As such, the assumption is that the HFPS individual-level data on adult household members that are weighted by w1 are again representative of the general adult population. We consider this assumption reasonable for the illustrative purpose of this analysis, but the caveat to consider is that the data for all adult household members is collected through the main HFPS respondent rather than from each household member directly. Collecting individual-level data through a proxy is considered second-best to self-reporting because proxy response may lead to non-sampling error [17], which may not be mitigated through reweighting [21]. Ideally, individual-level analyses would therefore rely on self-reported data for all household members, but this may be prohibitive in the context of a phone survey. In the absence of self-reported data for all household members, we consider the proxy-reported data for all household members a reasonable alternative benchmark against which to test the outcomes for phone survey respondents.

With this setup, we compare (i) individual employment outcomes for all adults as reported by a proxy and weighted by the household sampling weight (w1), the benchmark, to (ii) the same set of employment outcomes for only the sample of HFPS respondents using the household sampling weight (w1), to assess the differences between these two populations, to (iii) the sample of HFPS respondents using the adjusted individual (w2) sampling weight, to assess how well it performs in overcoming the differences. The approach to gauging graphical and statistically significant mean differences between the three competing estimates for each employment outcome is identical to the approach that is detailed in section 2.3.

3. Results

In the following, we first discuss how phone survey respondents differ from the general adult population and then explore how well the different weight adjustment techniques perform in allowing the data on HFPS respondents to be more representative of the general adult population.

3.1. Phone survey respondents versus the general adult population

Given the respondent selection protocols discussed above, it is expected that the two populations–phone survey respondents and the general adult population–differ along various dimensions. As a reminder, S1 and S2 Tables shows a set of descriptive statistics for the individual-level variables of interest for both populations in each of the four countries. Table 3 presents the results (i.e. marginal effects) from the estimation of Eq 1, i.e. the logit regression that models the likelihood of being a HFPS respondent among adults in successfully interviewed HFPS households as a function of a rich set of individual and household attributes. Several overarching results emerge.

Table 3. Marginal effects from logit regressions on being a HFPS respondent in round 1.

Ethiopia Malawi Nigeria Uganda
Household Size -0.015 (.002)*** -0.013 (.002)*** -0.011 (.001)*** -0.012 (.002)***
Head 0.457 (.018)*** 0.397 (.026)*** 0.314 (.019)*** 0.389 (.027)***
Spouse of head 0.128 (.023)*** 0.140 (.033)*** -0.010 (.023) 0.183 (.032)***
Child of head 0.083 (.019)*** -0.006 (.026) 0.026 (.021) 0.000 (.027)
Male -0.005 (.009) -0.040 (.015) *** 0.013 (.013) -0.050 (.012)***
Ages 25–49 0.031 (.011)*** 0.040 (.016) ** 0.079 (.016)*** 0.112 (.019)***
Ages 50+ -0.009 (.014) 0.038 (.019) ** 0.060 (.018)*** 0.094 (.020)***
Married -0.016 (.012) -0.021 (.019) 0.033 (.014)** -0.065 (.017)***
Primary 0.030 (.010)*** 0.005 (.015) 0.021 (.012)* 0.029 (.010)***
Secondary 0.043 (.013)*** 0.014 (.017) 0.031 (.012)*** 0.050 (.033)
Certificate 0.079 (.037)** -0.002 (.016) 0.057 (.017)*** 0.037 (.022)*
Post-Secondary Degree 0.063 (.016)*** 0.002 (.023) 0.036 (.019)* -0.003 (.020)
Employed for a wage/salary -0.007 (.010) -0.005 (.015) 0.039 (.013)*** 0.007 (.012)
Owner of a household enterprise 0.026 (.011)** 0.047 (.012)*** 0.057 (.009)*** 0.029 (.010)***
Casual laborer 0.075 (.020)*** 0.055 (.012)***
Consumption quintile 2 -0.011 (.017) -0.007 (.021) -0.022 (.015) -0.024 (.015)
Consumption quintile 3 -0.018 (.016) -0.017 (.021) -0.034 (.015)** -0.031 (.015)**
Consumption quintile 4 -0.031 (.016)* -0.027 (.020) -0.042 (.016)** -0.048 (.015)***
Consumption quintile 5 -0.043 (.017)*** -0.048 (.021)** -0.041 (.017)** -0.055 (.017)***
Individual owns a mobile phone 0.114 (.009)*** 0.154 (.012)*** 0.077 (.014)*** 0.139 (.010)***
Spatial Fixed Effects Region x Urban District State Subregion
Number of Observations 8535 4959 6183 6647
Pseudo R-squared 0.456 0.437 0.484 0.386

Note

† denotes dichotomous variables. Standard errors are reported in parentheses.

***/**/* denote statistical significance at the 1/5/10 percent level, respectively. For each country the sample is all F2F survey household members age 15 and older for the set of households that were successfully interviewed in round 1 of the phone survey.

First, household heads are most likely to be respondents. In all surveys, being the household head has the largest effect on the conditional probability of being the phone survey respondent, increasing that probability by between 31.4 percent in Nigeria and 45.7 percent in Ethiopia (with Malawi- and Uganda-specific impacts being estimated at 39.7 percent and 38.9 percent, respectively). Note that this result already accounts for phone ownership, which is one of the control variables. Being the spouse of the household head also has a large effect in all countries but Nigeria, ranging between 12.8 percent in Ethiopia and 18.3 percent in Uganda. These results are likely driven by the country-specific respondent selection protocols, which tend to favor the household head or their spouse, as discussed in section 2.1. Conditional on household headship and the remaining control variables, men are less likely to be HFPS respondents in Malawi and Uganda, and just as likely as women in being HFPS respondents in Ethiopia and Nigeria. However, men make up the majority of respondents in all four countries (Table 2). This finding is due to household heads being predominantly male combined with the strong effect headship has on being the respondent. The household head effect thus masks the gender dynamics of phone survey response.

Second, it is notable that the household head effect is similar in magnitude in Malawi as in the other three countries (Malawi: 0.397 vs Ethiopia: 0.457; Nigeria: 0.314; Uganda: 0.389), even though the Malawi survey stands out for not targeting the household head as first contact bur rather calling available phone numbers in random order. In spite of this protocol, 79 percent of respondents are household heads in Malawi, not very different from the shares of the other countries (Ethiopia: 83 percent; Nigeria: 82 percent; Uganda: 74 percent). This is due to a combination of factors. On the one hand, phone ownership is skewed towards household heads, so household heads are more likely to be called than other members in the first place.

In the Malawi sample, close to 60 percent of mobile phone owners are household heads and, in a multivariate logit regression, household heads are found to be 32 percent more likely to own a mobile phone, all other things being equal (S3 Table). On the other hand, calling available phone numbers in random order affects who is a household’s first contact; but not all first contacts also ended up being the main respondent. In round 1 of the Malawi HFPS, 66 percent of main respondents were also first contacts. For the remaining 34 percent, the first contact handed the phone to a household member who then became the main respondent. One scenario is when the first contact was not a household member but a reference contact outside of the household because no one in the household owned a mobile phone (see section 2.1). This was the case for about 15 percent of households contacted for round 1 of the Malawi HFPS. Not being a member of the household, the reference contact cannot be the main respondent and so the phone was handed to a member of the household instead. In another scenario, although the first contact was a member of the household, they preferred for another member, often the household head, to be the main respondent.

Third, ownership of a mobile phone increases the probability of being the respondent substantially, ranging from 7.7 percent in Nigeria to 15.4 percent in Malawi (with Ethiopia- and Uganda-specific impacts being estimated at 11.4 percent and 13.9 percent, respectively). This is not surprising in a phone survey context, though the effect is not as strong as the effect of household headship. Consequently, it suggests that phones are handed over from one household member to another to complete the interview.

Fourth, HFPS respondents are more educated than non-respondents in all countries except for Malawi. In Ethiopia and Nigeria, holding any of primary, secondary, post-secondary certificate, or post-secondary degrees increases probability of being a HFPS respondent vis-à-vis adults with no degree. In Uganda, there are effects specifically associated with having primary education and with having a post-secondary certificate. The effect sizes range from two to eight percent.

Fifth, being in an age category older than 15–24 increases the probability of being a phone survey respondent in all countries but Ethiopia, where individuals aged 50+ are not any more likely to be selected as HFPS respondents vis-à-vis individuals aged 15–24. The age effects are particularly pronounced in Uganda, where individuals aged 25–49 and those aged 50+ are respectively 11.2 percent and 9.4 percent more likely to be HFPS respondents compared to individuals aged 15–24.

Sixth, owning a household enterprise increases the probability of being a HFPS respondent in all countries, with the effect sizes ranging from 2.6 to 5.7 percent. The data on participation in casual labor is only available for Malawi and Ethiopia and the results show that this increases the likelihood of being a HFPS respondent by 7.5 percent in Ethiopia and 5.5 percent in Malawi. Given the high prevalence of casual labor in Malawi (estimated 38.6 percent of adults in the F2F survey), this is a relatively strong effect.

Finally, greater household wealth (proxied by household consumption quintiles) leads to a decline in the probability of being a HFPS respondent. However, differences only arise in the third quintile in Nigeria and Uganda, the fourth quintile in Ethiopia, and in the top quintile in Malawi. This suggests that wealthier households are overall less likely to respond to the phone survey, possibly due to higher opportunity cost of their time.

3.2 Assessing bias reduction through weight adjustments

We now turn to assessing how well the various survey weights perform at counteracting the bias associated with phone survey respondent selection. The results of the graphical analysis are shown in Figs 14. The effectiveness of the bias reduction is mixed and depends on the outcome of interest. Compared to the estimates obtained under the HFPS household sampling weights, the estimates based on the HFPS individual weights move closer to those for the general adult population for most individual-level outcomes of interest. However, confidence intervals widen as well. Several points stand out.

Fig 1. Graphical inspection of bias adjustment, Ethiopia.

Fig 1

Fig 4. Graphical inspection of bias adjustment, Uganda.

Fig 4

Fig 2. Graphical inspection of bias adjustment, Malawi.

Fig 2

Fig 3. Graphical inspection of bias adjustment, Nigeria.

Fig 3

First, there are instances where the HFPS household weight (w1) increases the difference between the unweighted respondent data and our benchmark wb-weighted F2F survey sample. Notably, the incidence of headship moves further from the mean in all four countries, though the difference is easier to detect in Nigeria and Uganda. The incidence of being a spouse also shows this pattern across all countries but Uganda. Beyond headship, Ethiopia exhibits larger deviations with household weights (w1) than without for the estimates of the dichotomous variables identifying men and women, those in the youngest and oldest age categories and married individuals. The same is true in Malawi for the youngest and oldest age groups, Nigeria for men and women and individuals that own a household enterprise, and in Uganda for individuals in the age group 25–49, those without an educational degree, individuals that are engaged in wage employment, those that own a household enterprise, and individuals that own a mobile phone. This broad set of instances provides evidence that the HFPS household weights (w1) do not adequately support the analysis of individual-level data on HFPS respondents in a way that is representative of the general adult population.

Second, individual weights (w2) substantially reduce the bias in those variables with the largest deviations from the benchmark mean. Specifically, the over-representation of household heads and mobile phone owners among phone survey respondents cannot be corrected by the HFPS household weights (w1) but is addressed more effectively by individual weights (w2). However, the individual weights only partially eliminate the difference from baseline adults and cause the confidence intervals to widen.

Lastly, there are some cases of over-correction where the individual weights move the mean estimates for the HFPS respondents beyond those that are associated with the benchmark sample of adult household members in F2F surveys. This is true particularly for the estimates of being the spouse of the household head in Malawi and being a woman in Uganda. These biases are introduced through reweighting and are not present in the unweighted data.

Tables 4 and 5 present the results from the weighted linear regressions that are detailed in section 2.3. They allow us to study whether differences between the benchmark means for the general adult population from the pre-COVID-19 F2F survey and the unweighted, household-weighted, and individual-weighted estimates for the HFPS respondents are statistically significant. The results show that the differences between the HFPS respondents and the general adult population are not fully addressed by HFPS individual weights (w2). However, there are a few cases where individual weights do succeed in addressing the bias. In Malawi, the individual weights can deal with over-representation of age group 50+ and under-representation of females. In all countries except for Ethiopia, under-representation of respondents without an educational degree is also mitigated. The over-arching result remains that the individual weights applied to the data on the HFPS respondents move the estimates in the right direction, but they do not successfully eliminate bias. These results hold if the sample is broken down by gender and different age groups. Gender- and age-disaggregated results are presented in S4S7 Tables.

Table 4. Tests of mean differences between face-to-face (F2F) adults and phone respondents: Sex, age, relation to head–as measured in the F2F survey.

  Comparison Group Ethiopia Malawi Nigeria Uganda
Variable Sample Weight Abbrev. Beta p-value Beta p-value Beta p-value Beta p-value
Female Base, All F2F Adults F2F HH Weight b 0.518 0.513 0.515 0.518
Phone respondents Unweighted w0 -0.142 (.000)*** -0.143 (.000)*** -0.243 (.000)*** -0.035 (.003)***
Phone respondents HFPS HH Weight w1 -0.242 (.000)*** -0.118 (.000)*** -0.263 (.000)*** -0.036 (.020)**
Phone respondents HFPS Individual Weight w2 -0.146 (.000)*** 0.035 (.206) -0.068 (.004)*** 0.039 (.028)**
Ages 15–24 Base, All F2F Adults F2F HH Weight b 0.356 0.387 0.313 0.360
Phone respondents Unweighted w0 -0.227 (.000)*** -0.269 (.000)*** -0.256 (.000)*** -0.300 (.000)***
Phone respondents HFPS HH Weight w1 -0.238 (.000)*** -0.300 (.000)*** -0.255 (.000)*** -0.290 (.000)***
Phone respondents HFPS Individual Weight w2 -0.124 (.000)*** -0.193 (.000)*** -0.100 (.000)*** -0.205 (.000)***
Ages 25–49 Base, All F2F Adults F2F HH Weight b 0.478 0.427 0.469 0.450
Phone respondents Unweighted w0 0.188 (.000)*** 0.229 (.000)*** 0.082 (.000)*** 0.148 (.000)***
Phone respondents HFPS HH Weight w1 0.169 (.000)*** 0.208 (.000)*** 0.082 (.000)*** 0.183 (.000)***
Phone respondents HFPS Individual Weight w2 0.068 (.001)*** 0.176 (.000)*** -0.010 (.622) 0.152 (.000)***
Ages 50+ Base, All F2F Adults F2F HH Weight b 0.166 0.186 0.218 0.190
Phone respondents Unweighted w0 0.039 (.000)*** 0.040 (.001)*** 0.175 (.000)*** 0.153 (.000)***
Phone respondents HFPS HH Weight w1 0.069 (.000)*** 0.092 (.000)*** 0.173 (.000)*** 0.107 (.000)***
Phone respondents HFPS Individual Weight w2 0.056 (.001)*** 0.017 (.342)  0.111 (.000)*** 0.053 (.000)***
Head Base, All F2F Adults F2F HH Weight b 0.370 0.341 0.326 0.374
Phone respondents Unweighted w0 0.457 (.000)*** 0.446 (.000)*** 0.501 (.000)*** 0.366 (.000)***
Phone respondents HFPS HH Weight w1 0.486 (.000)*** 0.455 (.000)*** 0.507 (.000)*** 0.369 (.000)***
Phone respondents HFPS Individual Weight w2 0.209 (.000)*** 0.100 (.000)*** 0.120 (.000)*** 0.072 (.000)***
Spouse Base, All F2F Adults F2F HH Weight b 0.259 0.232 0.290 0.238
Phone respondents Unweighted w0 -0.161 (.000)*** -0.068 (.000)*** -0.199 (.000)*** -0.036 (.000)***
Phone respondents HFPS HH Weight w1 -0.192 (.000)*** -0.096 (.000)*** -0.211 (.000)*** -0.032 (.012)**
Phone respondents HFPS Individual Weight w2 -0.069 (.000)*** 0.095 (.001)*** -0.049 (.034)** 0.110 (.000)***

Note: Base row reports the nationally representative mean among all adults in the face-to-face (F2F) survey. Rows other than the base row report the difference from the base and the p-value from a test of significance for that difference. Sample: all adults in F2F surveys, of which phone survey respondents are a sub-sample.

Table 5. Tests of mean differences between face-to-face (F2F) adults and phone respondents: Marital status, education, employment–as measured in the F2F survey.

  Comparison Group Ethiopia Malawi Nigeria Uganda
Variable Sample Weight Abbrev. Beta p-value Beta p-value Beta p-value Beta p-value
Married Base, All F2F Adults F2F HH Weight b 0.549 0.508 0.561 0.525
Phone respondents Unweighted w0 0.114 (.000)*** 0.250 (.000)*** 0.175 (.000)*** 0.204 (.000)***
Phone respondents HFPS HH Weight w1 0.176 (.000)*** 0.213 (.000)*** 0.175 (.000)*** 0.201 (.000)***
Phone respondents HFPS Individual Weight w2 0.116 (.000)*** 0.196 (.000)*** 0.050 (.034)** 0.166 (.000)***
Literate Base, All F2F Adults F2F HH Weight b 0.520 0.747 0.751 0.795
Phone respondents Unweighted w0 0.240 (.000)*** 0.128 (.000)*** 0.070 (.000)*** -0.020 (.046)**
Phone respondents HFPS HH Weight w1 0.060 (.001)*** 0.026 (.206)  0.030 (.026)** 0.007 (.488) 
Phone respondents HFPS Individual Weight w2 0.051 (.027)** 0.019 (.433) 0.021 (.253) -0.018 (.203)
No degree Base, All F2F Adults F2F HH Weight b 0.768 0.676 0.355 0.465
Phone respondents Unweighted w0 -0.282 (.000)*** -0.191 (.000)*** -0.114 (.000)*** 0.011 (.366) 
Phone respondents HFPS HH Weight w1 -0.045 (.003)*** -0.035 (.053)* -0.024 (.076)* -0.029 (.022)**
Phone respondents HFPS Individual Weight w2 -0.053 (.008)*** -0.040 (.124)  -0.025 (.218)  0.010 (.544) 
Wage employment Base, All F2F Adults F2F HH Weight b 0.090 0.089 0.100 0.219
Phone respondents Unweighted w0 0.199 (.000)*** 0.161 (.000)*** 0.113 (.000)*** 0.017 (.056)*
Phone respondents HFPS HH Weight w1 0.050 (.000)*** 0.089 (.000)*** 0.084 (.000)*** 0.076 (.000)***
Phone respondents HFPS Individual Weight w2 0.022 (.029)** 0.039 (.002)*** 0.032 (.011)** 0.035 (.008)***
Enterprise owner Base, All F2F Adults F2F HH Weight b 0.098 0.160 0.281 0.185
Phone respondents Unweighted w0 0.118 (.000)*** 0.181 (.000)*** 0.138 (.000)*** 0.134 (.000)***
Phone respondents HFPS HH Weight w1 0.078 (.000)*** 0.129 (.000)*** 0.179 (.000)*** 0.135 (.000)***
Phone respondents HFPS Individual Weight w2 0.037 (.004)*** 0.063 (.000)*** 0.070 (.001)*** 0.059 (.000)***
Mobile owner Base, All F2F Adults F2F HH Weight b 0.307 0.305 0.797 0.445
Phone respondents Unweighted w0 0.506 (.000)*** 0.518 (.000)*** 0.150 (.000)*** 0.317 (.000)***
Phone respondents HFPS HH Weight w1 0.351 (.000)*** 0.423 (.000)*** 0.133 (.000)*** 0.345 (.000)***
Phone respondents HFPS Individual Weight w2 0.227 (.000)*** 0.262 (.000)*** 0.060 (.001)*** 0.185 (.000)***

Note: Base row reports the nationally representative mean among all adults in the face-to-face (F2F) survey. Rows other than the base row report the difference from the base and the p-value from a test of significance for that difference. Sample: all adults in F2F surveys, of which phone survey respondents are a sub-sample.

3.3. An application with individual-level employment outcomes measured in phone surveys

We now turn to the analysis of individual-level employment outcomes during COVID-19, as measured in the fifth HFPS rounds in Malawi and Nigeria. The objective is to assess the use of individual-level recalibrated weights to analyze HFPS data as it would be done in many research applications. There are three dichotomous outcomes of interest that identify whether, in the past 7 days:

  1. an individual worked to generate income for at least 1 hour, irrespective of type of employment (i.e. any employment),

  2. an individual worked for a wage or salary (i.e. wage employment), and

  3. an individual worked at a household enterprise, as an owner, manager, or a contributing laborer (i.e. self-employment).

The pool of HFPS respondents differs slightly in round 5 vis-à-vis round 1 due to attrition. Therefore, we generate a round 5-specific HFPS individual weight, following the same steps outlined in section 2.2. Fig 5 shows the mean and confidence interval for each employment outcome of interest for:

Fig 5. HFPS round 5 employment outcomes, with HFPS household versus individual weights.

Fig 5

  1. all adults that were interviewed in the F2F survey and that were residing in HFPS households successfully interviewed in round 5, weighted by the round 5 HFPS household sampling weight (w1)–assumed to be representative of the general adult population,

  2. the main HFPS respondents interviewed in round 5, weighted by the round 5 HFPS household sampling weight (w1), and

  3. the main HFPS respondents interviewed in round 5, weighted by the round 5 HFPS individual sampling weight (w2).

We compare (i) an estimate that is assumed to be representative of the general adult population but relies on reports from a proxy, the main HFPS respondent, to (ii) a “naive” estimate applying household survey weights to the sample of HFPS respondents (based on self-reports), and (iii) estimates obtained by weighting that same sample of HFPS respondents with adjusted individual weights.

The mean for (i), which we take as the benchmark estimate in this portion of our analysis, is subtracted from all estimates. Following the approach in Figs 14, values for (ii) and (iii) thus reflect the deviation from the benchmark. Similar to the results presented in section 3.2, the HFPS individual weights succeed in moving the estimates for the HFPS respondents closer to those for the general adult population (except for the incidence of self-employment in Malawi), albeit with widened confidence intervals (Fig 5). When weighted with the HFPS household sampling weights (w1), i.e. the “naïve” estimate, the mean differences remain statistically significant between the estimates for the HFPS respondents and the estimates for all adults residing in HFPS households (Table 6). This result holds regardless of the country and employment outcome of interest. Once weighted with the HFPS individual weights (w2), the estimates of wage employment and self-employment for the HFPS respondents in Nigeria are statistically indistinguishable from the benchmark estimates. However, for the overall employment variable in Nigeria and for all three employment variables in Malawi, the mean differences remain statistically significant between the w2-weighted estimates for the HFPS respondents and the benchmark estimates for all adults residing in HFPS households.

Table 6. Differences in HFPS round 5 employment outcomes between phone survey respondents and adult population.

  Comparison Group   Malawi Nigeria
Variable Sample Weight Abbrev. Beta p-value Beta p-value
Any Employment Adults (base) Final HH Weight w1 0.612 0.72
Respondents Final HH Weight w1 0.204 (.000)*** 0.119 (.000)***
Respondents Individual Weight w2 0.146 (.000)*** 0.048 (.018)**
Wage Employment Adults (base) Final HH Weight w1 0.158 0.086
Respondents Final HH Weight w1 0.066 (.000)*** 0.026 (.003)***
Respondents Individual Weight w2 0.027 (.099)* 0.016 (.220) 
Self-Employment Adults (base) Final HH Weight w1 0.119 0.297
Respondents Final HH Weight w1 0.086 (.000)*** 0.024 (.050)*
Respondents Individual Weight w2 0.094 (.000)*** -0.001 (.975) 

Note: Base row reports the nationally representative mean among all adults present in both the F2F and phone surveys. Rows other than the base row report the difference from the base in the sample of respondents present in both F2F and phone surveys, and a p-value from a test of significance for that difference. Employment = 1 if individual spent any time in the last seven days doing specified work, 0 otherwise. All employment data are from wave 5 post-COVID survey in Malawi or wave 5 post-COVID survey in Nigeria.

The disaggregated employment results in S8 and S9 Tables are consistent with the findings presented in Table 6. In Nigeria, the individual weights remove the differences in estimates for the HFPS respondents versus the general adult population for wage employment and for self-employment, except among individuals aged 25–49 where a significant difference remains for wage employment. The differences in overall employment variable remain significant among the male sub-population, and individuals aged 25–49 in Nigeria, which are the largest sub-populations of HFPS respondents. In Malawi, the individual-weighted estimates for the HFPS respondents are statistically indistinguishable from the benchmark estimates for wage employment among males and for overall and wage employment among individuals aged 25–49. The HFPS household weights also mitigate bias in some subpopulations, particularly in Nigeria, but there are no cases where the individual weights do not perform at least as well. Overall, these results suggest that while individual weights can be more effective than household weights in reducing the bias in the analysis of individual level data on the main HFPS respondents, they are still insufficient to eliminate the bias in full.

4. Conclusion

Our analysis has confirmed that phone survey respondents in Ethiopia, Malawi, Nigeria, and Uganda are significantly different from the general adult population in a range of demographic, education, and labor market characteristics. On average, respondents are significantly more likely to be household heads or their spouses, and they tend to be older, more educated, and more likely to own a household enterprise.

We then assess how well reweighting can address these selection biases. For this, we recalibrate the HFPS household sampling weights based on propensity score adjustments derived from a cross-country comparable model of an adult individual’s likelihood of being interviewed, as a function of both individual- and household-level attributes. The individual-level reweighting reduces the bias, consistently moving the estimates for the phone survey respondents closer to those for the general adult population for a range of variables. However, individual-level reweighting fails to fully overcome the biases in most cases as the differences in means remain statistically significant for most of the outcomes of interest.

An application of the individual-level recalibrated weights to the phone survey data serves as a validation of our initial results. Using individual-level phone survey data, we show that respondents’ labor market outcomes during the COVID-19 pandemic differ from the adult population living in phone survey households. Here, too, individual-level reweighting is a step in the right direction but is ultimately insufficient.

Our findings have implications both for the use of existing phone survey data for individual-level analysis and for the design of future phone surveys. Phone surveys have proven critical tools to meet the urgent demand for data on the impacts of the COVID-19 pandemic in low- and middle-income countries and phone survey data have been used widely to provide insights on a broad range of issues related to the pandemic.

Across Ethiopia, Malawi, Nigeria, and Uganda alone, a total of 42 national phone survey rounds have been implemented from April 2020 to June 2021, amounting to a total of over 81,000 interviews. In the same timeframe, 28 analytical survey reports, several World Bank publications, cross-country journal publications, working papers and policy briefs were produced, with a total download count of over 21,000 [38, 4145]. The phone surveys in the four countries we study represent a fraction of the phone surveys fielded in the context of the COVID-19 pandemic. Many national statistical offices reported implementing phone surveys [3], the World Bank supported phone surveys in over 100 countries, and many other organizations rolled out phone surveys of their own.

Our results are relevant for individual-level analyses undertaken with these data. Specifically, where phone surveys are based on a frame of phone numbers from a previous F2F survey, making full use of the available information to recalibrate weights at the individual-level is worthwhile to achieve better representativeness. The availability of information on both individuals who participate in the survey and individuals who do not is an important advantage of using phone numbers from a previous F2F survey. There are also reweighting techniques for phone surveys based on random digit dialing (RDD) [21, 29], which was used frequently for COVID-19 related phone surveys in low- and middle-income countries. However, we are not aware of any systematic attempt at assessing their effectiveness in the context at hand.

In any case, in phone surveys in which individual-level data are available only for the main respondent and respondent selection was not based on a probability sampling method, our findings suggest that achieving fully representative individual-level estimates is unlikely to be ultimately feasible and researchers should be aware of these limitations.

The rapid design and successful implementation of high-frequency phone surveys during the COVID-19 pandemic has been an unprecedented learning experience on the part of national statistical offices in low- and middle-income countries and international agencies and donor organizations that have provided financial and technical support to these operations. Phone surveys are therefore expected to be part of the post-pandemic survey landscape in low- and middle-income countries, complementing face-to-face surveys.

In view of our findings regarding the limits of representativeness of individual-level phone survey data in four African countries, survey implementers should think more critically about respondent selection protocols in future phone surveys.

A desirable option is to randomly select an adult household member to be interviewed in each household on topics that are related to individuals and personal experiences. In the context of the on-going HFPS rounds and future phone surveys that use existing household surveys as sampling frames, the interview target can be selected at random (without replacement) in each household following a household roster update. Upon the selection of the interview target, the current phone survey respondent can be asked to (i) either pass the phone to the selected individual if he or she is available, (ii) provide a phone number for the selected individual if a person-specific phone number exists, or (iii) coordinate with the selected individual to converge on a date and time for an interview using the current respondent’s phone. Depending on the objective of the study, the randomly selected household member can ultimately replace or be in addition to the main phone survey respondent. Retaining the ‘most knowledgeable’ household member as one of the respondents may be desirable when collecting reliable household-level data is a priority.

Attempting to interview all adult members would be yet another option. Conducting several interviews per household would impinge on the comparatively low costs of phone surveys though and likely increase the scope for non-response. Given limited prior experiences with such variations in respondent selection protocols, a sensible first step would be to pilot one or several of these improved options in a random subset of households in future phone surveys to better understand the subsequent impacts on consent, non-response and attrition.

Supporting information

S1 Table. Descriptive statistics.

Ethiopia and Malawi. Notes: No weights are used. † denotes a dichotomous variable Respondent identifies whether the individual was a HFPS respondent–set to 1 for all individuals under the Phone Respondents column. These variables originate from the pre-COVID-19 F2F survey in each country.

(PDF)

S2 Table. Descriptive statistics.

Nigeria and Uganda. Notes: No weights are used. † denotes a dichotomous variable. Respondent identifies whether the individual was a HFPS respondent–set to 1 for all individuals under the Phone Respondents column. These variables originate from the pre-COVID-19 F2F survey in each country.

(PDF)

S3 Table. Marginal effects from logit regressions on mobile ownership in the sampled household baseline datasets.

Notes: † denotes dichotomous variables. Standard errors are reported in parentheses. ***/**/* denote statistical significance at the 1/5/10 percent level, respectively. For each country the sample is all F2F survey household members age 15 and older for the set of households that were successfully interviewed in round 1 of the phone survey.

(PDF)

S4 Table. Ethiopia: Tests of difference between face-to-face adults and phone respondents, disaggregated by sex and age group.

Notes: Base row reports the nationally representative mean among all adults in the face-to-face survey. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference.

(PDF)

S5 Table. Malawi: Tests of difference between face-to-face adults and phone respondents, disaggregated by sex and age group.

Notes: Base row reports the nationally representative mean among all adults in the face-to-face survey. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference.

(PDF)

S6 Table. Nigeria: Tests of difference between face-to-face adults and phone respondents, by sex and age group.

Note: Base row reports the nationally representative mean among all adults in the face-to-face survey. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference.

(PDF)

S7 Table. Uganda: Tests of difference between face-to-face adults and phone respondents, by sex and age group.

Notes: Base row reports the nationally representative mean among all adults in the face-to-face survey. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference.

(PDF)

S8 Table. Difference between adults and phone respondent employment outcomes, by sex.

Notes: Base row reports the HFPS-based nationally representative mean among all adults present in the face-to-face and phone surveys. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference. Employment = 1 if individual spent any time in the last seven days doing specified work, 0 otherwise. All data are from the fifth round of the HFPS in Malawi and Nigeria.

(PDF)

S9 Table. Difference between adults and phone respondent employment outcomes, by age group.

Notes: Base row reports the HFPS-based nationally representative mean among all adults present in the face-to-face and phone surveys. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference. Employment = 1 if individual spent any time in the last seven days doing specified work, 0 otherwise. All data are from the fifth round of the HFPS in Malawi and Nigeria.

(PDF)

Acknowledgments

The authors would like to thank Kathleen Beegle, Calogero Carletto, Isabela Coelho, Kristen Himelein, and Yannick Markhof for their comments on the earlier versions of this paper. We also thank the individuals involved in the design, implementation and dissemination of high-frequency phone surveys on COVID-19, specifically the World Bank LSMS team, and the phone survey managers and interviewers at the Malawi National Statistical Office, the Nigeria Bureau of Statistics, the Uganda Bureau of Statistics and Laterite Ethiopia.

Data Availability

The data are available from the World Bank's Microdata Library, High Frequency Phone Survey Catalog (https://microdata.worldbank.org/index.php/catalog/hfps) as well as LSMS catalog (https://microdata.worldbank.org/index.php/catalog/lsms).

Funding Statement

Funding for data collection and analysis comes from the World Bank Multi-Donor Trust Fund for Integrated Household and Agricultural Surveys in Low and Middle-Income Countries (TF072496). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

Decision Letter 0

Bjorn Van Campenhout

21 Jul 2021

PONE-D-21-17113

Representativeness of individual-level data in COVID-19 phone surveys: Findings from Sub-Saharan Africa

PLOS ONE

Dear Dr. Wollburg,

Let me apologize first for taking much more time than expected. All reviewers kept requesting for extensions of deadlines and in the current situation it is hard for editors to not grant these. Even at this point, one review is still outstanding, but I decided we should be able to proceed with the two I received at this point.

From the assessments of the reviewers, you will see that both reviewers feel this is important research, but also have some important questions on the methodology used (why deciles?) and suggestions on how to make the paper stronger overall. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 04 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Bjorn Van Campenhout, Ph.D.

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Reviewers' comments:

Reviewer #1: During the COVID-19 pandemic some of the LSMS-ISA household surveys were used as sampling bases for new phone surveys. This paper assesses the reliability of these phone surveys to provide individual-level statistics. One of the main constraints to doing this is that 74% to 83% of the phone calls were made to household heads, so the question arises how you can use this information to say something about the population as a whole. The public use data already include household-level weights that correct for non-response in the phone survey. This paper shows that those weights are not adequate to get accurate individual level representation and shows that a reweighting procedure using individual weights provides some improvement.

My main comments are about (i) framing of the paper and (ii) calculation of the weights and (iii) the tests.

(i) Clearly the main problem the authors have to deal with is that phone numbers were mainly collected from household heads and even where they were not, the interviewers ended up speaking to the heads. It is obviously not going to be easy to say something about the general population if you have such a specific sample of household heads. So the problem is really: with four fifths of the sample household heads, how can we say something about individual responses (on, say, knowledge of COVID-19) of everyone else. Presumably this is going to come about by giving lots of weight to those who are not heads and don’t have the typical head characteristics. There is no methodological innovation in this paper as such reweighing is quite standard, but because this is an important and widely used data source there is, in my opinion, merit in knowing whether we can use LSMS-ISA data for individual level analysis.

The problem is also quite specific to this survey constellation in LSMS-ISA. If the survey designers had known that a phone survey would follow, they very likely would not have set it up this way. So the question is one of how much reweighing can help in this specific survey.

These important points to highlight when framing the paper.

(ii) There is something odd about the two-step procedure you use. You estimate the probability of each individual to end up being interviewed in equation 1 and use the inverse (of the decile-average) to reweigh. But instead of applying that weight directly you multiply it with the household weight (in equation 3). I don’t understand the logic of doing this. Why not just use the inverse of the probability that each individual ends up in the sample as sampling weight? That is the tried and tested technique used widely to correct for such problems. If you do want to argue in favor of the two-step procedure then you should take account of your nested procedure when estimating equation 1.

(iii) There is something circular about your tests: you use a set X of observed characteristics of individuals to reweigh the observations. Then you see whether after reweighing this same set of characteristics X better reflects the population average. That is setting yourself mechanically up for success. And I think that if you drop the two-step procedure and use the single step procedure you will get a pretty good match. This is something to acknowledge and address in the paper.

Four specific questions:

• Could you explain why you use deciles of the predicted probabilities rather than the probabilities themselves?

• Section 2.6 is not very clearly written; I only understood what you were trying to do after reading the results section, which defeats the purpose of having a section explaining the empirical set-up. I suggest a careful rewrite.

• What do you mean that the winsorized weight are post-stratified (p19). Could you explain this part better?

• Why is winsorization necessary? You have already taken deciles for you weights so presumably there are no outliers. Could you provide some specific numbers to show what is going on in the extremes?

Reviewer #2: This article analyzes whether unweighted phone survey data are subject to selection bias, and whether the use of household- and individual-level weights correct for such bias. The paper uses household rosters obtained from pre-COVID face-to-face household surveys as nationally representative data, and as a benchmark for what the survey population of phone surveys administered during the pandemic should look like. Obviously, as the phone survey protocols targeted the household head, the unweighted phone survey sample does not resemble the nationally representative adult population. Interestingly, the use of household-level weights does not help address this bias, and in many cases, makes the phone survey sample even less representative of the adult population. Although individual-level weights perform better, this method is not sufficient as a bias correction technique, as differences between the nationally representative adult sample and the phone survey sample remain significant for several variables.

The analyses performed in this paper are interesting and worthwhile publishing, but I believe that the paper needs to be rewritten. First and foremost, I find the question of whether the phone survey data are representative of the adult population not very novel or interesting and would suggest taking that result for granted or as an obvious outcome of the survey protocols, which were not designed to get a representative sample of individuals. The protocols targeted households, and were mostly blind towards selection of respondents within the household.

The real added value of this paper is therefore the analyses of the bias correction methods: when we do our phone surveys with just one member of the household, usually the household head, can we use weighting methods to correct for a potential selection bias? The paper finds that this is not the case, and this has important implications for respondent selection protocols, if one is interested in individual- instead of household-level data. This could be focused on much more throughout the abstract, introduction, and conclusion.

For instance, it is not salient from the abstract that the focus is on selection bias from an individual rather than household point of view, and as a result, the last sentence of the abstract comes as a surprise. The introduction could be shorter and more focused on the issue at hand - that most phone surveys are done with just one person in the household, typically the household head, and thereby not representative of the adult population. The main question, then, is whether bias correction methods can make the data more representative, or whether we should adjust the methods through which we select respondents within a household. To make this salient, it would help if the paper introduced in the last paragraph on page 8 (using the same data) is introduced much earlier in the introduction.

Second, and related to the first comment, the data from Malawi appear underutilized in better understanding the drivers of the selection bias. In all other countries, the protocol was to first call the household head; but in Malawi, phone numbers were called in random order. Despite that, Malawi still has significant selection bias, and it would be good to know whether this is because household members are passing the phone to the household head, or whether certain types of household members are less likely to pick up their phone, and replaced by the next number on the list.

Third, the weighting is an important aspect of the paper, but the discussion of how the weights are constructed is difficult to follow and although I understand that the authors do not want to provide the technical details here, as they are presented in a different paper, the presentation could benefit from more intuition. For instance, why are deciles being created? What is their use (and why deciles as opposed to for instance quintiles or quartiles)? And why is post-stratification applied to the weights?

Fourth, I am not convinced by the added value of the employment outcomes. The supposedly representative data are reported by proxies, except for the data that the respondents provide for themselves. If we find differences between the individual-weighted and benchmark data, is that because the weighting does not work, or because the benchmark data are biased? The authors discuss this in the conclusion, but it raises questions around the added value of this comparison; unless the authors could show that findings are the same regardless of whether we study variables on which there should be less asymmetric information and more accurate reporting by proxies, versus variables where we would expect more error.

Finally, with the large set of variables that the authors look into, times the 4 countries for which the analyses are replicated, there are a lot of numbers to review in order to draw conclusions. Although I see why the weights are estimated at the country level, in presenting the results on bias correction, the regressions could be presented at the aggregate level, combining all countries in one regression, and the country-specific analyses could be presented in an appendix. Alternatively, would there be scope for an aggregation over the different variables, in order to reduce the number of coefficients that need to be interpreted and aggregated by eyeballing before drawing conclusions?

Other comments:

- The first part of the introduction appears negligent of a large body of literature on phone surveys and survey design; but in fact this literature is mentioned towards the end of the introduction. In this case, I would consider integrating the existing literature in the first part of the introduction, and presenting the findings along with their implication as the key contribution, to shorten the introduction and appear less negligent of the work that has already been done in this field.

- The figures were not visible in the manuscript itself, only as the appendices of the submission.

- A bit more explanation on why certain things vary across countries would be useful. For instance, why are the fifth round data for two countries excluded? Did these not include the employment outcomes? But how does that rhyme with all surveys being standardized across the four countries? And why were the selection protocols different in the four countries?

- At times, the paper reads a bit like a promotional brochure for the World Bank (for instance the introduction claims that the World Bank is the prime institute doing and learning around phone surveys, followed by 3-4 other institutions, among others). The fact that the anonymized surveys are published online a few weeks after completion could be a footnote in the methods section and does not require an entire paragraph. The paragraph in the conclusion stating how widely these data are used is irrelevant unless it is used to illustrate that people are using the data for individual-level analyses and therefore using the wrong household-level weights.

- Bottom of page 19 (last sentence), note that there is a small typo: households instead of household.

- IRB: Clarify that the data to which the authors had access were anonymized and that the merging of phone survey and face-to-face data was based on an anonymized household ID. Even if for data collection no ethics approval was needed, for a researcher to work with data that contain personal identifiers, and merge data from different sources, IRB should be obtained.

Overall, though, the paper presents an extremely useful analysis that needs to be published, since we should be more aware that our household-level weight adjustments often do more harm than good if using individual-level data, and that survey respondent selection protocols need to be adjusted when the objective is individual-level analysis.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Nov 17;16(11):e0258877. doi: 10.1371/journal.pone.0258877.r002

Author response to Decision Letter 0


4 Sep 2021

Responses to Comments from Reviewer #1

Comment: During the COVID-19 pandemic some of the LSMS-ISA household surveys were used as sampling bases for new phone surveys. This paper assesses the reliability of these phone surveys to provide individual-level statistics. One of the main constraints to doing this is that 74% to 83% of the phone calls were made to household heads, so the question arises how you can use this information to say something about the population as a whole. The public use data already include household-level weights that correct for non-response in the phone survey. This paper shows that those weights are not adequate to get accurate individual level representation and shows that a reweighting procedure using individual weights provides some improvement.

My main comments are about (i) framing of the paper and (ii) calculation of the weights and (iii) the tests.

(i) Clearly the main problem the authors have to deal with is that phone numbers were mainly collected from household heads and even where they were not, the interviewers ended up speaking to the heads. It is obviously not going to be easy to say something about the general population if you have such a specific sample of household heads. So, the problem is really: with four fifths of the sample household heads, how can we say something about individual responses (on, say, knowledge of COVID-19) of everyone else. Presumably this is going to come about by giving lots of weight to those who are not heads and don’t have the typical head characteristics. There is no methodological innovation in this paper as such reweighing is quite standard, but because this is an important and widely used data source there is, in my opinion, merit in knowing whether we can use LSMS-ISA data for individual level analysis.

The problem is also quite specific to this survey constellation in LSMS-ISA. If the survey designers had known that a phone survey would follow, they very likely would not have set it up this way. So, the question is one of how much reweighing can help in this specific survey.

These important points to highlight when framing the paper.

Response: Thank you for these thoughtful comments. We agree with the fact that the issue we discuss is specific to a phone survey setup where the sampling frame is based on phone numbers collected from a previous survey and respondent selection targeted the most knowledgeable adult, as it was done in the case of the national phone surveys that have been supported by the Living Standards Measurement Study.

However, we contend that this kind of setup is relevant not only for our phone surveys of interest but for many phone surveys conducted in LMICs. In a research synthesis commissioned by IPA, Henderson and Rosenbaum, 2020 describe the most common ways phone survey sampling is conducted in LMICs. These are: (i) using an existing list of phone numbers from a previous survey or program ( the LSMS-ISA-supported pre-COVID-19 face-to-face surveys in our case), (ii) using a list of phone numbers provided by e.g. mobile network operators, (iii) random digit dialing (RDD). In the same paper, which focuses on gathering the available evidence on phone surveys in LMICs available from before the pandemic, an inventory of 21 surveys using a similar setup to the ones we discuss in our paper is presented (vs. 9 surveys using RDD vs. 3 using mobile network operator lists). In addition, in the surge of phone surveys that the COVID-19 pandemic precipitated, the practice of using phone numbers from pre-COVID-19 face-to-face surveys was quite common. Some examples of this are the IPA-led Cote d’Ivoire Recovr Survey as well as World Bank-supported High Frequency Phone Surveys in Cambodia, Chad, Mali, Burkina Faso, Kenya, Sao Tome and Principe, and Zambia. In the latter set of phone surveys supported by the World Bank, the phone survey respondent was also NOT selected randomly among the eligible adult household members (as in the case of the phone surveys that we analyze and that provide a motivation for our work).

Given the relative success of phone surveys during the COVID-19 pandemic, phone surveys are expected to play a bigger role in survey data collection in LMICs going forward. So, we view our paper as also informing these future efforts on how to collect more representative data at the individual level, and we now reflect this reasoning in the Introduction and Conclusion sections

Comment: (ii) There is something odd about the two-step procedure you use. You estimate the probability of each individual to end up being interviewed in equation 1 and use the inverse (of the decile-average) to reweigh. But instead of applying that weight directly you multiply it with the household weight (in equation 3). I don’t understand the logic of doing this. Why not just use the inverse of the probability that each individual ends up in the sample as sampling weight? That is the tried and tested technique used widely to correct for such problems. If you do want to argue in favor of the two-step procedure then you should take account of your nested procedure when estimating equation 1.

Response: Thank you for this comment. In what follows, we explain our reasoning regarding why we do the sampling weight calibration in the way that we do (and hence, why we prefer to stick to it).

First, we do not have a “nested” model per se, when we are following the “two-step” procedure. That is, we start out with the phone survey household sampling weights as given and as disseminated in the public use datasets. These weights had already been subject to (a) adjustments to counteract selection bias at the household-level and (b) post-stratification to match estimated population totals. We then augment these weights for the phone survey sample with an additional adjustment based on a cross-country comparable propensity score model that we estimate anew and that provides us with estimated probabilities of being interviewed among adult household members within the phone survey sample.

Second, our objective is to carry out a calibration exercise that an average data user can consider doing with the public use datasets. This was not clearly stated in the original manuscript and the revision rectifies this shortcoming.

Specifically, the phone survey household sampling weights in the public use datasets readily take into account the differential selection into the HFPS sample at the household-level. What is not so obvious is that this process actually starts out by using information that is actually NOT publicly available, specifically the information on whether a phone number is available for at least one household member or a reference individual.

The availability of this information determines the initial HFPS interview targets out of the total pre-COVID-19 survey sample. In turn, the first sampling weight adjustment that is carried out at the household-level by the data producers is to multiply the pre-COVID-19 household sampling weights for the initial HFPS interview targets with the inverse of the probability of selection into the phone survey. The probability for a given household is defined as the total number of households that was attempted to be called in household’s region divided by the total number of pre-COVID-19 survey households in that region. The first sampling weight adjustment is then augmented with an additional adjustment in order to account for non-response among the households targeted for phone interviews.

As an average data user, if you seek to carry out the single-step correction suggested by the reviewer, you still would need to apply an initial adjustment to the pre-COVID-19 sampling weight associated with each adult living in a household selected for phone interviews – along the lines of the “first sampling weight adjustment” described in the previous paragraph. You would of course not be able to do this since the information is not publicly available – that is, you do not know which adults were living in pre-COVID-19 survey households that were attempted to be called by the phone survey. The data users only know the successfully interviewed household sample and individual household members within, and they can link them (through the unique identifiers) with their pre-COVID-19 survey data.

In the revised section 2.4, we now note the following: “In what follows, we detail an approach that can be followed by any potential data user, leveraging solely the publicly available data on successfully interviewed HFPS households and their adult household members as captured in the pre-COVID-19 F2F survey and the HFPS.”

Comment: (iii) There is something circular about your tests: you use a set X of observed characteristics of individuals to reweigh the observations. Then you see whether after reweighing this same set of characteristics X better reflects the population average. That is setting yourself mechanically up for success. And I think that if you drop the two-step procedure and use the single step procedure you will get a pretty good match. This is something to acknowledge and address in the paper.

Response: Thank you for this comment. We have now made sure to acknowledge, in sections 2.5 and 2.6, that we are testing the set of individual-level variables that we also use in the reweighting model (alongside household-level variables).We contend that our analysis shows that attempting to adjust this many variables at once is quite complex and may not necessarily lead to a completely successful reweighting outcome.

In any case, we agree with your point on testing the same set of characteristics used in the reweighting model. At the same time, the tests we perform in sections 2.6 and 3.3 with phone survey-based employment outcomes present an application of the individual-level recalibrated weights on variables that were not used in the reweighting model. These tests can be considered as a validation exercise that also mimics how analysts would use the phone survey data in their research. Specifically, in round 5 of Malawi and Nigeria we have data for all adult household members, as opposed having data only for the main respondent. This allows us to show (acknowledging some scope for proxy respondent bias)how well reweighted estimates for the main respondent align with the estimates based on all adult household members and the existing phone survey household sampling weights (i.e. without additional calibration). The original manuscript did not do a very good job at conveying this idea clearly – as you also pointed out in another comment. We have revised sections 2.6 and 3.3 and feel they are now much clearer and add a validation angle to our analysis.

Comment: Four specific questions:

• Could you explain why you use deciles of the predicted probabilities rather than the probabilities themselves?

Response: Thank you. In using deciles of predicted probabilities, as well as trimming/winsorizing and post-stratifying, we are following the advice and best practices laid out in the sampling literature relevant to these kinds of surveys, specifically Himelein, 2014; Himelein et al., 2020; Little et al., 1997; and Rosenbaum and Rubin, 1984. On the use of deciles: the original idea belongs to Rosenbaum and Rubin (1984) and is picked up by Himelein (2014), which has provided the basis for computing longitudinal sampling weights for the surveys that have been supported by the LSMS-Integrated Surveys on Agriculture Initiative since 2008 – including for the pre-COVID-19 household surveys that served as the sampling frames for the phone surveys. The basic idea is to balance treatment (respondent) and control (non-respondents) observations on a large set of observable characteristics. Dividing observations into deciles of predicted probabilities rather than the predicted probabilities themselves, creates subgroups with equal predicted probability that contain both treated and control observations. This would not be the case with the ‘raw’ predicted probability variable since it is based on so many different variables that each observation would receive its own value. Rosenstein and Rubin (1984) discuss this issue on p. 517 of their article. We have added a sentence summarizing this reasoning briefly.

Comment: Section 2.6 is not very clearly written; I only understood what you were trying to do after reading the results section, which defeats the purpose of having a section explaining the empirical set-up. I suggest a careful rewrite.

Response: Thank you for this comment. We have carefully revised section 2.6 (as well as section 3.3, the corresponding results section). We feel this has improved its clarity and purpose. We reframed section 2.6 to make clear that it is a type of validation of our initial results, an application of the individual-level recalibrated weights to an analysis of labor market outcomes using the phone survey data in a way that researchers may use the phone survey data. The section then seeks to assess how much of a difference it makes to use the recalibrated weights and how successful they are at overcoming differences owed to the selection of respondents.

Comment: What do you mean that the winsorized weight are post-stratified (p19). Could you explain this part better?

Response: Thank you. This means that the raw weights w_{i,\\ af} are winsorized and then, in the next step, the winsorized weights are weights are post-stratified. We winsorize to deal with extreme outliers, which in turn reduces standard errors and makes estimates more efficient. We post-stratify to ensure the weights sum up to known population totals and reduce overall standard errors. These steps are anchored in the published literature on the topic (Himelein, 2014) as well as the World Bank sampling design guidance document for high-frequency phone surveys on COVID-19 (Himelein, Kastelic et al. 2020). We have clarified this in the manuscript.

Comment: Why is winsorization necessary? You have already taken deciles for you weights so presumably there are no outliers. Could you provide some specific numbers to show what is going on in the extremes?

Response: Thank you for this comment. Winsorization aims to deal with extreme outliers and therefore reduces standard errors and increases the precision of the estimates. Reducing standard errors is important in this context because reweighting always increases standard errors, and especially so with individual-level reweighting where we are trying to adjust for a quite a large distortion (e.g. a majority of respondents are household heads). We perform winsorization on the variable \\ w_{i,\\ af}={af}_{D=d}\\ast w1, which is the household level weight w1 multiplied by the adjustment factor, af. The latter is deciles of the predicted probability as discussed above. However, after multiplication with w1, outliers are well possible, and we therefore winsorize. We had also performed a number of tests with trimmed vs untrimmed weights, and while our main conclusions were qualitatively unaltered, we follow the best practices outlined in the literature. For your information, we provide a set of summary statistics of the trimmed and untrimmed w2 weight variables.

Ethiopia Malawi Nigeria Uganda

pre-winsorize winsorized pre-winsorize winsorized pre-winsorize winsorized pre-winsorize winsorized

mean 15,797 12,647 6,079 4,995 42,988 314,648 10,411 8,976

min 62 134 44 69 1,010 1,393 766 1,007

p1 123 134 61 69 1,179 2,835 915 1,007

p50 3,232 3,232 1,728 1,728 13,362 354,348 3,722 3,722

p99 275,707 133,899 115,144 54,180 513,444 354,348 118,152 59,407

max 977,770 133,899 208,113 54,180 1,682,558 354,348 243,572 59,407

Responses to Comments from Reviewer #2

Comment: This article analyzes whether unweighted phone survey data are subject to selection bias, and whether the use of household- and individual-level weights correct for such bias. The paper uses household rosters obtained from pre-COVID face-to-face household surveys as nationally representative data, and as a benchmark for what the survey population of phone surveys administered during the pandemic should look like. Obviously, as the phone survey protocols targeted the household head, the unweighted phone survey sample does not resemble the nationally representative adult population. Interestingly, the use of household-level weights does not help address this bias, and in many cases, makes the phone survey sample even less representative of the adult population. Although individual-level weights perform better, this method is not sufficient as a bias correction technique, as differences between the nationally representative adult sample and the phone survey sample remain significant for several variables.

The analyses performed in this paper are interesting and worthwhile publishing, but I believe that the paper needs to be rewritten. First and foremost, I find the question of whether the phone survey data are representative of the adult population not very novel or interesting and would suggest taking that result for granted or as an obvious outcome of the survey protocols, which were not designed to get a representative sample of individuals. The protocols targeted households and were mostly blind towards selection of respondents within the household.

The real added value of this paper is therefore the analyses of the bias correction methods: when we do our phone surveys with just one member of the household, usually the household head, can we use weighting methods to correct for a potential selection bias? The paper finds that this is not the case, and this has important implications for respondent selection protocols, if one is interested in individual- instead of household-level data. This could be focused on much more throughout the abstract, introduction, and conclusion.

For instance, it is not salient from the abstract that the focus is on selection bias from an individual rather than household point of view, and as a result, the last sentence of the abstract comes as a surprise. The introduction could be shorter and more focused on the issue at hand - that most phone surveys are done with just one person in the household, typically the household head, and thereby not representative of the adult population. The main question, then, is whether bias correction methods can make the data more representative, or whether we should adjust the methods through which we select respondents within a household. To make this salient, it would help if the paper introduced in the last paragraph on page 8 (using the same data) is introduced much earlier in the introduction.

Response: Thank you for these thoughtful comments. We agree that the analyses of bias correction methods are the most important contribution of the paper. The differences between the sample of respondents and the general adult population provide the backdrop for these analyses. We have edited (and shortened) the abstract, introduction, and conclusion to improve the framing of the paper in line with your feedback. Specifically, in the introduction, we make salient the focus on individual-level data and we focus more directly on the role of bias correction. In the conclusion, we now discuss the implications of our findings for using existing data for individual-level analyses and how respondent selection may be altered going forward to improve individual-level data, if that is compatible with the objectives of the survey.

Comment: Second, and related to the first comment, the data from Malawi appear underutilized in better understanding the drivers of the selection bias. In all other countries, the protocol was to first call the household head; but in Malawi, phone numbers were called in random order. Despite that, Malawi still has significant selection bias, and it would be good to know whether this is because household members are passing the phone to the household head, or whether certain types of household members are less likely to pick up their phone, and replaced by the next number on the list.

Response: Thank you for raising this point. We agree the Malawi data can be utilized to provide additional insights. We therefore addressed this concern by adding a thorough discussion of the Malawi results in section “3.1. Phone survey respondents versus the general adult population”. For this, we gathered additional information from the country team to better understand the steps taken between first contact with the household and the first completed interview with the household. Second, we added in the Appendix a table with regressions detailing the determinants of mobile ownership similar to Table 3. Based on this, we now write in section 3.3:

“[…] it is notable that the household head effect is similar in magnitude in Malawi as in the other three countries (Malawi: 0.397 vs Ethiopia: 0.457; Nigeria: 0.314; Uganda: 0.389), even though the Malawi survey stands out for not targeting the household head as first contact bur rather calling available phone numbers in random order. In spite of this protocol, 79 percent of respondents are household heads in Malawi, not very different from the shares of the other countries (Ethiopia: 83 percent; Nigeria: 82 percent; Uganda: 74 percent). This is due to a combination of factors. On the one hand, phone ownership is skewed towards household heads, so household heads are more likely to be called than other members in the first place. In the Malawi sample, close to 60 percent of mobile phone owners are household heads and, in a multivariate logit regression, household heads are found to be 32 percent more likely to own a mobile phone, all other things being equal (S 3 Table). On the other hand, calling available phone numbers in random order affects who is a household’s first contact; but not all first contacts also ended up being the main respondent. In round 1 of the Malawi HFPS, 66 percent of main respondents were also first contacts. For the remaining 34 percent, the first contact handed the phone to a household member who then became the main respondent. One scenario is when the first contact was not a household member but a reference contact outside of the household because no one in the household owned a mobile phone (see section 2.1). This was the case for about 15 percent of households contacted for round 1 of the Malawi HFPS. Not being a member of the household, the reference contact cannot be the main respondent and so the phone was handed to a member of the household instead. In another scenario, although the first contact was a member of the household, they preferred for another member, often the household head, to be the main respondent.”

Comment: Third, the weighting is an important aspect of the paper, but the discussion of how the weights are constructed is difficult to follow and although I understand that the authors do not want to provide the technical details here, as they are presented in a different paper, the presentation could benefit from more intuition. For instance, why are deciles being created? What is their use (and why deciles as opposed to for instance quintiles or quartiles)? And why is post-stratification applied to the weights?

Response: Thank you for this comment. We have tried to add some more intuitive reasoning especially to the questions of winsorization and post-stratification. Winsorization is done to deal with extreme outliers, which reduces standard errors and makes estimates more efficient. Post-stratification serves the purpose of (i) ensuring the weights sum up to known population totals and (ii) further reduce overall standard errors, which is important for this paper since reweighting increases standard errors, especially in this application. As noted in response to similar comments from Reviewer #1, our empirical approach is anchored in the recommendations in the peer-reviewed literature.

As for the use of deciles, please see our response above to a similar comment from Reviewer #1. We added a sentence that summarizes the reasoning, though for a more detailed account, readers should consult the relevant cited literature.

Comment: Fourth, I am not convinced by the added value of the employment outcomes. The supposedly representative data are reported by proxies, except for the data that the respondents provide for themselves. If we find differences between the individual-weighted and benchmark data, is that because the weighting does not work, or because the benchmark data are biased? The authors discuss this in the conclusion, but it raises questions around the added value of this comparison; unless the authors could show that findings are the same regardless of whether we study variables on which there should be less asymmetric information and more accurate reporting by proxies, versus variables where we would expect more error.

Response: Thank you for this comment. We have revisited the phone survey employment outcomes sections (in 2.6 and 3.3) to decide how these results may best be used to add value to the overall analysis – also in light of a set of comments from reviewer #1. The write up of these sections lacked in clarity and direction and so we have decided to reframe it slightly: We see the value added of this part of the analysis in the fact that we are using actual phone survey data to validate the use of individual-level weights in a setup that mimics the kind of analysis that researchers and analysts would actually use the phone survey data for. In the specific case of round 5 in Malawi and Nigeria, we have individual-level data for all adult household members, but in the overwhelming majority of situations, analysts and researchers will have individual-level data only for the main respondent – that is at the heart of the motivation for this paper. As part of the unique set-up in round 5 in Malawi and Nigeria, we can create an alternative benchmark of the general adult population and assess how the differences between the respondents and the general adult population likely affect individual-level analyses with phone survey data when the data are only available for main respondents. We also assess how well the individual-level weights fare at reducing these differences. With this framing, we do now think this part adds value to the overall analysis.

Having said that potential proxy response bias is an important caveat that we acknowledge in the section. The paper cites Kilic et al. (2020) who, for Malawi, provides suggestive evidence of underreporting of wage- and self-employment in traditional face-to-face survey data collection that allows for proxy respondents and non-private interviews – in comparison to interviewing adult household members in private. However, the effects are less pronounced for 7-day recall (the reference period used in phone surveys) versus 12-month recall.

Ultimately, in our set-up, however, it would be speculative to quantify potential proxy response bias. We argue that proxy response is second-best to self-reporting of all individuals but that it is still preferable to have data on all (adult) individuals, even if reported through a proxy. Given the magnitude of the differences between respondents and the general adult population, we contend that errors stemming from proxy reporting could be a second-order concern. As such, we feel as long as we acknowledge the issue of proxy response adequately, it is worthwhile to present the results in sections 2.6 and 3.3 for illustrative purposes.

Comment: Finally, with the large set of variables that the authors look into, times the 4 countries for which the analyses are replicated, there are a lot of numbers to review in order to draw conclusions. Although I see why the weights are estimated at the country level, in presenting the results on bias correction, the regressions could be presented at the aggregate level, combining all countries in one regression, and the country-specific analyses could be presented in an appendix. Alternatively, would there be scope for an aggregation over the different variables, in order to reduce the number of coefficients that need to be interpreted and aggregated by eyeballing before drawing conclusions?

Response: The volume of numbers and figures to look at is indeed large. However, we feel it is important to keep the analysis at the country-level to allow for an evaluation of how the country-specific contact and selection protocols may have affected the comparisons between the respondents and the general adult population and the success of our bias correction.

A case in point is the Malawi survey whose particular contact protocol we now discuss in greater detail (in response to your second comment). We further hope that once the relevant figures are next to the text in the final published manuscript, it will be possible to draw conclusions from the results more easily. Similarly, we do not believe it would benefit the analysis to aggregate over different variables, as most of them are binary based on underlying categorical variables.

Comment: Other comments:

- The first part of the introduction appears negligent of a large body of literature on phone surveys and survey design; but in fact this literature is mentioned towards the end of the introduction. In this case, I would consider integrating the existing literature in the first part of the introduction and presenting the findings along with their implication as the key contribution, to shorten the introduction and appear less negligent of the work that has already been done in this field.

Response: Thank you for this comment. We have integrated the relevant references from the existing literature early on the in the Introduction, which we have also refocused and shortened in line with your first comment. We have opted to retain a short paragraph summarizing this literature at the end of the Introduction in an attempt to do justice to the breadth of the literature, some of which is not directly related to the problem we analyze but still worth mentioning in the paper for readers who would like to dive deeper into related issues.

Comment: The figures were not visible in the manuscript itself, only as the appendices of the submission.

Response: Apologies for this, this was based on our interpretation of the PLOSONE required formatting. We understand in the final published article, the figures will again appear alongside the text to which they pertain.

Comment: A bit more explanation on why certain things vary across countries would be useful. For instance, why are the fifth round data for two countries excluded? Did these not include the employment outcomes? But how does that rhyme with all surveys being standardized across the four countries? And why were the selection protocols different in the four countries?

Response: We have clarified this in the text in section 2.1: questionnaire design was comparable and a questionnaire working group developed modules which countries could then adopt. A set of core modules was adopted by all countries while other, optional modules were adopted according to country needs and interest. This is why certain topics are covered in some countries but not in others, including the fifth-round data on individual-level employment outcomes. We added an additional sentence in section 2.1 explaining that the individual-level employment data was collected only in these two countries.

As for selection protocols and interview protocols more broadly, these followed the advice provided in Amankwah et al. (2020). At the same time, as these are NSO-owned and World Bank-supported surveys, there is always scope for customization and contextualization and selection protocols reflected what NSOs and World Bank support staff jointly considered to lead to the best outcomes in terms of high response rates and data quality. Another factor was that each of the face to face baseline surveys recorded household contact phone numbers in a slightly different way – ranging from only collecting the phone number for the head of household to collecting the phone numbers for all household members. This then had a bearing on contact protocols and also likely affected the selection of the main respondent. Part of the implications of this paper is that in future survey implementers need to think more critically about recording and curating contact phone numbers and about selection protocols than was done with operational urgency when the COVID-19 phone surveys were set up. In any case, we have reflected these points in the manuscript in section 2.3.

Comment: At times, the paper reads a bit like a promotional brochure for the World Bank (for instance the introduction claims that the World Bank is the prime institute doing and learning around phone surveys, followed by 3-4 other institutions, among others). The fact that the anonymized surveys are published online a few weeks after completion could be a footnote in the methods section and does not require an entire paragraph. The paragraph in the conclusion stating how widely these data are used is irrelevant unless it is used to illustrate that people are using the data for individual-level analyses and therefore using the wrong household-level weights.

Response: Thank you for this comment. Since the PLOSONE style guide does not allow for footnotes, we had initially moved the information on publication of the surveys after a few weeks into the body text. However, we have now removed that information, retaining only what we considered relevant to data collection methodology. The idea of the paragraph in question in the conclusion was to highlight that there is now quite a lot of phone survey data out and that it is being used widely – and so our findings pertain to a growing and increasingly used data type. Arguably, the paragraph did not communicate that idea clearly enough and we have reframed it, to expand beyond just the four countries question.

Comment: Bottom of page 19 (last sentence), note that there is a small typo: households instead of household.

Response: Thank you. We have fixed this.

Comment: IRB: Clarify that the data to which the authors had access were anonymized and that the merging of phone survey and face-to-face data was based on an anonymized household ID. Even if for data collection no ethics approval was needed, for a researcher to work with data that contain personal identifiers, and merge data from different sources, IRB should be obtained.

Response: Thank you for pointing this out. We clarified that the data were anonymized/de-identified and the household and individual IDs were likewise anonymized.

Comment: Overall, though, the paper presents an extremely useful analysis that needs to be published, since we should be more aware that our household-level weight adjustments often do more harm than good if using individual-level data, and that survey respondent selection protocols need to be adjusted when the objective is individual-level analysis.

Response: Thank you very much.

References

Henderson, S., Rosenbaum, M., 2020. Remote Surveying in a Pandemic: Research Synthesis. Innovation for Poverty Action.

Himelein, K., 2014. Weight Calculations for Panel Surveys with Subsampling and Split-off Tracking. Statistics and Public Policy 1, 40–45. https://doi.org/10.1080/2330443X.2013.856170

Himelein, K., Eckman, S., Kastelic, J., McGee, K., Wild, M., Yoshida, N., Hoogeveen, J., 2020. High Frequency Mobile Phone Surveys of Households to Assess the Impacts of COVID-19. Guidelines on Sampling Design. World Bank, Washington D.C.

Little, R.J.A., Lewitzky, S., Heeringa, S., Lepkowski, J., Kessler, R.C., 1997. Assessment of weighting methodology for the National Comorbidity Survey. American journal of epidemiology 146, 439–449.

Rosenbaum, P.R., Rubin, D.B., 1984. Reducing Bias in Observational Studies Using Subclassification on the Propensity Score. Journal of the American Statistical Association 79, 516–524. https://doi.org/10.2307/2288398

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Bjorn Van Campenhout

8 Oct 2021

Representativeness of individual-level data in COVID-19 phone surveys: Findings from Sub-Saharan Africa

PONE-D-21-17113R1

Dear Philip,

I heard back from the two reviewers and they both indicated that all their comments and suggestions were satisfactory addressed. Therefore, I have decided to accept the article as is. Congratulations!

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Bjorn Van Campenhout, Ph.D.

Academic Editor

PLOS ONE

Acceptance letter

Bjorn Van Campenhout

14 Oct 2021

PONE-D-21-17113R1

Representativeness of individual-level data in COVID-19 phone surveys: Findings from Sub-Saharan Africa

Dear Dr. Wollburg:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Bjorn Van Campenhout

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Descriptive statistics.

    Ethiopia and Malawi. Notes: No weights are used. † denotes a dichotomous variable Respondent identifies whether the individual was a HFPS respondent–set to 1 for all individuals under the Phone Respondents column. These variables originate from the pre-COVID-19 F2F survey in each country.

    (PDF)

    S2 Table. Descriptive statistics.

    Nigeria and Uganda. Notes: No weights are used. † denotes a dichotomous variable. Respondent identifies whether the individual was a HFPS respondent–set to 1 for all individuals under the Phone Respondents column. These variables originate from the pre-COVID-19 F2F survey in each country.

    (PDF)

    S3 Table. Marginal effects from logit regressions on mobile ownership in the sampled household baseline datasets.

    Notes: † denotes dichotomous variables. Standard errors are reported in parentheses. ***/**/* denote statistical significance at the 1/5/10 percent level, respectively. For each country the sample is all F2F survey household members age 15 and older for the set of households that were successfully interviewed in round 1 of the phone survey.

    (PDF)

    S4 Table. Ethiopia: Tests of difference between face-to-face adults and phone respondents, disaggregated by sex and age group.

    Notes: Base row reports the nationally representative mean among all adults in the face-to-face survey. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference.

    (PDF)

    S5 Table. Malawi: Tests of difference between face-to-face adults and phone respondents, disaggregated by sex and age group.

    Notes: Base row reports the nationally representative mean among all adults in the face-to-face survey. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference.

    (PDF)

    S6 Table. Nigeria: Tests of difference between face-to-face adults and phone respondents, by sex and age group.

    Note: Base row reports the nationally representative mean among all adults in the face-to-face survey. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference.

    (PDF)

    S7 Table. Uganda: Tests of difference between face-to-face adults and phone respondents, by sex and age group.

    Notes: Base row reports the nationally representative mean among all adults in the face-to-face survey. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference.

    (PDF)

    S8 Table. Difference between adults and phone respondent employment outcomes, by sex.

    Notes: Base row reports the HFPS-based nationally representative mean among all adults present in the face-to-face and phone surveys. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference. Employment = 1 if individual spent any time in the last seven days doing specified work, 0 otherwise. All data are from the fifth round of the HFPS in Malawi and Nigeria.

    (PDF)

    S9 Table. Difference between adults and phone respondent employment outcomes, by age group.

    Notes: Base row reports the HFPS-based nationally representative mean among all adults present in the face-to-face and phone surveys. Rows other than the base row report the difference from the base and a p-value from a test of significance for that difference. Employment = 1 if individual spent any time in the last seven days doing specified work, 0 otherwise. All data are from the fifth round of the HFPS in Malawi and Nigeria.

    (PDF)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    The data are available from the World Bank's Microdata Library, High Frequency Phone Survey Catalog (https://microdata.worldbank.org/index.php/catalog/hfps) as well as LSMS catalog (https://microdata.worldbank.org/index.php/catalog/lsms).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES