Skip to main content
Public Health Action logoLink to Public Health Action
. 2013 Mar 21;3(1):56–59. doi: 10.5588/pha.12.0071

Assessing and improving data quality from community health workers: a successful intervention in Neno, Malawi

A J Admon 1, J Bazile 2,3, H Makungwa 3, M A Chingoli 3, L R Hirschhorn 2,4, M Peckarsky 2, J Rigodon 2,3, M Herce 2,3, F Chingoli 5, P N Malani 1,6, B L Hedt-Gauthier 4,
PMCID: PMC4354657  NIHMSID: NIHMS665006  PMID: 25767750

Abstract

Setting:

A community health worker (CHW) program was established in Neno District, Malawi, in 2007 by Partners In Health in support of Ministry of Health activities. Routinely generated CHW data provide critical information for program monitoring and evaluation. Informal assessments of the CHW reports indicated poor quality, limiting the usefulness of the data.

Objectives:

1) To establish the quality of aggregated measures contained in CHW reports; 2) to develop interventions to address poor data quality; and 3) to evaluate changes in data quality following the intervention.

Design:

We developed a lot quality assurance sampling-based data quality assessment tool to identify sites with high or low reporting quality. Following the first assessment, we identified challenges and best practices and followed the interventions with two subsequent assessments.

Results:

At baseline, four of five areas were classified as low data quality. After 8 months, all five areas had achieved high data quality, and the reports generated from our electronic database became consistent and plausible.

Conclusion:

Program changes included improving the usability of the reporting forms, shifting aggregation responsibility to designated assistants and providing aggregation support tools. Local quality assessments and targeted interventions resulted in immediate improvements in data quality.

Keywords: lot quality assurance sampling, supervision, quality improvement


Community health worker (CHW) programs have been employed by Partners In Health (PIH) since the mid-1980s, providing health education and linking populations with poor health care access to local health care resources.1 Abwenzi Pa Za Umoyo (APZU), the PIH sister organization in Malawi, established a CHW program to support primary care service delivery in rural, resource-poor Neno District. This program includes nearly 700 community health workers (referred to as village health workers by APZU), each receiving specialized training to serve as health educators, to accompany patients with human immunodeficiency virus (HIV) and/or tuberculosis (TB) to clinic visits, and to link patients to the formal health system. Along with these activities, about one third of CHWs (n = 240) are also tasked with active case finding for priority diseases and monthly data collection through the Household Chart (HHC) Program.

For the HHC Program, CHWs visit each of their approximately 40 households monthly to collect demographic and health data on nearly 30 indicators. The CHWs generate household reports; Health Surveillance Assistants (HSAs) then aggregate across all household reports to generate reports for each CHW in their area. These CHW reports are then entered into an electronic database from which monthly reports are generated for PIH and Ministry of Health (MoH) management.

As with other health programs, CHW data could play a critical role in supporting implementation by assisting in identifying and addressing programmatic gaps.24 However, there were numerous concerns over the quality of the data due to the large volume and complex aggregation pathways. An informal inspection revealed several instances in which chart data were aggregated incorrectly, leading to significant inaccuracies in the final reports presented to MoH/PIH.

In this article, we describe our system for the formal assessment of the quality of CHW reports; we provide detail on the development and implementation of interventions to address poor data quality; and we conclude with a discussion on the impact of improved data quality on the ability to successfully manage the program and suggestions of extending lessons learned to other programs.

STUDY POPULATION, DESIGN AND METHODS

Although there have been considerable improvements in health in Malawi over the last 10 years, the country still suffers from high rates of under-five and maternal mortality, at respectively 112 per 1000 and 675 per 100 000 live births.5 Formed in 2003, Neno District is home to approximately 110 000 individuals who live in four ‘Traditional Authority’ areas. The district is served by a district hospital, one community hospital, 11 government health centers and 78 smaller health posts. Each health center has a catchment area that draws from three to 11 health posts, and each health post serves several villages. This study focuses on 21 of the 78 health posts from three health centers, grouped into five clusters. Each cluster contained three to five health posts from a single health center. Each cluster received a classification of high or low data quality using the data quality assessment tool described below.

Data quality assessment tools

Lot quality assurance sampling (LQAS) is a classification procedure that classifies the performance of a ‘lot’ by evaluating a representative sample.68 To implement, n units are randomly sampled and the number with the trait of interest (i.e., report reliability) is compared to a predetermined decision rule, d. If d or more of the units have the trait of interest, the area is classified as high with respect to that metric. If fewer than d units have the trait of interest, then the area is classified as low.

As the end result is a two-level classification and not an estimate with high levels of precision, LQAS can often be implemented with relatively small sample sizes to meet program goals, translating into lower costs and more rapid feedback loops than other assessment methods. Furthermore, the classification of an area as ‘high performing’ or ‘low performing’ on a particular metric can immediately be linked to programmatic improvement. LQAS has been used in over 800 health applications,9 including for data quality assessments in other types of health programs1012 and monitoring CHW service delivery.13 Due to the precedence in successful use of LQAS for program monitoring, the ability to classify areas with relatively small sample sizes, the direct link of classifications to action and the ability to integrate LQAS into ongoing supervision activities, APZU opted to use this method to formally assess the quality of CHW reports.

To determine the sample size and decision rules for LQAS, four parameters must be defined: the definition of high data quality, the definition of low data quality and the allowable misclassification risks at each of these two thresholds. Based on extensive discussions on the necessary levels of data quality to be reliable for program management, we defined high data quality as 90% or better agreement between CHW reports and the corresponding household reports. We defined low data quality as 70% or worse agreement between CHW and household reports, and limited misclassification risks to <10% at each threshold, i.e., the probability of classifying as low data quality in areas with 90% agreement was <10% and the probability of classifying as high data quality in areas with 70% agreement was <10%. Based on these four specifications, our resulting LQAS system had a sample size of 25 and a decision rule of 5.14

Sample selection and data collection

To allow for LQAS in each of the five clusters of health posts, 25 CHWs were randomly sampled from each cluster. The sampled CHWs were invited to their closest health posts and asked to bring their household forms and registers. The data for all households were re-aggregated for four indicators; these re-aggregated numbers were compared to the aggregated results reported in the CHW report. For a single sampled report, if that report had a value differing by >10% from the re-aggregated data for any of the four indicators, it was marked as erroneous. To conserve resources, early stopping was allowed whereby once five erroneous reports (or alternatively 21 correct reports) were observed, the area was classified as low data quality (or high data quality) and the remaining reports were not aggregated.

Data quality was assessed at baseline (July 2011) and at two time points after the interventions were implemented to improve data quality (October 2011 and March 2012). Each assessment used the same LQAS design and evaluated the same four indicators: number of households, total number of individuals not tested for HIV, number of individuals aged >15 years not tested for HIV in the past 6 months and number of children aged <5 years. These indicators were chosen for two reasons: first, they are typically reported as non-zero values for each household and are therefore most sensitive to errors in aggregation; and second, these indicators were prioritized because the first and last indicators provide denominators for many other reported indicators, while questions on HIV help characterize gaps in HIV care in Neno District.

Analysis

Data were collected and analyzed in Excel (Microsoft, Redmond, WA, USA). In each cluster and for each time point, ≥5 erroneous reports led to that cluster being classified as low data quality; ≥21 reports without errors led to the cluster being classified as high data quality. The percentage of reviewed reports with errors are reported indicator by indicator for each time point. The routinely collected data, aggregated across one health center, are presented for two indicators—the proportion of pregnant women who were tested for HIV and the proportion of pregnant women who had had at least one antenatal care (ANC) visit. The values for these indicators are reported for 15 months, including a pre-intervention phase (January–May 2011) and a post-intervention phase (August–March 2012).

Ethics statement

The data presented here were generated by a routine program monitoring and evaluation activity. Only data routinely collected as part of the CHW program were reviewed and no identifiable patient information was extracted. There was no direct contact between the investigators and the households for the purposes of this study. As this activity was conducted as a routine quality improvement exercise for APZU, formal ethics review was not required.

RESULTS

In the first assessment, four of five clusters were identified as having poor data quality in the CHW reports (Table 1). The percentage of reviewed reports with >10% error on the first indicator, number of households, was the lowest (25.5%). The other three indicators had errors exceeding 10% in approximately 40–45% of reviewed reports (Table 2).

TABLE 1.

Data quality assessment results pre- and post-intervention per cluster, Neno District, Malawi

Cluster Number of health posts Pre-intervention quality (July 2011) Post-intervention quality (October 2011) Post-intervention quality (March 2012)
1 4 Low Low High
2 5 Low High High
3 5 Low High High
4 4 Low High High
5 3 High High High

TABLE 2.

Percentage of reviewed reports with poor data quality, pre- and post-intervention, by indicator, Neno District, Malawi

Reports with poor data quality
Pre-intervention (July 2011) % Post-intervention (October 2011) % Post-intervention (March 2012) %
Indicator 1: number of households 25.5 2.6 3.2
Indicator 2: number of individuals tested for HIV 41.8 5.2 4.8
Indicator 3: number of individuals aged >15 years tested for HIV 43.6 6.9 4.0
Indicator 4: number of children aged <5 years 41.8 7.8 7.3

HIV = human immunodeficiency virus.

The second and third assessments followed the implementation of several interventions to improve data quality. In October 2011, 4/5 clusters had high data quality, three of which had poor data quality before the intervention. At the assessment in March 2012, all five areas were classified as having high data quality. For the second and third assessments, the number of households continued to have the lowest error rates in the reviewed records. However, the percentage of reviewed reports that were determined to be erroneous dropped below 10% for all indicators.

The Figure tracks routinely collected data for two indicators. The pre-intervention phase, January–May 2011, shows reported values that were highly variable and frequently reported erroneously as >100%. The post-intervention phase, August–March 2012, shows results that are consistent and within plausible ranges.

FIGURE.

FIGURE

Reported values for two indicators in one health center, January 2011 to March 2012: 1) proportion of pregnant women who did not undergo HIV testing (triangles, blue line) and 2) proportion of pregnant women who did not visit an ANC clinic (squares, red line). HIV = human immunodeficiency virus; ANC = antenatal care.

DISCUSSION

We observed poor CHW report quality at baseline, with four of five areas being classified as having low data quality. Within 8 months of implementation of a targeted bundle of interventions, all five areas were classified as having high data quality. The strength of this data quality assessment results from two key processes: the first is classifying specific areas as either meeting or not meeting data quality thresholds, while the second is following up with stakeholder interviews to identify best practices as well as obstacles to achieving sufficient data quality. After meeting with multiple groups of CHWs and HSAs from those clusters with poor quality, we learned that competing demands from the MoH and other organizations left little time to support the aggregation process. Moreover, the HSAs were often poorly equipped for large-scale data aggregation, primarily relying on hand tabulations. In contrast, interviews of HSAs from the one area with high data quality revealed that CHW reports were completed by the CHW site supervisor—an individual with more training and involvement in the CHW program than the typical HSA and fewer demands posed by outside organizations.

Based on these results and recommendations, we adapted our program by shifting aggregation responsibilities away from HSAs. Relying entirely on CHW site supervisors was not immediately feasible given the heavy workload involved in aggregation. We therefore identified CHWs from each health post to serve as data aggregation assistants who would do the aggregation in collaboration with and under the guidance of the site supervisors. These CHWs and the site supervisors received special training on aggregation and the use of calculators, which were distributed to each health center. In addition to the shift of aggregation responsibility, we modified the reporting forms based on the feedback from HSAs in the poorly performing clusters. These changes included translating the forms into Chichewa (the local language), creating more contrast for the headings to make sections easier to identify and allowing more space for reported numbers. The shift in aggregation responsibilities and changes in forms were all implemented by the end of July 2011.

As a result of these tailored interventions, we observed rapid improvements in data quality. The Figure demonstrates the immediate effects of these changes. The pre-intervention data collection was marked with highly variable data and values that were often outside reasonable bounds. Managers often dismissed these data and did not use them to inform program implementation. Within 3 months of the first assessment and immediately after the implementation of tailored interventions, the data became more consistent and plausible, providing reliable evidence to managers on program coverage and inefficiencies. While the specific challenges and appropriate interventions may differ, this process of baseline assessments and developing tailored interventions can be repeated for other CHW and health programs.

There are several limitations to this study. Due to limited resources, we were unable to review the quality of all CHWs in our catchment area. For this reason, we based our classification of areas as low or high quality on a sample of 25 CHW reports per cluster, and these classifications are subject to error. In the design of the LQAS system, we specified the maximum allowable misclassification risk at each threshold to be no more than 10%. The risk of misclassification is even lower in our setting, as we were sampling a sizable fraction of our CHWs. Furthermore, our program defined levels of high data quality (≥90% of reports consistent with re-aggregated forms) and low data quality (≤70% of reports consistent with re-aggregated forms). The resulting classifications align with these definitions, and cannot extend to other program definitions of high and low data quality.

Another limitation is that the re-aggregation of data for the study could also be subject to error. Data collection was implemented by the highly qualified central APZU monitoring and evaluation team, and was closely supervised to ensure quality. These results focus only on report reliability, and a separate study should be conducted to determine whether the data collected in the source forms are accurate. Finally, the classifications are based on four of more than 30 CHW indicators reported monthly. Looking at more than four indicators at a single time point can be unwieldy; however, as the data quality assessment becomes integrated into routine supervision, we recommend rotating the indicators of focus.

While any health services delivery project can be strengthened when accurate program data inform decision making, resource-limited settings are under particular pressure to maximize the impact of every dollar spent. High data quality and efficient data collection systems are therefore critical to the CHW program in Malawi and similar programs globally. In many resource-limited settings, funding for the administrative tasks of data collection and aggregation is often drawn from the same pool as direct service delivery. This tension frequently limits the resources allocated to such data collection activities, often at the expense of ensuring data quality. Sample-based data quality assessments are likely a more efficient means of determining data quality as compared to exhaustive data reviews. In complement, data collection systems should only collect the minimum data elements needed for program management and collect no more indicators than can be processed and directly linked to program response in a timely manner.

Now that our program has achieved high reporting quality in the CHW report, we plan to conduct quarterly assessments to ensure that reporting quality remains high. This activity will be complemented with other assessments, including data quality checks in our other implementation areas to improve data quality across our health programs, ultimately improving our ability to provide appropriate, effective and data-driven health services in Neno District.

Acknowledgments

The authors thank the village health workers, health surveillance assistants and community health worker (CHW) site supervisors for their diligent work in Neno District and their commitment to improving the household chart program. The authors also extend their sincere gratitude to the Neno District Health Office and the Ministry of Health for supporting the CHW and household chart (HHC) programs. Finally, they also thank B Chabwera for his work in supporting the HHC program.

This project was supported in part by funding from Global REACH, University of Michigan Medical School, Ann Arbor, MI, USA. BHG received support from the Department of Global Health and Social Medicine Research Core at Harvard Medical School, Cambridge, MA, USA.

Conflict of interest: none declared.

References

  • 1.Mukherjee J S, Eustache F E. Community health workers as a cornerstone for integrating HIV and primary healthcare. AIDS Care. 2007;19(Suppl 1):S73–S82. doi: 10.1080/09540120601114485. [DOI] [PubMed] [Google Scholar]
  • 2.AbouZahr C, Boerma T. Health information systems: the foundations of public health. Bull World Health Organ. 2005;83:578–583. [PMC free article] [PubMed] [Google Scholar]
  • 3.Chan M, Kazatchkine M, Lob-Levyt J, et al. Meeting the demand for results and accountability: a call for action on health data from eight global health agencies. PLoS Med. 2010;7:e1000223. doi: 10.1371/journal.pmed.1000223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mphatswe W, Mate K S, Bennett B, et al. Improving public health information: a data quality intervention in KwaZulu-Natal, South Africa. Bull World Health Organ. 2012;90:176–182. doi: 10.2471/BLT.11.092759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.National Statistical Office. Malawi/ICF Macro. Malawi Demographic and Health Survey 2010. Calverton, MD, USA: ICF Macro; 2011. [Google Scholar]
  • 6.Dodge H, Romig H. A method of sampling inspection. The Bell System Technical J. 1929;8:613–631. [Google Scholar]
  • 7.Lwanga S, Lemeshow S. Sample size determination in health studies: a practical manual. Geneva, Switzerland: World Health Organization; 1991. [Google Scholar]
  • 8.Valadez J J. Assessing child survival programs in developing countries: testing lot quality assurance sampling. Cambridge, MA, USA: Harvard University Press; 1991. [Google Scholar]
  • 9.Robertson S E, Valadez J J. Global review of health care surveys using lot quality assurance sampling (LQAS), 1984–2004. Soc Sci Med. 2006;63:1648–1660. doi: 10.1016/j.socscimed.2006.04.011. [DOI] [PubMed] [Google Scholar]
  • 10.Stewart J C, Schroeder D G, Marsh D R, Allhasane S, Kone D. Assessing a computerized routine health information system in Mali using LQAS. Health Policy Plan. 2001;16:248–255. doi: 10.1093/heapol/16.3.248. [DOI] [PubMed] [Google Scholar]
  • 11.Laroche M L, Vergnenègre A, Druet-Cabanac M, Boutros-Toni F, Salamon R, Preux P M. [Quality of medical records in the ‘Medical Information System Program’: application of the lot quality assurance sampling method] Rev Epidemiol Sante Publique. 2002;50:433–439. [French] [PubMed] [Google Scholar]
  • 12.Hedt-Gauthier B L, Tenthani L, Mitchell S, et al. Improving data quality and supervision of antiretroviral therapy sites in Malawi: an application of lot quality assurance sampling. BMC Health Serv Res. 2012;12:196. doi: 10.1186/1472-6963-12-196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Valadez J J, Weld L, Vargas W V. Monitoring community health workers’ performance through lot quality-assurance sampling. Am J Public Health. 1995;85:1165–1166. doi: 10.2105/ajph.85.8_pt_1.1165-a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lemeshow S, Taber S. Lot quality assurance sampling: single- and double-sampling plans. World Health Stat Q. 1991;44:115–132. [PubMed] [Google Scholar]

Articles from Public Health Action are provided here courtesy of The International Union Against Tuberculosis and Lung Disease

RESOURCES