Skip to main content
Springer logoLink to Springer
. 2025 Sep 25;3(1):50. doi: 10.1186/s44247-025-00190-4

Malaria RDT (mRDT) interpretation accuracy by frontline health workers compared to AI in Kano state, Nigeria

Sasha Frade 1,✉,#, Shawna Cooper 1,#, Sam Smedinghoff 1,#, David Hattery 1,#, Yongshao Ruan 1,#, Paul Isabelli 1,#, Nirmal Ravi 2,#, Megan McLaughlin 3,#, Lynn Metz 3,#, Barry Finette 3,4,#
PMCID: PMC12460424  PMID: 41018674

Abstract

Background

Although malaria is preventable and treatable, it continues to be a significant cause of illness and death. Early diagnosis through testing is critical in reducing malaria-related morbidity and mortality. Malaria rapid diagnostic tests (mRDTs) are preferred for their ease of use, sensitivity, and rapid results, yet misadministration and misinterpretation errors persist. This study investigated whether pairing an existing application with an AI-based software could enhance interpretation accuracy among Frontline Healthcare Workers (FHWs) in Kano State, Nigeria.

Methods

A comparative analysis was conducted, examining mRDT interpretations by FHWs, trained expert mRDT reviewers (Panel Readers), and AI-based computer vision algorithms. The accuracy comparisons included: (1) AI interpretation versus Panel Read interpretation, (2) FHW interpretation versus Panel Read interpretation, (3) FHW interpretation versus AI interpretation, and (4) AI performance on faint positive lines. Accuracy was reported as a weighted F1 score, reflecting the harmonic mean of recall (sensitivity) and precision (positive predictive value).

Results

The AI algorithm demonstrated high accuracy, matching Panel Read interpretations correctly for positives 96.38% of the time and negatives 97.12%. FHW interpretations agreed with the Panel Read 96.82% on positives and 94.31% on negatives. Comparison of FHW and AI interpretations showed 97.52% agreement on positives and 93.38% on negatives. The overall accuracy was higher for AI (weighted F1 score of 96.4) compared to FHWs (95.3). Notably, the AI accurately identified 90.2% of 163 faint positive mRDTs, whereas FHWs correctly identified 76.1%.

Conclusion

AI-based computer vision algorithms performed comparably to trained and experienced FHWs and exceeded FHW performance in identifying faint positives. These findings demonstrate the potential of AI technology to enhance the accuracy of mRDT interpretation, thereby improving malaria diagnosis and reporting accuracy in malaria-endemic, resource-limited settings.

Supplementary Information

The online version contains supplementary material available at 10.1186/s44247-025-00190-4.

Keywords: RDT, Rapid test, Malaria, AI, Artificial intelligence, AI algorithms, ML, Machine learning, CV, Computer vision, Diagnosis

Background

Introduction and background

Although malaria is preventable and treatable [1], it is presently a significant cause of death around the world. Globally, it is estimated that there are nearly 250 million cases of malaria [13] with a mortality rate between 0.3% and 2.2% [4]. In 2021 alone, 619,000 deaths were recorded resulting from 247 million malaria cases, an increase of 2 million cases from 2020 [5]. Malaria’s global disease burden is best demonstrated through a loss of 46,438,000 Disability Adjusted Life Years (DALY), a figure which considers both mortality and disability [6].

Sub-Saharan Africa accounts for 92% of the world’s malaria burden [3], and in malaria endemic regions with tropical climates, the malaria mortality rate ranges from 11 to 30% [4]. Focusing in on this study’s geography, Nigeria accounts for up to a quarter (26.6%) of global malaria deaths [5] and over a quarter (31%) of the global malaria burden [3, 5]. In Kano State specifically, the malaria prevalence is 32% [7, 8], especially when the infection remains untreated.

One key strategy found to decrease malaria prevalence and mortality rates is early diagnosis and fast-acting treatment to prevent severe disease [4, 7]. Rapid and accurate testing ensures effective diagnosis, leading to effective treatment, which can prevent the further spread of infection and potentially combat the development of drug-resistant malaria due to overtreatment of artemisinin-based combination therapy [3, 4, 8, 9]. Malaria RDTs (mRDTs), which detect the presence of malaria parasite antigens in a blood sample, require no special equipment, advanced medical training, or specific experience to administer. mRDTs are small and easily transportable, do not require samples to be stored, and are used with minimal training at the point of care, even in remote settings. Due to their ease of use, sensitivity, rapid results, and cost-effectiveness compared to the other available diagnostics, mRDTs are the test of choice for malaria diagnostic testing [8, 10] in most resource constrained settings.

However, malaria rapid diagnostic test (mRDT) misadministration and misinterpretation errors made by Frontline Health Workers (FHWs) remain a concern, including the misinterpretation of faint lines. Health workers may not see a faint line or may think they see a line that isn’t present [11, 12]. Other factors contributing to these errors include a lack of training, regular supervision, feedback, practice oversight, or simply misunderstanding the meaning of each line [11, 13]. One study found that FHWs with no job aid and/or training only interpreted mRDTs correctly 54% of the time [14, 15], while another study found that community health worker interpretation accuracy of faint positive test lines declined over a 12-month period, from 89.7 to 76.7% [14]. Misadministration, defined as improper handling or execution of the RDT, can further impact the accuracy of test results. Factors such as incorrect sample application, improper use or quantity of diluent, or failure to adhere to required processing times contribute to misadministration. Malia Skjefte et al. [16] highlights similar issues in their study, demonstrating that while digital tools can improve test administration, persistent challenges remain, especially in low-resource settings. As such, to gain the full benefit of mRDTs, it is essential to minimize both errors in administration and misinterpretation [16].

A smartphone application that guides the health worker in interpreting the mRDT result is a simple, user-friendly, and accessible way to assist in minimizing such errors and misinterpretations, with the potential of increasing overall accuracy [11]. A similar application was found to have high feasibility and usability scores in South Africa [17], but the related study did not investigate whether the app increased the accuracy of mRDT interpretations. Therefore, this study aimed to investigate whether the use of a mobile application with artificial intelligence (AI)-based computer vision technology for interpreting mRDTs could improve the accuracy of mRDT interpretations amongst FHWs in Kano State, Nigeria (Fig. 1). This study focuses specifically on AI’s ability to minimize misinterpretation of mRDT results, particularly in cases involving faint lines or unclear test outcomes. While AI plays a critical role in improving the accuracy of result interpretation, it does not address all potential errors related to the administration of mRDTs, such as improper test handling or intentional misinterpretation by health workers. Addressing these broader issues requires additional training and supervision, which fall outside the scope of this investigation. The primary focus of this study is on evaluating the accuracy of the AI algorithm in interpreting malaria RDT results, particularly in complex cases such as faint positive lines. While AI may offer additional benefits such as streamlining workflows or providing decision support, its most significant contribution in this context is improving the precision of mRDT interpretation, thereby reducing errors and enhancing diagnostic reliability.

Fig. 1.

Fig. 1

Map of Kano state, Nigeria, based on work by Uwe Dedering (2010, February 11). Retrieved from https://commons.wikimedia.org/wiki/File:Nigeria_Kano_State_map.png

Methods

Study site and population

The study was conducted in Nasarawa LGA in Kano. Three types of health workers participated in the study: community health workers from a private community health program called REACH, laboratory scientists from a private primary healthcare care centre called EHA Clinics, and community health workers from a public primary healthcare centre. Forty-four (44) FHWs were included in the study; 8 REACH community health workers, 6 EHA Clinics (EHAC) health workers, and 30 Kano state government community health workers. REACH community health workers were highly trained health workers mobilised and deployed by EHAC in their respective communities, who primarily visited patients in their homes. EHA Clinics health workers were laboratory scientists who saw patients at the EHA Clinics health facility in Kano. Finally, Kano State government health workers were based within a local clinic which served as a triage point of care, referring internally to clinicians when necessary.

Study design and workflow

This study incorporated HealthPulse AI algorithms into the workflow of THINKMD’s clinical risk assessment platform (THINKMD platform), which was used by all FHWs. None of the FHWs had previously used the THINKMD platform. The platform was used to assess and manage patients in Kano State, Nigeria by providing integrated clinical severity, condition, and disease assessments, triage, treatment, and follow-up recommendations. The assessment process involved having FHWs run an SD Bioline mRDT if the THINKMD platform recommended testing. The THINKMD platform provided detailed instructional guides that assisted healthcare workers in the proper administration of malaria RDTs. These step-by-step guides helped address common issues related to test performance, such as the correct application of buffer, proper timing for result interpretation, and ensuring test results were read within the appropriate window. After the mRDT was administered, FHWs used the HealthPulse app, accessible from the THINKMD platform via Android Intent protocols, to capture a photo of the mRDT and collect the FHWs’ mRDT interpretation (Fig. 2).

Fig. 2.

Fig. 2

THINKMD platform app and HealthPulse AI flow

The HealthPulse solution then used artificial intelligence (AI) algorithms to interpret the result of the mRDT. The AI algorithms included multiple computer vision (CV) and machine learning (ML) components, in addition to Image Quality Assurance (IQA) designed to identify images that would be difficult or even impossible for the CV/ML to evaluate. In this study, IQA was run directly on the mobile devices at the time the photo was taken, providing an opportunity to ask the user to retake images that did not meet sufficient quality. AI interpretations were run post-hoc in the cloud.

In this study, Audere and THINKMD played distinct but complementary roles. Audere, a global digital health nonprofit, developed HealthPulse AI, the artificial intelligence-based platform responsible for the AI algorithms used to interpret mRDT results. These algorithms are designed to enhance diagnostic accuracy, particularly in cases with faint or ambiguous results. Audere led the design and development of the HealthPulse AI system which was integrated into THINKMD’s clinical risk assessment platform.

THINKMD, a public benefit corporation, developed the clinical decision support tool that was used by health workers in the study. The THINKMD platform provided integrated clinical severity and condition assessments, triage, treatment recommendations, and patient follow-up. Previous studies have validated the THINKMD platform’s diagnostic algorithms, including a notable validation published in The American Journal of Tropical Medicine and Hygiene in 2019, which confirmed the platform’s utility in resource-constrained settings [18]. In this study, HealthPulse AI and THINKMD were used together to guide health workers through the clinical assessment and diagnosis of malaria using RDTs.

Both HealthPulse AI and the THINKMD platform were developed independently and prior to this study; no new software development was conducted specifically for this evaluation. Audere and THINKMD provided support for integration, but their developers did not participate directly in data collection or analysis in this study.

To create the AI algorithm, the HealthPulse AI training pipeline consumed a set of over 12,500 labelled lab-created SD Bioline malaria P.f RDT images (positive, negative and invalid) and 1500 labelled field images to produce a set of CV models that work together including:

  • An IQA pipeline which flags images that don’t meet the quality bar, such as excess blur, low lighting, and over-exposure.

  • An initial object detector that locates the RDT and its sub-parts within the image and identifies the RDT type (SD Bioline P.f for this study).

  • A second object detector examines the RDT result window (found by the prior object detector), locating the test and control line regions.

  • A classifier that examines each line region of the result window (found by the second object detector) and outputs line presence, yielding an interpretation of positive, negative, invalid, or uninterpretable.

The AI models used in this study are proprietary and have not been previously published. HealthPulse AI algorithms include IQA, object detection to locate the RDT and its result window, and classifiers to interpret line presence in the result window.

Ethical considerations

Ethics approval was received from The University of Vermont Committees on Human Research (Ethics Number MOD00005335) and the Kano State Ministry of Health Research Ethics Committee (Ethics Number MOH/Off/797/T.I/2056).

Informed consent was obtained from all participants involved in the study, including both healthcare workers and subjects. For children, informed consent was obtained from their parents or legal guardians. Given the low-risk nature of the study, the minimal impact on routine clinical procedures, and the absence of collection of personal identifiers, both ethics committees approved the use of verbal informed consent. Specifically, no patient, caregiver, family, or healthcare worker identifiers were collected or uploaded to the HealthPulse server. Minimal personally identifiable information (such as geolocation tied to photo capture) was temporarily stored locally on the mobile devices only, facilitating the accurate capture and quality control of images. This locally stored information was never uploaded remotely or retained beyond temporary local device storage.

The healthcare worker who served as the patient’s provider shared the Research Information Sheet with the subject’s caregiver, explained the study verbally, answered all questions, and sought verbal consent. Amongst other key information, information was provided and verbal consent sought for the use of de-identified mRDT images, containing no personal identifiers, to be potentially used by Audere for additional research purposes, such as further training and refining of AI algorithms. Participants were informed that their de-identified images would be reviewed by trained personnel for interpretation purposes but were not specifically informed about the international location of these reviewers.

Participation using the THINKMD platform and the integrated AI software (HealthPulse algorithms) integrated into the THINKMD application was used. HealthPulse AI is proprietary software developed by Audere, a non-profit organization, and is available for integration under specific access policies.

This study was funded by a grant from the Gates foundation, INV-007492.

Data storage and reporting

The HealthPulse software stored minimal patient personally identifiable information (PII), such as geolocation of photo capture, temporarily and locally on mobile devices. All patient PII was strictly excluded from data records uploaded to the HealthPulse server hosted on Google Cloud. Thus, no patient or healthcare worker identifiers were remotely accessible, aligning fully with the ethics committee approvals. Data access control rules were configured in a manner compliant with country, study, and partner standards. Reporting was performed out of an Amazon Web Services (AWS) cloud instance in the US. mRDT images that were confirmed to not contain PII may be used by Audere for purposes beyond the study, including training of mRDT artificial intelligence interpretation algorithms.

Result interpretations performed in study

The three interpretation comparisons which form the basis of this paper are described below. To investigate the applicability of an automated interpretation by AI algorithms, the Panel Read interpretation and the FHW interpretation were compared to the AI algorithms interpretation, known as HealthPulse AI.

FHW interpretation Panel read interpretation HealthPulse AI interpretation

After taking a photo of the mRDT, FHWs indicated in the HealthPulse app whether they saw a control line only (negative mRDT result), a control and P.f line (positive mRDT result, or no control line (invalid mRDT result) on the actual RDT

FHWs evaluate the physical mRDT in person, they were not directed to select a positive, negative, or invalid result

A standard reference interpretation for each mRDT image was performed by an external, independent panel of three readers (Panel Readers) from IndiVillage (Bangalore, India).* Where the Panel Readers agreed with the HW’s interpretation of the mRDT result, the HW’s interpretation was deemed correct. A majority vote among the Panel Readers was used to determine the correctness of the HW’s interpretation when consensus was not achieved

The panel readers evaluate a photo of the mRDT without access to any other patient data, providing an unbiased interpretation

To investigate the applicability of an AI-powered interpretation, HealthPulse AI performed an interpretation using only the mRDT photo

The AI evaluates a photo of the mRDT

*IndiVillage personnel were trained by Audere to label and interpret RDTs from both good quality photos and poor-quality photos, including conditions such as blur, poor lighting, and blood in the result window. Each Panel Reader was evaluated for accuracy before being added to the panel. To date they have labeled and interpreted over 150 000 RDTs; 70 000 of which were mRDTs

Tuning, in this context, refers to the process of adjusting the AI algorithms to optimize their performance. When tuning the algorithms, the specific use case and expected malaria prevalence rate in the study population were taken into consideration. In a population skewed towards negative malaria results, using a model tuned to be good at detecting very faint positive lines has a higher possibility of classifying a negative image as positive. It is important to note that adjusting the tuning and prevalence can yield a different weighted F1 score, our overall indicator of AI algorithm performance, which will be discussed more in later sections. An F1 Score is defined as the harmonic mean between recall (sensitivity) and precision (positive predictive value), and the weighted F1 Score is defined as the weighted average of the F1 scores for each classification class, where the weighting is determined by the number of reference labels in each class.

Images shared internationally for independent interpretation by our partner IndiVillage (Bangalore, India) were anonymized and carefully screened for any identifiable patient data. IndiVillage is B-Corp certified and complies with ISO, SOC, GDPR, and HIPAA standards. Despite rigorous protocols to avoid inclusion of PII, any images inadvertently containing such data were flagged, managed securely, and handled following strict privacy standards.

Analysis methods

We begin the analysis showing images of variable quality grouped into 3 categories: (1) good quality images (Fig. 3), (2) sufficient quality where results can be interpreted (Fig. 4), and (3) bad quality images that were uninterpretable by Panel Read and the AI algorithms (Fig. 5).

Fig. 3.

Fig. 3

Examples of good quality mRDT images. These images were clear, with no issues related to lighting, blur, or blood interference, enabling straightforward and accurate interpretation by both Panel Readers and AI algorithms (All identifiable details have been removed, and explicit permission for dissemination was obtained).

Fig. 4.

Fig. 4

Examples of sufficient quality but suboptimal mRDT images. These images had minor quality issues, such as slight blur, shadows, or small amounts of blood in the result window. Despite these challenges, the images were interpretable by both Panel Readers and AI algorithms (All identifiable details have been removed, and explicit permission for dissemination was obtained).

Fig. 5.

Fig. 5

Examples of poor-quality mRDT images that were uninterpretable. Major quality issues, including excessive blood in the result window and significant blurriness, prevented both Panel Readers and AI algorithms from accurately interpreting these images (All identifiable details have been removed, and explicit permission for dissemination was obtained).

Thereafter, the analysis assesses the accuracy of the (1) AI algorithms’ interpretation compared to the Panel Read interpretation (Table 2); (2) FHW interpretation compared to the Panel Read interpretation (Table 3); and (3) FHW interpretation compared to the AI algorithms’ interpretation (Table 4). Unless otherwise specified, the Panel Read interpretation is considered the reference standard. The comparative result tables highlight positives, negatives, and invalids for each.

Table 2.

Comparison of AI algorithms and panel read interpretation

Panel read AI algorithms
Invalid Negative Positive Uninterpretable Total
% (n) % (n) % (n) % (n) n
Uninterpretable 28.57 (2) 1.52 (18) 0.77 (10) 11.1 (1) 31
Invalid 57.14 (4) 0.00 (0) 0.08 (1) 0.00 (0) 5
Negative 14.29 (1) 97.12 (1148) 2.77 (36) 2.22 (2) 1 187
Positive 0.00 (0) 1.35 (16) 96.38 (1251) 6.66 (6) 1 273
Total 100 (7) 100 (1 182) 100 (1 298) 100 (9) * 2 496

Bold italics are the total numbers and percentages for the column and/or row amounts

*Images where the CV ran but did not output an interpretation, as either no RDT could be in the photo, or the result window could not be located on the RDT. In these cases, the AI could not interpret certain images due to adverse conditions such as blur, glare, or improper distance, human readers were able to manually identify and interpret the RDTs in these same images

Table 3.

Comparison of FHW interpretation and panel read interpretation

Panel read FHW Interpretation
Invalid Negative Positive Total
% (n) % (n) % (n) n
Uninterpretable 0.00 (0) 1.30 (16) 1.19 (15) 31
Invalid 28.57 (2) 0.08 (1) 0.16 (2) 5
Negative 57.14 (4) 94.31 (1 160) 1.83 (23) 1 187
Positive 14.29 (1) 4.31 (53) 96.82 (1219) 1 273
Total 100 (7) 100 (1 230) 100 (1 259) 2 496

Bold italics are the total numbers and percentages for the column and/or row amounts

Table 4.

Comparison of FHW interpretation and AI algorithms’ interpretation

AI algorithms FHW interpretation
Invalid Negative Positive Total
% (n) % (n) % (n) n
Invalid 28.57 (2) 0.25 (3) 0.08 (1) 6
Negative 57.14 (4) 93.38 (1 142) 2.40 (30) 1 176
Positive 14.29 (1) 6.38 (78) 97.52 (1 218) 1 297
Total 100 (7) 100 (1 223) 100 (1 249) 2 479*

Bold italics are the total numbers and percentages for the column and/or row amounts

*The AI is not evaluated on uninterpretable images, however, this single photo had already gone through AI checks to determine photo quality. 17 images rejected by IQA filters are not included in the assessment

This is followed by the AI algorithms’ Performance (Table 5), in which the weighted F1 score is shown for the model as a whole. In this analysis, there is also a comparison provided for those mRDTs with a faint positive line, showing how many of the mRDTs with faint positive lines were interpretable by the FHWs and the AI algorithms.

Table 5.

Overall accuracy of the AI algorithms for SD bioline p.f

graphic file with name 44247_2025_190_Tab5_HTML.jpg

Results

Quality of captured images

Good quality images are vital for accurate interpretation of mRDT results, but in some instances, either the physical mRDT or the image of the mRDT is of poorer quality than required by either the Panel Readers and/or the AI algorithms to interpret. In these cases, the Panel Read/AI algorithms will classify the mRDT image as uninterpretable. Examples of quality issues that are not related to how the RDT was performed include blur and lighting issues when the photograph of the RDT was taken.

Most of the images taken by FHWs were of good quality and were interpreted by Panel Readers without any challenges. Figure 3 provides examples of “good quality” images that presented no challenge to mRDT result interpretations—even the image on the far right which is a faint positive.

Other images, although not of the best quality, were sufficient for an interpretation to be made by both the Panel Readers and the AI algorithms (Fig. 4). Blurriness, shadows, and some blood in the result window were minor challenges that could be overcome when interpreting these mRDT images.

Finally, there were several images for which the quality was so poor that neither the Panel Reader nor the AI algorithms were able to interpret (Fig. 5). Excessive blood in the result window and extreme blurriness were the main reasons these images were uninterpretable.

There is a continuing effort to enhance photo filters to enable processing of poor-quality images since the study concluded. As such, in the latest version of the AI algorithms, the IQA filters would identify additional images of insufficient quality and request the user to retake the photo.

Comparison of mRDT interpretation accuracy

Out of 2496 mRDTs interpreted by the FHWs, 7 were invalid, 1230 (49.28%) were negative, and 1259 (50.44%) were positive (Table 1). The Panel Readers interpreted 5 mRDT images as invalid, 1187 (47.56%) as negative, and 1273 (51.00%) as positive. The Panel Readers additionally had an option of marking an mRDT image as uninterpretable, an option not extended to the FHWs. The Panel Readers classified 31 (1.24%) of mRDT images as uninterpretable. Finally, the AI algorithms classified 6 mRDT images (0.24%) as invalid, 1176 (47.11%) as negative, and 1297 (51.96%) as positive.

Table 1.

FHW, panel read and AI algorithms’ interpretations

No. of mRDTs Percent
FHW interpretation
Invalid 7 0.28
Negative 1 230 49.28
Positive 1 259 50.44
Total 2 496 100.00
Panel read interpretation
Invalid 5 0.20
Negative 1 187 47.56
Positive 1 273 51.00
Uninterpretable (“Bad image”) 31 1.24
Total 2 496 100.00
AI algorithms’ interpretation
Invalid 6 0.24
Negative 1 176 47.11
Positive 1 297 51.96
Uninterpretable (rejected by IQA) 17 0.68
Total 2 496 100.00

Bold italics are the total numbers and percentages for the column and/or row amounts

In Tables 2, 3 and 4, we present the aggregate counts of positive (POS), negative (NEG), and invalid (INV) results as interpreted by the AI, FHWs, and panel readers. These tables summarize the overall distribution of results across the dataset, allowing for a comparative analysis of performance across the three groups. However, these aggregate comparisons do not account for discrepancies at the individual RDT level, as mismatches between interpretations can balance out when viewed in aggregate.

For individual-level comparisons, Table A5.1 (in Appendix 6) focuses on the classification of each RDT by AI, FHWs, and the panel. For example, RDT #1 might be classified as POS by AI, FHWs, and the panel, while RDT #2 might be classified as POS by the AI and FHWs but NEG by the panel. Such discrepancies are not captured in the aggregate counts of Tables 2, 3 and 4 but are critical to understanding the differences in performance across individual RDTs.

By presenting both aggregate and individual-level comparisons, we aim to provide a comprehensive view of the AI’s interpretation performance. This dual approach allows us to highlight overall trends while also identifying specific cases where discrepancies arise between the different interpreting parties.

Table 2 below compares mRDT image interpretations made by the AI algorithms to Panel Read interpretations s. Panel Read interpreted 31 mRDT images as uninterpretable. Of these, AI algorithms interpreted 2 as invalid, 18 as negative and 10 as positive. Of the 5 mRDT images the Panel Read classified as invalid, AI algorithms classified 4 as invalid (True Invalids) and 1 as positive. Of the 1 187 mRDT images classified as negative by Panel Read the AI algorithms correctly interpreted 97.12% of the mRDT images as negative (True Negatives), and 2.77% images as positive. Finally, the AI algorithms correctly interpreted 96.38% of positive mRDT images (True Positives), whilst interpreting 16 as negative.

Table 3 below compares 2496 FHW interpretations to Panel Read. Of the 31 mRDT images the Panel Read classified as uninterpretable, FHWs interpreted 16 as negative and 15 as positive. Furthermore, of the 5 mRDT images interpreted as invalid by Panel Read, FHWs interpreted 2 as invalid (True Invalids), 1 as negative and 2 as positive. Of the 1187 negative mRDTs, FHWs correctly interpreted 94.31% of the mRDTs as negative (True Negatives), 4 as invalid and 23 as positive. Finally, of the 1273 positive mRDTs, FHWs correctly interpreted 96.82% as positive (True Positives), 1 as invalid and 53 as negative. FHWs were able to do another test if the result was invalid.

Table 4 provides a comparison of AI algorithms and FHW Interpretations, a real-world example that uses the AI algorithms interpretation as the comparator rather than the Panel Read Interpretation. The 17 mRDT images rejected by IQA were removed for this analysis of AI accuracy. There were a total of 2479 mRDTs interpreted by both FHWs and the AI algorithms. The FHWs interpreted 7 mRDTs as invalid. Two of these were classified as invalid by the AI algorithms (True Invalids), four (4) were classified as negative and one was classified as positive. From the 1223 mRDTs interpreted as negative by the FHWs, 93.38% (True Negatives) were classified as negative by the AI algorithms whilst 6.38% (False Negatives) were classified as positive. Finally, 97.52% (True Positives) of the mRDTs the FHWs interpreted as positive were classified as positive by the AI algorithms, 2.40% (False Positives) were classified as negative and 1 mRDT was classified as invalid.

In addition to the overall comparison counts indicated in Tables 2, 3 and 4, an analysis was performed comparing how each individual image was interpreted by the Panel Readers, FHWs, and AI algorithms. These data are presented in Appendix 6.

Accuracy of AI algorithms

Table 5 below shows the performance statistics of the AI algorithms. The three IQA filters employed for this version of the algorithm were (1) object detection (2), blur and contrast, and (3) darkness. After IQA filters were applied to the full photo set, 2479 mRDT images remained. From these, 0.2% were classified as invalid, 47.7% as negative, 51.1% as positive, and 1.0% of the images were classified as uninterpretable. The weighted F1 score for the AI algorithms was 96.4, compared to 95.3 for the FHWs. Lastly, according to the Panel Read, the AI algorithms were able to accurately classify 90.2% of the 163 mRDTs that showed a faint positive line, compared to 76.1% of the FHWs.

Discussion

This paper investigated and compared mRDT interpretation accuracy of FHWs and AI algorithms to a reference Panel Read interpretation, in Kano State, Nigeria. FHWs interpreted the actual mRDT, whilst the AI algorithms’ interpretation was based on a single image of the mRDT that was taken by the FHW. Overall, findings revealed high interpretation accuracy by the FHWs and AI algorithms, as compared to the expert panel.

In some cases, the quality of the image presented interpretation challenges for both the AI algorithms as well as Panel Readers. A similar result was found in a study done on mRDTs for influenza in which images of mRDTs were uploaded onto a mobile application by the end user [19]. The quality of the image is shown to be a key factor in the interpretability success of mRDT images. Reasons for poor quality were labelled by the Panel Readers, which led to some of the images being uninterpretable, including blood in the result window, blurry images of the mRDT, shadows in the image of the mRDT, the mRDT being skewed in the captured image, and the mRDT being too small in the image. While some of these issues were related to the FHW’s ability to capture a good photo, others were related to the mRDT itself (e.g., presence of blood in the result window). As such, resolving these issues from the outset and ensuring mRDTs are accurately administered and images taken are of sufficient quality will increase the number and accuracy of interpretations by both a person (such as a Panel Reader, healthcare worker, or remote telehealth clinician) and AI algorithms. However, even with imperfect images, the AI algorithms were found to be quite resilient to several conditions (such as blur, glare, etc.).

The high accuracy numbers indicate that the AI algorithms performed well when interpreting both positive and negative mRDTs. Furthermore, the reference Panel Read was compared to the FHW interpretation, which found agreement 96.82% of the time for positive mRDTs and 94.31% for negative mRDTs. A separate study comparing interpretations of physical mRDTs by FHWs and study team researchers, identified similar concordance with FHWs displaying an overall sensitivity of 92% and specificity of 97.3% [20]. The difference between the Panel Read interpretations and FHW interpretations demonstrates an opportunity to improve the accuracy of FHW interpretations. In this study, FHW interpretations were incorrect 5.69% of the time for negative mRDTs and 3.18% for positive mRDTs. Lastly, a final interpretation accuracy comparison between the AI interpretation and the FHW interpretation provides a real-world use case for AI, which would not have access to a reference Panel Read. In this particular study, FHWs performed well overall when interpreting mRDTs. There was uncertainty about the mRDT interpretation competency of the FHW in this community health program prior to this study. Our results provided evidence of the FHW’s interpretation accuracy, bolstering the team’s confidence. When comparing the AI and FHW interpretations, using the AI as the new reference mechanism, there was agreement on the interpretation of positives 97.52% of the time and on negatives 93.38% of the time. This indicates that when comparing AI alone to the FHW interpretations, there are opportunities even in a well trained and experienced group of FHWs for the AI to identify potential interpretation errors, targeted opportunities for patient follow up, and insights regarding CHWs who may benefit from additional supervision and/or training.

Furthermore, in this assessment, the AI algorithm was better at interpreting faint positive lines than the FHWs, as 90.2% of the 163 mRDTs that showed a faint positive line were interpreted correctly by the AI algorithm, compared to only 76.1% by FHWs. This was similar to the results from a study that found that only 73.8% of FHWs correctly interpreted faint positive line mRDTs [20]. Digital tools such as HealthPulse AI offer the potential to streamline reporting processes by enabling real-time data capture and transmission. This reduces the lag time associated with manual entry, allowing healthcare providers to upload results when connectivity is available, and bypass delays often encountered with traditional reporting systems. Such capabilities enhance data accuracy, ensure timely reporting, and enable faster decision-making in clinical and public health settings. This shows a strong justification for using AI to support mRDT interpretations, specifically in cases of quality assurance, supportive supervision of FHWs, and public health surveillance. Although this study only had 163 images with faint lines, once a mobile app with AI support moves to scale, a difference of 14.1% additional positives being identified could be quite significant and lead to proper treatment for many more individuals who otherwise may go untreated.

In a context where FHWs are inexperienced and/or poorly trained or untrained, AI support of this nature could be even more impactful. In a systematic review of the use of malaria RDTs in various health contexts across sub-Saharan Africa, it was found that FHWs were only able to correctly interpret mRDTs around 57% of the time, in the absence of training or job aids. This figure was 90% for those who had undergone training and had job aids available to them [13]. Similarly, another study found that amongst FHWs who did not have training or job aids, FHWs were only able to correctly interpret mRDTs 54% of the time. This increased to 80% for a group who had job aids, and 93% for a group who had undergone training and also had access to job aids [14].

Study design limitations include the fact that the trained FHWs knew they were in a study and knew proof of the mRDT result was being captured as they were taking a photo of it. This may have led to FHWs in this study taking greater care when performing interpretations compared to the above-mentioned studies or in normal daily activity. To test this hypothesis, baseline data could be collected before introduction of the app. Additionally, while the FHW was intended to capture a photo right after test processing was complete, for this study there was no way to ensure all photos were captured during the manufacturer’s designated test read window. A study that includes observation of RDT testing and photographing procedures could be employed to achieve such results. Furthermore, while this study demonstrates AI’s potential to significantly reduce misinterpretation of mRDT results, particularly in the case of faint positive lines, it does not address errors related to the misadministration of the test itself. Factors such as improper RDT administration, health worker adherence to protocol, or potential intentional misinterpretation of results are beyond the scope of this study. Further research and interventions are needed to address these types of errors, which play an important role in overall diagnostic accuracy and patient outcomes.

This study also demonstrated how HealthPulse can interoperate with a comprehensive mHealth clinical risk assessment platform such as THINKMD, which manages the FHWs’ overall experience to capture detailed symptomatology and accurately administer rapid tests. The THINKMD platform decision support solution used digital test result data from the HealthPulse system to generate clinical risk assessment and treatment guidance for the patient. The Android Intent interoperability approach used in this study displays one example of an in-place solution. Deeper API-based integrations could also be used. Collected data can additionally be augmented with on-device or cloud AI interpretations to provide insights for program coordinators and stakeholders to enable overall program monitoring, surveillance, and quality control at all levels.

The AI algorithms performed well, showing a weighted F1 score of 96.4 compared to 95.3 amongst healthcare workers. When comparing the AI and FHW interpretations, using the AI as the reference mechanism, there was agreement on the interpretation of positives 96.76% of the time and on negatives 94.50% of the time. It is anticipated that the AI accuracy could be further improved by increasing the volume of field images used in the training dataset, especially related to specific adversarial conditions such as blood, staining, and lighting which causes harsh shadows. Furthermore, during the THINKMD study, the AI system was designed such that an individual set of AI algorithms were tailored to specific RDT brands, such as the SD Bioline P.f RDT. This approach allowed for accurate interpretation but limited the flexibility of the AI to work across different RDT types. Since the completion of the study, the AI algorithms have advanced significantly and a single set of AI algorithms are now capable of recognizing and interpreting results from a variety of RDT brands, even across diseases.

Conclusion

As healthcare mRDT needs increase and access to highly trained FHWs decreases, enabling solutions that can engage less experienced workers in the healthcare ecosystem could have a dramatic positive impact on the health sector overall. This analysis demonstrates that the HealthPulse AI algorithms had high performance and interpretation accuracy for images of mRDTs taken by healthcare workers in Kano State, Nigeria, performing similarly to FHWs on easier to read mRDTs and even better than FHWs on faint positives. The AI algorithms studied were shown to be resilient to a range of mRDT and photo capture conditions that are challenging even for highly trained FHWs. Combining a mobile application including mRDT administration support and mRDT image capture with AI algorithms can elevate the performance of frontline healthcare workers evaluating mRDT results, allowing for an effective point of care system to accurately interpret mRDTs in highly malaria endemic, and low resource settings. Together the FHW and AI can be a good team to support FHW decision making, ensuring accurate reporting and appropriate treatment; elevating the performance of humans evaluating mRDT results, particularly when embedded in digital mHealth clinical decision technology such as THINKMD. Looking ahead, the AI algorithms’ ability to examine the lines present in the mRDT result window could be further enhanced for moderate adversarial cases, including where the mRDT has blood and/or staining in the result window. The AI algorithms could additionally be targeted to support an expanded set of use cases, going beyond quality control and supportive supervision.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (1.8MB, docx)

Acknowledgements

Thank you to the individuals who helped review the results and provided insights for this manuscript, including Sarah Morris, Shyam Pather, Dino Rech, Hana Lee, and Bronte Li. Further, we deeply appreciate the Frontline Health Workers who participated in the study and the patients who were tested for malaria during the study. We also recognize the individuals who designed and developed the HealthPulse application and AI algorithms used in the study, including Ahmed Ibrahim, David Schacht, Griffin Hardy, Krishnam Gupta, Michael Marucheck, Riley Chang, Rob Jarrett, Terri Paik, and Yohann Richard.

Abbreviations

AI

Artificial Intelligence

AWS

Amazon Web Services

CV

Computer Vision

DALY

Disability Adjusted Life Years.

GMP

Google’s MediaPipe

FHW

Frontline Healthcare Workers

IQA

Image Quality Assurance

ML

Machine Learning

mRDT

Malaria Rapid Diagnostic Test

PII

Personally Identifiable Information

RDT

Rapid Diagnostic Test

Author contributions

SF and SC were major contributors to the writing of the manuscript. SF and SS analyzed the data. DH, YR, PI, RN, MM, LM and BF provided input, comments and feedback into the manuscript. All authors conceptualized the study and read and approved the final manuscript.

Funding

This study was funded by a grant from the Bill & Melinda Gates foundation, INV-007492.

Data availability

Data from the THINKMD system cannot be shared publicly because of private patient data and identifiers collected during the study. While images of Rapid Diagnostic Tests (RDTs) are included in this manuscript, these have been carefully reviewed to remove any personally identifiable information (PII). Images used are cropped and anonymized to present only the necessary components of the test results. Explicit consent was provided to use anonymized images internally for product improvement and dissemination of findings. However, broader public sharing of the full dataset of images is restricted due to ethical considerations, consent limitations, and the risk of inadvertent inclusion of background identifiable information. The images of Rapid Diagnostic Tests (RDTs) utilized in Audere’s computer vision analyses are not publicly available due to privacy and ethical considerations. Furthermore, the HealthPulse software is not open source at present but abides by the Gates Foundation policies related to open access for use of software in priority geographies. The software is available for integration and cost of use, as Audere is a non-profit organization, that operates in cost recovery mode in offering these services. Researchers interested in accessing RDT images for further research are encouraged to contact Audere directly. Access will be granted in accordance with Audere's data sharing policies and may require an appropriate data use agreement. Please direct inquiries to info@auderenow.org. Data are available from Audere for researchers who meet the criteria for access to confidential data.

Declarations

Ethics approval and consent to participate

Ethics Approval was obtained from The University of Vermont Committees on Human Research (Ethics Number MOD00005335) and the Kano State Ministry of Health Research Ethics Committee (Ethics Number MOH/Off/797/T.I/2056).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests. THINKMD was the principal investigator of the main study, authoring the study protocol and obtaining IRB. As part of an extension of the study that was submitted and approved by ethics, in partnership with Audere, THINKMD included an augmented version of the technology—which included Audere’s smartphone image capture and machine learning (ML) based rapid diagnostic test (RDT) analysis algorithms for malaria. THINKMD received funding from Audere to be able to incorporate this technology into their existing platform. This funding was part of grant received by Audere from the Gates Foundation.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally to this work.

References

  • 1.Monroe A, Olapeju B, Moore S, Hunter G, Merritt AP, Okumu F, Babalola S. Improving malaria control by understanding human behaviour. Bull World Health Organ. 2021;99(11):837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.World Malaria Report. WHO 2020.
  • 3.Okunlola OA, Oyeyemi OT. Spatio-temporal analysis of association between incidence of malaria and environmental predictors of malaria transmission in Nigeria. Sci Rep. 2019;9(1):17500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Talapko J, Škrlec I, Alebić T, Jukić M, Včev A. Malaria: the past and the present. Microorganisms. 2019;7(6):179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.World Malaria Report. WHO 2022.
  • 6.Global Health Data Exchange. 2019. IHME, filtered on: Location = Global; Year = 2019; Ages = All ages AND 0 to 9; Metric = Number; Measure = DALYs; Sex = Both; Conditions = A.1.1 HIV/AIDS, A.2.1 Tuberculosis, A.4.1 Malaria. http://ghdx.healthdata.org/gbd-results-tool
  • 7.Dhiman S. Are malaria elimination efforts on right track? An analysis of gains achieved and challenges ahead. Infect Dis Poverty. 2019;8:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mokuolu OA, Ajumobi OO, Ntadom GN, Adedoyin OT, Roberts AA, Agomo CO, Edozieh KU, Okafor HU, Wammanda RD, Odey FA, Maikore IK. Provider and patient perceptions of malaria rapid diagnostic test use in Nigeria: a cross-sectional evaluation. Malar J. 2018;17:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Scientific American. A new strain of drug-resistant malaria has sprung up in Africa-here’s how we fight back 2021. https://www.scientificamerican.com/article/a-new-strain-of-drug-resistant-malaria-has-sprung-up-in-africa/. Accessed on 26 October 2022.
  • 10.Batwala V, Magnussen P, Hansen KS, Nuwaha F. Cost-effectiveness of malaria microscopy and rapid diagnostic tests versus presumptive diagnosis: implications for malaria control in Uganda. Malar J. 2011;10:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Park C, Ngo H, Lavitt LR, Karuri V, Bhatt S, Lubell-Doughtie P, Shankar AH, Ndwiga L, Osoti V, Wambua JK, Bejon P. The design and evaluation of a mobile system for rapid diagnostic test interpretation. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2021;5(1):1–26.
  • 12.World Health Organization. The role of RDTs in malaria diagnosis and treatment. World Health Organization; 2015. https://www.who.int/malaria/publications/atoz/rdt-malaria-diagnosis/en/
  • 13.Boyce MR, O’Meara WP. Use of malaria RDTs in various health contexts across sub-Saharan Africa: a systematic review. BMC Public Health. 2017;17:1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Harvey SA, Jennings L, Chinyama M, Masaninga F, Mulholland K, Bell DR. Improving community health worker use of malaria rapid diagnostic tests in Zambia: package instructions, job aid and job aid-plus-training. Malar J. 2008;7(1):1–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Maltha J, Gillet P, Jacobs J. Malaria rapid diagnostic tests in travel medicine. Clin Microbiol Infect. 2013;19(5):408–15. 10.1111/1469-0691.12157. [DOI] [PubMed] [Google Scholar]
  • 16.Skjefte M, Cooper S, Poyer S, Lourenço C, Smedinghoff S, Keller B, Wambua T, Oduor C, Frade S, Waweru W. Use of a health worker-targeted smartphone app to support quality malaria RDT implementation in Busia county, Kenya: a feasibility and acceptability study. PLoS ONE. 2024. 10.1371/journal.pone.0295049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Fischer AE, Van Tonder T, Gumede SB, Lalla-Edward ST. Changes in perceptions and use of mobile technology and health communication in South Africa during the COVID-19 lockdown: cross-sectional survey study. JMIR Formative Res. 2021;5(5):e25273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Finette BA, McLaughlin M, Scarpino SV, Canning J, Grunauer M, Teran E, Bahamonde M, Quizhpe E, Shah R, Swedberg E, Rahman KA, Khondker H, Chakma I, Muhoza D, Seck A, Kabore A, Nibitanga S, Heath B. Development and initial validation of a frontline health worker mHealth assessment platform (MEDSINC®) for children 2–60 months of age. Am J Trop Med Hyg. 2019;100(6):1556–65. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6553915/. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kotnik JH, Cooper S, Smedinghoff S, Gade P, Scherer K, Maier M, Juusola J, Ramirez E, Naraghi-Arani P, Lyon V, Lutz B. Flu@ home: the comparative accuracy of an at-home influenza rapid diagnostic test using a prepositioned test kit, mobile app, mail-in reference sample, and symptom-based testing trigger. J Clin Microbiol. 2022;60(3):e02070–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Boyce MR, Menya D, Turner EL, Laktabai J, Prudhomme-O’Meara W. Evaluation of malaria rapid diagnostic test (RDT) use by community health workers: a longitudinal study in Western Kenya. Malar J. 2018;17:1–1. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (1.8MB, docx)

Data Availability Statement

Data from the THINKMD system cannot be shared publicly because of private patient data and identifiers collected during the study. While images of Rapid Diagnostic Tests (RDTs) are included in this manuscript, these have been carefully reviewed to remove any personally identifiable information (PII). Images used are cropped and anonymized to present only the necessary components of the test results. Explicit consent was provided to use anonymized images internally for product improvement and dissemination of findings. However, broader public sharing of the full dataset of images is restricted due to ethical considerations, consent limitations, and the risk of inadvertent inclusion of background identifiable information. The images of Rapid Diagnostic Tests (RDTs) utilized in Audere’s computer vision analyses are not publicly available due to privacy and ethical considerations. Furthermore, the HealthPulse software is not open source at present but abides by the Gates Foundation policies related to open access for use of software in priority geographies. The software is available for integration and cost of use, as Audere is a non-profit organization, that operates in cost recovery mode in offering these services. Researchers interested in accessing RDT images for further research are encouraged to contact Audere directly. Access will be granted in accordance with Audere's data sharing policies and may require an appropriate data use agreement. Please direct inquiries to info@auderenow.org. Data are available from Audere for researchers who meet the criteria for access to confidential data.


Articles from BMC Digital Health are provided here courtesy of Springer

RESOURCES