Abstract
Introduction
Machine learning (ML) has been used in bio-medical research, and recently in clinical and public health research. However, much of the available evidence comes from high-income countries, where different health profiles challenge the application of this research to low/middle-income countries (LMICs). It is largely unknown what ML applications are available for LMICs that can support and advance clinical medicine and public health. We aim to address this gap by conducting a scoping review of health-related ML applications in LMICs.
Methods and analysis
This scoping review will follow the methodology proposed by Levac et al. The search strategy is informed by recent systematic reviews of ML health-related applications. We will search Embase, Medline and Global Health (through Ovid), Cochrane and Google Scholar; we will present the date of our searches in the final review. Titles and abstracts will be screened by two reviewers independently; selected reports will be studied by two reviewers independently. Reports will be included if they are primary research where data have been analysed, ML techniques have been used on data from LMICs and they aimed to improve health-related outcomes. We will synthesise the information following evidence mapping recommendations.
Ethics and dissemination
The review will provide a comprehensive list of health-related ML applications in LMICs. The results will be disseminated through scientific publications. We also plan to launch a website where ML models can be hosted so that researchers, policymakers and the general public can readily access them.
Keywords: epidemiology, biotechnology & bioinformatics, health informatics, World Wide Web technology
Strengths and limitations of this study.
The search strategy is informed by a solid framework.
The search will be conducted in several information sources.
The screening/review process will be performed by two reviewers independently.
We will not search abstracts or conference proceedings specialised in machine learning.
We will focus on information sources, where health-related outcomes are most likely to be found.
Introduction
Machine learning (ML) refers to the process through which computers, models or algorithms, learn and improve from data and processes, rather than from specific programmed instructions.1 2 ML can be used in tasks such as classification (eg, whether a tumour is benign or malign based on patterns or characteristics), clustering (eg, group patients with similar profiles for targeted prevention or treatment interventions) or prediction (eg, forecast propensity to risk or probability of outcome of a disease following interventions).2 Of note, there could be overlap across different tasks.
ML is widely used in bio-medical sciences, and more recently in clinical and public health research.3 4 Systematic reviews on health-related applications of ML have explored questions such as the accuracy of ML for diagnosis or outcome prediction,5–7 but most of the research studies included in these reviews have come from high-income countries and the findings may not apply to low/middle-income countries (LMICs) because of the variability in access to healthcare and difference in the disease burden. It is largely unknown what ML applications are available for LMICs that can support and advance clinical medicine and public health.
Our aim was to address this gap in the evidence by conducting a scoping review of health-related applications of ML in LMICs to synthesise published evidence and to garner lessons to inform further research and policy development. The review will provide the first comprehensive list of health-related ML applications in LMICs.
Methods
Overview
This is a scoping review of the published scientific literature. This protocol adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA-P) guidelines (online supplementary material S1),8 and the methodology follows the procedures suggested by Levac et al.9 The final publication of this work will adhere to the PRISMA extension Scoping Reviews (PRISMA-ScR) recommendations.10
bmjopen-2019-035983supp001.pdf (94.5KB, pdf)
PRISMA-P is a standard method to report review protocols; similarly, PRSIMA-ScR is a well-known instrument to guide the reporting of scoping reviews. The methodology proposed by Levac and colleagues is useful for scoping reviews that aim to generate a broad picture of the available evidence on a subject.
Understanding that a key feature of scoping reviews is a broad research question, we aim to answer: what have been the health-related applications of ML techniques in LMICs? This review will summarise scientific evidence on research that have used data from LMICs for ML applications in clinical medicine (eg, risk prediction for clinical decisions) and public health (eg, vector control in an endemic area) to provide solutions to health problems in LMICs.
Definitions
For this work, we will follow the following definitions:
LMICs: classification by the World Bank country income grouping (see search strategy in table 1 for a list of countries).11
ML: analytical techniques through which computers learn directly from data, examples and experiences, rather than from a pre-programmed rule.1 2
ML techniques: analytical methods within the ML remit, that is, where machines have learnt from data or processes through ML techniques or algorithms. These include, but are not limited to2 12: support vector machine, support vector regression, decision trees, random forest, neural network, Bayesian network, artificial neural network, computer vision systems, computer-assisted image processing and natural language processing.
Table 1.
Overall search terms
1 | artificial intelligence.mp. |
2 | exp Artificial Intelligence/ |
3 | machine learning.mp. |
4 | exp machine learning/ |
5 | deep learning.mp. |
6 | unsupervised machine learning.mp. |
7 | supervised machine learning.mp. |
8 | computational Intelligence.mp. |
9 | predictive analytic*.mp. |
10 | support vector machine.mp. |
11 | support vector regression.mp. |
12 | decision tree*.mp. |
13 | random forest.mp. |
14 | neural network*.mp. |
15 | exp Neural Networks/ |
16 | bayesian network*.mp. |
17 | artificial neural network*.mp. |
18 | convolutional neural network*.mp. |
19 | computer vision systems.mp. |
20 | exp Image Processing, Computer-Assisted/ |
21 | natural language processesing.mp. |
22 | 1 or 2 or 3 …or 21 |
23 | ((“Afghanistan”) or (“Benin”) or (“Burkina Faso”) or (“Burundi”) or (“Central African Republic”) or (“Chad”) or (“Comoros”) or (“Democratic Republic of the Congo”) or (“Eritrea”) or (“Ethiopia”) or (“Gambia”) or (“Guinea”) or (“Guinea-Bissau”) or (“Haiti”) or (“Democratic People's Republic of Korea”) or (“Liberia”) or (“Madagascar”) or (“Malawi”) or (“Mali”) or (“Mozambique”) or (“Nepal”) or (“Niger”) or (“Rwanda”) or (“Senegal”) or (“Sierra Leone”) or (“Somalia”) or (“South Sudan”) or (“Tanzania”) or (“Togo”) or (“Uganda”) or (“Zimbabwe”) or (“Armenia”) or (“Bangladesh”) or (“Bhutan”) or (“Bolivia”) or (“Cape Verde”) or (“Cambodia”) or (“Cameroon”) or (“Congo”) or (“Cote d'Ivoire”) or (“Djibouti”) or (“Egypt”) or (“El Salvador”) or (“Ghana”) or (“Guatemala”) or (“Honduras”) or (“India”) or (“Indonesia”) or (“Kenya”) or (“Micronesia”) or (“Kosovo”) or (“Kyrgyzstan”) or (“Laos”) or (“Lesotho”) or (“Mauritania”) or (“Moldova”) or (“Mongolia”) or (“Morocco”) or (“Myanmar”) or (“Nicaragua”) or (“Nigeria”) or (“Pakistan”) or (“Papua New Guinea”) or (“Philippines”) or (“Samoa”) or (“Atlantic Islands”) or (“Melanesia”) or (“Sri Lanka”) or (“Sudan”) or (“Swaziland”) or (“Syria”) or (“Tajikistan”) or (“Timor-Leste”) or (“Tonga”) or (“Tunisia”) or (“Ukraine”) or (“Uzbekistan”) or (“Vanuatu”) or (“Vietnam”) or (“Middle East”) or (“Yemen”) or (“Zambia”) or (“Albania”) or (“Algeria”) or (“American Samoa”) or (“Angola”) or (“Argentina”) or (“Azerbaijan”) or (“Republic of Belarus”) or (“Belize”) or (“Bosnia and Herzegovina”) or (“Botswana”) or (“Brazil”) or (“Bulgaria”) or (“China”) or (“Colombia”) or (“Costa Rica”) or (“Cuba”) or (“Dominica”) or (“Dominican Republic”) or (“Equatorial Guinea”) or (“Ecuador”) or (“Fiji”) or (“Gabon”) or (“Georgia”) or (“Grenada”) or (“Guyana”) or (“Iran”) or (“Iraq”) or (“Jamaica”) or (“Jordan”) or (“Kazakhstan”) or (“Lebanon”) or (“Libya”) or (“Macedonia (Republic)”) or (“Malaysia”) or (“Indian Ocean Islands”) or (“Mauritius”) or (“Mexico”) or (“Montenegro”) or (“Namibia”) or (“Palau”) or (“Panama”) or (“Paraguay”) or (“Peru”) or (“Romania”) or (“Russia”) or (“Serbia”) or (“South Africa”) or (“Saint Lucia”) or (“Saint Vincent and the Grenadines”) or (“Suriname”) or (“Thailand”) or (“Turkey”) or (“Turkmenistan”) or (“Venezuela”) or (developing countr) or (low-income countr*) or (middle-income countr*) or (low-middle income countr*) or (upper-middle income countr*)) |
24 | 22 and 23 |
25 | exp animals/ not humans.sh. |
26 | 24 not 25 |
27 | Remove duplicates from 26 |
Eligibility criteria
We will include scientific evidence that meets the following inclusion criteria (figure 1).
Figure 1.
Algorithm to filter search results.
Primary research studies
We will include experimental and observational studies that have used and reported quantitative data. We will also screen relevant systematic reviews or narrative reviews for eligible primary studies and include them in our review if relevant. We will not include qualitative studies, opinions, conference abstracts, letters, editorials or any other scientific work in which data have not been actively analysed within an ML framework.
Machine learning research should have used data from LMICs, that is, machine learning research that used solely data from LMICs
We will focus on ML research based on data from LMICs applied to LMICs. We will not include ML research that used data from high-income countries even though it could have been applied to a LMIC, and neither studies using LMICs data applied to high-income countries. This scoping review focuses on the applications of ML techniques in LMICs; these applications should have been developed using LMICs data, because prediction models and other algorithms work better, that is, have better accuracy, in the populations—data—for which they were developed. Conversely, when these models are applied to other populations, settings or data, they need some modification (eg, recalibration). We will also exclude studies that have used LMIC data in a consortium or data pooling group if the model or results cannot be separated for the LMIC alone; in other words, we will exclude a report if this used LMIC data in aggregate with data from high-income countries, but the application cannot be separated for the LMIC alone.
Models developed in sites other than LMICs may not work correctly in these countries. For example, projects with digital imaging from high-income countries may reflect a different scenario; that is, images from streets in LMICs may depict objects or features not found in high-income countries. Another example could be projects for sound/noise classification. Those from high-income countries may not identify the variety of noises usually available in LIMCs (eg, loud cars or indiscriminate use of car horns). Finally, LMICs still have sizeable rural areas with large populations. Extrapolating models built for highly urbanised cities may not be adequate for rural sites.
Examples of studies of interest include: (1) development of a ‘deep learning-based visual evaluation algorithm’ to early identify cervical cancer signs based on data from women in Costa Rica,13 (2) classification of free-text (random forest) in emergency department records from nine hospitals in Nicaragua14 and (3) automatic classification (neural networks) of paediatric pneumonia based on ultrasound records from children in Peru.15
Distinguishing between ML applications and more conventional statistical methods could be challenging because in some cases the definitions are unclear, for instance, regression analysis. Nonetheless, from the context of the scientific paper, from the aims or overall methodological approach, it is possible to reckon whether a study uses ML techniques versus more conventional statistical methods. If needed, we would reach out to authors for further information.
The outcome of the study/analysis was to improve a health-related outcome
The primary outcome of the selected studies should have sought improvements in the following health-related endpoints along the care cascade: diagnosis, treatment, control, survival, complications and mortality. These outcomes have been selected because of their relevance in healthcare, clinical medicine and public health. As secondary outcome, we will also include studies that have reported endpoints related to cost, efficiency and productivity in the healthcare process.
Information sources
We will conduct the search in three databases through Ovid: Embase, Medline and Global Health; in addition, we will search Cochrane and Google Scholar (first 10 pages). Besides Google Scholar, no other grey literature source will be searched. These sources will be used without language or time/year restrictions. We have a very diverse team covering many languages and we will tap into our networks at our institutions in case we come across a study in a language that the authors do not speak.
Search strategy
Based on recent systematic reviews of ML applications on health-related endpoints: outcome prediction in gastrointestinal bleeding,7 assessment of physicians knowledge,16 applications on genomic data to predict outcomes in cancer patients17 and available ML algorithms to improve genomic data analysis,18 we developed our search strategy (table 1). We will screen the references of the included reports for any other relevant studies.
Study records
Results will be downloaded into EndNote, where duplicates will be omitted, and a second cleaning of duplicated results will be conducted using the online tool Rayyan.19 Titles and abstracts will be reviewed by two independent researchers following the above detailed selection criteria (figure 1); discrepancies will be solved by consensus or by a third party. Selected titles will be sought in full text, which will be assessed by two independent reviewers following the same selection criteria (figure 1). Again, in case of conflicts, these will be solved by consensus or by a third party.
Data collection
The reviewers will decide on a list of items that will be extracted to successfully answer the research question. These items will be implemented in an Excel spreadsheet before data extraction and will not be modified afterwards. Information of interest includes: country of origin; analytical approach; type of data used; data source; model performance; outcome of interest; number of observations and whether the model is available for independent use or reproduction. Data from the selected reports will be extracted by two researchers independently; if there were discrepancies, these will be solved by consensus among them or by a third party.
Risk of bias in individual studies
Because this is a scoping review aiming to summarise available evidence to identify research gaps and potential uses of ML to improve health-related outcomes in LMICs, no risk of bias of individual studies is planned.
Data synthesis
We anticipate a large heterogeneity of selected reports, both in terms of methodology and outcomes, as well as target population and data sources. Therefore, no meta-analysis is planned and only a qualitative summary will be conducted. Following current recommendations for scoping reviews and evidence mapping,20 we will present the results through tables and figures, for example, a map pointing out where studies have been conducted and summarising key characteristics. As needed, we will consider other figures such as a matrix evidence.20
Ethics and dissemination
This scoping review will not require ethical approval because it did not study human subjects; also, it included sources that are or can be made available to the public.
We plan to report our findings in a scientific publication. In addition, and depending on available resources, we aim to produce a website (or implement in an existing website) in which the findings and summarised reports can be easily accessed. Furthermore, we aim to host the ML models so that researchers, policymakers and the general public can readily access them. Where the ML models are open access or can be accessed through the original reports, these will be hosted on the website or a link to the original source will be provided; conversely, where ML models are not open access, we will contact the study authors and ask for the model to be hosted in our website or for a link to their ML model. This dissemination plan aims to increase visibility of ML research in LMICs and to increase the use of available models, thereby encouraging further research to improve health outcomes. We will engage with the communication office in our universities to promote this website through relevant channels, including but not limited to social media, newsletters and institutional websites.
Patient and public involvement
No patients will be directly involved in the design, planning and conception of this study.
Supplementary Material
Footnotes
Contributors: RMC-L conceived the idea and drafted the manuscript. JP-S, TP and RA provided advice to improve the research question and LTC to improve the protocol. JJM and RA edited and provided insights to improve the protocol. All authors approved the submitted version.
Funding: RMC-L has been supported by a Strategic Award, Wellcome Trust-Imperial College Centre for Global Health Research (100693/Z/12/Z) and Imperial College London Wellcome Trust Institutional Strategic Support Fund (Global Health Clinical Research Training Fellowship) (294834/Z/16/Z ISSF ICL). RMC-L is supported by a Wellcome Trust International Training Fellowship (214185/Z/18/Z). The funders had no role in this work and decision to submit for publication.
Competing interests: None declared.
Patient and public involvement: Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Patient consent for publication: Not required.
Provenance and peer review: Not commissioned; externally peer reviewed.
References
- 1.Davies SC. Annual report of the chief medical officer, 2018 health 2040 – better health within reach. Department of Health and Social Care, 2018. [Google Scholar]
- 2.Rebala G, Ravi A, Churiwala S. An introduction to machine learning. Springer International Publishing, 2019. [Google Scholar]
- 3.Panch T, Pearson-Stuttard J, Greaves F, et al. Artificial intelligence: opportunities and risks for public health. Lancet Digit Health 2019;1:e13–14. 10.1016/S2589-7500(19)30002-0 [DOI] [PubMed] [Google Scholar]
- 4.Wahl B, Cossy-Gantner A, Germann S, et al. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Glob Health 2018;3:e000798. 10.1136/bmjgh-2018-000798 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.de Filippis R, Carbone EA, Gaetano R, et al. Machine learning techniques in a structural and functional MRI diagnostic approach in schizophrenia: a systematic review. Neuropsychiatr Dis Treat 2019;15:1605–27. 10.2147/NDT.S202418 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bradley A, van der Meer R, McKay C. Personalized pancreatic cancer management: a systematic review of how machine learning is supporting decision-making. Pancreas 2019;48:598–604. 10.1097/MPA.0000000000001312 [DOI] [PubMed] [Google Scholar]
- 7.Shung D, Simonov M, Gentry M, et al. Machine learning to predict outcomes in patients with acute gastrointestinal bleeding: a systematic review. Dig Dis Sci 2019;64:2078–87. 10.1007/s10620-019-05645-z [DOI] [PubMed] [Google Scholar]
- 8.Shamseer L, Moher D, Clarke M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ 2015;349:g7647 10.1136/bmj.g7647 [DOI] [PubMed] [Google Scholar]
- 9.Levac D, Colquhoun H, O'Brien KK. Scoping studies: advancing the methodology. Implement Sci 2010;5:69. 10.1186/1748-5908-5-69 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Tricco AC, Lillie E, Zarin W, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med 2018;169:467–73. 10.7326/M18-0850 [DOI] [PubMed] [Google Scholar]
- 11.The World Bank World bank country and lending groups. Available: https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups
- 12.Burkov A. The Hundred-page machine learning book, 2019. [Google Scholar]
- 13.Hu L, Bell D, Antani S, et al. An observational study of deep learning and automated evaluation of cervical images for cancer screening. J Natl Cancer Inst 2019;111:923–32. 10.1093/jnci/djy225 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lorenzoni G, Bressan S, Lanera C, et al. Analysis of unstructured Text-Based data using machine learning techniques: the case of pediatric emergency department records in Nicaragua. Med Care Res Rev 2019;20:107755871984412. 10.1177/1077558719844123 [DOI] [PubMed] [Google Scholar]
- 15.Correa M, Zimic M, Barrientos F, et al. Automatic classification of pediatric pneumonia based on lung ultrasound pattern recognition. PLoS One 2018;13:e0206410. 10.1371/journal.pone.0206410 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Dias RD, Gupta A, Yule SJ. Using machine learning to assess physician competence: a systematic review. Acad Med 2019;94:427–39. 10.1097/ACM.0000000000002414 [DOI] [PubMed] [Google Scholar]
- 17.Patil S, Habib Awan K, Arakeri G, et al. Machine learning and its potential applications to the genomic study of head and neck cancer-a systematic review. J Oral Pathol Med 2019;48:773–9. 10.1111/jop.12854 [DOI] [PubMed] [Google Scholar]
- 18.Wu J, Zhao Y. Machine learning technology in the application of genome analysis: a systematic review. Gene 2019;705:149–56. 10.1016/j.gene.2019.04.062 [DOI] [PubMed] [Google Scholar]
- 19.Ouzzani M, Hammady H, Fedorowicz Z, et al. Rayyan-a web and mobile APP for systematic reviews. Syst Rev 2016;5:210. 10.1186/s13643-016-0384-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Evidence and gap maps: a comparison of different approaches. Oslo, Norway: the Campbell collaboration. Available: www.campbellcollaboration.org/ [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
bmjopen-2019-035983supp001.pdf (94.5KB, pdf)