Skip to main content
PLOS One logoLink to PLOS One
. 2026 Feb 25;21(2):e0342750. doi: 10.1371/journal.pone.0342750

Assessing the national essential medicines list selection processes: Instrument development and testing

Ekanki Saxena 1,*, Jillian C Kohler 2, Kevin E Thorpe 3, Nav Persaud 4
Editor: Muhammad Shahzad Aslam5
PMCID: PMC12935209  PMID: 41739779

Abstract

Background

National Medicines Policies use the National Essential Medicines Lists (NEML) to improve equitable access to medications. An effective list selection process should address a country’s priority healthcare needs and be aligned with WHO guidelines.

Objective

The purpose of this study was to develop and test an instrument to measure the effectiveness of NEML selection process design.

Methods

An instrument consisting of 16 items, along with an associated rating scheme, was created through a literature review and consultation with subject matter experts. Reliability was assessed using intraclass correlation coefficients (ICCs) for absolute agreement and consistency. Validity was evaluated by comparing instrument scores with two proxy measures of NEML selection process effectiveness.

Results

The total score for NEML selection process design effectiveness, based on ratings from 5 raters across 4 countries, demonstrated an absolute agreement ICC of 0.98 (95% confidence interval 0.91 to 0.99) and a consistency ICC of 0.97 (95% confidence interval 0.88 to 0.99). Instrument scores varied correspondingly with two proxy measures of NEML selection process effectiveness, indicating good validity.

Conclusion

The instrument developed in this study measures the construct of NEML selection process design effectiveness, which essentially evaluates the alignment of national policy content with policy intent. The instrument demonstrated good validity and excellent reliability.

Introduction

Improving access to medicines is critical to attaining Universal Health Coverage (UHC), which is one of the World Health Organization’s (WHO) Sustainable Development Goals (SDG 3.8). National Essential Medicines lists (NEMLs) are a key policy tool under the essential medicines concept. NEMLs are a shortlist of medicines selected to address the priority healthcare needs of a population and form the basis of medicine selection, supply, procurement, production, and donation [1]. The essential medicines (EM) concept is meant to promote health equity and improve global health by making safe, effective medicines accessible to all at affordable prices [2,3]. The WHO created its first model essential medicines list based on global health priorities in the 1970s, which has since been adopted and adapted by more than 150 countries [46]. Improving access to medicines is contingent on the NEML addressing the priority healthcare needs of a country, which in turn, is a direct result of the NEML selection process design that is in place. The NEML selection process design is, therefore, an important component of the policy [2,7].

The WHO serves as both an agency that generates norms and standards and an advocate for the EM concept globally [8]. The WHO publishes a model EML every two years, compiled based on global priority health needs, and provides a starting point for nations to begin their own EML selection process. Additionally, the WHO provides guidance documents on how to design an NEML Selection Process tailored to national needs, priorities, and contexts [1,6,7]. These guidelines serve as a global standard that embodies the intent of the EM concept and are central to preserving its values, principles, and processes.

The concept of effective policy design is critical in linking policy design inputs and outputs [9]. NEMLs are selected at the national level by a government body. The WHO provides procedural guidance on NEML selection and publishes an up-to-date model EML reflective of global health priorities. Effective policy design at the national level requires aligning WHO policy standards and guidance with the national EML selection process design to ensure that the intent (values, principles, and processes [6,7,10,11] of the EM policy is preserved. A properly designed selection process can ensure that NEMLs contain medicines to meet the priority healthcare needs of citizens and fulfill national policy objectives. The concept of effective policy design is critical to linking policy design inputs and outputs [9]. In this case, it requires aligning global policy standards and guidance with the NEML selection process design, in order to ensure that the intent (values, principles, and processes) [7,10,11]) of the EM policy is preserved.

There appears to be heterogeneity in NEML selection processes across WHO member countries, which often results in NEMLs not reflecting the intent of the EM policy [12]. Evidence from an observational study of 137 countries with an NEML found few associations between variations in the number and types of medicines on the NEMLs and country characteristics representative of healthcare needs [13]. This may indicate that NEML selection process design is based on factors other than priority healthcare needs [13].

The current literature focuses on understanding which features of the NEML selection process may contribute to the selection of an ill-suited NEML [1320]. Some extant studies have compared NEMLs of WHO member countries and analyzed the differences against possible explanatory factors influencing the selection of medicines at the national level [13], examined how research and the availability of contextualized pharmacoeconomic data impact the selection process [15,16], and explored financial barriers that constrain the selection process [14,15]. These studies highlight the impact of national context on policy implementation and recommend changes to existing selection processes to align them with the WHO policy development guidance. As far as we know, no studies have specifically investigated the alignment of the NEML selection process design with global standards.

In this work, we seek to develop a new instrument that captures the concept of NEML selection process design effectiveness and allows for the quantitative evaluation of the alignment of the NEML selection process design with global standards.

Methods

In this study we developed an instrument to measure NEML selection process design effectiveness using methods adapted from the literature and assessed the instrument’s inter-rater reliability and validity using data from a pilot study sample [21,22].

Instrument development

NEML selection process design effectiveness was operationalized as the extent to which existing NEML selection process content aligns with WHO NEML selection process development guidance documents, and the WHO EML selection process [3,7,2228].

A list of items was generated to measure NEML selection process through a literature review by comparing the process to the values of an ideal selection process and against a framework of ethical priority setting. Documents relevant to the application of the Accountability for Reasonableness (AFR) framework and WHO NEML selection process development guidance were identified. We (ES) conducted a literature review up to August 2024. The following databases were searched: SCOPUS, Medline, Google, and Google Scholar. The terms searched included: NEML, EML, EML selection criteria, NEML selection process, EML selection process, EML assessment, and EML selection process rating. All documents published after the introduction of the EM concept (1975) were considered. All types of English-language literature were included (peer-reviewed articles, WHO repository documents, print articles, electronic articles, websites, books, and policy papers). Title and abstracts were screened for relevance. Purposive sampling was then applied to select literature and documents relevant to the development NEMLs and application of AFR for policy assessment.

Data was analyzed using modified thematic analysis. First, the WHO model EML selection process [3] and the WHO NEML selection process development guidance documents [3] were analyzed to identify ideal process values for an NEML selection process that is fully aligned with EM policy intent. These process values served as categories under which instrument items were generated, similar to methods described previously [29]. The process values were then deductively categorized: relevance, transparency, revision & appeals, and enforcement.

The response scale for each instrument item was established through a review of key policy documents from WHO member countries and relevant literature, including current and previous NEMLs and National Pharmaceutical Policies (NPPs), associated legislation, relevant reports, and scholarly articles. All publicly available documents relevant to NPP development and NEML selection process design were included. Where available, the national health authority was contacted to provide official policy documents. WHO member countries with an NEML were sampled alphabetically until response saturation was reached. A modified Likert scale was used to generate a response scheme, with each item having a unique scale to capture the range of responses encountered in the policy document review. For each item, response options were arranged from least to most desirable and assigned a number rating to create a modified Likert rating scale. The highest rating for each item was based on the WHO model EML selection process design, as it is perfectly aligned with EM policy standards and should capture all features of an ideal selection process. The NEML selection process design effectiveness raw score was calculated by summing all item scores.

Instrument testing

Instrument testing used quantitative methods to establish the reliability and validity of the newly developed instrument that measures NEML selection process design effectiveness. A pilot study assessing the NEML selection process design effectiveness of four WHO member countries was used to establish the instrument’s reliability and validity. Four countries representative of different regions and with different types of documentation available were assessed in the pilot (Afghanistan, Papua New Guinea, South Africa, and Barbados) by stratifying countries examined during response scheme generation into income-level classifications and selecting one representative country from each category (See S1 File Table 1). As this instrument is examining a selection process it was deemed important to select countries representative of all resource settings. As well, this ensures that the instrument and response scheme can capture potential responses across economic groupings. Income level of a country is known to be a key determinant of medication access as it impacts affordability, availability and access [30]. NEML selection process design for each sample country was constructed through a document review of NEMLs, NEML selection process documents, and NPP documents publicly available via government websites, organizations websites, and scholarly literature [3133].

Table 1. NEML selection process design effectiveness Instrument.

Item # NEML Selection Process Design Effectiveness Evaluation
1 Explicit instructions for the selection of an expert committee exist.
2 The names, affiliations, and conflict of interest statements of expert committee members are publicly available.
3 The expert committee responsible for National Medicines List (NEML) selection operates with full scientific independence.
4 Detailed guidelines/principles for the expert committee to establish an essential medicines list exist.
5 Explicit and detailed selection criteria for essential medicines list selection exist.
6 There is explicit direction to base EM selection decisions on scientific evidence of efficacy and safety, as per the selection criteria.
7 The prevalence of health conditions and resistance patterns are considered in EML selection, as per the selection criteria.
8 The selection criteria of EMs explicitly considers financial implications when examining medicines with equal safety and efficacy.
9 The selection criteria of essential medicines assesses the feasibility of uptake (health care setting, personnel etc.).
10 There is clear evidence of a National Medicines Policy (NMP) explicitly emphasizing a focus on communication of NEML and clinical guidelines to the public and healthcare personnel.
11 The documentation associated with the decision-making process, such as meeting minutes, is made publicly available.
12 The selection process used to select EMs is published publicly. (Website, journal, industry paper, etc.)
13 There is a means for the public or other interested parties to question decisions on inclusion/exclusion of Essential Medicine on the NEML.
14 There are clear indications that the EM selection process is reviewed (external/internal review of information).
15 Selected EML Revised/Reviewed regularly (There are instructions to review/revise the NEML regularly).
16 The use and impact of EML implementation is monitored. (There are instructions to monitor the use and impact of the NEML as a policy tool.)
Total Rating
Relative Rating

Four members of the research team (DM, LS, AB, IA) who had backgrounds in health services research and were familiar with NEMLs acted as the raters. All four raters were provided with relevant documents required to evaluate the sample countries and received training through an introductory presentation and job aid. Ratings were performed independently.

NEML selection process design scores were determined by applying the instrument to each sample country by five raters: four recruited raters and the author. The raters were given eight hours per country to independently perform a document review and assess NEML selection process design. Item scores were converted using a conversion factor so that all items were equally weighted out of five. The summation of converted item scores formed the total NEML selection process raw score (See S1 File Table 2 for conversion calculations). Total raw scores were converted to percentage NEML selection process design effectiveness. The mean and standard deviation of NEML selection process design effectiveness scores for each pilot study country were calculated. Statistical analyses were conducted using SPSS by the author.

Table 2. AFR conditions mapped to process values and instrument items.

A4R Conditions Definition Process Values Evaluation Criteria Item#
Publicity Decisions should be made on the basis of reasons (i.e., evidence, principles, values, and arguments) that fair-minded people can agree are relevant under the circumstances. Fair-minded people are defined simply as those who seek in principle to cooperate with others to find mutually justifiable solutions to priority-setting problems. Transparency The selection process used to select essential medicines is published publicly (website/journal/industry paper/etc.). 12
The documentation associated with the decision-making process, such as meeting minutes, is made publicly available. 11
The names, affiliation, and conflict of interest statements of expert committee members are publicly available. 2
Explicit instructions for the selection of an expert committee exist. 1
Consultative There is clear evidence of a National Medicines Policy (NMP) explicitly emphasizing a focus on communication of NEML and clinical guidelines to the public and healthcare personnel. 10
Relevance Decisions and their rationale should be transparent and made publicly accessible. Accountability The expert committee responsible for NEML selection operates with full scientific independence. 3
Detailed guidelines/principles for the expert committee to establish an EML exist. 4
Relevant Selection criteria & Evidence-based selection Explicit and detailed selection criteria for essential medicines list selection exist. 5
The selection criteria of essential medicines explicitly considers financial implications when examining medicines with equal safety and efficacy. 8
There is explicit direction to base essential medicine selection decisions on scientific evidence or efficacy and safety, as per the selection criteria. 6
The prevalence of health conditions and resistance patterns are considered in NEML selection, as per the selection criteria. 7
The selection criteria of essential medicines assesses the feasibility of uptake (healthcare setting, personnel, etc.). 9
Revisions & Appeals There should be opportunities to revisit and revise decisions in light of further evidence or arguments, and there should be a mechanism for challenge and dispute resolution. Participation
Process Review & Revisions Selected EML revised/reviewed regularly. (There are instructions to revise/review the NEML regularly.) 15
There are clear indications that the essential medicines selection process is reviewed (external/internal review of information). 14
There is a means for the public or other interested parties to question decisions on inclusion/exclusion of essential medicines on the NEML. 13
Enforcement There should be either voluntary or public regulation of the process to ensure that the first three conditions are met. Process Use & Implementation The use and impact of EML implementation is monitored. (There are instructions to monitor the use and impact of the NEML as a policy tool.) 16

Reliability testing

Interrater reliability refers to the degree of variation in scores given by multiple raters for identical items [31]. Interrater reliability was determined using the Intraclass Correlation Coefficient (ICC) [31].

Two ICC forms were calculated [32]: two-way, random effects, consistency, multiple rater; and two-way, random effects, absolute agreement, single rater. ICCs and their associated 95% confidence intervals were calculated for total country scores and scores stratified by AFR conditions. Guidelines found in the literature were followed to design and report on interrater reliability for the instrument [3133].

Validity testing

Face and construct validity were established through iterative item generation, revision, and reduction in conjunction with subject matter experts from the WHO, academic specialists, and researchers.

In the absence of accepted measures of the NEML quality, we used two proxy measures: [1] the number of medicines recently added to WHO model list that were also recently added to the NEML (larger number indicates a responsive NEML process) and [2] the number of medicines recently removed from the WHO model list that remain on the NEML (smaller number indicates a responsive NEML process). Criterion validity was assessed by graphically comparing country NEML selection process design effectiveness mean scores against these two proxy measures for the pilot study sample countries.

Ethics statement

This study used only publicly available policy documents and did not involve human participants, patients, or confidential data. As such, it did not require ethics review under our institution’s policies.

Results

Instrument development

WHO guidance documents were reviewed, and recommended selection process design components, processes, and process values were identified. The identified components and processes formed 28 features of an ideal NEML selection process design. These 28 features were consolidated by logically combining related features into single questions, forming the basis of a preliminary instrument with 25 items (see S1 File Fig 1). In the first iteration, and in consultation with subject matter experts, 12 items were removed based on the following: eight items were redundant, and one item focused on implementation rather than selection process design (see S1 File Fig 1). The 16 items of the new instrument (Table 1) were then mapped to WHO guidance process values (Table 2), which were further mapped to the AFR procedural fairness conditions (Table 2).

Fig 1. Mean and standard deviation of total and AFR condition scores for NEML selection process design effectiveness in pilot study countries (Afghanistan, Barbados, Papua New Guinea, and South Africa).

Fig 1

A rating scale was created for each item based on scenarios encountered during a document review of NEML selection process documents for 40 countries. Each of the 16 items had a unique rating scale, ranging between 2 and 6 potential scores. Each point on the rating scale represents a specific situation as defined by the rating scheme. Table 3 provides a detailed description of the rating scale and scheme.

Table 3. Response scale and scheme.

Evaluation Criteria Rating Scale 0 1 2 3 4 5
1 Explicit instructions for the selection of an expert committee exist. 0 - 5 No process or explicit instructions for the selection of an expert committee exist. A Process for selection of an expert committee exists, but is not documented. An expert committee selection process with explicit instructions exists, and is documented; but it is only internally available. An expert committee selection process with explicit instructions exists, is documented, and is published (publicly available). An expert committee selection process with explicit instructions exists, is document, is published publicly. The process has rigorous oversight. An expert committee selection process with explicit instructions exists, is document, is published publicly. The process has rigorous oversight, and there is accountability for selection.
2 The names, affiliations, and conflict of interest statements of expert committee members are publicly available. 0 - 3 No identification of expert committee members. Names of the expert committee members are publicly available. Names and affiliations of the expert committee members are publicly available. Names, affiliations, and conflict of interest statements of the expert committee members are publicly available.
3 The expert committee responsible for National Essential Medicines List (NEML) selection operates with full scientific independence. 0 - 2 No initiative to ensure independence o2f expert committee. Guidelines exist that acknowledge the scientific independence of the expert committee Explicit implementation of scientific independence of the expert committee through legislation/policies/etc..
4 Detailed guidelines/principles for the expert committee to establish an essential medicines list exist. 0 - 2 No selection guidelines/principles exist General guidelines/principles exist. Detailed guidelines/principles exist.
5 Explicit and detailed selection criteria for essential medicines list selection exist. 0 - 4 No selection criteria exists An informal NEML Selection philosophy exists. A formal but general NEML Selection philosophy exists. A formal and detailed Selection Criteria for EM exists. A formal and detailed Selection Criteria for EM,and is published for the public.
6 There is explicit direction to base essential medicine selection decisions on scientific evidence of efficacy and safety, as per the selection criteria. 0 - 1 No explicit direction to link essential medicines selection to scientific evidence of efficacy and safety Explicit direction to link essential medicines selection to scientific evidence of efficacy and safety
7 The prevalence of health conditions and resistance patterns are considered in NEML selection, as per the selection criteria. 0 - 1 No process or instruction to examine disease prevalence data during essential medicine selection. Clear Process and explicit instructions to examine disease prevalence data during essential medicine selection.
8 The selection criteria of essential medicines explicitly considers financial implications when examining medicines with equal safety and efficacy. 0 - 1 There are no guidelines or methods for assessing/handling the financial implications when examining medicines with equal safety and efficacy. There is a method and guidelines for assessing/handling the financial implications when examining medicines with equal safety and efficacy.
9 The selection criteria of essential medicines assesses the feasibility of uptake (health care setting, personnel etc.). 0 - 1 No assessment of feasibility of uptake. Assess feasibility of uptake – healthcare setting/personnel available etc.
10 There is clear evidence of a National Medicines Policy (NMP) explicitly emphasizing a focus on communication of NEML and clinical guidelines to the public and healthcare personnel. 0 - 4 No clear communication emphasized. The NMP emphasizes a general philosophy to communicate NEML and clinical guidelines with few details/ information available. The NMP emphasizes an explicit instructions to communicate NEML and clinical guidelines with detailed information. The NMP emphasizes explicit instructions to communicate NEML and clinical guidelines. There is clear evidence of multiple modes of communication, with detailed information. The NMP emphasizes explicit instructions to communicate NEML and clinical guidelines. There is clear evidence of multiple modes of communication, with detailed information and avenues for general queries.
11 The documentation associated with the decision-making process, such as meeting minutes, is made publicly available. 0 - 2 No documentation of decision making documents EM decision making process documented, but not made publically available. EM decision making process documented, and made publically available.
12 The selection process used to select essential medicines is published publicly (website/journal/industry paper/etc.). 0 - 3 No Published Selection Process Selection Process published internally only. Selection process published publically. Widespread publication and availability of Selection Process (website/ journal/ etc.).
13 There is a means for the public or other interested parties to question decisions on inclusion/exclusion of essential medicines on the NEML. 0 - 4 No means or process exists. Means or process exists to question decisions on essential medicines, however it is not clear/straightforward. Clear and accessible process and means of questioning decisions on essential medicines inclusion/exclusion decisions. Documentation of questions to decision makers and results/answers exists. Documentation of changes made as a result of questioning of decision/ decision makers.
14 There are clear indications that the essential medicines selection process is reviewed (external/internal review of information). 0 - 5 No review of selection process exists. A general process exists for review of selection process. A documented review process exists for internal review of selection process. A documented review process exists for internal review of selection process, with regular process revision. A documented review process exists for internal & external review of selection. A documented review process exists for internal & external review of selection, with regular process revision.
15 Selected EML Revised/Reviewed regularly. (There are instructions to review/review the NEML regularly.) 0 - 2 No instructions to review/revise NEML regularly. Instructions to review/revise NEML regularly, and evidence that NEML intermittently revised. Instructions to review/revise NEML regularly, and evidence that NEML regularly revised.
16 The use and impact of EML implementation is monitored. (There are instructions to monitor the use and impact of the NEML as a policy tool.) 0 - 2 No instructions to monitor use and impact of NEML. There are instructions to monitor the use and impact of the NEML as a document and as a policy tool. (quantitative/qualitative). There are instructions to monitor the use and impact of the NEML as a policy tool (quantitative/qualitative). There exist published reports or articles evaluating the use and impact of NEMLs.

Instrument testing

NEML selection process design effectiveness raw scores for each sample country were derived by summing the 16 converted instrument item scores, which are also presented as percentage scores (see Table 4). A perfect score on the instrument is 80. AFR condition scores for each country were calculated by stratifying instrument item scores as outlined in Table 1. The mean and standard deviation of scores for each country are also included in Table 4.

Table 4. Converted out of 5 – Total, and AFR raw scores: mean and percentage, and standard deviation for pilot study sample countries.

Measure Mean Score Statistics Afghanistan Barbados Papua New Guinea South Africa
Total Score (%) 49.05 (61%) 49.43 (62%) 30.43 (38%) 68.8 (86%)
Standard deviation 5.5 5.5 4.8 3.9
Publicity Score (%) 13.95 (56%) 15.28 (61%) 3.33 (13%) 22.95 (92%)
Standard deviation 2.3 1.5 1.2 0.5
Relevance Score (%) 25 (71%) 24 (69%) 18.25 (52%) 31.5 (90%)
Standard deviation 5.4 5.7 5.6 2.9
Revisions & Appeals Score (%) 8.1 (54%) 6.65 (44%) 4.85 (32%) 11.45 (76%)
Standard deviation 1.4 2.5 0.9 2.9
Enforcement Score (%) 2 (40%) 3.5 (70%) 4 (80%) 4.5 (90%)
Standard deviation 2.1 2.2 2.2 1.1

The highest-scoring country was South Africa, with a score of 68.8 (86%), followed by Barbados at 49.43 (62%), Afghanistan at 49.05 (61%), and Papua New Guinea at 30.43 (38%). Standard deviation indicates how much the data scatters around the mean scores or the degree of variation in rater assessments. Afghanistan and Barbados had the greatest standard deviations (5.5), followed by Papua New Guinea (4.8), while South Africa had the lowest variation (standard deviation 3.9).

The country scores for the Publicity condition followed a similar trend to the total score: South Africa scored highest (22.95), followed by Barbados (15.28), Afghanistan (13.95), and Papua New Guinea (3.33). Scores for the Relevance and Revisions & Appeals conditions differed slightly, as Afghanistan scored higher than Barbados. The Enforcement condition score was an exception, with Papua New Guinea [4] scoring higher than Barbados (3.5) and Afghanistan [2]. The similarities in score trends for Publicity, Relevance, and Revisions & Appeals conditions with the total score are evident in Figs 1 and 2.

Fig 2. Mean scores for NEML selection process design effectiveness across pilot study countries (Afghanistan, Barbados, Papua New Guinea, and South Africa), aggregated by score type (total and AFR condition scores).

Fig 2

Reliability

Five raters independently assessed the four sample countries in the pilot study using the NEML selection process design effectiveness assessment instrument (see S1 File Table 3). Interrater reliability was assessed using two ICC forms selected based on established recommendations [30]: absolute agreement ICC using a two-way random effects model for single raters, and consistency ICC using a two-way random effects model for multiple raters [30]. Estimates of ICC and their 95% confidence intervals were calculated using SPSS statistical package versions 28 and 29 (SPSS Inc., Chicago, IL).

The NEML selection process design effectiveness total score showed an absolute agreement ICC of 0.98 (95% confidence interval: 0.91–0.99) and a consistency ICC of 0.97 (95% confidence interval: 0.88–0.99). The ICC for the Enforcement condition score was difficult to interpret due to there being only one item (See Table 5).

Table 5. Intra-Class Correlation Coefficient estimates for NEML Assessment Instrument total score and AFR condition scores.

Measure ICC Type Intraclass Correlation 95% Confidence Interval
Lower Bound Upper Bound
Total AA 0.98 0.91 0.99
C 0.97 0.88 0.99
Publicity AA 0.99 0.97 1.000
C 0.99 0.97 1.000
Relevance AA 0.83 0.33 0.99
C 0.84 0.28 0.99
Revisions & Appeals AA 0.89 0.56 0.99
C 0.89 0.53 0.99
Enforcement AA 0.44 −2.04 0.96
C 0.43 −1.90 0.96

Interrater reliability of NEML selection process design effectiveness scores aggregated by AFR conditions varied. The Publicity condition score showed excellent reliability: absolute agreement ICC (0.99, 95% CI: 0.97–1.00) and consistency ICC (0.99, 95% CI: 0.97–1.00). The Relevance condition score showed poor reliability: absolute agreement ICC (0.83, 95% CI: 0.32–0.99) and consistency ICC (0.84, 95% CI: 0.28–0.99), as the lower bounds of the 95% confidence interval were below 0.5. The Revisions & Appeals condition score showed moderate reliability: absolute agreement ICC (0.89, 95% CI: 0.56–0.99) and consistency ICC (0.89, 95% CI: 0.53–0.99), as the lower bounds of the 95% confidence interval were below 0.75. The confidence intervals for the Enforcement condition were wide, likely owing to the limited range in ordinal values. Due to varying AFR condition ICC values and wide confidence intervals, AFR sub scores should be used and interpreted with caution.

Validity

Each component of the NEML selection process design effectiveness assessment instrument was developed in collaboration with specialists from the WHO. Face and construct validity were established by performing item generation, revision, and reduction iteratively in conjunction with subject matter experts (WHO, academic specialists, researchers). This collaboration process resulted in minimal revisions to the instrument items and rating scheme.

Instrument validity was also assessed graphically using scatter plots. Instrument scores were plotted on the x-axis against two external criteria on the y-axis: a proxy measure of an NEML selection process that is up-to-date (added medicines common to both global and national lists; Fig 3) and a proxy measure of an NEML selection process that is not up-to-date (medicines excluded from the WHO model EML but still on the NEML; see S1 File Table 5). Fig 3 shows that countries with higher instrument scores also had a greater number of medicines in common with the most recent additions to the WHO model list. In contrast, Fig 4 shows that as country instrument scores increased, there were fewer medicines on their national lists that had been deleted from the corresponding WHO model list. Both findings align with our hypothesis and indicate that the instrument has good validity.

Fig 3. Validity test results showing the relationship between instrument scores and a proxy measure indicating an up-to-date selection process across pilot study countries (Afghanistan, Barbados, Papua New Guinea, and South Africa).

Fig 3

Fig 4. Validity test results showing the relationship between instrument scores and a proxy measure indicating an out-of-date selection process across pilot study countries (Afghanistan, Barbados, Papua New Guinea, and South Africa).

Fig 4

Discussion

The NEML selection process plays a pivotal role in the overall policy process. A critical first step in designing and revising effective essential medicines policies to achieve health equity and improve population health is ensuring an effective NEML selection process design, one that is aligned with WHO standards, is in place. In this effort, we developed and validated a 16-item instrument for measuring NEML selection process design effectiveness. The instrument has good validity and excellent reliability overall.

Improving equitable access to medicines is the target outcome of Essential Medicines policies, as such there are existing methods that are aimed at monitoring progress. A well established and widely used indicator of access is the WHO/Health Access Internation medicines prices, availability, and affordability index [34]. Wirtz et.al. also propose five priority areas(Financing, Affordability, Quality and safety, Rational use, and Innovation) for policy evaluation in the journey towards Universal Health Coverage of essential medicines [30]. Bigdeli et.al. proposed a conceptual framework that embeds access to medicines in a health systems perspective accounting for supply and demand side barriers identified in previous frameworks. The proposed framework reorganizes components of the health system influencing access to medicines and accounting for the dynamic relationship between them [35]. To our knowledge however, this is the first attempt at a method to quantitatively evaluate a component of national medicines policy design that is based on recommendations provided by the WHO. Currently, no validated instrument exists that quantitatively evaluates NEML selection process design effectiveness based on the assessment of policy content. Additionally, this is the first study to apply the AFR conditions to the evaluation of the NEML selection process design. Studies of the NEML selection process predominantly attempt to qualitatively examine the processes of individual countries [1416,18,19], focusing on identifying process deficiencies. Quantitative inquiry has primarily focused on comparative analysis of NEMLs, which indirectly examines the NEML selection process design.[13,36] The IMF Institute for Health Informatics quantitatively compared medicines on the lists of nine countries and described the NEML selection processes of those countries.[20] The report classifies the country-level factors affecting list implementation into six categories: pricing, availability, reimbursement, government initiative, patent and licensing issues, and healthcare infrastructure. The impact of each category on the country is assessed on a negative-neutral-positive scale. Another quantitative observational study computed disparities among NEMLs, as well as between NEMLs and the WHO model EML for 137 countries.[13] The differences between lists were statistically analyzed against country characteristics reasonably expected to represent a population’s healthcare needs [13], in order to identify factors that could explain the differences among national lists and discrepancies between NEMLs and the WHO model EML.

The WHO’s efforts at aiding governance and advocacy for the EM concept globally have advanced the agenda of medication access [5,8], however, challenges still persist.[37] Prerequisite to ensuring the success of the EM policy in all four dimensions of access (availability, affordability, drug financing, and adequate supply) [7] is a selection process design capable of producing a national list that addresses the national health priorities. Several qualitative reports survey the components of national policy pertaining to the NEML selection process, alluding to its key role in the policy process.[38,39] This paper takes evaluation of EM policy further by ensuring the most effective design can be put in place, through the development of an instrument that quantitatively assesses the alignment of WHO policy standards with national policy design. The application of a valid and reliable instrument that evaluates NEML selection process design can be used not only to assess the current design of the NEML selection process, but also to provide direction on how to improve that design in low-scoring AFR categories.

Limitations

Developing a new instrument is not without the challenge of establishing reliability of the measurement and validity of the instrument. Advantages of a new instrument include customizability to specific samples and data, both in items and range of responses. In this way, one can create a dynamic scale that accounts for the evaluation of real-world policies and responses. The NEML selection process design effectiveness assessment instrument developed here has excellent interrater reliability, along with reasonable interrater reliability of items, thereby providing meaningful information about the policy design of WHO member countries. However, it is recognized that country scores must be carefully interpreted, considering the low reliability of individual instrument items and Accountability for Reasonableness condition sub scores. This may mean that sub scores should be used only with caution and in context of the total score, and this is especially true for the enforcement domain which is based on a single item. As well, although the instrument validity was established on the basis of comparison with proxy measures of NEML selection process quality, it mainly assesses how quickly lists are updated to reflect changes made to the WHO model list. While timeliness of list update is an important dimension of process quality, it is acknowledged that it may not capture the concept of process quality in its entirety. While this is an important component of NEML selection process quality, it may not capture the full breadth of overall process quality. Additionally, despite efforts to mitigate variability in raters’ interpretation of items and policy documents, this variability can ultimately impact ratings and interrater reliability. The instrument was piloted in four countries that are not necessarily representative. Furthermore, the reliance of the ratings on publicly available policy data carries the risk of bias, as countries without publicly available information may systematically differ (economically, geographically), and their situations may not be captured in the response scale.

Conclusion

The instrument developed here measures the construct of NEML selection process design effectiveness, which examines the alignment of national policy content with policy intent. The final instrument has 16 items that were aggregated into the four AFR condition categories to help understand different contributions of the attributes of the NEML selection process design. The instrument has good validity and excellent reliability overall, however domains require further refinement. Future research could improve AFR condition sub score reliability by either increasing the number of items in each condition category or expanding the pilot study to include more countries and reassessing reliability.

This instrument provides a preliminary means to evaluate the policy content with respect to the NEML selection process design across countries. The data gathered can inform cross-country comparisons, assessments of scores over time, and to improve policy design based on both total scores and AFR condition sub-scores. Further validation in additional settings is needed to confirm broader applicability.

Supporting information

S1 File. Tables and Figures.

(DOCX)

pone.0342750.s001.docx (47.7KB, docx)
S2 File. Ratings by each rater in pilot study.

(XLSX)

pone.0342750.s002.xlsx (13KB, xlsx)

Acknowledgments

The World Health Organization (WHO) provided subject matter expertise and feedback during the instrument development process. The pilot country ratings were contributed by Darshanand Maraj (DM), Anjali Bali (AB), Itunu Adekoya (IA), and Liane Steiner (LS).

Data Availability

This study was designed as a pilot project. The data used to rate pilot study countries is available in the WHO repository, where NEMLs for each country are publicly available: https://www.who.int/teams/health-product-policy-and-standards/assistive-and-medical-technology/essential-medicines/national-emls The data underlying the instrument ratings can be found in the supplementary file. The corresponding author or the Unity Health Toronto Research Ethics Board (researchethics@unityhealth.to) can be contacted if additional data is needed.

Funding Statement

The author(s) received no specific funding for this work.

References

Decision Letter 0

Muhammad Shahzad Aslam

4 Sep 2025

Dear Dr. Persaud,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

  • Deposit all data openly (mandatory for journal compliance).

  • Provide ethics waiver/approval details.

  • Revise the abstract, background, and structure as per R1’s points.

  • Strengthen justification for country selection.

  • Clarify and discuss weak reliability sub-domains.

  • Tone down conclusions, add limitations/future work.

  • Reformat appendices and improve figures/tables.

  • Cite updated WHO sources.

Please submit your revised manuscript by Oct 19 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Muhammad Shahzad Aslam, Ph.D.,M.Phil., Pharm-D

Academic Editor

PLOS ONE

Journal Requirements:

If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise.

Additional Editor Comments:

Ethics: Need for clarification on waiver vs. approval (R1 #7–8). This is a compliance matter that must be resolved before publication.

Data availability: Both reviewers highlight that current practice (“on request”) does not meet PLOS ONE requirements. Must deposit data in an open repository such as OSF.

Sample size/generalizability: Both reviewers point out only four countries were included, which limits external validity. This cannot be fixed immediately, but must be framed transparently as a limitation.

Reliability: Some domains (e.g., Enforcement) show poor ICC. Both reviewers flag this. Authors must address it in discussion/limitations and ideally adjust the instrument.

Overstatement of conclusions: R1 is correct that claims are too strong given small sample + weak domains.

Appendices: R1 raises concerns that they are confusing, incomplete, or poorly explained. These can be corrected with careful revision.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

**********

Reviewer #1: I have reviewed the manuscript, but major comments need to be addressed accordingly, the manuscript addresses an important gap by proposing and validating an instrument to assess the design effectiveness of NEML selection processes. However, some methodological, conceptual, and reporting issues require attention before publication:

1. Title of study should be revised for better clarity, it should be catchy and concise

2. Background is long, make it short max 2 lines

3. Purpose/aim/objective of the study should be mentioned separately after the background.

4. The abstract reports ICC values but does not specify the pilot sample size (n=4 countries, 5 raters). Including this would improve transparency.

5. Key words should be mentioned alphabetically

6. Abbreviations should be written in full form first e.g. AFR, EML, NEML, NPPs etc, check this throughout the manuscript.

7. Ethics approval waiver letter is required, mention the details of waiver in revised manuscript and also provide the waiver letter to reviewers and editorial office for their perusal and consideration.

8. While the study used publicly available documents, the authors should clarify whether consultation with WHO experts and recruited raters required institutional ethics approval or exemption (particularly since human raters were involved). Currently, this section is minimal.

9. The authors claim this is the first validated instrument to assess NEML selection process design. While the novelty is clear, the manuscript could benefit from a deeper comparison with existing frameworks/tools for priority setting and policy evaluation e.g., WHO checklist approaches, health policy analysis frameworks. This would better situate the contribution.

10. Why Only four countries (Afghanistan, Papua New Guinea, South Africa, Barbados) were assessed? The rationale for their selection is only briefly explained (income-level stratification). A stronger justification is needed for why these specific countries were chosen and how representative they are. Small sample size also raises concerns about generalizability.

11. While total instrument reliability is excellent (ICC > 0.9), sub-domain reliability (e.g., Relevance, Revisions & Appeals, Enforcement) is weak due to wide CIs and limited items. The authors should discuss how this affects interpretability of subscale scores and whether additional items are needed to improve measurement stability.

12. It was observed that the proxy measures “alignment with WHO model EML additions/removals are reasonable but limited. They primarily capture timeliness rather than overall process quality. The authors should acknowledge this limitation more explicitly and suggest potential stronger gold-standard validity benchmarks.

13. The data availability statement indicates datasets are available upon request. This does not fully align with PLOS ONE’s data policy, which generally requires deposition in an open repository unless there are compelling reasons. Authors should ensure raw scoring data, item-level assessments, and country-level evaluations are made available.

14. The conclusion overstates the robustness of the instrument. Given some domains had low reliability and the pilot sample was limited, claims about wide applicability should be more cautious. The authors should emphasize that further validation in diverse contexts is necessary. Revise the conclusion accordingly.

15. Separately add the limitations and future recommendations section below the conclusion part, clearly mentioning the drawbacks of the current study, and what next can be done, discus this in a logical way.

16. The manuscript is generally clear, but some sections (Methods, particularly instrument development and response scale generation) are overly technical and could be streamlined for readability. A flowchart summarizing instrument development steps would help.

17. Table 3 (response scales) is long and dense; it may be better placed in supplementary materials. Figures 1–3 could use clearer captions that describe the key interpretation points (e.g., what higher/lower scores mean in context).

18. The manuscript uses “AFR” and “Accountability for Reasonableness” interchangeably. Standardize terminology to avoid confusion.

19. Several references are dated (e.g., 2001, 2002 WHO guidelines). More recent WHO documents (2023 model list and guidance) should be cited where applicable.

20. Appendix A, Pilot Study Sample Countries: Only four countries are included, which limits representativeness, the table does not explain why these specific countries (e.g., Afghanistan vs. other low-income countries) were chosen. Explicitly state the selection rationale beyond income-stratification (e.g., data availability, policy diversity). My recommendation is that discuss these points as a footnote or provide a brief explanation in discussion part.

21. Appendix B, Conversion Calculation: No worked example is provided, making it difficult for readers to follow how conversion works in practice. What was the logic behind choosing 42 as the total raw max score and converting to 80 is not sufficiently explained. I recommend to Include a step-by-step example for one item and country, showing raw score.

22. Appendix C, Explanations are superficial

23. Appendix D, Clarify the meaning of rater codes, add a legend, and consider averaging scores in a separate summary table.

24. Appendix E, Adds little beyond what is already in Results.Unclear whether these are averages across raters or consensus scores. State clearly if the totals are mean scores. If redundant with main text, move to supplementary-only.

25. Appendix F, Only percentages are provided; no statistical measures (correlation coefficients, regression outputs) are included. Proxy measures mainly capture timeliness of updates, not full process quality. Include statistical tests (e.g., Pearson/Spearman r, regression slopes). Acknowledge limitations of proxies more strongly.

26. Appendix G, Some domains show poor reliability (e.g., Enforcement ICC = 0.44, CI includes negative values). This is downplayed in the main text. The appendix does not explain why enforcement reliability is so weak (likely due to single-item measure). Add an explicit note on interpretability issues for domains with poor reliability. Consider revising the instrument to include more items under weak domains.

27. The appendices add value by making the methodology transparent, but they currently fall short of journal-quality supplementary material. They require: Better formatting (clearer tables, legends, consistent decimals), Stronger statistical reporting (validity beyond descriptive percentages). More detailed justification for methodological decisions (item reduction, country selection). Removal of duplicated or redundant sections. Without these revisions, the appendices risk confusing rather than supporting readers.

28. Major Revision Required: The manuscript is promising and fills an important methodological gap, but revisions are required to strengthen justification of the study design, address limitations of reliability/validity, clarify ethics and data availability, and moderate conclusions.

Reviewer #2: This study presents the first systematic development and validation of an instrument to evaluate the design effectiveness of National Essential Medicines List (NEML) selection processes, addressing a significant gap in the literature regarding quantitative assessment tools. The application of the Accountability for Reasonableness (AFR) framework to the evaluation of NEML selection processes is particularly innovative from a theoretical perspective. The study is well-designed, methodologically rigorous, and yields convincing results with strong theoretical and practical implications. It is recommended for acceptance after minor revisions, including expanding the sample size, optimizing the data availability statement, and elaborating on methodological details.

Specific Comments:

1.The instrument development process is rigorous, but the sample size is limited. The tool was developed based on WHO guidelines and expert consultation, demonstrating a systematic approach. However, the pilot study included only four countries, which may limit the generalizability of the findings. It is recommended to expand the number of countries in future studies to enhance the representativeness and robustness of the results.

2.Reliability is generally good, but certain sub-dimensions show low reliability. The ICC values for the total score indicate excellent reliability (>0.9). However, the ICC for the "Enforcement" condition is relatively low (0.44) with a wide confidence interval, suggesting poor inter-rater consistency for this dimension. Further refinement of the scoring criteria or additional items for this dimension are recommended.

3.The validity assessment approach is reasonable, but proxy measures require further justification. The use of "alignment with additions/removals on the WHO Model List" as a proxy measure is appropriate. However, the theoretical linkage between these indicators and "selection process effectiveness" should be more explicitly articulated to strengthen the validity argument.

4.The literature review and methods section could be more detailed. The description of the literature search strategy (databases, keywords, screening process) is somewhat brief. A more thorough methodological description is recommended to enhance reproducibility. Additionally, please clarify the rationale for retaining 16 items and indicate whether factor analysis or item response theory (IRT) was applied.

5.The current data availability statement does not comply with PLOS ONE’s requirements. The statement “Data are available from the corresponding author upon reasonable request” does not meet the journal’s requirement for full public data availability. It is recommended to deposit the data in a public repository (e.g., Figshare, Zenodo) and provide a DOI or access link.

6.Results are clearly presented, but figures could be improved. The figures (e.g., Figure 1–3) effectively communicate the main results but lack sufficient annotations and explanatory text. Please consider adding clearer labels, error bar interpretations, and indicators of statistical significance where applicable.

7.The policy implications are strong and practical. The developed instrument has clear value for cross-country comparisons, policy evaluation, and process improvement. It is suggested to further emphasize in the Discussion how the tool could be implemented by national or international organizations (e.g., WHO) and outline plans for future dissemination.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: Yes: Jing Zhang, Ph.D.

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: NEML Review.docx

pone.0342750.s003.docx (18.2KB, docx)
PLoS One. 2026 Feb 25;21(2):e0342750. doi: 10.1371/journal.pone.0342750.r002

Author response to Decision Letter 1


10 Nov 2025

Reviewer comments

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I have reviewed the manuscript, but major comments need to be addressed accordingly, the manuscript addresses an important gap by proposing and validating an instrument to assess the design effectiveness of NEML selection processes. However, some methodological, conceptual, and reporting issues require attention before publication:

1. Title of study should be revised for better clarity, it should be catchy and concise

***AUTHORS’ RESPONSE: We revised the title for clarity.

2. Background is long, make it short max 2 lines

***AUTHORS’ RESPONSE: We shortened the background section in the abstract as suggested.

3. Purpose/aim/objective of the study should be mentioned separately after the background.

***AUTHORS’ RESPONSE: We have added a separate objective section.

4. The abstract reports ICC values but does not specify the pilot sample size (n=4 countries, 5 raters). Including this would improve transparency.

***AUTHORS’ RESPONSE: We have added the sample size to the abstract.

5. Key words should be mentioned alphabetically.

***AUTHORS’ RESPONSE: The keywords have been put in alphabetical order.

6. Abbreviations should be written in full form first e.g. AFR, EML, NEML, NPPs etc., check this throughout the manuscript.

***AUTHORS’ RESPONSE: We have spelled out each term in its first instance.

7. Ethics approval waiver letter is required, mention the details of waiver in revised manuscript and also provide the waiver letter to reviewers and editorial office for their perusal and consideration.

***AUTHORS’ RESPONSE: This study used only publicly available policy documents and did not involve human participants, patients, or confidential data. As such, it did not require ethics review under our institution’s policies. We have revised the ethics statement to clarify this and we also share the waiver letter.

8. While the study used publicly available documents, the authors should clarify whether consultation with WHO experts and recruited raters required institutional ethics approval or exemption (particularly since human raters were involved). Currently, this section is minimal.

***AUTHORS’ RESPONSE: Consulting with experts was deemed by our institution not to require Research Ethics Board approval.

9. The authors claim this is the first validated instrument to assess NEML selection process design. While the novelty is clear, the manuscript could benefit from a deeper comparison with existing frameworks/tools for priority setting and policy evaluation e.g., WHO checklist approaches, health policy analysis frameworks. This would better situate the contribution.

***AUTHORS’ RESPONSE: We have revised the Discussion section to better situate this study in the literature.

10. Why Only four countries (Afghanistan, Papua New Guinea, South Africa, Barbados) were assessed? The rationale for their selection is only briefly explained (income-level stratification). A stronger justification is needed for why these specific countries were chosen and how representative they are. Small sample size also raises concerns about generalizability.

***AUTHORS’ RESPONSE: We have revised the methods and discussion section to indicate selected countries from different regions that we knew to have type of documentation available. We have discussed this as a potential limitation in the discussion section.

11. While total instrument reliability is excellent (ICC > 0.9), sub-domain reliability (e.g., Relevance, Revisions & Appeals, Enforcement) is weak due to wide CIs and limited items. The authors should discuss how this affects interpretability of subscale scores and whether additional items are needed to improve measurement stability.

***AUTHORS’ RESPONSE: We revised the manuscript to state in the limitations section of the discussion that the subscores should be used only with caution or in the context of the total score.

12. It was observed that the proxy measures “alignment with WHO model EML additions/removals are reasonable but limited. They primarily capture timeliness rather than overall process quality. The authors should acknowledge this limitation more explicitly and suggest potential stronger gold-standard validity benchmarks.

***AUTHORS’ RESPONSE: We have revised the limitation section of the discussion to indicate this.

13. The data availability statement indicates datasets are available upon request. This does not fully align with PLOS ONE’s data policy, which generally requires deposition in an open repository unless there are compelling reasons. Authors should ensure raw scoring data, item-level assessments, and country-level evaluations are made available.

***AUTHORS’ RESPONSE: We have revised this to indicate that the data used to rate the countries is available in the WHO’s repository. We can be contacted if additional data is needed.

14. The conclusion overstates the robustness of the instrument. Given some domains had low reliability and the pilot sample was limited, claims about wide applicability should be more cautious. The authors should emphasize that further validation in diverse contexts is necessary. Revise the conclusion accordingly.

***AUTHORS’ RESPONSE: We have revised the conclusion accordingly and moderated the claims.

15. Separately add the limitations and future recommendations section below the conclusion part, clearly mentioning the drawbacks of the current study, and what next can be done, discus this in a logical way.

***AUTHORS’ RESPONSE: We have added a separate limitations section. We have also discussed needed future work.

16. The manuscript is generally clear, but some sections (Methods, particularly instrument development and response scale generation) are overly technical and could be streamlined for readability. A flowchart summarizing instrument development steps would help.

***AUTHORS’ RESPONSE: We have revised the manuscript for clarity.

17. Table 3 (response scales) is long and dense; it may be better placed in supplementary materials. Figures 1–3 could use clearer captions that describe the key interpretation points (e.g., what higher/lower scores mean in context).

***AUTHORS’ RESPONSE: We have amended the captions for Figures 1-3.

18. The manuscript uses “AFR” and “Accountability for Reasonableness” interchangeably. Standardize terminology to avoid confusion.

***AUTHORS’ RESPONSE: We have standardized the terminology in the manuscript.

19. Several references are dated (e.g., 2001, 2002 WHO guidelines). More recent WHO documents (2023 model list and guidance) should be cited where applicable.

***AUTHORS’ RESPONSE: We have updated the manuscript with the most recent reference documents.

20. Appendix A, Pilot Study Sample Countries: Only four countries are included, which limits representativeness, the table does not explain why these specific countries (e.g., Afghanistan vs. other low-income countries) were chosen. Explicitly state the selection rationale beyond income-stratification (e.g., data availability, policy diversity). My recommendation is that discuss these points as a footnote or provide a brief explanation in discussion part.

***AUTHORS’ RESPONSE: We have addressed this comment in R1 comment 10 above.

21. Appendix B, Conversion Calculation: No worked example is provided, making it difficult for readers to follow how conversion works in practice. What was the logic behind choosing 42 as the total raw max score and converting to 80 is not sufficiently explained. I recommend to Include a step-by-step example for one item and country, showing raw score.

***AUTHORS’ RESPONSE: We have edited Appendix B to provide a worked example of the score conversion.

22. Appendix C, Explanations are superficial

***AUTHORS’ RESPONSE: We have revised the explanations in Appendix C and provided examples.

23. Appendix D, Clarify the meaning of rater codes, add a legend, and consider averaging scores in a separate summary table.

***AUTHORS’ RESPONSE: Appendix D amended

24. Appendix E, Adds little beyond what is already in Results. Unclear whether these are averages across raters or consensus scores. State clearly if the totals are mean scores. If redundant with main text, move to supplementary-only.

***AUTHORS’ RESPONSE: Amended table descriptions/caption to reflect their content

25. Appendix F, Only percentages are provided; no statistical measures (correlation coefficients, regression outputs) are included. Proxy measures mainly capture timeliness of updates, not full process quality. Include statistical tests (e.g., Pearson/Spearman r, regression slopes). Acknowledge limitations of proxies more strongly.

***AUTHORS’ RESPONSE: We have revised discussion/limitations section to acknowledge limitations of the proxy measures.

26. Appendix G, Some domains show poor reliability (e.g., Enforcement ICC = 0.44, CI includes negative values). This is downplayed in the main text. The appendix does not explain why enforcement reliability is so weak (likely due to single-item measure). Add an explicit note on interpretability issues for domains with poor reliability. Consider revising the instrument to include more items under weak domains.

***AUTHORS’ RESPONSE: We have revised the limitation section to indicate the issue with enforcement being a single item.

27. The appendices add value by making the methodology transparent, but they currently fall short of journal-quality supplementary material. They require: Better formatting (clearer tables, legends, consistent decimals), Stronger statistical reporting (validity beyond descriptive percentages). More detailed justification for methodological decisions (item reduction, country selection). Removal of duplicated or redundant sections. Without these revisions, the appendices risk confusing rather than supporting readers.

***AUTHORS’ RESPONSE: We have revised the appendices accordingly.

28. Major Revision Required: The manuscript is promising and fills an important methodological gap, but revisions are required to strengthen justification of the study design, address limitations of reliability/validity, clarify ethics and data availability, and moderate conclusions.

***AUTHORS’ RESPONSE: We have carefully revised the manuscript, and we believe the suggestions were very helpful.

Reviewer #2: This study presents the first systematic development and validation of an instrument to evaluate the design effectiveness of National Essential Medicines List (NEML) selection processes, addressing a significant gap in the literature regarding quantitative assessment tools. The application of the Accountability for Reasonableness (AFR) framework to the evaluation of NEML selection processes is particularly innovative from a theoretical perspective. The study is well-designed, methodologically rigorous, and yields convincing results with strong theoretical and practical implications. It is recommended for acceptance after minor revisions, including expanding the sample size, optimizing the data availability statement, and elaborating on methodological details.

Specific Comments:

1.The instrument development process is rigorous, but the sample size is limited. The tool was developed based on WHO guidelines and expert consultation, demonstrating a systematic approach. However, the pilot study included only four countries, which may limit the generalizability of the findings. It is recommended to expand the number of countries in future studies to enhance the representativeness and robustness of the results.

***AUTHORS’ RESPONSE: We have revised the manuscript to clearly state that scoring only four countries was a limitation. We have discussed the need for further work.

2.Reliability is generally good, but certain sub-dimensions show low reliability. The ICC values for the total score indicate excellent reliability (>0.9). However, the ICC for the "Enforcement" condition is relatively low (0.44) with a wide confidence interval, suggesting poor inter-rater consistency for this dimension. Further refinement of the scoring criteria or additional items for this dimension are recommended.

***AUTHORS’ RESPONSE: We have revised the limitations section to indicate the issue with subscores in general and the special issue with enforcement that is based on a single item. We have also revised the Conclusions section.

3.The validity assessment approach is reasonable, but proxy measures require further justification. The use of "alignment with additions/removals on the WHO Model List" as a proxy measure is appropriate. However, the theoretical linkage between these indicators and "selection process effectiveness" should be more explicitly articulated to strengthen the validity argument.

***AUTHORS’ RESPONSE: We have revised the Introduction and provided supporting references.

4.The literature review and methods section could be more detailed. The description of the literature search strategy (databases, keywords, screening process) is somewhat brief. A more thorough methodological description is recommended to enhance reproducibility. Additionally, please clarify the rationale for retaining 16 items and indicate whether factor analysis or item response theory (IRT) was applied.

***AUTHORS’ RESPONSE: We have provided additional details about the literature review.

5.The current data availability statement does not comply with PLOS ONE’s requirements. The statement “Data are available from the corresponding author upon reasonable request” does not meet the journal’s requirement for full public data availability. It is recommended to deposit the data in a public repository (e.g., Figshare, Zenodo) and provide a DOI or access link.

***AUTHORS’ RESPONSE: As we addressed in R1 comment 13 we have revised the data availability statement to indicate that the data used to rate the countries is available in the WHO’s repository. We can be contacted if additional data is needed.

6.Results are clearly presented, but figures could be improved. The figures (e.g., Figure 1–3) effectively communicate the main results but lack sufficient annotations and explanatory text. Please consider adding clearer labels, error bar interpretations, and indicators of statistical significance where applicable.

***AUTHORS’ RESPONSE: We have revised the figure captions to make them more clear.

7.The policy implications are strong and practical. The developed instrument has clear value for cross-country comparisons, policy evaluation, and process improvement. It is suggested to further emphasize in the Discussion how the tool could be implemented by national or international organizations (e.g., WHO) and outline plans for future dissemination.

***AUTHORS’ RESPONSE: Thank you. We have revised the discussion to discuss future work needed.

Attachment

Submitted filename: Response To Reviewers_2025_10_16.docx

pone.0342750.s004.docx (29.4KB, docx)

Decision Letter 1

Muhammad Shahzad Aslam

28 Jan 2026

Assessing the National Essential Medicines List Selection Processes: Instrument Development and Testing

PONE-D-25-39957R1

Dear Dr. Persaud,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support .

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Muhammad Shahzad Aslam, Ph.D.,M.Phil., Pharm-D

Academic Editor

PLOS One

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #2: Yes

**********

Reviewer #2: The authors are to be commended for their thorough and thoughtful responses to the reviewers' comments. They have undertaken extensive revisions to the manuscript, which has significantly improved its clarity, rigor, and overall quality.

All major concerns raised in the previous round of review have been adequately addressed. Key improvements include:

A clearer title and abstract structure.

A more detailed justification for the pilot study country selection.

A revised data availability statement that now aligns with journal policy by directing readers to publicly available WHO repository data and supplementary files.

A standardized use of terminology and updated references.

A significantly strengthened discussion that better contextualizes the study within the existing literature.

A new, stand-alone "Limitations" section that candidly addresses the pilot sample size, the reliability of certain sub-dimensions (notably the single-item 'Enforcement' domain), and the inherent limitations of the proxy measures used for validity.

A more measured and cautious conclusion that accurately reflects the instrument's current stage of development as a promising but preliminary tool.

The revisions to the methods, results, and appendices have enhanced the transparency and reproducibility of the instrument development process. The authors have successfully moderated claims of robustness while highlighting the tool's potential utility for cross-country comparison and policy evaluation.

In summary, the manuscript now presents a well-developed and validated instrument that fills a clear methodological gap in the field. The authors have satisfactorily responded to all points raised, and the study meets the standards for publication in my view.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #2: No

**********

Acceptance letter

Muhammad Shahzad Aslam

PONE-D-25-39957R1

PLOS One

Dear Dr. Persaud,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Muhammad Shahzad Aslam

Academic Editor

PLOS One

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Tables and Figures.

    (DOCX)

    pone.0342750.s001.docx (47.7KB, docx)
    S2 File. Ratings by each rater in pilot study.

    (XLSX)

    pone.0342750.s002.xlsx (13KB, xlsx)
    Attachment

    Submitted filename: NEML Review.docx

    pone.0342750.s003.docx (18.2KB, docx)
    Attachment

    Submitted filename: Response To Reviewers_2025_10_16.docx

    pone.0342750.s004.docx (29.4KB, docx)

    Data Availability Statement

    This study was designed as a pilot project. The data used to rate pilot study countries is available in the WHO repository, where NEMLs for each country are publicly available: https://www.who.int/teams/health-product-policy-and-standards/assistive-and-medical-technology/essential-medicines/national-emls The data underlying the instrument ratings can be found in the supplementary file. The corresponding author or the Unity Health Toronto Research Ethics Board (researchethics@unityhealth.to) can be contacted if additional data is needed.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES