Abstract
Background
The start of the COVID-19 pandemic presented a situation in which there was an urgent need for decision-making that relates to diagnosis, but the evidence was lacking, of low certainty or constantly changing. Rapid and living guideline development methods were needed and had to be applied to rigorous guideline approaches, such as the Grading of Recommendations Assessment, Development, and Evaluation approach.
Objectives
To describe the process of developing rapid diagnosis guidelines when there is limited and imperfect available data at the time of crisis.
Sources
Case example from four Infectious Disease Society of America COVID-19 diagnostic guidelines.
Content
As the world was experiencing panic with COVID-19, there were serious doubts about the feasibility of following a rigorous process for guideline development when timeliness was of extreme value. The Infectious Disease Society of America guideline panels supported by several methodologists strongly believed that at times of crisis, it is more important than ever to follow a rigorous process. The panel adopted a rapid and living systematic review methodology and applied the Grading of Recommendations Assessment, Development and Evaluation approach to four diagnosis guidelines despite the challenges of scarce and dynamic evidence. We describe the methodological details of the rapid and living approach (data extraction, meta-analysis, Evidence to Decision framework, and recommendation development), the challenge of resources, the challenge of scarce evidence, the challenge of rapidly changing evidence, as well as ‘wins’ from the Infectious Disease Society of America experience.
Implications
Mitigation of pandemics relies on rapid and accurate diagnosis, which is challenged by many knowledge gaps. This necessitates emerging evidence is rapidly incorporated in a living fashion with several decisional and contextual factors to ensure the best public health strategies and care for patients. This process must be systematic and transparent for developing trustworthy guidelines and should be supported by all stakeholders, including researchers, editors, publishers, professional societies, and policymakers.
Keywords: COVID-19, Crisis, Diagnosis, Evidence, GRADE, Guideline, Living guidelines, Rapid guidelines, Systematic review
Background
The ancient Greek physician, Hippocrates once said, “For extreme diseases, extreme methods of cure, as to restriction, are most suitable” [1]. In other words, “desperate times call for desperate measures”. This phrase summarizes the experience that many decision-makers, including policymakers, researchers, and frontline clinicians faced in the early stages of the COVID-19 pandemic. Efforts to mitigate and manage public health threats, such as pandemics, especially those with developing aetiologies, high transmissibility, and severe health outcomes, necessitate implementing strategies that focus on making accurate diagnostic and treatment decisions to guide management while dealing with the uncertainties of sparse and rapidly evolving evidence. It is in scenarios, such as these, that trustworthy guidelines by professional societies are critically needed.
As defined by The Institute of Medicine (now Academy of Medicine), clinical practice guidelines (further called guidelines) are “statements that include recommendations intended to optimize patient care that are informed by a systematic review of evidence and an assessment of the benefits and harms of alternative care options” [2]. Guidelines are intended to support health care professionals and patients in making health-related choices to optimize health outcomes [2] and should follow a rigorous methodology which has been summarized by multiple groups, such as the Guideline International Network-McMaster guideline development checklist [3] and AGREE-II instrument [4]. While the typical timeframe for the development of a traditional guideline can take 2–3 years [5] in the setting of health emergencies and situations where urgent guidance is needed, such as a pandemic, rapid guidelines are more appropriate. Rapid guidelines guide within shorter timeframes of a few weeks to a few months depending on the level of urgency and the impact on populations [6]. Rapid guidelines require rapid literature reviews, which have been defined as “a form of knowledge synthesis that accelerates the process of conducting a traditional systematic review through streamlining or omitting a variety of methods to produce evidence for stakeholders in a resource-efficient manner” [7,8]. With the onset of the pandemic, professional health care organizations had to find a way to develop rapid guidelines by utilizing all available resources. Lastly, in cases when the body of evidence is rapidly evolving, living guidelines allow for continual surveillance of new evidence, and its incorporation on an ongoing basis, ensuring the best care for patients [9]. This allows for up-to-date guidance and potentially prevents the use of outdated evidence as the basis for recommendations [9,10].
Yet, these approaches, rapid and living guidelines, which are informed by rapid and living reviews, are challenging to implement. According to an international survey conducted before the COVID-19 pandemic, guideline developers believed that their organization did not have adequate resources to develop rigorous guidelines rapidly [11]. Furthermore, living guidelines require the implementation of specific structural and procedural processes (e.g. using dynamic platforms for up-to-date and accessible recommendations) [12]. This challenge may deter guideline developers from using the living approach [13]. The other challenge is the potential risk of producing premature recommendations that are based on early evidence. Early evidence can be biased and likely to have low certainty. If future research contradicts the early guidance, this may impact society's reputation in the health field [14].
In this manuscript, we describe approaches to develop test recommendations in COVID-19 diagnostic guidelines using the best available evidence, which may be imperfect at times, to inform decisions at the time of crisis. We draw from the experience of the Infectious Disease Society of American (IDSA), which included the development of four rapid and living diagnosis guidelines for molecular, antibody, and antigen testing for COVID-19 [[15], [16], [17], [18]]. We describe the methods, challenges, as well as wins of adopting this approach. Although organizations other than IDSA may have different established structures and processes for rapid and living guidelines, we believe that there is a level of shared experiences that allows these lessons to be generalizable beyond the scope of IDSA. These lessons would be valuable for systematic reviewers, guideline developers, methodologists, and other stakeholders in guideline development.
Methodological details of the rapid and living approach
The IDSA COVID-19 diagnosis guidelines were performed in a rapid and living manner. As compared with the methodology of developing a regular guideline [3], the steps for developing these guidelines were modified to a design incorporating both rapid and living features. When considering the rapid/living approach, the conduct of the systematic review is a pivotal step for recommendation development. This process includes framing the question, identifying relevant publications, assessing study quality, summarizing the evidence (data extraction and pooling of results when appropriate), and interpreting the findings [19]. Table 1 provides a list of all questions addressed by the four diagnostic guidelines. We explore the components below in the systematic review process and recommendation development that had unique features from the IDSA diagnostic experience.
Table 1.
Questions addressed in the first IDSA COVID-19 molecular diagnosis guideline |
---|
In symptomatic individuals suspected of COVID-19
|
Questions addressed in the IDSA COVID-19 molecular diagnosis guideline update |
|
Questions addressed in the IDSA COVID-19 antigen diagnosis guideline |
|
Questions addressed in the IDSA COVID-19 serology diagnosis guideline |
|
IDSA, Infectious Disease Society of America; NAAT, Nucleic Acid Amplification Test.
URTI, Upper respiratory infection, HCP, Healthcare provider, ILI, influenza-like illness, ARDS, acute respiratory distress syndrome, LRTI, lower respiratory tract infection, ICU, intensive care unit, PPE, personal protective equipment, NP nasopharyngeal.
Data extraction
The step of data extraction was iteratively revised and modified across the four diagnostic guidelines. With the emerging and evolving evidence of COVID-19, it was not clear what factors would affect testing accuracy, thus requiring a comprehensive approach. This meant extracting all potentially informative information (e.g. type of swab, swab site, symptomatic status of participants, and order of sample collection). As clinical and methods experts were assessing the studies, additional data extraction areas emerged (e.g. the transport medium of the sample, the technique of swab collection, and turnaround time). People's important outcomes, which are a consequence of performing a test were considered relevant outcomes, such as isolation, quarantine, transmission events, and return to work. This required revising data extraction forms for all prior studies because they may have reported this information. This allowed for a living data extraction approach, which was performed under a tight timeline. Additionally, to allow for efficient data abstraction, a methods expert and a clinical expert reviewed studies and then the main considerations were discussed among the whole group to ensure consistency.
Meta-analysis
A living meta-analysis that rapidly incorporated additional information was of the essence for these guidelines.
Having a constantly evolving data extraction process required analysis using a living meta-analysis approach. Newer versions of meta-analysis forest plots incorporated additional extracted information. The accumulation of several versions of meta-analysis required careful version control methods to avoid accidentally utilizing outdated versions. Another reason for a living meta-analysis was the rapid identification of important subgroups. Examples of subgroups included patient characteristics (e.g. paediatrics, duration of symptom onset before testing) and test characteristics (e.g. assay technology). Additional analyses were performed during panel meetings with the evidence synthesis team to explore common questions phased by clinicians and patients. By only considering clinically relevant subgroups, we tried to strike a balance to mitigate potential coincidental results. This dynamic meta-analysis approach was essential for summarizing evidence promptly to move forward with developing recommendations.
Evidence to decision framework and recommendation development
Developing a recommendation incorporates considerations that extend beyond the test accuracy results. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach has tailored Evidence to Decision framework to tests in clinical practice and public health [20].
The GRADE Evidence to Decision Framework factors summarizes 12 considerations needed for decision-making, which include patient's/public values and preferences, the balance of desirable and undesirable effects of the test, acceptability, feasibility, equity, and resource utilization [20]. With the lack of evidence around these considerations, we relied on the experience of the clinical experts on the panel for factoring these into decision-making. Although the panel considered each of these factors, we did not require that they make separate judgments about each of them to facilitate rapid guideline development.
Challenge of resources
As the world was experiencing panic with COVID-19, there were serious reservations about following a rigorous process for guidelines when timeliness was of extreme value. The belief was that at times like this, it is more important than ever to follow a rigorous process. This meant utilizing all available resources, especially time and human resources.
Time was the most critical element for moving the project forward. The evidence synthesis team and the clinical panel were consistently working after hours. The first guideline iteration was conducted as part of an urgent response to COVID-19 in a matter of 1–2 weeks [21]. We worked within a tight timeline to develop recommendations without taking critical shortcuts. The evidence team was performing the literature search, literature screening, data abstraction, data analysis, and creating evidence profiles at a rapid pace.
This required having frequent internal meetings within the evidence synthesis team. It also meant frequent meetings between the evidence synthesis team and the clinical panel, which were daily meetings in the beginning. Given the urgency of the topic at hand, team members deprioritized other tasks (such as the development of other guidelines or research work) and ‘created’ time to work on these guidelines. The presence of lockdowns at the time might have been an opportunity to ‘create’ this time. However, it is important to note that the feasibility of continuing the work with such intensity had to be addressed and decisions about decreasing the intensity and changing the timeline had to be made to maintain the process.
Another critical resource was the human resource of a collaborative interdisciplinary team. This included clinical experts, systematic reviewers, methodologists, an information specialist, and a statistical expert. One unique feature was having a high number of experienced methodologists at multiple institutions (specifically the U.S. GRADE Network). All members of this team were in continuous communication. The information specialist collaborated with systematic reviewers in developing a search strategy and provided frequent search results (weekly and then monthly). The statistical expert was continuously updating the analysis according to input from the systematic reviewers and the clinical experts. The methodologists were in contact with all members to ensure rigour and transparency. Although a similar team would be required in any guideline, this very close communication and collaboration for developing the rapid and living guidelines was both a challenge and a necessity. This was achieved by tapping into existing collaboration and utilizing a model of active capacity building by including different levels of expertise among the evidence synthesis team. This was essential to avoid ‘burn out’ among the team. Some team members were more actively involved in some parts of the living guideline update than others. Given the limitations of an already huge interdisciplinary team and the need for rapid guidance, this effort did not formally involve patient/consumer/public representatives as panellists in the development of the guidelines. Instead, we incorporated the patient/public's values and preferences from the clinical expert's experience. Additionally, patients' view was incorporated as part of a comprehensive public and other organization review process.
Challenge of scarce evidence
The paucity of evidence was one of the biggest challenges of these rapid and living guidelines. With the scarcity of evidence, any available evidence was better than nothing. This meant including preprints that did not undergo the peer-review process and searching package inserts and manufacturer websites for diagnostic test accuracy outcomes. This also meant including potentially underpowered studies with a sample size of at least 30, accepting almost any selected reference test that was labelled as the reference standard in the study, and accepting any diagnostic test without restricting to tests that met an Emergency Use Authorization or European Commission approval.
Because there was limited evidence, additional quality assurance steps were taken. First, the clinical panel were confirming the data abstraction done by the evidence synthesis team for select studies. This allowed us to identify additional data that was of clinical relevance (e.g. transport medium of sample, technique of swab collection, and turnaround time). This, in turn, had implications for meta-analysis forest plots which needed to be modified accordingly. Second, in carefully reviewing the limited evidence, it became evident that there are implications for the different settings for testing across studies. This variability included differences in illness severity, the timing of testing about exposure or symptoms, site of swab sample, training of the collector, assay characteristics (e.g. point of care or laboratory-based), host factors (e.g. age, immunity status, and hospitalization status), and viral factors (e.g. viral load and variant of concern). This heterogeneity factored into the inconsistency in sensitivity and specificity results across studies. Another observation with the early evidence was that most diagnostic tests were coming from a single manufacturer. This raised questions around the directness (applicability) of the evidence to the prioritized question addressing testing from all manufacturers and potential publication bias.
Challenge of rapidly changing evidence
The changing nature of the evidence is important to highlight; we were chasing a moving target. In terms of COVID-19 prevalence, it was not well-defined in the beginning, showed considerable geographical differences, and changed during surges and outbreaks and about policy mandates (e.g. social distancing and masking requirements). The pretest probability, on the other hand, varied across symptomatic and asymptomatic individuals, those with known exposure (depending on the type of exposure), and whether they were vaccinated or not (later in the pandemic). The evidence was not clear in distinguishing these differences. We considered multiple prevalence values and pretest probabilities for each recommendation. This was important to guide varying recommendations based on different prevalence or pretest probabilities.
With the clear evidence gap, the panel had to make assumptions, particularly around setting a decision threshold for acceptable test accuracy. Setting such threshold is based on formal/informal modelling for direct and critical people's important outcomes, including natural progression, treatment effects and people's values, which changed as additional public health measures and information became available [22]). With the lack of evidence linking diagnostic testing with people's important outcomes, it was necessary for the panel to make their assumptions explicit. We applied informal modelling based on the experience of the clinical experts on the panel. During the living process, judgments around the decision threshold changed considerably. In the early stages of the pandemic, when there were higher hospitalization and mortality rates per infection, worse disease severity, no effective treatments, and no vaccine available one would accept more stringent and a conservative test decision threshold was used. For example, the decision threshold was determined to be 1–2% (10–20/1000) false negatives (FNs). This meant that only tests with an FN range less than 1–2% (10–20/1000) would be considered accurate. Later in the pandemic when some treatments proved efficacious, and when the disease became less severe with the vaccine and some newer variants, this decision threshold became 20% or 200/1000. People's important outcomes thresholds remained the same, but, because of the change in the linked evidence and consequences of test results, the test decision threshold(s) became less stringent at later stages of the pandemic and the panel was willing to accept higher thresholds for FN values.
‘Wins’ from the IDSA experience
There were many points that we considered as ‘wins’ from the IDSA diagnostic guidelines which became feasible as additional evidence became available. Early in the pandemic, the diagnosis panel had concerns about the use of different ‘reference standards’ that studies reported on, because that may be a reason for downgrading our certainty about the performance of a test. The panel published what they believed is an appropriate hierarchy of reference standards. Early in the process, we accepted any used reference test, but with more studies reporting on additional reference standards, we were able to categorize studies based on the reference standard and when feasible focus on the studies with better reference standards and higher certainty of evidence. Take for example the accuracy of the standard, laboratory-based Nucleic Acid Amplification Test molecular test which relies on the RT-PCR technology. With additional emerging studies, Nucleic Acid Amplification Test started to have a direct comparison of molecular tests against a composite reference standard. This reference standard provides better results than using one test as an index test and the other as the reference.
Other ‘wins’ were the ability to display recommendations visually in a decision tree and the restriction to higher quality evidence. As additional evidence emerged and more recommendations were developed, we were able to present them using user-friendly infographics. This was an essential piece for the dissemination and uptake of the recommendations by frontline clinicians. We disseminated the decision trees and relevant recommendations both on the IDSA website and as a peer-reviewed publication. On the other hand, the emerging evidence allowed restriction of the evidence to peer-reviewed publications with diagnostic tests that are Emergency Use Authorization/European Commission approved. This was also accompanied by a change in literature search frequency to once monthly. These modifications alleviated the burden on the information specialist and the evidence synthesis team.
Finally, the longitudinal dimension of these living guidelines allowed for noteworthy methodologic observations. Decision thresholds changed over time between guideline update iterations. Reasons for this change were because of improved outcomes, better treatment, more vaccination, and less severe variants. Values about people's important outcomes thresholds remained the same, but, because of the change in the linked evidence, the test decision threshold(s) became less stringent at later stages of the pandemic and the panel was willing to accept a higher threshold for FN.
Conclusions
When there is rapidly emerging and changing evidence, it is critical to adopt the rapid and living approach to update guidance rapidly with newly available potentially consequential evidence. At the same time, in those situations, it is important to maintain a rigorous and transparent method for recommendation development. This transparency is crucial because the recommendations have implications on clinical practice and public health decision-making at the individual and societal levels, respectively. The transparency includes highlighting when recommendations are based on low-certainty evidence and documenting major evidence gaps to help guide moving research studies. The diagnostic guidelines were essential in highlighting the evidence gaps around diagnostic tests with COVID-19.
Author contributions
RAM conceptualized the paper and provided the initial outline of the manuscript. IKE contributed to the writing and editing of the text. All authors provided input and supporting text and approved the final manuscript.
Transparency declaration
R.A.M, R.L.M, Y.F.Y, S.S., and H.M are co-founders of the US GRADE network and the Evidence Foundation and are methodologists collaborating on the IDSA COVID-19 guidelines. The perspective described in this article is of the authors.
Funding
This narrative review was written without funding.
Editor: L. Mariska
References
- 1.Hippocrates. The aphorisms of Hippocrates. Collins & co; New York: 1817. [Google Scholar]
- 2.Graham R., Mancher M., Miller Wolman D., Greenfield S., Steinberg E., editors. Clinical practice guidelines we can trust. National Academies Press (US) Copyright; Washington (DC): 2011. Institute of medicine committee on standards for developing trustworthy clinical practice G. by the National Academy of Sciences. All rights reserved.; 2011. [PubMed] [Google Scholar]
- 3.Schünemann H.J., Wiercioch W., Etxeandia I., Falavigna M., Santesso N., Mustafa R., et al. Guidelines 2.0: systematic development of a comprehensive checklist for a successful guideline enterprise. CMAJ. 2014;186:E123–E142. doi: 10.1503/cmaj.131237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Brouwers M.C., Kho M.E., Browman G.P., Burgers J.S., Cluzeau F., Feder G., et al. Agree II: advancing guideline development, reporting and evaluation in health care. CMAJ. 2010;182:E839–E842. doi: 10.1503/cmaj.090449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.World Health O. 2nd ed. ed. World Health Organization; Geneva: 2014. WHO handbook for guideline development. [Google Scholar]
- 6.Thayer K.A., Schünemann H.J. Using GRADE to respond to health questions with different levels of urgency. Environ Int. 2016;92–93:585–589. doi: 10.1016/j.envint.2016.03.027. [DOI] [PubMed] [Google Scholar]
- 7.Garritty C., Gartlehner G., Nussbaumer-Streit B., King V.J., Hamel C., Kamel C., et al. Cochrane Rapid Reviews Methods Group offers evidence-informed guidance to conduct rapid reviews. J Clin Epidemiol. 2021;130:13–22. doi: 10.1016/j.jclinepi.2020.10.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hamel C., Michaud A., Thuku M., Skidmore B., Stevens A., Nussbaumer-Streit B., et al. Defining rapid reviews: a systematic scoping review and thematic analysis of definitions and defining characteristics of rapid reviews. J Clin Epidemiol. 2021;129:74–85. doi: 10.1016/j.jclinepi.2020.09.041. [DOI] [PubMed] [Google Scholar]
- 9.Akl E.A., Meerpohl J.J., Elliott J., Kahale L.A., Schünemann H.J. Living systematic reviews: 4. Living guideline recommendations. J Clin Epidemiol. 2017;91:47–53. doi: 10.1016/j.jclinepi.2017.08.009. [DOI] [PubMed] [Google Scholar]
- 10.Vernooij R.W.M., Sanabria A.J., Solà I., Alonso-Coello P., Martínez García L. Guidance for updating clinical practice guidelines: a systematic review of methodological handbooks. Implement Sci. 2014;9:3. doi: 10.1186/1748-5908-9-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sultan S., Siedler M.R., Morgan R.L., Ogunremi T., Dahm P., Fatheree L.A., et al. An international needs assessment survey of guideline developers demonstrates variability in resources and challenges to collaboration between organizations. J Gen Intern Med. 2022;37:2669–2677. doi: 10.1007/s11606-021-07112-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.El Mikati I.K., Khabsa J., Harb T., Khamis M., Agarwal A., Pardo-Hernandez H., et al. A framework for the development of living practice guidelines in health care. Ann Intern Med. 2022;175:1154–1160. doi: 10.7326/M22-0514. [DOI] [PubMed] [Google Scholar]
- 13.Alonso-Coello P., Martínez García L., Carrasco J.M., Solà I., Qureshi S., Burgers J.S., et al. The updating of clinical practice guidelines: insights from an international survey. Implement Sci. 2011;6:107. doi: 10.1186/1748-5908-6-107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Alahdab F., Farah W., Almasri J., Barrionuevo P., Zaiem F., Benkhadra R., et al. Treatment effect in earlier trials of patients with chronic medical conditions: a meta-epidemiologic study. Mayo Clin Proc. 2018;93:278–283. doi: 10.1016/j.mayocp.2017.10.020. [DOI] [PubMed] [Google Scholar]
- 15.Hanson K.E., Altayar O., Caliendo A.M., Arias C.A., Englund J.A., Hayden M.K., et al. The infectious diseases society of America guidelines on the diagnosis of COVID-19: antigen testing. Clin Infect Dis. 2021 doi: 10.1093/cid/ciab557. [DOI] [PubMed] [Google Scholar]
- 16.Hanson K.E., Caliendo A.M., Arias C.A., Englund J.A., Hayden M.K., Lee M.J., et al. Infectious diseases society of America guidelines on the diagnosis of COVID-19: serologic testing. Clin Infect Dis. 2020:ciaa1343. doi: 10.1093/cid/ciaa1343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hanson K.E., Caliendo A.M., Arias C.A., Hayden M.K., Englund J.A., Lee M.J., et al. The infectious diseases society of America guidelines on the diagnosis of COVID-19: molecular diagnostic testing. Clin Infect Dis. 2021;ciab048 doi: 10.1093/cid/ciab048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hanson K.E., Caliendo A.M., Arias C.A., Englund J.A., Lee M.J., Loeb M., et al. Infectious diseases society of America guidelines on the diagnosis of COVID-19. Clin Infect Dis. 2020 doi: 10.1093/cid/ciaa760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Khan K.S., Kunz R., Kleijnen J., Antes G. Five steps to conducting a systematic review. J R Soc Med. 2003;96:118–121. doi: 10.1258/jrsm.96.3.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Schünemann H.J., Mustafa R., Brozek J., Santesso N., Alonso-Coello P., Guyatt G., et al. GRADE Guidelines: 16. GRADE evidence to decision frameworks for tests in clinical practice and public health. J Clin Epidemiol. 2016;76:89–98. doi: 10.1016/j.jclinepi.2016.01.032. [DOI] [PubMed] [Google Scholar]
- 21.Akl E.A., Morgan R.L., Rooney A.A., Beverly B., Katikireddi S.V., Agarwal A., et al. Developing trustworthy recommendations as part of an urgent response (1-2 weeks): a GRADE concept paper. J Clin Epidemiol. 2021;129:1–11. doi: 10.1016/j.jclinepi.2020.09.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Schünemann H.J., Mustafa R.A., Brozek J., Santesso N., Bossuyt P.M., Steingart K.R., et al. GRADE guidelines: 22. The GRADE approach for tests and strategies-from test accuracy to patient-important outcomes and recommendations. J Clin Epidemiol. 2019;111:69–82. doi: 10.1016/j.jclinepi.2019.02.003. [DOI] [PubMed] [Google Scholar]