Abstract
Efforts aimed at increasing the pace of evidence synthesis have been primarily focused on the use of published articles, but these are a relatively delayed, incomplete, and at times biased source of study results data. Compared to those in bibliographic databases, structured results data available in trial registries may be more timely, complete, and accessible, but these data remain underutilized. Key advantages of using structured results data include the potential to automatically monitor the accumulation of relevant evidence and use it to signal when a systematic review requires updating, as well as to prospectively assign trials to already published reviews. Shifting focus to emerging sources of structured trial data may provide the impetus to build a more proactive and efficient system of continuous evidence surveillance.
INTRODUCTION
Systematic reviews have the potential to provide the highest level of evidence to support diagnostic and therapeutic decisions in clinical practice. These reviews efficiently integrate vast amounts of biomedical research to support rational decision-making in developing practice guidelines, identifying research priorities, performing cost analyses, and formulating health policies. However, the production of systematic reviews remains entrenched in inadequate and time-consuming workflows that threaten to limit their value.
One fundamental challenge is the reliance of systematic reviews primarily on bibliographic databases, which are an incomplete and at times biased source of clinical evidence. Another limitation is a lack of empirical approaches for selecting which reviews to conduct and when. While there is guidance to determine when a systematic review update is warranted,1 the field remains defined by a lack of collaborative approaches and a wide breadth of motivators for publishing systematic reviews, leading to an abundance of reviews that are unnecessary or redundant.2
Systematic reviews are also enormously resource intensive and have been estimated to take more than 60 weeks to complete.3,4 Approximately a quarter of this effort is spent on searching and screening records in bibliographic databases and another quarter on extracting and collating information from articles.4 Recognising screening and extraction as familiar problems from other domains, experts in information retrieval and machine learning set out to automate certain manual processes. More than 36 machine learning methods have been developed to tackle the screening task alone,5,6 and a range of solutions for extracting information from abstracts and full text articles have been developed.7–9
However, rather than pursuing methods to incrementally improve traditional systematic review processes, we think it is time to reconsider the fundamentals of evidence synthesis. Access to different forms of computable trial data is increasingly available and may enable a shift away from time-consuming screening of trial publications to proactive surveillance of clinical trials and evidence accumulation to inform prioritization of systematic review activities. Here we 1) discuss the value of using structured trial data as the basis for evidence surveillance, 2) assess feasibility and gaps to be addressed, and 3) recommend next steps.
HOW CAN STRUCTURED STUDY DATA IMPROVE EVIDENCE SYNTHESIS?
Direct advantages of structured trial data include improvements in efficiency and scalability related to automation of data extraction and analysis. In the traditional life cycle of a trial, information regarding trial design and conduct was made available along with results through publication as journal articles. However, data curation from published articles is time-consuming, requires extensive subject-matter expertise, and cannot yet be effectively mimicked by machines. In the modern trial life cycle, detailed information on a trial’s clinical question and intervention is available via prespecified data elements in trial registries, and results data are presented in standardized, structured formats with consistent ontologies. This novel infrastructure can substantially simplify trial screening, extraction, and synthesis, and make systematic review processes amenable to automation.
Structured data sources also have the potential to improve the integrity of evidence synthesis by reducing the impact of reporting biases. For example, registries and data-sharing platforms may provide records for studies that were not published and would otherwise have been inaccessible to reviewers. Results in structured databases also tend to be more comprehensive than those reported in focused trial publications, which reduces biases introduced by limited or selective reporting of results, especially for adverse events.10–12
The ability to track trials as they are registered, completed, and reported presents a new opportunity to monitor the accumulation of evidence relative to published systematic reviews. Traditionally, systematic reviews have relied on reactively searching for and screening published articles related to a given clinical topic. Using data from the clinical questions described in trial registrations, algorithms could be developed to detect relevant trials as they are registered and proactively assign them to systematic reviews. As the trials are completed and results made available, an alert could be triggered indicating to the systematic review community that a systematic review merits examination for a potential update. This approach would reduce redundant and unnecessary systematic reviews by focusing human effort on updating systematic reviews where the accumulation of new results data suggests that a change in clinical practice or policy is most likely warranted.
The proposed approach could work in conjunction with incentives that promote uptake of such a prioritization system to inform review activities. For example, there could be targeted funding opportunities for prioritized clinical topics. Incentives could also derive from journal editors using the system to assess whether submitted systematic review manuscripts are addressing priority topics.
WHAT ARE THE CURRENT LIMITATIONS?
The Global Trial Bank for reporting structured and computable versions of clinical trial data and results was developed as early as 2005 by Sim and Detmer.13 While this repository was ahead of its time when proposed, structured and computable results data are now increasingly released through a variety of platforms, including public registries, company-based platforms, and patient-level data brokerage initiatives.
The largest registry is the US-based ClinicalTrials.gov, which provides access to structured trial data and results. It currently holds records for 252 248 interventional studies and structured summary results for 37 100 of these, involving more than 13.9 million participants. Where ClinicalTrials.gov provides structured summary data for a substantial and growing proportion of all registered trials, other initiatives provide more detailed information, including patient-level data, but for smaller numbers of trials. Examples include platforms such as the YODA Project,14 which was initiated in 2011 and provides access to patient-level data for 350 trials. Vivli,15 a similar data-sharing program, was launched in 2018 to facilitate sharing of patient-level data and currently manages approximately 4700 trials from over 20 sponsors.
An important current limitation is that access to structured and computable trial data is still far from universal, making it difficult to acquire the requisite data across a cohort. However, access to trial data is increasing at a rapid pace, in part due to initiatives promoting open science and transparency as well as federal mandates requiring prospective trial registration and timely structured results reporting. In the US, these represent vast and sweeping changes that affect all trials funded by the National Institutes of Health and those conducted for products regulated by the Food and Drug Administration. Additionally, the International Consortium of Medical Journal Editors has committed to publishing only trials that are compliant with prospective registration practices.16,17 Because of these changes, approximately half of all published trials are now registered,18 and structured trial results are made available on ClinicalTrials.gov often around the same time as the publication of trials in medical journals.
Another limitation stems from the decentralized development of trial data platforms, which precludes harmonization of data collection and the ability to link data across sources. Platform structures and data reporting vary, with results data ranging from individual patient-level data to aggregated summary findings. Many do not include all the elements needed to appraise the quality of relevant studies or lack the data standards to support consistent and efficient data extraction and synthesis across trials. In addition, many of the links are missing from the “information scaffolding” that is meant to connect trial information across registries and bibliographic databases,18–20and create unified datasets within a larger trial reporting system framework.21
A related issue is that the proposed system will continue to depend on the integrity and quality of the underlying trial data as well as on compliance by investigators with trial reporting procedures. Frequent problems with submitted trial information identified by ClinicalTrials.gov staff during the quality-control review process include inconsistent and incomplete data entry, such as varying numbers of participants within the submission, and lack of specificity for outcome measures.22 Overall, industry sponsors consistently surpass academic medical centers in meeting reporting requirements,23,24 pointing to the need for cultural shifts and changes at the organization level to ensure universally high-quality and timely trial data availability.
WHAT NEEDS TO BE DONE NEXT?
To enable more efficient, complete, and unbiased evidence synthesis, we need to create an environment where use of computable trial results can augment or replace current practices without introducing substantial new time and resource costs. We recommend that informatics-based efforts take advantage of the changing landscape by developing methods to improve and evaluate standards for structured results reporting. Greater standardization would improve interoperability and provide the backbone for an ecosystem of software tools that could be built to support and automate evidence synthesis tasks.25 If it is unlikely that investigators will adhere to a single standard, then tools will be needed to support the mapping of heterogeneous trial descriptions into standardized and computable forms.
To avoid unnecessary duplication of effort, transparency around systematic review projects and the democratization of evidence synthesis must also increase. Despite the substantial gains in transparency for the registration and reporting of trials that have occurred over the last 20 years, systematic reviews are still largely performed in isolation with few advances in cooperative processes. Encouraging advances include the adoption of PROSPERO by the systematic review community to prospectively register review protocols in a public and searchable format. Other innovative projects include the application of crowdsourcing approaches for evidence synthesis tasks26 and the development of novel methods connecting external data sources to bibliographic databases to train classifiers for a broad range of systematic review processes.27–29 We have also released, and continue to develop and improve, a repository of structured systematic review data designed to support assessments of new evidence and decisions around whether a systematic review requires updating.30
If trials were registered and tracked with evidence synthesis in mind, they could be linked to clinical questions and even systematic review protocols at the time of submission. In the current system, it can take years before a relevant trial appears in a systematic review. By addressing the need for evidence synthesis at the inception of a trial, not only would the arduous tasks of trial searching and screening be reduced, but a more efficient pipeline could be built where trial results were seamlessly assigned to pertinent reviews, signalling to the systematic review community that an update may be required as soon as they are completed and their computable results made available.
Despite major advances in access to clinical study data in the last decade, evidence synthesis technologies have remained too heavily focused on reactive screening of bibliographic databases. Coordinated efforts are needed to drive the harmonization of trial reporting and build an evidence surveillance system optimized through collaborative efforts to support monitoring of trial activity, prioritizing of systematic reviews, and automating results synthesis.
FUNDING
This work was supported by the National Library of Medicine, grant number R01LM012976.
CONFLICT OF INTEREST STATEMENT
None declared.
AUTHOR CONTRIBUTIONS
AD and FB contributed to the conception of the work and the drafting and critical revision of the manuscript. Both authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
REFERENCES
- 1. Garner P, Hopewell S, Chandler J, et al. When and how to update systematic reviews: consensus and checklist. BMJ 2016; 354: i3507.doi: 10.1136/bmj.i3507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Page MJ, Moher D.. Mass production of systematic reviews and meta-analyses: an exercise in mega-silliness? The Milbank Quarterly 2016; 94 (3): 515–9. doi: 10.1111/1468-0009.12211[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Borah R, Brown AW, Capers PL, Kaiser KA.. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 2017; 7 (2): e012545.doi: 10.1136/bmjopen-2016-012545[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Pham B, Bagheri E, Rios P, et al. Improving the conduct of systematic reviews: a process mining perspective. Journal of Clinical Epidemiology 2018; 103: 101–11. doi: 10.1016/j.jclinepi.2018.06.011[published Online First: Epub Date]|. [DOI] [PubMed] [Google Scholar]
- 5. O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S.. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 2015; 4 (1): 5.doi: 10.1186/2046-4053-4-5[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Shekelle PG, Shetty K, Newberry S, Maglione M, Motala A.. Machine learning versus standard techniques for updating searches for systematic reviews: a diagnostic accuracy study. Ann Intern Med 2017; 167 (3): 213–5. doi: 10.7326/l17-0124[published Online First: Epub Date]|. [DOI] [PubMed] [Google Scholar]
- 7. de Bruijn B, Carini S, Kiritchenko S, Martin J, Sim I.. Automated information extraction of key trial design elements from clinical trial publications. AMIA Ann Symp Proc 2008; 2008: 141–45. [PMC free article] [PubMed] [Google Scholar]
- 8. Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I.. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak 2010; 10 (1): 56.doi: 10.1186/1472-6947-10-56[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Marshall IJ, Kuiper J, Wallace BC.. Automating risk of bias assessment for clinical trials. IEEE J Biomed Health Inform 2015; 19 (4): 1406–12. [DOI] [PubMed] [Google Scholar]
- 10. Riveros C, Dechartres A, Perrodeau E, Haneef R, Boutron I, Ravaud P.. Timing and completeness of trial results posted at ClinicalTrials.gov and published in journals. PLOS Med 2013; 10 (12): e1001566–e66. doi: 10.1371/journal.pmed.1001566[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Hartung DM, Zarin DA, Guise JM, McDonagh M, Paynter R, Helfand M.. Reporting discrepancies between the ClinicalTrials.gov results database and peer-reviewed publications. Ann Intern Med 2014; 160 (7): 477–83. doi: 10.7326/m13-0480[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Tang E, Ravaud P, Riveros C, Perrodeau E, Dechartres A.. Comparison of serious adverse events posted at ClinicalTrials.gov and published in corresponding journal articles. BMC Med 2015; 13 (1): 189.doi: 10.1186/s12916-015-0430-4[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Sim I, Detmer DE.. Beyond trial registration: a global trial bank for clinical trial reporting. PLOS Med 2005; 2 (11): e365.doi: 10.1371/journal.pmed.0020365[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Ross JS, Waldstreicher J, Bamford S, et al. Overview and experience of the YODA Project with clinical trial data sharing after 5 years. Sci Data 2018; 5 (1): 180268.doi: 10.1038/sdata.2018.268[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Bierer BE, Li R, Barnes M, Sim I.. A global, neutral platform for sharing trial data. N Engl J Med 2016; 374 (25): 2411–13. [DOI] [PubMed] [Google Scholar]
- 16. Zarin DA, Tse T, Williams RJ, Carr S.. Trial reporting in ClinicalTrials.gov—the final rule. N Engl J Med 2016; 375 (20): 1998–2004. doi: 10.1056/NEJMsr1611785[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Zarin DA, Tse T, Williams RJ, Rajakannan T.. Update on trial registration 11 years after the ICMJE policy was established. N Engl J Med 2017; 376 (4): 383–91. doi: 10.1056/NEJMsr1601330[published Online First: Epub Date]. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Trinquart L, Dunn AG, Bourgeois FT.. Registration of published randomized trials: a systematic review and meta-analysis. BMC Med 2018; 16 (1): 173.doi: 10.1186/s12916-018-1168-6[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Huser V, Cimino JJ.. Linking ClinicalTrials.gov and PubMed to track results of interventional human clinical trials. PloS One 2013; 8 (7): e68409.doi: 10.1371/journal.pone.0068409[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Huser V, Cimino JJ.. Evaluating adherence to the International Committee of Medical Journal Editors’ policy of mandatory, timely clinical trial registration. Journal of the American Medical Informatics Association 2013; 20 (e1): e169–e74. doi: 10.1136/amiajnl-2012-001501[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Zarin DA, Tse T.. Sharing Individual Participant Data (IPD) within the context of the Trial Reporting System (TRS). PLOS Med 2016; 13 (1): e1001946.doi: 10.1371/journal.pmed.1001946[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Zarin DA. The culture of trial results reporting at academic medical centers. JAMA Intern Med 2020; 180 (2): 319–20. doi: 10.1001/jamainternmed.2019.4200 %J JAMA Internal Medicine[published Online First: Epub Date]|. [DOI] [PubMed] [Google Scholar]
- 23. Anderson ML, Chiswell K, Peterson ED, Tasneem A, Topping J, Califf RM.. Compliance with results reporting at ClinicalTrials.gov. N Engl J Med 2015; 372 (11): 1031–39. doi: 10.1056/NEJMsa1409364[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Gopal AD, Wallach JD, Aminawung JA, et al. Adherence to the International Committee of Medical Journal Editors’ (ICMJE) prospective registration policy and implications for outcome integrity: a cross-sectional analysis of trials published in high-impact specialty society journals. Trials 2018; 19 (1): 448.doi: 10.1186/s13063-018-2825-y[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Dunn AG, Day RO, Mandl KD, Coiera E.. Learning from hackers: open-source clinical trials. Science Translational Medicine 2012; 4 (132): 132cm5–132cm5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Mortensen ML, Adam GP, Trikalinos TA, Kraska T, Wallace BC.. An exploration of crowdsourcing citation screening for systematic reviews. Res Syn Meth 2017; 8 (3): 366–86. doi: 10.1002/jrsm.1252[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Bashir R, Surian D, Dunn AG.. The risk of conclusion change in systematic review updates can be estimated by learning from a database of published examples. Journal of Clinical Epidemiology 2019; 110: 42–49. doi: 10.1016/j.jclinepi.2019.02.015[published Online First: Epub Date]|. [DOI] [PubMed] [Google Scholar]
- 28. Surian D, Dunn AG, Orenstein L, Bashir R, Coiera E, Bourgeois FT.. A shared latent space matrix factorisation method for recommending new trial evidence for systematic review updates. Journal of Biomedical Informatics 2018; 79: 32–40. doi: 10.1016/j.jbi.2018.01.008[published Online First: Epub Date]|. [DOI] [PubMed] [Google Scholar]
- 29. Wallace BC, Kuiper J, Sharma A, Zhu MB, Marshall IJ.. Extracting PICO sentences from clinical trial reports using supervised distant supervision. Journal of Machine Learning Research 2016; 17: 132. [PMC free article] [PubMed] [Google Scholar]
- 30. Martin P, Surian D, Bashir R, Bourgeois FT, Dunn AG.. Trial2rev: combining machine learning and crowd-sourcing to create a shared space for updating systematic reviews. JAMIA Open 2019; 2 (1): 15–22. doi: 10.1093/jamiaopen/ooy062[published Online First: Epub Date]|. [DOI] [PMC free article] [PubMed] [Google Scholar]