Skip to main content
BMJ Open logoLink to BMJ Open
. 2025 Nov 5;15(11):e106546. doi: 10.1136/bmjopen-2025-106546

Comparing the accuracy of AI-assisted data extraction versus human double extraction in evidence synthesis: a randomised controlled trial protocol

Zhen Peng 1,2, Shiqi Fan 3, Yuan Tian 3, Yingxia Wang 4, Zongshi Qin 5, Suhail Doi 6, Chang Xu 1,3,7,
PMCID: PMC12593456  PMID: 41198207

Abstract

Introduction

Traditional data extraction strategies, such as human double extraction, are both time consuming and labour-intensive. Artificial intelligence (AI) has emerged as a promising tool for facilitating data extraction. However, it is not yet suitable as a standalone solution. We will conduct a randomised controlled trial (RCT) to compare the efficiency and accuracy of the AI-human data extraction strategy with human double extraction.

Methods and analysis

This study is designed as a randomised, controlled, parallel trial. Participants will be randomly assigned to either the AI group or the non-AI group at a 1:2 allocation ratio. The AI group will use a hybrid approach that combines AI extraction followed by human verification by the same participant, while the non-AI group will use human double extraction. Data will be collected for two tasks: event count and group size. Ten RCTs will be selected from an established database that analysed data extraction errors in systematic reviews of sleep medicine. The primary outcome measure will be the percentage of correct extractions by both groups for each data extraction task.

Ethics and dissemination

The trial is approved by the Ethics Council of Anhui Medical University (No. 81250507). We plan to publish the main results as an academic publication in an international peer-reviewed journal in 2026.

Trial registration number

Chinese Clinical Trial Register (Identifier: ChiCTR2500100393).

Keywords: Artificial Intelligence, Randomized Controlled Trial, Information Extraction


STRENGTHS AND LIMITATIONS OF THIS STUDY.

  • This study is designed as a randomised, controlled, parallel trial.

  • Strict randomisation procedure.

  • Strict data quality control.

  • As the participants need to be aware of the data extraction approach in order to extract data, double-blinding is not feasible.

Introduction

Evidence synthesis is a rigorous process that involves collating empirical data from existing research to provide a comprehensive and up-to-date understanding of specific research questions.1 Standardised procedures are essential in evidence synthesis to safeguard against potential bias and error. Therefore, conducting a well-executed evidence synthesis requires meticulous attention to detail at each stage, making the process both time-consuming and labour-intensive.2 3 A critical component of evidence synthesis is the extraction of data from eligible studies identified through systematic reviews.4 However, reproducibility studies have shown that data extraction errors are prevalent, with error rates of 17% at the study level and 66.8% at the meta-analysis level.5 These errors can undermine the credibility of evidence syntheses, diminishing their usefulness in healthcare practice and potentially leading to incorrect conclusions and misguided decisions. Therefore, there is an urgent need for effective methods to address these challenges.

Given the rapid advancement of technology, artificial intelligence (AI)—including machine learning, artificial neural networks and deep learning—has demonstrated considerable potential in various domains, particularly in medicine, such as enhancing diagnostics and supporting drug development.6,8 Among AI approaches, large language models (LLMs) have the potential to streamline data extraction in evidence synthesis, thereby reducing manual effort, minimising human errors and improving efficiency.9 While LLMs exhibit amendable accuracy in extracting data from certain studies because of their ability to capture contextual information and advanced semantic understanding, their performance varies considerably across different tasks.10,12 Furthermore, the accuracy of various AI tools differs, and the overall performance still falls short of the human double extraction recommended by the current guidelines.13 14 Several challenges also persist, including the need for domain-specific guidance and rigorous validation of the outputs.9 Therefore, the current AI tools for automated data extraction have not been adequately developed for widespread practical application.

Recent studies have indicated that the accuracy of certain AI tools in single data extraction surpasses that of individual human extraction, although it remains inferior to the accuracy achieved through human double extraction.15 This finding suggests that researchers could benefit from a collaborative approach that uses AI tools as data extractors alongside human extractors. However, there is a lack of studies assessing the effectiveness of this AI–human data extraction strategy. To address this gap, we will conduct a randomised controlled trial (RCT) to investigate whether the AI–human data extraction strategy improves extraction accuracy compared with traditional human double extraction.

Methods

Study design

This study is designed as a randomised, controlled, parallel trial, and a detailed flowchart of the trial design is presented in figure 1. The entire trial will be conducted online and is scheduled to run from October 2025 to December 2025. Participants will initially be randomly assigned to one of the two groups for data extraction: Group A (AI group) will use a hybrid approach combining AI-assisted single extraction followed by human verification, and Group B (non-AI group) will use human double extraction.16 In Group A, each participant will use an AI tool (Claude 3.5, developed by Anthropic, San Francisco, California, USA) for data extraction and then verify the results generated by the AI tool to ensure accuracy. In Group B, each pair of participants will independently extract data, followed by a cross-verification process to ensure accuracy. This design aims to minimise bias and enhance the reliability of the data extraction process. The study is designed in line with the Consolidated Standards of Reporting Trials (CONSORT), the Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) statements, CONSORT-AI and SPIRIT-AI, which will specifically focus on clinical trials in which the intervention includes an AI component.17,19

Figure 1. Flowchart of data extraction. AI, artificial intelligence.

Figure 1

Participants

Eligible participants will consist of graduate students, research assistants and undergraduate health sciences students from various universities or hospitals.20 This trial will include individuals who meet the following criteria:

  1. Has published, or is listed as a co-author on, at least one systematic review or meta-analysis.

  2. Has medical or health science backgrounds.

  3. Has proficiency in reading scientific articles in English, as demonstrated by passing the College English Test Band 4 or higher, a nationally recognised English proficiency examination in China.

  4. Has provided informed consent.

Participants will be recruited using various methods, including online promotions and the distribution of offline recruitment advertisements in multiple universities and hospitals. Once a participant is confirmed to meet all eligibility criteria for the trial, they will receive a link to electronically sign an informed consent form (online supplemental file). Random allocation to groups will proceed only on completion of the consent process. The recruitment will be performed using the Wenjuanxing system (Changsha Ranxing Information Technology Co., Ltd., Changsha, China), an online survey and data recording platform. Prior to random grouping, we will provide training to all participants on using Claude for data extraction and the Wenjuanxing system for data recording.

Studies for data extraction

The studies selected for data extraction will be drawn from our established database of meta-analyses on sleep medicine. This database, established in 2021, comprises interventional systematic reviews of RCTs featuring one or more pairwise meta-analyses from 19 sleep medicine journals. Data for these meta-analyses were independently extracted from forest plots or tables by two groups of investigators. To ensure accuracy, the extracted metadata were cross-verified against the original RCTs, and any data extraction errors were identified, recorded and corrected. The finalised database includes error-corrected data from 298 meta-analyses encompassing 648 trials across 48 systematic reviews and serves as the ‘gold standard’ for evaluating the accuracy of data re-extracted in this study. A comprehensive list of journals and a detailed search strategy was documented in our previous study.21

Building on prior meta-epidemiological studies, we established that each group will be given 10 studies for data extraction.22,24 In accordance with recommendations of the Cochrane Handbook, these data included information requiring subjective interpretation and details critical to the interpretation of results (eg, outcome data), which Cochrane emphasises should be extracted independently.13 In instances where a meta-analysis contains more than 10 studies, a simple random sampling method will be used to select 10 studies randomly.25 Our leading co-investigators will contribute to the selection of reviews and outcomes.

Data extraction tasks

For continuous outcomes, researchers may need to estimate SD from SEs or IQRs owing to missing value.26 While there are various methods for the estimation that may result in inconsistency among human extractors. Therefore, we will concentrate on binary outcomes for data extraction in this study. In standard evidence synthesis, binary outcomes typically involve group sizes and event counts for intervention and control groups. These data were categorised into the following two tasks:

  • Task 1: group size for intervention and control group for all trials.

  • Task 2: event count for the intervention and control group for all trials.

Since some studies may have multiple intervention and control groups, in order to minimise the impact of group variations, we will select studies that have only one intervention group and one control group.

AI-powered tool for data extraction

There will be three steps for the AI tool for the data extraction tasks. First, primary prompts (questions or statements to interact with AI) for each task will be carefully formulated by a researcher and modified by Claude by leveraging the original prompts, such as “Please design the best prompt for me based on this prompt: …”27 Second, we will conduct iterative testing on five RCTs to further refine the prompts for each task. Third, the leading investigators will review the outputs and provide feedback for prompt refinement in multiple iterations until the results align consistently with those of the experts. The finalised prompt will consist of three components: an introduction outlining the content to be extracted, guidelines detailing the extraction process and specifications for the output format. Finally, these refined prompts will be input into Claude to execute the data extraction tasks. Each prompt will be conducted in a new session to alleviate the memory retention bias.

One important element in the prompts is the term for the outcome, as it helps AI tools identify the targeted outcome in the trials and locate the data that need to be extracted. However, terms of outcomes may differ across trials for the same condition; for example, ‘fasting glucose’ may be reported as ‘fasting plasma glucose’ or ‘plasma glucose’ in different studies.28 The AI tool will be asked to determine appropriate synonyms for the term used, enabling the AI to accurately extract the relevant results from the RCTs.

Some studies have demonstrated that the accuracy of data extraction using AI tools is notably lower for image-based data than for textual data.2 To mitigate the influence of data format on extraction accuracy, this study will restrict data extraction to text-based sources exclusively. Consequently, any studies presenting critical information such as event counts or sample sizes in image format will be excluded from the analysis.

Randomisation process

The random sequences and group assignments will be generated using a computer-based random number generator by an independent third party.25 Participants will be randomised into the AI and non-AI groups in 1:2 allocation ratio using simple randomisation. Participants in the non-AI group will be further randomly matched (1:1) to form a pair based on a random sequenced number for the double-checking process. The will communicate the random sequence and group assignments to the enrolled participants via email. This communication will also include a Quick Response (QR) code generated by the Wenjuanxing System, along with a data extraction notice and PDF files of RCTs intended for data extraction.

As the participants need to be aware of the data extraction approach to extract data, double-blinding is not feasible. To reduce potential bias, investigators responsible for communicating with participants will not be involved in any other trial-related activities. All other researchers will remain blinded and unaware of the random sequence until it is disclosed by an independent third party. Outcome assessors and statistical analysts will also be blinded, as they will not have access to randomisation details. Participants will be informed of their group assignments; however, allocation concealment will be maintained through a unique, password-protected group assignment notification sent via email. To prevent data sharing among participants within or between groups, the order of the studies in the data extraction form will also be randomised for each participant.

Process of study and data quality control

Detailed information regarding the study process is presented in table 1. Data extraction will be performed online using the Wenjuanxing system (https://www.wjx.cn/). The research team will develop a standardised data extraction form within the Wenjuanxing system. A QR code linked to the data extraction form will be generated and distributed to participants via email. To maintain the integrity of the trial, research assistants will proactively contact the participants from both groups to coordinate a simultaneous data collection at a predetermined time. This QR code will be activated precisely at the designated moment, and any attempt to scan the code before this time will be unsuccessful. Furthermore, to assess the efficiency of the data extraction process, the total time taken by both the AI and the participants to complete the extraction will be recorded. A template spreadsheet will be created, which contains only the study list and the necessary column titles for the extraction process (table 2).

Table 1. Process of data extraction.

Study cycle Screening period Data extraction period Research termination
Single extraction Verification
Study process
 Informed consent X
 Enrolled criteria X
 Demographic data X
Randomisation X
Training
 AI training X
 Wenjuanxing training X
 Study process training X
Strategy of data extraction
 AI+human X X*
 Human double extraction X X
 Data submission and quality control X X X
*

Manual check of the data extracted by the AI tool.

Cross-check the data.

AI, artificial intelligence.

Table 2. Data extraction form.

Section 1: basic information
Enrolment number Single extraction/verification period Age
Gender Academic major University
Section 2: data extraction form
Study Innervation group Control group
Event count Sample size Reason for failing extraction Event count Sample size Reason for failing extraction Time for using (min)
No. 1
No. 2
No. 3
No. 4
No. 5
No. 6
No. 7
No. 8
No. 9
No. 10

In Group A (AI group), the participant will initially extract data using Claude and complete the data extraction form within the Wenjuanxing system. On completion of the initial data extraction, this participant will verify the responses from Claude and correct any inaccuracy.20 The extraction process will be considered complete once the participant verifies the data and inputs the confirmed results into the Wenjuanxing system.

In Group B (non-AI group), participants will independently extract data from the assigned studies using the data form in the Wenjuanxing system. On completing their independent extractions, those participants will be randomly matched 1:1 for the double-checking process. The matched participants will notify each other and develop a plan for adjudication (eg, via video call or phone call). They will then compare their extractions and address any discrepancies. Both participants will correct any inaccuracies in their own data form as necessary. Once a consensus is reached on all extracted data, the extraction process will be considered complete and one of the participants will input the confirmed results into the Wenjuanxing system.

To ensure data quality, the extraction process will be conducted under real-time monitoring using Tencent Conference (Tencent Technology (Shenzhen) Co., Ltd., Shenzhen, China), an online video conferencing platform. Participants will be prohibited from communicating with one another during the single data extraction process. This study does not include stopping rules or a data safety and monitoring board, as it does not assess the safety or effectiveness of any intervention on health outcomes. No adverse events are expected to occur during the data extraction process.16 23

Participant follow-up

The follow-up period for the trial will extend from the start of data extraction to its completion. No additional follow-up will be necessary, as the objective of the trial is focused on the data extraction process. Consequently, we anticipate a low dropout rate.

Public involvement

No public will be involved in setting the research question or the outcome measures, nor will they be involved in developing plans for recruitment, design or implementation of the study.

Sample size calculation

This study will use a differential design. The sample size between the two groups is estimated using the formula29:

Sample size = [Z1β+Z1α2]²[πλ(1πλ)+π(1π)](πλπ)²,

where Z is the standard score indicating the number of SD from the mean, α is the significance level, β is the statistical power reflecting the ability to reject the null hypothesis when a true effect exists. In addition, πλ represents the accuracy of data extraction in the AI group, and πₒ is the accuracy in the non-AI group. The sample size is estimated at task levels. Total sample size refers to the number of participants (n) multiplied by the number of studies (k), that is, n×k and tasks. We consider α=0.05 and β=0.9. Based on previous trials, πλ is set at 65% and πₒ at 75%.15 Based on these settings, we obtained sample sizes of 22 (436/20) in Group A and 44 (436/10) in Group B. Accounting for 20% dropout rate, we take 27 as the minimum sample size in Group A and 54 in Group B, thus requiring at least 81 participants in total.

Study outcome

The primary outcome is the percentage of correct extractions by the two groups to identify information for each data extraction task. For instance, in a task requiring the extraction of death events for the intervention and control groups across 10 RCTs, if the number of deaths is correctly extracted in 7 RCTs for the intervention group and 8 RCTs for the control group, the overall accuracy for the task is 75% (15/20).

The other outcomes to be evaluated include:

  1. The difference in data extraction accuracy between the two groups.

  2. The percentage of each type of error. Based on our recent work, errors during data extraction can arise from mechanisms such as numerical error, ambiguous data error, mismatching error and zero assumption error.21

  3. The time required to complete the data extraction process, including both tasks and any necessary verification or adjudication. Time will be measured using two methods: (1) an automatic timer embedded in the Wenjuanxing system, which records the duration from scanning the QR code to completing and submitting the data sheet, and (2) a self-reported method, in which participants logged the time they spent on data extraction. The primary analysis will rely on the automatically recorded times.16 In addition, the time required for designing and tuning prompts will be recorded for the AI group, and the time spent on pairwise agreement will be recorded for the non-AI group. These times will be documented separately by the research assistant.

To ensure the accuracy of recorded data extraction time, all participants, including those using AI tools for extraction, will be required to scan a QR code with their mobile phones prior to extracting data from each article. Subsequently, they will promptly enter the extracted data for each article into the Wenjuanxing system. Participants will be prohibited from extracting data from all articles first and subsequently entering the data into the system in bulk. This methodology will enable the precise recording of the actual time spent on data extraction. Before the data verification phase, research assistants will distribute a new QR code to the participants in the AI group and to one randomly selected participant from each pair in the non-AI group. At the initiation of the data verification process, the participants who receive the QR code will scan it again and enter the verified data into the Wenjuanxing system. Ultimately, the total data extraction time will be calculated as the sum of the longest duration taken by any participant in each pair during the individual data extraction phase, along with the time spent on data verification.

Data analysis

The baseline characteristics of the participants will be summarised descriptively. For categorical variables, frequencies and proportions will be reported, while continuous variables will be presented as means with SD or medians with IQR, depending on the data distribution.

For the main analysis, the percentage of correct extraction and its 95% CI, accuracy differences between the two groups and their 95% CIs will be estimated by a generalised linear model to avoid the potential boundary issue for extreme percentages.30 Comparisons between groups will also be performed using a two-tailed χ2 test or Fisher’s exact probability method. Accuracy differences will be calculated, with relative differences estimated as odds ratio (OR) to facilitate the generalisation of effects across other studies.30 31 To account for the hierarchical structure of the data, where accuracy is influenced by task-level and centre-level estimates, a generalised linear mixed model will be constructed by treating task as level 1 and centre as level 2 to address this problem. The mean time of data extraction will be tested using the independent sample t-test or Wilcoxon sum test. The percentage of each type of error and its 95% CI will also be calculated. All statistical analyses will be performed using Stata/SE V.18.0 (StataCorp LLC, College Station, TX, USA), with a significance level of α=0.05.

We anticipate that some participants may fail to identify targeted events in RCTs, resulting in missing data. Participants who are unable to extract information from the trials, classified as ‘inter-current events’, must record the reason for their failure to extract data. Corresponding blank cells in the data sheet will be marked as ‘None’. The potential impact of missing values will be evaluated using post-hoc sensitivity analyses.

Discussion

This protocol outlines the design, implementation and analysis plan for an upcoming trial aimed at generating robust evidence to support qualified data extraction in evidence synthesis practice.25 The study introduces several innovative elements, including a novel strategy that integrates an LLM with human verification. If proven effective, this strategy has the potential to significantly enhance the efficiency of the current data extraction process. Additionally, we will use a rigorous randomised controlled design to compare this approach with traditional human double extraction, thereby validating the effectiveness of the proposed strategy. These findings are expected to provide robust evidence to inform and improve future data extraction practices.

Several limitations of this study also warrant consideration. First, to ensure the feasibility of the trial, we restrict the participants to those with medical or health-related backgrounds, potentially reducing the representativeness of the sample. Second, while data extraction will be performed online, there remains the risk of participants sharing data with others, which can compromise the dataset and introduce bias. Several measures will be implemented to mitigate this risk, including online monitoring of the extraction process, randomising the study list in the spreadsheet, and using random assignment and allocation concealment strategies to maintain the integrity of the study. Third, the type of data selected for extraction is binary outcomes and derived from the field of sleep medicine, which may influence the results. Finally, as this study used Claude as the AI tool, the continuous evolution of LLMs and the emergence of alternative models will require future evaluations.

In summary, this study is expected to provide valuable evidence regarding the effectiveness of AI assistance in enhancing data extraction accuracy. Furthermore, the findings aim to inform the best practices for data extraction and contribute to the development of improved strategies for future evidence synthesis.

Ethics and dissemination

The trial has received approval from the Ethics Council of Anhui Medical University (No.81250507). We intend to publish the main findings as an academic article in an international peer-reviewed journal in 2026. Anonymised data will be made accessible to external researchers on conclusion of the project, defined as the completion of publications by the study team.

Supplementary material

online supplemental file 1
bmjopen-15-11-s001.doc (35.5KB, doc)
DOI: 10.1136/bmjopen-2025-106546

Footnotes

Funding: This study will be supported by the National Natural Science Foundation of China (72204003, 72574229), and Hefei Comprehensive National Science Center (0301035204). The funding bodies had no role in any process of the study. The findings herein reflect the work, and are solely the responsibility of the authors.

prepub: Prepublication history and additional supplemental material for this paper are available online. To view these files, please visit the journal online (https://doi.org/10.1136/bmjopen-2025-106546).

Provenance and peer review: Not commissioned; externally peer reviewed.

Patient consent for publication: Not applicable.

Patient and public involvement: Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

References

  • 1.Gurevitch J, Koricheva J, Nakagawa S, et al. Meta-analysis and the science of research synthesis. Nature. 2018;555:175–82. doi: 10.1038/nature25753. [DOI] [PubMed] [Google Scholar]
  • 2.Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015;4:78. doi: 10.1186/s13643-015-0066-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Mathes T, Klaßen P, Pieper D. Frequency of data extraction errors and methods to increase data extraction quality: a methodological review. BMC Med Res Methodol. 2017;17:152. doi: 10.1186/s12874-017-0431-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Li T, Vedula SS, Hadar N, et al. Innovations in data collection, management, and archiving for systematic reviews. Ann Intern Med. 2015;162:287–94. doi: 10.7326/M14-1603. [DOI] [PubMed] [Google Scholar]
  • 5.Xu C, Yu T, Furuya-Kanamori L, et al. Validity of data extraction in evidence synthesis practice of adverse events: reproducibility study. BMJ. 2022;377:e069155. doi: 10.1136/bmj-2021-069155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kufel J, Bargieł-Łączek K, Kocot S, et al. What Is Machine Learning, Artificial Neural Networks and Deep Learning?-Examples of Practical Applications in Medicine. Diagnostics (Basel) 2023;13:2582. doi: 10.3390/diagnostics13152582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cascella M, Montomoli J, Bellini V, et al. Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios. J Med Syst. 2023;47:33. doi: 10.1007/s10916-023-01925-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al. Large language models in medicine. Nat Med. 2023;29:1930–40. doi: 10.1038/s41591-023-02448-8. [DOI] [PubMed] [Google Scholar]
  • 9.Schilling-Wilhelmi M, Ríos-García M, Shabih S, et al. From text to insight: large language models for chemical data extraction. Chem Soc Rev. 2025;54:1125–50. doi: 10.1039/d4cs00913d. [DOI] [PubMed] [Google Scholar]
  • 10.Gartlehner G, Kahwati L, Hilscher R, et al. Data extraction for evidence synthesis using a large language model: A proof-of-concept study. Res Synth Methods. 2024;15:576–89. doi: 10.1002/jrsm.1710. [DOI] [PubMed] [Google Scholar]
  • 11.Ge J, Li M, Delk MB, et al. A comparison of large language model versus manual chart review for extraction of data elements from the electronic health record. Gastroenterology. 2024; 166(4):707-709.e3 doi: 10.1053/j.gastro.2023.12.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Konet A, Thomas I, Gartlehner G, et al. Performance of two large language models for data extraction in evidence synthesis. Res Synth Methods. 2024;15:818–24. doi: 10.1002/jrsm.1732. [DOI] [PubMed] [Google Scholar]
  • 13.Li T, Higgins JPT. In: Cochrane handbook for systematic reviews of interventions version 6.3 (updated February 2022) Higgins JPT, Thomas J, Chandler J, et al., editors. Cochrane; 2022. Chapter 5: collecting data. [Google Scholar]
  • 14.Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi: 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Tang L, Wang R, Doi SAR, et al. Effects of double data extraction on errors in evidence synthesis: a crossover, multicenter, investigator-blinded, randomized controlled trial. Postgrad Med J. 2025;101:603–11. doi: 10.1093/postmj/qgae195. [DOI] [PubMed] [Google Scholar]
  • 16.Li T, Saldanha IJ, Jap J, et al. A randomized trial provided new evidence on the accuracy and efficiency of traditional vs. electronically annotated abstraction approaches in systematic reviews. J Clin Epidemiol. 2019;115:77–89. doi: 10.1016/j.jclinepi.2019.07.005. [DOI] [PubMed] [Google Scholar]
  • 17.Chan A-W, Tetzlaff JM, Altman DG, et al. SPIRIT 2013 statement: defining standard protocol items for clinical trials. Ann Intern Med. 2013;158:200–7. doi: 10.7326/0003-4819-158-3-201302050-00583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dwan K, Li T, Altman DG, et al. CONSORT 2010 statement: extension to randomised crossover trials. BMJ. 2019;366:l4378. doi: 10.1136/bmj.l4378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rivera SC, Liu X, Chan AW, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension. BMJ. 2020;370:m3210 doi: 10.1136/bmj.m3210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Saldanha IJ, Schmid CH, Lau J, et al. Evaluating Data Abstraction Assistant, a novel software application for data abstraction during systematic reviews: protocol for a randomized controlled trial. Syst Rev. 2016;5:196. doi: 10.1186/s13643-016-0373-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Xu C, Doi SAR, Zhou X, et al. Data reproducibility issues and their potential impact on conclusions from evidence syntheses of randomized controlled trials in sleep medicine. Sleep Med Rev. 2022;66:101708. doi: 10.1016/j.smrv.2022.101708. [DOI] [PubMed] [Google Scholar]
  • 22.Furuya-Kanamori L, Lin L, Kostoulas P, et al. Limits in the search date for rapid reviews of diagnostic test accuracy studies. Res Synth Methods. 2023;14:173–9. doi: 10.1002/jrsm.1598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Rosenberger KJ, Xu C, Lin L. Methodological assessment of systematic reviews and meta-analyses on COVID-19: A meta-epidemiological study. J Eval Clin Pract. 2021;27:1123–33. doi: 10.1111/jep.13578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Puljak L, Makaric ZL, Buljan I, et al. What is a meta-epidemiological study? Analysis of published literature indicated heterogeneous study designs and definitions. J Comp Eff Res. 2020;9:497–508. doi: 10.2217/cer-2019-0201. [DOI] [PubMed] [Google Scholar]
  • 25.Zhu Y, Ren P, Doi SAR, et al. Data extraction error in pharmaceutical versus non-pharmaceutical interventions for evidence synthesis: Study protocol for a crossover trial. Contemp Clin Trials Commun. 2023;35:101189. doi: 10.1016/j.conctc.2023.101189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Munn Z, Peters MDJ, Stern C, et al. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med Res Methodol. 2018;18:143. doi: 10.1186/s12874-018-0611-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198. doi: 10.1371/journal.pdig.0000198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Abud R, Salgueiro M, Drake L, et al. Efficacy of continuous positive airway pressure (CPAP) preventing type 2 diabetes mellitus in patients with obstructive sleep apnea hypopnea syndrome (OSAHS) and insulin resistance: a systematic review and meta-analysis. Sleep Med. 2019;62:14–21. doi: 10.1016/j.sleep.2018.12.017. [DOI] [PubMed] [Google Scholar]
  • 29.Julious SA, Campbell MJ. Tutorial in biostatistics: sample sizes for parallel group clinical trials with binary data. Stat Med. 2012;31:2904–36. doi: 10.1002/sim.5381. [DOI] [PubMed] [Google Scholar]
  • 30.Doi SA, Furuya-Kanamori L, Xu C, et al. The Odds Ratio is “portable” across baseline risk but not the Relative Risk: Time to do away with the log link in binomial regression. J Clin Epidemiol. 2022;142:288–93. doi: 10.1016/j.jclinepi.2021.08.003. [DOI] [PubMed] [Google Scholar]
  • 31.Austin PC. A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications. Int Stat Rev. 2017;85:185–203. doi: 10.1111/insr.12214. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    online supplemental file 1
    bmjopen-15-11-s001.doc (35.5KB, doc)
    DOI: 10.1136/bmjopen-2025-106546

    Articles from BMJ Open are provided here courtesy of BMJ Publishing Group

    RESOURCES