Table 4.
Per-article appraisal time for ChatGPT-4.0 and DeepSeek R1 compared with manual evaluation (seconds).
| Tools | Models | Minimum | Maximum | Mean | 95%CI |
|---|---|---|---|---|---|
| AMSTAR2 | ChatGPT-4.0 | 12.83 | 24.00 | 16.94 | 15.98–17.99 |
| DeepSeek R1 | 31.28 | 111.69 | 65.02 | 56.25–73.78 | |
| Manual | – | – | 1,200.00 | – | |
| CASP | ChatGPT-4.0 | 18.08 | 28.21 | 22.26 | 20.98–23.55 |
| DeepSeek R1 | 28.97 | 157.97 | 58.22 | 43.91–72.52 | |
| Manual | – | – | 750.00 | – | |
| PEDro | ChatGPT-4.0 | 13.41 | 25.32 | 18.41 | 17.16–19.66 |
| DeepSeek R1 | 31.83 | 64.26 | 44.95 | 41.66–48.25 | |
| Manual | – | – | 2,700.00 | – | |
| ROB2 | ChatGPT-4.0 | 15.16 | 25.63 | 19.13 | 17.56–20.70 |
| DeepSeek R1 | 29.83 | 61.57 | 40.92 | 36.62–45.23 | |
| Manual | – | – | 1,680.00 | – | |
| Overall | ChatGPT-4.0 | 12.83 | 28.21 | 19.19 | 18.16–19.59 |
| DeepSeek R1 | 28.97 | 157.97 | 52.28 | 48.50–57.23 | |
| Manual | – | – | 1,582.50 | – |