Table 3.
Pearson Chi-Square test to analyze the distribution accuracy of chatbots over 10 days
| Days | Total | p value* | ||||||||||||
| 1 st Day | 2nd Day | 3rd Day | 4th Day | 5th Day | 6th Day | 7th Day | 8th Day | 9th Day | 10th Day | |||||
| ChatGPT-4o | Correct | N | 128 | 136 | 145 | 147 | 141 | 133 | 131 | 129 | 129 | 91 | 1310 | 0.001 |
| % | 9.8% | 10.4% | 11.1% | 11.2% | 10.8% | 10.2% | 10.0% | 9.8% | 9.8% | 6.9% | 100.0% | |||
| Incorrect | N | 52 | 44 | 35 | 33 | 39 | 47 | 49 | 51 | 51 | 89 | 490 | ||
| % | 10.6% | 9.0% | 7.1% | 6.7% | 8.0% | 9.6% | 10.0% | 10.4% | 10.4% | 18.2% | 100.0% | |||
| Total | N | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 1800 | ||
| % | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 100.0% | |||
| DeepSeek | Correct | N | 155 | 154 | 155 | 159 | 152 | 161 | 157 | 159 | 158 | 155 | 1565 | 0.949 |
| % | 9.9% | 9.8% | 9.9% | 10.2% | 9.7% | 10.3% | 10.0% | 10.2% | 10.1% | 9.9% | 100.0% | |||
| Incorrect | N | 25 | 26 | 25 | 21 | 28 | 19 | 23 | 21 | 22 | 25 | 235 | ||
| % | 10.6% | 11.1% | 10.6% | 8.9% | 11.9% | 8.1% | 9.8% | 8.9% | 9.4% | 10.6% | 100.0% | |||
| Total | N | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 1800 | ||
| % | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 100.0% | 0.885 | ||
| Gemini | Correct | N | 141 | 145 | 141 | 142 | 141 | 146 | 141 | 149 | 140 | 135 | 1421 | |
| % | 9.9% | 10.2% | 9.9% | 10.0% | 9.9% | 10.3% | 9.9% | 10.5% | 9.9% | 9.5% | 100.0% | |||
| Incorrect | N | 39 | 35 | 39 | 38 | 39 | 34 | 39 | 31 | 40 | 45 | 379 | ||
| % | 10.3% | 9.2% | 10.3% | 10.0% | 10.3% | 9.0% | 10.3% | 8.2% | 10.6% | 11.9% | 100.0% | |||
| Total | N | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 1800 | ||
| % | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 100.0% | |||
| Perplexity | Correct | N | 124 | 133 | 135 | 129 | 132 | 124 | 131 | 144 | 106 | 130 | 1288 | 0.005 |
| % | 9.6% | 10.3% | 10.5% | 10.0% | 10.2% | 9.6% | 10.2% | 11.2% | 8.2% | 10.1% | 100.0% | |||
| incorrect | N | 56 | 47 | 45 | 51 | 48 | 56 | 49 | 36 | 74 | 50 | 512 | ||
| % | 10.9% | 9.2% | 8.8% | 10.0% | 9.4% | 10.9% | 9.6% | 7.0% | 14.5% | 9.8% | 100.0% | |||
| Total | N | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 180 | 1800 | ||
| % | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 10.0% | 100.0% | |||
*Pearson Chi-Square test