Table 3.
App/ tested GP | Percentage of safe advice | P value (difference to GP mean) |
Ada | 97.0 | NS |
Babylon | 95.1 | NS |
Buoy | 80.0 | <0.001* |
K Health | 81.3 | <0.001* |
Mediktor | 87.3 | 1.3×10–3* |
Symptomate | 97.8 | NS |
Your.MD | 92.6 | NS |
App mean±SD. | 90.1±7.4 | – |
GP mean±SD. | 97.0±2.5 | – |
GP1 | 96.0 | NS |
GP2 | 96.9 | NS |
GP3 | 94.0 | NS |
GP4 | 99.0 | NS |
GP5 | 100.0 | NS |
GP6 | 93.9 | NS |
GP7 | 99.5 | NS |
*P<0.05. For two of these apps (K Health & Your.MD), one app-entry-Dr (#4) did not record all screenshots needed for source data verification—see online supplemental table 6 for a subanalysis of fully verified data, which shows the same trend of results and no significant difference to the data recorded here). This analysis is for those vignettes for which urgency advice was provided (ie, a ‘provided answer) analysis.
GP, general practitioner; NS, no significant difference.