Skip to main content
. 2019 Feb 11;26(4):324–338. doi: 10.1093/jamia/ocy179

Table 3.

Summary of articles including error analyses (n = 29)

Measure Medical Domain Articles Summary of Study Designs and Findings
Percentage of documents with errors (n = 13) Radiology McGurk et al (2008)64
  • Study design

  • Retrospective, cross-sectional by input method, with real reports24,41,64,66,85,87

  • Retrospective, cross-sectional by report type, with real reports24,85,86,90,132

  • Retrospective study with real reports84,88,89,91

  • Prospective, cross-sectional study with real reports24

  • Number of speakersa

  • Median: 1986

  • Range: 241 to 14789

  • Number of documents evaluated

  • Median: 30885

  • Range: 10024,41,91 to 584 87887

  • Percentage of finalized documents with errors

  • Median: 26.9%86

  • Range: 4.8%64 to 71%91

Pezzullo et al (2008)24
Quint et al (2008)84
Strahan and Schneider-Kolsky (2010)41
Basma et al (2011)85
Chang et al (2011)86
Luetmer et al (2013)87
Hawkins et al (2014)88
du Toit et al (2015)66
Ringler et al (2015)89
Motyer et al (2016)90
Emergency Department Goss et al (2016)91
Multiple Zhou et al (2018)132
Mean errors per document (n = 7) Radiology Hawkins et al (2012)26
  • Study design

  • Retrospective study of real reports88,90,91

  • Retrospective, cross-sectional by report type, with real reports132

  • Prospective, cross-sectional by report type, with real reports26

  • Controlled lab setting, cross-sectional by input method, with real reports44

  • Observational study, cross-sectional by input method, with real reports10

  • Number of speakersb

  • Median: 1291

  • Range: 244 to 144132

  • Number of documents

  • Median: 217132

  • Range: 2010 to 117326

  • Mean errors per document

  • Median: 1.391

  • Range: 0.2490 to 2.544

Hawkins et al (2014)88
Motyer et al (2016)90
Emergency Department Zick and Olsen (2001)44
Goss et al (2016)91
Dentistry Feldman and Stevens (1990)10
Multiple Zhou et al (2018)132
Accuracyc (n = 6) Radiology Herman (1995)92
  • Study design

  • Controlled lab setting, cross-sectional by input method, with real reports20,44,94

  • Controlled lab setting with real reports92

  • Controlled lab setting with fictional patient scenarios93

  • Retrospective, cross-sectional study with real reports33

  • Number of speakersd

  • Median: 1.5

  • Range: 120,93 to 533

  • Number of words evaluatede

  • Median: 6019

  • Range: 727793 to 18 72192

  • Accuracy

  • Median: 96.4%

  • Range: 73%93 to 98.5%44, but often varied within studies based on the configuration of the SR system(s) evaluated

Ramaswamy et al (2000)33
Ichikawa et al (2007)20
Emergency Department Zick and Olsen (2001)44
Nursing Suominen and Ferraro (2013)93
Unspecified Zafar et al (1999)94
Word error ratef (n = 4) Emergency Department Zemmel et al (1996)59
  • Study design

  • Controlled lab setting, cross-sectional by report type, with real reports19,95

  • Controlled lab setting, cross-sectional by SR system configuration59

  • Retrospective, cross-sectional by report type, with real reports132

  • Number of speakersg

  • Median: 1295

  • Range: 759 to 144132

  • Number of documents

  • Median: 46

  • Range: 759 to 217132

  • Number of wordsh

  • Median: 60 874

  • Range: 11 56895 to 110 180132

  • Word error rate

  • Median: 14.5% with general vocabularies, 11% with specialized vocabularies

  • Range: 7.4%132 to 38.72%19 with general vocabularies; 5.21%19 to 9%59 with specialized vocabularies

Internal Medicine Devine et al (2000)95
Otorhinolaryngology Ilgner et al (2006)19
Multiple Zhou et al (2018)132
Other (n = 3) Emergency Department Hodgson et al (2017)57
  • Controlled lab setting, cross-sectional by input method; 35 participants were randomly allocated simple and complex clinical tasks

  • 138 total errors with minor, moderate, or major potential for patient harm with SR across simple and complex tasks, vs 32 with keyboard and mouse

Internal Medicine Zafar et al (2004)96
  • Retrospective analysis of 148 real reports (104 created by 1 speaker with SR, 44 human transcribed with multiple speakers)

  • 9 identified categories of SR errors, including enunciation, dictionary, suffix, added words, deleted words, homonym, spelling, nonsense, and critical errors

Unspecified McKoskey and Boley (2000)97
  • Unsupervised clustering of 1200 completed dictations from 6 speakers aligned with their original SR output

  • Identified error clusters: short and function words; vowel destressing and cliticization; vowel syncope; words with sounds affected by telephony interference (eg, fricatives)

SR: speech recognition.

a

4 studies did not report the number of speakers.66,87,88,90

b

4 studies did not report the number of speakers.10,26,88,90

c

Accuracy = number of correctly recognized words/total number of words dictated.

d

2 studies did not report the number of speakers.92,94

e

2 studies did not report the number of words evaluated.44,94

f

Word error rate = (number of substitutions + number of insertions + number of deletions)/total number of words dictated.

g

1 study did not report the number of speakers.19

h

2 studies did not report the number of words evaluated.19,59