Skip to main content
. 2014 Oct 28;14:94. doi: 10.1186/1472-6947-14-94

Table 3.

Summary of speech recognition (SR) review results

Author Aim Setting Outcome measures Results
Year Sample
Country Design Speech technology (ST)
Design
Al-Aynati and Chorneyko 2003 [18] To compare SR software with HT for generating pathology reports Setting: Surgical pathology 1. Accuracy rate Accuracy rate (mean %)
Sample: 206 pathology reports 2. Recognition/ Transcription errors SR: 93.6 HT: 99.6
Canada Experimental ST: IBM Via Voice Pro version 8 with pathology vocabulary dictionary Mean recognition errors
SR: 6.7 HT: 0.4
Mohr et al. 2003 [22] To compare SR software with HT for clinical notes Setting: Endocrinology and Psychiatry 1. Dictation/recording time + transcription (minutes) = Report Turnaround Time (RTT). RTT (mins)
Endocrinology
SR: (Recording + transcription) = 23.7
HT: (Dictation + transcription) = 25.4
USA Experimental Sample: 2,354 reports
ST: Linguistic Technology Systems LTI with clinical notes application SR: 87.3% (CI 83.3, 92.3) productive compared to HT.
Psychiatry transcriptionist
SR: (Recording + transcription) = 65.2
HT: (Dictation + transcription) = 38.1
SR: 63.3% (CI 54.0, 74.0) productive compared to HT.
Psychiatry secretaries
SR: (Recording + transcription) = 36.5
HT: (Dictation + transcription) = 30.5
SR: 55.8% (CI 44.6, 68.0) productive compared to HT.
Author, secretary, type of notes were predictors of productivity (p < 0.05).
NSLHD 2012 [29] To compare accuracy and time between SR software and HT to produce emergency department reports Setting: Emergency Department 1. RTT RTT mean (range) in minutes
Australian Experimental Sample: 12 reports SR: 1.07 (46 sec, 1.32)
ST: Nuance Dragon Voice Recognition HT: 3.32 (2.45, 4.35)
HT: Spelling and punctuation errors
SR: Occasional misplaced words
Alapetite, 2008 [30] To evaluate the impact of background Setting: Simulation laboratory 1. Word Recognition Rate (WRR) WRR
Denmark Non-experimental noise (sounds of alarms, aspiration, metal, people talking, scratch, silence, ventilators) and other factors affecting SR accuracy when used in operating rooms Sample: 3600 short anaesthesia commands Microphone
Microphone 1: Headset 83.2%
ST: Philips Speech Magic 5.1.529 SP3 and Speech Magic Inter Active Danish language, Danish medical dictation adapted by Max Manus Microphone 2: Handset 73.9%
Recognition mode
Command 81.6%
Free text 77.1%
Background noise
Scratch 66.4%
Silence 86.8%
Gender
Male 76.8%
Female 80.3%
Alapetite et al. 2009 [31] To identify physician’s perceptions, attitudes and expectations of SR technology. Setting: Hospital (various clinical settings) 1. Users’ expectation and experience Overall
Denmark Non-experimental Sample: 186 physicians Predominant response noted. Q1 Expectation: positive 44%
Q1 Experience: negative 46%
Performance
Q8 Expectation: negative 64%
Q8 Experience: negative 77%
Time
Q14 Expectation: negative 85%
Q14 Experience: negative 95%
Social influence
Q6 Expectation negative 54%
Q6 Experienced negative 59%
Callaway et al. 2002 [20] To compare an off the shelf SR software with manual transcription services for radiology reports Setting: 3 military medical facilities 1. RTT (referred to as TAT) RTT
USA Non-experimental Sample: Facility 1: 2042 reports 2. Costs Facility 1: Decreased from 15.7 hours (HT) to 4.7 hours (SR)
Facility 2: 26600 reports Completed in <8 h: SR 25% HT 6.8%
Facility 3: 5109 reports Facility 2: Decreased from 89 hours (HT) to 19 hours (SR)
ST: Dragon Medical Cost
Professional 4.0 Facility 2: $42,000 saved
Facility 3: $10,650 saved
Derman et al. 2010 [32] To compare SR with existing methods of data entry for the creation of electronic progress notes Setting: Mental health hospital 1. Perceived usability Usability
Canada Non-experimental Sample: 12 mental health physicians
ST: Details not provided
2. Perceived time savings 50% prefer SR
3. Perceived impact Time savings: No sig diff (p = 0.19)
Impact
Quality of care No sig diff (p = 0.086)
Documentation No sig diff (p = 0.375)
Workflow No sig improvement (p = 0.59)
Devine et al. 2000 [33] To compare ‘out-of-box’ performance of 3 continuous SR software packages for the generation of medical reports. Sample: 12 physicians from Veterans Affairs facilities New England 1. Recognition errors (mean error rate) Recognition errors (mean-%)
USA Non-experimental ST: System 1 (S1) IBM ViaVoice98 General Medicine Vocabulary. 2. Dictation time Vocabulary
3. Completion time S1 (7.0 -9.1%) S3 (13.4-15.1%) S2 (14.1-15.2%)
System 2 (S2) Dragon Naturally Speaking Medical Suite, V 3.0. 4. Ranking S1 Best with general English and medical abbreviations.
Dictation time: No sig diff (P < 0.336).
System 3 (S3) L&H Voice Xpress for Medicine, General Medicine Edition, V 1.2. 5. Preference Completion time (mean):
S2 (12.2 min) S1 (14.7 min) S3 (16.1 min)
Ranking: 1S1 2S2 3S3
Irwin et al. 2007 [34] To compare SR features and functionality of 4 dental software application systems. Setting: Simulated dental 1. Training time Training time
USA Non-experimental Sample: 4 participants (3 students, 1 faculty member) 2. Charting time S1 11 min 8 sec S2 9 min 1 sec (no data reported for S3 ad S4).
3. Completion
ST: Systems 1 (S1) Microsoft SR with Dragon NaturallySpeaking. 4. Ranking Charting time: S1 5 min 20 sec S2 9 min 13 sec, (no data reported for S3 ad S4).
System 2 (S2) Microsoft SR Completion %: S1 100 S2 93 S3 90 S4 82
Systems 3 (S3) & System 4 (S4) Default speech engine. Ranking
1 S1 104/189 2 S2 77/189
Kanal et al. 2001 [35] To determine the accuracy of continuous SR for transcribing radiology reports Setting: Radiology department 1. Error rates Error rates (mean ± %)
USA Non-experimental Sample: 72 radiology reports 6 participants Overall (10.3 ± 33%)
Significant errors (7.8 ± 3.4%)
ST: IBM MedSpeaker/Radiology software version 1.1 Subtle significant errors (1.2 ± 1.6%)
Koivikko et al. 2008 [36] To evaluate the effect of speech recognition onadiology workflow systems over a period of 2 years Setting: Radiology department 1. RTT (referred to as TAT) at 3 collection points: RTT (mean ± SD) in minutes
Finland Non-experimental Sample: >20000 reports; 14 Radiologists HT: 2005 (n = 6037) HT: 1486 ± 4591
ST: Finnish Radiology Speech SR1: 2006 (n = 6486) SR1: 323 ± 1662
Recognition System (Philips Electronics) SR2: 2007 (n = 9072) SR2: 280 ± 763
HT: cassette-based reporting 2. Reports completed ≤ 1 hour Reports ≤ 1 hour (%)
SR1: SR in 2006 HT: 26
SR2: SR in 2007 SR1: 58
Training:
10-15 minutes training in SR
Langer 2002 [37] To compare impact of SR on radiologist productivity. Comparison of 4 workflow systems Setting: Radiology departments 1. RTT (referred to as TAT) RTT (mean ± SD%) in hours/ RP
USA Non-experimental Sample: Over 40 radiology sites 2. Report productivity (RP), number of reports per day System 1
System 1 Film, report dictated, HT RTT: 48.2 ± 50 RP: 240
System 2 Film, report dictated, SR System 2
System 3 Picture archiving and communication system + HT RTT: 15.5 ± 93 RP: 311
System 3
System 4 Picture archiving and communication system + SR RTT: 13.3 ± 119 (t value at 10%) RP: 248
System 4
RTT: 15.7 ± 98 (t value at 10%) RP: 310
Singh et al. 2011 [23] To compare accuracy and turnaround Setting: Surgical pathology 1. RTT (referred to as TAT) RTT in days
USA Non-experimental times between SR software and traditional transcription service (TS) when used for generating surgical pathology reports Sample: 5011 pathology reports 2. Reports completed ≤ 1 day Phase 0: 4
ST: VoiceOver (version 4.1) Dragon Naturally Speaking Software (version 10) 3. Reports completed ≤ 2 day Phase 1: 4
Phase 0: 3 years prior SR Phase 2–4: 3
Phase 1: First 35 months of SR use, gross descriptions Reports ≤ 1 day (%)
Phase 0: 22
Phase 2–4: During use of SR for gross descriptions and final diagnosis Phase 1: 24
Phase 2–4: 36
Reports ≤ 2 day (%)
Phase 0: 54
Phase 1: 60
Phase 2–4: 67
Zick et al. 2001 [38] To compare accuracy and RTT between Setting: Emergency Department 1. RTT (referred to as TAT) RTT in mins
USA Non-experimental SR software and traditional transcription service (TS) when used for recording in patients’ charts in ED Sample: Two physicians - 47 patients’ charts 2. Accuracy SR: 3.55 TS: 39.6
3. Errors per chart Accuracy % (Mean and range)
ST: Dragon NaturallySpeaking Medical suite version 4 4. Dictation and editing time SR: 98.5 (98.2-98.9) TS: 99.7 (99.6-99.8)
4. Throughput Average errors/chart
SR: 2.5 (2–3) TS: 1.2 (0.9-1.5)
Average dictation time in mins (Mean and range)
SR: 3.65 (3.35-3.95) TS: 3.77 (3.43-4.10)
Throughput (words/minute)
SR: 54.5 (49.6-59.4) TS: 14.1 (11.1-17.2)

Report productivity (RP): Normalises the output of staff to the daily report volume.

Note: SR = speech recognition ST = speech technology HT = human transcription RTT = report turnaround time WRR = word recognition rate PACS = picture archiving and communication system RP = report productivity TS = traditional transcription service ED = emergency department Sig. = Significant Diff = difference. TAT = turnaround time, equivalent to RTT.