Skip to main content
Journal of the American Medical Informatics Association : JAMIA logoLink to Journal of the American Medical Informatics Association : JAMIA
. 2020 Apr 24;27(5):770–775. doi: 10.1093/jamia/ocaa027

How accurate is the medical record? A comparison of the physician’s note with a concealed audio recording in unannounced standardized patient encounters

Saul J Weiner 1,2,, Shiyuan Wang 3, Brendan Kelly 1, Gunjan Sharma 1, Alan Schwartz 4
PMCID: PMC7647276  PMID: 32330258

Abstract

Objectives

Accurate documentation in the medical record is essential for quality care; extensive documentation is required for reimbursement. At times, these 2 imperatives conflict. We explored the concordance of information documented in the medical record with a gold standard measure.

Materials and Methods

We compared 105 encounter notes to audio recordings covertly collected by unannounced standardized patients from 36 physicians, to identify discrepancies and estimate the reimbursement implications of billing the visit based on the note vs the care actually delivered.

Results

There were 636 documentation errors, including 181 charted findings that did not take place, and 455 findings that were not charted. Ninety percent of notes contained at least 1 error. In 21 instances, the note justified a higher billing level than the gold standard audio recording, and in 4, it underrepresented the level of service (P = .005), resulting in 40 level 4 notes instead of the 23 justified based on the audio, a 74% inflated misrepresentation.

Discussion

While one cannot generalize about specific error rates based on a relatively small sample of physicians exclusively within the Department of Veterans Affairs Health System, the magnitude of the findings raise fundamental concerns about the integrity of the current medical record documentation process as an actual representation of care, with implications for determining both quality and resource utilization.

Conclusion

The medical record should not be assumed to reflect care delivered. Furthermore, errors of commission—documentation of services not actually provided—may inflate estimates of resource utilization.

Keywords: medical records; quality of health care; health care costs, unannounced standardized patients, medical ethics

INTRODUCTION

Health care is delivered behind closed doors as physician joins patient in a private examination room. We count on the physician (or a scribe) to accurately record in the medical record all salient information following an often complex and multifaceted interaction. The information is utilized to guide future care decisions, to assess and improve the quality of care, and to reimburse for services rendered. To accomplish each, it must be reliable.

Starting in the 1970s, researchers began asking such basic questions as whether physicians’ handwritten notes contain a legible description of the “… problem for which the patient is now being seen?” (ie, the chief complaint).1 In 10% of audits, there was insufficient information “for the rater to determine the nature of the problem or the treatment given,” and, in nearly a third, basic information about the patient’s medical history was missing. These determinations were made by 2 physicians using prospectively established criteria with a 90% interrater agreement.

A limitation of this early research was a lack of a gold standard for what actually occurred during the visit. Were the missing diagnoses and treatment plans just not documented, or were they not addressed during the actual encounter as well (indicating the note was actually accurate)? Was any of the information that did make it into the medical record—such as plans to order tests or new medications—never elicited (indicating the note was actually fictive)?

In a subsequent study, a few years later, the medical records of 3 physicians were studied in a pediatrics clinic by comparing them to audio recordings of 51 visits “according to the presence or absence of a variety of items which are present in both the tape and in the record, present in the record but not the tape, present in both but significantly different in content (not merely terminology), or absent from both tape and record.”2 Overall, the research team found relatively few inaccuracies in the medical record but many omissions of clinically significant information. Of note, however, the physicians knew when they were being recorded and that their accuracy would be checked. In 1981 a similar study, again comparing chart to audio recordings, arrived at a similar finding of relative accuracy in documentation but with many omissions of clinically relevant information. Again, physicians knew when they were being recorded.3

Since those early studies, 3 major changes affecting documentation have occurred in the intervening decades: 1) the advent of the electronic health record, 2) a billing and coding system that determines level of reimbursement based on what is written in the medical record, and 3) quality reviews that hold physicians accountable based on what’s in or not in their notes. Every medical professional hears the mantra “If you didn’t document it, it didn’t happen!”4

In theory, such an environment should lead to greater thoroughness. But could it also incline some physicians to write notes that represent how care should be delivered rather than how it is, in fact, delivered? To bill a new patient as 99204 (“Level 4”), for instance, physicians must document 4 elements of the history of present illness, 2 from the social or family history, 10 from the review of systems, and 2 elements from each of 9 organ systems for the physical exam. But does documenting these tasks mean they happened and were done correctly? Or might some physicians document services that they didn’t provide?

A study of the medical record employing the novel method of employing unannounced standardized patients (USPs) in 1996–97 began to shed light on this question.5 The USP is an individual, often a trained actor, who portrays a real patient and strictly adheres to a script (which is what makes them “standardized”). At the time of the encounter, the physician is generally unaware they are interacting with a fake patient, with an average detection rate of 13%.6 Following the visit, they complete a checklist which, when compared by a trained auditor to audio recording of the encounter, is more than 90% accurate.7 USPs have been considered a credible “gold standard” for measuring physician performance and, specifically, for evaluating the accuracy of “data obtained from other sources, such as medical records …”7

In the 1996–97 study, immediately following each encounter, the USP completed a checklist of 25–30 items documenting whether specific tasks occurred pertaining to eliciting the medical history, completing the physical exam, discussing a diagnosis, and proposing a treatment plan. These were compared to the content in the physician’s note. In addition to the findings reported in the older studies of frequent omissions of clinically significant information, the researchers observed a new kind of error: false positives. False positives occur when information found in the medical record, such as a normal physical exam or an unremarkable review of symptom, was never elicited during the encounter. As a proportion of the items on the checklists, false positives were highest for the physical exam (13.5%), and diagnosis (14.6%), but present in all sections of the notes. In their published report of the findings in 2002, the authors concluded: “As the electronic record becomes the standard for physician documentation, new threats to the integrity of the record emerge: templates and other time-saving mechanisms offer new possibilities for embellishing the record and propagating misinformation. The increase in documentation requirements and the growing scrutiny of the medical record only raise the incentives to falsify it.”5

OBJECTIVE

We sought to explore further, at a more granular level and drawing on data collected a decade later, the concordance of information documented in the medical record with data collected by USPs, this time utilizing concealed audio recordings rather than just checklists, thereby enabling a comprehensive cataloguing of errors of documentation.

METHODS AND MATERIALS

A detailed description of the original data collection process has been previously published.8 In brief, 8 actors were trained at the University of Illinois at Chicago, Dr. Allan L. and Mary L. Graham Clinical Performance Center to portray 4 cases (A–D). The original study for which the cases were developed was designed to assess how effectively physicians avoid making clinical decision-making errors by overlooking either biomedical or psychosocial information. Participating subjects, all board-certified attending physicians, consented to a protocol in which actors carried concealed audio recorders (which were subsequently transcribed) and completed checklists for each encounter following the visit. Physicians were notified they had seen a USP via an e-mail message soon after they completed their note. The message contained a postvisit suspicion question. Eighty-one percent responded that they’d believed they were seeing a real patient during the visit.

Case A is a 43-year-old man with persistent asthma symptoms despite taking an expensive brand-name steroid inhaler. Depending on the variant, underlying reasons include an inability to afford the medication, undiagnosed gastroesophageal reflux, or both. Case B is a 47-year-old woman presenting to a primary care doctor for preoperative evaluation for a hip replacement, secondary to damage from a decades old motor vehicle accident. She is the sole caregiver for a young adult child with a chronic debilitating neurological disease. Case C is a 59-year-old man with diabetes and several near fainting spells that may be related to hypoglycemia caused by mild cognitive deficits and loss of social support leading to confusion about how to take his medications, an undiagnosed cardiac condition, or both. Case D is a 72-year-old man with weight loss which could indicate an underlying malignancy, intermittent homelessness and food insecurity, or both. For all 4 cases, the number of diagnostic and management options, complexity of data, and risks of complications are sufficient to require “moderately complex” medical decision-making for coding and billing a level 4 new office visit (99204), if other criteria are met.

In the original study, a total of 380 encounters to 111 physicians were successfully audio-recorded, and transcripts were made from the audio recordings. Participants were all attending physicians in ambulatory internal medicine clinics across a range of solo, small group, academic, private, safety net, and Veterans Affairs (VA) practice settings. We selected 105 of the encounters (from the 3 participating VA facilities), because these sites did not provide patients with previsit questionnaires to complete that might account for information seen in physicians’ notes but not heard on the audio recordings.8 In the original study, there was no association between the clinical performance of physicians with their practice location, providing some evidence that the 36 VA physicians included in this analysis were not more or less error-prone than the 75 non-VA physicians who were excluded.8

All visit notes were entered by physicians in the VA electronic medical record system, VistA CPRS. Since the system allows copy and paste of text both from other notes and from templates that physicians can keep on their desktop, there is considerable variability and flexibility in how providers document encounters. Since all visits were new patient encounters, however, there was no opportunity to copy and paste information from a prior note.

Comparison of the audio recordings and transcripts with the physicians’ notes was carried out by 2 experienced nonmedically trained audio coders, a medical student, and an attending physician. The principal method for analysis of each encounter consisted of the following 3 steps: 1) verify the transcript while listening to the audio recording, 2) look for each element in the transcript in the corresponding physician’s note, notating whether it is present (regardless of its location in the note), missing, or present but inaccurate; and, if there are remaining elements in the physician’s note, 3) confirm that they were not overlooked in the transcripts and, “if they were, determine if they are accurate. The construct of “elements” of information in the medical record follows the usage of the term in CPT Coding Guidelines for Office Visits.9 To maximize accuracy, we generally excluded the physical exam since documented findings could not be verified by audio recording. Exceptions were instances in which the physician was heard discussing physical findings (such as an actor’s arthritic joints) but not documented in the note, or documented a physical finding (eg, normal foot exam) but was not heard asking the USP to expose the body part.

We adopted the nomenclature “medical record error of omission” to describe elements found in transcripts and audio recording, but omitted in the medical record, and “medical record error of commission” to describe elements documented in medical record, but not present in the transcripts and audio recordings (referred to below as just errors of omission or commission, respectively, for brevity). Inaccurate information was coded twice as both an omission (of the correct information) and a commission (since it is not found in the transcripts and audio recordings). These pairings were tagged so they could be counted separately as “inaccuracies” in the medical record. Finally, in order to isolate the most consequential errors, we distinguished between errors related to the chief complaint (category 1) and those related to the patients’ general care (category 2), and whether or not an error was clinically significant. An error was considered not clinically significant if it was unrelated to any of a patients’ medical conditions, symptoms, or signs. For instance, documenting that a patient has 1 grandson when, on the audio recording, they said they are expecting 1 grandson is not clinically significant. Classifications of elements were verified by an attending physician board-certified in internal medicine.

Analyses are primarily descriptive. We summarize the frequencies and types of errors overall by the note sections in which they occurred. We also determined the billing (visit complexity) level of each encounter based on the note and the audio recorded events of the visit and examined the association between the 2 using McNemar’s test of correlated proportions to test whether notes justifying downcoding or upcoding were more prevalent. We employed the Current Procedural Terminology Coding Guidelines for Office Visits9 based on medical services, as documented in the note, compared to what was heard on the audio with the former utilized to determine the level of billing and the latter representing the true reimbursement value of an encounter. This analysis, as well as the original study on which it was based, were approved by the Institutional Review Board at Jesse Brown Veterans Affairs Medical Center.

RESULTS

Across 105 encounters, 11 (10%) had no documentation errors, and 94 (90%) had at least 1 documentation error. (One encounter was incomplete due to recorder malfunction. For this encounter, we included 4 omission errors based on the partial audio recording, and 1 commission error based on the USP checklist indicating that the patient was not asked about several ROS elements but the note indicated a negative ROS. In the analyses of billing discrepancies, we assumed that this encounter would have been billed as performed [ie, neither under- nor overbilled]). Overall, there were 636 documentation errors, referencing the audio recording as the gold standard. Of these, 181 were errors of commission, and 455 were errors of omission, which is 28.5% and 71.5%, respectively. Among these were 55 inaccuracies in the medical record which, as noted above, counted as both an omission and commission. Eighty-three percent of errors were clinically significant, and, of these, 47% were category 1 errors. This means that there was an overall average of 5 clinically significant errors and 2.3 category 1 errors per encounter.

As documented in Table 1, nearly half of all errors were found in just 2 sections of the notes, the history of present illness (HPI) and family history/social history (FH/SH). And about 69% of these were errors of omission. Conversely, in the third most error-prone section—the review of systems (ROS)—most errors (73%) were commissions. Two sections—vitals and physical exam—likely greatly underestimated errors, because we did not have a video camera to catch most discrepancies, but we included the few we heard or knew were present, since the note didn’t match the actors’ physical characteristics. The most accurate section of the note was the chief complaint where there were a few instances in which the physician did not correctly document the patient’s stated reason for seeking care.

Table 1.

Distribution and types of errors across sections of the physician’s note

Section of note Number of errors (% of total errors) % Commissions by section % Omissions by section % Clinically significant by section % Category 1 by section
Chief Complaint 6 (1%) 17% 83% 100% 100%
HPI 119 (19%) 21% 79% 97% 85%
PMH/PSH 61 (10%) 12% 89% 92% 30%
Immunizations 19 (3%) 21% 79% 100% 0%
FH/SH 184 (29%) 21% 79% 63% 22%
Allergies 25 (4%) 24% 76% 100% 0%
Meds 32 (5%) 28% 72% 88% 78%
ROS 90 (14%) 73% 27% 99% 32%
Vitals 3 (1%) 33% 67% 33% 100%
Physical Exam 10 (2%) 40% 60% 70% 0%
Plan 87 (14%) 23% 77% 76% 30%
Total (all sections) 636 (100%) 29% 72% 83% 39%

Abbreviations: FH/SH, family history/social history; HPI, history of present illness; PMH/PSH, past medical history/past surgical history; ROS, review of systems.

Notes: Sum of percent of total errors in each section and sum of percent commission and omission errors in each section may not add up to 100% due to rounding. “Clinically significant” errors are those related to a patient’s medical conditions, symptoms, or signs. “Category 1” errors are those related to the chief complaint.

Table 2 contains examples of both types of errors from each section of the notes. In the Omission column, the examples indicate what was heard on the audio recording (“Audio”) but not seen in the note. In the Commission column, examples indicate what was seen in the note but not heard on the audio recording. The letters “A” through “D” in parentheses refer to the cases portrayed by the actors as described above.

Table 2.

Examples of documentation errors of omission and commission in each section of the physician’s note

Section of note Omission (Case) Commission (Case)
Chief Complaint
  • Audio: Diabetic patient describes feeling “woozy” with “pounding chest” during presyncopal event

  • Note: These symptoms not documented. (C)

  • Note: Physician documented patient “has no complaints at this time,” but he was never asked.

  • Audio: Patient reported weight loss later in visit, which was not documented (error of omission). (D)

HPI
  • Audio: Hypothyroid patient mentions last 3 periods have been heavier than normal.

  • Note: Not documented (B)

  • Note: “Denies abdominal pain, fevers, or chills” in patient with unexplained weight loss.

  • Audio: Patient was not asked nor volunteered the information. (D)

PMH/PSH
  • Audio: Patient reported she injured hip in car accident in 1972.

  • Note: Not documented in patient presenting for hip replacement preop evaluation (B)

  • Note: “No history of heart or lung disease.”

  • Audio: Patient was never asked despite seeking preoperative assessment for hip transplant.” (B)

Immunizations
  • Audio: Patient with diabetes declines pneumococcal vaccines.

  • Note: Not documented (C)

  • Note: “Up to date on immunizations.”

  • Audio: Patient was never asked. (B)

FH/SH
  • Audio: Patient “stretching” his Pulmicort medication since loss of job.

  • Note: Not documented despite poorly controlled asthma. (A)

  • Note: Documents asthma patient as “a smoker.”

  • Audio: Did not ask patient if he currently smokes. Asked if he smoked when younger, and he answered no. (Patient never smoked.) (A)

Allergies
  • Audio: Patient reported penicillin and nuts give him “a blotchy, itchy rash all over.”

  • Note: Not documented. (D)

  • Note: “NKDA” (No known drug allergies).

  • Audio: Did not ask patient about allergies. (B)

Meds
  • Audio: Patient reports he started Novolog insulin 2 weeks before onset of hypoglycemic symptoms.

  • Note: Not documented. (C)

  • Note: “OTC Med: 1 aspirin daily, Tylenol prn, 1 MVI daily.”

  • Audio: Patient did not report taking any of these medications. (B)

ROS
  • Audio: “No fevers, chills, night sweats” heard.

  • Note: Not recorded in note. (D)

  • Note: “No SOB” (part of an all-negative ROS).

  • Audio: No ROS questions asked; patient reported he was SOB which was noted in HPI. (A)

Vitals
  • Audio: Physician notes patient has normal BP despite reporting history of hypertension.

  • Note: BP not documented. (B)

  • Audio: Physician tells patient BP is 120/60 on repeat.

  • Note: 113/69* (B)

  • *Not clinically significant. (B)

Physical Exam
  • Audio: Arthritic changes in knee joint.*

  • Note: Not documented.

  • *Category 2: Not related to chief complaint. (D)

  • Note: “feet with no CCE” (ie, clubbing, cyanosis, edema).

  • Audio: Patient never instructed to remove shoes. (C)

Plan
  • Audio: Physician twice tells patient he needs to start taking aspirin daily.

  • Note: Not included in plan, which listed other medications to start. (C)

  • Note: “Foot care recommendations given” in patient with diabetes.

  • Audio: Not heard on audio. (C)

Abbreviations: FH/SH, family history/social history; HPI, history of present illness; PMH/PSH, past medical history/past surgical history; prn, as needed; ROS, review of systems.

Table 3 enumerates the payment effect of omission and commission errors, utilizing 2019 Medicare reimbursement rates and assuming the practice assigns a billing level correctly based on documented care.10Overall, there were 21 instances (20%) in which the note justified billing higher than the gold standard audio, and 4 (4%) in which it underrepresented the level of service, a significant difference (McNemar’s test P = .005). The net 16 instances related to upcoding for level 3 services (19 cases of level 3 billed at level 4 and 3 cases of level 4 billed at level 3) resulted in 40 level 4 notes instead of the 23 justified based on the audio, a 74% inflated misrepresentation. Taking into account all encounters (including correctly documented and downcoded encounters), overall the notes justified net reimbursement of $1034 above actual services rendered, an average of $9.85 per encounter.

Table 3.

Discrepancies in coding level based on the medical record and services heard on audio

Level discrepancy Note > Audio (Justifies upcode) Audio > Note (Justifies downcode) Net difference Percent of 105 visits Payment effect (2018)
4 vs 3 19 3 +16 +15% +$912
4 vs 2 1 0 +1 +1% +$91
3 vs 2 0 1 −1 −1% −$34
3 vs 1 1 0 +1 +1% +$65
All 21 4 +17 +16% +$1034

Notes: In addition to the 25 visits with discrepancies in coding between the medical record and visit audio recording, 79 visits would be coded at the same level based on both the record and audio (9 as level 2 or lower, 50 as level 3, 20 as level 4) and 1 incomplete encounter, which we treated as nondiscrepant.

DISCUSSION

The electronic health record has not resolved inaccuracies of documentation of patient care that have likely always been present. Although it is difficult to compare past frequencies and types of error with data collected in this study, the high number of errors of omission in the HPI likely reflect the inherent challenges of capturing all of the pertinent information a patient reveals when describing their current signs and symptoms. This is likely not new. In fact, because most people type faster than they can write, the HPI is likely more complete now than in the past. The high frequency of errors in the family and social histories may reflect the tendency of physicians to undervalue this information even when it is clinically relevant.11

Errors of commission, however, appear to be a relatively new documentation problem with potentially far-reaching consequences. Such errors were not reported in studies conducted before billing was based on documentation and providers began to utilize templates and/or the copy-and-paste function.12 Not surprisingly, the ROS accounts for the largest proportion of errors of commission given that at least 10 systems must be included to justify a level 4 visit for a new patient. Physicians get credit either by writing that their patient has “a negative ROS” or checking off at least 10 boxes. This leads to instances of misinformation such as an ROS indicating “no shortness of breath” in a patient presenting with poorly controlled asthma (Table 2), or “negative weight loss” in a patient presenting with unexplained weight loss, and so on. The commission errors in Table 2 illustrate falsely reported information that could lead to inappropriate care given the actors’ portrayal of the case.

In addition to quality of care implications, commission errors likely drive up health care costs. As shown in Table 3, they can lead to difficult-to-detect upcoding, when the note exaggerates the level of services that were actually provided.

There are several limitations to the study. First, we assume that the billing level selected by physicians and practices for each encounter matches the billing level justified by the note. That is how providers are required to bill; but not all do so accurately, resulting in both upcoding and downcoding relative to the level justified in the note.13 Second, although USP detection rates were low, it’s possible that just knowing they were in a USP study might have motivated physicians to be more accurate—which could incline our findings to underestimate documentation error rates. And, third, the dataset employed for this analysis is based on just 36 VA physicians, 105 encounters, and 4 cases, so the extent to which the findings are generalizable is unknown. Overall, it is unknown how compliance enforcement of accurate documentation of health care delivery compares between VA and non-VA health care settings. Both are monitored by federal offices of the inspector general.14 To carry out a similar analysis in private practices would, however, introduce other confounders: As noted, at many practices, patients complete previsit questionnaires about their health. These leave uncertainties about whether findings in the note not heard on audio are commission errors or, in fact, legitimately taken from patient completed paperwork.

A Centers for Medicare & Medicaid Services (CMS) initiative, Patient Over Paperwork, may reduce the pressures that lead to commission errors through the introduction, in 2021, of a single payment rate for visits currently reported as levels 2–4, with level 2 documentation sufficient to justify payment.15 It may also result in a more accurate record of care. Accurate medical records are not only important for keeping medical costs in line with services provided, but also critical for providing high-quality, high-value patient care. Our findings should raise fundamental concerns about the utility and integrity of the current medical recording documentation process and reinforce the case for a change in documentation requirements.

FUNDING

This work was supported by the Department of Veterans Affairs, Health Services Research & Development, grant number IIR 04-107-2, and by the University of Illinois at Chicago College of Medicine Dr. C. M. Craig Fellowship for Summer Research.

AUTHOR CONTRIBUTIONS

SJW contributed to the design of the work and drafted the manuscript. SW, BK, and GS contributed to the analysis of the work and substantive revision of the manuscript. AS contributed to the design of the work, and substantive revision of the manuscript. All authors approved the final version of the manuscript and are accountable for the work.

ACKNOWLEDGMENTS

This material is based upon work supported by the Department of Veterans Affairs, Veterans Health Administration, Office of Research and Development. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the US Department of Veterans Affairs or the United States government.

CONFLICT OF INTEREST STATEMENT

Two of the authors (SW and AS) are coprincipals of the Institute for Practice and Provider Performance Improvement, founded to employ unannounced standardized patient assessments as a quality improvement service. There are no other potential conflicts of interest or relationships or activities by any of the authors that could appear to have influenced the submitted work.

REFERENCES


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES