Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2018 Apr 16;2017:912–920.

Accuracy and Completeness of Clinical Coding Using ICD-10 for Ambulatory Visits

Jan Horsky 1,4, Elizabeth A Drucker 2,3,4, Harley Z Ramelson 1,4,5
PMCID: PMC5977598  PMID: 29854158

Abstract

This study describes a simulation of diagnostic coding using an EHR. Twenty-three ambulatory clinicians were asked to enter appropriate codes for six standardized scenarios with two different EHRs. Their interactions with the query interface were analyzed for patterns and variations in search strategies and the resulting sets of entered codes for accuracy and completeness. Just over a half of entered codes were appropriate for a given scenario and about a quarter were omitted. Crohn’s disease and diabetes scenarios had the highest rate of inappropriate coding and code variation. The omission rate was higher for secondary than for primary visit diagnoses. Codes for immunization, dialysis dependence and nicotine dependence were the most often omitted. We also found a high rate of variation in the search terms used to query the EHR for the same diagnoses. Changes to the training of clinicians and improved design of EHR query modules may lower the rate of inappropriate and omitted codes.

Introduction

The almost seventy thousand codes that comprise the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10-CM) are far more detailed than those in the preceding version that clinicians in the United States were working with since the late 1970s.1 This new level of complexity is expected to not only facilitate documenting and reporting causes of mortality and morbidity but to also extend the ability to identify and manage clinical processes with information technology by identifying changes in medication management and by monitoring data for health maintenance and preventive care purposes.2 Highly granular and more accurate data are also indispensable for rapidly expanding secondary uses such as detecting healthcare fraud, developing patient safety criteria, setting healthcare policy and developing public health initiatives, improving clinical performance and, crucially, allowing large-scale analyses for medical research. Codes are also essential in clinical care for phenotyping and predictive modeling of patient state.3 The transition also has significant implications on reimbursement from health care insurers. Diagnostic codes may be used to determine the severity of illness of a provider’s patient population and affect payment rates with newly adopted payment models.

Codes that previously could not differentiate between several types of diabetes, for example, are now refined to capture important distinctions but require clinicians to add to their documentation causal underlying conditions or whether the disease was induced by drugs.4 A more detailed description of laterality and location in the patient’s body is also a newly added specification. The previous emphasis on organs and disease that prioritized physician-oriented content is expanded to also cover human responses to disease that are necessary for advanced nursing and long-term care.5

Many electronic health record (EHR) systems integrate clinical documentation and billing information and provide cross-mapping between primarily care-oriented and reimbursement- or report-oriented data. Problem lists, for example, need to conform to standardized vocabularies based on ICD-10 or SNOMED codes for the CMS EHR Incentive Program known as Meaningful Use.6 EHRs may employ their own proprietary reference terminologies that allow to search for and display diagnostic and other concepts in forms that clinicians find customary and meaningful while maintaining mapped connections in the background to more or less granular codes intended for reporting, financial or automated decision-support purposes. Typically, a clinician adding a coded term to a problem list or a fully-qualified ICD-10 code to a billing record starts with typing one or more words, an abbreviation or a string of characters into a free-text query field. A record search engine within the EHR then returns results based on their relevance to the search string and ranks them in a list according to differentiating logic for complete word or partial word matching. For example, a search initiated by typing “esrd” may return many diagnostic codes related to end-stage renal disease. Further search through the results may be necessary, either by reading the list or by repeating the search with different terms, to find the exact code appropriate for the intended purpose. More sophisticated systems provide automated assistance with this refinement by using “wizards” or other support interventions to help locate the target diagnosis quickly.

The almost four-fold increase in the number of diagnoses in the current coding system presents a formidable challenge to computer engineers and designers to develop algorithms and human interfaces that can, in the same short time available to clinicians and medical coders in routine practice, query, compare and select the best descriptive diagnostic or other code in the vastly expanded field. A recent survey of perspectives that coders and physicians had about the practical usefulness of ICD-10 showed that most agreed on the need for computer-assisted coding.7 If the query process is not effective, however, clinicians may find themselves facing a choice between accurate and “close enough” coding when time constraints preclude further refinement of the process. This learned behavior would directly contravene the goal of improved and precise documentation of clinical care made possible with the ICD-10 system.

This study was intended to describe search behavior of clinicians using tools available in large EHR systems who were entering ICD10 codes. Our objectives were to observe interactive behavior that may contribute to incomplete or inaccurate coding and to analyze variations in coded diagnoses for standardized clinical scenarios. Findings of systematic errors or difficulties in completing the coding task may help inform or revise training that clinicians currently receive and to provide evidence and insight for improvement to electronic coding and indicate a need to revise the coding system itself.

Methods

The study was designed as a simulation of a clinical documentation task where clinicians used standardized case scenarios to enter diagnostic codes into the EHR. We asked 23 physicians to read short vignettes describing a variety of ambulatory visits and then enter relevant ICD-10 codes into a mock patient record. Seventeen participants completed two sets of three scenarios, using each set for a different EHR; six completed only one set, using EHR 1, due to technical reasons. In total, there were 40 completed sets: 23 on EHR 1 (12 Sets A and 11 Sets B), and 17 on EHR 2 (10 Sets A and 7 Sets B). The order of set completion (A vs. B) alternated to minimize possible learning bias.

EHR 1 was a commercial and EHR 2 an internally-developed clinical information system. Both required an initial entry of a search term into a free-text field that returned a list of ICD-10 codes with descriptions. If the target term was in the results they could simply select it or further refine the list by entering a different search string. Decision support interventions were available on both systems and were either triggered automatically for a subset of diagnoses in EHR 1, with an option to disregard, or were designed as a part of the entry process on EHR 2.8 Participants choosing to use decision support on EHR 1 could click on modification terms in pre-determined sets and an algorithm would refine the results accordingly to a single ICD10 code. Initial search on EHR 2 returned a list filtered by patient parameters such as age and gender that could be also modified by selecting answers to term-specific questions (e.g., Laterality? Left, Right, etc.) Both systems used algorithms and branching logic during the guided-search phase to refine result lists and suggest fully specified billable codes. For example, if “otitis media” was the initially entered search term, the support intervention would show subsets of optional terms for laterality, chronicity and recurrence. The visual presentation of these terms, their number and content were different for each system.

Practicing ambulatory clinicians (22 physicians, 1 physician assistant) were recruited through internal email advertising as a sample of convenience. Twenty-one (91%) had ten or more years of professional experience, sixteen (65%) as primary care providers and seven as specialists. Ten (44%) used EHR 1 for 6 months or more, twenty-one (91%) daily in practice. All were proficient in using EHR 2 as it served as the primary ambulatory record system prior to an institution-wide transition to the commercial system. The ICD10 requirement went into full effect in the hospital during the transition period and clinicians have therefore been entering codes with both EHRs for approximately the same time. Both systems had proprietary interface terminology and participants did not use any other sources such as ICD10 on the web.

Participants completed the task individually in the presence of an experimenter on a single workstation that was connected to both EHR systems. They were instructed to read each scenario and then find relevant codes according to their own clinical judgment, and to use their preferred strategy in order to best simulate authentic behavior. The number of codes expected for each scenario was not explicitly stated, only that more than one may be necessary. Interactions of clinicians with the two systems were recorded into an audiovisual media file using Morae9 screen-capture software running in the background. The recordings of full screens and verbal comments were later analyzed.

Scenarios and accuracy-rating criteria for diagnostic codes

Study scenarios were taken verbatim from interactive case studies made available by the Centers for Medicare & Medicaid Services (CMS) on their Road to Ten website.10 Since the CMS recommendations for the correct ICD-10 codes were only one example of proper coding, we developed rating criteria for the appropriateness of codes in order to accommodate other correct coding options. Two physicians (HZR, EAD) independently reviewed codes entered by the participants and rated whether they were appropriate in the context of each scenario. They reached consensus on disagreements through a series of discussions. Ratings were based on two criteria: clinical accuracy and completeness. Although the rating was binary (appropriate or not), the reviewers acknowledged the potential range of responses due to variations in clinical judgment and in the complexities of the structure and intent of the ICD-10 coding system. A code rated “appropriate” could be clinically accurate but incomplete in a way that did not alter the diagnosis or miss a clinically crucial information. For example, indicating that allergic rhinitis was seasonal but not explicitly including the causative agent of pollen, or documenting tonsillitis or pharyngitis for a patient with an inflamed pharynx with tonsillar exudate, would be still considered “appropriate”. However, codes without clinically significant information would not. This would include omitting streptococcus as the etiology of tonsillitis in a case where the rapid strep test was known to be positive, or not specifying large intestine in a case of Crohn’s disease complicated by colonic abscess.

We identified one diagnosis in each scenario that was primary and reflected the main reason for visit. All others were categorized as secondary but were still required for complete documentation. They are identified in Table 1. Three of the six scenarios included an itemized clinical assessment in the vignette, as in the example below:

The patient is a three-year-old male brought in by his mother. He has had a low-grade fever of 100.5 for 3 days. He is complaining that his ears hurt with difficulty sleeping. He also has a non-productive cough that started yesterday. Examination of both ears reveals significant redness and fluid in both middle ears with no apparent involvement of the eardrum or tympanic membrane. Further examination of the patient’s breathing and other manifestations indicates an upper respiratory infection. The patient’s parents are chronic heavy smokers and the child is exposed to second-hand smoke in the home environment.

History: The patient has recurrent episodes of middle ear infections.

Assessment:

  • Acute recurrent serous otitis media

  • Upper respiratory infection (presumed of viral origin)

  • Chronic secondary smoke exposure

Table 1.

Accuracy and completeness of ICD-10 coding for each scenario

Scenario Sets and Diagnoses Assessment Given in the Scenario Primary Dx Appropriate Omitted
Yes No
N % N % N %
Set A
Scenario 1
Streptococcal tonsillitis Primary 20 91% 2 9% 0
Immunization 0 2 9% 20 91%
Scenario 2 Assessment given
Moderate persistent asthma Primary 17 77% 5 23% 0
Seasonal allergic rhinitis 17 77% 2 9% 3 14%
Second-hand smoke exposure 14 61% 4 17% 5 22%
Scenario 3
Crohn’s disease of large intestine with abscess Primary 13 48% 14 52% 0
Old myocardial infarction 11 48% 2 9% 10 43%
Personal history of nicotine dependence 6 27% 1 5% 15 68%
Set B
Scenario 1 Assessment given
Acute recurrent otitis media, bilateral Primary 11 61% 7 39% 0
Acute upper respiratory infection 17 94% 0 1 6%
Exposure to environmental tobacco smoke 13 72% 3 17% 2 11%
Scenario 2
Type 1 diabetes with diabetic kidney disease Primary 9 50% 4 22% 5 28%
Type 1 diabetes with diabetic polyneuropathy 4 21% 11 58% 4 21%
Dependence on renal dialysis 4 22% 0 14 78%
End stage renal disease 9 50% 0 9 50%
Scenario 3 Assessment Given
Paroxysmal atrial fibrillation Primary 14 78% 4 22% 0
Essential (primary) hypertension 18 100% 0 0
Underdosing of other hypertensive drugs 8 40% 4 20% 8 40%
Total 206 56% 64 17% 96 26%

In this example, otitis media was the primary diagnosis and the other two diagnoses were secondary diagnoses. Although the participants were given this listing they were not instructed on how many codes to enter. Three scenarios did not include a clinical assessment (as in the example above) and only contained the narrative. The full wording of all scenarios and examples of coding are available on the Road to Ten website maintained by the CMS.10

Analysis

Analyses consisted of enumerating the entered codes and evaluating whether they were appropriate or not for the given scenario and whether any codes expected for complete coding of all implicit diagnoses were missing. We also analyzed free-text search terms entered in the query field for variation and the number of results returned by the algorithm. Group comparisons included type of scenario (with vs. without included assessment), type of diagnosis (primary vs. secondary) and electronic health record system (EHR 1 vs. EHR 2). Chi-square or Fisher’s Exact test were used where appropriate to compute group differences and statistical significance with SAS 9.4 software.11

Results

We have analyzed three aspects of the diagnostic coding process: a) the appropriateness and completeness of codes for each scenario, b) variation and patterns of similarity of free-text search terms the participants used to initiate queries and c) the number of returned search results.

Accuracy and completeness of coding

The unit of observation was a single ICD-10 code in the context of one standardized scenario. The number of diagnostic codes required for a fully qualified (complete) description varied between 2 and 4 per scenario. We therefore expected 356 codes to be entered if all participants coded all scenarios completely (22 participants x 8 in Set A plus 18 participants x 10 in Set B). Diagnoses for each scenario and the distribution of appropriate and omitted entries are shown in Table 1. Scenarios that included an explicit assessment are identified.

Just over a half (56%) of all entered diagnostic codes were rated as appropriate and about one quarter were omitted. Diagnoses with the highest accuracy rates, 90% or above, were essential hypertension, acute upper respiratory infection and streptococcal tonsillitis. Two diagnoses, type 1 diabetes mellitus with diabetic polyneuropathy and Crohn’s disease of large intestine with abscess had the greatest proportion of inappropriate codes entered than appropriate (52% and 58%, respectively). The most often omitted codes were for immunization not given (omission rate 91%), dependence on renal dialysis (78% omitted), and personal history of nicotine dependence that was omitted by two thirds of participants.

We also compared separately scenarios where an assessment was included with the narrative as it could serve as a cue to both the specificity of the diagnosis as well as the number of diagnoses that required coding. We computed their respective rates of accuracy and completeness. Results are shown in Table 2.

Table 2.

Accuracy and completeness by diagnosis type, scenario modality and EHR

Group Comparisons ICD-10 Codes Appropriate Not Appropriate Omitted Exact X2 Test
N % N % N % N % Prob.
Scenario Diagnosis < .0001
Primary 125 34% 84 67% 36 29% 5 4%
Secondary 241 66% 121 50% 29 12% 91 38%
All Diagnoses (ns)
EHR 1 212 58% 115 54% 44 21% 53 25%
EHR 2 154 42% 90 58% 21 14% 43 28%
Primary Dx Only < .003
EHR 1 73 58% 43 59% 27 37% 3 4%
EHR 2 52 42% 41 79% 9 17% 2 4%
Scenario Assessment < .0001
Given 177 48% 129 73% 29 16% 19 11%
Not given 189 52% 76 40% 36 19% 77 41%

Primary diagnoses were significantly less likely to be omitted than secondary diagnoses. Only 5 out of 125 expected primary diagnoses were not entered. These omissions were all for the same diagnosis: type 1 diabetes with diabetic kidney disease. The omission rate for secondary diagnoses was 38%, with only a half coded appropriately. There was no significant effect of the EHR used to enter the codes on the main outcome measures and the proportions in each category were similar. However, when we analyzed primary diagnoses separately we found that significantly more appropriate codes were entered with EHR 2 (79%) than with the commercial EHR 1 (59%). The proportion of omitted codes was equally low for both systems (4%). Scenarios that included clinical assessment with the narrative showed higher rates of appropriate codes (73% vs. 40%) and reduced rates of codes that were omitted by three quarters (41% to 11%).

We next assessed the frequency and distribution of ICD-10 codes for every diagnosis in the scenarios (Table 3). The distribution pattern was typically a majority of entries in one or two appropriate codes and a few other codes with frequencies of one or two. For example, out of the 17 codes entered for paroxysmal atrial fibrillation, 13 (76%) were the correct I48.0 codes, two were I48.2 and two I48.9. Seven diagnoses had between 2 and 4 codes, five between 5 and 7 and two had nine and ten different codes, respectively. Two scenarios showed markedly different distribution: type 1 diabetes with diabetic polyneuropathy and Crohn’s disease of large intestine with abscess. In the diabetes case, only 4 out of the 15 codes entered (27%) were appropriate and the rest was distributed over 8 other diagnoses, each used by just one participant (by two in one case). For Crohn’s disease, 13 (48%) codes were the appropriate K50.114 and 8 other codes were also entered. There were three diagnoses with a complete agreement in coding among all participants: end-stage renal disease (N18.6), essential (primary) hypertension (I10) and acute upper respiratory infection (J06.9).

Table 3.

Frequency of entered diagnoses (*rated appropriate)

ICD-10 Scenario Set A N
Immunization
Z23 Encounter for immunization Streptococcal tonsillitis 2
J02.0* Streptococcal pharyngitis 16
J03.00* Acute streptococcal tonsillitis, unspecified 4
J03.01 Acute recurrent streptococcal tonsillitis 1
J03.90 Acute tonsillitis, unspecified 1
Moderate persistent asthma
J45.20 Mild intermittent asthma, uncomplicated 1
J45.31 Mild persistent asthma w. acute exacerbation 1
J45.40* Moderate persistent asthma, uncomplicated 8
J45.41* Moderate persistent asthma w. acute exacerbation 9
J45.50 Severe persistent asthma, uncomplicated 1
J45.901 Unspecified asthma with acute exacerbation 2
Seasonal allergic rhinitis
J30.1* Allergic rhinitis due to pollen 1
J30.2* Other seasonal allergic rhinitis 13
J30.9* Allergic rhinitis, unspecified 3
J31.0 Chronic rhinitis 1
T78.40XA Allergy, unspecified, initial encounter 1
Second-hand smoke exposure
P96.81 Exposure to (parental) (environmental) tobacco smoke in the perinatal period 1
T59.811A Toxic effect of smoke, accidental (unintentional), initial encounter 1
Z57.31 Occupational exposure to environmental tobacco smoke 1
Z77.22* Contact with and (suspected) exposure to environmental tobacco smoke 14
Z77.29 Contact with and (suspected) exposure to other hazardous substances 1
Crohn’s disease of large intestine with abscess
K50.10 Crohn’s disease of large intestine without complications 1
K50.114* Crohn’s disease of large intestine with abscess 13
K50.914 Crohn’s disease, unspecified, with abscess 5
K65.0 Generalized (acute) peritonitis 1
L02.91 Cutaneous abscess, unspecified 1
R10.0 Acute abdomen 2
R10.81 Abdominal tenderness 1
R10.84 Generalized abdominal pain 2
Z87.19 Personal history of other diseases of thedigestive system 1
Old myocardial infarction
I21.3 ST elevation (STEMI) myocardial infarction of unspecified site 1
I25.10* Atherosclerotic heart disease of native coronary artery without angina pectoris 8
I25.2* Old myocardial infarction 3
I25.810 Atherosclerosis of coronary artery bypass graft(s) without angina pectoris 1
Personal history of nicotine dependence
F17.211* Nicotine dependence, cigarettes, in remission 2
Z72.0 Tobacco use 1
Z87.891* Personal history of nicotine dependence 3
ICD-10 Scenario Set B N
Acute recurrent serous otitis media, bilateral
H65.0 Acute serous otitis media 1
H65.01 Acute serous otitis media, right ear 1
H65.02 Acute serous otitis media, left ear 1
H65.03 Acute serous otitis media, bilateral 1
H65.06* Acute serous otitis media, recurrent, bilateral 11
H65.07 Acute serous otitis media, recurrent, uns. ear 1
H66.003 Acute suppurative otitis media without spontaneous rupture of ear drum, bilateral 2
Acute upper respiratory infection
J06.9* Acute upper respiratory infection, unspecified 17
Exposure to environmental tobacco smoke
Y26.XXXA Exposure to smoke, fire and flames, undetermined intent, initial encounter 2
Z77.22* Contact with and (suspected) exposure to environmental tobacco smoke 13
Z77.9 Other contact with and (suspected) exposures hazardousto health 1
Type 1 diabetes withdiabetic chronic kidney disease
E10.21* Type 1 DM with diabetic nephropathy 3
E10.22* Type 1 DMwith diabetic chronic kidney dis. 6
E10.29 Type 1 DM with diabetic other chronic kidney complication 1
E10.9 Type 1 DMwithout complications 2
E11.9 Type 2 DMwithout complications 1
Type 1 diabetes with diabetic polyneuropathy
E08.40 DMdue to underlying condition with diabetic neuropathy, unspecified 1
E10.40* Type 1 DMwithdiabetic neuropathy, unspec. 1
E10.42* Type 1 DMwith diabetic polyneuropathy 3
E11.40 Type 2 DMwith diabetic neuropathy, unspec. 3
E11.43 Type 2 DMwith diabetic autonomic (poly) neuropathy 1
E13.42 Other specified DM with diabetic polyneuropathy 1
E34.9 Endocrine disorder, unspecified 1
G62.81 Critical illness polyneuropathy 1
G62.89 Other specified polyneuropathies 1
G62.9 Polyneuropathy, unspecified 2
Dependence on renal dialysis
N18.6* End stage renal disease 1
Z99.2* Dependence on renal dialysis 3
End stage renal disease
N18.6* End stage renal disease 9
Paroxysmal atrial fibrillation
I48.0* Paroxysmal atrial fibrillation 13
I48.2 Chronic atrial fibrillation 2
I48.91 Unspecified atrial fibrillation 2
Essential (primary) hypertension
I10* Essential (primary) hypertension 18
Underdosing of other hypertensive drugs
Z91.128* Patient’s intentional underdosing of medication regimen for other reason 1
Z91.14* Patient’s other noncompliance with medication regimen 7
Z91.19 Patient’s noncompliance with other medical treatment and regimen 4

Search terms and selection of codes

We found no systematic differences in the type of search terms used, such as preference for different abbreviations, character sequences or full query terms that appeared to be EHR-specific. Table 4 shows the combined results for both systems, in descending order of use frequency. Terms with frequencies of 1 are grouped together and their cumulative proportion is shown.

Table 4.

Frequency and proportion of entered search terms (for N=1, the proportion shown is cumulative).

Search Terms for Set A N %
Scenario 1
strep 7 32%
strep pharyngitis 3 14%
tonsillitis 2 9%
(N=10) acute pharyngitis, pharyn, pharyngi, pharyngit, sore throat, strep pha, strep phar, strep pharyng, strep throat, strep tonsilli 1 45%
(N=8) fluimm, flu shot, immuniz, immunization, immunization influ, infliuenza vaccine requirement, influenza, influenza vaccine 1 100%
Scenario 2
asthma 18 78%
(N= 5) moder pers asthma, moderate persistent, moderate persistent asthma, persistent asthma, rad 1 22%
allergic rhinitis 6 25%
allergic rhin 4 17%
aller rhinit 2 8%
seasonal allergic rhinitis 2 8%
(N=10) alergic rhinitis, all rhin, aller, allergic, allergic rhin seaso, allergy, rhinitis, rhinitis aller, seas alle rhi, seasonal allergic rhin 1 42%
smoke 6 17%
smoke exposure 5 14%
second hand smoke 4 11%
smoke exp 3 9%
second hand 2 6%
second-handsmoke 2 6%
tobacco 2 6%
(N=11) environmental smoke, sec smoke, second hand smo, second hand smoke exposure, second smoke, second-hand, secondhand, smoke expo, smoke seconchand, tobacco exposure, tobacco second 1 31%
Scenario 3
crohn 10 29%
abdominal pain 3 9%
crohns 3 9%
abd pain 2 6%
abscess 2 6%
(N=14) Crohn, abcess, abd abs, abdominal abscess, abscess abdomen, abscess abdominal, chron, crohn diseas, crohn’s, crohn’s abscess, crohn’s disease, crohns abs, crohns disease, h/o croh 1 41%
cad 9 50%
mi 2 11%
myocardial infarction 2 11%
(N=5) coronary artery, coronary artery disease, csd, h/o myocardial infarction, ischemic heart disease 1 22%
tobacco 4 22%
former smoker 2 11%
former tobacco 2 11%
smoker 2 11%
smoking 2 11%
(N=6) ex smok, ex-smoker, h/o smoking, history of tobacco, nicotine, past smoker 1 34%
Search Terms for Set B N %
Scenario 1
Otitis 3 12%
serous otitis 3 12%
(N=10) acute otitis media, acute recurrent serous otitis media, acute serous otitis media, ot med, otit med, otitits media, ottitis med, recurrent serous otitis media, sercus otitis media, serous ototis 1 43%
uri 9 50%
viral uri 4 22%
upper respiratory 2 11%
upper respiratory infection 2 11%
(N=1) upper resp 1 6%
smoke 6 21%
smoke exposure 5 17%
secondary smoke 4 14%
secondary smoke exposure 2 7%
(N=12) chronic secondary smoke, parent smoke, passive smoke, secon smok, second hand smok, second smoke, secondary, secondhand smoke, smoke exposur, smoke second, smoker, tobacc expos 1 40%
Scenario 2
diabetes 6 30%
type 1 diabetes 3 15%
dm 2 10%
dm 1 2 10%
type 1 dm 2 10%
(N=4) diabetes 1, dm1 neph, dm1 rena, dm type1 1 25%
ckd 2 25%
diabetes with complication 2 25%
(N=4) diabetes, dialysis, dm 1, dm 1 ckd 1 50%
enuropathy 6 40%
diabetic neuropathy 2 13%
(N=7) diabet neuro, htn, perip neur, peripheral, peripheral neuropathy, renal failure, type 1 dm 1 47%
esrd 7 70%
(N=3) dialysis, end stage renal, type 1 dm 1 30%
Scenario 3
atrial fib 7 35%
afib 4 20%
atrial fibrillation 3 15%
paf 3 15%
(N=3) atrifib, atrial fibrillation intermittent, paroxysmal atrail fibrillation 1 15%
hypertension 7 39%
htn 6 33%
(N=5) benign hypertension, essen hyper, essential htn, essential hypertension, I10 1 22%
compliance 6 24%
noncompliance 3 12%
adherence 2 8%
medication compliance 2 8%
medication noncompliance 2 8%
(N=10) adher, complian, med com, med comp, med noncom, medication, medication nonadherence, non compliance, nonadherence, poor med compliance 1 40%

Relatively few terms were common to the queries of multiple clinicians. The proportion of common terms was typically 20%-40%, rarely exceeding 50% for each diagnosis while those that were unique and used only once comprised the largest group, often 40%-50% of all terms. For example, unique search strings such as ‘secondary’, ‘smoker’, ‘passive smoke’ or ‘parent smoke’ made up 40% of all search terms for the diagnostic code Z77.22 “Contact with and (suspected) exposure to environmental tobacco smoke”. The term ‘smoke’ by itself was used by only 6 participants (21%). Terms that appeared repeatedly in the search queries of multiple clinicians were either single-word disease names such as asthma (used in 78% of all queries for ‘moderate persistent asthma’) or neuropathy (40%), or common abbreviations such as ‘cad’ (50%), ‘uri’ (59%), or ‘esrd’ (70%).

We also examined the interactive behavior of clinicians with respect to repeated formulation of search queries. The mean number of search terms, that is, instances when participants entered a new search string into a free-text field, is shown in Table 5.

Table 5.

Average of search queries and returned results per entered diagnostic code

EHR Queriesentered (N) Queriesper diagnosis Numberof queries per diagnosis (%) Results returned perquery(µ)
(µ) Max (N) 1 2 >2
EHR 1 157 1.38 6 77% 14% 9% 36.13
EHR 2 114 1.41 9 79% 13% 8% 6.14

Participants typically entered only one initial search term. Almost 80% selected a code after first search attempt. Only a small minority (less than 15%) needed to or were willing to change the query term and repeat the search cycle, and fewer still (under 10%) tried more times. Several clinicians repeated the search up to nine times but they acknowledged during the test session that they would not have gone to this length under the time constraints of actual practice. The average of the number of queries per diagnosis was almost equal for the two systems although there was a large difference in the number of returned results. The commercial EHR 1 displayed on average six times as many results as EHR 2.

Discussion

We found a large variability in the accuracy of ICD-10 diagnostic codes in a simulation of six clinical scenarios by clinicians using two EHRs. This varying level of diagnostic accuracy (the use of appropriate codes) seemed to increase when the code could be searched for with relatively ubiquitous and intuitive terms and abbreviations. For example, diagnoses with the rate of appropriate coding close to 80% or higher, such as streptococcal tonsillitis, asthma, rhinitis, upper respiratory infection or atrial fibrillation (Table 1) also had a high proportion of initial search queries repeatedly formulated with the same or similar terms by many clinicians (Table 4). Search terms such as ‘strep’, ‘pharyngitis’, ‘asthma’, ‘rhinitis’ or ‘uri’ likely produced a list of results that included the target code. Conversely, the largest variation in search terms was for diagnoses related to nicotine dependence or tobacco smoke exposure that could be expressed in many ways and do not lend themselves easily to one highly specific word. Terms with relatively low specificity are more difficult to map in reference terminologies to related codes and therefore clinicians were likely looking at different sets of results when completing the same search task with very different initial query terms.

The large number of results that search queries often returned had to be further refined and this extra series of steps was a common source of frustration for many participants. For example, some clinicians pointed out that they had to read through a dozen or more variations on gestational hypertension and preeclampsia included in the results for a male patient before they found the appropriate code for primary hypertension. This low efficiency of the query and entry process may lead over time to adopting a satisficing strategy – a decision-making heuristic in which the first option that seems to address most needs is selected rather than the most optimal choice.12, 13 The tendency of clinicians to initiate only one or two different queries per diagnosis (Table 5) also seems to support the notion that there is a practical (likely time) limit on every query and that clinicians may need to tradeoff accuracy or completeness for efficiency.

The omission rate, while minimal for primary diagnoses, reached almost forty percent for secondary diagnoses. The non-coded conditions, if otherwise documented in notes, may not substantially affect the treatment of individual patients but would provide a distorted view of the patient population with underestimates of disease severity and comorbidity. For example, old myocardial infarction, dependence on dialysis and personal history of nicotine dependence were more often missing (up to almost 80%) than coded (Table 1). Leaving out such key clinical information in the diagnostic codes may have financial implications for the clinicians and may have research repercussions for those using such data for investigations.

Conclusion

We demonstrated significant deficiencies in documented diagnostic codes using clinical simulations of two different EHR systems. The promise of improved clinical documentation with ICD-10 may not be quickly realized due to three main factors. First, clinicians entering codes are not adequately trained to understand the requirements and nuances of the ICD-10 coding system. Additional training and specialized resources may be necessary at many institutions. Second, the design of query systems in EHRs may negatively affect the code selection process and lead clinicians to choose a less specific or less desirable code. Lastly, accurate understanding of the meaning of a particular code, especially those that indicate “unspecified,” is often possible only within the context of the code hierarchy. Clinicians therefore may need to have a good working knowledge of the ICD-10 structure to correctly identify what is unspecified and what may need to be specified. The core objective of decision support interventions is to assist them with this comparison by clearly contextualizing the returned candidate entries as the code set is too large and complex to be effectively learned in its entirety. The lack of appropriate coding may also affect reimbursement rates, especially with increased adoption of alternate payment models that rely on case-mix analysis to adjust financial reimbursement, as well as on potential utility of ICD-10 codes for research and other purposes.

ICD-10 will improve national healthcare initiatives such as Meaningful Use, value-based purchasing, payment reform and quality reporting. Without ICD-10 data, there will be serious gaps in the ability to extract important patient health information needed to support research and public health reporting, and move to a payment system based on quality and outcomes.14 These goals can be achieved if HIT better supports clinicians in making adequately informed choices by providing decision support, guidance and effective tools.

References

  • 1.DiSantostefano J. Getting to Know the ICD-10-CM. The Journal for Nurse Practitioners. 2009;6:149–50. [Google Scholar]
  • 2.Topaz M, Shafran-Topaz L, Bowles KH. ICD-9 to ICD-10: evolution, revolution, and current debates in the United States. Perspectives in health information management / AHIMA, American Health Information Management Association. 2013;10:1d. [PMC free article] [PubMed] [Google Scholar]
  • 3.Ritchie MD, Denny JC, Crawford DC, et al. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. American journal of human genetics. 2010 Apr 09;86:560–72. doi: 10.1016/j.ajhg.2010.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Leppert MA. ICD-9-CM vs. ICD-10-CM: Examine the Differences in Diabetes Coding. 2012. [cited 2017 March 7]. Available from: http://www.hcpro.com/HIM-282660-8160/ICD9CM-vs-ICD10CM-Examine-the-differences-in-diabetes-coding.html.
  • 5.Saba VK. Why the home health care classification is a recognized nursing nomenclature. Comput Nurs. 1997 Mar-Apr;15:S69–76. [PubMed] [Google Scholar]
  • 6.Office of the Secretary. In: Department of Health and Human Services, editor. Washington, DC: Office of the Federal Register; 2015. 2015 Edition Health Information Technology (Health IT) Certification Criteria, 2015 Edition Base Electronic Health Record (EHR) Definition, and ONC Health IT Certification Program Modifications; Final Rule; pp. 62601–759. [PubMed] [Google Scholar]
  • 7.Butz J, Brick D, Rinehart-Thompson LA, Brodnik M, Agnew AM, Patterson ES. Differences in coder and physician perspectives on the transition to ICD-10-CM/PCS: A survey study. Health Policy and Technology. 2016;5:251–9. [Google Scholar]
  • 8.Cartagena FP, Schaeffer M, Rifai D, Doroshenko V, Goldberg HS. Leveraging the NLM map from SNOMED CT to ICD-10-CM to facilitate adoption of ICD-10-CM. J Am Med Inform Assoc. 2015 May;22:659–70. doi: 10.1093/jamia/ocu042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Morae. 3.3 ed. Okemos. MI: TechSmith Corporation. 2012.
  • 10.Centers for Medicare & Medicaid Services. Interactive Case Studies. Road to 10: The Small Physician Practice’s Route to ICD-10. 2015. [cited 2016 March 1]. Available from: http://www.roadto10.org/ics/
  • 11.SAS. 9.4 ed. Cary. NC: SAS Institute, Inc. 2017.
  • 12.Gigerenzer G, Goldstein DG. Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review. 1996 Oct;103:650–69. doi: 10.1037/0033-295x.103.4.650. [DOI] [PubMed] [Google Scholar]
  • 13.Wilkinson SC, Reader W, Payne SJ. Adaptive browsing: Sensitivity to time pressure and task difficulty. International Journal of Human-Computer Studies. 2012;70:14–25. [Google Scholar]
  • 14.AHIMA. What are the benefits of ICD-10? 2015. cited; Available from: http://www.ahima.org/topics/icd10/faqs.

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES