Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2020 Mar 4;2019:438–447.

The Use of Inter-terminology Maps for the Creation and Maintenance of Value Sets

Kin Wah Fung 1, Julia Xu 1, Sigfried Gold 1
PMCID: PMC7153132  PMID: 32308837

Abstract

Value sets are essential in activities such as electronic clinical quality measures (eCQM) and patient cohort definition. Creation and maintenance of value sets is labor intensive and error prone. Our method aims to use existing inter-terminology maps to improve the quality of value sets that are defined in more than one terminology. For 197 eCQM value sets defined in SNOMED CT plus ICD-9-CM and/or ICD-10-CM, the map-generated codes showed good overlap with the value set codes. Manual review showed that some new codes identified by mapping should probably be included in the value sets. This could potentially augment the ICD-9-CM codes by 45% (1.5 codes), ICD-10-CM codes by 25% (1.8 codes) and SNOMED CT codes by up to 42% (4.8 codes) per value set on average. The mapping between SNOMED CT and ICD-10-PCS did not perform as well because of the granularity discrepancy in the map.

Introduction

A value set is typically a list of codes taken from a biomedical terminology that collectively defines the scope of a clinical concept. Value sets can be used to identify patient cohorts, measurement criteria for clinical quality improvement, allowable values for a data element (in a survey instrument or the electronic health record), among other uses. One important function of value sets is to ensure that health information from disparate sources are interoperable, whether to support collective data analytics or continuity of patient care. Towards this goal, value set codes are often derived from commonly used clinical terminology standards such as SNOMED CT, ICD code sets (ICD-9-CM, ICD-10-CM and ICD-10-PCS), LOINC and RxNorm.

Value sets play a crucial but often unappreciated and unnoticed role in the capture, use, and analysis of clinical information. Of all the professionals who interact with and rely on value sets in their work with health records, a small fraction is likely to know what a value set is, even in the informatics community, which has a deep, foundational appreciation for the role of standardized terminologies in the complex ecosystem of medical care and health sciences.

Creation of value sets is a labor-intensive process that usually involves subject matter experts working with terminologists to identify codes to include (and exclude) for a particular use case. Maintenance of value sets is essential to keep them up-to-date, since terminologies do evolve over time. The burden of curation is even bigger for value sets that are defined by more than one terminology because the same clinical information can be encoded by different code systems e.g., SNOMED CT and ICD-10-CM for diagnosis. Previous studies have investigated automated methods to assist the authoring and quality assurance of value sets, making use of information such as the hierarchical structure of terminologies and the semantic types of the codes. 1–3 This study describes a novel approach of using existing inter-terminology maps to improve the quality of multi-terminology value sets. In an earlier study, we have shown that published maps between ICD-9-CM and ICD-10-CM can help in the translation of codes in value sets.4

High quality and up-to-date value sets will generally increase their re-use and utility.5 This paper concentrates on one domain where value set reuse is assured by mandate and incentive: as parts of eCQM (electronic clinical quality measures).6 The Centers for Medicare & Medicaid Services (CMS) use eCQM in a variety of quality reporting and value-based purchasing programs. The main advantage of electronic quality measurement is that the burden of manual chart pulling, data extraction and reporting can be much reduced. In addition, there is reduced lag time from documentation to reporting and potential access to real time data for quality measurement.

The National Library of Medicine (NLM) in collaboration with the Office of the National Coordinator for Health Information Technology (ONC) and CMS runs the Value Set Authority Center (VSAC), a repository, website, and set of application programming interfaces (APIs) for public value sets created by external parties. 7,8 The VSAC does not create value set content but provides tools for value set authors to create and maintain their value sets. Through the VSAC, users can download all official, up-to-date versions of eCQM value sets. VSAC also hosts value sets from other sources, such as the HL7 C-CDA (Consolidated Clinical Document Architecture) and CMS Core Clinical Data Elements and Hybrid Measures.

Methods

We downloaded all the eCQM value sets from the VSAC website. Among the eCQM value sets, we identified those that were defined using more than one terminology. One of the terminologies had to be SNOMED CT, and the other could be one or more of ICD-9-CM, ICD-10-CM and ICD-10-PCS. ICD-9-CM included both diagnosis and procedure codes. We excluded the ICD-9-CM procedure codes, which were used in only a small number (2 out of 140) of value sets with ICD-9-CM codes. In addition to the lists of codes, the download file also contained metadata about the value set, including the value set OID, value set name, version and purpose. The purpose described the clinical focus, data element scope, inclusion and exclusion criteria. However, the purpose was not always filled in.

Generation of the reference inter-terminology maps

1. SNOMED CT and ICD-9-CM map

To generate the map between SNOMED CT and ICD-9-CM, we combined two sources of mappings. The first was the ICD-9-CM Diagnostic Codes to SNOMED CT Map published by NLM.9 The purpose of this map is to facilitate translation of ICD-9-CM codes to SNOMED CT. Even though the U.S. moved from ICD-9-CM to ICD-10-CM in 2015, many existing electronic health record systems still contain clinical information encoded in ICD-9-CM. To facilitate migration to SNOMED CT as the primary clinical terminology for patient health problems (diseases and conditions), it is desirable that the legacy ICD-9-CM data be translated to SNOMED CT. This map is updated yearly to synchronize with the latest version of SNOMED CT. This map is divided into two parts: the one-to-one and one-to-many maps. From both parts, we extracted pairs of SNOMED CT and ICD-9-CM codes into our map. The second source of mappings was the SNOMED CT to ICD-9-CM map published by SNOMED International. The original goal of this map was to facilitate the generation of ICD-9-CM codes from SNOMED CT-encoded clinical information. Since most countries using ICD-9-CM have moved to ICD-10-CM, SNOMED International stopped the maintenance of this map in 2016. We retrieved the last version of this map through the MRMAP table in the UMLS (2016AA version). We harvested pairs of SNOMED CT and ICD-9-CM codes from this map, excluding SNOMED CT concepts that have become inactive since the last update of the map. The combined set of code pairs formed our SCT-I9 Map.

2. SNOMED CT and ICD-10-CM map

We used the SNOMED CT to ICD-10-CM map published by NLM.10 This is a rule-based map that maps all SNOMED CT concept from three hierarchies (Clinical finding, Event and Situation with explicit context) to ICD-10-CM. The map is rule-based to cater for the ICD-10-CM coding rules, which sometimes stipulate that the same disease be coded differently according to patient age, gender and co-morbidities. For example, the SNOMED CT concept Failure to gain weight (36440009) may be coded to ICD-10-CM codes Adult failure to thrive (R62.7), Failure to thrive in newborn (P92.6) or Failure to thrive (child) (R62.51) depending on age. In the NLM map, there is always a default map target that is used when no additional information is available. For our study, we harvested all pairs of SNOMED CT and their default ICD-10-CM map targets to create our SCT-I10 Map.

3. SNOMED CT and ICD-10-PCS map

For mapping between SNOMED CT and ICD-10-PCS, we used the mappings contained in the vocabulary resource of the OHDSI Consortium (Observational Health Data Sciences and Informatics). 11,12 The purpose of the OHDSI vocabulary is to enable transparent and consistent content across disparate observational databases to support efficient and reproducible research. The mappings are expressed as ‘is a’ relationships between an ICD-10-PCS code and a SNOMED CT concept. For example, the ICD-10-PCS code Bypass Cerebral Ventricle to Intestine with Synthetic Substitute, Open Approach (00160J5) is a Ventriculostomy (63933000) in SNOMED CT. We harvested all the pairs of SNOMED CT and ICD-10-PCS codes linked by ‘is a’ to form our SCT-PCS Map.

Identification of map-generated code lists for each value set using the reference maps

For each value set, we used the reference maps to find mappings for the original value set codes in each terminology. We called the codes found by mapping the “map-generated codes”. We used the reference maps in either direction. For example, if the value set contained SNOMED CT, ICD-9-CM and ICD-10-CM codes, we would generate the following four lists of map-generated codes

  • I9-from-SCT codes - ICD-9-CM codes identified based on the original SNOMED CT codes in the value set mapped through the SCT-I9 Map

  • I10-from-SCT codes - ICD-10-CM codes identified based on the original SNOMED CT codes in the value set mapped through the SCT-I10 Map

  • SCT-from-I9 codes – SNOMED CT codes identified based on the original ICD-9-CM codes in the value set mapped through the SCT-I9 Map

  • SCT-from-I10 codes – SNOMED CT codes identified based on the original ICD-10-CM codes in the value set and mapped through the SCT-I10 Map

Evaluation of the map-generated codes

For each value set, we evaluated the map-generated codes in two aspects. First, we wanted to know how closely the map-generated codes resembled the original value set codes. Second, we wanted to know whether the map-generated codes could help to identify codes that should have been included in the value set. The evaluation was done as follows:

1. Jaccard similarity coefficient

We used the Jaccard coefficient as an indicator of the similarity between the original value set codes and the map-generated codes for each terminology of each value set. The Jaccard coefficient was defined as the number of codes common to both the original and map-generated code lists divided by the number of codes in the union of the two lists. The Jaccard coefficient generally reflects the degree of overlap between two lists of items in which the number of items can be different. As an example, for a value set with SNOMED CT, ICD-9-CM and ICD-10-CM codes, we would generate four Jaccard scores between:

  1. original ICD-9-CM and I9-from-SCT codes

  2. original ICD-10-CM and I10-from-SCT codes

  3. original SNOMED CT and SCT-from-I9 codes

  4. original SNOMED CT and SCT-from-I10 codes

2. Manual review

We performed manual review of all the codes found by mapping that were not present among the original codes in a value set. For practical purposes, we limited our review to cases in which the total number of codes (both map-generated and original) for the pair of terminologies in focus was less than 100. The reviewer judged whether the new code found by the map was:

  1. Exact match – the map-generated code matched exactly the meaning of one of the original codes in the value set. The map-generated code should probably be included in the value set.

  2. Approximate match, candidate – the map-generated code was similar in meaning to at least one original code in the value set, and should be considered as a candidate for addition to the value set. We judged the suitability of inclusion based on the stated purpose of the value set and the inclusion criteria, if explicitly stated. Otherwise, we would rely on the name of the value set and other codes in the value set to make our judgment.

  3. Approximate match, not a candidate – the map-generated code was similar in meaning to at least one original code in the value set, but was not considered a candidate for addition due to the inclusion and exclusion criteria, if explicitly stated.

  4. Not a match – the map-generated code did not match any of the original value set codes.

In interpreting the review results, we assumed that all codes in a terminology that were within the scope of the value set (as judged by the name, accompanying documentation and the original value set codes) should be included in the value set. We assumed that there was no exclusion for certain special kind of codes (e.g., unspecified codes) unless explicitly stated. We assumed that the code list in one terminology in a value set could be used independently from other code lists in a different terminology. Finally, we also assumed that map-generated codes that did not fall within the scope of the value set and did not match any of the original value set codes should not be included in the value set.

Results

The value sets

We downloaded 715 eCQM value sets (version September 2018) from VSAC of which 221 contained SNOMED CT and one or more of the other target terminologies. Table 1 shows the combination of terminologies in these value sets.

Table 1.

Distribution of terminologies among the value sets

SNOMED CT ICD-9-CM ICD-10-CM ICD-10-PCS Number of value sets with this combination of terminologies
137
60
21
* 2
1
Number of value sets with this terminology:
221 138* 197 23 Total: 221

*2 value sets using ICD-9-CM procedure codes excluded from study

Among the 221 value sets, only 83 (38%) had explicitly stated purpose, scope, inclusion and exclusion criteria.

In the following section, we present our findings according to the pair of terminologies covered by the reference maps that we created.

1. SNOMED CT and ICD-9-CM map

The SCT-I9 Map we created could provide mapping to 89,043 SNOMED CT concepts, which represented 72% of concepts in the three relevant hierarchies (Clinical finding, Event and Situation with special context). On the ICD-9-CM side, it covered 13,014 (89%) diagnostic ICD-9-CM codes.

Table 2 summarizes the results of using the SCT-I9 Map to generate candidate value set codes. The map was able to suggest some codes in almost all the value sets in either direction of mapping. For the map-generated codes, 62% of the ICD-9-CM codes and 53% of SNOMED CT codes already existed in the value set. For each value set, we measured the similarity between the original codes in the value set and the map-generated codes for the same terminology by calculating the Jaccard coefficient. The average of the Jaccard coefficients for ICD-9-CM and SNOMED CT was 0.5 and 0.38 respectively. We manually reviewed all the map-generated codes that were not among the original value set codes, limiting to value sets with 100 codes or less. We reviewed a total of 305 ICD-9-CM and 789 SNOMED CT codes from 89 value sets.

Table 2.

Results of mapping between SNOMED CT and ICD-9-CM

Direction in which the SCT-I9 Map is used
ICD-9-CM from SNOMED CT SNOMED CT from ICD-9-CM
Number of value sets: 138 138
Value sets with map-generated codes(%) 136 (99%) 138 (100%)
Number of codes:
Original codes in value sets 3,162 ICD-9-CM codes 9,964 SNOMED CT codes
Map-generated codes 3,671 ICD-9-CM codes 15,378 SNOMED CT codes
Map-generated codes found in value sets (%) 2,287 (62%) 8,203 (53%)
Jaccard score: macro-average (median) 0.5 (0.5) 0.38 (0.35)
Manual review:
Value sets reviewed 87 89
Original codes in value sets 284 ICD-9-CM codes 1,025 SNOMED CT codes
Map-generated codes reviewed 305 ICD-9-CM codes 789 SNOMED CT codes
Exact match (%) 102 (33%) 35 (4%)
Approximate match, candidate (%) 50 (16%) 357 (45%)
Approximate match, not candidate (%) 0 (0%) 18 (2%)
Not a match (%) 153 (50%) 379 (48%)

For the map-generated ICD-9-CM codes, 33% of the codes not present in the original value set were found to be exact matches to another code in the value set. For example, in the Optic Neuritis value set, the ICD-9-CM code Meningococcal optic neuritis (036.81) was found by mapping but not in the original value set. However, the SNOMED CT code Meningococcal optic neuritis (73431005) was in the original value set. A further 16% of the map-generated ICD-9-CM codes were approximate matches and could be considered candidates for addition. For example, in the Uveitis value set, the ICD-9-CM code Syphilitic uveitis, unspecified (091.50) was found by mapping but not in the original value set. This code should probably be added to the value set since it also contained the SNOMED CT code Uveitis due to secondary syphilis (186854007). Overall, 50% of the reviewed ICD-9-CM codes did not match any of the value set codes.

For the map-generated SNOMED CT codes, 4% of the codes not present in the original value set were found to be exact matches. For example, in the Hypotension value set, the SNOMED CT code Orthostatic hypotension (28651003) was found by mapping but not in the original value set. However, the ICD-9-CM code Orthostatic hypotension (458.0) was in the original value set. Almost half (45%) of the map-generated SNOMED CT codes were approximate matches and candidates for addition. For example, in the Bradycardia value set, the SNOMED CT code Symptomatic sinus bradycardia (444605001) was found by mapping but missing from the value set. This code should be considered a candidate to be added to the value set since it also contained the SNOMED CT code Severe sinus bradycardia (49044005). A small number of codes (2%) were approximate matches but not considered candidates for addition. This number could be artificially low because only a third of all value sets had explicitly stated purpose, inclusion and exclusion criteria, which were needed to determine if a code belonged to this category. For example, in the Lupus value set, the SNOMED CT code Systemic lupus erythematosus in remission (698694005) was an approximate match but not a candidate because the exclusion criteria explicitly excluded “systemic lupus erythematosus in childhood or in remission”. Overall, 48% of the reviewed SNOMED CT codes did not match any of the value set codes.

The unmatched codes generally belonged to two types. The first type was mapped-generated codes based on a part of a composite term that was not relevant to the value set. For example, the Lupus value set contained the SNOMED CT code Pericarditis co-occurrent and due to systemic lupus erythematosus (25380002), which was mapped to the ICD-9-CM code Other acute pericarditis (420.99) that was outside the scope of the value set. The second type was related to the “Not Elsewhere Classified” (residual or catch-all) codes in ICD-9-CM. For example, the Bradycardia value set contained the ICD-9-CM code Other specified cardiac dysrhythmias (427.89) which was mapped to many SNOMED CT codes, including Sinus tachycardia (11092001) that were not within the scope of the value set.

2. SNOMED CT and ICD-10-CM map

The SCT-I10 Map we created could provide mapping to 110,184 SNOMED CT concepts, which represented 89% of concepts in the three relevant hierarchies (Clinical finding, Event and Situation with special context). On the ICD-10-CM side, it covered 30,576 (43%) ICD-10-CM codes.

Table 3 summarizes the results of using the SCT-I10 Map to generate candidate value set codes. The map was able to suggest some ICD-10-CM and SNOMED CT codes in 98% and 89% of the value sets respectively. For the map-generated codes, 59% of the ICD-10-CM codes and 37% of SNOMED CT codes already existed in the value set. The macro-average of the Jaccard coefficients for ICD-10-CM and SNOMED CT was 0.3 and 0.32 respectively.

Table 3.

Results of mapping between SNOMED CT and ICD-10-CM

Direction in which the SCT-I10 Map is used
ICD-10-CM from SNOMED CT SNOMED CT from ICD-10-CM
Number of value sets: 197 197
Value sets with map-generated codes(%) 193 (98%) 176 (89%)
Number of codes:
Original codes in value sets 23,260 ICD-10-CM codes 18,482 SNOMED CT codes
Map-generated codes 8,102 ICD-10-CM codes 40,812 SNOMED CT codes
Map-generated codes found in value sets (%) 4,752 (59%) 14,947 (37%)
Jaccard score: macro-average (median) 0.3 (0.2) 0.32 (0.29)
Manual review:
Value sets reviewed 112 96
Original codes in value sets 806 ICD-10-CM codes 1,079 SNOMED CT codes
Map-generated codes reviewed 547 ICD-10-CM codes 1,124 SNOMED CT codes
Exact match (%) 136 (25%) 52 (5%)
Approximate match, candidate (%) 127 (23%) 811 (72%)
Approximate match, not candidate (%) 0 (0%) 5 (0.4%)
Not a match (%) 284 (52%) 256 (23%)

We reviewed 547 ICD-10-CM and 1,124 SNOMED CT codes (from 112 and 96 value sets) that were not present in the original value sets. For the map-generated ICD-10-CM codes, 25% of the codes were found to be exact matches to another code in the value set. For example, in the Chlamydia value set, the ICD-10-CM code Chlamydial peritonitis (A74.81) was found by mapping but not in the value set. However, the SNOMED CT code Chlamydial peritonitis (197172005) was in the original value set. A further 23% of the map-generated ICD-10-CM codes were approximate matches and could be considered candidates for addition. For example, in the Breastfeeding value set, the ICD-10-CM code Hypogalactia (O92.4) was found by mapping but not in the value set. This code should be considered for addition to the value set since it also contained the ICD-10-CM code Suppressed lactation (O92.5). Overall, 52% of the reviewed ICD-10-CM codes did not match any of the value set codes.

For the map-generated SNOMED CT codes, 5% of the codes not present among the original value set codes were found to be exact matches. For example, in the Rubella value set, the SNOMED CT code Rubella meningitis (1092351000119107) was found by mapping but not in the value set. However, the ICD-10-CM code Rubella meningitis (B06.02) was in the value set. Almost three-quarter (72%) of the map-generated SNOMED CT codes were approximate matches and candidates for addition. For example, in the Proteinuria value set, the SNOMED CT code Microalbuminuria due to type 1 diabetes mellitus (18521000119106) was found by mapping but missing from the value set. This code should be considered a candidate to be added to the value set since it also contained the SNOMED CT code Microalbuminuria (312975006). A small number of codes (0.4%) were approximate matches but not candidates for addition. For example, in the Chronic Malnutrition value set, the SNOMED CT code Progressive encephalopathy with severe infantile anorexia (715794009) was an approximate match but not considered a candidate because of the exclusion criteria “all diagnosis codes for feeding disorders of infancy or childhood”. Overall, 23% of the reviewed SNOMED CT codes did not match any of the value set codes.

The unmatched codes could generally be attributed to the same reasons as for the SCT-I9 Map (see above).

3. SNOMED CT and ICD-10-PCS map

The SCT-PCS Map we created could provide mapping to 4,563 SNOMED CT concepts, which represented 8% of concepts in the Procedure hierarchy. On the ICD-10-PCS side, it covered 40,378 (51%) ICD-10-PCS codes.

Table 4 summarizes the results of using the SCT-PCS Map to generate candidate value set codes. The map was able to suggest some ICD-10-PCS codes in 83% of the value sets, and some SNOMED CT codes in all value sets. For the map-generated codes, 47% of the ICD-10-PCS codes and 34% of SNOMED CT codes already existed in the value set. The macro-average of the Jaccard coefficients for ICD-10-PCS and SNOMED CT was 0.38 and 0.08 respectively.

Table 4.

Results of mapping between SNOMED CT and ICD-10-PCS

Direction in which the SCT-PCS Map is used
ICD-10-PCS from SNOMED CT SNOMED CT from ICD-10- PCS
Number of value sets: 23 23
Value sets with map-generated codes(%) 19 (83%) 23 (100%)
Number of codes:
Original codes in value sets 4,507 ICD-10-PCS codes 2,259 SNOMED CT codes
Map-generated codes 3,337 ICD-10-PCS codes 680 SNOMED CT codes
Map-generated codes found in value sets (%) 1,585 (47%) 228 (34%)
Jaccard score: macro-average (median) 0.38 (0.27) 0.08 (0.05)
Manual review:
Value sets reviewed 8 12
Original codes in value sets 57 ICD-10-PCS codes 137 SNOMED CT codes
Map-generated codes reviewed 99 ICD-10-PCS codes 48 SNOMED CT codes
Approximate match, candidate (%) 15 (15%) 9 (19%)
Not a match (%) 84 (85%) 39 (81%)

We reviewed a total of 99 ICD-10-PCS and 48 SNOMED CT codes (from 8 and 12 value sets) that were not present in the original value sets. For the map-generated ICD-10-PCS codes, 15% were approximate matches and could be considered candidates for addition. For example, in the Hip Fracture Surgery value set, the ICD-10-PCS code Repair Right Hip Joint, Percutaneous Endoscopic Approach (0SQ94ZZ) was found by mapping but not in the original value set. This code should be considered for addition to the value set since it also contained the ICD-10-PCS code Repair Right Hip Joint, Open Approach (0SQ90ZZ). The remaining 85% of the reviewed ICD-10-PCS codes did not match any of the value set codes.

For the map-generated SNOMED CT codes, 19% of the codes not present in the original value set were found to be approximate matches and candidates for addition. For example, in the BH Outpatient Psychotherapy value set, the SNOMED CT code Individual psychotherapy (18512000) was found by mapping but missing from the value set. This code could be considered a candidate to be added to the value set which also contained the SNOMED CT code Group psychotherapy (76168009). The remaining 81% of the reviewed SNOMED CT codes did not match any of the value set codes.

Most of the unmatched codes were related to the fact that the majority of the SNOMED CT codes in the SCT_PCS Map were broader than any of the value set codes (see Discussion).

Discussion

Value sets are useful resources and help to ensure the validity and comparability of data collected for various purposes. However, the creation and maintenance of value sets is a non-trivial process. Terminologies are constantly updated to reflect advance in biomedical science and changes in coding rules. While it is relatively straightforward to identify codes that have become obsolete when a terminology is updated, it is more difficult to identify new codes that should be included in a value set. In medical classifications like ICD-9-CM or ICD-10-CM, due to the requirement of a single (as opposed to a poly-) hierarchy, new codes related to the same condition are sometimes added to a different branch from the existing codes (e.g., in ICD-9-CM, hypertension in pregnancy is put under the chapter Complications of pregnancy, childbirth and puerperium), making them more liable to be missed by the value set editors. If value sets are not properly maintained, their quality will deteriorate over time, with deleterious effects on the validity of the data generated using these value sets. The impact can range from inaccurate calculation of payments to providers due to errors in electronic clinical quality reporting, to missing patients from a research cohort due to an omitted code.

Our study shows that existing inter-terminology maps can be useful adjuncts for the creation and maintenance of value sets that involve more than one terminology. Take the example of mapping between SNOMED CT and ICD-9-CM. There are 138 eCQM value sets that are defined by both of these terminologies. The overlap between the original value set codes and map-generated codes is considerable judging from the Jaccard scores (0.5 and 0.38 for ICD-9-CM and SNOMED CT respectively). This lends support to the general validity of our method that a significant portion of the original value set codes can be found by inter-terminology mapping. More importantly, mapping also uncovers a significant number of codes that are potentially missed by the value set curators. Based on our review, the exact matches are almost certainly missed codes that should be added. We think that a considerable portion of the approximate matches should probably be added as well. If we assume that all the exact matches and half of the approximate match candidate codes are indeed added to the value sets, the SCT-I9 Map would augment the 87 value sets by a total of 127 ICD-9-CM codes (1.5 codes per value set). This represents a 45% increase to the 284 original ICD-9-CM codes in these value sets. (Table 2) By similar assumptions, the SNOMED CT codes in these value sets would be augmented by 21% (2.4 codes per value set) on average. Similarly, the SCT-I10 Map will augment the ICD-10-CM codes by 25% (1.8 codes per value set) and SNOMED CT codes by 42% (4.8 codes per value set) on average. (Table 3)

The results for the SCT-PCS Map are less satisfactory. The overlap between the map-generated codes and the value set codes is on the low side, especially in the ICD-10-PCS to SNOMED CT direction. Manual review did not find any exact matches. This is not surprising considering the differences between SNOMED CT and ICD-10-PCS. ICD-10-PCS codes are generally more granular and contain information (e.g., surgical approach, use of devices and procedure intent) that is not captured in SNOMED CT concepts. Therefore, it is unusual to find an exact match between the two terminologies. The main purpose of the mappings in the OHDSI vocabulary is to support data aggregation and analytics. Many of the SNOMED CT concepts in the map are very general (e.g., Procedure on skin (118718002)), whereas value sets tend to use more specific SNOMED CT concepts. Moreover, the mappings in the OHDSI vocabulary are generated algorithmically with limited human review, unlike the other mapping resources used in this study that are all manually validated. It is possible that the performance of the SCT-PCS Map can be improved by propagating the mappings of the broader SNOMED CT concept to their descendants. We shall explore this option in future.

Use of inter-terminology maps in the creation and maintenance of value sets is likely to improve their quality (e.g., reducing missing codes) and reduce the curation time and effort. All the maps used in this study are in the public domain and regularly updated by the owners (except for the SNOMED CT to ICD-9-CM map from SNOMED International). To be useful, the codes suggested by the maps should have high accuracy (i.e. high signal-to-noise ratio). In our study, the proportion of map-generated new ICD-9-CM, ICD-10-CM and SNOMED CT codes that are either exact or approximate candidate matches are 49%, 48%, and from 49% (mapped from ICD-9-CM) to 77% (mapped from ICD-10-CM) respectively. (Tables 2 and 3) The suggestions based on the SNOMED CT to ICD-10-PCS map are not as useful. In actual implementation, how to present the map-suggested codes to the value set editors in a dynamic and efficient way is an important consideration. We hope to explore this issue in our future research. A potential refinement of our method is to use multiple routes of mapping to hone in on the most promising codes. For example, if a value set contains SNOMED CT, ICD-9-CM and ICD-10-CM codes (the most common type in our study), the SNOMED CT codes generated from the ICD-9-CM and ICD-10-CM maps can be used for mutual validation.

The VSAC has become the de facto home for dissemination of value sets for various purposes. It is also an authoring platform for value sets. The recent addition of the ability to create intensional value sets is likely to reduce the burden of value set curation. Instead of enumerating the codes in a value set (extensional value set), an intensional value set specifies the rule for inclusion of codes. An example will be using the expression ‘377.3-’ instead of a list of ICD-9-CM codes (‘377.30’,’377.31’,’377.32’,’377.33’,’377.34’,’377.39’) to represent the value set of Optic neuritis. When a terminology is updated, it is expected that the intensional definition will pick up new codes automatically, provided that they are subsumed by the same parent codes. However, this is more likely to be true in a multi-hierarchical terminology like SNOMED CT. How useful it is for single-hierarchy terminologies like ICD-9-CM and ICD-10-CM remains to be seen. Another observation in our study is that only 38% of the eCQM value sets have explicitly stated purpose and inclusion/exclusion criteria. This kind of information is essential for users to understand the scope of a value set and determine whether it fits their use case. Without this information, the users can only guess by the name of the value set and the codes it contains. For reliable re-use of value sets, it is essential that value set authors provide the necessary value set metadata and documentation.

We recognize the following limitations in our study. The study only focused on the eCQM value sets that are defined in more than one terminology. The results may not be generalizable to other value sets. We only studied maps between four terminologies, albeit that they are among the most commonly used in value sets. We found significant performance difference among different terminologies. Whether this method is applicable to other terminologies remains to be seen. All manual review of the map-generated codes was done by one author (JX) and not independently validated. We only reviewed value sets with 100 codes or less.

Conclusion

Inter-terminology maps can potentially be used to assist human curators in the creation and maintenance of value sets that are defined by multiple terminologies. The map-generated codes showed good overlap with the original value set codes, lending support to the validity of the method. For value sets using SNOMED CT in combination with ICD-9-CM and/or ICD-10-CM, the map-generated codes could potentially augment the ICD-9-CM codes by 45% (1.5 codes per value set), ICD-10-CM codes by 25% (1.8 codes per value set) and SNOMED CT codes by up to 42% (4.8 codes per value set) on average. The map between SNOMED CT and ICD-10-PCS did not work as well, probably due to the big difference in granularity between the terminologies and the inexact nature of the source mappings.

Acknowledgement

This work was supported in part by the Intramural Research Program of the National Institutes of Health and the National Library of Medicine.

Figures & Table

Figure 1.

Figure 1.

Overall schema of study

References


Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES