Abstract
Background:
The Unified Medical Language System (UMLS) maps relationships between and within >100 biomedical vocabularies, including Current Procedural Terminology (CPT) codes, creating a powerful knowledge resource which can accelerate clinical research.
Methods:
We used synonymy and concepts relating hierarchical structure of CPT codes within the UMLS, (1) guiding surgical experts in expanding the Operative Stress Score (OSS) from 565 originally rated CPT codes to additional, 1,853 related procedures; (2) establishing validity of the association between the added OSS ratings and 30-day outcomes in VASQIP (2015–2018).
Results:
The UMLS Metathesaurus and Semantic Network was converted into an interactive graph database (https://github.com/dbmi-pitt/UMLS-Graph) delineating ontology relatedness. From this UMLS-graph, the CPT hierarchy was queried obtaining all paths from each code to the hierarchical apex. Of 1,853 added ratings, 43% and 76% were siblings and cousins of original OSS CPT codes. Of 857,577 VASQIP cases (mean age, 64±11years; 91% male; 75% white), 786,122 (92%) and 71,455 (8%) were rated in the original and added OSS. Compared to original, added OSS cases included more females (14% vs 9%) and frail patients (25% vs 19%) undergoing high stress procedures (11% vs 8%; all P<.001). Postoperative mortality consistently increased with OSS. Very low stress procedures had <0.5% (original, 0.4% [95%CI, 0.4–0.5%] vs added, 0.9% [95%CI, 0.6–1.2%]) and very high 3.8% (original, 3.5% [95%CI, 3.0–4.0%] vs added, 5.8% [95%CI, 4.6–7.3%]) mortality rates.
Conclusions:
The synonymy and concepts relating biomedical data within the UMLS can be abstracted and efficiently used to expand the utility of existing clinical research tools.
Keywords: Computational biology, Current procedural terminology, Frailty, Unified medical language system, medical informatics, medical information computing, ontologies, surgical procedures
The Unified Medical Language System (UMLS) is a powerful resource using a common structure and disciplined curation to achieve interoperability between diverse biomedical structured vocabularies (e.g., Systematized Nomenclature of Medicine Clinical Terms [SNOMEDCT], Current Procedural Terminology [CPT], International Classification of Diseases [ICD], etc.) and unstructured data (e.g., free text notes of patient encounters) generated through the documentation of routine patient care coded into subject areas with a defined set of properties and relationships with other categories known as ontologies. Begun by the National Library of Medicine in 1986, the UMLS has grown to include the concepts, codes, and terms among >100 vocabularies1 with associated software that exploits the relationships among digital information to find conceptual connections within and across the otherwise completely disparate vocabularies.2 UMLS connects vocabularies through concept synonymy, vitally important for efficient patient care and biomedical research.
The breadth, structure, and complexity of data within the UMLS has enormous potential to aid clinical research.3 However, these strengths also create barriers to use. The intricate networks of data are difficult to conceptualize and requires advanced data management skills to manipulate,4 confining use primarily to biomedical informatics experts and is still typically applied within only specific ontologies rather than for its structure and broad intra- and inter-vocabulary conceptual consistency.
Just as a keen knowledge of anatomy is required for surgeons to perform an operation. We have defined key terms in Table 1 and designed a unique graph schema (i.e., network) connecting relationships between concepts as well as terms within and between vocabularies in the UMLS to facilitate its use by clinical and translational researchers.5 As such, the clinical investigator can overcome previously frustrating limitations in administrative datasets for tools specific to one vocabulary (e.g., CPT) but not another (e.g., ICD). Further, our tool facilitates the capacity of the UMLS to function as a crosswalk between synonymous variables included in disparate administrated datasets. This crosswalk’s capacity also applies to human curated healthcare concepts (e.g., appendicitis treatments; appendectomy and ileocecectomy) to terms and codes that would otherwise be unrelated in structured data both within languages (CPT to CPT code) and between languages (CPT to ICD codes). Through this use case, we describe the structure, organization, and power of the UMLS and its potential in clinical and translational investigation.
Table 1.
Term | Definition | Examples |
---|---|---|
Ontologies | Formalized subject area with a defined set of properties and relationships with other categories | SNOMEDCT |
Vocabulary | Information organized by indexing or categorizing defined terms | CPT, ICD, MeSH |
Vocabulary code | Smallest unit within a vocabulary within the UMLS | CPT Code 5-digit number: 48150 |
Vocabulary term | Text string that corresponds to the vocabulary code | Whipple; with pancreatojejunostomy |
UMLS concept | Concepts describing and capturing codes/terms which may be specific to single or spanning across many vocabularies | Resection of pancreas |
UMLS Concept Unique Identifier (CUI) | Alphanumeric code for a UMLS concept | C0372232 |
UMLS preferred term | Text string corresponding to the UMLS concept, preferentially used by the UMLS | Resection of pancreas |
Semantic types | Description of concepts with varying levels of granularity and specificity, organized in a hierarchy | Therapeutic or Prevention Procedure |
Graph schema | Network connecting codes, terms, concepts, and semantic types both with and without a hierarchical structure and relatedness | UMLS-Graph |
Property graph | Graphing model where the connections between codes, terms, concepts, and sematic types themselves have a meaning (i.e., a name) and properties. | “branch_of”, “component_of |
Abbreviations: SNOMEDCT, Systematized Nomenclature of Medicine Clinical Terms; CPT, Current Procedural Terminology, MeSH, medical subject headings, UMLS, Unified Medical Language System.
Methods and Materials
Clinical Context and Research Problem
Despite the ubiquity of CPT codes, most health services research examines patient populations defined by families of similar procedures, treated by groups of specialists, focusing on a subset of physiological diagnoses. Lack of a common framework to group procedures across specialties and clinical diagnoses according to the rate of key outcomes has limited analysis of the entire range of surgical procedures. To address this limitation, our prior work applied a modified Delphi consensus methodology to develop the Operative Stress Score (OSS) classifying 565 CPT codes according to the degree of physiologic stress imposed by each procedure, ranging from very low to very high (OSS1–5).6,7 Although these original 565 CPT codes represented 90% of the procedures performed in Veterans Affairs (VA) hospitals (2010–2014), the original OSS included only 16% of all 3,337 CPT codes present in VA sample. We hypothesized that the relationships inherent within CPT as manifest in the UMLS permits semi-automated expansion of the original OSS ratings to other conceptually related CPT codes (e.g., anatomically and physiologically similar procedures).
Data Sources
We used two data sources to systematically expand the original OSS. First, all 2018 English UMLS sources from the standard UMLS distribution files, freely accessible by any individual requesting a UMLS Terminology Services Account (https://uts.nlm.nih.gov//license.html),1 including the 2018 American Medical Association curated Category I CPT 8. Second, the validated original OSS CPT code rating.6,7
We validated the added OSS among procedures included in the Veterans Affairs Surgical Quality Improvement Program (VASQIP; 2015–2018) containing CPT code-defined operative procedures and postoperative outcomes.9,10 Our use of VASQIP data was reviewed by the VA Pittsburgh Healthcare System institutional review board and determined to be exempt (PRO3670).
UMLS Semantic graph
The UMLS organizes concepts, codes, and terms from source vocabularies into a unified human-curated network of conceptual relationships. Individual sources may be non-hierarchical or hierarchical but ultimately result in a single unified UMLS graph of the combined Metathesaurus and Semantic Network. This network is rendered in a vast series of tables with rows of data consisting of, 1) vocabulary source code which is the smallest unit within the UMLS, 2) vocabulary source term which is a text string that corresponds to the code, 3) UMLSConcept defined by Concept Unique Identifiers (CUI), 4) UMLS Concept preferred term, and 5) UMLS semantic type (Table 1).5
The UMLS has limitless, multi-directional connectivity. We took the tables of data and depicted a simplified, hierarchical conceptual model of underlying UMLS relationships between codes, terms, concepts, and semantic types for the diagnosis and treatment of appendicitis (Figure 1). Each source code can be conceptualized, per the UMLS-Graph network, as a terminal branch with terms representing leaves on a tree. As with most codes, the CPT 44960 code has multiple synonymous terms (i.e., leaves), including “Appendectomy”, “APPENDEC RPTD APPENDIX ABSC/PRITONITIS”, and “Removal of ruptured infected appendix”. For simplicity, the UMLS also defines, a single preferred term for each concept. For example, there are related codes both within (CPT 44950) and across vocabularies (SNOMEDCT 80146002, ICD9 47.0), each connected by a common UMLS concept (CUI C0003611 termed “Appendectomy”). Other related concepts (e.g., CUI C0003611 termed “Laparoscopic Appendectomy”), further aggregate into broader concepts. Each concept then consolidates into at least one semantic type. Here, the “Appendectomy” terms, codes, and concepts all emanate from the “Therapeutic or Prevention Procedure” left main branch of the tree. The treatment of appendicitis however requires a diagnosis, often achieved with imaging studies. Therefore, on the right side of the tree the “Diagnostic Procedures” expand into the associated “CT of abdomen and pelvis” leaves.
To expose the relationships between terms within and across vocabularies and facilitate research based on the UMLS, senior author (JS) conceptualized the complexity of this network into a property graph. The UMLS-Graph backend is manifest as a single application programming interface built upon the Neo4j Graph Database Platform (Neo4j, Inc.) that passes Cypher queries. All core relationships (e.g., semantic type-concept, concept-code, code-term, etc.) are included creating a directed property graph allowing for unified query and extraction of all data elements and the pathes between them.
In summary, the UMLS-Graph connects data elements commonly used in clinical research (e.g., CPT 44950, SNOMEDCT 80146002, and ICD9 47.0) which themselves are in separate vocabularies (CPT, SNOMEDCT, ICD9). In the eyes of the computer, each code is completely unrelated (44950, 80146002, 47.0). The UMLS allows these data elements to have a common definition, linking the clinical researcher’s concept (e.g., “Appendectomy”) with all related data elements for the computer (i.e., CUI C0003611).
CPT code Hierarchy
The CPT code 5-digit numbers ascend from lowest to highest numerically and are organized within an indexed codebook with similar procedures numerically clustered and unrelated procedures numerically distanced. As a result, the CPT codes for an appendectomy include 44950, 44960, and 44970 and an above knee amputation include 27590, 27591, 27592, 27594, and 27596. However, precise relationships between CPT codes are not apparent in the CPT codebook. For example, surgeons understand that a laparoscopic appendectomy and open appendectomy are two separate, but related procedures. Within the CPT coding system, these two procedures are defined by codes with similar numbers (i.e., 44950 and 44970). However, transrectal drainage of a pelvic abscess is an unrelated procedure with a numerically similar code (i.e., 45000). As such, the UMLS-Graph organizes digitally functional data elements according to human curated concepts, informed by experience in both clinical care and health services research. Thereby the UMLS-Graph allows extraction of the underlying hierarchy of CPT codes that systematically and explicitly clusters related procedures according to their UMLS Concepts and Semantic Types.
We identified all Category I CPT codes as terminal branches with corresponding terms as leaves in the UMLS-Graph. We then queried UMLS-Graph to define the shortest path (i.e., shortest hierarchical sequence of UMLS Concepts and Semantic Types) from the terminal branch to the single main trunk representing the CPT hierarchical term: “Surgery”. CPT codes (children) sharing their most distal branch (parent) were defined as siblings. CPT codes sharing a common branch one proximal to their most distal branch (grandparent) were defined as cousins.
OSS Expansion
The CPT code hierarchy inherent to the UMLS-Graph was exported as a comma separated file for conceptualization and application of the interconnections in a single table of data in wide format. Each row contained a unique 5-digit CPT code (terminal branch) and term (leaves) in its last column with preceding columns including all increasingly broad UMLS Concepts (proximal branches of the CPT code explicit hierarchy). These data were then imported to Excel (Microsoft, Redmond, WA) for creation of the OSS Expansion Tool. In Excel, we used conditional formatting to create visual cues indicating sequential branching and therefore CPT code relatedness within the hierarchy – leaves on a common branch, consolidating branches, and main branches consolidating to a common trunk (eTable 1). We then identified and labeled each of the 565 original OSS rated CPT codes, preserving their location in the hierarchy.
All Category I CPT codes were eligible for expansion. We excluded CPT codes without minimal relatedness to an originally rated OSS including, 1) codes without any originally OSS rated CPT codes consolidating into the main branch, 2) CPT codes without any siblings in the secondary terminal branch.
Three of the separate surgical experts (DH, PS, RS) who led the original Delphi-generated OSS rating, independently reviewed the CPT codes eligible for expansion with the OSS Expansion Tool. Each reviewer added an OSS rating to siblings CPT codes sharing a common terminal branch and/or up the tree hierarchy to first or second cousins. Reviewers’ ratings were then assessed for agreement and reconciled in a modified Delphi consensus method, mirroring the original methodology.6 Unanimously rated codes were assigned (n=1,853). Disagreements within 1 OSS unit and only among one rater (n=461) were reviewed by the minority rater for concession with rate assignment (n=349) or continued disagreement (n=112). If the minority rater did not concede, those ratings in combination with disparate ratings (i.e., all three raters disagreed or >1 units apart, n=29) were discussed via teleconference until unanimity was reached.
To establish OSS ratings in regions of the CPT hierarchy excluded from the original consensus rating, 10 additional CPT codes were rated in accordance with anatomic and physiologic principals using the methods described above.
OSS Expansion – UMLS-Graph Use Case Validation
We included all cases in VASQIP and excluded those with principal procedure CPT code in Category III (emerging technology). We categorized cases by those with an i) original OSS or ii) added OSS rating which together comprise the expanded OSS and excluded cases with no rating. To understand patient characteristics, we compared baseline demographics, preoperative risk factors, and intraoperative data. Postoperative 30-day mortality and complications were stratified by the Risk Analysis Index (RAI), a validated measure of patient frailty independently predicting postoperative morbidity and mortality independent of and synergistically with OSS.6,11–13
Results
The UMLS-Graph, Graphing UMLS Enables Search in Dynamic Trees (https://guesdt.com/). The semantic property graph includes 5,051,942 codes, 8,752,504 terms, 4,376,259 concepts, and 127 semantic types with a total of 49,172,202 relationships across 122 biomedical vocabularies. An example query within the UMLS-Graph includes searching the code “CPT 44970” which returns five terms: “LAPAROSCOPY APPENDECTOMY”, “Laparoscopy, surgical appendectomy”, “removal of appendix using an endoscope”, “Surgical laparoscopy with appendectomy”, and the preferred term “LAPAROSCOPIC APPENDECTOMY”. When inputting the “CPT 44970” code into the UMLS-Graph, the graph traverses synonymous concepts for codes across ontologies (SNOMED, ICD9, etc.) and within (CPT) ontologies, for example branching to CPT 44970’s parent and concept (CUI C0372525). A subset of the UMLS-Graph schematic is in eFigure 1.
CPT code Hierarchy and OSS expansion
All 5,752 terminal leaf CPT codes in the UMLS-Graph are of the “Treatment and Prevention Procedure” Semantic Type (Figure 2A). The main trunk bifurcated into 19 main branches defining broad concepts (i.e., “Surgical Procedures of the Digestive System”) with branches splitting up to five additional times before reaching the terminal leaves. The 19 branches split into 102 secondary (CPT code, n=5,741), 534 tertiary (CPT code, n=5,650), 1,136 quaternary (CPT code, n=3,692), 169 quinary (CPT code, n=670), and 37 senary (CPT code, n=76) additional conceptually more specific branches.
Six of the 19 main branches, containing 356 CPT codes, had no originally rated OSS CPT codes. Four secondary branches included only one CPT code. Therefore, of the 5,187 CPT codes without an original OSS, 4,827 CPT codes (93%) met criteria for potential primary expansion which included 13 main branches, 86 secondary, 486 tertiary, 1,050 quaternary, 119 quinary, and 35 senary subsequent branches (Table 2). Within and beyond the second branch of the tree, 1,049 CPT codes and 2,619 CPT codes had at least one original OSS rated CPT code sibling or cousin.
Table 2.
Branches (no. branches) | |||||||
---|---|---|---|---|---|---|---|
Main* (13) | Second* (86) | Third (486) | Fourth (1,050) | Fifth (119) | Sixth (35) | ||
CPT code, no. (%) | |||||||
Terminal branch | 0 | 43 (1%) | 1,659 (34%) | 2,642 (55%) | 418 (9%) | 65 (1%) | |
OSS-Rated siblings? | 0 | 10 (1%) | 622 (59%) | 329 (31%) | 81 (8%) | 7 (1%) | |
OSS-Rated cousins? | 0 | 10 (1%) | 1,416 (54%) | 1,028 (39%) | 126 (5%) | 39 (1%) | |
Main branch system description || , no. CPT code per branch (%) # | |||||||
Auditory | 92 (1.9%) | 92 (1.9%) | 88 (1.8%) | 39 (1.2%) | 0 | 0 | |
Cardiovascular | 660 (13.7%) | 660 (13.7%) | 660 (13.8%) | 500 (16.0%) | 173 (35.8%) | 17 (26%) | |
Digestive | 757 (15.7%) | 757 (15.7%) | 757 (15.8%) | 464 (14.8%) | 80 (16.6%) | 1 (2%) | |
Endocrine | 19 (0.4%) | 19 (0.4%) | 18 (0.4%) | 12 (0.4%) | 0 | 0 | |
Female Genital | 207 (4.3%) | 207 (4.3%) | 202 (4.2%) | 89 (2.8%) | 9 (1.9%) | ||
Hemic and Lymphatic | 57 (1.2%) | 57 (1.2%) | 56 (1.2%) | 32 (1.0%) | 0 | 0 | |
Integumentary | 367 (7.6%) | 367 (7.6%) | 348 (7.3%) | 279 (8.9%) | 37 (7.7%) | 31 (48%) | |
Male Genital | 126 (2.6%) | 126 (2.6%) | 124 (2.6%) | 45 (1.4%) | 0 | 0 | |
Mediastinum and Diaphragm | 14 (0.3%) | 14 (0.3%) | 14 (0.3%) | 5 (0.2%) | 0 | 0 | |
Musculoskeletal | 1,460 (30.2%) | 1,460 (30.2%) | 1,450 (30.3%) | 903 (28.9%) | 42 (8.7%) | 0 | |
Nervous | 494 (10.2%) | 494 (10.2%) | 494 (10.3%) | 403 (12.9%) | 108 (22.4%) | 16 (25%) | |
Respiratory | 285 (5.9%) | 285 (5.9%) | 285 (6.0%) | 196 (6.3%) | 0 | 0 | |
Urinary | 289 (6.0%) | 289 (6.0%) | 288 (6.0%) | 158 (5.1%) | 34 (7.0%) | 0 |
All CPT codes consolidating to a main branches which did not contain an originally rated CPT code were excluded from the primary expansion (n=356). All CPT codes terminating in the second branch without any siblings, independent of original OSS rating, were excluded (n=4).
CPT code relatedness is defined only in the most distal, and therefore contextually specific, available branch.
Main branches with CPT codes excluded from primary expansion include: Fine needle aspiration (CPT code, n=2), Intersex surgery
(CPT code, n=2), Operating microscope procedures (CPT code, n=1), Reproductive system procedures (CPT code, n=1), Maternity care (CPT code, n=64), Eye and ocular (CPT code, n=286).
Includes any CPT code consolidating onto each branch, not just solely the CPT codes at their most terminal branches. For example, there are 92 CPT codes which consolidate onto the main Auditory branch. Of which, all traverse a second branch, 88 traverse and 49 terminate on a third branch, the remaining 39 terminate at a fourth branch.
Abbreviations: OSS, Operative Stress Score; CPT, Current Procedural Terminology.
The conceptualized tree and associated relationships between CPT codes generated nine columns in the OSS Expansion Tool: (1) the main branch, (2–6) potential distal branches, (7) the CPT code and (8) Term leaf, (9) original OSS ratings when applicable. As conceptually depicted in Figure 2b, surgical experts reviewed each terminal leaf procedure term and the original OSS of related CPT codes. As appropriate, the OSS ratings were expanded to siblings (e.g., procedures of the appendix) or to siblings and cousins (e.g., procedures of the pancreas). CPT code relatedness guided the expansion, yet related codes were not automatically or blindly applied. For example, consider sibling CPT codes describing the primary laparoscopic repair of a reducible (CPT 49654) and incarcerated or strangulated (CPT 49655) incisional hernia. In the original OSS rating, reducible repairs were low (OSS2) and incarcerated repairs moderate (OSS3) stress procedures. Similarly, the rating for laparoscopic repair of a recurrent incarcerated or strangulated incisional hernia (CPT 49657) was given an expansion rating of moderate (OSS3) above that of the originally rated, low stress (OSS2) sibling procedure describing a recurrent reducible repair (CPT 49656).
Guided by the OSS Expansion Tool, surgeon expertise, and formal rating review, the final OSS expansion was applied to 1,768 additional CPT codes, of which 775 had siblings and 1,349 had cousins with an original OSS rated CPT code. Ten additional CPT codes were selected for rating by modified Delphi process by surgical experts without using the OSS Expansion Tool describing unlisted procedures of the stomach (CPT code, 43999) and pancreas (CPT code, 48999) as well as ectopic pregnancy (CPT code, 59120, 59121, 59130, 59135, 59136, 59140, 59150, and 59151). Of the added 1,778 codes, 23% (OSS1, n=409) of procedures were very low, 36% (OSS2, n=641) low, 31% (OSS3, n=548) moderate, 8% (OSS4, n=146) higher, and 2% (OSS5, n=34) very high operative stress.
Validation of the Expanded OSS
VASQIP comprised a total of 886,823 cases, of which 786,122 (89%) cases were captured by the original OSS and 71,455 (8%) cases by the added (eFigure 2) which includes 562 (99%) of the original and 848 (48%) of the added CPT codes with OSS scores. The expanded OSS therefore captures 857,577 (97%) cases (mean age, 63 years [standard deviation (SD), 11]; 91% male; 25% non-White; mean RAI, 24 [SD, 7]). Compared to original OSS cases, added OSS cases included more females (14% vs 9%), and frail patients (RAI 30, 25% vs 19%) undergoing more stressful procedures (OSS4 and 5, 11.4% vs 8.2%; all P<.001). Accordingly, added OSS cases include a greater proportion of vascular, neurologic, otolaryngologic, gynecologic, plastic, and thoracic with a subsequent decrease in the proportion of the general and orthopedic surgery cases (Table 3).
Table 3.
Variable | Expanded OSS | Original OSS | Added OSS | |
---|---|---|---|---|
n=857,577 | n=745,142 | n=14,819 | ||
Age, years, mean (SD) | 64 (11) | 64 (11) | 62 (12) | |
Male sex, no. (%) | 777,912 (91%) | 716,507 (91%) | 61,405 (86%) | |
Race, no. (%) | ||||
White | 612,332 (75%) | 563,330 (76%) | 49,002 (73%) | |
Black | 152,797 (19%) | 138,573 (19%) | 14,224 (21%) | |
Asian or Pacific Islander | 3,420 (<1%) | 3,015 (<1%) | 405 (1%) | |
Other | 43,272 (5%) | 39,743 (5%) | 3,529 (5%) | |
Ethnicity, no. (%) | ||||
Not Hispanic | 784,371 (93.2%) | 720,100 (93.3%) | 64,271 (92.1%) | |
Hispanic | 45,307 (5.4%) | 40,976 (5.3%) | 4,331 (6.2%) | |
Risk Analysis Index, no. (%) | ||||
Robust (≤20) | 213,859 (27.3%) | 194,235 (26.9%) | 19,624 (30.6%) | |
Normal (21–29) | 416,963 (53.1%) | 388,675 (53.9%) | 28,288 (44.2%) | |
Frail (30–39) | 123,416 (15.7%) | 110,942 (15.4%) | 12,474 (19.5%) | |
Very frail (≥40) | 30,556 (3.9%) | 26,876 (3.7%) | 3,680 (5.7%) | |
American Society of Anesthesiologists Classification, no. (%) | ||||
1 | 4,595 (0.5%) | 4,020 (0.5%) | 575 (0.8%) | |
2 | 134,518 (15.7%) | 124,584 (15.9%) | 9,934 (14.0%) | |
3 | 599,296 (69.9%) | 552,123 (70.3%) | 47,173 (66.3%) | |
4 | 116,546 (13.6%) | 103,296 (13.1%) | 13,250 (18.6%) | |
5 | 2,090 (0.2%) | 1,862 (0.2%) | 228 (0.3%) | |
Operative Stress Score, no. (%) | ||||
1 | 44,778 (5.2%) | 39,842 (5.1%) | 4,936 (6.9%) | |
2 | 377,023 (44.0%) | 343,954 (43.8%) | 33,069 (46.3%) | |
3 | 363,320 (42.4%) | 337,986 (43.0%) | 25,334 (35.5%) | |
4 | 64,562 (7.5%) | 57,644 (7.3%) | 6,918 (9.7%) | |
5 | 7,894 (0.9%) | 6,696 (0.9%) | 1,198 (1.7%) | |
Procedure type, no. (%) | ||||
General | 225,105 (26.5%) | 213,605 (27.4%) | 11,500 (16.3%) | |
Orthopedic | 249,714 (29.4%) | 237,460 (30.4%) | 12,254 (17.4%) | |
Vascular | 127,549 (15.0%) | 109,198 (14.0%) | 18,351 (26.0%) | |
Gynecologic | 12,528 (1.5%) | 8,795 (1.1%) | 3,733 (5.3%) | |
Urologic | 99,899 (11.7%) | 94,525 (12.1%) | 5,374 (7.6%) | |
Neurologic | 66,068 (7.8%) | 58,809 (7.5%) | 7,259 (10.3%) | |
Otolaryngologic | 23,967 (2.8%) | 20,447 (2.6%) | 3,520 (5.0%) | |
Plastics | 15,948 (1.9%) | 11,357 (1.5%) | 4,591 (6.5%) | |
Thoracic | 29,900 (3.5%) | 25,865 (3.3%) | 4,035 (5.7%) | |
30-day postoperative outcome, no. (%) | ||||
Any complication | 65,313 (7.6%) | 57,913 (7.4%) | 7,400 (10.4%) | |
Mortality | 9,658 (1.1%) | 8,432 (1.1%) | 1,226 (1.7%) |
Abbreviations: no.: Number; SD: standard deviation; OSS, Operative Stress Score.
Overall, postoperative mortality increased with higher levels of operative stress. Very low stress procedures (OSS1) had <0.5% (original, 0.4% [95%CI, 0.4–0.5%] vs added, 0.9% [95% CI, 0.6–1.2%]) and very high (OSS5) 3.8% (original, 3.5% [95%CI, 3.0–4.0%] vs added, 5.8% [95%CI, 4.6–7.3%]) mortality rate (eTable 3). Further, when stratifying by the OSS and patient frailty, the risk of postoperative mortality and complications in the added ratings mirrored the original ratings (Figure 3).
Discussion
We utilized the network inherent in UMLS’s data to create an application program interface to graphically relate over 100 biomedical vocabularies. In this first use case, we used the UMLS-Graph to formalize the relatedness of CPT codes into a single, user-friendly OSS Expansion Tool. We then expanded the OSS rating from the original 565 CPT codes to an additional 1,778 additional CPT codes, together the 2,343 expanded ratings covered 97% of cases included in VASQIP. Cases included by the original, added, and full expanded OSS rating continued to demonstrate that increasing operative stress correlated with increasing postoperative morbidity and mortality.
The relationships defined by the UMLS-Graph were exploited to generate a hierarchical network of CPT codes. Placing the OSS codes assigned by more labor-intensive modified Delphi process into the framework allowed for a systematic and efficient expansion of the OSS requiring smaller numbers of experts.14 In review of the hierarchy and OSS, a purely automated expansion would be fraught with error. Therefore, the applicability of sibling, parent, and cousin relationships required interpretation by surgical experts. Supported by the conditionally formatted, visual cues within the OSS Expansion Tool the expansion process was accelerated while utilizing the knowledge gained from the original OSS modified Delphi consensus method.
The CPT hierarchy and associated OSS Expansion Tool might assist other health services researchers to improve both simple and complex inefficiencies by facilitating rapid identification, review and evaluation of CPT codes grouped according to clinical diagnoses (i.e., Appendicitis) and specific anatomical systems (i.e., Digestive). Further, the known regional differences in coding preferences, a frequently sited limitation of national administrative data, can be systematically evaluated controlling for defined relatedness within the UMLS-Graph and associated CPT Hierarchy.
The original 565 CPT codes captured approximately 90% of procedures completed in the VA healthcare system (2010–2014). In the more recent (2015–2108), VASQIP dataset, the original OSS captured 97% of surgical cases. The OSS expansion increased the number of rated CPT codes by >400% allowing the physiologic stress of a surgical intervention to be rated for >99% of VASQIP. Although we may have functionally reached an inclusion asymptote in VASQIP, 40% of OSS rated CPT codes were not included as principal procedures in this primarily male cohort. In a private sector cohort including more females, the OSS expansion has been recently validated in the American College of Surgeons National Surgical Quality Improvement Program demonstrating mortality outcome differences between men and women controlling for multiple factors including physiologically-induced operative stress using the UMLS-based OSS expansion.15
Medicare claims are determined by multiple code sets including both CPT, ICD, and Healthcare Common Procedure Coding System to quantify reimbursement.16 Therefore, all patient care activities generate a potentially vast combination of billing codes in addition to an ever-growing list of healthcare related structured and unstructured data included within the UMLS.17 In this use case, we employed the UMLS-Graph to create the OSS Expansion Tool table specific to relationships between CPT codes. Investigators commonly face seemingly simple, but at times, insurmountable data management issues. The UMLS-Graph quickly and accurately managed a cross walk between administrative codes across two vocabularies for this work. Other commonly used tools, such as the Elixhauser and Charlson Comorbidity Indices could be easily crosswalked to existing biomedical vocabularies or expanded to include evolving and emerging comorbid conditions (e.g., COVID-19).
The analysis presented here was limited to the CPT vocabulary, and it remains to be seen if the UMLS-Graph can advance surgical research using other datasets and vocabularies. However, the potential applications for the UMLS-Graph are limited only by the imagination of investigators and their ability to conceptualize the complex nature of the UMLS network. For example, a more translational illustration of the utility of the UMLS-Graph is its ability to expose relationships between anatomy, diagnostic syndromes and specific genetic mutations. Anatomic locations can be queried with the Foundation Model of Anatomy vocabulary and linked to associated phenotypic and genetic abnormalities in the Online Mendelian Inheritance in Man vocabulary. For example, searching for genetic and phenotypic abnormalities pertinent to the liver generated 28 phenotypic and 27 genotypic liver related somatic mutations for systematic exploration. The results of this query are available in eTable 5 and include numerous glycogen storage, tumor and metabolic, and familial hypercholesteremia related diseases and genes. Investigators interested in exploring the UMLS-Graph may can access it freely on Github (https://github.com/dbmi-pitt/UMLS-Graph), are encouraged to use it expose the ever expanding and updated relationships within the UMLS.18
Conclusion
In this first use case, we created a UMLS-Graph to systematically and efficiently expand the OSS by >400% and validated the accuracy of our added ratings. We have made available an application program interface for conceptualization of the UMLS-Graph to allow other investigators to harness the power within the UMLS for clinical research and data science applications.
Supplementary Material
Highlights:
Inherent hierarchical relationships in the UMLS are underutilized
We graphically assembled into the UMLS-Graph
The UMLS-Graph allowed for expansion of the Operative Stress Score by >400%
The powerful UMLS-Graph can be used to similarly accelerate other clinical research
Funding/Support
This research was supported in part by the US Department of Veterans Affairs; the Veterans Health Administration; the Office of Research and Development; grants I21 HX-002345 and XVA 72-909 (Hall); L30 AG064730 (Reitz) from the National Institute on Aging, NIH; grant K12CA090625 (Shinall) from the National Cancer Institute, NIH; grant U01 TR002393 (Hall, Silverstein and Shireman) from the National Center for Advancing Translational Sciences and the Office of the Director, NIH; grant 5T32HL0098036 (Reitz) from the National Heart, Lung, and Blood Institute. These funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit for publication.
Footnotes
COI/Disclosures
No authors report disclosures, conflict of interest or relevant financial interests related to the content of the manuscript. The opinions expressed here are those of the authors and do not necessarily reflect the position of the Department of Veterans Affairs or the US government.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Barnett O. Computers in Medicine. JAMA. 1990;263(19). [PubMed] [Google Scholar]
- 2.Bodenreider O. The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Res. 2004;32:267–270. doi: 10.1093/nar/gkh061 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Humphreys BL, Del Fiol G, Xu H. The UMLS knowledge sources at 30: indispensable to current research and applications in biomedical informatics. J Am Med Inform Assoc. 2020;27(10):1499–1501. doi: 10.1093/jamia/ocaa208 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Adamusiak T, Shimoyama N, Shimoyama M. Next generation phenotyping using the unified medical language system. J Med Internet Res. 2014;16(3):1–24. doi: 10.2196/medinform.3172 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.National Library of Medicine. Metathesaurus. In: UMLS Reference Manual [Internet]. Bethesda; 2009. https://www.ncbi.nlm.nih.gov/books/NBK9684/. [Google Scholar]
- 6.Shinall MC, Arya S, Youk A, et al. Association of Preoperative Patient Frailty and Operative Stress with Postoperative Mortality. JAMA Surg. 2019;15213(1):1–9. doi: 10.1001/jamasurg.2019.4620 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Shinall MC, Youk A, Massarweh NN, et al. Association of Preoperative Frailty and Operative Stress With Mortality After Elective vs Emergency Surgery. JAMA Netw Open. 2020;3(7):10–13. doi: 10.1001/jamanetworkopen.2020.10358 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dotson P. CPT ® Codes: What Are They, Why Are They Necessary, and How Are They Developed? Adv Wound Care. 2013;2(10):583–587. doi: 10.1089/wound.2013.0483 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Massarweh NN, Kaji AH, Itani KMF. Practical guide to surgical data sets: Veterans Affairs Surgical Quality Improvement Program (VASQIP). JAMA Surg. 2018;153(8):768–769. doi: 10.1001/jamasurg.2018.0504 [DOI] [PubMed] [Google Scholar]
- 10.Shiloach M, Frencher SK, Steeger JE, et al. Toward Robust Information: Data Quality and Inter-Rater Reliability in the American College of Surgeons National Surgical Quality Improvement Program. J Am Coll Surg. 2010;210(1):6–16. doi: 10.1016/j.jamcollsurg.2009.09.031 [DOI] [PubMed] [Google Scholar]
- 11.Hall DE, Arya S, Schmid KK, et al. Development and initial validation of the Risk Analysis Index for measuring frailty in surgical populations. JAMA Surg. 2017;152(2):175–182. doi: 10.1001/jamasurg.2016.4202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Arya S, Varley P, Youk A, et al. Recalibration and External Validation of the Risk Analysis Index. Ann Surg. 2019. doi: 10.1097/sla.0000000000003276 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.George EL, Hall DE, Youk A, et al. Association between Patient Frailty and Postoperative Mortality across Multiple Noncardiac Surgical Specialties. JAMA Surg. 2020;94305:1–9. doi: 10.1001/jamasurg.2020.5152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.McCray AT. The UMLS Semantic Network. Proc - Annu Symp Comput Appl Med Care. 1989:503–507. [Google Scholar]
- 15.Yan Q, Kim J, Hall D, et al. Association of Frailty and the Expanded Operative Stress Score with Preoperative Acute Serious Conditions, Complications and Mortality in Males Compared to Females – A Retrospective Observational Study. Ann Surg. 2021;In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Centers for Medicare and Medicaid. Fact Sheet: ICD-10-CM, ICD-10-PCD, CPT, and HCPCS Code Sets. Vol MLN900943.; 2020. [Google Scholar]
- 17.Milinovich A, Kattan MW. Extracting and utilizing electronic health data from Epic for research. Ann Transl Med. 2018;6(3):42–42. doi: 10.21037/atm.2018.01.13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bertaud V, Lasbleiz J, Mougin F, Burgun A, Duvauferrier R. A unified representation of findings in clinical radiology using the UMLS and DICOM. Int J Med Inform. 2008;77(9):621–629. doi: 10.1016/j.ijmedinf.2007.11.003 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.