Key Points
Question
Are billing codes in electronic health records from single health care systems valid metrics for counting an individual’s total number of skin cancers?
Findings
In this single-center cohort study that included 6307 patients, there was a strong linear correlation between Current Procedural Terminology codes and the number of histologically verified skin cancers (r > 0.8 in all models).
Meaning
Current Procedural Terminology codes correlate strongly with skin cancer counts but do not capture all histologically verified skin cancers.
Abstract
Importance
Patients can develop multiple skin cancers, and their medical data can be spread over multiple health care systems. This fragmented care, combined with the lack of skin cancer registries, has limited our ability both to provide accurate estimates of incidence and to study the pathogenesis of multiple skin cancers.
Objective
To assess whether standard diagnostic and procedural codes present in the electronic health records at a single health care system are a valid proxy for estimating the number of overall skin cancers.
Design, Setting, and Participants
Retrospective cohort study of patients seen at a single-center tertiary care hospital (ie, Vanderbilt University Medical Center) between July 1, 2008, and June 30, 2018. All patients with at least 1 electronic health record–based diagnostic or procedural code for any skin cancer and at least 1 pathology report of a skin cancer.
Exposure
The number of International Classification of Disease (ICD) or Current Procedural Terminology (CPT) codes relating to skin cancer.
Main Outcomes and Measures
Pearson correlation coefficient and R2 were calculated for the total number of ICD or CPT codes for skin cancer and histologically verified skin cancers.
Results
In this cohort study of 35 901 patients, the mean (SD) age was 70.4 (14.4) years, 20 404 (56.8%) were men, and 31 623 (88.1%) were White individuals. Of these patients, 6307 had at least 1 ICD or CPT code or pathology report for a skin cancer, of whom 5688 patients had both a CPT code related to skin malignancy and a histologically verified skin cancer. There was a strong linear correlation between the number of CPT codes and pathology records (r = 0.87). There was a poor correlation between the number of ICD codes and pathology records (r = 0.22).
Conclusions and Relevance
This cohort study found that the use of ICD codes was a poor proxy measure for the number of skin cancers per patient. In contrast, CPT codes accounted for more than 75% of the variability in the number of skin cancers (R2 = 0.76) and were a better proxy measure for the total number of skin cancers per patient.
This cohort study assesses whether standard diagnostic and procedural codes present in the electronic health records (EHRs) at a single health care system are a valid proxy for estimating the number of overall skin cancers.
Introduction
Skin cancers, particularly keratinocyte cancers, are the most common malignant neoplasms among White populations and are often not included in cancer registries.1,2 When data are available, most studies have focused on identifying factors associated with the development of any skin cancer, but many patients develop numerous skin cancers, often with medical records scattered across different health care systems. The number of keratinocyte cancers (hereafter referred to as counts) are clinically important; not only may each one grow and metastasize, but having multiple skin cancers is associated with an increased risk for some internal malignant neoplasms and may be a clinical marker that determines which transplant recipients will respond to skin cancer prevention treatments.3,4 Most cancer registries do not track the number of individual cancers a patient develops, which limits our capacity to learn about the incidence, outcomes, and pathogenesis of the development of multiple skin cancers.
Administrative databases and electronic health records (EHRs) may help address these limitations. International Classification of Disease (ICD), Current Procedural Terminology (CPT), and Systematized Nomenclature of Medicine (SNOMED) codes can be used to identify individuals with a history of skin cancer.5,6,7 The most widely cited estimates of keratinocyte cancer incidence used CPT codes as a proxy for the total number of skin cancers per person obtained from Medicare data. However, this measure has not been validated in the EHRs from an individual health care system.1,2
A benefit of administrative databases is that they contain the complete record of billing codes for an individual, but they lack other pertinent clinical information, which may limit their usefulness. Electronic health records contain highly granular clinical data, but only for encounters within a specific health care system, which may limit their completeness. Although previous studies have examined the predictive value of having any skin cancer code on identifying skin cancer cases, there is a dearth of information on the association of the number of CPT and ICD codes with the true number of skin cancers. We aimed to assess whether common diagnostic and procedural codes present in the EHR can be used to estimate the overall number of skin cancers for an individual patient.
Methods
Study Design
Following approval by the institutional review board of Vanderbilt University Medical Center (VUMC), we performed a retrospective cohort study examining the association of ICD and CPT codes related to skin cancer diagnosis and treatment in the EHRs with the overall number of histologically verified skin cancers for the individual patient. A waiver of consent was granted because the study involved existing data that had been collected as part of routine clinical practice and was considered minimal risk. The goal of the study was to count the total number of skin cancers per patient. The primary outcome was the number of histologically verified skin cancers as identified in pathology reports. The primary exposure was the number of ICD and CPT codes for skin cancer diagnosis and treatment. CPT codes for the initial diagnostic biopsy and additional Mohs stages and ICD codes for the history of skin cancer were not included in this count.
Our inclusion criteria were based on the overlap of 2 sets of patients (Figure 1). First, we used VUMC’s clinical database, the Research Derivative, to identify all patients seen between July 1, 2008, and June 30, 2018, who had an ICD or CPT code for skin cancer.1,2,8 Then, we used our clinical pathology database to identify any patient seen at VUMC during this same period and had at least 1 pathology report showing a skin cancer. We separated our sample into 3 groups based on the presence of CPT codes and pathology records.
VUMC Research Derivative
The Research Derivative contains more than 4 million patients seen at VUMC and contains identified clinical data from the EHRs, including diagnostic and procedure codes (ie, International Classification of Diseases, Ninth Revision [ICD-9], International Statistical Classification of Diseases and Related Health Problems, Tenth Revision [ICD-10], and CPT codes), demographic characteristics, text from notes, laboratory values, radiology reports, and orders.8 Dermatopathology records accessioned after August 16, 2005, are not included in the Research Derivative. We used the Research Derivative to identify patients with any ICD or CPT code for skin cancer (eTables 1 and 2 in the Supplement).
Pathology Records
All dermatopathology records between July 1, 2008, and June 30, 2018, were collected in PDF format, and the diagnoses were abstracted using natural language processing. For billing purposes, the dermatopathology services are affiliated with Pathology Consultants of America, even though the dermatopathologists and all of the slides are physically housed within the VUMC Department of Dermatology. This relationship is set up so that the pathology reports are transmitted electronically by Pathology Consultants of America and entered into the EHRs as unformatted PDFs rather than formatted, searchable data.
Cases that listed a diagnosis of any type of skin cancer were included for review. Cases in which skin cancer could not be excluded (eg, actinic keratosis with involved deep margin as well as all dysplastic nevi) were treated as not being cancers. We included only primary biopsies but not excisions, to avoid double-counting individual lesions, and we used natural language processing to remove excisions, referring to prior accession numbers. All remaining records mentioning “excision” or “exc” were reviewed by hand (L.W.). Records listing “disk excision,” “punch excision,” or “shave excision” were counted as primary biopsies if they did not refer to a previous accession number.
Patient Groups
We anticipated that there would be 3 distinct groups of patients. Group 1 would be composed of the internal VUMC patients who had more than 1 CPT code and more than 1 pathology report of skin cancer. Group 2 would be composed of outside patients whose pathology is read at VUMC and who would have more than 1 pathology report, more than 1 ICD code, but zero CPT codes relating to skin cancer. Group 3 would be composed of outside patients referred to VUMC for Mohs surgery or subspecialty oncology management with more than 1 ICD code and more than 1 CPT code. Our analyses focused on group 1.
From group 3, we selected a random sample of 100 patients and reviewed these patients’ EHRs by hand to confirm the number of destructions, excisions, and Mohs surgery cases performed as well as to search for outside pathology records scanned into the EHR. Our main analysis for this group was to assess the correlation between the number of CPT codes and the number of histologically verified skin cancers through EHR review. We additionally reviewed the EHR for patients with more than 500 ICD codes to determine the reason for such high code counts.
Statistical Analyses
We calculated Pearson r and R2 values for the linear association between the total number of ICD or CPT codes and the number of histologically verified skin cancer reports. Secondary analyses examined the correlation for each type of skin cancer and for only those instances in which an ICD or CPT code were entered on the same day. The number of skin cancers documented by the available pathology reports was considered the gold standard count. We considered r > 0.7 to be a strong linear correlation and R2 > 0.7 to be an acceptable proxy measure.9 We constructed Bland-Altman plots to determine the limits of agreement and to examine the differences between measures, and we considered differences of less than 30% to be acceptable.10 We calculated the positive predictive value (PPV) of having at least 1 histologically verified skin cancer of a specific subtype if the patient had an ICD code related to that subtype or if the patient had a same-day CPT or ICD code of that subtype. A 2-sided P < .05 was considered statistically significant. All statistical analyses were performed using R, version 4.0.2.11
Results
Pooled Analyses
In the Research Derivative, there were 35 901 patients with at least 1 skin cancer ICD code, 24 981 patients with at least 1 skin cancer CPT code, and 20 483 patients with at least 1 of each during the study period. Of these, there were 6307 patients with at least 1 pathology record for skin cancer and 1 clinical code relating to skin cancer, for a total of 29 041 biopsy-confirmed skin cancers and 130 285 clinical codes. Of the 6307 patients, 5688 had at least 1 CPT code and at least 1 confirmed skin cancer (group 1), whereas the remaining 619 patients had at least 1 ICD code, at least 1 confirmed skin cancer, and no CPT codes (group 2). The patients in group 3 tended to be older than the patients in group 1 and group 2 (mean [SD] age, 70.6 [14.8] years in group 3 vs 69.8 [12.5] years in group 1 and 65.5 [13.6] years in group 2; P < .001), and a higher proportion were male patients in group 1 compared with groups 2 and 3 (3426 of 5688 patients [60.2%] in group 1 vs 267 of 619 patients [43.2%] in group 2 and 16711 of 29594 patients [56.5%] in group 3; P < .001) (Table 1).
Table 1. Characteristics of the Cohort.
Characteristic | Patients, No. (%) | P valuea | ||
---|---|---|---|---|
Group 1: pathology reports and CPT codes | Group 2: pathology reports but no CPT codes | Group 3: no pathology reports | ||
No. of patients | 5688 | 619 | 29 594 | |
Age, mean (SD), y | 69.8 (12.5) | 65.5 (13.6) | 70.6 (14.8) | <.001 |
Sex | ||||
Male | 3426 (60.2) | 267 (43.1) | 16 711 (56.5) | <.001 |
Female | 2262 (39.8) | 352(56.9) | 12 883 (43.5) | |
Race | ||||
White | 5558 (97.7) | 599 (96.8) | 25 446 (86.0) | <.001 |
African American | 9 (0.16) | 0 | 424 (1.4) | |
Asian | 10 (0.2) | 1 (0.2) | 74 (0.3) | |
Hispanic | 27 (0.5) | 2 (0.3) | 227 (0.8) | |
Native American | 2 (0.04) | 0 | 9 (<0.1) | |
Other | 1 (0.02) | 0 | 27 (0.1) | |
Unknown | 81 (1.4) | 17 (2.8) | 3423 (11.6) | |
Total pathology records, mean (SD) [range], No. | 4.9 (7.3) [1-129] | 1.5 (1.1) [1-10] | NA | <.001 |
Codes, mean (SD) [range], No. | ||||
ICD | 18.2 (58.6) [0-1658] | 7.8 (31.0) [1-324] | 14.3 (50.1) [0-1253] | <.001 |
CPT | 3.8 (5.0) [1-69] | 0 | 1.9 (3.3) [0-84] | <.001 |
Abbreviations: CPT, Current Procedural Terminology; ICD, International Classification of Disease.
Determined by use of the t test or the χ2 test.
Among the 5688 patients in group 1, there was a strong linear correlation between the number of pathology records and the number of CPT codes (r = 0.87, β = 0.60) (Figure 2). The total number of ICD codes was poorly correlated with the number of pathology records (r = 0.22). Among the 619 patients who had pathology records but no CPT codes (group 2), there was very poor correlation between ICD codes and pathology records (r = 0.06).
There were 3 outliers in group 1 with more than 100 histologically verified skin cancer. We conducted sensitivity analyses by restricting ourselves to only those patients with fewer than 100 confirmed skin cancers. This subset had a marginally stronger correlation between CPT codes and confirmed skin cancers (n = 5685, r = 0.88, β = 0.64). Several patients had very high numbers of ICD codes (ie, 22 patients had >500 ICD codes). After EHR review, they were discovered to have had long treatment courses for metastatic skin cancers with multiple inpatient admissions and other complications leading to frequent entry of a skin cancer ICD code.
The mean (SD) difference between the number of CPT codes and the number of pathology records was −1.06 (3.86). Our limits of agreement were −8.62 and 6.49 (Figure 3). The differences increased with an increasing number of both CPT codes and pathology records, and only 2800 (44.4%) had differences in CPT counts that were within 30% of the number of histologically verified skin cancers.
Cancer Subtypes
We found a strong correlation between nonmelanoma skin cancer (NMSC) same-day events and pathologically confirmed NMSC (r = 0.86) but a weak correlation for melanoma (r = 0.55) and Merkel cell carcinoma (r = 0.58). Similarly, the PPV was strong for NMSC, squamous cell carcinoma, and basal cell carcinoma (>0.9 for each) but less so for melanoma and Merkel cell carcinoma (Table 2). These PPVs were all greater among those with a CPT or ICD code on the same (Table 2).
Table 2. Positive Predictive Values of Having a Histologically Verified Subtype of Skin Cancer With an ICD Code or ICD and CPT Codes Related to That Subtype.
Subtype | Positive predictive value | |
---|---|---|
≥1 ICD codes only | ≥1 ICD and CPT codes on the same day | |
Melanoma | 0.58 | 0.79 |
Nonmelanoma skin cancer | 0.98 | 0.99 |
Basal cell carcinoma | 0.95 | 0.96 |
Squamous cell carcinoma | 0.93 | 0.95 |
Merkel cell carcinoma | 0.44 | 0.80 |
Abbreviations: CPT, Current Procedural Terminology; ICD, International Classification of Disease.
Patients With No Pathology Reports
Of the 100 patients with CPT codes but no pathology reports randomly selected for EHR review, there was near-perfect correlation between the number of histologically verified skin cancers from outside pathology and the number of CPT codes for unique lesions (r = 0.99).
Discussion
Using a large EHR from a single health care system, we observed that the number of CPT codes was strongly correlated with the total number of histologically verified skin cancers, although most patients had a more than 30% discrepancy in the total counts. The total number of ICD codes was a poor estimate. These findings support the use of CPT codes for estimating incidence of skin cancer and as a basis for using EHR data to estimate numbers of skin cancers per patient, which has been challenging to achieve in the past.
There was a strong correlation between the number of CPT codes and the number of histologically verified skin cancers (r = 0.87). In univariate models, we would anticipate a slope of 1 for perfect concordance. Rather, we observed that there was a 1-point increase in the number of histologically verified skin cancers for every 0.60 of CPT codes. Because this study was not population based, it should not be interpreted that the incidence estimates based on CPT code counts undercounted by 40%. One possible explanation for this finding is that we were not able to count CPT codes for those skin cancers that were treated topically, with systemic chemotherapy, or with radiation, or those cleared by biopsy, and those for which the patient elected not to be treated. In addition, patients referred for Mohs surgery will sometimes have other cancers that are biopsied and then treated by the referring physician, so these CPT codes would not be captured.
The correlation between same-day CPT or ICD codes and pathology reports was strong both overall (r = 0.85) and for NMSC (r = 0.86) but was weaker for melanoma (r = 0.55) and Merkel cell carcinoma (r = 0.58). These 2 latter skin cancers are often biopsied elsewhere before patients transfer their care to VUMC, and so the initial biopsy is not captured in our pathology database, whereas any subsequent skin cancers and all ICD and CPT codes would be captured. As a result, the correlation between the number of codes and the number of pathology records would be expected to be lower. Much like previous studies, we found a generally high PPV of having an ICD code for the subtype of interest to having a pathology report that confirmed skin cancer of that subtype (Table 2). The PPV was strong for NMSC, squamous cell carcinoma, and basal cell carcinoma (>0.9). We found a higher PPV when considering same-day CPT and ICD codes, instead of ICD codes alone, for all subtypes of skin cancer, with greater increases in melanoma and Merkel cell carcinoma (Table 2).
Because VUMC is a tertiary care medical center, many patients are referred for Mohs surgery or subspecialty oncology management. To assess the correlation between the number of CPT codes and the number of histologically verified skin cancers for these patients, we selected a random sample of 100 patients who had clinical codes but no VUMC pathology records (group 3). We reviewed these patients’ charts by hand to confirm the number of destructions, excisions, and Mohs surgical procedures performed as well as outside pathology records scanned into the chart. The correlation between CPT codes and pathology records was found to be extremely strong (r = 0.99). Based on our findings, in cases such as these where the primary pathology report is not in the EHR, the number of CPT codes will likely undercount the total number of skin cancers. Therefore, any associations using CPT codes as a proxy is likely to be biased toward the null and not toward spurious associations. Importantly, because this was not a population-based cohort study, incidence or prevalence estimates cannot be derived.
Several studies have shown high sensitivity and PPV using codes in administrative data sets to identify skin cancer cases, but not necessarily specific numbers of skin cancers.5,6,12 For example, in the Kaiser Permanente database, using the SNOMED codes for basal cell carcinoma had 99.2% sensitivity when searching pathology records6; however, these studies significantly differ from ours in that the databases used were much more likely to contain a patient’s entire medical history rather than the fragmented care characteristic of much of the US system. More similar to our approach are 2 studies examining ICD-9 and CPT codes as a means of determining skin cancer status.5,12 These showed much poorer sensitivity and PPV when using clinical codes alone, attributing them to miscoding benign or uncertain lesions as malignant.12 The goal of our study, however, was not to identify cases but rather to count the total number of skin cancers in each case.
Although our model was able to explain 75% of the variation in the number of the pathology reports using the CPT counts, the discrepancy between these 2 values increased with an increasing number of skin cancers. This may have led to undercounting of the total number of skin cancers, and indeed our models showed that up to 40% of skin cancers in this sample may not have been treated surgically at VUMC. Conversely, for the patients who did not have any pathology reports at VUMC, we found an extremely high correlation between the number of CPT codes within the VUMC system and the number of outside pathology reports, indicating near-complete capture of the available data for these patients.
Limitations
Our study was limited by obtaining data from a single institution. Phenotyping algorithms are highly portable between medical centers, and others have previously shown the ease of identifying skin cancer cases from single-center studies.7,13 The more granular outcome of counting skin cancers per patient, however, might not be as generalizable. The limitations that we encountered within the EHR data likely are experienced elsewhere, and herein we present approaches to address them. Most of the patients in this study did not have pathology reports. As a result, relying solely on pathology reports and natural language processing would miss this large group. Other institutions likely will experience similar issues of incompleteness in both CPT codes and pathology records, and we recommend assessing the utility of proxy measures prior to their use in other populations, although the use of CPT codes does appear to be the best available measure.
Clinical data on individual patients can be scattered across systems, which may limit the ability to conduct valid studies of many clinical outcomes. The lack of a skin cancer registry that includes keratinocyte carcinomas may further limit our capacity to learn about predictors, outcomes, and the pathogenesis of skin cancers. Currently, nearly all states have their own cancer registry and use electronic reporting. With this framework in place, the addition of skin cancers beyond those currently registered would be a feasible task and help overcome the limitations of determining cancers spread across multiple health care systems.
Conclusions
In this study of more than 6000 patients with data in the EHRs from an open health care system, we found a strong linear correlation between the number of CPT codes and the total number of histologically verified skin cancers. There was a poor correlation between the number of ICD codes and the total number of histologically verified skin cancers. CPT codes accounted for more than 75% of the variability in the number of skin cancers identified, which suggests that this method is an adequate proxy for the number of skin cancers per patient.
References
- 1.Rogers HW, Weinstock MA, Harris AR, et al. Incidence estimate of nonmelanoma skin cancer in the United States, 2006. Arch Dermatol. 2010;146(3):283-287. doi: 10.1001/archdermatol.2010.19 [DOI] [PubMed] [Google Scholar]
- 2.Rogers HW, Weinstock MA, Feldman SR, Coldiron BM. Incidence estimate of nonmelanoma skin cancer (keratinocyte carcinomas) in the U.S. population, 2012. JAMA Dermatol. 2015;151(10):1081-1086. doi: 10.1001/jamadermatol.2015.1187 [DOI] [PubMed] [Google Scholar]
- 3.Euvrard S, Morelon E, Rostaing L, et al. ; TUMORAPA Study Group . Sirolimus and secondary skin-cancer prevention in kidney transplantation. N Engl J Med. 2012;367(4):329-339. doi: 10.1056/NEJMoa1204166 [DOI] [PubMed] [Google Scholar]
- 4.Cho HG, Kuo KY, Li S, et al. Frequent basal cell cancer development is a clinical marker for inherited cancer susceptibility. JCI Insight. 2018;3(15):122744. doi: 10.1172/jci.insight.122744 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Eide MJ, Tuthill JM, Krajenta RJ, Jacobsen GR, Levine M, Johnson CC. Validation of claims data algorithms to identify nonmelanoma skin cancer. J Invest Dermatol. 2012;132(8):2005-2009. doi: 10.1038/jid.2012.98 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Asgari MM, Eide MJ, Warton EM, Fletcher SW. Validation of a large basal cell carcinoma registry. J Registry Manag. 2013;40(2):65-69. [PubMed] [Google Scholar]
- 7.Orso M, Serraino D, Abraha I, et al. ; D.I.V.O. Group . Validating malignant melanoma ICD-9-CM codes in Umbria, ASL Napoli 3 Sud and Friuli Venezia Giulia administrative healthcare databases: a diagnostic accuracy study. BMJ Open. 2018;8(4):e020631. doi: 10.1136/bmjopen-2017-020631 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Danciu I, Cowan JD, Basford M, et al. Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform. 2014;52:28-35. doi: 10.1016/j.jbi.2014.02.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hinkle DE, Wiersma W, Jurs SG. Applied Statistics for the Behavioral Sciences. Houghton Mifflin College Division; 2003. [Google Scholar]
- 10.Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307-310. doi: 10.1016/S0140-6736(86)90837-8 [DOI] [PubMed] [Google Scholar]
- 11.The R Project for Statistical Computing. Accessed November 11, 2020. https://www.R-project.org/
- 12.Eide MJ, Krajenta R, Johnson D, et al. Identification of patients with nonmelanoma skin cancer using health maintenance organization claims data. Am J Epidemiol. 2010;171(1):123-128. doi: 10.1093/aje/kwp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Pacheco JA, Rasmussen LV, Kiefer RC, et al. A case study evaluating the portability of an executable computable phenotype algorithm across multiple institutions and electronic health record environments. J Am Med Inform Assoc. 2018;25(11):1540-1546. doi: 10.1093/jamia/ocy101 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.