Abstract
Background
Electronic claims and medical record databases are important sources of information for medical research. However, potential sources of error and bias, including inaccurate diagnoses, incomplete data, incorrect data entry, and misclassification bias, necessitate studies that assess the validity of these databases.
Objective
To assess the validity of the diagnostic code for hidradenitis suppurativa (HS), which is an increasingly studied disease.
Methods
In this retrospective study, the medical records of 1,168 patients in the Massachusetts General Hospital database who had received at least two International Classification of Disease, Ninth Revision 705.83 codes were manually screened.
Results
Of the screened patients, 1,046 (89.6%) were confirmed as having HS. Mean age (standard deviation) was 44.0 (15.7) years, median age was 43.0 years, and 748 (71.5%) were female. The majority was white (66.7%), while a significant minority was black (13.9%) or Hispanic (13.4%). An increasing total number of codes and specific terms used to describe HS in the medical record, including “hydradenitis,” “boil,” “draining,” “abscess,” “fistula,” “cyst,” and “nodule,” could be used to improve the positive predictive value of the search.
Conclusions
Our results highlight the importance of establishing the validity of diagnostic codes in electronic databases and allow for refinements of appropriate ways to design future searches. Given the potential for misclassification of HS patients, establishing the validity of diagnostic codes and searching strategies in electronic databases represents a crucial step for subsequent studies utilizing these databases.
Introduction
Clinical databases represent increasingly important sources of information for medical research, including investigation of health outcomes, drug utilization, use of services, policy evaluation, epidemiology, quality of care, physician profiling, and health economics.1 Given the convenient access to a large amount of patient data available through these databases as well as the more widespread utilization of electronic medical records, the use of claims and medical record databases for research will likely increase, ultimately affecting patient care, treatment decisions, and health care policy.2 However, several possible sources of error and bias raise concerns about the validity of data obtained from electronic records, including inaccurate diagnoses, missing or incomplete data, faulty data entry, and misclassification bias.3–5 Consequently, studies assessing the validity of electronic databases are crucial as results obtained from electronic medical record review continue to inform future healthcare decisions.
The current knowledge about the epidemiology, associated comorbidities, and long-term outcomes of the hidradenitis suppurativa (HS) population is limited, and further research will likely rely on large population-based studies using electronic medical records. Recently, several reports have begun to describe the epidemiology of HS based on claims and medical record databases, including the Rochester Epidemiology Project6 and the PharMetrics Integrated Database study.7 Given the considerable variation seen amongst recent epidemiologic reports based on electronic databases,8 and the fact that the International Classification of Disease, Ninth Revision (ICD-9) code for HS, 705.83, includes other rare diagnoses such as neutrophilic eccrine hidradenitis and recurrent palmoplantar hidradenitis, the need for assessments of the validity of diagnostic codes within such databases becomes evident. Previous analyses of electronic diagnostic databases have been conducted for other dermatologic conditions, such as psoriasis.2,9 In this study, we therefore assessed the validity of the electronically recorded diagnostic code for HS in our medical record database.
Materials and Methods
We conducted a retrospective study using the patient data available through the Longitudinal Medical Record (LMR) and Queriable Patient Inference Dossier (QPID) at the Massachusetts General Hospital (MGH). LMR is an ambulatory-care electronic medical record system used by physicians and other clinical staff for documentation of outpatient medical care. Data captured in LMR include clinic notes, telephone encounters, problem lists, medication lists, emergency room discharge summaries, pathology reports, laboratory data, and imaging studies. Inpatient consult notes are also available; however, inpatient records, including admission, progress, discharge, and nursing progress notes, are not currently available in LMR. QPID is a health intelligence platform incorporating an electronic health record search engine and a programming system of query development that captures information from patients’ complete medical records (inpatient and outpatient records from all healthcare providers).
We conducted a search through the MGH Research Patient Data Registry (RPDR) to identify potential patients of all ages who had received at least one ICD-9 code 705.83 for hidradenitis between January 1, 1980, and October 1, 2013. The complete medical records of all patients who had received at least two ICD-9 codes 705.83 were manually reviewed in LMR and QPID by a single trained medical student and confirmed with the dermatologist co-investigator (A.B.K.) when the diagnosis of HS was questionable. For example, cases were considered questionable if the skin lesions described did not represent a classic presentation of HS, or if the patient had another condition (such as Crohn’s disease) that could potentially explain the skin lesions. In order to facilitate the search of the medical record, the following terms were systematically searched in QPID for all patients: “abscess,” “acne inversa,” “boil,” “cyst,” “draining,” “fistula,” “hidradenitis,” “hydradenitis,” “HS,” and “nodule.” Positive findings for the above search terms were noted and confirmed through QPID. In the event of no findings, the patient’s record in LMR was further reviewed by reviewing all of the patient’s clinic notes, pathology reports, and emergency room records. The HS diagnoses were validated in the medical record by a dermatologist’s confirmation of HS, description of the HS lesions by the reporting physician, or the results of a pathology report for a skin biopsy, whenever possible. The fact that a code for HS was entered by a dermatologist was not factored into our determination of positive cases of HS.
To help determine the accuracy of our methodology, we verified whether 82 patients who belonged to the co-investigator’s (A.B.K.’s) dermatology clinic and had known HS were included amongst our validated cases of HS. Inter-rater reliability was not assessed since a single rater reviewed all medical records for positive cases of HS. Intra-rater reliability was measured by reassessing a subset of 60 patients (15 patients each from the 2 codes, 3 codes, 4 codes, and 5+ codes group) to see whether the same patients were consistently considered positive cases of HS.
Positive predictive value (PPV) was defined as the number of patients verified as having HS divided by the total number of patients screened for HS in each category. Percentages were used to report the results of our analyses. Confidence intervals (CI) for PPV were calculated using exact binomial methods. Data were analyzed using the statistical software program JMP Pro 11. The study protocol was reviewed and approved by the Institutional Review Board (IRB) at MGH.
Results
Our initial query for all patients who had received at least one ICD-9 code 705.83, which includes HS, neutrophilic eccrine hidradenitis, and recurrent palmoplantar hidradenitis, between January 1, 1980, and October 1, 2013, resulted in a total of 2,292 potential patients. Of these patients, the complete medical records of the 1,168 (51%) patients who had received at least two 705.83 codes were manually screened. Our study did not analyze patients a priori who had received only one ICD-9 code since the PPV of using one code in studies of other disease populations, such as psoriasis patients, appeared to be suboptimal.2 Of the screened patients, 1,046 (89.6%) were validated as having HS. We expected the PPV to be less than 100% given that the ICD-9 code 705.83 encompasses other diagnoses besides HS, as described above. However, a diagnosis of neutrophilic eccrine hidradenitis was found only once. Among the false-positives, the most common actual diagnoses were sebaceous cyst (10 patients), skin abscess (7), breast cyst (4), and cellulitis (3). In addition, all of the 82 patients identified beforehand by the principal investigators as having HS were found within the data set of 2,292 patients generated by our search methodology, confirming that a correctly entered code results in accurate capturing by our search. Our measure of intra-rater reliability showed that 100% of the positive cases of HS out of 60 patients re-assessed by the rater were designated as positive cases upon re-evaluation.
The characteristics of the patients with confirmed HS are reported in Table 1. The mean age (standard deviation [SD]) was 44.0 (15.7) years, median age was 43.0 years, and 748 (71.5%) patients were female. The gender distribution was similar to that reported by Cosmatos et al.7 at 74% women, but we found a higher average age compared to Cosmatos et al., who reported a mean age (SD) of 38.2 (14.73) years based on a patient claims database of 7,927 patients. The majority of our patients with confirmed HS were identified in the database as white (66.7%), while a significant minority was identified as black (13.9%) or Hispanic (13.4%). These results show a different racial distribution from the overall demographics of the total of 2,279,254 MGH patients in our database, 68.7% of whom are white; 7.4%, Hispanic; 5.1%, black; and 3.9%, Asian. In contrast to previous studies, which did not substantiate a racial predilection for HS,10 our study shows a greater prevalence of HS among blacks and Hispanics. Of the confirmed HS patients, 557 (53.5%) were single, 350 (33.5%) were married, and 65 (6.2%) were divorced.
Table 1.
Epidemiology of patients with confirmed hidradenitis suppurativa (n = 1,046).
Age (yrs) | Mean (standard deviation) | 44 (15.7) |
Median | 43.0 | |
Gender | Female | 748 (71.5%) |
Male | 298 (28.5%) | |
Race/Ethnicity | White | 698 (66.7%) |
Black | 145 (13.9%) | |
Hispanic | 140 (13.4%) | |
Asian | 18 (1.7%) | |
Other | 13 (1.2%) | |
Unknown | 32 (3.1%) | |
Marital Status | Single | 557 (53.5%) |
Married | 350 (33.5%) | |
Divorced | 65 (6.2%) | |
Widow(er) | 30 (2.9%) | |
Separated | 12 (1.2%) | |
Other | 5 (0.5%) | |
Unknown | 27 (2.6%) |
The PPV of having two codes consistent with HS was 81.8% (95% CI: 77.8, 85.4); three codes, 85.1% (95% CI: 79.2, 89.9); four codes, 94.5% (95% CI: 89.1, 97.8); and five or more codes, 97.3% (95% CI: 95.3, 98.6) (Table 2). Of the 1,168 patients with at least two HS codes, 449 (38.4%) had received at least one of their codes from a dermatologist. The PPV of having a diagnosis of HS after receiving at least one code from a dermatologist was 96.4% (95% CI: 94.3, 98.0) compared to a PPV of 85.3% (95% CI: 82.5, 87.8) for a non-dermatologist entered code. If the first HS code that a patient received were entered by a dermatologist, the PPV was 95.0% (95% CI: 91.7, 97.2) compared to a PPV of 87.9% (95% CI: 85.5, 89.9) if the first code were entered by a non-dermatologist.
Table 2.
Positive predictive value (PPV) by number of codes, whether the code were entered by a dermatologist, whether the first code were entered by a dermatologist, and search term finding.
Confirmed HS | PPV (95% CI) | |||
---|---|---|---|---|
Yes | No | |||
No. of Codes | 2 | 338 | 75 | 81.8% (77.8, 85.4) |
3 | 160 | 28 | 85.1% (79.2, 89.8) | |
4 | 121 | 7 | 94.5% (89.1, 97.8) | |
≥5 | 427 | 12 | 97.3% (95.3, 98.6) | |
Coded by Derm at Least Once | Yes | 433 | 16 | 96.4% (94.3, 98.0) |
No | 613 | 106 | 85.3% (82.5, 87.8) | |
First Coded by Derm | Yes | 265 | 14 | 95.0% (91.7, 97.2) |
No | 781 | 108 | 87.9% (85.5, 89.9) | |
Search Term Finding | Acne Inversa | 2 | 0 | 100% (15.8, 100) |
HS | 86 | 0 | 100% (95.8, 100) | |
Hidradenitis | 817 | 3 | 99.6% (98.9, 99.9) | |
Hydradenitis | 433 | 2 | 99.5% (98.4, 99.9) | |
Boil | 125 | 2 | 98.4% (94.4, 99.8) | |
Draining | 329 | 12 | 96.5% (93.9, 98.2) | |
Abscess | 589 | 26 | 95.8% (93.9, 97.2) | |
Fistula | 95 | 6 | 94.1% (87.5, 97.8) | |
Cyst | 709 | 50 | 93.4% (91.4, 95.1) | |
Nodule | 351 | 33 | 91.4% (88.1, 94.0) |
We then assessed the PPV of the frequency of HS codes in 1-, 2-, 3-, and 5-year time periods (Figure 1). For a 1-year window, having two codes for HS had a PPV of 82.1% (95% CI: 78.1, 85.6), which increased to 89.2% (95% CI: 83.5, 93.5) for three codes and 97.0% (95% CI: 94.9, 98.4) for four or more codes. For a 2-year period, the PPV of having two, three, or four or more codes was 83.2% (95% CI: 79.3, 86.6), 86.0% (95% CI: 80.2, 90.7), and 96.4% (95% CI: 94.4, 97.8), respectively. When we examined the frequency of codes over 3 years, two HS codes has a PPV of 82.0% (95% CI: 77.9, 85.6), which increased to 86.9% (95% CI: 81.3, 91.4) in the presence of three codes and 96.4% (95% CI: 94.4, 97.8) in the presence of four or more codes. Lastly, over 5 years, the PPV of having two codes remained high at 81.9% (95% CI: 77.8, 85.5) and increased to 86.2% (95% CI: 80.5, 90.8) for three codes and 96.6% (95% CI: 94.7, 97.9) for four or more codes.
Figure 1.
Positive predictive value (PPV) of frequency of codes for hidradenitis suppurativa.
The results of our search in the medical record for specific HS-related terms are summarized in Table 2. The most commonly used terms were “hidradenitis,” “cyst,” “abscess,” and “hydradenitis,” appearing in 70.2%, 65.0%, 52.7%, and 37.2% of the screened medical records, respectively. The PPV of a successful finding for “acne inversa,” which was rare, and “HS” was 100% (95% CI: 15.8, 100) and 100% (95% CI: 95.8, 100), respectively. The PPV for “hidradenitis” was 99.6% (95% CI: 98.9, 99.9); “hydradenitis,” 99.5% (95% CI: 98.4, 99.9); “boil,” 98.4% (95% CI: 94.4, 99.8); “draining,” 96.5% (95% CI: 93.9, 97.2); “abscess,” 95.8% (95% CI: 93.9, 97.2); “fistula,” 94.1% (95% CI: 87.5, 97.8); “cyst,” 93.4% (95% CI: 91.4, 95.1); and “nodule,” 91.4% (95% CI: 88.1, 94.0).
Discussion
In this study, we analyzed the validity of the ICD-9 code consistent with HS in an electronic medical record database and found that using three codes to determine the patient population is likely necessary to assure reasonable fidelity in this data set. Compared to having a total of two HS codes, the PPV increased by 12.7% for four codes or by 15.5% for five or more total codes. For each time window of 1, 2, 3, or 5 years, an increasing number of HS codes resulted in a greater PPV, which was still as high as 81.9% for two HS codes in a 5-year time period. This value can be compared to a PPV of 76% for any two psoriasis codes in a 5-year time frame reported in a previous study by Icen et al.2 Interestingly, the PPV for the HS code is higher compared to that for psoriasis codes, a finding that may be explained by the fact that fewer physicians are likely to be aware of and use the HS diagnostic code since the disease may not be part of their regular practice or training. Accordingly, Icen et al.2 found that compared to a general psoriasis code, codes specifying the type of psoriasis, whose use may reflect better knowledge of dermatologic conditions, displayed higher PPV (up to 94.0%).
As our results also show, the PPV of the number of diagnostic codes for HS changes only slightly over time (Figure 1). The PPV of having two codes or having four or more codes remained relatively stable at 82% to 83% and 96% to 97%, respectively, over 1- to 5-year time periods. However, the presence of three codes consistent with HS in a 1-year time window was associated with a PPV of 89% and decreased to 86% at 2 years with no change at 5 years. Therefore, these criteria can be applied largely without regard to timing of the codes.
Unsurprisingly, we found that the PPV was significantly greater if at least one HS code were entered by a dermatologist compared to a non-dermatologist (96.4% vs. 85.3%). Similarly, if the first HS code that a patient received were entered by a dermatologist, the PPV was significantly higher compared to a non-dermatologist (95.0% vs. 87.9%). The initial presentation of HS, which manifests as inflammatory nodules, sinus tracts, comedones, and fibrotic scarring primarily involving intertriginous areas including the axillae, groin, mammary and inframammary region, and buttocks,11 may be misdiagnosed as folliculitis, recurrent cysts, or severe acne. Given the relative challenge of identifying HS, the expertise of a dermatologist appears to make an accurate diagnosis more likely. This is important given that accurate diagnosis facilitates prompt treatment aimed at minimizing the risk of progression to disabling, end-stage disease.
One of the goals of our study was to identify free text terms that may help improve the sensitivity and specificity of future searches. As expected, the PPV of these terms were relatively high since we started with a population enriched with HS patients (i.e. who had already received at least two HS codes). Not surprisingly, the most common search term finding in the medical records of patients coded for HS was “hidradenitis,” but we observed a large number of misspellings in the electronic medical record as “hydradenitis,” especially among non-dermatologists. However, both terms had a similarly high PPV, indicating that healthcare providers were still making the correct diagnosis regardless of spelling. In contrast, less specific terms used to describe HS lesions including “nodule,” “cyst,” “fistula,” and “abscess,” all of which can also be used to describe other conditions such as Crohn’s disease and acne, were not surprisingly found to have a lower PPV by as much as 8.2%. Interestingly, the two nonspecific terms that provided the greatest likelihood that the patient had HS were “boil” and “draining” with a PPV almost as high as the word “hidradenitis” itself.
As we begin to explore the epidemiology and long-term outcomes of the HS population, the PPV of diagnostic codes becomes increasingly important. Since we investigated whether patients with diagnostic codes consistent with HS truly had the disease, all patients included in our study had received HS codes; as a result, “true negatives” and “false negatives” were not applicable. Thus, we assessed PPV as the primary outcome rather than sensitivity or specificity. The implications of a low PPV include overestimation of disease incidence and may explain some of the discrepancies in recent studies of the incidence and prevalence of HS. Another consequence of a low PPV is inaccurate estimations of the comorbidities, medical complications, and impairments in quality of life associated with HS. The inclusion of misclassified patients without a true diagnosis of HS in the study analyses may misleadingly dilute the actual risk for the outcome. However, two important considerations must be made when interpreting PPV. Firstly, PPV depends on the prevalence of HS. In our study of patients with at least two diagnostic codes for HS, the prevalence was likely high, and thus, our PPV was also relatively high. Secondly, a higher PPV does not necessarily imply an advantage. For example, a higher PPV may be achieved at the cost of a lower sensitivity, which would be undesirable in a study that assesses prevalence and incidence of HS. Including patients with a greater number of diagnostic codes may increase PPV, but a significant proportion of patients with only one HS code may actually have HS, resulting in an underestimated incidence as a result of a low sensitivity.
Several potential limitations should be considered in the interpretation of the results of our study. The electronic database we analyzed may differ from other claims and medical record databases, limiting the generalizability of our findings. However, our results regarding successful search term findings that increase the PPV may prove more generally helpful in informing future searches even within different databases. Our demographic results do not include patients with true HS who may never have received an ICD-9 code for their disease, an observation that may be a major issue for this condition. Strengths of our study include the large sample size, long study period (over three decades), and essentially complete chart review of all 1,168 patients included in our study.
In conclusion, our results highlight the importance of establishing the validity of diagnostic codes in electronic databases and allow for refinements of appropriate ways to design future searches. Both represent a crucial first step for subsequent studies utilizing these databases.
What’s already known
Current knowledge about the epidemiology and comorbidities of hidradenitis suppurativa (HS) patients is limited. Further research will rely on electronic databases.
What does this study add
An increasing number of codes yielded a greater positive predictive value (PPV), which was as high as 81.9% for two codes in a 5-year period. Specific terms to describe HS in the medical record, including “hydradenitis,” “boil,” “draining,” “abscess,” “fistula,” “cyst,” and “nodule,” could be used to improve the PPV.
Acknowledgments
Funding/Support: None.
This work was conducted with support from Harvard Catalyst | The Harvard Clinical and Translational Science Center (National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health Award 1UL1 TR001102-01 and financial contributions from Harvard University and its affiliated academic health care centers).
Footnotes
Conflict of Interest: None declared.
The content is solely the responsibility of the authors and does not necessarily represent the official views of Harvard Catalyst, Harvard University and its affiliated academic health care centers, or the National Institutes of Health.
References
- 1.Wilchesky M, Tamblyn RM, Huang A. Validation of diagnostic codes within medical services claims. J Clin Epidemiol. 2004;57:131–41. doi: 10.1016/S0895-4356(03)00246-4. [DOI] [PubMed] [Google Scholar]
- 2.Icen M, Crowson CS, McEvoy MT, et al. Potential misclassification of patients with psoriasis in electronic databases. J Am Acad Dermatol. 2008;59:981–5. doi: 10.1016/j.jaad.2008.08.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Khwaja HA, Syed H, Cranston DW. Coding errors: a comparative analysis of hospital and prospectively collected departmental data. BJU Int. 2002;89:178–80. doi: 10.1046/j.1464-4096.2001.01428.x. [DOI] [PubMed] [Google Scholar]
- 4.Peabody JW, Jain S, Bertenthal D, et al. Assessing the Accuracy of Administrative Data in Health. Med Care. 2004;42:1066–1072. doi: 10.1097/00005650-200411000-00005. [DOI] [PubMed] [Google Scholar]
- 5.Gorelick MH, Knight S, Alessandrini EA, et al. Lack of agreement in pediatric emergency department discharge diagnoses from clinical and administrative data sources. Acad Emerg Med. 2007;14:646–52. doi: 10.1197/j.aem.2007.03.1357. [DOI] [PubMed] [Google Scholar]
- 6.Vazquez BG, Alikhan A, Weaver AL, et al. Incidence of hidradenitis suppurativa and associated factors: a population-based study of Olmsted County, Minnesota. J Invest Dermatol. 2013;133:97–103. doi: 10.1038/jid.2012.255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cosmatos I, Matcho A, Weinstein R, et al. Analysis of patient claims data to determine the prevalence of hidradenitis suppurativa in the United States. J Am Acad Dermatol. 2013;68:412–9. doi: 10.1016/j.jaad.2012.07.027. [DOI] [PubMed] [Google Scholar]
- 8.Sung S, Kimball AB. Counterpoint: analysis of patient claims data to determine the prevalence of hidradenitis suppurativa in the United States. J Am Acad Dermatol. 2013;69:818–9. doi: 10.1016/j.jaad.2013.06.043. [DOI] [PubMed] [Google Scholar]
- 9.Huerta C, Rivero E, Rodríguez LAG. Incidence and risk factors for psoriasis in the general population. Arch Dermatol. 2007;143:1559–65. doi: 10.1001/archderm.143.12.1559. [DOI] [PubMed] [Google Scholar]
- 10.Alikhan A, Lynch PJ, Eisen DB. Hidradenitis suppurativa: a comprehensive review. J Am Acad Dermatol. 2009;60:539–61. doi: 10.1016/j.jaad.2008.11.911. [DOI] [PubMed] [Google Scholar]
- 11.Slade DE, Powell B, Mortimer P. Hidradenitis suppurativa: pathogenesis and management. Br J Plast Surg. 2003;56:451–461. doi: 10.1016/s0007-1226(03)00177-2. [DOI] [PubMed] [Google Scholar]