Validation of an electronic algorithm for Hodgkin and non-Hodgkin Lymphoma in ICD-10-CM

Mara M Epstein; Sarah K Dutcher; Judith C Maro; Cassandra Saphirak; Sandra DeLuccia; Muthalagu Ramanathan; Tejaswini Dhawale; Sonali Harchandani; Christopher Delude; Laura Hou; Autumn Gertz; Nina DiNunzio; Cheryl N McMahill-Walraven; Mano S Selvan; Justin Vigeant; David V Cole; Kira Leishear; Jerry H Gurwitz; Susan Andrade; Noelle M Cocoros

doi:10.1002/pds.5256

. Author manuscript; available in PMC: 2022 Jul 1.

Published in final edited form as: Pharmacoepidemiol Drug Saf. 2021 May 5;30(7):910–917. doi: 10.1002/pds.5256

Validation of an electronic algorithm for Hodgkin and non-Hodgkin Lymphoma in ICD-10-CM

Mara M Epstein ^1,², Sarah K Dutcher ³, Judith C Maro ⁴, Cassandra Saphirak ^1,², Sandra DeLuccia ⁴, Muthalagu Ramanathan ⁵, Tejaswini Dhawale ⁶, Sonali Harchandani ⁵, Christopher Delude ², Laura Hou ⁴, Autumn Gertz ⁴, Nina DiNunzio ⁴, Cheryl N McMahill-Walraven ⁷, Mano S Selvan ⁸, Justin Vigeant ⁴, David V Cole ⁴, Kira Leishear ³, Jerry H Gurwitz ^1,², Susan Andrade ^1,², Noelle M Cocoros ⁴

PMCID: PMC8205565 NIHMSID: NIHMS1707792 PMID: 33899311

Abstract

Purpose:

Lymphoma is a health outcome of interest for drug safety studies. Studies using administrative claims data require the accurate identification of lymphoma cases. We developed and validated an International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM)-based algorithm to identify lymphoma in healthcare claims data.

Methods:

We developed a three-component algorithm to identify patients aged ≥15 years who were newly diagnosed with Hodgkin (HL) or non-Hodgkin (NHL) lymphoma from January 2016 through July 2018 among members of four Data Partners within the FDA’s Sentinel System. The algorithm identified potential cases as patients with ≥2 ICD-10-CM lymphoma diagnosis codes on different dates within 183 days; ≥1 procedure code for a diagnostic procedure (e.g. biopsy, flow cytometry) and ≥1 procedure code for a relevant imaging study within 90 days of the first lymphoma diagnosis code. Cases identified by the algorithm were adjudicated via chart review and a positive predictive value (PPV) was calculated.

Results:

We identified 8,723 potential lymphoma cases via the algorithm and randomly sampled 213 for validation. We retrieved 138 charts (65%) and adjudicated 134 (63%). The overall PPV was 77% (95% Confidence Interval: 69–84%). Most cases also had subtype information available, with 88% of cases identified as NHL and 11% as HL.

Conclusions:

Seventy-seven percent of lymphoma cases identified by an algorithm based on ICD-10-CM diagnosis and procedure codes and applied to claims data were true cases. This novel algorithm represents an efficient, cost-effective way to target an important health outcome of interest for large-scale drug safety and public health surveillance studies.

Keywords: algorithm, lymphoma, validation

Introduction

Lymphomas are cancers that arise from the proliferation of abnormal T- or B-lymphocytes.¹ Non-Hodgkin lymphoma (NHL) is the most common lymphoma subtype and the seventh most commonly diagnosed cancer among both US men and women, with over 74,000 new cases diagnosed annually.² Hodgkin lymphoma (HL) is less common, accounting for over 8,800 new cancer diagnoses annually.^2,3 NHL risk increases with age, with a median age at diagnosis of 67 years, and an overall five-year survival rate of 72% across the more than 60 histologic subtypes.^2,4 In contrast, HL follows a well-documented bimodal age distribution,⁵ with peaks in incidence in adolescence/young adulthood and in adults aged ≥50 years, with a median age at diagnosis of 39 years.² The five-year survival rate for HL is higher than NHL, at 86.6%.²

Few prior studies have validated methods to accurately identify lymphoma in administrative health data,⁶ and no published studies have evaluated International Classification of Diseases, 10^th Revision, Clinical Modification (ICD-10-CM)-based algorithms. Only one study has identified and validated algorithms to detect lymphoma in claims data using ICD-9-CM codes, with moderate positive predictive value (PPV) estimates ranging from 34.7–62.8%.⁷

The objective of this study was to develop and validate an algorithm to identify all lymphoma cases using ICD-10-CM codes present in administrative health data within the US Food and Drug Administration (FDA) Sentinel System.^8,9 The successful development of the first ICD-10-CM-based algorithm to identify lymphoma in administrative health data would assist in the surveillance for and evaluation of the risk of developing lymphoma associated with use of medications of interest.⁹

Methods

Data Source

This project was conducted among four Sentinel Data Partners: three large national insurers (Aetna, HealthCore (Anthem), Humana Healthcare Research, Inc. (HHR)) and one integrated healthcare delivery system (Kaiser Permanente Northwest). The Meyers Primary Care Institute, a partner organization of Sentinel and a joint endeavor of the University of Massachusetts Medical School, Fallon Health, and Reliant Medical Group, served as the lead analytic site; the Sentinel Operations Center at Harvard Pilgrim Health Care Institute served as the coordinating center for the project. The Sentinel System is a national, distributed, electronic health database network developed for medical product safety surveillance.^8–10 Sentinel Initiative activities are considered public health surveillance under the Federal Policy for the Protection of Human Subjects, and thus this project did not require Institutional Review Board (IRB) approval.¹¹

Study Population

Health plan members aged ≥15 years and enrolled for at least 12 months with medical and drug coverage prior to the appearance of a lymphoma diagnosis code were included. The selected age range included the younger peak in HL incidence based on the epidemiology of the disease.^2,5 To capture the time period following the clinical transition to ICD-10 in the US in October 2015, we identified lymphoma diagnoses occurring between January 1, 2016 and July 31, 2018.

Development of an ICD-10-CM-based algorithm

We developed a three-component algorithm incorporating diagnosis and procedure codes extracted from administrative medical claims data to identify lymphoma cases (Figure 1) based on clinical understanding of the condition. The outcome of lymphoma included all histologic subtypes of both NHL and HL.

Figure 1. — Design diagram detailing the study period, eligibility criteria, and algorithm components to identify lymphoma in administrative claims data¹

¹Light gray: eligibility criteria; dark gray: algorithm components

The first component of the algorithm required at least two diagnosis codes for NHL and/or HL on different dates within a 183-day window during the study period. The index date was defined as the first lymphoma diagnosis code where the patient was enrolled in the health plan with medical and drug coverage for 365 days prior and had no lymphoma-specific diagnosis codes in that pre-index period. Diagnosis code lists were inclusive of all lymphoma subtypes of both NHL and HL to decrease the false negative rate. Because the 365-day pre-index period could include ICD-9-CM codes, ICD-9-CM codes were also identified (see Supplement A for the list of diagnosis codes).¹² We selected a window of 183 days, or approximately six months, assuming that the majority of newly diagnosed lymphoma patients would have follow-up care within 3–6 months of their initial diagnosis. The identification of at least two diagnosis codes minimized the inclusion of codes that were part of an initial differential diagnosis and were later ruled out.

For patients meeting the first algorithm component, we then required the presence of at least one procedure code indicating a relevant diagnostic procedure (e.g. biopsy, flow cytometry) within 90 days before or after the first lymphoma diagnosis code (Supplement B).

The third algorithm component required patients to have at least one procedure code indicating a relevant imaging study (e.g. select computed tomography (CT) or magnetic resonance imaging (MRI) procedures) within 90 days before or after the index date (Supplement C) to capture imaging procedures performed as part of the diagnostic process. Patients with all three components were considered to be algorithm-identified lymphoma cases.

Analytic code was created using SAS v9.4 (SAS Institute, Cary, NC) using the Sentinel Cohort Identification and Descriptive Analysis (CIDA) Module version 5.4.3. and additional programming.¹³ Data Partners executed the distributed SAS code against their local Sentinel databases to identify all potential lymphoma cases.

Algorithm Evaluation: Patient Episode Profile Retrieval (PEPR)

The Patient Episode Profile Retrieval (PEPR) distributed SAS program developed for Sentinel generates a profile of a patient’s claims history relevant to an event of interest.¹⁴ The PEPR profile creates a patient-level summary of data over a specified time. PEPR claims profiles were generated from 90 days prior to 183 days after the first lymphoma diagnosis (index date). We requested 213 claims profiles and subsequent charts to achieve the goal of calculating a final PPV with high precision; we assumed a chart retrieval rate of 60–80% based on prior Sentinel validation studies.¹⁵ For a target PPV of 80% (±10%), a sample size of 150 charts would allow for 95% confidence intervals of 72.7–86.1%.

PEPR claims profiles were used in two ways. First, clinician adjudicators with hematology-oncology expertise (MR, TD) ranked one primary and one secondary healthcare encounter for each patient to target for medical chart retrieval and review. The clinicians used their hematology-oncology expertise to rank encounters by their likelihood of containing the relevant diagnostic information, with outpatient charts preferred, since lymphoma is more likely to be diagnosed in an outpatient setting. Second, we examined the ability of the clinician adjudicators to classify lymphoma case status for algorithm-identified cases based on these de-identified, patient-level claims profiles, and ultimately compared these results to the chart review results. The adjudicators reviewed the claims profiles and categorized each patient as likely lymphoma, unlikely lymphoma, or unable to determine based on their clinical judgement. Adjudicators were also asked to determine if the case had NHL or HL.

Algorithm Evaluation: Chart Review

Chart Selection.

Charts were retrieved from encounters identified through the PEPR review as the most likely outpatient encounter to contain the required information on the potential lymphoma case. Outpatient charts were preferred and were requested to contain data from 30 days prior to 90 days following the selected healthcare encounter date, with the expectation that most relevant information would be present in the chart within three months after diagnosis. If an inpatient chart was identified, records were requested for 7 days prior to 7 days following the encounter date since dates in claims data may vary slightly from actual dates of service. Each chart was reviewed prior to abstraction to ensure adequate material for adjudication was present.

Chart Abstraction.

Charts were abstracted by two trained abstractors. Ten charts were abstracted by both abstractors and any discrepancies resolved before continuing with single abstraction of the remaining charts. The abstraction and adjudication forms were developed using REDCap software,¹⁶ and can be found in Supplement D.

Chart Adjudication.

Ten charts were adjudicated by two hematologist-oncologists (MR and SH) and any discrepancies resolved before continuing with single adjudication of the remaining charts. Adjudicators determined the status for each case, defined as definite lymphoma, probable lymphoma, possible or less than 50/50, unlikely or no evidence of lymphoma, or other disorder. A definite case was defined as having evidence in the chart (e.g. biopsy or flow cytometry results) consistent with a lymphoma diagnosis, with information present in a biopsy or pathology report. A probable case was defined as having evidence in the chart to suggest a lymphoma diagnosis supported by clinician notes (e.g. description of biopsy results) if no pathology report was present. If some evidence suggested a lymphoma diagnosis, but key information, such as biopsy results or clinician diagnosis, was missing, that case could be defined as possible or less than 50/50. If no evidence to support a lymphoma diagnosis was present, the case was defined as unlikely. If other disorder was selected, a specific non-lymphoma diagnosis was documented. Definite and probable cases together were considered as true positive lymphoma cases and adjudicators were asked to determine whether the case had NHL or HL. All other cases were defined as false positive lymphoma cases.

Evaluation of Algorithm Performance

The PPV and corresponding 95% Confidence Interval (CI) were calculated to evaluate algorithm performance based on the results of the chart review. All cases confirmed by chart review (i.e. those adjudicated as definite or probable lymphoma) were considered true positives. As a sensitivity analysis, we calculated the PPV using only definite lymphoma cases to define true positives and calculated the PPV separately for each Data Partner. To evaluate the ability of the adjudicators to determine cases status based on review of PEPR claims profiles, concordance with chart review was assessed and misclassification was evaluated. We also assessed the ability of adjudicators to identify NHL and HL specifically in the PEPR claims profiles by comparing the percentage of each subtype identified in the claims profile review to the percentage identified in chart review. Clinical data and demographic characteristics of cases were summarized using descriptive statistics by final case status and lymphoma subtype.

Secondary Analysis

As a secondary, exploratory, analysis, we evaluated the ability of the algorithm to identify NHL and HL cases separately by calculating subtype specific PPVs from among the 134 patients selected for chart review. We did not modify the algorithm. To be defined as an NHL case, both lymphoma diagnosis codes satisfying the first algorithm component were required to indicate NHL; codes occurring later in time were not evaluated. If a participant had only HL codes for the two qualifying diagnoses, they were classified as an HL case. For patients with qualifying diagnoses that included both NHL and HL codes, they were classified as mixed subtype. PPV was calculated for each subtype-specific algorithm as described above, considering both probable and definite cases as true cases. Due to small numbers, exact 95% confidence intervals were calculated.

Results

The electronic algorithm identified 8,723 potential lymphoma cases. A random sample of 213 individual claims profiles were selected for review, of which 211 passed quality checks. Based on patients’ claims profiles, adjudicators judged 80% (N=169) of patients likely to be lymphoma cases, 10% (N=20) of patients unlikely to have lymphoma, and were unable to determine the status of 10% (N=22) of patients based on the evidence provided in the claims profiles. Of the 169 likely lymphoma cases, 92% (N=155) were classified as NHL, 8% (N=13) as HL, and subtype could not be determined for one case.

Following the identification and ranking of healthcare encounters in the claims profile data, 210 charts were pursued from the primary medical encounter identified by adjudicators. Of the 210 requested charts, 138 (65%) were retrieved and redacted, including 10 inpatient charts. The most common reasons for unobtainable charts included no patient record (N=22) and unable to locate provider (N=20). Four charts did not pass quality checks; as a result, 63% (N=134) of requested charts were abstracted and adjudicated.

Based on medical chart review, the adjudicators classified 46% (N=61) of cases as definite lymphoma and 31% (N=42) as probable lymphoma, for a total of 103 true positive cases (Table 1). The overall PPV for the algorithm was 77% (95% CI: 69–84%) for definite and probable cases.

Table 1.

Selected demographic characteristics of the subset of potential lymphoma cases identified by a claims-based algorithm that underwent medical chart adjudication, by adjudication status and histologic subtype

Overall		Final Lymphoma Adjudication Status					Subtype¹
		True positives (lymphoma cases)		False positives (non-cases)			NHL	HL	Not Lymphoma
		Definite	Probable	Possible or less than 50/50	No evidence of lymphoma	Other condition	NHL	HL	Not Lymphoma
Total N (%)	134	61 (46%)	42 (31%)	8 (6%)	8 (6%)	15 (11%)	91 (68%)	11 (8%)	31 (23%)
Age, years
Mean	62.2	62.4	63.7	70.0	52.6	58.1	65.6	39.8	59.8
Median	65.5	65.0	66.5	67.5	62.0	66.0	67.0	34.0	66.0
Range	17–94	21–85	17–83	49–94	18–79	18–90	17–85	17–72	18–94
Sex, N (%)
Male	69 (52%)	36 (59%)	21 (50%)	4 (50%)	0	8 (53%)	54 (59%)	3 (27%)	12 (39%)
Female	64 (48%)	25 (41%)	20 (48%)	4 (50%)	8 (100%)	7 (47%)	37 (40%)	8 (73%)	19 (61%)
Missing²	1 (<1%)	0	1 (2%)	0	0	0	1 (1%)	0	0

Open in a new tab

The subtype of one true positive case could not be determined from available medical record documentation. NHL: Non-Hodgkin lymphoma; HL: Hodgkin lymphoma.

Sex could not be determined from available medical record documentation for one case.

Six percent of cases (N=8) were classified as possible or less than 50/50 chance of being lymphoma, 6% (N=8) had no evidence of lymphoma in the retrieved charts, and 11% (N=15) were classified as having other disorders. Of the charts classified as having other, non-lymphoma disorders, the majority of cases were determined to have other malignancies including solid tumors (N=7; e.g. prostate, lung, and breast cancer), and hematologic malignancies (N=5; e.g. leukemia). Overall, 27 of the adjudicated charts were missing reports on either the diagnostic procedure (33%), the imaging procedure (44%), or both (22%). Of those, 56% were adjudicated as definite or probable lymphoma.

The PPVs stratified by Data Partner ranged from 68% (95% CI: 50–82%) to 100% (95% CI: 79–100%) when considering both definite and probable lymphoma cases as true positives. In the sensitivity analysis where only definite lymphoma cases were considered as true positives, the overall PPV was 46% (95% CI: 37–54%), with Data Partner-specific estimates ranging from 32% (95% CI: 18–50%) to 81% (95% CI: 54–96%).

In the subset of adjudicated cases, 52% of cases were male and 48% were female (Table 1); true positive cases (definite or probable lymphoma) were also more likely to be male (55%). The overall mean age at diagnosis/index date of true positive cases was 63.0 years (range: 17.0–85.0), slightly older than the mean age for false positive cases of 59.8 years (range: 18.0–94.0). Median age was similar across categories of adjudicated case status (range: 62.0–67.5 years). By histologic subtype, the median age at diagnosis among 91 NHL cases was 67.0 years (mean: 65.6; range: 17.0–85.0), and 34.0 years for the 11 HL cases (mean: 39.8; range: 17.0–72.0). NHL cases were more likely to be male (59%), while HL cases were more likely to be female (73%). Histologic subtype for one true positive case could not be determined.

Comparing the chart and patient claim profile adjudications for possible misclassification, adjudicators correctly categorized 87% of lymphoma cases (92/106 cases identified by claim profile adjudication). Fourteen patients (13%) categorized as likely cases based on summary claims data were not confirmed by chart review. Of the 11 patients that the adjudicators categorized as unlikely lymphoma based on the summary claims data, two (18%) were determined to have lymphoma following chart review.

To evaluate the ability of the algorithm to identify sufficient information to specify lymphoma subtype (NHL or HL), we evaluated subtypes in the medical charts and claims profiles adjudicated to be lymphoma cases. Of the 103 cases classified as definite or probable lymphoma following chart review, 88% (N=91) were NHL, 11% (N=11) were HL, and one subtype could not be determined. Similarly, of the 169 likely lymphoma cases determined by patient claims profiles, 92% (N=155) were NHL, 8% (N=13) were HL, and one subtype could not be determined.

We evaluated the ability of the broad lymphoma algorithm to identify NHL and HL cases separately as a secondary analysis. Of the 134 patients with medical charts abstracted for this study, 123 cases were classified as NHL, and 8 were classified as HL based on their two algorithm-qualifying diagnosis codes. Three patients had a mixture of NHL and HL qualifying diagnosis codes and were not included in the calculations. Following chart review, 91 NHL cases and 7 HL cases were confirmed as definite or probable lymphoma. The resulting PPV for the NHL-only algorithm was 74% (95% CI: 65–81%). Although numbers were very small, the PPV for the HL-only algorithm was 88% (95% CI: 47–100%).

Discussion

Our study is the first to utilize ICD-10-CM codes to identify lymphoma in administrative health data, and incorporates an extensive, oncologist-vetted list of diagnosis and procedure codes relevant to case identification. Our algorithm was applied to the general patient population aged ≥15 years from four Sentinel Data Partners across the US, with the goal of constructing a tool that can be used in future studies with all lymphoma, regardless of histologic subtype, as a health outcome of interest, and is generalizable to an insured, adult population.

Our results suggest that lymphoma can be reliably identified in the Sentinel System, and by extension other administrative claim data sources, using our ICD-10-CM-based algorithm validated by chart review with a PPV of 77% (95% CI: 69–84%). Review of the patient claims profiles also identified the majority (87%) of true lymphoma cases when compared with the gold standard of chart review. Our study did not calculate the sensitivity or specificity of the algorithm, as extensive chart review from the large source population would have been required. Linkage to cancer registry data, considered the gold standard of cancer case ascertainment, was not possible with this geographically diverse population. Given the intent of the study, the PPV is the appropriate measure of validity. Further, this approach is common given the nature of the underlying data source.^17,18

Our study incorporated a combined lymphoma outcome consisting of both NHL and HL endpoints, and we observed that subtype-specific data could be consistently extracted and confirmed by review of both summary claims profiles and medical charts. The observed distribution of NHL and HL (approximately 90% and 10%, respectively) among cases identified by the algorithm, as well as the median ages at diagnosis for both subtypes were in line with the known epidemiology of lymphoma.² However, we were unable to further distinguish the heterogeneous histologic subtypes of lymphoma, as this would require the development of many subtype-specific algorithms, and was beyond the scope of the present study. As a secondary analysis, we examined whether the algorithm could detect NHL and HL subtypes separately, defining case subtype based on the two algorithm-qualifying diagnosis codes used to satisfy the first component of the algorithm. Both the NHL- and HL-only PPVs were promising, but should be interpreted with caution as the algorithm was not designed to explicitly capture these subtypes.

Our results build upon prior Sentinel work to ascertain validated methods for identifying lymphoma in administrative health data,⁶ and updates the existing limited literature on this topic. A systematic review of the literature that constructed algorithms to identify lymphoma from administrative health data identified 10 studies that reported algorithms, but only one study reported validation results.^6,7 The validated lymphoma algorithms included adults aged ≥65 enrolled in Medicare from 1997–2000. The algorithm requiring two ICD-9 diagnosis codes within two months achieved the highest PPV (62.8%).⁷ A Canadian study incorporated ICD-10-CA codes to identify lymphoma among patients hospitalized with inflammatory bowel disease, but was not validated outside this population.¹⁹

Medical records were sought from a single provider per patient based on clinician review of the claims profile data. Since the type of provider per medical encounter is not currently captured in the Sentinel data, clinician adjudicators were asked to select primary and secondary encounters of interest. It is possible that due to variability of retrieved charts, essential information on lymphoma may not have been present, despite a lymphoma diagnosis occurring concurrently in a different clinical setting. As a result, such cases may have been incorrectly adjudicated as negative cases. Chart variability likely also contributed to the low PPV of the algorithm when only definite cases were considered, as original documentation of additional information may be in charts from other providers.

Although the number of inpatient charts reviewed was small (N=10), we requested corresponding records for a limited time period (±7 days around the index date) to limit the amount of material to be reviewed, introducing the possibility that lymphoma-specific data were not captured. Similar limitations existed for outpatient charts, with data requested from 30 days prior to 90 days after index date, a time frame that was determined based on early sensitivity analyses to most likely include relevant diagnostic results. A review of more comprehensive medical records could provide additional information to improve case identification, and subsequently, the algorithm PPV. However, the financial and time costs of reviewing typically large inpatient and outpatient charts must also be considered. The overall retrieval rate of medical charts was 65%, within range of historical retrieval percentages for Sentinel chart review.^17,18,20

Upon chart review, 11% of cases originally identified as lymphoma by the algorithm were adjudicated as other, non-lymphoma disorders, including solid tumors and other hematologic malignancies. We may have identified non-lymphoma diagnoses due to the decision to create inclusive lists of diagnosis and procedure codes to minimize the false negative rate, which results in a possible increase in false positives. However, we included both the requirements for a diagnostic procedure and an imaging study since almost all lymphoma cases would be expected to undergo both procedures as part of the diagnostic process. We did not include lymphoma treatment in the algorithm, as patients diagnosed with more indolent lymphoma subtypes may delay primary treatment for several years after diagnosis.

We chose to include patients ≥15 years old in the study, as lymphoma is rarely diagnosed in children aged <15 years,^2,5 thus we would have an insufficient number of cases to validate the algorithm in this age group. Restricting the algorithm to a higher risk group, such as patients with certain autoimmune conditions or organ transplant recipients,^21–24 may have increased the PPV by increasing the number of true lymphoma cases; however, it would have decreased the generalizability of the algorithm. The primary goal of the study was to identify all lymphoma cases for the purpose of drug safety studies, and chart review prioritized the combined lymphoma endpoint. It should also be noted that all patients included in this study had health insurance, and thus the results may not be generalizable to populations without insurance.

In conclusion, the overall PPV of an ICD-10-CM-based algorithm designed to identify lymphoma cases in administrative claims data suggests that the algorithm can be used in future studies. Despite the etiological and clinical heterogeneity of the various histologic subtypes of lymphoma, 77% of the lymphoma cases detected by the algorithm were correctly identified. The algorithm constructed in this study improves upon prior lymphoma algorithms and may help facilitate future studies of drug safety including this common and heterogeneous cancer as a potential endpoint using large administrative health datasets.

Supplementary Material

sm3

NIHMS1707792-supplement-sm3.docx^{(40.4KB, docx)}

sm1

NIHMS1707792-supplement-sm1.docx^{(79.9KB, docx)}

sm2

NIHMS1707792-supplement-sm2.docx^{(67.1KB, docx)}

sm4

NIHMS1707792-supplement-sm4.docx^{(193KB, docx)}

Key points:

We developed and validated the first published ICD-10-CM-based algorithm to identify lymphoma in administrative health data.
An algorithm based on ICD-10-CM diagnosis codes and procedure codes and applied to administrative claims data was able to identify lymphoma cases with a positive predictive value of 77% (95% CI: 69–84%).
An exploratory secondary analysis evaluated the ability of the broad algorithm to identify non-Hodgkin (PPV = 74% [95% CI: 65–81%]) and Hodgkin (PPV = 88% [95% CI: 47–100%]) lymphoma subtypes separately with similar accuracy despite small numbers.
Based on data from the initial review of claims profiles, adjudicators correctly classified 87% of potential lymphoma cases when compared to the chart review results.
This validated algorithm to identify patients newly diagnosed with lymphoma in administrative health claims data can be used in future studies of drug safety and surveillance.

Acknowledgements

The authors would like to acknowledge the contributions of Yunping Zhou (Humana Healthcare Research, Inc. (HHR)), Jennifer Kuntz (Kaiser Permanente Northwest), Kevin Haynes, Lauren Parlett, and Shia Kent (HealthCore (Anthem)), and Michael Nguyen (FDA), and thank them for their assistance with this project.

This work was supported by funding from the U.S. Food and Drug Administration under the following contract: FDA HHSF223201400030I. MME is supported in part by the National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant KL2TR001454.

Conflict of interest statement:

Dr. McMahill-Walraven is employed by Aetna, a CVS Health company. Aetna receives funding for public health and distributed research projects as a subcontractor from Harvard Pilgrim Health Care Institute for FDA Sentinel, Managed Care Pharmacy’s (AMCP) Biologics and Biosimilars Collective Intelligence Collaborative (BBCIC), Pfizer, and GSK; and contractor from Patient Centered Outcomes Research Institute (PCORI), Reagan-Udall’s Foundation Innovation in Medical Evidence Development and Surveillance (IMEDS), and Pfizer. No other authors have conflicts of interest to declare.

Footnotes

Prior presentations: An abstract was presented at the 2020 Society for Epidemiologic Annual Meeting (virtual) as a poster.

Publisher's Disclaimer: Disclaimer

Publisher's Disclaimer: The views expressed in this manuscript are those of the authors and are not to be construed as conveying either an official endorsement or criticism by U.S. Department of Health and Human Services, U.S. Food and Drug Administration.

References

1.Kaushansky K Williams Hematology. Ninth edition. New York: McGraw-Hill; 2016. [Google Scholar]
2.Howlader N, Noone AM, Krapcho M, Miller D, Brest A, Yu M, Ruhl J, Tatalovich Z, Mariotto A, Lewis DR, Chen HS, Feuer EJ, Cronin KA (eds). SEER Cancer Statistics Review, 1975–2016, National Cancer Institute. Bethesda, MD, https://seer.cancer.gov/csr/1975_2016/, based on November 2018 SEER data submission, posted to the SEER web site, April 2019. [Google Scholar]
3.American Cancer Society. Cancer Facts & Figures 2021. Atlanta: American Cancer Society; 2021. [Google Scholar]
4.Swerdlow SH, Campo E, Harris NL, Jaffe ES, Pileri SA, Stein H, Thiele J, Vardiman JW, editors. WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues. 4th edition. Lyon: International Agency for Research on Cancer; 2008. [Google Scholar]
5.Clarke CA, Glaser SL, Prehn AW. Age-specific survival after Hodgkin’s disease in a population-based cohort (United States). Cancer Causes Control. 2001;12(9):803–812. [DOI] [PubMed] [Google Scholar]
6.Herman RA, Gilchrist B, Link BK, Carnahan R. A systematic review of validated methods for identifying lymphoma using administrative data. Pharmacoepidemiol Drug Saf. 2012;21 Suppl 1:203–212. [DOI] [PubMed] [Google Scholar]
7.Setoguchi S, Solomon DH, Glynn RJ, Cook EF, Levin R, Schneeweiss S. Agreement of diagnosis and its date for hematologic malignancies and solid tumors between medicare claims and cancer registry data. Cancer Causes Control. 2007;18(5):561–569. [DOI] [PubMed] [Google Scholar]
8.Behrman RE, Benner JS, Brown JS, McClellan M, Woodcock J, Platt R. Developing the Sentinel System--a national resource for evidence development. N Engl J Med. 2011;364(6):498–499. [DOI] [PubMed] [Google Scholar]
9.Platt R, Brown JS, Robb M, et al. The FDA Sentinel Initiative - An Evolving National Resource. N Engl J Med. 2018;379(22):2091–2093. [DOI] [PubMed] [Google Scholar]
10.Ball R, Robb M, Anderson SA, Dal Pan G. The FDA’s sentinel initiative--A comprehensive approach to medical product surveillance. Clin Pharmacol Ther. 2016;99(3):265–268. [DOI] [PubMed] [Google Scholar]
11.Rosati K, Jorgensen N, Soliz M, Evans B (2018) HIPAA and Common Rule Compliance in the Sentinel Initiative, white paper. https://www.sentinelinitiative.org/sites/default/files/communications/publications-presentations/HIPPA-Common-Rule-Compliance-in-Sentinel-Initiative.pdf [serial online]. Accessed 25 September 2019.
12.Fung KW, Richesson R, Smerek M, et al. Preparing for the ICD-10-CM Transition: Automated Methods for Translating ICD Codes in Clinical Phenotype Definitions. EGEMS (Wash DC). 2016;4(1):1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Sentinel Operations Center. Sentinel Modular Programs Querying Tools: Overview of Functionality and Technical Documentation. Version 7.0.0. November 1, 2018. Available from: https://www.sentinelinitiative.org/sites/default/files/surveillance-tools/routine-querying/Sentinel-Routine_Querying_System-Documentation_7.0.0.pdf. Accessed September 30, 2020.
14.Cole DV, Kulldorff M, Baker M, et al. Infrastructure for Evaluation of Statistical Alerts Arising from Vaccine Safety Data Mining Activities in Mini-Sentinel. Silver Spring, MD: Sentinel Initiative, US Food and Drug Administration; July 2016. Available from: https://www.sentinelinitiative.org/sites/default/files/Methods/Mini-Sentinel_PRISM_Data-Mining-Infrastructure_Report_0.pdf. Accessed March 1, 2018. [Google Scholar]
15.Cutrona SL, Toh S, Iyer A, et al. Design for validation of acute myocardial infarction cases in Mini-Sentinel. Pharmacoepidemiol Drug Saf. 2012;21 Suppl 1:274–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–381. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Cutrona SL, Toh S, Iyer A, et al. Validation of acute myocardial infarction in the Food and Drug Administration’s Mini-Sentinel program. Pharmacoepidemiol Drug Saf. 2013;22(1):40–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Walsh KE, Cutrona SL, Foy S, et al. Validation of anaphylaxis in the Food and Drug Administration’s Mini-Sentinel. Pharmacoepidemiol Drug Saf. 2013;22(11):1205–1213. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Bernstein CN, Nabalamba A. Hospitalization-based major comorbidity of inflammatory bowel disease in Canada. Can J Gastroenterol. 2007;21(8):507–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Ammann EM, Cuker A, Carnahan RM, et al. Chart validation of inpatient International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) administrative diagnosis codes for venous thromboembolism (VTE) among intravenous immune globulin (IGIV) users in the Sentinel Distributed Database. Medicine (Baltimore). 2018;97(8):e9960. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Silverberg MJ, Chao C, Leyden WA, et al. HIV infection, immunodeficiency, viral replication, and the risk of cancer. Cancer Epidemiol Biomarkers Prev. 2011;20(12):2551–2559. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Opelz G, Henderson R. Incidence of non-Hodgkin lymphoma in kidney and heart transplant recipients. Lancet. 1993;342(8886–8887):1514–1516. [DOI] [PubMed] [Google Scholar]
23.Ekstrom Smedby K, Vajdic CM, Falster M, et al. Autoimmune disorders and risk of non-Hodgkin lymphoma subtypes: a pooled analysis within the InterLymph Consortium. Blood. 2008;111(8):4029–4038. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Linet MS, Vajdic CM, Morton LM, et al. Medical history, lifestyle, family history, and occupational risk factors for follicular lymphoma: the InterLymph Non-Hodgkin Lymphoma Subtypes Project. J Natl Cancer Inst Monogr. 2014;2014(48):26–40. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sm3

NIHMS1707792-supplement-sm3.docx^{(40.4KB, docx)}

sm1

NIHMS1707792-supplement-sm1.docx^{(79.9KB, docx)}

sm2

NIHMS1707792-supplement-sm2.docx^{(67.1KB, docx)}

sm4

NIHMS1707792-supplement-sm4.docx^{(193KB, docx)}

[R1] 1.Kaushansky K Williams Hematology. Ninth edition. New York: McGraw-Hill; 2016. [Google Scholar]

[R2] 2.Howlader N, Noone AM, Krapcho M, Miller D, Brest A, Yu M, Ruhl J, Tatalovich Z, Mariotto A, Lewis DR, Chen HS, Feuer EJ, Cronin KA (eds). SEER Cancer Statistics Review, 1975–2016, National Cancer Institute. Bethesda, MD, https://seer.cancer.gov/csr/1975_2016/, based on November 2018 SEER data submission, posted to the SEER web site, April 2019. [Google Scholar]

[R3] 3.American Cancer Society. Cancer Facts & Figures 2021. Atlanta: American Cancer Society; 2021. [Google Scholar]

[R4] 4.Swerdlow SH, Campo E, Harris NL, Jaffe ES, Pileri SA, Stein H, Thiele J, Vardiman JW, editors. WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues. 4th edition. Lyon: International Agency for Research on Cancer; 2008. [Google Scholar]

[R5] 5.Clarke CA, Glaser SL, Prehn AW. Age-specific survival after Hodgkin’s disease in a population-based cohort (United States). Cancer Causes Control. 2001;12(9):803–812. [DOI] [PubMed] [Google Scholar]

[R6] 6.Herman RA, Gilchrist B, Link BK, Carnahan R. A systematic review of validated methods for identifying lymphoma using administrative data. Pharmacoepidemiol Drug Saf. 2012;21 Suppl 1:203–212. [DOI] [PubMed] [Google Scholar]

[R7] 7.Setoguchi S, Solomon DH, Glynn RJ, Cook EF, Levin R, Schneeweiss S. Agreement of diagnosis and its date for hematologic malignancies and solid tumors between medicare claims and cancer registry data. Cancer Causes Control. 2007;18(5):561–569. [DOI] [PubMed] [Google Scholar]

[R8] 8.Behrman RE, Benner JS, Brown JS, McClellan M, Woodcock J, Platt R. Developing the Sentinel System--a national resource for evidence development. N Engl J Med. 2011;364(6):498–499. [DOI] [PubMed] [Google Scholar]

[R9] 9.Platt R, Brown JS, Robb M, et al. The FDA Sentinel Initiative - An Evolving National Resource. N Engl J Med. 2018;379(22):2091–2093. [DOI] [PubMed] [Google Scholar]

[R10] 10.Ball R, Robb M, Anderson SA, Dal Pan G. The FDA’s sentinel initiative--A comprehensive approach to medical product surveillance. Clin Pharmacol Ther. 2016;99(3):265–268. [DOI] [PubMed] [Google Scholar]

[R11] 11.Rosati K, Jorgensen N, Soliz M, Evans B (2018) HIPAA and Common Rule Compliance in the Sentinel Initiative, white paper. https://www.sentinelinitiative.org/sites/default/files/communications/publications-presentations/HIPPA-Common-Rule-Compliance-in-Sentinel-Initiative.pdf [serial online]. Accessed 25 September 2019.

[R12] 12.Fung KW, Richesson R, Smerek M, et al. Preparing for the ICD-10-CM Transition: Automated Methods for Translating ICD Codes in Clinical Phenotype Definitions. EGEMS (Wash DC). 2016;4(1):1211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Sentinel Operations Center. Sentinel Modular Programs Querying Tools: Overview of Functionality and Technical Documentation. Version 7.0.0. November 1, 2018. Available from: https://www.sentinelinitiative.org/sites/default/files/surveillance-tools/routine-querying/Sentinel-Routine_Querying_System-Documentation_7.0.0.pdf. Accessed September 30, 2020.

[R14] 14.Cole DV, Kulldorff M, Baker M, et al. Infrastructure for Evaluation of Statistical Alerts Arising from Vaccine Safety Data Mining Activities in Mini-Sentinel. Silver Spring, MD: Sentinel Initiative, US Food and Drug Administration; July 2016. Available from: https://www.sentinelinitiative.org/sites/default/files/Methods/Mini-Sentinel_PRISM_Data-Mining-Infrastructure_Report_0.pdf. Accessed March 1, 2018. [Google Scholar]

[R15] 15.Cutrona SL, Toh S, Iyer A, et al. Design for validation of acute myocardial infarction cases in Mini-Sentinel. Pharmacoepidemiol Drug Saf. 2012;21 Suppl 1:274–281. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–381. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Cutrona SL, Toh S, Iyer A, et al. Validation of acute myocardial infarction in the Food and Drug Administration’s Mini-Sentinel program. Pharmacoepidemiol Drug Saf. 2013;22(1):40–54. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Walsh KE, Cutrona SL, Foy S, et al. Validation of anaphylaxis in the Food and Drug Administration’s Mini-Sentinel. Pharmacoepidemiol Drug Saf. 2013;22(11):1205–1213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Bernstein CN, Nabalamba A. Hospitalization-based major comorbidity of inflammatory bowel disease in Canada. Can J Gastroenterol. 2007;21(8):507–511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Ammann EM, Cuker A, Carnahan RM, et al. Chart validation of inpatient International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) administrative diagnosis codes for venous thromboembolism (VTE) among intravenous immune globulin (IGIV) users in the Sentinel Distributed Database. Medicine (Baltimore). 2018;97(8):e9960. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Silverberg MJ, Chao C, Leyden WA, et al. HIV infection, immunodeficiency, viral replication, and the risk of cancer. Cancer Epidemiol Biomarkers Prev. 2011;20(12):2551–2559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Opelz G, Henderson R. Incidence of non-Hodgkin lymphoma in kidney and heart transplant recipients. Lancet. 1993;342(8886–8887):1514–1516. [DOI] [PubMed] [Google Scholar]

[R23] 23.Ekstrom Smedby K, Vajdic CM, Falster M, et al. Autoimmune disorders and risk of non-Hodgkin lymphoma subtypes: a pooled analysis within the InterLymph Consortium. Blood. 2008;111(8):4029–4038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Linet MS, Vajdic CM, Morton LM, et al. Medical history, lifestyle, family history, and occupational risk factors for follicular lymphoma: the InterLymph Non-Hodgkin Lymphoma Subtypes Project. J Natl Cancer Inst Monogr. 2014;2014(48):26–40. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Validation of an electronic algorithm for Hodgkin and non-Hodgkin Lymphoma in ICD-10-CM

Mara M Epstein

Sarah K Dutcher

Judith C Maro

Cassandra Saphirak

Sandra DeLuccia

Muthalagu Ramanathan

Tejaswini Dhawale

Sonali Harchandani

Christopher Delude

Laura Hou

Autumn Gertz

Nina DiNunzio

Cheryl N McMahill-Walraven

Mano S Selvan

Justin Vigeant

David V Cole

Kira Leishear

Jerry H Gurwitz

Susan Andrade

Noelle M Cocoros

Abstract

Purpose:

Methods:

Results:

Conclusions:

Introduction

Methods

Data Source

Study Population

Development of an ICD-10-CM-based algorithm

Figure 1.

Algorithm Evaluation: Patient Episode Profile Retrieval (PEPR)

Algorithm Evaluation: Chart Review

Chart Selection.

Chart Abstraction.

Chart Adjudication.

Evaluation of Algorithm Performance

Secondary Analysis

Results

Table 1.

Discussion

Supplementary Material

Key points:

Acknowledgements

Conflict of interest statement:

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases