Abstract
Background
The aim of this retrospective study was to demonstrate that irAEs, specifically gastrointestinal and pulmonary, examined through International Classification of Disease (ICD) data leads to underrepresentation of true irAEs and overrepresentation of false irAEs, thereby concluding that ICD claims data are a poor approach to electronic health record (EHR) data mining for irAEs in immunotherapy clinical research.
Methods
This retrospective analysis was conducted in 1,063 cancer patients who received ICIs between 2011 and 2017. We identified irAEs by manual review of medical records to determine the incidence of each of our endpoints, namely colitis, hepatitis, pneumonitis, other irAE, or no irAE. We then performed a secondary analysis utilizing ICD claims data alone using a broad range of symptom and disease-specific ICD codes representative of irAEs.
Results
16% (n = 174/1,063) of the total study population was initially found to have either pneumonitis 3% (n = 37), colitis 7% (n = 81) or hepatitis 5% (n = 56) on manual review. Of these patients, 46% (n = 80/174) did not have ICD code evidence in the EHR reflecting their irAE. Of the total patients not found to have any irAEs during manual review, 61% (n = 459/748) of patients had ICD codes suggestive of possible irAE, yet were not identified as having an irAE during manual review.
Discussion
Examining gastrointestinal and pulmonary irAEs through the International Classification of Disease (ICD) data leads to underrepresentation of true irAEs and overrepresentation of false irAEs.
Supplementary Information
The online version of this article contains supplementary material available at 10.1007/s00262-021-02880-0
Keywords: Immune-related adverse events (irAE), Immune checkpoint inhibitors (ICI), Immunotherapy, ICD codes, Adverse drug events (ADEs), Data-mining
Background
Immunotherapy
Immune checkpoint inhibitors (ICIs) have revolutionized cancer therapy and are clinically associated with durable response in a wide range of cancers. This novel class of agents enhances adaptive immune responses to cancer through inhibition of major T-lymphocyte inhibitory pathways that otherwise block immune escape mechanisms [1]. They primarily work by inhibiting CTLA-4, PD-1 or PD-L-1 thereby promoting anti-tumor immune activation.
The US Food and Drug Administration (FDA) has approved 6 total immune checkpoint inhibitors between 2011 and 2017 that clinicians may now choose from for the treatment of specific tumors[2]. However, unique autoimmune-like syndromes, termed immune-related adverse events (irAEs), are increasingly being recognized in patients treated with ICIs. Clear risk factors for irAEs have not been identified, nor are there reliable markers for serious or even fatal toxicities [3, 4].
More than two-thirds of patients who receive anti-CTLA-4 therapy develop an irAE[5]. IrAEs may involve any organ system; however, most common toxicities include skin, endocrine, pulmonary, and gastrointestinal systems [1, 2, 6, 7]. We chose to focus on colitis, hepatitis and pneumonitis for several reasons. Severe ICI-related gastroenterocolitis, hepatitis or pneumonitis requiring hospitalization often necessitates acute intervention and forces oncologists to make difficult choices regarding continuation of current treatment and decisions regarding rechallenging with ICIs [8]. As will be outlined below, their incidence is relatively common and clinical vigilance is necessary on any patients treated with immunotherapy. In a study of 273 irAEs considered by an immunotherapy toxicity board overlooking cases in France, 4% (n = 11) irAEs were found to be fatal events [9]. Depending on severity, treatment can include steroids in addition to consideration for immunotherapy withholding or discontinuation [5, 9, 10]. Although many other irAEs may occur, such as dermatologic or endocrinopathies, these does not usually require treatment discontinuation nor do they commonly lead to severe or fatal events and are therefore not as clinically significant [9]. Efforts to identify known or novel irAEs and reduce these events are an oncologic priority. At present, irAE monitoring relies on a combination of pre-market clinical trials, spontaneous reporting systems, biomedical literature reviews, electronic health records claims data, manual review of clinical notes, and text mining.
irAEs: gastrointestinal and pulmonary
Gastrointestinal tract (GI) toxicities are a commonly recorded serious irAEs. Additionally, ICI-related GI toxicities are the most common reason for immunotherapy treatment discontinuation [11]. Regimens containing CTLA-4 agents are more likely to cause GI toxicities compared to PD-1 or PD-L-1[11]. More than one-third of patients receiving anti-CTLA-4 therapy experience irAEs of the gastrointestinal tract, such as aphthous ulcers, esophagitis, gastritis, and enterocolitis, usually presenting as diarrhea or abdominal pain [5].
Depending on the targeted pathway, 20–50% of patients receiving ICI monotherapy develop some form of gastroenterocolitis and 2–10% develop severe disease. Combination ICI therapy results in 46–51% of the exposed patients developing ICI-related gastroenterocolitis with 8–18% developing a severe form [1]. Discontinuation due to the severity of irAEs occurs in approximately 10% of patients [12].
Although less frequent than ICI-colitis, ICI-associated hepatotoxicity is also not uncommon. Hepatitis is marked by hepatocellular injury defined as elevations in serum alanine and aspartate aminotransferases (ALT and AST). In clinical trials of these agents, hepatitis was found to occur in 5–10% of patients treated with a single ICI, of whom 1–2% experienced grade 3–4 hepatitis [13]. However, grade 2–4 ALT elevations were observed in approximately 20% of patients receiving the combination of ipilimumab and nivolumab consistent with the finding that combination CLA-4 and PD-1/PD-L1 blockade is associated with more frequent and more severe immune-related toxicities [13]. ICI-associated hepatitis is most often asymptomatic; thus, liver panels are often measured at baseline and before each treatment cycle. The typical timing of ICI-associated hepatitis is in the range of 6–14 weeks following treatment initiation.
A third immune adverse event associated with immunotherapy is pneumonitis which manifests with pulmonary symptoms such as coughing and shortness of breath. Checkpoint inhibitor pneumonitis is rare with an incidence of < 5% in clinical trials utilizing monotherapy and > 5% in combined therapies [14]. Although the incidence is uncommon, it is one of the few irAEs associated with drug-related deaths, necessitating heightened vigilance [14]. Pneumonitis is the most common irAE responsible for the discontinuation of immunotherapy [10]. Pneumonitis is more commonly seen with anti-PD-1 compared to anti-CTLA-4 inhibitors, which is in contrast to the majority of irAEs.
Paucity of clinical data
Efforts to optimize immunotherapy use and reduce adverse drug events are an oncologic priority. Currently, irAE monitoring relies on the analysis of clinical trials, spontaneous reports, review of the biomedical literature, and electronic health records primarily in the form of structured data. Adverse drug events (ADEs) should ideally be captured in randomized controlled trials before a drug ever enters the market [15]. Nonetheless, ADEs that manifest in clinical practice may vary dramatically from those identified in pre-market clinical trials [16]. Pre-market clinical trials may underreport the impact of adverse events due to lack of generalizability, limited timeframes and smaller sample sizes. Post-market awareness may rely on spontaneous reporting systems (SRS), which rely on case reports of adverse drug events being voluntarily submitted by health professionals and pharmaceutical companies to the national pharmacovigilance center [17]. However, adverse events obtained in this manner are typically underreported by physicians and medical personnel [18].
Electronic health records (EHRs) contain a plethora of health data that are generally inexpensive, readily accessible and collected without interfering in the delivery of care. These data sources are more likely to reflect the outcomes experienced by patients in the real-world clinical practice setting. EHRs increasingly use the International Classification of Diseases (ICD) revision 10 system to classify diagnostic health services utilization data and enable coders to document adverse drug events. EHRs contain a longitudinal record of clinical data from routine clinical care and have been widely used for adverse drug event surveillance and detection. While some information is structured, a significant portion of the EHR remains in narrative formats. Much of the information that is critical to uncovering irAE occurrences is only available in narrative text. In comparison, coded diagnoses and claims data have relatively low sensitivity for detecting ADEs, weaker coverage of symptomatology and are vulnerable to inaccuracy as they are oriented toward billing purposes.
Previous literature outlining limitations of ICD codes in ADE detection
In a study at the University of Michigan, ICD codes for all hospital admissions during a 6-month window were assessed using an initial software called Data Direct to identify potential cases of drug-induced liver injury (DILI). A total of 489 potential cases were identified using the software; however, only 32 were true DILI after manual review, and then, only 12 were coded using the code K71.9 for DILI resulting in a PPV of 2.5% [19]. When the initial ICD code list was later expanded, the PPV was increased to 6.5% suggesting not all cases are coded by the appropriate ICD code. However, it was apparent that the search is overly sensitive in efforts to identify DILI.
A prospective observational cohort study performed at the University of British Columbia Emergency Departments aimed to compare ADEs diagnosed and recorded at point-of-care compared to ADEs reflected in the administrative data. Among 1574 visits, 221 were identified as adverse drug events; however, only 15 adverse drug events were documented with ICD-10 codes indicating a sensitivity of 6.8% [20]. When ICD codes were broadened, sensitivity corresponded to 28.1% compared to point-of-care diagnosis suggesting significant deficit in ICD reporting. A strength of this study was the utilization of a literature review to identify adverse drug event codes in the ICD-10 coding system, thereby broadening the sensitivity of their latter search.
Methods
Objective
We present a retrospective analysis demonstrating that irAEs, specifically gastrointestinal and pulmonary, examined through International Classification of Disease (ICD) data lead to underrepresentation of true irAEs and overrepresentation of false irAEs, thereby concluding that ICD claims data are a poor approach to EHR data mining for irAEs in immunotherapy clinical research.
Retrospective descriptive study design on irAE detection
We conducted a retrospective, descriptive, single-center study after obtaining approval from the Institutional Review Board at the Ohio State University (IRB #2016C0070). We investigated adult cancer patients who had received at least one dose of ICI at Ohio State University between January 1, 2011, and June 1, 2017. Adult cancer patients who received PD-L1 inhibitors, PD-1 inhibitors, or CTLA-4 inhibitors, as single agents or as multiple-agent therapy for a malignancy under a clinical trial or as part of standard practice were included. Patients in any line of therapy and any cancer diagnosis were included. The study included both men and women and all racial and ethnic subgroups. Patients were not excluded from this study on the basis of a history of known HIV or HCV positive status.
Chart review process
We extracted patient data from institutional electronic medical charts. Once we identified our study cohort, we collected patient demographic data including date of birth, gender, date of death (if applicable), past medical history including concomitant medications, information about the tumor (date of diagnosis, pathologic features, previous treatments, etc.), information about treatment used (type, date initiated, side effects, etc.), and response to treatment (clinical, tumor markers and radiologic).
We identified irAEs by manual review of all patients who received immunotherapy. For manual chart review, a pharmacy database of patients confirmed to have received at least one dose of ICI was utilized. Manual chart review of all patients during immunotherapy was conducted through assessment of the clinical notes. irAE were defined by treating oncologist or multi-disciplinary team based on the following criteria as per prior studies (REF: PMID: 33,119,034; PMID: 32,140,762): (1) timing of irAE occurring after initiation of ICI; (2) pathologic diagnosis, when available; (3) exclusion of alternate etiologies; or (4) clinical improvement with appropriate irAE–directed therapies. Review included history and physicals, progress notes, telephone encounters, laboratory results, radiology studies and pathology. All identification of irAEs relied on documentation from the treating team responsible for the care of the patient including inpatient, outpatient and emergency department encounters. These data were stored in a password-protected institutional REDCap database [21]. Through our chart review, we determined the incidence of each of our endpoints, specifically; colitis, hepatitis, pneumonitis, other irAE or no irAE.
ICD code-based symptoms
After we completed manual identification of all patients with irAEs and their subtypes, we then performed a secondary analysis utilizing ICD claims data alone without the use of manual review or clinical notes. The exhaustive list of ICD codes utilized for screening the three endpoints of interest (colitis, hepatitis, pneumonitis) can be seen in Table of ESM 1. The ICD code list was tailored to be a broad representation of all potential manifestations of the irAEs. Their selection was implemented through the guidance of experienced oncologists in our medical center with significant involvement in the prescribing, monitoring and research of immunotherapy and their adverse events.
A secondary analysis using a narrower range of ICD codes more specific to immunotherapy-related adverse events was utilized as well (Table of ESM 2). This narrow range of ICD codes was compiled by a group of treating oncologists in a subjective manner based on their personal experience with ICD code documentation of immunotherapy.
As mentioned earlier, patients were categorized into the following endpoints: colitis, hepatitis, pneumonitis, other irAE or no irAE. All patients with positive ICD codes were then subcategorized into three categories: (1) irAE ICD codes documented before immunotherapy date, (2) between 0 and 89 days after immunotherapy date, or (3) 90 + days after immunotherapy date (Table of ESM 3). Although irAE can occur at any time during treatment, most occur during the first 3 months of therapy [1, 22]; therefore, we evaluated ICD codes within the first 90 days of treatment and beyond 90 days of ICI (Table of ESM 3).
Results
Patient characteristics
A total of 1063 patients were included in the analysis. Demographics and patient characteristics are shown in Table 1.
Table 1.
Patient characteristics
| Age (median) | 61 | (%) |
|---|---|---|
| Male | 626 | 59 |
| Female | 435 | 41 |
| Malignancy | ||
| Non-small cell lung cancer | 199 | 19 |
| Melanoma | 340 | 32 |
| Renal cell carcinoma | 118 | 11 |
| Head and neck carcinoma | 67 | 6 |
| Bladder cancer | 43 | 4 |
| Sarcoma | 32 | 3 |
| Other | 262 | 25 |
| Treatment | ||
| Anti-PD 1 monotherapy (nivolumab, pembrolizumab) | 702 | 66 |
| Anti-PD L1 monotherapy (atezolizumab, durvalumab) | 34 | 3 |
| Anti-CTLA4 monotherapy (ipilimumab, tremelimumab) | 195 | 18 |
| Combination PD1 and CTLA4 | 77 | 7 |
| Other | 53 | 5 |
Initial findings using manual review
16% (n = 174/1,063) of the total study population was initially found to have either pneumonitis, colitis or hepatitis (Table of ESM 4). Specifically, 3% (n = 37) of patients had pneumonitis, 8% (n = 81) had colitis, and 5% (n = 56) had hepatitis. An additional 15% (n = 159) were found to have other irAEs other than hepatitis, colitis or pneumonitis. The remaining 70% (n = 748) were not identified to have any irAEs in the initial review.
Subsequent findings using ICD9/10 codes
Of the total patients manually found to have either pneumonitis, hepatitis, or colitis, 28% (n = 49/174) of patients did not have ICD code evidence in the EHR reflecting their irAE and 18% (n = 31/174) had irAE ICD codes documented before immunotherapy date (Table 2). Meanwhile, 45% (n = 79/174) within 0 and 89 days after immunotherapy date, and 9% (n = 15/174) more than 90 days after irAE date.
Table 2.
Analysis of ICD codes in patients confirmed to have pneumonitis, colitis or hepatitis by manual review
| Pneumonitis (37) | Colitis (81) | Hepatitis (56) | Total (174) | |||||
|---|---|---|---|---|---|---|---|---|
| ICD all | ICD nar | ICD all | ICD nar | ICD all | ICD nar | ICD all | ICD nar | |
| ICD code found before immunotherapy | 9 | 0 | 17 | 1 | 5 | 0 | 31 | 1 |
| ICD code found within 0 and 89 days after immunotherapy | 15 | 4 | 40 | 8 | 24 | 8 | 79 | 20 |
| ICD code found 90 + days after immunotherapy | 5 | 2 | 6 | 3 | 4 | 1 | 15 | 6 |
| No ICD code found | 8 | 31 | 18 | 69 | 23 | 47 | 49 | 147 |
Of the total patients manually found to have other irAEs (not pneumonitis, colitis or hepatitis), 34.8% (n = 77/159) of patients did not have ICD code evidence in the EHR reflecting their irAE and 21% (n = 33/159) had irAE ICD codes documented before immunotherapy administration (Table 3). Meanwhile, 14% (n = 23/159) between 0 and 89 days after administration and 16% (n = 26/159) more than 90 days after administration.
Table 3.
Analysis of ICD codes in patients confirmed to have other irAEs or no-irAE by manual review
| Other irAE(s) (159) | All | ICD nar | No irAE(s) (748) | All | ICD nar |
|---|---|---|---|---|---|
| ICD code found before immunotherapy | 33 | 0 | No ICD code identified | 289 | 721 |
| ICD code found between 0 and 89 days after immunotherapy | 23 | 1 | ICD code found within one year of immunotherapy | 444 | 25 |
| ICD code found 90 + days after immunotherapy | 26 | 2 | ICD code found after one year of immunotherapy | 15 | 2 |
| No ICD code found | 77 | 156 | ICD code found total | 459 | 27 |
Discussion
ICD codes leading to false positive identification of irAEs compared to manual review
In this study, we demonstrate that the reliance on a broad range of symptom-associated ICD codes in order to identify irAEs leads to over-identification and a high false positive rate when compared to manual review alone. The ICD codes demonstrated 469 false cases of irAEs that were not identified on manual review.
Several reasons may explain this over-identification of irAEs by reliance on the ICD record. There is significant overlap between the symptoms manifest in irAEs and other non-irAE related clinical conditions. The disease processes in this patient population, frequently stage IV malignancies, are often indistinguishable from the irAEs assessed. For example, many patients on immunotherapy have pulmonary malignancy as their primary diagnosis and report symptoms of shortness of breath or cough due to their malignancy. Furthermore, their poor health status and frequent hospitalizations puts them at high risk for secondary pneumonias. Either of these cases can be virtually indistinguishable from immunotherapy-induced pneumonitis without specialized imaging or appropriate therapy to rule out non-immunotherapy etiologies. In cases where non-irAE diagnosis manifest with nearly identical symptoms, the generic ICD code may be used, while identification of the underlying etiology is reserved for the clinical documentation alone. An example would be progressive post-obstructive pneumonia versus pneumonitis in which acute hypoxic respiratory failure is used for billing claims data, while the true diagnosis of immunotherapy pneumonitis is later documented in the chart once pneumonia has been ruled out with antibiotics.
This challenge is not as commonly encountered in other examples of adverse drug event research, when compared to immunotherapy research. For example, aspirin is an antiplatelet agent used to treat post-myocardial infarction patients; however, its adverse event of severe gastrointestinal bleeding is clearly distinguishable from the primary disease diagnosis of myocardial infarction. There is no overlap as we see in immunotherapy adverse events. Some may suggest that the solution is to apply a more narrow range of ICD codes targeting the specific adverse event of interest; however, this comes at enormous cost of sensitivity as shall be discussed in a later section.
The cumulative incidence of pulmonary, hepatitis and colitis irAEs identified in our study by manual review was 16%. However, the percentage reported by other clinical practice studies varies drastically from as low as 5% to as high as 82% depending on immunotherapy and irAE [1, 5, 9, 11, 14, 23]. This vastly differing number between studies demonstrates the enormity of the challenge in consistently identifying irAE occurrence in the real-world clinical setting. A study examining the interrater agreement of 2 observers in the occurrence and grade of irAE found that interobserver agreement was exceedingly poor (K = 0.37–0.64). As a control for data availability and access, observers had a high degree of agreement for the exact start date (98%) and end date (96%) of immunotherapy administration, suggesting that information interpretation rather than identification largely accounted for assessment differences [24]. This limitation in interobserver agreement reinforces the limited abilities of clinicians to consistently make the correct diagnosis of irAEs, thereby leading to large variations in reported incidence rates. Furthermore, this variability may also be seen between institutions and oncology practice groups. The heterogeneous presentation and clinical overlap with other conditions contributes to challenges and differences in irAE characterization.
ICD codes leading to false negative identification of irAEs compared to manual review
It is commonly agreed upon among health researchers that adverse drug events are underreported in administrative data [16], a finding that was replicated by our analysis. In the case of patients who were identified as having true irAEs on manual review, we demonstrated that the utility of ICD codes can perform rather poorly in comparison with manual review. Those with ICD codes not present or present prior to administration 46% (n = 80/174) would have led to false negative results, had ICD codes been relied on alone without manual review. Using these calculations, the prevalence of correctly identified patients with irAEs confirmed on manual review through the use of ICD codes alone was 54% (n = 94/174).
Several reasons may explain why ICD codes are not capturing the entirety of the irAE population. Physicians often document clinical events such as irAEs in their clinical notes in order to propagate information to colleagues in the medical record; however, ICD codes are utilized primarily for billing purposes. For instance, in a patient with immunotherapy-induced pneumonitis, it is highly important to document the incidence of pneumonitis in clinical documentation and to elaborate the impact it has on clinical treatment; however, for billing purposes, it is potentially sufficient to document other aspects of the patient’s care (i.e., C34.90 malignant neoplasm of unspecified part of unspecified bronchus or lung). The ICD code, although accurate and sufficient for billing, does not necessarily need to capture the occurrence of the irAE.
Additionally, irAEs are a relatively new array of treatments in the oncologic repertoire and ICD codes often do not exist to be specific enough for many irAEs. For example, at the time of this writing, there is no ICD code for “Immunotherapy-induced pneumonitis.” This lack of appropriate ICD code may lead clinicians to omit the event entirely from the ICD record after an initial search fails to yield a specific ICD code of interest. A study at the University of Utah Department of Health assessing the performance of ICD codes claims data in detecting adverse events had confirmed this association that specific adverse event codes perform well on specificity metrics and reached specificities of 70% [25], but decreased in the cases of irAEs without specific ICD codes available for clinicians to utilize. Furthermore, because it is up to clinical staff, including physicians, medical assistants, and coders to update the patients problem list and code each diagnosis, there exists tremendous room for variability and inconsistency.
The lack of diagnostic certainty and breadth of the differential diagnoses during initial patient presentation complicate matters even further. An irAE is often not immediately diagnosed, but relies on the ruling out of a broad differential through the utilization of second opinions, consultations, diagnostic testing, drug-elimination trials, and follow-ups. Few diagnostic tests are available to specifically diagnose irAEs, and often, the diagnosis is reliant on ruling out other causes first, before suggesting an irAE as a diagnosis of exclusion. The diagnosis is then usually only confirmed if symptoms improve after treatment discontinuation or the administration of steroids or other immunosuppressants. At times, a biopsy through either a bronchoscopy for pneumonitis, colonoscopy for colitis or percutaneous liver sampling for hepatitis may help pinpoint a diagnosis; however, oftentimes, patients are critically ill and unable to undergo such invasive procedures. Even in cases where tissue sampling is obtained, the diagnosis may still not be definitively elucidated. Other medical conditions, such as myocardial infarctions or bacteremia, have more definitive diagnostic tests such as EKGs and blood cultures, which serve to provide immediate diagnostic clarity and straightforward clinical labeling and ICD coding. Ultimately, in the absence of clear diagnostic tests affording clinical certainty, the ability to definitively diagnose and correctly label irAEs in the structured data will continue to remain a challenge.
An additional problem is that the ICD code system is known to have weaker coverage for non-billable symptomatology [26]. For example, it would be unusual to code a complaint of dry mouth due to anticholinergic medication [26]. The adverse drug event may only be recorded if it is severe enough to constitute a chief complaint or major finding; otherwise, it may never find its way into the ICD claims data [26]. This is further compounded in that ICD coding is not always performed by the team during the clinical encounter, but can be encoded in later days by medical records staff for billing purposes. This process is highly error-prone, not only due to the delayed timing of the coding several days after the clinical encounter, but also the relegation of this task to medical records staff unfamiliar with the patient or clinical context. Ultimately, if the adverse event was not a primary diagnosis, it is less likely to be perceived as essential by coding staff post-visit.
Trade-off between sensitivity and specificity when using ICD codes
One solution that other research studies have implemented to improve the specificity of ICD searches is to narrow the selection of ICD codes to only include those with high specificity for the adverse drug event of interest. This undoubtedly improves specificity; however, it comes at high cost to sensitivity overall as more true cases are left out due to non-specific ICD codes for reasons described previously. Although we used a relatively broad approach to including ICD codes in our search criteria, we still only achieved a sensitivity of 54%. Narrowing the search criteria to our narrow ICD with only the more appropriate ICD codes lowered this sensitivity even further to 15% (n = 26/174). As our analysis demonstrates, the ICD claims data neither serve as a robust filtering tool nor does it serve as an accurate positive predictor. The use of ICD code claims data should not be relied on in the assessment of irAEs.
Role of text mining
The passing of the FDA Amendments Act of 2007 led to a widespread adoption of electronic health records across healthcare systems nationally. This expanded use of EHR has led to a widespread interest in secondary use for adverse drug event research using both structured and unstructured data [27]. It is well known that EHRs contain rich ADE information and has been widely used for drug safety surveillance [15]. Unlike other resources, ADEs provide real-time dynamic data that can lead to better and more cost-effective patient management [15]. Initial studies focused on the use of structured data stored in relational databases such as the ICD codes claim data used in our analysis. However, our analysis helps to demonstrate that a large proportion of content highlighting incidence of adverse drug events is only present in the clinical notes in the form of unstructured data. Unlike data typically stored in relational databases, free-text is subject to the complexities and variability of natural language and challenging to deal with algorithmically [27]; however, our analysis demonstrates that this must be captured in order to generate comprehensive data on adverse drug event research. Because extracting information using manual review is cumbersome and labor-intensive, this approach is rarely used [28]. To meet the challenges posted by clinical narratives, text mining employs many tools including statistical analysis, machine learning and linguistic techniques collectively known as natural language processing (NLP) [27]. NLP may be a solution to provide a logistical advantage of efficient identification of adverse events when compared to the laborious process of manual review used in this study [15].
Limitations
Our study is not without limitations. First, ICD codes were subjectively chosen by clinicians with experience diagnosing irAEs and based on our search of the ICD code repository. However, it is probable that our curated list was not sufficiently inclusive and may have missed ICD codes utilized, thereby leading to falsely decreased sensitivities in our analysis. Likewise, the “narrow” ICD codes were also subjectively chosen by treating oncologists without rigorous or objective criteria. A large surveying of oncologists to identify commonly utilized ICD codes chosen in the recording of irAEs may have led to a more robust comprehensive listing.
Second, our results may have been influenced by the existing variation and interrater variability in the identification and diagnosis of irAEs. Although our manual chart review followed relatively strict criteria as described previously, interrater discrepancy to an unknown degree is still probable. Furthermore, diagnosis of irAEs relies on the exclusion of all alternate etiologies, which introduces variation based on the degree to which a thorough investigation has been performed to rule out alternate diagnoses.
Third, we did not differentiate irAEs by level of severity commonly referred to as grading. Grading of adverse events is not always possible or feasible to perform in real times. While some grades rely on measurable physical or laboratory findings, grading of subjective findings requires careful inspection of the clinical record and detailed patient interviewing. Clinicians may therefore neglect providing grades in the clinical documentation, and low-level graded irAEs may not make it to the clinical documentation altogether. Furthermore, low-grade irAEs may not always be reported by patients to their clinical providers and therefore our data may be an overrepresentation of higher grade irAEs. Lastly, our results reflect one institution and may not be generalizable to other institutions.
Future direction
Significant differences in the ICD codes used to document irAEs are likely to continue causing heterogeneity between manual review and ICD data. ICD listing specifically identifying irAEs as a separate entity with an independent classification system would standardize physician documentation within claims data. The existing inconsistency in the operational definition of irAEs needs to be addressed before being able to achieve consensus on a common code set. Future studies should also standardize the criteria for identifying irAE in a manner that creates consistent reliable diagnosis. The alternative to relying on ICD codes data is data-mining the narrative text. This holds greater promise in coming nearer to the current gold standard of manual review.
Conclusion
Our study showed that examining gastrointestinal and pulmonary irAEs through the International Classification of Disease (ICD) data leads to underrepresentation of true irAEs and overrepresentation of false irAEs, thereby concluding that ICD claims data are a poor approach to EHR data mining for irAEs in immunotherapy clinical research. These results taken together suggest that the use of ICD code claims data alone should not be relied on in the assessment of irAEs.
Supplementary Information
Acknowledgements
Research support was provided by the REDCap project and The Ohio State University Center for Clinical and Translational Science grant support (National Center for Advancing Translational Sciences, Grant UL1TR002733). Dr. Owen and Dr. Presley are Paul Calabresi Scholars supported by the OSU K12 Training Grant for Clinical Faculty Investigators (K12 CA133250).
Abbreviations
- ALT
Alanine aminotransferases
- AST
Aspartate aminotransferases
- DILI
Drug-induced liver injury
- EHR
Electronic health record
- FDA
Food and drug administration
- GI
Gastrointestinal tract
- ICIs
Immune checkpoint inhibitors
- ICD
International classification of disease
- irAEs
Immune-related adverse events
- NLP
Natural language processing
- SRS
Spontaneous reporting systems
Author contributions
AN developed the initial manuscript draft, incorporated all author comments and edits throughout multiple versions, and completed the final draft for submission. LL and DHO developed the overall concept for this paper, contributed to the initial manuscript draft and provided additional edits and final approval. CWC provided significant input into the construct of the entire manuscript, gathered and formulated data and reviewed edits for inclusion and provided final approval. SZ, MMZ, GO, CP, KK, SP, AJ, ML, MG, and GL contributed significantly to manual extraction of data and logistical support of the REDCap database. All authors critically reviewed the manuscript and approved submission.
Funding
This study was supported by the National Institutes of Health P30CA016058.
Code availability
All analyses were conducted using the SAS system, version 9.4 (SAS Institute Inc., Cary, NC). No custom codes were used.
Data availability
In accordance with local and/or U.S. Government laws and regulations, any materials and de-identified data that are reasonably requested by others for the purposes of academic research will be made available in a timely fashion.
Compliance with ethical standards
Conflict of interest
The authors report no conflict of interest.
Ethics approval
This study was approved by Institutional Review Board at the Ohio State University (IRB Study ID #2016C0070, PI: Dwight H. Owen, MD, MS).
Consent to participate
A waiver of consent was granted by the Institutional Review Board at the Ohio State University for this retrospective study.
Consent for publication
The authors listed have participated in the study, read the final version and given consent for publication.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Hughes MS, Zheng H, Zubiri L, et al. Colitis after checkpoint blockade: a retrospective cohort study of melanoma patients requiring admission for symptom control. Cancer Med. 2019;8(11):4986–4999. doi: 10.1002/cam4.2397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Yu Y, Ruddy K, Mansfield A, et al. Detecting and filtering immune-related adverse events signal based on text mining and observational health data sciences and informatics common data model: framework development study. JMIR Med Inform. 2020;8(6):e17353. doi: 10.2196/17353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Postow MA, Sidlow R, Hellmann MD. Immune-related adverse events associated with immune checkpoint blockade. N Engl J Med. 2018;378(2):158–168. doi: 10.1056/NEJMra1703481. [DOI] [PubMed] [Google Scholar]
- 4.Wang DY, Salem JE, Cohen JV, et al. Fatal toxic effects associated with immune checkpoint inhibitors: a systematic review and meta-analysis. JAMA Oncol. 2018;4(12):1721–1728. doi: 10.1001/jamaoncol.2018.3923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Som A, Mandaliya R, Alsaadi D, et al. Immune checkpoint inhibitor-induced colitis: a comprehensive review. World J Clin Cases. 2019;7(4):405–418. doi: 10.12998/wjcc.v7.i4.405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Xing P, Zhang F, Wang G, et al. Incidence rates of immune-related adverse events and their correlation with response in advanced solid tumours treated with NIVO or NIVO+IPI: a systematic review and meta-analysis. J Immunotherapy Cancer. 2019 doi: 10.1186/s40425-019-0779-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Osorio JC, Ni A, Chaft JE, et al. Antibody-mediated thyroid dysfunction during T-cell checkpoint blockade in patients with non-small-cell lung cancer. Ann Oncol. 2017;28(3):583–589. doi: 10.1093/annonc/mdw640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Abu-Sbeih H, Ali FS, Naqash AR, et al. Resumption of immune checkpoint inhibitor therapy after immune-mediated colitis. J Clin Oncol. 2019;37(30):2738–2745. doi: 10.1200/JCO.19.00320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Michot J, Lappara A, Pavec J, et al. The 2016–2019 ImmunoTOX assessment board report collaborative management of immune-related adverse events, an observational clinical study. European Society for Medical Oncology. 2019, Sept 29. Barcelona, Spain.
- 10.Naqash AR, Ricciuti B, Owen DH, et al. Outcomes associated with immune-related adverse events in metastatic non-small cell lung cancer treated with nivolumab: a pooled exploratory analysis from a global cohort. Cancer Immunol Immunother. 2020;69(7):1177–1187. doi: 10.1007/s00262-020-02536-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wang Y, Abu-Sbeih H, Mao E, et al. Immune-checkpoint inhibitor-induced diarrhea and colitis in patients with advanced malignancies retrospective: review at MD Anderson. J immunotherapy cancer. 2018 doi: 10.1186/s40425-018-0346-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Dupont R, Bérard E, Puisset F, et al. The prognostic impact of immune-related adverse events during anti-PD1 treatment in melanoma and non-small-cell lung cancer: a real-life retrospective study. Oncoimmunology. 2019;9(1):1682383. doi: 10.1080/2162402X.2019.1682383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Reddy HG, Schneider BJ, Tai AW. Immune checkpoint inhibitor-associated colitis and hepatitis. Clin Transl Gastroenterol. 2018;9(9):180. doi: 10.1038/s41424-018-0049-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chuzi S, Tavora F, Cruz M, et al. (2017) Clinical features, diagnostic challenges, and management strategies in checkpoint inhibitor-related pneumonitis. Cancer Manag Res. 2017;9:207–213. doi: 10.2147/CMAR.S136818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Jagannatha A, Liu F, Liu W, Yu H. Overview of the first natural language processing challenge for extracting medication, indication, and adverse drug events from electronic health record notes (MADE 1.0) Drug Saf. 2019;42(1):99–111. doi: 10.1007/s40264-018-0762-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hohl CM, Karpov A, Reddekopp L, Doyle-Waters M, Stausberg J. ICD-10 codes used to identify adverse drug events in administrative data: a systematic review. J Am Med Inform Assoc. 2014;21(3):547–557. doi: 10.1136/amiajnl-2013-002116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.World Health Organization . The safety of medicines in public health programmes: pharmacovigilance: an essential tool. Geneva, Switzerland: WHO Publications; 2006. [Google Scholar]
- 18.Cox AR, Anton C, Goh CH, Easter M, Langford NJ, Ferner RE. Adverse drug reactions in patients admitted to hospital identified by discharge ICD-10 codes and by spontaneous reports. Br J Clin Pharmacol. 2001;52(3):337–339. doi: 10.1046/j.0306-5251.2001.01454.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Coffman C. Using Free-Text Searching and ICD-10 Codes to identify prospective patients with drug-induced liver injury. College of Health and Human Services Eastern Michigan University. Thesis Committee. December 14, 2015
- 20.Hohl CM, Kuramoto L, Yu E, et al. Evaluating adverse drug event reporting in administrative data from emergency departments: a validation study. BMC Health Serv Res. 2013;13:473. doi: 10.1186/1472-6963-13-473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap): a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–381. doi: 10.1016/j.jbi.2008.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Owen DH, Wei L, Bertino EM, Edd T, Villalona-Calero MA, He K, Shields PG, Carbone DP, Otterson GA. Incidence, risk factors, and effect on survival of immune-related adverse events in patients with non-small-cell lung cancer. Clin Lung Cancer. 2018;19(6):e893–e900. doi: 10.1016/j.cllc.2018.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Cui P, Huang D, Wu Z, Tao H, Zhang S, Ma J, Liu Z, Wang J, Huang Z, Chen S, Zheng X, Hu Y. Association of immune-related pneumonitis with the efficacy of PD-1/PD-L1 inhibitors in non-small cell lung cancer. Ther Adv Med Oncol. 2020;12:1–10. doi: 10.1177/1758835920922033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hsiehchen D, Watters MK, Lu R, Xie Y, Gerber DE. Variation in the assessment of immune-related adverse event occurrence, grade, and timing in patients receiving immune checkpoint inhibitors. JAMA Netw Open. 2019 doi: 10.1001/jamanetworkopen.2019.11519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hougland P, Nebeker J, Pickard S, et al. Using ICD-9-CM Codes in Hospital Claims Data to Detect Adverse Events in Patient Safety Surveillance. In: Henriksen K, Battles JB, Keyes MA, et al., editors. Advances in Patient Safety: New Directions and Alternative Approaches (Vol. 1: Assessment). Rockville (MD): Agency for Healthcare Research and Quality; 2008 Aug. Available from: https://www.ncbi.nlm.nih.gov/books/NBK43647/ [PubMed]
- 26.Nadkarni PM. Drug safety surveillance using de-identified EMR and claims data: issues and challenges. J Am Med Inform Assoc. 2010;17(6):671–674. doi: 10.1136/jamia.2010.008607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Harpaz R, Callahan A, Tamang S, et al. Text mining for adverse drug events: the promise, challenges, and state of the art. Drug Saf. 2014;37(10):777–790. doi: 10.1007/s40264-014-0218-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Maguire FB, Morris CR, Parikh-Patel A, et al. A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: a study of non-small cell lung cancer in California. PLoS ONE. 2019;14(2):e0212454. doi: 10.1371/journal.pone.0212454. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All analyses were conducted using the SAS system, version 9.4 (SAS Institute Inc., Cary, NC). No custom codes were used.
In accordance with local and/or U.S. Government laws and regulations, any materials and de-identified data that are reasonably requested by others for the purposes of academic research will be made available in a timely fashion.
