Abstract
Evidence-based medicine utilizes research evidence from clinical trials to support treatment decisions. To leverage the advantage of electronic health records and big data analysis methods, we developed a data-driven analytic pipeline that uses 1) agglomerative hierarchical clustering to define different granularity of treatment variation, 2) feature selection and multinomial multivariate logistic regression analysis to identify variables (factors) associated with treatment variation, and 3) prognosis analysis to compare patient outcome across top treatment groups. We tested our approach on the diffuse large B-cell lymphoma patient population from the MIMIC-IV dataset and found that our approach helps determine the optimal granularity of treatment variation and identify factors associated with treatment variation but not realized in randomized controlled trials due to unbalanced patient cohorts. We also found some patient cohorts’ characteristics that could serve to inspire hypothesis generation, such as the influence of ethnicity on the treatment plans and subsequent prognoses.
Introduction
Evidence-based medicine has been supporting clinical decision making for optimal patient care because it features the integration of “individual clinical expertise with the best available external evidence from systematic research”1 and evaluates the treatment benefits or harms on specific patient populations with statistical average effects2. It relies on carefully curated clinical trials to establish optimal treatments for the pre-defined patient cohorts and assumes that the heterogeneity within the cohorts can be dismissed3, assessed2, and validated4 by subgroup analysis. Patients who are enrolled in these clinical trials are usually pre-selected based on their physical robustness, extent of diseases, access to academic centers and personal understanding of the trial regimens, all of which may impact the outcomes of clinical trials. However, parameters that are captured and analyzed in clinical trials are also pre-defined by trial investigators. On the other hand, real-world patients are more heterogenous. As a result, to “steer patients to the right drug at the right dose at the right time”5, there has been a surge to analyze real-world data with big data analysis methods to understand the decision-making process and the impact of these decisions in the clinical setting, as to enhance our knowledge on other factors not realized in randomized controlled trials6.
Treatment variation, for example, has been used as an indicator to assess treatment patterns and outcomes of patient cohorts7, or an approach to reveal factors contributing to clinical decision-making8,12. For example, comorbidities9, genetic mutations10, ethnicity11, geographic locations12, social determinants of health13, and healthcare teams and institutions14 have all been reported as contributing factors to the treatment variations in many disciplines and diseases. There has been a significant effort in identifying treatment patterns and variations to recommend the optimal treatments for individual patient. Sun and colleagues15 clustered similar treatments, constructed the decision trees of patient cohorts based on their demographic and diagnosis information, and then analyzed the outcomes of different treatment clusters in each patient cohort. Chen and colleagues16 included three additional disease severity scores in the construction of the decision trees of patient cohorts. The disease severity scores were calculated from a pre-set list of ordinal variables of demographic information and laboratory test results and had been validated to predict the mortality in critically ill patients. However, similar approaches used some highly simplified scores without providing enough insights and space for reasoning for clinicians. Chen and colleagues17 also introduced a clustering and association analysis framework. They first defined disease-symptom clusters from laboratory test results, imaging results and notes, as well as the schemes of diseases and treatments. Then they performed Apriori-based association analysis to find the strongest associations between disease-symptom or disease-diagnosis/treatment pairs for ranking-based recommendation. However, this approach discarded the less frequent diagnoses and treatments in the patient cohort and did not consider the rationale behind these potential subgroups. In summary, the previous studies disregarded some information, such as detailed contributing factors and less frequent clinical concepts, or did not report treatment variation with clinically reasonable number of clusters, diminishing data’s explainability.
Therefore, in this study, we developed a treatment variation identification pipeline to explore treatment variations, via clustering similar treatments and identifying factors from patient characteristics, such as demographics, comorbidities, and laboratory test results. We also worked with a clinical domain expert to compare the results from different levels of clustering to determine the optimal granularity of treatment variation that was informative and meaningful. Finally, we examined patient characteristics in the top treatment groups and assessed its explainability.
Methods
The treatment variation identification pipeline consisted of three steps (Figure 1): 1) unsupervised data-driven and automated annotation of treatment clusters, 2) feature selection and multinomial multivariate logistic regression analysis to identify the responsible variables (patient characteristics) in the top treatment clusters, and 3) prognosis analysis of the treatment clusters. This pipeline was performed iteratively to determine the optimal granularity of treatment variation that is insightful yet explainable by clinicians. In addition, we examined three models with different variable candidate pools: A) only demographics and comorbidities, which assumed that treatment decisions were solely based on the demographics and comorbidities of patients, B) demographics, comorbidities, and laboratory test results of clinicians’ interest picked by a domain expert, which assumed that treatment decisions were made based on demographics, comorbidities and some specific laboratory test results of clinicians’ interest, and C) demographics, comorbidities, and all available laboratory test results, which assumed that treatment decisions were made considering all available information, including demographics, comorbidities, and all laboratory test results.
Figure 1.
The treatment variation identification pipeline and three variable candidate pool models.
DLBCL dataset and population:
The treatment variation identification pipeline was tested using diffuse large B-cell lymphoma (DLBCL) patient population in the Medical Information Mart for Intensive Care IV (MIMIC-IV) dataset18,19. DLBCL is a typical disease whose treatment plans can highly depend on molecular features, clinical presentations, and patient characteristics. Since the introduction of rituximab, remission rate has increased and many treatment plans containing rituximab have been validated to improve the overall prognoses of DLBCL patients20. The treatment plans of DLBCL rely on chemotherapies, which can be composed of multiple drugs administered together in one treatment cycle. The clinical guidelines of DLBCL recommend several first-line drug combinations, such as R-CHOP (Rituximab, Cyclophosphamide, H-doxorubicin, O-vincristine, Prednisone) and R-EPOCH (R-CHOP + Etoposide), with possible dose adjustments. Alternative combinations and finer dose adjustments are provided based on the comorbid conditions and the disease subtypes. For example, R-mini-CHOP is recommended as one of the first-line therapies for very frail patients or patients with comorbidities who are older than 80 years old. More second-line and subsequent therapies are recommended if patients have disease progression. Moreover, for patients with relapse or refractory DLBCL, clinical trials incorporating new therapeutics are commonly offered, especially in academic centers21. These characters made DLBCL patients a suitable population to study the treatment variance. The MIMIC-IV dataset contained 221 DLBCL patients and 605 admissions who had ever 1) been diagnosed with DLBCL, 2) received treatments related to DLBCL and 3) had laboratory test results before any DLBCL-related treatments started in each admission. 44.3% (98/251) of the patients were female, accounting for 43.1% (261/605) of the admissions. The age range of all patients was from 21 to 91.
Pre-processing:
Treatments, drugs and procedures, used for clustering were selected manually if they were included in any literature or clinical trials of DLBCL. Every inpatient admission of patients was used as a data point for clustering, grouping together only the drug combinations they receive in a single admission, such as R-CHOP. This allowed for the treatment changes in each admission under the assumption that patients are in different conditions and can receive different treatment plans, which is consistent with treatment variations. The generic and brand names, as well as the dosage units, were unified. Treatments were extracted as the mean for each drug and formulation (e.g., liposomal doxorubicin and doxorubicin), or counted as times for each procedure. If methotrexate or cytarabine was used as monotherapy in an admission after other admissions with standard regimens such as R-CHOP, they were merged into the previous admissions as the adjuvant treatments. Treatment data were standardized before clustering.
Patient data, or variables, can be categorized into three types: demographics, comorbidities, and laboratory test results. Demographics included age, gender, insurance type, language, ethnicity, and admission types. Comorbidities were coded by both Charlson Comorbidity Index (CCI) and Elixhauser Comorbidity Index (ECI) based on patients’ previous diagnoses. The CCI provided the most widely used overall condition estimation and prognosis prediction22, while the ECI provided disease-specific information23. Laboratory test results were extracted with only the last values before their treatments started, because the treatment decisions were based on their most recent conditions and remained the same even with adverse reactions, such as anemia and pancytopenia, emerged. Categorical variables were dummy-coded and randomly one of the dummied variables was dropped to avoid redundancy. Lymphocyte / monocyte ratio (LMR) and neutrophil / lymphocyte ratio (NLR) were calculated as absolute lymphocyte count (ALC) / absolute monocyte count (AMC) and absolute neutrophil count (ANC) / ALC because they were reported as predictive for prognosis in DLBCL24. All variables were also standardized.
Step 1 – Clustering of treatments:
The clustering step was conducted using agglomerative hierarchical clustering after comparing multiple clustering methods, namely a) centroid-based k-Means clustering, b) Gaussian mixture clustering, c) agglomerative hierarchical clustering, d) density-based spatial clustering of applications with noise (DBSCAN), e) ordering points to identify the clustering structure (OPTICS), and f) affinity propagation clustering. They were evaluated by Silhouette score, Calinski-Harabasz index and Davies-Bouldin index methods for their within-cluster similarity and between-cluster separation. Principal component analysis (PCA) was also used to visualize the clustering results for validation. K-Means clustering was done with multiple random starts to decrease the influence of initial values. The optimal number of clusters was determined by visualizing the elbow point in the Sum of Squared Error (SSE) plot or the peak of Silhouette score plot in k-Means, DBSCAN and OPTICS. Hierarchical clustering showed all possible clustering levels.
For each cluster, treatment components and dosages were automatically annotated. A drug or procedure was annotated in a cluster if they were used in no less than 70% of data points in this specific cluster. Rituximab, because of its unique status in the treatments of DLBCL, was annotated with parentheses if it was used in no less than 50% but less than 70% in a cluster. If no annotations were made automatedly, the cluster would be inspected manually and then marked as an ‘uncertain cluster’ which means the cluster consisted of several subgroups.
Step 2 – Feature selection and multinomial logistic regression:
Variables with an absolute Pearson’s correlation coefficient more than 0.7 were removed for collinearity problem. Variables which were highly sparse with more than 30% null values were also removed before imputations. Mean imputation was used in the remaining variables.
Feature selection was done via a two-part process. The first part was recursive feature elimination with cross validation (RFECV) to determine the most appropriate number of variables based on the cross-validation scores (the number of correct classifications with five-fold cross validation). The second part was recursive feature elimination (RFE) to remove the least important variable iteratively until the number of the variables from the RFECV part was achieved. Multinomial multivariate logistic regression analysis was used to identify a set of significant variables (p < 0.05) to explain the between-cluster differences, i.e., the contributing factors for treatment variation. Their corresponding odds ratios were calculated as the exponentiation of coefficients. Only clusters with more than 10 admissions were included in the logistic regression model.
Comparison of cluster levels:
In this step, we also examined the performance across multiple hierarchical clustering levels (Step 1). Hierarchical clustering suggested an optimal level of clustering, but at the same time, provided other clustering levels from high to low granularity. A higher granularity showed more clusters (treatment groups) but could be difficult to explain and utilize; a lower granularity showed fewer clusters but might lose ability to differentiate patient characteristics.
As a result, we used measures to assess the explainability of the between-cluster differences to determine a more explainable level of clustering. The measures were a) match, the percentage of admissions or clusters recognized comparing to the optimal level, b) uncertainty, the number of admissions that failed the automated annotations and needed manual inspection, c) insights, 100% minus the ratio of the average number of picked significant variables of clinicians’ interest over the average number of all picked significant variables, d) pseudo-R square scores (McFadden’s25, Cox & Snell26, and Nagelkerke’s27), e) the number of clusters with significantly better prognosis, and f) the number of clusters with significantly worse prognosis (Figure 1). Also, a Sankey diagram was used to visualize how the prognosis information was transferred when clusters were grouped together.
The match and uncertainty measures implied how much information was lost when reducing the granularity of variance step by step. The insights measure was calculated to reveal how many variables uniquely in model C were selected. This measure was exploratory. If it dropped when the granularity decreased, it could be attributed to the masking of subgroup effects when the clusters were mixed. But it should be analyzed together with the absolute number of the identified variables, because when the granularity decreased, the number of selected variables were expected to increase due to the increase of heterogeneity within the clusters. The three pseudo-R square scores assessed the models’ explainability compared to a null model25,26,27; therefore, by comparing the changes within each score across different levels of granularity, we were able to determine the reasonable level of clustering.
Comparison of three models with variable candidate pools:
The treatment variation identification pipeline was performed in the three models: A) only demographics and comorbidities, B) demographics, comorbidities, and some variables of clinicians’ interest, and C) demographics, comorbidities, and all available lab variables. The explainability of the model was also evaluated and compared to determine the best model.
Step 3 – Prognosis analysis:
In this step, we aimed to compare the prognosis of the top treatment groups identified in Step 2. The goal for newly diagnosed DLBCL patients was to achieve and sustain a complete response (CR) to the treatment. The optimal treatment for relapse or refractory patients was hematopoietic cell transplant (HCT). Without results of the positron emission tomography with computed tomography (PET/CT) imaging, we were not able to accurately evaluate CR with MIMIC-IV dataset. Therefore, we assigned patients who received treatments within only one cluster (non-HCT-related clusters) to be ‘responsive’ (R), and the others as ‘non-responsive’ (NR). Patients who ever received any drugs or procedures related to HCT, such as the salvage regimens, were assigned to ‘HCT’, and the others to ‘non-HCT’. In each cluster, χ2 test was used to determine if there were any significant differences (p < 0.05) between R and NR patients compared to all other patients, as well as HCT and non-HCT patients. If significant, we then compared the R or HCT rates over 50% to determine if this cluster generated a better or worse prognosis.
Results
Step 1 – Treatment clustering:
The SSE elbow point suggests a data-driven optimal clustering level of 36 clusters (Figure 2a). The heatmap of the agglomerative hierarchical clustering confirms the proximity of treatment components in the commonly used regimens, such as the close distances within CHOP and between gemcitabine and oxaliplatin (red and yellow box in Figure 3a). Six clustering methods were evaluated by PCA and evaluation metrics (Figure 3b). The Silhouette score and the Calinski-Harabasz index reveal the relationship between intra-cluster and inter-cluster dispersion, favoring higher values, while the Davies-Bouldin index more focus on the inter-cluster separation, favoring a lower value. Hierarchical clustering was then used for further analysis because of its superior performance and ability to provide different clustering granularities.
Figure 2.
The clustering results. a) The SSE plot for k-Means clustering, b) The Silhouette score plot for k-Means clustering, and c) The PCA and evaluation metrics results of six clustering methods.
Figure 3.
The agglomerative hierarchical clustering results and outcome metrics as the performances of the model over different levels of clustering. a) The boxes showing the proximity of commonly used regimens, such as GemOx and R-EPOCH, b) The outcome metrics with the numbers of variables or variables of interest transformed into percentages according to the number of total variables or total variables of interest for illustration.
The results shows that each cluster clearly represented a common treatment regimen; for example, in the 36-cluster level, R-EPOCH in cluster #1, (R)-CHOP in cluster #2, (R)-ICE in cluster #12, GDP in cluster #15, R-GemOx in cluster #16, and treatments with methotrexate maintenance in cluster #21. Some clusters included lymphodepletion chemotherapies, such as FC (fludarabine + cyclophosphamide) in cluster #28. Some treatment plans centering newer drugs that are still undergoing clinical trials were also identified, such as the ibrutinib-centered treatments in cluster #30 and the temozolomide-centered treatments in cluster #33. The only exception was a cluster containing over half admission data points receiving prednisone. It was then noted as ‘P’ (in cluster #18) or an ‘uncertain cluster’ and used to assess the explainability as a negative evaluator.
Step 2 – Variables responsible for the treatment variations:
The univariate analysis identified 60 significant variables among the total of 95 available variables. These variables included demographic information, comorbidities, and laboratory test results, such as age, chronic heart failure, alanine aminotransferase, lymphocytes (%), monocytes (%), neutrophils (%), and so on. We also removed colinear variables. For example, ‘Red Blood Cells’ was strongly associated with ‘Hematocrit’ (Pearson’s correlation coefficient 0.92) and ‘Hemoglobulin’ (0.88). ‘White Blood Cells, count’ was strongly associated with ‘Absolute Neutrophil Count’ (0.93). ‘Neutrophils, %’ was strongly associated with ‘Lymphocytes, %’ (-0.79). ‘MCH’ was strongly associated with ‘MCV’ (0.91). After removing the collinear variables, the multivariate multinomial logistic analysis was able to identify contributing variables to the clusters (Table 1). Table 1 only presents the results of two levels of clustering for easy comparison and understanding, namely the data-driven clustering-recommended 36-cluster level and the less granular 22-cluster level, which was later decided as the optimal clustering level.
Table 1.
The odds ratios of significant variables from the multivariate multinomial logistic regression analysis.
# of clusters | 36 | # of clusters | 22 | ||
---|---|---|---|---|---|
Cluster | Variables | OR | Cluster | Variables | OR |
(R)-CHOP | None | None | (R)-CHOP | Visit Type, Elective | 0.3959 |
(R)-EPOCH | None | None | (R)-EPOCH | Comorbid, Neurological Disorders | 0.4863 |
(R)-ICE | Comorbid, Arthritis | 0.3510 | (R)-ICE | Comorbid, Arthritis | 0.3759 |
Lab test, Creatinine | 0.3618 | ||||
Lab test, L/M Ratio | 0.3602 | ||||
Visit Type, Emergency Room | 0.2952 | ||||
Ethnicity, Hispanic/Latino | 0.2954 | ||||
(R)-E+Cytarabine+MTX | DLBCL, Intrathoracic Lymph Nodes | 0.3720 | Uncertainty, the combination of the left clusters on the 36-cluster level | None | None |
FC | Comorbid, Coagulopathy | 3.4058 | |||
Comorbid, Obesity | 2.7040 | ||||
Uncertainty | None | None |
Significant demographic or comorbid variables were directly suggestive of the differences between clusters. Take the results from ‘(R)-ICE’ as an example. The low odds ratios of ‘Visit, Emergency Room’ suggested the low likelihood of DLBCL patients to receive (R)-ICE regimens over any other treatment options when they were admitted from the emergency room. Another example was the odds ratios of ‘Ethnicity, Hispanic/Latino’ suggested the low likelihood of Hispanic or Latino people to receive R-ICE. The ORs of laboratory test results were associated with the relationship between their mean values, the mean values in all other clusters, and the normal ranges, to generate meaningful understanding. For example, the means of ‘L/M ratio’ (LMR) in the (R)-ICE cluster and in all other clusters were 1.7619 and 2.4634, respectively. Because the normal range of LMR was from 3.63 to 6.99, the smaller (farther away from normal) mean of the (R)-ICE cluster suggested that patients with the smaller LMR would be less likely to receive (R)-ICE over other treatments, or patients with larger LMR (nearer to normal) would be more likely to receive (R)-ICE treatments. Similar condition could be seen in (R)-EPOCH cluster with a decreased likelihood of existing ‘comorbid neurological disorders’ prior to treatment. This phenomenon was consistent with the neurotoxicity reported in patients previously treated with R-DA-EPOCH regimens, for example, the posterior reversible encephalopathy syndrome28.
Determination of the optimal granularity of variance:
Although the data-driven clustering algorithm suggested 36 as the optimal number of clusters, it does not provide meaningful insights from a clinical perspective. After consulting with a clinical domain expert, gradual reduction of the number of clusters was performed. The new explainable level of clustering, 22, was chosen (Figure 3b) as it maintained relatively high pseudo-R square scores (The average percentage of pseudo-R square scores at the 22-cluster level and the next 17-cluster level comparing to the 36-cluster level were 77.26% and 71.05% respectively).
We found that when the number of clusters dropped gradually, the variables identified by the multivariate multinomial logistic regression analysis increased. Figure 4 shows that more new variables were identified as the number of clusters decreased. The left column shows the significant variables identified first at the 36-cluster level, but they were also picked up at the 22-cluster and 10-cluster levels. The middle and right columns show the newly added significant variables at the 22-cluster and 10-cluster levels respectively. The bright red and bold text marks the variables whose odds ratios >= 3 or <= 0.33, which means that they have a high distinguishing power for the patient cohorts’ characteristics. The dark red text marks the remaining variables with odds ratio >= 2 or <= 0.5, meaning they have a moderate distinguishing power. The remaining variables have a relatively lower distinguishing power. It was clear that the number of newly added variables with a high distinguishing power decreased gradually, while newly added variables with a moderate or low distinguishing power increased (Table 1, Figure 4). This could be attributed to the compromise of the explainability by grouping clusters together from the introduction of more heterogeneity within the clusters. For example, the odds ratio of ‘Comorbid, Coagulopathy’ for FC regimen at the 36-cluster level was 3.4058 (high distinguishing power); but the odds ratio of ‘Comorbid, Neurological disorders’ for (R)-EPOCH was 0.4863 (moderate distinguishing power) at the 22-cluster level (Table 1). This phenomenon confirmed the loss of distinguishing power when we gradually reduced the total number of clusters. Therefore, we decided the clustering level, 22, to be the optimal granularity of treatment variance, as its explainability was not compromised significantly.
Figure 4.
The new variables picked up for between-cluster variations. When the number of clusters decreased, the number of variables needed for explainability increased, while their average distinguishing power (denoted by odds ratio) drops. LNs, lymph nodes. RBC, red blood cells. RDW(-SD), red cell distribution width (standard deviation).
Determination of the optimal model of variable candidate pool:
Three models, with different ranges of variables, were compared based on the metrics (Table 2), comparing the explainabilitby of laboratory test results, especially those of clinicians’ interest. The most optimal model was model C, with demographic information, comorbidities, and all laboratory test results. This shows that some lab variables, although not interesting to clinicians, may have some unknown effects on treatment plans. Table 2 also only presents the 36-cluster (data-driven) and 22-cluster (optimal granularity) levels for easy comparison.
Table 2.
The results of model comparisons for the optimal variable candidate pool. #, number.
Step 3 – Prognosis analysis:
In this step, we aimed to analyze the responsive rates of the treatment clusters and to visualize how the prognosis information was transferred when the number of clusters decreased. We used a Sankey diagram to illustrate how treatments merged into fewer clusters with R rates (Figure 5). The χ2 test revealed that (R)-CHOP, (R)-EPOCH and venetoclax + R-EPOCH (in cluster #4) are found to produce significant better R rates (the red flows). (R)-ICE, treatments with methotrexate maintenance, FC and cytarabine-centered regimens (in cluster #34) show worse prognoses in R rate (the blue flows). Bortezomib + EPOCH (in cluster #5) and mechlorethamine-centered regimens (in cluster #6) show no significant difference in R rate (the yellow flows). Regarding the HCT rates, the venetoclax + R-EPOCH, mechlorethamine + R-CHOP, and bortezomib + EPOCH clusters all show insignificant prognosis changes.
Figure 5.
The Sankey diagram showing the clusters with their annotations, and how the clusters got grouped together when the number of clusters were reduced.
Discussion
The identification of treatment variation provides opportunities for us to differentiate treatment plans that are currently being prescribed to patients. With a deeper investigation of patient characteristics, it identified variables associated with these treatment plans. Existing approaches are limited because they grouped treatment plans into only three categories, optimal, suboptimal, and palliative29, and were not able to provide a high granularity of variance to explain potential contributing variables. A higher granularity of treatment variation is important because the average effects of a low granularity will mask the treatment response heterogeneity, thus the minimum clinically important difference for the disease may be overlooked30. However, a higher granularity of treatment variation could be overwhelming and not meaningful for clinicians. Our treatment variation identification pipeline presents a generalizable approach that can find a reasonable level clustering that is informative to clinicians.
During our analysis on the DLBCL population, the logistic regression also revealed specific patient characteristics in the treatment groups that aligned with clinicians’ decision rationale. Specifically, some identified variables confirmed the adherence to clinical guidelines. For example, R-ICE is a widely used salvage regimen, with which the patients should be admitted and monitored closely for adverse reactions. Therefore, it was consistent with the less likelihood to be prescribed to unstable patients admitted from the emergency room (Table 1).
In addition, we noticed that some findings were not explainable currently but could inform for future research opportunities (Table 1). For example, the low OR of Hispanics or Latinos to receive (R)-ICE therapies over other treatments could be inspirational for hypothesis generation. Previous population-based analysis reported the disparities in the survival outcomes in the Hispanic or Latino population31 and facilitated research on the clinical and molecular characteristics in these populations. Research in Miami’s Hispanic and Latino population concluded with no contribution of ethnicity in the biological development of DLBCL32. Therefore, whether the use of this commonly used salvage regimen led to the differences in Hispanic/Latino patients’ prognoses could be a potential hypothesis to examine in the future, for example, whether there are cultural influences on the patients’ attitudes towards complex treatments in an advanced stage of diseases and whether there are other factors against the Hispanic and Latino people to receive treatments such as (R)-ICE.
Our pipeline also suggested the C model with all available variables in the EHR to be the best model. This suggests the complexity of clinical scenarios. Under the paradigm of evidence-based medicine, evidence is mostly generated from high quality randomized controlled trials, cohort studies, or case-controlled studies. These studies provide clean data for analysis and generate trustworthy results. However, the real-world data are usually limited in their integrity and accuracy, as not all data are collected consistently. Our pipeline is a generalized approach to wrangle with real-world data and provide insights regarding treatment variance and its contributing factors.
Limitations
Firstly, DLBCL has highly varied genetic characteristics and molecular origins, which are associated with their distinct prognoses. But MIMIC-IV dataset currently do not include genetic information. Unfortunately, other datasets with genetic information also do not include laboratory test results or detailed treatment plans. Meanwhile, although the International Prognostic Index (IPI) was widely used for evaluation of DLBCL patient condition and prognosis, it was not introduced because of the lack of data regarding Ann Arbor stage, ECOG performance score and specific location of the lymphoma. Future efforts should consider incorporating multimodal data and multiple analysis methods.
Secondly, the MIMIC IV dataset provides de-identified patient information with altered timelines. Therefore, we were only able to evaluate the prognosis by the presence of relapses of DLBCL. However, the time of progression-free survival and the time to relapse are also important and interesting prognosis indicators to be investigated in the future.
Thirdly, our patient population size was small (221 patients with 605 admissions). However, we chose this population because the complexity of DLBCL treatment plans and the varied outcomes of treatments call for a deeper inspection on the treatment variance from data-driven perspectives. Further validation research needs larger-scale real-world data.
Fourthly, the comorbidity analysis was based on the coding in the MIMIC-IV dataset, which relies on the accuracy of documented clinical diagnosis records. This drawback of data-driven approaches has also been reported in other studies using public dataset33. Therefore, we suggest that an algorithm to automatedly recognize paradoxical diagnoses or diagnoses inconsistent with the natural progression of diseases may help solve this problem.
Finally, the data cleaning of treatments initially used the mapping to some drug ontologies, which failed because they were not up to date with the new drugs. As a result, the treatments were matched to a pre-defined manual list of drugs and procedures which have been recommended in the DLBCL clinical guidelines or tested in the related clinical trials. It is expected this mapping can be integrated with named entity recognition, clinically pre-defined order sets in EHR system, the query-based synthesized data platforms, or Fast Healthcare Interoperability Resources (FHIR) applications.
Conclusion
We developed a treatment variation identification pipeline to identify treatment clusters and the contributing variables responsible for the between-cluster differences. We tested the pipeline on the DLBCL patients from the MIMIC-IV dataset. We also used the pipeline to determine the optimal granularity of treatment variance with a better understanding of the complexity of real-world clinical data. This methodology, if supported by standardized, well-defined and accurate data entry, may enable curation of large data across different centers or geographical areas and thus has the potential of uncovering novel findings that can be further confirmed with dedicated studies.
Figures & Table
References
- 1.Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ. 1996 Jan 13;312(7023):71–2. doi: 10.1136/bmj.312.7023.71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Chow N, Gallo L, Busse JW. Evidence-based medicine and precision medicine: Complementary approaches to clinical decision-making. Precis Clin Med. 2018 Sep 1;1(2):60–4. doi: 10.1093/pcmedi/pby009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Senn S. Individual response to treatment: is it a valid assumption? BMJ. 2004 Oct 23;329(7472):966–8. doi: 10.1136/bmj.329.7472.966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sun X, Briel M, Busse JW, You JJ, Akl EA, Mejza F, et al. Credibility of claims of subgroup effects in randomised controlled trials: systematic review. BMJ. 2012 Mar 15;344(mar15 1):e1553–e1553. doi: 10.1136/bmj.e1553. [DOI] [PubMed] [Google Scholar]
- 5.Hamburg MA, Collins FS. The Path to personalized medicine. N Engl J Med. 2010 Jul 22;363(4):301–4. doi: 10.1056/NEJMp1006304. [DOI] [PubMed] [Google Scholar]
- 6.Passamonti F, Corrao G, Castellani G, Mora B, Maggioni G, Gale RP, et al. The future of research in hematology: Integration of conventional studies with real-world data and artificial intelligence. Blood Reviews. 2021 Dec. p. 100914. [DOI] [PubMed]
- 7.Derks MGM, Bastiaaneet E, Kiderlen M, Hilling DE, Boelens PG. on behalf of the EURECCA Breast Cancer Group. Variation in treatment and survival of older patients with non-metastatic breast cancer in five European countries: a population-based cohort study from the EURECCA Breast Cancer Group. Br J Cancer. 2018 Jul;119(1):121–9. doi: 10.1038/s41416-018-0090-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pfeffer MA, Claggett B, Assmann SF, Boineau R, Anand IS, Clausell N, et al. Regional variation in patients and outcomes in the treatment of preserved cardiac function heart failure with an aldosterone antagonist (TOPCAT) trail. Circulation. 2015 Jan 6;131(1):34–42. doi: 10.1161/CIRCULATIONAHA.114.013255. [DOI] [PubMed] [Google Scholar]
- 9.Chen R, Ryan P, Natarajan K, Falconer T, Crew KD, Reich CG, et al. Treatment patterns for chronic comorbid conditions in patients with cancer using a large-scale observational data network. JCO Clinical Cancer Informatics. 2020 Sep. pp. 171–83. [DOI] [PMC free article] [PubMed]
- 10.Coate L, Cuffe S, Horgan A, Hung RJ, Christiani D, Liu G. Germline Genetic Variation, Cancer Outcome, and Pharmacogenetics. JCO. 2010 Sep 10;28(26):4029–37. doi: 10.1200/JCO.2009.27.2336. [DOI] [PubMed] [Google Scholar]
- 11.Goodman CW, Brett AS. Race and pharmacogenomics—personalized medicine or misguided practice? JAMA. 2021 Feb 16;325(7):625. doi: 10.1001/jama.2020.25473. [DOI] [PubMed] [Google Scholar]
- 12.O’Connor GT, Quinton HB, Traven ND, Ramunno LD, Dodds TA, Marciniak TA, et al. Geographic variation in the treatment of acute myocardial infarction: The Cooperative Cardiovascular Project. JAMA. 1999 Feb 17;281(7):627. doi: 10.1001/jama.281.7.627. [DOI] [PubMed] [Google Scholar]
- 13.Kolak M, Bhatt J, Park YH, Padron NA, Molefe A. Quantification of neighborhood-level social determinants of health in the continental United States. JAMA Netw Open. 2020 Jan 29;3(1):e1919928. doi: 10.1001/jamanetworkopen.2019.19928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Keikes L, Koopman M, Stuiver MM, Lemmens VEPP, van Oijen MGH, Punt CJA. Practice variation on hospital level in the systemic treatment of metastatic colorectal cancer in The Netherlands: a population-based study. Acta Oncologica. 2020 Apr 2;59(4):395–403. doi: 10.1080/0284186X.2020.1722320. [DOI] [PubMed] [Google Scholar]
- 15.Sun L, Liu C, Guo C, Xiong H, Xie Y. Data-driven automatic treatment regimen development and recommendation. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. pp. 1865–74. Available from: https://dl.acm.org/doi/10.1145/2939672.2939866.
- 16.Chen J, Guo C, Sun L, Lu M. Mining typical treatment duration patterns for rational drug use from electronic medical records. J Syst Sci Syst Eng. 2019 Oct;28(5):602–20. [Google Scholar]
- 17.Chen J, Li K, Rong H, Bilal K, Yang N, Li K. A disease diagnosis and treatment recommendation system based on big data mining and cloud computing. Information Sciences. 2018 Apr;435:124–49. [Google Scholar]
- 18.Johnson A., Bulgarelli L., Pollard T., Horng S., Celi L. A., Mark R. 2021. MIMIC-IV (version 1.0). PhysioNet. [DOI]
- 19.Goldberger A., Amaral L., Glass L., Hausdorff J., Ivanov P. C., Mark R., ..., Stanley H. E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online] 2000;101(23):e215–e220. doi: 10.1161/01.cir.101.23.e215. [DOI] [PubMed] [Google Scholar]
- 20.Roschewski M, Staudt LM, Wilson WH. Diffuse large B-cell lymphoma—treatment approaches in the molecular era. Nat Rev Clin Oncol. 2014 Jan;11(1):12–23. doi: 10.1038/nrclinonc.2013.197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Cheson BD, Nowakowski G, Salles G. Diffuse large B-cell lymphoma: new targets and novel therapies. Blood Cancer J. 2021 Apr;11(4):68. doi: 10.1038/s41408-021-00456-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–83. doi: 10.1016/0021-9681(87)90171-8. [DOI] [PubMed] [Google Scholar]
- 23.Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Medical Care. 1998;36(1):8–27. doi: 10.1097/00005650-199801000-00004. [DOI] [PubMed] [Google Scholar]
- 24.Ho C-L, Lu C-S, Chen J-H, Chen Y-G, Huang T-C, Wu Y-Y. Neutrophil/lymphocyte ratio, lymphocyte/monocyte ratio, and absolute lymphocyte count/absolute monocyte count prognostic score in diffuse large B-cell lymphoma: Useful prognostic tools in the rituximab era. Medicine. 2015 Jun;94(24):e993. doi: 10.1097/MD.0000000000000993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.McFadden D. Zarembka P., editor. (1973) Conditional logit analysis of qualitative choice be. Frontiers in Econometrics. pp. 105–142.
- 26.Cox D.R., Snell E.J. Routledge; 1970. Analysis of binary data (2nd ed.) [DOI] [Google Scholar]
- 27.Nagelkerke NJD. A note on a general definition of the coefficient of determination. Biometrika. 1991;78(3):691–2. [Google Scholar]
- 28.Floeter AE, Patel A, Tran M, Chamberlain MC, Hendrie PC, Gopal AK, et al. Posterior Reversible Encephalopathy Syndrome Associated With Dose-adjusted EPOCH (Etoposide, Prednisone, Vincristine, Cyclophosphamide, Doxorubicin) Chemotherapy. Clinical Lymphoma Myeloma and Leukemia. 2017 Apr;17(4):225–30. doi: 10.1016/j.clml.2016.12.004. [DOI] [PubMed] [Google Scholar]
- 29.White Wong Doo, Bassett Martin, Harrison Prince, et al. The use of optimal treatment for DLBCL is improving in all age groups and is a key factor in overall survival, but non-clinical factors influence treatment. Cancers. 2019 Jul 2;11(7):928. doi: 10.3390/cancers11070928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Johnson L, Shapiro M, Mankoff J. Removing the mask of average treatment effects in chronic Lyme disease research using big data and subgroup analysis. Healthcare. 2018 Oct 12;6(4):124. doi: 10.3390/healthcare6040124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bakhshi TJ, Georgel PT. Genetic and epigenetic determinants of diffuse large B-cell lymphoma. Blood Cancer J. 2020 Dec;10(12):123. doi: 10.1038/s41408-020-00389-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Harari Turquie M, Alencar AJ. DLBCL in Hispanics: The University of Miami (UM) experience. JCO. 2018;36(15_suppl):e19526–e19526. [Google Scholar]
- 33.Yang X, Laliberté F, Germain G, Raut M, Duh MS, Sen SS, et al. Real-World Characteristics, Treatment Patterns, Health Care Resource Use, and Costs of Patients with Diffuse Large B-Cell Lymphoma in the U.S. The Oncologist. 2021 May 1;26(5):e817–26. doi: 10.1002/onco.13721. [DOI] [PMC free article] [PubMed] [Google Scholar]