Skip to main content
Neuro-Oncology logoLink to Neuro-Oncology
. 2023 Dec 23;26(6):1163–1170. doi: 10.1093/neuonc/noad249

Developing a computable phenotype for glioblastoma

Sandra Yan 1,#, Kaitlyn Melnick 2,#, Xing He 3,4, Tianchen Lyu 5,6, Rachel S F Moor 7, Megan E H Still 8, Duane A Mitchell 9, Elizabeth A Shenkman 10,11, Han Wang 12, Yi Guo 13,14, Jiang Bian 15,16,3, Ashley P Ghiaseddin 17,3,
PMCID: PMC11145437  PMID: 38141226

Abstract

Background

Glioblastoma is the most common malignant brain tumor, and thus it is important to be able to identify patients with this diagnosis for population studies. However, this can be challenging as diagnostic codes are nonspecific. The aim of this study was to create a computable phenotype (CP) for glioblastoma multiforme (GBM) from structured and unstructured data to identify patients with this condition in a large electronic health record (EHR).

Methods

We used the University of Florida (UF) Health Integrated Data Repository, a centralized clinical data warehouse that stores clinical and research data from various sources within the UF Health system, including the EHR system. We performed multiple iterations to refine the GBM-relevant diagnosis codes, procedure codes, medication codes, and keywords through manual chart review of patient data. We then evaluated the performances of various possible proposed CPs constructed from the relevant codes and keywords.

Results

We underwent six rounds of manual chart reviews to refine the CP elements. The final CP algorithm for identifying GBM patients was selected based on the best F1-score. Overall, the CP rule “if the patient had at least 1 relevant diagnosis code and at least 1 relevant keyword” demonstrated the highest F1-score using both structured and unstructured data. Thus, it was selected as the best-performing CP rule.

Conclusions

We developed and validated a CP algorithm for identifying patients with GBM using both structured and unstructured EHR data from a large tertiary care center. The final algorithm achieved an F1-score of 0.817, indicating a high performance, which minimizes possible biases from misclassification errors.

Keywords: computable phenotype, Electronic Health Records (EHRs), glioblastoma, structured data, unstructured data


Key Points.

  • Nonspecific diagnostic codes make it difficult to identify glioblastoma multiforme (GBM) patients for population studies.

  • Computable phenotypes can be used to identify patients with specific conditions in large electronic health records.

  • We created a computable phenotype for GBM patients using structured and unstructured data.

Importance of the Study.

Glioblastoma is the most common malignant brain tumor, and as such, it is important to conduct population studies that investigate this group. Many studies have been limited in size and biased by lack of racial and ethnic diversity. Determining appropriate patients is also challenged by nonspecific diagnostic codes. Computable phenotypes (CPs) can be used to identify patients with glioblastoma for large population studies, so we designed a CP for GBM patients using structured and unstructured data.

Glioblastoma multiforme (GBM) is a devastating illness, with a poor prognosis and a median survival of 15 months after diagnosis, even with surgical gross total resection followed by adjuvant radiation and chemotherapy.1 As the most common malignant brain tumor, it is estimated that roughly 13 000 people in the United States alone received a GBM diagnosis in 2022.2 Much of the current knowledge regarding the natural history of GBM is from small, randomized trials at academic health centers. Unfortunately, such studies typically lack large sample sizes and may have limited generalizability, since patients with poor functional status and/or significant disease burden are often excluded from these trials. There is also a concern for selection bias given the homogeneity of large GBM clinical trials, which are overwhelmingly carried out at tertiary care centers. Furthermore, it is well established that oncological clinical trials have poor racial and ethnic diversity,3 which highlights health disparities faced by racial–ethnic minorities. Blacks, Hispanics, and Asians are underrepresented in brain tumor trials,4 [cite] and there are well-documented racial–ethnic disparities in optimal treatment and subsequent survival of brain tumors. Prior studies lack vital information on patients’ medical history, family history, and social determinants of health, which are important for the diagnosis and prognosis of GBM.

In recent years, hospital systems across the country have adopted the widespread use of electronic health record (EHR) systems. The United States Food and Drug Administration (FDA) has called for the use of real-world data (RWD), such as information from EHRs and administrative claims data, to generate real-world evidence for medical literature. These reflect patients who are treated in real-world clinical settings, leading to greater generalizability.5–7 Thus, there has been a push to develop larger clinical research networks to answer the call from the FDA for RWD and real-world evidence. The national Patient-Centered Outcomes Research Network is one of the most prominent examples of this type of large clinical research network. This network has collated and linked EHRs and claims data from more than 80 million Americans.8 RWD, especially from EHRs, contain an extensive number of longitudinal patients’ sociodemographic and clinical information. This can be found in the form of both structured data (eg, diagnoses, medications, procedures, and laboratory results) and unstructured data (eg, clinicians’ notes, discharge summary, pathology and radiology reports), which provide an opportunity for clinical research and population studies in the form of real-world evidence. This can be applied to GBM, but we must first be able to accurately identify GBM patients using their EHRs.

Using diagnostic codes alone (ie, International Classification of Diseases—9th/10th revisions—Clinical Modification [ICD-9/10-CM]) to identify patients in RWD leads to substantial misclassification errors.9–11 Using the ICD-9 or -10 code method of identifying patients is also nearly impossible for GBM population-based studies, as there is no diagnostic ICD-10 code that is specific for GBM. The codes that are frequently used, C71.*, are also used for other malignant brain tumors, such as anaplastic astrocytomas, high-grade oligodendrogliomas, etc. To complicate matters further, secondary brain metastases, which fall under an ICD-10 code of C79.31, may be miscoded as C71.* due to misdiagnosis or provider coding error. Our group is not the first to discover how challenging it is to use existing diagnosis codes to accurately identify our patient population of interest; Reed-Guy et al. attempted to use ICD-10 codes to identify GBM patients with deep venous thrombosis or pulmonary embolism but found that upon manual chart review, only 68.7% of possible study subjects were actually included in the study.12

This challenge highlights the need for and impetus behind the creation of the computable phenotype (CP). CPs are clinical conditions, characteristics, or sets of clinical features that can be determined solely from EHRs and ancillary data sources. CCPs have been developed for various clinical conditions, but have not been created for GBM.13–15 Traditionally, CPs only consider structured data elements (eg, coded data such as diagnostic and procedural codes). Leveraging both structured EHR data and unstructured narratives (eg, keywords of a condition) can significantly improve the performance of CPs.9,11 The purpose of this study was to develop and validate a CP that uses both structured and unstructured EHR data for GBM that can then be used to conduct large-scale population and epidemiologic studies.

Methods

Data Source

To develop the CP, we used data from the University of Florida (UF) Health system, which is a large tertiary academic center in north central Florida. The UF Health Integrated Data Repository is a centralized clinical data warehouse that aggregates and stores clinical and research data from various sources within the UF Health system, including the Epic EHR system. This study received approval from the UF Institutional Review Board.

Overall Study Design

Figure 1 demonstrates the 2-step approach that was taken to develop the GBM CP: Step 1—refinement of the GBM-relevant diagnosis codes, procedure codes, medication codes, and keywords through manual chart review of patient EHR data, and Step 2—evaluation of the performances of various possible proposed CPs constructed from the GBM-relevant codes and keywords.

Figure 1.

Figure 1.

Workflow for the development of the computable phenotype algorithm.

Step 1: Refining the Computable Phenotype

Through literature review and discussions with clinicians, we created an initial list of EHR criteria that could be utilized as the basis to identify patients with GBM, including diagnosis codes (eg, ICD-9/10-CM codes), procedure codes (eg, CPT codes), medications (eg, temozolomide), and indicative keywords (eg, “GBM,” “Glioblastoma,” etc.). For these 4 categories, the codes and keywords designed to capture the broadest group of patients (eg, any possible keyword that is remotely related to GBMs) were initially used as a wide-net algorithm to identify potential GBM patients. These wide-net criteria had high sensitivity (eg, among the possible patients to include, most of the patients who do have GBM will be identified) but low positive predictive value (PPV) (eg, many of the identified potential GBM patients may not actually have the disease). Through multiple rounds of manual chart review, we refined the algorithm to exclude any codes and keywords that led to a low PPV.

We used an iterative process to refine the CP algorithm, where each iteration involved 4 steps: (1) preparing or updating the CP elements and their associated codes or keywords; (2) creating rules using these updated CP elements (eg, for the CP element “diagnosis code,” possible CP rules could be: “≥1 diagnosis code,” “≥2 diagnosis codes,” etc.); (3) generating possible candidate CP algorithms by combining these CP rules; and (4) sampling patients for chart review, using a flexible sampling strategy. In each iteration, we split the code/keyword sets of the 4 basic CP rules (ie, diagnoses, procedures, medications, and keywords) into 3 likelihood groups (ie, indicative of GBM): high likelihood, unlikely, very unlikely. For each likelihood group, we created rules and different combinations of these rules to sample patients for chart review. Two reviewers performed manual chart review to determine whether the patient had GBM. A third reviewer helped resolve any conflicts, if needed. Based on the results of the manual chart review, we determined whether specific code/keyword sets had poor performances (low PPV) and whether or not they needed to be excluded. We then used the updated CP elements and code/keyword sets as the starting point for the next round of refinement. We continued to refine the CP elements until no code/keyword sets needed to be excluded and all the remaining codes/keyword sets had high PPVs. We used these stable code/keyword sets as the final code/keyword sets for these 4 CP elements then created CP rules for these 4 CP elements and different combinations of these rules. More patients were sampled for specific combinations to make the distribution of the number of reviewed patients consistent with the distribution of the number of patients identified by the CP rule combinations in the patient data.

Step 2: Evaluation of the Computable Phenotype Rules to Derive the Final Computable Phenotype Algorithm

Once the final CP elements and their corresponding code/keyword sets were obtained, the performance of each possible CP rule was evaluated (Table 3). It is important to note that when evaluating the performance of the CP rules, the restrictions brought by the “-” symbol used in Table 3 were dropped. Utilizing the chart-reviewed samples as a gold standard, the authors assessed the specificity, sensitivity, PPV, negative predictive value (NPV), and F1-score of the CP rules under 2 different scenarios: (1) the use of structured data alone, and (2) the use of both structured and unstructured data.

Table 3.

Final computable phenotype rule element.

Code/keyword sets
Diagnosis codes
ICD-9-CM 191.* – “Malignant neoplasm of brain” and 192.2 – “Malignant neoplasm of spinal cord”
ICD-10-CM C71.* – “Malignant neoplasm of brain” and C72.0 – “Malignant neoplasm of spinal cord”
Procedure codes (selected examples)
CPT eg, 61140 – “Burr hole(s) or trephine; with biopsy of brain or intracranial lesion,” 61304 – “Craniectomy or craniotomy, exploratory; supratentorial,” 61305 – “Craniectomy or craniotomy, exploratory; infratentorial (posterior fossa),” 61333 – “Exploration of orbit (transcranial approach), with removal of lesion,” etc.
Medication codes (selected examples)
RxNorm eg, 261290, 835948, 261291, 705617, 261289, etc.
Keywords
Glioblastoma,” “gliosarcoma,” “gbm,” “Optune,” “Novocure

Results

GBM Computable Phenotype Development and Refinement

In total, 6 rounds of manual chart review were conducted to refine the CP elements. Table 1 shows the sampling strategy, the number of patients sampled, the number of patients reviewed, and the refinement strategy for each round of manual chart review. A total of 347 patients’ charts was manually reviewed. Table 2 demonstrates demographics of included patients.

Table 1.

Six rounds of computable phenotype rule element refinement

Chart review round Sampling strategy # of patients sampled # of patients reviewed Refinement strategy
Base rules Sample selection
1 ≥1 diagnosis code;
≥1 procedure code;
≥1 medication code;
≥1 keyword
Randomly selected 1% from patients who meet the requirement of each base rule combination. The minimum number is 3. 124 5 Break down the code sets for diagnosis, procedures, and medication into groups: 1) highly likely, 2) unlikely and 3) highly unlikely. Wrong codes are excluded.
2 ≥1 highly likely diagnosis code;
≥1 low likely diagnosis code;
≥1 highly unlikely diagnosis code;
≥1 highly likely procedure code;
≥1 low likely procedure code;
≥1 highly unlikely procedure code;
≥1 highly likely medication code;
≥1 low likely medication code;
≥1 highly unlikely medication code;
≥1 keyword
Randomly selected 1% from patients who meet the requirement of each base rule combination. The minimum number is 3. 375 36 All the reviewed patients from the “highly unlikely” group are negative. And some of the codes marked as “highly likely” or “low likely” are also not accurate. Those codes with a PPV of 0 are excluded.
3 ≥1 highly likely diagnosis code;
≥1 low likely diagnosis code;
≥1 highly unlikely diagnosis code;
≥1 high likely procedure code;
≥1 low likely procedure code;
≥1 highly unlikely procedure code;
≥1 highly likely medication code;
≥1 low likely medication code;
≥1 highly unlikely medication code;
≥1 keyword
Randomly selected 1% from patients who meet the requirement of each base rule combination. The minimum number is 3. 285 148 Refine the code set to keep codes with high PPV only.
4 ≥1 diagnosis code;
≥1 procedure code;
≥1 medication code;
≥1 keyword
Randomly selected 1% from patients who meet the requirement of each base rule combination. The minimum number is 3.
For those combinations containing base rule “≥1 diagnosis code” and “≥1 keyword,” randomly selected 5% from patients who meet the requirement of each combination. The minimum number is 5.
19 19 No refinement will be conducted. Review more patients to increase the patient sample size.
5 ≥1 diagnosis code;
≥1 procedure code;
≥1 medication code;
≥1 keyword
Randomly selected 1% from patients who meet the requirement of each base rule combination. The minimum number is 3.
For those combinations containing base rule “≥1 diagnosis code” and “≥1 keyword,” randomly selected 10% from patients who meet the requirement of each combination. The minimum number is 5.
64 64 No refinement will be conducted. Review more patients to increase the patient sample size.
6 ≥1 diagnosis code;
≥1 procedure code;
≥1 medication code;
≥1 keyword
Randomly selected 1% from patients who meet the requirement of each base rule combination. The minimum number is 3.
For those combinations containing base rule “≥1 diagnosis code” and “≥1 keyword,” randomly selected 10% from patients who meet the requirement of each combination. The minimum number is 5.
80 80 No refinement will be conducted. Review more patients to increase the patient sample size.

Table 2.

Demographics of patients identified as GBM with the CP who had at least 2 encounters within a 2-month period with a diagnosis consistent with glioblastoma between 1/1/2012 and 12/31/2020

CP rule
Structured and unstructured data Structured DATA ONLY
Best F1 Best PPV Best F1 Best PPV
Total N = 642 N = 177 N = 1268 N = 151
Gender
 Female 262 (40.8%) 69 (39.0%) 555 (43.8%) 62 (41.1%)
 Male 380 (59.2%) 108 (61.0%) 713 (56.2%) 89 (58.9%)
Age
 <50 92 (14.3%) 28 (15.8%) 225 (17.7%) 26 (17.2%)
 ≥50 550 (85.7%) 149 (84.2%) 1043 (82.3%) 125 (82.8%)
Race/ethnicity
 Non-Hispanic White 539 (84.0%) 159 (89.8%) 1005 (79.3%) 139 (92.1%)
 Non-Hispanic Black 40 (6.2%) 6 (3.4%) 116 (9.2%) 6 (4.0%)
 Asian and Pacific Islander 4 (0.6%) 1 (0.6%) 8 (0.6%) 0
 Hispanic 22 (3.4%) 6 (3.4%) 42 (3.3%) 4 (2.6%)
 Others 18 (2.8%) 4 (2.3%) 45 (3.5%) 2 (1.3%)
 Unknown 19 (3.0%) 1 (0.6%) 52 (4.1%) 0

Table 3 shows the final refined CP elements (see Supplementary Appendix for the complete list). Table 4 shows different combinations of the 4 base rules, the number of patients who met the criteria for each combination, the number of manual chart review samples, and the number of actual GBM patients in the samples.

Table 4.

Different combinations of 4 base rules, the number of patients who met the criteria for each combination, the number of manual chart review samples, and the number of actual GBM patients in the samples

Diagnosis Procedure Medication Keyword Total # of patients # of patients selected # of GBM patients
≥1 ≥1 ≥1 ≥1
 +  3660 230 98
+ 2860 167 55
+ 535 77 35
+ 1348 170 99
 +  + 1323 119 55
 +  + 513 61 35
 +  + 1260 141 98
+ + 384 49 26
+ + 838 99 55
+ + 423 43 35
 +  + + 380 45 26
 +  + + 784 83 55
 +  + + 422 42 35
+ + + 324 32 26
 +  + + + 324 32 26

Note that “+” means the patient MUST fulfil this rule element.

Performance Evaluation of Computable Phenotype Rules

The performance of the CP rules was evaluated using manual chart review results as the gold standard. The inter-rater reliability as measured by Cohen’s kappa coefficient was 1.00. As illustrated in Table 4, the CP rules that yielded the best specificity, sensitivity, PPV, NPV, and F1-score under the 2 scenarios (structured data only, and both structured and unstructured data) are presented. The final CP algorithm for identifying GBM patients was selected based on the best F1-score.

As seen in Table 5, for CP rules based on structured data only, the highest specificity was 0.906 with the CP rule of “if the patient had at least 1 relevant diagnosis code, at least 1 relevant procedure code, and at least 1 relevant medication code.” The highest sensitivity was 0.990 with the CP rule of “if the patient had at least 1 relevant diagnosis code.” The highest PPV was 0.578 with the CP rule of “if the patient had at least 1 relevant diagnosis code, at least 1 relevant procedure code, and at least 1 relevant medication code.” The highest NPV was 0.986 with the CP rule of “if the patient had at least 1 relevant diagnosis code.” Overall, when considering structured data alone, the best-performing CP rule was “if the patient had at least 1 relevant diagnosis code” with an F1-score of 0.596.

Table 5.

Best-performing computable phenotype rules and their performance with 95% confidence intervals

Best CP rule Performance
Specificity Sensitivity PPV NPV F1-score
Structured data only
Best Specificity ≥1 diagnosis code, ≥1 procedure code and ≥1 medication code 0.906
[0.821, 0.992]
0.263
[0.134, 0.391]
0.578
[0.433, 0.722]
0.716
[0.584, 0.848]
0.361
[0.221, 0.501]
Best Sensitivity ≥1 diagnosis code 0.350
[0.254, 0.446]
0.990
[0.970, 1.000]
0.426
[0.327, 0.526]
0.986
[0.963, 1.000]
0.596
[0.497, 0.694]
Best PPV ≥1 diagnosis code, ≥1 procedure code and ≥1 medication code 0.906
[0.821, 0.992]
0.263
[0.134, 0.391]
0.578
[0.433, 0.722]
0.716
[0.584, 0.848]
0.361
[0.221, 0.501]
Best NPV ≥1 diagnosis code 0.350
[0.254, 0.446]
0.990
[0.970, 1.000]
0.426
[0.327, 0.526]
0.986
[0.963, 1.000]
0.596
[0.497, 0.694]
Best F1-score ≥1 diagnosis code 0.350
[0.254, 0.446]
0.990
[0.970, 1.000]
0.426
[0.327, 0.526]
0.986
[0.963, 1.000]
0.596
[0.497, 0.694]
Structured and unstructured data
Best Specificity ≥1 procedure code, ≥1 medication code and ≥1 keyword 0.970
[0.912, 1.000]
0.263
[0.110, 0.415]
0.813
[0.677, 0.948]
0.730
[0.576, 0.884]
0.397
[0.227, 0.566]
Best Sensitivity ≥1 diagnosis code and ≥1 keyword 0.788
[0.673, 0.904]
0.990
[0.962, 1.000]
0.695
[0.565, 0.825]
0.994
[0.972, 1.000]
0.817
[0.707, 0.926]
Best PPV ≥ 1 diagnosis code, ≥1 medication code and ≥1 keyword 0.966
[0.852, 1.000]
0.354
[0.057, 0.650]
0.833
[0.602, 1.000]
0.754
[0.487, 1.000]
0.496
[0.187, 0.806]
Best NPV ≥1 diagnosis code and ≥1 keyword 0.788
[0.673, 0.904]
0.990
[0.962, 1.000]
0.695
[0.565, 0.825]
0.994
[0.972, 1.000]
0.817
[0.707, 0.926]
Best F1-score ≥1 diagnosis code and ≥1 keyword 0.788
[0.673, 0.904]
0.990
[0.962, 1.000]
0.695
[0.565, 0.825]
0.994
[0.972, 1.000]
0.817
[0.707, 0.926]

For CP rules based on both structured and unstructured data, the highest specificity was 0.970 with the CP rule “if the patient had at least 1 relevant procedure code, at least 1 relevant medication code, and at least 1 relevant keyword.” The highest sensitivity was 0.990 with the CP rule “if the patient had at least 1 relevant diagnosis code and at least 1 relevant keyword.” The highest PPV was 0.833 with the CP rule “if the patient had at least 1 diagnosis code, at least 1 medication code, and at least 1 keyword.” The highest NPV was 0.994 with the CP rule “if the patient had at least 1 relevant diagnosis code and at least 1 relevant keyword.” Overall, the CP rule “if the patient had at least 1 relevant diagnosis code and at least 1 relevant keyword” demonstrated a significant improvement in the F1-score, which increased from 0.596 with structured data only to 0.817. Thus, it was selected as the best-performing CP rule for identifying GBM patients.

Discussion

In this study, we developed and validated a CP algorithm for identifying patients with GBM using both structured and unstructured EHR data from a large tertiary care center. The final algorithm achieved an F1-score of 0.817, indicating a high performance which minimizes possible biases from misclassification errors.

As illustrated in Table 4, most performance measures improved when unstructured data was added to the CP rules. It can be concluded that adding a keyword takes an ICD-10 code that is highly unspecific and makes it more specific. While the best-performing CP algorithm of “at least 1 relevant diagnosis code and at least 1 relevant keyword” had an F1-score of 0.817, the PPV was only 0.695, which indicates a relatively high number of false positives. This is likely due to the inadequacy of current methods in attempting to identify GBM patients, which typically include patients with other types of cerebral malignancies as well.

The development of a CP algorithm for identifying patients with GBM is, to the authors’ knowledge, the first of its kind. Some of the difficulty in identifying patients with GBM is due to the lack of a standardized diagnosis code, which leads to clinicians manually reviewing thousands of charts to determine whether a patient actually had GBM. The developed CP algorithm is automated and bypasses this intensive manual chart review process. Additionally, because the final CP algorithm is relatively straightforward and only requires a diagnosis code and a keyword, it can help identify patients with GBM in different healthcare systems that do not necessarily use the same EHR systems with minimal local refinement effort, which supports the development of multi-institutional collaborative efforts to improve research and optimize patient care.

Prior studies on the health outcomes in GBM patients have mostly used data from the Surveillance, Epidemiology, and End Results (SEER) program, which collects data on patient demographics, tumor site and morphology, courses of treatment, and survival. As a result, most of the analyses utilizing the SEER database focused on predicting survival rates of GBM patients. Very little is known regarding the social determinants of health, family history, and past medical history of GBM patients and whether these factors have an impact on the development and prognosis of the disease. The development of a GBM CP algorithm and its utility in quickly identifying GBM patients can support future studies investigating additional medical and social factors that may be associated with GBM outcomes. To that end, we plan to utilize this CP algorithm and expand upon existing data by collaborating with other institutions, with the eventual goal of applying this algorithm to large clinical data networks such as the OneFlorida + Clinical Research Consortium, which UF Health is part of and covers ~20 million patients across Florida, Georgia, and Alabama. This would enable researchers to analyze GBM patients’ demographics and medical comorbidities on a large scale.

Conclusions

In this study, we developed and validated a CP algorithm for identifying patients with GBM using both structured and unstructured EHR data from a large tertiary care center. The CP will support future studies that examine medical and social factors that may be associated with GBM outcomes.

Supplementary Material

noad249_suppl_Supplementary_Figure_S1
noad249_suppl_Supplementary_Table_S1

Contributor Information

Sandra Yan, Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA.

Kaitlyn Melnick, Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA.

Xing He, Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.

Tianchen Lyu, Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.

Rachel S F Moor, Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA.

Megan E H Still, Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA.

Duane A Mitchell, Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA.

Elizabeth A Shenkman, Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.

Han Wang, Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.

Yi Guo, Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.

Jiang Bian, Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.

Ashley P Ghiaseddin, Department of Neurosurgery, College of Medicine, University of Florida, Gainesville, Florida, USA.

Funding

Research reported in this publication was supported by the University of Florida Clinical and Translational Science Institute, which is supported in part by the National Institutes of Health National Center for Advancing Translational Sciences under award number UL1TR001427. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Conflict of interest statement

D.A.M. holds patented technologies that have been licensed or have exclusive options to license to Celldex Therapeutics, Annias, Immunomic Therapeutics, and iOncologi. D.A.M. received research funding from Immunomic Therapeutics. D.A.M. serves/served as an advisor/consultant to Bristol-Myers Squibb, Tocagen, Oncorus, and RM Global. D.A.M. is co-founder of iOncologi, Inc., an immuno-oncology biotechnology company. A.P.G. serves/served as an advisor/consultant to Neosoma and Monteris Medical. A.P.G. has received honoraria for advisory board participation from Alexion Pharmaceuticals, Servier, and Aptitude Health. A.P.G. has held stock in Viatris Inc. All other authors have no conflicts to report.

Authorship statement

Each author contributed to the text according to his/her specialty and each author reviewed and approved the final version of the manuscript.

Data Availability

The data generated in this study will be made available by reasonable request by contacting the corresponding author.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

noad249_suppl_Supplementary_Figure_S1
noad249_suppl_Supplementary_Table_S1

Data Availability Statement

The data generated in this study will be made available by reasonable request by contacting the corresponding author.


Articles from Neuro-Oncology are provided here courtesy of Society for Neuro-Oncology and Oxford University Press

RESOURCES