Abstract
Cancer immunotherapy is a rapidly growing field that is completely transforming oncology care. Mining this knowledge base for biomedically important information is becoming increasingly challenging, due to the expanding number of scientific publications, and the dynamic evolution of this subject with time. In this study, we have employed a literature-mining approach that was used to analyze the cancer immunotherapy-related publications listed in PubMed and quantify emerging trends. A total of 93,033 publications published in 5055 journals have been retrieved, and 141 meaningful topics have been identified, which were further classified into eight distinct categories. Statistical analysis indicates a mean annual increase in the number of published papers of approximately 8% in the last 20 years. The research topics that exhibited the highest trends included “immune checkpoint inhibitors,” “tumor microenvironment,” “HPV vaccination,” “CAR T-cells,” and “gene mutations/tumor profiling.” The top identified cancer types included “lung,” “colorectal,” and “breast cancer,” and a shift in popularity from hematological to solid tumors was observed. As regards clinical research, a transition from early phase clinical trials to randomized control trials was recorded, indicating that the field is entering a more advanced phase of development. Overall, this mining approach provided an unbiased analysis of the cancer immunotherapy literature in a time-conserving and scale-efficient manner.
Electronic supplementary material
The online version of this article (10.1007/s00262-020-02630-8) contains supplementary material, which is available to authorized users.
Keywords: Cancer immunotherapy, Literature mining, Trends analysis, Topic modeling, LDA
Introduction
Cancer immunotherapy is a rapidly growing field that is currently considered as the “fifth pillar” of cancer therapy, thus joining the ranks of surgery, cytotoxic chemotherapy, radiation and targeted therapy as a promising, innovative approach for combating cancer [1]. This stems from the fact that cancer cells express tumor antigens that can be detected and potentially eliminated by the immune system [2]. This can be achieved actively, hence by directly targeting tumor antigens, or passively by enhancing existing anti-tumor responses with the use of monoclonal antibodies, lymphocytes, or cytokines [3].
Our knowledge of the relationship between cancer and the immune system has increased considerably in the last two decades [4]. However, the concept of malignant disease manifestation and immune system interaction has been postulated a long time ago [5]. Hallmarks of this scientific tradition include the discovery of dendritic cells [6], the use of Bacillus Calmette–Guerin (BCG) in bladder cancer [7], the administration of interleukin-2 (IL-2), interferon-alpha (IFN-α), and tumor necrosis factor (TNF) against hematological and solid tumors [8–10], and the discovery of Toll-like receptors’ function in immune system evasion, tumor growth and survival [11].
Schreiber and his colleagues [12] describe the cancer immunoediting concept where the tumor immune system balance shifts among tumor escape, equilibrium, and elimination. Poor antigenic expression, immunosuppressive cytokines, myeloid-derived suppressor cells (MDSC), and expression of negative regulatory receptors on T cells assist in tumor escape. The tumor and the adaptive immune system coexist in the equilibrium phase where the immune system creates a growth-inhibitory environment, and antigenic tumor outgrowths are kept in check. In tumor elimination, which often occurs in early tumor development, highly antigenic tumor clones are recognized and eliminated by both innate and adaptive immune systems.
Until today, several different types of cancer immunotherapies have been experimentally tested, are currently ongoing clinical trials, or have been approved by major regulatory organizations around the world [13]. The contemporary immunooncology landscape is teeming with an arsenal of active agents, molecular targets, novel therapy types, and cancer-specific applications [14–16]. This expanding universe of scientific information mirrors the complexity of the underlying cancer biology, the diversity of scientific disciplines and research methods used in the analysis, understanding, and manipulation of the immune system, and its ability to fight cancer [17].
Extracting and analyzing this information, in order to identify relevant subjects, explore topic dynamics, and generate new research hypotheses, is becoming increasingly challenging. It is nearly impossible to grasp the entire spectrum of the published literature and synthesize concepts in a systematic and unbiased way [18]. An alternative and more innovative approach for automatically retrieving information from a vast number of scientific publications is through literature mining [19]. Data mining methods have the unique potential of new knowledge discovery, enabling researchers to integrate various sources of information and capture vital scientific insights, in a time-conserving and scale-efficient manner [20, 21]. These methods have been successfully used in the past to inform current practice, update existing protocols, and drive research initiatives across many scientific disciplines [22–25].
In this study, we use a topic modeling approach to understand the historical evolution of cancer immunotherapy, recapture known facts, and identify new subjects and trends that are associated with this rapidly expanding field. One of the most popular topic modeling algorithms, latent Dirichlet allocation (LDA), has been used to pursue text mining of the large cancer immunotherapy corpus [26, 27], and resulting topics and trends are presented herein.
Materials and methods
The analysis of cancer immunotherapy literature is based on the following three-step approach proposed in Drosatos and Kaldoudi [28]:
A PubMed query was created to build the corpus of published literature on cancer immunotherapy.
The main topics in the area of cancer immunotherapy were identified via the latent Dirichlet allocation (LDA) unsupervised topics modeling algorithm [26].
The trends were derived based on the popularity of each topic per year.
These three steps are presented in detail in the following subsections.
Search strategy
The research strategy of our literature-mining analysis followed the core idea of a systematic review. We are differentiated, in our trying to identify a representative set of the literature (not necessarily a superset) that is relevant to the cancer immunotherapy field, and simultaneously minimize the number of irrelevant articles. The resulted articles of this research strategy were then used for deriving topics and their trends.
Given that LDA uses a probabilistic and unsupervised approach, we decided to limit our search in the PubMed search engine [29], the most popular and comprehensive database for biomedical scientific literature, in order to avoid irrelevant articles to the desired field. The syntax of our search query consists of two parts: the synonym terms of “cancer” and the synonym terms of “immunotherapy.” This method is known as query expansion, and the synonym terms were extracted by the MeSH taxonomy, which is a comprehensive controlled vocabulary for the purpose of indexing articles in PubMed. The exact query was the following:
Each part of the query was designed to retrieve articles classified under the specific MeSH term and articles including any of the synonym terms in the title (TI) or abstract (AB) field. The intersection of both parts was performed using the logical operator “AND” in order to retrieve the final corpus of articles published up to the end of year 2018. PubMed database was searched on October 3, 2019, and the results were downloaded in an XML format using the provided export function.
Topic modeling
Topic modeling algorithms are probabilistic methods that automatically identify topics from a large and unstructured collection of documents. In this work, we used the LDA algorithm [26, 27], and especially its scalable implementation in the MALLET toolkit (v2.0.8) [30], as proposed in Drosatos and Kaldoudi [28]. In order to avoid any noise in the topic modeling from the free text of articles, we applied the following cleaning process: (1) removed all punctuation and non-Latin characters; (2) excluded all stop words using the list in the Text Categorization Project [31]; (3) converted all words to their lemmas by applying the Krovetz stemming procedure [32]; and (4) excluded articles with no words in their abstracts or with less than 3 characters in their titles.
There are several heuristic approaches to tune LDA algorithm for identifying a meaningful number of topics and iterations [33]. In order to determine the appropriate number of topics and iterations to be used as parameters in the LDA algorithm in this study, we followed the approach proposed in Drosatos and Kaldoudi [28]. Thus, we performed a series of exploration experiments using different number of topics (from 50 to 250, with a step of 10) at different iterations (i.e., 8000 and 10000 iterations). For each number of topics and iterations, we repeated the experiment 10 times and calculated the similarity between successive repetitions using Jaccard distance [34]. This similarity distance was calculated between the list of the 10 top words defining each topic with the respective list of words in the different repetition. The final number of topics and iterations were selected when the percentage of topics with a similarity distance less than or equal to 0.57143 achieved a local maximum [28]. This practically means that the topics have more than or equal to 6 words out of the first 10 words (i.e., ≥ 60%) in common.
After the tuning process of the LDA algorithm, the artificially generated topics, where each topic consists of a weighted list of words, were screened, manually labeled giving a short label (title), and organized in conceptual categories. This process was independently performed by the authors of this paper for the whole list of topics, taking into account the top 20 words of each topic. Then, the researchers discussed their findings and agreed on a unified topics list with their respective categories.
Trend analysis
The trend analysis of topics was based on the approach proposed in Drosatos and Kaldoudi [28]. First, the weight of each topic for each document was calculated as the percentage of the document words belonging to a topic. Then, the popularity of the topic defined as the yearly topic contribution estimate P(t,y) of the topic (t) for each year (y) was calculated as the average weight of this topic for all documents published that year Dy:
1 |
where t represents a topic and w is a word in document d of the documents’ collection Dy for year y. Accordingly, the overall popularity of a topic was defined as the overall topic contribution estimate, calculated as the average weight of this topic for all documents included in the corpus.
Finally, we applied moving averaging (over 3 years interval) to smooth out short-term fluctuations. Furthermore, we used the linear regression coefficient to identify the positive or negative trend for each topic. The overall topic modeling, labeling, and trend analysis process that was used on this paper can be also performed via a web-based platform (TM-Toolkit), which allows biomedical researchers with no experience in data modeling and programming to execute topic modeling and trends analysis of the literature using the PubMed database [35].
Results
Search results
The PubMed query (performed on October 3, 2019) returned 108,435 publications (total XML file size of 1.37 GB). Preprocessing excluded 15,402 publications with no abstracts (14.2% of all retrieved records). The final corpus included title, abstract, and keywords of 93,033 publications, corresponding to a total of 11,034,162 words and a vocabulary of 193,497 unique words.
The first record that was retrieved and was included in the final corpus dates to 1922 was published in the Journal of Experimental Medicine. The total retrieved records have been published in 5055 journals, which correspond to 15.5% of the total number of journals that are currently indexed in PubMed (as retrieved on 14-03-2020 from the online PubMed journal list available at https://www.nlm.nih.gov/bsd/serfile_addedinfo.html. The top 5 journal titles were: Cancer Research, AACR; Cancer Immunology, Immunotherapy, Springer; Journal of Immunology, AAI; Bone Marrow Transplantation, Nature; and Blood; ASH. As represented, the cancer immunotherapy domain corresponds to approximately 0.37% of the entire PubMed corpus (as of the end of 2018).
Figure 1 shows the distribution of publications, per year, that were included in the corpus as an absolute value and a percentage of the total number of articles indexed in PubMed each year. The cancer immunotherapy field shows a mean increase in the number of published papers of approximately 8% per annum during the last 20 years, and almost 16% in the last 5 years.
Topic modeling: resulting topics and categories
Screening of the 150 topics led to the identification of 141 meaningful topics (94% of all topics), which were organized into eight categories and five subcategories as follows:
Targeted immunotherapy: Twenty-two topics discussing key concepts of cancer immunotherapy such as “immune checkpoint inhibitors,” “PD-1/PD-L1,” “CAR T-cells,” “bispecific monoclonal antibodies,” “invariant natural killer T cells (iNKT),” and “dendritic cell-based vaccines.”
Cancer type: Twenty-one topics corresponding to specific cancer types (“breast cancer,” “colorectal cancer,” “melanoma,” “lymphoma,” “lung cancer,” etc.).
Diagnosis: Four topics discussing certain aspects of cancer diagnostics, namely “imaging, “immunohistochemistry,” “carcinoembryonic antigen (CEA),” and “leukemia-associated phenotypic markers.”
Cancer therapies: Sixteen topics that include classical cancer therapies (e.g., “chemotherapy,” radiotherapy,” “bone marrow transplantation,” etc.), and modern applications (e.g., “gene therapy,” “biological response modifiers,” etc.).
Clinical research: Eighteen topics addressing various aspects of clinical research such as “early phase clinical studies,” “randomized control trials,” “survival prognostic factors,” “clinical study outcomes,” and “survival prognostic factors.”
Mechanisms of carcinogenesis: Twelve topics related to the pathogenesis and/or progression of cancer (e.g., “cancer stem cells,” “cell adhesion,” “lymphatic metastasis,” “cell signaling pathways,” “tumor microenvironment,” “and adaptive immune response”).
Translational research: a broad topic further classified into five subcategories, namely (i) animal research including topics such as “murine tumor models” and “rat studies and experiments,” (ii) cell type including “T regulatory cells,” “cytotoxic T lymphocytes,” “dendritic cells,” “TH1/TH2 response,” etc., (iii) methodology including “gene expression studies,” “cell lines/in vitro culture,” “flow cytometry,” “nanoparticles,” etc., (iv) pathways and physiology including “cell metabolism,” “apoptosis,” “microbiome,” “epigenetic regulation,” etc., and (v) protein-based including “tumor antigens,” “chemokine receptors and ligands,” “cytotoxic T lymphocytes epitopes,” etc.
General: Seven topics that include reviews on the development of new cancer therapies, including “systematic reviews and meta-analyses,” “clinical guidelines,” etc.
Table 1 (Appendix) contains the list of identified topics organized in the above categories. The overall popularity of each topic is presented as percentage of the overall topic contribution and is used to calculate the rank of the topic in the entire list. (Most popular topic is ranked first.) Within each category, topics are organized in two groups, corresponding to positive and negative trends, respectively. Within each group, topics are listed in descending order using the absolute value of the corresponding regression coefficient.
Table 1.
Topic label | Trend analysis | Topic popularity | ||
---|---|---|---|---|
Reg. coeff. | R2 (%) | Contrib. (%) | Rank | |
Category: Targeted Immunotherapy | ||||
Positive trends | ||||
Immune checkpoint inhibitors | 0.002007 | 65.99 | 1.26 | 11 |
HPV vaccination | 0.001203 | 78.23 | 1.08 | 17 |
CAR T cells* | 0.000926 | 81.98 | 0.76 | 35 |
PD-1 / PD-L1 | 0.000581 | 57.73 | 0.37 | 86 |
Tyrosine kinase inhibitors* | 0.000376 | 87.56 | 0.35 | 95 |
Toll-like receptor and CpG oligodeoxynucleotides | 0.000244 | 77.61 | 0.34 | 97 |
Adoptive cell transfer (ACT) | 0.000174 | 21.03 | 0.62 | 54 |
Indoleamine 2.3-dioxygenase (IDO)* | 0.000112 | 90.57 | 0.15 | 138 |
Tumor-associated macrophages (TAMs) | 0.000060 | 35.42 | 0.26 | 117 |
Invariant natural killer T cells (iNKT) | 0.000052 | 26.03 | 0.16 | 137 |
Tumor-infiltrating lymphocytes | 0.000017 | 27.36 | 0.21 | 128 |
Vascular endothelial growth factor (VEGF)† | 0.000013 | 03.06 | 0.32 | 103 |
Bispecific monoclonal antibodies† | 0.000004 | 00.08 | 0.64 | 51 |
Negative trends | ||||
Dendritic cell-based vaccines | − 0.000628 | 60.15 | 1.01 | 20 |
Anti-idiotypic monoclonal antibodies* | − 0.000326 | 95.78 | 0.75 | 36 |
IL-2 immunotherapy* | − 0.000325 | 87.08 | 0.66 | 49 |
Carbohydrate tumor antigen* | − 0.000202 | 80.36 | 0.34 | 98 |
Cancer testis antigens (CTA) | − 0.000130 | 37.86 | 0.35 | 94 |
Granulocyte-macrophage colony-stimulating factor (GM-CSF)* | − 0.000128 | 94.38 | 0.21 | 130 |
HER2/neu | − 0.000072 | 47.76 | 0.20 | 133 |
Immunotoxins | − 0.000042 | 42.11 | 0.22 | 126 |
Heat shock protein† | − 0.000021 | 16.54 | 0.20 | 131 |
Category: Cancer Type | ||||
Positive trends | ||||
Lung cancer | 0.000391 | 62.80 | 0.46 | 70 |
Pancreatic & head and neck cancer | 0.000147 | 67.94 | 0.39 | 83 |
Cervical cancer (HPV) | 0.000145 | 32.49 | 0.57 | 60 |
Glioblastoma* | 0.000106 | 81.61 | 0.37 | 87 |
Breast cancer | 0.000094 | 53.78 | 0.36 | 89 |
Colorectal cancer | 0.000084 | 31.42 | 0.43 | 73 |
Prostate cancer† | 0.000080 | 12.58 | 0.35 | 91 |
Ovarian cancer* | 0.000063 | 82.44 | 0.25 | 121 |
Hepatocellular carcinoma | 0.000059 | 64.52 | 0.26 | 119 |
Pediatric cancers | 0.000050 | 70.32 | 0.26 | 116 |
Multiple myeloma | 0.000019 | 42.33 | 0.25 | 120 |
Chronic lymphocytic leukemia (CLL)† | 0.000009 | 02.74 | 0.26 | 115 |
Central nervous system cancer† | 0.000006 | 01.62 | 0.26 | 118 |
Negative trends | ||||
Myeloid leukemia* | − 0.000252 | 91.90 | 0.81 | 31 |
Non-Hodgkin lymphoma† | − 0.000149 | 15.61 | 0.53 | 66 |
Lymphoma* | − 0.000141 | 92.30 | 0.55 | 65 |
Renal cancer | − 0.000115 | 47.29 | 0.46 | 71 |
Melanoma | − 0.000058 | 28.25 | 0.61 | 55 |
Skin lesions of the face | − 0.000030 | 22.08 | 0.38 | 84 |
Endocrine tumors† | − 0.000022 | 19.34 | 0.21 | 129 |
Malignant mesothelioma | − 0.000007 | 26.95 | 0.16 | 136 |
Category: Diagnosis | ||||
Positive trends | ||||
Imaging* | 0.000188 | 89.09 | 0.28 | 111 |
Negative trends | ||||
Immunohistochemistry* | − 0.000314 | 91.44 | 0.94 | 23 |
Leukemia-associated phenotypic markers* | − 0.000234 | 88.79 | 0.71 | 43 |
Carcinoembryonic antigen (CEA)* | − 0.000083 | 94.53 | 0.13 | 141 |
Category: Cancer Therapies | ||||
Positive trends | ||||
Allogeneic stem cell transplantation | 0.000680 | 61.32 | 0.92 | 24 |
Radiotherapy | 0.000108 | 46.44 | 0.35 | 92 |
Photoimmunotherapy* | 0.000105 | 91.27 | 0.19 | 135 |
Biological response modifiers (BRM) | 0.000044 | 23.49 | 0.34 | 96 |
Anti-TNF therapies† | 0.000042 | 17.38 | 0.32 | 102 |
Negative trends | ||||
Radioimmunotherapy* | − 0.000682 | 93.95 | 0.72 | 40 |
Gene therapy* | − 0.000543 | 98.60 | 0.57 | 61 |
Bone marrow transplantation* | − 0.000437 | 81.62 | 0.44 | 72 |
Radioimmunotherapy* | − 0.000399 | 94.02 | 0.33 | 101 |
Adjuvant treatments | − 0.000301 | 77.00 | 1.14 | 15 |
Vaccine adjuvants | − 0.000148 | 27.12 | 0.91 | 25 |
Chemotherapy | − 0.000135 | 79.58 | 0.64 | 52 |
Interferon-based treatments* | − 0.000128 | 91.49 | 0.24 | 122 |
Bladder cancer intravesical therapy | − 0.000096 | 55.92 | 0.56 | 64 |
Clinical nutrition in surgery | − 0.000054 | 52.19 | 0.31 | 107 |
Vaccination† | − 0.000019 | 06.64 | 0.35 | 90 |
Category: Clinical Research | ||||
Positive trends | ||||
Immune-related adverse events | 0.000670 | 60.00 | 0.48 | 68 |
Anti-NMDA receptor encephalitis | 0.000433 | 76.72 | 0.63 | 53 |
Survival prognostic factors* | 0.000412 | 95.72 | 0.94 | 22 |
Progression-free survival (PFS)* | 0.000256 | 93.65 | 0.71 | 44 |
Randomized control trials* | 0.000165 | 90.96 | 0.66 | 48 |
Hypersensitivity reactions† | 0.000009 | 01.42 | 0.22 | 125 |
Quality of life† | 0.000009 | 05.21 | 0.35 | 93 |
Cancer risk factors† | 0.000003 | 00.10 | 0.61 | 56 |
Negative trends | ||||
Early phase clinical studies* | − 0.000501 | 97.15 | 1.16 | 14 |
Cancer treatment reports* | − 0.000473 | 91.49 | 1.62 | 8 |
CD34+ hematopoietic stem cells* | − 0.000450 | 87.27 | 0.52 | 67 |
Serum biomarkers* | − 0.000275 | 83.10 | 0.79 | 32 |
Lymphocyte stimulation* | − 0.000237 | 95.47 | 0.90 | 26 |
Peripheral blood mononuclear cells (PBMC)* | − 0.000232 | 86.80 | 0.70 | 45 |
Serum immunoglobulins* | − 0.000211 | 99.37 | 0.56 | 63 |
Case reports* | − 0.000178 | 81.88 | 1.14 | 16 |
Infections and immunodeficiency | − 0.000077 | 69.14 | 0.33 | 100 |
Clinical studies outcomes† | − 0.000031 | 09.20 | 1.20 | 12 |
Category: Mechanisms of Carcinogenesis | ||||
Positive trends | ||||
Tumor microenvironment* | 0.001795 | 94.88 | 2.17 | 4 |
Cell signaling pathways* | 0.000531 | 97.70 | 0.78 | 33 |
Role of regulatory immune cells | 0.000520 | 67.07 | 1.88 | 6 |
Cancer stem cells* | 0.000255 | 87.71 | 0.29 | 109 |
Negative trends | ||||
Inflammatory mediators* | − 0.000489 | 87.02 | 0.84 | 29 |
Epstein–Barr virus and cytomegalovirus* | − 0.000141 | 88.63 | 0.33 | 99 |
Adaptive immune response† | − 0.000137 | 12.90 | 1.74 | 7 |
Tumor metastasis* | − 0.000125 | 80.53 | 0.42 | 77 |
UV DNA damage (skin cancer) | − 0.000089 | 67.46 | 0.23 | 124 |
Lymphatic metastasis* | − 0.000053 | 80.47 | 0.27 | 112 |
Cell adhesion | − 0.000053 | 36.70 | 0.42 | 76 |
Viral hepatitis | − 0.000033 | 29.08 | 0.31 | 105 |
Category: Translational Research | ||||
Subcategory: Pathways and Physiology | ||||
Positive trends | ||||
Epigenetic regulation* | 0.000130 | 95.89 | 0.15 | 139 |
Oncolytic viruses | 0.000084 | 76.42 | 0.27 | 113 |
Microbiome | 0.000056 | 76.61 | 0.23 | 123 |
Exosomes | 0.000048 | 65.82 | 0.14 | 140 |
Cell metabolism | 0.000042 | 27.30 | 0.42 | 79 |
Negative trends | ||||
Apoptosis* | − 0.000239 | 93.85 | 0.42 | 75 |
Tumor-draining lymph nodes (TDLN)* | − 0.000192 | 90.82 | 0.27 | 114 |
Animal oncogenic viruses* | − 0.000131 | 82.34 | 0.41 | 80 |
Subcategory: Methodology | ||||
Positive trends | ||||
Gene mutations / tumor profiling | 0.000771 | 61.31 | 0.70 | 46 |
Nanoparticles (drug delivery)* | 0.000470 | 88.94 | 0.57 | 62 |
Predictive models* | 0.000160 | 83.11 | 0.36 | 88 |
Test assays | 0.000042 | 19.89 | 0.75 | 37 |
In vivo intratumoral immunotherapy† | 0.000037 | 01.13 | 2.15 | 5 |
Negative trends | ||||
Recombinant DNA vaccines | − 0.000501 | 73.01 | 0.72 | 39 |
Cell lines / in vitro cultures* | − 0.000417 | 92.98 | 1.29 | 10 |
Gene expression studies | − 0.000222 | 66.06 | 0.74 | 38 |
Protein purification (chromatography)* | − 0.000221 | 88.45 | 0.83 | 30 |
Protein/gene sequencing* | − 0.000204 | 91.74 | 0.40 | 82 |
Flow cytometry (CD markers) | − 0.000117 | 63.64 | 0.77 | 34 |
Bacteria-based immunotherapies | − 0.000038 | 41.03 | 0.20 | 132 |
BCG-based immunotherapies | − 0.000035 | 49.22 | 0.31 | 106 |
Gene polymporphisms† | − 0.000017 | 08.55 | 0.21 | 127 |
Subcategory: Protein-based | ||||
Positive trends | ||||
Tumor necrosis factor receptor (TNFR) superfamily | 0.000090 | 75.75 | 0.43 | 74 |
Chemokine receptors and ligands | 0.000056 | 37.71 | 0.19 | 134 |
Negative trends | ||||
Cytotoxic T lymphocytes (CTL) epitopes* | − 0.001005 | 97.16 | 0.98 | 21 |
Tumor antigens* | − 0.000420 | 83.00 | 0.88 | 28 |
Single-chain variable fragment (scFv) antibodies | − 0.000174 | 76.30 | 0.72 | 42 |
T cell receptor (TCR)† | − 0.000002 | 00.16 | 0.32 | 104 |
Subcategory: Cell Type | ||||
Positive trends | ||||
Natural killer cells* | 0.000398 | 90.08 | 0.66 | 50 |
T regulatory cells | 0.000394 | 47.25 | 0.47 | 69 |
Myeloid-derived suppressor cells* | 0.000365 | 83.64 | 0.31 | 108 |
T cell memory (CD4+, CD8+)† | 0.000052 | 01.26 | 1.20 | 13 |
Negative trends | ||||
Cytotoxic T lymphocytes* | − 0.000655 | 98.64 | 1.03 | 19 |
TH1/TH2 response* | − 0.000351 | 90.10 | 0.72 | 41 |
Dendritic cells | − 0.000326 | 57.70 | 0.58 | 58 |
Cytokine-induced killer cells (CIK) | − 0.000037 | 25.48 | 0.42 | 78 |
Subcategory: Animal Models | ||||
Positive trends | ||||
n/a | ||||
Negative trends | ||||
Murine tumor models* | − 0.000928 | 88.31 | 0.68 | 47 |
Mouse studies and experiments* | − 0.000455 | 94.31 | 1.08 | 18 |
Syngeneic mouse models* | − 0.000292 | 80.53 | 1.30 | 9 |
Rat studies and experiments* | − 0.000159 | 86.47 | 0.38 | 85 |
Immunodeficient mouse models | − 0.000071 | 38.50 | 0.58 | 59 |
Category: General | ||||
Positive trends | ||||
Targeted therapies (review)* | 0.001418 | 88.93 | 3.46 | 1 |
Development of new cancer therapies (reviews)* | 0.001266 | 91.81 | 2.64 | 3 |
Cancer immunotherapy (reviews)* | 0.000851 | 88.63 | 3.24 | 2 |
Clinical guidelines* | 0.000333 | 87.83 | 0.60 | 57 |
Systematic review and meta-analysis* | 0.000262 | 92.18 | 0.29 | 110 |
Significant statistical results* | 0.000260 | 88.79 | 0.89 | 27 |
Negative trends | ||||
Anti-tumor immunity* | − 0.000153 | 87.05 | 0.40 | 81 |
Within each category, topics are organized in two groups corresponding to positive and negative trends, respectively; within each group, topics are listed with descending order of regression coefficient
* Topics with a good linear regression fit, R-squared > 80%
† Topics with nonsignificant reg. coefficient, p value > 0.05
Synthesis of results
Research topics with the highest rank (contribution) in the cancer immunotherapy literature included: (i) “tumor microenvironment” (2.17%), (ii) “in vivo intratumoral immunotherapy” (2.15%), (iii) “role of regulatory immune cells” (1.88%), (iv) “adaptive immune responses” (1.74%), (v) “cancer treatment reports” (1.62%), (vi) “syngeneic mouse models” (1.30%), (vii) “cell lines/in vitro cultures” (1.29%), (viii) “immune checkpoint inhibitors” (1.26%), (ix) “clinical studies—outcomes” (1.20%), and (x) “T-cell memory (CD4+, CD8+)” (1.20%). Representative world clouds of some of the most popular topics are shown in Fig. 2.
The top five research topics with the highest positive trend were: (i) “immune checkpoint inhibitors”; (ii) “tumor microenvironment”; (iii) “HPV vaccination”; (iv) “CAR T-cells”; and (v) “gene mutations/tumor profiling.” Likewise the top five research topics with the highest negative trend were: (i) “cytotoxic T lymphocytes (CTL) epitopes”; (ii) “murine tumor models”; (iii) “radioimmunotherapy”; (iv) “cytotoxic T lymphocytes”; and (v) “dendritic cell-based vaccines” (Fig. 3).
In the Targeted immunotherapy category, topics with increasing popularity included “immune checkpoint inhibitors,” “CAR T-cells,” “PD-1/PD-L1,” and “tyrosine kinase inhibitors,” all showing a significant increase in the last decade. On the other hand, the popularity of “dendritic cell-based vaccines” peaked in 2003 and has since been declining. Likewise the popularity of “adoptive cell transfer” peaked in 2010, and afterward, it has gradually decreased (Fig. 4).
As regards the Cancer-type category, a remarkable increase was observed at the “lung cancer” topic, followed by similar increases in “colorectal cancer” and “breast cancer.” An increase in “prostate cancer” was observed between 2009 and 2012, and its popularity has since been decreasing. On the contrary, the contribution of “melanoma,” “central nervous system cancers,” and “chronic lymphocytic leukemia (CLL)” has remained relatively stable over the last 15 years (Fig. 5).
In the Clinical research category, a steady decrease was observed in “early phase clinical studies” and “case reports” topics, which was contradicted by a steady increase in “randomized control trials” and “survival prognostic factors.” Likewise a sharp increase in “immune-related adverse events” was observed after 2010. The contribution and trend of “clinical studies—outcomes,” on the other hand, were relatively stable throughout the study period (Fig. 6).
In the Mechanisms of carcinogenesis category, the most remarkable increase was observed in the topic “tumor microenvironment,” followed to a lesser extent by “cell signaling pathways” and “cancer stem cells.” The contribution of “adaptive immune response” peaked in 2007, and the “role of regulatory immune cells” in 2012, and then, their popularity gradually decreased. The topic “inflammatory mediators” on the other hand has been decreasing in popularity throughout the study period (Fig. 7).
Finally, in the Translational research subcategory (a) pathways and physiology, topics like “apoptosis,” “tumor-draining lymph nodes (TDLN),” and “animal oncogenic viruses” have showed decreasing trends, whereas “epigenetic regulation,” “oncolytic viruses,” “exosomes,” and “microbiome” have been increasing (Fig. 8). As regards subcategory (b) methodology, classical topics such as “cell lines/in vitro cultures” have been decreasing, whereas “recombinant DNA vaccines” and “predictive models” have been increasing in popularity. The most remarkable increase, however, was observed in “gene mutations/tumor profiling,” which was more pronounced in the last 5 years (Fig. 8).
A more exhaustive presentation of all listed categories and topics is provided in the Supplementary material. The General category (Supp. Figure 12) provides proof-of-concept that cancer immunotherapy is an actively growing field of contemporary biomedical research with increasing trends being observed among seminal topics, such as “systematic reviews and meta-analyses” and “clinical guidelines.”
Discussion
Literature analysis is fundamental for understanding the current state of a research field. It can provide new directions and guide further studies and experimentation. The exponential growth of scientific literature, however, makes this task increasingly challenging. The corpus of published papers is vast, and often not accessible due to copyright and other restrictions [36]. Moreover, this analysis can be extremely laborious, time-consuming, and subject to various types of errors [37].
(Semi)-automated methods, such as topic modeling, can be used to retrieve text-based information and apply it to generate meaningful insights in an efficient and unbiased manner [38]. In our study, we used the latent Dirichlet allocation (LDA) algorithm, a Bayesian hierarchical topic modeling algorithm, which can perform this task with minimal (< 5%) loss of relevant studies, while saving up to 70% of the workload of a classical systematic review [18].
Using this approach, we have analyzed over 90,000 publications in the cancer immunotherapy field, comprising more than 190,000 unique words. Topic modeling led to the identification of an abundance of meaningful topics, classified in conceptualized categories, and subcategories. The cancer immunotherapy domain is relative new and corresponds to a small fraction of the PubMed-listed scientific literature (< 0.5%). However, its popularity has been increasing in the last 20 years, exhibiting an exponential growth averaging 8% per annum, and 16% in the last 5 years alone.
The category that had the largest number of topics in this study was Translational research, followed by Targeted immunotherapy, and Cancer types. The General category comprised mostly of topics related to reviews, clinical guidelines, and meta-analyses and had the highest overall contribution. As regards research topics, remarkable trends were seen among “immune checkpoint inhibitors,” “tumor microenvironment,” “gene mutations/tumor profiling,” and “CAR T-cells.” These topics have dominated the field and are currently contributing to approximately 10% of all cancer immunotherapy publications.
With respect to Cancer types, leading trends included “lung,” “breast,” and “colorectal” cancer. This probably reflects the increasing applicability of immunomodulatory checkpoint inhibitors (e.g., atezolizumab, nivolumab, and pembrolizumab) in these cancers, and the high prevalence of the corresponding malignancies [39]. Among hematologic cancers, “myeloid leukemia” has been steadily decreasing as a topic, whereas “non-Hodgkin lymphoma” peaked in 2007 and then gradually decreased. This phenomenal shift in focus can be perhaps explained by the elaboration of cancer immunotherapy for solid tumors, which has been achieved in recent years [40].
In Clinical research, “early phase clinical studies” have been decreasing in popularity, whereas “randomized control trials” have been on the rise. This provides evidence that the field is maturing and is currently advancing to a later stage of development. Hundreds of ongoing Phase III clinical trials that are listed under cancer immunotherapy provide proof-of-concept (https://clinicaltrials.gov). Moreover, this is confirmed by the sharp increase in the interest of immune-related adverse effects. In the Translational research domain, prevailing topics included “in vivo intratumoral immunotherapy,” “gene mutations/tumor profiling,” “nanoparticles (drug delivery),” “epigenetic regulation,” and “predictive models.” Collectively these trends show a growing interest in personalized cancer therapies, where individual patient characteristics and biomarkers are being used to determine the optimum treatment on a case-by-case basis [41].
Cancer immunotherapy is transforming modern cancer care in an unprecedented way. In this study, we have shown that several ideas have been developed and evolved, and others have been abandoned through years. Immune system factors, such as antibodies, cytokines, CD4+ and CD8+ T cells, dendritic cells, and macrophages, have all been tested for their capacity to fight cancer with varied effectiveness. As of today, targeted immunotherapies (e.g., monoclonal antibodies, antibody–drug conjugates, and bispecific antibodies), immunomodulators (e.g., checkpoint inhibitors, cytokines, and adjuvants), preventive and therapeutic cancer vaccines (e.g., HPV, HBV, BCG, and Sipuleucel-T), oncolytic viruses (e.g., T-VEC using modified HSV), and adoptive cell therapies (e.g., CAR T cells) have been approved and marketed as novel anti-neoplastic medications [42].
Currently, there is a growing interest for dissemination, and real-world effectiveness of these medications among patient populations. This is realized as a T3–T4 transition in the transnational medicine continuum, suggesting that the field is entering its final, and most important stage [43]. From our analysis, we conclude that it is possible to identify these insights, using a data-driven approach. LDA-mediated topic modeling provides several advantages over traditional methods and is emerging as an effective, unbiased method for conducting this type of research. The limitation of our analysis is that we only use one database (PubMed) as a source of the entire cancer immunotherapy literature. However, the exclusive use of this database has been shown to achieve a precision of over 80%, when compared to combinations of PubMed, Embase, Web of Science, and Google Scholar [44]. The quantitative trends that have been displayed herein can be, thus, used as a good starting point for further experimentation and guide new research initiatives. This dynamically evolving field has the capacity to transform evidence generation and will be used more frequently in the future.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
List of identified topics
Author contributions
All authors have contributed equally. All authors read and approved the final manuscript.
Funding
This research is partially co-financed by Greece and the European Union (European Social Fund - ESF) through the Operational Programme “Human Resources Development, Education and Lifelong Learning” in the context of the project “Reinforcement of Postdoctoral Researchers - 2nd Cycle” (MIS-5033021), implemented by the State Scholarships Foundation (IKY).
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Oiseth SJ, Aziz MS. Cancer immunotherapy: a brief review of the history, possibilities, and challenges ahead. J Cancer Metastasis Treat. 2017;3(10):250. doi: 10.20517/2394-4722.2017.41. [DOI] [Google Scholar]
- 2.Marabelle A, Tselikas L, Baere Td, Houot R. Intratumoral immunotherapy: using the tumor as the remedy. Ann Oncol. 2017;28(Suppl. 12):xii33–xii43. doi: 10.1093/annonc/mdx683. [DOI] [PubMed] [Google Scholar]
- 3.Mellman I, Coukos G, Dranoff G. Cancer immunotherapy comes of age. Nature. 2011;480(7378):480–489. doi: 10.1038/nature10673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zhang Y, Quan L, Du L. The 100 top-cited studies in cancer immunotherapy. Artif Cells Nanomed Biotechnol. 2019;47(1):2282–2292. doi: 10.1080/21691401.2019.1623234. [DOI] [PubMed] [Google Scholar]
- 5.Dobosz P, Dzieciatkowski T. The intriguing history of cancer immunotherapy. Front Immunol. 2019;10:2965. doi: 10.3389/fimmu.2019.02965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Whiteside TL, Odoux C. Dendritic cell biology and cancer therapy. Cancer Immunol Immunother. 2004;53(3):240–248. doi: 10.1007/s00262-003-0468-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Fuge O, Vasdev N, Allchorne P, Green JS. Immunotherapy for bladder cancer. Res Rep Urol. 2015;7:65–79. doi: 10.2147/RRU.S63447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Jiang T, Zhou C, Ren S. Role of IL-2 in cancer immunotherapy. OncoImmunology. 2016;5(6):e1163462. doi: 10.1080/2162402X.2016.1163462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kirkwood J. Cancer immunotherapy: the interferon-α experience. Semin Oncol. 2002;29(3, Suppl. 7):18–26. doi: 10.1053/sonc.2002.33078. [DOI] [PubMed] [Google Scholar]
- 10.Waters JP, Pober JS, Bradley JR. Tumour necrosis factor and cancer. J Pathol. 2013;230(3):241–248. doi: 10.1002/path.4188. [DOI] [PubMed] [Google Scholar]
- 11.Rakoff-Nahoum S, Medzhitov R. Toll-like receptors and cancer. Nat Rev Cancer. 2009;9(1):57–63. doi: 10.1038/nrc2541. [DOI] [PubMed] [Google Scholar]
- 12.Dunn GP, Old LJ, Schreiber RD. The immunobiology of cancer immunosurveillance and immunoediting. Immunity. 2004;21(2):137–148. doi: 10.1016/j.immuni.2004.07.017. [DOI] [PubMed] [Google Scholar]
- 13.Zhang H, Chen J. Current status and future directions of cancer immunotherapy. J Cancer. 2018;9(10):1773–1781. doi: 10.7150/jca.24577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kohrt HE, Tumeh PC, Benson D, Bhardwaj N, Brody J, Formenti S, Fox BA, Galon J, June CH, Kalos M, Kirsch I, Kleen T, Kroemer G, Lanier L, Levy R, Lyerly HK, Maecker H, Marabelle A, Melenhorst J, Miller J, Melero I, Odunsi K, Palucka K, Peoples G, Ribas A, Robins H, Robinson W, Serafini T, Sondel P, Vivier E, Weber J, Wolchok J, Zitvogel L, Disis ML, Cheever MA, on behalf of the Cancer Immunotherapy Trials Network (CITN) Immunodynamics: a cancer immunotherapy trials network review of immune monitoring in immuno-oncology clinical trials. J Immunother Cancer. 2016;4(1):15. doi: 10.1186/s40425-016-0118-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Marshall HT, Djamgoz MBA. Immuno-oncology: emerging targets and combination therapies. Front Oncol. 2018;8:315. doi: 10.3389/fonc.2018.00315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Farkona S, Diamandis EP, Blasutig IM. Cancer immunotherapy: the beginning of the end of cancer? BMC Med. 2016;14(1):73. doi: 10.1186/s12916-016-0623-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Subramanian N, Torabi-Parizi P, Gottschalk RA, Germain RN, Dutta B. Network representations of immune system complexity: immune networks. Wiley Interdiscip Rev Syst Biol Med. 2015;7(1):13–38. doi: 10.1002/wsbm.1288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1):5. doi: 10.1186/2046-4053-4-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Mo Y, Kontonatsios G, Ananiadou S. Supporting systematic reviews using LDA-based document representations. Syst Rev. 2015;4(1):172. doi: 10.1186/s13643-015-0117-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zou C. Analyzing research trends on drug safety using topic modeling. Expert Opin Drug Saf. 2018;17(6):629–636. doi: 10.1080/14740338.2018.1458838. [DOI] [PubMed] [Google Scholar]
- 21.Bisgin H, Liu Z, Fang H, Xu X, Tong W. Mining FDA drug labels using an unsupervised learning technique—topic modeling. BMC Bioinform. 2011;12(Suppl. 10):S11. doi: 10.1186/1471-2105-12-S10-S11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Andronis C, Sharma A, Virvilis V, Deftereos S, Persidis A. Literature mining, ontologies and information visualization for drug repurposing. Brief Bioinform. 2011;12(4):357–368. doi: 10.1093/bib/bbr005. [DOI] [PubMed] [Google Scholar]
- 23.Wang SH, Ding Y, Zhao W, Huang YH, Perkins R, Zou W, Chen JJ. Text mining for identifying topics in the literatures about adolescent substance use and depression. BMC Public Health. 2016;16(1):279. doi: 10.1186/s12889-016-2932-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zhou X, Peng Y, Liu B. Text mining for traditional Chinese medical knowledge discovery: a survey. J Biomed Inform. 2010;43(4):650–660. doi: 10.1016/j.jbi.2010.01.002. [DOI] [PubMed] [Google Scholar]
- 25.Faro A, Giordano D, Spampinato C. Combining literature text mining with microarray data: advances for system biology modeling. Brief Bioinform. 2012;13(1):61–82. doi: 10.1093/bib/bbr018. [DOI] [PubMed] [Google Scholar]
- 26.Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3(Jan):993–1022. [Google Scholar]
- 27.Blei DM. Probabilistic topic models. Commun ACM. 2012;55(4):77–84. doi: 10.1145/2133806.2133826. [DOI] [Google Scholar]
- 28.Drosatos G, Kaldoudi E. A probabilistic semantic analysis of ehealth scientific literature. J Telemed Telecare. 2019;00:1–19. doi: 10.1177/1357633X19846252. [DOI] [PubMed] [Google Scholar]
- 29.PubMed, US National Library of Medicine (2019) PubMed—biomedical literature from MEDLINE. https://www.ncbi.nlm.nih.gov/pubmed/. Accessed 29 Dec 2019
- 30.McCallum AK (2002) Mallet: a machine learning for language toolkit. http://mallet.cs.umass.edu. Accessed 20 Feb 2019
- 31.Text Categorization Project (2011) Lists of stopwords. http://code.google.com/p/text-categorization/. Accessed 20 Feb 2019
- 32.Krovetz R (1993) Viewing morphology as an inference process. In: 16th Annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, SIGIR ’93, pp 191–202. 10.1145/160688.160718
- 33.Agrawal A, Fu W, Menzies T. What is wrong with topic modeling? And how to fix it using search-based software engineering. Inf Softw Technol. 2018;98:74–88. doi: 10.1016/j.infsof.2018.02.005. [DOI] [Google Scholar]
- 34.Jaccard P. The distribution of the flora in the alpine zone. New Phytol. 1912;11(2):37–50. doi: 10.1111/j.1469-8137.1912.tb05611.x. [DOI] [Google Scholar]
- 35.Kavvadias S, Drosatos G, Kaldoudi E (2018) An online service for topics and trends analysis in medical literature. In: Lhotska L, Sukupova L, Lacković I, Ibbott GS (eds) World congress on medical physics and biomedical engineering, 3–8 June 2018, Prague, Czech Republic, IFMBE proceedings, vol 68/3. Springer Singapore
- 36.Reichman JH, Okediji RL. When copyright law and science collide: empowering digitally integrated research methods on a global scale. Minn Law Rev. 2012;96(4):1362–1480. [PMC free article] [PubMed] [Google Scholar]
- 37.Ahmed I, Sutton AJ, Riley RD. Assessment of publication bias, selection bias, and unavailable data in meta-analyses using individual participant data: a database survey. BMJ. 2012;344(1 jan03):d7762–d7762. doi: 10.1136/bmj.d7762. [DOI] [PubMed] [Google Scholar]
- 38.Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB. Frontiers of biomedical text mining: current progress. Brief Bioinform. 2007;8(5):358–375. doi: 10.1093/bib/bbm045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Raju S, Joseph R, Sehgal S. Review of checkpoint immunotherapy for the management of non-small cell lung cancer. ImmunoTargets Ther. 2018;7:63–75. doi: 10.2147/ITT.S125070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Kather JN, Berghoff AS, Ferber D, Suarez-Carmona M, Reyes-Aldasoro CC, Valous NA, Rojas-Moraleda R, Jäger D, Halama N. Large-scale database mining reveals hidden trends and future directions for cancer immunotherapy. OncoImmunology. 2018;7(7):e1444412. doi: 10.1080/2162402X.2018.1444412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Dumbrava EI, Meric-Bernstam F. Personalized cancer therapy—leveraging a knowledge base for clinical decision-making. Mol Case Stud. 2018;4(2):a001578. doi: 10.1101/mcs.a001578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Tang J, Pearce L, O’Donnell-Tormey J, Hubbard-Lucey VM. Trends in the global immuno-oncology landscape. Nat Rev Drug Discov. 2018;17(11):783–784. doi: 10.1038/nrd.2018.167. [DOI] [PubMed] [Google Scholar]
- 43.Klevorn LE, Teague RM. Adapting cancer immunotherapy models for the real world. Trends Immunol. 2016;37(6):354–363. doi: 10.1016/j.it.2016.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Bramer WM, Rethlefsen ML, Kleijnen J, Franco OH. Optimal database combinations for literature searches in systematic reviews: a prospective exploratory study. Syst Rev. 2017;6(1):245. doi: 10.1186/s13643-017-0644-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.