Abstract
Background
The intrinsic limitations of studying rare cancers poses challenges to undertaking studies with adequate statistical power. Therefore, efforts are needed to exploit available, high-quality resources. The European Prospective Investigation into Cancer and Nutrition (EPIC) is a large-scale longitudinal cohort with great potential for rare cancer research.
Materials and methods
We have used the EPIC data, which includes lifestyle, diet, and health information, on ∼520 000 participants recruited across Europe. Rare cancers were identified according to the RARE CAncer REgistry network (RARECAREnet) classification, which includes incidence-based categorization and detailed morphological and site-specific information.
Results
An interactive R Shiny web application was developed to explore EPIC data interactively, available at https://epic-rare-cancers-explorer.opendata.iarc.who.int. Among the EPIC participants, 11 450 incident cases of rare cancers were identified, with data currently available for 8851 of them, encompassing a wide range of cancer sites and morphologies. Sex ratios and incidence rates align with previously reported statistics. The R Shiny web application was designed for preliminary data analysis and hypothesis generation, aiding researchers in assessing the feasibility and potential of epidemiological studies. Taking head and neck cancers as a use case, we confirmed the strong association of these tumors with tobacco and alcohol consumption, proving the suitability of EPIC for identifying risk factors for rare cancers. However, it is important to note that, as with all observational studies, the associations reported in this article do not establish causality.
Conclusion
The development of the EPIC rare cancers database, accompanied by the development of an interactive web application, represents a significant step forward in rare cancer research embedded within a large-scale population-based cohort. It is therefore vital to promote awareness of this resource within the research community.
Key words: rare cancers, cancer epidemiology, R Shiny, database, EPIC, head and neck
Highlights
-
•
EPIC is a large-scale longitudinal cohort with great potential for rare cancer research.
-
•
An EPIC rare cancers database was created and an interactive web application developed to explore it.
-
•
The suitability of EPIC for identifying risk factors for rare cancers was confirmed using head and neck as use case.
-
•
These tools represent a significant step forward in rare cancer research.
Introduction
Rare cancers are defined as cancers with an incidence of less than six new cases per 100 000 people per year, in Europe.1 Although individually uncommon, collectively, rare cancers constitute ∼22% of all cancer diagnoses annually in Europe, representing 24% of total cancer prevalence, and contribute to 25% of cancer-related deaths, constituting a substantial burden of disease.1,2 Due to their low individual incidence, rare cancers are understudied, and economies of scale cannot be made, limiting the market incentive for drug development. Additionally, there are multiple challenges in demonstrating the efficacy of new therapeutic approaches through conventional studies.3 Patients with rare cancers have poorer outcomes than those with more common cancers, with estimated 5-year relative survival rates of 47% versus 65% on average, respectively.1 Indeed, while some rare cancers subtypes have particularly good individual prognosis, the 5-year relative survival of others such as those of the hypopharynx, small intestine, and trachea is still below 30%.4 The epidemiological studies on rare cancers are limited and inconclusive, and the etiology of rare cancers is mostly unknown, which hampers the development of prevention strategies and the identification of therapeutic opportunities. While there are several ongoing efforts in Europe and the United States to identify novel therapeutic options for rare cancers, such as the International Call for Research Proposals on Rare Cancer Drug Development—ATTRACT (https://www.fondation-arc.org/attract)—or the collaboration between MD Anderson and the Rare Cancer Research Foundation to accelerate the development of treatments for rare cancers, there are few efforts aimed at unveiling the etiological and epidemiological factors responsible for this heterogeneous group of tumors.
The European Prospective Investigation into Cancer and Nutrition study (EPIC; https://epic.iarc.fr/) has been proven to be a powerful tool for epidemiological research on the causes of cancer and other chronic diseases.5,6 EPIC is one of the largest cohort studies globally, with more than half a million participants recruited across 10 Western European countries and followed for >25 years. Detailed information on diet, lifestyle characteristics, anthropometric measurements, and medical history were collected at recruitment, along with biological samples. Currently, >65 000 participants have been diagnosed with cancer since baseline. However, epidemiological studies on rare cancers within EPIC have been quite limited.
The comprehensive and curated list of rare cancers developed by the RARECAREnet working group and of the rare cancer families defined by consensus by the Joint Action on Rare Cancers (JARC) has provided a framework for advancing studies on rare cancers in a systematic and structured manner (https://seer.cancer.gov/seerstat/variables/seer/raresiterecode/).2,7 They categorized rare cancers into three ‘tiers’: tier 3 at the bottom, corresponding to the World Health Organization names of individual entities; tier 2 in the middle, representing categories requiring similar clinical management and research; and tier 1 at the top, encompassing general categories involving the same clinical expertise and patient referral structure. The latter can be grouped into families based on the site of cancers, totaling 12: hematological tumors, childhood cancers, and 10 families of rare adult solid cancers namely head and neck carcinomas, central nervous system cancers, endocrine organs, sarcomas, digestive cancers, neuroendocrine neoplasms, thoracic (including mesothelioma), male genital and urogenital, female genital, and finally skin cancers and non-cutaneous melanoma. Recently, two of these families have been renamed: ‘thoracic cancers’ are now referred to as ‘rare cancers of the chest’, and ‘skin cancers and non-cutaneous melanoma’ have been simplified to ‘skin cancers’. Additionally, the ‘digestive’ and ‘endocrine organs’ families have been combined into the ‘digestive and endocrine systems’, while the ‘male genital and urogenital’ family has been separated into two distinct categories: ‘male genital’ and ‘urinary tracts’.4 Rare cancers are identified among the tier 2 cancers. In this study, we utilized the rare cancers list from the RARECAREnet to generate the EPIC rare cancer database.
Tobacco and alcohol are well-established risk factors for head and neck cancers.8, 9, 10, 11 Tobacco use is estimated to be responsible for 70%-80% of head and neck cancers, and second-hand smoke may also increase the risk of developing this disease. Similarly, moderate to heavy alcohol consumption is associated with higher risks of developing cancer in the mouth, pharynx, larynx, and esophagus.12,13 The combined use of alcohol and tobacco further amplifies this risk.10 It has also been previously shown that hypertension and diabetes are also risk factors.14, 15, 16 Other factors such as high body mass index (BMI) and poor oral hygiene have been suggested as suspected risk factors11,17 and studies are ongoing to confirm their role in these diseases.
In order to demonstrate the utility of EPIC for the study of rare cancers and establishment of a rare cancer database, we selected cancers of the head and neck on which we were able to confirm the role of alcohol and smoking in their development (see as example the classification of this family in different tiers in Supplementary Material, available at https://doi.org/10.1016/j.esmorc.2025.100014). Moreover, we developed a Shiny application to provide the research community with an accessible and a user-friendly tool for exploring the curated EPIC dataset on rare cancers, enabling hypothesis generation and feasibility assessments. The Shiny application represents a key platform to advance the much-needed etiological and epidemiological research on rare cancers using EPIC data.
Materials and methods
The EPIC study
The EPIC was launched in the early 1990s and recruited >520 000 participants across 10 European countries. The EPIC study has ethical approval from the International Agency for Research on Cancer (IARC) and all contributing centers and all participants have signed informed consents. Upon enrollment, participants provided blood samples, had their physical measures recorded, and completed comprehensive surveys encompassing their diet, lifestyle habits, medical history, physical activity, and other lifestyle and environmental factors. Lifestyle and dietary assessments were also collected during follow-up, depending on the centers. Participants’ cancer occurrence and vital status were monitored during follow-up, along with the occurrence of other diseases, including diabetes, cardiovascular diseases, and Parkinson disease. It is important to note that EPIC recruitment strategies were different from one country to another leading to limitations in representativity of the general population, and therefore some differences in patients’ characteristics and incidence rates are expected. For example, in France, only women were recruited. For the current project on rare cancers, we had access to data from 379 825 individuals from seven countries.
Database development
The list of rare tiers 2 (i.e. with incidence <6/100 000 in Europe) available in Gatta et al.2 but also the set of site (topographies) and morphologies of both rare and common tiers 2 available on the SEER website (https://seer.cancer.gov/seerstat/variables/seer/raresiterecode/—version from November 2023) were used to generate an exhaustive list of 31 635 distinct combinations of site and morphology including information on family and tier 1 and tier 2 categories for each combination (Supplementary Table S1, available at https://doi.org/10.1016/j.esmorc.2025.100014). This list includes only the combinations corresponding to rare tier 2 as described by Gatta et al.2 and not very general morphologies which are not possible to be grouped into specific rare cancers. It was used to extract rare cancer datasets from the EPIC database which includes for each cancer case information of the site and morphology coded according to the second and third edition of the International Classification of Diseases for Oncology (ICD-O)18 and can be re-used to extract rare cancers from any cancer database. In total, our database included 2961 distinct combinations of site and morphology. Of these, 1440 were on our list of rare cancer combinations.
Web interactive database explorer
The EPIC Rare Cancers Explorer was developed as an interactive web application using the following R packages (https://www.r-project.org/) : Shiny v1.7.5 (https://cran.r-project.org/web/packages/shiny/), Shinydashboard v0.7.2 (https://cran.r-project.org/web/packages/shinydashboard/index.html), ShinydashboardPlus v2.0.5 (https://cran.r-project.org/web/packages/shinydashboardPlus/index.html), Plotly 4.10.4 (https://cran.r-project.org/web/packages/plotly/index.html). The primary focus was to facilitate data exploration and preliminary analysis of the EPIC rare cancer dataset while safeguarding the privacy and security of the data. To do so, only aggregated statistics without location/center information were displayed to prevent re-identification of participants, and all summary statistics presented in the application were pre-computed. This process included calculating counts and proportions stratified by various factors such as sex, cancer site, morphology, and RARECAREnet tiers 1 and 2. These pre-computed statistics were then utilized to populate the application, ensuring that no individual-level data were exposed. The Shiny application’s development involved designing an intuitive user interface and server-side logic to handle user inputs, retrieve pre-computed statistics, and generate dynamic visualizations.
Study population for head and neck use case
For the current analysis we had access to data from 881 head and neck cancer cases (607 males and 274 females) to carry out statistical analyses as a use case. The most common tumors were ‘epithelial tumors of the hypopharynx and larynx’ (n = 321), ‘epithelial tumors of the oral cavity and lip’ (n = 243), and ‘epithelial tumors of the oropharynx’ (n = 177), while the most represented tier 2 categories were ‘squamous-cell carcinoma with variants of larynx’ (n = 268), ‘squamous-cell carcinoma with variants of the oral cavity’ (n = 189), and ‘squamous-cell carcinoma with variants of the oropharynx’ (n = 177) (Supplementary Table S2, available at https://doi.org/10.1016/j.esmorc.2025.100014). We used as non-cases all the other study participants from our dataset who did not develop a head and neck cancer at the time of the follow-up, including those with any other type of cancer whether common or rare (n = 378 944) (Table 1 and Supplementary Table S3, available at https://doi.org/10.1016/j.esmorc.2025.100014).
Table 1.
Baseline characteristics of EPIC rare cancers cases and EPIC head and neck rare cancers cases
| All rare cancers |
Head and neck rare cancers |
||||
|---|---|---|---|---|---|
| Characteristic | Overall, N = 379 825a | Cases, n = 8851a | Controls, n = 370 974a | Cases, n = 881a | Controls, n= 378 944a |
| Sex | |||||
| Male | 113 470 (30%) | 3163 (36%) | 110 307 (30%) | 607 (69%) | 112 863 (30%) |
| Female | 266 355 (70%) | 5688 (64%) | 260 667 (70%) | 274 (31%) | 266 081 (70%) |
| Age at recruitment (years) | 52 (45-58) | 56 (50-61) | 52 (45-58) | 56 (51-61) | 52 (45-58) |
| Vital status | |||||
| Alive | 335 724 (88%) | 4357 (49%) | 331 367 (89%) | 498 (57%) | 335 226 (88%) |
| Dead | 39 458 (10%) | 4467 (50%) | 34 991 (9.4%) | 382 (43%) | 39 076 (10%) |
| Withdrew from study | 702 (0.2%) | 9 (0.1%) | 693 (0.2%) | 0 (0%) | 702 (0.2%) |
| Emigrated to another region | 782 (0.2%) | 6 (<0.1%) | 776 (0.2%) | 0 (0%) | 782 (0.2%) |
| Emigrated to another country | 1896 (0.5%) | 10 (0.1%) | 1886 (0.5%) | 0 (0%) | 1896 (0.5%) |
| Unknown | 1263 (0.3%) | 2 (<0.1%) | 1261 (0.3%) | 1 (0.1%) | 1262 (0.3%) |
| Body mass index (computed) | 24.8 (22.3-27.7) | 25.6 (23.0-28.5) | 24.8 (22.3-27.7) | 26.0 (23.3-28.7) | 24.8 (22.3-27.7) |
| Smoking status | |||||
| Never | 189 223 (50%) | 3764 (43%) | 185 459 (50%) | 164 (19%) | 189 059 (50%) |
| Former | 103 079 (27%) | 2535 (29%) | 100 544 (27%) | 237 (27%) | 102 842 (27%) |
| Smoker | 80 693 (21%) | 2424 (27%) | 78 269 (21%) | 471 (53%) | 80 222 (21%) |
| Unknown | 6830 (1.8%) | 128 (1.4%) | 6702 (1.8%) | 9 (1.0%) | 6821 (1.8%) |
| Alcohol lifetime pattern | |||||
| Never drinkers | 24 031 (6.3%) | 507 (5.7%) | 23 524 (6.3%) | 22 (2.5%) | 24 009 (6.3%) |
| Former light drinkers | 14 159 (3.7%) | 376 (4.2%) | 13 783 (3.7%) | 27 (3.1%) | 14 132 (3.7%) |
| Former heavy drinkers | 1777 (0.5%) | 100 (1.1%) | 1677 (0.5%) | 39 (4.4%) | 1738 (0.5%) |
| Light drinkers | 52 325 (14%) | 1132 (13%) | 51 193 (14%) | 68 (7.7%) | 52 257 (14%) |
| Never heavy drinkers | 208 029 (55%) | 4749 (54%) | 203 280 (55%) | 380 (43%) | 207 649 (55%) |
| Periodically heavy drinkers | 40 893 (11%) | 1147 (13%) | 39 746 (11%) | 229 (26%) | 40 664 (11%) |
| Always heavy drinkers | 2750 (0.7%) | 99 (1.1%) | 2651 (0.7%) | 35 (4.0%) | 2715 (0.7%) |
| Unknown | 35 861 (9.4%) | 741 (8.4%) | 35 120 (9.5%) | 81 (9.2%) | 35 780 (9.4%) |
IQR, interquartile range.
n (%); median (IQR).
Statistical analyses
Multivariate Cox proportional hazards regression analyses were carried out to estimate the hazard ratios (HRs) and corresponding 95% confidence intervals (CIs) for the association between smoking, alcohol consumption, hypertension, diabetes (any reported), and the risk of cancers of the head and neck. We used follow-up as the time scale. Participants were followed up until the first incident head and neck cancer, death, or end of follow-up, whichever comes first. All models were adjusted for baseline confounders, including age at recruitment, sex, country, and highest education level (categories include ‘none/primary’ when none or primary school completed, ‘secondary/tech’ when secondary school or technical/professional school completed, ‘longer’ when longer education including university degree completed, and ‘missing’ when the education level was not specified or missing). To ensure robust analyses and avoid instability from small sample sizes, data from countries that had <10 cases were removed for all analyses (France had only two cases because only women were recruited in France). Regarding the analysis of asbestos exposure estimates, countries including ‘The Netherlands’ and the ‘UK’ as well as the category ‘missing’ from the highest education level (‘school level’) were removed because not enough data were available for these categories. For the analyses by subtypes, we also had to remove a few subsets because we did not have enough cases in each subset: ‘unknown’ smoking status for the oral cavity, oropharynx, hypopharynx, and lip subtypes; ‘light drinkers’ alcohol pattern for the hypopharynx subtype; and finally ‘longer’ for the highest education level, ‘former heavy drinkers’ alcohol pattern, and ‘do not know’ diabetes status for the lip subtype. These removals modified only very slightly the number of cases used for the analyses for the oral cavity subtype (n = 186 instead of 189) and the lip subtype (n = 51 instead of 54).
Scaled Schoenfeld residuals were used to test the proportionality of hazards assumptions using the ggcoxzph function in the survminer package. Statistical analyses were carried out using R version 4.3.1 and RStudio Server version 2023.1.2.1 Build 402. The following specific packages were used: survival version 3.5.7 for survival analysis and for Cox proportional hazards models,19 gtsummary version 2.0.3 for tables and ggplot2 version 3.5.120 for plots. The 5-year survival rate was estimated using the Kaplan–Meier model with the survfit function from the survival package. A P value < 0.05 was considered significant. The scripts are available on the associated Github repository: https://github.com/IARCbioinfo/EPIC-RareCancers.
Results
Rare cancers in EPIC
Out of the 48 328 eligible incident cancer cases from our dataset, 8851 were rare cancers. This corresponds to 18.3% of the total cancer cases, which is slightly below but still in line with the 20%-24% prevalence previously reported.7 Indeed, rates of most cancers in EPIC are generally lower than the source population due to the healthy volunteer effect. The average age at diagnosis was 63.4 versus 64.4 years for all cancers, and 64.3% of the rare cancers were diagnosed in females (n = 5688) versus 63.8% for all cancers (n = 30 849). In our EPIC dataset, the 5-year survival rate was 72% (95% CI 71-72) for common cancers and 55% (95% CI 54-56) for rare cancers, also consistent with previously reported figures (RARECARE project). The 8851 rare cancer cases represent a majority of categories reported by the RARECAREnet working group (57/66 tier 1 and 155/197 tier 2 categories). All families are represented with 1808 hematological cancer cases, 881 head and neck cancer cases, 754 cancers of the central nervous system cases, 528 sarcomas, 396 endocrine organs cancer cases, 361 digestive cancer cases, 344 neuroendocrine cancer cases, 236 male genital and urogenital cancer cases, 183 thoracic cancer cases, 118 female genital cancer cases, and 100 skin cancers and non-cutaneous melanoma cases (and five embryonal tumor cases). The Supplementary Table S4, available at https://doi.org/10.1016/j.esmorc.2025.100014, presents the numbers by sex for each country, center, family, tier 1, tier 2 site, and morphology with additional information on alcohol consumption, smoking intensity, and asbestos exposure.
In terms of key epidemiological data, we had information on lifetime alcohol consumption patterns and drinking history for 91.6% (n = 8110) of the rare cancer cases, smoking status for 98.6% (n = 8723), smoking intensity for 98.8% (n = 8744), (semi-) quantitative dietary exposure for 98% (n = 8696), and anthropometric characteristics, such as BMI, for 100% (n = 8851). We also have vital status available for almost 100% (n = 8849) of the cases. Vital data on total and cause-specific mortality were collected at the EPIC study centers through mortality registries or active follow-up and death record collection. The data dictionary including numbers of samples having data available for each variable is presented in Supplementary Table S5, available at https://doi.org/10.1016/j.esmorc.2025.100014.
Regarding biological material availability, the IARC EPIC biobank hosts samples for 5177 of our 8851 rare cancer cases including serum, erythrocytes, plasma, blood DNA, and buffy coat. The majority of samples are cancers of the hematological family (n = 1064), cancers of the central nervous system family (n = 514), head and neck family (n = 501), and sarcoma family (n = 341). Among our dataset, only Danish samples (n = 2176) are not hosted in the IARC biobank but are housed at the EPIC Denmark biobank. An overview of the distribution of cases per country is shown in Figure 1. A summary of the baseline characteristics of the cohort is shown in Table 1 and Supplementary Table S3, available at https://doi.org/10.1016/j.esmorc.2025.100014.
Figure 1.
Map of EPIC rare cancers. For each participating country with data included in the rare cancers database, numbers correspond to the number of rare cancer cases (see the Materials and Methods section for their derivation) and dots correspond to cities that recruited patients. Note that although Sweden and Norway are part of EPIC, their data are not included in the current version of the database. EPIC, European Prospective Investigation into Cancer and Nutrition.
The EPIC Rare Cancers Explorer
While providing an overview, the static format of the EPIC rare cancers data presented in Supplementary Table S4, available at https://doi.org/10.1016/j.esmorc.2025.100014, is not ideal for data exploration. Our effort was not only to summarize the data available in EPIC but also to develop a user-friendly tool for data exploration to assess the feasibility of specific research questions. For this purpose, we developed an easy-to-navigate interactive web application using the Shiny R package, named the EPIC Rare Cancers Explorer (Figure 2). It provides de-identified summary information about individuals diagnosed with rare cancers in EPIC, including total cases stratified by sex, site, morphology, tier 1, tier 2, rare cancer families, and the availability of biological samples by type: serum, erythrocytes, plasma, blood DNA, or buffy coat (Supplementary Table S6, available at https://doi.org/10.1016/j.esmorc.2025.100014). The data dictionary tab allows users to check the number of samples with data on civil status, anthropometry, cancer diagnosis and follow-up, socioeconomic status, medical history, smoking and alcohol consumption, reproductive factors, diet, and physical activity. While these data correspond to the baseline questionnaire at recruitment, some follow-up data are also available at later time points for some of the centers available as part of the working documents in the EPIC website.
Figure 2.
Screenshot of EPIC Rare Cancers Explorer. Left panel: main menu. Right panel: the rare families page, with a treemap where areas represent the number of cases and colors represent tier 1 and tier 2 families (top), a barplot of the number of cases by sex (red for females and blue for males) per cancer family (middle). EPIC, European Prospective Investigation into Cancer and Nutrition.
Different screens available from the left menu and various search options facilitate navigation through the data tables. Also, basic summary plots are provided to aid visualization. For example, to use the explorer to search for specific cases by site and morphology combinations, you can select the corresponding ‘Sites & Morphologies’ tab, enter your search criteria, and view the results in real time. One tab enables the exploration of summary statistics for a specified dataset (i.e. cases from a specific family, tier 1, tier 2, site, or morphology) including biological sample availability. Another tab enables the generation of plots of the main risk factors from a specified dataset and comparing them with plots from control data.
This should help in quickly assessing the feasibility of studying particular rare cancers within the EPIC cohort. Once feasibility has been assessed, researchers can access the full dataset and biological samples by following the procedures outlined on the EPIC website (https://epic.iarc.fr/access/). The rare cancers database and explorer, which can be found at https://epic-rare-cancers-explorer.opendata.iarc.who.int, will be updated and expanded with new information populated from the global EPIC database updates when they occur.
Use case: head and neck cancer
We examined tobacco and alcohol consumption, as well as hypertension and diabetes, in relation to cancers of the head and neck (n = 881) in EPIC to illustrate the utility of the database for such epidemiological analyses. We conducted analyses for all cancers of the head and neck collectively and for each of the five main tier 2 categories in terms of number of cases: squamous-cell carcinoma with variants of the larynx (n = 268), squamous-cell carcinoma with variants of the oral cavity (n = 189), squamous-cell carcinoma with variants of the oropharynx (n = 177), squamous-cell carcinoma with variants of the hypopharynx (n = 53), and squamous-cell carcinoma with variants of the lip (n = 54). The different analyses revealed no violations of the proportional hazard’s assumption (Supplementary Figure S1, available at https://doi.org/10.1016/j.esmorc.2025.100014). An overview of the study population is shown in Table 1 and Supplementary Tables S2 and S3, available at https://doi.org/10.1016/j.esmorc.2025.100014. In this analysis, the comparison group comprised participants without head and neck cancer at the time of follow-up, including those with any other cancer (n = 378 944).
Despite the low numbers of head and neck cancer cases, and in line with the current knowledge,9,10 we were able to confirm a positive association between smoking and alcohol drinking and head and neck cancer risk, after adjusting for sex, age at recruitment, country, educational level, hypertension, and diabetes. As shown in Figure 3, Supplementary Figure S2 and Table S7, available at https://doi.org/10.1016/j.esmorc.2025.100014, in the case of tobacco smoking, the risk increases with the intensity, with the HR for current smoker varying from 2.65 (95% CI 2.04-3.45, P < 0.001) to 8.18 (95% CI 6.07-11.03, P < 0.001) depending on the number of cigarettes per day indicating a twofold to eight-fold increase in risk for current smokers compared with never smokers. The risk decreases with the time since quitting, with those who stopped smoking for >20 years having a risk similar to those who never smoked (HR 1.08, 95% CI 0.77-1.54).
Figure 3.
Forest plot of hazard ratios (HRs) for cancer risk of the fully adjusted multivariate model for all head and neck rare cancers (n = 881). The model includes the following factors: smoking status and intensity, alcohol consumption patterns, hypertension, diabetes, sex, age at recruitment, country, and education level (see the Materials and Methods section). The vertical dashed line represents a HR of 1 (no effect on cancer risk). See Table 1, Supplementary Tables S3, S7 and Figure S3, available at https://doi.org/10.1016/j.esmorc.2025.100014, for number of events and numerical values of HR, Cox proportional hazard model parameters, and confidence intervals. CI, confidence interval.
Similarly, heavy alcohol consumption was also associated with a higher risk of these cancers, with an HR of 3.50 (95% CI 1.92-6.37, P < 0.001) for the heavier drinkers compared with never drinkers (Figure 3, Supplementary Figure S2 and Table S7, available at https://doi.org/10.1016/j.esmorc.2025.100014). We noticed that former heavy drinkers have a higher HR than ‘always heavy drinkers’ (>60 g/day for men and 30 g/day for women during their whole life starting at 20 years old) (HR 5.54, 95% CI 3.06-10.02, P < 0.001) most probably because these participants quit drinking for health reasons (reverse causality, or the ‘sick quitter effect’).21 We also observed an association of hypertension with head and neck cancer risk (HR 1.28, 95% CI 1.08-1.51, P = 0.004) but not with diabetes (HR 0.97, 95% 95% CI 0.67-1.39, P = 0.853).
We further explored other potential factors adjusting each time for smoking status and alcohol consumption in addition to the baseline confounders. We found a weak association between BMI and head and neck cancer risk (HR 0.98, 95% CI 0.96-1.00, P = 0.02; Supplementary Figure S3, available at https://doi.org/10.1016/j.esmorc.2025.100014), but this association disappeared when the analysis was restricted to never smokers only (HR 1.02, 95% CI 0.98-1.06, P = 0.292) or to never heavy drinkers only (HR 1.00, 95% CI 0.97-1.03, P = 0.958), suggesting that the observed effect may be confounded by smoking and alcohol consumption. We also checked potential associations with physical activity, energy intake, and asbestos exposure but none of them were statistically significant (Supplementary Figures S4-S6, available at https://doi.org/10.1016/j.esmorc.2025.100014).
The relatively high numbers of cases for anatomic subtypes, including larynx (n = 268), oral cavity (n = 189), oropharynx (n = 177), hypopharynx (n = 53), and lip (n = 54), allowed associations with alcohol and smoking to be examined across anatomical subtypes. Our analyses found a higher risk of tumors in the larynx for smokers (HR 13.25, 95% CI 7.45-23.57, P < 0.001) compared with never smokers, while heavy drinkers had a higher risk of tumors in the oropharynx (HR 19.76, 95% CI 2.43-160.78, P = 0.005 for former heavy drinkers and HR 15.36, 95% CI 1.91-123.84, P = 0.01 for ‘always heavy drinkers’) (Supplementary Figures S7 and S8, available at https://doi.org/10.1016/j.esmorc.2025.100014). Smoking was also positively associated with cancers of the oral cavity and hypopharynx compared with never smokers (HR 3.49, 95% CI 2.29-5.33, P < 0.001 and HR 28.83, 95% CI 3.88-214.17, P = 0.001, respectively; Supplementary Figures S9 and S10, available at https://doi.org/10.1016/j.esmorc.2025.100014). For cancers of the lip, no association was observed with alcohol consumption (HR 1.97, 95% CI 0.40-9.72, P = 0.41 for periodically heavy drinkers compared with never drinkers) or smoking (HR 1.42, 95% CI 0.70-2.89, P = 0.33 for smokers compared with never smokers), but we observed positive association between hypertension and cancer of the lip with a HR of 1.98 (95% CI 1.10-3.54, P = 0.022; Supplementary Figure S11, available at https://doi.org/10.1016/j.esmorc.2025.100014).
Discussion
The intrinsic limitation of small sample sizes leads to rare cancers being understudied and neglected, often being classified as orphan diseases.22 Since making treatments for individual rare cancers cost-effective for pharmaceutical companies is challenging, different clinical approaches need to be developed for rare cancers. A better understanding of their etiology can aid in designing primary (preventing cancer development), secondary (early diagnosis), and tertiary (interception, preventing progression to life-threatening stages) strategies.
One of the most puzzling open questions in the field of rare cancers is the reason for their rarity. One potential explanation is the rarity of their cell of origin. For example, in the case of neuroendocrine tumors, the main cell of origin is believed to be a neuroendocrine cell. These cells are dispersed throughout the body but are underrepresented compared with other cell types. In the lung, for instance, a recent single-cell cancer atlas identified just 500 neuroendocrine cells out of 2.4 million lung cells, accounting for only 0.02% of the total.23 However, other factors, such as rare exposures, less intense exposure to known risk factors, or genetic predispositions, may also play a role. Prospective cohorts like EPIC present a unique opportunity to study the potential causes of rare cancers, but they are few and have been underutilized for this purpose. Understanding these underlying potential causes can pave the way for better prevention, diagnosis, and treatment strategies.
With the current available number of incident cancer cases, we have been able to confirm the link between rare head and neck cancers and tobacco and alcohol consumption9, 10, 11, 12, 13,24 serving as a proof-of-principle study on the etiology of rare cancers using EPIC. The choice of this case was driven by these well-established associations in a heterogeneous family with distinct etiological patterns, including several rarer tier 2 subtypes. Within EPIC, a relatively high number of head and neck cancer cases allowed robust statistical analyses despite their rarity, enabling evaluation of known risk associations across various anatomical subtypes. Indeed, even the rarer tier 2 categories within this family demonstrated the expected epidemiological associations. However, while these analyses validate EPIC’s potential for exploring established carcinogenic exposures, they may not fully represent the cohort’s capacity to identify subtler associations involving less potent or less established exposures and rare cancers.
The aim of establishing the EPIC Rare Cancers Explorer was to raise awareness within the scientific community of the possibility of using the EPIC database to study the potential etiologies of rare cancers and to facilitate the assessment of the feasibility of studying a given rare cancer. While low numbers of incident cases will always be an issue for rare cancer research, large-scale cohorts such as EPIC are valuable resources because the individuals being followed up belong to the average age range for the diagnosis of these diseases and with a large sample size and long duration of follow-up the number of incident cases will grow. It is important to note that the last update of the full EPIC database was in 2015 with end of follow-up ranging from 2009 to 2013 depending on the individual centers, and that a new update of the EPIC database is anticipated to be released by 2026. The average age at recruitment of the EPIC population was 51.4 years for dates of recruitment between October 1992 and July 2001 and given that the average age for the diagnosis of rare cancers in EPIC is 63.4 years, we expect the current number to increase significantly, likely gaining representation in the 197 tier 2 categories. Moreover, when the number of cases is still relatively low, one of EPIC’s strengths is the large number of non-cases available, which increases the statistical power to detect robust associations.
While the EPIC cohort provides several methodological strengths, important limitations must be carefully considered when used for rare cancer epidemiological research. As with all observational studies, EPIC can mostly identify associations rather than confirm causality, and residual confounding remains a concern—particularly for occupational or environmental exposures. Additionally, isolating the independent effects of correlated exposures is inherently challenging and demands critical attention to potential confounders and effect modifiers, which together with non-hypothesis-driven analyses may inflate the chance of type I errors. Additionally, the grouping of rare cancers into families and tiers, while practical, may mask underlying heterogeneity in disease pathogenesis, either obscuring genuine associations or artificially inflating weaker ones. Recruitment biases, including variations in country-specific participant selection (e.g. only women recruited in France), may also affect representativity and generalizability. Lastly, modifiable risk factors were primarily assessed at baseline, limiting the assessment of risk associated with lifetime exposures or trajectories.
Despite these limitations, the findings from this analysis suggest that rare cancers in EPIC are well representative of rare cancers in Europe based on the RARECAREnet working group data, possess detailed lifestyle information for the majority of cases, and therefore could be used to conduct epidemiological studies to identify risk factors for rare cancers. In addition, data on assessments made during follow-up represent an additional resource in EPIC, which can be accessed for rare cancer research, and longitudinal trends or cumulative burden should be considered whenever possible. The availability of biospecimen samples provides an important opportunity to combine epidemiological and molecular data, potentially enabling researchers to generate hypotheses regarding rare cancer etiology. While our use case demonstrates EPIC’s potential, it also highlights the challenges involved in investigating subtler relationships between other exposures and rare cancers, where lower effect sizes and residual confounding may complicate interpretations. To address these challenges, efforts should be made to explore other large-scale prospective cohorts such as UK Biobank and to carry out pooled analyses in consortia such as the National Cancer Institute Cohort Consortium (https://epi.grants.cancer.gov/cohort-consortium/). Combining these resources will increase the statistical power of the analyses and generate findings that are robust across populations.
In summary, the EPIC Rare Cancers Explorer is a tool designed to raise awareness of the usefulness of the EPIC study in bridging the gap in understanding the potential etiologies of rare cancers. Its goal is to pave the way for more collaborative efforts to leverage other prospective cohorts to advance knowledge of these overlooked diseases and design strategies for primary, secondary, and tertiary prevention.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work the authors used ChatGPT in order to correct grammatical errors and typos. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
Acknowledgements
We thank Thomas Cattiaux for providing the code template to generate the map of EPIC rare cancers, as well as Germain Deroche and Nicolas Tardy for the setup of the server hosting the EPIC Rare Cancers Explorer.
Funding
None declared.
Disclosure
The authors have declared no conflicts of interest.
Data sharing
The raw data underlying the EPIC Rare Cancers Explorer are not publicly available. However, access to these data can be obtained by applying through the EPIC steering committee. Detailed instructions on the application process are available at https://epic.iarc.fr/access/.
Contributor Information
L. Fernandez-Cuesta, Email: fernandezcuestal@iarc.who.int.
M. Foll, Email: follm@iarc.who.int.
Supplementary Material
References
- 1.Gatta G., van der Zwan J.M., Casali P.G., et al. Rare cancers are not so rare: the rare cancer burden in Europe. Eur J Cancer. 2011;47(17):2493–2511. doi: 10.1016/j.ejca.2011.08.008. [DOI] [PubMed] [Google Scholar]
- 2.Gatta G., Capocaccia R., Botta L., et al. Burden and centralised treatment in Europe of rare tumours: results of RARECAREnet—a population-based study. Lancet Oncol. 2017;18(8):1022–1039. doi: 10.1016/S1470-2045(17)30445-X. [DOI] [PubMed] [Google Scholar]
- 3.Casali P.G., Licitra L., Frezza A.M., Trama A. ‘Rare cancers’: not all together in clinical studies. Ann Oncol. 2022;33(5):463–465. doi: 10.1016/j.annonc.2022.01.077. [DOI] [PubMed] [Google Scholar]
- 4.Trama A., Bernasconi A., Cañete A., et al. Incidence and survival of rare adult solid cancers in Europe (EUROCARE-6): a population-based study. Eur J Cancer. 2024;214 doi: 10.1016/j.ejca.2024.115147. [DOI] [PubMed] [Google Scholar]
- 5.Riboli E., Hunt K.J., Slimani N., et al. European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutr. 2002;5(6B):1113–1124. doi: 10.1079/PHN2002394. [DOI] [PubMed] [Google Scholar]
- 6.Molina-Montes E., Ubago-Guisado E., Petrova D., et al. The role of diet, alcohol, BMI, and physical activity in cancer mortality: summary findings of the EPIC study. Nutrients. 2021;13(12):4293. doi: 10.3390/nu13124293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Botta L., Gatta G., Trama A., et al. Incidence and survival of rare cancers in the US and Europe. Cancer Med. 2020;9(15):5632–5642. doi: 10.1002/cam4.3137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cancer.Net Editorial Board Head and neck cancer: risk factors. Cancer.Net. 2022 https://www.cancer.net/cancer-types/head-and-neck-cancer/risk-factors Available at. [Google Scholar]
- 9.Lubin J.H., Purdue M., Kelsey K., et al. Total exposure and exposure rate effects for alcohol and smoking and risk of head and neck cancer: a pooled analysis of case-control studies. Am J Epidemiol. 2009;170(8):937–947. doi: 10.1093/aje/kwp222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hashibe M., Brennan P., Chuang S.C., et al. Interaction between tobacco and alcohol use and the risk of head and neck cancer: pooled analysis in the International Head and Neck Cancer Epidemiology Consortium. Cancer Epidemiol Biomarkers Prev. 2009;18(2):541–550. doi: 10.1158/1055-9965.EPI-08-0347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gaudet M.M., Olshan A.F., Chuang S.C., et al. Body mass index and risk of head and neck cancer in a pooled analysis of case–control studies in the International Head and Neck Cancer Epidemiology (INHANCE) Consortium. Int J Epidemiol. 2010;39(4):1091–1102. doi: 10.1093/ije/dyp380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bagnardi V., Rota M., Botteri E., et al. Alcohol consumption and site-specific cancer risk: a comprehensive dose-response meta-analysis. Br J Cancer. 2015;112(3):580–593. doi: 10.1038/bjc.2014.579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.LoConte N.K., Brewster A.M., Kaur J.S., Merrill J.K., Alberg A.J. Alcohol and cancer: a statement of the American Society of Clinical Oncology. J Clin Oncol. 2018;36(1):83–93. doi: 10.1200/JCO.2017.76.1155. [DOI] [PubMed] [Google Scholar]
- 14.Seo J.-H., Kim Y.-D., Park C.-S., Han K.-D., Joo Y.-H. Hypertension is associated with oral, laryngeal, and esophageal cancer: a nationwide population-based study. Sci Rep. 2020;10(1) doi: 10.1038/s41598-020-67329-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Stott-Miller M., Chen C., Chuang S.-C., et al. History of diabetes and risk of head and neck cancer: a pooled analysis from the international head and neck cancer epidemiology consortium. Cancer Epidemiol Biomarkers Prev. 2012;21(2):294–304. doi: 10.1158/1055-9965.EPI-11-0590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tseng K.S., Lin C., Lin Y.-S., Weng S.-F. Risk of head and neck cancer in patients with diabetes mellitus: a retrospective cohort study in Taiwan. JAMA Otolaryngol Head Neck Surg. 2014;140(8):746–753. doi: 10.1001/jamaoto.2014.1258. [DOI] [PubMed] [Google Scholar]
- 17.Hashim D., Sartori S., Brennan P., et al. The role of oral hygiene in head and neck cancer: results from International Head and Neck Cancer Epidemiology (INHANCE) consortium. Ann Oncol. 2016;27(8):1619–1625. doi: 10.1093/annonc/mdw224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fritz A., Percy C., Jack A., et al. International Classification of Diseases for Oncology. 3rd ed. World Health Organization. 2000. https://iris.who.int/handle/10665/42344 Available at.
- 19.Therneau Y.M., Grambsch P.M. Springer; New York, NY: 2000. Modeling Survival Data: Extending the Cox Model. [Google Scholar]
- 20.Wickham H. Springer; Cham: 2016. ggplot2: Elegant Graphics for Data Analysis. [Google Scholar]
- 21.Sarich P., Gao S., Zhu Y., Canfell K., Weber M.F. The association between alcohol consumption and all-cause mortality: an umbrella review of systematic reviews using lifetime abstainers or low-volume drinkers as a reference group. Addiction. 2024;119(6):998–1012. doi: 10.1111/add.16446. [DOI] [PubMed] [Google Scholar]
- 22.Casali P.G., Trama A. Rationale of the rare cancer list: a consensus paper from the Joint Action on Rare Cancers (JARC) of the European Union (EU) ESMO Open. 2020;5(2) doi: 10.1136/esmoopen-2019-000666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sikkema L., Ramírez-Suástegui C., Strobl D.C., et al. An integrated cell atlas of the lung in health and disease. Nat Med. 2023;29(6):1563–1577. doi: 10.1038/s41591-023-02327-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Freedman N.D., Abnet C.C., Caporaso N.E., et al. Impact of changing US cigarette smoking patterns on incident cancer: risks of 20 smoking-related cancers among the women and men of the NIH-AARP cohort. Int J Epidemiol. 2016;45(3):846–856. doi: 10.1093/ije/dyv175. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



