Abstract
Lymph node metastasis is of major prognostic significance for breast cancer. Lymph node metastasis arises at a very early stage in some patients. Using the data downloaded from the TCGA database, we studied the differences between primary tumors with and without lymph node metastasis at the multi-omics level using bioinformatics approaches. Our study found that low mutation and neoantigen burdens correlated with lymph node metastazation of breast cancer. All three conserved domains in TP53 were mutated in lymph node-negative breast cancers, whereas only one domain was mutated in lymph node-positive samples. Mutations in microtubule-related proteins appear to help immune cells recognize tumors and inhibit their lymph node metastasis. Destroying microtubule-related proteins is a potential therapeutic strategy to inhibit lymph node metastasis of breast cancer. As the neoantigens specifically present in lymph node-positive breast cancers, MAPK10, BC9L, TRIM65, CD93, KITLG, CNPPD1, CPED1, CCDC146, TMEM185A, INO80D, and PSMD11 are potential targets for vaccine design. In the tumor microenvironment, reduced numbers of effector immune cells, especially activated memory CD4+ T cells and activated mast cells, facilitate breast cancer metastasis to the lymph nodes. According to transcriptome data, lymph node metastasis was mostly driven by gene mutation rather than by gene expression. Although differential gene expression analysis was based on lymph node metastasis status, many genes were shown to be differentially expressed based on estrogen receptor status.
Introduction
Breast cancer is the most frequently occurring cancer in women and has become a major public health problem. The worldwide incidence of female breast cancer has been predicted to reach approximately 3.2 million new cases per year by 20501.
Lymph node metastasis is of major prognostic significance for breast cancer2. The presence and number of lymph node metastases are associated with compromised survival in patients with other types of cancer, such as papillary thyroid cancer3. Metastasis is caused by complex interactions that involve many factors, including molecular factors triggered by tumor cell proliferation, cytokine production and expansion, tumor microenvironmental changes, and other mechanical factors inside the tumor and their interactions with host tissues4.
The transitional view indicates that tumor metastasis is the result of an accumulation of mutations, especially mutations in metastasis genes. A study by Simpson et al. showed that the tumor mutation burden increases the presentation of neoantigens that stimulate immune tumor recognition, resulting in improved immunotherapy outcomes in melanoma and other cancers5. A higher mutation burden and mutant allele fraction of circulating tumor DNA corresponds to a worse progression-free survival in metastatic breast cancer patients6. Mansfield et al. observed a higher mutation burden in metastatic lesions7. However, the relationship between mutation and neoantigen burden of primary breast cancer and lymph node metastasis is not known.
Tumor-infiltrating lymphocytes (TILs) are associated with the response to neoadjuvant chemotherapy in triple-negative breast cancer (TNBC) and HER2-positive breast cancer8. Pan-cancer immunogenomic analyses have revealed that many TILs related to adaptive immunity are associated with a good prognosis, including activated CD8+ T cells, effector memory T cells and central memory CD8+ T cells, and effector memory CD4+ T cells, whereas MDSCs and Tregs are associated with a poor prognosis9. Therefore, studying TILs that are highly enriched in non-lymph node metastasis breast cancers can provide clues for slowing tumor progression.
In clinical practice, we noted that breast cancer is highly heterogeneous in its pathological characteristics. Some patients have no lymph node metastasis, even when the primary tumors are relatively large, while others have lymph node metastasis at a very early stage. To investigate the mechanism of lymph node metastasis in breast cancer, we downloaded whole exome sequencing data and RNA-seq data from 243 samples from the TCGA project and assessed the tumor itself and tumor microenvironmental characteristics, such as the mutation burden, neoantigens, tumor heterogeneity, TILs and gene expression. Interestingly, we noted that a high mutation burden and neoantigen burden can suppress lymph node metastasis of breast cancer. Most of the lymph node-negative specific mutations are in proteins associated with microtubules. In other words, destroying microtubule-related protein structures may help inhibit lymph node metastasis in breast cancer. For TP53, the distribution of mutation hotspots in the lymph node-positive group was clearly distinct from that in the lymph node-negative group. We analyzed the neoantigen origin proteins specifically present in the lymph node metastasis group, which suggested potential target therapies for inhibiting breast cancer metastasis. As expected, the fraction of effector TILs is higher in samples with no lymph node metastasis than in samples with lymph node metastasis. In particular, the proportions of activated memory CD4+ T cells and activated mast cells in the lymph node-negative group were both double those in the lymph node-positive group.
Results
Sample demographic statistics
The publicly available 1098 BRCA clinical information in the TCGA database was used as the primary source. Using the criteria in the methods section, there were 128 LN-negative samples and 115 LN-positive samples. The demographic characteristics are shown in Table 1.
Table 1.
LN-negative | LN-postive | p | |
---|---|---|---|
N | 128 | 115 | |
Race (%) | 0.355 | ||
NA | 3 (2.3) | 0 (0.0) | |
American Indian or Alaska native | 0 (0.0) | 1 (0.9) | |
Asian | 6 (4.7) | 5 (4.3) | |
Black or African American | 23 (18.0) | 17 (14.8) | |
White | 96 (75.0) | 92 (80.0) | |
Number of positive lymphnodes by HE (mean (sd)) | 0.00 (0.00) | 7.13 (5.71) | <0.001 |
Progesterone receptor status (%) | 0.035 | ||
NA | 9 (7.0) | 10 (8.7) | |
Indeterminate | 0 (0.0) | 1 (0.9) | |
Negative | 54 (42.2) | 29 (25.2) | |
Positive | 65 (50.8) | 75 (65.2) | |
Estrogen receptor status (%) | 0.01 | ||
NA | 8 (6.2) | 10 (8.7) | |
Indeterminate | 0 (0.0) | 1 (0.9) | |
Negative | 46 (35.9) | 20 (17.4) | |
Positive | 74 (57.8) | 84 (73.0) | |
HER2 immunohistochemistry receptor status (%) | 0.026 | ||
NA | 15 (11.7) | 26 (22.6) | |
Equivocal | 28 (21.9) | 12 (10.4) | |
Indeterminate | 0 (0.0) | 1 (0.9) | |
Negative | 69 (53.9) | 57 (49.6) | |
Positive | 16 (12.5) | 19 (16.5) | |
Therapy types (%) | 0.019 | ||
NA | 36 (28.1) | 24 (20.9) | |
Ancillary | 1 (0.8) | 1 (0.9) | |
Chemotherapy | 55 (43.0) | 73 (63.5) | |
Chemotherapy and hormone therapy | 0 (0.0) | 1 (0.9) | |
Hormone therapy | 34 (26.6) | 12 (10.4) | |
Immunotherapy | 1 (0.8) | 1 (0.9) | |
Targeted molecular therapy | 0 (0.0) | 1 (0.9) | |
Other | 1 (0.8) | 2 (1.7) | |
Pathologic stage (%) | <0.001 | ||
Stage II | 3 (2.4) | 1 (0.9) | |
Stage IIA | 106 (82.8) | 8 (7.0) | |
Stage IIB | 15 (11.7) | 26 (22.6) | |
Stage III | 0 (0.0) | 2 (1.7) | |
Stage IIIA | 0 (0.0) | 53 (46.1) | |
Stage IIIB | 4 (3.1) | 3 (2.6) | |
Stage IIIC | 0 (0.0) | 22 (19.1) | |
Age at initial pathologic diagnosis (mean (sd)) | 54.87 (9.19) | 52.43 (9.08) | 0.039 |
Vital status follow up (%) | 0.142 | ||
NA | 2 (1.7) | 3 (2.8) | |
Alive | 111 (91.7) | 89 (83.2) | |
Dead | 8 (6.6) | 15 (14.0) | |
OS time (mean (sd)) | 1108.70 (1069.91) | 1416.06 (1426.87) | 0.091 |
Mutation burden in relation to lymph node metastasis
In general, a malignant tumor, such as a tumor with lymph node metastasis, was considered to have a high mutation burden. We first asked whether the non-synonymous mutation burden could distinguish LN-negative and LN-positive groups. The somatic mutations detected by the mutect2 software of 118 LN-negative samples and 99 LN-positive samples were available. Interestingly, as shown in Fig. 1A, the non-synonymous mutation burden of the LN-negative group (median 47) was significantly higher (Wilcox rank-sum test p < 0.0001) than that of the LN-positive group (median 32). As high TMB may be associated TNBC, we stratified the data into 2 groups, TNBC group and non-TNBC group. TMBs were compared between LN-negative and LN-positive groups in each stratification. For the TNBC stratification, the Wilcox rank sum test was used and a p-value of 0.008 was detected. For non-TNBC samples, the p value was found to be 0.012.
Highly mutated genes with distinct mutation patterns
Mutational patterns of highly mutated genes were distinct between the LN-negative and LN-positive groups. In the top 10 mutated genes of the LN-negative and LN-positive groups, TP53, PIK3CA, TTN, CDH1, GATA3, and KMT2C are shared (Fig. 1B). More nonsense (stop-gain) and fewer frame-shift-deletion mutations on the CDH1 gene were in the LN-negative group than in the LN-positive group. We also noted one nonsense mutation in PIK3CA in the LN-negative group (Fig. 1C). As TP53 is a tumor suppressor gene, the mutation spots were discrete. PIK3CA is a proto-oncogene, and the mutation spots were clustered (Fig. 1D). We noted that all three conserved domains on TP53 were mutated in the LN-negative group. However, only one conserved domain was mutated in the LN-positive group. The mutated spot distributions on PIK3CA were similar between the LN-negative and LN-positive groups (Fig. 1D).
Almost all genes with significantly differential mutation rates are specific to the LN-negative group
We selected genes with a significantly different mutation rate between LN-negative and LN-positive groups, as shown in Table 2. The numbers in the second and third columns are number of samples with mutations for each gene. All of the genes, except PLD5, were highly mutated in the LN-negative group.
Table 2.
Gene | LN-positive (n = 99) | LN-negative (n = 118) | p-value |
---|---|---|---|
DST | 0 | 10 | 0.002187 |
PCNT | 0 | 9 | 0.004294 |
LRP1B | 1 | 11 | 0.00713 |
HERC2 | 0 | 8 | 0.008435 |
RB1 | 0 | 8 | 0.008435 |
ZDBF2 | 0 | 8 | 0.008435 |
PCDH15 | 0 | 7 | 0.01658 |
PREX2 | 0 | 7 | 0.01658 |
TNIK | 0 | 7 | 0.01658 |
XIRP2 | 0 | 7 | 0.01658 |
ZNF536 | 0 | 7 | 0.01658 |
DNAH7 | 1 | 9 | 0.02326 |
VPS13C | 1 | 9 | 0.02326 |
TP53 | 32 | 56 | 0.02681 |
MUC16 | 4 | 15 | 0.02957 |
BIRC6 | 0 | 6 | 0.03265 |
BRCA1 | 0 | 6 | 0.03265 |
CECR2 | 0 | 6 | 0.03265 |
CNTNAP5 | 0 | 6 | 0.03265 |
DUSP27 | 0 | 6 | 0.03265 |
PCDH19 | 0 | 6 | 0.03265 |
PDE4B | 0 | 6 | 0.03265 |
RP1 | 0 | 6 | 0.03265 |
SI | 0 | 6 | 0.03265 |
TPR | 0 | 6 | 0.03265 |
TRPS1 | 0 | 6 | 0.03265 |
PLD5 | 4 | 0 | 0.04189 |
To investigate the functional association of the genes with a significantly different mutation rate, we analyzed them with the GeneMANIA plugin in the Cytoscape software (Fig. 2). The yellow genes are query genes, while the gray genes are related to the query genes. Most of the network interactions were physical interactions, genetic interactions, or co-expression. The largest functional group genes in the network was related to microtubules (shown with a diamond shape in Fig. 2), such as microtubule cytoskeleton organization, microtubule-associated complex, and microtubule binding. The involved genes included BRCA1, PCNT, BIRC6, RP1, RB1, TRP, DNAH7, PAFAH1B1, DYNC1H1, DISC1, and AGTPBP1.
Neoantigen burden is low in LN-positive samples, but neoantigen origin proteins may be potential vaccine targets
The LN-positive samples had significantly lower (Wilcox rank sum test p < 0.005) neoantigen burden (Fig. 3A) and neoantigen origin protein burden (Fig. 3B) than samples from the LN-negative group. The neoantigen origin proteins (N = 11) that occurred in only the LN-positive group were closely connected (Fig. 3C). LN-positive-specific neoantigen proteins included MAPK10, BC9L, TRIM65, CD93, KITLG, CNPPD1, CPED1, CCDC146, TMEM185A, INO80D, and PSMD11. MAPK is a type of protein kinase that is involved in directing cellular responses to a diverse array of stimuli, such as mitogens, osmotic stress, heat shock and proinflammatory cytokines10. MAPKs regulate cell functions, including proliferation, gene expression, differentiation, mitosis, cell survival, and apoptosis. BCL9L (B-cell CLL/lymphoma 9 like) protein shares a conserved domain with BCL9, which is related to intestinal tumor progression. TRIM65 can trigger -catenin signaling via ubiquitylation of Axin1 to promote hepatocellular carcinoma11.
LN-negative samples have high heterogeneity
Tumor heterogeneity and clonality of mutations within lesions are deemed responsible for relapses in malignancies and present challenges for targeted therapy. Therefore, we compared clonality and neoantigen origin clonal information between the LN-negative and LN-positive groups (Fig. 4A). Obviously, in the overall samples, ER-negative samples, or ER-positive samples, there were more clonal and subclonal samples in the LN-negative group. We also noted that the number of neoantigens from clonal and subclonal samples had the same trend, although many were not statistically significant. The tumor composition of the ER-negative group was more complex than that of the ER-positive group (Fig. 4A middle vs bottom). This result could help to explain why the ER-negative samples were more malignant.
Activated memory CD4+ T cells and mast cells heavily infiltrated samples from the LN-negative group
TILs include T cells, B cells, natural killer cells, macrophages, neutrophils, dendritic cells, mast cells, eosinophils, and basophils. Tumor-infiltrating immune cells can often be found in the stroma and within the tumor itself. Their functions can dynamically change throughout tumor progression and in response to anticancer therapy. TILs are implicated in killing tumor cells. The presence of lymphocytes in tumors is often associated with a better clinical outcome.
We classified 22 immune cell types into three groups: effectors, suppressors, and reserves (Table S1 and Fig. 4B). In overall samples, ER-negative samples, and ER-positive samples, there were more effector immune cells and fewer reserve cells in the LN-negative group. In particular, the activated memory CD4+ T cell fraction in the LN-negative group was approximately 2.4%, which was double that of the LN-positive group. The fraction of activated mast cells was higher in the LN-negative group. The number of CD8+ T cells and activated dendritic cells in the LN-negative group was slightly higher than that of the LN-positive group. There were no obviously different suppressor cell fractions between the LN-negative group and the LN-positive group. Resting mast cell fraction was higher in the LN-positive group than in the LN-negative group.
Lymph node metastasis of breast cancer is likely driven by mutations but not by changes in gene expression
DESeq2, TCGAanalyze_DEA and limma methods were used to select 598, 456 and 866 genes as differentially expressed genes (DEGs), respectively. Forty-eight DEGs were shared by three methods. Although stringent criteria were used to select DEGs, the heatmap showed that some genes were expressed unstably in a group. We noted that the instability of expression could mostly be explained by estrogen status. The estrogen status was often the same as the progesterone and estrogen status but not the HER2 status (Fig. 5A).
Functional clustering shows that the DEG functions are associated with ‘immune response’, ‘defense response’ and ‘cellular response to chemical stimulus’, among others (Fig. 5B).
Discussion
To study the mechanism of lymph node metastasis in breast cancer, we analyzed exome sequencing and RNA sequencing data from more than 200 samples from the TCGA project. Our results revealed a number of associations for breast cancer lymph node metastasis, such as non-synonymous mutation burden, neoantigen burden, significantly different gene mutation rates, mutation hotspot distribution on TP53, tumor heterogeneity, neoantigen origin proteins and differentially expressed genes.
First, we observed that breast cancer samples with lower mutation and neoantigen burden are more likely to have lymph node metastasis. The cumulative theory of mutations suggests that tumors are caused by increased mutations. With the increase in the mutation load, the original normal cells gain the ability to indefinitely differentiate and form tumor cells. The higher the mutation load, the higher the malignancy of the tumor. The trend in tumor metastasis is an important indicator for evaluating the malignancy of the tumor. Cazier et al. found that in bladder cancer patients, the mutation load was correlated with clinical pathology. A high mutation load can help identify lesions with a high risk of invasiveness in early or poorly differentiated tumors12. However, we found that breast cancer samples with no lymph node metastasis have a higher mutation burden. These tumor cells can be regarded as foreign substances; therefore, non-specific immune cells are more likely to target cells with a large number of mutations. In our analysis, it was found that in the non-lymph node metastasis group, the level of activated dendritic cells was higher than that of lymph node metastases, and these cells could stimulate innate immunity. The innate immune system cleared the highly mutated cells quickly, and they could not metastasize. Our results were consistent with the results of Birkbak et al. They found that, in TCGA ovarian cancer samples, a small number of non-synonymous mutations suggested that patients have chemotherapy resistance and a shorter progression-free survival and overall survival, while a large number of homozygous mutations predicted a better prognosis for ovarian cancer patients13.
Many genes with a significantly different mutation rate between LN-positive and LN-negative groups were related to microtubules. Tubulin and microtubule-associated proteins may play a role in a series of cellular stress responses, thereby helping cancer cell survival14. The tubulin family is the target of tubulin-based chemotherapeutic drugs, which inhibit the dynamics of the mitotic spindle causing mitotic arrest and cell death. Changes in microtubule stability and the expression of different tubulin isoforms as well as altered post-translational modifications have been reported to be involved in a variety of cancers15.
Somatic mutation-induced tumor-specific antigens (neoantigens) have become key targets of immunotherapy. Neoantigen burden can be a biomarker in cancer immunotherapy and provide an incentive for the development of novel therapeutic approaches that selectively enhance T cell reactivity against this class of antigens16. We found that the neoantigen peptide burden was significantly higher in the non-lymph node metastasis group than in the lymph node metastasis group. More than one neoantigen peptide can come from one protein. When comparing neoantigen origin protein burden, the difference is more pronounced. However, we noted that the neoantigen origin clonal fraction in the 2 groups was almost the same. Therefore, if a tumor vaccine was used against the neoantigen in the clone, there would be no difference in the response rate between the two types of breast cancer.
A neoantigen-targeting vaccine showed promise in several preclinical and clinical studies. However, to date, neoantigen vaccine studies have involved only tumors with a high mutation burden. In reality, T cells that specifically target neoantigens do not always recognize tumor cells. In other words, corresponding mutations do not produce MHC-presenting epitopes17. In our study, we filtered neoantigen-associated proteins in lymph node metastasis samples. It can be hypothesized that drugs targeting these proteins can inhibit lymph node metastasis in breast cancer. These proteins include MAPK10, BCL9L, TRIM65, CD93, KITLG, CNPPD1, CPED1, CCDC146, TMEM185A, INO80D, and PSMD11. Using mass spectrometry technology, Maurizio and his colleagues also found that CD93 is an antigen bound by 4E1 and mapped the recognized epitope. CD93 is a transmembrane protein that is heavily glycosylated and preferentially expressed in the vascular endothelium. CD93 silencing impairs human endothelial cell proliferation, migration, and sprouting. They revealed that 4E1 was a novel antiangiogenic antibody and identified CD93 as a new target suitable for antiangiogenic therapy18. This study suggested that the proteins we listed give us clues for potential immunotherapy targets. The neoantigen of CD93 only occurs in breast cancer with lymph node metastasis and indicates a close relationship between angiogenesis and lymph node metastasis.
It can be understood that the tumor suppressor gene TP53 has a discrete distribution of mutations and that mutations in the proto-oncogene PIK3CA cluster into hotspots. These two genes are highly mutated in both the lymph node metastasis group and the nonmetastatic group. The study by Kotoula et al. showed that TP53 and PIK3CA mutations appear to have diverse effects on the outcome of early breast cancer patients, according to whether or not these genes were comutated19. We found that 14 (13.0%) samples in the nonmetastasis group were comutated in these 2 genes. Correspondingly, the number of comutated samples in the lymph node metastasis group was 9 (9.1%). The proportion was low. Another finding is that in breast cancer samples with lymph node metastasis, the mutation hotspots in TP53 are only discretely distributed in the p53 (DNA-binding) domain, which is consistent with the previous study that most cancer somatic mutations are located in the DNA-binding domain20. In the non-lymph node metastasis group, mutations are widely distributed on three conserved domains, P52_TAD (natively unfolded amino-terminal transactivation domain), P53 DNA-binding and P53_tetramer (tetramerization). This phenomenon may be a biomarker for good prognosis.
The TIL status has been recently proposed to predict the clinical outcome of patients with breast cancer. TILs are independent positive prognostic indicators of survival time for neoadjuvant anti-HER2 therapy and chemotherapy for early breast cancer patients21. In the future, TILs should be considered a prognostic marker of clinical therapies for HER2-positive BC22. We found that the activated memory CD4+ T cell fraction in the LN-negative group was approximately 2.4%, which is double that of the LN-positive group. Lucas et al. compared primary and metastatic thyroid cancer and noted that LN metastasis is enriched with activated immune cells23. Unlike their study, we compared primary cancer, with one group having lymph node metastasis in the early stage and the other group having no lymph node metastasis, even though the primary tumor was relatively large. Our results suggested that metastasis ability was not gained by tumor growth and differentiation, and tumors with innately metastasis ability use a different intrinsic mechanism.
The weakness of this study is that we only analyzed a limited sample size per group. However, tumors are highly heterogeneous diseases. Our data suggested that there are trends that may not be observed in some other specific samples. In addition, many conclusions are not necessarily suitable for other types of tumors.
Methods
The publicly available BRCA datasets were downloaded from the TCGA project24 using the TCGAbiolinks package25 from Bioconductor. We selected 30- to 70-year-old females with no positive lymph nodes by HE, stage II, IIA, IIB, or IIIB, TNM categories26 of T4N0M0, T4bN0M0, T3N0M0, T3N0(i-)M0, T2N0M0 or T2N0(i-)M0 samples as the LN-negative group. Using these criteria, the samples in the LN-negative group with very early stage or too small of a tumor size were excluded. Samples from 30- to 70-year-old females with >3 positive lymph nodes, stage IIIA or IIIC, TNM categories of T1cN1M0, T1cN1MX, T2bN1M0, T2N1aM0, T2N1bM0, T2N1M0, T1bN3aM0, T1cN1aM0, T2N1M0, T1bN3aMx, T1cN2aM0, T1cN2aMx, T1cN2M0, T1cN3aMx, T1N2M0, T2N2aM0, T2N2aMx, T2N2M0, T2N3aM0, T2N3aMx, T2N3bM0, T2N3bMx, T2N3cM0, T2N3M0, T2N3Mx, T4bN1bM0, T4bN1M0, T3N2M0 and T3N3M0 were classified as the LN-positive group. Samples with very a late stage or too large of a tumor size were excluded.
The somatic mutations detected by the mutect227 software were downloaded from the TCGA project. The mutation dataset for 118 LN-negative samples and 99 LN-positive samples were available. The synonymous variants and variants in the intergenic or noncoding regions were filtered out for mutation burden analysis. The maftools28 package was used for mutation spectrum visualization. A chi-square test from the R chi sq. test function29 was used to compare the sample mutation rates between the LN-negative and LN-positive groups.
To investigate the functional association of the genes with significantly different mutation rates, we utilized the GeneMANIA plugin30 in the Cytoscape31 software (version 3.6.0; National Institute of General Medical Sciences, Seattle, WA, USA) based on a large set of functional association data, including protein and genetic interactions, co-expression, co-localization pathways, and protein domain similarity.
The neoantigens for each sample, clonality and neoantigen origin clonal information were downloaded from TCIA 9 (The Cancer Immunome Atlas, https://tcia.at/) project. In the TCIA pipeline, HLA alleles were determined from RNA-sequencing data using Optitype32. Mutated protein peptides of 8–11 amino acids in length were analyzed with NetMHCpan33 to estimate their binding affinity to the HLA alleles. If the expression of an identified peptide-associated gene exceeded a certain threshold, it was considered to be a neoantigen. We investigated the association of neoantigen origin proteins, which only occurred in the LN-positive group, with GeneMANIA in the Cytoscape software. For clonality information, the ABSOLUTE algorithm34 was used to measure the fraction of cancer cells (CCF) per mutation in the TCIA pipeline. A mutation was classified as clonal if the CCF was >0.95 with probability >0.5, and the mutation was otherwise considered subclonal.
For gene expression data, we downloaded the level-3 RNA-seq FPKM dataset. The number of fragments per kilobase of transcript per million mapped reads is represented. The CIBERSORT35 algorithm was used to infer the TIL proportions of the tumor microenvironment. The LM22 dataset was downloaded from the CIBERSORT website (https://cibersort.stanford.edu/download.php), and it consisted of 22 distinct immune cell types and was constructed from the gene expression profiles of these cell types. DESeq236, limma37 and TCGAanalyze_DEA function in TCGAbiolinks were used for DEG analysis. In the DESeq2 and TCGAanalyze_DEA methods, the criteria for DEGs were fold-change >2 (or <0.5) and adjusted p-value <0.01. The cutoff in the limma method was fold-change >1.5 (or <0.67) and adjusted p-value < 0.05. We kept the intersected genes as the final DEGs. The DESeq2 and TCGAanalyze DEA function in TCGAbiolinks use the count data directly for DEG analysis. While in the limma method, voom function was used to transform count data to log2-counts per million (logCPM) for the linear model. The logCPM was used to transform expression data for heatmap visualization. The DAVID web service38 was used for DEG functional annotation and functional clustering. The FGnet39 R package was used for functional clustering visualization.
Electronic supplementary material
Acknowledgements
This work is supported by National Key R&D Program of China (2017YFC0907505), National Population and Health Science Data Sharing Platform Special Project (2016NCMIZX01 to Zhigang), and Peking Union Medical College Youth Research Fund (201407 to Zhigang). We appreciate the language editing of Professor Yongqun He from the University of Michigan Medical School. We also appreciate the suggestions and help of Lifang Xie from Information Technology Center of PUMC, Haitao Luo from Seqchina Co., Ltd., Nana Luo from Allwegene Co., Ltd., and Jiancheng Luo from Aiyi Co., Ltd.
Author Contributions
Y.L. and B.Z. designed the research study. Z.W. performed the analysis. W.L. gave suggestion on the anaslysis pipeline. W.L., X.Y. and C.C. assisted in the analysis. Z.W., C.C. and Y.L. wrote the paper.
Competing Interests
The authors declare no competing interests.
Footnotes
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Yunping Luo, Email: ypluo@ibms.pumc.edu.cn.
Bailin Zhang, Email: bailin_zhang@cicams.ac.cn.
Electronic supplementary material
Supplementary information accompanies this paper at 10.1038/s41598-018-36319-x.
References
- 1.Hortobagyi GN, et al. The global breast cancer burden: variations in epidemiology and survival. Clinical Breast Cancer. 2005;6:391–401. doi: 10.3816/CBC.2005.n.043. [DOI] [PubMed] [Google Scholar]
- 2.Wei J-C, et al. Tumor-associated lymphatic endothelial cells promote lymphatic metastasis by highly expressing and secreting sema4c. Clinical Cancer Research. 2017;23:214–224. doi: 10.1158/1078-0432.CCR-16-0741. [DOI] [PubMed] [Google Scholar]
- 3.Adam MA, et al. Presence and number of lymph node metastases are associated with compromised survival for patients younger than age 45 years with papillary thyroid cancer. Journal of Clinical Oncology. 2015;33:2370–2375. doi: 10.1200/JCO.2014.59.8391. [DOI] [PubMed] [Google Scholar]
- 4.Nathanson SD, Shah R, Rosso K. Sentinel lymph node metastases in cancer: Causes, detection and their role in disease progression. Seminars in Cell & Developmental Biology. 2015;38:106–116. doi: 10.1016/j.semcdb.2014.10.002. [DOI] [PubMed] [Google Scholar]
- 5.Simpson D, et al. Mutation burden as a potential prognostic marker of melanoma progression and survival. Journal of Clinical Oncology. 2017;35:9567–9567. doi: 10.1200/JCO.2017.35.15_suppl.9567. [DOI] [Google Scholar]
- 6.Keenan T, et al. Abstract P2-02-18: Higher mutation burden and mutant allele fraction of circulating tumor dna corresponds to worse progression free survival in metastatic breast cancer patients. Cancer Research. 2018;78:P2–02–18–P2–02–18. doi: 10.1158/1538-7445.SABCS17-P2-02-18. [DOI] [Google Scholar]
- 7.Mansfield AS, et al. Contraction of t cell richness in lung cancer brain metastases. Scientific Reports. 2018;8:2171. doi: 10.1038/s41598-018-20622-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Denkert C, et al. Tumour-infiltrating lymphocytes and prognosis in different subtypes of breast cancer: a pooled analysis of 3771 patients treated with neoadjuvant therapy. The Lancet Oncology. 2018;19:40–50. doi: 10.1016/S1470-2045(17)30904-X. [DOI] [PubMed] [Google Scholar]
- 9.Charoentong P, et al. Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkpoint blockade. Cell Reports. 2017;18:248–262. doi: 10.1016/j.celrep.2016.12.019. [DOI] [PubMed] [Google Scholar]
- 10.Pearson G, et al. Mitogen-activated protein (map) kinase pathways: regulation and physiological functions. Endocrine Reviews. 2001;22:153–183. doi: 10.1210/edrv.22.2.0428. [DOI] [PubMed] [Google Scholar]
- 11.Yang Y-F, Zhang M-F, Tian Q-H, Zhang CZ. Trim65 triggers -catenin signaling via ubiquitylation of axin1 to promote hepatocellular carcinoma. Journal of Cell Science. 2017;130:3108–3115. doi: 10.1242/jcs.206623. [DOI] [PubMed] [Google Scholar]
- 12.Cazier J-B, et al. Whole-genome sequencing of bladder cancers reveals somatic CDKN1A mutations and clinicopathological associations with mutation burden. Nature. Communications. 2014;5:3756. doi: 10.1038/ncomms4756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Birkbak NJ, et al. Tumor mutation burden forecasts outcome in ovarian cancer with brca1 or brca2 mutations. PLOS ONE. 2013;8:e80023. doi: 10.1371/journal.pone.0080023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Parker, A. L., Kavallaris, M. & McCarroll, J. A. Microtubules and their role in cellular stress in cancer. Frontiers in Oncology4 (2014). [DOI] [PMC free article] [PubMed]
- 15.Wikipedia contributors. Microtubule — Wikipedia, the free encyclopedia (2018). [Online; accessed 01-June-2018].
- 16.Schumacher TN, Schreiber RD. Neoantigens in cancer immunotherapy. Science. 2015;348:69–74. doi: 10.1126/science.aaa4971. [DOI] [PubMed] [Google Scholar]
- 17.Martin SD, et al. Low mutation burden in ovarian cancer may limit the utility of neoantigen-targeted vaccines. PLOS ONE. 2016;11:e0155189. doi: 10.1371/journal.pone.0155189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Orlandini M, et al. The characterization of a novel monoclonal antibody against cd93 unveils a new antiangiogenic target. Oncotarget. 2014;5:2750–2760. doi: 10.18632/oncotarget.1887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kotoula V, et al. Effects of tp53 and pik3ca mutations in early breast cancer: a matter of co-mutation and tumor-infiltrating lymphocytes. Breast Cancer Research and Treatment. 2016;158:307–321. doi: 10.1007/s10549-016-3883-z. [DOI] [PubMed] [Google Scholar]
- 20.Joerger, A. C. & Fersht, A. R. The tumor suppressorp53: From structures to drug discovery. Cold Spring Harbor Perspectives in Biology2 (2010). [DOI] [PMC free article] [PubMed]
- 21.Salgado R, et al. Tumor-infiltrating lymphocytes and associations with pathological complete response and event-free survival in her2-positive Early-Stage Breast Cancer Treated With Lapatinib and Trastuzumab: A Secondary Analysis of the NeoALTTO Trial. JAMA Oncology. 2015;1:448–455. doi: 10.1001/jamaoncol.2015.0830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Luen SJ, et al. Tumour-infiltrating lymphocytes in advanced HER2-positive breast cancer treated with pertuzumab or placebo in addition to trastuzumab and docetaxel: a retrospective analysis of the CLEOPATRA study. The Lancet Oncology. 2017;18:52–62. doi: 10.1016/S1470-2045(16)30631-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Cunha L, Nonogaki S, Soares FA, Vassallo J, Ward LS. Immune escape mechanism is impaired in the microenvironment of thyroid lymph node metastasis. Endocrine Pathology. 2017;28:369–372. doi: 10.1007/s12022-017-9495-2. [DOI] [PubMed] [Google Scholar]
- 24.Hoadley KA, et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell. 2014;158:929–944. doi: 10.1016/j.cell.2014.06.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Colaprico A, et al. Tcgabiolinks: an r/bioconductor package for integrative analysis of tcga data. Nucleic Acids Research. 2016;44:e71. doi: 10.1093/nar/gkv1507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Edge SB, Compton CC. The american joint committee on cancer: the7th edition of the ajcc cancer staging manual and the future of tnm. Annals of Surgical Oncology. 2010;17:1471–1474. doi: 10.1245/s10434-010-0985-4. [DOI] [PubMed] [Google Scholar]
- 27.Cibulskis K, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnology. 2013;31:213–219. doi: 10.1038/nbt.2514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Mayakonda, A. & Koeffler, H. P. Maftools: Efficient analysis, visualization and summarization of maf files from large-scale cohort based cancer studies. bioRxiv 052662 (2016).
- 29.R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2018).
- 30.Montojo J, et al. Genemania cytoscape plugin: fast gene function predictions on the desktop. Bioinformatics. 2010;26:2927–2928. doi: 10.1093/bioinformatics/btq562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Shannon P, et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Szolek A, et al. Optitype: precision hla typing from next-generation sequencing data. Bioinformatics. 2014;30:3310–3316. doi: 10.1093/bioinformatics/btu548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Hoof I, et al. Netmhcpan, a method for mhc class i binding prediction beyond humans. Immunogenetics. 2009;61:1. doi: 10.1007/s00251-008-0341-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Carter SL, et al. Absolute quantification of somatic dna alterations in human cancer. Nature Biotechnology. 2012;30:413–421. doi: 10.1038/nbt.2203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Newman AM, et al. Robust enumeration of cell subsets from tissue expression profiles. Nature Methods. 2015;12:453–457. doi: 10.1038/nmeth.3337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for rna-seq data with deseq2. Genome Biology. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ritchie ME, et al. limma powers differential expression analyses for rna-sequencing and microarray studies. Nucleic Acids Research. 2015;43:e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protocols. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 39.Aibar S, Fontanillo C, Droste C, De Las Rivas J. Functional gene networks: R/bioc package to generate and analyse gene networks derived from functional enrichment and clustering. Bioinformatics. 2015;31:1686–1688. doi: 10.1093/bioinformatics/btu864. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.