Abstract
Dysbioisis is an imbalance of an organ's microbiome and plays a role in colorectal cancer pathogenesis. Characterizing the bacteria in the microenvironment of a cancer through genome sequencing has advantages compared to culture-based profiling. However, there are notable technical and analytical challenges in characterizing universal features of tumor microbiomes. Colorectal tumors demonstrate microbiome variation among different studies and across individual patients. To address these issues, we conducted a computational study to determine a consensus microbiome for colorectal cancer, analyzing 924 tumors from eight independent RNA-Seq data sets. A standardized meta-transcriptomic analysis pipeline was established with quality control metrics. Microbiome profiles across different cohorts were compared and recurrently altered microbial shifts specific to colorectal cancer were determined. We identified cancer-specific set of 114 microbial species associated with tumors that were found among all investigated studies. Firmicutes, Bacteroidetes, Proteobacteria and Actinobacteria were among the four most abundant phyla for the colorectal cancer microbiome. Member species of Clostridia were depleted and Fusobacterium nucleatum was one of the most enriched bacterial species in tumors. Associations between the consensus species and specific immune cell types were noted. Our results are available as a web data resource for other researchers to explore (https://crc-microbiome.stanford.edu).
Graphical Abstract
Graphical Abstract.
We identified a consensus microbiome for colorectal cancer from 924 tumors from eight RNA-Seq data sets. This microbiome profile had 114 microbial species was associated with tumors across all studies.
INTRODUCTION
Tumors such as colorectal cancer have specific biological interactions with its surrounding commensal microbial species. Humans coexist with a rich diversity of bacteria and viruses living within the confines of specific tissue niches. This collection of microbial organisms, referred to as the microbiome, vastly outnumber the eukaryotic cells making up our various tissues (1,2). The cellular interactions of specific organ tissue and the microbiome can be beneficial, neutral or pathogenic in terms of non-infectious human diseases. Beneficial microbes play critical roles in maintaining immune function, metabolic homeostasis and overall health (3). Neutral bacteria have no discernible consequences on the host. Pathogenic microorganisms may increase the risk and severity of conditions like inflammation (4), obesity (5), fatty liver disease (6), type 2 diabetes (7) and carcinogenesis (8). An indicator of a microbial influence in disease pathogenesis, dysbiosis is an imbalanced state of the naturally occurring microbiota where specific pathogenic microbes overgrow other components. This phenomenon leads to a fundamental shift in the contents of the microbiome. This imbalance has the potential to lead to cancer (4). The microbiome properties of colorectal cancer (CRC) have been of interest given that the colon and the rectum have the most abundant and diverse microbiome for any human organ. Many studies seek to identify specific microbiome properties that are indicators of dysbiosis and influence colorectal cancer development, phenotype and clinical outcomes.
We define two specific environmental niches for the analysis of the colorectal cancer microbiome. Generally, the largest and most diverse niche involves the microbial and viral contents of the fecal material within the colon. This is a high biomass source for microbiomes. The smaller niche, a subset of the fecal material, involves those microbes that are in direct contact with the colorectal tumor. The colon mucosa is composed of a thin layer of epithelial cells (the epithelium), a layer of connective tissue (the lamina propria) and a thin layer of muscle (muscularis propriate). The formal pathologic definition of colon carcinoma refers to epithelial cells that have malignant properties and invaded past multiple layers of the mucosa (9). Therefore, a clinical diagnosis of colorectal cancer requires an adequate biopsy of colon tissue which have all mucosal cell layers. Per the universally accepted histopathologic criteria, colon adenocarcinoma arises from the mucosal epithelium. This colorectal mucosa-associated microbiome has an important role in colorectal cancer biology given its direct contact to the colon epithelial tumor cells and its interactions with the local tumor microenvironment (TME). Because of its direct contact with the colon cellular microenvironment, this microbiome niche is carried over after a biopsy or surgical resection of a tumor. Thus, the tumor extracted DNA or RNA reflect the microbial contents adjacent to and intermingled with the colon mucosa.
For genomic microbial characterization of CRCs, next-generation sequencing (NGS) methods like RNA-Seq have been used for determining the microbiomes of specific tissues. For example, Simon et al investigated >17 000 samples from publicly available human RNA-Seq data and found that a significant proportion of unmapped reads were of microbial origin (10). Sequencing the 16S rRNA gene is another common method for determining microbiome characteristics. The 16S gene contains nine hypervariable regions (V1-V9) that provide a sequence barcode for identifying microbial species and conducting phylogenetic analysis (11). Depending on the sequencing approach, microbial abundance estimation is represented in operational taxonomic units (OTUs) or amplicon sequence variants (ASVs), which are usually mapped to the genus or species level (12). Each molecular dataset captures different aspects about the patient's microbiota; comparative analysis of data from these two methods may provide insights not possible through a single data type alone.
There is substantive evidence that dysbiosis is associated with the development and progression of CRC (11,13,14). Studies have focused on either studying (1) the fecal contents from CRC patients or (2) direct analysis of CRC tumors with the microbiome that is in direct contact with the tumor epithelium. Citing a study from the former, Sobhani et al. (13) performed one of the first studies to identify cancer-related dysbiosis in CRC from the analysis of fecal material from patients. They found that an elevated representation of the Bacteroides/Prevotella genus was present among the majority of CRCs in their sample set. Using a similar approach, Yu et al. (14) did a metagenomic profiling of CRC samples and showed that four microbial species, including Parvimonas micra, Solobacterium moorei, Fusobacterium nucleatum and Peptostreptococcus stomatis were enriched in individuals with CRC compared to normal controls. These studies were limited to fecal samples which represent a distinct niche from CRC tissue samples.
The direct sequencing analysis of CRC tumors and the tumor associated mucosa provides insight into the microbiota that are directly associated with the TME. Given their direct contact to the cellular milieu of the tumor, these microbes may play a potential role in the physiopathology of CRC (15). Citing the most widely validated example of mucosa-proximal microbiome of CRC, many studies have demonstrated an enrichment of Fusobacterium nucleatum, which we will refer to as F. nucleatum for short, in CRC tumors. The initial discoveries were based on identifying microbial sequence reads from genomics studies of CRCs (16). Some studies have shown that F. nucleatum is associated with higher stage CRC and a lower density of T-cells in the CRC TME. Some of these observations have been born out experimentally (17). For example, this bacteria activates the WNT signaling pathway in CRC cells and inhibits T-cell-mediated immune responses against tumors (18).
Obtaining a high-quality characterization of cancer microbiomes has challenges. In the case of the mucosa-associated microbiome of colorectal cancer, samples are exposed to contamination across multiple steps as a clinical biopsy is acquired, processed and sequenced. This includes the presence of microbial DNA among the molecular biology reagents used sequencing and genetic characterization. Complicating any analysis, the use of stringent quality controls has been inconsistent for cancer-based genomic studies of the microbiome (19). These issues can dramatically skew microbiome results. Different sequencing methods such as 16S and RNA-Seq reveal different microbiome features. As an added challenge, microbiomes vary among individuals living in different geographic regions and ethnic backgrounds. This fundamental variation among individual microbiomes makes it more difficult to identify common microbial species that may have a universal role in colorectal cancer tumorigenesis.
Addressing these challenges, we developed a scheme to analyze the colorectal cancer microbiome composition for universally shared features and then determine their potential role in modulating cancer and the immune system. Importantly, we sought to identify consensus bacterial species that were consistently observed across multiple independent CRC cohorts—most studies have been limited to evaluating the genera level. We utilized different RNA-Seq datasets including the Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) and those available from the Gene Expression Omnibus (GEO) database. In total, 924 CRCs were included in this study to investigate the different microbiome profiles across different studies. Among the eight studies, five used total RNA and three used mRNA selection. One of our goals was to determine common bacterial species that were detected and associated with colorectal cancers regardless of this isolation RNA method (Table 1).
Table 1.
RNA-Seq datasets included in the study
Study | Type of tissue | Sample origins | Type of RNA for sequencing | Sample size | Bacterial species |
---|---|---|---|---|---|
IMS3 | Tumor (N = 924) | Western United States | Total RNA | 162 | 731 |
TCGA | Multiple countries* | mRNA and total RNA | 564 | 4187 | |
GSE107422 | South Korea | mRNA per poly-A selection | 109 | 744 | |
GSE146889 | Midwest United States | Total RNA | 42 | 1293 | |
GSE50760 | South Korea | mRNA per poly-A selection | 18 | 951 | |
GSE95132 | Eastern United States | Total RNA | 10 | 1321 | |
GSE104836 | China | Total RNA | 10 | 1729 | |
GSE137327** | Eastern United States | Total RNA | 9 | 3378 | |
IMS3 | Matched normal colon (N = 298) | Western United States | Total RNA | 162 | 635 |
TCGA | Multiple countries* | mRNA and total RNA | 51 | 3763 | |
GSE146889 | Midwest United States | mRNA per poly-A selection | 38 | 1412 | |
GSE50760 | South Korea | Total RNA | 18 | 883 | |
GSE95132 | Eastern United States | mRNA per poly-A selection | 10 | 1395 | |
GSE104836 | China | Total RNA | 10 | 1673 | |
GSE137327** | Eastern United States | Total RNA | 9 | 3305 |
*Brazil, Germany, Israel, Poland, Russia, Ukraine, United States, Vietnam
**Sequenced on BGI system. All other studies used Illumina.
With this large number of samples, we processed all data in the same fashion. This analysis pipeline included conducted a rigorous quality control to eliminate potential contaminants and reduce the effect of batch bias. To evaluate the quality of our mucosa-associated RNA-Seq data in evaluating microbiomes, we compared these results to a 16S analysis for a subset of overlapping samples. We used the same tumor RNA-Seq to determine the cellular tumor microenvironment features of each tumor. Finally, we derived a consensus microbiome composition across different CRC cohorts, determined dysbiosis features when examining normal tumor pairs and investigated several microbial species’ association with CRC’s immune cellular characteristics. To facilitate the sharing of this consensus microbiome, our results are available and can be queried through a web data resource (https://crc-microbiome.stanford.edu).
MATERIALS AND METHODS
Colorectal tumor RNA-Seq data
Seven RNA-Seq CRC datasets were downloaded either from NCI’s Genomic Data Commons (GDC) or the Sequence Read Archive (SRA) (Table 1). In addition, we had an internal data set from an independent CRC set that we refer to as IMS3. All participants signed a written informed consent as part of a study protocol approved by Stanford University. Tumor tissues were collected and preserved on formalin-fixed paraffin-embedded (FFPE) slides. All tumor samples were determined to have >60% cellularity in pathology review.
DNA and RNA sequencing of the IMS3 colorectal tumors
Tumor tissues from 2 mm punches or 5 μm scrolls from FFPE blocks were recovered and processed for nucleic acid. RNA was extracted from Maxwell 16 LEV RNA FFPE Purification Kit (Promega, Wisconsin, USA) following the manufacturer’s instructions. RNA-Seq libraries were prepared using KAPA RNA HyperPrep Kit with RiboErase (HMR) (Roche, California, USA) by 8 cycles of PCR. The enriched libraries were quantified by qPCR using Kapa Library Quantification kit (Roche, California, USA) and subjected to Illumina MiSeq sequencing (100 bp paired-end reads).
DNA was extracted using the Promega AS1030 Maxwell 16 Tissue DNA Purification Kit (Promega, Wisconsin, USA) following the manufacturer’s protocols. The concentration of DNA was quantified with the Qubit system (Thermo-Fisher Scientific, Massachusetts, USA), and DNA integrity was evaluated using LabChip GX (PerkinElmer, Waltham, Massachusetts, USA). Five hundred nanograms DNA from each sample was sheared using a Covaris E220 sonicator (Covaris, Massachusetts, USA) (microTUBES AFA fibre, 10% duty cycle, 200 cbp, intensity 5 and time 55 s) and purified by a 0.8× AMPure XP (Beckman-Coulter, California, USA) bead cleanup. The hypervariable regions (V3-4) of the 16S rRNA gene from each sample were amplified using Forward primer (5′-TCG TCG GCA GCG TCA GAT GTG TAT AAG AGA CAG CCT ACG GGN GGC WGC AG-3′) and Reverse primer (5′-GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA GGA CTA CHV GGG TAT CTA ATC C-3′) with Illumina sequencing adaptors (Illumina, California, USA). The purified PCR products were then subjected to a multiplexing process using Nextera XT Index kit (Illumina, California, USA) in 50 μl reactions. After PCR product cleanup, two batches of libraries were quantified and sequenced using an Illumina MiSeq platform.
Sequence data processing for microbiome characterization
Raw RNA-Seq data were preprocessed to remove adapter sequences and low-quality bases with Cutadapt (v2.4) (20). Trimmed data were then mapped to the human genome (GRCh38) using STAR (v2.5) (21). Uniquely mapped reads were used for subsequent immune cell infiltration analysis. Quality controlled (by fastp software) unmapped reads were used as input data for taxonomic assignment for each OTU.
Reverse reads from 16S amplicon sequencing were removed from the analysis due to low sequence quality. The sequence was processed with DADA2 using maxN = 0, maxEE = 2, truncQ = 2 parameters to do reads filtering and quality checks (22). Reads that passed the quality control were used for taxonomy classification. ASV values were determined for each sample.
Taxonomic microbiome classification
Kraken2 was used as the meta-transcriptome classification tool in our study. It relies on exact k-mer matches to assign microbial sequences to specific taxonomic labels (23). Prior studies have used this program to define cancer microbiomes. The unmapped reads from RNA-Seq were queried in a Kraken2 database we created on our server (18 September 2019), which contains taxonomic information (obtained from NCBI Taxonomy database), complete genomes in Refseq for the bacterial, archaeal, viral, plasmid and eukaryotic organisms. The microbial relative abundance was calculated based on the percentages of the microbiome on the selected top ranked phyla/genera. Taxonomy classification results were posted to the CRC consensus microbiome website (https://crc-microbiome.stanford.edu). We applied a frequency filter of 1% and an established list to eliminate potential contaminating species.
A portion of sequencing reads may originate from contaminating microbial DNA that are found in the general environment or contaminants from the sequencing assay. This includes contaminating microbes in the sample that come from clinical processing were present in the sequencing reagents or grew in the fluidic systems of sequencers. To minimize the bias introduced by microbial species not associated with the original CRC, we conducted multiple steps to reduce microbial filtering process. First, we eliminated taxa that were not present in at least one read count in 1% of the samples. Thus, rare contaminants that are underrepresented are eliminated. Second, we used a list of known microbial contaminants compiled by Eisenhofer et al., which addresses many of the potential sources of artifacts in genomic-based microbiome characterization (24), was also filtered out from our study (Supplementary Table S1). This list was compiled by Eisenhofer et al. based on a series of negative controls across multiple studies and part of their ‘RIDE’ minimum standards criteria which addresses many of the potential sources of artifacts in genomic-based microbiome characterization (24).
Gene expression quantification and immune cell infiltration analysis
Gene counts table generated from RNA-Seq mapped reads were normalized using TMM (weighted trimmed mean of M-values) with the EdgeR package and converted to cpm and log2 transformed (25). A filtering process was also performed to exclude genes without at least 1 cpm in 20% of the samples. We used the program Xcell to estimate 64 tumor infiltrating immune and stromal cell types, together with immune, stroma and tumor microenvironment (TME) scores for each tumor’s or normal colon’s RNA-Seq data (xcell.ucsf.edu) (26). Multiple testing correction was applied using the p.adjust() function available in R, with the method set as ‘FDR’. Kruskal–Wallis rank sum test was used to determine differential immune cell infiltration among patient groups using a threshold of multiple testing corrected P < 0.05.
Differential microbiome analysis
Microbial differential analysis was performed using DEseq2 and Phyloseq (27). Statistical tests such as the Chi-Squared test and the Wilcoxon rank-sum test were performed to examine the patient grouping information with various clinical variables. Multiple testing correction was applied as previously described. Results were considered significant if the adjusted P-value was <0.05.
The CRC Microbiome Explorer website
We developed a web-based data resource for our study (https://crc-microbiome.stanford.edu). The microbial abundance data was uploaded to a MySQL (v5.5.62) relational database from kraken2 output converted to mpa format. The database server has 32 GB RAM and 16 processors running Ubuntu (v16.04). The web application was written using Ruby on Rails (v5.1.7 with ruby v2.4.2), a framework well suited for use with a backend relational database. The application server uses Ubuntu (v16.04). The application was deployed using Passenger and Apache2. The user interface utilizes Bootstrap (v3.4.1) for responsive sizing to different format clients and browsers. Jquery dataTables provide standard formatting, search and filtering capability for query tables, and Highcharts is used to format and display plots. All queries and plots are produced dynamically from the underlying database tables based on user query parameters.
RESULTS
CRC microbiome composition estimation from unmapped RNA-Seq
Overall, we analyzed eight primary CRC transcriptomic datasets (28–34) from a variety of sources that included the Cancer Genome Atlas (TCGA), the NIH Gene Expression Omnibus (GEO), the NIH’s Short Read Archive (SRA) and an independent dataset (IMS3). The CRC RNA-Seq studies included the TCGA COAD data set which had the largest number of samples (n = 564) (35,36). In addition, GEO had six different data sets with the highest number of CRCs coming from GSE107422 (n = 109) and the smallest set being GSE137327 (n = 9) (Table 1). The IMS3 data set contained 162 tumor and matched normal tissues. The total number of CRC samples were 924. An additional 298 matching normal colon samples were available for assessing their microbiome characteristics. Except for GSE137327 which used the BGI sequencing technology, all samples were sequenced with Illumina.
In terms of the type of RNA used for the sequencing, five of the eight studies used total RNA for the RNA-seq libraries (Table 1). Total RNA was used for the following data sets: IMS3, GSE146889, GSE95132, GSE104836 and GSE137327. Two of the data sets originated from formalin fixed paraffin embedded samples for which total RNA extraction is required. For the total RNA from flash frozen tumors, a ribosomal RNA depletion method was used to generate quality RNA-seq data that includes RNA from microbial organisms. The TCGA study and two others used mRNA. We noted that prior studies of the TCGA samples have successfully identified microbiome features from this source of RNA (19).
To process these CRC RNA-Seq cohorts, we removed human genome sequences, low quality reads and adapter sequences. Subsequently, we used the high quality microbial (unmapped) reads from a given CRC sample for taxonomy classification with Kraken2 (Figure 1). We also conducted downstream processing and leveraged an updated database that includes NCBI’s RefSeq sequence data for human, bacteria and viruses (Materials and Methods section). Across this extended tumor cohort, we observed that an average of 83% of reads were uniquely mapped to the human genome per sample. Quality controlled, unmapped RNA-Seq reads averaged 4% per sample. The percentage of unmapped reads for each dataset varies from 0.05% (TCGA) to 19.86% (GSE107422) (Supplementary Figure S1). Variations in the raw sequence data, unmapped reads and unmapped ratios were observed from each dataset. For example, the TCGA and GSE146889 cohorts had the largest number of total and unmapped sequences per a sample. The GSE107422 as well as GSE104836 samples set had the highest percentages of unmapped reads (Supplementary Figure S1). Thus, we normalized microbial abundances to the median sequencing depth within each cohort. Our results are available for exploring and download at the following URL: https://crc-microbiome.stanford.edu.
Figure 1.
Pipeline for microbiome analysis using RNA-Seq data and analysis. (A) RNA-Seq data were processed and mapped to the Human genome. Mapped data were employed to do immune cell infiltration profiling. Unmapped reads were quality controlled and used as the input to do taxonomy classification. The downstream steps included in the pipeline are microbial abundance, differential analysis and microbe-trait correlation analysis. (B) A heatmap representing the phyla determined among the eight different data sets. The phyla of the colorectal tumor and normal microbiome’s representation is shown via a relative abundance percentage each phyla across each study. Red is indicative of a higher fraction and green indicates a lower fraction.
We identified the highest represented phyla from each cohort and made comparisons of relative percentage abundance (Table 1, Figure 2A–G). Firmicutes, Proteobacteria, Bacteroidetes and Actinobacteria were the four top ranked bacterial phyla identified from various CRC cohorts. The average relative abundance of Firmicutes (over 29.5%) had the highest average abundance across the entire cohort. This species was followed by Proteobacteria (22.4%), Actinobacteria (13.8%) and Bacteroidetes (11.5%). Variations of bacterial community composition were observed at the phylum level, such as Proteobacteria accounts for more than half of the major phyla abundance in GSE137327, whereas this species only accounted for <10% in GSE104836 (Figure 2F,E). Other noticeable phyla include Fusobacteria and Deinococcus-Thermus (5.46% and 1.17% relatively). These phyla accounted for a small proportion of the total percentages of relative abundance, respectively. Overall, the bacterial community composition variations were observed at the genus level (Supplementary Figure S2).
Figure 2.
CRC microbial composition varies across different studies. The top 12 most enriched phyla identified from each cohort: TCGA (A), GSE146889 (B), GSE50760 (C), GSE95132 (D), GSE104836 (E), GSE137327 (F) and GSE107422 (G). In the heatmap, columns correspond to microbes, and rows to different dataset. Relative fractional abundances were represented by different colors.
Comparing RNA-Seq versus 16S for identifying and characterizing CRC microbiome
One of the CRC cohorts had overlapping RNA-Seq and 16S data from the same tumors (IMS3, n = 162). We used this data for a comparison study between the two sequencing methods. The processing and analysis of microbial reads derived from RNA-seq data were described above. The raw 16S sequencing data were processed using DADA2 and phyloseq pipelines. Adapters, low quality bases and amplification primers were filtered out. Approximately 95% of 16S rRNA sequences passed our quality control measures, bringing in an average of 47 000 reads per sample for taxonomy assignments using Silva v132 annotation. DADA2 detected 531 ASVs, after removal of ASVs that were not present in at least one read count in 1% of the samples (Supplementary Table S2). Fifteen and 172 bacterial taxons were observed from ASV on the phylum- and genus-level, respectively.
From this data set, we compared the RNA-Seq and 16S methods as a way of evaluating the accuracy of RNA-seq-based microbiome phylum and genus level characterization. There were 12 common phyla identified from these two platforms (Supplementary Figure S3). Actinobacteria, Proteobacteria, Firmicutes and Bacteroidetes were the four most prevalent (Supplementary Figure S3a) and abundant phyla (Supplementary Figure S3b). High Pearson correlation coefficients were observed for phylum-level prevalence (0.977) and abundances (0.962) between 16S and RNA-Seq data (Supplementary Figure S4a,b).
We determined that the microbial diversity via a Shannon index estimate from RNA-Seq data was significantly higher compared to the results from the 16S data at the phylum/genus levels (Supplementary Figure S3). Statistical significance was demonstrated using pairwise Wilcox test (P < 2e-16). A total of 89 overlapped genera were evident when comparing these two different methods (Supplementary Table S3). Bacteroides and Faecalibacterium were the two most enriched genera identified both from 16S and RNA-Seq data (Supplementary Figure S3a,b). The differences between the two were largely due to the viral genome species that were present in the RNA-Seq data. Overall, these results support that RNA-Seq analysis of CRC can determine microbiome features that overlap with 16S sequencing. Thus, we opted to focus on using the RNA-Seq data for the remainder of the study given the large number of colorectal cancers which had this type of publically available data.
The CRC consensus microbiome
From the 924 CRCs and the tumor RNA-Seq data, the high-quality unmapped reads underwent Kraken2 processing and species classification. All tumors were used regardless of whether there was a matched normal or not. The range of species identified prior to consensus filtering was from 731 (IMS3) to 4187 (TCGA) (Table 1). To determine the microbiome features that were generalizable across the entire cohort, we created a union matrix representing all samples and different species across the entire cohort. Subsequently, we applied a 1% prevalence filter, retaining only the bacterial and viral species above this frequency threshold. Among the eight studies, 126 microbial species were obtained from the 924 CRC tumors. We conducted an additional level of filtering to determine if identify any species typically associated with contaminant artifacts. There were a few species that are known environmental contaminants, such as microbe belonging to the the Cutibacterium and Methylobacterium genera. These contaminants and others were removed, which resulted in a final consensus list of 114 microbial species associated with CRCs (Table 1 and Supplementary Table S4). All species were present for all sample sets included in the study.
We compared the 114 consensus species with the 61 microbial biomarkers identified from meta-analysis of several fecal cohorts (37). Fourteen species (F. nucleatum, F. prausnitzii, F. plautii, B. longum, G. morbillorum, S. thermophiles, B. fragilis, P. intermedia, P. asaccharolytica, R. intestinalis, E. coli, S. sputigena, P. micra and E. hallii) were commonly observed in both tissue and fecal microbiomes and half of them belong to the Firmicutes phylum. All remaining species were distinct and specific to either tissue or fecal microbiome, suggesting major differences in the microbial compositions between these two habitats.
To determine if this consensus list represented potential contaminants, we used a blacklist of contaminating microbiome genera that was generated by Poore et al. (19). As previously noted, Poore et al. only examined the microbiome at the level of genera which included their contamination list. Using a combination of their computation and manual annotated blacklist, we used a non-duplicated blacklist of 272 genera (Supplementary Table S5).
Our consensus list of CRC-associated bacteria had 64 non-overlapping genera. We compared the consensus CRC microbiome list with the blacklist genera comparison and over 95% (109/114) of the bacterial species were not on the blacklist. This result shows that our consensus list was not skewed by contaminating genera as described by Poore et al. A total of five bacteria had genera that overlapped with the blacklist. On reviewing the literature we determined that four of these species had been previously identified in the colon fecal microbiome including Alcanivorax sp. N3-2A (38), Hungatella hathewayi (39), Janibacter indicus (40) and Variovorax sp. PMC12 (38). The fifth was Pseudoxanthomonas suwonensis for which we did not identify a previous report.
Bacteroidetes and Firmicutes species account for a significant proportion of the 114 microbial list (Figure 3). From our consensus CRC microbiome, >33% of these species belong to the class of Clostridia (Figure 3A). This class was the most frequently occurring among our cohort. Species belonging to Bacteroidetes (23.5%), Proteobacteria (16.5%) and Actinobacteria (10.4%) were the second, third and fourth most predominant phyla among the components of the consensus microbiome. Most members of the Clostridia have a commensal relationship with the host and are involved in the maintenance of intestinal health (41). Other well-characterized fecal species included Bacteroides megaterium, Bacteroides fragilis, Escherichia coli, Bacillus cereus, Faecalibacterium prausnitzii, Bacteroides vulgatus and Prevotella intermedia and were among the most abundant species of all the CRC samples across the cohort (Supplementary Table S3). Validating our analysis results, F. nucleatum was common among the tumors, has been previously associated with CRC and has a mechanistic contribution towards colon cancer growth.
Figure 3.
Colorectal cancer’s consensus microbial species. (A) Balloon plot to summarize and compare the taxa distribution for the 114 species at the phylum (x-axis) and class (y-axis) levels, where the area and color of the dots were proportional to their numerical value (Freq). (B) A correlation plot of the eight CRC cohorts. Red represents positive correlation and blue negative correlations. This correlogram indicates that the area and color of the dots are proportional to their correlation coefficients. (C) The PCA-based on Bray-Curtis dissimilarity was used to estimate the beta diversity of the cohorts. (D) This panel is a correlation heatmap. Panels (B and E) share the same color legend. (E) A correlation heatmap of the 114 species with stroma, immune, and TME scores derived from the same tissue. The rows are specific microbe species found in our colorectal cancer consensus microbiome. The columns are labeled with the cell type summaries derived from the RNA-seq data.
A number of these microbiome species are pathogens with the most prevalent being Clostridium difficile. Infection with this species leads to an infectious diarrhea and is also associated with inflammatory bowel disease (IBD) (42). Clostridium perfringens is one of the most common causes of food poisoning in the United States (43). Besides the pathogens such as Clostridium difficile and Clostridium perfringens, other potentially pathogens such as Akkermansia muciniphila is a mucin-degrading bacterium. Pasteurella multocida can cause a range of diseases in animals and humans, particularly for skin infections.
Other species are commensal elements of normal gut microbiota. Some of them possess probiotic properties, like Bacteroides xylanisolvens and Bacteroides ovatus. Others play important roles in other mammalian species and extrinsic metabolic processes. For example, Lachnospiraceae bacterium can ferment polysaccharides into short-chain fatty acids and alcohols (44). Bacteroides cellulosilyticus, a strictly anaerobic cellulolytic bacterium, metabolizes cellulose to smaller molecules and ferment various carbohydrates (45). Ruthenibacterium lactatiformans is characterized by fermentative metabolism (46). Bacillus megaterium has probiotic potential (47).
Consensus microbiome from matched normal colon tissue
From the 298 matched normal colon tissue and their RNA-Seq data, we applied the same bioinformatic process, quality control filtering and prevalence analysis with a union matrix. The range of species identified prior to consensus filtering was from 635 (IMS3) to 3763 (TCGA) (Table 1). From this analysis, there were 153 species consistently found among all matched normal tissues (Supplementary Table S6). More than half of the species were identical to the tumors’ 114 species list (Supplementary Figure S5). Interestingly, the remaining consensus mucosa-associated microbiome from normal colon tissue was quite different from tumors, with a large proportion of them came from the Proteobacteria and Firmicutes phyla. Proteobacteria spp occurs as a free-living species which can be identified within the colon microbiota. Firmicutes phyla, especially in the class of Clostridia were enriched in normal tissues, suggesting that Clostridia spp. were potentially beneficial microbes. When compared to the matched normal tissue set, 29 species were only present in tumor tissues, this included two Fusobacteria species and several other known pathogens (Streptococcus spp. and Prevotella spp.).
We investigated the geographic associations of the eight datasets using the abundances of the selected 114 microbial species specific to CRC tumors. We conducted a correlation studies across the different data sets (Figure 3B–D). Our analysis included a correlation plots and a PCA-based on Bray-Curtis dissimilarity anlaysis that estimateed the beta diversity of the cohorts. The GSE137327 microbial profile was negatively associated with other cohorts, suggesting that the choice of sequencing platform may have affected the microbiome profile (Figure 3B,E). This data set was generated from a different sequencing platform, the BGI-Seq system versus the remainder of the studies which were Illumina-based. Several Asian cohorts from different geographic locations were represented in this study. This included GSE107422 and GSE50760 where the tumor samples originated from South Korea. For the GSE104836 cohort, the samples originated from mainland China. The CRCs from all three of these studies were part of a distinct cluster with Asian origins (Figure 3B,E). The TCGA samples originated from a variety of different countries including Brazil, Germany, Israel, Poland, Russia, Ukraine, United States, Vietnam (Table 1). Thus, there was an international geographic representation of CRCs from the TCGA data set (Supplementary Table S7).
TME and immune cell correlations with the CRC consensus microbiome
Using the same RNA-Seq data per sample, we determined the immune cell estimations of each CRC across the cohort. Currently, one can use bulk RNA-Seq data to infer the proportions of individual cell types from tumor samples. This process is generally referred to as cell deconvolution. To conduct this study we used the program xCell to make estimates about the relative cell populations among the CRC RNA-Seq data sets (26). This program deconvolutes gene expression data to identify the relative representation of 64 immune and stromal cell types.
After deconvoluting the gene expression data from our cohort, we determined the association of TME components that are referred to as the immune, stroma and TME scores. For this analysis, we used the 114 tumor-specific microbial species from the CRC consensus microbiome. As an aggregate indicator of different types of cellular composition, Xcell provided each CRC with an immune score (the sum of all immune cell types), stroma score (the sum of all stroma cell types) and TME score (the sum of all immune and stromal cell types) (26). Stroma scores were mostly negatively correlated with the 114 CRC species compared to immune and TME scores (Figure 3E).
Thirty-eight microbial species were selected which have significant associations with specific types of the immune cells (Spearman correlations, FDR < 0.05) (Figure 4A). Natural killer (NK) cells had a positive correlation with the majority of the selected microbial species. CD4 T, CD8 T, naïve/pro B and T regulatory cells play opposite correlation patterns with NK cells (Figure 4A). In other words, these cells were negatively correlated with more than half of the CRC consensus microbes. Victivallales bacterium (CCUG447300) was one of the species that had a significant positive correlation with NK cell’s enrichment in the CRC TME (Spearman’s rho = 0.57; P < 1e-4). This species was significantly negative correlated with the CD4 naive T cell’s abundance in the TME (Spearman’s rho = -0.45; P < 1e-4).
Figure 4.
Selected microbial species correlate with immune cells. Heatmaps show the correlation patterns between microbes and immune cells in the tumor microenvironment that include (A) tumor and normal samples (B). In the heatmaps, columns correspond to cell types, and rows to microbes. Spearman correlation values were represented by different colors, red means higher correlations, and green, lower ones.
Immune cells and microbe’s correlations were also investigated in the adjacent normal samples (Figure 4B). NK cells were positively correlated with the selected thirty-eight microbial species. More negative correlations than positive correlations between immune cells and microbes can be seen from the heatmap (Figure 4B). Distinct correlation patterns have been found between tumor and normal tissues. For example, macrophages and CD4 memory T cells were generally positively correlated with the selected species in tumor samples; however, the correlations changed to negative in adjacent normal tissues. Similarly, we found that T regulatory, CD4 T, T helper2, dendritic cells and monocytes had different correlation patterns in tumor (positive) and normal (negative) samples. A group of microbial species such as B. helcogenes, L. bacterium, P. cangingivalis, S. sputigena and E. harbinense have shown similar trends of correlations with a subset of immune cells (DC, NK and other innate immune cells), indicating that there were some microbe-microbe interactions between them.
Comparison of CRC versus matched normal microbiomes
To determine the differences between matched normal and colon tissue microbiomes, we used the IMS3 and TCGA tumors. These two sample sets had sufficiently large numbers of matched normal tumor pairs to perform statistically meaningful differential analysis. We compared and identified microbial compositions between tumor and adjacent normal tissues at different taxonomic levels (phylum/genus/species).
Variations in the microbial phyla, genera and species relative abundances were observed between tumor and normal groups, respectively (Figure 5). More specifically, at the level of the phylum, increased proportion of Fusobacteria and virus (adjusted P < 0.01) as well as depletion of Bacteroidetes (adjusted P < 0.01) were detected in tumors. For example, the average percentage of the viral constituents among the total tumor microbiota was 30.70% compared to 11.43% in the adjacent normal tissues. The relative abundances of Bacteroidetes (38.29% versus 22.56%) in normal tissue was detected at a higher percentage than in the tumor tissues. Significant differences in the abundances of three genera were observed between tumor and adjacent normal tissues. These genera were all under the above-mentioned phyla such as Fusobacteria and Bacteroidetes, which followed the same trend with the fold changes we observed at the phylum level.
Figure 5.
Tumor microbial composition is different from that of adjacent normal. The top 12 most abundant phyla (A) and genera (B) distribution plots for tumor and adjacent normal. Differentially enriched/depleted microbial species between tumor and adjacent normal (C), x-axis: log 2 fold changes, y-axis: microbial species names, and colors labeled their phylum levels. Comparison of Alpha diversity (Shannon index) between tumor and normal at the phylum (D), genus (E) and species (F) levels.
A total of 13 microbial species were significantly differentiated between the tumor and normal groups with an adjusted P < 0.05 (Figure 5C and Supplementary Table S8) in IMS3 cohort. For example, high abundance of F. nucleatum and Pasteurella multocida were identified among the tumors compared to matched normal tissue as noted by fold changes >1. The remaining 11 species were all decreased in tumor. For instance, six members within the order Chlostridales, namely, three Lachnospiraceae and three Ruminococcaceae were depleted in tumors. Lachnospiraceae are generally beneficial microorganisms that work to fight off colon cancer by producing butyric acid (48). Faecalibacterium prausnitzii from the family of Ruminococcaceae was notable as one of the most prominent commensal bacteria in the human gut. The remaining five species that were differentially lower in their tumor presence included Collinsella aerofaciens, and four members within the order Bacteroidales (three Bacteroides and one Parabacteroides genera). The overall diversity of the microbial community significantly decreased in tumors compared to the matched normal tissue at the species level (Figure 5F). However, the microbial diversity in the tumors relative to their matched normal tissue was not significantly different at the phylum and genus levels.
From the TCGA dataset, we obtained 129 differential enriched/depleted microbial species with adjusted-P <0.05 in CRC (Supplementary Table S9). Among them, seven species (F. nucleatum, F. prausnitzii, Fusobacterium plautii, Ruthenibacterium lactatiformans, Lachnospiraceae bacterium, Lachnospiraceae bacterium Choco86 and Bacteroids vulgatus) overlapped with the 13 species we identified from the IMS3 cohort. Fusobacterium nucleatum was found to be enriched in the tumor tissues in both the TCGA and IMS3 datasets, whereas the remaining six species were all depleted in tumors.
A web-based CRC Microbiome Explorer Interface
To enable access to the study’s results, we created an interactive database entitled the CRC Microbiome Explorer (https://crc-microbiome.stanford.edu/). The CRC Microbiome Explorer enables the user to query the database by ‘Study’ or ‘Patient’. The ‘Study’ query displays an overview bar plot of the top 12 microbial phyla across all normal when available and tumor samples in the queried study. Additionally, users can view a bar plot that displays the microbiome composition of each individual patient in the study. Alternatively, the user can select a specific patient to submit for a ‘Patient’ query that generates a Sankey plot displaying the top 10 microbial genii in each patient sample. The patient-level microbial abundance data is also available as a searchable and sortable table. Kraken2 output files from each study are available for download as tar archived files from the ‘Data Download’ tab.
DISCUSSION
The human microbiome is associated with human health, and dysbiosis can lead to a variety of disease such as colon cancer (49). The colon is the site of one of the most diverse human microbiomes (3,50). CRC is a heterogeneous malignancy with distinct molecular features and clinical outcomes among patients. Besides genetic alterations, the gut microbiome may play a role in CRC initiation and progression (11,13,51,52). Most studies on CRC microbiota so far are conducted on fecal samples, which are obtained through non-invasive methods and are widely available compared to tissue samples. When considering the examination of the fecal versus mucosa-associated microbiomes, the analysis of tissues is more directly related to the microbiota contributions to the cellular physiopathology of CRC (15). Thus, studying the microbiome in direct contact with the CRC’s microenvironment is important for revealing potential interactions and relationships. Moreover, microorganisms in the gut microbiota interact which changes the representation of any given species. In addition, it is estimated that >60–80% of the microbes are nearly impossible to culture using conventional microbiology techniques (15). Thus, culture-independent analysis using high-throughput sequencing provides an opportunity to identify species that otherwise would be missed. Overall, we conducted this NGS-based microbial study including a series of different CRC tissue cohorts to identify tumor specific microbial profiles for future clinical use. We also investigated infiltrated immune and stroma components in the TME as the phenotype of interest to link them with the marker microbial species we identified. Our results are available at a genomic web resource for the CRC consensus microbiome (https://crc-microbiome.stanford.edu).
Tumor-promoting effects of the microbiome in CRC occurs through a dysbiosis mechanism, rather than by infections with specific pathogens (8). This is different from the role of Helicobacter pylori in the pathogenesis of gastric carcinoma (53), where bacteria is widely recognized as a microbial carcinogen and the most important known risk factor for GC. Through our analysis, we found that CRC patients are characterized by the enrichment of a set of microbes which can have pathogenic effects in some circumstances as well as depletion of health-related microorganisms. For example, we identified 13 differentially enriched/depleted microbes using 162 paired tumor and normal tissues samples. Among them, F. nucleatum has been detected as a predominant species in tumors, which match well with previous studies. Several members of Clostridia possess the properties of fermenting diverse plant polysaccharides, which are beneficial to human health, were found to be depleted in CRC tissues.
We defined a consensus CRC microbiota by searching the most prevalent microbial species across several different cohorts, which can be a valuable resource for future studies. Importantly, this set of microbes are present regardless of the patients’ origins over a diverse range of geographic locations and ethnicities. This consensus represents species that may interact with the cellular tumor microenvironment of CRC. As an additional evaluation of the quality of this consensus microbiome, we determined if these species had been previously reported in in the literature as component species of the normal colon microbiome.
The connection between microbiome and CRC is likely to be bidirectional: microbiome changes may happen because of CRC development but may also contribute to CRC progression (54). Integration information across datasets provided key insights into the gut microbiota of CRC patients. In conclusion, we identified tumor-specific bacteria patterns and signatures, which might serve as biomarkers for the prognosis of CRC. Our future works include identified prognostic microbial signatures across various cancer types, and translating the microbiome biomarkers to the clinic.
DATA AVAILABILITY
All results from this study are available from the following URL: (https://crc-microbiome.stanford.edu). Sequence data are available at the TCGA COAD study from the NCI’s Genomic Data Commons website: https://portal.gdc.cancer.gov/projects. Additional data sets were available from the NIH’s GEO website from the following studies and their GEO identifiers (GSE107422, GSE146889, GSE50760, GSE95132, GSE104836, GSE137327). The IMS3 data sets are available at the online repository. The scripts used in this study are available in an online repository (https://github.com/sgtc-stanford/crc-microbiome).
Supplementary Material
ACKNOWLEDGEMENTS
Additional support came from the Clayville Foundation (S.U.G., S.M.G., H.P.J.).
Author Contributions: L.Z., H.L. and H.P.J. designed the study. M.K. conducted sequencing. G.S and C.S. optimized the isolation process. L.D.N. oversaw the clinical and samples resources for a the IMS samples. L.Z. developed the bioinformatic pipelines and analysis algorithms. L.Z., S.M.G., H.L. and H.P.J. analyzed the data. S.U.G. and S.M.G developed the website resource. L.Z., S.U.G. and H.P.J. wrote the manuscript.
Contributor Information
Lan Zhao, Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
Susan M Grimes, Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
Stephanie U Greer, Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
Matthew Kubit, Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
HoJoon Lee, Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
Lincoln D Nadauld, Intermountain Precision Genomics Program, Intermountain Healthcare, Saint George, UT 84790, USA.
Hanlee P Ji, Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA; Stanford Genome Technology Center, Stanford University, Palo Alto, CA 94304, USA.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Cancer Online.
FUNDING
Innovation in Cancer Informatics (to L.Z., H.L., H.P.J.); National Institutes of Health [2R01HG006137-04 to S.U.G., S.M.G., H.P.J.].
Conflict of interest statement. None declared.
REFERENCES
- 1. Micah H., Claire F.-L., Rob K.. The human microbiome project: exploring the microbial part of ourselves in a changing world. Nature. 2007; 449:804–810.17943116 [Google Scholar]
- 2. Knight R., Callewaert C., Marotz C., Hyde E.R., Debelius J.W., McDonald D., Sogin M.L.. The microbiome and human biology. Annu. Rev. Genomics Hum. Genet. 2017; 18:65–86. [DOI] [PubMed] [Google Scholar]
- 3. Quigley E.M.M. Gut bacteria in health and disease. Gastroenterol. Hepatol. 2013; 9:560–569. [PMC free article] [PubMed] [Google Scholar]
- 4. Carding S., Verbeke K., Vipond D.T., Corfe B.M., Owen L.J.. Dysbiosis of the gut microbiota in disease. Microb. Ecol. Health Dis. 2015; 26:26191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Turnbaugh P.J., Hamady M., Yatsunenko T., Cantarel B.L., Duncan A., Ley R.E., Sogin M.L., Jones W.J., Roe B.A., Affourtit J.P.et al.. A core gut microbiome in obese and lean twins. Nature. 2009; 457:480–484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Arslan N. Obesity, fatty liver disease and intestinal microbiota. World J. Gastroenterol. 2014; 20:16452–16463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Qin J., Li Y., Cai Z., Li S., Zhu J., Zhang F., Liang S., Zhang W., Guan Y., Shen D.et al.. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature. 2012; 490:55–60. [DOI] [PubMed] [Google Scholar]
- 8. Schwabe R.F., Jobin C.. The microbiome and cancer. Nat. Rev. Cancer. 2013; 13:800–812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Fleming M., Ravula S., Tatishchev S.F., Wang H.L.. Colorectal carcinoma: Pathologic aspects. J. Gastrointest. Oncol. 2012; 3:153–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Simon L.M., Karg S., Westermann A.J., Engel M., Elbehery A.H.A., Hense B., Heinig M., Deng L., Theis F.J.. MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data. Gigascience. 2018; 7:giy070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Gao Z., Guo B., Gao R., Zhu Q., Qin H.. Microbiota disbiosis is associated with colorectal cancer. Front. Microbiol. 2015; 6:20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Ames N.J., Ranucci A., Moriyama B., Wallen G.R.. The human microbiome and understanding the 16S rRNA gene in translational nursing science. Nurs. Res. 2017; 66:184–197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Sobhani I., Tap J., Roudot-Thoraval F., Roperch J.P., Letulle S., Langella P., Corthier G., Van Nhieu J.T., Furet J.P.. Microbial dysbiosis in colorectal cancer (CRC) patients. PLoS One. 2011; 6:e16393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Yu J., Feng Q., Wong S.H., Zhang D., Liang Q.Y., Qin Y., Tang L., Zhao H., Stenvang J., Li Y.et al.. Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut. 2017; 66:70–78. [DOI] [PubMed] [Google Scholar]
- 15. Villéger R., Lopès A., Veziant J., Gagnière J., Barnich N., Billard E., Boucher D., Bonnet M.. Microbial markers in colorectal cancer detection and/or prognosis. World J. Gastroenterol. 2018; 24:2327–2347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Kostic A.D., Gevers D., Pedamallu C.S., Michaud M., Duke F., Earl A.M., Ojesina A.I., Jung J., Bass A.J., Tabernero J.et al.. Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome Res. 2012; 22:292–298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Mima K., Nishihara R., Qian Z.R., Cao Y., Sukawa Y., Nowak J.A., Yang J., Dou R., Masugi Y., Song M.et al.. Fusobacterium nucleatum in colorectal carcinoma tissue and patient prognosis. Gut. 2016; 65:1973–1980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Rubinstein M.R., Wang X., Liu W., Hao Y., Cai G., Han Y.W.. Fusobacterium nucleatum promotes colorectal carcinogenesis by modulating E-cadherin/β-catenin signaling via its FadA adhesin. Cell Host Microbe. 2013; 14:195–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Poore G.D., Kopylova E., Zhu Q., Carpenter C., Fraraccio S., Wandro S., Kosciolek T., Janssen S., Metcalf J., Song S.J.et al.. Microbiome analyses of blood and tissues suggest cancer diagnostic approach. Nature. 2020; 579:567–574. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 20. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011; 17:10–12. [Google Scholar]
- 21. Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R.. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29:15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Callahan B.J., McMurdie P.J., Rosen M.J., Han A.W., Johnson A.J.A., Holmes S.P.. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods. 2016; 13:581–583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Wood D.E., Lu J., Langmead B.. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019; 20:257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Eisenhofer R., Minich J.J., Marotz C., Cooper A., Knight R., Weyrich L.S.. Contamination in low microbial biomass microbiome studies: issues and recommendations. Trends Microbiol. 2019; 27:105–117. [DOI] [PubMed] [Google Scholar]
- 25. Robinson M.D., McCarthy D.J., Smyth G.K.. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26:139–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Aran D., Hu Z., Butte A.J.. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017; 18:220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. McMurdie P.J., Holmes S.. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One. 2013; 8:e61217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012; 487:330–337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Kim S.-K., Kim S.-Y., Kim C.W., Roh S.A., Ha Y.J., Lee J.L., Heo H., Cho D.-H., Lee J.-S., Kim Y.S.et al.. A prognostic index based on an eleven gene signature to predict systemic recurrences in colorectal cancer. Exp. Mol. Med. 2019; 51:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Kim S.-K., Kim S.-Y., Kim J.-H., Roh S.A., Cho D.-H., Kim Y.S., Kim J.C.. A nineteen gene-based risk score classifier predicts prognosis of colorectal cancer patients. Mol. Oncol. 2014; 8:1653–1666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Hanley M.P., Hahn M.A., Li A.X., Wu X., Lin J., Wang J., Choi A.H., Ouyang Z., Fong Y., Pfeifer G.P.et al.. Genome-wide DNA methylation profiling reveals cancer-associated changes within early colonic neoplasia. Oncogene. 2017; 36:5035–5044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Li M., Zhao L.-M., Li S.-L., Li J., Gao B., Wang F.-F., Wang S.-P., Hu X.-H., Cao J., Wang G.-Y.. Differentially expressed lncRNAs and mRNAs identified by NGS analysis in colorectal cancer patients. Cancer Med. 2018; 7:4650–4664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Wang H., Wang D.H., Yang X., Sun Y., Yang C.S.. Colitis-induced IL11 promotes colon carcinogenesis. Carcinogenesis. 2020; 42:557–569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. DiGuardo M.A., Davila J.I., Jackson R.A., Nair A.A., Fadra N., Minn K.T., Atiq M.A., Zarei S., Blommel J.H., Knight S.M.et al.. RNA-Seq Reveals Differences in Expressed Tumor Mutation Burden in Colorectal and Endometrial Cancers with and without Defective DNA-Mismatch Repair. J. Mol. Diagn. 2021; 23:555–564. [DOI] [PubMed] [Google Scholar]
- 35. Cancer Genome Atlas N. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012; 487:330–337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Cerami E., Gao J., Dogrusoz U., Gross B.E., Sumer S.O., Aksoy B.A., Jacobsen A., Byrne C.J., Heuer M.L., Larsson E.et al.. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012; 2:401–404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Thomas A.M., Manghi P., Asnicar F., Pasolli E., Armanini F., Zolfo M., Beghini F., Manara S., Karcher N., Pozzi C.et al.. Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat. Med. 2019; 25:667–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Murphy K., Curley D., O’Callaghan T.F., O'Shea C.A., Dempsey E.M., O’Toole P.W., Ross R.P., Ryan C.A., Stanton C. The composition of human milk and infant faecal microbiota over the first three months of life: A Pilot Study. Sci. Rep. 2017; 7:40597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Ohara T. Identification of the microbial diversity after fecal microbiota transplantation therapy for chronic intractable constipation using 16s rRNA amplicon sequencing. PLoS One. 2019; 14:e0214085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Hoyles L., Honda H., Logan N.A., Halket G., La Ragione R.M., McCartney A.L.. Recognition of greater diversity of Bacillus species and related bacteria in human faeces. Res. Microbiol. 2012; 163:3–13. [DOI] [PubMed] [Google Scholar]
- 41. Lopetuso L.R., Scaldaferri F., Petito V., Gasbarrini A.. Commensal Clostridia: leading players in the maintenance of gut homeostasis. Gut Pathog. 2013; 5:23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Bien J., Palagani V., Bozko P.. The intestinal microbiota dysbiosis and Clostridium difficile infection: is there a relationship with inflammatory bowel disease?. Therap. Adv. Gastroenterol. 2013; 6:53–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Center for Disease Control and Prevention Foodborne germs and illnesses. Centers Disease Control Prevent. 2016; (Last accessed date August 14, 2021)https://www. cdc. gov/foodsafety/foodborne-germs.html. [Google Scholar]
- 44. Boutard M., Cerisy T., Nogue P.-Y., Alberti A., Weissenbach J., Salanoubat M., Tolonen A.C.. Functional diversity of carbohydrate-active enzymes enabling a bacterium to ferment plant biomass. PLoS Genet. 2014; 10:e1004773. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Robert C., Chassard C., Lawson P.A., Bernalier-Donadille A.. Bacteroides cellulosilyticus sp. nov., a cellulolytic bacterium from the human gut microbial community. Int. J. Syst. Evol. Microbiol. 2007; 57:1516–1520. [DOI] [PubMed] [Google Scholar]
- 46. Shkoporov A.N., Chaplin A.V., Shcherbakova V.A., Suzina N.E., Kafarskaia L.I., Bozhenko V.K., Efimov B.A.. Ruthenibacterium lactatiformans gen. nov., sp. nov., an anaerobic, lactate-producing member of the family Ruminococcaceae isolated from human faeces. Int. J. Syst. Evol. Microbiol. 2016; 66:3041–3049. [DOI] [PubMed] [Google Scholar]
- 47. Afrilasari W., Meryandini A. Effect of probiotic Bacillus megaterium PTB 1.4 on the population of intestinal microflora, digestive enzyme activity and the growth of catfish (Clarias sp.). HAYATI J. Biosci. 2016; 23:168–172. [Google Scholar]
- 48. Meehan C.J., Beiko R.G.. A phylogenomic view of ecological specialization in the Lachnospiraceae, a family of digestive tract-associated bacteria. Genome Biol. Evol. 2014; 6:703–713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Ahn J., Chen C.Y., Hayes R.B.. Oral microbiome and oral and gastrointestinal cancer risk. Cancer Causes Control. 2012; 23:399–404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Nagpal R., Yadav H., Marotta F.. Gut microbiota: the next-gen frontier in preventive and therapeutic medicine?. Front. Med. 2014; 1:15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Claesson M.J., Jeffery I.B., Conde S., Power S.E., O’Connor E.M., Cusack S., Harris H.M.B., Coakley M., Lakshminarayanan B., O'Sullivan O.et al.. Gut microbiota composition correlates with diet and health in the elderly. Nature. 2012; 488:178–184. [DOI] [PubMed] [Google Scholar]
- 52. Hughes L.A.E., J. C.C., van den Brandt P.A., van Engeland M., Weijenberg M.P. Lifestyle, diet, and colorectal cancer risk according to (epi)genetic instability: current evidence and future directions of molecular pathological epidemiology. Curr. Colorect. Cancer Rep. 2017; 13:455–469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Wroblewski L.E., Peek R.M. Jr, Wilson K.T. Helicobacter pylori and gastric cancer: factors that modulate disease risk. Clin. Microbiol. Rev. 2010; 23:713–739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Zitvogel L., Daillère R., Roberti M.P., Routy B., Kroemer G.. Anticancer effects of the microbiome and its products. Nat. Rev. Microbiol. 2017; 15:465–478. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All results from this study are available from the following URL: (https://crc-microbiome.stanford.edu). Sequence data are available at the TCGA COAD study from the NCI’s Genomic Data Commons website: https://portal.gdc.cancer.gov/projects. Additional data sets were available from the NIH’s GEO website from the following studies and their GEO identifiers (GSE107422, GSE146889, GSE50760, GSE95132, GSE104836, GSE137327). The IMS3 data sets are available at the online repository. The scripts used in this study are available in an online repository (https://github.com/sgtc-stanford/crc-microbiome).