Preface
The presence of microbiota in human tumors has been reported by many groups based on bioinformatic searches of DNA sequence databases. The source of these microbial sequences in atypical sites is difficult to determine with certainty, but they could derive from nucleic acids acquired during the sampling, storage/handling and processing of samples, as observed earlier in studies of ancient DNA. Another source of spurious microbial signals is contamination of microbial reference genomes causing human reads to be wrongly classified. We present a series of quality controls and validation approaches summarized in a checklist, that should be considered in any study attempting to find microbes in tumor tissues. This provides a constructive path forward to improve the rigor and standards of studies exploring the role of microbes in human cancers.
Introduction
Multiple sequencing-based studies have reported the presence of bacterial, fungal, viral, and archaeal DNA in human tissues across diverse cancer types1-6. In many of these studies, microbes were detected by searching through the non-human DNA (or RNA) sequences generated by high-throughput sequencing experiments, and comparing those sequences to microbial genome databases using various computational tools7-9. A recent retraction10 as well as the discussion it generated has raised the question of whether some of these findings need to be re-examined, and is being debated intensely in the fields of genomics, microbiology, immunology and oncology11-14. In this perspective, we discuss the need for more rigorous standards for reporting the presence of microbes in human cancers, to avoid incorrect findings that undermine scientific literature and lead to futile investments of time and research effort.
Dramatic reductions in the cost of sequencing over the last decade and the corresponding availability of large cancer genomic datasets15 has enabled studies that have sought to explore the presence of microbes in diverse cancer types. Some of these efforts have helped to shortlist specific microbial species or strains for further molecular characterization of how these microbes could contribute to disease phenotypes such as pro-inflammatory metabolite production by Fusobacterium nucleatum C216 or PI3K signalling by Peptostreptococcus anaerobius17. However, many cancer microbiome studies rely primarily or solely on the analysis of sequencing data to arrive at their conclusions, and often without sufficient controls (reviewed in Cullin et al18 and Knippel et al19). While various higher-quality reports have sought to address concerns on the reliability of findings by sequencing negative controls20-22, detecting microbes with in situ imaging22-24, isolating microbes from tissues20,21, and detecting microbial antigens within tumors25, the application of these ideas is by no means uniform, and correspondingly studies vary highly in terms of their quality and reliability of findings. Even some relatively well-controlled investigations can still suffer from one or more shortcomings (and often all three), including: (a) data from positive and negative control experiments were limited or missing24,26, (b) the identified organisms have metabolic requirements and niches that are far removed from the body sites they were reported in23, and (c) the microbial species reported within tumors were not independently confirmed through other assays5, which raises questions about the origin of these non-human DNA sequences. For example, despite efforts to control for contaminants, Pushalkar et al23 reported high prevalence and abundance of Elizabethkingia (typically found in soil and hospital environments27) in human pancreatic ductal adenocarcinomas (PDA), which was not subsequently observed in a separate larger dataset28. This begs the question, where did the Elizabethkingia DNA come from in the first study? Was it a real signal or simply an environmental contaminant? Overall, such cases underscore the importance of establishing standardized checks and guidelines in the field to enhance the reproducibility and reliability of findings.
Of note, the idea that both viruses and bacteria can contribute to the development of cancers by triggering DNA mutations, or by activating oncogenic signaling pathways in pre-malignant cells is well known29,30. For bacteria, this is most established in gastric tumors31, with increasing evidence emerging for colorectal cancers32 and gallbladder cancer33, where tissues are exposed to abundantly present microbes in the gastrointestinal tract. Tissue-resident bacteria can thus provide one step in a multi-step process of transformation leading to tumors16. Furthermore, emerging evidence suggests that bacteria such as Bacteroides, Bifidobacterium and Fusobacterium in the colon can infiltrate tumors16,34,35, or produce antigens or metabolites that affect the outcome of immunotherapy36-38. Similarly, gut microbial production of metabolites and cross-talk with the immune system could influence the risk for cancer and other diseases in distant organs39,40.
Although the role of microbial colonization and function in some cancers is clearly important, the notion of cancer microbiomes in human tumor tissues at places not in contact with the external environment is fundamentally distinct. The term microbiome is generally used to refer to a community of microorganisms (e.g. bacteria, fungi, viruses) that colonize and interact in an environment. Cancer microbiome, in turn, is used to refer to the analogous concept of multi-species communities detected in a tumor tissue that presumably have similar interactions. In the human body, microbiomes are present on epithelial surfaces such as the intestines, the skin, and the oral cavity. However, internal organs and tissues are believed to rarely harbor microbes, with some entirely devoid of them (e.g. brain41 and blood 42) except during an acute infection. In fact, preventing infiltration of microbes in tissues is a major task of the immune system, which responds with strong inflammatory responses to eliminate microbes when they appear. Failure to do so may result in sepsis that is often lethal to the host.
Challenges faced by the emerging field of cancer microbiome studies
In recent years, a series of publications have reported the presence of microbiomes that include a wide range of bacterial, fungal, and viral species in tumor tissues from organs that should be microbiome-free43-45. These studies were primarily based on bioinformatic analyses of DNA/RNA sequences from tumors and sometimes from matched normal tissues for microbe discovery. In a typical experiment, human DNA/RNA sequences were removed from shotgun-sequenced data for tissue samples, and the remaining sequences were aligned to databases containing bacterial, fungal, and viral genomes. When source material was collected from preserved or banked samples, as they generally were, the authors were often unable to rigorously test whether the microbes identified computationally could be detected through other means inside the original tissues, without the risk for contamination5,10.
In addition, signatures for the activation of the immune system have rarely been tested, which is concerning because lipopolysaccharides (LPS) and bacterial DNA/RNA are strong inducers of immune responses, a point that is generally ignored in such studies. LPS in the brain can induce the expression of an array of chemokines to attract neutrophils, monocytes and T cells46. LPS can then activate many immune cells including human CD8+ T cells47. Local LPS enrichment in tissue is thus an excellent and specific immune stimulator. Of note, higher doses of LPS can induce septic shock and death due to the massive release of cytokines by activated immune cells. Furthermore, microbes can be easily detected directly by immunohistochemistry, with straightforward histopathological assessment. Few studies have reported immunodetection of LPS in cancer tissues28, and fewer still have sought to replicate it in an independent cohort48, raising the key question: what is the true origin of microbial signals seen in sequencing data and can they be robustly and systematically validated across studies in tumor tissues from organs that are not otherwise known to have microbiomes?
Recent experience illustrates that in most human tissue sequencing assays, the danger of nucleic acid contamination by external sources is high, and any analysis focusing on the non-human reads in these experiments can easily discover mostly or entirely contaminant DNA/RNA sequences. For example, microbiomes have been reported in several low-biomass tissues, including the placenta44, brain49, ocular surface50, and blood51, though the conclusions of these studies have been vigorously challenged41,42,52. Considering how human cancer tissues are prepared, microbial nucleic acids can enter at all points in the process, including surgery, transfer of material between the operating theater and the pathology department, tissue fixation (if conducted), cutting tissues into sections, storage and transport, laboratory processing for nucleic acid extraction, DNA/RNA library preparation and sequencing53. This can yield findings of large numbers of microbes that are simply the result of contamination during one or more of these steps. Many recent papers on the search for cancer microbiomes do not properly control for this possibility and might therefore be reporting on environmental or database contaminants rather than biologically meaningful results12,54,55.
In the recent past, the field of ancient DNA studies faced similar challenges to what cancer microbiome studies face today. Early studies in that field, when processing samples with relatively minor amounts of endogenous DNA, suffered from severe problems of contamination56. As a consequence, a number of extravagant claims were reported as putative DNA findings, only to be revealed later as resulting from contamination57. The field developed strict criteria to overcome such false reports, including independent replication of results in more than one laboratory58. As a result, the field is thriving today, with notable successes such as the sequencing of the Neanderthal genome59, the Denisovan genome60, and many others involving ancient plants, animals, and microbes61-63.
Below, we describe problems and suggest solutions in four distinct areas that will improve the reliability and reproducibility of studies of microbes in cancer tissues. This is critical because confirming microbial signals in cancer samples is more challenging, as certain controls used in ancient DNA studies, such as quantifying DNA damage patterns64, are not always applicable. In addition, while several studies and reviews have highlighted the care that is needed for low-biomass microbiome studies53,65, cancer tissue studies have additional considerations, including the ability to collect specific controls, the routine sequencing of normal tissue or blood samples for tumor-normal comparisons, the availability of clinical infrastructure for histology based validation (particularly in relation to host cells and their response markers), and the increasing use of large datasets from consortium studies to discover cancer-associated microbiome signatures66. Given the importance of these points for this emerging field, we have developed a specific set of recommendations below, as well as an easily referred to checklist (Table 1), to guide future studies.
Table 1:
Checklist for reports on microbial species in human cancers.
| Q# | Checklist question | Rationale | Response | Comments |
|---|---|---|---|---|
| Sample acquisition | ||||
| 1 | What was done to reduce contamination during sample acquisition? | Important control point; difficult to identify contaminants introduced here | Aseptic techniques post tissue harvest: ☐ | |
| 2 | Were sampling controls collected in the same environment as the sample and were they sequenced? | Samples in each environment likely introduce a unique set of contaminant species | >1 control per sampling environment: ☐ Total: |
|
| 3 | Were matched adjacent normal tissues collected and sequenced? | Excellent control to identify shared contaminants with tumor tissues, but may exclude true signals as well | Yes: ☐ | |
| Sample handling and storage | ||||
| 3 | What was done to reduce contamination during handling and storage? | An aspect that is often ignored; labs where DNA is amplified are a great risk for contaminants | Aseptic techniques: ☐ Physical separation from DNA processing labs: ☐ |
|
| 4 | Were handling or storage controls collected and sequenced? | Can help identify contaminant signals | >1 control per storage environment: ☐ Total: |
|
| 5 | For FFPE samples, were tissue blanks processed and sequenced? | Good control for FFPE samples | >1 control per FFPE batch: ☐ Total: |
|
| Sample processing for nucleic acid analysis | ||||
| 6 | Were samples processed in a laboratory environment that minimizes nucleic acid contamination? | Laboratory contamination can easily overwhelm the signal in low biomass samples | Clean room: ☐ BSL2 lab w/aseptic techniques: ☐ Physical separation from DNA amplification rooms: ☐ |
|
| 7 | Were samples processed in two independent labs for validating signals? | Important sanity check; does not rule out upstream contamination | Yes: ☐ | |
| 8 | Were samples processed in the same lab with two different DNA isolation/library prep kits for validating signals? | Alternative, if access to independent labs was not feasible | Yes: ☐ | |
| 9 | Were blank control samples collected and sequenced for each sequencing run and batch of reagents? | Reagent batches can have unique contaminant signals | ≥1 per library prep & sequencing batch: ☐ Total: |
|
| Sample processing for imaging-based analysis | ||||
| 10 | Were “no probe” controls included? | Signals can be due to autofluorescence | Yes: ☐ | |
| 11 | Were negative control probes included in ISH experiments? | Checks for non-specific binding; probes for known contaminants can reveal distribution of environmental signals | Yes (details in comments): ☐ | |
| Sequencing | ||||
| 12 | Were all libraries quantified before they were sequenced? | Useful information to identify contaminant species | Data is available: ☐ | |
| 13 | What was done to minimize the risk of barcode hopping? | Can introduce significant read contamination | No multiplexing: ☐ Dual-indexing: ☐ |
|
| 14 | Were positive controls used to test the extent of barcode hopping? | Essential to determine the extent to which results may be affected | Estimated rate of barcode hopping: | |
| Batch information | ||||
| 15 | Has information for sample acquisition centers, date of acquisition, sample processing center and date, library preparation center and date, and sequencing batch been provided? | Key information to analyze control samples, evaluate sources of contamination and potentially account for batch effects on abundance profiles | Yes, in full: ☐ Partial (missing data explained in comments): ☐ |
|
| 16 | What was done to minimize the impact of batch variation across samples that are jointly analyzed? | Important for statistical analysis with data from multiple batches | No analysis across batches: ☐ Batch-to-batch variation was assessed to be minimal: ☐ Batch correction was performed w/appropriate positive controls: ☐ |
|
| Bioinformatic analysis of sequencing data | ||||
| 17 | Was taxonomic classification done with databases that include the highest quality reference sequences available for humans (T2T, pangenome) and potential contamination sources? | Conservative approach to reduce false positive assignments to microbial taxa | Yes (details in comments): ☐ | |
| 18 | Was the genomic distribution of reads and quality of alignments used to assess potential identification of microbial taxa? | False-positive signals often cluster unevenly and can be due to low-quality alignments | Yes (details in comments): ☐ | |
| 19 | Were potential contaminant species identified and computationally subtracted from the data? | Important step that leverages control samples to conservatively identify contaminant signals | Yes (details in comments): ☐ | |
| 20 | Is data relevant for contamination removal available? | Essential for reproducibility and reuse of data with newer bioinformatic methods | Taxonomic profiles (tumor and control): ☐ DNA quantification: ☐ Library quantification: ☐ |
|
| 21 | Were detected microbes validated based on sequencing data from another lab or another cohort? | Can be a quick sanity check if the data is already available | Yes (details in comments): ☐ | |
| Evidence from non-shotgun-metagenomic sources | ||||
| 22 | Were detected microbes validated in the same samples based on targeted approaches? | Targeted validation can help rule out bioinformatic artefacts | FISH: ☐ IF: ☐ PCR: ☐ Others (details in comments): ☐ |
|
| 23 | Were detected microbes validated to be present in or near tumor cells? | Spatial mapping of microbes in tissue can help clarify the nature of detected signals | FISH: ☐ IF: ☐ IHC: ☐ Others (details in comments): ☐ |
|
| 24 | What is the evidence for viable or metabolically active microbial cells? | The viability and/or metabolic activity of cells can help reduce concerns related to contamination by dead cells or nucleic acids | Cell culture: ☐ Targeted RNA detection: ☐ Spatial metabolomics: ☐ |
|
| Other considerations | ||||
| 25 | Were limitations of the measures taken to reduce and identify contamination acknowledged? | Contamination at any stage can impact the ability to draw conclusions from the data | Yes (details in comments): ☐ | |
| 26 | Are there multiple lines of evidence that consistently detect specific microbes in a set of samples? | Orthogonal sources of validation are essential to avoid artefacts of any one assay | Yes (details in comments): ☐ | |
Tissue samples should be consistently handled using aseptic techniques and processed in clean rooms where feasible
Minimize sample contamination during sample collection, handling and storage.
In general, the best way to avoid microbial contaminant signals from confounding and overwhelming analysis of cancer tissues is to minimize their presence in the first place. As contaminants can enter at all stages of the tissue collection and processing workflow it is critical to maintain aseptic techniques at all points (Figure 1). For example, while surgery to collect tissue specimens may follow strict aseptic procedures, what happens after the tissue is harvested is equally important and sample contamination during transport, storage and handling before it enters a genomics lab is often the hardest to account for in terms of contamination (Table 1, Q1, Q3). Correspondingly, few studies have tried to minimize contamination in these stages and indeed for archived samples this is perhaps the greatest concern10,67. Given the exquisite sensitivity of deep sequencing-based analysis, even relatively minor contamination amounts in these stages can turn a tumor tissue that lacks microbes into one that readily displays signals for diverse bacteria67.
Figure 1.

Possible sources of contamination (yellow) and analysis artefacts (orange) that can confound interpretation of true microbial signatures (green) in tumor tissues. Created in BioRender: https://BioRender.com/9pnlnor
Tissue samples should be processed in specialized clean facilities.
This is a lesson that was learnt the hard way in studies of ancient DNA and correspondingly they employ stringent measures to enable high-fidelity analyses. In particular, tissue sampling, DNA extraction, and library building takes place in clean rooms that are physically isolated from laboratories where amplification of DNA takes place68. This is because DNA amplification products provide among the greatest risks for contamination as they are produced in massive amounts and permeate the laboratory, often floating in the air. For this reason, clean labs for ancient DNA analysis have separate ventilation systems, positive air pressure, and nightly UV irradiation to minimize contamination from the outside. Laboratory surfaces are cleaned daily by bleach, as are lab equipment. Similar precautions are also commonly adopted in molecular pathology labs. Full-body suits and disposable cloth are worn in clean labs (Figure 2) and daily movement of personnel between laboratories is always up the contamination gradient, i.e. from the clean lab towards the more contaminated laboratories56. While such handling of tumor/normal tissues for microbial detection would be ideal, sadly no studies have incorporated this level of stringency. We acknowledge that access to clean rooms may not be available to many groups. Nevertheless, many hospitals have similar capabilities in pathology labs, and working in Biosafety Level 2 (BSL-2) labs with strict aseptic techniques would be an intermediate solution that is affordable and accessible, while still helping to reduce contamination risks (Table 1, Q6).
Figure 2.

Images of a clean lab where ancient DNA samples are processed with appropriate measures to minimize contamination, including full body suits and physical isolation of areas where tissues are processed, and nucleic acids are amplified.
Internal controls should be systematically built in at every stage to assess and account for microbial contamination
Negative controls should be collected at every stage of sample handling.
While reducing the risk of microbial contamination is an important strategy for improving the success rate of identifying genuine microbial signals, internal controls are needed in addition at every stage of sample handling to maximize the chances that contaminants are detected and removed from consideration. These include negative controls for sampling (>1 per sampling environment), storage (>1 per environment), DNA extraction, library preparation and sequencing (>1 per reagent and sequencing batch; Table 1, Q2, Q4, Q9) that are handled and processed under identical conditions as tumor samples, and can be sequenced to identify contaminant signals (replicates can be used to assess consistency). Potential sources for negative controls include: (i) adjacent normal tissues sampled at the same time as the tumor and handled in the same batch (ideal for capturing upstream contamination sources), (ii) blank paraffin blocks as controls for formalin fixed and paraffin embedded (FFPE) samples, that are handled and processed in the same way as the tumor tissue, (iii) environmental swabs collected in the sample collection, storage and tissue processing environments, (iv) blank samples collected in the nucleic acid processing environment (e.g. PBS), and (v) tissue blanks from an FFPE block containing the tumor tissue (Table 1, Q3, Q4, Q5, Q9). Several studies have previously highlighted the utility of such negative controls67,69, showing that sequencing of either blank (no tissue) paraffin blocks or extraction controls (no paraffin) readily detects microbial DNA reads originating from common gut, skin and oral microbes70.
Independent replication of key results is critical.
It is noteworthy that clean blank controls containing reagents, but no tissue samples, are by themselves not a guarantee for avoiding false positive results because of sample contamination and/or carrying effects. A blank control is a measure of how clean the reagents are, not the sample. Intriguingly, even if the reagents are contaminated it will not always show itself in the blank controls because minute amounts of DNA can bind to plastic surfaces and only be released by sample DNA being added and working as a carrier releasing the contaminants. The best way to assess if any positive finding derives from contamination of reagents is thus through independent replication of results by another laboratory, the rationale being that any such contaminant is unlikely to repeat itself across multiple labs (Table 1, Q7). Importantly, however, to avoid the case where a positive finding was due to microbial contamination of the sample itself, any replication study should be conducted on separate samples56. At the very least, replication studies need to be conducted with different DNA isolation and library preparation kits to rule out the strong influence of contaminants from these sources (Table 1, Q8).
Positive controls should be included to assess sensitivity and robustness of data.
While negative controls help reduce noise from contaminant sources, an equally important aspect is having positive controls for assessing (i) the sensitivity of detection methods, and (ii) to evaluate the impact of batch effects on the data (Table 1, Q14, Q16). By ensuring that microbial cell lysis methods (particularly for gram-positive bacteria and fungi) are optimized through positive control spike-ins/mock communities, studies can ensure that they improve their signal-to-noise ratio71. In addition, as large studies typically have multiple batches, and as comparison of data across batches can be significantly affected by sample processing differences (e.g. DNA extraction protocols), positive control samples are essential to evaluate and account for these differences72.
Bioinformatics analysis must be conservative and address sources of contamination
Analysis of legacy cancer genomic databases should be approached with caution.
The growing number of whole-genome shotgun (WGS) and RNA sequencing datasets available in public archives, such as those from The Cancer Genome Atlas15 (TCGA), has sparked efforts to mine these datasets for microbial signatures (Table 2). Notable publications from recent years include profiling a large cohort for evidence of a common blood microbiome42, finding cross-cohort microbial signatures in inflammatory bowel disease (IBD)73 and mining sequences from metastatic tumors for microbiomes5. While bioinformatic analysis of existing sequence datasets is an efficient way of potentially gaining valuable insights into microbial signatures on a large scale, care must be taken to ensure that the conclusions are robust and account for possible false positive signals. Investigators must be especially cautious because most publicly available datasets were not collected specifically for microbiome analysis and therefore lack appropriate negative controls, or were handled in ways that could inadvertently introduce contaminants, as discussed above. Failure to account for such artifacts can result in spurious claims of microbiomes in tissues previously thought to be sterile, as has been claimed previously for the placenta44 and for blood from healthy individuals51. Improper analysis of microbial reads in public datasets can also lead to mistaken claims that certain microbial species are associated with cancer. This is especially relevant for studies investigating fungal signatures in tumors, as they make up an even smaller proportion of microbial signals compared to bacteria. In one recent example, Aykut et al6 reported that pancreatic ductal adenocarcinomas (PDACs) had a higher burden of Malassezia spp, a genera of fungi which commonly colonizes the skin due to their metabolic dependence on host-derived lipids74. However, a subsequent re-examination of their data showed that these associations were inconsistent and based on <10 fungal reads on average in PDACs, suggesting that those findings were confounded by background contamination, compounded by the absence of appropriate negative controls75. In another example, Narunsky-Haziza et al1 utilized TCGA data to define intra-tumoral fungal signatures (“mycotypes”) associated with different survival risks. However, many reads were incorrectly classified as fungal due to the misincorporation of vector or human sequences into fungal genomes within classification databases, or due to incomplete removal of human reads54. We recognize that obtaining internal controls for historical sequencing datasets might be impossible, and we propose adopting the following data quality control steps to reduce the impact of contaminants on biological conclusions.
Table 2:
Publicly available resources for mining microbial signatures associated with cancer and normal tissues.
| No. | Name | Type | Salient Features | Caveats/Recommendations |
|---|---|---|---|---|
| 1 | The Cancer Genome Atlas Program (TCGA)15 | Omics Dataset | WGS, Exome and RNA-seq data for >20,000 tumors and matched normal tissues from 33 cancer types | Lacks negative controls; sample handling was not geared towards minimizing microbial contamination; limited experimental information (e.g. library concentrations) |
| 2 | CRC microbiome explorer97 | Omics Dataset | FFPE RNA-seq for >900 colorectal tumors; contaminant genera filtered | Lacks negative controls; sample handling was not geared towards minimizing microbial contamination; limited experimental information (e.g. library concentrations) |
| 3 | AC-ICAM98 | Omics Dataset | WGS, RNA-seq & 16S rRNA sequencing of 348 fresh-frozen primary colon tumors and matched normal tissue | Lacks negative controls; sample handling was not geared towards minimizing microbial contamination; limited experimental information (e.g. library concentrations) |
| 4 | Battaglia et al5 | Omics Dataset | WGS, RNA-seq & 16S rRNA sequencing of >4000 metastatic tumors from 26 tissue types, with negative controls | Taxonomic profiles may contain false positives due to the use of databases containing MAGs that are not of high quality; limited experimental information (e.g. library concentrations) |
| 5 | Nejman et al28 | Omics Dataset | 16S rRNA sequencing of >1500 tumors and matched normal tissues from 7 cancer types, with negative controls | Species level classification using 16S sequences alone might produce false positives due to sequence similarities99 |
| 6 | Gihawi et al12 | Analysis Results | Taxonomic profiles for 17,625 TCGA samples accounting for computational analysis artefacts | Taxonomic profiles may still contain contaminants due to the absence of negative controls or due to false positive species calls |
| 7 | Eisenhofer et al53 | Analysis Results | Meta-analysis resulting in a list of potential contaminant taxa | Published lists of contaminant genera are non-exhaustive |
| 8 | Salter et al100 | Analysis Results | 16S sequencing of samples and negative controls with batch information, yielding a list of potential contaminant taxa | Published lists of contaminant genera are non-exhaustive |
| 9 | KrakenUniq82 | Software | Taxonomic classifier that identifies likely false positives using unique k-mers | Thresholds for unique k-mers should be empirically determined for different sample types |
| 10 | Decontam101 | Software | Detects likely contaminant taxa based on analysis of taxonomic profiles | Leverages information from DNA/RNA input concentrations and/or matched negative controls |
| 11 | Squeegee102 | Software | De novo identification of contaminant taxa in the absence of negative controls and DNA/RNA input concentrations | Poorer sensitivity for low abundance contaminants102; potentially more computationally expensive compared to alternatives |
| 12 | checkM78 | Software | To assess completeness & contamination in MAGs | Use of high-quality MAGs with no/low chimerism is recommended as genomic references for taxonomic classification |
| 13 | GUNC79 | Software | Detects and quantifies chimerism in MAGs | Use of high-quality MAGs with no/low chimerism is recommended as genomic references for taxonomic classification |
| 14 | MMUPHin73 | Software | Performs batch correction of taxonomic profiles | Batch effects must not be highly confounded with the variable(s) of interest; should be used with care, only when essential, and with appropriate positive controls to assess the impact of batch correction |
Genomes for contamination sources should be included during taxonomic classification.
Sequence databases used for taxonomic classification of microbial reads should contain not only the microbial genomes being searched, but also the host genome (as well as other potential contamination sources, e.g. plant pollen or dust mites), to reduce the false positive assignment of reads to microbial genomes (Table 1, Q17). This might seem obvious, but some published studies have assumed that a pre-filtering step in which reads are aligned to the human genome will be 100% effective at removing human reads, an assumption that is rarely correct. Inclusion of the host genome in the database is particularly important in the analysis of samples with relatively low microbial biomass, as is typical for human cancer tissue samples. For example, as shown recently12, failure to include the human genome in a database resulted in millions of human reads being incorrectly identified as being microbial in a now-retracted study that claimed to find microbiome signatures in TCGA sequence data collected across 32 cancer types10. In addition, the use of more complete references such as telomere-to-telomere assemblies (T2T CHM13) or the human pangenome76 is likely to be more effective at removing human reads than commonly used references such as hg3877.
High-quality microbial reference genomes should be used.
Microbial sequence databases should be built from high-quality reference genomes, e.g. complete genomes from NCBI’s RefSeq database. This reduces the chance that microbial reads from one species will be misclassified as another due to assembly errors in these genomes. If metagenome-assembled genomes (MAGs) must be used, extra care should be taken to ensure that they are of high quality and not chimeric78,79 (Table 2) and even then, any assignments based on MAGs should be recognized as having lower confidence.
Alignment properties of reads should be used to identify false positive signals.
Microbial signatures of biological importance should be validated by independently aligning microbial reads to their relevant reference genomes, especially if these signatures are unexpected or novel (Table 1, Q18). This approach has previously been used in metagenomic profiling of ancient DNA to distinguish true positive species80, and in the use of metagenomic sequencing to diagnose brain infections81. The central idea is that if a microbial species is truly present, then any reads sequenced from that species should originate from a relatively uniform distribution of locations on the reference genome. This contrasts with false-positive read assignments, which often accumulate unevenly in low-complexity regions, regions with high sequence identity to other genomes, or in regions where the assembly contains a contaminant from another genome. Low-complexity matches can be identified computationally by counting the number of unique k-mers in the matching reads82, and genome coverage metrics can be readily computed after alignment using common bioinformatics software83 (Table 2).
Environmental contamination signals must be identified and removed.
Environmental contaminants in metagenomic data can be distinguished from biological signals because abundances of the former are often negatively correlated with those of the latter (Table 1, Q12, Q19). As a corollary, signals from contaminant microbes from the same source tend to correlate positively with each other, and abundances of many contaminant microbes vary inversely with library concentration, which allows for their identification and removal (Table 2). This approach of identifying contaminants has been described in various studies and has proven successful in identifying contaminant signatures in low microbial biomass samples such as placenta52, blood42 and bronchoalveolar fluid84. We emphasize that these quality control methods do not substitute for appropriate negative controls, but they can be valuable for metagenomic analyses, particularly when internal controls are not available. Also note that while contamination removal methods typically reduce the number of false positives, they do not guarantee that what remains are all true positives.
Information relevant to contaminant removal should be reported.
Pertinent details for contamination avoidance and removal should be included in all publications to facilitate reproducibility and critical evaluation of the methods used (Table 1, Q13, Q20). For example, samples are often sequenced in barcoded pools, and incorrect assignment of reads across barcodes (barcode hopping) can readily introduce false signals, an issue that requires particular care85. Additionally, even if human reads cannot be released (e.g. due to privacy concerns), microbial reads should be made freely available with information on which taxa they were assigned to and read counts at all reported taxonomic levels should be provided per sample. Details on how the samples were collected and what was done to minimize microbial contamination should be provided. Where possible, extraction and sequencing kit lot numbers should be recorded to help identify experimental batch-specific contamination signals4.
Care should be taken for analysis of data across batches.
Large public datasets that were sequenced for other purposes should be used with caution because cancers and controls are often sequenced in batches, and tissues are also processed in batches. The confounding is often so strong that computational batch correction techniques cannot alleviate the problem. In fact, applying these might exacerbate the strength of a false signal12. Different laboratory techniques, such as the use of reagents or primer sequences that change over time, can also introduce artificial sequencing artifacts. These artifacts may not align with the human genome and could vary across tissue types or cancer status, leading to false signals. Such variability may mistakenly suggest different microbial profiles when, in fact, the differences are merely technical artifacts rather than true biological variation. Therefore, careful consideration of batch effects and technical variability is crucial when analyzing these datasets to avoid misinterpreting the data. Ideally, data should be analyzed within batches that are highly similar in how they were collected and processed. If data must be analyzed across clearly distinct batches, batch-correction steps may be needed, but these have to be used with great caution and with positive controls that help assess the success of this procedure (Table 1, Q15, Q16). Typically, despite these sanity checks, the risk that the signals identified are a function of batch correction remain, and careful validation in additional cohorts or through experimental approaches is essential before greater weight is lent to such results.
We recognize that studies that leverage legacy datasets will necessarily be restricted in being able to adopt several of the solutions proposed here. As a corollary, we propose that the conclusions that can be drawn solely from bioinformatic analysis of such datasets should be limited to ruling out the abundant presence of microbes. Claims of establishing the presence of microbes, or more generally a microbiome, should require extensive validation and internal controls in additional cohorts based on the approaches presented here (Table 1, Q21).
Additional detection techniques must be employed to support and clarify sequencing-based signals for microbes in cancer tissues
Targeted validation is essential for sequencing-based signals.
Given the many confounding sources that can affect sequencing-based signals for microbes in cancer tissues, complementary detection techniques can be essential to validate signals (Table 1, Q22, Q24). For example, reverse transcription quantitative PCR (RT-qPCR) is widely used as a sensitive method for measuring microbial RNA abundances and to indicate the presence of live microbes in tissues86,87. When used with appropriate controls, such as no-template and no–reverse transcriptase controls, qPCR can be a simple and effective method for confirming specific microbial signals even in low-biomass environments. However, qPCR positivity still does not rule out the possibility of contamination from intact microbial cells introduced during surgical resection or laboratory handling. Similar challenges arise for culture-based confirmation, though it provides more direct evidence for live microbes (with appropriate negative controls).
Spatial information can clarify the nature of signals in tissues.
Ideally, microbial signatures detected by sequencing-based methods should be further validated by in situ hybridization (ISH) of fluorescent or chromogenic probes to taxa-specific RNA markers in tissue sections (Table 1, Q10, Q11, Q23). Commonly used RNA markers for microbial detection include 16S/23S88 and 28S89 ribosomal RNAs for bacteria and fungi respectively, due to their high intracellular abundances and hypervariable regions whose sequences are genus or species specific90,91. Less frequently, microbial mRNAs have also been targeted92 to detect some species e.g. Mycobacterium tuberculosis, which is more technically challenging but can be useful in cases where their 16S rRNAs lack species specificity93. Skepticism about specific microbial signatures is warranted if they are detected by metagenomic sequencing or qPCR but not by RNA based imaging, or if the RNA signal is not localized to expected areas or cells in the tissue. Imaging based techniques have the additional advantage that they enable assessment of the frequency and distribution of specific bacteria in a tissue sample, and whether these patterns resemble contamination (e.g. if signals are peripheral to the tissue), localized infection or generalized dissemination in the tissue.
Detection of non-DNA/RNA microbial signals should be attempted.
Since DNA is the major source of contamination in 16S rRNA and metagenomic sequencing experiments, in-situ staining and imaging of other biomolecules of microbial origin or signals of microbial infiltration are also powerful complementary methods to further verify microbial signals within tissues or cells (Table 1, Q22, Q23, Q24). Specifically, we know that the immune system is geared to detect microbes in the human body, and bacterial-derived lipopolysaccharides (LPSs) are one of the most reliable activators of immune responses94. Consequently, strong immune cell infiltration should be detected when diverse microbes are present in tumor tissues. One possibility is that microbes are intracellular, but bacterial DNA and secreted LPSs nonetheless activate the innate immune system, resulting in local inflammatory responses. Thus a simple check for inflammatory markers or the presence of microbial cell-wall components (e.g. LPS for gram-negative or lipoteichoic acid (LTA) for gram-positive bacteria, as well as others for fungi95) with immunohistochemistry (IHC) can serve as a basic control. Only a few recent studies have employed this control, such as a recent application detecting LPS-containing bacteria in breast cancer tissues28. Reliable and well-established technologies such as IHC can be readily employed by multiple labs to provide independent (positive or negative) data on the presence of bacteria in the tumor or surrounding tissues, leveraging on spatial information to clarify the biological relevance of signals.
Conclusion
The application of ancient DNA analysis techniques, without adequate controls to exclude modern contamination, led to a series of publications spanning over at least eight years that had limited or no scientific value. Over time, the research community addressed these issues by implementing stringent quality controls, which significantly enhanced the reliability of subsequent studies, and recognition of the contributions of this field to science in the form of a Nobel Prize. Cancer microbiome research currently faces a similar challenge, and there have been growing calls for more rigorous approaches65,96. Unfortunately, claims are still being published in lower quality studies based primarily on computational analyses of datasets, often lacking appropriate controls and validation (e.g. through qPCR, immunohistochemistry or culture), and thorough consideration of the limitations of the study (Table 1, Q25). This raises the concern that many microbial species detected in such studies may be artifacts of environmental or experimental contamination, rather than true biological findings. They can also impact the perception of the whole field (including of higher quality studies that incorporate more controls and validation work), when their poorly supported claims get retracted or are not validated in subsequent studies.
Overall, it is important to remember that detecting microbial DNA/RNA does not necessarily indicate the presence of a microbiome or for that matter even viable cells, and the relationship between microbes and cancer can be highly complex. It is not always clear whether the detected changes in the microbial profile of a tissue are a cause or consequence of the disease, creating a "chicken and egg" scenario. Nevertheless, even if microbial cells are convincingly shown to infect or infiltrate tissues after cancer initiation, this could have important consequences for the tumor microenvironment, immune responses and ultimately treatment outcomes. Therefore, while numerous studies on cancer microbiomes have already been published, readers should critically assess the methodologies used when evaluating published conclusions, while future studies are warranted with greater care to advance this exciting field.
The lack of rigorous quality control in many studies—compounded by inconsistent demands from reviewers and editors—risks wasting valuable time, talent, and resources on follow-up research that may not be reproducible due to flawed or poorly controlled data. To advance the field and derive meaningful biological and clinical insights, we propose several essential quality control measures that should be applied to all reports on microbes in cancer and other human tissues (summarized in a checklist; Table 1). The basic principle here is that the field needs to view reports of detection of microbes in diverse cancer tissue samples with care and appropriate skepticism, with the requirement that multiple carefully developed lines of evidence are needed in sites where the burden of microbes is expected to be very low (in the best-case scenario; Table 1, Q26). While research on ancient DNA analyses realized the need for rigorous controls essential to arrive at reliable conclusions nearly 10 years ago, similar controls are equally essential in the cancer microbiome field. These measures will strengthen the reliability of the findings and support progress in this important area of research.
References
- 1.Narunsky-Haziza L, Sepich-Poore GD, Livyatan I, Asraf O, Martino C, Nejman D, Gavert N, Stajich JE, Amit G, González A, et al. (2022). Pan-cancer analyses reveal cancer-type-specific fungal ecologies and bacteriome interactions. Cell 185, 3789–3806.e17. 10.1016/j.cell.2022.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wang J, Wang Y, Li Z, Gao X, and Huang D. (2021). Global analysis of microbiota signatures in four major types of gastrointestinal cancer. Front. Oncol 11, 685641. 10.3389/fonc.2021.685641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cai M, Kandalai S, Tang X, and Zheng Q. (2022). Contributions of Human-Associated Archaeal Metabolites to Tumor Microenvironment and Carcinogenesis. Microbiol. Spectr 10, e0236721. 10.1128/spectrum.02367-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Dohlman AB, Arguijo Mendoza D, Ding S, Gao M, Dressman H, Iliev ID, Lipkin SM, and Shen X. (2021). The cancer microbiome atlas: a pan-cancer comparative analysis to distinguish tissue-resident microbiota from contaminants. Cell Host Microbe 29, 281–298.e5. 10.1016/j.chom.2020.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Battaglia TW, Mimpen IL, Traets JJH, van Hoeck A, Zeverijn LJ, Geurts BS, de Wit GF, Noë M, Hofland I, Vos JL, et al. (2024). A pan-cancer analysis of the microbiome in metastatic cancer. Cell 187, 2324–2335.e19. 10.1016/j.cell.2024.03.021. [DOI] [PubMed] [Google Scholar]
- 6.Aykut B, Pushalkar S, Chen R, Li Q, Abengozar R, Kim JI, Shadaloey SA, Wu D, Preiss P, Verma N, et al. (2019). The fungal mycobiome promotes pancreatic oncogenesis via activation of MBL. Nature 574, 264–267. 10.1038/s41586-019-1608-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wood DE, Lu J, and Langmead B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257. 10.1186/s13059-019-1891-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hong C, Manimaran S, Shen Y, Perez-Rogers JF, Byrd AL, Castro-Nallar E, Crandall KA, and Johnson WE (2014). PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples. Microbiome 2, 33. 10.1186/2049-2618-2-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kim D, Song L, Breitwieser FP, and Salzberg SL (2016). Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729. 10.1101/gr.210641.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Poore GD, Kopylova E, Zhu Q, Carpenter C, Fraraccio S, Wandro S, Kosciolek T, Janssen S, Metcalf J, Song SJ, et al. (2020). Microbiome analyses of blood and tissues suggest cancer diagnostic approach. Nature 579, 567–574. 10.1038/s41586-020-2095-1. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 11.Sepich-Poore GD, McDonald D, Kopylova E, Guccione C, Zhu Q, Austin G, Carpenter C, Fraraccio S, Wandro S, Kosciolek T, et al. (2024). Robustness of cancer microbiome signals over a broad range of methodological variation. Oncogene 43, 1127–1148. 10.1038/s41388-024-02974-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gihawi A, Ge Y, Lu J, Puiu D, Xu A, Cooper CS, Brewer DS, Pertea M, and Salzberg SL (2023). Major data analysis errors invalidate cancer microbiome findings. MBio 14, e0160723. 10.1128/mbio.01607-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Scott AJ, Alexander JL, Merrifield CA, Cunningham D, Jobin C, Brown R, Alverdy J, O’Keefe SJ, Gaskins HR, Teare J, et al. (2019). International Cancer Microbiome Consortium consensus statement on the role of the human microbiome in carcinogenesis. Gut 68, 1624–1632. 10.1136/gutjnl-2019-318556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Robinson KM, Crabtree J, Mattick JSA, Anderson KE, and Dunning Hotopp JC (2017). Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data. Microbiome 5, 9. 10.1186/s40168-016-0224-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.The Cancer Genome Atlas Program (TCGA) - NCI https://www.cancer.gov/ccg/research/genome-sequencing/tcga.
- 16.Zepeda-Rivera M, Minot SS, Bouzek H, Wu H, Blanco-Míguez A, Manghi P, Jones DS, LaCourse KD, Wu Y, McMahon EF, et al. (2024). A distinct Fusobacterium nucleatum clade dominates the colorectal cancer niche. Nature 628, 424–432. 10.1038/s41586-024-07182-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Long X, Wong CC, Tong L, Chu ESH, Ho Szeto C, Go MYY, Coker OO, Chan AWH, Chan FKL, Sung JJY, et al. (2019). Peptostreptococcus anaerobius promotes colorectal carcinogenesis and modulates tumour immunity. Nat. Microbiol 4, 2319–2330. 10.1038/s41564-019-0541-3. [DOI] [PubMed] [Google Scholar]
- 18.Cullin N, Azevedo Antunes C, Straussman R, Stein-Thoeringer CK, and Elinav E. (2021). Microbiome and cancer. Cancer Cell 39, 1317–1341. 10.1016/j.ccell.2021.08.006. [DOI] [PubMed] [Google Scholar]
- 19.Knippel RJ, Drewes JL, and Sears CL (2021). The cancer microbiome: recent highlights and knowledge gaps. Cancer Discov. 11, 2378–2395. 10.1158/2159-8290.CD-21-0324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Fu A, Yao B, Dong T, Chen Y, Yao J, Liu Y, Li H, Bai H, Liu X, Zhang Y, et al. (2022). Tumor-resident intracellular microbiota promotes metastatic colonization in breast cancer. Cell 185, 1356–1372.e26. 10.1016/j.cell.2022.02.027. [DOI] [PubMed] [Google Scholar]
- 21.Parhi L, Alon-Maimon T, Sol A, Nejman D, Shhadeh A, Fainsod-Levi T, Yajuk O, Isaacson B, Abed J, Maalouf N, et al. (2020). Breast cancer colonization by Fusobacterium nucleatum accelerates tumor growth and metastatic progression. Nat. Commun 11, 3259. 10.1038/s41467-020-16967-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Riquelme E, Zhang Y, Zhang L, Montiel M, Zoltan M, Dong W, Quesada P, Sahin I, Chandra V, San Lucas A, et al. (2019). Tumor microbiome diversity and composition influence pancreatic cancer outcomes. Cell 178, 795–806.e12. 10.1016/j.cell.2019.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Pushalkar S, Hundeyin M, Daley D, Zambirinis CP, Kurz E, Mishra A, Mohan N, Aykut B, Usyk M, Torres LE, et al. (2018). The pancreatic cancer microbiome promotes oncogenesis by induction of innate and adaptive immune suppression. Cancer Discov. 8, 403–416. 10.1158/2159-8290.CD-17-1134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Galeano Niño JL, Wu H, LaCourse KD, Kempchinsky AG, Baryiames A, Barber B, Futran N, Houlton J, Sather C, Sicinska E, et al. (2022). Effect of the intratumoral microbiota on spatial and cellular heterogeneity in cancer. Nature 611, 810–817. 10.1038/s41586-022-05435-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kalaora S, Nagler A, Nejman D, Alon M, Barbolin C, Barnea E, Ketelaars SLC, Cheng K, Vervier K, Shental N, et al. (2021). Identification of bacteria-derived HLA-bound peptides in melanoma. Nature 592, 138–143. 10.1038/s41586-021-03368-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Coker OO, Nakatsu G, Dai RZ, Wu WKK, Wong SH, Ng SC, Chan FKL, Sung JJY, and Yu J. (2019). Enteric fungal microbiota dysbiosis and ecological alterations in colorectal cancer. Gut 68, 654–662. 10.1136/gutjnl-2018-317178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Choi MH, Kim M, Jeong SJ, Choi JY, Lee I-Y, Yong T-S, Yong D, Jeong SH, and Lee K. (2019). Risk factors for elizabethkingia acquisition and clinical characteristics of patients, south korea. Emerging Infect. Dis 25, 42–51. 10.3201/eid2501.171985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Nejman D, Livyatan I, Fuks G, Gavert N, Zwang Y, Geller LT, Rotter-Maskowitz A, Weiser R, Mallel G, Gigi E, et al. (2020). The human tumor microbiome is composed of tumor type-specific intracellular bacteria. Science 368, 973–980. 10.1126/science.aay9189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zapatka M, Borozan I, Brewer DS, Iskar M, Grundhoff A, Alawi M, Desai N, Sültmann H, Moch H, PCAWG Pathogens, et al. (2020). The landscape of viral associations in human cancers. Nat. Genet 52, 320–330. 10.1038/s41588-019-0558-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Rubin H. (2011). The early history of tumor virology: Rous, RIF, and RAV. Proc Natl Acad Sci USA 108, 14389–14396. 10.1073/pnas.1108655108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Polk DB, and Peek RM (2010). Helicobacter pylori: gastric cancer and beyond. Nat. Rev. Cancer 10, 403–414. 10.1038/nrc2857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wong SH, and Yu J. (2019). Gut microbiota in colorectal cancer: mechanisms of action and clinical applications. Nat. Rev. Gastroenterol. Hepatol 16, 690–704. 10.1038/s41575-019-0209-8. [DOI] [PubMed] [Google Scholar]
- 33.Scanu T, Spaapen RM, Bakker JM, Pratap CB, Wu L, Hofland I, Broeks A, Shukla VK, Kumar M, Janssen H, et al. (2015). Salmonella Manipulation of Host Signaling Pathways Provokes Cellular Transformation Associated with Gallbladder Carcinoma. Cell Host Microbe 17, 763–774. 10.1016/j.chom.2015.05.002. [DOI] [PubMed] [Google Scholar]
- 34.Rubinstein MR, Wang X, Liu W, Hao Y, Cai G, and Han YW (2013). Fusobacterium nucleatum promotes colorectal carcinogenesis by modulating E-cadherin/β-catenin signaling via its FadA adhesin. Cell Host Microbe 14, 195–206. 10.1016/j.chom.2013.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Serna G, Ruiz-Pace F, Hernando J, Alonso L, Fasani R, Landolfi S, Comas R, Jimenez J, Elez E, Bullman S, et al. (2020). Fusobacterium nucleatum persistence and risk of recurrence after preoperative treatment in locally advanced rectal cancer. Ann. Oncol 31, 1366–1375. 10.1016/j.annonc.2020.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Jiang S-S, Xie Y-L, Xiao X-Y, Kang Z-R, Lin X-L, Zhang L, Li C-S, Qian Y, Xu P-P, Leng X-X, et al. (2023). Fusobacterium nucleatum-derived succinic acid induces tumor resistance to immunotherapy in colorectal cancer. Cell Host Microbe 31, 781–797.e9. 10.1016/j.chom.2023.04.010. [DOI] [PubMed] [Google Scholar]
- 37.Vétizou M, Pitt JM, Daillère R, Lepage P, Waldschmitt N, Flament C, Rusakiewicz S, Routy B, Roberti MP, Duong CPM, et al. (2015). Anticancer immunotherapy by CTLA-4 blockade relies on the gut microbiota. Science 350, 1079–1084. 10.1126/science.aad1329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Mager LF, Burkhard R, Pett N, Cooke NCA, Brown K, Ramay H, Paik S, Stagg J, Groves RA, Gallo M, et al. (2020). Microbiome-derived inosine modulates response to checkpoint inhibitor immunotherapy. Science 369, 1481–1489. 10.1126/science.abc3421. [DOI] [PubMed] [Google Scholar]
- 39.Björk JR, Bolte LA, Maltez Thomas A, Lee KA, Rossi N, Wind TT, Smit LM, Armanini F, Asnicar F, Blanco-Miguez A, et al. (2024). Longitudinal gut microbiome changes in immune checkpoint blockade-treated advanced melanoma. Nat. Med 30, 785–796. 10.1038/s41591-024-02803-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hsu CL, and Schnabl B. (2023). The gut-liver axis and gut microbiota in health and liver disease. Nat. Rev. Microbiol 21, 719–733. 10.1038/s41579-023-00904-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Bedarf JR, Beraza N, Khazneh H, Özkurt E, Baker D, Borger V, Wüllner U, and Hildebrand F. (2021). Much ado about nothing? Off-target amplification can lead to false-positive bacterial brain microbiome detection in healthy and Parkinson’s disease individuals. Microbiome 9, 75. 10.1186/s40168-021-01012-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Tan CCS, Ko KKK, Chen H, Liu J, Loh M, SG10K_Health Consortium, Chia M, and Nagarajan N. (2023). No evidence for a common blood microbiome based on a population study of 9,770 healthy humans. Nat. Microbiol 8, 973–985. 10.1038/s41564-023-01350-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Emery DC, Shoemark DK, Batstone TE, Waterfall CM, Coghill JA, Cerajewska TL, Davies M, West NX, and Allen SJ (2017). 16S rRNA Next Generation Sequencing Analysis Shows Bacteria in Alzheimer’s Post-Mortem Brain. Front. Aging Neurosci 9, 195. 10.3389/fnagi.2017.00195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Aagaard K, Ma J, Antony KM, Ganu R, Petrosino J, and Versalovic J. (2014). The placenta harbors a unique microbiome. Sci. Transl. Med 6, 237ra65. 10.1126/scitranslmed.3008599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Alonso R, Pisa D, Fernández-Fernández AM, and Carrasco L. (2018). Infection of fungi and bacteria in brain tissue from elderly persons and patients with alzheimer’s disease. Front. Aging Neurosci 10, 159. 10.3389/fnagi.2018.00159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Thomson CA, McColl A, Graham GJ, and Cavanagh J. (2020). Sustained exposure to systemic endotoxin triggers chemokine induction in the brain followed by a rapid influx of leukocytes. J. Neuroinflammation 17, 94. 10.1186/s12974-020-01759-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Komai-Koma M, Gilchrist DS, and Xu D. (2009). Direct recognition of LPS by human but not murine CD8+ T cells via TLR4 complex. Eur. J. Immunol 39, 1564–1572. 10.1002/eji.200838866. [DOI] [PubMed] [Google Scholar]
- 48.de Miranda NF, Smit VT, van der Ploeg M, Wesseling J, and Neefjes J. (2023). Absence of Lipopolysccharide (LPS) expression in Breast Cancer Cells. BioRxiv. 10.1101/2023.08.28.555057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Branton WG, Ellestad KK, Maingat F, Wheatley BM, Rud E, Warren RL, Holt RA, Surette MG, and Power C. (2013). Brain microbial populations in HIV/AIDS: α-proteobacteria predominate independent of host immune status. PLoS ONE 8, e54673. 10.1371/journal.pone.0054673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Willis KA, Postnikoff CK, Freeman A, Rezonzew G, Nichols K, Gaggar A, and Lal CV (2020). The closed eye harbours a unique microbiome in dry eye disease. Sci. Rep 10, 12035. 10.1038/s41598-020-68952-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.D’Aquila P, Giacconi R, Malavolta M, Piacenza F, Bürkle A, Villanueva MM, Dollé MET, Jansen E, Grune T, Gonos ES, et al. (2021). Microbiome in Blood Samples From the General Population Recruited in the MARK-AGE Project: A Pilot Study. Front. Microbiol 12, 707515. 10.3389/fmicb.2021.707515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.de Goffau MC, Lager S, Sovio U, Gaccioli F, Cook E, Peacock SJ, Parkhill J, Charnock-Jones DS, and Smith GCS (2019). Human placenta has no microbiome but can contain potential pathogens. Nature 572, 329–334. 10.1038/s41586-019-1451-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Eisenhofer R, Minich JJ, Marotz C, Cooper A, Knight R, and Weyrich LS (2019). Contamination in low microbial biomass microbiome studies: issues and recommendations. Trends Microbiol. 27, 105–117. 10.1016/j.tim.2018.11.003. [DOI] [PubMed] [Google Scholar]
- 54.Ge Y, Lu J, Puiu D, Revsine M, and Salzberg SL (2024). Comprehensive analysis of microbial content in whole-genome sequencing samples from The Cancer Genome Atlas project. BioRxiv. 10.1101/2024.05.24.595788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Breitwieser FP, Pertea M, Zimin AV, and Salzberg SL (2019). Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Res. 29, 954–960. 10.1101/gr.245373.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Willerslev E, and Cooper A. (2005). Ancient DNA. Proc. Biol. Sci 272, 3–16. 10.1098/rspb.2004.2813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Hebsgaard MB, Phillips MJ, and Willerslev E. (2005). Geologically ancient DNA: fact or artefact? Trends Microbiol. 13, 212–220. 10.1016/j.tim.2005.03.010. [DOI] [PubMed] [Google Scholar]
- 58.Cooper A, and Poinar HN (2000). Ancient DNA: do it right or not at all. Science 289, 1139. 10.1126/science.289.5482.1139b. [DOI] [PubMed] [Google Scholar]
- 59.Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH-Y, et al. (2010). A draft sequence of the Neandertal genome. Science 328, 710–722. 10.1126/science.1188021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Meyer M, Kircher M, Gansauge M-T, Li H, Racimo F, Mallick S, Schraiber JG, Jay F, Prüfer K, de Filippo C, et al. (2012). A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226. 10.1126/science.1224344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Liang R, Li Z, Lau Vetter MCY, Vishnivetskaya TA, Zanina OG, Lloyd KG, Pfiffner SM, Rivkina EM, Wang W, Wiggins J, et al. (2021). Genomic reconstruction of fossil and living microorganisms in ancient Siberian permafrost. Microbiome 9, 110. 10.1186/s40168-021-01057-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Willerslev E, Hansen AJ, Binladen J, Brand TB, Gilbert MTP, Shapiro B, Bunce M, Wiuf C, Gilichinsky DA, and Cooper A. (2003). Diverse plant and animal genetic records from Holocene and Pleistocene sediments. Science 300, 791–795. 10.1126/science.1084114. [DOI] [PubMed] [Google Scholar]
- 63.Ramos-Madrigal J, Smith BD, Moreno-Mayar JV, Gopalakrishnan S, Ross-Ibarra J, Gilbert MTP, and Wales N. (2016). Genome Sequence of a 5,310-Year-Old Maize Cob Provides Insights into the Early Stages of Maize Domestication. Curr. Biol 26, 3195–3201. 10.1016/j.cub.2016.09.036. [DOI] [PubMed] [Google Scholar]
- 64.Jónsson H, Ginolhac A, Schubert M, Johnson PLF, and Orlando L. (2013). mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684. 10.1093/bioinformatics/btt193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.de Goffau MC, Lager S, Salter SJ, Wagner J, Kronbichler A, Charnock-Jones DS, Peacock SJ, Smith GCS, and Parkhill J. (2018). Recognizing the reagent microbiome. Nat. Microbiol 3, 851–853. 10.1038/s41564-018-0202-y. [DOI] [PubMed] [Google Scholar]
- 66.Sepich-Poore GD, Zitvogel L, Straussman R, Hasty J, Wargo JA, and Knight R. (2021). The microbiome and human cancer. Science 371. 10.1126/science.abc4552. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Cruz-Flores R, López-Carvallo JA, Cáceres-Martínez J, and Dhar AK (2022). Microbiome analysis from formalin-fixed paraffin-embedded tissues: Current challenges and future perspectives. J. Microbiol. Methods 196, 106476. 10.1016/j.mimet.2022.106476. [DOI] [PubMed] [Google Scholar]
- 68.Zhu K, He H, Tao L, Ma H, Yang X, Wang R, Guo J, and Wang C-C (2024). Protocol for a comprehensive pipeline to study ancient human genomes. STAR Protocols 5, 102985. 10.1016/j.xpro.2024.102985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Lam SY, Ioannou A, Konstanti P, Visseren T, Doukas M, Peppelenbosch MP, Belzer C, and Fuhler GM (2021). Technical challenges regarding the use of formalin-fixed paraffin embedded (FFPE) tissue specimens for the detection of bacterial alterations in colorectal cancer. BMC Microbiol. 21, 297. 10.1186/s12866-021-02359-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.CSB5/FFPE_kitome: Metagenomic sequencing conducted on blank paraffin blocks or extraction controls. https://github.com/CSB5/FFPE_kitome.
- 71.Tourlousse DM, Narita K, Miura T, Ohashi A, Matsuda M, Ohyama Y, Shimamura M, Furukawa M, Kasahara K, Kameyama K, et al. (2022). Characterization and demonstration of mock communities as control reagents for accurate human microbiome community measurements. Microbiol. Spectr 10, e0191521. 10.1128/spectrum.01915-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Leek JT (2014). svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res. 42, e161. 10.1093/nar/gku864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Ma S, Shungin D, Mallick H, Schirmer M, Nguyen LH, Kolde R, Franzosa E, Vlamakis H, Xavier R, and Huttenhower C. (2022). Population structure discovery in meta-analyzed microbial communities and inflammatory bowel disease using MMUPHin. Genome Biol. 23, 208. 10.1186/s13059-022-02753-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Vijaya Chandra SH, Srinivas R, Dawson TL, and Common JE (2020). Cutaneous malassezia: commensal, pathogen, or protector? Front. Cell. Infect. Microbiol 10, 614446. 10.3389/fcimb.2020.614446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Fletcher AA, Kelly MS, Eckhoff AM, and Allen PJ (2023). Revisiting the intrinsic mycobiome in pancreatic cancer. Nature 620, E1–E6. 10.1038/s41586-023-06292-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Liao W-W, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, et al. (2023). A draft human pangenome reference. Nature 617, 312–324. 10.1038/s41586-023-05896-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Forbes M, Ng DYK, Boggan RM, Frick-Kretschmer A, Durham J, Lorenz O, Dave B, Lassalle F, Scott C, Wagner J, et al. (2025). Benchmarking of human read removal strategies for viral and microbial metagenomics. BioRxiv. 10.1101/2025.03.21.644587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, and Tyson GW (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055. 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Orakov A, Fullam A, Coelho LP, Khedkar S, Szklarczyk D, Mende DR, Schmidt TSB, and Bork P. (2021). GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol. 22, 178. 10.1186/s13059-021-02393-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Pochon Z, Bergfeldt N, Kırdök E, Vicente M, Naidoo T, van der Valk T, Altınışık NE, Krzewińska M, Dalén L, Götherström A, et al. (2023). aMeta: an accurate and memory-efficient ancient metagenomic profiling workflow. Genome Biol. 24, 242. 10.1186/s13059-023-03083-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Salzberg SL, Breitwieser FP, Kumar A, Hao H, Burger P, Rodriguez FJ, Lim M, Quiñones-Hinojosa A, Gallia GL, Tornheim JA, et al. (2016). Next-generation sequencing in neuropathologic diagnosis of infections of the nervous system. Neurol. Neuroimmunol. Neuroinflamm 3, e251. 10.1212/NXI.0000000000000251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Breitwieser FP, Baker DN, and Salzberg SL (2018). KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 19, 198. 10.1186/s13059-018-1568-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079. 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Willner D, Daly J, Whiley D, Grimwood K, Wainwright CE, and Hugenholtz P. (2012). Comparison of DNA extraction methods for microbial community profiling with an application to pediatric bronchoalveolar lavage samples. PLoS ONE 7, e34605. 10.1371/journal.pone.0034605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Costello M, Fleharty M, Abreu J, Farjoun Y, Ferriera S, Holmes L, Granger B, Green L, Howd T, Mason T, et al. (2018). Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms. BMC Genomics 19, 332. 10.1186/s12864-018-4703-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Magalhães AP, França Â, Pereira MO, and Cerca N. (2019). RNA-based qPCR as a tool to quantify and to characterize dual-species biofilms. Sci. Rep 9, 13639. 10.1038/s41598-019-50094-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Kapoor V, Pitkänen T, Ryu H, Elk M, Wendell D, and Santo Domingo JW (2015). Distribution of human-specific bacteroidales and fecal indicator bacteria in an urban watershed impacted by sewage pollution, determined using RNA- and DNA-based quantitative PCR assays. Appl. Environ. Microbiol 81, 91–99. 10.1128/AEM.02446-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Fuchs BM, Syutsubo K, Ludwig W, and Amann R. (2001). In situ accessibility of Escherichia coli 23S rRNA to fluorescently labeled oligonucleotide probes. Appl. Environ. Microbiol 67, 961–968. 10.1128/AEM.67.2.961-968.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Rickerts V, Khot PD, Myerson D, Ko DL, Lambrecht E, and Fredricks DN (2011). Comparison of quantitative real time PCR with Sequencing and ribosomal RNA-FISH for the identification of fungi in formalin fixed, paraffin-embedded tissue specimens. BMC Infect. Dis 11, 202. 10.1186/1471-2334-11-202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Glöckner FO, Yilmaz P, Quast C, Gerken J, Beccati A, Ciuprina A, Bruns G, Yarza P, Peplies J, Westram R, et al. (2017). 25 years of serving the community with ribosomal RNA gene reference databases and tools. J. Biotechnol 261, 169–176. 10.1016/j.jbiotec.2017.06.1198. [DOI] [PubMed] [Google Scholar]
- 91.McDonald D, Jiang Y, Balaban M, Cantrell K, Zhu Q, Gonzalez A, Morton JT, Nicolaou G, Parks DH, Karst SM, et al. (2024). Greengenes2 unifies microbial data in a single reference tree. Nat. Biotechnol 42, 715–718. 10.1038/s41587-023-01845-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Fenhalls G, Stevens-Muller L, Warren R, Carroll N, Bezuidenhout J, Van Helden P, and Bardin P. (2002). Localisation of mycobacterial DNA and mRNA in human tuberculous granulomas. J. Microbiol. Methods 51, 197–208. 10.1016/s0167-7012(02)00076-3. [DOI] [PubMed] [Google Scholar]
- 93.Loukil A, Kirtania P, Bedotto M, and Drancourt M. (2018). FISHing Mycobacterium tuberculosis Complex by Use of a rpoB DNA Probe Bait. J. Clin. Microbiol 56. 10.1128/JCM.00568-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Liu J, Kang R, and Tang D. (2024). Lipopolysaccharide delivery systems in innate immunity. Trends Immunol. 45, 274–287. 10.1016/j.it.2024.02.003. [DOI] [PubMed] [Google Scholar]
- 95.Oumarou Hama H, Aboudharam G, Barbieri R, Lepidi H, and Drancourt M. (2022). Immunohistochemical diagnosis of human infectious diseases: a review. Diagn. Pathol 17, 17. 10.1186/s13000-022-01197-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Austin GI, and Korem T. (2024). Planning and Analyzing a Low-Biomass Microbiome Study: A Data Analysis Perspective. J. Infect. Dis 10.1093/infdis/jiae378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Zhao L, Grimes SM, Greer SU, Kubit M, Lee H, Nadauld LD, and Ji HP (2021). Characterization of the consensus mucosal microbiome of colorectal cancer. NAR Cancer 3, zcab049. 10.1093/narcan/zcab049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Roelands J, Kuppen PJK, Ahmed EI, Mall R, Masoodi T, Singh P, Monaco G, Raynaud C, de Miranda NFCC, Ferraro L, et al. (2023). An integrated tumor, immune and microbiome atlas of colon cancer. Nat. Med 29, 1273–1286. 10.1038/s41591-023-02324-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Barb JJ, Oler AJ, Kim H-S, Chalmers N, Wallen GR, Cashion A, Munson PJ, and Ames NJ (2016). Development of an Analysis Pipeline Characterizing Multiple Hypervariable Regions of 16S rRNA Using Mock Samples. PLoS ONE 11, e0148047. 10.1371/journal.pone.0148047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Salter SJ, Cox MJ, Turek EM, Calus ST, Cookson WO, Moffatt MF, Turner P, Parkhill J, Loman NJ, and Walker AW (2014). Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87. 10.1186/s12915-014-0087-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Davis NM, Proctor DM, Holmes SP, Relman DA, and Callahan BJ (2018). Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome 6, 226. 10.1186/s40168-018-0605-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Liu Y, Elworth RAL, Jochum MD, Aagaard KM, and Treangen TJ (2022). De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee. Nat. Commun 13, 6799. 10.1038/s41467-022-34409-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
