Abstract
Mucosal and barrier tissues, such as the gut, lung or skin, are composed of a complex network of cells and microbes forming a tight niche that prevents pathogen colonization and supports host–microbiome symbiosis. Characterizing these networks at high molecular and cellular resolution is crucial for understanding homeostasis and disease. Here we present spatial host–microbiome sequencing (SHM-seq), an all-sequencing-based approach that captures tissue histology, polyadenylated RNAs and bacterial 16S sequences directly from a tissue by modifying spatially barcoded glass surfaces to enable simultaneous capture of host transcripts and hypervariable regions of the 16S bacterial ribosomal RNA. We applied our approach to the mouse gut as a model system, used a deep learning approach for data mapping and detected spatial niches defined by cellular composition and microbial geography. We show that subpopulations of gut cells express specific gene programs in different microenvironments characteristic of regional commensal bacteria and impact host–bacteria interactions. SHM-seq should enhance the study of native host–microbe interactions in health and disease.
Subject terms: Next-generation sequencing, Genomic analysis
Spatial host–microbiome sequencing simultaneously profiles microbes and host transcriptomes from mouse colons.
Main
Mucosal and barrier tissues are ecosystems of multiple host cell types and a complex microbiome that vary in space and time. Antigen recognition and innate immune responses1 in the host, and molecular mechanisms derived from the microbiome, together prevent pathogen colonization and support the establishment of the host–microbiome spatial niche and host–microbial symbiosis2. Conversely, spatial dysregulation3,4 in diseases such as inflammatory bowel disease (IBD) can lead to dysfunction of the gut barrier5,6.
Characterizing and understanding the host–microbiome spatial niche requires detailed measurement of the identity and molecular characteristics of host cells and microbiome species and their interrelations in a spatial context. On the microbiome side, spatial metagenomics methods7 are emerging to map bacteria by either imaging8,9 or metagenomic plot sampling10. However, such studies focused on smaller regions, such as inter-fold, mucosal or lumen regions in the gut, and typically used broad taxonomy assignments, reaching family level at best10,11, with few reports at the level of specific genera or species9,12–14. Moreover, metagenomic plot sampling, so far the only approach for spatial bacterial sequencing in situ10, does not currently profile host gene expression. On the host side, single-cell genomics, including single-cell RNA sequencing (scRNA-seq), has been instrumental to characterize the cellular composition of tissues, for the host15, resident microbes16,17 or joint profiling of host and viral amplicon sequences18, but without spatial information. Spatial transcriptomics methods, either imaging based or sequencing based, enable cell type mapping in situ19–24 but have not yet been applied to simultaneously profile both host and microbiome in a spatial context.
In this study, we bridged this gap by developing spatial host–microbiome sequencing (SHM-seq; Fig. 1), a robust all-sequencing-based technology that leverages previous advancements in spatial transcriptomics25,26 and provides histology, spatial RNA-seq and spatial 16S sequencing using readily available instrumentation to profile the host’s expression responses in relation to microbial biogeography. We applied it in the model system of the mouse colon and show here a roadmap for interrogating spatial gene expression programs in correlation with bacterial presence.
Results
SMH-seq
We developed SHM-seq by adapting spatial transcriptomics25, where mRNA is captured by probes on a glass slide followed by profiling, to enable simultaneous capture of polyadenylated (host) transcripts and hypervariable (V4) regions of the 16S ribosomal RNA (rRNA) (Methods and Fig. 1). Specifically, we first produced solid-phase spatial transcriptomics slides covered with uniquely barcoded and spatially addressable poly(d)T capture probes25–27, with ~1,000 distinct DNA features (that is, spatial spots) deposited and covalently linked to a glass substrate (Methods). We enzymatically modified these spatially barcoded features on the glass array to enable simultaneous capture of polyadenylated transcripts (~50% surface capture probes) and hypervariable (V4) regions of the 16S rRNA28 (~50% surface capture probes) through a hybridization and extension reaction (Methods, Fig. 1 and Supplementary Fig. 1). Next, we placed frozen tissue sections on the optimized glass surface, stained them with hematoxylin and eosin (H&E) and imaged the tissue histology by bright-field microscopy. Finally, after imaging, we permeabilized the cells, allowing capture of host polyadenylated transcripts and bacterial 16S sequences on the array. The result was direct spatial DNA barcoding of host transcripts and bacterial species, which were sequenced by Illumina sequencing26.
To test SHM-seq, we applied it to profile intestinal cross-sections from the colon of C57BL/6 mice grown under typical conditions (specific pathogen free (SPF)) or as germ free (GF) or altered Schaedler flora (ASF) mice (Fig. 1 and Methods). GF mice provide a negative control; ASF mice, which contain only a defined floral community, provide a clear target for validation of the capture of expected bacterial species; and regular C57BL/6 SPF mice represent a complex case study with unaltered gut flora. In total, we applied SHM-seq to 124 tissue sections and collected data from 15,321 spatial spots (covered by the tissue) across the three conditions (Supplementary Tables 1 and 2 and Supplementary Fig. 2).
A deep learning approach for taxonomy classification
Although, for spatial transcriptomics (host) data, we used an established processing pipeline29 (Methods), we devised a novel taxonomy assignment pipeline to process the spatial microbiome data. First, we created a custom, gold standard bacterial genome reference for our experiment (Methods), based on species detected in dedicated shotgun metagenomic sequencing data, comprising 65 most abundant species from 39 genera present in our bulk reference samples in at least 0.1% abundance, a cutoff chosen as offering the most accurate mapping metrics (Fig. 2a, Supplementary Fig. 3a and Methods). Next, we compared the performance in terms of taxonomic assignment (by Kraken2 (ref.30)) when using this custom reference versus using other bacterial reference databases, including the National Center for Biotechnology Informationʼs (NCBI) RefSeq31 whole genome and NCBI’s 16S rRNA databases (Methods and Supplementary Fig. 3b–d). Although our customized gold standard (restricted) whole genome reference (SPF: 65 species; ASF: eight species) had higher mapping accuracy and lower false-positive rate on both real and simulated SHM-seq data, the RefSeq references performed reasonably as well, making them a viable option when dedicated metagenomics data cannot be collected for a customized reference. Simulated data additionally showed that the database type (whole genome versus 16S rRNA), size and sequencing read length all impact the performance of taxonomic assignments (Supplementary Fig. 3e).
We next devised a taxonomy assignment approach, where spatially captured sequences were first classified using Kraken2 (ref.30) (Methods), and those without a taxonomic classification were then processed by a novel deep learning approach (Methods). Our deep learning model (Supplementary Fig. 4a) is based on convolutional and recurrent neural networks, which process a read from both directions, seek local sequence patterns and their distant interactions and are trained to predict the most likely taxonomic assignment (Methods). We assessed its performance using simulated bacterial reads with attached taxa labels, mimicking data otherwise obtained with SHM-seq (Methods).
The deep learning model enhanced performance compared to using Kraken2 (Fig. 2b and Supplementary Fig. 4b,c) or QIIME 2 (ref.32), a commonly used 16S rRNA analysis tool (average Pearson r: 0.97 (Kraken2 + deep learning model) and 0.21 (QIIME 2), P ≤ 10−4, average Bray–Curtis dissimilarity: 0.06 and 0.46, respectively, genus level; Supplementary Fig. 4d). First, the deep learning model (used alone) assigned sequences to genera with 97% accuracy on a test dataset of 20% of the data. Next, we used the simulated data to assess taxonomic assignment metrics by comparing the predicted to the true taxonomic labels for data unseen by the model during training. The deep learning model after Kraken2 significantly outperformed Kraken2 alone on genus-level assignment, by several measures, including (1) higher similarity between relative bacterial abundances based on the model’s assignment versus the ground truth (average Pearson r for Kraken2 + deep learning model versus Kraken2 alone: 0.97 versus 0.68, P ≤ 10−4; Fig. 2b); (2) higher similarity in bacterial composition, evaluated at a resolution of 1,000 randomly assigned spatial spots (average Bray–Curtis dissimilarity 0.06 versus 0.15; Supplementary Fig. 4b); and (3) higher total accuracy (92% versus 84%, P ≤ 10−4), higher F1 score (89% versus 85%, P ≤ 10−4) and lower false-positive rate (8% versus 16%, P ≤ 10−4; Supplementary Fig. 4c), when evaluated as bulk-like samples. Thus, the deep learning model can improve the taxonomic assignments in SHM-seq data.
Sensitive and specific bacterial rRNA and host mRNA capture
We evaluated SHM-seq by (1) specificity and sensitivity of bacterial capture rates in SHM-seq compared to bulk 16S rRNA sequencing and by fluorescence in situ hybridization (FISH); and (2) host RNA-seq quality metrics obtained by SHM-seq compared to spatial transcriptomics alone.
To assess specificity (the fraction of sequencing reads mapping to genomic regions in the gold standard reference) and sensitivity (the fraction of expected species detected with SHM-seq), we analyzed profiles from the defined community in ASF33 mice as a positive control and from GF mice as a negative control (Fig. 1 and Methods). On average, 22% of all reads in ASF mice samples (n = 3) aligned to the bacterial reference, whereas only 0.008036% of reads from GF mice samples were assigned to any of the 65 species in the reference (n = 3) (Fig. 2c). For ASF samples, bacterial reads mapped to the expected locations in the respective ASF reference genomes highlighting the specificity of our targeted capture (mean reads in expected genomic bin: 97.0 ± 1.5% s.e.m., n = 18 tissue sections; Supplementary Fig. 5), with most reads (85.7 ± 4.5% mean ± s.e.m., n = 18 tissue sections) mapping on average to the expected capture region of the 16S rRNA gene (Supplementary Fig. 6). Highlighting the sensitivity of SHM-seq, all of the expected bacterial species were captured in the ASF samples, with ASF519 and ASF502 as the dominating bacteria (Supplementary Fig. 7a), in line with previous bulk RT–qPCR results34 (Pearson r = 0.85; Supplementary Fig. 7b) and with high reproducibility across replicates (Supplementary Fig. 7a). SHM-seq even detected ASF360, which was previously reported to be difficult to detect at low abundance using RT–qPCR35.
As a more complex case study, we further tested SHM-seq’s performance in bacterial capture in SPF mice (Methods; n = 3). On average, 28% of all reads aligned to the bacterial genome reference (Fig. 2c) and were assigned to 39 genera in our metagenomic reference (22 of which were present at >1% abundance), with Duncaniella, Turicibacter and Muribaculum the most abundant (Fig. 2d). The genera detected and their relative abundances correlated well with 16S rRNA sequencing (Pearson r = 0.69, P ≤ 10−4; Fig. 2e), and, on average, 90.7 ± 1.7% (mean ± s.e.m.) of SHM-seq reads (n = 9 tissue sections) mapped to the expected 16S rRNA capture region. Notably, our enzymatic cell permeabilization protocol was as efficient for preparing (bulk) bacterial samples as was traditional mechanical extraction of nucleic acids (Pearson r = 0.95, P ≤ 10−4; Fig. 2f).
To further validate the specificity of spatial capture of bacterial genomes in different regions of interest, we compared the bacterial abundance profiles obtained with SHM-seq in ASF mice with those measured by FISH (Methods) with five fluorescent bacterial detection probes: a positive control to detect all bacterial species, probes targeting three distinct ASF species and a negative control. We detected and quantified the fluorescence signal over three major tissue regions (Methods and Supplementary Fig. 8a–d). The abundances of the overall positive control and of each of the three ASF-specific bacterial species in FISH correlated significantly with the SHM-seq measurements (average Spearman ρ; ASF502: 0.72, ASF360: 0.72, ASF519: 0.55, positive control: 0.75, P ≤ 10−4; Fig. 2g–i and Supplementary Fig. 8e–g).
Host RNA-seq quality metrics were similar between SHM-seq and spatial transcriptomics. There were no significant differences in RNA-seq read mapping rates or unique molecular identifier (UMI) counts between spatial transcriptomics and SHM-seq in either SPF or ASF mice (n = 3; Supplementary Fig. 9a–d): 66% and 63% of the spatially captured reads were uniquely mapped, and pseudo-bulk UMI counts correlated highly (Pearson r = 0.95 and 0.92, respectively). Furthermore, there was high agreement in host expression profiles when we used regular spatial transcriptomics arrays (only poly(d)T capture) with the permeabilization method developed solely for disrupting host cells versus the method used for disrupting both host and bacterial cells (Pearson r = 0.94; Supplementary Fig. 9e,f). Thus, the surface treatment, permeabilization method and library preparation used in SHM-seq compare in specificity and sensitivity to commonly used methods for accessing bacterial sample composition and for spatial host expression profiling.
Defining spatial patterns of bacterial and host expression
To recover the spatial organization of microbes and host from our data, we defined the expression of host genes and abundance of bacterial genera in each spot, mapped those to 16 defined morphological regions of interest (MROIs) (Fig. 3a) to identify characteristic patterns and, finally, visualized our data as overviews of changes in tissue architecture at a more gross (by major MROIs) or fine (minor MROIs) level. In brief, we manually assigned each spot in each profiled tissue section to one of 16 MROI categories (Methods), based on histology, and then automatically visualized those on rasterized vector representations of tissues for each mouse condition (Methods). In this way, we quantified spatial abundances from 100 colonic mouse sections in SPF and GF mice, spanning 10,924 spatially barcoded spots (covered by gut tissue), each with spatial expression of 17,956 host genes and 39 bacterial genera across the MROIs. On average, we sampled 20 tissue sections, 2,208 spots and ~32,000 nuclear cell segments from each mouse colon (Supplementary Fig. 10). We tested for significant spatial expression differences in the sampled sections using Splotch36,37 (Methods), a hierarchical probabilistic approach that accounts for the relative position of each spot (with four nearest neighbors), differences in sampling (number of spots) between MROIs and the biological batch variables of presence of bacteria in the mice (that is, conditions) and individuals (that is, animals).
Spatial co-organization of host and microbe composition
We asked how gene expression in each of 16 MROIs was impacted by overall bacterial presence by comparing SPF versus GF mice (with no bacteria). Although both SPF and GF mice showed similar regional expression of some marker genes (for example, Epcam in the epithelium, Myh11 in the muscularis regions and Cd52 in Peyer’s patches; Fig. 3b), other genes were significantly differentially expressed between them in a region-specific manner (Fig. 3c). For example, Satb2 and Muc2 were, respectively, downregulated and upregulated in the crypt apex of SPF versus GF mice, the tissue layer most proximal to the mucosa and lumen (Fig. 3d). Satb2 helps maintain intestinal homeostasis, and its expression prevents excessive crypt damage and inflammation38. Similarly, Muc2 is key for maintenance of a healthy mucosal layer, and its depletion results in direct contact between epithelial cells and bacteria in the colon, leading to inflammation and cancer39. In other examples, Hnf4a, a gene associated with epithelium renewal40, is more highly expressed in the base of the crypt in GF versus SPF mice, and Gpx2, whose deficiency is related to propagating IBD symptoms41, is induced in the region bordering epithelium and muscularis mucosae tissue in SPF versus GF mice (Supplementary Fig. 11).
Host spatial expression patterns in SPF mice were mirrored by distinct bacterial genera detected by Splotch (Methods) at different abundances and compositions in six distinct MROIs in the SPF mice. The detected bacteria were found in the colonic inter-fold regions (crypt base, crypt mid and crypt apex/mid), the mucosal layers (crypt apex/mucosa and mucosa/pellet) or the lumen (that is, pellet, where they were most abundant, as expected). Inter-fold regions had the lowest diversity, and the pellet had the highest diversity (Fig. 3e). Morphological regions in close proximity to each other shared some highly abundant genera: Pseudobutyrivibrio was shared in the two mucosal regions, and Mediterraneibacter, an obligate anaerobe and formerly part of the Ruminococcus genus42, was shared between the inter-fold regions (Fig. 3e,f). Mucosal regions had a preponderance of Oscillibacter (Fig. 3f,g, middle); Pseudobutyrivibrio (Fig. 3f,g, bottom); and Ruminococcus and Phocaeicola, with the latter two genera previously associated with the mucosa13,43–45, whereas the pellet had an abundance of commensal bacteria14,46, such as Lactobacillus, Muribaculum and Anaerocolumna but also Massilistercora, part of the Eubacteriales family and previously reported only in the human gut47 (Fig. 3g, top). These patterns were apparent both in aggregate across samples and in individual sections, with good reproducibility (Fig. 3f–h and Supplementary Fig. 12).
The mucosal barrier, otherwise preventing unwanted direct contact between lumen and host cells in the crypt apex, signals the immune system in a process mediated by epithelial cells48. We hypothesized that detected bacterial genera, some observed exclusively with tight junction mucosal barriers (for example, Pseudobutyrivibrio, Ruminococcus and Oscillibacter; Figs. 3f,g and 4a) and others diffusing into the tissue-specific inter-fold regions (for example, Intestimonas, Coprococcus and Flavonifractor; Figs. 3f and 4a), could influence and be influenced by host expression in close proximity. To systematically investigate significant regional and cell type composition differences and associate them to the presence of bacteria from different genera, we identified 28 spatial modules of genes that are co-expressed across spots (Supplementary Fig. 13a and Methods). We then partitioned each such module into gene submodules by gene co-variation across single-nucleus RNA sequencing (snRNA-seq) profiles (Fig. 4b, Supplementary Fig. 13b and Methods), recovering 203 submodules (Supplementary Table 3 and Methods). We labeled each submodule by its expression in one or multiple of the 30 cell types identified by snRNA-seq and tested it for enriched KEGG pathways (Fig. 4c and Methods).
In the presence of microbiota, specifically Pseudobutyrivibrio, Sodaliphilus and Oscillibacter, colonocytes in the apex of the crypts expressed Ceacam20, a known receptor for Gram-negative bacteria49 and a known colitis suppressor50, whereas goblet cells expressed high levels of Hif1a, a marker of a functioning mucosal barrier that is downregulated in IBD51 (Fig. 4d and Supplementary Table 4). Neurons in the neighboring region (that is, upper mid region of the crypts), in the presence of Intestimonas, expressed Tacr1 and other neuroactive ligands and receptors implicated in regulating gut motility52, whereas macrophages in the same regions and in the presence of the same bacterial genera expressed Fcrl2 and Slamf6, genes that have been shown to modulate neuro-immune signaling upon receptor–microbe binding53,54 (Fig. 4d and Supplementary Table 4). Specialized spatial niches in lower regions of the crypts also contained networks of neurons and myocytes involved in muscle contractility (Camk2a in the presence of Coprococcus), axon guidance (Sema4f in the presence of Flavonifractor) and cholinergic signaling (Chat in the presence of Coprococcus) (Fig. 4d and Supplementary Table 4).
Discussion
Here we presented SHM-seq, a method that relies on solid surface capture of polyadenylated host transcripts and variable (V4) 16S bacterial regions onto spatially barcoded microarrays for joint spatial profiling of bacterial composition, host gene expression and tissue histology. We provided a deep-learning-based approach to enhance taxonomy assignment for metagenomic taxa classification from SHM-seq data with improved detection rates and assignment accuracy and a roadmap for interrogating coordinated spatial expression programs. Benchmarking against a gold standard custom reference, generated from dedicated metagenomics data in the same system, we show that SHM-seq data are compatible with mapping to different databases, containing either 16S rRNA or full genome bacterial sequences, and that the accuracy of the mapping is based on the quality and size of the respective databases.
We benchmarked the sensitivity and specificity of SHM-seq compared to traditional 16S sequencing, published RT–qPCR data as well as FISH and spatial transcriptomics in three mouse conditions: SPF, GF and ASF. SHM-seq showed reproducibility and robustness using a tissue dataset of 124 sections and detected all the bacteria genera otherwise present after 16S sequencing in SPF mice as well as all of the eight species referenced in ASF mice. Previous studies reported variation in bacterial abundance between mice34,35,55. In our study, we also saw differences in abundance obtained with SHM-seq versus external ASF data, although the overall correlation between the datasets was high (Pearson r = 0.85), and SHM-seq was highly reproducible across mice. Future studies can alter the amount and sequence of capture oligonucleotides on the spatial array surface to further tune the recovery rate of bacterial versus host transcripts or introduce other user-defined capture moieties of interest. Additionally, although sequencing only parts of the 16S rRNA gene has been shown to be sufficient to identify bacterial genera56, it has limited resolution at finer taxonomic levels, such as specific bacterial species and strains. SHM-seq can address these concerns in the future by modifying the capture sequences and library preparation procedures, preferably by increasing the sequencing read length.
Using these data and methods, we show that, in the presence of microbiota, subpopulations of goblet cells and colonocytes formed cell-adhesive layers filled with Muc2 and Ceacam20 for host–microbial communication. Additionally, we observed distinct submodules of genes expressed in specific microenvironments in SPF mice that encode proteins that can regulate intestinal physiological functions and colonic motility, which are disrupted in GF mice57. Thus, our spatial analysis identified spatial expression programs throughout the tissue cross-section characteristic of regional populations that display distinct, mouse-condition-relevant dynamics and may depend on the presence of commensal bacteria and/or impact host–bacteria interactions.
SHM-seq enables robust spatial host–microbiome profiling from a large number of tissues but is currently limited by the resolution of solid-phase capture arrays. To address this, Splotch, our quantitative data model (Methods), simultaneously combines spatial and experimental parameters to improve probabilistic inference of spatially resolved gene expression from lower-resolution arrays36,37. Moreover, by interrogating tissue contexts through MROIs, the model shares information across tissue sections to detect reproducible spatial changes in the different mouse conditions; to create a common coordinate framework (CCF) guided by the biological question and spatial resolution58; and to generate easier visualization of large tissue cohorts. Future studies can further tackle the resolution limitation using higher-density formats23,24,59 and with enhanced computational mapping approaches for deconvolving cell–cell inter-species communication networks. Additionally, using 16S rRNA databases restricted to gut microbial species can further alleviate the computational burden of mapping SHM-seq data, whereas mapping to large 16S rRNA databases increases the risk of false-positive mapping rates and the risk of lower representation of species in these databases. As such, we favor whole genome databases, such as RefSeq, and, when possible, restrict those to species present in adjacent metagenomic data, when available.
SHM-seq paves the way for future work and detailed investigation in larger studies, designed to compare animal models—for example, during colitis-induced changes60 or infection61—and human patients sampled longitudinally or cross-sectionally, where both microbiome and host cells vary, as does host genetics. Such analyses can expand understanding of the relationship between host and microbiome and lead to better understanding of mechanisms sustaining homeostasis in health or onset and persistence of chronic inflammation. Our method should, thus, help in better understanding environmental and microbiome-driven spatial neighborhood heterogeneity in barrier and mucosal tissues.
Methods
SHM-seq data generation
Mice
Adult C57BL/6 SPF mice were purchased from The Jackson Laboratory and maintained in accordance with ethical guidelines monitored by the Institutional Animal Care and Use Committee (IACUC), established by the Division of Comparative Medicine at the Broad Institute of MIT and Harvard, and consistent with the Guide for Care and Use of Laboratory Animals, National Research Council, 1996 (institutional animal welfare assurance no. A4711-01), with protocol 0122-10-16. Adult C57BL/6 GF mice were obtained from Taconic Biosciences and maintained in a gnotobiotic environment. Some of these mice were randomly selected and inoculated with ASF33 over several generations and used when >6 weeks of age. After colonization, ASF mice were housed in sterile conditions and tested with polymerase chain reaction (PCR) to ensure that sterility was maintained63. Animal housing room temperatures were monitored and always maintained according to species-specific needs. Humidity was maintained at 30–70%. Light intensity and light cycle timing were carefully regulated by Broad Institute animal facilities. To capture material from multiple sections per colonic tube, as well as to maximize the use of a single spatial array (1,007 spatial spots spread over ~42 µm2), we placed 2–3 tissue cross-sections onto one spatial capture area. We sampled ~20 sections from each mouse by sectioning in the aforementioned fashion across one spatial capture slide containing six active capture areas.
Tissue collection
Colonic tubes from the mid part of the colon were dissected within minutes of killing mice, and tissues were dried from excess fluids and embedded in Optimal Cutting Temperature (O.C.T., Fisher Healthcare) in large molds (VWR) pre-filled with O.C.T. The molds were then laid onto a metal plate pre-chilled and set on top of dry ice for 2 min or until complete freezing. Samples were transferred to −80 °C until sectioning.
Generation of slides with customized surfaces
Customized surface primers were immobilized to an amine-activated surface area (~40 mm2 each) using covalent bioconjugation25,27, as recommended by the manufacturer (Surmodics). Three distinct surfaces were generated for validations: 16S, poly(d)T and a mixed poly(d)T/16S surface. The oligonucleotides immobilization in each case were:
5′-[AmC6]UUUUUGACTCGTAATACGACTCACTATAGGGACACGACGCTCTTCCGATCTNNNNNNNNATCTCGACGACTACHVGGGTATCTAATCC-3′
5′-[AmC6]UUUUUGACTCGTAATACGACTCACTATAGGGACACGACGCTCTTCCGATCTNNNNNNNNTTTTTTTTTTTTTTTTTTTVN-3′ (both Integrated DNA Technologies (IDT)).
All slide incubations took place on a thermal incubator (Eppendorf Thermomixer Option C) with slides mounted into a hybridization chamber (ArrayIt). All in situ reactions performed on spatial arrays were carried out in a class II biosafety cabinet.
Generation of spatial arrays with customized surfaces
All spatial arrays were produced as previously described for the original spatial transcriptomics method25,27. In brief, six spatial microarrays per slide were created using amine-activated CodeLink slides (Surmodics). To ensure covalent binding chemistry to the amine-activated surface, DNA oligonucleotides (IDT) were constructed as follows:
5′-[AmC6]UUUUUGACTCGTAATACGACTCACTATAGGGACACGACGCTCTTCCGA TCT-[18mer spatial barcode]-[7mer random UMI]-[20T]-VN.
Printing was performed by ArrayJet LTD by spotting 100 pL of spatially barcoded DNA oligonucleotides (33 µM diluted in 2× CodeLink printing buffer) using inkjet technology to form 100-μm spots with a 200-μm spot-to-spot pitch, resulting in a total of 1,007 different spatially addressable spots printed in a 6.2-mm × 6.6-mm capture area. A complete list of all spatially barcoded DNA oligonucleotides used in this study is available at https://github.com/nygctech/shmseq. After printing the spatial arrays, slides were blocked using a pre-warmed blocking solution (50 mM ethanolamine, 0.1 M Tris, pH 9) at 50 °C for 30 min and washed with 4× saline sodium citrate (SSC) and 0.1% SDS (pre-warmed to 50 °C) for 30 min before rinsing the slides with deionized water and drying.
Next, capture areas were modified to create a customized surface containing a mixture of poly(d)T and 16S capture sequences. To hybridize the 16S probe onto the spatially barcoded poly(d)T surface probes, 75 µl of the 16S (V4) probe (IDT) with the sequence 5′-GGATTAGATACCCBDGTAGTCGAGATNBAAAAAAAAAAAAAAAAAAAA-3′ (sequence28 modified to enable attachment to the spatial arrays) at 0.8 nM concentration in 2× SSC (Sigma-Aldrich), 20% fresh formamide (Thermo Fisher Scientific) and 0.1% Tween (Sigma-Aldrich) was added to each spatial capture area and incubated for 30 min at room temperature. The probe mix was then removed, and capture areas were washed with 100 µl of 0.1× SSC (Sigma-Aldrich). To covalently attach the hybridized 16S probes onto the spatially barcoded poly(d)T surface probes, an extension reaction was performed with 75 µl of 1× M-MuLV buffer, 2 U µl−1 RNaseOUT, 20 U µl−1 M-MuLV and 0.5 mM dNTPs (all from Thermo Fisher Scientific) and 0.20 µg µl−1 BSA (New England Biolabs (NEB)) added to the wells and incubated at 42 °C for 30 min. The M-Mulv solution was then removed, followed by a wash with 100 µl of 0.1× SSC. To strip the 16S probes used in the hybridization and extension reaction, and make the covalently attached 16S surface probes single stranded, surface capture areas were incubated 3× with 75 µl of 100% formamide for 3 min at room temperature. Capture areas were then washed twice with 100 µl of 0.1× SSC before washing the entire slide for 10 min at 50 °C in 2× SSC/0.1% SDS (Sigma-Aldrich), followed by 1-min wash with 0.2× SSC and finally 0.1× SSC, both at 37 °C. This resulted in spatially barcoded capture areas containing ~1:1 ratio of poly(d)T and 16S capture sequences.
Cryosectioning
The entire cryo chamber, including all surfaces and tools used during cryosectioning, were wiped with 70% ethanol before the start of sectioning to avoid bacterial contamination. Both spatial arrays and O.C.T.-embedded gut tissue blocks were allowed to reach the temperature of the cryo chamber before 10-µm-thick cross-sections of gut tissue were placed on customized spatial arrays. Tissue fixation followed immediately as described below.
Tissue fixation, H&E staining and imaging
The spatial array was warmed at 37 °C for 2.5 min. Then, the entire area of the glass slide was covered in a methacarn solution (60% absolute methanol, 30% chloroform stabilized with ethanol and 10% glacial acetic acid (all from Sigma-Aldrich)) for 10 min at room temperature in a closed space to avoid evaporation. Methacarn was then removed, and the slide was allowed to dry before ~300 µl of isopropanol (Sigma-Aldrich) was added to the slide and incubated for 1 min at room temperature. When the slide was completely dry again, it was stained using H&E in an EasyDip Slide Jar Staining system (Weber Scientific). The system included containers separately filled with ~80 ml of Dako Mayer’s hematoxylin and Dako Blueing Buffer (both from Agilent Technologies), 5% Eosin Y in 0.45 M Tris acetate (both from Sigma-Aldrich) buffer at pH 6 and nuclease-free water (Thermo Fisher Scientific). The slide was put in a slide holder and completely dipped in hematoxylin for 6 min, followed by five dips in nuclease-free water and then 10 dips in a beaker filled with ~800 ml of nuclease-free water. The slide holder was then dipped in Dako Blueing Buffer for 5 s, followed by another five dips in nuclease-free water. Finally, the slide holder was put in the eosin solution for 1 min and washed by five dips in nuclease-free water. The slide was removed from the holder and air dried before being mounted with 85% glycerol and covered with a coverslip (VWR) before imaging. Imaging of stained H&E tissue sections on glass arrays was performed on a Metafer VSlide scanning system (MetaSystems) installed on an Axio Imager Z2 microscope (Carl Zeiss) with an LED transmitted light source and a CCD camera. Using an A-P ×10/0.25 Ph1 objective lens (Carl Zeiss) and a configuration program26, focusing and scanning of each tissue section on the glass array was done automatically. Image stitching was done using VSlide (version 1.0.0) with 60-µm overlap and linear blending between fields of view. Images were extracted using jpg compression.
In situ reactions: permeabilization and reverse transcription
Before start, the hybridization chamber was cleaned with RNaseZap (Thermo Fisher Scientific) and 70% ethanol, followed by at least 30 min in a UV light chamber. After section imaging, the slide was again attached to the hybridization chamber to proceed with the following permeabilization reactions (referred to as ‘bacterial treatment’ below). First, 100 µl of a lysozyme solution with 0.05 M EDTA (pH 8.0, Thermo Fisher Scientific), 0.1 M Tris HCl, pH 8 (Thermo Fisher Scientific) and 10 µg µl−1 lysozyme (from chicken egg white, lyophilized powder, Sigma-Aldrich) were added to each well and incubated for 30 min at 37 °C, followed by wash with 100 µl of 0.1× SSC. Second, 75 µl of 10% Triton X-100 (Sigma-Aldrich) was added and incubated for 5 min at 37 °C, followed by a 100-µl wash of 0.1× SSC. Third, a solution with 0.05% SDS and 5 mM DTT (Thermo Fisher Scientific) was added and incubated for 5 min at 37 °C, followed by a 100-µl wash of 0.1× SSC. Fourth, 100 µl of collagenase I (200 U) in 1× HBSS (both from Thermo Fisher Scientific) were added to each well and incubated for 20 min at 37 °C, again followed by a 100-µl wash of 0.1× SSC. Lastly, 75 µl per well of 0.1% pepsin (pH 1, Sigma-Aldrich) was incubated for 10 min at 37 °C, followed by a final wash of 100 µl of 0.1× SSC. In situ cDNA synthesis was performed as previously described26. In brief, 75 µl of 50 ng µl−1 actinomycin D (Sigma-Aldrich) and 0.5 mM dNTPs (Thermo Fisher Scientific, 0.20 µg µl−1 BSA and 1 U µl−1 USER enzyme (both from NEB), 6% v/v Lymphoprep (STEMCELL Technologies), 1 M betaine (B0300-1VL, Sigma-Aldrich), 1× first-strand buffer, 5 mM DTT, 2 U µl−1 RNaseOUT and 20 U µl−1 Superscript III (all from Thermo Fisher Scientific)) were added to each well. The reaction was sealed with Microseal ‘B’ PCR Plate Seals (Bio-Rad) and incubated for at least 6 h. After incubation, 70 µl of the released cDNA material from each hybridization chamber well was collected and stored in a 96-well PCR plate (Eppendorf).
Library preparation
Library preparation was performed using the SM-Omics automated library preparation protocol, as previously described26. In brief, released cDNA material was first made double stranded using the nicked RNA template strands as primers for copying the cDNA strand with DNA polymerase I. To avoid overdigestion, the reaction was terminated with EDTA, and ends were blunted using T4 DNA polymerase before linear amplification by in vitro transcription. Amplified material was again transcribed into cDNA, resulting in material ready for PCR indexing as described in the next subsection.
Quantification, indexing and sequencing
qPCR quantification and indexing were performed as previously described64 using TruSeq LT Illumina indexing and a KAPA HotStart HiFi ReadyMix (Roche). Indexed cDNA libraries were cleaned using a 0.7:1 ratio with AMPure XP beads (Beckman Coulter) to PCR product, according to the manufacturer’s protocol, and eluted in 12 µl of elution buffer (Qiagen). Each sample’s concentration was measured using the DNA HS Qubit assay (Thermo Fisher Scientific), and average fragment length was determined using either Bioanalyzer HS or DNA1000 TapeStation (both from Agilent Technologies). Each sample was then diluted to the desired concentration for sequencing (1.08 pM on a NextSeq and 10 pM on a MiSeq, both with ~10% PhiX). Pooled libraries were sequenced with 25 nucleotides (nt) in the forward read and 55 nt and 150 nt in the reverse read on NextSeq and MiSeq (Illumina), respectively.
Generation of bacterial validation data
Mechanical extraction of bacterial RNA
An approximately 1-mm-thick tissue section with pellet was sectioned from SPF colons in O.C.T. and put in a dry ice-cold Lysis Matrix D tube (MP Biomedicals). Then, 400 µl of RLT buffer (Qiagen) with 1% 2-mercaptoethanol (Sigma-Aldrich) was added to the tube, and the solution was homogenized in a FastPrep-24 instrument (MP Biomedicals) at speed 6 for 40 s. Tubes were then centrifuged for 5 min at 12,000 r.p.m. Supernatant was transferred to a new tube, and RNA extraction was done using the RNeasy Mini Kit (Qiagen), according to the manufacturer’s instructions. Extracted RNA was fragmented using the NEBNext Magnesium RNA Fragmentation Module Kit (NEB), heating for 2 min. Fragmented RNA was cleaned with the MinElute Cleanup Kit (Qiagen), according to the manufacturer’s instructions. Quality of the fragmented RNA was evaluated by the Bioanalyzer Pico Kit (Agilent Technologies). Next, ~20 ng µl−1 mechanical extracted RNA was added on a 16S surface probe coated quality control (QC) array in an in situ cDNA reaction, as described in the ‘In situ reactions: permeabilization and reverse transcription’ subsection. After at least 6-h incubation at 42 °C, 70 µl of the released material from each well was collected and stored in a new 96-well PCR plate (Eppendorf). Library preparation, quantification, indexing and sequencing on the MiSeq were performed as described in the ‘Library preparation’ and ‘Quantification, indexing and sequencing’ subsections.
Extraction and metagenomic sequencing of fecal DNA
Pellet was collected from the colon of SPF mice by perforating the colon wall and scraping the pellet and mucus into a 1.5-ml collection tube (Eppendorf). Collected pellet was stored at −80 °C until further processed. DNA was extracted from the pellet using a Lysing Matrix Y tube (MP Biomedicals), according to the manufacturer’s instructions. Extracted DNA concentration was determined using the DNA HS Qubit assay. DNA was made into libraries using Nextera XT (15031942 v05). Concentration and average fragment length of each sample were evaluated using the DNA HS Qubit assay (Thermo Fisher Scientific) and Bioanalyzer HS (Agilent Technologies), respectively. Each sample was diluted to the desired concentration for sequencing (9 pM, ~10% PhiX), and pooled samples were sequenced on a MiSeq (2 × 150 bp, lllumina). Each sample was sequenced to ~5–10 million reads.
FISH
FISH was performed on the same fresh-frozen gut tissue samples from ASF mice. All sections were 10-µm-thick cross-sections and consecutively collected. First sections were placed on the spatial array, followed by placing consecutive sections on a CodeLink amine-activated slide (Surmodics); the following two sections were then again placed on the spatial array. Sections on the spatial array were used for SHM-seq, and sections on the amine-activated CodeLink slide (Surmodics) were prepared for FISH as further described. Slides were warmed at 37 °C for 2.5 min on a thermal incubator, before tissue sections were fixed using freshly prepared methacarn, as described in the ‘Tissue fixation, H&E staining and imaging’ subsection. Slides were then placed in a hybridization chamber, and 75 µl of preheated FISH solution (0.9 M NaCl and 20 mM Tris, pH 7 (both Thermo Fisher Scientific), 0.1% SDS (Sigma-Aldrich) and a FISH oligonucleotide detection probe (0.06 µg ul−1)) was added to each well and incubated for 2 h at 25 °C. Oligonucleotide detection FISH probes (IDT) were used depending on the target of interest: probe EUB338 (5′-/Cy5/GCTGCCTCCCGTAGGAGT-3′) for all bacteria; probe non-338 (5′-/Cy5/ACTCCTACGGGAGGCAGC-3′) as a negative control; probe Lab158 (5′-/Cy5/GGTATTAGCAYCTGTTTCCA-3′)65–67 to target ASF360; probe Lac435 (5′-/Cy5/TCTTCCCTGCTGATAGA-3′)68,69 to target ASF502; and probe Bac303 (5′-/Cy5/CCAATGTGGGGGACCTT-3′)8,67 to target ASF519. After the 2-h incubation, FISH solution was removed, and wells were washed with 100 µl of 1× PBS before the hybridization chamber was removed and slides were dipped 12 times in 50 ml of 1× PBS before being air dried. Slides were mounted with 85% glycerol (Sigma-Aldrich) and a coverslip (VWR). Epifluorescent images were acquired on an Axio Imager Z2 microscope using a PhotoFluor LM-75 light source (89North) in combination with a Plan-APOCHROMAT ×63/1.4 oil DIC objective (Carl Zeiss). Images were processed using VSlide (version 1.0.0, MetaSystems).
Processing on H&E imaging data
Image registration and annotation
Image processing and registration of barcoded spots was done using SpoTteR26. H&E images (collected in RGB channels) were downscaled to approximately 500 × 500 pixels. For efficient grid spot detection, tissues were masked from the images using quantile thresholding in the red channel. Centroids of spatial array spots were detected by computing the image Hessian. Centroid coordinates were used as probable grid points, and a rectangular grid was then fitted to these probable points using a local optimizer (nlminb, R package stats (R version 3.6.3)). With iterations and removing 10% of the probable spots that did not fit the perfect grid structure, a new grid was fitted until the target number of grid points per row (here, 35) and column (here, 33) was reached. Final grid points were overlapped with the previously masked tissue section to select spatial points present only under the detected tissue section area. These points were used in further analysis.
H&E images were annotated using a graphical cloud-based interface24 by manually assigning each spatial coordinate (x,y) resulting from the grid fitting process with one or more morphological region tags. The tags used were epithelium (E), epithelium and muscle and submucosa (ALL), epithelium and mucosae and submucosa (EMMSUB), epithelium and mucosae (EMM), muscle and submucosa (MSUB), crypt base (BASE), externa and interna (MEI), externa (ME), interna (MI), mucosae and interna (MMI), mucosa and pellet (MUPE), crypt mid (MID), crypt apex and mucosa (APEXMU), crypt apex and crypt mid (UPPERMID), Peyer’s patch (PP) and pellet (PE). E, EMMSUB, EMM, BASE, MEI, ME, MI, MUPE, MID, MMI, APEXMU, UPPERMID, PP and PE were visualized in tissue vector representations.
Processing of host reads
Raw reads processing and mapping of host reads
Reads were generated with bcl2fastq2 (version 2.20.0) and trimmed to remove adaptor sequences and the 16S surface probe sequence using BBDuk70 (version 38.33). ST Pipeline (version 1.7.6)29 was used to generate gene-by-barcode matrices. The reverse quality-filtered reads were mapped with STAR (version 2.6.0)71 to the mouse genome reference (GRCm38 primary assembly), and mitochondrial sequences were removed. Mapped reads were annotated using HTseq-count (version 0.11.4)72 and the mm11 mouse annotation reference (https://www.gencodegenes.org/mouse/release_M11.html). Annotated reads were demultiplexed with TagGD29,73 (version 0.3.6) with a Hamming distance clustering approach (k-mer 6, mismatches 2). This connected transcript information to spatial barcodes. Finally, UMI collapsing per transcript and spatial barcode was performed with a naive clustering approach (mismatches 1) similar to that described in UMI-tools74.
Processing of bacterial reads
Generation of gold standard mouse gut bacterial reference
FASTQ reads were generated with bcl2fastq2, and reads were quality filtered using KneadData (version 0.7.4) (https://huttenhower.sph.harvard.edu/kneaddata/) (mouse database mouse_C57BL). MEGAHIT75 (version 1.2.9) was used for assembly of the filtered reads, and bowtie2 (ref.76) (version 2.3.4.3) was used for mapping reads to the assembly. MetaBAT2 (ref.77) (version 2.15) was used for binning the assembly, and the command-line version of NCBI BLAST78 (version 2.9.0+) was used to assign taxonomy to contigs with blastn and database ‘nt’. MEGAHIT, bowtie2 and MetaBAT2 were all run using default settings. Assignments were filtered (E-value ≤ 10E−6) and sorted (by E-value and percent identity), and each contig was then assigned the top taxonomy assignment. Contigs belonging to an assigned taxonomy on species level at various cutoffs (>0.1%, >0.05% and >0.01% corresponding to 65, 121 and 419 species, respectively) were retained. For each cutoff, reference genomic sequences (complete genomes, chromosomes or scaffolds, depending on availability for these species) were downloaded from the NCBI RefSeq database31 (release 205), resulting in FASTA sequence databases (one for each cutoff) of the taxa found in SPF mice (n = 6) and used as input to build custom databases in Kraken2 (version 2.0.9)30 according to Kraken2 default instructions, including masking of low-complexity regions. Reference genomes for six species were not found in the RefSeq database (Supplementary Table 5) and were not included in the FASTA sequence databases. The mouse gut bacterial references were also filtered for genera that have previously been found in mice and/or the intestine79–81. A phylogenetic tree of the reference taxa was built using NCBI’s Common Tree and visualized using iTOL (version 6.4.3)82. When analyzing mouse gut tissue with defined flora (ASF), genome sequences according to ref. 83 were downloaded from the NCBI and used as input to build a custom ASF database in Kraken2.
Generation of simulated data
Two simulated datasets were generated based on the abundance of taxa using cutoffs 0.1% and 0.01% (as described in the ‘Generation of gold standard mouse gut bacterial reference’ subsection): 16S rRNA FASTA sequences for the taxa found in SPF mice were downloaded from the NCBI (downloaded 24 July 2021), except two taxa where the 16S rRNA FASTA sequence were missing (Sodaliphilus pleomorphus and Anaerocolumna sedimenticola). Command-line NCBI BLAST78 (version 2.9.0+) was used to align every possible sequence version of the 16S surface probe to the 16S rRNA FASTA sequences to find the best possible alignment for the 16S surface probe per taxa. To mimic spatially captured reads from a real SHM-seq, 2 million paired reads from a real SHM-seq experiment were used as a template for FASTQ headers, sequence and quality scores for the forward read and FASTQ headers and quality scores for the reverse read. The sequences in Read 2 were replaced by 150-bp-long fragments of the 16S rRNA sequences from randomly selected taxa. Fragments were created by selecting a region upstream of the best possible alignment of the 16S surface probe per randomly selected taxa. Each region was then randomly selected a length based on a normal length distribution with parameters characteristic to a spatial array (400 ± 44 bp) and trimmed to 150 bp. This resulted in a simulated dataset with 2 million randomly selected 16S rRNA gene sequences, generated from where the 16S surface probe was expected to capture, from the taxa in our mouse gut bacterial references but with known exact taxa and both reverse and forward reads.
Deep learning model: data pre-processing
A total of 500,000 DNA sequences were randomly selected from the simulated dataset based on a 0.1% abundance cutoff (described in the ‘Generation of simulated data’ subsection) and uniformly sampled, and single-point mutations with 0.1% rate were introduced. This was followed by random shortening based on a normal distribution of fragment lengths from a true SHM-seq experiment (143 ±13 bp, truncated at 150 bp). Reads from each taxon in the mouse gut bacterial reference were represented at least 100 times per genus. Sequences were one-hot encoded, such that each nucleotide (A, C, T, G and N) was represented by a five-dimensional binary vector, followed by sequence padding up to the maximum length (150 bp). Taxa labels were one-hot encoded into one of N genera. The encoded sequences and taxa labels were provided as input for training the model.
Deep learning model: architecture
A taxonomic classifier of short reads was implemented using Keras84 with TensorFlow85 back end (version 2.2.0) in Python (version 3.8.10) (Supplementary Fig. 4a). The model takes as input one-hot encoded DNA sequences of varying lengths and provides a genus label as output. First, a masking layer was used to ignore padded entries, followed by four layers of a one-dimensional convolutional layer with kernel sizes of 15, 17, 19 and 23 to extract short motifs, followed by a concatenation and a dropout (50% rate) module and two bidirectional long short-term memory network layers, which processed the sequences in both directions. This was followed by another dropout layer (20% rate), a dense layer (reLU activation), a dropout layer (10% rate), another dense layer (reLU activation) and, finally, a fully connected layer (softmax activation) to reduce the final output size to the number of distinct genera in the input data. In total, the model consisted of 298,760 trainable parameters. Cross-entropy loss was used to train a multi-class classifier with Adam as the optimization algorithm86. The model architecture was visualized using Netron87.
Deep learning model: training details
Model parameters were optimized by using 80% of sequences for training and 20% for testing. Each epoch started with shuffling the training data and computing the gradient update once for each training data point to obtain unbiased gradient estimates88. During training, categorical accuracy and cross-entropy loss were used to monitor progress. Training was terminated after a maximum of 15 epochs or when the training loss did not decrease in five consecutive epochs. The area under the receiver operating characteristic (ROC) curve and the F1 score were calculated using Scikit-learn (version 0.24.2)89 and used to report the final performance on test data.
Deep learning model: evaluation
One million simulated sequences with corresponding taxa (as in the ‘Generation of simulated data’ subsection) were modified with a sequencing error rate of 1%90 and random shortening as described above. Sequences were classified either by Kraken2 alone or by Kraken2 followed by the deep learning model. Performance was evaluated compared to the ground truth taxa labels by calculating Bray–Curtis dissimilarities and Pearson correlation coefficients of the bacterial relative abundances per spot using Scipy (version 1.1.0)91 spatial.distance.braycurtis and Scikit-learn (version 0.24.2)89 stats.pearsonr, respectively. A higher similarity of the relative abundances between classifications and the ground truth resulted in lower Bray–Curtis dissimilarities and higher Pearson correlations. Accuracy and F1 score were calculated on the whole dataset using Scikit-learn (version 0.24.2)89 metrics.classification_report.
Comparison of taxonomy assignments
To compare how well Kraken2 performs when using different RefSeq databases (whole genome versus 16S rRNA) of different sizes (restricted versus unrestricted), taxonomy assignments were made by the taxonomy assignment pipeline but without using the deep learning model (as described in the ‘Raw reads processing and mapping of bacterial data’ subsection). The four databases used in the comparisons were: RefSeq Bacteria whole genome database (downloaded from Kraken2 GitHub version 2.1.2) and adding to it the whole genomes from all eight ASF species in Kraken2 (ref.83) (‘RefSeq whole genomes’); the custom gold standard restricted whole genome database (‘65 species whole genome’, as described in the ‘Generation of gold standard mouse gut bacterial reference’ subsection) and the RefSeq Bacteria 16S rRNA database, derived from those RefSeq bacterial taxa that had available 16S rRNA sequences in the NCBI (‘RefSeq 16S rRNA’, ~3,000 taxa, downloaded on 24 July 2021); and finally, we restricted the RefSeq 16S rRNA database to the 65 species detected in the gold standard restricted whole genome database (‘65 species 16S rRNA’). For comparing the impact of read lengths, simulated datasets were prepared as described in the ‘Generation of simulated data’ subsection by using cutoff 0.1% but with longer length distribution (650 ± 44 bp) and trimmed to 150 bp, 300 bp, 450 bp and 600 bp.
Raw reads processing and mapping of bacterial data
FASTQ reads were generated with bcl2fastq2 and trimmed to remove adaptor sequences using BBDuk70. Trimmed reads were quality filtered using the same quality-filtering step as in the ST Pipeline (version 1.7.6)29, but only reads longer than 100 nt were kept. TagGD73 was used to connect the spatial barcode to each forward read (k-mer 6, mismatches 2, Hamming distance clustering algorithm), and BWA-MEM (version 0.7.17)92 with reference mouse genome (GRCm39) was used to remove host mapping sequences. Remaining reverse reads were mapped to the mouse gut bacterial reference (created as described in the ‘Generation of gold standard mouse gut bacterial reference’ subsection) using Kraken2 (version 2.0.9)30 (confidence 0.01). Reads originated from GF and SPF mice were mapped to the mouse gut bacterial reference, whereas reads originated from ASF mice were mapped to the ASF reference. Taxonomy assignments made by Kraken2 were improved using the deep learning model. UMIs with identical spatial barcodes and taxonomical assignments were collapsed using UMI-tools (version 1.0.0)74 (UMIClusterer, threshold 1), resulting in a bacteria-by-barcode matrix.
Analysis of bacterial validation data
Spatial analysis of bacterial fluorescence
Bacterial presence in scanned fluorescence images was detected using ilastik (version 1.3.3)93. After training and testing each bacterial fluorescence print separately in ilastik, the resulting bacterial detection mask was aligned with the fluorescent image to detect mean fluorescence intensity per spatial coordinate and stored as a matrix. This matrix was then run in Splotch (as described in the ‘Hierarchical probabilistic modeling using Splotch’ subsection). Resulting normalized fluorescence intensity was compared to the normalized bacterial presence by randomly selecting, at most, three spatial coordinates from each annotated region per sample (only annotated regions that were shared between the normalized fluorescence intensity and the normalized bacterial presence were considered) and scaling them within each sample, before matching them to a spatial coordinate in the same region and comparing them to each other (normalized fluorescence intensity versus normalized bacterial presence per spatial coordinate). To limit the region annotated as pellet, spatial coordinates annotated as pellet were selected if they were spatially adjacent to coordinates annotated as mouse tissue. This procedure was repeated 1,000 times to generate an average spatial correlation measurement between normalized bacterial FISH intensity and normalized sequenced bacterial presence, expressed as Spearman correlation.
16S surface probe sensitivity
To evaluate 16S surface probe sensitivity, reference DNA sequence and gene annotation files were downloaded from Ensembl Bacteria94 for the ASF bacteria available in the database (version 104.1) (ASF356, ASF360, ASF457, ASF492, ASF500 and ASF519 (taxonomy ID 1235789)). Reads captured from ASF tissue sections on a spatial transcriptomics QC array with only 16S surface probes on the array surface were separately mapped against each ASF bacteria genome using BWA-MEM (version 0.7.17)92. Gene body coverage over the 16S rRNA genes in respective reference genomes was generated using RSeQC (version 4.0.0)95. Genome binning was done by summarizing the aligned reads in separate bins, each bin representing a hundredth of the respective ASF genome.
16S surface probe specificity
Specificity was first evaluated by proportion of bacteria versus mouse read alignment. Tissue sections from SPF, ASF and GF mice were placed on QC arrays with 16S surface probes, and finished libraries were prepared using either bacterial treatment or colon treatment. Each finished library was sequenced to approximately 660,000 reads. Reads were taxonomically annotated by using the taxonomy assignment pipeline without the deep learning model. The proportion of reads mapping to the respective bacterial reference (mouse gut bacterial reference for SPF and GF tissue samples and ASF reference for ASF tissue samples) was calculated by using the number of trimmed reads.
Protocol specificity was also evaluated by comparing the bacterial treatment with a mechanical treatment (see the ‘Mechanical extraction of bacterial RNA’ subsection). Spearman rank and Pearson correlation coefficients were calculated using Scipy’s (version 1.1.0)91 stats.spearmanr and stats.pearsonr.
Bacterial treatment was compared to a bulk 16S rRNA sequencing dataset62 where the 16S libraries were made from material originating from feces of C57BL/6J mice (Sequence Read Archive (SRA) sample references: SRR9212951, SRR9213178 and SRR9213335). The correlation was calculated using Scipy’s (version 1.1.0) Pearson correlation coefficient91.
Comparison of the taxonomy assignment pipeline with QIIME 2
The taxonomy assignment pipeline (as described in the ‘Raw reads processing and mapping of bacterial data’ subsection) was compared to QIIME 2 (ref.32) (version 2022.2) by using the simulated dataset (generated as described in the ‘Generation of simulated data’ subsection). QIIME 2 was run with default settings for single-end sequences, and the Silva 138 99% OTUs full-length sequences classifier was used for taxonomic profiling.
Effect of bacterial treatment on mouse gene expression
To evaluate the effect of the bacterial treatment on measured host (mouse) gene expression, we normalized96 gene counts from samples with and without bacterial treatment (reads downsampled to the same saturation levels) and from samples prepared on a spatial array with customized surface or a standard spatial array (reads downsampled to the same saturation levels). Pearson correlation coefficient was calculated using Scipy’s (version 1.1.0)91 stats.pearsonr.
Spatial modeling and visualization of host–microbiome data
Hierarchical probabilistic modeling using Splotch
Splotch36,37 was used for statistical analysis of spatial data. Splotch is a hierarchical probabilistic that captures variation in spatial transcriptomics data through modeling of different study design covariates, such as individual’s age or mouse condition (); a linear model component capturing spatial variation in array data with a conditional autoregressive (CAR) prior (); and gene expression variation captured in each independent spatial measurement () to account for technical artifacts. Sequencing depth is accounted for by using a size factor s where the total number of captured UMI counts per spatial spot is divided by the median UMI counts across all analyzed spots. The posterior distribution of the parameters is interrogated from the model—for example, when the model was conditioned of bacterial presence in the tissues to quantitate expression changes across both the mouse conditions and different tissue contexts.
Genes (i), tissue sections (j) and independent spatial spots (k) were indexed as follows: Gene expression in each spot is considered an approximation of observed counts , where is expected to equal to . is the size factor (total number of UMIs observed at spot k and tissue section , and is the rate of gene expression (referred to as normalized counts throughout)). Splotch then models the observed counts using the zero-inflated Poisson (ZIP) distribution:
1 |
where represents the gene-specific probability of a dropout. The zero-inflated models account for an overabundance of zeros by introducing a second zero-generating process gated by a Bernoulli random variable:
2 |
where the Poisson process can be replaced by negative binomial (NB) without loss of generality. The gene expression rate parameter λi,j,k is described in terms of a generalized linear model (GLM) by three components:
3 |
where is the characteristic expression of gene k within the context of spot k, from which a characteristic expression vector is derived describing which MROI spot k comes from. At the top level, the dataset is split along an important covariate (for example, presence of bacteria), and a separate is modeled for each unique group (). At the next level, each set is further partitioned along another covariate (for example, animal individual). A two-level hierarchical model for can, thus, be specified as:
4 |
where, in practice, for all , and posteriors are inferred over all . For convenience, because each tissue belongs to one covariate group at each level, the inverse mapping function is introduced that maps to the appropriate l1,l2,l3 indices for βi. With this in hand, Bi,j,kis formally defined in the non-compositional model:
5 |
where is a one-hot encoding of the spot MROI annotation used to index the relevant entry in the characteristic expression vector .
ψi,j,k describes the how the local and immediate neighborhood of spot k has an effect gene i and is modeled using the CAR prior. The observations in each spatial spot are assumed to be dependent on the spot’s immediate spatial neighborhood defined as four nearest neighbors. ψi,j,k is defined as a Markov random field over the spots in each array:
6 |
where is a spatial autocorrelation parameter; is a conditional precision parameter; is a diagonal matrix containing the number of neighbors for each spot in tissue ; and is the adjacency matrix (with zero diagonal).
captures variation at the level of individual spots with the assumption that each spot was independently and identically distributed (i.i.d) to infer their standard deviations:
7 |
where σi is the inferred level of variability for gene .
Data were processed as a two-level model when describing differences in mouse model/condition and morphological region (when comparing SPF versus GF mice) or as a one-level model for ASF mouse analysis. Input data were raw UMI counts (as described above). Sampling from the posterior was done running four independent chains with 200 iterations per chain (100 warmup and 100 sampling). The model was conditioned on 10,924 spots, 16 morphological region tags and two mouse conditions (SPF versus GF) (two-level model) or 4,397 spots, five morphological region tags and one mouse condition (ASF) (one-level model).
For differential expression analysis, each pairwise comparison of gene expression was denoted as a random variable Δβ that describes the difference between two conditions as β1 − β2. β1 and β2 represent any two conditions arising from any two combinations in the model—for example, any two genes, sample covariates (for example, mouse condition; SPF versus GF) or MROIs regions (for example, crypt apex and mucosa versus crypt base). The null hypothesis presumes that the two posterior distributions over characteristic expression coefficients β1 and β2 estimated by the model are identical and that Δβ is tightly centered around zero. To quantify this similarity, Δβ| (where is the training data) is compared to the prior distribution Δβ using the Savage–Dickey density ratio97 that estimates the Bayes factor (BF) between the conditions:
8 |
where the probability density functions are evaluated at zero. If expression is different between the two conditions, then the posterior Δβ| will not be centered around zero, and the estimated BF will be large; hence, the null hypothesis is rejected, and the two genes are denoted as differentially expressed between the conditions. Hereafter, the Savage–Dickey density ratio is referred to as BF. Upregulated genes (Δβ > 0) with at least log(BF) > 0.5 were considered as differentially expressed between any two conditions and used in all downstream analysis. Bacterial genera were called as detected in SPF tissue if the bacterial weighted mean count per morphological region was greater than the maximal weighted mean in corresponding morphological mouse region in GF. The total regional count had to count for more than 2% of the total bacterial count to be called as detected.
Visualizing expression and abundances with rasters
To enable spatial data visualization across sections and conditions, a rasterized tissue representation of canonical tissue architecture of the mid part of the colonic tube was created as scalable vector graphics (svg) and annotated with MROI information. Tissue vectors captured the two most common tissue architectures observed in this study (a zoomed-out view of major MROIs (E, EMM, ME, MEI, MI, MMI, PP, MUPE and P) and a zoomed-in view of minor MROIs (APEXMU, BASE, MID, UPPERMID, EMM, EMMSUB, ME, MEI, MI, MMI, PP, MUPE and P)) and used only for visualizations. matplotlib98 was used to automatically plot averaged host gene or bacterial expression from all spatial spots corresponding to each MROI and condition as annotated in the svg files.
Host gene expression mapped using cell type signatures
snRNA-seq data processing
Mouse colon snRNA-seq data were obtained from ref. 99, containing 340,461 individual cell profiles across 22,986 expressed genes. In brief, nucleus profiles with >800 genes expressed in a minimum of 10 cells and <30% mitochondrial or rRNA signatures were retained for analysis. Raw counts data were normalized to transcripts-per-10,000 (TP10K). To regress out genes as differentially expressed, the mean and the coefficient of variation (CV) of expression of each gene were calculated and partitioned into 20 equal-frequency bins. LOESS regression was used to estimate the relationship between log(CV) and log(mean), and genes with the 1,500 highest residuals were equally sampled across these bins. To account for differences in batches, this was performed for each sample separately, and a consensus list of 1,500 genes with greatest recovery rates was selected. Next, using Scanpy100, Harmony101 was used for further batch correction with 20 neighbors and 40 principal components from principal component analysis. After 10 iterations, convergence was reached, and the resulting data were clustered with PhenoGraph102, with 25 nearest neighbors using the Minkowski metric. Cell type labels provided in ref. 99 were used to manually label clusters after PhenoGraph clustering.
Spatial co-expression analysis and definition of modules
All posterior estimates that account for both morphological differences and differences in mouse conditions were used as λi,j,k in a sparse matrix format , where Nspots = 5,413 and Ngenes = 17,956. The snRNA-seq normalized counts and SHM-seq posterior means counts tables were standardized separately across cells and spots within genes, respectively, considering common genes (Ncommon genes = 16,525) in both datasets, resulting in matrices and . Finally, the similarity of each cell to each spot was calculated as the Pearson correlation coefficient r between its standardized and imputed expression vector (columns of ) and spots’ expression vectors (columns of ), resulting in cell-specific similarity vectors. Morphological spots were used from all region categories except for those found in PE and MUPE. To find sets of co-expressed genes—that is, with similar spatial patterns across spots—the data P were hierarchically clustered with the average linkage method using the L1 norm (Manhattan distance), with a set distance threshold to detect 28 distinct blocks (subsets of genes co-expressed across subsets of spots—hereafter, spatial modules) using scipy.cluster.hierarchy.fcluster.
Using snRNA-seq profiles to partition modules to submodules
Gene expression submatrices were created of the expression of genes belonging to each spatial co-expression module. To identify which specific cell types underlie expression in each spatial module or submodule, mean expression values were calculated for each gene across the single-cell profiles in each of 30 snRNA-seq clusters (as described in the ‘snRNA-seq data processing’ subsection) and scaled by dividing each gene’s mean expression per cluster by its maximum mean expression across cell type clusters. Genes with an average scaled expression lower than 1 were removed (scaled expressions set to 0). Then, to estimate cell type compositions in each spatial module (or submodule), the expression profiles in each spatial module for the subset of genes from the 30 filtered and averaged snRNA-seq cell type (cluster) signatures were hierarchically subclustered within each spatial module using cosine distance and average linkage. These genes of each module were then grouped in submodules using 0.4× the maximum of the linkage matrix as cutoff. Next, two-sided Wilcoxon signed-rank test (followed by Benjamini–Hochberg false discovery rate (FDR)) was used to compare enrichment of cell type (cluster) signatures in the co-expression submodules in a one-versus-rest fashion. The cell types used in the enrichment analysis were: neurons, transit amplifying cells (TAs), cycling TAs, myocytes, goblet cells, colonocytes, fibroblasts, glia, lymphatic cells, macrophages, enteroendocrine cells, mesothelial cells, stem cells, T cells, tuft cells, B cells and vascular cells.
KEGG pathway enrichment
KEGG database103 gene sets were tested for enrichment in each cell-type-specific submodule with a one-tailed Fisher exact test followed by a Benjamini–Hochberg FDR. KEGG pathways with FDR < 0.05 were visualized.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41587-023-01988-1.
Supplementary information
Acknowledgements
We thank A. Hupalowska for help with figure preparation. We thank E. Brown for help with mouse husbandry and H. Vlamakis for fruitful discussions. Work was supported by the Knut and Alice Wallenberg Foundation, the Beijer Laboratory for Gene and Neuro Research, the Royal Swedish Academy of Sciences, the Swedish Society for Medical Research, Science for Life Laboratory and 1RM1 HG011014-01 (S.V.), the Hans Werthén Foundation, Foundation Blanceflor Boncompagni Ludovisi née Bildt (B.L), the Klarman Cell Observatory, the Manton Foundation and the Howard Hughes Medical Institute (A.R.). S.V was supported as a Wallenberg Fellow at the Broad Institute of MIT and Harvard and as a Wallenberg Academy Fellow and SciLifeLab Fellow at Uppsala University. A.R. was an investigator of the Howard Hughes Medical Institute.
Author contributions
S.V. conceived and designed the study and experiments, with guidance from A.R. B.L. performed the experiments. S.V. and B.L. analyzed the data, with guidance from A.R. B.L. built the deep learning model, with help from M.S. S.V. annotated the histological sections. S.V., B.L. and A.R. wrote the paper, with input from all authors. All authors discussed the results.
Peer review
Peer review information
Nature Biotechnology thanks Iwijn De Vlaminck and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Funding
Open access funding provided by Uppsala University.
Data availability
All raw data have been deposited to NCBI’s SRA under accession PRJNA999495 (ref.104). All processed data have been deposited in the Single Cell Portal under accession SCP2375 (https://singlecell.broadinstitute.org/single_cell/study/SCP2375).
Code availability
All code is deposited on GitHub at https://github.com/nygctech/shmseq (ref.105).
Competing interests
A.R. is a founder and equity holder of Celsius Therapeutics; is an equity holder in Immunitas Therapeutics; and, until 31 August 2020, was a scientific advisory board member of Syros Pharmaceuticals, Neogene Therapeutics, Asimov and Thermo Fisher Scientific. From 1 August 2020, A.R. is an employee of Genentech and an equity holder in Roche. S.V is an author on patents applied for by Spatial Transcriptomics AB (10x Genomics). S.V. and A.R. are co-inventors on PCT/US2020/015481, related to this work. The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Aviv Regev, Email: aviv.regev.sc@gmail.com.
Sanja Vickovic, Email: svickovic@nygenome.org.
Supplementary information
The online version contains supplementary material available at 10.1038/s41587-023-01988-1.
References
- 1.Tlaskalová-Hogenová, H. et al. Commensal bacteria (normal microflora), mucosal immunity and chronic inflammatory and autoimmune diseases. Immunol. Lett.93, 97–108 (2004). 10.1016/j.imlet.2004.02.005 [DOI] [PubMed] [Google Scholar]
- 2.Donaldson, G. P. et al. Gut microbiota utilize immunoglobulin A for mucosal colonization. Science360, 795–800 (2018). 10.1126/science.aaq0926 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hanahan, D. & Coussens, L. M. Accessories to the crime: functions of cells recruited to the tumor microenvironment. Cancer Cell21, 309–322 (2012). 10.1016/j.ccr.2012.02.022 [DOI] [PubMed] [Google Scholar]
- 4.Bodenmiller, B. Multiplexed epitope-based tissue imaging for discovery and healthcare applications. Cell Syst.2, 225–238 (2016). 10.1016/j.cels.2016.03.008 [DOI] [PubMed] [Google Scholar]
- 5.Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet.47, 979–986 (2015). 10.1038/ng.3359 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rivas, M. A. et al. Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat. Genet.43, 1066–1073 (2011). 10.1038/ng.952 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tropini, C., Earle, K. A., Huang, K. C. & Sonnenburg, J. L. The gut microbiome: connecting spatial organization to function. Cell Host Microbe21, 433–442 (2017). 10.1016/j.chom.2017.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Earle, K. A. et al. Quantitative imaging of gut microbiota spatial organization. Cell Host Microbe18, 478–488 (2015). 10.1016/j.chom.2015.09.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Shi, H. et al. Highly multiplexed spatial mapping of microbial communities. Nature588, 676–681 (2020). 10.1038/s41586-020-2983-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Sheth, R. U. et al. Spatial metagenomic characterization of microbial biogeography in the gut. Nat. Biotechnol.37, 877–883 (2019). 10.1038/s41587-019-0183-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Donaldson, G. P., Lee, S. M. & Mazmanian, S. K. Gut biogeography of the bacterial microbiota. Nat. Rev. Microbiol.14, 20–32 (2016). 10.1038/nrmicro3552 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Berry, D. et al. Host-compound foraging by intestinal microbiota revealed by single-cell stable isotope probing. Proc. Natl Acad. Sci. USA110, 4720–4725 (2013). 10.1073/pnas.1219247110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Png, C. W. et al. Mucolytic bacteria with increased prevalence in IBD mucosa augmentin vitroutilization of mucin by other bacteria. Am. J. Gastroenterol.105, 2420–2428 (2010). 10.1038/ajg.2010.281 [DOI] [PubMed] [Google Scholar]
- 14.Wu, M. et al. The differences between luminal microbiota and mucosal microbiota in mice. J. Microbiol. Biotechnol.30, 287–295 (2020). 10.4014/jmb.1908.08037 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tanay, A. & Regev, A. Scaling single-cell genomics from phenomenology to mechanism. Nature541, 331–338 (2017). 10.1038/nature21350 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kuchina, A. et al. Microbial single-cell RNA sequencing by split-pool barcoding. Science371, eaba5257 (2021). 10.1126/science.aba5257 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ma, P. et al. Bacterial droplet-based single-cell RNA-seq reveals antibiotic-associated heterogeneous cellular states. Cell186, 877–891 (2023). 10.1016/j.cell.2023.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Saikia, M. et al. Simultaneous multiplexed amplicon sequencing and transcriptome profiling in single cells. Nat. Methods16, 59–62 (2019). 10.1038/s41592-018-0259-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science348, aaa6090 (2015). 10.1126/science.aaa6090 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M. & Cai, L. Single-cell in situ RNA profiling by sequential hybridization. Nat. Methods11, 360–361 (2014). 10.1038/nmeth.2892 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Eng, C.-H. L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature568, 235–239 (2019). 10.1038/s41586-019-1049-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lee, J. H. et al. Highly multiplexed subcellular RNA sequencing in situ. Science343, 1360–1363 (2014). 10.1126/science.1250212 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science363, 1463–1467 (2019). 10.1126/science.aaw1219 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Vickovic, S. et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat. Methods16, 987–990 (2019). 10.1038/s41592-019-0548-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science353, 78–82 (2016). 10.1126/science.aaf2403 [DOI] [PubMed] [Google Scholar]
- 26.Vickovic, S. et al. SM-Omics is an automated platform for high-throughput spatial multi-omics. Nat. Commun.13, 795 (2022). 10.1038/s41467-022-28445-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Vickovic, S. et al. Massive and parallel expression profiling using microarrayed single-cell sequencing. Nat. Commun.7, 13182 (2016). 10.1038/ncomms13182 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Herlemann, D. P. et al. Transitions in bacterial communities along the 2000 km salinity gradient of the Baltic Sea. ISME J.5, 1571–1579 (2011). 10.1038/ismej.2011.41 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Navarro, J. F., Sjöstrand, J., Salmén, F., Lundeberg, J. & Ståhl, P. L. ST Pipeline: an automated pipeline for spatial mapping of unique transcripts. Bioinformatics33, 2591–2593 (2017). 10.1093/bioinformatics/btx211 [DOI] [PubMed] [Google Scholar]
- 30.Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol.20, 257 (2019). 10.1186/s13059-019-1891-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res.44, D733–D745 (2016). 10.1093/nar/gkv1189 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol.37, 852–857 (2019). 10.1038/s41587-019-0209-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Dewhirst, F. E. et al. Phylogeny of the defined murine microbiota: altered Schaedler flora. Appl. Environ. Microbiol.65, 3287–3292 (1999). 10.1128/AEM.65.8.3287-3292.1999 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sarma-Rupavtarm, R. B., Ge, Z., Schauer, D. B., Fox, J. G. & Polz, M. F. Spatial distribution and stability of the eight microbial species of the altered Schaedler flora in the mouse gastrointestinal tract. Appl. Environ. Microbiol.70, 2791–2800 (2004). 10.1128/AEM.70.5.2791-2800.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wymore Brand, M. et al. The altered schaedler flora: continued applications of a defined murine microbial community. ILAR J.56, 169–178 (2015). 10.1093/ilar/ilv012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Maniatis, S. et al. Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis. Science364, 89–93 (2019). 10.1126/science.aav9776 [DOI] [PubMed] [Google Scholar]
- 37.Äijö, T., Maniatis, S., Vickovic, S., Kang, K. & Cuevas, M. Splotch: robust estimation of aligned spatial temporal gene expression data. Preprint at bioRxiv10.1101/757096 (2019).
- 38.Ni, H. et al. SATB2 defect promotes colitis and colitis-associated colorectal cancer by impairing Cl−/HCO3− exchange and homeostasis of gut microbiota. J. Crohns Colitis15, 2088–2102 (2021). 10.1093/ecco-jcc/jjab094 [DOI] [PubMed] [Google Scholar]
- 39.Johansson, M. E. V. et al. The inner of the two Muc2 mucin-dependent mucus layers in colon is devoid of bacteria. Proc. Natl Acad. Sci. USA105, 15064–15069 (2008). [DOI] [PMC free article] [PubMed]
- 40.Cattin, A.-L. et al. Hepatocyte nuclear factor 4α, a key factor for homeostasis, cell architecture, and barrier function of the adult intestinal epithelium. Mol. Cell. Biol.29, 6294–6308 (2009). 10.1128/MCB.00939-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Esworthy, R. S. et al. Mice with combined disruption of Gpx1 and Gpx2 genes have colitis. Am. J. Physiol. Gastrointest. Liver Physiol.281, G848–G855 (2001). 10.1152/ajpgi.2001.281.3.G848 [DOI] [PubMed] [Google Scholar]
- 42.Togo, A. H. et al. Description of Mediterraneibacter massiliensis, gen. nov., sp. nov., a new genus isolated from the gut microbiota of an obese patient and reclassification of Ruminococcus faecis, Ruminococcus lactaris, Ruminococcus torques, Ruminococcus gnavus and Clostridium glycyrrhizinilyticum as Mediterraneibacter faecis comb. nov., Mediterraneibacter lactaris comb. nov., Mediterraneibacter torques comb. nov., Mediterraneibacter gnavus comb. nov. and Mediterraneibacter glycyrrhizinilyticus comb. nov. Antonie Van Leeuwenhoek111, 2107–2128 (2018). 10.1007/s10482-018-1104-y [DOI] [PubMed] [Google Scholar]
- 43.Graziani, F. et al. Ruminococcus gnavus E1 modulates mucin expression and intestinal glycosylation. J. Appl. Microbiol.120, 1403–1417 (2016). 10.1111/jam.13095 [DOI] [PubMed] [Google Scholar]
- 44.Bell, A. et al. Elucidation of a sialic acid metabolism pathway in mucus-foraging Ruminococcus gnavus unravels mechanisms of bacterial adaptation to the gut. Nat. Microbiol.4, 2393–2404 (2019). 10.1038/s41564-019-0590-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Paone, P. & Cani, P. D. Mucus barrier, mucins and gut microbiota: the expected slimy partners? Gut69, 2232–2243 (2020). 10.1136/gutjnl-2020-322260 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Swidsinski, A., Loening-Baucke, V., Lochs, H. & Hale, L.-P. Spatial organization of bacterial flora in normal and inflamed intestine: a fluorescence in situ hybridization study in mice. World J. Gastroenterol.11, 1131–1140 (2005). 10.3748/wjg.v11.i8.1131 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Tall, M. L. et al. Massilistercora timonensis gen. nov., sp. nov., a new bacterium isolated from the human microbiota. New Microbes New Infect.35, 100664 (2020). 10.1016/j.nmni.2020.100664 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Turner, J. R. Intestinal mucosal barrier function in health and disease. Nat. Rev. Immunol.9, 799–809 (2009). 10.1038/nri2653 [DOI] [PubMed] [Google Scholar]
- 49.Kitamura, Y. et al. Regulation by gut commensal bacteria of carcinoembryonic antigen-related cell adhesion molecule expression in the intestinal epithelium. Genes Cells20, 578–589 (2015). [DOI] [PubMed]
- 50.Murata, Y. et al. Protein tyrosine phosphatase SAP-1 protects against colitis through regulation of CEACAM20 in the intestinal epithelium. Proc. Natl Acad. Sci. USA112, E4264–E4271 (2015). 10.1073/pnas.1510167112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Karhausen, J. et al. Epithelial hypoxia-inducible factor-1 is protective in murine experimental colitis. J. Clin. Invest.114, 1098–1106 (2004). 10.1172/JCI200421086 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Schwartz, S. et al. A metagenomic study of diet-dependent interaction between gut microbiota and host in infants reveals differences in immune response. Genome Biol.13, r32 (2012). 10.1186/gb-2012-13-4-r32 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.van Driel, B. J., Liao, G., Engel, P. & Terhorst, C. Responses to microbial challenges by SLAMF receptors. Front. Immunol.7, 4 (2016). [DOI] [PMC free article] [PubMed]
- 54.Viola, M. F. & Boeckxstaens, G. Niche-specific functional heterogeneity of intestinal resident macrophages. Gut70, 1383–1395 (2021). 10.1136/gutjnl-2020-323121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Deloris Alexander, A. et al. Quantitative PCR assays for mouse enteric flora reveal strain-dependent differences in composition that are influenced by the microenvironment. Mamm. Genome17, 1093–1104 (2006). 10.1007/s00335-006-0063-1 [DOI] [PubMed] [Google Scholar]
- 56.Johnson, J. S. et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat. Commun.10, 5029 (2019). 10.1038/s41467-019-13036-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Anitha, M., Vijay–Kumar, M., Sitaraman, S. V., Gewirtz, A. T. & Srinivasan, S. Gut microbial products regulate murine gastrointestinal motility via Toll-like receptor 4 signaling. Gastroenterology143, 1006–1016 (2012). 10.1053/j.gastro.2012.06.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Rood, J. E. et al. Toward a common coordinate framework for the human body. Cell179, 1455–1467 (2019). 10.1016/j.cell.2019.11.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Liu, Y. et al. High-spatial-resolution multi-omics sequencing via deterministic barcoding in tissue. Cell183, 1665–1681 (2020). 10.1016/j.cell.2020.10.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Chassaing, B., Aitken, J. D., Malleshappa, M. & Vijay-Kumar, M. Dextran sulfate sodium (DSS)-induced colitis in mice. Curr. Protoc. Immunol.104, 15.25.1–15.25.14 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Ahrends, T. et al. Enteric pathogens induce tissue tolerance and prevent neuronal loss from subsequent infections. Cell184, 5715–5727 (2021). 10.1016/j.cell.2021.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Rosshart, S. P. et al. Laboratory mice born to wild mice have natural microbiota and model human immune responses. Science365, eaaw4361 (2019). 10.1126/science.aaw4361 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Brown, E. M. et al. Gut microbiome ADP-ribosyltransferases are widespread phage-encoded fitness factors. Cell Host Microbe29, 1351–1365 (2021). 10.1016/j.chom.2021.07.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Salmén, F. et al. Barcoded solid-phase RNA capture for Spatial Transcriptomics profiling in mammalian tissue sections. Nat. Protoc.13, 2501–2534 (2018). 10.1038/s41596-018-0045-2 [DOI] [PubMed] [Google Scholar]
- 65.Shen, X. J. et al. Molecular characterization of mucosal adherent bacteria and associations with colorectal adenomas. Gut Microbes1, 138–147 (2010). 10.4161/gmic.1.3.12360 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Olszewska, M. A., Kocot, A. M., Nynca, A. & Łaniewska-Trokenheim, Ł. Utilization of physiological and taxonomic fluorescent probes to study Lactobacilli cells and response to pH challenge. Microbiol. Res.192, 239–246 (2016). 10.1016/j.micres.2016.07.011 [DOI] [PubMed] [Google Scholar]
- 67.Atherly, T. et al. Helicobacter bilis infection alters mucosal bacteria and modulates colitis development in defined microbiota mice. Inflamm. Bowel Dis.22, 2571–2581 (2016). 10.1097/MIB.0000000000000944 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Macedonia, M. C. et al. Clinically adaptable polymer enables simultaneous spatial analysis of colonic tissues and biofilms. NPJ Biofilms Microbiomes6, 33 (2020). 10.1038/s41522-020-00143-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Nadkarni, M. A. et al. Lactobacilli are prominent in the initial stages of polymicrobial infection of dental pulp. J. Clin. Microbiol.48, 1732–1740 (2010). 10.1128/JCM.01912-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Bushnell, B. BBTools. Joint Genome Institute. https://jgi.doe.gov/data-and-tools/software-tools/bbtools/
- 71.Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics29, 15–21 (2013). 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics31, 166–169 (2015). 10.1093/bioinformatics/btu638 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Costea, P. I., Lundeberg, J. & Akan, P. TagGD: fast and accurate software for DNA Tag generation and demultiplexing. PLoS ONE8, e57521 (2013). 10.1371/journal.pone.0057521 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res.27, 491–499 (2017). 10.1101/gr.209601.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics31, 1674–1676 (2015). 10.1093/bioinformatics/btv033 [DOI] [PubMed] [Google Scholar]
- 76.Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods9, 357–359 (2012). 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ7, e7359 (2019). 10.7717/peerj.7359 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.BLAST®Command Line Applications User Manual (National Center for Biotechnology Information, 2008).
- 79.Yang, J., Park, J., Park, S., Baek, I. & Chun, J. Introducing Murine Microbiome Database (MMDB): a curated database with taxonomic profiling of the healthy mouse gastrointestinal microbiome. Microorganisms7, 480 (2019). 10.3390/microorganisms7110480 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Lagkouvardos, I. et al. The Mouse Intestinal Bacterial Collection (miBC) provides host-specific insight into cultured diversity and functional potential of the gut microbiota. Nat Microbiol1, 16131 (2016). 10.1038/nmicrobiol.2016.131 [DOI] [PubMed] [Google Scholar]
- 81.Liu, C. et al. The Mouse Gut Microbial Biobank expands the coverage of cultured bacteria. Nat. Commun.11, 79 (2020). [DOI] [PMC free article] [PubMed]
- 82.Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res.49, W293–W296 (2021). 10.1093/nar/gkab301 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Wannemuehler, M. J., Overstreet, A.-M., Ward, D. V. & Phillips, G. J. Draft genome sequences of the altered Schaedler flora, a defined bacterial community from gnotobiotic mice. Genome Announc.2, e00287–14 (2014). 10.1128/genomeA.00287-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Chollet, F. et al. Keras. https://keras.io (2015).
- 85.Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In Proc. of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ʼ16) 265–283 https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf (2016).
- 86.Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at arXiv10.48550/arXiv.1412.6980 (2014).
- 87.Roeder, L. Netron-Visualizer for neural network, deep learning, and machine learning models. https://github.com/lutzroeder/netron (2020).
- 88.Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
- 89.Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res.12, 2825–2830 (2011). [Google Scholar]
- 90.Salk, J. J., Schmitt, M. W. & Loeb, L. A. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet.19, 269–285 (2018). 10.1038/nrg.2017.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods17, 261–272 (2020). 10.1038/s41592-019-0686-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics25, 1754–1760 (2009). [DOI] [PMC free article] [PubMed]
- 93.Berg, S. et al. ilastik: interactive machine learning for (bio)image analysis. Nat. Methods16, 1226–1232 (2019). [DOI] [PubMed]
- 94.Howe, K. L. et al. Ensembl Genomes 2020—enabling non-vertebrate genomic research. Nucleic Acids Res.48, D689–D695 (2019). 10.1093/nar/gkz890 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Wang, L., Wang, S. & Li, W. RSeQC: quality control of RNA-seq experiments. Bioinformatics28, 2184–2185 (2012). 10.1093/bioinformatics/bts356 [DOI] [PubMed] [Google Scholar]
- 96.Svensson, V., Teichmann, S. A. & Stegle, O. SpatialDE: identification of spatially variable genes. Nat. Methods15, 343–346 (2018). 10.1038/nmeth.4636 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Dickey, J. M. The weighted likelihood ratio, linear hypotheses on normal location parameters. Ann. Math. Statist.42, 204–223 (1971).
- 98.Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng.9, 90–95 (2007). 10.1109/MCSE.2007.55 [DOI] [Google Scholar]
- 99.Drokhlyansky, E. et al. The human and mouse enteric nervous system at single-cell resolution. Cell182, 1606–1622 (2020). 10.1016/j.cell.2020.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol.19, 15 (2018). 10.1186/s13059-017-1382-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods16, 1289–1296 (2019). 10.1038/s41592-019-0619-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell162, 184–197 (2015). 10.1016/j.cell.2015.05.047 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res.28, 27–30 (2000). 10.1093/nar/28.1.27 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Lötstedt, B., Stražar, M., Xavier, R., Regev, A. & Vickovic, S. Spatial host–microbiome sequencing reveals niches in the mouse gut. Sequence Read Archive. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA999495/ (2023). [DOI] [PMC free article] [PubMed]
- 105.Lötstedt, B., Stražar, M., Xavier, R., Regev, A. & Vickovic S. Spatial host–microbiome sequencing reveals niches in the mouse gut. GitHub. https://github.com/nygctech/shmseq (2023). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All raw data have been deposited to NCBI’s SRA under accession PRJNA999495 (ref.104). All processed data have been deposited in the Single Cell Portal under accession SCP2375 (https://singlecell.broadinstitute.org/single_cell/study/SCP2375).
All code is deposited on GitHub at https://github.com/nygctech/shmseq (ref.105).