Abstract
The significance of the oral microbiome in the generation of the nitric oxide (NO) via the enterosalivary nitrate-nitrite-nitric oxide pathway is increasingly recognised, directly linking the oral microbiome to cardiometabolic outcomes influenced by NO. The objective of this chapter is to outline a strategy of identifying pathway-specific bacterial taxa or predicted genes of interest from 16S rRNA data, specifically in the enterosalivary pathway of nitrate reduction, and analysing their relationship with cardiometabolic outcomes using multivariable regression models.
1. INTRODUCTION
Nitric oxide (NO) is an important signaling molecule involved in many physiological processes and its deficiency has been implicated in the pathogenesis of hypertension and insulin resistance1–3. Thus, NO bioavailability has garnered much attention as an increase in NO production can potentially reduce high blood pressure and blood glucose levels. As NO was thought to be produced only by NO synthases in the endothelium, immune cells and other tissues; previous research has focused on enhancing NO bioavailability produced through this synthase pathway.
However, nitrate resulting from NO oxidation metabolism or from dietary nitrate consumption has since been found to provide an important storage pool for NO. The physiological recycling of nitrate to produce NO takes place via the enterosalivary nitrate-nitrite-nitric oxide (NO3-NO2-NO) pathway. In this alternative pathway, oral bacteria play a crucial role by reducing salivary nitrate to nitrite, which is then swallowed and made systemically available for further reduction into NO in the blood vessels and tissues1. The direct role of oral bacteria in this NO3-NO2-NO pathway was demonstrated by several experimental studies that use antibacterial mouthwash to reduce oral bacteria, resulting in decreased nitrate-reducing capacity and a corresponding increase in blood pressure and plasma glucose4–8. Not all oral bacteria contribute to the reduction of salivary nitrate, and specific taxa with nitrate-reduction capacity have been identified9,10. More recently, studies have observed a correlation between baseline nitrate-reducing bacteria abundance and differential blood pressure responses to dietary nitrate supplementation11. Associations of bacteria taxa and genes coding for bacterial enzymes involved along the NO3-NO2-NO pathway with blood pressure levels have also been observed12,13, further emphasizing the importance of oral microbiome composition in NO generation.
The enterosalivary pathway of NO generation has gained significant attention in recent years14, and presents an alternative target for manipulation to improve NO bioavailability-associated cardiometabolic outcomes. While many have examined this pathway through the effects of dietary nitrate supplementation (i.e., increasing the storage pool of nitrate) on cardiometabolic outcomes15,16, fewer population-based studies have explored the association of the oral microbiome involved in the NO3-NO2-NO pathway with cardiometabolic outcomes.
Most microbiome analyses seek to identify differentially abundant taxa between disease states. The highly dimensional oral microbiome, with a large number of taxa to be compared, results in multiple comparisons and an increased false discovery rate17. The identification of a pathway-specific hypothesis linking the oral microbiome and cardiometabolic outcomes, such as the NO3-NO2-NO pathway, allows us to narrow our focus on specific bacteria or bacterial genes of interest, resulting in a priori hypothesis-driven analyses.
While whole genome shotgun metagenomic sequencing can directly yield information on the functional capacity of nitrate metabolism pathways in the oral microbiome, the relatively high cost of metagenomic sequencing and data handling may be prohibitive. Moreover, there are currently available 16S rRNA sequencing data within large cohorts that will yield valuable prospective data on the incidence of cardiometabolic outcomes and it would be a missed opportunity to not fully capitalize on those data.
The aim of this chapter is to provide explicit step-by-step examples demonstrating the identification of pathway-specific taxa or genes of interest, and their operationalization from 16S rRNA sequencing data. This exposure construct can then be leveraged in traditional statistical analysis workflows exploring the association between microbial nitrogen metabolism capacity and cardiometabolic outcomes.
2. MATERIALS
2.1. Next-Generation High-Throughput 16S rRNA Sequencing
Sequence-based microbiome analysis consists of several steps: sample collection, storage, DNA extraction, library preparation, next-generation high-throughput microbial sequencing, quality control, sequence identification, and finally statistical analysis18. Briefly, bacteria from the samples collected are lysed and the DNA extracted. In library preparation, the 16S rRNA gene – the most commonly used marker gene in oral microbiome studies and the gold standard for sequence-based bacterial analyses – is amplified from the extracted DNA. These amplicons are then sequenced on the sequencing platform of choice (e.g., Illumina) to produce sequence reads. Quality control is then carried out to filter out short reads or sequences with lower quality scores before assigning the sequences to taxonomic classifiers. Several useful papers and references are available discussing the best practices and considerations at each stage to reduce the potential biases introduced17–22.
2.2. Taxonomic Classification of Sequence Reads
In general, sequence reads obtained from sequencing are referenced against known microbial reference databases (e.g., SILVA, RDP, or Greengenes database) to assign a taxonomic identifier. A description of the different methods for sequence identification is beyond the scope of this chapter, and other papers provide an overview of the process17. Our previous analysis12 utilized the Human Oral Microbiome Identification using Next Generation Sequencing (HOMINGS) methodology specifically designed for the oral microbiome to generate species-level information, which uses a customized BLAST program (ProbeSeq for HOMINGS)23,24. Other methods of taxonomic classification of 16S rRNA sequence reads are available, such as using operational taxonomic units (OTU) clustering25, or the newer DADA2-corrected amplicon sequence variants (ASV)26. The HOMINGS methodology has been shown to be largely equivalent to the tree-based OTU clustering approach27, but increasingly the use of ASVs is recommended as the standard unit of marker-gene analysis and reporting17,26. Researchers have a choice of bioinformatics pipeline and reference database, and are recommended to document the software versions used and all commands run17.
2.3. Required data for this Chapter
For this chapter we will assume the following:
Standard 16S rRNA next-generation sequencing has been carried out on microbial DNA.
The necessary bioinformatics quality controls have been performed (e.g., filtering and trimming)19,22
An appropriate technique for sequence inference and taxonomic alignment has been used
A final table of OTUs/ASVs relating to taxa at the species level and their relative abundances in each sample has been produced. If using predicted gene abundances from 16S rRNA data, we will assume that PICRUSt2 analysis has been successfully conducted, using 16S data to infer KEGG ortholog abundances.
2.4. Required Datasets
Taxonomic table with relative abundances, such as shown in Table 1.
Metadata of participants including the clinical outcomes of interest, e.g., systolic blood pressure, insulin resistance, such as shown in Table 2.
(Optional) Output from PICRUSt2 with KEGG ortholog functional abundance. (See Table 3 example)
Table 1.
Truncated example of OTU/ASV output table after 16S rRNA sequencing and taxonomic alignment in the long format
| Taxa | Relative_abundance | ID |
|---|---|---|
|
| ||
| Actinomyces_johnsonii | 0.00001 | 1 |
| Actinomyces_massiliensis | 0.00389 | 1 |
| Actinomyces_meyeri | 0.00184 | 1 |
| Actinomyces_naeslundii | 0.05540 | 1 |
| Actinomyces_odontolyticus | 0.00001 | 1 |
| ... | ... | ... |
| Actinomyces_johnsonii | 0.000008 | 2 |
| Actinomyces_massiliensis | 0.000823 | 2 |
| ... | 2 | |
Note the taxa is different in each row but with the same ID, and each ID will have relative abundance data for each individual taxon.
Variable Key: Taxa refers to OTUs/ASVs relating to taxa at the species level; ID refers to the unique participant or sample ID. This dataset has one sample per participant. Relative abundance is calculated from absolute counts of that taxa sequence divided by the total counts across all taxa in the individual sample.
Table 2.
Example of participant metadata containing cardiometabolic outcomes of interest in wide format.
| ID | Age | Sex | Glucose | MeanSBP | ... |
|---|---|---|---|---|---|
|
| |||||
| 1 | 25 | F | 85 | 95.5 | |
| 2 | 31 | F | 78 | 116.5 | |
| 3 | 41 | F | 94 | 111.0 | |
| 4 | 30 | M | 78 | 123.5 | |
| 5 | 22 | F | 90 | 117.5 | |
| ... | |||||
Study participants are not repeated across rows and each new variable (e.g., age, sex, mean systolic blood pressure) is represented as a new column variable.
Summary scores (taxa or predicted gene-based) are added as a new column in the final dataset used in linear regression analyses.
Variable Key: ID refers to the unique participant ID; Glucose refers to the fasting plasma glucose levels of the participant; MeanSBP refers to the mean systolic blood pressure of the participant.
Table 3.
Truncated example of PICRUSt2 output predicted absolute gene abundances using KEGG Orthologs (KOs) conducted on 16S rRNA sequencing data, with KOs in rows and Subject IDs in columns.
| KEGG_orthology | ID1 | ID2 | ID3 | ... |
|---|---|---|---|---|
|
| ||||
| K00367 | 34 | 205 | 29 | |
| K00370 | 10141 | 25010 | 31940 | |
| K00371 | 10388 | 7487 | 10450 | |
| ... | ||||
Variable Key: Column heads have ID numbers, referring to the unique participant ID. Each row has one KEGG Ortholog (KO), defined as functional orthologs containing groups of genes. Cells contain absolute counts of that KO in the participant’s samples. Relative gene/KO abundances will be calculated before summing into a summary score.
2.5. Software
Statistical software to be used for analysis, e.g., SAS and R.
(Optional) PICRUSt2 or Piphillin software packages, if using predicted gene abundance from 16S rRNA sequences.
3. METHODS
An important statistical issue in microbiome analysis is the high dimensionality of microbiome data which includes thousands of taxa. With the enterosalivary nitrate-nitrite-NO pathway of interest, the selection of certain bacteria a priori is possible based on existing knowledge9,10. While individual taxa can be modelled one-by-one in regression models, statistical hypothesis testing may need to be adjusted for the false discovery rate, which reduces statistical power. In addition, since individual taxa analysis may fail to capture the many complex interactions between bacteria co-existing in a microbial community, a summary score can be a useful feature to give an overall picture of the microbiome community’s nitrate-reducing capacity.
There are two general approaches that can be used to create a summary score from 16S sequencing data: 1) a taxa-based score, using taxa a priori identified to be associated with nitrate-reducing capacity; and 2) a predicted metagenomic (gene)-based score in which scores are based on the estimated number of genes relevant to nitrate reduction.
An advantage of the method of creating a summary score of bacteria taxa previously identified in the literature to be of importance is the specificity of the taxa selected. Numerous oral bacteria are thought to contain nitrate reductase genes, and incorporation of all bacteria containing any nitrate reductase gene into the exposure summary score may result in greater variability and noise, thus masking the effect of the important species and biasing the effect estimate towards the null.
An alternative to using taxa already associated with a pathway of interest is available. In situations where key bacterial species are not in the literature, but a functional gene(s) of interest has been identified (e.g., nitrate reductase), the use of predicted metagenomic content to operationalize the exposure may be useful. It should be noted however, that both methods still rely on taxonomic classification from 16S rRNA marker gene sequencing and that microbial traits, such as horizontal gene transfer between bacteria or strain-level variation within species (e.g., differential nitrate-reduction capacity between strains), make misclassification of the individual’s true nitrate-reducing capacity possible.
3.1. Summary Score of Bacterial Taxa Associated with Nitrate-Reducing Capacity
3.1.1. Identification of bacteria associated with nitrate-reducing capacity
To identify the bacterial species of interest and create a summary score, a literature review was used to identify bacterial species associated with the nitrate-reducing capacity of oral microbiome samples12. Two reference papers were used to identify the bacterial species of interest in nitrate-reduction9,10. Doel et. al used culture-based techniques to isolate and identify only nitrate reductase-positive bacteria9, and all bacteria identified regardless of rate of nitrate-reduction was included. Additionally, Hyde et al. compared samples with high versus low nitrate-reducing capacity and used next-generation sequencing methods to identify species that were differentially abundant in the highest nitrate-reducing sample10. The latter sought to provide a whole community picture, including species indirectly helping in nitrate reduction; therefore, while most of the candidate species identified have a nitrate-reductase gene, some like P. melanogenica do not but contain a nitrite reductase gene instead. As our goal was to optimise the measurement of nitrate-reducing capacity, all species identified as candidate species, whether directly or indirectly contributing to nitrate-reducing capacity as “helper” species, were included in the summary score.
3.1.2. Operationalization using the arcsine-square root transformation of taxa relative abundance
From a taxonomic table of OTU/ASVs with absolute counts (i.e., counts of sequence hits), the relative abundance of each taxa is calculated by dividing the number of counts observed for that taxa sequence by the total counts across all taxa in the individual sample. The resulting relative abundance measure is a proportion (i.e., compositional) that is highly skewed and constrained to the range of zero to one with many zeros present (i.e., zero-inflated).
The arcsine-square root transformation has been widely used on taxa relative abundance to examine differentially abundant taxa between groups 28–31. This transformation reduces the skewed distribution, creating a more normally distributed continuous variable that can range in the negative, stabilizing the variance, and allowing it to be effectively used in linear regression models. Unlike studies which use the microbiome as the dependent variable of interest, we use microbial relative abundance as the exposure or independent variable. Therefore, we do not employ statistical methods that model absolute counts, instead of relative abundance, with zero-inflated Poisson32 or negative binomial models33 as others have. The arcsine-square root provides a simple transformation that can be easily performed in all software and is often used as a baseline comparison with the newly developed methods 34,35.
3.1.3. Standardization and Creation of a Summary Score for Bacteria
Before summing the selected taxa in a summary score, standardization is first carried out. This gives equal weight to each taxon in the score, preventing very high relative abundance taxa from dominating the score. This is important especially when it is unknown whether the actual nitrate-reducing capacity of each taxon is directly correlated with its relative abundance. For example, it is plausible that nitrate-reducing capacity might vary by taxa. A sum of the relative abundances of taxa of interest without standardization would simply represent the total relative abundance of the selected bacteria in the sample. Therefore, the arcsine square-root transformed relative abundance of each taxa for an individual is standardized via division by the taxon’s standard deviation across all the samples. The standardized values for the selected bacteria are then summed to create a summary score for each individual.
3.2. Summary score of Predicted Metagenomic genes from 16S rRNA sequencing on the NO3-NO2-NO Pathways
3.2.1. Prediction of metagenomic content from 16S rRNA sequencing
Bioinformatics tools can be used to predict metagenomic content from 16S rRNA marker gene sequencing. Piphillin and PICRUSt2 are the two most well-known tools for inferring metagenomic content36–38.
These tools use the taxonomic identification, the relative abundances of the taxa, and a reference database of known bacteria genomes. The output is a functional-gene-count matrix, providing an estimate of the count of each functional gene in each sample. Comprehensive tutorials are available on how to use these tools39.
3.2.2. Identification of functional gene orthologs on the enterosalivary pathway
Genes of interest can be identified by searching the Kyoto Encyclopaedia of Genes and Genomes (KEGG) Pathways for the pathway of interest, in this case the bacterial nitrogen metabolism pathway (Figure 1). Enzymes involved in each step of the NO3-NO2-NO pathway are mapped out. From Figure 1, nitrate reduction involves nitrate reductase enzymes EC 1.7.7.2, 1.7.5.1, 1.9.6.1, NR and 1.7.99. Selecting the respective enzyme, for example EC 1.7.5.1, will show the associated KEGG Ortholog (KO)s – functional orthologs containing a group of bacterial genes coding for that molecular-level function. These groups of genes include nitrate reductase genes such as narG, narH, narI, napA, napB, narB, nasA, and nasB. More details on the structure, organization and uses of the KEGG encyclopaedia are available elsewhere40.
Figure 1.

KEGG pathway map00910 – Nitrogen metabolism in bacteria. Publicly accessible from https://www.genome.jp/kegg-bin/show_pathway?map00910. Nitrate-reduction enzymes are highlighted in yellow and include EC 1.7.7.2, 1.7.5.1, 1.9.6.1, NR, and 1.7.99. Selecting nitrate reductase enzyme “EC 1.7.5.1” for example brings up K00370, K00371 and K00374 as KO gene groups, containing narG, narZ, nxrA, narH, narY, nxrB, narI, and narV genes. KOs are defined as functional orthologs, and may consist of a single bacterial gene or genes from closely related species.
3.2.3. Operationalization of predicted gene abundance summary score
From bioinformatics tools PICRUSt2 and Piphillin, absolute counts of all possible KEGG KOs in each sample are output. As with taxa, the absolute counts of specific KOs can be converted into relative KO gene abundance by division with the total counts across all possible KOs in that individual sample. The relative gene abundances of interest are then added into a summary score and normalized using the arcsine-square root transformation as performed on the taxa relative abundance in Section 3.1.2.
Examining the NO3-NO2-NO pathway using predicted metagenomic content can be operationalised in several ways. As a start, the summary score containing all nitrate-reducing genes can be calculated. Our ongoing methodological work explores incorporating competition from other bacterial genes, such as nitrite-reductase into the summary score10, and creating summary scores based on bacterial metabolic pathways (e.g., respiratory denitrification), to further examine the different parts of the NO3-NO2-NO enterosalivary pathway in relation to cardiometabolic outcomes. It should however be emphasized that estimating metagenomic content from 16S rRNA sequencing does not directly measure the bacterial genes in the microbiome, and the specific strains present in the samples may not have the same functions as mapped in the bacteria reference database.
3.3. Analysis of the association between the taxa and the outcome of interest
Multiple methods for analysing microbiome data have been developed and the standards for microbiome analyses are rapidly evolving17,41,42. Developments include the use of alternative parameters, such as the change in ratios of taxa to address biases introduced by comparing relative abundances between samples43–45. In this chapter, we analyse microbial relative abundance as the exposure or independent variable in linear regression models, adjusting for potential confounders. Patient selection criteria may be broader in population-based cross-sectional studies. Importantly, those who took antibiotics less than 30 days before recruitment were excluded, and microbiome studies often exclude those who report taking medications such as proton pump inhibitors. The patient selection criteria would likely be more restricted for smaller studies with a different study design, where control of confounding through design is necessary, or where smaller sample sizes limit the power for multivariate regression analysis.
3.4. Examples of datasets used in the analysis
3.5. Code for running bacterial taxa summary score analyses in SAS
In this section, we provide step-by-step instruction for operationalizing a nitrate-reducing taxa summary score as discussed in Section 3.1 above. We also provide the SAS code for analysing the created summary score with a cardiometabolic outcome of interest, as in our previous work using multivariable regression models12,46. We have used CAPS for SAS keywords and mixed case for user-supplied text in keeping with typographical conventions used in SAS textbooks.
-
/* Import the OTU/ASV output from 16S rRNA sequencing and sequencing analysis with relative abundances */ (see Note 1).
PROC IMPORT DATAFILE= “C:\Users\CG\16SOTUrelativeabundance.csv” OUT=microbiome DBMS=CSV REPLACE;
GETNAMES=YES;
RUN;
-
/*Import the participants metadata including Subject ID, socio-demographics, and clinical outcomes. Example of such a dataset is provided in Table 2 */
PROC IMPORT DATAFILE= “C:\Users\CG\Participantsclinicaldata.csv” OUT=metadata DBMS=CSV REPLACE;
GETNAMES=YES;
RUN;
-
/*Arcsin-square root transformation on the relative abundance of each taxa*/
DATA noarc;
SET microbiome;
transra= ARSIN (SQRT(relativeabundance));
RUN;
-
/*Creating a variable called ztransRA, which will be standardised in the next step*/
DATA arcsinz;
SET noarc;
ztransRA=transRA;
RUN;
-
/* SAS function that carries out the z score standardization, creating a mean of 0 and standard deviation of 1 across all samples. The dataset has to first be sorted by taxa in order for the standardization to be performed correctly. */
PROC SORT DATA= arcsinz;
BY taxa;
RUN;
PROC STANDARD DATA=arcsinz MEAN=0 STD=1 OUT=zscore;
VAR ztransRA;
BY taxa; /*see Note 2*/
RUN;
-
/*Creating the nitrate-reducing taxa summary score “sumarcz” by including only the a priori identified taxa*/
PROC SORT DATA= zscore;
BY id;
RUN;
PROC MEANS data=zscore NOPRINT;
WHERE taxa in (“Actinomyces_naeslundii”, “Actinomyces_odontolyticus”, “Actinomyces_viscosus”, “Capnocytophaga_sputigena”,
“Corynebacterium_durum”, “Corynebacterium_matruchotii”, “Eikenella_corrodens”,
“Haemophilus_parainfluenzae”,
“Neisseria_flavescens”,
“Neisseria_sicca”,
“Neisseria_subflava”,
“Prevotella_melaninogenica”,
“Prevotella_salivae”,
“Propionibacterium_acnes”,
“Rothia_dentocariosa”,
“Rothia_mucilaginosa”,
“Selenomonas_noxia”,
“Veillonella_dispar”,
“Veillonella_parvula”,
“Veillonella_atypica”); /*see Note 3*/
VAR ztransRA; OUTPUT OUT=sumzscore (drop= _TYPE_ _FREQ_)
sum(ztransRA)=sumarcz;
BY id;
RUN;
-
/*Merging the datasets to add the nitrate-reducing taxa summary score to the participant metadata */
PROC SORT data=sumzscore; BY id; RUN;
PROC SORT DATA= metadata; BY id; RUN;
DATA final;
MERGE sumzscore (IN=a) metadata (IN=b);
IF a AND b;
BY id;
RUN;
-
/*Multivariable regression models regressing systolic blood pressure outcome (MeanSBP) on the exposure summary score of nitrate-reducing capacity (sumarcz) controlling for other covariates age, sex, ethnicity, body mass index (BMI), smoking status, etc. */
PROC GENMOD DATA=final; /* see Note 4*/
CLASS sex(ref=last) ethnicity(ref=“d Hispanic”) smoking(ref=“never”);
MODEL MeanSBP=sumarcz age sex ethnicity bmi smoking /LINK=identity DIST=normal;
RUN;
3.6. Code for running bacterial taxa summary score analyses in R
The following sections generate results identical to those in Section 3.5 which creates a bacterial taxa summary score as described in Section 3.1; only now we are using R software, and provide the R code to perform the analyses.
-
# Import the OTU/ASV output from 16S rRNA sequencing and sequencing analysis with taxa relative abundances (see Note 1).
microbiome <- read.csv(“C:\Users\CG\16SOTUrelativeabundance.csv”)
-
# Import the participants metadata including Subject ID, socio-demographics, and clinical outcomes.
metadata <- read.csv(“C:\Users\CG\Participantsclinicaldata.csv”)
-
# Conduct the arcsin-square root transformation on the relative abundance of each taxa, using the function created above
microbiome$arsin <- asin(sqrt(microbiome$relativeabundance))
-
# Conduct taxa-specific z score standardization, creating a mean of 0 and standard deviation of 1 across all samples (see Note 5)
microbiome$taxaZscore <- ave(x = microbiome$arsin, group = microbiome$taxa, FUN = scale)
-
# Create vector with the selected taxa, and subset to include only these taxa (see Note 3)
taxaNames <- c (“Actinomyces_naeslundii”,
“Actinomyces_odontolyticus”,
“Actinomyces_viscosus”,
“Capnocytophaga_sputigena”,
“Corynebacterium_durum”,
“Corynebacterium_matruchotii”,
“Eikenella_corrodens”,
“Haemophilus_parainfluenzae”,
“Neisseria_flavescens”,
“Neisseria_sicca”,
“Neisseria_subflava”,
“Prevotella_melaninogenica”,
“Prevotella_salivae”,
“Propionibacterium_acnes”,
“Rothia_dentocariosa”,
“Rothia_mucilaginosa”,
“Selenomonas_noxia”,
“Veillonella_dispar”,
“Veillonella_parvula”,
“Veillonella_atypica”)
taxaSubset <- microbiome[microbiome$taxa %in% taxaNames, ]
-
# Add the z scores for each sample, and create new data set with appropriate column names
sumZscore <- as.data.frame(aggregate(taxaSubset$taxaZscore, by = taxaSubset$ID, FUN = sum)
names(sumZscore) <- c(“ID”, “sumarcz”)
-
# Merge dataset with the nitrate-reducing taxa summary score to the metadata
final <- merge(metadata, sumZscore, by = “ID”)
13. # Multivariable regression model, regressing systolic blood pressure outcome (MeanSBP) on the exposure nitrate-reducing taxa summary score (sumarcz) controlling for other covariates
fit <- lm(MeanSBP ~ sumarcz + age + sex + ethnicity + bmi + smoking, data = final)
summary(fit)
3.7. Code for running predicted gene abundance summary score analyses on SAS
In this section, we provide step-by-step instruction using SAS software to operationalize the PICRUSt2 output predicted gene abundance data, based on 16S rRNA data as discussed in Section 3.2 above. A nitrate-reducing gene abundance summary score is created and used as an exposure in multivariable linear regressions with a cardiometabolic outcome of interest-mean systolic blood pressure.
-
/* Import the output from PICRUSt2 analysis using KEGG Orthologs (KOs), conducted on 16S rRNA sequencing data. This data set has KOs in the rows and Subject IDs as columns. See Table 3) */
PROC IMPORT DATAFILE= “C:\Users\CG\PICRUSt2OutputKO.csv” OUT=picrust2_wide DBMS=CSV REPLACE;
RUN;
-
/* Transpose PICRUSt2 output using the SAS procedure PROC TRANSPOSE. The NAME=ID command is to create a new variable containing all the participants IDs previously listed as columns */ (see Note 6).
PROC TRANSPOSE DATA=picrust2_wide OUT=picrust2_long NAME=ID;
ID kegg_orthology;
RUN;
-
/*Create a variable “TotalCounts” that records the total counts of all KOs per sample. Be sure to check the names of the first and last KO variables that appear in the data set (in this example, K00360 and K15876) */
DATA picrust2v2;
SET picrust2_long;
TotalCounts= SUM (OF K00360--K15876); /*see Note 7*/
RUN;
-
/*Create a subset of data keeping only Subject ID, TotalCounts, and KOs related to nitrate reduction genes*/
DATA picrustNO3only;
SET picrust2v2;
KEEP ID TotalCounts
K00367 K00370 K00371 K00374 K02567 K02568 K00372 K00360; /*see Note 8*/
RUN;
-
/* Calculating relative abundances of the individual predicted genes by division with the “TotalCounts” of the sample*/
DATA picrustNO3only;
SET picrustNO3only;
K00367_rel= K00367/TotalCounts;
K00370_rel=K00370/TotalCounts;
K00371_rel=K00371/TotalCounts;
K00374_rel= K00374/TotalCounts;
K02567_rel= K02567/TotalCounts;
K02568_rel= K02568/TotalCounts;
K00372_rel=K00372/TotalCounts;
K00360_rel=K00360/TotalCounts;
RUN;
-
/*Create a NO3 reduction relative gene abundance summary score, “NO3_rel”, by adding the predicted relative abundances of genes involved in nitrate reduction.*/
DATA picrustNO3only;
SET picrustNO3only;
NO3_rel= K00367_rel + K00370_rel + K00371_rel + K00374_rel + K02567_rel + K02568_rel + K00372_rel + K00360_rel;
RUN;
-
/* Arcsin-square root transformation on the NO3 reduction gene abundance summary score*/
DATA picrustNO3only;
SET picrustNO3only;
NO3_arsin=arsin(sqrt(NO3_rel));
RUN;
-
/*Import the participants metadata including Subject ID, socio-demographics, and clinical outcomes. Example of such a dataset is provided in Table 2 */
PROC IMPORT DATAFILE= “C:\Users\CG\Participantsclinicaldata.csv” OUT=metadata DBMS=CSV REPLACE;
GETNAMES=YES;
RUN;
-
/*Merging the datasets to add the NO3 reduction gene abundance summary score to the participant metadata */
PROC SORT DATA=picrustNO3only; BY id; RUN;
PROC SORT DATA= metadata; BY id; RUN;
DATA final;
MERGE picrustNO3only (IN=A) metadata (IN=B);
IF A AND B;
BY id;
RUN;
-
/*Multivariable regression model, regressing systolic blood pressure outcome (MeanSBP) on the exposure, NO3 reduction gene abundance summary score (NO3_arsin), controlling for other covariates */
PROC GENMOD DATA=final;
CLASS sex(ref=last) raceethn(ref=“d Hispanic”) cigcurr(ref=“never”);
MODEL MeanSBP=NO3_arsin age sex raceethn bmi cigcurr /LINK=identity DIST=normal;
RUN;
3.8. Code for running predicted gene abundance summary score analyses on R
The following section generate results identical to those in Section 3.7, using predicted gene abundance data output from PICRUSt2 to create a nitrate-reducing gene abundance summary score as described in Section 3.2; only now we are using R software, and provide the R code to perform the analyses.
-
# Import the output from PICRUSt2 analysis using KEGG Orthologs (KOs), conducted on 16S rRNA sequencing data. This data set has KOs in the rows and Subject IDs as columns.
PICRUSt2_wide <- read.csv(“C:\Users\CG\PICRUSt2OutputKO.csv”)
-
# Transpose PICRUSt2 output.
PICRUSt2_long <- gather(data = PICRUSt2_wide, key = ID, value = Count, 2:ncol(PICRUSt2_wide))
-
# Record total number of reads per sample as a new data set, and label columns appropriately.
Total <- as.data.frame(aggregate(PICRUSt2_long$Count, by = list(PICRUSt2_long$ID), FUN = sum)
names(Total) <- c(“ID”, “TotalCounts”)
-
# Create vector with the KOs of interest. Using the approach outlines above, we will select the KOs corresponding to enzymes involved in nitrate reduction to nitrite (see Note 8).
KOs <- c(
“K00367”, #Corresponds to the gene narB
“K00370”, #Corresponds to the genes narG, narZ, and nxrA
“K00371”, #Corresponds to the genes narH, narY, and nxrB
“K00374”, #Corresponds to the genes narI and narV
“K02567”, #Corresponds to the gene napA
“K02568”, #Corresponds to the gene napB
“K00372”, #Corresponds to the gene nasA
“K00360”, #Corresponds to the gene nasB)
-
# Create a subset of data, including only the KOs selected above
NO3 <- PICRUSt2_long[PICRUSt2_long$KEGG_orthology %in% KOs,]
-
# Convert NO3 data set to long format.
NO3 <- spread(NO3, key = KEGG_orthology, value = Count)
-
Add the total sample counts to the NO3 data set.
NO3 <- merge(NO3, Total, by = “ID”)
-
# Transform absolute abundances into relative abundances, by dividing each value by the sample total (see Note 9).
NO3$K00367_rel <- NO3$K00367/NO3$Total
NO3$K00370_rel <- NO3$K00370/NO3$Total
NO3$K00371_rel <- NO3$K00371/NO3$Total
NO3$K00374_rel <- NO3$K00374/NO3$Total
NO3$K02567_rel <- NO3$K02567/NO3$Total
NO3$K02568_rel <- NO3$K02568/NO3$Total
NO3$K00372_rel <- NO2$K00372/NO3$Total
NO3$K00360_rel <- NO2$K00360/NO3$Total
-
# Create a NO3 reduction relative gene abundance summary score “NO3_rel”, by adding the predicted relative abundances of genes involved in nitrate reduction
NO3$NO3_rel <- NO3$K00367_rel + NO3$K00370_rel + NO3$K00371_rel + NO3$K00374_rel + NO3$K02567_rel + NO3$K02568_rel + NO3$K00372_rel + NO3$K00360_rel
-
# Conduct the arcsin-square root transformation on the NO3 reduction gene abundance summary score created.
NO3$NO3_arsin <- asin(sqrt (NO3$NO3_rel))
-
# Import the participants metadata including Subject ID, socio-demographics, and clinical outcomes.
metadata <- read.csv(“C:\Users\CG\Participantsclinicaldata.csv”)
-
# Merge data sets with inferred metagenome proportions to the metadata.
final <- merge(metadata, NO2, by = “ID”)
-
# Multivariable regression model, regressing systolic blood pressure outcome (MeanSBP) on the exposure, NO3 reduction gene abundance summary score (NO3_arsin), controlling for other covariates.
fit <- lm(meansbp ~ NO3_arsin + age + sex + raceethn + bmi + cigcurr, data = final)
summary(fit)
Footnotes
Often the output after 16S rRNA sequencing and sequence assignment will be in the long format, with multiple rows representing different OTUs/ASVs for the same subject. The code assumes this format as shown in Table 1.
It is important to perform the standardization by taxa. The command “BY taxa” tells SAS to use the standard deviation of that particular taxa across all samples for the standardization.
From a list of 28 putative taxa associated with nitrate-reducing species derived from prior literature9,10, only 20 were identified in the ORIGINS data12. These earlier studies looked at bacteria from tongue dorsum, supragingival plaque and/or saliva.
PROC GENMOD is the SAS procedure fitting a generalized linear model to the data by maximum likelihood estimation of the parameter vector β. The “LINK=identity DIST=normal” indicates that a linear model with a normal distribution and continuous response variable is being used.
There a many different Z-score standardization functions in R, including the scale function in the base package. However, since we needed to conduct a taxa-specific standardization, we use the ave function to apply the scale function to groups of data.
More information on PROC TRANSPOSE can be found from https://support.sas.com/resources/papers/proceedings/proceedings/forum2007/046-2007.pdf
The procedure SUM(OF K00360--K15876) tells SAS to sum over a range of variables in the order they are listed in the dataset. Therefore, K00360 is the first column variable and K15876 the last variable to be included in the count.
The list of KOs included in the nitrate reductase gene abundance summary score as derived from the KEGG pathway map are K00367, K00370, K00371, K00374, K02567, K02568, K00372, and K00360. K10534 which corresponds to NR in the KEGG pathway was not present in our dataset. The KOs present were selected based on their identification in KEGG as being directly involved in conversion of nitrate to nitrite (see Figure 1).
Note that this step was done individually for each taxon, but the “for loop” approach in R can also be used to conduct the standardization.
REFERENCES
- 1.Lundberg JO, Weitzberg E, Gladwin MT. The nitrate-nitrite-nitric oxide pathway in physiology and therapeutics. Nat Rev Drug Discov. 2008;7(2):156–167. [DOI] [PubMed] [Google Scholar]
- 2.Sansbury BE, Hill BG. Regulation of obesity and insulin resistance by nitric oxide. Free Radic Biol Med. 2014;73:383–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Koch CD, Gladwin MT, Freeman BA, Lundberg JO, Weitzberg E, Morris A. Enterosalivary nitrate metabolism and the microbiome: intersection of microbial metabolism, nitric oxide and diet in cardiac and pulmonary vascular health. Free Radical Biology and Medicine. 2017;105:48–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Beals JW, Binns SE, Davis JL, et al. Concurrent Beet Juice and Carbohydrate Ingestion: Influence on Glucose Tolerance in Obese and Nonobese Adults. J Nutr Metab. 2017;2017:6436783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Govoni M, Jansson EA, Weitzberg E, Lundberg JO. The increase in plasma nitrite after a dietary nitrate load is markedly attenuated by an antibacterial mouthwash. Nitric Oxide. 2008;19:333–337. [DOI] [PubMed] [Google Scholar]
- 6.Woessner M, Smoliga JM, Tarzia B, Stabler T, Van Bruggen M, Allen JD. A stepwise reduction in plasma and salivary nitrite with increasing strengths of mouthwash following a dietary nitrate load. Nitric Oxide. 2016;54:1–7. [DOI] [PubMed] [Google Scholar]
- 7.Kapil V, Haydar SM, Pearl V, Lundberg JO, Weitzberg E, Ahluwalia A. Physiological role for nitrate-reducing oral bacteria in blood pressure control. Free Radic Biol Med. 2013;55:93–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bescos R, Ashworth A, Cutler C, et al. Effects of Chlorhexidine mouthwash on the oral microbiome. Scientific reports. 2020;10(1):5254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Doel JJ, Benjamin N, Hector MP, Rogers M, Allaker RP. Evaluation of bacterial nitrate reduction in the human oral cavity. European journal of oral sciences. 2005;113(1):14–19. [DOI] [PubMed] [Google Scholar]
- 10.Hyde ER, Andrade F, Vaksman Z, et al. Metagenomic analysis of nitrate-reducing bacteria in the oral cavity: implications for nitric oxide homeostasis. PLoS One. 2014;9:e88645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Vanhatalo A, Blackwell JR, L’Heureux JE, et al. Nitrate-responsive oral microbiome modulates nitric oxide homeostasis and blood pressure in humans. Free Radic Biol Med. 2018;124:21–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Goh CE, Trinh P, Colombo PC, et al. Association Between Nitrate-Reducing Oral Bacteria and Cardiometabolic Outcomes: Results From ORIGINS. Journal of the American Heart Association. 2019;8(23):e013324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tribble GD, Angelov N, Weltman R, et al. Frequency of Tongue Cleaning Impacts the Human Tongue Microbiome Composition and Enterosalivary Circulation of Nitrate. Frontiers in Cellular and Infection Microbiology. 2019;9:39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kapil V, Khambata RS, Jones DA, et al. The Noncanonical Pathway for In Vivo Nitric Oxide Generation: The Nitrate-Nitrite-Nitric Oxide Pathway. Pharmacological Reviews. 2020;72(3):692–766. [DOI] [PubMed] [Google Scholar]
- 15.Jackson JK, Zong G, MacDonald-Wicks LK, et al. Dietary nitrate consumption and risk of CHD in women from the Nurses’ Health Study. British Journal of Nutrition. 2019;121(7):831–838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jackson JK, Patterson AJ, MacDonald-Wicks LK, Oldmeadow C, McEvoy MA. The role of inorganic nitrate and nitrite in cardiovascular disease risk factors: a systematic review and meta-analysis of human evidence. Nutrition reviews. 2018;76(5):348–371. [DOI] [PubMed] [Google Scholar]
- 17.Knight R, Vrbanac A, Taylor BC, et al. Best practices for analysing microbiomes. Nature Reviews Microbiology. 2018;16(7):410–422. [DOI] [PubMed] [Google Scholar]
- 18.Tyler AD, Smith MI, Silverberg MS. Analyzing the human microbiome: a “how to” guide for physicians. Am J Gastroenterol. 2014;109(7):983–993. [DOI] [PubMed] [Google Scholar]
- 19.Pollock J, Glendinning L, Wisedchanwet T, Watson M. The Madness of Microbiome: Attempting To Find Consensus “Best Practice” for 16S Microbiome Studies. Applied and Environmental Microbiology. 2018;84(7):e02627–02617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Willis JR, Gabaldón T. The Human Oral Microbiome in Health and Disease: From Sequences to Ecosystems. Microorganisms. 2020;8(2):308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Morgan XC, Huttenhower C. Chapter 12: Human microbiome analysis. PLoS Comput Biol. 2012;8(12):e1002808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sinha R, Abu-Ali G, Vogtmann E, et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nature Biotechnology. 2017;35(11):1077–1086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Gomes BP, Berber VB, Kokaras AS, Chen T, Paster BJ. Microbiomes of Endodontic-Periodontal Lesions before and after Chemomechanical Preparation. J Endod. 2015;41(12):1975–1984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mougeot JL, Stevens CB, Cotton SL, et al. Concordance of HOMIM and HOMINGS technologies in the microbiome analysis of clinical samples. J Oral Microbiol. 2016;8:30379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kopylova E, Navas-Molina JA, Mercier C, et al. Open-Source Sequence Clustering Methods Improve the State Of the Art. mSystems. 2016;1(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Callahan BJ, McMurdie PJ, Holmes SP. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME Journal. 2017;11(12):2639–2643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Palmer RJ, Cotton SL, Kokaras A, et al. Analysis of oral bacterial communities: comparison of HOMINGS with a tree-based approach implemented in QIIME. Journal of Oral Microbiology. 2019;In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Morgan XC, Kabakchiev B, Waldron L, et al. Associations between host gene expression, the mucosal microbiome, and clinical outcome in the pelvic pouch of patients with inflammatory bowel disease. Genome Biol. 2015;16:67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Morgan XC, Tickle TL, Sokol H, et al. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biology. 2012;13(9):R79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Gevers D, Kugathasan S, Denson LA, et al. The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe. 2014;15(3):382–392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhou W, Sailani MR, Contrepois K, et al. Longitudinal multi-omics of host–microbe dynamics in prediabetes. Nature. 2019;569(7758):663–671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Xu T, Demmer RT, Li G. Zero-inflated Poisson factor model with application to microbiome read counts. Biometrics.n/a(n/a). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Chen J, King E, Deek R, et al. An omnibus test for differential distribution analysis of microbiome sequencing data. Bioinformatics. 2018;34(4):643–651. [DOI] [PubMed] [Google Scholar]
- 34.Ho NT, Li F, Wang S, Kuhn L. metamicrobiomeR: an R package for analysis of microbiome relative abundance data using zero-inflated beta GAMLSS and meta-analysis across studies using random effects models. BMC bioinformatics. 2019;20(1):188–188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Chen EZ, Li H. A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics. 2016;32(17):2611–2617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Narayan NR, Weinmaier T, Laserna-Mendieta EJ, et al. Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences. BMC Genomics. 2020;21(1):56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Langille MG, Zaneveld J, Caporaso JG, et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nature biotechnology. 2013;31(9):814–821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Douglas GM, Maffei VJ, Zaneveld J, et al. PICRUSt2: An improved and extensible approach for metagenome inference. bioRxiv. 2019:672295. [Google Scholar]
- 39.PICRUSt2 Tutorial (v2.1.4 beta). https://github.com/picrust/picrust2/wiki/PICRUSt2-Tutorial-(v2.1.4-beta). Published 2019. Accessed.
- 40.Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Research. 2016;45(D1):D353–D361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Tsilimigras MC, Fodor AA. Compositional data analysis of the microbiome: fundamentals, tools, and challenges. Ann Epidemiol. 2016;26(5):330–335. [DOI] [PubMed] [Google Scholar]
- 42.Microbiome Li H., metagenomics, and high-dimensional compositional data analysis. Annual Review of Statistics and Its Application. 2015;2:73–94. [Google Scholar]
- 43.McLaren MR, Willis AD, Callahan BJ. Consistent and correctable bias in metagenomic sequencing experiments. Elife. 2019;8:e46923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Williamson BD, Hughes JP, Willis AD. A multi-view model for relative and absolute microbial abundances. bioRxiv. 2019:761486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Morton JT, Marotz C, Washburne A, et al. Establishing microbial composition measurement standards with reference frames. Nature Communications. 2019;10(1):2719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Demmer RT, Trinh P, Rosenbaum M, et al. Subgingival Microbiota and Longitudinal Glucose Change: The Oral Infections, Glucose Intolerance and Insulin Resistance Study (ORIGINS). Journal of Dental Research. 2019:0022034519881978. [DOI] [PMC free article] [PubMed] [Google Scholar]
