Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jun 10.
Published in final edited form as: Methods Mol Biol. 2018;1849:227–242. doi: 10.1007/978-1-4939-8728-3_15

Transkingdom Networks: A Systems Biology Approach to Identify Causal Members of Host-Microbiota Interactions

Richard R Rodrigues 1, Natalia Shulzhenko 2, Andrey Morgun 3
PMCID: PMC6557635  NIHMSID: NIHMS1029749  PMID: 30298258

Abstract

Improvements in sequencing technologies and reduced experimental costs have resulted in a vast number of studies generating high-throughput data. Although the number of methods to analyze these “omics” data has also increased, computational complexity and lack of documentation hinder researchers from analyzing their high-throughput data to its true potential. In this chapter we detail our data-driven, transkingdom network (TransNet) analysis protocol to integrate and interrogate multi-omics data. This systems biology approach has allowed us to successfully identify important causal relationships between different taxonomic kingdoms (e.g. mammals and microbes) using diverse types of data.

Keywords: omics, transkingdom, network analysis, causal relationships

Introduction

Over the last decade assessing eukaryotic and prokaryotic genomes and transcriptomes have become extremely easy. With technologies like microarrays and next-generation sequencing, investigators now have faster and cheaper access to high-throughput “-omics” data [1]. This in turn has increased the number of analysis methods [2] and allows for the exploration of new and different biological questions to provide insights and better understanding of host, host-microbial systems, and diseases [35].

Studies usually focus on identifying differences between “groups” (e.g., healthy versus diseased or treatment versus control) or changes across a time course (e.g., development of an organism or progression of a disease). Depending on the biological questions, such studies generate one or more types of omics data [68], e.g., host gene expression and gut microbial abundance. Typically, studies analyze these omics data separately, comparing gene expression and microbial abundance between groups or across stages. Although such analysis methods have been very useful, they do not directly answer the most critical questions of host-microbiota interactions, i.e. which microbes affect specific pathways in the host and which host pathways/genes control specific members of the microbial community? Therefore, to answer those questions, these analyses are usually followed by literature searches to identify relationships between host genes and microbes.

Different algorithms and methods have been proposed to integrate multi-omics data [913]. More recently, a few published studies have not only integrated microbiome and host data, but have also been able to successfully test their computational predictions in the laboratory [1419]. In this chapter we describe our data-driven, transkingdom network (TransNet) analysis pipeline (Figure 1) that has allowed us to make validatable computational inferences. We construct networks using correlations between differentially expressed elements (e.g., genes, microbes) and integration of high throughput data from different taxonomic kingdoms (e.g., human and bacteria). In fact, TransNet analysis can be applied to integrate any “Transomics” data, between as well as within taxonomic kingdoms e.g., miRNA and gene expression, protein and metabolite, bacterial and host gene expression, or copy number, methylation, and gene expression. Interrogation of this network allows us to pinpoint important causal relationships between data. For example, using this method we inferred and validated: 1) microbes and microbial genes controlling a specific mammalian pathway [15]; 2) a microbe that mediates effect of one host pathway on another [14]; 3) a host gene that mediates control of gut microbe through an upstream master regulator gene [14]. Below we show how TransNet analysis can be used to integrate host gene expression with microbial abundance to create transkingdom networks.

Figure 1: Overview of transkingdom network analysis.

Figure 1:

Omics data for multiple data types (e.g. microbial, gene expression, etc.) are analyzed to identify differentially abundant elements (e.g. microbes, genes, etc.). For each group (e.g. treatment or control) co-expression networks are constructed for each data type followed by the identification of dense sub-networks (modules). Calculating correlations between module elements of the different data types creates the “transkingdom” network. Network interrogation of the transkingdom network allows identification of causal members and regulatory relationships.

Materials

Program Availability:

Our transkingdom network analysis pipeline is independent of programming language or software. However, for ease of access and usage simplicity, we have provided our pipeline as a convenient R package (TransNetDemo) and supplementary document (File S1) in addition to the description provided. Although the user can choose to perform the following steps in a programming language or software of their choice, we suggest using our R package.

Required R packages:

Install the following packages along with their dependencies: stringr, ProNet, igraph, ggplot2, gplots from CRAN (https://cran.r-project.org/).

Installing TransNetDemo:

library(devtools)
install_github(“ richrr/TransNetDemo”)
library (TransNetDemo)

Data Sources:

Due to a variety of data generation technologies, biological questions, and software, description of every possible analysis is beyond the scope of this chapter. We expect that the user has access to tab-delimited file(s) containing the measurements of biological data type, e.g., gene expression, copy number, methylation, miRNA, or microbial across samples. Depending on the data type the user can find reviews and protocol papers describing the analysis needed to produce “abundance” tables [2024].

The transkingdom network analysis method can be applied to any experimental design (e.g. treated/untreated, control/disease). As an example we will use simulated data from a simple experimental design that investigates the effects of factors on host-microbial interactions. Consider an experiment with two groups (HFHS/NCD) of 25 samples each. At the end of the experiment, among other phenotypic measurements (e.g., body weight, enzyme levels, hormone levels), the gene expression levels and microbial abundance in the gut (e.g., ileum) of the samples were measured. Depending on resource availability, high confidence and consistent results can be achieved by increasing the number of samples per group and/or repeating the above experiment multiple times. In this example data, we have two such experiments. A brief description of how to generate the abundance tables is mentioned below. Information about how the network analysis protocol can be adapted to answer some other biological questions have been mentioned in the NOTES section of this chapter.

Gene expression analysis:

Several different technologies, each with their own pros and cons, allow for the measuring of transcriptome levels in an organism. Although microarrays were extensively used over the last two decades, the availability of cheap and efficient library preparation kits and sequencing methods allow for the expression measurements of known and novel genes using RNA-Seq technologies [25].

In case of RNA-Seq data, the sequencing facilities usually provide fastq files that contain raw reads per sample (demultiplexed). Here, the number of reads corresponding to a particular gene is proportional to that gene’s expression level. Software like FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/), FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), PRINSEQ [26], or cutadapt [27] can be used for adapter removal and quality control. Depending on the availability of a gold-standard reference host genome sequence, gene expression abundance can be measured using the Tuxedo [21] or Trinity [28] pipeline. Both of these pipelines permit the analysis of single or paired reads and different read lengths while outputting a file containing the expression levels (number of reads) of genes (rows) present in each sample (columns). The obtained read counts can be normalized by simple (e.g., quantile normalization, total reads (CPKM), reads per kilobase per million mapped reads (RPKM) [29]) or sophisticated methods (e.g., DESEQ [30], edgeR [31]).

In the case of microarray data, hybridization facilities usually provide scan files (Affymetrix CEL, Illumina IDAT, or GenePix GPR) that contain the intensity of probes per sample. Here, the probe intensity is proportional to the corresponding gene expression level. Software like Affymetrix® Expression Console, Illumina’s GenomeStudio, and GenePix® Pro, as well as packages like affy [32] and limma [33], allow for background correction, normalization, and summarized probe intensities while outputting a file containing the expression levels of genes (rows) present in each sample (columns).

Microbial abundance analysis:

The abundance of next-generation technologies has helped in the study of microbial richness and diversity. Scientists no longer need to rely on cultivation methods and can directly sequence the microbiome, helping explore previously unknown microbes. The amplicon based sequencing technologies rely on using a gene marker (16S [34,35] ribosomal RNA gene, Internal Transcribed Spacer [36,37], etc.) to identify microbial presence and abundance. Although relatively cheaper than the shotgun metagenomics, they rely on databases of known genomic markers to identify microbes and rarely provide taxonomy at the species or strain levels. The shotgun metagenomics sequencing approach does a better job at surveying the entire genome of microbes since it does not focus on amplifying specific genes. Consequently, it provides fine-grained taxonomic information along with a more accurate representation of the microbial structure and function, including the previously unknown “dark matter” microbes [38].

Software like QIIME [39], MOTHUR [40], etc. provide all-in-one toolkits that can demultiplex, perform quality control, and analyze the amplicon based sequences. Similar to RNA-Seq data, the fastq files obtained from the sequencing facility need to be processed for the removal of barcodes, adapter, and primers followed by filtering to retain high quality sequences. The reads are grouped (binned) per sequence similarity (usually at 97% threshold) into operational taxonomic units (OTUs). The taxonomy of a known microbe (or the ancestor taxonomy of the top matches) closest to the representative sequence of the OTU is assigned to all the reads in that OTU. The tools output a file containing the abundance (number of reads) of OTUs (rows) present in each sample (columns). The obtained read counts can be relativized or cumulative sum scaling (CSS) [41] normalized.

Shotgun metagenomic data can be analyzed [38] using tools such as MG-RAST [42], MEGAN [43], MetaPhlAn [44], and HUMAnN [45]. Although most of these software provide taxonomic and functional analyses, they are not standalone. Demultiplexing and quality control need to be done before the reads are imported in the software. Especially in case of host-microbe systems, Processing utility for Metagenomics Analysis (PuMA) (http://blogs.oregonstate.edu/morgunshulzhenkolabs/softwares/puma/) provides an all-inclusive software pipeline that can be more user-friendly. PuMA uses cutadapt for quality control and Bowtie [46] to identify reads that match the host genome and discards these “contaminating” reads from downstream analysis. The remaining microbial reads are aligned to a database of known protein sequences using DIAMOND [47], followed by taxonomic and functional (e.g. SEED, COG, KEGG) assignments using MEGAN. PuMA outputs a file containing the abundance of microbes and pathways (rows) in each sample (columns). The appropriate normalization techniques from the RNA-Seq or amplicon sequencing methods can be performed on the abundance table.

In summary, the user needs at least one of each of the following files before starting network analysis:

  • mapping file: tab-delimited file containing the group (e.g. treated/untreated, control/disease) affiliation for each sample with “Factor” and “SampleID” as column headers, respectively.

  • data files: tab-delimited files containing the abundance of elements (host genes and microbes) per sample, where the elements and samples are rows and columns, respectively. Importantly, each sample must have both types of data available.
    • normalized gene expression file: the column “IdSymbol” contains the unique genes while the remaining columns contain their expression levels across different samples.
    • normalized otu abundance file: the column “IdSymbol” contains the unique microbes while the remaining columns contain their abundance across different samples.

Methods

The following steps will help to identify key elements of a system from high confidence modules of a multi-omics network. We show the first few steps with the gene abundance file(s) using the code from the GeneDemo.R (GD) file available in our package. It is straightforward to run similar steps on the microbe abundance file(s), however we have also provided the code in MicrobeDemo.R (MD) file for ease of use.

  • Start by setting defaults for variables that you will use in the analysis, such as significance thresholds (GD: lines 7–9), groups to be compared (GD: lines 11–13), and headers of relevant columns from the mapping (GD: lines 14–15) and abundance files (GD: line 16).

  • Next you want to identify the differentially expressed elements (GD: line 29). The network analysis can be performed using all the elements (genes, microbes, etc.). However, we suggest identifying the elements that show differential abundance between groups, using code from Compare groups.R (Cg) file, to focus on the most important elements and make the analyses computationally efficient.
    • Read from the mapping file to extract the samples from each group (Cg: lines 11–20).
    • Read from the gene abundance file (Cg: lines 22–25).
    • Then perform test for differential abundance using code from Diff abundance.R (Da) file. This function returns the mean, median for each group along with the fold change and p-value (Da: lines 8–28).
    • Next, account for multiple testing using Benjamini-Hochberg’s FDR calculation (Cg: lines 38–41).
    • Finally, select the differentially expressed genes using appropriate FDR cutoff (GD: line 33) (< 0.05) (Figure 2b).
  • We highly recommend that if you have datasets obtained from replicate experiments or in different sample cohorts that you perform the above steps for each experiment and do metaanalysis [16,18,48,49] (GD: lines 39–47).
    • First, the meta-analysis selects for genes that show fold change direction consistency across datasets (Check consistency.R), e.g., upregulated (or downregulated) across all experiments.
    • Second, for the genes showing consistent fold change direction use Fisher’s method to calculate a combined p-value (Calc combined.R) from the individual p-values (from comparison test) across multiple experiments.
    • Then apply appropriate significance thresholds (Apply sign cutoffs.R) based on individual p-value (< 0.3) in each dataset, combined (Fisher’s) p-value across datasets (< 0.05), and FDR (< 0.1) across the combined p-values to identify consistently differentially abundant elements.
    • Ensuring the same direction of regulation in all datasets and restricting individual p values at each individual dataset allows controlling of heterogeneity between datasets. Note that mere calculation of Fisher p-value for meta-analysis followed by application of FDR is not sufficient for accurate identification of differentially abundance/expression.
  • Determining associations between elements (e.g., genes and/or microbes) is central for network reconstruction. Defining strength and sign of correlation (GD: line 56) can help to determine whether two elements (i.e. biological entities represented by nodes in a network) have a positive or negative interaction. Such information about potential relationships, using code from Correlation in group.R (Cig) file, is crucial for interrogating and understanding the regulatory mechanisms between elements. Note, correlations are calculated using data from samples representing one group (phenotypic class), never pooling samples from all groups for estimation of correlation. Therefore, the following steps should be performed for each group separately.
    • Read from the mapping file to extract the samples from a group (Cig: lines 11–18).
    • Read from the gene abundance file (Cig: lines 20–23).
    • Then create pairs (Cig: lines 25–32) from the consistent genes obtained in the previous step.
    • Next perform test for correlation on gene pairs using code from Calc cor.R (Ccr) file.
    • This function returns the correlation and p-value (Ccr: lines 8–17).
    • Next, account for multiple testing using Benjamini-Hochberg’s FDR calculation (Cig: lines 48–50).
    • Finally, select the significantly correlated gene pairs using appropriate FDR cutoff (GD: line 60) < 0.1.
  • We highly recommend that if you have datasets obtained from replicate experiments or different sample cohorts that you perform the above steps for each experiment and do meta-analysis (GD: lines 65–72).
    • First, the meta-analysis selects for gene pairs that show correlation direction consistency across datasets (Check consistency.R), e.g., positive (or negative) across all experiments.
    • The next steps of combining the individual p-values (from correlation test) and applying multiple significance cutoffs are similar to those in the meta-analysis of genes.
  • At this point you have a network for a single group where nodes are genes and edges indicate significant correlation. Next, we identify the proportion of unexpected correlations (PUC) [50] (GD: line 83). If two elements have a regulatory relationship we expect them to behave in certain ways. For example, consider two groups. Two positively correlated genes in a group should have the same direction of fold change between two groups. On the other hand, two negatively correlated genes should have the opposite direction of fold change. Edges in a network where the sign of correlations does not correspond to the direction of change are unexpected, are not likely to contribute to the process under investigation, and hence, discarded using code from Puc compatiable network.R (Pcn) file.
    • First, for each gene pair identify the sign of correlation (Pcn: lines 47–53).
    • Second, calculate if each gene in the pair has the same direction of regulation (i.e. fold change) (Pcn: lines 56–65).
    • Pairs are expected and kept (Pcn: lines 70–80) if they satisfy either of these conditions:
      • positively correlated genes have the same fold change direction
      • negatively correlated genes have different fold change direction
  • At this point you have a network that satisfies regulatory relationships. Next, the obtained network can be systematically studied to answer different biological question. Most often, network interrogation relies on identifying highly inter-connected sets of nodes. Such a subnetwork is called a module (or cluster). Identify clusters (GD: line 89) using the MCODE method from the Identify subnetworks.R file (Figure 3b).

  • Repeat the above steps for the microbial (or any other data type) abundance file(s) to obtain heat map (Figure 2a) and clusters (Figure 3a) per biological data type (e.g. genes, microbes, etc.). Refer to the code in MicrobeDemo.R file.

  • The next step is to integrate sub-networks to create transkingdom networks using code from the GeneMicrobeDemo. R (GMD) file. Note that at this point you have already identified modules from the gene and the microbe networks. Similar to the above steps, create pairs between nodes from the different modules (GMD: line 29), calculate correlations within a group (GMD: line 32), and identify significant pairs based on single (GMD: line 36) or meta (GMD: lines 41–49) analysis. Next, apply PUC analysis and remove unexpected edges from this transkingdom (gene-microbe) network as it is done for regular gene expression (and microbial abundance) network (GMD: line58).

  • Combining the gene-gene correlations (edges from the gene sub-networks) (GMD: line 74), microbe-microbe correlations (edges from the microbe sub-networks) (GMD: line 77), and the gene-microbe correlations (GMD: line 80) creates the full transkingdom network (GMD: line 83) (Figure 4).

  • Finally, identify elements that are crucial for crosstalk between the different modules in a network using bipartite betweenness centrality (BBC) (GMD: lines 92–119). This approach involves calculating the shortest paths between nodes from different modules using code from Get shortest paths.R file. The elements with the highest BBC measurement (GMD: lines 123–128) are more likely to be critical in mediating the transfer of signals between the different modules of a network and candidates for further experimentation.

Figure 2: Heat map from hierarchical clustering of differentially abundant elements.

Figure 2:

Rows indicate (a) microbes and (b) genes, while columns indicate samples. The pink and blue colors indicate samples belonging to the groups A (HFHS) and B (NCD), respectively. The green and red colors indicate increase and decrease, respectively, in expression or abundance, whereas brightness indicates higher fold change.

Figure 3: Clusters obtained from the correlation networks.

Figure 3:

The PUC compatible (a) microbe and (b) gene networks for an individual group (HFHS) are mined to identify densely connected sub-networks. Edges indicate significant correlation between elements.

Figure 4: Transkingdom network.

Figure 4:

A full network, for the HFHS group, contains gene-gene, microbe-microbe, and gene-microbe edges. Edges indicate significant correlation between elements. The blue and pink colors indicate gene and microbe nodes, respectively. The labeled node has the highest BBC measurement among microbes and is therefore considered to be important and a potential causal player in the experiment.

Notes

The above protocol was written for a step-by-step introduction to transkingdom network analysis.Although the above experimental setup and analyses should suffice in most cases please see the following suggestions for other alternatives to the analysis.

  • Data normalization is a crucial step in analysis and network reconstruction [51], hence choose the appropriate normalization method for your biological data [52,53]. No normalization method universally out-performs other methods. However, if unsure about which normalization to use we recommend quantile normalization followed by log transformation since, in our experience, it works well for most biological data.

  • Depending on the experimental design and biological question apply appropriate parametric (paired or unpaired t-test, analysis of variance (ANOVA), multivariate ANOVA (MANOVA), etc.) and non-parametric (Man-Whitney, Wilcoxon rank sum test, Multi-response Permutation Procedures (MRPP), etc.) tests to identify differential abundance.

  • It is common practice to visualize the levels of differentially abundant elements. The code from Heatmaps.R file can help to visualize the significant genes and microbes from our example.

  • Depending on sample size a Pearson or Spearman correlation between two elements from the same samples should suffice. However, use partial correlation [54] or other methods [55] to detect correlations and reduce indirect interactions.

  • The network analysis can be extended to identify differentially correlated genes in co-expression networks obtained for the different groups and uncover regulatory mechanisms in phenotypic transitions [56,57].

  • Cfinder and graph clustering (MCL) [19] are other tools to help identify modules in networks.

  • We can also inspect multiple network topology properties such as the degree and centrality measures using NetworkAnalyzer in Cytoscape to identify important elements in the full transkingdom network.

Supplementary Material

File S1

Acknowledgements

The authors thank Karen N. D’Souza, Khiem Lam, and Dr. Xiaoxi Dong for their help in writing the book chapter. This work was supported by the NIH U01 AI109695 (AM) and R01 DK103761 (NS).

Contributor Information

Richard R. Rodrigues, College of Pharmacy, Oregon State University, 1601 SW Jefferson Way, Corvallis, Oregon 97331, USA

Natalia Shulzhenko, College of Veterinary Medicine, Oregon State University, 105 Dryden Hall, 450 SW 30th Street, Corvallis, Oregon 97331, USA.

Andrey Morgun, College of Pharmacy, Oregon State University, 1601 SW Jefferson Way, Corvallis, Oregon 97331, USA.

References

  • 1.Schuster SC (2008) Next-generation sequencing transforms today’s biology. Nat Methods 5 (1): 16–18. doi: 10.1038/nmeth1156 [DOI] [PubMed] [Google Scholar]
  • 2.Metzker ML (2010) Sequencing technologies - the next generation. Nat Rev Genet 11 (1):31–46. doi : 10.1038/nrg2626 [DOI] [PubMed] [Google Scholar]
  • 3.Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17 (6):333–351. doi: 10.1038/nrg.2016.49 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24 (3): 133–141. doi: 10.1016/j.tig.2007.12.007 [DOI] [PubMed] [Google Scholar]
  • 5.Morozova O, Marra MA (2008) Applications of next-generation sequencing technologies in functional genomics. Genomics 92 (5):255–264. doi: 10.1016/j.ygeno.2008.07.001 [DOI] [PubMed] [Google Scholar]
  • 6.Erickson AR, Cantarel BL, Lamendella R, Darzi Y, Mongodin EF, Pan C, Shah M, Halfvarson J, Tysk C, Henrissat B, Raes J, Verberkmoes NC, Fraser CM, Hettich RL, Jansson JK (2012) Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn’s disease. PLoS One 7(11):e49138. doi: 10.1371/journal.pone.0049138 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Moreno-Risueno MA, Busch W, Benfey PN (2010) Omics meet networks - using systems approaches to infer regulatory networks in plants. Curr Opin Plant Biol 13 (2):126–131.doi: 10.1016/j.pbi.2009.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Imhann F, Vich Vila A, Bonder MJ, Fu J, Gevers D, Visschedijk MC, Spekhorst LM, Alberts R, Franke L, van Dullemen HM, Ter Steege RW, Huttenhower C, Dijkstra G, Xavier RJ, Festen EA, Wijmenga C, Zhernakova A, Weersma RK (2016) Interplay of host genetics and gut microbiota underlying the onset and clinical presentation of inflammatory bowel disease. Gut. doi: 10.1136/gutjnl-2016-312135 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Joyce AR, Palsson BO (2006) The model organism as a system: integrating ‘omics’ data sets. Nat Rev Mol Cell Biol 7 (3):198–210. doi: 10.1038/nrm1857 [DOI] [PubMed] [Google Scholar]
  • 10.Gehlenborg N, O’Donoghue SI, Baliga NS, Goesmann A, Hibbs MA, Kitano H, Kohlbacher O, Neuweger H, Schneider R, Tenenbaum D, Gavin AC (2010) Visualization of omics data for systems biology. Nat Methods 7 (3 Suppl):S56–68. doi: 10.1038/nmeth.1436 [DOI] [PubMed] [Google Scholar]
  • 11.Poirel CL, Rahman A, Rodrigues RR, Krishnan A, Addesa JR, Murali TM (2013) Reconciling differential gene expression data with molecular interaction networks. Bioinformatics 29 (5):622–629. doi: 10.1093/bioinformatics/btt007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhang W, Li F, Nie L (2010) Integrating multiple ‘omics’ analysis for microbial biology: application and methodologies. Microbiology 156 (Pt 2):287–301. doi: 10.1099/mic.0.034793-0 [DOI] [PubMed] [Google Scholar]
  • 13.Greer R, Dong X, Morgun A, Shulzhenko N (2016) Investigating a holobiont: Microbiota perturbations and transkingdom networks. Gut Microbes 7 (2): 126–135.doi: 10.1080/19490976.2015.1128625 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Greer RL, Dong X, Moraes AC, Zielke RA, Fernandes GR, Peremyslova E, Vasquez-Perez S, Schoenborn AA, Gomes EP, Pereira AC, Ferreira SR, Yao M, Fuss IJ, Strober W, Sikora AE, Taylor GA, Gulati AS, Morgun A, Shulzhenko N (2016) Akkermansia muciniphila mediates negative effects of IFNgamma on glucose metabolism. Nat Commun 7:13329. doi: 10.1038/ncomms13329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Morgun A, Dzutsev A, Dong X, Greer RL, Sexton DJ, Ravel J, Schuster M, Hsiao W, Matzinger P, Shulzhenko N (2015) Uncovering effects of antibiotics on the host and microbiota using transkingdom gene networks. Gut 64 (11): 1732–1743. doi: 10.1136/gutjnl-2014-308820 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mine KL, Shulzhenko N, Yambartsev A, Rochman M, Sanson GF, Lando M, Varma S, Skinner J, Volfovsky N, Deng T, Brenna SM, Carvalho CR, Ribalta JC, Bustin M, Matzinger P, Silva ID, Lyng H, Gerbase-DeLima M, Morgun A (2013) Gene network reconstruction reveals cell cycle and antiviral genes as major drivers of cervical cancer. Nat Commun 4:1806. doi: 10.1038/ncomms2693 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Schirmer M, Smeekens SP, Vlamakis H, Jaeger M, Oosting M, Franzosa EA, Jansen T, Jacobs L, Bonder MJ, Kurilshikov A, Fu J, Joosten LA, Zhernakova A, Huttenhower C, Wijmenga C, Netea MG, Xavier RJ (2016) Linking the Human Gut Microbiome to Inflammatory Cytokine Production Capacity. Cell 167 (4):1125–1136 e1128. doi: 10.1016/j.cell.2016.10.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Shulzhenko N, Morgun A, Hsiao W, Battle M, Yao M, Gavrilova O, Orandle M, Mayer L, Macpherson AJ, McCoy KD, Fraser-Liggett C, Matzinger P (2011) Crosstalk between B lymphocytes, microbiota and the intestinal epithelium governs immunity versus metabolism in the gut. Nat Med 17 (12): 1585–1593. doi: 10.1038/nm.2505 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Dong X, Yambartsev A, Ramsey SA, Thomas LD, Shulzhenko N, Morgun A (2015) Reverse enGENEering of Regulatory Networks from Big Data: A Roadmap for Biologists. Bioinform Biol Insights 9:61–74. doi: 10.4137/BBI.S12467 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7 (5):335–336. doi: 10.1038/nmeth.f.303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7 (3):562–578. doi: 10.1038/nprot.2012.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Laird PW (2010) Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet 11 (3): 191–203. doi: 10.1038/nrg2732 [DOI] [PubMed] [Google Scholar]
  • 23.Krumm N, Sudmant PH, Ko A, O’Roak BJ, Malig M, Coe BP, Project NES, Quinlan AR, Nickerson DA, Eichler EE (2012) Copy number variation detection and genotyping from exome sequence data. Genome Res 22 (8): 1525–1532. doi: 10.1101/gr.138115.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Perez-Diez A, Morgun A, Shulzhenko N (2007) Microarrays for cancer diagnosis and classification. Adv Exp Med Biol 593:74–85. doi: 10.1007/978-0-387-39978-2_8 [DOI] [PubMed] [Google Scholar]
  • 25.Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X (2014) Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS One 9 (1):e78644.doi: 10.1371/journal.pone.0078644 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Schmieder R, Edwards R (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27 (6):863–864. doi: 10.1093/bioinformatics/btr026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal 17 (1):10. doi: 10.14806/ej.17.1.200 [DOI] [Google Scholar]
  • 28.Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, Macmanes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, Leduc RD, Friedman N, Regev A (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8 (8): 1494–1512. doi: 10.1038/nprot.2013.084 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5 (7):621–628. doi: 10.1038/nmeth.1226 [DOI] [PubMed] [Google Scholar]
  • 30.Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11 (10):R106. doi: 10.1186/gb-2010-11-10-r106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.McCarthy DJ, Chen Y, Smyth GK (2012) Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res 40 (10):4288–4297.doi: 10.1093/nar/gks042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gautier L, Cope L, Bolstad BM, Irizarry RA (2004) affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20 (3):307–315. doi: 10.1093/bioinformatics/btg405 [DOI] [PubMed] [Google Scholar]
  • 33.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43 (7):e47. doi: 10.1093/nar/gkv007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Stackebrandt E, Goebel BM (1994) Taxonomic Note: A Place for DNA-DNA Reassociation and 16S rRNA Sequence Analysis in the Present Species Definition in Bacteriology. International Journal of Systematic and Evolutionary Microbiology 44 (4):846–849. doi: 10.1099/00207713-44-4-846 [DOI] [Google Scholar]
  • 35.Lane DJ, Pace B, Olsen GJ, Stahl DA, Sogin ML, Pace NR (1985) Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc Natl Acad Sci U S A 82 (20):6955–6959 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Brookman JL, Mennim G, Trinci AP, Theodorou MK, Tuckwell DS (2000) Identification and characterization of anaerobic gut fungi using molecular methodologies based on ribosomal ITS 1 and 185 rRNA. Microbiology 146 ( Pt 2):393–403. doi: 10.1099/00221287-146-2-393 [DOI] [PubMed] [Google Scholar]
  • 37.Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Fungal Barcoding c,Fungal Barcoding Consortium Author L (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proc Natl Acad Sci U S A 109 (16):6241–6246. doi: 10.1073/pnas.1117018109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sharpton TJ (2014) An introduction to the analysis of shotgun metagenomic data. Front Plant Sci 5:209. doi: 10.3389/fpls.2014.00209 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kuczynski J, Stombaugh J, Walters WA, Gonzalez A, Caporaso JG, Knight R (2011) Using QIIME to analyze 16S rRNA gene sequences from microbial communities. Curr Protoc Bioinformatics Chapter 10:Unit 10 17. doi: 10.1002/0471250953.bi1007s36 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75 (23):7537–7541. doi: 10.1128/AEM.01541-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Paulson JN, Stine OC, Bravo HC, Pop M (2013) Differential abundance analysis for microbial marker-gene surveys. Nat Methods 10 (12):1200–1202. doi: 10.1038/nmeth.2658 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386.doi: 10.1186/1471-2105-9-386 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Huson DH, Weber N (2013) Microbial community analysis using MEGAN. Methods Enzymol 531:465–485. doi: 10.1016/B978-0-12-407863-5.00021-6 [DOI] [PubMed] [Google Scholar]
  • 44.Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9 (8):811–814. doi: 10.1038/nmeth.2066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Lindgreen S, Adair KL, Gardner PP (2016) An evaluation of the accuracy and speed of metagenome analysis tools. Sci Rep 6:19233. doi: 10.1038/srep19233 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10 (3):R25. doi: 10.1186/gb-2009-10-3-r25 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12 (1):59–60. doi: 10.1038/nmeth.3176 [DOI] [PubMed] [Google Scholar]
  • 48.Rodrigues RR, Barry CT (2011) Gene pathway analysis of hepatocellular carcinoma genomic expression datasets. J Surg Res 170 (1):e85–92. doi: 10.1016/j.jss.2011.04.004 [DOI] [PubMed] [Google Scholar]
  • 49.Morgun A, Shulzhenko N, Perez-Diez A, Diniz RV, Sanson GF, Almeida DR, Matzinger P, Gerbase-DeLima M (2006) Molecular profiling improves diagnoses of rejection and infection in transplanted organs. Circ Res 98 (12):e74–83. doi: 10.1161/01.RES.0000228714.15691.8a [DOI] [PubMed] [Google Scholar]
  • 50.Yambartsev A, Perlin MA, Kovchegov Y, Shulzhenko N, Mine KL, Dong X, Morgun A (2016) Unexpected links reflect the noise in networks. Biol Direct 11 (1):52. doi: 10.1186/s13062-016-0155-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Saccenti E (2016) Correlation Patterns in Experimental Data Are Affected by Normalization Procedures: Consequences for Data Analysis and Network Inference. J Proteome Res.doi: 10.1021/acs.jproteome.6b00704 [DOI] [PubMed] [Google Scholar]
  • 52.Hua YJ, Tu K, Tang ZY, Li YX, Xiao HS (2008) Comparison of normalization methods with microRNA microarray. Genomics 92 (2):122–128. doi: 10.1016/j.ygeno.2008.04.002 [DOI] [PubMed] [Google Scholar]
  • 53.Li P, Piao Y, Shon HS, Ryu KH (2015) Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data. BMC Bioinformatics 16:347. doi: 10.1186/s12859-015-0778-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.de la Fuente A, Bing N, Hoeschele I, Mendes P (2004) Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20 (18):3565–3574.doi: 10.1093/bioinformatics/bth445 [DOI] [PubMed] [Google Scholar]
  • 55.Weiss S, Van Treuren W, Lozupone C, Faust K, Friedman J, Deng Y, Xia LC, Xu ZZ, Ursell L, Alm EJ, Birmingham A, Cram JA, Fuhrman JA, Raes J, Sun F, Zhou J, Knight R (2016) Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. ISME J 10 (7): 1669–1681. doi: 10.1038/ismej.2015.235 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Thomas LD, Vyshenska D, Shulzhenko N, Yambartsev A, Morgun A (2016) Differentially correlated genes in co-expression networks control phenotype transitions. F1000Research 5:2740.doi: 10.12688/f1000research.9708.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Skinner J, Kotliarov Y, Varma S, Mine KL, Yambartsev A, Simon R, Huyen Y, Morgun A (2011) Construct and Compare Gene Coexpression Networks with DAPfinder and DAPview. BMC Bioinformatics 12:286. doi: 10.1186/1471-2105-12-286 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

File S1

RESOURCES