Abstract
Evaluating changes in metabolic pathway activity is essential for studying disease mechanisms and developing new treatments, with significant benefits extending to human health. Here, we propose EMPathways2, a maximum likelihood pipeline that is based on the expectation-maximization algorithm, which is capable of evaluating enzyme expression and metabolic pathway activity level. We first estimate enzyme expression from RNA-seq data that is used for simultaneous estimation of pathway activity levels using enzyme participation levels in each pathway. We implement the novel pipeline to RNA-seq data from several groups of mice, which provides a deeper look at the biochemical changes occurring as a result of bacterial infection, disease, and immune response. Our results show that estimated enzyme expression, pathway activity levels, and enzyme participation levels in each pathway are robust and stable across all samples. Estimated activity levels of a significant number of metabolic pathways strongly correlate with the infected and uninfected status of the respective rodent types.
Keywords: enzyme expression, expectation maximization, maximum likelihood, metabolic pathway activity, mice
1. INTRODUCTION
Lyme disease, a significant public health concern, is caused by the tick-borne spirochete, Borreliella burgdorferi (B. burgdorferi). Specific mammalian hosts respond differently to B. burgdorferi infection, with varying disease manifestations (Mead, 2015; Kugeler et al., 2021). For instance, certain strains of Mus musculus (M. musculus) exhibit severe arthritis, while others like Peromyscus leucopus (P. leucopus) do not show visible disease symptoms post-infection (Barthold et al., 1990; Crandall et al., 2006; Schwanz et al., 2011). This study focuses on the infection-induced changes in gene expression to understand the potential mechanisms of disease tolerance in P. leucopus mice. Comparing transcriptomic responses with B. burgdorferi infection between P. leucopus and M. musculus (C3H/HeJ, hereafter referred to as C3H) should shed light on the disease-tolerance capacities of P. leucopus mice. Mice have been the experimental tool of choice for the vast majority of immunologists. Studying their immune responses has yielded tremendous insight into the inner workings of the human immune system (Masopust et al., 2017). Humans and mice share approximately 70% of the same protein-coding gene sequences (Margolin, 2000). Therefore, analyzing the activity of metabolic pathways of mice with contrasting immune responses is imperative to gaining a deeper understanding of human immune system. Measuring the functional activity, enrichment, and interaction of metabolic pathways in rodent groups with diametric health conditions is essential for understanding the biochemical and metabolic changes that may occur in humans during stress or disease. Despite many advances of using biomolecules (DNA, RNA, enzymes) to assess the biochemical changes in mice, it remains challenging to quantify how the expression of individual enzymes contributes to the activity of multi-enzyme metabolic pathways. In this study, we analyze differentially active metabolic pathways from RNA sequencing data to generate an efficient model for understanding metabolic pathway activity changes (Subramanian et al., 2005; Efron and Tibshirani, 2007; Mitrea et al., 2013; Shen et al., 2019). Even though advances in high-throughput sequencing have aided the exploration of RNA-seq data, it is often challenging to analyze metabolic pathway activity changes in organisms with varying health conditions, notably as existing pathway analysis tools (e.g., MinPath, MetaPathways, MEGAN4) often yield variable conclusions about the activity of pathways based on RNA data (Huson et al., 2011; Konwar et al., 2013; Ye and Doak, 2009; Sharon et al., 2011). To overcome the current challenges, we developed a workflow that uses a maximum likelihood-based model and annotations based on the KEGG (Kanehisa and Goto 2000) database to estimate transcript frequency, enzyme expression, enzyme participation in pathways, and metabolic pathway activity in microbial communities (Rondel et al., 2020; Rondel et al., 2021).
In this article, we test this model using transcriptomic data of mice infected with B. burgdorferi, an agent of Lyme disease, and their uninfected controls. The data describes the infected as well as the uninfected groups of two rodent species—M. musculus and P. leucopus to elucidate the complex metabolic pathway activity changes between rodents with inherent tolerance to B. burgdorferi infection (P. leucopus mice) and those that develop Lyme disease [a laboratory strain of C3H/HeJ (C3H) mice]. The proposed methodology is to use a maximum likelihood estimate to infer the pathway activity considering an enzyme’s participation. First, we filtered mouse-specific metabolic pathways from the KEGG database and merged the expression of enzymes represented by the same group of genes. We adjusted our expectation-maximization (EM) algorithm-based pipeline and improved it using enzyme participation level in each pathway and then used these estimations for more accurate predictions of pathway activity (Rondel et al., 2021). Our contributions include:
Estimation of metabolic enzyme expression, identification of groups of rodents’ enzymes that are represented by the same group of genes.
Estimation of enzyme-in-pathways coefficients and confirmation that they are more stable than for microbial communities in Rondel et al. (2021). Additionally, we show that these coefficients do not significantly vary across species of infected and uninfected mice.
Differential analysis of metabolic pathway activity in P. leucopus and C3H mice uninfected and infected with B. Burgdorferi.
The rest of the article is organized as follows. In the next section, we describe the pipeline of our software framework and several EM-based algorithms for estimating enzyme expression and metabolic pathway activity between two rodent species. Further on, we describe our data including sequencing data and extraction of metabolic enzymes and pathways. Finally, we use our results to provide a statistical validation of the proposed pipeline.
2. MATERIALS AND METHODS
2.1. Data procurement
The data for this study were acquired from a previous experiment conducted by (Gaber et al., 2023). It consists of 12 male mice, 6 P. leucopus and 6 C3H mice, which were split into four groups. Half of these mice were subcutaneously inoculated with B. burgdorferi 297, while the remaining mice were injected with a sterile saline solution as a control group. Following inoculation, blood samples were taken from all 12 mice to confirm the infection. At 70th day of post-inoculation, various tissues were harvested from the mice and cultured to further examine the presence or absence of viable spirochetes. Spleens were harvested from all the mice, preserved, and stored at −80 °C until RNA extraction was performed.
2.2. Pipeline for estimating metabolic pathway activity of C3H and P. leucopus mice
In the past, we created a pipeline for estimating metabolic pathway activity levels in a microbial community (Rondel et al., 2021). We explored the differential pathway activity inside of a microbial community under different conditions. Microbial community has diverse species and, in some cases, hard to interpret due to abundance of species in the samples.
Below, we describe our novel metabolic pathway activities pipeline EMPathways2 (see Fig. 1) that is used for estimating pathways activities in mice. These models are resolved using the EM algorithm (see Fig. 1).
FIG. 1.
EMPathways2 pipeline for metabolic pathway analysis for the rodent samples. The RNA-seq data obtained from the rodents are sequenced, and then raw reads are mapped into genes. The gene-obtained contigs are further mapped into the enzyme-pathway database. Gene expression is obtained using IsoEM2 (Mandric et al., 2017). Then, we estimate enzyme expression using gene expression. Finally, the pathway activity level and enzyme participation coefficients are estimated in the feedback loop.
The entire pipeline EMPathways2 consists of the following five steps:
The first step is the collection of samples from infected and uninfected rodent groups, which then get sequenced.
RNA-seq reads are mapped into reference transcriptomes of C3H and P. leucopus mice collected from the NCBI reference database. The mapped reads were used by IsoEM2 to generate gene expression data (Mandric et al., 2017).
We use KEGG to establish the many-to-many correspondence between genes and enzymes (see Section 2.3). We estimate enzyme expressions based on gene expression using EM (see Fig. 1).
Unstable enzymes that converge inconsistently were identified, grouped, and collapsed (see Section 2.4).
The feedback loop is based on inferred enzyme expressions and metabolic pathway annotation. It simultaneously estimates enzyme participation coefficients and metabolic pathways activity levels (see Section 2.5).
2.3. Mapping between genes, enzymes, and pathways for C3H and P. leucopus mice
KEGG metabolic pathway database has information on all metabolic pathways that occur in the living organisms. However, the scope of EMPathways2 is to analyze metabolic pathways in the rodents. We concentrate on 152 metabolic pathways and 2386 enzymes that play a significant role in mouse metabolism, which is confirmed by literature referenced in PubMed.
In order to compute metabolic pathway activity levels, EMPathways2 requires an input in a form of a correspondence between genes and enzymes as well as a dictionary of enzymes participating in metabolic pathways. Gene–enzyme and enzyme–pathway mappings were extracted from NCBI Entrez Molecular Sequence Database System and KEGG PATHWAY database, respectively, which provide consolidated access to nucleotide, protein sequence, and gene-centered and genomic mapping data. We used KEGG’s and NCBI’s APIs to collect raw data allowing us to produce a correspondence of genes to enzymes and enzymes to metabolic pathways. We used the collected data to create sets of genes participating in production of every enzyme, as well as sets of enzymes required for functional activity of every metabolic pathway.
2.4. Enzyme grouping
There is a many-to-many correspondence between genes and enzymes, which may pose challenges to compute enzymes expression. To resolve this challenge, we use a maximum likelihood EM model to infer enzyme expression from gene expression, which converges consistently in a vast majority of cases. However, there are enzymes that share some genes and enzymes whose genes are entirely a subset of genes used for production of another enzyme. In some of those cases, EM struggles to discern one such enzyme from its genetic relatives and, in turn, converges inconsistently from one run to another. Those enzymes that fail to converge consistently are labeled unstable and grouped into clusters whose expression as a single entity converges consistently after every EM iteration. After running a few iterations of gene–enzyme EM, we observe clusters of enzymes whose expression varies individually but they are stable in groups. The unstable enzymes individual expressions vary from one run to another. However, summing them always converge to the same expression in every run (see Table 1). This instability makes such groups of enzymes indistinguishable to our algorithm. To establish the groups accurately, we run EM and produce enzyme expression values for every enzyme. We establish clusters by evaluating the grouped enzyme expressions that do not converge consistently individually, but the sum of their expressions always converges to the same value. As a result, such enzymes must be treated as single entities. After all the unstable enzyme groups are found, we collapse them into one (see Fig. 2A). The groups are collapsed to a single enzyme with the lowest EC number nomenclature. The collapsed group enzyme is then used to compute metabolic pathway expressions of all related pathways (see Fig. 2B). In total, we found and collapsed 59 pairs, 3 triplets, and 1 quadruplet of indistinguishable enzymes. Table 2 gives the list of triplets and a quadruplet found in mice. We have compared the list of collapsed enzymes for microbial communities found in Rondel et al. (2021) with the list of collapsed enzymes in rodents. We found out that there are 28 pairs common for these two datasets.
Table 1.
A Pair of Individually Unstable Enzymes That are Stable When Summed into a Group
| Enzymes | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 |
|---|---|---|---|---|---|
| EC:3.1.3.12 | 0.054 | 0.311 | 0.251 | 0.317 | 0.12 |
| EC:2.4.1.15 | 0.404 | 0.147 | 0.207 | 0.141 | 0.338 |
| Sum | 0.458 | 0.458 | 0.458 | 0.458 | 0.458 |
FIG. 2.
(A) Enzymes that cannot be distinguished from each other must be treated as groups. (B) Enzymes that are unstable are collapsed into a single enzyme with the lowest EC nomenclature number.
Table 2.
Three Triplets and One Quadruplet of Collapsed Enzymes
| Triplet1 | Triplet2 | Triplet3 | Quadruplets |
|---|---|---|---|
| EC:1.1.1.51 | EC:6.3.4.13 | EC:2.1.3.2 | EC:6.3.4.9 |
| EC:1.1.1.213 | EC:6.3.3.1 | EC:6.3.5.5 | EC:6.3.4.10 |
| EC:1.1.1.188 | EC:2.1.2.2 | EC:3.5.2.3 | EC:6.3.4.11 |
| EC:6.3.4.15 |
2.5. Feedback loop for pathway activity level estimation
Each enzyme is initially assigned a participation coefficient of , where is the total amount of enzymes in the pathway w. The Feedback loop for pathway activity updates the enzyme participation level by fitting expected enzyme expressions to the expressions estimated by EM for enzyme expression.
The initial estimate of the participation level of an enzyme e in a pathway w may be far from accurate. However, more accurate estimates of enzyme participation can lead to more accurate estimates for the pathway activity levels. Our algorithm first estimates enzyme expression from gene expression using the EM for enzyme expression. The E-step and M-step are run in order to compute expected expression and compare it with the new estimate, respectively. After computing enzyme expressions, we then filter out enzymes with stable expressions and perform enzyme grouping on enzymes with unstable expressions. Pathway activity levels are, in turn, computed using the EM for pathway activity level.
Following, we estimate how well the computed activities fw’s fit the enzyme expressions using the EM for enzyme participation depicted in Figure 1.
Together, EM for enzyme participation and EM for pathway activity levels make up the Feedback loop for pathway activity level estimation. If the fit is not good enough, then the Feedback loop for pathway activity level is applied to update the enzyme participation levels pew’s with the EM for enzyme participation and then fw’s are recomputed according to updated pew’s.
The E-step. Compute expected ’s that will make for each ,
The M-step. Provide the new estimates by normalization for each ,
The algorithm halts when the change in estimates between iterations is small enough:
3. RESULTS
We have applied the proposed pipeline EMPathways2 to rodent RNA-seq data. For each group of rodents, we compute the mean and the standard deviation for each pathway activity level. We categorize a metabolic pathway as having significantly (resp. slightly) different activity across conditions if its standard deviation intervals do not intersect (resp. its standard deviation intervals intersect but do not contain each other’s means) for different conditions. Note that if a metabolic pathway has significantly (resp. slightly) different activity, then the probability that the activity is the same is below 0.25% (resp. 5%).
The list of metabolic pathways with significantly different activity across infected and uninfected C3H (res P. leucopus) are in Tables 3 and 4. We found that four C3H metabolic pathways are expressed with differing activity levels. For example, caffeine metabolism has a significant difference in its activity levels between the infected and uninfected groups. Note that the number of metabolic pathways of P. leucopus significantly affected by the infection is much higher than for C3H that can explain why C3H get sick after infection while P. leucopus do not show any symptoms.
Table 3.
C3H Pathways with Significantly Different Activity Levels Across Infected and Uninfected Groups
| Pathway name | ID | Infected mice Mean ± Std |
Uninfected mice Mean ± Std |
|---|---|---|---|
| Caffeine metabolism | ec00232 | 84.48 ± 1.069 | 82.888 ± 0.357 |
| Mucin type O-glycan biosynthesis | ec00512 | 0.873 ± 0.666 | 2.205 ± 0.656 |
| Pentose and glucuronate interconversions | ec00040 | 273.774 ± 0.896 | 269.624 ± 1.82 |
| Thiamine metabolism | ec00730 | 49.922 ± 0.297 | 59.741 ± 0.205 |
Table 4.
Peromyscus leucopus Pathways with Significantly Different Activity Level Across Infected and Uninfected Groups
| Pathway name | ID | Infected mice Mean ± Std |
Uninfected mice Mean ± Std |
|---|---|---|---|
| Arginine and proline metabolism | ec00330 | 108.443 ± 3.567 | 103.845 ± 1.015 |
| D-amino acid metabolism | ec00470 | 218.092 ± 0.626 | 206.601 ± 7.797 |
| Glycerophospholipid metabolism | ec00564 | 78.228 ± 0.336 | 77.621 ± 0.172 |
| Glycine, serine, and threonine metabolism | ec00260 | 49.423 ± 0.728 | 47.543 ± 0.119 |
| One carbon pool by folate | ec00670 | 66.566 ± 0.204 | 67.377 ± 0.301 |
| Selenocompound metabolism | ec00450 | 103.557 ± 25.685 | 137.99 ± 8.249 |
| Starch and sucrose metabolism | ec00500 | 64.353 ± 1.33 | 66.401 ± 0.433 |
| Tryptophan metabolism | ec00380 | 98.223 ± 0.896 | 102.88 ± 0.892 |
| Ascorbate and aldarate metabolism | ec00780 | 24.271 ± 0.578 | 25.417 ± 0.049 |
| Ascorbate and aldarate metabolism | ec00053 | 131.871 ± 1.17 | 136.458 ± 0.912 |
| Citrate cycle | ec00020 | 116.276 ± 10.912 | 128.679 ± 0.663 |
| Glycosaminoglycan biosynthesis—heparan sulfate/heparin | ec00534 | 85.392 ± 1.203 | 90.012 ± 1.656 |
| Glycosaminoglycan biosynthesis—keratan sulfate | ec00533 | 351.816 ± 1.994 | 342.511 ± 1.023 |
| GPI-anchor biosynthesis | ec00563 | 348.609 ± 1.349 | 353.073 ± 1.766 |
| Linoleic acid metabolism | ec00591 | 440.035 ± 10.893 | 423.801 ± 1.7 |
| Other glycan degradation | ec00511 | 164.744 ± 2.361 | 135.58 ± 0.722 |
| Pentose phosphate | ec00030 | 103.646 ± 0.475 | 104.649 ± 0.247 |
| Pyrimidine metabolism | ec00240 | 167.062 ± 0.407 | 179.749 ± 11.62 |
| Valine, leucine, and isoleucine biosynthesis | ec00290 | 77.081 ± 2.466 | 83.37 ± 2.5 |
| Valine, leucine, and isoleucine degradation | ec00280 | 113.366 ± 4.269 | 103.142 ± 5.56 |
| Vitamin B6 metabolism | ec00750 | 56.675 ± 0.557 | 52.601 ± 0.395 |
GPI, glycosylphosphatidylinositol.
The list of metabolic pathways with slightly different activity across infected/uninfected C3H (res P. leucopus) are in Tables 5 and 6. Note that the lists of these pathways are very different for different mouse species.
Table 5.
C3H Pathways with Slightly Different Activity Level Across Infected and Uninfected Groups
| Pathway name | ID | Infected mice Mean ± Std |
Uninfected mice Mean ± Std |
|---|---|---|---|
| Ascorbate and aldarate metabolism | ec00053 | 139.789 ± 0.958 | 142.04 ± 1.581 |
| Drug metabolism—cytochrome P450 | ec00982 | 104.598 ± 0.85 | 105.261 ± 0.518 |
| Glycine, serine, and threonine metabolism | ec00260 | 50.586 ± 0.807 | 48.544 ± 2.094 |
| Glycosaminoglycan degradation | ec00531 | 78.611 ± 0.568 | 77.616 ± 1.778 |
| Glycosphingolipid biosynthesis—globo and isoglobo series | ec00603 | 198.785 ± 8.711 | 202.718 ± 1.443 |
| Selenocompound metabolism | ec00450 | 141.024 ± 23.292 | 159.357 ± 1.326 |
| Amino sugar and nucleotide sugar metabolism | ec00520 | 105.101 ± 0.287 | 104.142 ± 1.246 |
| Arginine and proline metabolism | ec00330 | 102.133 ± 0.884 | 100.602 ± 0.933 |
| Citrate cycle (Krebs cycle) | ec00020 | 116.843 ± 12.089 | 124.87 ± 0.702 |
| Fatty acid biosynthesis | ec00061 | 303.491 ± 5.538 | 307.308 ± 0.489 |
| Fatty acid elongation | ec00062 | 67.066 ± 8.073 | 71.807 ± 0.022 |
| Folate biosynthesis | ec00790 | 302.951 ± 9.635 | 287.446 ± 9.319 |
| Glycolysis | ec00010 | 145.131 ± 6.6 | 138.049 ± 11.634 |
| Lysine degradation | ec00310 | 13.663 ± 3.617 | 8.986 ± 3.171 |
| Mannose type O-glycan biosynthesis | ec00515 | 136.003 ± 20.316 | 152.586 ± 6.335 |
| Metabolism of xenobiotics by cytochrome P450 | ec00980 | 69.32 ± 0.17 | 68.617 ± 0.827 |
| N-glycan biosynthesis | ec00510 | 221.444 ± 2.738 | 227.992 ± 5.498 |
| O-glycan biosynthesis | ec00514 | 162.416 ± 1.829 | 155.666 ± 8.056 |
| Other glycan degradation | ec00511 | 177.914 ± 1.182 | 175.957 ± 4.27 |
| Pantothenate and CoA biosynthesis | ec00770 | 24.598 ± 8.195 | 27.777 ± 0.525 |
| Pentose phosphate | ec00030 | 102.537 ± 0.314 | 97.822 ± 9.699 |
| Propanoate metabolism | ec00640 | 212.329 ± 1.465 | 201.314 ± 9.563 |
| Pyrimidine metabolism | ec00240 | 172.185 ± 4.223 | 181.393 ± 6.125 |
| Sulfur metabolism | ec00920 | 33.591 ± 7.459 | 36.213 ± 2.483 |
Table 6.
P. leucopus Pathways with Slightly Different Activity Level Across Infected and Uninfected Groups
| Pathway name | ID | Infected mice Mean ± Std |
Uninfected mice Mean ± Std |
|---|---|---|---|
| Amino sugar and nucleotide sugar metabolism | ec00520 | 104.8 ± 1.365 | 102.262 ± 2.796 |
| Arachidonic acid metabolism | ec00590 | 163.557 ± 0.317 | 162.903 ± 1.17 |
| Nitrogen metabolism | ec00910 | 102.949 ± 0.324 | 101.743 ± 0.897 |
| Folate biosynthesis | ec00790 | 314.768 ± 6.619 | 307.406 ± 1.934 |
| Fructose and mannose metabolism | ec00051 | 30.991 ± 0.403 | 30.493 ± 0.193 |
| Glutathione metabolism | ec00480 | 45.435 ± 0.73 | 44.655 ± 0.569 |
| Glycosphingolipid biosynthesis—lacto and neolacto series | ec00601 | 29.256 ± 6.267 | 41.326 ± 6.14 |
| Glyoxylate and dicarboxylate metabolism | ec00630 | 108.993 ± 11.861 | 120.784 ± 8.979 |
| Inositol phosphate metabolism | ec00562 | 39.927 ± 0.154 | 39.575 ± 0.588 |
| Porphyrin metabolism | ec00860 | 278.62 ± 1.556 | 275.075 ± 6.258 |
| Riboflavin metabolism | ec00740 | 117.214 ± 8.465 | 105.181 ± 5.623 |
| Steroid hormone biosynthesis | ec00140 | 131.347 ± 2.431 | 132.832 ± 0.644 |
| Thiamine metabolism | ec00730 | 58.599 ± 0.158 | 58.15 ± 0.715 |
| Tyrosine metabolism | ec00350 | 70.298 ± 2.634 | 66.036 ± 2.207 |
| Ubiquinone and other terpenoid-quinone biosynthesis | ec00130 | 194.363 ± 4.996 | 201.709 ± 5.371 |
Finally, we check how stable are the enzyme participation coefficients across different mouse species (see Table 7). Note that the average relative standard deviation (RSD) for C3H is 2.7% in contrast to much higher RSD of 8.9% for P. leucopus. This can be caused by the fact that C3H mice are genetically identical. Note that the average RSD for enzyme participation coefficients in the microbial community for the same metabolic pathway (ec00620) is 34.8%, which is significantly higher (see Rondel et al., 2021) than RSD for mice.
Table 7.
The Enzyme Expression Coefficients and Relative Standard Deviations (%RSD) for the Enzyme Participation Coefficients in Pathway ec00620
| ec00620 | Infected C3H | Uninfected C3H | %RSD | Infected P. leucopus | Uninfected P. leucopus | %RSD | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EC:1.1.1.1 | 0.110 | 0.107 | 0.113 | 0.106 | 0.109 | 0.112 | 2.501 | 0.054 | 0.061 | 0.049 | 0.051 | 0.045 | 0.048 | 10.928 |
| EC:1.5.8.3 | 0.027 | 0.025 | 0.026 | 0.026 | 0.026 | 0.026 | 2.433 | 0.035 | 0.031 | 0.038 | 0.033 | 0.035 | 0.041 | 10.039 |
| EC:3.1.3.3 | 0.027 | 0.025 | 0.026 | 0.026 | 0.026 | 0.026 | 2.433 | 0.035 | 0.031 | 0.038 | 0.033 | 0.035 | 0.041 | 10.039 |
| EC:2.1.2.10 | 0.034 | 0.034 | 0.035 | 0.035 | 0.034 | 0.034 | 1.504 | 0.028 | 0.030 | 0.027 | 0.027 | 0.025 | 0.025 | 7.027 |
| EC:5.1.1.18 | 0.032 | 0.038 | 0.033 | 0.037 | 0.034 | 0.031 | 8.157 | 0.013 | 0.016 | 0.011 | 0.015 | 0.012 | 0.012 | 14.740 |
| EC:1.4.3.21 | 0.050 | 0.055 | 0.055 | 0.054 | 0.054 | 0.051 | 4.019 | 0.028 | 0.029 | 0.019 | 0.027 | 0.025 | 0.024 | 14.269 |
| EC:2.6.1.52 | 0.059 | 0.058 | 0.060 | 0.059 | 0.061 | 0.060 | 1.763 | 0.047 | 0.050 | 0.042 | 0.043 | 0.041 | 0.040 | 8.826 |
| EC:2.1.1.20 | 0.027 | 0.025 | 0.026 | 0.026 | 0.026 | 0.026 | 2.433 | 0.035 | 0.031 | 0.038 | 0.033 | 0.035 | 0.041 | 10.039 |
| EC:2.7.1.165 | 0.095 | 0.086 | 0.087 | 0.088 | 0.087 | 0.088 | 3.696 | 0.077 | 0.067 | 0.074 | 0.061 | 0.060 | 0.059 | 11.586 |
| EC:1.5.3.1 | 0.027 | 0.025 | 0.026 | 0.026 | 0.026 | 0.026 | 2.433 | 0.035 | 0.031 | 0.038 | 0.033 | 0.035 | 0.041 | 10.039 |
| EC:2.3.1.29 | 0.027 | 0.025 | 0.026 | 0.026 | 0.026 | 0.026 | 2.433 | 0.035 | 0.031 | 0.038 | 0.033 | 0.035 | 0.041 | 10.039 |
| EC:4.1.2.48 | 0.027 | 0.025 | 0.026 | 0.026 | 0.026 | 0.026 | 2.433 | 0.035 | 0.031 | 0.038 | 0.033 | 0.035 | 0.041 | 10.039 |
| EC:1.1.99.1 | 0.027 | 0.025 | 0.026 | 0.026 | 0.026 | 0.026 | 2.433 | 0.035 | 0.031 | 0.038 | 0.033 | 0.035 | 0.041 | 10.039 |
| EC:2.3.1.37 | 0.055 | 0.052 | 0.055 | 0.055 | 0.056 | 0.055 | 2.499 | 0.072 | 0.066 | 0.074 | 0.067 | 0.070 | 0.077 | 5.909 |
| EC:2.1.2.1 | 0.051 | 0.051 | 0.052 | 0.052 | 0.052 | 0.050 | 1.591 | 0.038 | 0.041 | 0.035 | 0.036 | 0.034 | 0.033 | 8.093 |
| EC:1.1.1.95 | 0.050 | 0.048 | 0.050 | 0.050 | 0.050 | 0.050 | 1.644 | 0.050 | 0.050 | 0.048 | 0.048 | 0.046 | 0.049 | 3.127 |
| EC:1.1.1.103 | 0.027 | 0.025 | 0.026 | 0.026 | 0.026 | 0.026 | 2.433 | 0.035 | 0.031 | 0.038 | 0.033 | 0.035 | 0.041 | 10.039 |
| EC:2.1.4.1 | 0.049 | 0.047 | 0.049 | 0.049 | 0.049 | 0.050 | 2.013 | 0.049 | 0.049 | 0.044 | 0.047 | 0.046 | 0.051 | 5.252 |
| EC:4.2.1.22 | 0.050 | 0.048 | 0.050 | 0.050 | 0.050 | 0.050 | 1.644 | 0.050 | 0.050 | 0.048 | 0.048 | 0.046 | 0.049 | 3.127 |
| EC:4.4.1.1 | 0.061 | 0.059 | 0.061 | 0.061 | 0.060 | 0.060 | 1.353 | 0.047 | 0.050 | 0.043 | 0.045 | 0.042 | 0.042 | 7.112 |
| EC:4.3.1.17 | 0.050 | 0.048 | 0.050 | 0.050 | 0.050 | 0.050 | 1.644 | 0.050 | 0.050 | 0.048 | 0.048 | 0.046 | 0.049 | 3.127 |
| EC:1.4.3.4 | 0.062 | 0.070 | 0.064 | 0.070 | 0.067 | 0.068 | 4.864 | 0.028 | 0.033 | 0.023 | 0.028 | 0.027 | 0.026 | 11.895 |
| EC:1.4.3.3 | 0.046 | 0.053 | 0.047 | 0.052 | 0.049 | 0.045 | 6.711 | 0.019 | 0.023 | 0.015 | 0.021 | 0.017 | 0.017 | 15.771 |
| EC:1.8.1.4 | 0.088 | 0.090 | 0.089 | 0.091 | 0.090 | 0.090 | 1.152 | 0.042 | 0.049 | 0.034 | 0.043 | 0.039 | 0.040 | 12.040 |
| EC:2.1.1.5 | 0.050 | 0.048 | 0.050 | 0.050 | 0.050 | 0.050 | 1.644 | 0.050 | 0.050 | 0.048 | 0.048 | 0.046 | 0.049 | 3.127 |
| EC:2.1.1.2 | 0.049 | 0.047 | 0.049 | 0.049 | 0.049 | 0.050 | 2.013 | 0.049 | 0.049 | 0.044 | 0.047 | 0.046 | 0.051 | 5.252 |
4. DISCUSSION
The results of our study highlight the potential of using RNA-seq data to estimate enzyme expression and metabolic pathway activity in the rodent models of disease. Our modified EM-based pipeline, EMPathways2, has successfully demonstrated its ability to estimate enzyme expression, enzyme participation in pathways, and metabolic pathway activity levels in both infected and uninfected mice.
These findings further enhance our understanding of the biochemical changes occurring in the host during bacterial infection. The differences in enzyme expression and pathway activity levels between infected and uninfected mice could provide insights into the immune response mechanisms at the metabolic level. This, in turn, can potentially be used to develop new therapeutic strategies for bacterial infections and other diseases.
The variation in pathway activity levels between C3H and P. leucopus sheds light on the different immune responses in these rodent species. The higher number of pathways significantly affected by infection in P. leucopus compared with C3H may explain why C3H mice develop Lyme disease symptoms while P. leucopus mice do not. This highlights the importance of considering the host species in understanding the disease pathogenesis. It is also interesting to note that the enzyme participation coefficients were more stable in C3H compared with P. leucopus. This could be owing to the genetic similarities among laboratory mouse strains, as compared with wild mice.
5. CONCLUSIONS
In this article, we propose an improved maximum likelihood-based pipeline for the estimation of metabolic pathway activity in mice using the KEGG pathway database. Specifically, the proposed approach uses EM-based algorithms to estimate enzyme expression, enzyme participation levels in pathways, and metabolic pathway activity.
The proposed metabolic pathway analysis was applied to the RNA-seq data from 12 mice samples collected from C3H and P. leucopus with half of them being infected by B. burgdorferi 297. The key findings of the study are as follows:
The infection affects metabolism of both mice while for P. leucopus, the affect is more significant than for C3H.
The enzymes participation coefficients vary insignificantly for C3H in contrast to higher variation for P. leucopus and much higher variation for microbial communities.
ACKNOWLEDGMENTS
The authors would like to thank Dr. Artem S. Rogovskyy for his invaluable expertise and guidance throughout this project.
DATA AVAILABILITY
The data presented in the study are deposited in the GenBank Sequence Read Archive (SRA) depository and the SRA accession numbers are SAMN32740077, SAMN32740078, SAMN32740079, SAMN32740080, SAMN32740081, SAMN32740082, SAMN32740083, SAMN32740084, SAMN32740085, SAMN32740086, SAMN32740087, SAMN32740088.
AUTHORS’ CONTRIBUTIONS
F.R. did conceptualization and methodology and writing of the article. H.F. and R.H. worked on implementation of methodology of metabolic pathway retrieval and did some writing of the draft. A.J. helped in methodology technically and also did formatting of the draft. S.K. worked on conceptualization and preparing the initial implementation of the project. S.M. and A.Z. supervised the project along with funding acquisition.
AUTHOR DISCLOSURE STATEMENT
The authors state that the research was conducted without any commercial or financial affiliations that could be interpreted as potential conflicts of interest.
FUNDING INFORMATION
The work at Georgia State University (GSU) was partially supported by NIH grant 1R21CA241044-01A1, NSF grant IIS-2212508, by the GSU Molecular Basis of Disease Fellowship, and by the GSU Brain and Behavior Fellowship. S.M. was supported by the National Science Foundation Grants (2041984 and 2135954) and the National Institutes of Health Grant R01AI173172. The work at Texas A&M University was partially supported by the National Institutes of Health (NIH) grant R03AI135159-02, the Department of Veterinary Pathobiology, Texas A&M School of Veterinary Medicine and Biomedical Sciences, and Texas A&M AgriLife.
REFERENCES
- Barthold SW, Beck DS, Hansen GM, et al. Lyme borreliosis in selected strains and ages of laboratory mice. J Infect Dis 1990;162(1):133–138. [DOI] [PubMed] [Google Scholar]
- Crandall H, Dunn DM, Ma Y, et al. Gene expression profiling reveals unique pathways associated with differential severity of Lyme arthritis. J Immunol 2006;177(11):7930–7942. [DOI] [PubMed] [Google Scholar]
- Efron B, Tibshirani R. On testing the significance of sets of genes. 2007.
- Gaber AM, Mandric I, Nitirahardjo C, et al. Comparative transcriptome analysis of Peromyscus leucopus and C3H mice infected with the Lyme disease pathogen. Front Cell Infect Microbiol 2023;13:1115350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huson DH, Mitra S, Ruscheweyh HJ, et al. Integrative analysis of environmental sequences using megan4. Genome Res 2011;21(9):1552–1560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanehisa M, Goto S. Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28(1):27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Konwar KM, Hanson NW, Pagé AP, et al. Metapathways: A modular pipeline for constructing pathway/genome databases from environmental sequence information. BMC Bioinformatics 2013;14:202–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kugeler KJ, Schwartz AM, Delorey MJ, et al. Estimating the frequency of Lyme disease diagnoses, United States, 2010–2018. Emerg Infect Dis 2021;27(2):616–619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mandric I, Temate-Tiagueu Y, Shcheglova T, et al. Fast bootstrapping-based estimation of confidence intervals of expression levels and differential expression from RNA-seq data. Bioinformatics 2017;33(20):3302–3304. [DOI] [PubMed] [Google Scholar]
- Margolin J. Of mice, men, and the genome. Genome Res 2000;10(10):1431–1432. [DOI] [PubMed] [Google Scholar]
- Masopust D, Sivula CP, Jameson SC. Of mice, dirty mice, and men: Using mice to understand human immunology. J Immunol 2017;199(2):383–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mead PS. Epidemiology of Lyme disease. Infect Dis Clin 2015;29(2):187–210. [DOI] [PubMed] [Google Scholar]
- Mitrea C, Taghavi Z, Bokanizad B, et al. Methods and approaches in the topology-based analysis of biological pathways. Front Physiol 2013;4:278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rondel F, Hosseini R, Sahoo B, et al. Estimating enzyme participation in metabolic pathways for microbial communities from rna-seq data. In Bioinformatics Research and Applications: 16th International Symposium, ISBRA 2020, Moscow, Russia, December 1–4, 2020, Proceedings 16, pp. 335–343. Springer, 2020. [Google Scholar]
- Rondel FM, Hosseini R, Sahoo B, et al. Pipeline for analyzing activity of metabolic pathways in planktonic communities using metatranscriptomic data. J Comput Biol 2021;28(8):842–855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwanz LE, Voordouw MJ, Brisson D, et al. Borrelia burgdorferi has minimal impact on the Lyme disease reservoir host Peromyscus leucopus. Vector-Borne Zoonotic Diseases 2011;11(2):117–124. [DOI] [PubMed] [Google Scholar]
- Sharon I, Bercovici S, Pinter RY, et al. Pathway-based functional analysis of metagenomes. J Comput Biol 2011;18(3):495–505. [DOI] [PubMed] [Google Scholar]
- Shen M, Li Q, Ren M, et al. Trophic status is associated with community structure and metabolic potential of planktonic microbiota in plateau lakes. Front Microbiol 2019;10:2560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005;102(43):15545–15550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ye Y, Doak TG. A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol 2009;5(8):e1000465. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data presented in the study are deposited in the GenBank Sequence Read Archive (SRA) depository and the SRA accession numbers are SAMN32740077, SAMN32740078, SAMN32740079, SAMN32740080, SAMN32740081, SAMN32740082, SAMN32740083, SAMN32740084, SAMN32740085, SAMN32740086, SAMN32740087, SAMN32740088.


