Skip to main content
Journal of Computational Biology logoLink to Journal of Computational Biology
. 2025 Feb 12;32(2):188–197. doi: 10.1089/cmb.2024.0564

Estimating Enzyme Expression and Metabolic Pathway Activity in Borreliella-Infected and Uninfected Mice

Filipp Martin Rondel 1, Hafsa Farooq 1,, Roya Hosseini 1, Akshay Juyal 1, Sergey Knyazev 2,3, Serghei Mangul 3,4,5, Artem S Rogovskyy 6, Alexander Zelikovsky 1
PMCID: PMC11947643  PMID: 38934087

Abstract

Evaluating changes in metabolic pathway activity is essential for studying disease mechanisms and developing new treatments, with significant benefits extending to human health. Here, we propose EMPathways2, a maximum likelihood pipeline that is based on the expectation-maximization algorithm, which is capable of evaluating enzyme expression and metabolic pathway activity level. We first estimate enzyme expression from RNA-seq data that is used for simultaneous estimation of pathway activity levels using enzyme participation levels in each pathway. We implement the novel pipeline to RNA-seq data from several groups of mice, which provides a deeper look at the biochemical changes occurring as a result of bacterial infection, disease, and immune response. Our results show that estimated enzyme expression, pathway activity levels, and enzyme participation levels in each pathway are robust and stable across all samples. Estimated activity levels of a significant number of metabolic pathways strongly correlate with the infected and uninfected status of the respective rodent types.

Keywords: enzyme expression, expectation maximization, maximum likelihood, metabolic pathway activity, mice

1. INTRODUCTION

Lyme disease, a significant public health concern, is caused by the tick-borne spirochete, Borreliella burgdorferi (B. burgdorferi). Specific mammalian hosts respond differently to B. burgdorferi infection, with varying disease manifestations (Mead, 2015; Kugeler et al., 2021). For instance, certain strains of Mus musculus (M. musculus) exhibit severe arthritis, while others like Peromyscus leucopus (P. leucopus) do not show visible disease symptoms post-infection (Barthold et al., 1990; Crandall et al., 2006; Schwanz et al., 2011). This study focuses on the infection-induced changes in gene expression to understand the potential mechanisms of disease tolerance in P. leucopus mice. Comparing transcriptomic responses with B. burgdorferi infection between P. leucopus and M. musculus (C3H/HeJ, hereafter referred to as C3H) should shed light on the disease-tolerance capacities of P. leucopus mice. Mice have been the experimental tool of choice for the vast majority of immunologists. Studying their immune responses has yielded tremendous insight into the inner workings of the human immune system (Masopust et al., 2017). Humans and mice share approximately 70% of the same protein-coding gene sequences (Margolin, 2000). Therefore, analyzing the activity of metabolic pathways of mice with contrasting immune responses is imperative to gaining a deeper understanding of human immune system. Measuring the functional activity, enrichment, and interaction of metabolic pathways in rodent groups with diametric health conditions is essential for understanding the biochemical and metabolic changes that may occur in humans during stress or disease. Despite many advances of using biomolecules (DNA, RNA, enzymes) to assess the biochemical changes in mice, it remains challenging to quantify how the expression of individual enzymes contributes to the activity of multi-enzyme metabolic pathways. In this study, we analyze differentially active metabolic pathways from RNA sequencing data to generate an efficient model for understanding metabolic pathway activity changes (Subramanian et al., 2005; Efron and Tibshirani, 2007; Mitrea et al., 2013; Shen et al., 2019). Even though advances in high-throughput sequencing have aided the exploration of RNA-seq data, it is often challenging to analyze metabolic pathway activity changes in organisms with varying health conditions, notably as existing pathway analysis tools (e.g., MinPath, MetaPathways, MEGAN4) often yield variable conclusions about the activity of pathways based on RNA data (Huson et al., 2011; Konwar et al., 2013; Ye and Doak, 2009; Sharon et al., 2011). To overcome the current challenges, we developed a workflow that uses a maximum likelihood-based model and annotations based on the KEGG (Kanehisa and Goto 2000) database to estimate transcript frequency, enzyme expression, enzyme participation in pathways, and metabolic pathway activity in microbial communities (Rondel et al., 2020; Rondel et al., 2021).

In this article, we test this model using transcriptomic data of mice infected with B. burgdorferi, an agent of Lyme disease, and their uninfected controls. The data describes the infected as well as the uninfected groups of two rodent species—M. musculus and P. leucopus to elucidate the complex metabolic pathway activity changes between rodents with inherent tolerance to B. burgdorferi infection (P. leucopus mice) and those that develop Lyme disease [a laboratory strain of C3H/HeJ (C3H) mice]. The proposed methodology is to use a maximum likelihood estimate to infer the pathway activity considering an enzyme’s participation. First, we filtered mouse-specific metabolic pathways from the KEGG database and merged the expression of enzymes represented by the same group of genes. We adjusted our expectation-maximization (EM) algorithm-based pipeline and improved it using enzyme participation level in each pathway and then used these estimations for more accurate predictions of pathway activity (Rondel et al., 2021). Our contributions include:

  • Estimation of metabolic enzyme expression, identification of groups of rodents’ enzymes that are represented by the same group of genes.

  • Estimation of enzyme-in-pathways coefficients and confirmation that they are more stable than for microbial communities in Rondel et al. (2021). Additionally, we show that these coefficients do not significantly vary across species of infected and uninfected mice.

  • Differential analysis of metabolic pathway activity in P. leucopus and C3H mice uninfected and infected with B. Burgdorferi.

The rest of the article is organized as follows. In the next section, we describe the pipeline of our software framework and several EM-based algorithms for estimating enzyme expression and metabolic pathway activity between two rodent species. Further on, we describe our data including sequencing data and extraction of metabolic enzymes and pathways. Finally, we use our results to provide a statistical validation of the proposed pipeline.

2. MATERIALS AND METHODS

2.1. Data procurement

The data for this study were acquired from a previous experiment conducted by (Gaber et al., 2023). It consists of 12 male mice, 6 P. leucopus and 6 C3H mice, which were split into four groups. Half of these mice were subcutaneously inoculated with B. burgdorferi 297, while the remaining mice were injected with a sterile saline solution as a control group. Following inoculation, blood samples were taken from all 12 mice to confirm the infection. At 70th day of post-inoculation, various tissues were harvested from the mice and cultured to further examine the presence or absence of viable spirochetes. Spleens were harvested from all the mice, preserved, and stored at −80 °C until RNA extraction was performed.

2.2. Pipeline for estimating metabolic pathway activity of C3H and P. leucopus mice

In the past, we created a pipeline for estimating metabolic pathway activity levels in a microbial community (Rondel et al., 2021). We explored the differential pathway activity inside of a microbial community under different conditions. Microbial community has diverse species and, in some cases, hard to interpret due to abundance of species in the samples.

Below, we describe our novel metabolic pathway activities pipeline EMPathways2 (see Fig. 1) that is used for estimating pathways activities in mice. These models are resolved using the EM algorithm (see Fig. 1).

FIG. 1.

FIG. 1.

EMPathways2 pipeline for metabolic pathway analysis for the rodent samples. The RNA-seq data obtained from the rodents are sequenced, and then raw reads are mapped into genes. The gene-obtained contigs are further mapped into the enzyme-pathway database. Gene expression is obtained using IsoEM2 (Mandric et al., 2017). Then, we estimate enzyme expression using gene expression. Finally, the pathway activity level and enzyme participation coefficients are estimated in the feedback loop.

The entire pipeline EMPathways2 consists of the following five steps:

  • The first step is the collection of samples from infected and uninfected rodent groups, which then get sequenced.

  • RNA-seq reads are mapped into reference transcriptomes of C3H and P. leucopus mice collected from the NCBI reference database. The mapped reads were used by IsoEM2 to generate gene expression data (Mandric et al., 2017).

  • We use KEGG to establish the many-to-many correspondence between genes and enzymes (see Section 2.3). We estimate enzyme expressions based on gene expression using EM (see Fig. 1).

  • Unstable enzymes that converge inconsistently were identified, grouped, and collapsed (see Section 2.4).

  • The feedback loop is based on inferred enzyme expressions and metabolic pathway annotation. It simultaneously estimates enzyme participation coefficients and metabolic pathways activity levels (see Section 2.5).

2.3. Mapping between genes, enzymes, and pathways for C3H and P. leucopus mice

KEGG metabolic pathway database has information on all metabolic pathways that occur in the living organisms. However, the scope of EMPathways2 is to analyze metabolic pathways in the rodents. We concentrate on 152 metabolic pathways and 2386 enzymes that play a significant role in mouse metabolism, which is confirmed by literature referenced in PubMed.

In order to compute metabolic pathway activity levels, EMPathways2 requires an input in a form of a correspondence between genes and enzymes as well as a dictionary of enzymes participating in metabolic pathways. Gene–enzyme and enzyme–pathway mappings were extracted from NCBI Entrez Molecular Sequence Database System and KEGG PATHWAY database, respectively, which provide consolidated access to nucleotide, protein sequence, and gene-centered and genomic mapping data. We used KEGG’s and NCBI’s APIs to collect raw data allowing us to produce a correspondence of genes to enzymes and enzymes to metabolic pathways. We used the collected data to create sets of genes participating in production of every enzyme, as well as sets of enzymes required for functional activity of every metabolic pathway.

2.4. Enzyme grouping

There is a many-to-many correspondence between genes and enzymes, which may pose challenges to compute enzymes expression. To resolve this challenge, we use a maximum likelihood EM model to infer enzyme expression from gene expression, which converges consistently in a vast majority of cases. However, there are enzymes that share some genes and enzymes whose genes are entirely a subset of genes used for production of another enzyme. In some of those cases, EM struggles to discern one such enzyme from its genetic relatives and, in turn, converges inconsistently from one run to another. Those enzymes that fail to converge consistently are labeled unstable and grouped into clusters whose expression as a single entity converges consistently after every EM iteration. After running a few iterations of gene–enzyme EM, we observe clusters of enzymes whose expression varies individually but they are stable in groups. The unstable enzymes individual expressions vary from one run to another. However, summing them always converge to the same expression in every run (see Table 1). This instability makes such groups of enzymes indistinguishable to our algorithm. To establish the groups accurately, we run EM and produce enzyme expression values for every enzyme. We establish clusters by evaluating the grouped enzyme expressions that do not converge consistently individually, but the sum of their expressions always converges to the same value. As a result, such enzymes must be treated as single entities. After all the unstable enzyme groups are found, we collapse them into one (see Fig. 2A). The groups are collapsed to a single enzyme with the lowest EC number nomenclature. The collapsed group enzyme is then used to compute metabolic pathway expressions of all related pathways (see Fig. 2B). In total, we found and collapsed 59 pairs, 3 triplets, and 1 quadruplet of indistinguishable enzymes. Table 2 gives the list of triplets and a quadruplet found in mice. We have compared the list of collapsed enzymes for microbial communities found in Rondel et al. (2021) with the list of collapsed enzymes in rodents. We found out that there are 28 pairs common for these two datasets.

Table 1.

A Pair of Individually Unstable Enzymes That are Stable When Summed into a Group

Enzymes Run 1 Run 2 Run 3 Run 4 Run 5
EC:3.1.3.12 0.054 0.311 0.251 0.317 0.12
EC:2.4.1.15 0.404 0.147 0.207 0.141 0.338
Sum 0.458 0.458 0.458 0.458 0.458

FIG. 2.

FIG. 2.

(A) Enzymes that cannot be distinguished from each other must be treated as groups. (B) Enzymes that are unstable are collapsed into a single enzyme with the lowest EC nomenclature number.

Table 2.

Three Triplets and One Quadruplet of Collapsed Enzymes

Triplet1 Triplet2 Triplet3 Quadruplets
EC:1.1.1.51 EC:6.3.4.13 EC:2.1.3.2 EC:6.3.4.9
EC:1.1.1.213 EC:6.3.3.1 EC:6.3.5.5 EC:6.3.4.10
EC:1.1.1.188 EC:2.1.2.2 EC:3.5.2.3 EC:6.3.4.11
      EC:6.3.4.15

2.5. Feedback loop for pathway activity level estimation

Each enzyme is initially assigned a participation coefficient of 1/|w|, where |w| is the total amount of enzymes in the pathway w. The Feedback loop for pathway activity updates the enzyme participation level by fitting expected enzyme expressions to the expressions estimated by EM for enzyme expression.

The initial estimate of the participation level of an enzyme e in a pathway w may be far from accurate. However, more accurate estimates of enzyme participation can lead to more accurate estimates for the pathway activity levels. Our algorithm first estimates enzyme expression from gene expression using the EM for enzyme expression. The E-step and M-step are run in order to compute expected expression and compare it with the new estimate, respectively. After computing enzyme expressions, we then filter out enzymes with stable expressions and perform enzyme grouping on enzymes with unstable expressions. Pathway activity levels are, in turn, computed using the EM for pathway activity level.

Following, we estimate how well the computed activities fw’s fit the enzyme expressions using the EM for enzyme participation depicted in Figure 1.

Together, EM for enzyme participation and EM for pathway activity levels make up the Feedback loop for pathway activity level estimation. If the fit is not good enough, then the Feedback loop for pathway activity level is applied to update the enzyme participation levels pew’s with the EM for enzyme participation and then fw’s are recomputed according to updated pew’s.

The E-step. Compute expected pewexp’s that will make fe=feexp for each eE,wW,

pewexp=pew×fefeexp

The M-step. Provide the new estimates by normalization for each eE,wW,

pewnew=pewexpeEpewexp

The algorithm halts when the change in estimates between iterations is small enough:

||pnewp||=eE,wW(pewnewpew)2ϵ1

3. RESULTS

We have applied the proposed pipeline EMPathways2 to rodent RNA-seq data. For each group of rodents, we compute the mean and the standard deviation for each pathway activity level. We categorize a metabolic pathway as having significantly (resp. slightly) different activity across conditions if its standard deviation intervals do not intersect (resp. its standard deviation intervals intersect but do not contain each other’s means) for different conditions. Note that if a metabolic pathway has significantly (resp. slightly) different activity, then the probability that the activity is the same is below 0.25% (resp. 5%).

The list of metabolic pathways with significantly different activity across infected and uninfected C3H (res P. leucopus) are in Tables 3 and 4. We found that four C3H metabolic pathways are expressed with differing activity levels. For example, caffeine metabolism has a significant difference in its activity levels between the infected and uninfected groups. Note that the number of metabolic pathways of P. leucopus significantly affected by the infection is much higher than for C3H that can explain why C3H get sick after infection while P. leucopus do not show any symptoms.

Table 3.

C3H Pathways with Significantly Different Activity Levels Across Infected and Uninfected Groups

Pathway name ID Infected mice
Mean ± Std
Uninfected mice
Mean ± Std
Caffeine metabolism ec00232 84.48 ± 1.069 82.888 ± 0.357
Mucin type O-glycan biosynthesis ec00512 0.873 ± 0.666 2.205 ± 0.656
Pentose and glucuronate interconversions ec00040 273.774 ± 0.896 269.624 ± 1.82
Thiamine metabolism ec00730 49.922 ± 0.297 59.741 ± 0.205

Table 4.

Peromyscus leucopus Pathways with Significantly Different Activity Level Across Infected and Uninfected Groups

Pathway name ID Infected mice
Mean ± Std
Uninfected mice
Mean ± Std
Arginine and proline metabolism ec00330 108.443 ± 3.567 103.845 ± 1.015
D-amino acid metabolism ec00470 218.092 ± 0.626 206.601 ± 7.797
Glycerophospholipid metabolism ec00564 78.228 ± 0.336 77.621 ± 0.172
Glycine, serine, and threonine metabolism ec00260 49.423 ± 0.728 47.543 ± 0.119
One carbon pool by folate ec00670 66.566 ± 0.204 67.377 ± 0.301
Selenocompound metabolism ec00450 103.557 ± 25.685 137.99 ± 8.249
Starch and sucrose metabolism ec00500 64.353 ± 1.33 66.401 ± 0.433
Tryptophan metabolism ec00380 98.223 ± 0.896 102.88 ± 0.892
Ascorbate and aldarate metabolism ec00780 24.271 ± 0.578 25.417 ± 0.049
Ascorbate and aldarate metabolism ec00053 131.871 ± 1.17 136.458 ± 0.912
Citrate cycle ec00020 116.276 ± 10.912 128.679 ± 0.663
Glycosaminoglycan biosynthesis—heparan sulfate/heparin ec00534 85.392 ± 1.203 90.012 ± 1.656
Glycosaminoglycan biosynthesis—keratan sulfate ec00533 351.816 ± 1.994 342.511 ± 1.023
GPI-anchor biosynthesis ec00563 348.609 ± 1.349 353.073 ± 1.766
Linoleic acid metabolism ec00591 440.035 ± 10.893 423.801 ± 1.7
Other glycan degradation ec00511 164.744 ± 2.361 135.58 ± 0.722
Pentose phosphate ec00030 103.646 ± 0.475 104.649 ± 0.247
Pyrimidine metabolism ec00240 167.062 ± 0.407 179.749 ± 11.62
Valine, leucine, and isoleucine biosynthesis ec00290 77.081 ± 2.466 83.37 ± 2.5
Valine, leucine, and isoleucine degradation ec00280 113.366 ± 4.269 103.142 ± 5.56
Vitamin B6 metabolism ec00750 56.675 ± 0.557 52.601 ± 0.395

GPI, glycosylphosphatidylinositol.

The list of metabolic pathways with slightly different activity across infected/uninfected C3H (res P. leucopus) are in Tables 5 and 6. Note that the lists of these pathways are very different for different mouse species.

Table 5.

C3H Pathways with Slightly Different Activity Level Across Infected and Uninfected Groups

Pathway name ID Infected mice
Mean ± Std
Uninfected mice
Mean ± Std
Ascorbate and aldarate metabolism ec00053 139.789 ± 0.958 142.04 ± 1.581
Drug metabolism—cytochrome P450 ec00982 104.598 ± 0.85 105.261 ± 0.518
Glycine, serine, and threonine metabolism ec00260 50.586 ± 0.807 48.544 ± 2.094
Glycosaminoglycan degradation ec00531 78.611 ± 0.568 77.616 ± 1.778
Glycosphingolipid biosynthesis—globo and isoglobo series ec00603 198.785 ± 8.711 202.718 ± 1.443
Selenocompound metabolism ec00450 141.024 ± 23.292 159.357 ± 1.326
Amino sugar and nucleotide sugar metabolism ec00520 105.101 ± 0.287 104.142 ± 1.246
Arginine and proline metabolism ec00330 102.133 ± 0.884 100.602 ± 0.933
Citrate cycle (Krebs cycle) ec00020 116.843 ± 12.089 124.87 ± 0.702
Fatty acid biosynthesis ec00061 303.491 ± 5.538 307.308 ± 0.489
Fatty acid elongation ec00062 67.066 ± 8.073 71.807 ± 0.022
Folate biosynthesis ec00790 302.951 ± 9.635 287.446 ± 9.319
Glycolysis ec00010 145.131 ± 6.6 138.049 ± 11.634
Lysine degradation ec00310 13.663 ± 3.617 8.986 ± 3.171
Mannose type O-glycan biosynthesis ec00515 136.003 ± 20.316 152.586 ± 6.335
Metabolism of xenobiotics by cytochrome P450 ec00980 69.32 ± 0.17 68.617 ± 0.827
N-glycan biosynthesis ec00510 221.444 ± 2.738 227.992 ± 5.498
O-glycan biosynthesis ec00514 162.416 ± 1.829 155.666 ± 8.056
Other glycan degradation ec00511 177.914 ± 1.182 175.957 ± 4.27
Pantothenate and CoA biosynthesis ec00770 24.598 ± 8.195 27.777 ± 0.525
Pentose phosphate ec00030 102.537 ± 0.314 97.822 ± 9.699
Propanoate metabolism ec00640 212.329 ± 1.465 201.314 ± 9.563
Pyrimidine metabolism ec00240 172.185 ± 4.223 181.393 ± 6.125
Sulfur metabolism ec00920 33.591 ± 7.459 36.213 ± 2.483

Table 6.

P. leucopus Pathways with Slightly Different Activity Level Across Infected and Uninfected Groups

Pathway name ID Infected mice
Mean ± Std
Uninfected mice
Mean ± Std
Amino sugar and nucleotide sugar metabolism ec00520 104.8 ± 1.365 102.262 ± 2.796
Arachidonic acid metabolism ec00590 163.557 ± 0.317 162.903 ± 1.17
Nitrogen metabolism ec00910 102.949 ± 0.324 101.743 ± 0.897
Folate biosynthesis ec00790 314.768 ± 6.619 307.406 ± 1.934
Fructose and mannose metabolism ec00051 30.991 ± 0.403 30.493 ± 0.193
Glutathione metabolism ec00480 45.435 ± 0.73 44.655 ± 0.569
Glycosphingolipid biosynthesis—lacto and neolacto series ec00601 29.256 ± 6.267 41.326 ± 6.14
Glyoxylate and dicarboxylate metabolism ec00630 108.993 ± 11.861 120.784 ± 8.979
Inositol phosphate metabolism ec00562 39.927 ± 0.154 39.575 ± 0.588
Porphyrin metabolism ec00860 278.62 ± 1.556 275.075 ± 6.258
Riboflavin metabolism ec00740 117.214 ± 8.465 105.181 ± 5.623
Steroid hormone biosynthesis ec00140 131.347 ± 2.431 132.832 ± 0.644
Thiamine metabolism ec00730 58.599 ± 0.158 58.15 ± 0.715
Tyrosine metabolism ec00350 70.298 ± 2.634 66.036 ± 2.207
Ubiquinone and other terpenoid-quinone biosynthesis ec00130 194.363 ± 4.996 201.709 ± 5.371

Finally, we check how stable are the enzyme participation coefficients across different mouse species (see Table 7). Note that the average relative standard deviation (RSD) for C3H is 2.7% in contrast to much higher RSD of 8.9% for P. leucopus. This can be caused by the fact that C3H mice are genetically identical. Note that the average RSD for enzyme participation coefficients in the microbial community for the same metabolic pathway (ec00620) is 34.8%, which is significantly higher (see Rondel et al., 2021) than RSD for mice.

Table 7.

The Enzyme Expression Coefficients and Relative Standard Deviations (%RSD) for the Enzyme Participation Coefficients in Pathway ec00620

ec00620 Infected C3H Uninfected C3H %RSD Infected P. leucopus Uninfected P. leucopus %RSD
EC:1.1.1.1 0.110 0.107 0.113 0.106 0.109 0.112 2.501 0.054 0.061 0.049 0.051 0.045 0.048 10.928
EC:1.5.8.3 0.027 0.025 0.026 0.026 0.026 0.026 2.433 0.035 0.031 0.038 0.033 0.035 0.041 10.039
EC:3.1.3.3 0.027 0.025 0.026 0.026 0.026 0.026 2.433 0.035 0.031 0.038 0.033 0.035 0.041 10.039
EC:2.1.2.10 0.034 0.034 0.035 0.035 0.034 0.034 1.504 0.028 0.030 0.027 0.027 0.025 0.025 7.027
EC:5.1.1.18 0.032 0.038 0.033 0.037 0.034 0.031 8.157 0.013 0.016 0.011 0.015 0.012 0.012 14.740
EC:1.4.3.21 0.050 0.055 0.055 0.054 0.054 0.051 4.019 0.028 0.029 0.019 0.027 0.025 0.024 14.269
EC:2.6.1.52 0.059 0.058 0.060 0.059 0.061 0.060 1.763 0.047 0.050 0.042 0.043 0.041 0.040 8.826
EC:2.1.1.20 0.027 0.025 0.026 0.026 0.026 0.026 2.433 0.035 0.031 0.038 0.033 0.035 0.041 10.039
EC:2.7.1.165 0.095 0.086 0.087 0.088 0.087 0.088 3.696 0.077 0.067 0.074 0.061 0.060 0.059 11.586
EC:1.5.3.1 0.027 0.025 0.026 0.026 0.026 0.026 2.433 0.035 0.031 0.038 0.033 0.035 0.041 10.039
EC:2.3.1.29 0.027 0.025 0.026 0.026 0.026 0.026 2.433 0.035 0.031 0.038 0.033 0.035 0.041 10.039
EC:4.1.2.48 0.027 0.025 0.026 0.026 0.026 0.026 2.433 0.035 0.031 0.038 0.033 0.035 0.041 10.039
EC:1.1.99.1 0.027 0.025 0.026 0.026 0.026 0.026 2.433 0.035 0.031 0.038 0.033 0.035 0.041 10.039
EC:2.3.1.37 0.055 0.052 0.055 0.055 0.056 0.055 2.499 0.072 0.066 0.074 0.067 0.070 0.077 5.909
EC:2.1.2.1 0.051 0.051 0.052 0.052 0.052 0.050 1.591 0.038 0.041 0.035 0.036 0.034 0.033 8.093
EC:1.1.1.95 0.050 0.048 0.050 0.050 0.050 0.050 1.644 0.050 0.050 0.048 0.048 0.046 0.049 3.127
EC:1.1.1.103 0.027 0.025 0.026 0.026 0.026 0.026 2.433 0.035 0.031 0.038 0.033 0.035 0.041 10.039
EC:2.1.4.1 0.049 0.047 0.049 0.049 0.049 0.050 2.013 0.049 0.049 0.044 0.047 0.046 0.051 5.252
EC:4.2.1.22 0.050 0.048 0.050 0.050 0.050 0.050 1.644 0.050 0.050 0.048 0.048 0.046 0.049 3.127
EC:4.4.1.1 0.061 0.059 0.061 0.061 0.060 0.060 1.353 0.047 0.050 0.043 0.045 0.042 0.042 7.112
EC:4.3.1.17 0.050 0.048 0.050 0.050 0.050 0.050 1.644 0.050 0.050 0.048 0.048 0.046 0.049 3.127
EC:1.4.3.4 0.062 0.070 0.064 0.070 0.067 0.068 4.864 0.028 0.033 0.023 0.028 0.027 0.026 11.895
EC:1.4.3.3 0.046 0.053 0.047 0.052 0.049 0.045 6.711 0.019 0.023 0.015 0.021 0.017 0.017 15.771
EC:1.8.1.4 0.088 0.090 0.089 0.091 0.090 0.090 1.152 0.042 0.049 0.034 0.043 0.039 0.040 12.040
EC:2.1.1.5 0.050 0.048 0.050 0.050 0.050 0.050 1.644 0.050 0.050 0.048 0.048 0.046 0.049 3.127
EC:2.1.1.2 0.049 0.047 0.049 0.049 0.049 0.050 2.013 0.049 0.049 0.044 0.047 0.046 0.051 5.252

4. DISCUSSION

The results of our study highlight the potential of using RNA-seq data to estimate enzyme expression and metabolic pathway activity in the rodent models of disease. Our modified EM-based pipeline, EMPathways2, has successfully demonstrated its ability to estimate enzyme expression, enzyme participation in pathways, and metabolic pathway activity levels in both infected and uninfected mice.

These findings further enhance our understanding of the biochemical changes occurring in the host during bacterial infection. The differences in enzyme expression and pathway activity levels between infected and uninfected mice could provide insights into the immune response mechanisms at the metabolic level. This, in turn, can potentially be used to develop new therapeutic strategies for bacterial infections and other diseases.

The variation in pathway activity levels between C3H and P. leucopus sheds light on the different immune responses in these rodent species. The higher number of pathways significantly affected by infection in P. leucopus compared with C3H may explain why C3H mice develop Lyme disease symptoms while P. leucopus mice do not. This highlights the importance of considering the host species in understanding the disease pathogenesis. It is also interesting to note that the enzyme participation coefficients were more stable in C3H compared with P. leucopus. This could be owing to the genetic similarities among laboratory mouse strains, as compared with wild mice.

5. CONCLUSIONS

In this article, we propose an improved maximum likelihood-based pipeline for the estimation of metabolic pathway activity in mice using the KEGG pathway database. Specifically, the proposed approach uses EM-based algorithms to estimate enzyme expression, enzyme participation levels in pathways, and metabolic pathway activity.

The proposed metabolic pathway analysis was applied to the RNA-seq data from 12 mice samples collected from C3H and P. leucopus with half of them being infected by B. burgdorferi 297. The key findings of the study are as follows:

  • The infection affects metabolism of both mice while for P. leucopus, the affect is more significant than for C3H.

  • The enzymes participation coefficients vary insignificantly for C3H in contrast to higher variation for P. leucopus and much higher variation for microbial communities.

ACKNOWLEDGMENTS

The authors would like to thank Dr. Artem S. Rogovskyy for his invaluable expertise and guidance throughout this project.

DATA AVAILABILITY

The data presented in the study are deposited in the GenBank Sequence Read Archive (SRA) depository and the SRA accession numbers are SAMN32740077, SAMN32740078, SAMN32740079, SAMN32740080, SAMN32740081, SAMN32740082, SAMN32740083, SAMN32740084, SAMN32740085, SAMN32740086, SAMN32740087, SAMN32740088.

AUTHORS’ CONTRIBUTIONS

F.R. did conceptualization and methodology and writing of the article. H.F. and R.H. worked on implementation of methodology of metabolic pathway retrieval and did some writing of the draft. A.J. helped in methodology technically and also did formatting of the draft. S.K. worked on conceptualization and preparing the initial implementation of the project. S.M. and A.Z. supervised the project along with funding acquisition.

AUTHOR DISCLOSURE STATEMENT

The authors state that the research was conducted without any commercial or financial affiliations that could be interpreted as potential conflicts of interest.

FUNDING INFORMATION

The work at Georgia State University (GSU) was partially supported by NIH grant 1R21CA241044-01A1, NSF grant IIS-2212508, by the GSU Molecular Basis of Disease Fellowship, and by the GSU Brain and Behavior Fellowship. S.M. was supported by the National Science Foundation Grants (2041984 and 2135954) and the National Institutes of Health Grant R01AI173172. The work at Texas A&M University was partially supported by the National Institutes of Health (NIH) grant R03AI135159-02, the Department of Veterinary Pathobiology, Texas A&M School of Veterinary Medicine and Biomedical Sciences, and Texas A&M AgriLife.

REFERENCES

  1. Barthold SW, Beck DS, Hansen GM, et al. Lyme borreliosis in selected strains and ages of laboratory mice. J Infect Dis 1990;162(1):133–138. [DOI] [PubMed] [Google Scholar]
  2. Crandall H, Dunn DM, Ma Y, et al. Gene expression profiling reveals unique pathways associated with differential severity of Lyme arthritis. J Immunol 2006;177(11):7930–7942. [DOI] [PubMed] [Google Scholar]
  3. Efron B, Tibshirani R. On testing the significance of sets of genes. 2007.
  4. Gaber AM, Mandric I, Nitirahardjo C, et al. Comparative transcriptome analysis of Peromyscus leucopus and C3H mice infected with the Lyme disease pathogen. Front Cell Infect Microbiol 2023;13:1115350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Huson DH, Mitra S, Ruscheweyh HJ, et al. Integrative analysis of environmental sequences using megan4. Genome Res 2011;21(9):1552–1560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Kanehisa M, Goto S. Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28(1):27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Konwar KM, Hanson NW, Pagé AP, et al. Metapathways: A modular pipeline for constructing pathway/genome databases from environmental sequence information. BMC Bioinformatics 2013;14:202–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Kugeler KJ, Schwartz AM, Delorey MJ, et al. Estimating the frequency of Lyme disease diagnoses, United States, 2010–2018. Emerg Infect Dis 2021;27(2):616–619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Mandric I, Temate-Tiagueu Y, Shcheglova T, et al. Fast bootstrapping-based estimation of confidence intervals of expression levels and differential expression from RNA-seq data. Bioinformatics 2017;33(20):3302–3304. [DOI] [PubMed] [Google Scholar]
  10. Margolin J. Of mice, men, and the genome. Genome Res 2000;10(10):1431–1432. [DOI] [PubMed] [Google Scholar]
  11. Masopust D, Sivula CP, Jameson SC. Of mice, dirty mice, and men: Using mice to understand human immunology. J Immunol 2017;199(2):383–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Mead PS. Epidemiology of Lyme disease. Infect Dis Clin 2015;29(2):187–210. [DOI] [PubMed] [Google Scholar]
  13. Mitrea C, Taghavi Z, Bokanizad B, et al. Methods and approaches in the topology-based analysis of biological pathways. Front Physiol 2013;4:278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Rondel F, Hosseini R, Sahoo B, et al. Estimating enzyme participation in metabolic pathways for microbial communities from rna-seq data. In Bioinformatics Research and Applications: 16th International Symposium, ISBRA 2020, Moscow, Russia, December 1–4, 2020, Proceedings 16, pp. 335–343. Springer, 2020. [Google Scholar]
  15. Rondel FM, Hosseini R, Sahoo B, et al. Pipeline for analyzing activity of metabolic pathways in planktonic communities using metatranscriptomic data. J Comput Biol 2021;28(8):842–855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Schwanz LE, Voordouw MJ, Brisson D, et al. Borrelia burgdorferi has minimal impact on the Lyme disease reservoir host Peromyscus leucopus. Vector-Borne Zoonotic Diseases 2011;11(2):117–124. [DOI] [PubMed] [Google Scholar]
  17. Sharon I, Bercovici S, Pinter RY, et al. Pathway-based functional analysis of metagenomes. J Comput Biol 2011;18(3):495–505. [DOI] [PubMed] [Google Scholar]
  18. Shen M, Li Q, Ren M, et al. Trophic status is associated with community structure and metabolic potential of planktonic microbiota in plateau lakes. Front Microbiol 2019;10:2560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005;102(43):15545–15550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Ye Y, Doak TG. A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol 2009;5(8):e1000465. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data presented in the study are deposited in the GenBank Sequence Read Archive (SRA) depository and the SRA accession numbers are SAMN32740077, SAMN32740078, SAMN32740079, SAMN32740080, SAMN32740081, SAMN32740082, SAMN32740083, SAMN32740084, SAMN32740085, SAMN32740086, SAMN32740087, SAMN32740088.


Articles from Journal of Computational Biology are provided here courtesy of Mary Ann Liebert, Inc.

RESOURCES