Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2025 Sep 8;21(9):e1013420. doi: 10.1371/journal.pcbi.1013420

paramix: An R package for parameter discretisation in compartmental models, with application to calculating years of life lost

Lucy Goodfellow 1, Carl A B Pearson 1,2,*,#, Simon R Procter 1,#
Editor: Nic Vega3
PMCID: PMC12425178  PMID: 40920848

Abstract

Compartmental infectious disease models are used to calculate disease transmission, estimate underlying rates, forecast future burden, and compare benefits across intervention scenarios. These models aggregate individuals into compartments, often stratified by characteristics to represent groups that might be intervention targets or otherwise of particular concern. Ideally, model calculation could occur at the most demanding resolution for the overall analysis, but this may be infeasible due to availability of computational resources or empirical data. Instead, detailed population age structure might be consolidated into broad categories such as children, working-age adults, and seniors. Researchers must then discretise key epidemic parameters, like the infection-fatality ratio, for these lower resolution groups. After estimating outcomes for those crude groups, follow-on analyses, such as calculating years of life lost (YLLs), may need to distribute or weight those low-resolution outcomes back to the high resolution. The specific calculation for these aggregation and disaggregation steps can substantially influence outcomes. To assist researchers with these tasks, we developed paramix, an R package which simplifies the transformations between high and low resolution. We demonstrate applying paramix to a common discretisation analysis: using age structured models for health economic calculations comparing YLLs. We compare how estimates vary between paramix and several alternatives for an archetypal model, including comparison to a high resolution benchmark. We consistently found that paramix yielded the most similar estimates to the high-resolution model, for the same computational burden of low-resolution models. In our illustrative analysis, the non-paramix methods estimated up to twice as many YLLs averted as the paramix approach, which would likely lead to a similarly large impact on incremental cost-effectiveness ratios used in economic evaluations.

Author summary

Researchers use infectious disease models to understand trends in disease spread, including predicting future infections under different interventions. Constraints like data availability and numerical complexity drive researchers to group individuals into broad categories; for example, all working age adults might be represented as a single set of model compartments. Key epidemic parameters can vary widely across such groups. Additionally, model outcomes calculated using these broad categories often need to be disaggregated to a high resolution, for example a precise age at death for calculating years life lost, a key measure when estimating the cost-effectiveness of interventions. To satisfy these needs, we present a software package, paramix, which provides tools to move between high and low resolution data. In this paper, we demonstrate the capabilities of paramix by comparing various methods of calculating deaths and years of life lost across broad age groups. For an analysis of an archetypal model, we find that paramix best matches a high-resolution model, while the alternatives are substantially different.

Introduction

Mathematical models are essential tools for understanding the transmission of pathogens within populations, for estimating and predicting the associated burden of disease on individuals and health systems, and for helping to inform public health decision-making. One common type of model is the compartmental model, which groups individuals into population compartments reflecting different stages of infection and disease. These compartments are commonly further stratified by other characteristics such as age, location, or risk factors for infection and disease. The resolution for these stratifications is constrained by various considerations, such as data availability and computational resources, and can be broad. In practical terms, such broad groups mean that many individuals who differ meaningfully in epidemiological terms are treated as indistinguishable. As we demonstrate here, these methodological assumptions can substantially impact estimates for decision-making criteria like cost-effectiveness, despite identical empirical inputs.

A notable example of this is age-stratification. Data on model inputs such as population age structure and social contact patterns are often only available in 5-year age brackets and typically use broad open age groups to encompass older ages [1], and higher resolution age brackets require more computational resources, which can be impractical for complex scenario analysis or parameter inference. For simplicity, researchers may elect to align model resolution with interventions under consideration, such as grouping all school-age children when considering the impact of school closures, working-age adults for essential worker programmes, or all elderly individuals for vaccination. Important parameters may vary significantly between individuals within these broad age groups, such as the infection-fatality ratio (IFR), prevalence of co-morbidities and risk factors, or cost of treatment. Modellers must then calculate aggregate values for these key parameters when applying them to discretised age groups. Naive approaches such as using the parameter value at the midpoint or mean age within the group may lead to incorrect results by not accounting for the variation of the parameter within the age group.

Additional issues arise when disaggregating the outcomes of compartmental models to high resolution, such as calculating the distribution of ages at death of individuals within a broad age group. This distribution is useful when calculating the years of life lost (YLLs) in an epidemic, a key measure of premature mortality used in economic evaluations of public health interventions. YLLs, which are calculated as the remaining life expectancy from the age at which a death occurs [2], often contribute a large proportion of disability-adjusted life years (DALYs) in economic evaluations, and drive the cost-effectiveness of interventions. YLLs are therefore key evidence for investment decisions such as funding routine vaccination programmes. The distribution of ages at death across a broad age group is often assumed to be proportional to the age distribution of the underlying population, but this typically leads to an overestimation of YLLs without also accounting for relative mortality rates across the age group, as deaths may be assigned at younger ages than occur in reality.

To address these issues, we present paramix, an R software package which provides functions for modellers to aggregate high resolution data into discrete, correctly weighted model parameters, and disaggregate model outputs into high resolution estimates. The package prioritises practicality, focusing on balancing ease of use with flexibility to support common modelling needs. To demonstrate the impact of different approaches, we compared model outputs using several methods when aggregating IFRs and disaggregating deaths, using an archetypal epidemic model to evaluate vaccination programmes for different underlying populations and pathogens.

Design and implementation

Functions

The paramix workflow can be summarised as 1) gathering parameters and population distribution, either as functional forms or tabulated data, 2) selecting the model and output resolution, 3) providing 1 and 2 to alembic() to create a mixing table for matched aggregation and disaggregation, 4) providing the mixing table to blend() to create compartment parameters, 5) simulating models using those parameters, 6) providing model outputs and the mixing table to distill() to disaggregate outcomes, and then 7) using the disaggregated outcomes in post-simulation analysis. Fig 1 illustrates this workflow.

Fig 1. Functionality of the paramix R package and its incorporated functions.

Fig 1

The alembic() function uses the model and output partitions, e.g. the age groups for the compartments and the age resolution for outcomes, to create a mixing partition. This mixing partition, the union of model age bounds and output age bounds, sets the intervals for calculating weighted parameter integrals (hereafter, weights) and populations (Fig 2). These weights and populations are then combined in different ways for the two stages: according to the model partition for aggregation or according to the output partition for disaggregation.

Fig 2. Example of partitions used in the paramix functions, where the model partition (labelled ai, according to their lower bounds) and output partition (labelled bi, according to their lower bounds) may differ, and the mixing partition (labelled ci, according to their lower bounds) is defined as their union.

Fig 2

When the mixing partitions are subsequently used to form parameter discretisation for the model, each ai includes all overlapping ci, so for example a1 would combine the values from c1 and c2. After simulating outcomes on the model partition, the mixing partition and weights can be used to apportion outcomes, and the overlapping ci for both the model source and output target partitions are needed. For example, b1 would also take c1 and c2: c1 matches the output partition, but the source partition a1 needs both c1 and c2. In this example, apportioning outcomes to b2 would actually rely on all of the ci, since b2 intersects both a1 and a2 and will draw outcomes from both.

Let A={ai} be the boundaries for the model partition, B={bi} the boundaries for the output partition, and C={ci} their union, the mixing partition, where all sets are strictly increasing (Fig 2). For brevity, we will denote partitions [xi,xi+1) by their lower bounds, xi (with x as a, b, or c as appropriate). The calculations for weights and populations for each mixing partition are then

weighti=ciparameter(x)ρ(x)dx (1)
populationi=ciρ(x)dx (2)

Here, parameter(x) is the user-provided distribution of the parameter of interest, and ρ(x)>0 is the population density per feature of interest, with Cρ(x)dx=1. Internally, paramix handles converting from tabular input into functional forms using base R interpolation functions (splinefun() for parameters and approxfun() for density), but users can provide a custom interpolation function. In our demonstration, the partition feature x is age (so ρ is the density of the population by age), but can be any compartment stratification feature (e.g. risk, if compartments represent discretised behaviour groups, and then ρ would be the population density by risk).

Using the mixing table results, blend() then computes parameters for model partition ai in terms of the corresponding mixing boundary set {cj,cj+n} where cj=ai and cj+n+1=ai+1

parameteri=k=jj+nweightkk=jj+npopulationk (3)

The distill() function distributes model outcomes (for example, estimated hospitalisations or deaths) to the output partition bi from intersecting model partition(s) in A, i.e. where the lower and/or upper bound of an aj is within bi. Using Bayes’ theorem:

P(in partition bi|outcome)=P(outcome|in partition bi)P(in partition bi)P(outcome) (4)

The elements on the right hand side can be composed out of the mixture partition weights. Defining the contribution of model partition aj’s outcomes to bi as ωji, we can compute ωji as 0 (where aj and bi do not overlap) or in terms of ck=aj and cl=aj+1 (the whole span of the model partition, proportional to P(outcome)), and cm=max(aj,bi) and cn=min(aj+1,bi+1) (the span of the intersection, proportional via the same multiplier to the conditioned P(outcome)). Or in terms of set boundaries:

ωji=ckbiajweightkckajweightk (5)

Then the model outputs Xj from aj are transformed into the output partition outputs Yi in bi (reiterating that ωji is 0 where aj and bi do not intersect) by:

Yi=ωjiXj (6)

Practically, most modelling work will have higher resolution outcome partitions that only subdivide low-resolution model partitions (rather than crossing multiple ones), but paramix supports arbitrary intersection of model and output partition bounds.

To see how these relations are implemented in paramix, users can refer to the documentation for the alembic() function (e.g. via > ?alembic) or view the body of the function (e.g. via > print(alembic)), which includes inline comments explaining the operations. We have also reproduced those elements in Section 2 of S1 Text.

Data format

The paramix package inputs can be either functions or data.frame objects (or any type that extends data.frame, such as data.table or tibble [3,4]), though paramix returns data.table objects. Model and output partitions are provided as vectors. When users provide the parameter or density ‘functions’ as tabular data, these are translated into functions via, by default, base R interpolation methods, but this interpolation can also be user-specified.

Comparison to alternative approximations

To demonstrate the use of paramix and compare it to alternative approaches, we consider an archetypal infectious disease dynamical model with age stratification. We aggregate an IFR function to create death-per-infection parameters for the age groups and apply them to that archetypal model. We disaggregate the resulting fatalities based on the underlying population and same IFR function, and then compute YLLs averted by different vaccination programmes. For model age groups, we used four broad groups: pre-school age (0-4), school age (5-19), working age (20-64), and elderly (over 65). We also ran a high-resolution model with 101 age groups, i.e. 0, 1, ..., 100+ year olds, to benchmark the estimates for the broad age groups. We considered vaccination programmes targeting one of the 5-19, 20-64, or 65+ age groups. To ensure comparability of the vaccination programmes, we assumed that a fixed number of doses corresponding to vaccinating 75% of over 65s was available. For each scenario, we allocated those doses to the target age group; for the under 65 targets, this typically yields lower than 75% coverage.

We modelled pathogen dynamics using a Susceptible (S), Exposed (E), Infectious (I), and Recovered (R) epidemic model (SEIR model), with the four age groups as stratifications (Fig 3) [5]. For simplicity, we do not include ageing or births and deaths in the model. Fatalities appear in the R compartment; this is equivalent to living individuals reducing their contact rates as the population size shrinks. We assumed that vaccinations were 50% efficacious with an all-or-nought mechanism, with no loss of immunity in the epidemic time period. We represent vaccination by moving effectively vaccinated individuals to the R compartment prior to the start of simulation. We seeded the epidemic with 0.001% of each age group in the E compartment at the start of the epidemic. For the high-resolution model, both vaccination and seeding are distributed proportionally according to population within the low-resolution bounds.

Fig 3. Age-stratified SEIR model, where λ is the force of infection, τ is the average rate at which exposed individuals become infectious, γ is the average rate of recovery, v denotes vaccination coverage, and A vaccine efficacy.

Fig 3

The force of infection is determined by transmissibility, age-specific contact patterns, and the proportion of each age group in the I compartment. Subscript a denotes age-specificity.

We considered both flu-like and COVID-like pathogens, with assumed differences in transmissibility, length of infectious and latent periods [6,7], and IFR distributions (Table 1). The force of infection for age group i was then calculated as

Table 1. Epidemiological parameters for the transmission model, for a flu-like and COVID-19-like infection.

Pathogen Flu-like COVID-19-like
Transmissibility, β 0.15 0.1
Latent period (days), 1/τ 1 3
Infectious period (days), 1/γ 2 5
Infection-fatality ratio for age x Proportional to all-cause mortality at age x, mx (10−3.27 + 0.0524x)/100
λi=β×j(ci,j×IjSj+Ej+Ij+Rj)

Here, β is the transmissibility of the pathogen of interest, ci,j is the daily number of contacts between age groups i and j, and Ij is the number of infectious individuals in age group j.

We present results in two underlying populations, using either a rectangular age structure similar to that of many high-income countries (HICs), or a young age structure resembling that of many low- and middle-income countries (LMICs) (Fig 4a). The populations also experienced life expectancies resembling those in HICs and LMICs, respectively, but the same age-specific IFR. These populations were calculated using World Population Project data for the United Kingdom and Afghanistan, respectively [9].

Fig 4. a. Population age distribution.

Fig 4

b. Age-specific infection fatality ratio, and the results of aggregating into broad age groups using four different methods, for both a flu-like and COVID-19-like infection. c. Incidence of flu-like and COVID-19-like infections, under either no vaccination program or vaccination of specific age groups. Subpanels on the left use the demography of a low- and middle-income country (LMIC); subpanels on the right use the demography of a high-income country (HIC).

For the IFR, we assumed that the flu-like pathogen IFR follows the all-cause mortality distribution (using mortality data from the HIC setting), while the COVID-like pathogen was associated with a strictly increasing IFR with age [8] (Fig 4b). We scaled the IFRs to produce comparable total mortality for both pathogens. We calculated YLLs as the total sum of remaining life expectancy at death across all fatalities in an epidemic, meaning the results were sensitive to the method of assigning age at death to model fatalities.

We compared three calculation approaches for aggregating IFR when modelling with broad age groups (Table 2). Briefly: the mid-age approach is the simplest and only accounts for the bounds of the age groups; the mean age approach uses the mean age within those bounds based on the population distribution, and the paramix approach accounts for age structure across the age group when aggregating IFR. We then compared four calculation approaches for the disaggregation of deaths to calculate YLLs (Table 2). Here, the first approach is again dependent on the bounds of the age groups, the next approach assumes that all deaths occur at the mean age within these bounds, the next assumes that deaths occur proportional to the age distribution, and the paramix approach assumes that deaths occur proportional to age and IFR distributions. These approaches represent incremental increases in computational complexity as well as data requirements, but none of them are substantial compared to actual simulation and post-processing demands. The paramix approach would be the most complex to implement by hand, but as encapsulated requires the user only invoke 3 commands. When calculating the midpoint of age groups, we assumed that the open-ended 65+ age group ended at age 101.

Table 2. Aggregation and disaggregation approaches compared in this example.

Infection-fatality ratio (IFR) aggregation
Approach name Calculation approach Formula
IFR(mid(Age)) IFR of midpoint of age group IFR((a+b)2)
IFR(E[Age]) IFR of mean age in age group IFR(abρ(age)dage)
E[IFR(Age)] Mean of IFR across age group abρ(age)IFR(age)dage
Age at death disaggregation
Approach name Calculation approach Formula
Uniform Uniform across age group Uniform[a,b)
Mean age All occurring at mean age in age group 1{x=abρ(age)dage}
Prop. to pop. density Occurring proportional to age distribution ρ(age)abρ(age)dage
paramix Occurring proportional to age distribution and mortality IFR(age)ρ(age)abIFR(age)ρ(age)dage

To review this analysis pipeline, readers may use the follow commands in R to obtain the precise analysis pipeline:

path <- "folder/to/copy/to"

srcdir <- system.file("analysis", package = "paramix")

srcfiles <- list.files(srcdir, full.names = TRUE, recursive = FALSE,

include.dirs = TRUE)

file.copy(from = srcfiles, to = path, recursive = TRUE)

The analysis is orchestrated using the make tool, with each step documented in the Makefile.

Results

Parameter aggregation

We compared the IFR values for each of the models’ broad age groups calculated using each approach; Fig 4b presents the compartmental aggregate values against the true age-specific IFR for flu-like and COVID-like pathogens. The IFR values were identical across populations when using the mid-age IFR (as this approach considers age bounds), but otherwise varied based on the underlying population. The age-specific flu-like IFR increased at very young ages as well as older ages; consequently, the different approaches produced divergent flu-like IFR estimates for the 0-4 model age group. The COVID-like IFR was relatively similar across approaches in the 0-4 and 5-19 age groups, but varied widely in the 20-64 and 65+ age groups.

Incidence per capita varied across underlying populations and between pathogens, as did the comparative effect of vaccinations (Fig 4c). The LMIC-like population experienced a smaller decrease in incidence per capita, due to fewer vaccines used in this example, as the proportion of the population aged over 65 was much smaller than in the HIC-like population (Fig 4a) and we fixed vaccine doses to a matching coverage in the 65+ age group.

In all scenarios, vaccinating those aged over 65 averted the most deaths, but the magnitude of deaths averted and the relative impact of each vaccine programme varied depending on the calculation approach (Fig 5). Using the IFR of the mid-age consistently calculated the most deaths averted, largely due to overemphasis on the eldest in the 65+ age group, while using the IFR of the mean age computed the fewest deaths averted. The relative importance of calculation approaches was the greatest when considering the 65+ vaccination programme: using the mid-age overestimated the number of deaths averted under a COVID-like epidemic in an LMIC-like population compared to the paramix approach by 173% when vaccinating those aged over 65, 96% when vaccinating those aged 5-19, and 20% when vaccinating those aged 20-64. Similar findings occurred for flu-like epidemics, and in the HIC-like population.

Fig 5. Estimated deaths averted under each vaccination scenario, for flu-like and COVID-like pathogens in LMIC-like and HIC-like populations.

Fig 5

Shown for three varying approaches of infection-fatality ratio aggregation, including the paramix package, and a high-resolution model with no aggregation.

The estimated number of deaths averted is consistently most similar to the results of the high-resolution model when using the paramix approach (Fig 5). However, the high-resolution model run time is around 1000 times greater than the low-resolution models.

Outcome disaggregation

The YLLs averted again give qualitatively consistent results for all approaches in all settings, with different quantitative outcomes; however, the preferred intervention is no longer the same for flu-like and COVID-like pathogens (Fig 6). Each YLL computation was based on the same numbers of deaths in each of the four broad model age groups, for comparability (those estimated by the paramix approach). This may correspond to research where figures such as the number of deaths in each age group have been provided to researchers who are planning on conducting further economic analysis. Using the mean age or distribution proportional to age only neglects that older individuals are more likely to die if exposed for these IFR trends, which paramix accounts for, meaning those approaches assign more deaths to relatively younger individuals compared to paramix and thus estimate higher YLLs per death. The effect of this is most extreme when assigning age at death proportional to age distribution only, an approach frequently used by researchers, where for example YLLs averted in an HIC-like population by vaccinating elderly individuals are 99% and 76% higher in COVID- and flu-like epidemics, respectively, compared to the paramix approach. The corresponding figures are 65% and 52% in an LMIC-like population. Again, the estimated number of YLLs saved is consistently most similar to the results of the high-resolution model when using the paramix approach (Fig 6).

Fig 6. Estimated years of life saved under each vaccination scenario, for flu-like and COVID-like pathogens in LMIC-like and HIC-like populations.

Fig 6

Shown for four varying methods of age at death disaggregation, including the paramix package, and directly from the high-resolution model.

This demonstration has shown that computational approaches for aggregation or disaggregation can drastically change the magnitude of effect of interventions and that using paramix most closely resembles the results of a fully disaggregated model. Evaluations which use thresholds to determine if an intervention should be implemented are affected by these changes in magnitude. In some cases, it is also possible that incorrect aggregation and disaggregation of parameters could change the ranking of interventions under consideration, particularly for non-linear parameters. Effective evaluation of public health interventions therefore requires considered and accurate methods of discretisation which take into account the population and parameter densities at hand. The paramix package will simplify these processes for modellers and researchers.

Demonstration limitations

For this demonstrative analysis, we used a relative short time horizon. As is a limitation of any age-compartmentalised model, we would expect paramix estimates to diverge from the high-resolution model if there were stronger depletion effects in play: for example, the deaths concentrated amongst oldest population could shift the relative composition of the 65+ age compartment over time and reduce the effective death rate. However, we also generally expect the alternative methods to continue to be more poorly-matched than paramix.

We also note that there is a case where the high-resolution and paramix estimates suggest a different intervention ranking by a very narrow margin (flu-like epidemic in an LMIC-like population). This is likely due to the exact choice of assigning IFRs for the high-resolution age bands, where a decision still has to be made: for age group x to x+1, we used IFR(x), when we might have instead used xx+1IFR(x)dx, IFR(x+12), or some other intermediate value. Our interpretation is that these options are within the practical margin of uncertainty for the decision and likely to result in indistinguishable outcomes for this metric. That suggests other policy values may be more appropriate decision factors, for example prioritizing directly protecting vulnerable members of society versus maintaining economic activity by protecting workers (which has its own indirect effects on those vulnerable individuals).

Availability and future directions

The paramix open-source software package is implemented in R and available for download via CRAN (https://doi.org/10.32614/CRAN.package.paramix). Installation instructions, tutorials, and detailed vignettes are available at https://cmmid.github.io/paramix/. Code used in the example detailed in this manuscript is available via Github (https://github.com/cmmid/paramix/tree/main/inst/analysis). This package currently supports only simple interpolations when offered data, but we provide contributor guidelines for anyone that wishes to suggest better defaults and alternatives.

The takeaways of this analysis apply to any age-specific epidemic parameters, or even more broadly, any stratifications of a population. Users can easily compare the effect of different computational approaches for different populations and parameters as we have here by using paramix’s builtin summary comparison functions parameter_summary() and distill_summary() for aggregation and disaggregation, respectively. We have demonstrated that aggregation and disaggregation choices can lead to large and potentially consequential changes in estimated impact of disease interventions, and shown how to use paramix to better approximate that impact. We hope that the ease and practicality of paramix will help modellers improve their estimates in future work.

Supporting information

S1 Text. Section 1. paramix functions. Section 2. Documentation for alembic().

(PDF)

pcbi.1013420.s001.pdf (288.7KB, pdf)

Acknowledgments

We would like to thank Nicholas Davies, Jonathan Dushoff, Thomas Hladish, and Juliet Pulliam for helpful comments on the draft of this manuscript.

Data Availability

All relevant data are within the manuscript or the associated freely available R package.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Prem K, Zandvoort KV, Klepac P, Eggo RM, Davies NG, Centre for the Mathematical Modelling of Infectious Diseases COVID-19 Working Group. Projecting contact matrices in 177 geographical regions: an update and comparison with empirical data for the COVID-19 era. PLoS Comput Biol. 2021;17:e1009098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.World Health Organization. Years of life lost from mortality (YLL). https://www.who.int/data/gho/indicator-metadata-registry/imr-details/159
  • 3.Barrett T, Dowle M, Srinivasan A, Gorecki J, Chirico M, Hocking T, et al. data.table: Extension of “data.frame”. 2024. https://cran.r-project.org/web/packages/data.table/index.html
  • 4.Müller K, Wickham H, Francois R, Bryan J. RStudio. Tibble: Simple Data Frames. 2023. https://cran.r-project.org/web/packages/tibble/index.html
  • 5.Anderson RM, May RM. Infectious diseases of humans: dynamics and control. Oxford: Oxford University Press; 1991.
  • 6.Baguelin M, Flasche S, Camacho A, Demiris N, Miller E, Edmunds WJ. Assessing optimal target populations for influenza vaccination programmes: an evidence synthesis and modelling study. PLoS Med. 2013;10(10):e1001527. doi: 10.1371/journal.pmed.1001527 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Davies NG, Klepac P, Liu Y, Prem K, Jit M, CMMID COVID-19 working group, et al. Age-dependent effects in the transmission and control of COVID-19 epidemics. Nat Med. 2020;26(8):1205–11. doi: 10.1038/s41591-020-0962-9 [DOI] [PubMed] [Google Scholar]
  • 8.Levin AT, Hanage WP, Owusu-Boaitey N, Cochran KB, Walsh SP, Meyerowitz-Katz G. Assessing the age specificity of infection fatality rates for COVID-19: systematic review, meta-analysis, and public policy implications. Eur J Epidemiol. 2020;35(12):1123–38. doi: 10.1007/s10654-020-00698-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.United Nations D of E and SA Population Division. World Population Prospects 2022, Online Edition. 2022. https://population.un.org/wpp/Download/Standard/MostUsed/
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1013420.r001

Decision Letter 0

Nic Vega

11 May 2025

PCOMPBIOL-D-24-02097

paramix : An R package for parameter discretisation in compartmental models, with application to calculating years of life lost

PLOS Computational Biology

Dear Dr. Pearson,

Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript within 30 days Jul 11 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Nic Vega, Ph.D.

Academic Editor

PLOS Computational Biology

Roger Kouyos

Section Editor

PLOS Computational Biology

Additional Editor Comments:

The reviews were positive with regard to the technical advance implemented in the package, but indicated a number of modifications to the manuscript itself that would improve clarity and usability.

Journal Requirements:

1) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019.

2) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: 

https://journals.plos.org/ploscompbiol/s/figures

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The paper is very well-written, easy to read, and is potentially relevant, with practical applications. I only have a few minor comments:

- Within integrals, the $d$ in dx should not be italicized.

- I would bring the definition of mixing table (L62-64, Fig 2) before the introduction of the workflow (L53-60). Now the workflow refers to the mixing table, but the reader can't know what does it mean, as it is introduced only later.

- I would give a unit of measurement for $\rho$.

- Equation (4) is unclear to me: What does "in group [e, f)" mean...? It should be properly defined.

- Equation (5) is unclear to me. "outcome" is never defined, and it is used somewhat inconsistently: on the left-hand-side we have $outcomes_x$, on the right hand side we have $outcomes_x^y$. These should be properly defined.

- I'd add a citation to SEIR, and age-stratified SEIR models.

- "To ensure comparability of the vaccination programmes, we assumed that enough vaccines to vaccinate 75% of over 65s could instead be allocated to different age groups, which typically entails lower coverage but otherwise never approaches 100% coverage." (L102-104) I don't really understand this sentence, could you please elaborate?

- Fig 4b according to the figure caption ("Age-specific remaining life expectancy") is missing from the figure itself. (Thus the remaining ones have wrong letter.) It is also referred to in the text, despite not appearing in the figure.

- It is strange that Figure 4c (actually 4b, see above) is referred before 4a in the text.

- "and approaches 3 and 4 produced similar estimates of deaths averted" (L170) I don't understand this, there are only 3 approaches for aggregation.

Reviewer #2: Goodfellow et al. present an R package to bin parameters that vary as a function of a continuous variable (e.g., mortality rate as a function of age) into a small number of discrete values (e.g., mortality in young, mortality in middle-aged, mortality in elderly). They make a clear argument against using a value at the midpoint of each discrete bin and instead point out that a more statistically justifiable approach would be to take each parameter's expectation over the continuous variable's probability across each bin. While this R package addresses a clear problem, insufficient details are included to fully understand the approach without looking at the source code.

I suggest either going into more formal detail about the maths underlying the methods (more like a methods paper rather than a software paper) or by explicitly working through an example using the R functions in this package (more like a R vignette). For example, equations 1-5 reference bounds a,b,c,d,e,f without explicitly defining; the term “mixing partition” is used without a definition; the parameters to the SEIR model are described only in the figure caption but not explicitly referenced in Table 1 which gives numerical values; etc.

Specific points

• fig 1: purple circles are labeled as “user inputs,” but the circle at the bottom (“post-analysis e.g. YLLs”) seems more like an output. Maybe label purple as “inputs/outputs”.

• line 71: talking about “interpolation” here made me question “what type?” until the later section that says your package uses both linear and spline interpolation. Would be better to give this information when first mention it!

• line 87: the last word on this line (“data.tables”) is all in italics, which implies the name of this class has an “s” on the end. A better phrasing might be “\emph{paramix} returns \emph{data.table} objects”.

• fig 4: The caption needs more detail and should explicitly state that all subpanels on the left follow the LMIC-like distribution and all subpanels on the right follow the HIC-like distribution (also “LMIC” and “HIC” should be defined in the caption, not just the main text). Further, this figure seems to have been changed from an earlier version – some places refer to four panels (e.g., in the figure caption, and in text at bottom of page 5) and in other places three panels (e.g., the figure itself and in text on page 8).

• fig 5 & 6: interpretation would be easier if the four subpanels in these figures were transposed such that the layout matches fig 4 (i.e., LMIC-like is on left, HIC is on right, Flu-like is on top, COVID-like is on bottom).

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

Figure resubmission:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions.

Reproducibility:

To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1013420.r003

Decision Letter 1

Nic Vega

11 Aug 2025

Dear Dr. Pearson,

We are pleased to inform you that your manuscript 'paramix : An R package for parameter discretisation in compartmental models, with application to calculating years of life lost' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Nic Vega, Ph.D.

Academic Editor

PLOS Computational Biology

Roger Kouyos

Section Editor

PLOS Computational Biology

***********************************************************

The reviews indicate that all major concerns have been addressed, with only minor edits suggested for clarity.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: My concerns have been addressed, I suggest the acceptance of the paper.

Reviewer #2: The authors have addressed my concerns with this revision. I noted just one minor error in their expanded method section:

On line 87, double check the inequalities and description. The text says you're checking if an upper/lower bound of a_j is within b_i. I interpret these as referencing the intervals [a_j, a_{j+1}) and [b_i, b_{i+1}). What is $n$ in $a_{j+n}$? Did you mean +1 instead of +n? Regardless, the text doesn't seem to match the inequalities – there is no check if the lower bound of a_j is within the interval [b_i,b_{i+1}). I think you meant to write that you're checking if the two intervals overlap.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1013420.r004

Acceptance letter

Nic Vega

PCOMPBIOL-D-24-02097R1

paramix: An R package for parameter discretisation in compartmental models, with application to calculating years of life lost

Dear Dr Pearson,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Section 1. paramix functions. Section 2. Documentation for alembic().

    (PDF)

    pcbi.1013420.s001.pdf (288.7KB, pdf)
    Attachment

    Submitted filename: Paramix response to reviewers.pdf

    pcbi.1013420.s002.pdf (140.8KB, pdf)

    Data Availability Statement

    All relevant data are within the manuscript or the associated freely available R package.


    Articles from PLOS Computational Biology are provided here courtesy of PLOS

    RESOURCES