Skip to main content
PLOS One logoLink to PLOS One
. 2020 Mar 6;15(3):e0228073. doi: 10.1371/journal.pone.0228073

lillies: An R package for the estimation of excess Life Years Lost among patients with a given disease or condition

Oleguer Plana-Ripoll 1,*, Vladimir Canudas-Romo 2, Nanna Weye 1, Thomas M Laursen 1, John J McGrath 1,3,4, Per Kragh Andersen 5
Editor: Louise Emilsson6
PMCID: PMC7059906  PMID: 32142521

Abstract

Life expectancy at a given age is a summary measure of mortality rates present in a population (estimated as the area under the survival curve), and represents the average number of years an individual at that age is expected to live if current age-specific mortality rates apply now and in the future. A complementary metric is the number of Life Years Lost, which is used to measure the reduction in life expectancy for a specific group of persons, for example those diagnosed with a specific disease or condition (e.g. smoking). However, calculation of life expectancy among those with a specific disease is not straightforward for diseases that are not present at birth, and previous studies have considered a fixed age at onset of the disease, e.g. at age 15 or 20 years. In this paper, we present the R package lillies (freely available through the Comprehensive R Archive Network; CRAN) to guide the reader on how to implement a recently-introduced method to estimate excess Life Years Lost associated with a disease or condition that overcomes these limitations. In addition, we show how to decompose the total number of Life Years Lost into specific causes of death through a competing risks model, and how to calculate confidence intervals for the estimates using non-parametric bootstrap. We provide a description on how to use the method when the researcher has access to individual-level data (e.g. electronic healthcare and mortality records) and when only aggregated-level data are available.

Introduction

Life expectancy at birth for a given population is defined as the life table average age at death. However, demographers are usually interested in estimating life expectancy for a population that is not yet extinct, and for which age at death is not available for everyone. For example, one could be interested in estimating life expectancy for babies born in year 2017. In such situations, life expectancy at birth is a summary measure of life tables based on mortality for that given year (estimated as the area under the survival curve built with all age-specific mortality rates in that specific year). This measure can be interpreted as the average number of years a newborn is expected to live if all age-specific mortality rates in that year remain constant in the future. In addition, it is possible to estimate life expectancy for different subgroups in the population (e.g. boys and girls), by considering the age-specific mortality rates within these specific subgroups, and assuming that a boy (girl) will experience current mortality rates for males (females) throughout their life. Global life expectancy at birth in year 2017 was 75.6 years for females and 70.5 years for males [1]. Females were therefore expected to live five years longer than males, or – put it in a different way – males were expected to lose five years of life compared to females.

The number of years of life lost can also be used to estimate how many years patients with a given disease are expected to lose compared to the general population, by subtracting the life expectancy in those with the disorder or condition of interest from that of the general population. Such measure is useful to quantify – and therefore compare – the societal burden of different diseases. Although this health metric is of interest for epidemiologists due to its potential impact, the estimation of life expectancy among those with a given disease is not straightforward (different methodologies have been described and discussed in detail elsewhere [2,3]). For congenital diseases (i.e. diseases that are present at birth), life expectancy among the diseased can be estimated using age-specific mortality rates among those with the disease, as we assume that they will experience these mortality rates throughout their entire life. However, for other disorders not present at birth, age of onset can vary widely between individuals. If the same approach is used for these types of disorders, the estimated life expectancy can only be interpreted as the expected lifetime for persons who have had the disease since birth, since there is again the assumption that the diseased experienced these mortality rates throughout their entire life – even before disease onset. In an attempt to overcome this limitation when calculating life expectancy for persons with a given disease, some studies have assumed that the persons with the disease experienced the mortality rates of the general population until a specific age threshold, and the specific mortality rates of the diseased afterwards (e.g. 15 years for mental disorders [46], 20 years for type 1 diabetes [7,8], and 55 years for colon cancer [9]). However, this simplifying assumption does not reflect the underlying age of onset distribution, and can result in biased estimates of life expectancy. Another approach has been to consider those diagnosed before a range of different possible ages [10,11], which then leads to a difficult interpretation and reporting of results given that there are different estimates for each age of onset anchor points.

Recently, new methods have been developed that overcome these past limitations. The Life Years Lost method [2,12] can estimate remaining life expectancy among those with a disorder of interest at the specific observed time of onset, and compare the average of these individuals to that of the general population of same age. In addition, one of the main advantages of this method is that it is possible to decompose the total life lost associated with a given disease into specific causes of death by means of a competing risks model [13], which permits the inspection of how specific causes of death contribute to the premature mortality in the disorder or exposed group of interest (e.g. smoking or any other time-varying or constant risk factor). Differences in remaining life expectancy after disease onset is relatively easy to interpret, and complements other widely used mortality estimates (e.g. standardized mortality rates or mortality rate ratios). This method has been used to investigate excess mortality associated with mental disorders [12,14,15]; the Life Years Lost measure found smaller differences in life expectancy compared to previous estimates that assumed a fixed age at onset at age 15 years [46].

In this paper, we present the R package lillies (a word that reflects the initials of Life Years Lost; LYLs), whose version 0.2.5 is available through the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=lillies, and will work with R version 3.5.0 or higher. We provide a description of how to estimate excess LYLs associated with a given disease or condition, including the decomposition into specific causes of death and the calculation of confidence intervals using bootstrapping. Additionally, we provide a description on how to use the method when the researcher has access to individual-level data (e.g. electronic healthcare and mortality records), but also when only aggregated-level data are available.

Excess Life Years Lost among patients with a given disease: the method

The LYL method, which is based on a population of persons with a given disease, uses age at disease diagnosis for each person in the population as its starting point and estimates the expected residual lifetime at that age using age-specific mortality rates. The number of excess LYLs is estimated by matching the expected residual lifetime for someone diagnosed with the disease, with the life expectancy from the general population at that specific age. Interestingly, age-specific life expectancy in the general population is usually available through standard life tables from the Central Bureau of Statistics in each country. By using life tables from the general population, a metric can be calculated that compares those with a disorder of interest to a group of persons from the same source population matched on age and sex. In order to obtain an overall single estimate of excess LYL (instead of one estimate for each affected person), it is possible to take an average of all the person-specific LYL. This estimate can be interpreted as the average life lost (in years) that patients with a given disease experience from the time of diagnosis in excess to those experienced by a reference population of same age.

Excess Life Years Lost using individual-level data

Step 1: Selection of the study population

A simulated dataset is used to show how to use the LYL method (the R code is available in S3 Appendix). The ‘simu_data’ (available through the lillies package), contains information on a simulated population of 100,000 persons. In this population, all individuals are followed from birth (age_start=0 for everyone) until death (at age_death, which ranges from 0.25 until 95 years). Note that persons alive at age 95 years (n=4,384) are censored. While censoring at age 95 years is the only censoring mechanism applied in this simulated population, the LYL method also works with censoring at different ages, as in a standard time-to-event analysis (and therefore assuming independent censoring, i.e. those being censored at one specific time should be representative of those still at risk at that time). In addition, delayed entry (i.e. left truncation) is also possible in data sets where some individuals first enter the study some years after birth (in that case, variable age_start would contain the age at start of follow-up). Additionally, all deaths were classified into two mutually exclusive and collectively exhaustive causes: natural and unnatural causes. Among those who died before age 95 years, 89,989 (94.1%) died of natural causes and 5,627 (5.9%) of unnatural causes. Finally, 32,391 persons in this simulated population experienced a disease of interest, and the age of onset for these individuals (mean age 38.9 years) is recorded in the variable age_disease.

install.packages("lillies")

library(lillies)

data(simu_data)

summary(simu_data[, c("age_death", "death", "cause_death", "age_disease")])

The R output is shown in Fig 1.

Fig 1. R Output 1.

Fig 1

In order to estimate life expectancy among persons with a given disease, it is necessary to identify those who experience the disease. However, the entire population will be used in Step 3 to make the comparison between the two groups. In case the researcher has access to only a group of persons with a disease (instead of the entire population), we show (in Step 5) how to compare it to the general population using publicly available life tables.

diseased <- simu_data[!is.na(simu_data$age_disease), ]

nrow(diseased)

## [1] 32391

Step 2: Life Years Lost at one specific age for persons with the disease (age 45 years as example)

The first step is to calculate remaining life expectancy for each person at time of diagnosis. The age at disease onset ranged from 1.25 to 94.75 years; it is therefore necessary to calculate remaining life expectancy at each age from 1 to 94 years (if considering only integers as possible ages of onset). For example, for someone diagnosed at age 45 years, the conditional survival function is shown in Fig 2A. This curve is built on mortality rates obtained from all persons in the population with the disease who were still alive at age 45 years (this restriction is similar to a Landmark analysis that conditions on those alive at specific landmark points [16,17]). As example, around 40% of individuals are still alive at age 75 years, while the remaining 60% have died before.

Fig 2.

Fig 2

Conditional survival curves (a, b and c), stacked cumulative incidence for all-cause mortality (a) and stacked cause-specific cumulative incidences for natural and unnatural causes of deaths (b and c) for persons with a diagnosis of the disease and alive at age 45 years. Fig 2C is the same as Fig 2B but changing the colors and the x axis label. Details on how to interpret these figures are available in S1 Appendix.

The basic features of survival analysis that are required to implement this method are provided in S1 Appendix. In brief, remaining life expectancy at age 45 years is estimated as the area under the conditional survival curve from age 45 years to ∞; however, this measure is sometimes ill-determined if there are censored observations and the curve does not reach zero (i.e. some persons are still alive at the end of the curve), as it is the case in this example (in fact, the survival curve cannot reach zero if the last person at risk is censored, even if this is the only censored observation in the dataset). One approach to overcome this limitation is the τ-restricted mean lifetime, which can be interpreted as the average number of years lived before time τ, and is defined as the area under the curve until time τ: For this example, τ has been set to 95 years, an age in which persons were censored if they had not died before. The estimate of LYL has therefore to be interpreted as life lost after the specific age (45 years in this example) and before age 95 years. Although the choice of 95 years is arbitrary, the life lost before τ can be interpreted as total life lost if τ is an age in which the survival probability is as low as possible (ideally zero). However, in other settings, the researchers might be interested in LYL before age 18 years for childhood disorders, or before retirement at age 68 years, for example. The remaining 95-restricted life expectancy at age 45 years (referred to as 50e45 in S1 Appendix), is therefore 26.1 years. A person with disease onset at age 45 years lives on average an additional 26.1 years before age 95 years; or – put it in a different way – persons with disease onset at age 45 years lose 23.9 years of life before reaching 95 years. Estimates at one specific age can be obtained with the function lyl, which later can be examined with functions summary and plot.

LYL45 <- lyl(data = diseased, t0 = age_disease, t = age_death, status = death, age_specific = 45, tau = 95)

summary(LYL45)

The R output is shown in Fig 3.

Fig 3. R Output 2.

Fig 3

Note that the beginning of follow-up here is at age_disease to avoid immortal time bias (individuals survive until disease onset, therefore follow-up must start at disease onset, and not at birth, for the group of persons with the disease). In addition, it is possible to estimate LYL due to specific causes of death [12,13], e.g. natural and unnatural causes of death (by providing the variable cause_death, which is a categorical variable with different causes, instead of death). The total of 23.9 years of life that those diagnosed at age 45 years lose before age 95 years can be decomposed into 21.7 years due to natural causes and 2.2 years due to unnatural causes.

LYL45 <- lyl(data = diseased, t0 = age_disease, t = age_death, status = cause_death, age_specific = 45, tau = 95)

summary(LYL45)

plot(LYL45)

The R output is shown in Fig 4 and the plot generated is shown in Fig 2B.

Fig 4. R Output 3.

Fig 4

Fig 2B shows the survival function for persons with disease onset at age 45 years, in an analogous way as in the figure with all-cause mortality (Fig 2A); however, the area above the survival curve (corresponding to the LYL) is now decomposed into natural and unnatural causes of death. In order to decompose the area, the cumulative incidence function for one of the causes has to be ‘stacked’ over the survival curve [13] (details on how to interpret these figures are available in S1 Appendix). Among the 60% who have died before age 75 years, around 50% were due to natural causes and 10% due to unnatural causes. Note that it is possible to change the colors of the figures by adding the colors option, or change other attributes using the standard ggplot2 notation.

plot(LYL45, colors=c("blue", "red")) + ggplot2::xlab("Age (y)")

The plot generated is shown in Fig 2C.

At this point, it is possible to estimate confidence intervals (CI) via non-parametric bootstrap with the function lyl_ci (confidence level – 0.95 by default – can be specified with the parameter level). The total excess LYL for those with the disease of interest is 23.9 (95% CI: 23.7 – 24.1).

LYL45_ci <- lyl_ci(LYL45, niter = 1000)

summary(LYL45_ci)

The R output is shown in Fig 5.

Fig 5. R Output 4.

Fig 5

Step 3: Comparison to the general population

On average, individuals with disease onset at age 45 years live an additional 26.1 years, which means they lose 23.9 years of life when considering a theoretical maximum of 95 years. It is now possible to estimate the life expectancy and life lost for the general population of same age. The difference between the two estimates results in the excess LYL, which is the amount of life persons with a disease at age 45 years lose in excess of that seen in the general population.

LYL45_ref <- lyl(data = simu_data, t = age_death, status = cause_death, age_specific = 45, tau = 95)

summary(LYL45_ref)

The R output is shown in Fig 6.

Fig 6. R Output 5.

Fig 6

On average, individuals in the general population alive at age 45 years live an additional 32.3 years, which means that a person with disease onset at age 45 years lose on average 6.2 years compared to a person of the same age from the general population (32.3 – 26.1 = 6.2). Or alternatively, a person with disease onset at age 45 years experiences an excess LYL of 6.2 years (23.9 vs. 17.7 years). Note that start of follow-up (parameter t0) is not specified because all persons are followed from birth (it should be specified in situations with delayed entry). The excess LYL among those with a disease compared to the general population at age 45 years of 6.2 years can be decomposed into 5.0 years due to natural causes and 1.2 years due to unnatural causes.

lyl_diff(LYL45_ci, LYL45_ref)

The R output is shown in Fig 7.

Fig 7. R Output 6.

Fig 7

The function lyl_diff compares two objects, which can be provided with confidence intervals (as LYL45_ci) or without them (as LYL45_ref). When estimating confidence intervals for the difference, the object without confidence intervals is assumed to be estimated without uncertainty (this assumption might be reasonable if the entire population is available). Finally, it is possible to draw the two survival curves side by side with the function lyl_2plot.

lyl_2plot(LYL45, LYL45_ref)

The plot generated is shown in Fig 8. On average, we can see that, for those alive at age 45 years, mortality after this age is much lower for a person from the general population than for a person with the disease of interest.

Fig 8.

Fig 8

Survival curves and stacked cause-specific cumulative incidences for natural and unnatural causes of deaths for persons with a diagnosis of the disease (left panel) and the general population (right panel) alive at age 45 years. Details on how to interpret these figures are available in S1 Appendix.

The main reason to compare those with a given disease to the general population – and not to persons without the disease – is that the number of LYL at a given age, e.g. 45 years, is estimated using mortality rates at ages 45 years and beyond. By choosing persons without the disease as a comparison group, we would assume that someone who has not experienced the disease at age 45, would remain free of the disease until death. Although it might seem problematic to include persons with a disease in both the diseased and reference groups, this is analogous to standardized mortality ratios, which compare mortality in a group of persons to the one in the general population [18]. In any case, differences in life expectancy would be even larger if the comparison group were persons without the disease.

Step 4: Life Years Lost over a range of different ages

We have presented how to estimate LYLs at age 45 years. In order to obtain a summary measure of LYLs associated with the disorder of interest, remaining life expectancy after diagnosis for each person in the group of individuals with the disease has to be calculated. This is the same as estimating them for each specific age from 1 to 94 years, and then weight the average depending on the number of new cases at each age, which can be performed with functions lyl_range and summary with the appropriate weights. Age-specific estimates are available in Table 1. The weighted average for remaining life expectancy after the diagnosis is 34.0 years, which means that on average persons with the disease live an additional 34.0 years after the diagnosis. Alternatively, individuals with the disease lose 22.4 years of life after the diagnosis when considering a theoretical maximum of 95 years.

Table 1. For each age i from 0 to 94 years: number of persons diagnosed with the disease (ni); 95-restricted remaining life expectancy at each age for those with the disease (LEi1) and those from the general population (LEi0); total, natural and unnatural years of life lost at each age (denoted respectively as 95−iəi, 95iəi1 and 95iəi2 in S1 Appendix) for those with the disease (LLi1,LLni1 and LLui1, respectively) and those from the general population (LLi0,LLni0 and LLui0, respectively); and differences between these estimates (LEi0LEi1 or LLi1LLi0 for overall differences, and LLni1LLni0 and LLui1LLui0, respectively, for cause-specific differences). Weighted means are averages of each column weighted by the number of cases at each age, corresponding to means over the whole group of women with the disease.

All columns are in years except ‘Cases’.

Age
i
DISEASED GENERAL POPULATION DIFFERENCE
Cases
ni
Life exp LEi1 Life lost
LLi1
Natur LLni1 Unnat
LLui1
Life exp
LEi0
Life lost
LLi0
Natur
LLni0
Unnat
LLui0
Total Natur Unnat
0 0 67.4 27.6 22.5 5.1 75.4 19.6 17.6 2.0 8.0 4.9 3.1
1 35 66.4 27.6 22.5 5.1 74.6 19.4 17.4 2.0 8.2 5.1 3.1
2 76 65.4 27.6 22.5 5.1 73.6 19.4 17.4 2.0 8.2 5.1 3.1
44 299 26.9 24.1 21.8 2.3 33.2 17.8 16.8 1.0 6.3 5.0 1.3
45 302 26.1 23.9 21.7 2.2 32.3 17.7 16.7 1.0 6.2 5.0 1.2
46 301 25.3 23.7 21.6 2.1 31.4 17.6 16.6 1.0 6.1 5.0 1.1
93 85 1.5 0.5 0.5 0.01 1.6 0.4 0.4 0.01 0.1 0.1 0.00
94 57 0.8 0.2 0.2 0.01 0.9 0.1 0.1 0.00 0.1 0.1 0.00
Weighted means 34.0 22.4 19.3 3.1 40.2 16.2 15.0 1.2 6.2 4.3 1.9

LYL <- lyl_range(data = diseased, t0 = age_disease, t = age_death, status = cause_death, age_begin = 0, age_end = 94, tau = 95)

LYL_ci <- lyl_ci(LYL, niter = 1000)

summary(LYL_ci, weights = diseased$age_disease)

The R output is shown in Fig 9.

Fig 9. R Output 7.

Fig 9

Analogously, it is possible to estimate Life Years Lost over a range of ages for the general population, and summarize using the same weights as the population with the disease. The weighted average of remaining life expectancy for the general population for ages corresponding to the age-at-onset distribution is 40.2 years. The excess LYL among persons with a given disease after disease onset compared to the general population of same age are therefore 6.2 years (22.4 – 16.2), which can be decomposed into 4.3 years (19.3 – 15.0) due to natural causes and 1.9 years (3.1 – 1.3; with different results due to rounding error) due to unnatural causes.

LYL_ref <- lyl_range(data = simu_data, t = age_death, status = cause_death, age_begin = 0, age_end = 94, tau = 95)

summary(LYL_ref, weights = diseased$age_disease)

lyl_diff(LYL_ci, LYL_ref, weights = diseased$age_disease)

The R output is shown in Fig 10.

Fig 10. R Output 8.

Fig 10

When going from several age-specific LYL to one single estimate, we simply average over the observed distribution of onset ages. This information is usually available when collecting data from a group of patients with a disease. Alternatively, if information on the entire population (and not only the diseased) is available, one could use the distribution of onset ages conditional on disease occurrence estimated through transition intensities in the illness-death model [2]. It is also important to take into consideration the age-of-onset distribution in the population of interest, as ages with more cases will have larger weights in the overall estimate. Naturally, life lost will be larger at younger ages, simply because the potential of life lost at younger ages is larger than at older ages. Two diseases with the exact age-specific excess LYL could have different overall LYL if the age-of-onset distribution for the two diseases is different. The function summary without the appropriate weights will provide a table with the LYL at each specific age, which could be useful to investigate age-specific life lost.

Step 5 Comparison to the general population when individual-level data are not available

In some situations, individual data from the general population might not be available. Fortunately, standard life tables are usually available from the Central Bureau of Statistics in each country or from the World Health Organization, and estimates from the population with a disease can be compared to these standard life tables. Standard life tables for Danish women in the period 2017-2018 (www.statistikbanken.dk) are provided in the ‘pop_ref’ dataset (available through the lillies package) to be used as example.

data(pop_ref)

head(pop_ref); tail(pop_ref)

The R output is shown in Fig 11.

Fig 11. R Output 9.

Fig 11

For each age, the proportion of women alive and the mortality rate are provided. The estimation of excess LYL using life tables can be performed with the function lyl_diff_ref, and only one of the two age-specific measures provided (mortality rates or survival) is sufficient. This function identifies whether the results for individuals with a disease is at one specific age or over a range of ages, or whether it includes confidence intervals, and returns the corresponding comparison.

lyl_diff_ref(LYL45, data_ref = pop_ref, age = age, surv = survival)

lyl_diff_ref(LYL_ci, data_ref = pop_ref, age = age, rates = mortality_rates, weights = diseased$age_disease)

The R output is shown in Fig 12.

Fig 12. R Output 10.

Fig 12

If the simulated population used in this example were Danish women in 2017-2018, then the excess LYL at age 45 years could be estimated as 12.3 years. The average excess LYL among persons with a given disease after disease onset compared to the general population of Danish women of same age are 11.8 years (95% CI: 11.7 – 12.0).

Excess Life Years Lost using aggregated-level data

We saw in the previous section that publicly available life tables can be used to compare life expectancy for a group of individuals with a disease with that of the general population. In some instances, individual-level data is not available for the group of individuals with the disease of interest either. However, it is still possible to estimate LYLs if the researcher has access to (i) number of new cases at each specific age, and (ii) age-specific mortality rates among those with the disease (or age-specific survival probability). A simulated dataset ‘aggreg_data’ is available through the package lillies as example for a disease with possible onset after age 40 years. Note also that data is available only until age 90 years (the maximum age τ has to be set at 90 years for this example).

data(aggreg_data)

head(aggreg_data); tail(aggreg_data)

The R output is shown in Fig 13.

Fig 13. R Output 11.

Fig 13

Excess LYL at one specific age using aggregated data can be estimated with function lyl_aggregated, providing aggregated data for those with the disease and also for the general population. A person with disease onset at age 70 years experiences an excess LYL of 7.7 years (12.6 years vs. 4.9 years in the reference population consisting of all Danish women, as in the previous example).

lyl_summary_data70 <-

    lyl_aggregated(data = aggreg_data, age = age, rates = rate,

            data0 = pop_ref, age0 = age, surv0 = survival,

            age_specific = 70, tau = 90)

summary(lyl_summary_data70)

plot(lyl_summary_data70)

The R output is shown in Fig 14 and the plot generated is shown in Fig 15. On average, we can see that, for those alive at age 70 years, mortality after this age is much higher for a person with the disease of interest than from the general population.

Fig 14. R Output 12.

Fig 14

Fig 15.

Fig 15

Survival curve and stacked cumulative incidence for mortality for persons with a diagnosis of the disease (left panel) and the general population (right panel) alive at age 70 years. Details on how to interpret these figures are available in S1 Appendix.

Excess LYL averaged over the observed age-at-onset distribution can be estimated with function lyl_aggregated_range, providing the number of new cases at each age (weights).

lyl_summary_data <-

    lyl_aggregated_range(data = aggreg_data, age = age,

                rates = rate, weights = new_cases,

                data0 = pop_ref, age0 = age, surv0 = survival,

                age_begin = 40, age_end = 89, tau = 90)

summary(lyl_summary_data)

The R output is shown in Fig 16. Persons with a diagnosis experience a remaining life expectancy after disease onset 6.2 years shorter than the reference population.

Fig 16. R Output 13.

Fig 16

Conclusions

The Life Years Lost is an informative measure of mortality associated with a given trait or disorder that has greater construct validity because it uses the observed age-at-onset. One of its main advantages is that it allows the decomposition of excess mortality into specific causes of death, which is important in order to examine the magnitude of each cause, and implement focused public health programs. In this paper, we present the lillies R package, which can be used to estimate Life Years Lost and we show how to implement the method using a simulated population and life tables (available through the package). With this package, epidemiologists, applied biostatisticians, and other researchers can easily estimate cause-specific Life Years Lost using their own data. The LYL estimates are meant to be used merely as a descriptive tool rather than one on which causal conclusions should be drawn. This is first of all because the question raised concerning life-years lost with a given disease does not correspond to a well-defined intervention for which a randomized study could be conceptualized [19]. Further details about the method are available in S2 Appendix. For example, life expectancy is estimated non-parametrically and, in some instances, a small number of individuals at risk (especially at older ages) can lead to unreliable estimates. We provide a function to examine whether there are enough observations in the population to obtain valid results. As in standard survival analysis, the LYL method works well with left truncation and right censoring as long as the assumption of independent censoring is met (i.e. those being censored at one specific time should be representative of those still at risk at that time). Additionally, when there is administrative censoring at one specific age (i.e. there are no observations by design after a certain age), as it was the case with the examples provided using individual-level data (95 years) and aggregated data (90 years), the LYLs can only be interpreted as life-lost before that specific age, included in the models as τ. We also show in S2 Appendix how to examine whether the number of bootstrap iterations used to estimate confidence intervals are sufficient. Additionally, we show in S2 Appendix how to interpret negative excess LYLs; in a recent study [14], we observed that men with mental disorder had negative excess LYL related to cancer (i.e. the general population experienced a larger amount of life lost due to cancer than those with mental disorders), an underappreciated feature in previous studies. This finding does not implicate that those with mental disorders have lower mortality due to cancer, but it relates to the ability of the LYLs to accommodate different causes of death. While men with mental disorders have higher rates of dying from cancer, they have even higher rates of dying of non-cancer causes of death, which precludes them of dying from cancer [14]. Finally, we provide the R code used to replicate the examples provided (S3 Appendix).

Supporting information

S1 Appendix. Basic concepts of survival analysis.

(PDF)

S2 Appendix. Technical extensions of the Life Years Lost method.

(PDF)

S3 Appendix. R code to replicate the examples.

(R)

Data Availability

All relevant code to replicate the results in the paper is available in the supplement, while all relevant data are available through the 'lillies' R package, freely available through the Comprehensive R Archive Network (CRAN).

Funding Statement

This work was supported by the European Union’s Horizon 2020 research and innovation programme (Marie Sklodowska-Curie grant agreement No 837180 to Oleguer Plana-Ripoll), the Danish National Research Foundation (Niels Bohr Professorship to John J McGrath), and the National Health and Medical Research Council (John Cade Fellowship to John J McGrath). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Dicker D, Nguyen G, Abate D, Abate KH, Abay SM, Abbafati C, et al. Global, regional, and national age-sex-specific mortality and life expectancy, 1950–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392: 1684–1735. 10.1016/S0140-6736(18)31891-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Andersen PK. Life years lost among patients with a given disease. Stat Med. 2017/06/07. 2017;36: 3573–3582. 10.1002/sim.7357 [DOI] [PubMed] [Google Scholar]
  • 3.Andersen PK, Canudas-Romo V, Keiding N. Cause-specific measures of life years lost. Demogr Res. 2013;29: 1127–1152. 10.4054/DemRes.2013.29.41 [DOI] [Google Scholar]
  • 4.Lawrence D, Hancock KJ, Kisely S. The gap in life expectancy from preventable physical illness in psychiatric patients in Western Australia: retrospective analysis of population based registers. BMJ. 2013;346: f2539 10.1136/bmj.f2539 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Laursen TM, Nordentoft M, Mortensen PB. Excess Early Mortality in Schizophrenia. Annu Rev Clin Psychol. 2014;10: 425–48. 10.1146/annurev-clinpsy-032813-153657 [DOI] [PubMed] [Google Scholar]
  • 6.Nordentoft M, Wahlbeck K, Hällgren J, Westman J, Ösby U, Alinaghizadeh H, et al. Excess Mortality, Causes of Death and Life Expectancy in 270,770 Patients with Recent Onset of Mental Disorders in Denmark, Finland and Sweden. Burne T, editor. PLoS One. 2013;8: e55176 10.1371/journal.pone.0055176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Petrie D, Lung TWC, Rawshani A, Palmer AJ, Svensson A-M, Eliasson B, et al. Recent trends in life expectancy for people with type 1 diabetes in Sweden. Diabetologia. 2016;59: 1167–1176. 10.1007/s00125-016-3914-7 [DOI] [PubMed] [Google Scholar]
  • 8.Livingstone SJ, Levin D, Looker HC, Lindsay RS, Wild SH, Joss N, et al. Estimated Life Expectancy in a Scottish Cohort With Type 1 Diabetes, 2008-2010. JAMA. 2015;313: 37 10.1001/jama.2014.16425 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Andersson TM-L, Dickman PW, Eloranta S, Sjövall A, Lambe M, Lambert PC. The loss in expectation of life after colon cancer: a population-based study. BMC Cancer. 2015;15: 412 10.1186/s12885-015-1427-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kessing L V, Vradi E, Andersen PK. Life expectancy in bipolar disorder. Bipolar Disord. 2015/04/08. 2015;17: 543–548. 10.1111/bdi.12296 [DOI] [PubMed] [Google Scholar]
  • 11.Laursen TM, Musliner KL, Benros ME, Vestergaard M, Munk-Olsen T. Mortality and life expectancy in persons with severe unipolar depression. J Affect Disord. 2016;193: 203–207. 10.1016/j.jad.2015.12.067 [DOI] [PubMed] [Google Scholar]
  • 12.Erlangsen A, Andersen PK, Toender A, Laursen TM, Nordentoft M, Canudas-Romo V. Cause-specific life-years lost in people with mental disorders: a nationwide, register-based cohort study. Lancet Psychiatry. 2017/11/11. 2017;4: 937–945. 10.1016/S2215-0366(17)30429-7 [DOI] [PubMed] [Google Scholar]
  • 13.Andersen PK. Decomposition of number of life years lost according to causes of death. Stat Med. 2013/07/11. 2013;32: 5278–5285. 10.1002/sim.5903 [DOI] [PubMed] [Google Scholar]
  • 14.Plana-Ripoll O, Pedersen CB, Agerbo E, Holtz Y, Erlangsen A, Canudas-Romo V, et al. A comprehensive analysis of mortality-related health metrics associated with mental disorders: a nationwide, register-based cohort study. Lancet. 2019;394: 1827–1835. 10.1016/S0140-6736(19)32316-5 [DOI] [PubMed] [Google Scholar]
  • 15.Laursen TM, Plana-Ripoll O, Andersen PK, McGrath JJ, Toender A, Nordentoft M, et al. Cause-specific life years lost among persons diagnosed with schizophrenia: Is it getting better or worse? Schizophr Res. 2019;206: 284–290. 10.1016/j.schres.2018.11.003 [DOI] [PubMed] [Google Scholar]
  • 16.Dafni U. Landmark analysis at the 25-year landmark point. Circ Cardiovasc Qual Outcomes. 2011; 10.1161/CIRCOUTCOMES.110.957951 [DOI] [PubMed] [Google Scholar]
  • 17.Putter H, Houwelingen HC van. Understanding Landmarking and Its Relation with Time-Dependent Cox Regression. Stat Biosci. 2017;9: 489–503. 10.1007/s12561-016-9157-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Clayton D, Hills M. Statistical models in epidemiology. Oxford: Oxford University Press; 1993. [Google Scholar]
  • 19.Hernán MA, Robins JM. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC; 2020. [Google Scholar]

Decision Letter 0

Louise Emilsson

22 Nov 2019

PONE-D-19-29790

lillies: an R package for the estimation of excess Life Years Lost among patients with a given disease or condition

PLOS ONE

Dear Plana-Ripoll,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

ACADEMIC EDITOR:

As you can see from the reviewers comments (below) we feel this is a good description of the R package lillies and how to use it. The paper would need to be improved further by describing the underlying assumptions and limitations. For e.g. you mention that it is important to use a tau as large as possible so that few individuals are (right) censored. Indeed in most epidemiological research there is alot of right censoring. Would that disqualify this method? Is there a target value (proportion of right censoring) where you consider the method (in)appropriate? Specifically for competeing risks this is a cruicial problem and needs a thorough discussion. Further the method needs to be put in a causal framework of thinking (counterfactual comparison(s) identified by your estimator) as suggested by the reviewers. Also for your overall estimate the package (or at least your description) lack a comprehensive description of the underlying distribution of age at onset of the disease in the population/cohort. 12 lifeyears lost from birth is very different than 12 years lost from age 75 both in terms of interpretation and percentage etc. Please also adress all the specific comments raised by the reviweres. We are looking forward to your resubmission.

==============================

We would appreciate receiving your revised manuscript by Jan 06 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Louise Emilsson

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

1. Our internal editors have looked over your manuscript and determined that it is within the scope of our Digital Health Technology Call for Papers. This collection of papers is headed by a team of Guest Editors for PLOS ONE: Eun Kyoung Choe (University of Maryland, College Park), Chelsea Dobbins (University of Queensland), Sunghoon Ivan Lee (University of Massachusetts, Amherst), and Claudia Pagliari (University of Edinburgh).The Collection will encompass a diverse range of research articles on digital health technologies ranging from technology design to patient care and health systems management.  Additional information can be found on our announcement page: https://collections.plos.org/s/digital-health-tech.

If you would like your manuscript to be considered for this collection, please let us know in your cover letter and we will ensure that your paper is treated as if you were responding to this call. If you would prefer to remove your manuscript from collection consideration, please specify this in the cover letter.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript "lillies: an R package for the estimation of excess Life Years Lost among patients with a given disease or condition" is a very well written work that introduces the reader to the method of Life Years Losts in a very friendly, clear and comprehensive tone. The paper first describes the method and guides the reader through a number of steps, where the theoretical content is well complemented by the provided code and the interpretation of the output.

The paper does a great job by presenting the different features of the R package on different scenarios (like competing risks, aggregated-level data and comparison to general population using available life tables). The functions are very intuitive and the names of the arguments for each function are easy to remember given the provided examples. In addition, it is very considerate that the plots can be modified using ggplot2 arguments, and that the package provides handy functions to assess how small numbers can influence the estimates and how many iterations are necessary for bootstrap confidence intervals. The consistency in the coding style (for example using snake_case for variable names) is also appreciated.

Mayor comments:

1. Given that the work and the R package are targeted to applied researchers, it would be useful to emphasize under which underlying assumptions the estimates are valid in the Methods Section.

2. In addition, it would be useful to add in the Conclusions section, about the limitations of the method; particularly about providing causal interpretations based on the results.

Minor comments:

1. In the conclusions section, it would be useful to mention if there are other available R packages for the same method, and if so, compare the proposed package to the packages available.

2. At the end of Step 3, could you provide a reference after the sentence: “Although it might seem problematic to include persons with a disease in both the diseased and reference groups, this is analogous to standardized mortality ratios, which compare mortality in a group of persons to the one in the general population”. (Lines 305, 306, page 11)

Reviewer #2: This is a very well-written paper that addresses an interesting question – how to estimate average excess life years lost among individuals with a disease compared with the general population. The authors have prepared an R package to disseminate their work, facilitating the implementation of their ideas by a general audience. However, I have a couple of concerns:

1) The authors have not addressed or cited any of the important work that has been done in the field of causal inference in the last 30 years (e.g. any work by Jamie Robins, Tyler VanderWeele, Miguel Hernan). It appears that they are asking a fundamentally causal question: what are the average life years lost if everyone suffered from a particular disease compared with the “natural course” – the disease status they suffered in real life. This is a huge gap when thinking about this paper and should be addressed. The authors repeatedly describe causal quantities but do not consider any of the identifiability conditions for their claims, e.g. what counterfactual comparisons they are making by using their estimator. I believe this paper would be much stronger if it filled in this gap.

2) One issue which the authors do not consider is the fact that individuals may develop the disease at any time during the study. That brings into question the issue of having a clearly defined “time zero,” which can lead to a lot of immortal time bias (see Suissa 2007). The authors may want to familiarize themselves with the work by Danaei et al (2013) in the effect of statins on CVD, where they apply a “multiple trials” approach to estimating an average treatment effect (where your “treatment” would be disease versus no disease).

3) It seems that the authors choose to study a conditional survival quantity – probability of survival at time t if an individual survived to be 45 (figure 1, page 7). This should be more clearly specified and outlined. Also, there may be limitations to using a conditional survival quantity versus a marginal survival quantity.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Mar 6;15(3):e0228073. doi: 10.1371/journal.pone.0228073.r002

Author response to Decision Letter 0


6 Dec 2019

Dear Dr Emilsson,

Thanks for allowing us the opportunity to revise and resubmit this manuscript, and for providing us with many constructive suggestions. We have addressed each reviewer point below. Two versions of the revised manuscript have been uploaded (a ‘clean’ version, and one with track changes showing corrections). Additionally, we made changes to meet PLOS ONE’s style requirements (e.g. format of headings and subheadings), as suggested in the decision letter. Finally, thank you for alerting us to the interesting call for Digital Health Technology papers, but we do not think our manuscript is a good match, so we would like to have the manuscript removed from this collection consideration.

ACADEMIC EDITOR:

As you can see from the reviewers comments (below) we feel this is a good description of the R package lillies and how to use it. The paper would need to be improved further by describing the underlying assumptions and limitations. For e.g. you mention that it is important to use a tau as large as possible so that few individuals are (right) censored. Indeed in most epidemiological research there is alot of right censoring. Would that disqualify this method? Is there a target value (proportion of right censoring) where you consider the method (in)appropriate? Specifically for competeing risks this is a cruicial problem and needs a thorough discussion. Further the method needs to be put in a causal framework of thinking (counterfactual comparison(s) identified by your estimator) as suggested by the reviewers. Also for your overall estimate the package (or at least your description) lack a comprehensive description of the underlying distribution of age at onset of the disease in the population/cohort. 12 lifeyears lost from birth is very different than 12 years lost from age 75 both in terms of interpretation and percentage etc. Please also adress all the specific comments raised by the reviweres. We are looking forward to your resubmission.

RESPONSE: Thank you for constructive suggestions, which we address in this revision of the manuscript:

We have now improved the manuscript by describing the assumptions and limitations. We agree there is usually a lot of right censoring in epidemiological research. The life-years lost method uses standard survival analysis techniques and can deal with right censoring without problems, as long as the assumption of independent censoring holds. A large amount of right censoring would lead to more uncertainty in the estimates, as in standard estimates such as the Kaplan-Meier or the Aalen-Johansen, but the point estimate will be unbiased if censoring is independent. We have now reworded some parts of the manuscript to include the assumption of right censoring. Specifically, we stated in the introduction that (new text underlined) “the LYL method also works with censoring at different ages, as in a standard time-to-event analysis (and therefore assuming independent censoring, i.e. those being censored at one specific time should be representative of those still at risk at that time)” (page 6, line 124); and in the conclusions that “As in standard survival analysis, the LYL method works well with left truncation and right censoring as long as the assumption of independent censoring is met (i.e. those being censored at one specific time should be representative of those still at risk at that time)” (page 19, line 549).

Regarding the use of a tau as large as possible, we believe our explanations in the previous version of the manuscript were not clear. The choice of a tau is related to the availability of data after a certain age. When there is administrative censoring at one specific age tau (i.e. there are no observations after age tau), it is not possible to estimate the survival curve beyond that age, and consequently neither is the life expectancy. For this reason, the estimate obtained can only be interpreted as life expectancy (or life-years lost) before age tau. By choosing an age tau in which the survival curve is as low as possible (ideally zero), the life-years lost before age tau can be interpreted simply as the overall life-years lost (for example, life-years lost before age 120 years can be interpreted as overall life-years lost, as survival at age 120 years is zero). We have now rephrased the manuscript to make this statement clearer: “In brief, remaining life expectancy at age 45 years is estimated as the area under the conditional survival curve from age 45 years to ∞; however, this measure is sometimes ill-determined if there are censored observations and the curve does not reach zero (i.e. some persons are still alive at the end of the curve), as it is the case in this example (in fact, the survival curve cannot reach zero if the last person at risk is censored, even if this is the only censored observation in the dataset). One approach to overcome this limitation is the �-restricted mean lifetime, which can be interpreted as the average number of years lived before time �, and is defined as the area under the curve until time �: For this example, � has been set to 95 years, an age in which persons were censored if they had not died before. The estimate of LYL has therefore to be interpreted as life lost after the specific age (45 years in this example) and before age 95 years. Although the choice of 95 years is arbitrary, the life lost before � can be interpreted as total life lost if � is an age in which the survival probability is as low as possible (ideally zero). However, in other settings, the researchers might be interested in LYL before age 18 years for childhood disorders, or before retirement at age 68 years, for example” (page 7, line 171). We also added the following sentence to the conclusions: “Additionally, when there is administrative censoring at one specific age (i.e. there are no observations after a certain age by design), as it was the case with the examples provided using individual-level data (95 years) and aggregated data (90 years), the LYLs can only be interpreted as life-lost before that specific age, included in the models as �“ (page 19, line 551).

The estimates obtained through the “lillies” package are valid even when the user aims to describe the reduction in life expectancy in people experiencing a disease or condition without making any causal claim. However, we agree it is important to not mislead the user about making any causal interpretation, and have added a paragraph on causal interpretations (page 18, line 539). See the comments to reviewer 2 for further details.

Thank you for the comment about age-at-onset distribution, which we believe it is very important. We have now added the following section when describing the transformation from age-specific estimates to the overall one: “It is also important to take into consideration the age-of-onset distribution in the population of interest, as ages with more cases will have larger weights in the overall estimate. Naturally, life lost will be larger at younger ages, simply because the potential of life lost at younger ages is larger than at older ages. Two diseases with the exact age-specific excess LYL could have different overall LYL if the age-of-onset distribution for the two diseases is different. The function summary without the appropriate weights will provide a table with the LYL at each specific age, which could be useful to investigate age-specific life lost” (page 14, line 394).

REVIEWERS:

Reviewer #1:

The manuscript "lillies: an R package for the estimation of excess Life Years Lost among patients with a given disease or condition" is a very well written work that introduces the reader to the method of Life Years Losts in a very friendly, clear and comprehensive tone. The paper first describes the method and guides the reader through a number of steps, where the theoretical content is well complemented by the provided code and the interpretation of the output.

The paper does a great job by presenting the different features of the R package on different scenarios (like competing risks, aggregated-level data and comparison to general population using available life tables). The functions are very intuitive and the names of the arguments for each function are easy to remember given the provided examples. In addition, it is very considerate that the plots can be modified using ggplot2 arguments, and that the package provides handy functions to assess how small numbers can influence the estimates and how many iterations are necessary for bootstrap confidence intervals. The consistency in the coding style (for example using snake_case for variable names) is also appreciated.

RESPONSE: Thank you for a nice comment.

Mayor comments:

1. Given that the work and the R package are targeted to applied researchers, it would be useful to emphasize under which underlying assumptions the estimates are valid in the Methods Section.

RESPONSE: The life-years lost method uses standard survival analysis estimates, e.g. Kaplan Meier for survival curves without competing risks, or Aalen-Johansen for cause-specific cumulative incidences. As such, the most important assumption is about independent censoring, i.e. those being censored at one specific time should be representative of those still at risk at that time. We have now included this assumption in the introduction of the method (page 6, line 124) and also in the conclusions section (page 19, line 549). See responses 1 and 2 to the Academic Editor suggestions for further details.

2. In addition, it would be useful to add in the Conclusions section, about the limitations of the method; particularly about providing causal interpretations based on the results.

RESPONSE: We have now included a paragraph about causal interpretations in the conclusions section (page 18, line 539). See the comments to reviewer 2 and response 3 to the Academic Editor suggestions for further details.

Minor comments:

1. In the conclusions section, it would be useful to mention if there are other available R packages for the same method, and if so, compare the proposed package to the packages available.

RESPONSE: As far as we know, there are no other packages available to estimate life-years lost averaged over the age-of-onset distribution.

2. At the end of Step 3, could you provide a reference after the sentence: “Although it might seem problematic to include persons with a disease in both the diseased and reference groups, this is analogous to standardized mortality ratios, which compare mortality in a group of persons to the one in the general population”. (Lines 305, 306, page 11)

RESPONSE: Thank you, we have provided the following reference: Clayton D, Hills M. Statistical models in epidemiology. Oxford: Oxford University Press. 1993.

Reviewer #2:

This is a very well-written paper that addresses an interesting question – how to estimate average excess life years lost among individuals with a disease compared with the general population. The authors have prepared an R package to disseminate their work, facilitating the implementation of their ideas by a general audience. However, I have a couple of concerns:

RESPONSE: Thank you for a thorough review and constructive suggestions.

1) The authors have not addressed or cited any of the important work that has been done in the field of causal inference in the last 30 years (e.g. any work by Jamie Robins, Tyler VanderWeele, Miguel Hernan). It appears that they are asking a fundamentally causal question: what are the average life years lost if everyone suffered from a particular disease compared with the “natural course” – the disease status they suffered in real life. This is a huge gap when thinking about this paper and should be addressed. The authors repeatedly describe causal quantities but do not consider any of the identifiability conditions for their claims, e.g. what counterfactual comparisons they are making by using their estimator. I believe this paper would be much stronger if it filled in this gap.

RESPONSE: We agree that we have not mentioned any work in the field of causal inference, but we believe we are not making any causal claim in the entire manuscript. The “lillies” package is based on the life-years lost method, described by one of our co-authors previously (Andersen, 2017). This method allows to estimate the average reduction in life expectancy after disease onset experienced by those with a specific disease, but the method does not include any assumption that the reduction is actually caused by the disease. For example, we have recently used this method to show that men and women with mental disorders experience respectively 10 and 7 years shorter life expectancies after disease diagnosis compared to the general Danish population of same sex and age (Plana-Ripoll et al., 2019). However, we believe this is merely a descriptive estimate, as it is very difficult to fulfil the assumptions for causality (e.g. what would be the counterfactual of experiencing depression?). We think that making any causal interpretation of these estimates depend on each specific case, and the user of the package should be the one to take this into consideration. In any case, we agree it is important to make this point clear and we included a paragraph on causal interpretations in the conclusions section: “The LYL estimates are meant to be used merely as a descriptive tool rather than one on which causal conclusions should be drawn. This is first of all because the question raised concerning life-years lost with a given disease does not correspond to a well-defined intervention for which a randomized study could be conceptualized (Hernán and Robins, 2020).” (page 18, line 539).

2) One issue which the authors do not consider is the fact that individuals may develop the disease at any time during the study. That brings into question the issue of having a clearly defined “time zero,” which can lead to a lot of immortal time bias (see Suissa 2007). The authors may want to familiarize themselves with the work by Danaei et al (2013) in the effect of statins on CVD, where they apply a “multiple trials” approach to estimating an average treatment effect (where your “treatment” would be disease versus no disease).

RESPONSE: We believe we are taking into consideration that individuals may develop the disease at any time during the study. In fact, we consider a variable of age at onset, which we treat as time-varying, and we consider individuals to experience the disease only since that age. In the dataset used as example, when estimating LYLs for those with a disease, we include an argument “t0 = age_disease” to specify that the diseased should enter the follow-up period only when they are diagnosed, and not since birth, precisely to avoid immortal time bias (page 8, line 203). In the first type of bias described in Suissa 2007 (misclassification of immortal time), individuals would be considered to have the disease since birth, which is exactly what we are recommending the users to avoid: “Note that the beginning of follow-up here is at age_disease to avoid immortal time bias (individuals survive until disease onset, therefore follow-up must start at disease onset, and not at birth, for the group of persons with the disease)” (page 8, line 203). The second type of bias described in Suissa 2007 (excluded immortal time) might seem what we are doing here, but in their case, there is a need for a “time 0” also among the unexposed because the time scale is time since exposure. However, in our analyses, age is always the underlying time scale, and individuals are followed since a pre-specified age, which is the same for exposed and unexposed. Consequently, we believe this is not creating immortal time bias.

3) It seems that the authors choose to study a conditional survival quantity – probability of survival at time t if an individual survived to be 45 (figure 1, page 7). This should be more clearly specified and outlined. Also, there may be limitations to using a conditional survival quantity versus a marginal survival quantity.

RESPONSE: The conditional survival curve is explained in Appendix S1. However, we have changed some text in the main manuscript to make it clearer it is the conditional survival curve we are estimating (page 7, lines 159, 164, 172). A main advantage of the life-years lost (compared to previous methods) is that it allows to take into consideration the observed age-at-onset of the disease, instead of assuming that all cases had onset at one particular age (e.g. at birth or at age 15 years). This, on the other hand, necessitates the use of a survival curve from disease onset and forward in time, i.e. a conditional survival curve, in order to avoid immortal time bias.

References

Andersen, P. K. (2017) ‘Life years lost among patients with a given disease’, Stat Med. 2017/06/07, 36(22), pp. 3573–3582. doi: 10.1002/sim.7357.

Hernán, M. A. and Robins, J. M. (2020) Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.

Plana-Ripoll, O. et al. (2019) ‘A comprehensive analysis of mortality-related health metrics associated with mental disorders: a nationwide, register-based cohort study’, The Lancet. Elsevier, 394(10211), pp. 1827–1835. doi: 10.1016/S0140-6736(19)32316-5.

Decision Letter 1

Louise Emilsson

8 Jan 2020

lillies: an R package for the estimation of excess Life Years Lost among patients with a given disease or condition

PONE-D-19-29790R1

Dear Dr. Plana-Ripoll,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Louise Emilsson

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Acceptance letter

Louise Emilsson

21 Jan 2020

PONE-D-19-29790R1

lillies: an R package for the estimation of excess Life Years Lost among patients with a given disease or condition

Dear Dr. Plana-Ripoll:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Louise Emilsson

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Basic concepts of survival analysis.

    (PDF)

    S2 Appendix. Technical extensions of the Life Years Lost method.

    (PDF)

    S3 Appendix. R code to replicate the examples.

    (R)

    Data Availability Statement

    All relevant code to replicate the results in the paper is available in the supplement, while all relevant data are available through the 'lillies' R package, freely available through the Comprehensive R Archive Network (CRAN).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES