Abstract
In this study, we present a new theory of partitioning of disease prevalence and incidence-based mortality and demonstrate how this theory practically works for analyses of Medicare data. In the theory, the prevalence of a disease and incidence-based mortality are modeled in terms of disease incidence and survival after diagnosis supplemented by information on disease prevalence at the initial age and year available in a dataset. Partitioning of the trends of prevalence and mortality is calculated with minimal assumptions. The resulting expressions for the components of the trends are given by continuous functions of data. The estimator is consistent and stable. The developed methodology is applied for data on type 2 diabetes using individual records from a nationally representative 5% sample of Medicare beneficiaries age 65+. Numerical estimates show excellent concordance between empirical estimates and theoretical predictions. Evaluated partitioning model showed that both prevalence and mortality increase with time. The primary driving factors of the observed prevalence increase are improved survival and increased prevalence at age 65. The increase in diabetes-related mortality is driven by increased prevalence and unobserved trends in time-periods and age-groups outside of the range of the data used in the study. Finally, the properties of the new estimator, possible statistical and systematical uncertainties, and future practical applications of this methodology in epidemiology, demography, public health and health forecasting are discussed.
1. Introduction
Prevalence is an epidemiologic characteristic which is easily measured using survey data or medical records. Analyses of prevalence trends play an influential role in health policy planning and are widely used to assess the extent to which a given health problem affects the population. However, conclusions about the relative success or failure of a health policy change cannot be made directly from trends of disease prevalence because temporal changes in age-adjusted prevalence rates are the result of two simultaneously occurring competing processes: i) changes in incidence and ii) changes in survival. Health interventions and disease treatment guidelines are usually aimed at decreasing the incidence and increasing the survival rate for a disease. If successful, these measures will push the observed prevalence in different directions. A related quantity of interest is the mortality rate by cause or more generally, the mortality for individuals after the onset of a specific disease. This is also known as the incidence-based mortality rate(Chu et al. 1994). The time trend of incidence-based mortality (Mozaffarian et al. 2016; Smith et al. 2013; Thun et al. 2013) is defined by the same factors that define the time trends in the disease prevalence rate, as well as trends in mortality in the general population. In contrast to disease prevalence, improvements in incidence and survival push the observed incidence-based mortality for a specific disease in the same direction, because improved incidence reduces the total number of people with the disease and improved survival further reduces the number of deaths associated with the disease.
In this paper, we develop a new methodological approach for the decomposition of trends in disease prevalence and incidence-based mortality into their constituent components (such as trends in incidence, survival, and prevalence prior to observation) and for the evaluation of the strength and the direction of the contribution of each respective component. The methodology described in this study offers a number of distinct strengths: i) computation of disease prevalence and incidence-based mortality as well as their partitioning through a set of exact formulas without making simplifying assumptions, ii) evaluation of the individual contributions of each component to the total time trend by direct calculation using exact formulas applied to real data, iii) a set of natural generalizations including applications to medical costs, complications of a specific disease, the incorporation of disease risk factors and the use of the historical trends of each of the model components beyond the region directly measured in data.
The only previously existing methodological approach of this type was developed by Tunstall-Pedoe for the partitioning of mortality trends through the use of an approximate formula for the simple decomposition of the annual percent change (APC) for mortality as a sum of APC’s of cardiovascular disease incidence and case fatality (percentage of 28-day fatalities)(Tunstall-Pedoe et al. 1999). This approximation is valid only for events (disease onset and death) occurring within a short time of each other and requires that the APC be small and the disease of interest be the primary cause of death. Other methods of decomposition used in demography and epidemiology (see refs.(Canudas-Romo 2003; Horiuchi, Wilmoth and Pletcher 2008; Vaupel and Romo 2003) for a comprehensive review) are not related to the decomposition of prevalence into its constituent components.
Although the primary focus of this paper is to introduce the methodology and describe the mathematics involved in its execution, an example involving type 2 diabetes mellitus is also considered. The application of the methodology to disease prevalence and mortality is intended to address an aspect of a current Public Health problem—with some notable exceptions such as cardiovascular disease (Will, Yuan and Ford 2014), the prevalence rate of many chronic diseases including diabetes has been increasing with time (Akinbami et al. 2012; Bauer et al. 2014; Coresh et al. 2007; Egan, Zhao and Axon 2010). Understanding the contribution each individual component makes to the overall effect on disease prevalence and mortality and how these contributions have changed over time in response to changes in health policy, population age-structure and epidemiologic characteristics could be of great use in identifying likely targets for pro-active policy interventions.
2. Theory
2.1 Mathematical formalism
Data collected in an observational study represent information on eligible individuals over given periods of age and time. In this study, we use a nationally representative 5% sample of the U.S. Medicare population provided for research as restricted access public use files by the Centers for Medicare and Medicaid Services. This database provides individual heath related information on U.S. Medicare beneficiaries after age 65 from 1991 to 2013. The long time period and level of detail provided by such data allow us to calculate disease prevalence and mortality at any point after a certain look-back period (12 months is used in this study) necessary to collect individual information for evaluation of disease presence. Figure 1 presents the Lexis diagram in the plane over age (in years; denoted by x) and calendar time (in years; denoted by y). Each of the dashed lines in the Lexis diagram uniquely corresponds to a birth cohort with the birth time yb =y−x for any point (x,y) belonging to the cohort-specific dashed line. Therefore, epidemiologic characteristics at a given point of time are defined by the history of the cohort represented by a leftward move along the respective line in the Lexis diagram down to bounds of the available region. The bound is defined by an initial year (y00) or minimal age (x0) observed in the data. These two subareas are separated by the bisecting line defined as y=y00+x− x0. Above the bisecting line, the starting point is defined by the initial conditions y=y00 with various ages while below the line the initial point is defined by boundary condition x=x0 with various years. The cohort-specific bounding point is defined as x̄0 = max(x0, y00 − yb) and ȳ0= yb+ x̄0. Definitions of ages and times as well as functions of survival analyses used in the paper are collected in Table 1.
Table 1.
Ages and calendar times used as arguments in the survival functions | ||||
x | Current age in years | y | Current (calendar) time in years | |
x0 | Minimal age observed in data (see Figure 1) | y00 | Initial time in data (see Figure 1) | |
x̄0 = max(x0, y00 − yb) | The cohort-specific bounding (minimal) age | ȳ0 = yb + x̄0 | The cohort-specific bounding (minimal) time | |
τ, x̄0 < τ ≤ x | Age at diagnosis | yd = y − x + τ | Time of diagnosis | |
x00 = y00 − y + x | The cohort specific age at y00 | y0 = y − x + x0 | The cohort-specific time of reaching age x0 | |
yb = y − x | Time of birth for a cohort | |||
| ||||
Survival analysis functions for age-specific prevalence and mortality | ||||
Pc(x, yb) | Probability of being prevalent at age x in cohort with birth time yb | |||
Ic(τ, yb) | Incidence density function for birth cohort yb. The normalization rule for the density is | |||
Mc(x, yb) | Mortality density function for birth cohort yb. The normalization rule for the density is | |||
|
Survival function and respective density function of a patient group formed at age x̄0 and time ȳ0, diagnosed before x̄0, and survived to x (i.e., living x − x̄0 years after cohort forming) | |||
|
Survival function and respective density function of a patient group diagnosed at age τ and time yd, and survived to x (i.e., living x − τ years after cohort forming) | |||
St(x − τ, τ, yd) and μ(x, y) | Survival function and respective mortality hazard function in the general population for the cohort formed at age τ and yd, i.e., St(0, τ, yd) = 1. | |||
| ||||
Survival analysis functions for age-adjusted prevalence and mortality | ||||
P(x, y) | Prevalence at age x and time y | |||
I(x, y) | Incidence hazard function at age x and time y | |||
M(x, y) | Incidence based mortality hazard function at age x and time y | |||
P(y) | Age-adjusted prevalence at time y | |||
M(y) | Age-adjusted Incidence-based mortality hazard function at time y | |||
|
Relative survival and respective density function of a patient group diagnosed at age τ and year yd, and reached age x (i.e., living x − τ years after diagnosis and cohort forming) | |||
|
Relative survival and respective density function of a patient group function of a patient group formed at age x̄0 and time ȳ0, diagnosed before x̄0, and survived to age x (i.e., living x − x̄0 years after cohort forming) | |||
p(x) | the density of age distribution in a standard year | |||
T(y) and T̂(y) | Specific contributions to the time trends of age-adjusted disease prevalence and incidence-based mortality (discussed in detail in Interpretation of partitioning components) |
The idea for the representation of the formulas for prevalence is based on that the probability of being prevalent Pc(x, yb) at age x in cohort c with birth time yb requires either
being prevalent (represented by initial prevalence Pc(x̄0, yb)) in the initial age x̄0 (and year ȳ0) for the cohort and surviving to age x (represented by the survival probability S̄(x − x̄0, x̄0, ȳ0) of a patient diagnosed no later than x̄0), or
being incident at an earlier age τ, x̄0 < τ ≤ x (represented by incidence density function Ic(τ, yb)) and having survival longer than x − τ (represented by survival probability S(x−τ, τ, yd) of a patient diagnosed at age τ and year yd).
Therefore
(1) |
where we integrate over all possible ages at diagnosis. Similarly, for mortality (we consider incidence-based mortality, i.e., mortality after disease onset) the probability of dying in the age interval (x,x +dx) requires having death in the interval (x,x + dx) and either being prevalent at the boundary point (x̄0,ȳ0) for this cohort or being incident at an earlier age x − τ. Death is represented by a respective density function Mc(x, yb) such that
(2) |
The densities f̄c (x−x̄0, x̄0, ȳ0) and fc (x−τ, τ, yd) in (2) are related to respective survival functions in (1): and , where derivatives are taken in respect to the first argument. Details of derivation of eqs. (1) and (2) and some properties of the contributed functions are given in Appendix A.
The exact definition of Pc(x, yb) is the fraction of individuals born in year yb and living with the disease at age x of the total number of individuals born in year yb. Similarly, Ic(τ, yb) and Mc(x, yb) are the cohort incidence and mortality densities defined through the number of new incident and death cases per cohort size (i.e., the number of individuals born in year yb). However, the cohort size for the studied population is not usually known with sufficient accuracy. What is known (or can be estimated) is the current population at risk i.e., the population currently living in the same age and calendar year (denoted as y = yb + x) or calendar year of diagnosis (denoted as yd = yb + τ). Therefore, we avoid dealing with cohort prevalence and incidence/mortality densities and use their standard definitions involving the population at risk rather than birth cohort size. Within these definitions, the cohort prevalence and incidence/mortality densities are expressed through accepted definitions of prevalence and hazard functions of incidence and mortality: Pc(x, yb) = P(x, y)St(x, 0, yb), Mc(x, yb) = M(x, yd) St (x,0, yb), and Ic(τ, yb) = I(τ, yd) St (τ,0, yb), where St (x,0, yb) is the survival function of the cohort born during year yb. Using these expressions in eq. (1) results in occurrence of three survival functions in the right hand side which can be combined in the relative survival functions (i.e., the ratios of survival probabilities for individuals with the disease and general population):
(3) |
The resulting expression for age-specific prevalence is:
(4) |
where ȳ0 = yb + x̄0 = y−x+x̄0 and yd = y−x+τ.
Combining survival probabilities for mortality results in the ratio of density fc (or f̄c) to the survival probability in the general population, that can be further transformed as
(5) |
The function μ(x,y) is the mortality hazard function in the general population. Note because , we have . We see that all terms at μ(x,y) give the prevalence and finally we obtain the incidence-based mortality in terms of μ(x,y) and functions previously derived.
(6) |
Explicit representation (i.e., avoiding a function maximum used in the definition of x̄0) of prevalence and mortality as functions of x and y that allows for expressing prevalence and mortality in terms of x, y, and constants, requires considering two regions below and above the bisecting line, i.e., the regions defined by inequalities y ≥ y00 + x − x0 and y < y00 + x − x0. Thus
(7) |
for y< y00 + x − x0 and
(8) |
for y≥y00 + x− x0. The formulas (7) and (8) coincide for y=y00 + x − x0. In these formulas we denote the age at y00 as x00 = y00 − y+x and year at x0 as y0 = y−x+x0.
The quantity of interest is the time trend of age adjusted prevalence (over the age region (x0,xmax)) and mortality as well as their partitioning. Age-adjusted prevalence and incidence-based mortality based on (7) and (8) are:
(9) |
and
(10) |
where I() is the indicator function and p(x) is the density of age distribution in a standard year. Recall, x00 = y00 − y+x, yd = y− x+τ, and y0 = y−x+x0 are functions of y and the integration variables x and τ. Age-adjusted prevalence and mortality are functions of three and four contributing factors, respectively:
(11) |
The derivative of P(y) with respect to y represents the time trend of age-adjusted prevalence and are determined by trends in the respective components including initial prevalence (i.e., prevalence at x0 or y00), incidence rates, relative survival after disease onset and in patients with the disease at initial point of observation. Explicit differentiation results in seven terms (note that max(x0, x00) depends on y because of x00). Thus,
(12) |
with explicit expressions for terms
(13) |
Derivative of mortality includes nine terms
(14) |
with
(15) |
Non-trivial technical aspects of derivation of the derivatives (13) and (15) are discussed in Appendix B.
2.2. Interpretation of partitioning components
The three terms contributing to disease prevalence in eq. (11) correspond to the contributions of individuals with disease onset i) before x0 = 65 (the age of eligibility for Medicare coverage for the majority of the general population) for the cohorts with y≥y00 +x− x0 (i.e., cohorts below the bisecting line the Lexis diagram in Figure 1), ii) before y00 = 1992 for the cohorts y<y00 +x− x0 (i.e., cohorts above the bisecting line in Figure 1), and iii) after x0 = 65 and after y00 = 1992 (i.e., in the shaded area in the Lexis diagram in Figure 1). Mortality in eq. (11) has four terms, MPμ(y) and three others: M0(y), M00(y), and Mis(y) which have the same meaning as the three equivalent terms in prevalence. These three terms represent the mortality rates of individuals with disease onset before x0, before y00, and after both x0 and y00 respectively. These terms are expressed in terms of relative survival and therefore represent the mortality of individuals with the disease relative to the mortality in the general population. The additional term in eqs. (11), MPμ(y), represents mortality for the prevalent population with the mortality rate as in the general population. In sum, eq. (11) models two components of mortality: i) the effect of prevailing trends in the general population and ii) the effect of relative mortality in individuals with the disease.
The time trend of disease prevalence, represented by the first derivative of the age-adjusted prevalence, has seven terms. The main contributions are Tinc(y) and TS(y) that reflect effects of trends in disease incidence and survival after the disease onset. Occurrence of five other terms reflects the fact that we observe individual follow-up not from their birth date. They can be combined in two terms reflecting the effects on two bounds x=x0 and y=y00: T0 (y) = Tp0(y) + TS̄(y) and T00(y) = Tp00 (y) + TS̄00(y) + TX00(y), respectively. The terms Tp0(y)and TS̄(y) reflect the effects of time trends in initial prevalence (i.e., prevalence at x0 = 65) and trends in survival of these individuals. The contributions of these terms can be considered separately if the respective hypotheses are of interest. The three terms contributing to T00 (y) (i.e., Tp00(y), TS̄00(y), and TX00(y)) are the only terms contributing to disease prevalence that survive in the limit y→y00. They are responsible for the reconstruction of the correct derivative in the region of y~y00. Specifically, the first and second terms characterize the time trend in initial prevalence and survival for y=y00, respectively. The last term equals age-adjusted incidence rate in the limit y→y00. Its occurrence reflects the lack of information about incidence before y00. With time the fraction of unknown information about incidence goes down and the contribution from this term to the total time trends decreases.
Similarly, seven of the nine terms in the decomposition of mortality can be combined into four terms: T̂inc (y) and T̂S(y)represent the effects of incidence and survival for individuals diagnosed after x0 and y00, while T̂0 (y) = T̂p0(y) + T̂S̄(y) and T̂00(y) = T̂p00(y) + T̂S̄00(y) + T̂X00(y) reflect the effects on two bounds x = x0 and y=y00. Two additional terms occurring in the formula for mortality are T̂μ(y) and T̂P (y). They reflect the effects of trends in mortality in the general population and in prevalence of the given disease.
3. Statistical Estimation of Model parameters from observational data
The quantities of interest (i.e., T(y) and T̂(y)) are expressed in terms of derivatives of survival analysis functions in respect of time. In our approach, we use explicit analytic parameterization for all functions for which derivatives are needed. An alternative approach based on numerical differentiation would require us to deal with numerical instabilities typical for numerical evaluation of derivatives. Since integration is performed numerically, the integrand must be calculated with maximal accuracy—this condition is satisfied by our approach involving analytic differentiation of the parametric models of these functions. Specifically we need to develop and estimate three disease-specific models for a specific disease which are involved in the expressions for prevalence and mortality as well as their derivatives: i) models for prevalence at x0 (i.e., prevalence at the starting age of observation) and all years y≥y00, and prevalence at y00 (i.e., at the beginning year of observation) and all ages x≥x0, ii) the model for the incidence rate for all x≥x0 and y≥y00; and iii) models for relative survival of individuals prevalent at x0, prevalent at y00, and incident at x>x0 and y>y00. Furthermore, for the modeling of incidence-based mortality, models for mortality in the general population need to be developed.
We use individual medical records from a nationally representative 5% sample of Medicare beneficiaries age 65+ to estimate the model parameters for the models enumerated above. Medicare data provide individual records for individuals above age 65 (i.e., x0 = 65) and starting in 1992 (i.e., y00 = 1992). Collection of all records with the disease-specific ICD-9-codes for an individual allows us to reconstruct individual disease-specific trajectories and then create the following datasets for further analyses using the methods from Akushevich et al. (Akushevich et al. 2012):
-
D1
Prevalence rates for boundaries of the region (Figure 1), i.e., one-year-specific prevalence rates i) for x0 and all years y≥y00 and ii) for y00 and all ages x≥x0.
-
D2
Incidence rates in one-year groups over age and year.
-
D3
Individual survival times. The dataset contains individual records including age and year at the first record interpreted as incident or prevalent cases and time to death/censoring. For prevalent cases x=x0 or y=y00.
-
D4
Prevalence and mortality in one-year groups over age and year. This dataset will be only used for comparison to the results of modeling of these measures.
Estimation strategy of model parameters involves B-splines in order to evaluate y - and x00 -dependences occurring in the expressions for prevalence, mortality, and their derivatives. An important feature of B-splines necessary for our study is that they allow for the calculation of derivatives explicitly and without additional simplifying assumptions. Other dependencies such as age-dependencies of incidence, survival, and mortality in the general population as well as survival time dependence are modeled by appropriate (known or empirically based) models adopted for them, such as the linear model of disease incidence, the Gompertz model for age patterns for mortality in the general population, and the Weibull model for survival time distribution.
3.1. Model for prevalence at boundaries
First, the y0-dependence of initial prevalence (dataset D1) are modeled using B-splines as P(x0, y0) = ΣiαiBi,n(y0), where n is the degree of B-splines (n = 3 in our analysis) and i runs over all B-splines the number of which is defined by the number of used knots. The functions Bi,n(y0) are polynomial functions completely known when the sets of knots are fixed and parameters αi are subject for estimation. The first derivative of P(x0,y0) is then explicitly calculated because is represented in terms of B-splines of a lower degree for that we also have explicit representation. Note also that the approach gives the derivative of P(x0, y0) with respect to y0, however since y0 = y−x+x0 it is equal to the derivative with respect to y: dP(x0,y0)/dy0 = dP(x0,y0)/dy. Similarly, B-splines provide the fit for x00 (where x00 = y00 − y+x) dependence of P(x00,y00) together with the first derivative in respect of x00 thus providing the derivative in respect of y : dP(x00,y00)/dx00 = −dP(x00,y00)/dy. Empirical estimates and the B-spline models for both y0 - dependence of P(x0,y0) and x00-dependence of P(x00,y00) are shown in Figure 2.
3.2. Model for incidence
Assume that for each yd the age-dependence of incidence rates I(τ, yd) from dataset D2 is explicitly parameterized through the sets of model parameters dependent on yd (e.g., linearly , and yd -dependence of each parameter is fitted by B-splines providing the first derivative . Thus
Figure 3 presents the age- dependence of age-specific rates for two selected years (left panel) and yd - dependence of age-adjusted incidence rates (right panel) together with the B-spline models fitting yd - dependences of age-specific rates with subsequent age-adjustment for the second case. The results presented in Figure 3 justify the choice of the linear model for the age-specific rates of diabetes (another model can be chosen for another disease). Note that the age adjusted rates can be represented by a linear model only approximately and the spline approximation that provides partial smoothing of this effect could be an alternative.
3.3. Models for survival
The models describing age- and survival-time-dependences for the three specific relative survival functions S̄r(x−x0, x0,y0), S̄r(y−y00,x00,y00), and have to be specified, parameterized, and estimated. We use the approach based on maximizing the likelihood for individual survival data (Dickman et al. 2004), which can be outlined in general terms as follows and specified for the three relative survival functions below. An individual i in dataset D3 is characterized by i) the age of diabetes diagnosis or initial age of follow-up (x0i), ii) final age of follow-up (xi), and iii) the death/censoring indicator di at age xi. Denoting survival function for an individual i as S(xi,x0i)and using the standard likelihood for total survival L = ∏i (h(xi))diS(xi, x0i) and the definitions of relative survival, S(xi, x0i) = St(xi, x0i) Sr = (xi, x0i; β), and respective hazard functions h(x) = ht(x) + hr(x,β), we construct the log likelihood as
(16) |
Here β is the set of parameters for the relative survival and respective hazard. The first term does not depend on β and therefore can be omitted. The only item that we need to know about the general population is the population hazards at the age of death for all individuals in the datasets. This information is obtained from the Human Mortality Database.
Specific parameterization is required to describe the age- and survival time dependences of relative survival functions. We assume that the Weibull model is flexible enough (Carroll 2003; Zhu et al. 2011) and can be applied for the three relative survival functions involved in (7) and (8).
For S̄r(x−x0,x0,y0) we use S̄r (x−x0,x0, y0) = exp(−exp(σ−1 (log(x−x0) − μ))) in which parameters μ = μ(y0)and σ = σ(y0) are estimated for each y0 using maximizing the likelihood (16), and then y0 - dependences of μ and σ are fitted by B-splines providing derivatives dμ/dy0 = dμ/dy and dσ/dy0 = dσ/dy. Thus,
The partial derivatives are calculated explicitly and derivatives of μ and σ in respect to y0 are provided by B-splines.
Similarly, we use S̄r(y−y00, x00, y00) = exp(−exp(σ−1 (log(y−y00) − μ))), and parameters μ = μ(x00) and σ = σ(x00) depend on x00 and are estimated using (16). The dependences are given by B-splines and dμ/dx00 = dμ/dy and dμ/dx00 = −dμ/dy. Now the relative survival function depends on y explicitly, therefore
Finally, we use Sr(x − τ, τ, yd) = exp(−exp(σ−1(log(x − τ) − μ))) where the dependence of μ and σ on τ are explicitly represented through the quadratic function of τ with the sets of parameters and and each parameter is yd-specific (i.e., estimated for each year yd). Then yd dependence of each parameter is fitted by B-splines providing respective derivatives . Therefore,
Figure 4 presents the results for the relative survival functions. Projections of time survival for selected ages and years are shown for all three survival functions involving in (7) and (8). Note that empirical estimates of relative survival are not necessary for our modeling so they are not presented in Figure 4.
4. Partitioning for Diabetes Prevalence and Mortality and Their Time Trends
Application of estimated models to 5%-Medicare data resulted in predicted age-adjusted prevalence, P(y), and incidence-based mortality, M(y), according eqs. (9) and (10). Their patterns are presented by thick lines in the upper panels of Figure 5; the actual empirical patterns (dots) are provided for comparison. The three terms (P0(y), P00(y), and Pis(y)) contributing to prevalence and four terms (MPμ(y), M0(y), M00(y), and Mis(y)) contributing to the mortality rate according to eq. (11) are also shown. The curves in these plots are marked by labels corresponding to subscripts of the respective contributions from eq. (11). Excellent agreement between the theoretical predictions and the empirical estimates are detected for both prevalence and the incidence-based mortality of diabetes.
The term with the double integral in eq. (9), Pis(y), that contains the product of incidence and relative survival gives the most essential contribution to diabetes prevalence. The contribution of prevalence at 1992, P00(y), decreases with time because of two reasons: i) mortality of individuals prevalent at 1992 is larger than mortality in the general population and ii) the relative contribution of the region above the bisecting line to the integral with respect to x (see Figure 1) decreases with time. In contrast the contribution of prevalence at age 65, P0(y), goes up because of increased prevalence at 65 with time (Figure 2) and the increased contribution of the region below the bisecting line to the integral with respect to x in (9). The major contribution to the incidence-based mortality is the term containing the product of prevalence and mortality in the general population (i.e., MPμ(y). This term would be the only contribution if the mortality rates of individuals with diabetes and of the general population are the same. The gap between this term and mortality, given by the thick line (i.e., M(x, y)), is due to three remaining contribution terms showing the effects of the individuals prevalent at the boundaries (M0(y) and M00(y) represented by the curves ‘0’ and ‘00’) and the individuals diagnosed during follow-up (Mis(y) and the curve ‘is’). The patterns of these curves reproduce those observed for prevalence and are explained by the same reasoning.
Both prevalence and mortality increase with time so their derivative over calendar time is positive. Explicit calculations allowed us to evaluate time patterns of the components responsible for these trends of prevalence (12),(13) and mortality (14),(15). The total derivative of prevalence ( and thick curve) largely reproduces the shape of the curve marked by ‘inc’, that is the total prevalence trend is defined primarily by the dynamics of diabetes incidence. The term containing the derivative of incidence, Tinc(y), is negative, i.e., incidence is decreasing over time driving the prevalence downwards. The effect of survival (TS(y) and the curve marked by “S”) pushes the prevalence upwards reflecting increased life-span of patients with diabetes. The curve marked by “0” contains two contributions, T0(y) = Tp0(y) + TS̄(y), the first representing prevalence at 65 is dominant and increasing driving the total prevalence up. Another contribution TS̄(y) reflecting survival of patients prevalent at 65 is positive (similarly to TS(y)) and small. The remaining contribution marked by “00” comprises all effects related to the boundary at y00, i.e., T00(y) = Tp00(y) + TS̄00(y) + TX00(y). This contribution is largely technical because it reflects the fraction of the effects coming from incidence and survival trends before 1992. As expected this fraction decrease with time. In sum, the total prevalence increases over time as the three contributions pulling the prevalence up overpower the downward effect of incidence.
The presentation of the partitioning of mortality trends (Figure 5d) largely reflects the picture obtained for prevalence in Figure 5c. The most important contribution to the mortality trend is the term containing the derivative of prevalence T̂P(y) (marked by “P”, thick dashed curve). Its shape reproduces the shape of total prevalence change (thick line in Figure 5c) and deviates only because of the factor containing the mortality rate in the general population which is time dependent. Other curves on Figure 5d reflect the effects of relative survival and respective mortality. Although the size of the effects is not large (as follows from Figure 5b), they can have significant contributions to the mortality time trend (Figure 5d). Their signs, sizes, and time trends reflect what we observed for prevalence with the exception of the terms reflecting survival that change sign. For example, incidence and survival results in decreasing mortality. However, mortality still increases. This is a consequence of negative tendencies in past (i.e., before 1992) that are represented by the term “00”, i.e., T̂00(y) = T̂p00(y) + T̂S̄00(y) + T̂X00(y).. An additional term, T̂μ(y), is marked by μ (dashed curve), this reflects the effect of the general change in mortality in the general population, that is the mortality of patients with diabetes not related to diabetes itself (i.e., death due to other causes) is going down just as in the general population.
5. Discussion and Conclusion
In this study, we developed an approach for the modeling of disease prevalence and incidence-based mortality (i.e., mortality for individuals who had a diagnosis earlier in life). The model provides analytical expressions for these epidemiologic characteristics which allows for the analysis of the relative contributions of incidence and survival as important components of total prevalence and incidence-based mortality.
The approach provides expressions for the partitioning of the time trends of these quantities. All of the components that are responsible for trends in disease prevalence and mortality are evaluated through explicit expressions. These components include disease incidence and survival as well as effects at the boundaries of the region available for analysis (in the case of 5%-Medicare we observe the effects after 1992 for individuals aged 65+). For mortality, additional information about the mortality rate in the general population is required.
The results of the partitioning analyses are presented in Figure 5 and described in detail in “Partitioning for Diabetes Prevalence and Mortality and Their Time Trends” earlier in this text. Based on this analysis we can conclude that i) the theory describes empirical estimates for prevalence and mortality with good accuracy, ii) among the possible contributions T(y) and T̂(y) in eq. equation (12) and (14), the contributions of incidence and survival after age 65 have the greatest effect on diabetes prevalence and diabetes-related mortality, and iii) the dynamics of diabetes prevalence and mortality are generated by causes consistent with improvements in population health: decreased incidence and improved survival.
Use of our methodology offers new opportunities in public health. Researchers obtain the opportunity to clearly identify the sources of observed processes at the level of disease prevalence and mortality. The methodology presented in this paper provides a formal method for the decomposition of an observed trend in prevalence and incidence-based mortality into their constituent parts. Practically, this can be used as a public health planning tool, to identify areas of concern which, either due to the size of the effect, the direction of the trend, or the observed rate of change, require targeted attention from health agencies. Furthermore, over time improvements in diagnostic technology and the body of knowledge on the pathological characteristics of a disease lead to improved ascertainment (i.e. the ability to identify the presence of a disease) and more tightly defined guidelines for making a valid diagnosis. Improved ascertainment is likely to lead to an increase in the incidence of a disease as individuals who were previously left undiagnosed are identified. The effect of changes in diagnostic guidelines is more ambiguous as depending on whether elements were added or removed from the definition of a valid diagnosis incidence and by extent prevalence could be pushed in either direction. A standardized method for the partitioning of an existing prevalence trend into the time-trends of its components and, more importantly, the relative strength of the contribution of each component over time, will aid in both correctly assessing the relative success of a health intervention and identifying time-periods of special interest for more in depth analysis (e.g. a sharp spike in the relative strength of the contribution associated with incidence could indicate either an area of public health interest, or an improvement in ascertainment).
Our approach lends itself to multiple natural generalizations allowing for the estimation and partitioning of quantities such as i) the effects of disease-specific medical costs (respective costs are added into integrand) (Akushevich et al. 2011; Akushevich et al. 2016), ii) the effects of recovery and/or long-term remission (respective survival functions have to be used as additional factors) (Akushevich et al. 2013b), and iii) the effects of complications for patients with a specific disease (specific patient selections and respective changes in mathematical formalism have to be included) (Akushevich et al. 2013a; Yashkin, Picone and Sloan 2015). The expressions for prevalence and mortality can also be used in improving the accuracy of future projections in health forecasting.
The components contributing to time trends are obtained not using a fitting procedure and/or maximum likelihood, but direct calculation using expressions representing each component as continuous functions of available data. Statistical estimates of parameters characterizing the time patterns of prevalence at the bounds, incidence and survival are obtained using B-splines. Since B-splines provide consistent estimates of model parameters (Strawderman and Tsiatis 1996) and the trend components are continuous functions of the B-spline parameters, the estimates of the trend components are consistent. The estimates are largely model independent; therefore, the risk of model misspecification is minimal. The only model we use is the Weibul model for survival time. The model is quite flexible and its choice is not critical for the estimation procedure: two-dimensional splines for survival can be used instead.
The level of detail our approach provides is highly dependent on the length and scope present in the data. When Medicare data or a dataset of a similar size is used, statistical uncertainties are not expected to be large. In the general case the statistical uncertainty has to be estimated using a bootstrapping approach or through analytic estimates of error propagation. However, systematic uncertainties (biases) could be noticeable. Possible sources for the systematical uncertainties include i) the possibility of non-precise separation of incident and prevalent cases, ii) the effect of time trends in the fraction of individuals covered by Medicare Advantage (a private alternative to traditional Medicare which does not contribute data to Medicare datasets), and/or iii) changes in the structure of the population of Medicare beneficiaries due to specific events such as initiation of Medicare coverage of Part D in 2006. Separate research to evaluate the contributions of these factors to the total systematical error is required. In sum, the estimation of the time trend components is consistent and stable, however further investigation of systematic uncertainties would improve overall accuracy of the estimates.
In summary, notable strengths of our approach include: i) modeling of all components used in our models using explicit expressions; ii) lack of simplifying assumptions; iii) stability and consistency of the resulting estimates; and iv) wide availability of large administrative health data like that used in the study. The application of this approach to the case of diabetes found that both its prevalence and incidence-based mortality increases with time. The primary driving factor of the observed prevalence increase is improved survival and increased prevalence at age 65. The increase in diabetes-related mortality is driven by increased prevalence and unobserved trends beyond the region observed in the data.
Acknowledgments
This study has been supported by the National Institute on Aging (grants R01-AG017473, R01-AG046860, P01-AG043352).
Appendix A. Derivation of expressions for prevalence (1) and mortality (2)
The formulae (1) and (2) can be understood in terms of the numbers of individuals. Let N0 be the size of a birth cohort and NI (τ)Δτ be the number of individuals with disease onset within the age period Δτ. The total number of sick (and alive) individuals at age x(Nd(x)) is the sum of all individuals who survived to age x after diagnoses at τ over all age periods, i.e.,
where S(x − τn, τn) is the survival function of individuals diagnosed at age period [τn, τn + (Δτ)n] who survived to age x. Considering infinitely small age periods (i.e., Δτ → 0) and defining Pc(x, yb) = Nd(x)/N0 and Ic(τ, yb) = NI(τ)/N0, we obtain the formula . Then the formula (1) is obtained when we split the integration region in two parts (from 0 to x̄0 and from x̄0 to x) and use respective notation for the first part: . Exactly the definition of S̄(x − x̄0, x̄0, ȳ0) is:
The survival function in the numerator can split as S(x − τ, τ, yd) = S(x − x̄0, x̄0, ȳ0|τ)S(x̄0 − τ, τ, yd). The function S(x − x̄0, x̄0, ȳ0|τ) is the survival function for a cohort of patients formed at age x̄0 and time ȳ0 and diagnosed at age τ, τ < x̄0. If the dependence on τ is weak for a disease, we can put S(x − x̄0, x̄0, ȳ0|τ) ≈ S(x − x̄0, x̄0, ȳ0), and therefore obtain S̄(x − x̄0, x̄0, ȳ0) ≈ S(x − x̄0, x̄0, ȳ0). Thus, the difference between empiric estimates or estimated models for the two survival functions reflects the force of dependence of S(x − x̄0, x̄0, ȳ0|τ) on τ.
Let the number of individuals diagnosed at the age interval [τ, τ + Δτ] and then died during age interval [x, x + Δx] is NIM(τ, x). The total number of individuals died in the age interval [x, x + Δx] is defined through the density of the incidence-based mortality M(x, yb) and equals:
Considering infinitely small age periods (i.e., Δτ→0 and Δx→0) and defining we obtain the formula . Splitting the integration region and using definitions of f̄c(x − x̄0, x̄0, ȳ0) similarly as in the case of disease prevalence considered above we obtain eq. (2)
Appendix B. Derivation of the derivatives of prevalence and mortality
In this Appendix technical aspects of derivation of the derivatives (13) and (15) are discussed. Rewrite eq. (9) by rewriting integration limits explicitly:
(17) |
Derivation of the right hand side is based on the Barrow’s Fundamental Theorem of Calculus which can be adopted for our case as: , i.e., when we need to differentiate a function over an argument that is both in the integration limits and in integrand we have the two terms with and without integration. Differentiation of the first term in P(y) results:
The second term of P(y) gives
As we see first terms of these two contributions cancel in the sum resulting in Tp0(y) + TS̄(y) + Tp00(y) + TS̄00(y).
Differentiation of the third term in eq. (17) results:
(18) |
The last term in the expression for P(y) contains y in integration limits for both first and second integral, therefore the Barrow’s theorem has to be applied twice resulting:
(19) |
First terms of (18) and (19) cancel, second terms of (19) is TX00(y), and sum of remaining terms (last two terms of (18) and (19)) gives Tinc(y)+ TS(y). The sum of surviving terms gives finally the right hand side of eq. (12).
The calculation for derivative of mortality is similar.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Akinbami LJ, Moorman JE, Bailey C, Zahran HS, King M, Johnson CA, Liu X. Trends in asthma prevalence, health care use, and mortality in the United States, 2001–2010. NCHS data brief. 2012;94(94):1–8. [PubMed] [Google Scholar]
- Akushevich I, Kravchenko J, Akushevich L, Ukraintseva S, Arbeev K, Yashin AI. Medical cost trajectories and onsets of cancer and noncancer diseases in US elderly population. Computational and mathematical methods in medicine. 2011;2011:857892. doi: 10.1155/2011/857892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akushevich I, Kravchenko J, Arbeev KG, Ukraintseva SV, Land KC, Yashin AI. Biodemography of Aging. Springer; 2016. Medical Cost Trajectories and Onset of Age-Associated Diseases; pp. 143–162. [Google Scholar]
- Akushevich I, Kravchenko J, Ukraintseva S, Arbeev K, Kulminski A, Yashin AI. Morbidity risks among older adults with pre-existing age-related diseases. Experimental gerontology. 2013a;48(12):1395–1401. doi: 10.1016/j.exger.2013.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akushevich I, Kravchenko J, Ukraintseva S, Arbeev K, Yashin AI. Age Patterns of Incidence of Geriatric Disease in the US Elderly Population: Medicare+Based Analysis. Journal of the American Geriatrics Society. 2012;60(2):323–327. doi: 10.1111/j.1532-5415.2011.03786.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akushevich I, Kravchenko J, Ukraintseva S, Arbeev K, Yashin AI. Recovery and survival from agingassociated diseases. Experimental gerontology. 2013b;48(8):824–830. doi: 10.1016/j.exger.2013.05.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bauer UE, Briss PA, Goodman RA, Bowman BA. Prevention of chronic disease in the 21st century: elimination of the leading preventable causes of premature death and disability in the USA. The Lancet. 2014;384(9937):45–52. doi: 10.1016/S0140-6736(14)60648-6. [DOI] [PubMed] [Google Scholar]
- Canudas-Romo V. Decomposition methods in demography. Rozenberg Publishers; 2003. [Google Scholar]
- Carroll KJ. On the use and utility of the Weibull model in the analysis of survival data. Controlled clinical trials. 2003;24(6):682–701. doi: 10.1016/s0197-2456(03)00072-2. [DOI] [PubMed] [Google Scholar]
- Chu KC, Miller BA, Feuer EJ, Hankey BF. A method for partitioning cancer mortality trends by factors associated with diagnosis: An application to female breat cancer. Journal of clinical epidemiology. 1994;47(12):1451–1461. doi: 10.1016/0895-4356(94)90089-2. [DOI] [PubMed] [Google Scholar]
- Coresh J, Selvin E, Stevens LA, Manzi J, Kusek JW, Eggers P, Van Lente F, Levey AS. Prevalence of chronic kidney disease in the United States. Jama. 2007;298(17):2038–2047. doi: 10.1001/jama.298.17.2038. [DOI] [PubMed] [Google Scholar]
- Dickman PW, Sloggett A, Hills M, Hakulinen T. Regression models for relative survival. Statistics in medicine. 2004;23(1):51–64. doi: 10.1002/sim.1597. [DOI] [PubMed] [Google Scholar]
- Egan BM, Zhao Y, Axon RN. US trends in prevalence, awareness, treatment, and control of hypertension, 1988–2008. Jama. 2010;303(20):2043–2050. doi: 10.1001/jama.2010.650. [DOI] [PubMed] [Google Scholar]
- Horiuchi S, Wilmoth JR, Pletcher SD. A decomposition method based on a model of continuous change. Demography. 2008;45(4):785–801. doi: 10.1353/dem.0.0033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mozaffarian D, Benjamin EJ, Go AS, Arnett DK, Blaha MJ, Cushman M, Das SR, de Ferranti S, Després JP, Fullerton HJ. Executive Summary: Heart Disease and Stroke Statistics-2016 Update: A Report From the American Heart Association. Circulation. 2016;133(4):447. doi: 10.1161/CIR.0000000000000366. [DOI] [PubMed] [Google Scholar]
- Smith RA, Brooks D, Cokkinides V, Saslow D, Brawley OW. Cancer screening in the United States, 2013. CA: a cancer journal for clinicians. 2013;63(2):87–105. doi: 10.3322/caac.21174. [DOI] [PubMed] [Google Scholar]
- Strawderman RL, Tsiatis AA. On consistency in parameter spaces of expanding dimension: an application of the inverse function theorem. Statistica Sinica. 1996:917–923. [Google Scholar]
- Thun MJ, Carter BD, Feskanich D, Freedman ND, Prentice R, Lopez AD, Hartge P, Gapstur SM. 50- year trends in smoking-related mortality in the United States. New England Journal of Medicine. 2013;368(4):351–364. doi: 10.1056/NEJMsa1211127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tunstall-Pedoe H, Kuulasmaa K, Mähönen M, Tolonen H, Ruokokoski E, Amouyel P. Contribution of trends in survival and coronar y-event rates to changes in coronary heart disease mortality: 10-year results from 37 WHO MONICA Project populations. The Lancet. 1999;353(9164):1547–1557. doi: 10.1016/s0140-6736(99)04021-0. [DOI] [PubMed] [Google Scholar]
- Vaupel JW, Romo VC. Decomposing change in life expectancy: A bouquet of formulas in honor of Nathan Keyfitz’s 90th birthday. Demography. 2003;40(2):201–216. doi: 10.1353/dem.2003.0018. [DOI] [PubMed] [Google Scholar]
- Will JC, Yuan K, Ford E. National trends in the prevalence and medical history of angina: 1988 to 2012. Circulation: Cardiovascular Quality and Outcomes. 2014;7(3):407–413. doi: 10.1161/CIRCOUTCOMES.113.000779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yashkin AP, Picone G, Sloan F. Causes of the change in the rates of mortality and severe complications of diabetes mellitus: 1992–2012. Medical care. 2015;53(3):268. doi: 10.1097/MLR.0000000000000309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu HP, Xia X, Chuan HY, Adnan A, Liu SF, Du YK. Application of Weibull model for survival of patients with gastric cancer. BMC gastroenterology. 2011;11(1):1. doi: 10.1186/1471-230X-11-1. [DOI] [PMC free article] [PubMed] [Google Scholar]