Skip to main content
Sage Choice logoLink to Sage Choice
. 2023 Aug 24;32(9):1799–1810. doi: 10.1177/09622802231192950

Smoothing Lexis diagrams using kernel functions: A contemporary approach

Philip S Rosenberg 1,, Adalberto Miranda Filho 1,*, Julia Elrod 1,*, Aryana Arsham 2, Ana F Best 3,*, Pavel Chernyavskiy 4,*
PMCID: PMC11385548  PMID: 37621099

Abstract

Lexis diagrams are rectangular arrays of event rates indexed by age and period. Analysis of Lexis diagrams is a cornerstone of cancer surveillance research. Typically, population-based descriptive studies analyze multiple Lexis diagrams defined by sex, tumor characteristics, race/ethnicity, geographic region, etc. Inevitably the amount of information per Lexis diminishes with increasing stratification. Several methods have been proposed to smooth observed Lexis diagrams up front to clarify salient patterns and improve summary estimates of averages, gradients, and trends. In this article, we develop a novel bivariate kernel-based smoother that incorporates two key innovations. First, for any given kernel, we calculate its singular values decomposition, and select an optimal truncation point—the number of leading singular vectors to retain—based on the bias-corrected Akaike information criterion. Second, we model-average over a panel of candidate kernels with diverse shapes and bandwidths. The truncated model averaging approach is fast, automatic, has excellent performance, and provides a variance-covariance matrix that takes model selection into account. We present an in-depth case study (invasive estrogen receptor-negative breast cancer incidence among non-Hispanic white women in the United States) and simulate operating characteristics for 20 representative cancers. The truncated model averaging approach consistently outperforms any fixed kernel. Our results support the routine use of the truncated model averaging approach in descriptive studies of cancer.

Keywords: Lexis diagram, kernel methods, nonparametric smoothing, Surveillance, Epidemiology, and End Results Program, cancer surveillance research

1. Introduction

The Lexis diagram is a fundamental construct in epidemiology, demography, and sociology. 1 A Lexis diagram is a rectangular grid of square cells with age along one axis and calendar period along the other. Individuals from a surveilled population contribute person-years and events (births, deaths, cancers, etc.) to each cell. Analysis of Lexis diagrams can elucidate temporal patterns and provide clues about the etiology of an event. 2 Notably, the analysis of Lexis diagrams is a cornerstone of cancer surveillance research.

Population-based cancer rates usually exhibit Poisson-type variability. 3 This is the default mode for analysis, although negative binomial models are available that accommodate more flexible mean-to-variance relationships.46 Within any given Lexis diagram, intrinsic variability (“noise”) can mask important signals and limit the power of comparative analyses to identify heterogeneity. In epidemiologic research, the classic approaches deal with intrinsic variability by aggregating granular data (e.g. data originally sampled at the scale of a single-years of age and/or calendar years) into broader age and period categories, 7 typically, 5-year age groups. Unfortunately, there is nothing optimal about such traditional groupings.

Several methods have been proposed to de-noise or smooth an observed Lexis diagram before the usual descriptive79 and analytical 10 methods are applied. Keiding 1 pioneered the first such approach using bivariate kernel functions. This was a major advance; however, this classic method has two major limitations. First, the choice of kernel is arbitrary, hence so too is the implicit bias-variance trade-off. Second, the estimated variance-covariance matrix does not reflect uncertainty arising from kernel (model) selection. In practice, analysts tend to use small bandwidths to minimize bias rather than variance. However, there is no evidence that such choices are optimal.

Other investigators have developed more sophisticated methods. Currie et al. 11 produced smooth Lexis diagrams using bivariate P-splines and second-difference penalty functions. Camarda 12 implemented the Currie approach in R and developed methods to incorporate established demographic constraints using asymmetric penalties. 13 Dokumentov et al. 14 developed a hybrid approach that combines smoothing methods with additional parameters that account for abrupt changes in age incidence by period and cohort. Chien et al.15,16 developed a fully Bayesian smoothing approach using two-dimensional Bernstein polynomials, data-dependent prior distributions, and the Metropolis-Hastings reversible jump algorithm. Martinez-Hernandez and Genton 17 developed nonparametric methods for the situation where the data can be viewed as a functional trend in one temporal dimension (e.g. age) that is modulated over the other dimension (e.g. time).

In this article, we develop a novel and complementary kernel-based approach using truncated bivariate kernel functions1820 and information theory. 21 We refer to these algorithms as “filters” and the outputs as “filtrations” because they do more than simply “smooth” the data: by design, they pass through all signals that materially reduce the within-Lexis mean squared error. Our approach has several attractive features. It is fast, and automatic, provides the smoothest possible kernel-based estimates consistent with data, yields a variance–covariance matrix that takes model selection into account, and has excellent performance.

In Section 2, we review notation and background on Lexis diagrams and kernel functions and assemble a representative panel of incident cancers. In Section 3, we describe the new filters and corresponding variance calculations. In Section 4, we illustrate our proposed methods using invasive estrogen receptor (ER)-negative breast cancer incidence among non-Hispanic white women in the United States, and in Section 6, we simulate our method's operating characteristics over the cancer incidence panel. In Section 7, we give a summary and discuss avenues for future research. We provide technical details in an online supplement. Our R code is freely available upon request.

2. Background

2.1. Event rates on a Lexis diagram

We begin with a brief overview of the Lexis diagram and introduce our notation. 22 A Lexis diagram is a rectangular field with attained age a along the y-axis and calendar time p along the x-axis. For any given population and outcome, individual event times and corresponding person-years at risk are summed beginning at age a0 at calendar time p0 within a grid of square cells defined by A age intervals and P period intervals of width Δ . We can describe the structure of Lexis diagrams using the notation L(a0,p0,Δ,A,P) . For each cell, the event rate is yap=Eap/Oap,a=1,,A,p=1,,P , where Eap is the accumulated number of events and Oap is the corresponding person-years. Each cell can also be referenced by its central value

(a*(a),p*(p))=((a0Δ2)+Δa,(p0Δ2)+Δp)

We will denote a rate matrix ascertained over a Lexis diagram as YA×P , and the vectorized version (stacked column-on-column) as y=vec(Y) . Our working model posits that the cells contain realizations of independent quasi-Poisson random variables with over-dispersion parameter ϕ2 . Given an estimate ϕ~2 of ϕ2 (Section 3.1), the estimated variance–covariance matrix of y is

Var^(y)=Vy=ϕ~2diag(vec([EapOap2]A×P))ϕ~2Σy

2.2. Cancer incidence panel

We extracted authoritative cancer incidence data for the United States from the Surveillance, Epidemiology, and End Results Program's Thirteen Registries Database (SEER-13 23 ) for 50 single-years of age (ages 35–84) and 27 calendar years (1992–2018). We selected a panel of representative scenarios defined by cancer site (14 cancers associated with obesity 24 ), sex, and standard SEER race/ethnicity categories (Table 1 and Supplemental Part A). We redistributed breast cancer cases with missing or unknown ER status to the corresponding age- and year-specific ER positive and negative cells using a validated approach.25,26 The panel includes a total of 1,049,633 incident cases.

2.3. Kernel functions

See Supplemental Part B for details. The standard form of a univariate kernel function 20 is represented by a canonical “shape” function K(s)(μ) that is non-negative and symmetric within the interval [1,+1] with K(s)(μ)du=11K(s)(μ)du=1 . The centered bandwidth-specific kernel is

Kλ(s)(u|u0)={1λλK(s)(vλ)dvK(s)(uu0λ)u[u0λ,u0+λ]0otherwise

where u0 is the center point and λ>0 is the bandwidth. We will use four standard kernels, the boxcar (box), triangle (trg), Epanechnikov (Epan), and triweight (triwt).

We can extend a univariate kernel for use in two isotropic spatial dimensions by expressing its formula as a function of the Euclidean distance between any given point u1=(x1,y1)R2 from a target point u0=(x0,y0),i.e.,u1u0=(x1x0)2+(y1y0)2 . Bivariate kernels have support over the region u0λ={uR2:uu0λ} , that is, a circle of radius λ centered at u0 . Hence

Kλ(s)(u|u0)={1u0λK(s)((x,y)u0λ)dxdyK(s)(uu0λ)uu0λ0otherwise

3. Kernel function analysis

3.1. Kernel filtration

Our kernel function analysis is motivated by the classic domain filtering approach of image processing 27 which itself is motivated by classic kernel methods in statistics.20,28,29 For images, bivariate kernels are used to construct mean or low-pass “blurring” filters for purposes of “de-noising” the observed pixel values and/or reducing the impact of high-frequency signals.30,31 We apply bivariate kernels to Lexis diagrams for the same purposes. A standard kernel function analysis 29 is described in Algorithm 1.

Algorithm 1: kernel function analysis of rates on a Lexis diagram:

  1. Choose a kernel shape box, trg, Epan, or triwt. Choose a bandwidth λap for each cell.

  2. Repeat steps 3–6 for each cell.

  3. Superimpose a masking circle of radius λap over the cell's central value (a*(a),p*(p)) .

  4. Calculate kernel weights for all cells whose mid-points are covered by the masking circle.

  5. Divide each weight in step 4 by their sum (normalization).

  6. Use the normalized weights to calculate a weighted average of the corresponding yap values within the masking circle.

We will refer to the vector of weighted averages as yF ( F for filter, see below), and the corresponding matrix as (YA×P)FYF . These calculations can be described by a linear operator or filter that can be represented by a sparse matrix KF such that yF=KFy . Given an estimate of the over-dispersion parameter ϕ~2 , it follows that Var^(yF)=V^Fy=ϕ~2KFΣyKF .

The evaluation grid is discrete, while the possible bandwidths λ are continuous. We will select values of λ based on discrete odd integer point values k=3,5,7,9, with corresponding bandwidths λ=k+12=2,3,4,5, etc. Our approach allows us to apply a different bandwidth depending on cell coordinates. We will use kernels defined by the size of the target Lexis, the kernel shape, and a chosen point-value k2 for the outer two annuli and k1 for all other cells. Hence, we can extend the generic notation KF using the specific notation: Kk2k1 - shape , or Kk2k1 - shapeA×P when we want to include information about the size of the target Lexis. In what follows we consider the simplest possible kernel, K33 - boxA×P , as a benchmark.

Surprisingly, the bivariate kernels used in Algorithm 1 are invertible or nearly so. It follows that little or no information is discarded by these kernels. Their application allows “smooth” signals to pass through almost unchanged, while at the same time, complex variable signals are downweighted. Although the kernel operators are sparse, their inverses are dense.

We these results in hand we can present our “rule-of-thumb” over-dispersion estimator

ϕ~2([IAP2K33boxA×P+(K33boxA×P)2]y)Σy1([IAP2K33boxA×P+(K33boxA×P)2]y)AP1

See Supplemental Part C for details.

3.2. Adaptive kernel filtration

We hypothesized that we could regularize the output of a filter KF by applying additional smoothing. Intuitively, the kernel smoothers blunt local peaks and fill in local valleys in the rate matrix. The extent to which this happens is greater or lesser if the peaks or valleys cover a smaller or larger domain relative to the kernel's bandwidth. This behavior of KF is entirely characterized by its singular values decomposition (SVD). As illustrated in Supplemental Part D, the singular vectors of KF represent canonical patterns of peaks and valleys that increase in complexity as the magnitude of the corresponding singular values decreases in magnitude. This led us to consider a truncated singular values decomposition approach. 32 A key question is how many singular vectors to discard. We posited that information theory would provide a good answer. 21 By combining these ideas, we developed the adaptive filter described below.

Algorithm 2: Adaptive kernel filtration of rates on a Lexis diagram:

  1. Choose a kernel KF

  2. Construct the SVD of KF=ULV=Udiag(σ1,σ2,,σj,,σAP)V

Center the data using the inverse-variance weighted average, y0=yy¯1AP,y¯=yΣy1y/1APΣy11AP . Centering the data stabilizes calculations in the subsequent steps. Calculate the rule-of-thumb over-dispersion estimator ϕ~2 .

  • 3.

    Consider the sequence of truncated kernels KF:s=Udiag(σ1,σ2,,σs,0,0,,0)V

  • 4.

    For each value of s :

Consider s to be the effective degrees of freedom or edf

Calculate the residual vector r0(s)=y0KF:sy0

Calculate the bias-corrected Akaike information criterion statistic

AICcF(s)=ϕ~2r0(s)Σy1r0(s)+2s+2s2+2sAPs1,s=1,2,,AP2
  • 5.

    The best-fit model uses KF:adp , where adp=argminsAICcF(s) .

  • 6.

    The fitted values are yF:adp=KF:adpy and the estimated variance–covariance matrix conditional on the selected model is Var^(yF:adp)=V^F:adpy=ϕ~2KF:adpΣy1KF:adp .

Use of the bias-corrected penalty term is essential to avoid over-fitting, 21 especially for “small” Lexis diagrams with a relatively small total number of cells. The bias-corrected AIC serves as a compromise between the classic uncorrected AIC, which overfits when the number of unknown terms is large relative to the amount of data, and the classic BIC, which is known to be overly conservative producing poorly fitting results. 21

3.3. Model averaging

We now have three versions of the data: the observed or “raw” data y0 , the filtered data yF based on Algorithm 1, which we will also refer to as yF:all - in because all of the kernel's singular vectors contribute to the filtration, and the adaptive filtration yF:adp based on Algorithm 2. We obtain our penultimate version by applying model averaging as reviewed by Burnham and Anderson. 21

Algorithm 3: Model averaging:

  1. Select a panel of G1 kernels KF1,KF2,,KFG .

  2. Apply Algorithm 2 to identify optimal truncation values adp1for\; KF1,adp2forKF2,,adpG\; forKFG .

  3. Identify the optimal kernel g0=argming{1,2,,G}AICcFg(adpg) . Discard all kernels with ΔFgAICcFg(adpg)AICcFg0(adpg0)>tol . By default, use tol=7 . Values of tol between 7 and 10 have a theoretical justification. 21 Label the H remaining kernels KF1,KF2,,KFH , where 1HG .

  4. For each kernel identified in step 3, identify all cut-off values c whose AICcFh(c) values are within tol of the global minimum attained by model g0 , that is, the sets Ch={c1,,AP:ΔFh(c)=AICcFh(c)AICcFg0(adpg0)tol},h=1,,H . Let nh=|Ch| and M=h=1Hnh . Denote the complete set of “acceptable” models, which includes n1 members based on KF1 , KF1:C1{1},,KF1:C1{n1},n2 members based on KF2 , KF2:C2{1},,KF2:C2{n2} , etc., as KFm,m=1,,M . Assemble the corresponding AICc differentials ΔFm and filtrations yFm=KFmy .

  5. Calculate Akaike weights
    wm=exp(ΔFm2)k=1Mexp(ΔFk2),m=1,,M
  6. Calculate the truncated model average filtration
    ytma=m=1MwmyFm
  7. Calculate the unconditional (model-averaged) variance-covariance matrix
    Var^(ytma)=V^tmay=ϕ~2m=1Mwm(KFmΣyKFm)+m=1Mwm(yFmytma)(yFmytma)

There is no penalty other than computation time for using a comprehensive panel of kernels. We used 48 kernels in our case study and simulations (Supplemental Part B).

3.4. Variance calculations

Each algorithm produces a variance-covariance matrix that allows us to set confidence limits for any given cell, or for any set of linear combinations of cells. Let F denote a filtration; formally, use F=0 to denote the observed data. Then yF is the filtered data, and V^Fy is the corresponding variance-covariance matrix ( V^Fy=V^Fy for Algorithm 1, V^Fy=V^F:adpy for Algorithm 2, and V^Fy=V^tmay for Algorithm 3). Matrices V^Fy and V^F:adpy are conditional variances, that is, conditional on the selected kernel, but V^tmay is unconditional, that is, it incorporates uncertainty arising from the model selection process.

Given C linear combinations of yF encoded in contrast matrix OC×AP , the vector of outputs is zF=OyF and Var^zF=V^Fz=OV^FyO .

To analyze log-transformed rates νF=log(yF) , the delta-method variance-covariance matrix is Var^(νF)=V^Fν=V^Fy(yFyF)1 , where denotes the Hadamard (elementwise) product and (.)p denotes the elementwise p -th power. Application of a contrast matrix O* to the log rates νF yields ζF=O*νF and V^Fζ=O*V^FνO* .

4. Results

4.1. Case study: Invasive ER− breast cancer incidence among non-Hispanic white women

Invasive female breast cancer is the most common malignancy among women, and its epidemiology varies by tumor subtype. 33 Two major subtypes are ER-positive (ER+) and ER-negative (ER−) tumors. 33 For reasons that remain unclear, the incidence of ER− breast cancer has been decreasing over time in many populations around the world, and the rate of decrease over time varies by age.5,3437 In this case study, we apply our new methods to SEER data on invasive ER− breast cancer incidence among non-Hispanic white women (Table 1, No. 1).

Table 1.

Cancer panel. a

Population and tumor Observed cell counts Simulation parameters
No. Sex b Race/ethnicity c Site a Age groups Min (No. 0) Mean Max Sum Δ ϕ2
1 F NHW ER− breast 35–84 12.9 62.2 143 84,013 1 1
2 F NHW ER + breast 35–84 33.7 292 649 394,190 1 1.5
3 F NHB ER− breast 35–84 0 (6) 15.7 48 21,234 1 1
4 F NHB ER + breast 35–84 3 32.9 85 44,476 1 1
5 F NHW Ovary 35–84 2 34.2 73 46,112 1 1.5
6 F HIS Ovary 35–84 0 (30) 5.5 17 7443 1 1.5
7 F NHW Corpus uteri 35–84 4 72.7 198 98,112 1 2
8 F HIS Rectum 35–84 0 (155) 3.1 16 4147 2 1
9 F NHW Thyroid 35–84 1 31.1 94 42,044 1 2
10 F All Gallbladder 45–84 0 (6) 12 37 14,268 1 1
11 M HIS Colon 40–84 0 (9) 11.2 35 13,604 1 1
12 M NHW Colon 35–84 0 (2) 75.6 212 102,017 1 1
13 M NHW Esophagus 45–84 0 (4) 15.1 44 16,267 2 2
14 M All Kidney 35–84 2 62.6 184 84,526 1 1
15 M NHB Kidney 35–84 0 (53) 6.9 30 9289 1 1
16 M API Liver 40–84 0 (13) 9 30 10,930 1 1
17 M NHB Myeloma 45–84 0 (36) 4.6 17 5020 2 2
18 M API Pancreas 45–84 0 (61) 4.6 20 4963 2 1
19 M NHW Rectum 35–84 0 (8) 25.8 66 34,790 1 2
20 M NHW Stomach 45–84 0 (3) 11.3 32 12,188 2 2
a

See Supplemental Part A for details.

b

Female (F), male (M).

c

Non-Hispanic white (NHW), non-Hispanic black (NHB), Hispanic (HIS), Asian and Pacific Islander (API), all races combined (all).

Figure 1(A) presents a heat map of the observed rates per 100,000 women years. There is considerable variability from cell to cell, but the data are not over-dispersed ( ϕ~2=1.062 ). Figure 1(B) presents a canonical plot of log incidence over time within 5-year age groups, and Figure 1(C) plots the cross-sectional log incidence by age within 5-year calendar periods. Incidence appears to be decreasing over time within each age group. However, overlapping values and error bands make it hard to distinguish between many of the curves, and it is difficult to discern in a quantitative sense how fast the rates are changing within and between age groups and calendar periods.

Figure 1.

Figure 1.

Estrogen receptor-negative breast cancer incidence among non-Hispanic white women. Raw data (panels A–C), benchmark kernel (panels D – F), and truncated model average (panels G – I). Left panels: Lexis diagram heat maps. Center panels: Rates over time within 5-year age groups. Right panels: Rates by age within 5-year calendar periods. Shaded envelopes show 95% point-wise confidence limits.

Figure 1(D) to (F) repeats these analyses using outputs from the benchmark all-in kernel. Panel 1D presents a heat map based on y33box:all - in , and panels 1E-F display log-scale curves obtained from ν33box:allin with confidence bands set using V^33box:all - inν . The patterns are clearer, the curves more easily distinguished, and one can appreciate that age incidence is decreasing over time in every age group, especially among women 45–49 and 50–54 years old. Compared to panels 1B to C, the confidence limits are 55% narrower on average.

Figure 1(G) to (I) repeats these analyses using the filtered data ytma and νtma , with log-scale confidence bands set using V^tmaν . Algorithm 3 identified H=6 kernels and a total of M=31 acceptable truncation values. The most heavily weighted kernels were K33triwt:51 and K57triwt:50 (Akaike weights of 0.20 and 0.19, respectively). The patterns are clearest, the curves readily distinguished, and the confidence bands are 71% narrower. The TMA kernel is illustrated in Supplemental Figure D4.

4.2. Case study: Averages, gradients, and trends

Graphs provide insight, but no descriptive study is complete until salient features that may be apparent in graphs are quantified using objective and reproducible statistics. There is latitude regarding the precise feature set8,10; widely used features obtain from averages, gradients, and trends. We consider five features that together quantify the marginal effects of period and age as well as interactions between period and age.

  1. The marginal period curve mpcFν and its gradient mpcFν . The marginal period curve mpcFν is the log rate averaged over the age group, one value for each calendar period. The gradient mpcFν is the first difference of mpcFν divided by the bin width Δ .

  2. The marginal age curve macFν and its gradient macFν . The marginal age curve macFν is the log rate averaged over period, with one value for each age group. The gradient macFν is the first difference of macFν divided by the bin width Δ .

  3. The slope of the age-specific log rates over time, Fπ|α , one slope for each age group.

Each feature is a linear function of the log rates; therefore, each can be extracted using a corresponding contrast matrix O* whose values depend only on the structure of the corresponding Lexis diagram L(a0,p0,Δ,A,P) . See Supplemental Part E for details.

Figure 2(A) to (C) presents these features calculated from the observed data ν0 , Figure 2(D) to (F) from ν33box:allin , and Figure 2(G) to (I) from νtma . Looking down the columns, the estimates become increasingly regular and the confidence limits narrower. The marginal period curve based on model averaging ( mpctmaν , Figure 2(G), left axis) decreases over time, by around 2% per year through 2004 ( mpctmaν , Figure 2(G), right axis), and by around 3% per year circa 2005–2013. Subsequently, the gradient approaches 0 as the curve approaches a plateau.

Figure 2.

Figure 2.

Breast cancer averages, gradients, and trends. Features extracted from the data are shown in Figure 1. Raw data (panels A–C), benchmark kernel (panels D – F), and truncated model average (panels G – I). Left panels: Marginal period curve (left axis) and gradient (right axis). Center panels: Marginal age curve (left axis) and gradient (right axis). Right panels: Age-specific period trends. Gradient estimates in the left and center panels are trimmed to exclude the first and last time points. Shaded envelopes show pointwise 95% confidence limits.

The marginal age curve ( mactmaν , Figure 2(H), left axis) increases continuously until circa age 72 at a decreasing rate ( mactmaν , Figure 2(H), right axis). Subsequently, the curve attains a plateau and then slowly decreases.

The incidence decreases over time in every age group at a rate that varies by age ( tmaπ|α , Figure 2(I)). The greatest decreases of around 3.25% per year occurred circa ages 45–54, consistent with the curves shown in Figure 1(H).

4.3. Simulation studies

We assessed the operating characteristics of our algorithms by simulating the 20 cancers in Table 1. See Supplemental Part F for details. In five scenarios, we blocked the data into 2×2 cells, and in nine, we incorporated over-dispersion ( ϕ2>1 ). We considered three approaches: raw data; benchmark all-in kernel (Algorithm 1 with K33boxA×P ); and truncated model average (TMA; Algorithm 3 with 48 candidates). We simulated 500 Lexis diagrams for each scenario. We tracked the performance of benchmark and TMA for heat maps (Figure 1(D) and (G)) and features (Figure 2(D) to (I)). To summarize performance, we calculated the percentage reduction in average root mean squared error ( RMSE ) for benchmark versus raw and for TMA versus raw. We also examined bias and variance per cell (heat maps) and per abscissa value (features).

In every scenario, TMA produced more accurate heat maps than the benchmark kernel (Figure 3(A)). Compared to analyzing the raw data, the benchmark kernel reduced the mean percentage error by 58% on average over the 20 scenarios, versus 74% for TMA.

Figure 3.

Figure 3.

Arrow plots of simulation results. Rows correspond to 20 cancers summarized in Table 1. Panels correspond to features. Blue circles show a percent reduction for benchmark kernel versus raw, and yellow triangles show a percent reduction for truncated model average (TMA) versus raw.

Performance gains for mpc were comparatively modest (Figure 3(B)). In Scenario 15 (M Kidney NHB), the benchmark kernel performed slightly better. TMA was uniformly superior to the benchmark for estimating the gradient mpc (Figure 3(C); 85% mean reduction versus 68%), and TMA achieved similar gains for mac and mac (Figure 3(D) and (E), respectively). TMA was uniformly superior for estimating the slopes π|α (Figure 3(F); 61% vs. 46%).

How did TMA achieve such gains in performance? The rule-of-thumb over-dispersion estimator, estimated in step 2 and used in step 4 of Algorithm 2, appears adequate (Figure 4(A)). On average over the 20 scenarios, the median value of ϕ~2 (squares) exceeded the true value (triangles) by just 0.043 or 3.1%, and the ninety percent error bars for ϕ~2 varied by just ±0.16 ( ±11.0% ) around the corresponding median (squares).

Figure 4.

Figure 4.

Operating characteristics of truncated model average. (A) Rows correspond to 20 cancers summarized in Table 1. Median (squares) and 90% limits (bars) of the estimated over-dispersion parameter ( ϕ~2 ), versus true values (triangles). (B) Box plots of effective degrees of freedom ( edf ). (C) Best-fit kernels, frequency of selection across 20 cancers.

In all scenarios, edf values were one or two orders of magnitude smaller than the maximum possible edf of AP2 (Figure 4(B)). Hence, truncated kernels were always superior to any fixed all-in kernel regardless of kernel shape. Small values of k2 and k1 were selected by TMA a large majority of the time (Figure 4(C)), and all four shapes contributed. Interestingly, amongst the four kernel shapes, the boxcar was selected the least (6.5%) and triweight the most (73%).

5. Discussion

We developed a novel non-parametric approach to regularize (“smooth”) event rates ascertained over Lexis diagrams. Our methods borrow smoothing concepts from time series analysis, that is, k-point moving averages, and classic multivariate kernel methods in statistics 29 and image processing, that is, filtering and singular values decomposition. Our approach uses statistical information theory, specifically, the bias-corrected AIC, 21 to handle the model selection problem and provide a variance–covariance matrix that takes model selection into account.

Our truncated model averaging approach adds to the armamentarium and complements existing non-parametric12,13,17 and Bayesian 15 methods. The kernel-based methods developed by Duong and Hazelton 29 are closest in spirit to our approach. In our approach, we use bias-corrected AIC versus cross-validation, and we use truncated kernels versus full kernels, which, as we show, substantially increases accuracy.

Our approach to selecting effective degrees of freedom to quantify the complexity of the underlying Lexis diagram is similar to approaches used to select basis functions in generalized additive models. 38 Furthermore, Gaussian process-based smoothing can be viewed as special case of our new bivariate methods since the Matérn covariance used in previous work 39 encompasses both the Exponential and Gaussian kernels, both of which are similar to the triweight kernel. Interestingly, in our simulations, the triweight kernel was by far the most frequently selected kernel shape.

Examination of smoothed Lexis diagrams provides a good overview. Subsequently, scientific conclusions obtain from quantifications of averages, gradients, and trends. As illustrated by our case study, not only does the truncated model averaging approach provide appealingly smoothed heat maps (Figure 1), but corresponding linear combinations of the smoothed values are also substantially more precise (Figure 2).

As shown by our simulation studies, when the cells of the observed Lexis diagrams are statistically independent quasi-Poisson variates, Algorithm 3 (truncated model averaging) is superior to any fixed kernel. Indeed, compared to our benchmark kernel, the truncated model average reduced the intrinsic root mean squared error by 60%–86% depending on the feature.

We focused here on Lexis diagrams, but our methods and software can be applied when the data follow an approximate multivariate normal distribution with a full-rank covariance matrix known up to a scale parameter. Sparse cell counts are an issue in the Poisson case. We implemented our methods using the Normal approximation to the Poisson distribution, which worked well in all scenarios considered (Table 1). If single-year data are sparse (numerous zeros) one can increase the bin width. Future research might consider incorporating a standard or zero-inflated Poisson log-likelihood function. More advanced kernels could also be investigated, for example, steering kernels. 27 Indeed, our model averaging approach could be extended to include estimates obtained using complementary methods such as steering kernels or splines.

In cancer surveillance research few studies examine a single Lexis diagram in isolation. Rather, hypotheses are generated by examining related Lexis diagrams defined by sex, race/ethnicity, geographic region, tumor characteristics, etc. Invariably the amount of information per Lexis diminishes with increasing stratification. As demonstrated here, our new truncated model averaging approach can advance such descriptive studies.

In future work, truncated model averaging might be extended to incorporate as covariates the effects of sex, race/ethnicity, etc. For example, one could start with a joint parametric fit, for example, a proportional hazards age-period-cohort model, 40 then apply truncated model averaging to the residuals to characterize any lack-of-fit.

Supplemental Material

sj-docx-1-smm-10.1177_09622802231192950 - Supplemental material for Smoothing Lexis diagrams using kernel functions: A contemporary approach

Supplemental material, sj-docx-1-smm-10.1177_09622802231192950 for Smoothing Lexis diagrams using kernel functions: A contemporary approach by Philip S Rosenberg, Adalberto Miranda Filho, Julia Elrod, Aryana Arsham, Ana F Best and Pavel Chernyavskiy in Statistical Methods in Medical Research

Acknowledgements

This research was funded by the Intramural Research Program of the National Cancer Institute, Division of Cancer Epidemiology and Genetics. AMF is also supported through an appointment to the National Cancer Institute (NCI) ORISE Research Participation Program under DOE contract number DE-SC0014664.

Footnotes

Data accessibility: Our freely available R code and example data are available upon request.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the ORISE Research Participation Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute (grant number DOE contract DE-SC0014664, Intramural Research Program).

ORCID iD: Philip S Rosenberg https://orcid.org/0000-0001-6349-9126

Supplemental material: Supplemental material for this article is available online.

References

  • 1.Keiding N. Statistical-inference in the Lexis diagram. Philos T Roy Soc A 1990; 332: 487–509. [Google Scholar]
  • 2.Carstensen B. Age-period-cohort models for the Lexis diagram. Stat Med 2007; 26: 3018–3045. [DOI] [PubMed] [Google Scholar]
  • 3.Breslow NE, Day NE. Statistical Methods in Cancer Research, Volume 2, The Design and Analysis of Cohort Studies. Oxford: International Agency for Research on Cancer, 1987. [PubMed] [Google Scholar]
  • 4.Froelicher JH, Forjaz G, Rosenberg PS, et al. Geographic disparities of breast cancer incidence in Portugal at the district level: a spatial age-period-cohort analysis, 1998-2011. Cancer Epidemiol 2021; 74: 102009. [DOI] [PubMed] [Google Scholar]
  • 5.Lynn BCD, Chernyavskiy P, Gierach GL, et al. Decreasing incidence of estrogen receptor-negative breast cancer in the United States: trends by race and region. J Natl Cancer Inst 2022; 114: 263–270. DOI: 10.1093/jnci/djab186 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Chernyavskiy P, Little MP, Rosenberg PS. Spatially varying age-period-cohort analysis with application to US mortality, 2002–2016. Biostatistics 2020; 21: 845–859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.National Cancer Institute SEER*Stat software, Version 8.4.0. Surveillance Research Program, 2022.
  • 8.Robertson C, Boyle P. Age-period-cohort models of chronic disease rates. II: graphical approaches. Stat Med 1998; 17: 1325–1340. [DOI] [PubMed] [Google Scholar]
  • 9.Devesa SS, Donaldson J, Fears T. Graphical presentation of trends in rates. Am J Epidemiol 1995; 141: 300–304. [DOI] [PubMed] [Google Scholar]
  • 10.Fay MP, Tiwari RC, Feuer EJ, et al. Estimating average annual percent change for disease rates without assuming constant change. Biometrics 2006; 62: 847–854. [DOI] [PubMed] [Google Scholar]
  • 11.Currie ID, Durban M, Eilers PHC. Smoothing and forecasting mortality rates. Stat Model 2004; 4: 279–298. [Google Scholar]
  • 12.Camarda CG. Mortalitysmooth: an R package for smoothing Poisson counts with P-splines. J Stat Softw 2012; 50: 1–24.25317082 [Google Scholar]
  • 13.Camarda CG. Smooth constrained mortality forecasting. Demogr Res 2019; 41: 1091–1130. [Google Scholar]
  • 14.Dokumentov A, Hyndman RJ, Tickle L. Bivariate smoothing of mortality surfaces with cohort and period ridges. Stat-Us 2018; 7: e199. [Google Scholar]
  • 15.Chien LC, Wu YJ, Hsiung CA, et al. Smoothed Lexis diagrams with applications to lung and breast cancer trends in Taiwan. J Am Stat Assoc 2015; 110: 1000–1012. [Google Scholar]
  • 16.Chien LH, Tseng TJ, Chen CH, et al. Comparison of annual percentage change in breast cancer incidence rate between Taiwan and the United States: a smoothed Lexis diagram approach. Cancer Med-Us 2017; 6: 1762–1775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Martinez-Hernandez IG, Genton MG. Nonparametric trend estimation in functional time series with application to annual mortality rates. Biometrics 2021; 77: 866–878. [DOI] [PubMed] [Google Scholar]
  • 18.Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. Stanford, CA: Standford University, 2002. [Google Scholar]
  • 19.Cleveland WS. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 1979; 74: 829–836. [Google Scholar]
  • 20.Ramlauhansen H. Smoothing counting process intensities by means of kernel functions. Ann Stat 1983; 11: 453–466. [Google Scholar]
  • 21.Burnham KP, Anderson DR. Multimodel inference—understanding AIC and BIC in model selection. Sociol Method Res 2004; 33: 261–304. [Google Scholar]
  • 22.Rosenberg PS. A new age-period-cohort model for cancer surveillance research. Stat Methods Med Res 2019; 28: 3363–3391. [DOI] [PubMed] [Google Scholar]
  • 23.Institute NC. Surveillance, Epidemiology, and End Results (SEER 13, Plus) Program Populations (1992-2018). (www.seer.cancer.gov/popdata), National Cancer Institute, DCCPS, Surveillance Research Program, released February 2022. February 2022S ed.: National Cancer Institute, 2022.
  • 24.Sung H, Siegel RL, Rosenberg PS, et al. Emerging cancer trends among young adults in the USA: analysis of a population-based cancer registry. Lancet Public Health 2019; 4: E137–E147. [DOI] [PubMed] [Google Scholar]
  • 25.Anderson WF, Katki HA, Rosenberg PS. Incidence of breast cancer in the United States: current and future trends. J Natl Cancer Inst 2011; 103: 1397–1402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Howlader N, Noone AM, Yu M, et al. Use of imputed population-based cancer registry data as a method of accounting for missing information: application to estrogen receptor status for breast cancer. Am J Epidemiol 2012; 176: 347–356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Milanfar P. A tour of modern image filtering. IEEE Signal Proc Mag 2013; 30: 106–128. [Google Scholar]
  • 28.Wand MP, Jones MC. Comparison of smoothing parameterizations in bivariate kernel density-estimation. J Am Stat Assoc 1993; 88: 520–528. [Google Scholar]
  • 29.Duong T, Hazelton ML. Cross-validation bandwidth matrices for multivariate kernel density estimation. Scand J Stat 2005; 32: 485–506. [Google Scholar]
  • 30.Takeda H, Farsiu S, Milanfar P. Kernel regression for image processing and reconstruction. IEEE Trans Image Process 2007; 16: 349–366. [DOI] [PubMed] [Google Scholar]
  • 31.Takeda H, Farsiu S, Milanfar P. Regularized kernel regression for image deblurring. Conf Rec Asilomar C 2006: 1914–191+. DOI: 10.1109/Acssc.2006.355096 [DOI] [Google Scholar]
  • 32.Hansen PC. The truncated SVD as a method for regularization. BIT 1987; 27: 534–553. [Google Scholar]
  • 33.Anderson WF, Rosenberg PS, Prat A, et al. How many etiological subtypes of breast cancer: two, three, four, or more? J Natl Cancer Inst 2014; 106: dju165. DOI: 10.1093/jnci/dju165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Tuan AW, Lynn BCD, Chernyavskiy P, et al. Breast cancer incidence trends by estrogen receptor status among Asian American ethnic groups, 1990–2014. Jnci Cancer Spect 2020; 4: pkaa005. DOI: 005.10.1093/jncics/pkaa005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Mullooly M, Murphy J, Gierach GL, et al. Divergent oestrogen receptor-specific breast cancer trends in Ireland (2004-2013): amassing data from independent Western populations provide etiologic clues. Eur J Cancer 2017; 86: 326–333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Rosenberg PS, Barker KA, Anderson WF. Estrogen receptor status and the future burden of invasive and in situ breast cancers in the United States. J Natl Cancer Inst 2015; 107: djv159. DOI: 10.1093/jnci/djv159 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Anderson WF, Rosenberg PS, Petito L, et al. Divergent estrogen receptor-positive and -negative breast cancer trends and etiologic heterogeneity in Denmark. Int J Cancer 2013; 133: 2201–2206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Wood SN. Generalized additive models: an introduction with R. Boca Raton, FL: CRC Press, 2017. [Google Scholar]
  • 39.Chernyavskiy P, Little MP, Rosenberg PS. Correlated Poisson models for age-period-cohort analysis. Stat Med 2018; 37: 405–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Rosenberg PS, Anderson WF. Proportional hazards models and age-period-cohort analysis of cancer rates. Stat Med 2010; 29: 1228–1238. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-docx-1-smm-10.1177_09622802231192950 - Supplemental material for Smoothing Lexis diagrams using kernel functions: A contemporary approach

Supplemental material, sj-docx-1-smm-10.1177_09622802231192950 for Smoothing Lexis diagrams using kernel functions: A contemporary approach by Philip S Rosenberg, Adalberto Miranda Filho, Julia Elrod, Aryana Arsham, Ana F Best and Pavel Chernyavskiy in Statistical Methods in Medical Research


Articles from Statistical Methods in Medical Research are provided here courtesy of SAGE Publications

RESOURCES