Abstract
In examining the association between vaccines and rare adverse events after vaccination in post-licensure observational studies, it is challenging to define appropriate risk windows because pre-licensure randomized clinical trials provide little insight on the timing of specific adverse events. Past vaccine safety studies have often used pre-specified risk windows based on prior publications, biological understanding of the vaccine, and expert opinion. Recently, a data driven approach was developed to identify appropriate risk windows for vaccine safety studies that use the self-controlled case series design. This approach employs both the maximum incidence rate ratio and the linear relation between the estimated incidence rate ratio and the inverse of average person time at risk, given a specified risk window. In this paper, we present a scan statistic that can identify appropriate risk windows in vaccine safety studies using the self-controlled case series design while taking into account the dependence of time intervals within an individual and while adjusting for time-varying covariates such as age and seasonality. This approach uses the maximum likelihood ratio test based on fixed effects models, which has been used for analyzing data from self-controlled case series design in addition to conditional Poisson models.
Keywords: self-controlled case series, adverse events after vaccination, fixed effects model, scan statistics, maximum likelihood ratio test
1. Background
Observational studies have been used to examine the safety of vaccines after licensure. These post-licensure studies are critical to evaluate the safety of licensed vaccines by examining their possible association with rare adverse events not detected by pre-licensure randomized clinical trials. These associations can be explored using cohort, self-controlled case series (SCCS), and risk interval designs [1]. In these methods, the incidence rates or risks of adverse events in pre-determined risk windows are compared to the risk of adverse events in non-risk windows (control periods). Typically after a vaccine is licensed, an association with an adverse event may be suggested by an active surveillance program [2,3] or a passive one (e.g., via the Vaccine Adverse Event Reporting System from the Centers for Disease Control and Prevention (CDC) and the Food and Drug Administration (FDA)). This signal needs to be tested in an independent data set through a rigorous epidemiological study. Before conducting the epidemiological study, it is important to answer two questions: 1) whether there is a defined time after vaccination with an elevated risk of a certain adverse event (risk window); 2) if the risk window does exist, when it starts (location in time) and how long it lasts (length of time). Little knowledge regarding the timing of rare adverse events is available from pre-licensure randomized clinical trials, which usually focus on the efficacy, immunogenicity, and safety of some pre-selected severe adverse events. The statistical power is often limited in these trials to study associations between vaccines and rare adverse events. In addition, when the biologic mechanism for the adverse event is largely unknown, it is difficult to identify biologically plausible risk windows. Due to the lack of methods for identifying optimal risk windows in post-licensure observational studies, researchers typically use pre-defined risk windows in vaccine safety studies, partially informed by prior studies, or by hypotheses based on biological understanding of how the vaccines work [4–7].
Recently, a data-driven approach was developed to identify optimal risk windows for vaccine safety studies using self-controlled case series design [8]. This data-driven approach employs both the maximum incidence rate ratio (IRR) and the linear relation between the estimated IRR and the inverse of average person time at risk, given a specified risk window when it is longer than the true risk window. Hereafter we refer to this approach as the maximum and graphic approach. Briefly, for a specified risk window length (z), the average time at risk, T(z), can be calculated. When the specified risk window is shorter than the true risk window, the calculated IRR for the specified risk window, R(z), decreases with 1/T(z) increasing, but there is no explicit relationship. When the specified risk window is longer than the true, R(z) increases linearly with 1/T(z) increasing. Theoretically the risk window with the maximum IRR is the optimal risk window. Because of issues of sparse data, using both the maximum incidence rate ratio and the linear relationship was recommended when the specified risk window is longer than the true risk window in order to identify the optimal risk window. However, this method may be subjective, especially when two window lengths have similar IRR estimates and a linear relationship with 1/T(z). The other limitation is that the maximum and graphic approach does not provide a statistical test against the null hypothesis that there does not exist a period after vaccination with elevated risk of adverse events. To address these limitations, we propose using a scan statistic to identify optimal risk windows.
Scan statistics have been used for identifying temporally or/and spatially clustered diseases [9,10]. The simple scan statistic was first introduced by Naus [11]. With a fixed scan window, Naus derived the probability of an interval shorter than the observation period with a certain number of events or greater. Under the null hypothesis that events are uniformly distributed during the follow up period, the interval with the maximum number of events is the clustering window. Kulldorff and Nagarwalla [12] and Nagarwalla [13] extended the simple scan statistic to use a variable scan window and developed variable scan statistics. For testing the null hypothesis that the events are distributed uniformly, they used a likelihood ratio test against a piecewise-constant alternative. Further, Molinari et al [14] proposed a method that could detect multiple clusters by applying a piece-constant regression model to the spacings of events occurring times. Scan statistics have also been developed and widely used for spatial detection of disease clusters [12,15–20]. In real-time vaccine safety surveillance, temporal scan statistics have been used to determine if adverse events cluster during a post-vaccination observation window [21–23].
In this paper, we propose a scan statistic specifically to test if there exists a risk window after vaccination with an elevated risk of an adverse event and to identify the location (in time) and length of risk windows for vaccine safety studies using self-controlled case series designs, considering that 1) only individuals with adverse events are included in the analysis; 2) follow-up periods before vaccination are control periods and are not scanned for elevated risk of adverse events; 3) more than one adverse event may occur to an individual; and 4) age is a confounder since it may be associated with both the adverse events and the probability of being vaccinated. With this newly developed approach, the IRR is readily available when an optimal risk window is identified.
2. Statistical Theory
2.1 SCCS data and conditional Poisson regression model
The SCCS design includes only those individuals who experienced adverse events during a fixed pre-defined time period, either exposed or unexposed to vaccine. It is widely used in vaccine safety studies for the following reasons. First, it addresses selection bias due to the differences between the vaccinated and unvaccinated individuals in observational study settings. Secondly, traditional study designs such as the matched cohort and case-control designs may not even be feasible for studying vaccine safety because 1) the coverage of some vaccines is nearly 100%, and so there are not enough unvaccinated individuals to use for the control group; and 2) data are not collected in safety surveillance systems for those who did not experience/report an adverse event.
In a SCCS study, the follow-up period of an individual is partitioned into intervals by time-varying covariates such as vaccination, age, and seasonality. Let yij denote the number of adverse events in interval j for individual i, j=1,…, J. The adverse events to be monitored can be common (e.g. fever and soreness), relatively common (e.g. seizures) or rare (e.g. ITP and death). In vaccine safety studies with a goal to establish causal relations between certain vaccines and adverse events of interest, the adverse events are generally pre-specified [3]. In hypotheses-generating vaccine safety studies [4,6,7], many adverse events (not pre-specified) based on diagnosis codes in electronic health care data bases are evaluated for their associations with vaccines. Conditional Poisson regression models have been used to analyze data from studies using the SCCS design [24–26]. For individual i, the conditional likelihood function is
(1) |
Here, xij is the row vector of time-varying covariates including indicator variables (x1, x2, x3…) for age effects and α is a column vector of corresponding coefficients (α1, α2, α3…), kij =1 for risk period and equal to 0 for control period, β is the parameter coefficient for the vaccination effect. Model (1) can accommodate multiple risk levels by simply further partitioning the risk window into sub risk intervals [24,25]. tij is the actual time at risk in days that an individual i contributes to interval j.
2.2 Fixed effects models for SCCS data
In addition to conditional Poisson models, fixed effects models were used to analyze SCCS data in a tutorial paper by Whitaker et al [25]. SCCS data have a longitudinal data structure in which multiple observations are those intervals defined by vaccination exposure status and time-varying covariates on an individual. Let ξi represent the individual-specific coefficient for individual i who experienced at least one adverse event during the follow-up period. ξi does not assume any defined distribution such as a normal distribution. To fit fixed effects models for SCCS data, assume
(2) |
where uij=tij exp(β0+xij α + kij β+ξi), β0 represents the intercept coefficient for the overall case sample, and xij, α, kij and β have the same meanings as defined previously in model (1). Thus, exp(β0) is the baseline incidence rate for the adverse events for each unit of unvaccinated period of time (e.g., each day) in the case sample. It can be shown that the likelihood function of a fixed effects model is proportional to that from a conditional Poisson model as in (1). As a result, the maximum likelihood estimates (MLE) of α and β are same for the two models [27].
Fixed effects models are likelihood based and make it possible to employ likelihood-based statistical tests [27]. Whitaker et al [25] used the likelihood ratio test statistic to investigate the effect of gender on the association between oral polio vaccine and intussusception. In this paper, we propose to use a maximized likelihood ratio test (MLRT) for identifying optimal risk windows in vaccine safety studies using the SCCS design.
2.3 Maximized likelihood ratio test for identifying the optimal risk window
Let L(s,z) be the likelihood function from a fixed effects model for a risk window starting s days after vaccination with a length of z days. The start of the risk window, representing the location of a risk window, can be any time after vaccination. We propose the following log likelihood ratio test statistic as the scan statistic for identifying risk windows,
(3) |
where are MLEs under the alternative hypothesis and are MLEs under the null hypothesis. L0 is the value of the likelihood function under the null hypothesis that the incidence rate of adverse events remains constant throughout the follow-up period while adjusting for other covariates (i.e., x1, x2 …). For identifying the optimal risk window, given a large set of different locations, s, and a large set of different lengths, z, the corresponding values of the likelihood function can be calculated after are obtained through likelihood maximization based on fixed effects models. Then the largest log likelihood ratio test statistic, τ(s0,z0), can be found, where τ(s0, z0) ≥ τ(s,z). Similar to other scan statistics [12,13,15], we consider the risk window starting s0 days after vaccination with a length of z0 to be the optimal risk window. The statistic, τ(s,z), is the scan statistic for identifying optimal risk window in vaccine safety studies using the SCCS design.
To test the null hypothesis that there does not exist a risk window after vaccination with elevated risk of adverse events, Monte Carlo simulation will be used [10,11,14] because the log likelihood ratio test statistic does not follow an explicit distribution such as a t distribution or a χ2 distribution. Details are described in next section.
3. Example: Idiopathic Thrombocytopenic Purpura (ITP) after Measles-Mumps-Rubella (MMR) vaccination
France et al. [5] examined the association between MMR vaccination and ITP among a population of US children aged 1 to 18. ITP case data from the years 1991–2000 were extracted from the Vaccine Safety Datalink (VSD), a project established in 1991 by the CDC to evaluate the safety of vaccines [28]. There were 259 confirmed patients aged 18 years old or younger with ITP. Among them, 66 ITP cases occurred between 366–690 days of age. Using a 42-day risk window after MMR vaccination, they found an incidence rate ratio of 7.06 for those vaccinated between 366 and 690 days of age. They excluded the 42-day “healthy vaccinee period” immediately preceding MMR vaccination. In a recent study, a 42-day optimal risk window was identified using the maximum and graphic approach assuming that risk window started immediately after MMR vaccination [8]. That study included all subjects with follow-up for 366–730 days regardless of vaccination status and did not exclude the healthy vaccinee period before MMR vaccination. There were 79 ITP cases that occurred between 366–730 days of age. Although the maximum and graphic approach is useful for identifying a risk window for SCCS design, it is somewhat subjective. For example, for a 42-day risk window after MMR vaccination, the IRR is 7.41; for a 35-day risk window after MMR vaccination, the IRR is 7.47. The linear relation between R(z) and 1/T(z) does not help to make the decision either. From a vaccine safety point view, a 42-day risk window was selected somewhat subjectively, because six weeks had been used in prior studies looking at immune-mediated adverse events after vaccination [8].
To compare with the results using the maximum and graphic approach, we assume that the risk window starts at day 1 after MMR vaccination, s=1. Using MLRT to identify the optimal risk window for the US MMR-ITP data involves two steps. The first step was to find the risk window with the maximum log likelihood ratio test statistic. To obtain , we fit a fixed effects model to the US MMR-ITP data under the null hypothesis that there does not exist an interval after vaccination with elevated risk for ITP occurrence. Under the alternative hypotheses, we fit a series of fixed effects models to obtain MLEs of parameters for vaccination and age effects and then calculated the likelihood values for a large set of risk windows starting day 1 after MMR vaccination (s=1) with different lengths of z. We fit two sets of fixed effects models: parametric and semi-parametric regression models regarding age effects. In the parametric models, we defined six age groups as in Whitaker et al [25] and Xu et al [8]: 366–426 days of age, 427–487 days of age, 488–548 days of age, 549–609 days of age, 610–670 days of age, and 671–730 days of age (the reference group). In this way, the results from the scan statistic approach can be compared to the results from the maximum and graphic approach. In the semi-parametric models [29], indicators for unique ages at events were included in the fixed effects models as described in the section on Statistical Theory. With the parametric model, risk windows with length z=1 to 364 days were searched and a 42-day risk window starting day 1 after vaccination was identified with the maximum τ(s = 1,z = 42) = 15.14. The corresponding incidence rate ratio is 7.41. Similarly with the semi-parametric model, a 42-day risk window starting day 1 after vaccination was identified with the maximum τ(s = 1,z = 42) = 14.00. The corresponding incidence rate ratio is 7.16.
The second step was to calculate the p-value for testing the null hypothesis that there does not exist an interval with elevated risk for ITP after MMR vaccination. We employed a Monte Carlo simulation approach to obtain the p-values as used in other scan statistic methods [12,13,15]. We first obtained the vaccination probabilities across six age groups in the case sample and then randomly simulated vaccination times for each individual using these multinomial probabilities and kept the original event times. This randomization was repeated 20,000 times, which produced 20,000 datasets. The number of cases is the same in each null dataset as in the original US MMR-ITP data (79 ITP cases). For each dataset, we fit a fixed effects model as in model (2) with a 42-day risk window after simulated vaccination date, age groups included as time-varying covariates, and indicators for subjects included as fixed effects. The scan statistics under the null hypothesis,τ’(β =0) are then calculated. The p-value 0.016 was obtained by comparing the value of τ(s = 1,z = 42) under the alternative hypothesis, 15.14, to the distribution of 20,000 τ ’(β =0) values.
The scan statistics approach can identify a risk window that is not immediately after vaccination. Under the alternative hypotheses that there is an elevated risk for ITP after MMR vaccination, we calculated the likelihood values for a large set of risk windows starting at any day after vaccination with different lengths of z. For the parametric approach regarding the age effects, we found that τ(s = 15,z = 27) = 17.55 is the maximum log likelihood ratio test statistic, thus the identified risk window begins 15 days after vaccination and has a length of 27 days, the corresponding IRR=8.86. Same risk window was identified with the semi-parametric approach regarding the age effects.
We also used the scan statistic to identify the optimal risk window in a United Kingdom (U.K.) MMR-ITP data set [25]. In this ITP case sample, thirty-five children aged 366–730 days of age were admitted to the hospital for ITP at least once and six were admitted more than once during the follow-up period. The total number of ITP cases was 44. For the parametric approach regarding the age effects, the same age groups were defined as in the U.S. MMR-ITP data. If we let the risk window start day 1 after MMR vaccination, we searched risk windows with length z=1 to 364 days and found that τ(s = 1,z=77) = 4.58 was the maximum log likelihood ratio test statistic. This is consistent with the previous finding using the maximum and graphic approach that the risk window ends 77 days after MMR vaccination in the U.K. MMR-ITP data [8]. When we used the scan statistic to search for the start of the risk window, we found that τ(s = 11,z = 60)=7.27 was the maximum log likelihood ratio test statistic, thus the identified risk window began 11 days after vaccination and had a length of 60 days, the corresponding IRR=4.56. The same risk window was identified with the semi-parametric approach regarding the age effects.
4. Simulation studies
We performed simulations to evaluate the performance of the scan statistic, τ(s, z), for identifying optimal risk window for vaccine safety studies using the SCCS design. In simulating adverse event, for β >0, we assumed that the risk for adverse events increased immediately after vaccination. Thus in identifying optimal risk window, under the alternative hypothesis, we assumed that risk window started day 1 after vaccination (s=1) in the simulation study.
4.1 Simulation
To simulate case data with true risk window of 14 or 42 days, each of the 100 individuals in the simulations was assumed to have a follow-up period of 365 days, consisting of both risk and control periods. First, vaccination day for each individual was generated according to a normal distribution with a mean of 140 days and a standard deviation of 42 days. Secondly, the follow-up period was partitioned into intervals by risk period and three age groups. The true risk window was defined as 14 days or 42 days after vaccination and the control window was times outside of the risk window including before vaccination and after risk window. Thirdly, cases were simulated for each interval of an individual according to model (2) with ξi independent and identically distributed N (0, 1) and with parameters for baseline incidence rate (β0), vaccination effect (β), and age effects (α1 and α2). Lastly, individuals with cases were extracted and analyzed using the newly proposed method. The values of the overall intercept coefficient, β0, were set to be −5, −6, and −7 to achieve baseline incidence rates ranging from 0.00091 to 0.00674 per day in the simulated data. The chosen coefficients for age effects were 0.3 (α1) for days 91–270 and 0.1 (α2) for days 271–365, with days 1–90 as the reference group. The coefficient for the vaccination effect, β, was chosen to be 0.69 and 1.39, which represent incidence rate ratios of 2 and 4, respectively. 1,000 datasets were simulated and analyzed for each combination of parameters.
4.2 Identification of optimal risk window with maximized likelihood ratio test
For each simulated dataset, we fit a series of fixed effects models as in (2) with a large set of risk windows starting one day after vaccination (s=1) with different lengths of z. Under the alternative hypothesis that there is a risk period after vaccination with elevated IRR, we fit fixed effects models for each dataset with age groups and an indicator for risk window included as time-varying covariates, and indicators for subjects included as fixed effects. Under the null hypothesis that there is no elevated risk for adverse event due to vaccination, fixed effects models were fit for each dataset with age groups included as time-varying covariates and indicators for subjects included as fixed effects. The scan statistic τ(s = 1,z) was calculated for each selected risk window length, z. We chose z0 to be the optimal risk window for a dataset such that τ(s = 1,z0) ≥ τ(s = 1,z). The mean and standard deviation of z0 based on 1000 replicates are reported. Percent bias was calculated as 100 × (true value-mean estimate)/true value. We also provided the mean and standard deviation and percent bias of corresponding β1, the coefficient for vaccination effect.
4.3 Simulation results
The average number of cases in our simulations ranges from 69 to 677 (Table I). The total number of cases is primarily influenced by the value of β0. Table I also shows the estimated risk window lengths and incidence rate ratios (and their standard deviations) for each simulation setting. For a 14-day risk window, the scan statistics approach produced a median percent bias of −7.1 to 7.1 per cent compared with the true risk window lengths, which represents a one day difference at most. Using the identified optimal risk window, the median percent bias of β ranged from −1.5 to 91.1 per cent compared with the true β. The median bias of 91.1% is largely due to total data separation issues in which there is no exposed case because of a low baseline incidence rate (β0=−7) and a low IRR (β =0.69) in simulations. Only 751 valid β estimates out of 1000 are averaged and reported. Thus the results from the valid estimates tend to be overestimated. The bias decreased when the baseline incidence rate and the IRR increased.
Table I.
True risk window |
True β |
β0 | number of cases (std) |
Estimated risk window (std) |
Estimated β (std) | Estimated α1 (std) | Estimated α2 (std) | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
method1* | method2+ | method1 | method2 | method1 | method2 | method1 | method2 | ||||
14 | 0.69 | −5 | 510 (65.4) | 13.9 (5.4) | 12.8(4.3) | 0.73 (0.19) | 0.82(0.20) | 0.30(0.12) | 0.32(0.12) | 0.10(0.14) | 0.10(0.14) |
−6 | 188(25.5) | 14.7 (8.1) | 13.2 (7.3) | 0.82 (0.36) | 0.88(0.36) | 0.28(0.20) | 0.30(0.20) | 0.08 (0.24) | 0.08 (0.24) | ||
−7 | 69(7.8) | 15.0 (10.4) | 13.5 (8.4) | 1.02 (0.64) | 1.06(0.46) | 0.24(0.36) | 0.24(0.35) | 0.08 (0.38) | 0.08 (0.38) | ||
14 | 1.39 | −5 | 551(70.3) | 13.4 (1.0) | 12.5(2.3) | 1.37 (0.13) | 1.39(0.13) | 0.33(0.12) | 0.31(0.12) | 0.10(0.14) | 0.10(0.14) |
−6 | 203(27.6) | 13.5 (3.1) | 12.7(3.8) | 1.41 (0.22) | 1.48(0.21) | 0.28(0.19) | 0.33(0.20) | 0.06(0.22) | 0.06(0.22) | ||
−7 | 74(11.9) | 14.0 (6.0) | 13.1(5.4) | 1.49 (0.38) | 1.51(0.35) | 0.30(0.29) | 0.31(0.29) | 0.06(0.35) | 0.06(0.35) | ||
42 | 0.69 | −5 | 550(70.6) | 41.1 (5.6) | 38.4(6.8) | 0.70 (0.11) | 0.72(0.11) | 0.29(0.12) | 0.31(0.12) | 0.10(0.14) | 0.11(0.14) |
−6 | 203(27.9) | 40.6 (9.8) | 39.8(10.6) | 0.75 (0.18) | 0.77(0.18) | 0.27(0.20) | 0.30(0.20) | 0.06(0.22) | 0.06(0.22) | ||
−7 | 74(11.7) | 39.1 (13.8) | 38.5(13.8) | 0.82 (0.42) | 0.85(0.38) | 0.29(0.35) | 0.31(0.36) | 0.08(0.42) | 0.09(0.42) | ||
42 | 1.39 | −5 | 677(85.3) | 41.2 (1.0) | 40.7(1.9) | 1.37 (0.09) | 1.37 (0.09) | 0.31(0.12) | 0.31(0.12) | 0.10(0.13) | 0.10(0.13) |
−6 | 250(33.4) | 41.1 (2.8) | 39.3(4.7) | 1.40 (0.15) | 1.40(0.14) | 0.29(0.20) | 0.30(0.20) | 0.07(0.22) | 0.07(0.22) | ||
−7 | 92(13.8) | 40.7 (5.8) | 39.7(6.1) | 1.46 (0.25) | 1.45(0.22) | 0.30(0.34) | 0.31(0.34) | 0.07(0.39) | 0.07(0.39) |
method1, the scan statistic approach
method2, the maximum and graphic approach
For a 42-day risk window length, the scan statistics approach produced a median percent bias of −6.9 to −2.4 percent, and −1.9 to 24.5 percent for the risk window lengths and the incidence rate ratios, respectively. For risk windows of 14 and 42 days, the standard deviations of the estimated risk window lengths and incidence rate ratios increased as baseline incidence rates decreased. The standard deviations of the risk window length decreased as the true incidence rate ratios increased.
The simulated datasets were also analyzed using the maximum and graphic approach and the results were presented in Table 1. On average, the maximum and graphic approach and the scan statistic approach yielded similar risk windows, corresponding vaccination effects and age effects. Further examination showed that these two approaches gave risk windows within one day difference in 67%–78% of the simulated datasets.
To obtain type I error rates, we simulated 200 SCCS datasets under the null hypothesis (β=0). For each SCCS dataset, we identified a risk window with τ using the scan statistic approach. Similar to examples, we employed a Monte Carlo simulation approach to test whether the identified risk window was statistically significant. We kept the original event times and randomized the vaccination times. To achieve a significance level of 0.05, this randomization process was repeated 1000 times and 1000 scan statistic, τ’, were obtained. If τ exceeded the 95th percentile of the scan statistics τ’, the identified risk window was statistically significant at a significance level of 0.05. A type I error rate was calculated as the proportion of SCCS datasets with significant risk windows among 200 SCCS datasets. We reported type I error rates of 5.1 %, 6.5% and 5.5% for β0=−5, −6, and −7, respectively.
5. Discussion
Misspecification of risk windows can lead to biased risk estimates of vaccination effects on adverse events [8]. Although health care professionals provide care for patients who experience adverse events, they do not typically investigate the causes of those adverse events, or may not even recognize an event as potentially vaccine-related. Because little data are available from pre-licensure RCTs, and when biologic mechanism is unknown, it can be difficult to identify biologically plausible risk windows prior to conducting vaccine safety studies. Therefore identifying the optimal risk window is a very important step in determining whether an association between a particular vaccine and adverse event exists, and if so, its magnitude.
We proposed a novel scan statistic for identifying optimal risk window for vaccine safety research. Because vaccine safety studies often examine the possible association of rare events with vaccines, data from several study sites have to be pooled to gain statistical power. Therefore, it is rare to have an independent dataset. In the case of small case sample, we recommend reporting the identified optimal risk window along with the corresponding IRR and a p-value from Monte Carlo simulation against the null hypothesis that a risk window after vaccination does not exist. If there exists an independent dataset or the case sample is large enough to be randomly split into two independent datasets, one can use one dataset to find the optimal risk window and then use the found risk window to conduct conventional analysis in the other dataset. The corresponding p-value in conventional analysis will be based on Wald statistic or t-statistic for statistical testing against the null hypothesis (IRR=1). However, from a policy makers’ point of view, it may be preferred to use the entire case sample to find the risk window with corresponding IRR and both p-values from the Monte Carlo simulation and conventional regression models.
When sample sizes are small, it is possible that there are no cases in either risk window or control window. In this case, fixed effects models will not fit the data and IRR estimates will not be available. In practice, one must collect more cases in order to use either the maximum and graphic approach or the scan statistic approach to identify an optimal risk window. It is also possible that fixed effects models will not fit the data if there are too many time-varying covariates such as age, an important confounder in vaccine safety studies. However, it has been shown that semi-parametric models handle this problem well. In these models, the age effects are not specified, leaving many nuisance coefficients to be estimated for age effects [25,29]. In instances where there are too many nuisance coefficients to estimate, Whitaker et al. [25] suggested the use of absorbing factors to fit the models efficiently without estimating individual-specific effects explicitly. Although our simulation studies cannot suggest a minimal sample size and the maximum number of covariates that can be handled by this proposed approach, it was shown that this approach worked well in two MMR-ITP examples using the semi-parametric model regarding to age effects. In the US MMR-ITP example with only 79 cases, fixed effects model worked well with 56 unique age effects and the vaccination effect. In the U.K. MMR-ITP example with even smaller number of ITP cases (44), fixed effects model performed well with 43 unique age effects and the vaccination effect.
Although scan statistics have been used to detect temporal or spatial clusters of diseases, our study used scan statistics to identify optimal risk windows in SCCS design for vaccine safety studies. Our simulation results show that the newly proposed scan statistics can identify optimal risk windows for use in vaccine safety studies. Our simulation study showed that 1) on average the estimates and standard deviations of identified risk windows and corresponding incidence rate ratios were comparable to those using the maximum and graphic approach [8]; 2) in 67%–78% of the simulated datasets, these two approaches yielded risk windows with a difference in only one day.
Compared to the maximum and graphic approach, the scan statistics approach is less subjective. For example, to identify the optimal risk window for the US MMR-ITP data described above, it is not obvious whether 35 or 42 days after MMR vaccination is the optimal risk window length. With the scan statistic approach and using parametric model for age effects, τ(s = 1,z = 42)=3,772,422, which is apparently greater than the neighboring values:τ(s = 1,z = 35)=1,822,366 and τ(s = 1,z = 49)=1,112,060. Thus it is easier to identify optimal risk windows with the scan statistic approach. Both approaches take into account the effects of time-varying covariates such as age effects. The maximum and graphic approach does not provide the statistical significance level (p-value) against the null hypothesis that there does not exist a risk window after vaccination with elevated incidence rate of adverse events. In contrast, the scan statistic approach provides p-values against the null hypothesis using Monte Carlo simulation as described in the US MMR-ITP example above.
Neither the maximum and graphic approach nor the scan statistics approach addressed the issue that there may be multiple risk levels within the overall risk windows identified using the two approaches. However, other scan statistics have been developed for identifying multiple clusters of diseases, i.e., temporal [30] or spatial [31]. We believe that the newly proposed scan statistic for identifying risk window in vaccine safety studies can be modified to identify multiple sub risk windows within the overall risk window. This can be a two-steps process in which the overall risk window is identified and then further partitioned into sub risk windows, or it can be a one-step approach in which multiple risk windows are simultaneously identified.
The true start of the risk window is not necessarily immediately after vaccination. Using the maximum and graphic approach, one can first identify the end of the risk window, then search for the start of the risk window as discussed by Xu et al [8]. The scan statistics approach can identify a risk window that is not immediately after vaccination as shown in the MMR-ITP examples.
Both the maximum and graphic approach and the scan statistics approach have their pros and cons. The maximum and graphic approach provides visual relation between the specified risk window and its corresponding IRR, but can be subjective sometimes, while the scan statistic yields one definite risk window but does not show the visual relation between the specified risk window and IRR. However, they can complement each other in vaccine safety studies using SCCS designs as shown in the US MMR-ITP example. Although the simulation study showed that optimal risk windows from the maximum and graphic approach and the scan statistics approach are the same for most times, they can be different especially when adverse events are rare. For the US MMR-ITP data, the maximum and graphic approach suggested a either 1–42 days or 1–35 days of risk window after MMR vaccination. The scan statistic approach clearly indicated that the risk window was 1–42 days after MMR vaccination in the US MMR-ITP data if we let the risk window start day 1 after MMR vaccination. So we recommend using both approaches to identify optimal risk windows in SCCS designs for vaccine safety studies.
Acknowledgements
This study was supported by the Centers for Disease Control and Prevention via contract 200-2002-00732 (the Vaccine Safety Datalink Project) with America's Health Insurance Plans. Xu was also supported by NIH/NCRR Colorado CTSI Grant Number UL1 RR025780. We are grateful to Dr. Martin Kulldorff of Harvard Pilgrim Health Care for his comments and guidance.
References
- 1.Glanz G, McClure D, Xu S, et al. Comparison of four study designs for evaluating vaccine safety: Retrospective cohort, case-control, risk-interval and self-controlled case series. Journal of Clinical Epidemiology. 2006;59:808–818. doi: 10.1016/j.jclinepi.2005.11.012. [DOI] [PubMed] [Google Scholar]
- 2.Yih WK, Kulldorff M, Fireman BH, Shui IM, Lewis EM, Klein NP, Baggs J, Weintraub ES, Belongia EA, Naleway A, Gee J, Platt R, Lieu TA. Active surveillance for adverse events: the experience of the Vaccine Safety Datalink project. Pediatrics. 2011 May;127(Suppl 1):S54–S64. doi: 10.1542/peds.2010-1722I. [DOI] [PubMed] [Google Scholar]
- 3.Baggs J, Gee J, Lewis E, Fowler G, Benson P, Lieu T, Naleway A, Klein NP, Baxter R, Belongia E, Glanz J, Hambidge SJ, Jacobsen SJ, Jackson L, Nordin J, Weintraub E. The Vaccine Safety Datalink: a model for monitoring immunization safety. Pediatrics. 2011 May;127(Suppl 1):S45–S53. doi: 10.1542/peds.2010-1722H. Epub 2011 Apr 18. Review. [DOI] [PubMed] [Google Scholar]
- 4.France EK, Glanz JM, Xu S, Davis RL, Black SB, Shinefield HR, Zangwill KM, Marcy SM, Mullooly JP, Jackson LA, Chen R. Safety of the trivalent inactivated influenza vaccine among children: a population-based study. Archives of Pediatrics and Adolescent Medicine. 2004;158:1031–1036. doi: 10.1001/archpedi.158.11.1031. [DOI] [PubMed] [Google Scholar]
- 5.France EK, Glanz J, Xu S, Hambidge S, Yamasaki K, Black SB, Marcy M, Mullooly JP, Jackson LA, Nordin J, Belongia EA, Hohman K, Chen RT, Davis R Vaccine Safety Datalink Team. Risk of immune thrombocytopenic purpura after measles-mumps-rubella immunization in children. Pediatrics. 2008;121(3):e687–e692. doi: 10.1542/peds.2007-1578. [DOI] [PubMed] [Google Scholar]
- 6.Hambidge SJ, Glanz JM, France EK, McClure D, Xu S, Yamasaki K, Jackson L, Mullooly JP, Zangwill KM, Marcy SM, Black SB, Lewis EM, Shinefield HR, Belongia E, Nordin J, Chen RT, Shay DK, Davis RL, DeStefano F Vaccine Safety Datalink Team. Safety of trivalent inactivated influenza vaccine in children 6 to 23 months old. Journal of the American Medical Association. 2006;296(16):1990–1997. doi: 10.1001/jama.296.16.1990. [DOI] [PubMed] [Google Scholar]
- 7.Glanz JM, Newcomer SR, Hambidge SJ, Daley MF, Narwaney KJ, Xu S, et al. The Safety of Trivalent Inactivated Influenza Vaccine in Children Ages 24 to 59 Months. Arch Pediatr Adolesc Med. 2011;165(8):749–755. doi: 10.1001/archpediatrics.2011.112. [DOI] [PubMed] [Google Scholar]
- 8.Xu S, Zhang L, Zeng C, Nelson J, Mullooly J, McClure D, Glanz J. Identifying optimal risk windows for self-controlled case series studies of vaccine safety. Statistics in Medicine. 2011;30:742–752. doi: 10.1002/sim.4125. PMID: 21120820. [DOI] [PubMed] [Google Scholar]
- 9.Glaz J, Balakrishnan N. Scan Statistics and Applications. Boston: Birkhäuser; 1999. [Google Scholar]
- 10.Glaz J, Pozdnyakov V, Wallenstein S. Scan Statistics: Methods and Applications. Springer; 2009. [Google Scholar]
- 11.Naus J. The distribution of the size of the maximum cluster of points on the line. Journal of American Statistical Association. 1965;60:532–538. [Google Scholar]
- 12.Kulldorff M, Nagarwalla N. Spatial disease clusters: detection and inference. Statistics in medicine. 1995;14:799–810. doi: 10.1002/sim.4780140809. [DOI] [PubMed] [Google Scholar]
- 13.Nagarwalla N. A scan statistics with a variable window. Statistics in Medicine. 1996;15:845–850. doi: 10.1002/(sici)1097-0258(19960415)15:7/9<845::aid-sim254>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]
- 14.Molinari N, Bonaldi C, Daurhs JP. Multiple temporal cluster detection. Biometrics. 2011;57:577–583. doi: 10.1111/j.0006-341x.2001.00577.x. [DOI] [PubMed] [Google Scholar]
- 15.Kulldorff M. A spatial scan statistic. Communications in Statistics,Theory and Methods. 1997;26:1481–1496. [Google Scholar]
- 16.Jung I, Kulldorff M, Klassen AC. A spatial scan statistic for ordinal data. Statistics in Medicine. 2007;26:1594–1607. doi: 10.1002/sim.2607. [DOI] [PubMed] [Google Scholar]
- 17.Huang L, Kulldorff M, Gregorio D. A spatial scan statistic for survival data. Biometrics. 2007;63:109–118. doi: 10.1111/j.1541-0420.2006.00661.x. [DOI] [PubMed] [Google Scholar]
- 18.Kulldorff M. Tests for spatial randomness adjusting for an inhomogeneity: A general framework. Journal of the American Statistical Association. 2006;101:1289–1305. [Google Scholar]
- 19.Kleinman K, Abrams A, Kulldorff M, Platt R. A model-adjusted space-time scan statistic with an application to syndromic surveillance. Epidemiology and Infection. 2005;133:409–419. doi: 10.1017/s0950268804003528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kulldorff M, Mostashari F, Duczmal L, Yih K, Kleinman K, Platt R. Multivariate spatial scan statistics for disease surveillance. Statistics in Medicine. 2007;26:1824–1833. doi: 10.1002/sim.2818. [DOI] [PubMed] [Google Scholar]
- 21.Klein NP, Fireman B, Yih WK, Lewis E, Kulldorff M, Ray P, Baxter R, Hambidge S, Nordin J, Naleway A, Belongia EA, Lieu T, Baggs J, Weintraub E. Measles-mumps-rubella-varicella combination vaccine and the risk of febrile seizures. Pediatrics. 2010;126(1):e1–e8. doi: 10.1542/peds.2010-0665. [DOI] [PubMed] [Google Scholar]
- 22.Yih WK, Kulldorff M, Fireman BH, Shui IM, Lewis EM, Klein NP, Baggs J, Weintraub ES, Belongia EA, Naleway A, Gee J, Platt R, Lieu TA. Active surveillance for adverse events: the experience of the Vaccine Safety Datalink project. Pediatrics. 2011;127(Suppl 1):S54–S64. doi: 10.1542/peds.2010-1722I. [DOI] [PubMed] [Google Scholar]
- 23.Lieu TA, Kulldorff M, Davis RL, Lewis EM, Weintraub E, Yih K, Yin R, Brown JS, Platt R. Real-time vaccine safety surveillance for the early detection of adverse events. Med Care. 2007;45:S89–S95. doi: 10.1097/MLR.0b013e3180616c0a. [DOI] [PubMed] [Google Scholar]
- 24.Farrington CP. Relative incidence estimation from case series for vaccine safety evaluation. Biometrics. 1995;51:228–235. [PubMed] [Google Scholar]
- 25.Whitaker HJ, Farrington CP, Spiessens B, Musonda P. Tutorial in Biostatistics: The self-controlled case series method. Statistics in Medicine. 2006;25(10):1768–1797. doi: 10.1002/sim.2302. [DOI] [PubMed] [Google Scholar]
- 26.Whitaker HJ, Hocine MN, Farrington CP. The methodology of self-controlled case series studies. Statistical Methods in Medical Research. 2009;18:7–26. doi: 10.1177/0962280208092342. [DOI] [PubMed] [Google Scholar]
- 27.Xu S, Zeng C, Newcomer S, Nelson J, Glanz J. Use of fixed effects models to analyze self-controlled case series data in vaccine safety studies. J Biomet Biostat. 2012;S7:006. doi: 10.4172/2155-6180.s7-006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Chen RT, Glasser JW, Rhodes PH, et al. Vaccine Safety Datalink project: a new tool for improving vaccine safety monitoring in the United States. Pediatrics. 1997;99(6):765–773. doi: 10.1542/peds.99.6.765. [DOI] [PubMed] [Google Scholar]
- 29.Farrington CP, Whitaker HJ. Semiparametric analysis of case series data (with discussion) Applied Statistics. 2006;155:553–594. [Google Scholar]
- 30.Cucala L. A hypothesis-free multiple scan statistic with variable window. Biometrical Journal. 2008;50:299–310. doi: 10.1002/bimj.200710412. [DOI] [PubMed] [Google Scholar]
- 31.Zhang Z, Kulldorff M, Assunção R. Spatial scan statistics adjusted for multiple clusters. Journal of Probability and Statistics. 2010;2010:1–11. [Google Scholar]