Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2017 Feb 1;114(7):1524–1529. doi: 10.1073/pnas.1612833114

Estimating the population-level impact of vaccines using synthetic controls

Christian A W Bruhn a, Stephen Hetterich b, Cynthia Schuck-Paim b, Esra Kürüm a,c, Robert J Taylor b, Roger Lustig b, Eugene D Shapiro a,d, Joshua L Warren a,e, Lone Simonsen b,f,g, Daniel M Weinberger a,1
PMCID: PMC5321019  PMID: 28154145

Significance

Pneumococcus, a bacterial pathogen, is among the most important causes of pneumonia globally. Quantifying the impact of pneumococcal conjugate vaccines (PCVs) on pneumonia is challenging due to time trends unrelated to the vaccine. We use a method developed for website analytics and economics called “synthetic controls” to disentangle changes in pneumonia rates caused by the vaccine from changes caused by unrelated factors. We found that PCVs significantly reduce all-cause pneumonia hospitalizations in young children, and reduce hospitalizations for invasive pneumococcal disease and pneumococcal pneumonia in children and adults. In contrast to previous studies, we did not detect a decline in all-cause pneumonia hospitalizations in older adults in any of the five countries following the introduction of the vaccine in children.

Keywords: pneumococcal conjugate vaccines, synthetic controls, Streptococcus pneumoniae, observational study, program evaluation

Abstract

When a new vaccine is introduced, it is critical to monitor trends in disease rates to ensure that the vaccine is effective and to quantify its impact. However, estimates from observational studies can be confounded by unrelated changes in healthcare utilization, changes in the underlying health of the population, or changes in reporting. Other diseases are often used to detect and adjust for these changes, but choosing an appropriate control disease a priori is a major challenge. The “synthetic controls” (causal impact) method, which was originally developed for website analytics and social sciences, provides an appealing solution. With this approach, potential comparison time series are combined into a composite and are used to generate a counterfactual estimate, which can be compared with the time series of interest after the intervention. We sought to estimate changes in hospitalizations for all-cause pneumonia associated with the introduction of pneumococcal conjugate vaccines (PCVs) in five countries in the Americas. Using synthetic controls, we found a substantial decline in hospitalizations for all-cause pneumonia in infants in all five countries (average of 20%), whereas estimates for young and middle-aged adults varied by country and were potentially influenced by the 2009 influenza pandemic. In contrast to previous reports, we did not detect a decline in all-cause pneumonia in older adults in any country. Synthetic controls promise to increase the accuracy of studies of vaccine impact and to increase comparability of results between populations compared with alternative approaches.


When a new public health intervention such as a vaccine is introduced, it is critical to monitor disease rates to ensure that the intervention is effective and to quantify its benefit. This information is important for policymakers working to prioritize use of health resources, but obtaining accurate estimates of vaccine impact is a major challenge.

Vaccine impact is typically evaluated by comparing rates or trends of the targeted disease in the years after introduction with rates or trends in the years before introduction using a time-series analysis (18). This type of analysis is used to estimate the total benefit of a new vaccine, in both vaccinated and unvaccinated individuals. However, these observational studies cannot easily disentangle changes in disease rates caused by the vaccine from changes caused by unrelated factors. Coincident changes in reporting patterns, access to care, clinical practice, and diagnostic coding practices, as well as social investments that influence the underlying health of the population, can either mask or exaggerate the true effect of a vaccine.

Evaluations of the impact of pneumococcal conjugate vaccines (PCVs) provide a case in point. PCVs target a subset (7, 10, or 13) of the more than 90 known serotypes of pneumococcus. Use of PCVs in infants nearly eliminated the targeted serotypes from vaccinated children and, through indirect effects, unvaccinated adults, with rates of invasive pneumococcal disease (IPD) in children declining by an average of 56% (range: 24–83%) within 3 y in high-income countries (9). However, the incidence of IPD is relatively small compared with the incidence of pneumonia caused by pneumococcus (10), particularly in older adults (11). Therefore, to inform public health policy, it is critical to quantify the extent to which the vaccine reduces pneumonia hospitalizations in all age groups.

Detecting the population-level impact of vaccines on IPD is relatively straightforward because a large fraction of the cases is preventable by the vaccine, so the decline is dramatic. However, quantifying the impact of the vaccine on rates of pneumonia is more challenging because many pathogens can cause pneumonia, including other bacterial species, viruses, and fungi. Therefore, the relative decline in rates of pneumonia is expected to be far smaller than the decline in rates of IPD, as was observed in randomized controlled trials. For example, a trial in Latin America demonstrated that PCV10 had 67% efficacy against IPD, 22% efficacy against likely bacterial pneumonia, and 10% against radiologically confirmed community-acquired pneumonia (12). When evaluating trends in disease rates over time, these modest vaccine-related changes in rates of pneumonia are difficult to distinguish from background noise and secular trends. Thus, it is not surprising that published estimates of the change in rates of pneumonia after introduction of PCVs vary considerably (13, 14). Better approaches are needed to separate vaccine effects from unrelated changes.

To strengthen the argument that an observed decline in disease rates is, in fact, caused by a vaccine, many studies compare changes in the disease of interest to changes in other diseases during the same time period (7, 8). The goal of these comparative analyses is to detect unmeasured confounding and bias in the time series of interest. The general reasoning is that if the disease of interest declines relative to the comparison disease after vaccine use begins, then the decline can be attributed to the vaccine. However, there is no standard methodology for selecting these comparison diseases or using them to adjust estimates of vaccine-associated changes. Ideally, diseases used for comparison would share the same set of potential biases and causal factors as the outcome of interest but would not be influenced by the vaccine (15, 16). Identifying a comparison disease that reasonably fulfills these criteria a priori is a major challenge and requires many qualitative decisions by the analyst.

A method developed for website analytics and economics research, “synthetic controls,” provides a potential solution to this problem. With this method, a number of time series that are unaffected by the intervention are optimally weighted according to their fit to the outcome of interest in the period before the intervention, then combined into a composite time series. The synthetic composite gives more weight to the predictor variables that jointly explain the outcome variable best (17, 18). The values for this composite synthetic control are then estimated in the postintervention period, which produces a counterfactual estimate of what would have happened in the postintervention period had the intervention not occurred, effectively adjusting for unmeasured bias and confounding. This general approach has been applied to various problems, including evaluation of the effects of advertising on website traffic (using websites unaffected by the ad to construct the synthetic control) (18) and evaluation of tobacco control programs on smoking rates (using states that did not have a program to construct the synthetic control) (17). The increasing availability of large electronic healthcare databases now allows this approach to be applied to evaluations of public health interventions.

In this study, we demonstrate how the synthetic control method [causal impact approach of Brodersen et al. (18)] can be adapted to detect and adjust for unmeasured bias and confounding in the evaluation of vaccine programs. Specifically, we applied the synthetic controls method to nationwide administrative databases in Brazil, Chile, Ecuador, Mexico, and the United States to evaluate changes in the burden of hospitalizations for all-cause pneumonia associated with the introduction of PCVs. We show that this approach can effectively adjust for unexplained trends in the data from these five countries and offers important improvements over other commonly used approaches. This approach has the potential to increase the accuracy and improve the comparability of studies of vaccine impact between locations with different local biases and trends.

Results

Unexplained Factors Bias Estimates of Vaccine Impact.

We used nationally aggregated, age-specific hospitalization time-series data from Brazil, Chile, Ecuador, Mexico, and the United States (Fig. 1 and Tables S1 and S2). To evaluate changes in pneumonia hospitalizations that occurred after vaccine introduction, we first performed a simple comparison of the number of cases of all-cause pneumonia before and after vaccine introduction, using nonrespiratory hospitalizations as an offset. Among <12-mo-old children, the number of hospitalizations for pneumonia was substantially lower than the counterfactual in the postvaccine period in Brazil [−22%; 95% credible interval (CI): −27%, −17%], Mexico (−17%; 95% CI: −28%, −4%), and the United States (−19%; 95% CI: −23%, −14%). In contrast, the number of hospitalizations for pneumonia did not significantly change compared with the counterfactual in Chile (−15%; 95% CI: −29%, +2%), and increased in Ecuador (+17%; 95% CI: +1%, +35%) (Figs. 1 and 2A and Table S3). In adults 80+ years of age, the results also varied between countries when using this model. Notably, in Brazil and Ecuador, hospitalizations for pneumonia increased by 25% (95% CI: 18%, 32%) and 19% (95% CI: 6%, 33%) compared with the counterfactual (Fig. 2A and Table S3).

Fig. 1.

Fig. 1.

Time series of pneumonia hospitalizations (cases) from five countries (black line) among (A) children <12 mo and (B) adults 80+ y, and counterfactual estimates during the postvaccine period from a model that only adjusts for the number of nonrespiratory hospitalizations and seasonality (blue dotted line) or the synthetic control model (red dashed line). The vertical dashed line indicates the time of PCV introduction (the date of introduction varied by region in Mexico). The black horizontal line indicates the period during which the observed/counterfactual rate ratio was calculated.

Table S1.

Summary of datasets and sources

Country Type of data* Population covered, % Data period Year PCV introduced (valency) No. of pneumonia hospitalizations in year before PCV introduction (all ages) Source of data Link
Brazil Public system 82 (2012) 2004–2014 2010 (PCV10) 798,078 (2009) Hospital Information System, National Unified Health System, Ministry of Health www2.datasus.gov.br/DATASUS/index.php?area=02
Chile All 100 2001–2013 2011 (PCV10) 74,279 (2010) Departamento de Estadísticas e Información de Salud, Ministerio de Salud deis.minsal.cl/BDPublica/BD_Egresos.aspx
Mexico Public system Variable 2001–2013 2006 (PCV7/10/13) 86,564 (2009) Sistema Nacional de Información en Salud, Ministerio de Salud www.dgis.salud.gob.mx/contenidos/basesdedatos/std_egresoshospitalarios.html
Ecuador Public system Variable 2001–2012 2010 (PCV10) 21,153 (2005) Instituto Nacional de Estadísticas y Censos www.ecuadorencifras.gob.ec/banco-de-informacion/
United States All from 10 states 100 of 10 states 1994–2005 2000 (PCV7) 274,512 (1999) Agency for Healthcare Research and Quality, Healthcare Utilization Project https://www.hcup-us.ahrq.gov/sidoverview.jsp
*

Hospitalization data.

Introduced in different years by different states and risk groups beginning in 2006.

State Inpatient Databases from Arizona, Colorado, Iowa, Massachusetts, New Jersey, New York, Oregon, Utah, Washington, and Wisconsin.

Table S2.

Time series used as components of the synthetic control

Grouping scheme ICD-10 Description Exclusions
ICD-10 chapters
C00-D48 Neoplasms A40.3, B95
D50-89 Diseases of blood and blood-forming organs and certain disorders involving the immune mechanism
E00-99 Endocrine, nutritional, metabolic disorders
G00-99_SY Diseases of the nervous system G00-G04
H00-99_SY Diseases of the ear and mastoid process H10, H65, H66
I00-99 Diseases of the circulatory system
K00-99 Diseases of the digestive system
L00-99 Diseases of the skin
M00-99 Diseases of the musculoskeletal system
N00-99 Diseases of the genitourinary system
P00-99 Perinatal diseases
Q00-99 Congenital malformations, deformations and chromosomal abnormalities
R00-99 Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified
S00-T99 Injury, poisoning and consequences of external causes
U00-99 Codes for special purposes
V00-Y99 External causes
Z00-99 Factors influencing health status and contact w/ health workers
Other grouped outcomes
A10_B99_nopneumo Certain infectious and parasitic diseases, except intestinal A40.3, B95
B20-24 HIV
E10-E14 Diabetes
E40-E46 Malnutrition
I60-I64 Stroke
J20-J22 Bronchitis, bronchiolitis and unspecified acute lower respiratory infection
P05-P07 Premature delivery and low birth weight
ACH_NOJ All nonrespiratory hospitalizations J00–J99, F and O chapters
Specific outcomes
A17 Tuberculosis of nervous system
A18 Tuberculosis of other organs
A19 Miliary tuberculosis
A39 Meningococcal infection
A41 Other septicemia
B34 Viral infection of unspecified site
B96 Other specified bacterial agents as the cause of diseases classified to other chapters
B97 Viral agents as the cause of diseases classified to other chapters
B99 Other and unspecified infectious diseases
K35 Appendicitis
K80 Cholelithiasis
N39 Urinary tract infection (UTI)

Fig. 2.

Fig. 2.

Changes in hospitalizations for all-cause pneumonia (rate ratio) in five countries using (A) a model without synthetic controls and adjusted for nonrespiratory hospitalizations and seasonality only or (B) a model with synthetic controls. Brazil (blue circle), Chile (purple inverted triangle), United States (yellow diamond), Mexico (green triangle), and Ecuador (red square). Rate ratios were calculated as the sum of observed hospitalizations for all-cause pneumonia divided by the sum of predicted (counterfactual) hospitalizations during the evaluation period.

Table S3.

Changes in pneumonia hospitalizations (rate ratios) associated with PCV introduction in five countries

Country Vaccine Introduction, y Evaluation period (inclusive) Rate ratio
<12 mo* 12–23 mo 24–59 mo 5–17 y 18–39 y 40–64 y 65–79 y 80+ y
With synthetic control
Brazil PCV10 2010 2012–2013 0.75 (0.68, 0.82) 0.76 (0.70, 0.83) 0.77 (0.69, 0.86) 0.80 (0.70, 0.92) 0.80 (0.69, 0.92) 0.94 (0.80, 1.09) 0.96 (0.83, 1.10) 0.98 (0.86, 1.11)
Chile PCV10 2011 2012 0.55 (0.42, 0.87) NA 0.80 (0.70, 0.90) 0.70 (0.63, 0.78) 0.79 (0.70, 0.88) 0.83 (0.70, 0.96) 0.95 (0.86, 1.04) 1.00 (0.91, 1.11)
Ecuador PCV10 2010 2012 0.84 (0.72, 0.97) 1.05 (0.90, 1.20) 1.01 (0.86, 1.18) 0.89 (0.79, 1.02) 0.81 (0.67, 0.95) 0.85 (0.73, 0.97) 0.93 (0.81, 1.06) 1.00 (0.87, 1.15)
Mexico PCV7/10/13 2006 2010–2011 0.84 (0.74, 0.97) 0.85 (0.74, 0.98) 1.06 (0.92, 1.21) 1.20 (1.01, 1.38) 1.36 (1.06, 1.64) 1.27 (1.04, 1.54) 1.18(0.98, 1.44) 1.20 (1.00, 1.37)
United States PCV7 2000 2002–2004 0.86 (0.77, 0.96) 0.90 (0.78, 1.02) 0.95 (0.82, 1.08) 0.97 (0.81, 1.25) 1.14 (1.00, 1.31) 0.99 (0.82, 1.18) 0.97 (0.84, 1.13) 1.01 (0.84, 1.22)
Without synthetic controls (nonrespiratory hospitalizations as a denominator and control for seasonality)
Brazil PCV10 2010 2012–2013 0.78 (0.73, 0.83) 0.96 (0.89, 1.04) 0.86 (0.81, 0.90) 0.80 (0.75, 0.85) 0.86 (0.80, 0.92) 0.99 (0.94, 1.04) 1.10 (1.05, 1.16) 1.25 (1.18, 1.32)
Chile PCV10 2011 2012 0.85 (0.71, 1.02) NA 0.91 (0.80, 1.03) 0.70 (0.63, 0.78) 0.78 (0.70, 0.88) 0.86 (0.76, 0.95) 0.89 (0.81, 0.98) 0.96 (0.88, 1.05)
Ecuador PCV10 2010 2012 1.17 (1.01, 1.35) 1.51 (1.28, 1.76) 1.20 (1.06, 1.36) 1.03 (0.92, 1.15) 0.82 (0.71, 0.94) 0.92 (0.81, 1.04) 1.01 (0.90, 1.12) 1.19 (1.06, 1.33)
Mexico PCV7/10/13 2006* 2010–2011 0.83 (0.72, 0.96) 1.04 (0.92, 1.17) 0.93 (0.83, 1.03) 1.00 (0.94, 1.07) 1.05 (0.99, 1.11) 1.09 (1.01, 1.16) 1.00 (0.92, 1.08) 1.02 (0.95, 1.10)
United States PCV7 2000 2002–2004 0.81 (0.77, 0.86) 0.96 (0.91, 1.02) 1.08 (1.02, 1.14) 0.98 (0.93, 1.04) 0.95 (0.90, 0.99) 1.12 (1.07, 1.17) 1.11 (1.06, 1.15) 1.04 (1.00, 1.08)
Sensitivity analysis: Model with synthetic control, excluding bronchitis/bronchiolitis as a control variable
Brazil PCV10 2010 2012–2013 0.80 (0.69, 0.94) 0.79 (0.69, 0.88) 0.78 (0.70, 0.86) 0.81 (0.70, 0.92) 0.80 (0.69, 0.92) 0.95 (0.81, 1.10) 0.92 (0.80, 1.12) 0.97 (0.86, 1.11)
Chile PCV10 2011 2012 0.79 (0.63, 1.04) NA 0.86 (0.75, 0.98) 0.72 (0.61, 0.84) 0.79 (0.70, 0.88) 0.83 (0.69, 0.96) 0.95 (0.86, 1.04) 1.00 (0.91, 1.11)
Ecuador PCV10 2010 2012 1.10 (0.92, 1.31) 1.23 (1.02, 1.47) 1.16 (0.97, 1.36) 0.91 (0.79, 1.06) 0.73 (0.60, 0.88) 0.77 (0.65, 0.89) 0.85 (0.74, 0.97) 0.98 (0.85, 1.14)
Mexico PCV7/10/13 2006* 2010–2011 1.21 (0.91, 1.73) 1.26 (1.01, 1.60) 1.20 (1.01, 1.44) 1.22 (1.06, 1.39) 1.36 (1.05, 1.63) 1.26 (1.03, 1.53) 1.12 (0.95, 1.41) 1.12 (0.97, 1.28)
United States PCV7 2000 2002–2004 0.84 (0.71, 1.01) 0.92 (0.81, 1.04) 0.91 (0.80, 1.09) 0.87 (0.78, 1.19) 0.96 (0.68, 1.15) 0.92 (0.68, 1.19) 0.94 (0.71, 1.21) 0.94 (0.65, 1.20)
Secondary analysis: Model without synthetic control, with linear trend (nonrespiratory hospitalizations as a denominator and control for seasonality)
Brazil PCV10 2010 2012–2013 0.74 (0.65, 0.84) 0.77 (0.68, 0.88) 0.82 (0.73, 0.92) 0.77 (0.68, 0.87) 0.84 (0.73, 0.96) 0.94 (0.84, 1.05) 0.95 (0.85, 1.06) 0.98 (0.87, 1.10)
Chile PCV10 2011 2012 1.08 (0.85, 1.35) NA 0.87 (0.74, 1.01) 0.76 (0.67, 0.86) 0.76 (0.67, 0.86) 0.86 (0.75, 0.98) 0.92 (0.81, 1.03) 0.97 (0.86, 1.09)
Ecuador PCV10 2010 2012 0.91 (0.76, 1.10) 1.16 (0.97, 1.39) 1.00 (0.84, 1.19) 0.83 (0.68, 1.00) 0.70 (0.58, 0.86) 0.64 (0.53, 0.77) 0.76 (0.63, 0.91) 0.84 (0.66, 1.06)
Mexico PCV7/10/13 2006* 2010–2011 1.05 (0.77, 1.43) 1.13 (0.88, 1.45) 1.01 (0.80, 1.28) 1.13 (0.94, 1.36) 1.28 (1.08, 1.52) 1.25 (0.99, 1.60) 0.88 (0.67, 1.13) 0.91 (0.70, 1.18)
United States PCV7 2000 2002–2004 0.93 (0.77, 1.13) 0.83 (0.67, 1.01) 0.86 (0.71, 1.03) 0.89 (0.77, 1.02) 0.96 (0.86, 1.06) 0.84 (0.72, 0.98) 0.88 (0.78, 0.98) 0.82 (0.70, 0.96)
Secondary analysis: Model without synthetic control (nonrespiratory hospitalizations as a covariate rather than offset and control for seasonality)
Brazil PCV10 2010 2012–2013 0.77 (0.72, 0.82) 0.70 (0.66, 0.79) 0.74 (0.70, 0.81) 0.77 (0.72, 0.83) 0.89 (0.80, 0.96) 0.96 (0.88, 1.09) 0.95 (0.86, 1.05) 1.00 (0.91, 1.09)
Chile PCV10 2011 2012 0.78 (0.63, 0.96) NA 0.85 (0.74, 0.97) 0.68 (0.59, 0.77) 0.82 (0.74, 0.91) 0.94 (0.84, 1.05) 0.93 (0.84, 1.04) 0.98 (0.88, 1.08)
Ecuador PCV10 2010 2012 1.05 (0.89, 1.23) 1.51 (1.30, 1.74) 1.15 (1.01, 1.30) 0.91 (0.80, 1.02) 0.73 (0.62, 0.85) 0.75 (0.65, 0.86) 0.86 (0.77, 0.97) 0.94 (0.82, 1.06)
Mexico PCV7/10/13 2006* 2010–2011 1.13 (0.94, 1.30) 1.27 (1.12, 1.41) 1.35 (1.07, 1.55) 1.21 (1.08, 1.36) 1.19 (1.05, 1.35) 1.21 (1.04, 1.40) 1.05 (0.89, 1.24) 1.01 (0.87, 1.18)
United States PCV7 2000 2002–2004 0.83 (0.75, 0.92) 0.95 (0.85, 1.05) 1.04 (0.95, 1.14) 0.97 (0.90, 1.04) 0.93 (0.88, 0.98) 1.03 (0.96, 1.11) 1.08 (1.02, 1.14) 0.99 (0.93, 1.07)

Data for all children <24 mo of age in Chile were grouped. NA, not applicable.

*

<12 mo in Brazil, Ecuador, Mexico, and Chile; <24 mo in Chile.

Date of vaccine introduction Mexico uncertain and varied by region.

The increase in rates of hospitalization for pneumonia among older adults in Brazil and Ecuador suggested that factors other than the vaccine had influenced the observed trends. To put these patterns into context, we repeated this simple analysis for a broad range of comparison disease categories using data from Brazil (Table S2). These analyses demonstrated that hospitalizations for many other disease categories also increased substantially in the same period among 80+-year-olds (Fig. S1B). Among <12-mo-old children, the changes in these other disease categories after PCV10 was introduced were more varied, with both significant increases and decreases (Fig. S1A).

Fig. S1.

Fig. S1.

Changes in hospitalizations for different diseases categories in Brazil, where the rate ratios are estimated from a simple model that adjusts only for seasonality and the number of nonrespiratory hospitalizations as the denominator for (A) infants <12 mo and (B) adults 80+ y. Rate ratios are calculated as the sum of observed hospitalizations for all-cause pneumonia divided by the sum of predicted (counterfactual) hospitalizations during the evaluation period, 2012–2013. Values <1 indicate a decrease in hospitalizations, and values >1 indicate an increase.

Use of Synthetic Controls to Adjust for Unexplained Trends.

To adjust for unexplained trends in the data, we used the synthetic control approach. These analyses yielded several notable results. For all five of the countries, the introduction of PCVs was associated with substantial and significant declines in hospitalizations for all-cause pneumonia among children <12 mo of age (Figs. 2 and 3). The decline showed a similar trajectory in Brazil, Ecuador, Mexico, and the United States, and was more pronounced in Chile (though, as discussed in Sensitivity Analyses and Alternative Models, the estimates for Chile were less robust to the model structure; Fig. 3). The average decline across all five countries was −20% (95% CI: −27%, −12%). However, among 65- to 79-year-olds and 80+-year-olds, we did not detect a decline in hospitalizations for pneumonia after vaccine introduction in children in any of the five countries; for example, the estimate for 80+-year-olds was +3%; 95% CI: −4%, +10% (Figs. 2 and 3 and Table S3). For the other age groups, the estimated change differed between countries (Table S3). Notably, the significant declines in older children and young/middle-aged adults in some of the countries might have been biased by the inclusion of the 2009 influenza pandemic during the fitting period (sensitivity analyses below; Figs. S2 and S3).

Fig. 3.

Fig. 3.

Changes in hospitalizations for all-cause pneumonia by months since vaccine introduction in Brazil (red), Chile (blue), Ecuador (green), Mexico (purple), and the United States (orange) in (A) children <12 mo and (B) adults 80+ y; estimates are from the model adjusted using a synthetic control. In each month, the rate ratio is the sum of observed all-cause pneumonia hospitalizations during the previous 12 mo divided by the sum of predicted (counterfactual) hospitalizations in the same period.

Fig. S2.

Fig. S2.

Validation of synthetic controls models using pre-PCV data from Brazil: (A) Brazil; (B) Chile; (C) Ecuador; (D) Mexico; and (E) the United States. Prevaccine data were divided into 4-y fitting windows, followed immediately by 1-y evaluation windows, at the start of each time series. Windows were then successively shifted forward in time by 1-mo increments. Each rate ratio is for the 12-mo evaluation window ending in the indicated month. Because the analysis covers only prevaccine months, the 95% credible intervals of the rate ratios should include 1. Red triangles are from models fit without synthetic controls; black circles are from models with synthetic controls.

Fig. S3.

Fig. S3.

Sensitivity analysis showing estimates of the rate ratio from the synthetic controls model for Brazil produced by various training periods. For all estimates, the rate ratios are calculated by comparing the observed counts in 2012–2013 with the counterfactual estimates for 2012–2013; the fitting periods were 2004–2009 (black, full pre-PCV0 period); 2006–2009 (yellow); 2008–2009 (red); 2004–2007 (blue); 2004–2008 (green); 2004–2010 (gray); 2004–2011 (purple, includes 1 y post-PCV10).

In some strata, the estimates were similar between the model that simply adjusted for nonrespiratory hospitalizations and the model that adjusted for the synthetic control (e.g., declines of −25% vs. −22% in hospitalizations for all-cause pneumonia among children <12 mo of age in Brazil with and without a synthetic control, respectively; Fig. 1 and Table S3). This consistency would be expected if there are no unexplained biases in the data. However, in other strata, the synthetic control adjustment changed the estimates substantially. Among 80+-year-olds in Brazil, the model without the synthetic control estimated a 25% (95% CI: 18%, 32%) increase in pneumonia rates associated with introduction of PCV, whereas the model with a synthetic control estimated no change in pneumonia rates (−2%, 95% CI: −14%, +11%). The weight given to different variables in the synthetic control differed by age group and country (SI Results and Dataset S1).

Validation of Synthetic Controls Using Other Categories of Pneumococcal Disease.

Analyses of other categories of pneumococcal disease can be used to evaluate the epidemiological credibility of the estimates of the declines in all-cause pneumonia. The relative decline in disease rates should be correlated with the specificity of the case definition—more-specific definitions (e.g., IPD, pneumococcal pneumonia) should have greater relative declines than less-specific definitions (e.g., all-cause pneumonia) because a larger fraction of cases are caused by vaccine-type pneumococci. When using the synthetic control model, the estimates followed the expected pattern: for children <12 mo of age, the greatest estimated decline was for IPD (−59%; 95% CI: −66%, −52%), followed by pneumococcal/lobar pneumonia (−39%, 95% CI: −66%, −24%), followed by a more-specific definition of all-cause pneumonia (19) (−26%; 95% CI: −34%, −16%), and finally the less-specific definition of all-cause pneumonia that was used for the main analyses (−14%; 95% CI: −23%, −3%; Fig. 4A). The same pattern of greater reduction with higher specificity was seen for the point estimates for all of the other age groups (Fig. 4A). In contrast, estimates from models that only adjusted for nonrespiratory hospitalizations or linear trend did not follow this expected pattern (Fig. 4 B and C).

Fig. 4.

Fig. 4.

Changes in hospitalizations in the United States (rate ratios) due to IPD (turquoise), lobar/pneumococcal pneumonia (orange), all-cause pneumonia [Grijalva and coworkers’ (19) definition, purple], all-cause pneumonia (as defined in this study, pink) adjusted using (A) synthetic controls; (B) linear trend, nonrespiratory hospitalizations (as the denominator), and seasonality; and (C) only nonrespiratory hospitalizations (as the denominator) and seasonality.

Sensitivity Analyses and Alternative Models.

To validate the synthetic control approach, we divided the prevaccine data from each country into different 48-mo training periods and 12-mo evaluation periods. The counterfactual estimates matched the observed data (rate ratio of 1) in most instances for the <5-y-old age group and age groups over 65 years when using the synthetic controls (Fig. S2). In Brazil and Chile, the models did not perform well (credible intervals for the rate ratios did not include 1) for the age groups that were most affected by the 2009 influenza pandemic (older children and young and middle-aged adults) when the postvaccine evaluation windows included the pandemic period (SI Results and Fig. S2). The estimates for Chile and Ecuador were also more subject to short-term variations than for the other countries (Fig. S2). As a further sensitivity analysis, we used different prevaccine training periods and estimated the rate ratios using the same evaluation period; the credible intervals for the rate ratios in older children and adults mostly overlapped 1 when excluding 2009 from the training period (Fig. S3).

The sensitivity of the results to the exclusion of specific variables as potential components of the control differed by country and age group, with highly consistent results for Brazil and the United States, and less-consistent results for the other countries (Fig. S4). The estimates for children <12 mo of age in Ecuador, Mexico, and Chile were sensitive to the specific variables included. In particular, J20–J22 (bronchitis/bronchiolitis) was an influential variable. The exclusion of J20–J22 generally increased heterogeneity of results between studies among children <12 mo and 12–23 mo of age compared with the models that included bronchitis/bronchiolitis (Table S3). In some instances, the models excluding J20–J22 did not properly adjust for pneumonia epidemics (leading to smaller estimates of vaccine-related declines). A notable exception is the estimates for <1-y-old children in Chile, where the effect of the vaccine was possibly overestimated when J20–J22 was included (Figs. 13).

Fig. S4.

Fig. S4.

Sensitivity analyses showing estimates of the rate ratio from models with the full set of variables contributing to the synthetic control (black square), and from models where the top one (red circle), two (blue triangle), or three (green diamond) covariates were excluded. Rate ratios are calculated as the sum of observed hospitalizations for all-cause pneumonia divided by the sum of predicted (counterfactual) hospitalizations during the evaluation period. Values <1 indicate a decrease in hospitalizations, and values >1 indicate an increase.

We also evaluated two alternative models (SI Results and Table S3) that had a simple adjustment for linear trend or used nonrespiratory hospitalizations as the sole covariate in the model (rather than as an offset). In some instances, these models gave results similar to the synthetic control model, but they were less likely to detect an effect of the vaccine in children, and, as noted above, the rate ratio estimates from these alternative models did not follow the expected patterns among different categories of pneumococcal disease, whereas the estimates from the synthetic control models did.

SI Results

Importance of the Components of the Counterfactual Predictions Differ by Age Group and Country.

The weights of the variables that comprised the synthetic control varied substantially by age group and country (Dataset S1). There were some consistent patterns across countries. Bronchitis/bronchiolitis (J20–J22) was heavily weighted in all countries in children and in many of the adult age groups. Malnutrition (ICD-10 E40–E46) and other endocrine, nutritional, and metabolic diseases (E00–E99) were weighted substantially in many strata. When J20–J22 was excluded from the model, a broader set of variables received more weight.

Sensitivity of Results to the Control Variables Included.

To evaluate the sensitivity of the results to the inclusion of specific control variables, we removed the control disease category that received the most weight in each age group and refit the models (Fig. S4). We then repeated this process, removing up to three of the top-weighted variables. In most age groups, the estimates did not change appreciably (percent decline changed by 0–5%) when removing these control variables. The point estimates were more sensitive for 12- to 23-mo-olds (change from 21% decline to 28% decline when removing three variables) and for 18- to <40-y-olds (change from 34% decline to 20% decline when removing three variables), but the qualitative conclusion of vaccine impact remained the same in both instances. In other countries, some results were more sensitive to the specific variables included. In particular, the inclusion of J20–J22 (bronchitis/bronchiolitis) was influential in several instances, as noted in the main text.

Comparison of Synthetic Controls with Alternative Model Structures.

As secondary analyses, we evaluated two additional models: adjusting for simple linear trend or using all nonrespiratory hospitalizations (NRH) as a covariate rather than as a denominator (as in model 1; Eq. 1). When adjusting for linear trend, the counterfactual estimates assume that the trend from the prevaccine period continues into the postvaccine period (model 3; Eq. S1):

log(pneumoniacasest)=β0+k[ckI{monthk=m(t)}]+log(NRHt)+β1indext [S1]

Model 4 (Eq. S2) is similar to model 1, except that the coefficient for all nonrespiratory hospitalizations is estimated rather than held to 1.

log(pneumoniacasest)=β0+k[ckI{monthk=m(t)}]+binary_inclusion_indicatorjβ1log(NRHt) [S2]

Both of these models provided adequate adjustments in some, but not all, of the settings (Table S3). In contrast to the results from the synthetic controls model, the linear trend adjustment resulted in estimates of significant and substantial declines in pneumonia in older adults in the United States and Ecuador (Table S3).

Another validity check is to evaluate the magnitude of the rate ratios for different categories of pneumococcal disease that vary in specificity. As described in the main text, when performing the synthetic control adjustment (model 2; Eq. 2), the rate ratios followed the expected patterns, with the largest declines for invasive pneumococcal disease and the smallest declines for all-cause pneumonia. In contrast, when performing a simple adjustment for linear trend (model 3; Eq. S1), the estimates did follow the expected patterns for children but not for adults instance, among 65- to 80-y-olds, the relative declines for pneumococcal pneumonia and both definitions of all-cause pneumonia were similar, and only the estimate for IPD was greater. With the adjustment for linear trend, we did detect significant declines in all-cause pneumonia (more specific definition) in older adults (17%; 95% CI: 5%, 27% among 65- to 80-y-olds), similar to past studies (7) (Fig. 4B). We also evaluated a simple model (model 1; Eq. 1) that only used nonrespiratory hospitalizations as a denominator. The estimates obtained with this approach showed no pattern based on specificity of the disease outcome (Fig. 4C).

Sensitivity of Results to the Specific Years Included in the Training Period.

As a further evaluation of the robustness of the results, we tested the effect of using different years in the training period for models fit to data from Brazil. If just 2 y were included in the pre-PCV training period (2008–2009), the credible intervals became substantially wider, though the point estimates were similar (Fig. S3). The inclusion of 2009 during the training period led to greater estimates of decline in several age groups, potentially a bias introduced by the 2009 influenza pandemic. Including 2011 in the training period biased the results toward no effect, which would be expected because the effect of the vaccine would already be expected to be apparent by this time (Fig. S3).

Discussion

We have used synthetic controls to assess the population-level impact on pneumonia of vaccinating children with PCVs. In so doing, we have shown that using synthetic controls provides a promising way to adjust for unmeasured bias and confounding in observational studies that evaluate the impact of public health interventions. Our results showing a reduction in all-cause pneumonia following introduction of PCVs in infants, and to some extent in young children, agree with previous studies. We also confirmed that invasive pneumococcal disease and pneumococcal pneumonia declined in children and adults. However, contrary to previous reports (7, 20), we did not detect a reduction in all-cause pneumonia among adults 65 y and older in any of the five countries. This is notable because the burden of all-cause pneumonia in this age group is large, and the fraction that is indirectly preventable by vaccinating children might be smaller than previously estimated (7, 20). Declines were also observed in older children and younger adult age groups in some countries, but validation and sensitivity analyses suggest that some of these declines were biased by inclusion of the 2009 influenza pandemic during the prevaccine period.

Our analyses show that although observational studies of vaccine impact come with multiple caveats, use of synthetic controls has the potential to improve the accuracy of the estimates. This approach to adjusting for unmeasured bias and confounding, irrespective of source, increases comparability of such assessments across settings.

Use of Synthetic Controls to Evaluate Impacts of Vaccination.

Aside from conducting a large and expensive cluster-randomized trial, time trend analyses are the only method that can quantify both the direct effects of being vaccinated and the indirect effects that result from disrupting transmission. However, because any number of factors can influence disease rates around the time of vaccine introduction, time-series analyses are susceptible both to incorrectly attributing unrelated changes to a vaccine and to missing a vaccine-related change hidden by an unrelated factor. Analyses that incorporate synthetic controls to adjust for unmeasured bias and confounding have several distinct advantages over approaches currently used; first, they are more likely to correctly control for these unrelated changes than other methods, potentially increasing accuracy as well as comparability of results between studies and study settings. Second, the analyst is not required to handpick individual comparison diseases a priori, a process that itself can introduce bias; instead, the method combines time series of hospitalizations for many disease outcomes into a single synthetic composite time series, with each component weighted according to its fit to the disease of interest in the preintervention period. The result is a time series that is more likely to behave like the disease of interest in the postintervention period than any single comparator. Third, the synthetic control eliminates the need to assume that any trend beginning in the prevaccine period continues linearly into the postvaccine period, as is done with the commonly used interrupted time-series analysis. Fourth, the analytic framework can readily be used to incorporate any time series relevant to the outcome of interest into the synthetic control. For example, time series of virological data [e.g., confirmed influenza or respiratory syncytial virus (RSV) infections] could be used to adjust epidemics of pneumonia, assuming they are not themselves influenced by the vaccine.

Synthetic controls can also be implemented for different types of study designs and different levels of data availability. For example, synthetic controls could be used for prospective studies of pneumonia trends or to adjust for changes in ascertainment over time for studies of invasive bacterial diseases (using time series of other pathogens as components of the synthetic control) (21, 22). In settings where it is not feasible to extract a large number of time series to use as components of the synthetic control (e.g., in a setting with only paper health records), the approach might still be useful, albeit with fewer contributing time series. In such a situation, studies from countries with electronic data might be used to guide selection of time series that are most likely to be informative. In all of these scenarios, the synthetic control approach provides the important advantage of accounting for model uncertainty (and subsequent uncertainty in the estimates of the counterfactual) in a way that current approaches do not.

To use the synthetic control approach optimally, we propose the following guidelines. First, we suggest presenting results from both a model that adjusts only for all-cause hospitalizations or population size and one that adjusts using the synthetic control; presenting results from both models increases the likelihood of detecting potential biases, as was the case for the older adults in Brazil. Second, we suggest presenting results of sensitivity analyses in which the most heavily weighted components of the control are removed, and the models are rerun (Fig. S4). Third, we suggest validating the model by using one subset of prevaccine data to predict values in a second subset of prevaccine data (Fig. S2), with the expectation that there should be no effect if the relationships among the variables are stable. This approach can help to identify potential issues with the analyses, as with the effect of the 2009 influenza pandemic in older children and young/middle-aged adults in Brazil (Fig. S3). Fourth, whenever possible, analyses of positive controls—time series for which one is highly confident that an intervention will have an impact—should be included; for example, invasive pneumococcal disease in the case of PCVs.

Several caveats must be noted when interpreting the results. The synthetic controls method makes two major assumptions: that use of PCVs does not affect the incidence of the diseases that are components of the synthetic control, and that the only change in the relationship between pneumonia and the components of the synthetic control over time is caused by the vaccine. If the contributors to the synthetic control have themselves been subject to a separate intervention (e.g., rotavirus vaccine), this assumption would be violated. If a disease time series is expected to be affected by the vaccine, it must be excluded from the synthetic control; likewise, if reporting or coding for pneumonia changes over time (e.g., due to awareness of pneumonia as a public health issue following PCV introduction), the assumption that the relationship among the time series is consistent over time would be violated. Likewise, some factors affecting the outcome of interest might not be captured by the time series available for inclusion in the synthetic control—for instance, the 2009 influenza pandemic was not fully adjusted by the model, and therefore the relationship between pneumonia and the components of the synthetic control change during this time period. If the assumptions of the synthetic controls model are fulfilled, then the difference between the observed and counterfactual values can be interpreted as the causal effect of the vaccine (16). However, as with any observational study, results must be interpreted cautiously; synthetic controls can help to reduce the effects of unmeasured bias and confounding, but they are unlikely to completely eliminate it.

Impacts of PCVs on Pneumococcal Disease.

Assessments of vaccine impact are a particular challenge because there is no “ground-truth” estimate for how much disease rates should be expected to decline following vaccine introduction. We get some approximation from randomized controlled trials and case-control analyses, both of which provide an estimate of the direct protection obtained by receiving the vaccine, although these do not capture the indirect effect of the vaccine that results from disrupting transmission. The estimates of the PCV-related reduction in hospitalizations of infants for all-cause pneumonia produced by the synthetic controls approach are larger than the both 10% against clinical pneumonia found by a randomized controlled trial in Latin America (12) and the 11% reduction in pneumonia hospitalizations in children less than 2 y of age found by a case-control study in Chile (23). Given that trend analyses capture both direct and indirect effects of the vaccine, it is reasonable that our estimates would be larger than those found by clinical trials or case-control studies, which only capture direct effects. Estimates from time trend analyses have been more varied. For instance, a 13% reduction in clinical pneumonia in children <12 mo was found from a population-based surveillance study in Brazil (24), whereas estimates of ∼20% and 40% were found using administrative databases from Brazil and the United States, respectively (7, 25).

Aside from comparing estimates from time trend studies to randomized controlled trials, the best ground-truth estimate of vaccine impact that we can obtain is to quantify changes in more-specific categories of pneumococcal disease, where high-quality surveillance studies provide an estimate of the expected decline. When evaluating categories of pneumococcal disease with different specificities, the relative decline should be greater for disease categories that are more specific for pneumococcus (12, 26). Both the synthetic control models and models with simpler adjustments gave estimates of declines in IPD that were comparable to estimates from active surveillance data from the United States (27). The advantage of the synthetic control approach can be seen when comparing estimates of vaccine-related changes in four categories of pneumococcal disease. The point estimates from the synthetic controls model followed the expected patterns, with greater declines for more specific definitions across most age groups (Fig. 4). In contrast, using simpler adjustment methods, the expected pattern was only seen among children.

In contrast to some previous studies, we found no vaccine-related reductions in pneumonia among people age 65+. Although one trend study from the United States reported no decline in confirmed or presumptive hospitalized pneumonia cases in this age group (28), others that used large administrative databases found significant PCV-related reductions in pneumonia hospitalization in seniors (7, 8, 19, 20). Our secondary analyses, in which we use a simple linear trend to make the adjustment, might explain this difference. Trend-line adjustments in the other studies might have been skewed upward by severe pneumonia seasons during the prevaccine years, thereby leading to larger apparent vaccine impact when extrapolated into the postvaccine period. Synthetic control adjustments are less sensitive to such short-term epidemics than are linear trends, even if they are not immune.

There are at least two possible explanations for our results in older adults; one is that any indirect effect on all-cause pneumonia in older adults is either nonexistent or too small to estimate reliably from time-series data due to insufficient power. Although we did find substantial declines in hospitalizations for IPD and lobar/pneumococcal pneumonia among people 65+ y in the United States after introduction of PCV7, vaccine-type pneumococci might comprise a smaller-than-expected fraction of all-cause pneumonia cases than previously thought. It is also possible that increases in pneumonia hospitalizations due to other pathogens either offset any indirect benefit or mask declines due to pneumococcus. A second possibility is that our synthetic control approach might not fully adjust for unmeasured bias and confounding, so true declines might be masked. Regardless, the burden of pneumonia in older adults is tremendous, so small relative declines (even if they cannot be reliably measured from time-series data) could translate into large numbers of cases prevented.

We found some unexplained variability in estimates of vaccine-associated changes between countries, particularly among young and middle-aged adults; these were the age groups with the highest hospitalization rates during the 2009 pandemic, and significant declines in pneumonia in these age groups were notably only found in countries where the baseline period included the pandemic. Including the pandemic during the fitting period has the potential to bias the counterfactual upward and thus exaggerate the estimated declines, which the sensitivity analyses in Figs. S2 and S3 suggest might have occurred. Alternative approaches to adjust for the pandemic (including the inclusion of time series of virological data as components of the synthetic control) could help to adjust for these biases. The variability in the estimated impacts of the vaccine between countries for some age groups highlights the perils of trying to interpret data from a single country on its own, without context.

Our results were sensitive to some modeling choices and not to others. Inclusion of bronchiolitis/bronchitis (J20–J22) time series as a component of the synthetic control had a pronounced effect on the estimates; for instance, the estimated magnitude of a vaccine-associated effect on all-cause pneumonia for each age group in Chile was substantially larger when bronchiolitis/bronchitis (J20–J22) was included in the synthetic control (Table S3). The validity of including bronchiolitis/bronchitis in the model is debatable. On one hand, bronchiolitis/bronchitis are acute respiratory illnesses strongly associated with RSV in young children; they likely share many causal mechanisms with all-cause pneumonia and can potentially explain short-term epidemic patterns (i.e., if epidemics of pneumonia are also caused by RSV or some other shared environmental factor). However, PCVs could plausibly influence the rate of bronchiolitis/bronchitis hospitalizations as indicated by results showing RSV-associated disease decreasing in the United States following PCV introduction, in violation of one of the model assumptions (29).

Although the use of synthetic controls is not a guarantee against bias and confounding, we believe that this approach will increase the probability of obtaining accurate and interpretable estimates of vaccine impact that can be shared with public health decision-makers. Despite the complex computing involved, using the synthetic control method is straightforward, thanks to an R package developed by Brodersen et al. (18). We include code for implementing this package with administrative hospitalization data (Datasets S2–S4). The adoption of this methodology has the potential to improve the quality and transparency of vaccine benefit estimates as well as the comparability between studies conducted in different settings.

Methods

Data Sources and Definitions.

We used routinely collected administrative data on hospitalizations from Brazil, Chile, Ecuador, Mexico, and 10 states in the United States (Table S1). The unifying characteristic of the five databases is that they all use International Classification of Diseases (ICD) codes to classify the cause of hospitalizations. The primary outcome for all analyses was all-cause pneumonia. For the components of the synthetic control, we used the ICD chapters, which group disease codes thematically (e.g., infectious and parasitic diseases, diseases of the respiratory system, etc.). Some codes that identified specific diseases were also used. Codes that could potentially be influenced by the introduction of PCVs were excluded from the covariates (SI Methods). The Human Investigation Committee at Yale University determined that this study is exempt from review.

Preprocessing Data.

All of the time series were log-transformed and standardized before being used for analysis. This transformation helped to minimize the effects of epidemics on the long-term trends and associations. After model fitting and before calculating the rate ratios and cases prevented, the observed and predicted values were transformed back to the original scale. For data from Brazil, additional preprocessing was required to adjust for an abrupt shift in coding that occurred at the start of 2008 due to a change in reimbursement policies (SI Methods).

Estimating Vaccine-Associated Declines.

When estimating the effect of a vaccine using time-series data, it is necessary to compare the observed number of cases with a counterfactual estimate of what the number of cases would have been if the vaccine had not been introduced. We compare the synthetic control approach with other approaches that are commonly used to generate counterfactuals. In a simple pre/postanalysis (model 1; Eq. 1), the counterfactual for the postvaccine period is based on the seasonally adjusted mean number of cases for the prevaccine period. We adjusted for seasonal variations using indicator variables for month of the year [ck*I{monthk = m(t)}]. Changes in population size or changes in the volume of nonrespiratory hospitalizations (NRH) were adjusted using an offset:

log(pneumoniacasest)=β0+k[ckI{monthk=m(t)}]+log(NRHt). [1]

With the synthetic control approach [causal impact method of Brodersen et al. (18)] (model 2; Eq. 1), we used time series for specific other diseases to adjust for trends unrelated to the vaccine. The control time series were incorporated into a regression model fit to pneumonia data from the prevaccine period. Bayesian variable selection was used to weight the different control time series, effectively giving more weight to those predictor variables that jointly explain the outcome variable best. These weights are used to generate a composite control variable (the synthetic control), which adjusts the counterfactual for changes in these other time series. Notably, log(NRHt) can be included as one of the potential control variables.

log(pneumoniacasest)=β0+k[ckI{monthk=m(t)}]+[binary_inclusion_indicatorjcoeffj*log(control_timeseriesjt) [2]

In this model, Σ[binary_inclusion_indicatorj*coeffj*log(control_time seriesjt)] is the set of control time series, weighted by their inclusion indicators and regression coefficients. The binary_inclusion_indicatorj are independent Bernoulli (π)-distributed random variables that determine the presence or absence of a particular control series in the model (0: excluded; 1: included). For the variable selection step, spike and slab priors were used with equal prior probability for inclusion for all covariates (18), and this probability was set so that the prior inclusion probability for each variable was π = 0.5. We also evaluated models that excluded J20–J22 as a potential component of the synthetic control (a heavily weighted covariate that might potentially be influenced by the vaccine), as well as models that performed simple adjustments for linear trend or that included log(NRHt) as the only covariate (rather than as an offset) (SI Results). Each of the models was fit to pneumonia data from the pre-PCV period and used to generate counterfactual estimates for a post-PCV evaluation period. We used the bsts (30) and CausalImpact (18) packages in R for model fitting. For details on the calculation of the counterfactuals and rate ratios and additional details on the model, SI Methods. The R code used for these analyses and the time series are included in Datasets S2–S10 and can be found at https://github.com/weinbergerlab/synthetic-control/.

SI Methods

Definitions.

The Latin American datasets used a single ICD-10 code for each hospitalization, whereas the US data used multiple ICD-9 codes per hospitalization. The primary outcome for all analyses was all-cause pneumonia, defined as the presence of ICD-10 codes J12–J18 in the diagnostic field in the Latin American datasets or ICD-9 codes 480–486 in any of the diagnostic fields in the US dataset. For the components of the synthetic control, we used the ICD chapters, which group disease codes thematically (e.g., infectious and parasitic diseases, diseases of the eye, diseases of the respiratory system). From these broad categories, we excluded codes that could potentially be influenced by the introduction of PCVs; these include otitis media (H65–H66); meningitis (G00–G04); conjunctivitis (H10) and pneumococcal/streptococcal sepsis (A40, A49); and pneumococcus as the cause of diseases classified elsewhere (B95.3). For this same reason, we did not include the codes J00–J99 (diseases of the respiratory system) among the constituent time series of the synthetic control, although we did compare models that included or excluded J20–J22 (bronchitis/bronchiolitis/unspecified lower respiratory tract infection).

In addition to the ICD chapters, we also included some additional specific sets of codes as controls, which we hypothesized might share similar biases/causes as all-cause pneumonia or which have been used as controls in previous studies; these included diabetes (E10–E14), premature delivery/low birth weight (P05–P07), malnutrition (E40–E46), disorders of the urinary tract (N39, including infections), acute appendicitis (K35), and stroke (I60–I64). The full list is given in Table S2.

Preprocessing Data.

All of the time series were log-transformed and standardized before being used in the models. For months where there were zero cases, a value of 0.5 was substituted before the log-transformation (this was a rare occurrence, with just 0.4% of entries in Brazil across all variables and age groups having values of zero). This transformation helped to minimize the effects of epidemics on the long-term trends and associations. After model fitting and before calculating the rate ratios and cases prevented, the observed and predicted values were transformed back to the original scale. For data from Brazil, additional preprocessing was required. There was an abrupt shift in coding that occurred at the start of 2008 due to a change in reimbursement policies. This coding shift tended to have the greatest effect on cause-specific codes and was less prominent when aggregating into ICD chapters. To adjust for these shifts, regressions were fit to each of the control time series. The outcome was the control time series, and covariates were a cubic spline for date, a dummy variable for pre/post-2008, and dummy variables for month of the year. Observations for the original log-transformed time series were then divided by (intercept + 2008 dummy × coefficient_2008). This effectively adjusts the time series for abrupt changes in 2008. If no change occurred, all values were effectively divided by the intercept. These adjusted time series for Brazil were then standardized before using them in the model.

Model Fitting.

Bayesian structural time-series models provided a convenient framework for fitting these models (18, 31). We used the bsts (30) and CausalImpact (18) packages in R for model fitting and formatting of output. The models ran for 10,000 Markov chain Monte Carlo (MCMC) iterations (burn-in of 1,000 iterations). Contrary to previous publications (18), we held the intercept and seasonality parameters constant over time. To calculate average estimates between countries, we used metaanalyses (SI Methods). The R code used for these analyses, along with the formatted aggregate time series, is included as Datasets S2–S10.

Estimation of the Rate Ratios.

Each of the models was fit to pneumonia data from the pre-PCV period and used to generate counterfactual estimates for a post-PCV evaluation period. Because it takes time for a vaccine program to reach full coverage and for an effect to be evident, there was a gap of 2 y between the time of vaccine introduction and the beginning of the evaluation period, except for Chile, where there were fewer years of post-PCV data available, so the gap was 1 y. For the months included in the evaluation period (Table S3), the counterfactual estimate was Y1 = Σ(predicted pneumonia casest), and the observed number of cases during the same time period was Y2 = Σ(observed pneumonia casest). The vaccine effect (rate ratio) was then calculated as Y2/Y1. The 2.5 and 97.5 percentiles of the MCMC iterations provide the 95% central credible intervals. To visualize the trajectories of the decline in pneumonia rates, we also calculated a rolling rate ratio, where Y1 and Y2 were estimated for rolling 12-mo evaluation windows, the first of which ended 1 mo after the training period (including 1 mo of postvaccine data and 11 mo of prevaccine data).

Comparing Estimates of Vaccine-Associated Change Between Countries.

The impacts of the conjugate vaccine against pneumonia might vary between countries due to regional differences in serotype distribution, pneumonia etiology, frequency of comorbidities, and vaccine used. However, it can still be useful to formally compare the similarities and differences between studies. To calculate average estimates of the rate ratios where appropriate, we used the random effects metaanalysis function in the metafor package in R (32). The pooled estimates were calculated using the empirical Bayes method (32).

SI Data and Code

The aggregated time series used for these analyses and corresponding R code are included as Datasets S2–S10.

Supplementary Material

Supplementary File
pnas.1612833114.sd01.xlsx (35.1KB, xlsx)
Supplementary File
Supplementary File
pnas.1612833114.sd03.txt (25.5KB, txt)
Supplementary File
Supplementary File
pnas.1612833114.sd05.csv (191.3KB, csv)
Supplementary File
pnas.1612833114.sd06.csv (208.4KB, csv)
Supplementary File
pnas.1612833114.sd07.csv (147.3KB, csv)
Supplementary File
pnas.1612833114.sd08.csv (194.7KB, csv)
Supplementary File
pnas.1612833114.sd09.csv (247.2KB, csv)
Supplementary File
pnas.1612833114.sd10.csv (270.6KB, csv)
Supplementary File
pnas.1612833114.sd01.xlsx (35.1KB, xlsx)

Acknowledgments

Thanks to Gerardo Chowell for help identifying appropriate datasets for analysis; Rodrigo Fuentes for information on the dataset from Chile; and Claire Broome, Cynthia Whitney, Marie Griffin, Kate O'Brien, Cristiana Toscano, Lucia De Oliveira, Kayoko Shioda, Ted Cohen, and Marc Lipsitch for helpful feedback. This work was funded by the Bill & Melinda Gates Foundation Grant OPP1114733; NIH/National Institute on Aging Grants P30AG021342 (Scholar at the Claude D. Pepper Older Americans Independence Center at Yale University School of Medicine), R01AI123208, and R56AI110449; and NIH National Center for Advancing Translational Science Grants CTSA KL2TR001862 and UL1TR001863 (to E.D.S. and D.M.W.).

Footnotes

Conflict of interest statement: D.M.W. has previously received an investigator-initiated research grant from Pfizer and consulting fees from Pfizer, Merck, GSK, and Affinivax. L.S. and R.J.T. have an ownership interest in Sage Analytica, a research consultancy with government, nongovernment, and pharmaceutical industry clients, including an investigator-initiated research grant from Pfizer (completed in 2013).

This article is a PNAS Direct Submission.

Data deposition: The code and data reported in this paper have been deposited in the GitHub database, https://github.com/weinbergerlab/synthetic-control/, and are available in Datasets S1–S10.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1612833114/-/DCSupplemental.

References

  • 1.Hariri S, et al. HPV-IMPACT Working Group Population-based trends in high-grade cervical lesions in the early human papillomavirus vaccine era in the United States. Cancer. 2015;121(16):2775–2781. doi: 10.1002/cncr.29266. [DOI] [PubMed] [Google Scholar]
  • 2.Dijkstra F, Donker GA, Wilbrink B, Van Gageldonk-Lafeber AB, Van Der Sande MA. Long time trends in influenza-like illness and associated determinants in The Netherlands. Epidemiol Infect. 2009;137(4):473–479. doi: 10.1017/S095026880800126X. [DOI] [PubMed] [Google Scholar]
  • 3.Chang DH, et al. Trends in U.S. hospitalizations and inpatient deaths from pneumonia and influenza, 1996-2011. Vaccine. 2016;34(4):486–494. doi: 10.1016/j.vaccine.2015.12.003. [DOI] [PubMed] [Google Scholar]
  • 4.Davis MM, King JC, Jr, Moag L, Cummings G, Magder LS. Countywide school-based influenza immunization: Direct and indirect impact on student absenteeism. Pediatrics. 2008;122(1):e260–e265. doi: 10.1542/peds.2007-2963. [DOI] [PubMed] [Google Scholar]
  • 5.Richardson V, et al. Effect of rotavirus vaccination on death from childhood diarrhea in Mexico. N Engl J Med. 2010;362(4):299–305. doi: 10.1056/NEJMoa0905211. [DOI] [PubMed] [Google Scholar]
  • 6.do Carmo GMI, et al. Decline in diarrhea mortality and admissions after routine childhood rotavirus immunization in Brazil: A time-series analysis. PLoS Med. 2011;8(4):e1001024. doi: 10.1371/journal.pmed.1001024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Grijalva CG, et al. Decline in pneumonia admissions after routine childhood immunisation with pneumococcal conjugate vaccine in the USA: A time-series analysis. Lancet. 2007;369(9568):1179–1186. doi: 10.1016/S0140-6736(07)60564-9. [DOI] [PubMed] [Google Scholar]
  • 8.Simonsen L, et al. Effect of 13-valent pneumococcal conjugate vaccine on admissions to hospital 2 years after its introduction in the USA: A time series analysis. Lancet Respir Med. 2014;2(5):387–394. doi: 10.1016/S2213-2600(14)70032-3. [DOI] [PubMed] [Google Scholar]
  • 9.Feikin DR, et al. Serotype Replacement Study Group Serotype-specific changes in invasive pneumococcal disease after pneumococcal conjugate vaccine introduction: A pooled analysis of multiple surveillance sites. PLoS Med. 2013;10(9):e1001517. doi: 10.1371/journal.pmed.1001517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.O’Brien KL, et al. Hib and Pneumococcal Global Burden of Disease Study Team Burden of disease caused by Streptococcus pneumoniae in children younger than 5 years: Global estimates. Lancet. 2009;374(9693):893–902. doi: 10.1016/S0140-6736(09)61204-6. [DOI] [PubMed] [Google Scholar]
  • 11.Said MA, et al. AGEDD Adult Pneumococcal Burden Study Team Estimating the burden of pneumococcal pneumonia among adults: A systematic review and meta-analysis of diagnostic techniques. PLoS One. 2013;8(4):e60273. doi: 10.1371/journal.pone.0060273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Tregnaghi MW, et al. COMPAS Group Efficacy of pneumococcal nontypable Haemophilus influenzae protein D conjugate vaccine (PHiD-CV) in young Latin American children: A double-blind randomized controlled trial. PLoS Med. 2014;11(6):e1001657. doi: 10.1371/journal.pmed.1001657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Loo JD, et al. Systematic review of the indirect effect of pneumococcal conjugate vaccine dosing schedules on pneumococcal disease and colonization. Pediatr Infect Dis J. 2014;33(Suppl 2):S161–S171. doi: 10.1097/INF.0000000000000084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Loo JD, et al. Systematic review of the effect of pneumococcal conjugate vaccine dosing schedules on prevention of pneumonia. Pediatr Infect Dis J. 2014;33(Suppl 2):S140–S151. doi: 10.1097/INF.0000000000000082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lipsitch M, Tchetgen Tchetgen E, Cohen T. Negative controls: A tool for detecting confounding and bias in observational studies. Epidemiology. 2010;21(3):383–388. doi: 10.1097/EDE.0b013e3181d61eeb. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lipsitch M, Jha A, Simonsen L. Observational studies and the difficult quest for causality: Lessons from vaccine effectiveness and impact studies. Int J Epidemiol. 2016 doi: 10.1093/ije/dyw124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Abadie A, Diamond A, Hainmueller J. Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. J Am Stat Assoc. 2012;105(490):493–505. [Google Scholar]
  • 18.Brodersen KH, Gallusser F, Koehler J, Remy N, Scott SL. Inferring causal impact using Bayesian structural time-series models. Ann Appl Stat. 2015;9(1):247–274. [Google Scholar]
  • 19.Griffin MR, Zhu Y, Moore MR, Whitney CG, Grijalva CG. U.S. hospitalizations for pneumonia after a decade of pneumococcal vaccination. N Engl J Med. 2013;369(2):155–163. doi: 10.1056/NEJMoa1209165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Simonsen L, et al. Impact of pneumococcal conjugate vaccination of infants on pneumonia and influenza hospitalization and mortality in all age groups in the United States. MBio. 2011;2(1):e00309–e00310. doi: 10.1128/mBio.00309-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Miller E, Andrews NJ, Waight PA, Slack MPE, George RC. Herd immunity and serotype replacement 4 years after seven-valent pneumococcal conjugate vaccination in England and Wales: An observational cohort study. Lancet Infect Dis. 2011;11(10):760–768. doi: 10.1016/S1473-3099(11)70090-1. [DOI] [PubMed] [Google Scholar]
  • 22.Flasche S, Slack M, Miller E. Long term trends introduce a potential bias when evaluating the impact of the pneumococcal conjugate vaccination programme in England and Wales. Euro Surveill. 2011;16(20):19868. [PubMed] [Google Scholar]
  • 23.Diaz J, et al. Effectiveness of the 10-valent pneumococcal conjugate vaccine (PCV-10) in children in Chile: A nested case-control study using nationwide pneumonia morbidity and mortality surveillance data. PLoS One. 2016;11(4):e0153141. doi: 10.1371/journal.pone.0153141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Sgambatti S, et al. Early impact of 10-valent pneumococcal conjugate vaccine in childhood pneumonia hospitalizations using primary data from an active population-based surveillance. Vaccine. 2016;34(5):663–670. doi: 10.1016/j.vaccine.2015.12.007. [DOI] [PubMed] [Google Scholar]
  • 25.Afonso ET, et al. Effect of 10-valent pneumococcal vaccine on pneumonia among children, Brazil. Emerg Infect Dis. 2013;19(4):589–597. doi: 10.3201/eid1904.121198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Madhi SA, Klugman KP. World Health Organisation definition of “radiologically-confirmed pneumonia” may under-estimate the true public health value of conjugate pneumococcal vaccines. Vaccine. 2007;25(13):2413–2419. doi: 10.1016/j.vaccine.2006.09.010. [DOI] [PubMed] [Google Scholar]
  • 27.Pilishvili T, et al. Active Bacterial Core Surveillance/Emerging Infections Program Network Sustained reductions in invasive pneumococcal disease in the era of conjugate vaccine. J Infect Dis. 2010;201(1):32–41. doi: 10.1086/648593. [DOI] [PubMed] [Google Scholar]
  • 28.Nelson JC, et al. Impact of the introduction of pneumococcal conjugate vaccine on rates of community acquired pneumonia in children and adults. Vaccine. 2008;26(38):4947–4954. doi: 10.1016/j.vaccine.2008.07.016. [DOI] [PubMed] [Google Scholar]
  • 29.Weinberger DM, Klugman KP, Steiner CA, Simonsen L, Viboud C. Association between respiratory syncytial virus activity and pneumococcal disease in infants: A time series analysis of US hospitalization data. PLoS Med. 2015;12(1):e1001776. doi: 10.1371/journal.pmed.1001776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Scott SL. bsts: Bayesian Structural Time Series. R package version 0.6.2. 2015 Available at https://cran.r-project.org/web/packages/bsts/. Accessed January 20, 2017.
  • 31.Harvey AC. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge Univ Press; Cambridge, UK: 1990. [Google Scholar]
  • 32.Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;36(3):1–48. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1612833114.sd01.xlsx (35.1KB, xlsx)
Supplementary File
Supplementary File
pnas.1612833114.sd03.txt (25.5KB, txt)
Supplementary File
Supplementary File
pnas.1612833114.sd05.csv (191.3KB, csv)
Supplementary File
pnas.1612833114.sd06.csv (208.4KB, csv)
Supplementary File
pnas.1612833114.sd07.csv (147.3KB, csv)
Supplementary File
pnas.1612833114.sd08.csv (194.7KB, csv)
Supplementary File
pnas.1612833114.sd09.csv (247.2KB, csv)
Supplementary File
pnas.1612833114.sd10.csv (270.6KB, csv)
Supplementary File
pnas.1612833114.sd01.xlsx (35.1KB, xlsx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES