Dear Editor, In their recent cohort study using administrative healthcare data in France, Pina Vegas et al. [1] conclude that the risk of major adverse cardiovascular events was significantly greater among patients with psoriatic arthritis treated with IL-12/23 and IL-17 inhibitors than patients treated with TNF inhibitors, while no significant increased risk was observed with apremilast treatment. The authors’ conclusion is based on a weighted hazard ratio with a 95% CI that excluded the null value, which was based on five observed major adverse cardiovascular events among IL-12/23 inhibitor initiators, eight among IL-17 inhibitor initiators, eight among apremilast initiators and 30 among TNF inhibitor initiators. The reported precision in the confidence intervals raises fundamental concerns that the authors did not conduct an appropriate analysis of the weighted data.
Propensity score weighting techniques, such as the inverse probability of treatment weighting used by the authors, create a pseudo-population in which the likelihood of treatment is similar between the two groups being examined (i.e. exchangeability) with respect to the measured covariates used to construct the propensity scores. Therefore, an outcome regression model for treatment in the weighted sample provides estimates unaffected by measured confounding. However, to account for the fact that observations are upweighted or downweighted in the pseudo-population and that weights are estimated (rather than known with certainty), a robust, sandwich type estimator is recommended for calculating the variance for the treatment effect estimates [2, 3]. Failure to account for this additional source of variance can lead to spuriously precise estimates in circumstances in which patients experiencing events end up receiving large weights. This issue can be exacerbated in studies with few events, since even small differences in numbers of events can have a large impact on the estimated variance and resulting confidence interval for effect estimates.
Pina Vegas et al. [1] used weighting to adjust for confounding, and stated that, ‘Stabilized weights were calculated to preserve the same size of the original data and produce an appropriate estimation of the main effect variance’. While stabilized weights yield a weighted pseudo-population of equal size to the original study population, they do not yield the same number of weighted events as that for the original study population. The variance of an estimator depends on the number of outcome events, not on the sample size. By not appropriately accounting for the weighting in their analysis, the estimated variance was artificially small because it was based on the number of weighted events, rather than on the actual observed number of events, resulting in incorrect 95% CIs.
The impact of using an incorrectly estimated variance can be indirectly observed by comparing the width of the 95% CIs from the unadjusted incidence rate ratios to the width of the 95% CIs that the authors reported for the weighted analyses. Because adjustment for confounding using weighting methods makes modelling assumptions about the underlying data, they tend to yield larger variances, when estimated correctly, as compared with the variance of the unadjusted estimate. However, the results reported by Pina Vegas et al. [1] show the opposite trend compared with what would be expected using the methodology they employed. Although the authors did not present unadjusted hazard ratios, these can be estimated from the data presented in Table 2 [1]. Using IL-17 inhibitors as an example, the unadjusted incidence rate ratio (and 95% CI) comparing eight events among 1354.9 person-years to the 30 events observed among 10 519.3 person-years for TNF inhibitor initiators is 2.07 (0.95, 4.52). The confidence limit ratio, calculated by dividing the upper bound of the confidence interval by the lower bound, provides an estimate of its width. For this unadjusted rate ratio, it is 4.76 (4.52/0.95). The weighted hazard ratio reported by the authors was similar (1.9), but the 95% CI (1.2, 3.0) was much narrower, with a confidence limit ratio of 2.5. This suggests that had the confidence interval for this weighted comparison been correctly estimated, then it would likely have included the null value.
We urge the authors to correct their analysis and suggest using either a robust variance estimator or bootstrapping to appropriately account for the propensity score weighting. We believe that the findings are not valid as they currently stand.
Funding: This work was supported by Janssen Research & Development, LLC.
Disclosure statement: R.Y.S. and K.J.D. are employees of Janssen Pharmaceutical Companies of Johnson and Johnson, and J.J.G. is an employee of Johnson and Johnson; all own stock in Johnson & Johnson. J.A.B. is a former employee of Johnson and Johnson and owns stock in Johnson & Johnson. R.J.D. received research grants from Bayer AG, Novartis AG and Vertex to the Brigham and Women’s Hospital for unrelated projects.
Contributor Information
Robert Y Suruki, Department of Global Epidemiology, Janssen Pharmaceutical Companies of Johnson and Johnson, Spring House, PA.
Rishi J Desai, Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA.
Kourtney J Davis, Department of Global Epidemiology, Janssen Pharmaceutical Companies of Johnson and Johnson, Titusville, NJ.
Jesse A Berlin, Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, Bala Cynwyd, PA.
Joshua J Gagne, Global Epidemiology, Johnson and Johnson, Cambridge; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Data availability statement
No new data were generated in support of this research; all data relevant to the study are included in the original article.
References
- 1. Pina Vegas L, Le Corvoisier P, Penso L. et al. Risk of major adverse cardiovascular events in patients initiating biologics/apremilast for psoriatic arthritis: a nationwide cohort study. Rheumatology, doi: 10.1093/rheumatology/keab522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Austin PC, Stuart EA.. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med 2015;34:3661–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Desai RJ, Franklin JM.. Alternative approaches for confounding adjustment in observational studies using weighting based on the propensity score: a primer for practitioners. BMJ 2019;367:l5657. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No new data were generated in support of this research; all data relevant to the study are included in the original article.