Skip to main content
eGEMs logoLink to eGEMs
. 2016 May 18;4(1):1221. doi: 10.13063/2327-9214.1221

Observational Studies of Drug Safety in Multi-Database Studies: Methodological Challenges and Opportunities

Robert W Platt i, Colin R Dormuth ii, Dan Chateau iii, Kristian Filion i
PMCID: PMC4909373  PMID: 27376096

Abstract

Introduction/objective:

The Canadian Network for Observational Drug Effect Studies (CNODES), a network of researchers and databases, is a collaborating center of the Drug Safety and Effectiveness Network. CNODES’ main mandate is to conduct observational studies of drug safety based on queries developed and submitted by Health Canada and other federal, provincial, and territorial stakeholders. Through a case study we explore several methodological opportunities and challenges that arise in distributed pharmacoepidemiology networks.

Case study:

We use as a case study a study of proton pump inhibitors and hospitalization for community-acquired pneumonia. Challenges arise in the design and conduct of studies at individual sites, and then with processes and methods for combining data. On the other hand, distributed networks provide opportunities, such as the ability to detect and understand heterogeneity, in sample sizes that would typically be impossible for a single study.

Conclusions:

Networks such as CNODES provide the opportunity to detect and quantify important safety signals from administrative data, and provide many challenges for methods research in pharmacoepidemiology using distributed data. As networks increase in size and scope of research questions, the need for methodological developments should continue to grow.

Keywords: Methods, Research networks, Data analysis method

Introduction

The Canadian Network for Observational Drug Effect Studies (CNODES) conducts studies of drug safety in a distributed network of databases from seven Canadian provinces, the United States, and the United Kingdom. CNODES is similar to other networks such as the Sentinel,1 PROTECT,2 and AsPEN3 networks. Huang et al.4 review the different approaches and methods of the various networks.

CNODES is a collaborating center of the Drug Safety and Effectiveness Network (DSEN), a joint initiative of the Canadian Institutes of Health Research (CIHR), Health Canada, and other federal and provincial stakeholders. DSEN funds several research teams with the objectives of providing high-quality evidence on drug safety and effectiveness to Canadian regulators, and of developing research capacity in this area. CNODES is a distributed network of researchers and databases across Canada. Seven Canadian provinces contribute administrative claims data, which is supplemented by the United Kingdom’s Clinical Practice Research Datalink (CPRD)5—formerly known as the General Practice Research Database—a clinical database containing the records of patients seen at over 680 general practitioner practices; and the Thomson Reuters (Healthcare) Inc. 2006 to 2014 Thomson Reuters MarketScan Commercial Claims and Encounters Data, an insurance claims data set. The latter two databases are included to add sample size and, in the case of the CPRD, detail, as it contains clinical information not typically found in administrative databases and has finer-grained confounding data included.

The structure of CNODES has given rise to several methodological opportunities and challenges. Bazelier et al.6 reviewed the literature on distributed network data management and analyses, and noted several methodological strengths and weaknesses of these networks. They stressed the need for detailed protocols and documentation, and the need for careful understanding of heterogeneity. In this paper, we describe a series of methodological challenges that arise in distributed networks and describe some solutions for these challenges. We illustrate these challenges and solutions through a case study conducted by CNODES.

Case Study

The typical CNODES study follows a standardized process.7 Once a research question is proposed by a government stakeholder (usually by Health Canada directly, or by another federal or provincial stakeholder), a study team is developed that includes expertise in pharmacoepidemiology, biostatistics, and the relevant clinical and pharmacological areas. This team includes a researcher and analyst from each site participating in the study. A study protocol and detailed technical analytic protocol are prepared by the study team and distributed to the sites. Site-specific studies following these protocols are conducted at each of the study sites, and the results are then combined using meta-analytic methods.

Filion et al.8 studied the association between proton pump inhibitors (PPI) and hospitalization for community-acquired pneumonia (HCAP), in the CNODES databases. Separate cohorts of nonsteroidal anti-inflammatory drug (NSAID) users were created—at each participating CNODES site (Alberta, Saskatchewan, Manitoba, Ontario, Quebec, Nova Scotia); the United States (MarketScan); and the United Kingdom, (General Practice Research Database (GPRD)). The cohorts were restricted to NSAID users for reasons outlined below. In each cohort separately, high-dimensional propensity scores (hdPS) were used to adjust for confounding, and logistic regression was used to estimate adjusted odds ratios (ORs) for the risk of HCAP within the first six months post-NSAID prescription.

The authors assumed “intention-to-treat”; that is, exposure to PPIs throughout the six-month time window was assumed to be constant based on the initial prescription. Fixed-effects meta-analysis with inverse-variance weighting was used to combine the results across sites and generate a summary estimate of the adjusted OR.

Figure 1 (from Filion et al.8) gives the forest plot comparing PPI use to no PPI use for the six-month cumulative incidence of HCAP. With the exception of Nova Scotia, all individual study results are centered around the null, and the summary OR was 1.05—95 percent confidence interval (CI): 0.89–1.25—a strong indication of no effect.

Figure 1.

Figure 1.

Forest Plot of Association Between Use of Proton Pump Inhibitors and the Six-Month Cumulative Incidence of Hospitalization for Community-Acquired Pneumonia in a Restricted Cohort of New Users of Nonsteroidal Anti-Inflammatory Drugs (NSAIDs)

Notes: Analyses were adjusted for age, sex, previous nonhospitalized pneumonia, prescription of proton pump inhibitors, histamine-2 receptor antagonists and NSAIDs in the 7–12 months prior to cohort entry, and high-dimensional propensity score decile. General Practice Research Database (GPRD).

Source: Filion et al.8

Methodological Challenges

The network structure of CNODES has given rise to several methodological challenges; the CNODES Methods Team is responsible for addressing these problems by determining best practices and conducting new research as needed. The team includes methods researchers at several of the CNODES sites, as well as trainees dedicated to pharmacoepidemiologic and statistical methods.

The challenges can be grouped into three broad categories: problems related to the conduct of individual-site studies, problems related to combining data across sites (meta-analytic methods), and issues related to the logistics and value of adding data to an existing study across time. We outline several of these problems below, and discuss them in the context of the PPI and HCAP study.

Design

Prior to the work by Filion et al.,8 results of studies of PPIs and pneumonia have suggested an association, with users of PPIs being at higher risk of pneumonia.9 It has been hypothesized that studies of this association were affected by severe protopathic bias—because of the prescription of PPIs due to the symptoms of undiagnosed pneumonia rather than gastroesophageal reflux disease (GERD); and confounding by indication due to the GERD itself. To avoid such biases, Filion et al.8 created restricted cohorts of new users of NSAIDs, in which some patients were prescribed PPIs prophylactically, and the treatment-outcome association was thus relatively unconfounded. Their primary result is at odds with prior findings; however, analysis of an unrestricted cohort of PPI users using a similar approach to that used in prior studies at one CNODES site that showed an increased risk due to PPIs (aOR=1.24, 95 percent CI 0.96–1.59). This result was consistent with the results of the previous meta-analysis by Eom et al. (adjusted OR 1.27, 95 percent CI 1.11–1.46). Chateau et al. found the highest risk close to initiation of treatment, also suggestive of protopathic bias.10 The consistency of these results with those of previous studies that used similar methods suggests that the confounding adjustment by restriction was effective.

This case study illustrates the need for careful design choices in distributed network analyses, and the need for substantial sample sizes that networks can provide. Cohort restriction to reduce confounding significantly affects sample size; the ability to combine data across sites in a network mitigates this concern so that good design choices can be implemented. On the other hand, increased sample size can exaggerate problems due to poor design choices. The re-analysis of the PPI study using designs similar to past studies supports this. Biases in single studies may become more apparent in network analyses because the increased precision from the use of data from several sites and their corresponding meta-analyses leads to reduced variance (it is reasonable to assume that with other sites included, the CI around the 1.24 would be significantly narrower).

Estimation in a Single Site

In a CNODES study, each site conducts a study based on the standardized protocol using established pharmacoepidemiologic methods.

However, the standardized protocol must take into account a variety of differences between sites. Data structures differ across sites; the CPRD5 is an electronic health record (EHR) database, while the Canadian provincial data sets are insurance claims data. Claims data may contain less information than EHR data, because the reason for collecting the data is financial and administrative rather than being related to care. Statistical control for confounding could therefore be less complete with claims data than with EHR data. In addition, data coding differs across sites—for example, the CPRD uses Read codes for outpatient diagnoses and International Classification of Diseases 10 (ICD 10) codes for hospitalization data; drug coding varies widely, as do the periods of data availability. Finally, the provincial coverage plans can differ substantially, both in terms of who is covered (for example, in some provinces medications are covered for only those over 65, while in others all ages receive coverage), and in terms of what is covered (substantial differences can exist in coverage of drugs for specific conditions due to differences in provincial formularies; for example, coverage for PPIs differs between provinces). CNODES does not use a common data model. Each site keeps data in the original format and keeps all available data, rather than restricting to variables that can be compiled in a common unified data set. This may have an impact on efficiency, but it does allow maximum flexibility to incorporate all available information and reflect heterogeneity. However, the protocol must be flexible enough to take into account these differences in data structure and availability.

Further, in a typical, single-site, pharmacoepidemiologic study, a single (or very few) data analyst would work in close proximity to the principal investigator and the lead methodologist. However, in a CNODES study, the analysts work with a common protocol, but are dispersed throughout the network, and contact with the principal investigator and lead methodologist is limited. As such, the protocols must be designed with sufficient clarity to ensure reproducible results in multiple sites. This leads to an important challenge for protocol writers. The protocol must be sufficiently complex to capture site-specific nuances, but clear and straightforward enough to be used by several analysts at a distance.

In such a distributed setting, automated methods for confounding control are ideal. The hdPS approach proposed by Schneeweiss11 provides an efficient way to automate confounding control, and performs well in practice. The hdPS methods also allow confounding control to be optimized to data available in individual sites; while this is a strength, it could also expose between-site heterogeneity because covariate selection is tailored to covariates that have potential to create bias in analyses at the sites. This could ensure minimal bias at each site but demonstrate heterogeneity due to population variations. However, these are relatively novel methods, and their properties should continue to be explored.

In the PPI study, the protocol was standardized and the hdPS did allow accounting for different control of confounding across sites. The divergent results in Nova Scotia, likely due to prescribing heterogeneity, indicate the usefulness of the hdPS in accounting for site-specific differences.

Meta-Analysis: Choice of Method

CNODES and other related networks perform studies that can be thought of as prospective meta-analyses. Since the study sites are known in advance and not a random sample from some potential larger set of sites, this situation does not satisfy the assumptions of a random-effects model. Furthermore, with an identical protocol defined a priori, it is reasonable to consider a network study as a fixed-effects problem.12 Of course, on the other hand, heterogeneity may still be present due to, for example, differences in populations or in prescription patterns. This does not leave investigators with an easy solution to heterogeneity. In the Filion et al.8 study, the I2 heterogeneity statistic13 was 0 and the random effects estimate and 95 percent CI essentially identical to the fixed effects estimates, indicating minimal heterogeneity and that the fixed effects model was reasonable. However, in other CNODES studies there has been measurable heterogeneity.14,15 Even though heterogeneity should be minimal by design, it should be addressed when it is present. Methods are needed to distinguish between genuine heterogeneity and random noise that is due to the substantial variation in site sample sizes. Authors of such studies need to consider approaches to this problem. Metaregression16 may be a useful approach; however Hansen et al.17 noted that, in this setting, metaregression should be treated as exploratory, and is prone to false positives. Given the small number of sites, false positives are a significant concern with metaregression.

Meta-Analysis: Stopping Rules

The CNODES meta-analysis is designed in advance; that is, the meta-analysis is planned before any site-specific study is done, and site-specific analyses are blinded to other sites’ results. This gives the advantage that, to the extent that it is feasible, effect-measure heterogeneity and heterogeneity due to confounding control can both be minimized.

However, in the CNODES network, some jurisdictions have had difficulties accessing data in a timely fashion due to regulatory and privacy concerns. Site-level results are therefore produced at varying times. It is possible that several sites may complete analyses prior to other sites having access to their data. This raises the question as to whether the data already collected and analyzed from other jurisdictions are sufficient to form a definitive conclusion, or whether it is necessary to wait for the final results from all sites prior to forming a conclusion. Langan et al.18 proposed a method to assess the potential change in statistical significance when a new study is added to a meta-analysis (i.e., the likelihood that a new study could change the conclusions based on the p-value). In a CNODES-sponsored project, Chevance et al.19 extended this method to describe the potential changes in point estimate and heterogeneity when a new study is added. These authors developed a series of contour plots that describe the state of the evidence and the potential impact of a new study; specifically, they showed potential changes in the summary point estimate and the I2 statistic, as well as the p-value.

Figure 2 shows the contour plots developed by Chevance for the PPI and HCAP study. It indicates that an additional study would be unlikely to change the result substantially, unless the study were both very large and had a substantial treatment effect. For example, a study of moderate size with a standard error of 0.4 would have to have an OR of at least 2.0 to change the summary OR to 1.08. A study would have to be very large (standard error < 0.2) to create a nonnull conclusion based on the p-value, indicating that a clinically significant change is unlikely.

Figure 2.

Figure 2.

Contour Plots for the Meta-Analysis of the Association Between Proton Pump Inhibitors and the Six-Month Cumulative Incidence of Hospitalization for Community-Acquired Pneumonia in a Restricted Cohort of New Users of Nonsteroidal Anti-Inflammatory Drugs (NSAIDs)

Notes: The top panel gives the meta-analytic summary estimate. The second, third, and fourth panels show contours for the point estimate, p-value, and I2 statistic, indicating how each would change with the addition of an additional study, indexed by the size of the odds ratio (x-axis) and the standard error (y-axis) for the new study.

Further work should extend these methods. The current approaches allow consideration of only a single additional study, and are based on a fixed effects model. Daigneault (unpublished manuscript) developed a Bayesian method with more flexibility; however, this has not been assessed in practice.

Conclusion

In this paper, we summarize several of the methodological challenges that are faced in analyses of distributed research networks, illustrated through a case study on PPIs and HCAP.

Many of the problems that arise in distributed networks are common to all research in pharmacoepidemiology. The study described here showcases the value of sophisticated design; of access to large data sets with good measurement of outcome, exposure, and confounders; and of strong methods for confounding control. In the “big data” era, it is important and well recognized that large sample sizes alone are not sufficient to ensure unbiased answers to research questions. Sensible epidemiologic design and analysis are perhaps even more important given the increased precision due to large sample sizes.20,21 These concerns would be equally applicable to a large, pooled data analysis; they are not specific to networks. While a pooled analysis does alleviate many of the problems involved in network analyses, the fundamental challenges of design and adjustment in order to provide unbiased estimators remain. However, much work is still needed in this area; newer methods such as double robust methods22,23 and targeted learning24 are promising, but remain relatively underutilized in large-sample pharmacoepidemiology.

What is unique to distributed networks is the need for combining inference across various databases. Methods for combining data from preplanned meta-analyses, for dealing with heterogeneity, and for managing studies when data arrive at variable rates are needed. Methods for understanding and adjusting for heterogeneity, in particular, should be developed. Methods for stopping rules and for cumulative meta-analysis25 may be relevant to distributed network analyses.

Considerable research is ongoing into methods for pooling data while respecting privacy concerns,26 but consideration should be given to the loss of detail that arises when pooling, due to the requirements for data harmonization. Approaches like that of Sentinel, which uses a common data model but allows substantial site-to-site flexibility in data structure and availability, may provide advantages relative to the CNODES system of separate data files or more restrictive common data model approaches.

Networks such as CNODES provide the opportunity to detect and quantify important safety signals from administrative data. Large sample sizes and rapid data access allow this research to be done quickly, but do not eliminate the need for strong methods and careful substantive input. As networks increase in size and scope of research questions, the need for methodological developments should continue to grow.

Acknowledgments

The Canadian Network for Observational Drug Effect Studies (CNODES), a collaborating centre of the Drug Safety and Effectiveness Network (DSEN), is funded by the Canadian Institutes of Health Research (CIHR; Grant Number DSE-111845). We would like to thank the CNODES investigators and collaborators for their contribution to developing the study protocol discussed in this paper. Dr. Platt holds a Chercheur-national (National Scholar) award from the Fonds de Recherche du Québec - Santé and the Albert Boehringer I Chair in Pharmacoepidemiology at McGill University. Dr. Filion holds a New Investigator Award from the CIHR.

References

  • 1.Behrman RE, Benner JS, Brown JS, et al. Developing the Sentinel System — A National Resource for Evidence Development. New England Journal of Medicine. 2011;364(6):498–499. doi: 10.1056/NEJMp1014427. [DOI] [PubMed] [Google Scholar]
  • 2.Huerta C, Abbing-Karahagopian V, Requena G, et al. Exposure to benzodiazepines (anxiolytics, hypnotics and related drugs) in seven European electronic healthcare databases: a cross-national descriptive study from the PROTECT-EU Project. Pharmacoepidem Drug Safe. 2015 doi: 10.1002/pds.3825. [DOI] [PubMed] [Google Scholar]
  • 3.AsPEN collaborators. Andersen M, Bergman U, et al. The Asian Pharmacoepidemiology Network (AsPEN): promoting multinational collaboration for pharmacoepidemiologic research in Asia. Pharmacoepidem. Drug Safe. 2013;22(7):700–704. doi: 10.1002/pds.3439. [DOI] [PubMed] [Google Scholar]
  • 4.Huang Y-L, Moon J, Segal JB. A Comparison of Active Adverse Event Surveillance Systems Worldwide. Drug Saf. 2014;37(8):581–596. doi: 10.1007/s40264-014-0194-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Herrett E, Gallagher AM, Bhaskaran K, et al. Data Resource Profile: Clinical Practice Research Datalink (CPRD) Int J Epidemiol. 2015;44(3):827–836. doi: 10.1093/ije/dyv098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bazelier MT, Eriksson I, de Vries F, et al. Data management and data analysis techniques in pharmacoepidemiological studies using a pre-planned multi-database approach: a systematic literature review. Pharmacoepidem. Drug Safe. 2015;24(9):897–905. doi: 10.1002/pds.3828. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Suissa S, Henry D, Caetano P, et al. CNODES: the Canadian Network for Observational Drug Effect Studies. Open Med. 2012;6(4):e134–40. [PMC free article] [PubMed] [Google Scholar]
  • 8.Filion KB, Chateau D, Targownik LE, et al. Proton pump inhibitors and the risk of hospitalisation for community-acquired pneumonia: replicated cohort studies with meta-analysis. Gut. 2014;63(4):552–558. doi: 10.1136/gutjnl-2013-304738. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Eom C-S, Jeon CY, Lim J-W, et al. Use of acid-suppressive drugs and risk of pneumonia: a systematic review and meta-analysis. CMAJ : Canadian Medical Association journal. 2011;183(3):310–319. doi: 10.1503/cmaj.092129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chateau D. International Health Data Linkage Conference. Vancouver, Canada: 2014. Addressing Confounding Through Creative Cohort Construction: CNODES analysis of PPIs and Pneumonia. [Google Scholar]
  • 11.Schneeweiss S, Rassen J, Glynn RJ, et al. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. 2009;20(4):512. doi: 10.1097/EDE.0b013e3181a663cc. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Egger M, Davey Smith G, Altman D. Systematic Reviews in Health Care. London, UK: John Wiley & Sons; 2008. [Google Scholar]
  • 13.Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine. 2002;21(11):1539–1558. doi: 10.1002/sim.1186. [DOI] [PubMed] [Google Scholar]
  • 14.Dormuth CR, Hemmelgarn BR, Paterson JM, et al. Use of high potency statins and rates of admission for acute kidney injury: multicenter, retrospective observational analysis of administrative databases. BMJ. 2013;346:f880–f880. doi: 10.1136/bmj.f880. mar18 3. [DOI] [PubMed] [Google Scholar]
  • 15.Dormuth CR, Filion KB, Paterson JM, et al. Higher potency statins and the risk of new diabetes: multicentre, observational study of administrative databases. BMJ. 2014;348(6):g3244–g3244. doi: 10.1136/bmj.g3244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Thompson S, Higgins J. How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine. 2002;21:1559–73. doi: 10.1002/sim.1187. [DOI] [PubMed] [Google Scholar]
  • 17.Hansen RA, Zeng P, Ryan P, et al. Exploration of heterogeneity in distributed research network drug safety analyses. 2014;5(4):352–370. doi: 10.1002/jrsm.1121. [DOI] [PubMed] [Google Scholar]
  • 18.Langan D, Higgins JPT, Gregory W, et al. Graphical augmentations to the funnel plot assess the impact of additional evidence on a meta-analysis. J Clin Epidemiol. 2012;65(5):511–519. doi: 10.1016/j.jclinepi.2011.10.009. [DOI] [PubMed] [Google Scholar]
  • 19.Chevance A, Schuster T, Steele R, et al. Contour plot assessment of existing meta-analyses confirms robust association of statin use and acute kidney injury risk. J Clin Epidemiol. 2015 doi: 10.1016/j.jclinepi.2015.05.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Toh S, Platt R. Is Size the Next Big Thing in Epidemiology? 2013;24(3):349–351. doi: 10.1097/EDE.0b013e31828ac65e. [DOI] [PubMed] [Google Scholar]
  • 21.Hernán MA, Savitz DA. From “Big Epidemiology” to ‘Colossal Epidemiology’. 2013;24(3):344–45. doi: 10.1097/EDE.0b013e31828c7694. [DOI] [PubMed] [Google Scholar]
  • 22.Tchetgen Tchetgen EJ, Rotnitzky A. Double-robust estimation of an exposure-outcome odds ratio adjusting for confounding in cohort and case-control studies. Statistics in Medicine. 2010;30(4):335–47. doi: 10.1002/sim.4103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhang M, Schaubel DE. Contrasting treatment-specific survival using double-robust estimators. Statistics in Medicine. 2012;31(30):4255–68. doi: 10.1002/sim.5511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.van der Laan MJ, Rose S. Targeted Learning. Springer; 2011. [Google Scholar]
  • 25.Pogue JM, Yusuf S. Cumulating evidence from randomized trials: utilizing sequential monitoring boundaries for cumulative meta-analysis. Controlled Clinical Trials. 1997;18(6):580–93. doi: 10.1016/s0197-2456(97)00051-2. – discussion 661–6. [DOI] [PubMed] [Google Scholar]
  • 26.Emam El K, Samet S, Arbuckle L, et al. A secure distributed logistic regression protocol for the detection of rare adverse drug events. Journal of the American Medical Informatics Association. 2013;20(3):453–461. doi: 10.1136/amiajnl-2011-000735. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from eGEMs are provided here courtesy of Ubiquity Press

RESOURCES