Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 1.
Published in final edited form as: Clin Pharmacol Ther. 2021 Aug 4;111(1):209–217. doi: 10.1002/cpt.2364

Prevalence of Avoidable and Bias-Inflicting Methodological Pitfalls in Real-World Studies of Medication Safety and Effectiveness

Katsiaryna Bykov 1,*, Elisabetta Patorno 1, Elvira D’Andrea 1, Mengdong He 1, Hemin Lee 1, Jennifer S Graff 2, Jessica M Franklin 1
PMCID: PMC8678198  NIHMSID: NIHMS1733132  PMID: 34260087

Abstract

Many real-word evidence (RWE) studies that utilize existing healthcare data to evaluate treatment effects incur substantial but avoidable bias from methodologically flawed study design; however, the extent of preventable methodological pitfalls in current RWE is unknown. To characterize the prevalence of avoidable methodological pitfalls with potential for bias in published claims-based studies of medication safety or effectiveness, we conducted an English-language search of PubMed for articles published from January 1, 2010 to May 20, 2019 and randomly selected 75 studies (10 case-control and 65 cohort studies) that evaluated safety or effectiveness of cardiovascular, diabetes, or osteoporosis medications using US health insurance claims. General and methodological study characteristics were extracted independently by two reviewers, and potential for bias was assessed across nine bias domains. Nearly all studies (95%) had at least one avoidable methodological issue known to incur bias, and 81% had potentially at least one of the four issues considered major due to their potential to undermine study validity: time-related bias (57%), potential for depletion of outcome-susceptible individuals (44%), inappropriate adjustment for postbaseline variables (41%), or potential for reverse causation (39%). The median number of major issues per study was 2 (interquartile range (IQR), 1–3) and was lower in cohort studies with a new-user, active-comparator design (median 1, IQR 0–1) than in cohort studies of prevalent users with a nonuser comparator (median 3, IQR 3–4). Recognizing and avoiding known methodological study design pitfalls could substantially improve the utility of RWE and confidence in its validity.


Evaluating the safety and effectiveness of health interventions using existing healthcare data has become a valuable tool for generating evidence.1 While randomized controlled trials (RCTs) have been the gold standard for establishing the causal relationship between interventions and health outcomes for decades, they are often expensive, take years to complete, and exclude relevant population groups, such as children, pregnant women, older adults, or individuals with multiple comorbidities.24 Data that are routinely generated in the course of clinical care, such as healthcare insurance claims or electronic healthcare records (EHRs), could contribute substantially to evidence on clinical effectiveness and safety of medical interventions at a much lower cost.5

Yet, since these data are not collected for research purposes, their use presents many challenges. Treatments are not randomized in clinical practice, data reflect the purpose of collection and may be missing relevant clinical parameters, and with all data already available at the study design stage, it is easy for researchers to deviate from principles that would otherwise guide the conduct of a trial, if one were conducted instead. Opportunities for bias abound and findings from many real-world data (RWD) analyses have been subsequently refuted by either RCTs or more rigorously conducted real-world evidence (RWE) invesigations.69 And yet, it has been shown that RWD analyses can provide valid estimates of treatment effects, and there have been several examples when RWE findings preceded and were later confirmed by RCTs.1012

While the lack of randomization is often considered the culprit of flawed RWE, many flawed findings from RWD analyses arise from methodological pitfalls in study design and analysis.6,8,9,13 Unlike problems inherent to data, such as missing data or confounding due to unmeasured risk factors, bias due to flawed design is avoidable and can be eliminated by adhering to accepted principles of RWD causal analysis, such as emulation of a hypothetical target trial.10,1416 Without greater awareness and adherence to these principles, the presence of major methodological problems in RWE hinders both our collective efforts to maximize the utility of existing healthcare data and the ability to interpret and identify high-quality and valid RWE studies.

Thus, understanding the current state of RWE is essential to move the field forward. This attempt is also critical, considering the rapidly growing availability of RWD as well as the exponentially increasing quantity of RWE studies coming out each year. The objective of this study was to systematically evaluate the prevalence of major avoidable, investigator-controlled methodological issues in published RWE studies on medication safety and effectiveness. We focused on RWE analyses done in US health insurance claims databases, the most common RWD source in the United States.

METHODS

Data sources and searches

To identify studies for our review, we conducted a systematic search in PubMed for cohort or case-control studies that assessed medication safety or effectiveness using US health insurance claims databases and were published between January 1, 2010 and May 20, 2019. We restricted our search to medications for cardiovascular disease, diabetes, asthma or chronic obstructive pulmonary disease, and osteoporosis, the therapeutic areas where many observational studies, including RCT replications, have been published over the past decade and where drug exposures and outcomes are known to be measurable in claims data. We excluded studies in a language other than English, studies conducted in pregnant women or in patients with secondary or gestational diabetes, and studies not conducted in humans. In addition, we hand searched relevant reviews, meta-analyses, and reference lists for articles not identified through the systematic search.

A full list of inclusion and exclusion criteria (Table S1), and the search strategy (Table S2) are reported in the Supplementary Material.

Study selection

We manually reviewed the titles and abstracts of all identified records. From the records deemed eligible for inclusion, we randomly selected 10 case-control studies and 65 cohort studies, for a total of 75 studies. The ratio of case-control studies to cohort studies was fixed to ensure a sufficient number of case-control studies in the sample. Due to the low number of eligible studies evaluating asthma / chronic obstructive pulmonary disease treatments that were published during the assessed period, the random sample was restricted to eligible studies from the other three therapeutic areas. If a study was found not to meet eligibility criteria during the data extraction and evaluation stage, it was replaced with another study randomly selected among the remaining records.

Data extraction and evaluation

Each paper was independently evaluated by two investigators. Data extraction was done using a prespecified data collection form. Discrepancies were resolved by consensus between the two investigators, and among all six reviewing investigators if necessary.

We extracted the following general study characteristics: publication year, affiliation of the corresponding author (academic, pharmaceutical company, or a research organization), source of funding (federal, pharmaceutical company, for-profit research organization, nonprofit research organization, or not mentioned), journal name, specific database utilized, whether the authors utilized other sources of data in addition to claims (e.g., EHRs, registry data, census data, laboratory test results, or National Death Index data), exposure therapeutic area, the types of outcomes assessed (safety, effectiveness, or both), and whether the study was registered.

To assess the potential for bias in each study, we developed a structured questionnaire that evaluated the potential for bias across nine domains (Table 1 and Questionnaire in the Supplement). To avoid the necessity for subjective judgement and assumptions on the part of the reviewers, the questionnaire was developed to include factual questions that could be answered based on information usually presented in published manuscripts. Once a consensus on study design and analysis for each reviewed study was reached, answers were rolled up to identify potential for bias (see Questionnaire in the Supplement). The domains were chosen based on investigators’ expertise, published literature, and existing tools for evaluation of observational studies, including the GRACE (Good ReseArch for Comparative Effectiveness) checklist and ROBINS-I (Risk Of Bias In Non-randomized Studies of Interventions), further modified to specifically fit RWD analyses of medication safety and effectiveness in healthcare claims data.9,10,1722 Four domains, specifically, time-related bias, which included immortal person-time in cohort studies and time-window bias in case-control studies,16 inappropriate adjustment for postbaseline variables (without the use of marginal structural models or g-estimation), depletion of outcome-susceptible individuals, and reverse causation, were considered major due to their potential to cause substantial bias and undermine study validity.6,9,16,2327 Other domains included insufficiently addressed confounding, differential outcome surveillance (detection bias), exposure misclassification (due to investigators’ decisions), outcome misclassification, and informative censoring. Studies were considered to have insufficiently addressed confounding when there was no adjustment for age, gender, treatment indication or its severity, comorbidities, use of other medications, and measures of healthcare utilization, or when the authors did not implement any techniques to control or evaluate confounders not measurable in claims data (see Questionnaire Section 7.2 in the Supplement for specific examples). We also assessed adjustment for calendar time; however, since in some cases calendar time may act as an instrument or a study period may be very short, we did not count the lack of adjustment for calendar time as a source of bias in this review. Since adjustment for treatment indication may not be needed (though adjustment for variables indicating disease severity may still be advisable) when exposure under evaluation is compared with an active comparator from the same therapeutic class, in a sensitivity analysis we did not consider lack of adjustment for indicating disease indicative of insufficiently addressed confounding.

Table 1.

Investigator-controlled bias domains

Bias domainsa Description
Time-related biasb Selective inclusion or exclusion of immortal time (during which outcome did not and could not occur) from one treatment group, but not the other, in cohort studies; differential duration of exposure assessment windows in a case-control study.
Inappropriate adjustment for postbaseline variablesb Adjustment for variables that may be on the causal pathway between treatment and outcome without methods that can handle postbaseline variables without bias, such as marginal structural models or g-methods.
Depletion of outcome-susceptible individualsb The study population excludes (by design) individuals or eligible follow-up time and outcomes that may have occurred early in therapy; often the consequence of including individuals already on treatment into the study (prevalent users).
Reverse causationb Initiation of medication in response to an early manifestation related to the outcome of interest.
Insufficiently addressed confounding While confounding can never be ruled out in nonrandomized data, investigators can implement strategies to control for it. We considered studies to have potential for insufficiently addressed confounding if there was no adjustment for age, sex, treatment indication or its severity, comorbidities, prior use of medications, and prior healthcare use, all measurable in claims data. In addition, studies, in which treatment was not compared with a similar alternative (active comparator), were considered to have insufficiently addressed confounding, if the authors did not conduct additional analyses aimed to evaluate confounding due to factors not available in claims data, such as bias analysis, analysis of control outcomes or exposures, or evaluation of variables not available in claims using other data sources.
Detection bias Differential surveillance for outcome across the comparison groups; often a result of differential medical surveillance or healthcare utilization.
Exposure misclassification For the purpose of this investigation (to evaluate avoidable sources of bias), exposure misclassification was defined as failure to conduct analyses that account for treatment changes during follow-up.
Outcome misclassification Outcome is based entirely on disease or procedure codes without strategies to improve or evaluate the performance of the outcome identification algorithm.
Informative censoring Individuals discontinue treatment due to the symptoms of the outcome under study or changes in health status related to the outcome.
a

Details on evaluations are presented in the Questionnaire in the Supplement.

b

Considered major due to their potential to cause substantial bias.

In addition to the nine bias domains, we also assessed the type of comparator (nonuser comparator vs. an alternative treatment, also known as active comparator), and whether a study included individuals already on treatment at the start of follow-up (prevalent users) or limited the study population to treatment initiators (new-user design). Both active comparator and new-user design are often recommended for observational studies of therapeutics to avoid design errors and reduce bias due to confounding and other avoidable methodological issues.8,10,15,28

All information presented and all analyses, including sensitivity and secondary, reported in reviewed studies were considered in the assessment, except for the evaluation of outcome misclassification, which was restricted to the primary outcome. When reported information was not sufficient to answer a question in the questionnaire, we conservatively interpreted the absence of information to positively answer a question as a negative answer.

Data analysis

General and methodological study characteristics were assessed in the overall sample (N = 75) and in the subsets stratified by study design (cohort and case-control). We classified the journals where the reviewed studies were published into four types (clinical medicine, pharmacology or pharmacotherapy, epidemiology or healthcare services, and multidisciplinary) based on the labels or indexes provided by each journal on the Web of Science Master Journal List (available at https://mjl.clarivate.com/home). We also classified journals based on the most recent 1-year impact factor, retrieved from journals’ websites in June 2020.

Analyses included frequencies and percentages for characteristics, and medians with interquartile range (IQR) for the number of major methodological issues and any methodological issues per study.

We further explored the differences across study subgroups among all the included studies and then among the included cohort studies. Subgroups were formed based on the year of publication (studies published between 2010 and 2014 vs. those published between 2015 and 2019); availability of additional sources of data (EHR and/or registry vs. healthcare claims only); and impact factor of journals (above or below the median). Differences in methodological characteristics across the subgroups were assessed using the χ2 test and Fisher exact test when the count in at least one of the subgroups was < 5.

Finally, to assess possible correlation between the presence of major methodological issues and the implementation of an active-comparator, new-user design, we assessed the frequency of the four major issues, as well as the median number of major issues per study, across four subgroups of cohort studies: (i) new-user design with an active comparator; (ii) new-user design with a nonuser comparator; (iii) prevalent-user design with an active comparator; and (iv) prevalent-user design with a nonuser comparator. Studies that included both an active comparator and a separate nonuser comparator analysis contributed to both subgroups, with the presence of issues assessed based on that specific design choice.

All analyses were performed using the Statistical Analysis System (SAS) statistical software package, version 9.4 (SAS Institute Inc., Cary, NC). Data analysis code is available from the corresponding author upon request.

Results

Study selection and general characteristics

The search identified 1,598 studies; 1,217 studies were excluded on title/abstract review and 19 were found not eligible during the data extraction process (Figure 1). The 75 studies included in the final sample are reported in Table S3.

Figure 1.

Figure 1

Flow diagram of the study selection. *The included studies were randomly selected from the list of eligible studies.

General study characteristics for the overall sample and study design subsets are presented in Table 2. Most included studies (74%) evaluated cardiovascular medications, 19% evaluated drugs for diabetes, and 7% evaluated medications for osteoporosis. Drug effectiveness studies were more common than safety studies (45% vs. 24%, respectively); a third of studies (31%) evaluated both drug safety and effectiveness. In more than half of studies (61%), other data sources were available in addition to healthcare claims, with EHR being the most common type. Less than a quarter of studies (21%) were funded by pharmaceutical companies; federal agencies and nonprofit research organizations represented the most common funding source (39% and 32%, respectively). Only one study mentioned that the study protocol was publicly registered.

Table 2.

General characteristics of the reviewed studies

Characteristic All (N = 75) Cohort studies (N = 65) Case-control studies (N = 10)
Publication years, n (%)
 2010–2014 27 (36) 21 (32) 6 (60)
 2015–2019 48 (64) 44 (68) 4 (40)
Corresponding author affiliation, n (%)
 Academic 68 (91) 58 (89) 10 (100)
 Pharmaceutical company 1 (1) 1 (2) 0 (0)
 Contract research organization 6 (8) 6 (9) 0 (0)
Funding source, n (%)
 Federal 29 (39) 25 (38) 4 (40)
 Pharmaceutical company 16 (21) 16 (25) 0 (0)
 For-profit research organization 2 (3) 2 (3) 0 (0)
 Nonprofit 24 (32) 18 (28) 6 (60)
 Not mentioned 4 (5) 4 (6) 0 (0)
Publication journal type, n (%)
 Clinical Medicine 57 (76) 49 (75) 8 (80)
 Pharmacology/Pharmacotherapy 8 (11) 8 (12) 0 (0)
 Epidemiology/health care services 9 (12) 8 (12) 1 (10)
 Multidisciplinary 1 (1) 0 (0) 1 (10)
Journal impact factor, n (%)
 <3 28 (37) 24 (37) 4 (40)
 3–5 26 (35) 21 (32) 5 (50)
 >5 21 (28) 20 (31) 1 (10)
Study registration, n (%) 1 (1) 1 (1) 0 (0)
Claims database vendor, n (%)
 Commercial 33 (44) 29 (45) 4 (40)
 Federal (Medicare, Medicaid, Veteran Affairs) 31 (41) 29 (45) 2 (20)
 Other (e.g., Kaiser) 10 (14) 6 (9) 4 (40)
 Mixed (commercial and federal) 1 (1) 1 (1) 0 (0)
Additional sources of data,a n (%)
 No linked data (claims only) 29 (39) 26 (40) 3 (30)
 Electronic health records 25 (33) 21 (32) 4 (40)
 Registry 13 (17) 11 (17) 2 (20)
 Other types of datab 20 (27) 17 (26) 3 (30)
Exposure therapeutic area, n (%)
 Antidiabetic drugs 14 (19) 12 (19) 2 (20)
 Cardiovascular drugs 56 (74) 49 (75) 7 (70)
 Osteoporosis drugs 5 (7) 4 (6) 1 (10)
Outcomes, n (%)
 Safety only 18 (24) 14 (21) 4 (40)
 Effectiveness only 34 (45) 29 (45) 5 (50)
 Both safety and effectiveness 23 (31) 22 (34) 1 (10)
a

Not mutually exclusive as studies may have utilized multiple sources of additional data.

b

Other types of data include laboratory results, medical chart reviews, in-person interviews or questionnaires, census data, national nursing home administrative data, mortality data from the Social Security Death Index Master File, National Death Index, or local (state level or internal to the facility) death index databases.

Methodological characteristics

Methodological study characteristics for the overall sample and stratified by study design are presented in Table 3. Nearly all studies (95%) had at least one methodological issue (Figure S1) and 81% of studies had at least one of the four major methodological issues known to cause substantial bias (Figure S2).

Table 3.

Methodological characteristics of the reviewed studies

Methodological characteristics Total (N = 75) Cohort studies (N = 65) Case-control studies (N = 10)
Nonuser comparator,a n (%) 41 (55) 31 (48) 10 (100)
Prevalent-user design, n (%) 38 (51) 28 (43) 10 (100)
Major methodological issues
 Time-related bias (i.e., immortal person-time in cohort studies, time-window bias in case-control studies), n (%) 43 (57) 41 (63) 2 (20)
 Adjustment for postbaseline variables without appropriate statistical models, n (%) 31 (41) 21 (32) 10 (100)
 Depletion of outcome-susceptible individuals, n (%) 33 (44) 23 (35) 10 (100)
 Reverse causation, n (%) 29 (39) 25 (38) 4 (40)
 Number of major methodological issues per study, median (IQR) 2 (1–3) 2 (1–3) 2.5 (2–3)
Other methodological issues
 Insufficiently addressed confounding, n (%) 56 (75) 49 (75) 7 (70)
  Lack of adjustment for potential confounders available in claims—any (except calendar timeb) 54 (72) 47 (72) 7 (70)
   Age 2 (3) 2 (3) 0 (0)
   Gender 5 (7) 4 (6) 1 (10)
   Calendar year 48 (64) 45 (69) 3 (30)
   Indicating disease or its severity 17 (23) 13 (20) 4 (40)
   Comorbidities 2 (3) 2 (3) 0 (0)
   Prior medication use 16 (21) 13 (20) 3 (30)
   Prior healthcare utilization 26 (65) 44 (68) 5 (50)
  Failure to evaluate or control for confounders unavailable in claims 16 (21) 12 (18) 4 (40)
 Detection bias, n (%) 16 (21) 11 (17) 5 (50)
 Exposure misclassification, n (%) 23 (31) 21 (32) 2 (20)
 Outcome misclassification, n (%) 8 (11) 7 (11) 1 (10)
 Informative censoring, n (%) 6 (8) 6 (9) 0 (0)

IQR, interquartile range (Q1–Q3).

a

In two cohort studies, the treatment under study was compared with both nonuse and an active comparator.

b

Since calendar time may not always be a confounder and may sometimes act as an instrumental variable, the lack of adjustment for calendar time was not considered a source of bias in this review.

The most common major methodological issue was the potential for time-related bias, which was found in 57% of studies: 63% of cohort studies had the potential for immortal time bias and 20% of case-control studies had the potential for time-window bias. Potential for bias due to the depletion of susceptible individuals was found in 44% of studies, while 41% of studies adjusted for postbaseline variables without appropriate statistical models. The potential for bias due to reverse causation was found in 39% of the reviewed studies. Overall, the median number of major methodological issues per study was 2 (IQR, 1–3).

For the remaining domains, potential for insufficiently addressed confounding was found in 75% of studies. Almost all studies adjusted for age, sex, and comorbidities; however, a quarter of the studies (23%) lacked adjustment for treatment indication or the severity of the indicating disease, 21% lacked adjustment for prior medication use, and 65% did not adjust for any markers of prior healthcare utilization. In the sensitivity analysis where studies with an active comparator were not counted as lacking adjustment for indicating disease or its severity in the absence of model-based adjustment for indicating disease, 55 studies (73%) were still deemed to have potential for insufficiently addressed confounding. Potential for bias due to inappropriate selection of exposure risk windows (exposure misclassification) was found in a third of reviewed studies (31%), while 11% of studies had potential for outcome misclassification. Potential for bias due to differential outcome detection was found in 21% of studies. Finally, potential for bias due to informative censoring was found in 9% of cohort studies. The median number of any methodological issues per study was 3 (IQR, 2–5) and only four studies (5%) appeared not to have any (Figure S1).

Subgroup analyses

We did not observe substantial differences in methodological characteristics of the reviewed studies published in later years (2015–2019) as compared with earlier years (2010–2014), despite the decrease in the use of a nonuser comparison (from 70% to 46%; Tables S4 and S5). There were fewer studies with inappropriate adjustment for postbaseline variables (decrease from 59% to 31%), but more studies with potential for time-related bias (increase from 41% to 67%) in later years.

Median journal impact factor was 3.27, and there were no substantial differences in the prevalence of methodological issues across the subgroups of studies published in journals with impact factor > 3.27, as compared with studies published in journals with impact factor of 3.27 or lower (Tables S6 and S7). As compared with the studies that used claims only (n = 29), studies with linked data (n = 33) had a higher frequency of nonuser comparison (70% vs 38%) and a prevalent-user design (64% vs 38%), as well as a higher number of major methodological issues per study with the median of 2 (IQR 1–3) as compared with the median of 1 (IQR 1–2) in claims-only studies (Tables S8 and S9).

More than half of the studies (55%) utilized a nonuser comparator and a similar proportion of studies (51%) evaluated prevalent users (Table 3). All case-control studies evaluated prevalent users and compared them with individuals who were not using the treatment of interest (nonusers). Cohort studies with a prevalent-user design and a nonuser comparator (n = 21) had a higher prevalence of all four major methodological issues (Table S10) as compared with studies with a new-user design and an active comparator (n = 27). The median number of major issues per study was 3 (IQR 3–4) in prevalent-user, nonuser-comparison cohort studies, while the corresponding median was 1 (IQR 0–1) in an active-comparator, new-user cohort studies.

DISCUSSION

In this systematic review of a random sample of RWE studies that examined medication safety or effectiveness using US healthcare insurance claims, we found that the majority of studies had at least one major but avoidable methodological pitfall that carried substantial risk of bias. Half of the reviewed studies were found to have two or more major methodological pitfalls. Time-related issues, such as immortal person-time, were the most prevalent major methodological challenges. Potential for bias due to depletion of outcome-susceptible individuals, inappropriate adjustment for variables measured during follow-up, and reverse causation were each found in at least a third of the reviewed studies.

The increasing availability of electronic health data has led to the abundance of studies that utilize these data sources to generate evidence on the effectiveness and safety of treatments. In addition, as RWD reflect experiences of a wide spectrum of patients using treatments in routine clinical settings, there is also a growing demand for RWE, which is increasingly seen as complementary to data from tightly controlled RCTs.1 Yet, while RWD analysis can generate reliable and valid evidence, potential for biased findings increases substantially in the absence of a rigorous and appropriate methodological approach to study design and analysis.10,15

The complexity of data and analytic techniques implemented to evaluate the effects of health interventions in RWD make recognition of major methodological problems difficult, which may be the reason behind the high prevalence of design-related issues in published RWE on medication safety and effectiveness. Papers on methodological issues in observational studies of therapeutics and approaches to mitigate them are published every year, yet their impact and reach seems to be lagging. In our analyses, we did not observe substantial differences in the prevalence of major methodological pitfalls over time, except for inappropriate adjustment for postbaseline variables, which decreased by half. There were no significant differences in the quality of RWE published in journals with higher vs. lower impact factor, suggesting that the current peer review process may not be sufficient to ensure internal validity and high quality of RWE studies. Many of the existing checklists, such as STROBE (Strengthening The Reporting of OBservational Studies in Epidemiology), focus on reporting and are not designed to detect the risk of bias in RWE analyses. And many journals may be struggling to identify and assign expert reviewers under the increasing volume of RWE submissions.

We observed, however, a lower prevalence of methodological issues in studies that implemented both a new-user design and an active comparator as compared with studies that evaluated prevalent users and compared them with nonusers. This finding confirms prior research showing that many study design errors can be avoided through the use of new-user design that anchors study timelines around the initiation of treatment, as in designs targeting the emulation of a hypothetical randomized trial.9,29 An active comparator, preferably given for the same indication and at the same disease stage as the treatment under study, further ensures comparability of exposure groups, minimizes potential for immortal time bias, and reduces confounding by unmeasured factors.10,30 Although new-user designs or active comparators are not always possible, they provide numerous advantages and safeguards against bias.

A few points should be kept in mind when interpreting our findings. In this evaluation, we focused on study design and analysis issues that are under the control of investigators and would be common to medication safety and effectiveness evaluations done using administrative claims. While administrative claims represent the most common source of RWE, RWD also include data collected for research purposes, such as data from registries and prospective cohort studies. The prevalence of major methodological flaws may be different in studies done using these alternative data sources, as well as in other areas of RWE, such as healthcare utilization or medication adherence, particularly, if research aims are more descriptive. In addition, since our review was limited to studies that utilized US data sources, our findings may not be generalizable to studies done using non-US data.

Our evaluation was based on our current understanding of sources of bias and challenges in nonrandomized studies of drug safety or effectiveness. As this understanding is constantly evolving, it is possible that there are other mechanisms, not included in our review, through which nonrandomized studies can incur bias. Our questionnaire was developed in one academic institution and was not extensively validated, although it was based on other established tools, such as ROBINS-I, and observational study design and analysis principles largely established in pharmacoepidemiology. It is worth noting that while we deemed time-related bias, depletion of outcome-susceptible individuals, reverse causation, and inappropriate adjustment for postbaseline variables as major sources of bias, the other issues, such as confounding, differential surveillance (detection bias) or informative censoring, may still lead to biased results in a particular study.31,32 Readers can evaluate the questions used to assess each potential bias and make their own judgements about which questions and bias domains are more important.

While every attempt was made to base our review solely on information presented in reviewed studies, some subjective judgement on the part of the reviewers cannot be excluded. Moreover, ambiguity or lack of sufficient detail in reporting of study details may have led to misinterpretation of analytic choices and underestimation or overestimation of the prevalence of well-known study design and analysis pitfalls in our sample. Finally, while we evaluated the design and analytic choices in each study, we did not quantify the amount of bias that resulted from these choices, which could range from substantial to minimal. Potential for bias does not mean that study results are substantially biased, and studies with multiple sources of bias may end up with less biased results than studies with only one critical source of bias. Neither have we evaluated the appropriateness of either the data source or the study design to answer the question of interest. Ultimately, however, researchers should aim to avoid all possible study design flaws to minimize bias and maximize study validity, keeping in mind that some questions may not be answerable with available data.

CONCLUSION

In conclusion, the prevalence of well-known study design and analysis pitfalls that may lead to biased results is high in published RWE studies of medication safety and effectiveness. Producing high-quality RWE remains a challenge; however, this challenge is not insurmountable, as has been shown by many successful and valid analyses of real-world data.6,33 Our findings, while bringing into focus the quality of current medication safety and effectiveness RWE studies, should not be interpreted as questioning the value of observational research in general. Well-conducted RWE studies can generate much-needed evidence that is often complementary to evidence from RCTs.

The challenge is making sure that real-world data are analyzed appropriately and, at least for the time being, being able to recognize flawed study designs and analyses. As our review and the recent barrage of low-quality studies on therapeutics for the COVID-19 pandemic showed, conducting and analyzing real-world studies requires substantial expertise not only in the relevant clinical areas, but also in methodologies needed to analyze RWD and the utilized data sources. Many nonrandomized studies fail to produce high-quality evidence not due to the failure of available methods to deal with the challenges of nonrandomized studies, but due to a failure to adhere to known pharmacoepidemiologic principles. Researchers should either ensure they have a clear understanding as to how to validly analyze RWD or have experts with relevant experience and expertise on their teams. Journals should seek out methods-trained reviewers with a publication track record in nonrandomized research based on RWD. Professional societies should be more proactive in educating both their own membership and professionals from other disciplines that rely on RWE, and work to develop practical guidelines and tutorials.

Recognizing that analyzing RWD is challenging should not be a detriment to generating and using RWE in decision making, but it should represent a call for improvement of the quality of real-world studies that needs to be addressed by all involved stakeholders.

Supplementary Material

Supplement

Study Highlights.

WHAT IS THE CURRENT KNOWLEDGE ON THE TOPIC?

  • Many analyses of real-world data incur substantial, but avoidable bias from methodological pitfalls in study design and analysis.

WHAT QUESTION DID THIS STUDY ADDRESS?

  • This study quantified the prevalence and frequency of avoidable, bias-inducing methodological issues in real-world studies that utilized US healthcare claims to evaluate treatment effects.

WHAT DOES THIS STUDY ADD TO OUR KNOWLEDGE?

  • The majority of RWE studies suffer from self-inflicted methodological pitfalls that may substantially undermine study validity; time-related bias was the most frequent methodological challenge.

HOW MIGHT THIS CHANGE CLINICAL PHARMACOLOGY OR TRANSLATIONAL SCIENCE?

  • Real-world evidence has the potential to generate robust evidence on what works in real practice; however, opportunities for bias abound and many come from avoidable methodological missteps in study design and analysis. Recognizing and avoiding known methodological pitfalls could substantially improve the quality of real-world studies, ultimately increasing confidence in evidence generated from them.

FUNDING

This study was funded by a contract from the National Pharmaceutical Council. K.B. was supported by a training grant from the National Institute of Child Health and Human Development (T32HD40128). E.P. was supported by a career development grant K08AG055670 from the National Institute on Aging.

CONFLICT OF INTEREST

K.B. received consulting fees from Alosa Health, unrelated to the submitted work. E.P. was an investigator of an investigator-initiated grant to the Brigham and Women’s Hospital from Boehringer Ingelheim, not related to the topic of the submitted work. J.S.G. was an employee of the National Pharmaceutical Council, a nonprofit health policy research organization funded by US biopharmaceutical companies. All other authors declared no competing interests for this work.

Footnotes

SUPPORTING INFORMATION

Supplementary information accompanies this paper on the Clinical Pharmacology & Therapeutics website (www.cpt-journal.com).

References

  • 1.Corrigan-Curay J, Sacks L & Woodcock J Real-world evidence and real-world data for evaluating drug safety and effectiveness. JAMA 320, 867–868 (2018). [DOI] [PubMed] [Google Scholar]
  • 2.Black N Why we need observational studies to evaluate the effectiveness of health care. BMJ 312, 1215–1218 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Schneeweiss S et al. Real world data in adaptive biomedical innovation: a framework for generating evidence fit for decision-making. Clin. Pharmacol. Ther 100, 633–646 (2016). [DOI] [PubMed] [Google Scholar]
  • 4.Booth CM & Tannock IF Randomised controlled trials and population-based observational research: partners in the evolution of medical evidence. Br. J. Cancer 110, 551–555 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Schneeweiss S Improving therapeutic effectiveness and safety through big healthcare data. Clin. Pharmacol. Ther 99, 262–265 (2016). [DOI] [PubMed] [Google Scholar]
  • 6.Hernán MA et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology 19, 766–779 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lévesque LE, Hanley JA, Kezouh A & Suissa S Problem of immortal time bias in cohort studies: example using statins for preventing progression of diabetes. BMJ 340, b5087 (2010). [DOI] [PubMed] [Google Scholar]
  • 8.Emilsson L, García-Albéniz X, Logan RW, Caniglia EC, Kalager M & Hernán MA Examining bias in studies of statin treatment and survival in patients with cancer. JAMA Oncol. 4, 63–70 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Dickerman BA, García-Albéniz X, Logan RW, Denaxas S & Hernán MA Avoidable flaws in observational analyses: an application to statins and cancer. Nat. Med 25, 1601–1606 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Franklin JM & Schneeweiss S When and how can real world data analyses substitute for randomized controlled trials? Clin. Pharmacol. Ther 102, 924–933 (2017). [DOI] [PubMed] [Google Scholar]
  • 11.Patorno E, Schneeweiss S, Gopalakrishnan C, Martin D & Franklin JM Using real-world data to predict findings of an ongoing phase IV cardiovascular outcome trial: cardiovascular safety of linagliptin versus glimepiride. Diabetes Care 42, 2204–2210 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Schneeweiss S, Seeger JD, Landon J & Walker AM Aprotinin during coronary-artery bypass grafting and risk of death. N Engl. J. Med 358, 771–783 (2008). [DOI] [PubMed] [Google Scholar]
  • 13.Suissa S Immortal time bias in observational studies of drug effects. Pharmacoepidemiol. Drug Saf 16, 241–249 (2007). [DOI] [PubMed] [Google Scholar]
  • 14.Hernán MA & Robins JM Using big data to emulate a target trial when a randomized trial is not available. Am J. Epidemiol 183, 758–764 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Stürmer T, Wang T, Golightly YM, Keil A, Lund JL & Jonsson Funk M Methodological considerations when analysing and interpreting real-world data. Rheumatology (Oxford) 59, 14–25 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Suissa S & Dell’Aniello S Time-related biases in pharmacoepidemiology. Pharmacoepidemiol. Drug Saf 29, 1101–1110 (2020). [DOI] [PubMed] [Google Scholar]
  • 17.Bykov K, He M, Franklin JM, Garry EM, Seeger JD & Patorno E Glucose-lowering medications and the risk of cancer: a methodological review of studies based on real-world data. Diabetes Obes. Metab 21, 2029–2038 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sterne JAC et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ 355, i4919 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Dreyer NA, Bryant A & Velentgas P The GRACE checklist: a validated assessment tool for high quality observational studies of comparative effectiveness. J. Manag. Care Spec. Pharm 22, 1107–1113 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Berger ML et al. A questionnaire to assess the relevance and credibility of observational studies to inform health care decision making: an ISPOR-AMCP-NPC Good Practice Task Force report. Value Health 17, 143–156 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Suissa S & Azoulay L Metformin and the risk of cancer: time-related biases in observational studies. Diabetes Care 35, 2665–2673 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hernán MA, Hernández-Díaz S & Robins JM A structural approach to selection bias. Epidemiology 15, 615–625 (2004). [DOI] [PubMed] [Google Scholar]
  • 23.Robins JM, Hernán MA & Brumback B Marginal structural models and causal inference in epidemiology. Epidemiology 11, 550–560 (2000). [DOI] [PubMed] [Google Scholar]
  • 24.Schisterman EF, Cole SR & Platt RW Overadjustment bias and unnecessary adjustment in epidemiologic studies. Epidemiology 20, 488–495 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Walker AM Confounding by indication. Epidemiology 7, 335–336 (1996). [PubMed] [Google Scholar]
  • 26.Gruber S et al. Design and analysis choices for safety surveillance evaluations need to be tuned to the specifics of the hypothesized drug-outcome association. Pharmacoepidemiol. Drug Saf 25, 973–981 (2016). [DOI] [PubMed] [Google Scholar]
  • 27.Moride Y & Abenhaim L Evidence of the depletion of susceptibles effect in non-experimental pharmacoepidemiologic research. J. Clin. Epidemiol 47, 731–737 (1994). [DOI] [PubMed] [Google Scholar]
  • 28.Ray WA Evaluating medication effects outside of clinical trials: new-user designs. Am J. Epidemiol 158, 915–920 (2003). [DOI] [PubMed] [Google Scholar]
  • 29.Hernán MA, Sauer BC, Hernández-Díaz S, Platt R & Shrier I Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. J. Clin. Epidemiol 79, 70–75 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Lund JL, Richardson DB & Stürmer T The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application. Curr. Epidemiol. Rep 2, 221–228 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Gopalakrishnan C, Bykov K, Fischer MA, Connolly JG, Gagne JJ & Fralick M Association of fluoroquinolones with the risk of aortic aneurysm or aortic dissection. JAMA Intern. Med 180, 1596–1605 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Dormuth CR et al. Statin adherence and risk of accidents: a cautionary tale. Circulation 119, 2051–2057 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Franklin JM et al. Emulating randomized clinical trials with nonrandomized real-world evidence studies: first results from the RCT DUPLICATE initiative. Circulation 143, 1002–1013 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

RESOURCES