Use of Stacked Proportional Bar Graphs (“Grotta Bars”) in Observational Neurology Research: A Meta-Research Study

Meghan R Forrest; Tracey L Weissgerber; Emma S Lieske; Elena Tamayo Cuartero; Elena Fischer; Lydia Jones; Marco Piccininni; Jessica L Rohmann

doi:10.1212/WNL.0000000000210169

. 2025 Feb 3;104(4):e210169. doi: 10.1212/WNL.0000000000210169

Use of Stacked Proportional Bar Graphs (“Grotta Bars”) in Observational Neurology Research

A Meta-Research Study

Meghan R Forrest ^1,^2,^✉, Tracey L Weissgerber ³, Emma S Lieske ^1,², Elena Tamayo Cuartero ^1,², Elena Fischer ¹, Lydia Jones ¹, Marco Piccininni ^1,², Jessica L Rohmann ^1,²

PMCID: PMC11793921 PMID: 39899788

Abstract

Background and Objectives

Stacked proportional bar graphs (nicknamed “Grotta bars”) are commonly used to visualize functional outcome scales in stroke research and are also used in other domains of neurology research. While lending themselves to a straightforward causal interpretation in ideal randomized controlled trials, in observational studies, Grotta bars cannot be generally interpreted causally if they show unadjusted, confounded comparisons. In a sample of recent observational neurology studies with confounding-adjusted effect estimates, we aimed to determine the frequency with which Grotta bars were used to visualize functional outcomes and how often unadjusted Grotta bars were presented without an accompanying adjusted version. We also assessed the methods used to generate adjusted Grotta bars.

Methods

We identified the 15 top-ranked clinical neurology journals, according to journal impact factor, publishing full-length original research in English. Using PubMed, we retrieved all records published in these journals between 2020 and 2021 after applying a filter for observational studies. We included and systematically examined all observational studies aiming to identify a cause-and-effect relationship with an ordinal functional outcome and confounding-adjusted effect estimate. We determined whether at least 1 comparison using Grotta bars was present, whether the visualized comparisons were adjusted, and which adjustment strategies were applied to generate these graphs.

Results

A total of 250 studies met all inclusion criteria. Of these, 93 (37.2%) used Grotta bars to depict functional outcome scale distributions, with 76 (81.7%) presenting only Grotta bars without model-based adjustment. These bars were most commonly presented in studies with stroke patient populations; 87 of 192 studies (45.3%) presented Grotta bars. Among the 17 studies that presented Grotta bars adjusted using a model, the adjustment strategies included propensity score matching (n = 10; 58.8%), regression (n = 6; 35.3%), and inverse probability weighting (n = 1; 5.9%).

Discussion

Studies that presented adjusted associations for functional outcomes commonly showed only unadjusted Grotta bars, which alone have little value for causal questions. In observational research, Grotta bars are most informative if an adjusted version, aligning with adjusted effect estimates, is presented directly alongside the unadjusted version. Based on our findings, we offer recommendations to help authors generate more informative Grotta bars and to facilitate correct interpretation for readers.

Introduction

Stacked proportional bar graphs are commonly used to present functional outcomes that quantify global physical functional ability on an ordinal scale.¹ In the stroke field, these visualizations acquired the nickname “Grotta bars” after James C. Grotta used them to compare functional outcome scales between intervention groups in the 1995 rt-PA trial.^2,3 Thereafter, Grotta bars became standard in the reporting of clinical stroke studies.¹ These graphs are especially attractive for presenting ordinal outcomes, such as the 7-point modified Rankin Scale (mRS).^1,4 They allow for the visual representation of each level of the scale,^1,3 which is preferable to discarding information by dichotomizing ordinal outcomes (e.g., “good outcome” vs “poor outcome”).^4-8 By depicting all individual levels of an ordinal scale, Grotta bars granularly portray the difference (“shift”) of the full mRS distribution between the exposure groups.^3,4 These characteristics make Grotta bars particularly useful to researchers, policymakers and decision makers, clinicians, and patients when interpreting study results.^5,8,9

In ideal, large randomized controlled trials, any visible shift in the distribution of the outcome between exposure groups has a direct causal interpretation that corresponds to the effect of the intervention. The interpretation of Grotta bars in observational studies is more complex. Shifts in unadjusted Grotta bars showing only observed data (e.g., without confounding control) do not answer causal questions and have the potential to mislead readers if interpreted causally.⁴ However, it is possible to produce adjusted Grotta bars that are causally interpretable and retain the advantages of this type of visualization by applying suitable causal inference methods.¹⁰

A recent study illustrated this problem by using data from a large registry to examine the relationship between being “discharged home” vs “discharged elsewhere” and functional outcomes after a stroke or transient ischemic attack (TIA).⁴ The unadjusted results seemed to show a dramatic difference. However, evidence of any association between the exposure and outcome disappeared once inverse probability weighting was applied to control for the following confounding variables that were determined a priori: age, sex, presence of a disease with a life expectancy of less than 1 year, TIA diagnosis, mRS scores at hospital admission, NIH Stroke Scale score at hospital admission, number of days in hospital before discharge, and mRS scores at discharge. The authors then generated 2 Grotta bar graphs depicting the unadjusted and adjusted outcome comparisons (Figure 1). The large shift seen in the unadjusted Grotta bars was greatly attenuated after adjustment for confounding, which aligned with the computed unadjusted and adjusted common odds ratios.⁴

Grotta bars depicting the distributions of mRS scores for patients with ischemic stroke by discharge type (discharged home or discharged elsewhere) without confounding adjustment (A) and with confounding adjustment using inverse probability weighting (B). Figure adapted from Rohmann et al. (Figure 2).⁴ Not all displayed percentages add up to 100% due to rounding. mRS = modified Rankin Scale.

Researchers can use several different methods to generate confounding-adjusted effect estimates¹⁰ and Grotta bars.⁴ In observational neurology studies, confounding control is frequently performed using traditional outcome regression or other methods such as propensity score–based adjustment (including weighting and matching). A few articles in the stroke literature have presented confounding-adjusted Grotta bars to accompany their confounding-adjusted effect estimates,^11,12 although this is uncommon.

While we know that this phenomenon is relevant in the clinical literature about stroke populations, it may also be relevant to research examining other neurologic diseases that also use ordinal functional outcomes. In observational neurology studies that adjusted for confounding, we aimed to assess how often Grotta bars were used to visualize ordinal functional outcomes, how often adjusted effect estimates were accompanied by adjusted Grotta bars, and which statistical methods were used to generate confounding-adjusted Grotta bars.

Methods

We systematically reviewed observational studies in top-ranking clinical neurology journals. No informed consent or ethics approval was required for this cross-sectional study of published literature.

Standard Protocol Approvals, Registrations, and Patient Consents

All protocols underwent feasibility testing and were pre-registered on our Open Science Framework (RRID: SCR_003238) repository, which also contains full protocols, protocol modifications, and a full abstraction table.¹³

Journal Screening

We retrieved a list of all journals indexed in the 2021 Science Citation Index Expanded classification of the Journal Citation Reports (RRID: SCR_017656) Clinical Neurology category and then sorted this list according to the 2021 Journal Impact Factor. Starting with the highest impact factor, 2 independent reviewers (M.R.F. and E.S.L.) evaluated whether each journal (1) published articles in English and (2) published full-length original research. There were no discrepancies between reviewers. The top 15 journals meeting these inclusion criteria were incorporated into our search strategy (eTable 1).

Search Strategy

Our search strategy was developed in consultation with an information specialist (L.J.). All journals included in our search strategy were indexed in PubMed (RRID: SCR_004846) in 2020 and 2021. We retrieved all bibliographic records from PubMed, which were published in the journals identified for inclusion either in print or electronically between 2020 and 2021, and applied a validated search filter to identify observational research (sensitivity: 92.4%, specificity: 79.7%).¹⁴ The full search strategy is detailed in eTable 2.

Inclusion and Exclusion Criteria

We included all articles in our sample that (1) were full-length, original research; (2) were written in English; (3) had human participants; (4) were observational studies that aimed to identify a cause-effect relationship; (5) had a functional outcome as a dependent variable of any analysis within the study; and (6) contained a confounding-adjusted effect estimate for the functional outcome.

We excluded each article from our study that was (1) short-form research, such as a brief report, research letter, an article without a Background, Methods, Results, or Discussion section, or not original research; (2) published in a language other than English; (3) an animal study; (4) a prediction model development study, systematic review, or nonobservational research; (5) lacking a functional outcome as a dependent variable of the analysis; or (6) without an effect estimate for the functional outcome or contained only effect estimates, which were not adjusted for confounding.

Article Screening

Both title/abstract screening and full-text screening were conducted using Rayyan QCRI (RRID: SCR_017584). All bibliographic records retrieved from the search strategy first underwent title/abstract screening by 2 independent reviewers (M.R.F. and either E.S.L. or E.T.C.). If reviewers were unsure or disagreed about whether an article met all inclusion criteria, the article proceeded to full-text screening, during which 2 independent reviewers (M.R.F. and either E.F., E.S.L., or E.T.C.) evaluated its qualification for inclusion. All 4 reviewers (M.R.F., E.F., E.S.L., E.T.C.) resolved discrepancies arising from full-text screening through group consensus. If consensus could not be achieved, a fifth reviewer (J.L.R.) was consulted for arbitration.

Abstraction

Each article that met all inclusion criteria was abstracted by 2 independent abstractors (M.R.F. and either E.F., E.S.L., or E.T.C). Abstractors first confirmed that the article met all inclusion criteria.

The following data were abstracted for analysis:

Characteristics of each article (study design, population, functional outcome).
Presence of Grotta bars visualizing the functional outcome (yes/no).
Presence of figures that were not Grotta bars to visualize the functional outcome (yes/no).
For each Grotta bar graph: was a model-based adjustment method (e.g., ordinal regression, propensity score matching, inverse probability weighting) applied to create the bars? If yes, which method was applied?
For each Grotta bar graph: was it stratified?

Statistical Analysis

In this descriptive analysis, summary statistics were calculated using R (version 4.2.2) and RStudio (version 2023.03.0+386). Frequencies and percentages were used to describe all categorical variables.

Grotta bars that were generated using a model-based adjustment method were categorized differently than those that were stratified. Although stratification is a valid strategy to remove confounding, it only removes confounding related to the stratifying variable(s). Grotta bars comparing exposure groups stratified by only a few variables (for example, only by sex) are likely not sufficient to address the threat of unmeasured confounding. For this reason, we considered stratified Grotta bars as a standalone category.

Per our inclusion criteria, we sought to include studies with causal aims. Adjustment for confounding is a characteristic of causal research¹⁵; however, ascertaining whether studies have causal intentions is often not straightforward. The language authors use can be unreliable when determining whether a study has causal aims.^16,17 Furthermore, the statistical methods applied in causal and predictive studies often overlap, with interpretations that are frequently conflated in health research.^18,19

Our complete “likely causal” sample contained studies that the review team identified as having potential causal aims. Some of these articles also had characteristics of predictive research. However, these studies could not be definitively ruled out for not having causal aims when assessed for inclusion. We further conducted a sensitivity analysis with stricter criteria for studies to be considered “very likely causal.” This sensitivity analysis was not prespecified in the original study pre-registration. More details about the sensitivity analysis can be found in the supplement (eAppendices 1 and 2).

Data Availability

All data and code are openly available and can be accessed on our GitHub repository.²⁰

Results

Study Sample

After screening all 4,404 retrieved records, our overall abstracted sample consisted of 250 articles (Figure 2). These articles included the following patient populations: stroke (n = 192; 76.8%), multiple sclerosis (n = 25; 10%), Parkinson disease (n = 12; 4.8%), traumatic brain injury (n = 4; 1.6%), Huntington disease (n = 3; 1.2%), encephalitis (n = 3; 1.2%), Guillain-Barré (n = 3; 1.2%), and other neuropathologies (n = 10; 4.0%) (eTable 3). Two studies contained dual-pathology populations (hemorrhagic stroke and traumatic brain injury; ischemic stroke and acute myocardial infarction) and were each counted twice, once for each patient population.

“Not full-length original research” indicates short-form research (e.g., brief report; research letter; an article without a Background, Methods, Results, or Discussion section; or not original research). “Excluded study design” indicates a prediction model, systematic review, or nonobservational research. “Does not meet functional outcome criteria” indicates that no outcome in the study meets our functional outcome criteria (given in eAppendix 3: Definitions). “No adjusted effect estimate reported” indicates no effect estimate for the functional outcome reported or that the only reported effect estimate was not adjusted. Records that seemed to meet all inclusion criteria during title/abstract screening underwent another validation that all inclusion criteria were met. PRISMA = Preferred Reporting Items for Systematic reviews and Meta-Analyses.

Prevalence of Grotta Bars

In our overall sample (n = 250), 93 studies (37.2%) used at least 1 Grotta bar graph to report exposure-outcome relationships (Figure 3). Some included studies presented more than 1 Grotta bar graph, but the same functional outcome was visualized across all Grotta bars in these studies. The mRS score was the most frequently visualized functional outcome (n = 86; 92.4%), followed by the NIH Stroke Scale (n = 2; 2.2%), Guillain-Barré syndrome disability score (n = 2; 2.2%), Pediatric Stroke Outcome Measure (n = 1; 1.1%), Glasgow Outcome Scale–Extended (n = 1; 1.1%), and the Rankin Scale (n = 1; 1.1%) (eTable 4). Most studies using Grotta bars had a stroke patient study population (n = 87; 93.5%), followed by encephalitis (n = 3; 3.2%), Guillain-Barré (n = 2; 2.2%), traumatic brain injury (n = 1; 1.1%), and acute myocardial infarction (n = 1; 1.1%). Therefore, 45.3% (87/192) of articles with stroke patient populations presented Grotta bars.

This table outlines whether each study contained Grotta bars that were unadjusted, stratified, and/or adjusted using a model to visualize functional outcomes, in addition to whether other visualization strategies were used to depict functional outcomes. The bar graph to the right of the table depicts the number of studies using each combination of visualization strategies as an absolute count. An article was classified as having “other visualization types” if that article presented at least 1 visualization of the functional outcome that was not a Grotta bar graph.

Adjusted Grotta Bars

In accordance with our inclusion criteria, all studies in our sample reported effect estimates for 1 or more functional outcome(s) that were adjusted using a model-based approach. Among the 93 studies with Grotta bars, only 17 (18.3%) depicted Grotta bars that were adjusted using a model-based approach. The model-based adjustment strategies that were used in our sample included propensity score matching (n = 10; 58.8%), ordinal regression (n = 6; 35.3%), and inverse probability weighting (n = 1; 5.9%).

Furthermore, 25 of the 93 studies (26.9%) with Grotta bars visualized functional outcomes using stratified Grotta bars for at least 1 variable. Of these, 20 studies (80.0%) did not further present Grotta bars adjusted using a model. We present an overview of the adjustment strategies of the Grotta bars (unadjusted, stratified, and/or model-based adjustment) and whether articles used other visualization strategies in Figure 3.

Sensitivity Analysis

The results of the sensitivity analysis, which only included studies meeting stronger criteria to be considered causally aimed, were very similar to the results of the main analysis (eAppendix 1). This suggests that our results are robust to different definitions of the study aims.

Inter-Rater Reliability

Agreement between reviewers was substantial during title and abstract screening (Cohen κ = 0.601) and moderate during full-text screening (Cohen κ = 0.516). Most discrepancy resolution discussions during full-text screening focused on determining whether studies aimed to assess cause-and-effect relationships.

Discussion

We performed a cross-sectional meta-research study of observational, neurology research studies published in 15 top-ranking journals in 2020 and 2021 that present associations between an exposure and a functional outcome adjusted for confounding. Our study determined that stroke research, in particular, often used Grotta bars to visualize relationships between exposures and functional outcomes. We also identified isolated uses in encephalitis, Guillain-Barré, and traumatic brain injury research (as well as acute myocardial infarction in a population combined with a stroke). By design, every article included in our study seemed to control for confounding using a model-based method. However, more than 80% of studies reporting Grotta bars did not adjust Grotta bars using a model-based method. This indicates that generating adjusted Grotta bars reflective of their accompanying adjusted effect estimates is not common practice in observational neurology research.

Grotta bars are useful for depicting ordinal functional outcomes by exposure groups at a high level of granularity. When the levels of the outcome scale are not collapsed (e.g., dichotomized or trichotomized), readers can readily identify the differences in the full outcome distributions. This feature is important for patients and clinicians interested in observing the shift at a specific, clinically relevant level of granularity.⁶ We, therefore, recommend presenting all levels of the outcome scale on Grotta bars. This presentation promotes the readers' understanding of the exposure-outcome relationship under study, which is the main strength of this type of visualization. This practice further adheres to current guidelines for the statistical analysis of ordinal functional outcomes⁵ and, as a result, can help inform decision making for clinicians and other stakeholders.^5,6

Grotta bars can be misleading if they do not include the adjustments applied in the analysis.⁴ Unadjusted Grotta bars may be useful in observational studies to descriptively present the observed exposure-outcome associations, but they cannot generally be interpreted causally. Even if observational study authors do not intend for unadjusted Grotta bars to be interpreted causally, readers who are familiar with these intuitive bars from randomized trials and eager to know the effect of the intervention across the entire outcome distribution may be tempted to improperly infer causality. For this reason, we advise authors of observational studies to present an adjusted version alongside unadjusted Grotta bars when aiming to answer causal questions. We believe this recommendation complements checklist item 16b of the Strengthening the Reporting of Observational Studies in Epidemiology guidelines, which instructs to “give unadjusted estimates and, if applicable, confounder-adjusted estimates.”²¹

Three reported methods of model-based confounding adjustment were used to create adjusted Grotta bars in our sample: ordinal outcome regression, propensity score matching, and inverse probability of treatment weighting. These adjustment strategies are presented in Figure 4. Stratification by a variable other than the exposure was commonly encountered in Grotta bars both with and without model-based adjustment. Stratification alone, however, is likely not sufficient to address the threat of confounding.

Among our sample of 17 studies that depict Grotta bars adjusted using models, ordinal regression (n = 6), propensity score matching (n = 10), and inverse probability weighting (n = 1) were the techniques used to adjust the visualizations.

Selecting an appropriate adjustment strategy is imperative to generate Grotta bars that are both interpretable and aligned with the target causal effect²² (Figures 4 and 5). In the same set of data with identical exposures, outcomes, and confounders, the application of different adjustment strategies can produce different results because the methods may target different causal effects.²² Researchers must start with a well-defined question and then select the appropriate adjustment strategy to estimate the causal effect targeted by their research question. This is essential to draw valid causal inferences from observational analyses.⁴

We provide guidance for creating interpretable and transparent Grotta bars based on the barriers limiting the interpretability of Grotta bars identified in our study.

Research designed to affect public health policies often aims to determine the causal effect across the entire study population.¹⁰ In this situation, inverse probability weighting is an adjustment strategy that is particularly suitable to remove confounding influence and generate corresponding adjusted Grotta bars.⁴ This strategy allows us to estimate the average (i.e., marginal) effect among all individuals within a given study population, which is typically the effect reported in randomized controlled trials.¹⁰ Interested readers may refer to a detailed tutorial on how to build and interpret Grotta bars by applying inverse probability weighting, using stroke registry data.⁴ Our results suggest that this technique is underused in observational studies published in top clinical neurology journals.

While unmeasured confounding can be a major obstacle for causal inference in observational settings, observational data sets can be useful when the experimental studies needed to answer a research question are unethical or unfeasible.²³ Effect estimates and visualizations depicting results from observational studies have the potential to be endowed with a causal interpretation if careful considerations are made in the design and analysis phases.¹⁰ Although it is an important first step, the presentation of adjusted Grotta bars from observational data does not alone guarantee a straightforward causal interpretation (Figure 6).^10,24

Guidance for interpreting Grotta bars when the association between the exposure and functional outcome is adjusted in the analysis. We recommend that readers take a nuanced approach when interpreting Grotta bars in observational neurology research depending both on whether adjustment is present and which adjustment method is used.

This study has several limitations. First, our sample only included articles from the top-ranked subset of journals from the Journal Citation Reports “Clinical Neurology” category. In addition, only articles written in English were included in our sample. Our results may not be generalizable to articles published in other journals. Second, only the main texts of articles were examined. It is possible that supplementary materials for publications containing Grotta bars went unaccounted for in our study, and that these Grotta bars were generated differently than those present in the main article. Third, we acknowledge that authors of studies presenting only unadjusted Grotta bars may not have intended to use these figures to depict causal distributional shifts. However, readers familiar with Grotta bars from randomized experiments and interested in the effect of the intervention across the entire outcome distribution may overinterpret unadjusted Grotta bars and erroneously draw causal conclusions from them. Fourth, we did not attempt to assess the rigor or effectiveness of the methods applied to individual studies included in our sample, including whether an appropriate adjustment strategy was used for the target causal contrast or whether there was a difference between the adjusted and unadjusted effect estimates. As such, we cannot conclude whether any of the adjusted Grotta bars in our sample can be interpreted causally. Finally, we aimed to only include publications with causal aims in our study. Our assessment of whether individual studies had causal aims may not necessarily reflect the authors' intentions. Causal aims are subjective and known to be inconsistently reported.^16,17 The moderate inter-rater agreement in full-text screening was primarily due to disagreements over whether articles had causal aims, aligning with prior work showing that study aims (causal vs predictive vs descriptive) are often conflated in practice.^15,17 It is, therefore, possible that we excluded studies with causal aims and, conversely, that studies without causal aims were included in our sample. To address this concern, we conducted a sensitivity analysis that used additional criteria to identify causal studies. Results of this sensitivity analysis were similar to those of the main analysis.

We emphasize that this article specifically is focused on confounding. However, unadjusted Grotta bars may also represent biased exposure-outcome associations if other types of biases (e.g., selection or information bias) are present. While confounding is a major concern in observational studies, we encourage researchers to carefully consider other sources of biases when they perform causal inference analyses.^10,25,26 Even in well-conducted randomized controlled trials, in which confounding is not present, the unadjusted associations may not represent the causal effect of interest if other sources of bias exist (e.g., nonadherence, attrition, measurement error, and missing values). In these scenarios, unadjusted Grotta bars can be misleading, and investigators should consider adjusting their Grotta bars for these sources of bias, if possible.

In conclusion, our study shows that model-based adjusted effect estimates are often not accompanied by model-based adjusted Grotta bars in observational neurology research focused on functional outcomes. Unadjusted Grotta bars created from observational data can be misleading because they may show potentially confounded exposure-outcome associations. If graphs depict observed functional outcome distributions by exposure groups, readers must recognize that any observable distributional difference may be explained by confounding and may not match the adjusted effect estimates. If a Grotta bar graph is adjusted, a causal interpretation may be possible depending on the research aims and whether rigorous causal inference methods were applied in the study.

Visualizations have significant potential to enhance the reader's understanding of study results.²⁷ Authors should generate visualizations that reflect a study's result to clearly and informatively present research findings.²⁸ Grotta bars are a very effective tool for visualizing the strength and direction of ordinal distributional shifts and are very popular in randomized stroke trials. We believe that this graphical tool can also be useful in observational studies with causal aims examining effects on functional outcomes. We recommend that both an unadjusted Grotta bar graph and an adjusted version be presented together when adjusted effects for functional outcomes are presented in a study.

Acknowledgment

The authors thank Camila Victoria-Quilla Baselly Heinrich for her advice regarding the data management practices. The authors are grateful to Nadja Wülk for her proofreading and support of this project as M.R. Forrest's master's thesis. The authors also thank Fiona Morrison for additional grammatical proofreading support.

Glossary

mRS: modified Rankin Scale
TIA: transient ischemic attack

Footnotes

Editorial, page e213338

Author Contributions

M.R. Forrest: drafting/revision of the manuscript for content, including medical writing for content; major role in the acquisition of data; study concept or design; analysis or interpretation of data. T.L. Weissgerber: drafting/revision of the manuscript for content, including medical writing for content; analysis or interpretation of data; study concept or design. E.S. Lieske: drafting/revision of the manuscript for content, including medical writing for content; major role in the acquisition of data. E. Tamayo Cuartero: drafting/revision of the manuscript for content, including medical writing for content; major role in the acquisition of data. E. Fischer: drafting/revision of the manuscript for content, including medical writing for content; major role in the acquisition of data. L. Jones: drafting/revision of the manuscript for content, including medical writing for content; study concept or design. M. Piccininni: drafting/revision of the manuscript for content, including medical writing for content; study concept or design; analysis or interpretation of data. J.L. Rohmann: drafting/revision of the manuscript for content, including medical writing for content; study concept or design; analysis or interpretation of data. A complete Contributor Roles Taxonomy (CRediT) author statement can be found on our GitHub repository.

Study Funding

Support for the open access publication fees was provided by the BIH QUEST Center for Responsible Research and private donations to the Center for Stroke Research Berlin, Charité-Universitätsmedizin Berlin.

Disclosure

J.L. Rohmann reports receiving a research grant from Novartis Pharma for a self-initiated research project about migraine outside of this work, which partially funded M. Piccininni's position. M. Piccininni further reports being awarded a research grant from the Center for Stroke Research Berlin (private donations). The other authors report no relevant disclosures. Go to Neurology.org/N for full disclosures.

References

1.Goyal M, Ganesh A, Brown S, Menon BK, Hill MD. Suggested modification of presentation of stroke trial results. Int J Stroke. 2018;13(7):669-672. doi: 10.1177/1747493018778122 [DOI] [PubMed] [Google Scholar]
2.National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group. Tissue plasminogen activator for acute ischemic stroke. N Engl J Med. 1995;333(24):1581-1587. doi: 10.1056/nejm199512143332401 [DOI] [PubMed] [Google Scholar]
3.Grotta JC. Fifty years of acute ischemic stroke treatment: a personal history. Cerebrovasc Dis. 2021;50(6):666-680. doi: 10.1159/000519843 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Rohmann JL, Huerta-Gutierrez R, Audebert HJ, Kurth T, Piccininni M. Adjusted horizontal stacked bar graphs (“Grotta bars”) for consistent presentation of observational stroke study results. Eur Stroke J. 2023;8(1):370-379. doi: 10.1177/23969873221149464 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Sandset EC. The Modified Rankin Scale and Ordinal Logistic Regression. European Stroke Organization. Published April 18, 2017. Accessed April 13, 2023. eso-stroke.org/outcome-measures-stroke-modified-rankin-scale-ordinal-logistic-regression/. [Google Scholar]
6.Optimising Analysis of Stroke Trials OAST Collaboration, Bath PMW, Gray LJ, Collier T, Pocock S, Carpenter J. Can we improve the statistical analysis of stroke trials? Stroke. 2007;38(6):1911-1915. doi: 10.1161/STROKEAHA.106.474080 [DOI] [PubMed] [Google Scholar]
7.Ganesh A, Luengo-Fernandez R, Rothwell PM. Author response: ordinal vs dichotomous analyses of modified Rankin Scale, 5-year outcome, and cost of stroke. Neurology. 2019;93(16):725. doi: 10.1212/WNL.0000000000008328 [DOI] [PubMed] [Google Scholar]
8.Saver JL. Novel end point analytic techniques and interpreting shifts across the entire range of outcome scales in acute stroke trials. Stroke. 2007;38(11):3055-3062. doi: 10.1161/STROKEAHA.107.488536 [DOI] [PubMed] [Google Scholar]
9.Sankey SS, Weissfeld LA. A study of the effect of dichotomizing ordinal data upon modeling. Commun Stat Simul Comput. 1998;27(4):871-887. doi: 10.1080/03610919808813515 [DOI] [Google Scholar]
10.Hernán MA, Robins JM. Causal Inference: What If. Chapman & Hall/CRC; 2020. [Google Scholar]
11.Gerner ST, Kuramatsu JB, Sembill JA, et al. Characteristics in non-vitamin K antagonist oral anticoagulant-related intracerebral hemorrhage. Stroke. 2019;50(6):1392-1402. doi: 10.1161/STROKEAHA.118.023492 [DOI] [PubMed] [Google Scholar]
12.Manno C, Disanto G, Bianco G, et al. Outcome of endovascular therapy in stroke with large vessel occlusion and mild symptoms. Neurology. 2019;93(17):e1618-e1626. doi: 10.1212/WNL.0000000000008362 [DOI] [PubMed] [Google Scholar]
13.Forrest M, Weissgerber T, Jones L, Piccininni M, Rohmann JL. Use of stacked proportional bar graphs to visualize functional outcome distributions in published neurological research. Open Science Framework. Accessed September 6, 2024. osf.io/w78mh/. [Google Scholar]
14.Waffenschmidt S, Navarro-Ruan T, Hobson N, Hausner E, Sauerland S, Haynes RB. Development and validation of study filters for identifying controlled non-randomized studies in PubMed and Ovid MEDLINE. Res Synth Methods. 2020;11(5):617-626. doi: 10.1002/jrsm.1425 [DOI] [PubMed] [Google Scholar]
15.Hernán MA, Hsu J, Healy B. A second chance to get causal inference right: a classification of data science tasks. Chance. 2019;32(1):42-49. doi: 10.1080/09332480.2019.1579578 [DOI] [Google Scholar]
16.Hernán MA. The C-word: scientific euphemisms do not improve causal inference from observational data. Am J Public Health. 2018;108(5):616-619. doi: 10.2105/AJPH.2018.304337 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Haber NA, Wieten SE, Rohrer JM, et al. Causal and associational language in observational health research: a systematic evaluation. Am J Epidemiol. 2022;191(12):2084-2097. doi: 10.1093/aje/kwac137 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Shmueli G. To explain or to predict? Stat Sci. 2010;25(3):289-310. doi: 10.1214/10-STS330 [DOI] [Google Scholar]
19.Arnold KF, Davies V, de Kamps M, Tennant PWG, Mbotwa J, Gilthorpe MS. Reflection on modern methods: generalized linear models for prognosis and intervention—theory, practice and implications for machine learning. Int J Epidemiol. 2021;49(6):2074-2082. doi: 10.1093/ije/dyaa049 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Forrest MR. SPBG-meta-research-study. GitHub. Accessed September 6, 2024. github.com/meghanrforrest/SPBG-meta-research-study. [Google Scholar]
21.von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ. 2007;335(7624):806-808. doi: 10.1136/bmj.39335.541782.AD [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Kurth T, Walker AM, Glynn RJ, et al. Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am J Epidemiol. 2006;163(3):262-270. doi: 10.1093/aje/kwj047 [DOI] [PubMed] [Google Scholar]
23.Glass TA, Goodman SN, Hernán MA, Samet JM. Causal inference in public health. Annu Rev Public Health. 2013;34:61-75. doi: 10.1146/annurev-publhealth-031811-124606 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Hernán MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health. 2006;60(7):578-586. doi: 10.1136/jech.2004.029496 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Lu H, Cole SR, Howe CJ, Westreich D. Toward a clearer definition of selection bias when estimating causal effects. Epidemiology. 2022;33(5):699-706. doi: 10.1097/EDE.0000000000001516 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Pressat-Laffouilhère T, Jouffroy R, Leguillou A, Kerdelhue G, Benichou J, Gillibert A. Variable selection methods were poorly reported but rarely misused in major medical journals: literature review. J Clin Epidemiol. 2021;139:12-19. doi: 10.1016/j.jclinepi.2021.07.006 [DOI] [PubMed] [Google Scholar]
27.Weissgerber TL, Milic NM, Winham SJ, Garovic VD. Beyond bar and line graphs: time for a new data presentation paradigm. PLoS Biol. 2015;13(4):e1002128. doi: 10.1371/journal.pbio.1002128 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Jambor H, Antonietti A, Alicea B, et al. Creating clear and informative image-based figures for scientific publications. PLoS Biol. 2021;19(3):e3001161. doi: 10.1371/journal.pbio.3001161 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data and code are openly available and can be accessed on our GitHub repository.²⁰

[R1] 1.Goyal M, Ganesh A, Brown S, Menon BK, Hill MD. Suggested modification of presentation of stroke trial results. Int J Stroke. 2018;13(7):669-672. doi: 10.1177/1747493018778122 [DOI] [PubMed] [Google Scholar]

[R2] 2.National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group. Tissue plasminogen activator for acute ischemic stroke. N Engl J Med. 1995;333(24):1581-1587. doi: 10.1056/nejm199512143332401 [DOI] [PubMed] [Google Scholar]

[R3] 3.Grotta JC. Fifty years of acute ischemic stroke treatment: a personal history. Cerebrovasc Dis. 2021;50(6):666-680. doi: 10.1159/000519843 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Rohmann JL, Huerta-Gutierrez R, Audebert HJ, Kurth T, Piccininni M. Adjusted horizontal stacked bar graphs (“Grotta bars”) for consistent presentation of observational stroke study results. Eur Stroke J. 2023;8(1):370-379. doi: 10.1177/23969873221149464 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Sandset EC. The Modified Rankin Scale and Ordinal Logistic Regression. European Stroke Organization. Published April 18, 2017. Accessed April 13, 2023. eso-stroke.org/outcome-measures-stroke-modified-rankin-scale-ordinal-logistic-regression/. [Google Scholar]

[R6] 6.Optimising Analysis of Stroke Trials OAST Collaboration, Bath PMW, Gray LJ, Collier T, Pocock S, Carpenter J. Can we improve the statistical analysis of stroke trials? Stroke. 2007;38(6):1911-1915. doi: 10.1161/STROKEAHA.106.474080 [DOI] [PubMed] [Google Scholar]

[R7] 7.Ganesh A, Luengo-Fernandez R, Rothwell PM. Author response: ordinal vs dichotomous analyses of modified Rankin Scale, 5-year outcome, and cost of stroke. Neurology. 2019;93(16):725. doi: 10.1212/WNL.0000000000008328 [DOI] [PubMed] [Google Scholar]

[R8] 8.Saver JL. Novel end point analytic techniques and interpreting shifts across the entire range of outcome scales in acute stroke trials. Stroke. 2007;38(11):3055-3062. doi: 10.1161/STROKEAHA.107.488536 [DOI] [PubMed] [Google Scholar]

[R9] 9.Sankey SS, Weissfeld LA. A study of the effect of dichotomizing ordinal data upon modeling. Commun Stat Simul Comput. 1998;27(4):871-887. doi: 10.1080/03610919808813515 [DOI] [Google Scholar]

[R10] 10.Hernán MA, Robins JM. Causal Inference: What If. Chapman & Hall/CRC; 2020. [Google Scholar]

[R11] 11.Gerner ST, Kuramatsu JB, Sembill JA, et al. Characteristics in non-vitamin K antagonist oral anticoagulant-related intracerebral hemorrhage. Stroke. 2019;50(6):1392-1402. doi: 10.1161/STROKEAHA.118.023492 [DOI] [PubMed] [Google Scholar]

[R12] 12.Manno C, Disanto G, Bianco G, et al. Outcome of endovascular therapy in stroke with large vessel occlusion and mild symptoms. Neurology. 2019;93(17):e1618-e1626. doi: 10.1212/WNL.0000000000008362 [DOI] [PubMed] [Google Scholar]

[R13] 13.Forrest M, Weissgerber T, Jones L, Piccininni M, Rohmann JL. Use of stacked proportional bar graphs to visualize functional outcome distributions in published neurological research. Open Science Framework. Accessed September 6, 2024. osf.io/w78mh/. [Google Scholar]

[R14] 14.Waffenschmidt S, Navarro-Ruan T, Hobson N, Hausner E, Sauerland S, Haynes RB. Development and validation of study filters for identifying controlled non-randomized studies in PubMed and Ovid MEDLINE. Res Synth Methods. 2020;11(5):617-626. doi: 10.1002/jrsm.1425 [DOI] [PubMed] [Google Scholar]

[R15] 15.Hernán MA, Hsu J, Healy B. A second chance to get causal inference right: a classification of data science tasks. Chance. 2019;32(1):42-49. doi: 10.1080/09332480.2019.1579578 [DOI] [Google Scholar]

[R16] 16.Hernán MA. The C-word: scientific euphemisms do not improve causal inference from observational data. Am J Public Health. 2018;108(5):616-619. doi: 10.2105/AJPH.2018.304337 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Haber NA, Wieten SE, Rohrer JM, et al. Causal and associational language in observational health research: a systematic evaluation. Am J Epidemiol. 2022;191(12):2084-2097. doi: 10.1093/aje/kwac137 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Shmueli G. To explain or to predict? Stat Sci. 2010;25(3):289-310. doi: 10.1214/10-STS330 [DOI] [Google Scholar]

[R19] 19.Arnold KF, Davies V, de Kamps M, Tennant PWG, Mbotwa J, Gilthorpe MS. Reflection on modern methods: generalized linear models for prognosis and intervention—theory, practice and implications for machine learning. Int J Epidemiol. 2021;49(6):2074-2082. doi: 10.1093/ije/dyaa049 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Forrest MR. SPBG-meta-research-study. GitHub. Accessed September 6, 2024. github.com/meghanrforrest/SPBG-meta-research-study. [Google Scholar]

[R21] 21.von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ. 2007;335(7624):806-808. doi: 10.1136/bmj.39335.541782.AD [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Kurth T, Walker AM, Glynn RJ, et al. Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am J Epidemiol. 2006;163(3):262-270. doi: 10.1093/aje/kwj047 [DOI] [PubMed] [Google Scholar]

[R23] 23.Glass TA, Goodman SN, Hernán MA, Samet JM. Causal inference in public health. Annu Rev Public Health. 2013;34:61-75. doi: 10.1146/annurev-publhealth-031811-124606 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Hernán MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health. 2006;60(7):578-586. doi: 10.1136/jech.2004.029496 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Lu H, Cole SR, Howe CJ, Westreich D. Toward a clearer definition of selection bias when estimating causal effects. Epidemiology. 2022;33(5):699-706. doi: 10.1097/EDE.0000000000001516 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Pressat-Laffouilhère T, Jouffroy R, Leguillou A, Kerdelhue G, Benichou J, Gillibert A. Variable selection methods were poorly reported but rarely misused in major medical journals: literature review. J Clin Epidemiol. 2021;139:12-19. doi: 10.1016/j.jclinepi.2021.07.006 [DOI] [PubMed] [Google Scholar]

[R27] 27.Weissgerber TL, Milic NM, Winham SJ, Garovic VD. Beyond bar and line graphs: time for a new data presentation paradigm. PLoS Biol. 2015;13(4):e1002128. doi: 10.1371/journal.pbio.1002128 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Jambor H, Antonietti A, Alicea B, et al. Creating clear and informative image-based figures for scientific publications. PLoS Biol. 2021;19(3):e3001161. doi: 10.1371/journal.pbio.3001161 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Use of Stacked Proportional Bar Graphs (“Grotta Bars”) in Observational Neurology Research

Meghan R Forrest

Tracey L Weissgerber

Emma S Lieske

Elena Tamayo Cuartero

Elena Fischer

Lydia Jones

Marco Piccininni

Jessica L Rohmann

Abstract

Background and Objectives

Methods

Results

Discussion

Introduction

Figure 1. Depiction of Example Grotta Bars.

Methods

Standard Protocol Approvals, Registrations, and Patient Consents

Journal Screening

Search Strategy

Inclusion and Exclusion Criteria

Article Screening

Abstraction

Statistical Analysis

Data Availability

Results

Study Sample

Figure 2. PRISMA Flow Diagram for Study Selection.

Prevalence of Grotta Bars

Figure 3. Functional Outcome Visualization in the Overall Sample.

Adjusted Grotta Bars

Sensitivity Analysis

Inter-Rater Reliability

Discussion

Figure 4. Summary of Model-Based Adjustment Methods Identified in Our Sample of Studies.

Figure 5. Recommendations for Authors.

Figure 6. Recommendations for Readers and Reviewers.

Acknowledgment

Glossary

Footnotes

Author Contributions

Study Funding

Disclosure

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases