Skip to main content
Patient Safety in Surgery logoLink to Patient Safety in Surgery
. 2025 Dec 16;19:36. doi: 10.1186/s13037-025-00456-w

Flawed design and selection bias in critical care randomized controlled trials (RCTs): the patient safety risk of the “RCT mimic”

Lawrence A Lynn 1,
PMCID: PMC12709683  PMID: 41402862

Abstract

Over three decades, randomized controlled trials (RCTs) for critical care syndromes such as acute respiratory distress syndrome (ARDS), sepsis, and community acquired pneumonia (CAP) have repeatedly produced non-reproducible results, at times leading to high-impact reversals of global protocols when later studies revealed harm. These trials enroll patients using expert-derived threshold sets intended to define the syndrome. This analytic review presents the first historical and formal methodological review and mathematical analysis of such RCT using causal symbolic modeling (cSM), directed acyclic graphs (DAGs), and do-calculus. The review includes landmark publications, task-force threshold sets, and case examples, including the 2025 REMAP-CAP corticosteroid domain, to model the causal structure of standard RCTs applied to threshold-defined syndromes. PubMed searches and ChatGPT were used to assist in this process. The historical inquiry uncovered that the critical care syndromes of ARDS and sepsis are guessed synthetic constructs, devised in the twentieth century by Thomas Petty and Roger Bone as heuristic groupings of diverse but similar appearing diseases. However a much more striking discovery was that Petty and Bone introduced a streamlined variant of the Bradford Hill RCT method, here termed the “Petty-Bone RCT’, which conditions enrollment on a triage threshold set that functions as a cohort-level collider. This design yields results valid only for the unstable mixture of diseases enrolled. The “Petty-Bone RCT” preserves the outward form of a randomized trial but lacks the causal structure needed for transportability, making it an RCT mimic. The cSM analysis in this review shows that while potentially internally valid, such trials cannot produce reliable treatment protocols and often cause harm. These findings compel the abandonment of the Petty-Bone RCT framework, the integration of cSM into the Consolidated Standards of Reporting Trials (CONSORT), and prioritizing mechanistically grounded, investigator-led designs in critical care research. These provocative discoveries indicate that not one more patient, not one more investigator, not one more grant should be sacrificed to the next iteration of a Petty and Bone’s synthetic syndrome RCT.

Graphical Abstract

graphic file with name 13037_2025_456_Figa_HTML.jpg

Keywords: Randomized controlled trials, Sepsis, Acute respiratory distress syndrome, Critical care


Graphical summary – mechanism of the RCT mimic

1. Origins – Petty (1967: ARDS via IRDS analogy) → Bone (1992: SIRS thresholds) → task-force synthetic syndromes.

2. Enrollment – Threshold-based inclusion variable (X) defines the synthetic syndrome (S).

3. Symbolic Flag – S is a label: D → X → S, not in the causal path to outcome.

4. Collider Structure – X is a cohort collider: D → X → inclusion → T → Y; D ⊆ {d₁, d₂, … , dᵢ}, |D| ≥ 1.

5. Non-Transportability – Average treatment effect applies only to the unstable disease mix enrolled.

6. Example – REMAP-CAP mixes bacterial pneumonias (strong antibiotic safety net) with influenza pneumonias (weak antiviral safety net).

7. Illusion of Evidence – Apparent effects reflect cohort composition, not a unified disease entity.

8. Paradigm Control – Task force controls X, enabling synthetic paradigm shifts or sample size inflation without new discovery.

9. Pathological Consensus – Pathological science evolves into centrally controlled consensus.

10. Global Harm – Methodology exported to low- and middle-income countries (“intellectual colonization”).

11. Solution – Retire the “Petty-Bone RCT” mimic; make causal symbolic modeling (cSM) a requirement for all grant-funded trials, and return greater design autonomy to local investigators to reintroduce diversity of approach and reduce centralized control over research paradigms.

Cohort collider structure in “Petty-Bone RCT”

This directed acyclic graph (DAG) illustrates the cohort collider mechanism in “Petty-Bone RCTs”. Multiple diseases (d₁, d₂, d₃) influence the inclusion thresholds (X). X determines trial inclusion, leading to treatment assignment (T) and outcome (Y). Conditioning on X creates a cohort-level collider, making treatment effects specific to the unstable mix of diseases enrolled and non-transportable to other patient populations.

Introduction

Randomized controlled trials (RCTs) of critical care syndromes have repeatedly failed to produce reproducible results [13], often prompting reversal of clinical protocols sometimes after harmful average treatment effects were revealed [49]. The reversal of an international RCT based guideline due to harm is a profound event which sends shock waves through the critical care science community [10]. Most of these high profile, failed RCTs employed a streamlined modification of the Bradford Hill RCT, introduced by Thomas Petty and Roger Bone, the “Petty-Bone RCT”, that uses arbitrary threshold-based triage rules to select patients for treatment trials of syndromes such as ARDS or sepsis (4–9).

Given these failures, this review examined the origins, evolution, and standardization of this “Petty-Bone RCT” over five decades, and analyzed its causal structure using symbolic causal modeling (cSM) and Pearl’s do-calculus. An important discovery was that these RCT test treatments of syndromes which are synthetic constructs, created without empirical validation and built from broad, nonspecific inclusion thresholds that introduce cohort-level collider bias (4–9).

The following analysis shows that this framework produces an RCT mimic: a design that mimics the procedural form of a Bradford Hill trial but lacks the causal structure needed for transportable conclusions and reveals that this paradigm has shaped, and generally misdirected, critical care science for more than 40 years.

Historical background: emergence of synthetic syndromes in critical care RCTs

After Sewall Wright laid the groundwork for causal pathways [11, 12], Ronald Fisher provided the foundational work for both statistical inference and experimental design [13]. Based on Fisher’s work, the randomized controlled trial (RCT) emerged in the mid-twentieth century as the gold standard for causal inference in medicine, exemplified by Bradford Hill’s landmark study [14]. The RCTs power lies not in form, randomization, blinding, or sample size alone, but in its causal grounding: a true RCT defines its population, interventions, and outcomes within a framework that respects causal structure. Inclusion criteria reflect a coherent causal condition, randomization severs all backdoor paths, and the outcome is interpretable as the effect of the intervention within that framework. When these conditions are met, the RCT estimates not just an association, but a counterfactual claim—what would likely have happened if the treatment had not been assigned. In some fields the perceived need for causal pathways declined as the RCT gained prominence.

In 1967, two decades after Hill introduced the RCT, David Ashbaugh and American pulmonologist Thomas Petty, et al. described adult respiratory distress syndrome (ARDS), a grouping of disparate disease, such as trauma-associated lung injury, pneumonia, and pancreatitis, under a shared set of criteria [15]. Petty’s construct, described more fully in 1971 [16], drew heavily from infant respiratory distress syndrome (IRDS) also called “hyaline membrane disease’, an actual disease caused by surfactant deficiency in premature infants [17] and familiar to the public after the death of President Kennedy and Jackie’s newborn son [18]. Petty borrowed not only terminology but also pathophysiologic imagery, diffuse alveolar damage, hyaline membranes, and surfactant dysfunction, despite their inconsistent applicability to adults. The American-Vietnam War’s “Da Nang lung” phenomenon [19], a post-battle pulmonary distress of uncertain cause, reinforced the appeal of a unifying label.

Petty’s approach was a similarity heuristic, not the identification of a discrete biologic disease. His hypothesis—that surfactant depletion could unify multiple adult lung pathologies by reproducing the same physiologic derangement seen in IRDS—was never mechanistically confirmed. Even after the surfactant theory failed, the ARDS label persisted. Petty’s notable contribution was recognizing that positive end-expiratory pressure (PEEP), first shown by British pediatric anesthesiologist John Inkster to benefit IRDS [20], also improved oxygenation in adults meeting Petty’s definition of ARDS [21]. Although PEEP was later shown to be nonspecific and effective across many pulmonary conditions [22, 23], its success in ARDS appeared to validate the synthetic syndrome as a basis for RCT enrollment.

In the 1980s, Roger Bone further operationalized synthetic syndrome construction with the systemic inflammatory response syndrome (SIRS). Bone proposed a simple set of physiologic thresholds—fever, tachycardia, leukocytosis—as a gateway to diagnosing and enrolling patients in sepsis trials [24, 25]. This approach enabled rapid recruitment of critically ill patients with diverse advanced infections, collapsing distinct diseases into a single trial label. The simplicity of threshold-based triage overcame the major bottleneck of disease-specific diagnosis in ICU recruitment, opening funding gates and ushering in a “golden era” of synthetic syndrome trials.

Yet the inclusivity of SIRS was also its weakness. It grouped patients with sterile inflammation, viral infections, bacterial and fungal sepsis, and myriad combinations thereof into one mechanistically incoherent cohort. Like ARDS, Bone’s sepsis syndrome traded causal specificity for ease of use. The thresholds were consensus-driven, unanchored in mechanism, amendable at will, and optimized for ease rather than defined by causal rigor. As will be demonstrated in the following analysis, the aggregation of different diseases introduced structural confounding and collider bias at the cohort level, making observed treatment effects difficult to interpret, non-reproducible, and non-transportable. Even when a biologically active treatment was tested, its true effect could be obscured or reversed by the synthetic (procedure induced) cohort level heterogeneity induced by the mixed set of diseases enrolled.

Historical development and entrenchment of the “petty-bone RCT”

By the late 1980s, the combination of Petty’s diagnostic disease aggregations and Bone’s threshold set triage to select the cohort established a procedural basis for synthetic syndrome trial design. This new “Petty-Bone RCT” design was formally codified in 1992 by an official task force [25], greatly simplifying research in critical care. Embracing this streamlined approach, expert task forces met every decade to reaffirm the use of the SIRS criteria for sepsis research RCTs worldwide. Petty and Bone’s technique soon grew to dominate critical care research, with task forces, which were formed for each synthetic syndrome, progressively assuming worldwide control of synthetic syndrome science.

With decades of strong expert support, synthetic syndromic constructs such as sepsis and ARDS became deeply embedded in clinical culture, serving as anchors for communication, urgency, and awareness. In this context, they functioned effectively as educational tools or severity markers—shorthand that alerts clinicians to escalating physiologic instability and prompts timely action. For example, teaching the public that “sepsis kills” and generating mandated sepsis triggers for hospitals produced enhanced awareness and vigilance, and promoted consistent early antibiotic intervention in the absence of immediate diagnostic certainty saving many lives worldwide [26, 27].

Although the success of this didactic utility did not confer causal validity, such constructs were imported wholesale into randomized controlled trial design using Petty and Bone’s method. The problem arose when thresholds of nonspecific physiologic measurements were used as inclusion criteria in studies intended to produce causal claims about treatment efficacy. In that context, these constructs functioned not as communication tools but as cohort-level colliders, introducing systematic bias by selecting patients from a plurality of distinct causal pathways under a single synthetic label.

Reviewing the history, no evidence was identified that the causal structure of Petty and Bone’s modification of the RCT was ever questioned with any specificity, much less investigated or validated, by the task forces or by anyone. It appears that the similarity to Bradford Hill’s designs was enough to fool clinicians and trialists. The statisticians, focusing on the math and design integrity, were apparently unaware that the synthetic syndromes were not disease equivalents. This review found that statisticians in critical care science often leave confirmation of the integrity of the clinical measurements (the entry criteria) to the trialists [28]. Perhaps for this reason, all assumed that these were Bradford Hill RCTs, which, when properly designed, represent the best medical evidence-generating tool available to critical care. They were wrong.

Sepsis science crisis triggers a synthetic paradigm shift

For decade’s critical care used SIRS as the threshold set for Sepsis RCT [25, 29, 30]. However, by 2015, as shown in Table 1, the sepsis research community was in crisis. Multiple high-profile reversals of critical care guidelines for harm or lack of efficacy had occurred. Everyone knew that reversal of an international treatment guideline for harm, or even just for lack of efficacy, was a sentinel event requiring deep methodological failure mode analysis. However, the “Petty-Bone RCT” methodology was the accepted floor of critical care science so the flaw was perceived Bone’s mistake in selection of thresholds not in his idea to triage for RCT participants using a nonspecific threshold set in the first place.

Table 1.

Major sepsis interventions reversed after inclusion in guidelines

Intervention Early Positive Trial(s) Later Negative Trial(s) Guideline Impact Reversal Outcome
Tight Glycemic Control van den Berghe, NEJM 2001 [31]

NICE-SUGAR, NEJM 2009

[32]

Surviving Sepsis Campaign (SSC) 2004/2008 recommended intensive insulin Reversed 2009 due to harm
Low-Dose Corticosteroids

Annane D, JAMA 2002

[33]

CORTICUS, NEJM 2008 [34] SSC recommended for refractory shock Reversal 2008 → downgraded, no mortality benefit
Drotrecogin alfa (Activated Protein C)

Bernard GR, NEJM 2001 (PROWESS)

[35]

Ranieri, V.M., Drotrecogin alfa N Engl J Med. 2012

[36]

FDA approved 2001; SSC 2004/2008 recommended Withdrawn 2011 due to harm
Early Goal-Directed Therapy (EGDT) Rivers E, NEJM 2001 [30] ProCESS 2014 [37]; ARISE 2014 [38] PROMISE 2015 [39] Adopted into SSC 2004/2008 as standard of care Abandoned 2014–15 → no benefit

Only a year before, Lynn had formally asserted that SIRS, the cornerstone of the synthetic syndrome of sepsis was simply a placeholder guessed by Roger Bone, explaining why all sepsis trials based on SIRS had yielded non-reproducible results. Lynn, noting that SIRS must be abandoned as the standard sepsis measurement, warned the leadership not to yield to temptation to simply select a new set of thresholds [40]. Yet that is exactly what they did.

The leadership faced a crossroads: disengage from the “Petty-Bone RCT” framework and pivot toward real-time series data analysis, or ignore Lynn’s warning [40] and engineer a ‘synthetic paradigm shift’ by substituting one guessed threshold set for another. They chose the latter.

New threshold set selection entrenches pathological science

Since 1992 each sepsis task force had been titrating the thresholds around SIRS but in 2016 they jumped to a new threshold set. With the new set of thresholds selected in the wake of repeated RCT failures, sepsis task force leaders believed they engineered a decisive turning point. In reality, they orchestrated what amounted to a synthetic paradigm shift rooted not in new discovery or methodology but in the inherited methodology of Petty and Bone.

The task force believed they had a “new” construct, Sepsis-3, replacing the 2012 Sepsis-2 (the last guideline tied to SIRS) [41]. Central to this redefinition was the adoption of the Sepsis-related Organ Failure Assessment (SOFA), an old threshold set first introduced in the 1996 [42]. Yet SOFA had already been identified as nonspecific for sepsis: by 1998 its very name was altered from “Sepsis-related” to “Sequential” Organ Failure Assessment [43], acknowledging that it captured organ dysfunction from any cause, not exclusively sepsis. Despite this history, the 2016 task force revived SOFA, pairing it with “suspicion of infection” to construct a new entry gate for sepsis RCTs they called Sepsis-3 [44].

This maneuver exemplified symbolic retrenchment rather than scientific advancement. The SOFA score had been bypassed by the 2001 and 2012 task forces [29, 41], which retained SIRS as the operational standard. In 2016, however, perhaps out of frustration with repeated trial failures, the task force elevated SOFA to global prominence, co-opting an organ dysfunction index never validated for sepsis and branding it as innovation [44].

The transition from SIRS to SOFA was thus not a paradigm shift but a paradigm recycling. For nearly a quarter-century, sepsis task forces had modified Bone’s SIRS criteria incrementally, while leaving untouched Petty and Bone’s methodological premise: that non-specific threshold sets could stand as trial entry gates. In 2016, instead of abandoning this fatally flawed foundation, the task force merely substituted one non-specific threshold set (SOFA) for another (SIRS), thereby mimicking the practices of their disciplinary icons, Petty and Bone.

Sepsis-3 was less a scientific breakthrough than an act of icon mimicry. Not surprisingly, since the failed Petty and Bone methodology was not corrected, there have been many sepsis RCT using SOFA as the threshold set but no reproducibly positive sepsis trials since 2016. In 2016, what was heralded as scientific progress was in fact the rebranding of an entrenched methodological error—precisely the design pathology Lynn [40] warned against. Table 2 shows the progression of adult sepsis thresholds to the present day.

Table 2.

Timeline of taskforce-derived sepsis threshold sets (x) for rct

Year/Task Force Key Development
1991–92 (Bone et al., ACCP/SCCM Task Force) [25] Bone introduces SIRS, the first threshold set for sepsis RCTs.
1996 (Vincent et al.) [42] SOFA (Sepsis-related Organ Failure Assessment) introduced, initially tied to sepsis.
1998 (Vincent et al.) [43] SOFA abandoned as a sepsis-specific score because it was recognized as nonspecific (captured any organ dysfunction).
2001 (Levy et al., SCCM/ESICM/ACCP/ATS/SIS Task Force) [29] Sepsis Definitions Reaffirmed: SIRS retained; sepsis defined as infection + ≥2 SIRS
2012 (Surviving Sepsis Campaign Task Force) [41] Continued reliance on SIRS thresholds for sepsis recognition. Added other thresholds.
2014 (Lynn’s Warning) [40] Lynn cautioned against re-tying sepsis definitions to nonspecific threshold sets.
2015 (SIRS Failure Resignation) As repeated sepsis RCTs failed (Table 2)
2016 (Sepsis-3: Synthetic Paradigm Shift) [44] Ignoring Lynn’s warning, the Sepsis-3 Task Force replaced SIRS with SOFA ≥2 as the defining threshold for sepsis RCT.

Absence of formal validation of the “petty-bone RCT” modification

This historical review uncovered a striking finding: despite decades of use, Petty and Bone’s modification of the RCT has never undergone formal validation. This recognition motivated a deeper investigation into the mathematical and causal consequences of the modification.

This analysis establishes that the sole structural deviation from Bradford Hill’s classical RCT framework is the introduction of the synthetic syndrome. Patients are enrolled not on the basis of a single disease, but through nonspecific threshold criteria (X)—such as oxygenation ratios, radiographic patterns, or hemodynamic cutoffs—that collapse multiple distinct diseases into a symbolic placeholder (S), where S is the synthetic syndrome. This construct is then treated as if it were a unified disease category even though the underlying diseases are diverse. Apart from this redefinition of the disease state, the resulting trials retain the procedural appearance of a conventional Bradford Hill RCT as Illustrated in Fig. 1.

Fig. 1.

Fig. 1

Variability in ‘synthetic syndrome’ ards composition across RCT each large circle depicts the patient cohort in an ARDS-labeled RCT. Numbered sub-circles represent distinct diseases with differing average treatment effects (ATEs), sized in proportion to their prevalence within the cohort. Because disease mixtures vary across trials, the “ARDS” case set is unstable, undermining reproducibility and transportability, and revealing the cohort-level collider bias inherent in nonspecific threshold-based disease aggregation

Across ARDS trials, enrollment is determined not by a single causal disease entity but by threshold-defined criteria that pool patients with diverse upstream conditions [4548]. As illustrated in Fig. 1, each trial cohort (large circle) is composed of a variable mixture of distinct diseases (sub-circles), each with its own causal pathway and potential treatment effect. The relative proportions of these diseases differ across studies, so that what is labeled as “ARDS” in one RCT is not the same composite as in another. This structural instability produces treatment effects that are cohort-specific rather than transportable. The variation in disease composition explains why results are difficult to reproduce and why protocols derived from such trials often fail when applied in new settings. This instability is a direct consequence of the cohort-level collider bias introduced by Petty and Bone’s threshold-based aggregation.

For example, in ARDS trials, patients with bacterial pneumonia (d₁), viral pneumonia (d₂), aspiration (d₃), trauma-related respiratory failure (d₄), or pancreatitis-associated lung injury (d₅) are all grouped together if they meet threshold criteria. Although these diseases differ in etiology and treatment responsiveness, they are reified as equivalent. Consequently, the distribution of {d₁, d₂, … , dᵢ} necessarily varies across trials, even under identical enrollment rules. This instability undermines reproducibility, prevents transportability of results.

This distortion is not ordinary confounding, which can be managed by randomization or covariate adjustment. Instead, it arises from aggregation itself. The synthetic syndrome (S) is a collider, comprised by multiple upstream diseases (d₁ … dᵢ) through threshold criteria (X). Each patient will likely have at least one target disease, but the proportions differ across trials, making the average treatment effect a moving target rather than a stable population parameter.

This structural instability is not merely a practical limitation but a mathematically predictable bias. In causal terms, the synthetic syndrome functions as a cohort collider, created when upstream diseases converge through nonspecific thresholds. When describing such mechanisms, a directed acyclic graph (DAG) framework [12, 49, 50], as provided below, provides clarity.

In causal inference terms, the synthetic syndrome S is not a real disease but a symbolic placeholder indicating “some disease is present.” At least one of the diseases (d₁ … dᵢ) triggers threshold values (X), which gate entry into S. Treatment assignment (T) is randomized within S, often with a restrictive criteria (Z) such as “suspicion of infection” which must be satisfied. The directed acyclic graph (DAG) in Fig. 2 demonstrates how this structure generates cohort-specific and non-generalizable results.

Fig. 2.

Fig. 2

Generic “petty-bone RCT” cohort collider structure (ChatGPT assisted figure)

The key structure is:

Here: DInline graphicXInline graphicTInline graphicY; XInline graphicDInline graphicY; XInline graphicS, DInline graphic{d₁, d₂, d₃, … , di}, |D| ≥ 1

P(X = 1IZ = 0) = 0 (X does not trigger T when Z is absent)

Where: D represents one or more diseases selected from a larger set of distinct diseases

di is a disease which generated X and was selected by X

X is a set of gate keeping thresholds (which requires the presence of Z)

Z is a clinical state selected by the task force (e.g. under suspicion of sepsis)

T is a treatment

Y is an outcome

The first thing to notice is that the DAG reveals that there is no causal connection between the synthetic syndrome itself and outcome or treatment. The synthetic syndrome is just a symbol. It is like a flag which is raised upon the occurrence of the breach of threshold X in the presence of Z. The flag is not in the causal path as it is the occurrence of X in the presence of Z which triggers inclusion in the RCT and randomization to treatment or control. That is why they could freely change Xi from the threshold set of SIRS (X1) to that of SOFA (X2) and the syndrome S (the flag of sepsis) stays identical despite the fact that X2 shares no thresholds or even signals with X1. That would not be possible if S was a real biologic entity.

As you can see, X is a collider but not a collider for a single patient. For an individual, D is fixed; X is just a function of that single D. The collider phenomenon arises when the task force directs to pool many patients with different D’s and condition on X to create the cohort. Thus the collider is a cohort‑level artifact of disease mixing at the enrollment gate directed by the task force. So X is a cohort‑level collider. Each participant has at least one disease that determines their pathophysiology which induces perturbations of clinical values of laboratory or vitals signals. Enrollment is triggered by the crossing(s) of task force dictated threshold(s) X by one or more perturbations in the presence or occurrence of Z. Thus, many distinct upstream causes flow into the same gate X. The observed treatment effect within S = 1 (in the RCT) is therefore a weighted mixture over disease‑specific effects which varies with the enrolled disease composition and is generally non‑transportable.

During this review of the history it was clear that this structure is not the structure that Petty, Bone and the task forces of the last 23 years believed was operative. To Fig. 3 shows actual “Petty-Bone RCT” DAG compared with the “task force belief DAG”.

Fig. 3.

Fig. 3

Comparison of true DAG vs task force belief DAG left: correct dag for a “petty-bone RCT”, where S is a flag (a synthetic syndrome) derived from X and has no causal effect on Y. Right: “task force belief DAG” representing the mistaken assumption that S is a unified causal disease equivalent directly influencing outcome (S → Y)

Using DAGs it is clear that, relevant future RCT, all the consensus task forces are doing is generating a new X and/or Z for their respective synthetic syndrome during periodic meetings. However, they are actually leaving S unchanged (though they may adjust its definition), because S functions only as a flag or placeholder. By controlling X, they control how easily a large sample size (n) can be enrolled—an advantage for securing stronger grant applications. Enrollment can be inflated by lowering thresholds or adding new ones, as occurred with Sepsis-2 in 2012 [41]. Another example of X expansion occurred with the most recent ARDS task force definition [48], which greatly broadened inclusion criteria, making high-n ARDS trials achievable in virtually any large ICU.

The record shows that, by maintaining global control over X, the task force can also engineer a new synthetic ‘paradigm shift’ any time they choose. They can simply replace X₁ with an entirely new X₂. For example, in 2016, the relationship between X and S for the synthetic syndrome of sepsis was altered when the task force replaced X₁ = two SIRS thresholds with X₂ = SOFA score of 2 in the presence of Z (suspicion of infection). In doing so, they also constructed a new—albeit ambiguous—definition for S [44]. However, because S is not part of the true causal path, their new definition is not relevant to the detection of the synthetic syndrome or the RCT.

This dynamic illustrates that it is the task force, not underlying pathophysiology that dictates the science, since they determine and standardize the perceived causal path within the RCTs. Ironically, in making the adjustments of X, the task force is accomplishing nothing because they are effectively titrating or replacing a cohort-level collider and any new definition they promulgate is not on the causal path to the outcome. These task force adjustments, like the “Petty-Bone RCT” itself, represent science mimicry rather than genuine causal refinement. They waste the public funds.

The unexpected effects of mixing different diseases

The next approach was to investigate the causal modeling of the 2025 Randomized, Embedded, Multifactorial, and Adaptive Platform Trial for Community-Acquired Pneumonia (REMAP–CAP) RCT as a case study. This trial evaluated the treatment of CAP (community-acquired pneumonia), a synthetic syndrome, was conducted on an ambitious, high-profile trial platform with over 300 recruiting sites across more than 20 countries, including Australia, New Zealand, the UK, the USA, Canada, the Netherlands, Germany, Ireland, and several others in Europe and Asia [51].

However, the corticosteroid testing domain of REMAP–CAP was terminated early due to harm [52, 53], despite a large previous RCT, CAPE-COD [54] suggesting that corticosteroids provided benefit in CAP. Although CAPE-COD excluded Influenza, both were “Petty-Bone RCT”. For this reason, the contradictory findings of REMAP–CAP represented yet another in a long line of “Petty-Bone RCT” reversals, making it an ideal recent case for examination of critical care RCT design and non-reproducibility.

Importantly, REMAP-CAP combined influenza pneumonia and a range of bacterial pneumonia and the authors state this did not affect the results. The paper states;

Of 658 patients enrolled, 8.2% had severe influenza pneumonia. Although slightly more influenza cases were randomized to hydrocortisone than control (8.4% vs 7.4%), prespecified subgroup analyses showed similar adjusted odds ratios for mortality in patients with and without influenza, indicating that influenza inclusion did not primarily drive the overall results.

However the purpose for including this study in this cSM analysis is not to determine whether the small influenza subgroup in REMAP-CAP numerically influenced the results, but rather to illustrate a deeper methodological point: a priori causal diagrams should be developed for each major cause of community-acquired pneumonia. Without explicit causal representation of the distinct pathways—different species of bacterial, viral, mixed, and influenza in particular—RCTs risk collapsing a wide range of different diseases with different disease mechanisms and safety nets into a single synthetic construct. This aggregation produces estimates that may appear internally consistent, but that are neither mechanistically interpretable nor transportable across etiologies. The REMAP-CAP inclusion of influenza exemplifies this larger problem: the absence of disease-specific DAGs allowed fundamentally different processes to be pooled under one label, obscuring rather than clarifying causal effects.

The first step in this review to analyze REMAP –CAP was to use Wright and Pearl’s causal symbolic modeling (cSM) to map the fundamental structure of the RCT causal pathway. To do this, DAGs were generated to examine the paths of different diseases (d₁, d₂, d₃, … , dᵢ) selected by the inclusion criteria (Xi in the presence of Z). The DAG revealed that the incorrect focus on the synthetic syndrome S as a disease equivalent conceals important differences in how patients or treatments buffer the effect of the intervention when the primary causal effect of corticosteroids, immune suppression, is incorporated into the model.

Immune suppression can be beneficial in this setting by reducing inflammation-mediated pulmonary dysfunction, which can cause death. However, it might also have negative consequences, such as reducing microbial clearance (thereby increasing the peak or AUC of microbial load in the time domain) or increasing the risk of secondary infection

An important consideration is that each disease (dᵢ) may have—or may trigger—a distinct ‘safety net’ that mitigates potential harm from the intervention. In bacterial pneumonia, the initiation of antibiotics can counteract the immune-suppressive effects of corticosteroids by mitigating any increase in microbial load caused by immune suppression [55, 56]. In contrast, patients with viral infections, especially when no effective antiviral therapy exists, lack this buffering mechanism, leaving them vulnerable to unchecked viral replication or to secondary resistant bacterial infections [57, 58].

This disparity is easy to miss when the immune suppression effects are assumed to be equal across all diseases within S and the safety nets are not considered. Figure 4 shows a pneumococcal pneumonia DAG that makes this difference explicit when compare with the influenza DAG of Fig. 5.

Fig. 4.

Fig. 4

Pneumococcal pneumonia DAG with antibiotics blocking immunosuppression-driven increased or sustained microbial load

Fig. 5.

Fig. 5

Influenza pneumonia, antivirals block immunosuppression driven viral load or resistant secondary bacterial infection

  • D₁: Pneumococcal pneumonia.

  • Antibiotics: Pathogen-targeted therapy that reduces bacterial burden.

  • I: Immunosuppression from corticosteroids.

  • M: ↑ Microbial load from immunosuppression.

  • Y: Patient outcome (e.g., mortality).

  • S: CAP inclusion criteria.

  • X: Threshold measurement.

Node Explanations:

This pneumococcal pneumonia DAG illustrating the causal relationships in pneumococcal pneumonia where antibiotics block immunosuppression-driven microbial load. Arrows indicate hypothesized causal directions. Notably, hypoxemia influences both the designation of CAP (which is just a flag) and the decision to initiate corticosteroids, which in turn drive immunosuppression and subsequent changes in microbial load. Effective antibiotics block the steroid-induced microbial growth, buffering harm [55, 56].

Node explanations

  • D₂: Influenza pneumonia.

  • Antivirals: Limited or absent efficacy.

  • I: Immunosuppression from corticosteroids.

  • V: ↑ Viral replication and/or microbial load from secondary bacterial infection

  • Y: Patient outcome.

  • S: CAP inclusion criteria.

  • X: Threshold measurement.

This influenza pneumonia DAG illustrating causal relationships in influenza pneumonia where antivirals may incompletely block immunosuppression-driven viral load. Hypoxemia influences both the designation of CAP and the decision to initiate corticosteroids, which in turn drive immunosuppression and subsequent changes in viral load. The arrow from Antivirals to Viral load is shown as a less densely dotted line to represent a weaker safety net then is present with pneumococcal pneumonia as shown in Fig. 4.

Mechanism: Limited antiviral efficacy leaves immunosuppression-driven viral and secondary bacterial proliferation unbuffered. While Confalonieri et al. [59] reported that steroids did not significantly increase influenza viral load, Lansbury et al. [58] and Tsai et al. [57] found higher mortality and secondary infection risks in steroid-treated influenza pneumonia so the actual mechanism is uncertain and so the provision of the node label “viral load” is broadly indicative of adverse effects of immune suppression caused by corticosteroids.

Comparing these two DAGs and the DAG for a generic “Petty-Bone RCT”, it is clear that that synthetic syndrome S is agnostic to the specific disease and their respective safety nets, the estimated treatment effect within S = 1 (the RCT) averages across incompatible causal contexts. The DAG structure makes this clear: disjunctive safety nets lie on unmeasured paths from Di → Y that moderate the effect of T. Without stratifying or modeling these disease-specific buffering pathways, the trial produces structurally ambiguous results. This helps explain why identical treatments yield beneficial, null, or harmful outcomes depending on the disease mix within the synthetic syndrome cohort. While REMAP–CAP did initially stratify by pathogen, in the end the analysis combined them all generating a single n for the calculation of the average treatment effect so this was a “Petty-Bone RCT”.

Community acquired pneumonia (CAP): from sound, evidence based beginnings to a synthetic syndrome trap

A review of the history of the science of CAP reveals that the rationale for aggregating patients under a clinical grouping label was due to the fact that the actual diagnosis of an infection may be unknown at the time when treatment is most likely to be effective. This is true of sepsis, but less commonly true of ARDS. In this regard, the record shows that communityacquired pneumonia (CAP) as a clinical grouping under a label was drawn from the apparent success of broad-spectrum antibiotic therapy applied before the diagnosis is known.

Empiric antibiotic regimens for community-acquired pneumonia are deliberately chosen based on the expected pathogen profile, since rapid identification of the true causative organism is rarely feasible at presentation. Guidelines by ATS/IDSA explicitly recommend agents active against the most common bacterial pathogens, with the expectation that coverage will be refined once further diagnostic information becomes available.

To scientists who were trained in the Petty and Bone research method, it was natural to perceive that CAP, like ARDS and Sepsis, comprised a disease equivalent under the Petty and Bone framework and so was also a synthetic syndrome for which any treatment could be tested using the triage technique taught by Petty and Bone. However, the exploration of the history of CAP reveals that this analogy was produced by backward reasoning. Antibiotics were not first developed or tested on the CAP as a symbolic cohort, they were validated individually against specific pathogens, then combined based on evidence to ensure microbial coverage for the expected pathogens in the select clinical settings of presentation with pneumonia from the community rather than from a hospital or nursing home where different a more resistant pathogens would likely pose a risk.

CAP as a specific antibiotic selection flag was derived from bottom up evidence and then subsequently evaluated in real-world CAP cohorts to confirm that combination therapy provided survival benefit in diverse cases the evidence suggested that it would. This is unlike the top down Petty and Bone approach to the research of sepsis and ARDS. Instead this is classical research, building on durable evidence based threads defined with solid bottom up and reasonably definable causal structure.

The top down approach without evidence was the mistake of Petty and Bone. In contrast the layered bottom up evidence grounded process provided justification for CAP as a flag for empirical antibiotic combinations but it did not justify applying the same symbolic logic to non-antibiotic interventions which have not been tested to individual pathogens. Treatments like corticosteroids, where safety and efficacy are highly dependent on host-pathogen immune interaction, cannot be rationally tested in an aggregated, disease agnostic CAP cohort. In such cases, aggregation conceals the causal distinctions that should guide treatment.

The goal should not be to treat all “severe community acquired pneumonia” identically, but to learn which specific infection types benefit from immunomodulatory therapy and which are harmed. That knowledge allows bedside clinicians to make probabilistic judgments even when definitive diagnosis is not yet available.

This bottom-up process is demanding, but the alternative is the perpetual cycle of contradictory results—apparent benefit versus harm—that has characterized sepsis research for decades and resurfaced in 2025 with CAPE-COD and REMAP-CAP.

This is a textbook example of how symbolism can corrupt scientific inference. Community-acquired pneumonia originated as a valid operational category, derived from mechanistic insight into common pathogens and intended to standardize empirical antibiotic practice. Over time, the pragmatic category (CAP) was ossified into a disease surrogate (a synthetic syndrome) suitable for randomized treatment trials, which it was never intended to be. Moreover, steroid trials folded distinct cases (e.g., influenza pneumonia, pneumococcal pneumonias) into CAP cohorts without differentiating by cause, thereby creating a cohort collider: a mixture induced by symbolic thresholding rather than mechanistic consistency. This transforms a valid pragmatic framework into a causal trap, undermining rigor and reproducibility.

The observation of how readily CAP symbolism corrupted the methods used for defining causal inference at scale in the field of critical care provides is revealing. It provides evidence that 21st century culture of scientists who were trained under the 20th century paradigm of thought leader guessed synthetic syndromes with ambiguous definitions are vulnerable to the acceptance of research streamlining and simplification modifications without formal validation. In this culture the next generation of leaders likely expect that in a decade or so, it will be their opportunity to derive the next new X.

From science to harm: the “petty-bone RCT”’s displacement of genuine public health trials

One of the most astonishing discoveries made while reviewing the history of the “Petty-Bone RCT” was that the “Petty-Bone RCT” was preferred in critical care science over the Bradford Hill RCT. It is easy to speculate the reason for this finding. The Petty and Bone method uses easy to deploy threshold set triage pooling many diseases thereby providing a high n and a robust RCT (on the surface) for grant applications. Regardless of the cause, despite decades of corticosteroid trials of CAP, trials that symbolically absorb influenza cases into broader cohorts, no large RCT has directly evaluated steroid therapy specifically in influenza pneumonia. Analyses to date rely on subgroup data embedded within CAP trials [52] or from observational studies [54, 55, 57, 58] which have suggested harm but provide no definitive evidence on whether steroids help, harm, or have no effect in this context. This a truly extraordinary finding and evidence of the deep trust the critical care field has in their “Petty-Bone RCT” method.

One reason this is so surprising is that, while global influenza mortality remains substantial with the WHO estimating that there are 290,000 to 650,000 annual deaths from respiratory cases worldwide [60] and if effective corticosteroids would be an inexpensive and widely available treatment.

This critical knowledge gap despite adequate funding suggests the research priorities of critical care science have not been properly vetted by the NIH. Steroid efficacy or harm is never tested by RCT at the disease level but rather at the disease agnostic CAP level and this cannot be due to fear of harm as these cases are embedded in the CAP trials. This means that critical bedside questions remain unanswered, not for scientific or operational reasons, but because of a faulty trial structure preference.

REMAP-CAP: adaptive innovation subsumed within the “petty-bone RCT” mimic

This investigation of the design of the REMAP-CAP trial was considered necessary because it offers a modern example. This was a massively ambitious and expensive trial. The trial’s syndromic inclusion under CAP bundled patients with influenza, different bacterial infections, and other respiratory pathogens and while initially stratifying them they were, in the end, combined to generate a single global n for average treatment effect (ATE) determination. REMAP-CAP represented a major innovation in adaptive platform trial design and international collaboration. Its modular approach enables the study of multiple interventions in real time, using Bayesian inference and frequent adaptive updates. However, REMAP-CAP inherited core structural flaws of the “Petty-Bone RCT” method.

REMAP-CAP should be praised for its logistical scale, ethical intent, and statistical flexibility. But without mechanistic anchoring, even the best adaptive platform trial becomes a “Petty-Bone RCT” mimic, procedurally sophisticated, yet causally ambiguous and trapped by a mid-20th century concept of pathophysiology. Such trials do not support scientific accumulation if the different diseases are blindly mixed under unstable labels and it becomes impossible to build upon prior work.

The alibi of the heterogeneous syndrome

In this review it is clear that critical care scientists are aware that the research of critical care syndromes has not progressed as the study of real diseases have in other fields. Yet this review failed to find any deep analysis of the methods applied. Much of the discussion in high impact journals centered on the standard concerns of trial size and power and the limitations of the endpoints used especially the endpoint of mortality. However one thing that stood out was that there was no corrective action. Rationalization universally replaced deep introspection. No one in the discussion of RCT trial limitations section stated that that “one of the limitations of the trial was that we used the RCT streamlining modification of Petty and Bone” nor did they provide any analysis or any indication they were aware of the effect of this modification on transportability.

The rationalization has been strikingly uniform: one of the most durable defenses of failed reproducibility in sepsis trials is the repeated invocation of “heterogeneous syndromes” as an explanatory alibi. A typical example comes from Wang et al., who note that:

Notably, sepsis is a heterogeneous syndrome characterized by a vast, multidimensional array of clinical and biologic features, which has hindered advances in the therapeutic field beyond the current standards. [61]

While framed by some forward thinkers like Wang as a call toward biologically informed stratification, this rhetorical refrain more often functions more as a shield. It masks the fact that a large component of the observed heterogeneity is not biological at all, but procedural—introduced by the very design of Petty and Bone-style trials, which employ broad, nonspecific inclusion thresholds to maximize sample size at the expense of causal coherence.

The persistent invocation of heterogeneity as both an alibi and a problem to be solved accompanied by recurrent calls for greater homogeneity in trial design is often cited as evidence that critical care scientists are aware that synthetic syndrome (Petty and Bone) science is the problem. Yet the record shows they are not. Such appeals are largely rhetorical and ultimately hollow so long as the Petty and Bone paradigm endures and there is no call for its retirement. To the contrary task forces are doubling down in 2025 [62]. As long as patient selection remains anchored in nonspecific, threshold-based syndromic constructs, no amount of exhortation toward homogeneity can meaningfully resolve the structural instability embedded in the design itself.

Of course all patient groups in critical care will include much heterogeneity. The problem isn’t just heterogeneity itself, it’s when that heterogeneity is comprised of a mix of different diseases selected by disease agnostic triage, and then this is mistaken for biological variation. That’s what happens when scientists use a synthetic syndrome like ARDS as their inclusion gate: they are not just accepting heterogeneity, they are inducing it at the disease mix level creating a cohort level collider, and biasing their treatment estimates in the process.

Based on this analysis, to move forward, the field should stop hiding behind this curtain. True complexity demands sharper tools and an initially narrower causal focus to build evidence from the ground up which then allows pragmatic combinations as was applied with antibiotics in CAP. The task is not to simplify the study of disease mixes, but to respect causal structure of a disease, even when it is more difficult to model, less convenient to enroll, or less institutionally familiar. A severely compromised pragmatic RCT is not better than no RCT, it worse because it has proven to generate false evidence based medicine which can last for many years causing harm and wastes public funds.

Petty and Bone’s methods are used to train AI

This investigative review also examined the introduction and implementation of machine learning (ML) and artificial intelligence (AI) to the critical care research environment. The review found that the structural flaws of the Petty and Bone’s synthetic syndrome techniques were incorporated into the machine learning and artificial intelligence models which are trained on amendable and guessed task force-derived criteria as ground truth. For example, efforts to generate computational phenotyping begin by accepting the latest consensus criteria such as SOFA rather than resolve the original misclassification [63]. The result is a class of algorithms that replicate the misclassifications, biases, and heterogeneity embedded in the original constructs.

It is entirely reasonable, indeed essential, to study phenotypes within infection-related illness to uncover causal pathways, predict treatment response, and tailor interventions. But such work must begin from a clean epistemological slate. Efforts to stratify infection phenotypes while retaining the symbolic boundary of “sepsis” as defined by the latest X (SOFA) are fundamentally compromised.

There are forward looking ML/AI research efforts going forward without the synthetic syndrome boundaries [64]. This is true progress. However until the need to disconnect from the task force defined synthetic syndrome thresholds is widely promulgated by the task forces themselves many data scientists will be completely unaware, still relying on task force guidance and will continue train ML and AI using the synthetic syndromes and risk wasting their young careers.

Comparison of “petty-bone RCT” mimic vs. Bradford Hill RCT

The following Table 3 contrasts the structural and epistemic features of a classical Bradford Hill RCT with those of a “Petty-Bone RCT” mimic. While the former derives from a single mechanistically defined disease process, the latter depends on mutable, consensus-based thresholds and unstable disease mixtures.

Table 3.

Comparison of a Bradford Hill RCT with the “Petty-Bone RCT” mimic

Feature Bradford Hill RCT “Petty-Bone RCT” mimic
Inclusion Basis Single, mechanistically defined disease Threshold-defined synthetic syndrome
Causal Anchoring High—based on known pathophysiology Absent or weak—based on symptoms/thresholds
Exchangeability Maintained through narrow, mechanistic selection Violated by mixing heterogeneous diseases
Randomization Validity Internally and externally valid (given proper design) Internally valid only for selected mixture (S = 1)
Transportability Often transportable across similar populations Not transportable—cohort-specific effects
Treatment Consistency One intervention ↔ one mechanism One intervention ↔ multiple mechanisms
Phenotype Definition Stable and biologically meaningful Mutable and consensus-driven
Interpretability Clear inference about treatment effect Causal ambiguity due to latent mixture
Use of DAGs or sCM Optional but compatible Essential for identifying structural flaws
Reproducibility High if assumptions are met Low due to unstable patient mixtures

Why did RCT mimicry persist unnoticed until now?

At first glance, these RCT look quite valid and fooled the best statisticians because randomization creates balance within the enrolled group, giving the impression of scientific rigor. This internal validity sustained credibility for fifty years. But the clinicians were seeing the other side, the results failed when applied to future patients outside the first trial.

DAGs explained the path but to connect to the trials themselves the “Petty-Bone RCT” with do-calculus was examined. But first it is important to present what the Bradford Hill RCT, critical cares “belief RCT” is.

In a traditional RCT, the average treatment effect (ATE) is simple.

This is a comparison of the average outcome if everyone in the trial got the treatment (T = 1) versus if everyone did not (T = 0). That difference tells us the treatment effect, because the trial randomization balances all the patient factors.

Mathematically:

ATE = E[Y | do(T = 1)] – E[Y | do(T = 0)]

where: E denotes the expectation operator, meaning the average outcome across the population and Y is the outcome.

In contrast, the Petty and Bone framework, the observed trial ATE is cohort-specific and depends on the composition of the mix of different diseases synthetic syndrome (S = 1). It can be expressed as:

ATE_{S = 1} = Σ i P (a) (Dᵢ | S = 1) × βᵢ

• ATE_{S = 1}: This represents the Average Treatment Effect within the cohort of patients selected by the synthetic syndrome definition (S = 1). It is the overall treatment effect reported by the trial.

• D₁, D₂, … , Di: These are the distinct underlying diseases (e.g., bacterial pneumonia, viral pneumonia, aspiration, pancreatitis, trauma-related ARDS) that are grouped together under the broad, nonspecific threshold criteria that define the synthetic syndrome S.

• βᵢ: This denotes the true average treatment effect for disease Dᵢ specifically. Each disease may have a different responsiveness to the intervention.

• Σi (the summation sign): This indicates that the overall ATE is not calculated for a single disease, but as a weighted average across all diseases included in the cohort. The sum runs over all possible diseases {D₁, D₂, … , Dᵢ} that could be admitted by the trial thresholds.

Pa (Di| S = 1): This is the proportion (probability) of patients with disease Dᵢ among those included in the trial (S = 1). The superscript ‘a’ emphasizes that this distribution is specific to the actual cohort enrolled in a given trial—it is not universal, and will differ across trials even when the same thresholds are used.

This indicates that the treatment effect estimated within the trial (ATE_{S = 1}) is not the effect of the treatment on a single disease. Instead, it is a weighted average of the true effects (βᵢ) across all distinct diseases {D₁, D₂, … , Dᵢ} that are admitted under the synthetic syndrome threshold S = 1. The weights are the relative proportions of each disease within the trial cohort, P( a ) (Dᵢ | S = 1).

This means that the trial does not identify a stable causal effect of treatment on a single disease entity. Instead, it reflects a cohort-dependent mixture of effects, which can shift unpredictably depending on the disease proportions admitted by the thresholds. Thus, the observed effect is inherently unstable and non-transportable even to other patients outside the trial that meet the synthetic syndrome criteria.

Implications

  1. S is not a disease: It is a symbolic placeholder that collapses multiple heterogeneous diseases into one category.

  2. Trial-to-trial instability: Because the mix of Dᵢ’s changes between trials, the reported ATE_{S = 1} will also change.

  3. Non-transportability: Results cannot be generalized outside the specific cohort composition, since a different mixture of Dᵢ’s produces a different weighted average.

Why can “petty-bone RCT” appear valid yet fail transportability to the bedside?

To understand why the “Petty-Bone RCT” looks valid, its mathematical structure must be examined. In causal inference, do(T) means “intervening to assign treatment T, rather than merely observing it.” In a randomized controlled trial (RCT), randomization is equivalent to do(T), ensuring internal validity. Thus, at first glance, the “Petty-Bone RCT” appears indistinguishable from a Bradford Hill RCT:

P(Y | do(T), S = 1) = P(Y | T, S = 1)

This expression shows that, within the trial cohort (S = 1), randomization functions as intended: because treatment T is randomly assigned, the probability of an outcome Y under do(T) equals the probability of observing Y given T.

The hidden distortion

More precisely, however, the directed acyclic graph (DAG) of Fig. 2 reveals that S is not a disease but a placeholder that is not in the causal path. Instead the threshold set X is in the causal path so the proper structural relation is:

P(Y | do(T), X Inline graphic Threshold Range) = P(Y | T, X Inline graphic Threshold Range)

When S is treated as if it were a disease and X is misinterpreted as diagnostic of that disease, the trial reduces back to the familiar expression:

P(Y | do(T), S = 1) = P(Y | T, S = 1)

On the surface, this looks to any statistician just like a standard RCT.

The case-mix problem

The flaw emerges only when one learns the details and mathematical effect of Petty and Bone’s historical “streamlining” modification of the RCT made in the last century. The innocent appearing error is that a “Petty-Bone RCT” has participant selection by non-specific triage thresholds. Patients with different diseases are pooled into one synthetic cohort.

Given this apical error, the ATE formula is very different. In any given setting (a):

ATE_{S = 1}(a) = Σ i P ( a ) (Dᵢ | S = 1) × βᵢ

But in a future setting (b):

ATE_{S = 1}(b) = Σ i P ( b ) (Dᵢ | S = 1) × βᵢ

If the proportions of diseases D1, D2, … , Di differ between settings—and if treatment effects βi vary across diseases—then:

ATE_{S = 1}(a) ≠ ATE_{S = 1}(b)

Thus, even though randomization preserves internal validity, the trial’s effect estimate is a weighted blend of different diseases that changes with context.

Summary

“Petty-Bone RCT”s look valid to statisticians because randomization balances treatment and control within the enrolled cohort. But to clinicians—or to anyone attempting replication—the flaw becomes evident: the trial results do not transport.

The effect is always contingent on the local disease mix, which shifts from trial to trial and setting to setting. This explains why initial “positive” synthetic-syndrome RCTs are so often contradicted by subsequent ones, sometimes even showing harm.

In short:

- Internally valid: randomization works inside the cohort.

- Externally fragile: the synthetic-syndrome construct ensures non-transportability.

Therefore, synthetic-syndrome RCTs should be abandoned. Clinical protocols should not be adopted from such trials unless causal evidence supports that the treatment effect is transportable across disease mixtures.

Task force consensus as futile science mimicry

This review indicates that the remarkable durability of the “Petty-Bone RCT” tradition—its failure to self‑correct despite repeated negative lessons—derives from a structural dependence on ever‑revised task‑force consensus definitions for syndromes and trial inclusion criteria. This reliance confers the appearance of science’s self‑corrective mechanism while preserving the original error. Investigators routinely cite the latest task‑force thresholds and often situate their designs within the lineage of Petty, Bone, and colleagues, thereby anchoring contemporary protocols to historical authority and conferring an aura of currency and legitimacy.

Within the consensus‑driven framework established since Bone—rooted in threshold‑based triage—multiple and mutually inconsistent approaches are treated as acceptable. Recall that is Sepsis-3 [44], a set of thresholds (SOFA) an organ‑failure score first guessed in 1996 was reframed as the new standard sepsis RCT triage threshold set in 2016. In 2024, the pediatric sepsis task force derived the Phoenix Sepsis Criteria [62], using a hybrid of 21st‑century machine learning and the 1960s Delphi expert‑convergence method from management science [65]. Here a central authority, in 2025, once again dictates threshold‑based enrollment for the study of severe infection in children, despite decades of disappointing experience with analogous methods in adults.

As demonstrated using do‑calculus and directed acyclic graphs (DAGs), this is science mimicry. It scarcely matters how thresholds are derived: each new threshold set is what Irving Langmuir described as an unrecognized apical error-one introduced by respected experts that corrupts downstream research. Here, the apical error is the aggregation of different diseases into a single synthetic construct in the first place.

Once embellished by global consensus and treated as unassailable, the initial mistake propagates through study design, trial interpretation, funding mandates, textbooks, and clinical guidelines. This is the transition from pathological science to “pathological consensus” [66]. The consequence is a cascade of misleading conclusions emanating from a flawed premise and perpetuated worldwide across generations of trainees.

Because task forces define both the inclusion thresholds (X) and the clinical state (Z), they instantiate a self‑reinforcing structure. Once enshrined, these definitions constrain subsequent designs, choke off innovation, and impede mechanistic reasoning. The resulting RCTs reproduce the procedures of science while lacking the foundations and healthy social structure that confer genuine self‑correction.

Task force defined synthetic syndromes are progressively elevated to the highest order of evidence

This review of the literature made clear that the synthetic syndrome, whether sepsis, ARDS, or CAP, represents not merely a scientific misstep, but a deeper institutional failure: the transformation of pathological science into pathological consensus. Over time, high-impact journals progressively elevated these task-force-defined constructs to the highest order of evidence. They became fixed symbolic categories dominating trial design, clinical protocols, grant funding, and global critical care research standards. Particularly in the example of sepsis, this ascendancy was not the result of discovery, but of expert opinion, artfully embellished by authoritative prose.

Despite their lack of causal grounding, these labels became normalized through promulgation, not verification, and institutionalized by international task forces and guidelines. The result was a pathological consensus: an illusion of scientific advancement built on symbolic coherence rather than validated theory or experiment. Critical care science proved unable to self-correct precisely because it was controlled by well-intentioned but epistemologically naïve task forces, regularly reaffirmed in periodic position statements. Perhaps the most impactful force was that this central control was endorsed by funding bodies. The consequence was a globally funded artifact, insulated from dissent and immune to correction by virtue of complete institutional and societal backing.

Harmonizing causal modeling and clinical statistics in trial design

Reversing this process requires confronting symbolic authority with causal humility. This entails routinely analyzing planned trials with causal symbolic modeling (cSM) and, in some instances, docalculus, while posing a basic but longoverdue question to the National Institutes of Health (NIH), the European Research Council (ERC), and the Medical Research Council (MRC): Why has the 40yearold “Petty-Bone RCT” modification—funded with public resources and marked by decades of consistent nonreproducibility—never been formally interrogated with cSM or any rigorous causal methodology, let alone validated?

Closing this gap requires a new academic alliance that bridges methodological cSM with clinical research statistics. At present, even goldstandard review frameworks fail to address the core problem. For example, despite its otherwise comprehensive reporting standards, CONSORT 2025 [67] contains no requirement for cSM or for the validation of clinical measurement standards. Under its current criteria, a Petty and Bonestyle RCT mimic—complete with its cohort collider and absent causal integrity—could still be judged methodologically sound. This loophole has allowed fatally flawed trial designs to gain legitimacy through procedural compliance alone and to persist for decades.

To safeguard both patient safety and scientific integrity, CONSORT 2025 should be amended to require explicit validation of causal structure and prespecified cSM. Based on the discoveries presented here—likely the proverbial canary in the coal mine—uniting the cSM and clinical research statistics communities to address this omission is not optional but an urgent matter of public interest.

Terminological reform through causal precision

It must be acknowledged that in many cases, especially early in presentation, the specific cause of severe infection may not be clear. In such contexts, terms like sepsis or septic shock retain clinical utility as provisional descriptors of physiologic severity. However, these terms cannot serve as standalone causal synthetic syndromes in clinical trials or scientific lexicons. Terms like “syndrome” misled statisticians into assuming disease equivalence in the Bradford Hill sense.

The process of deconditioning clinicians and trialists from symbolic syndrome science begins with language [6870]. The term “ARDS” is not simply a label, it is a symbol that encodes the illusion of a unified disease entity [68].

To dismantle the illusion that ARDS is a disease equivalent, critical care scientists should begin by replacing the synthetic syndrome with precise, mechanistic descriptors:

• “ARDS” becomes:

  • Severe COVID-19 pneumonia

  • Severe Influenza A pneumonia

  • Severe Post-aspiration respiratory distress

Each of these is not just a semantic alternative, it is a causally informative category, capable of guiding more valid trial design, mechanistic hypothesis generation, and therapeutic targeting. These can be further divides as proven useful for treatment or prognosis by phenotypes. Many times the cause will be uncertain but by studying the treatment of these diseases from the bottom up the clinician can make probabilistic judgments about treatment rather than routinely follow one-size-fits-all protocols which have failed so catastrophically in the past.

The first structural change must be a lexical shift, yet the critical care community remains far from undertaking it. Farkas taught, in the widely read EMCrit Project, that “ARDS is not a real thing,” in 2023 [71, 72]. In remarkable contrast, a year later, Matthay MA et al., (with almost more authors than one can count) published “A New Global Definition of Acute Respiratory Distress Syndrome” in the American Journal of Respiratory and Critical Care Medicine [48]. Both positions cannot be correct. The cSM analysis in this review clearly aligns with Farkas’ insightful critique.

The authors of the Global Definition Task Force [48] harkened back to the 1960s to secure scientific foundation with the statement, “The authors dedicate this report and the work of this committee to Thomas L. Petty, M.D., and John F. Murray, M.D., who provided foundational contributions to recognizing and studying ARDS’ The authors ignore the fact that Murray [73] in 1975 dissented, at first thoughtfully rejecting ARDS while Petty at that same time confessed to being a “lumper” [74]. The task force provided no evidence that ARDS is anything more than Petty’s original guessed construct.

Yet the significant work performed by this task force is clear. As a clinical document their work is useful. ARDS has a place as a severity construct and for awareness and education but the task force should make it clear by amendment now that ARDS is a synthetic syndrome which is not suitable as a disease equivalent for RCT.

Precision in language by experts restores a focus on mechanism, clears the fog of symbolic aggregation, and opens a pathway toward causally sound science. The blurring of the lines between heuristic based, useful clinical constructs and science has been a fundamental problem in critical care.

From a scientific perspective, replacing symbolic terms such as “ARDS” in RCTs with 21st-century, mechanistically grounded, pathophysiologically specific descriptors is not a semantic adjustment—it is a fundamental reorientation of clinical thought away from a 50-year-old hypothesis and toward true causation. Deconditioning begins not only by critiquing the old framework but also by deliberately replacing its language with terms that reflect causal reality. Providing this clarity is the responsibility of the task forces.

As mechanistic knowledge advances, and as time-series modeling becomes accepted and applied independent of consensus-based synthetic syndromes, the field will move toward a deeper understanding of the diverse trajectories of disease—encompassing source, host response, and pathophysiologic phases.

Global harm: synthetic syndromes and intellectual colonization

One thing that was evident from this historical review was the global expansion of task force–defined constructs such as ARDS and sepsis representing not merely a diffusion of terminology, but a deeper phenomenon: intellectual colonization. Originating in Western academic centers, these synthetic syndromes were exported to low- and middle-income countries (LMICs) under the banner of evidence-based standardization. In many regions, this served a valuable purpose—raising awareness of severe infection and emphasizing early intervention. But these syndromes arrived with their epistemological flaws intact: threshold-based definitions, symbolic constructs mistaken for mechanistic categories, and trial designs that mimicked RCTs while embedding cohort level collider bias and procedural artifacts.

As researchers in emerging centers sought to contribute to global knowledge, they often adopted these frameworks in good faith, pursuing what they believed were valid discoveries, unaware that the constructs themselves were simply guessed without science by clinicians the US and then advanced as standards over the decades by US and European leadership. This was not merely a passive transfer of pathological science, but one amplified by the dogma and demands of international funding bodies, which embedded their synthetic syndromes into trial eligibility criteria, protocol templates, and funding language. The desire to engage in impactful science, coupled with the need to align with fundable paradigms, created a self-perpetuating system in which the form of scientific rigor was preserved, but its foundation was hollow.

In this way, synthetic-syndrome science has scaled globally, not through causal discovery, but through semantic enforcement: a worldwide rehearsal of inherited error, repeated in the language of standardization.

Severe COVID-19 pneumonia and the illusion of evidence via syndrome inclusion

This review demonstrated how with the REMAP-CAP RCT, the focus on synthetic syndromes can displace funding for critical disease focused research. However there is another downstream clinical consequence caused by undue deference to synthetic syndromes which was discovered during this review. This is false evidence based medicine (FEBM). The record shows that one example of FEBM emerged during protocoled ventilator treatment based on PaO2/FIO2 of severe COVID-19 pneumonia and this offers a revealing case study in how synthetic syndromes can distort both inference and clinical practice.

Although SARS-CoV-2–related pneumonia did not exist when the Berlin ARDS criteria were formulated in 2012, it met those inclusion thresholds. To those holding the synthetic syndrome paradigm, it seemed rational to consider severe COVID pneumonia as ARDS because they considered ARDS a “heterogeneous syndrome” so it made sense to them (although no one else) that even a disease that did not exist when the criteria were defined by the task force would be included in ARDS if it met those criteria. Bain et al. [75] stated;

Whereas ARDS is a heterogeneous clinical entity resulting from various direct or indirect pulmonary insults, the severe hypoxemic respiratory failure of COVID‑19 pneumonia aligns with this framework and thus can be considered a variant of ARDS.

This is circular logic built on the language of the synthetic syndrome. ARDS is heterogeneous so a new disease fits in. Here you can see that the causal path was never considered or present. They did not see the mistake because Petty and Bone science is not rooted in causal modeling.

The problem is that the Berlin definition recommended initiation of ventilator treatment based on PaO2/FIO2 even beginning at the 200–300 mm Hg stratum, not strictly waiting until PaO₂/FiO₂ ≤ 200 mm Hg. Tobin argued that the ARDS diagnosis itself is dangerous because it triggers ARDS protocols.

A diagnosis of ARDS serves as a pretext for several perilous clinical practices. Present guidelines recommend 4 ml/kg, which foments severe air hunger, leading to prescription of hazardous (yet ineffective) sedatives, narcotics and paralytic agents. [68]

The surest way to increase COVID‑19 mortality is liberal use of intubation and mechanical ventilation. [76]

The experts trained in the 1960s synthetic syndrome science believed they were applying the logic of science when concluded that, since severe COVID-19 pneumonia generated X (the standard task force defined threshold breach) then it is S (ARDS) and properly treated with T (ventilator treatment tested by RCT for ARDS). Ironically, while empirical pharmacologic treatments (e.g., hydroxychloroquine, antivirals, immunomodulators) were, quite reasonably, initially dismissed as unproven by experts for lack of RCT validation, much more dangerous mechanical ventilation protocols were affirmed as “evidence based” and the standard of care solely on the basis of syndrome label matching.

This reveals the core clinical hazard of relying on synthetic syndromes like ARDS which goes beyond the hazard of its use as a placeholders for trial inclusion: they allow causal legitimacy to be transferred symbolically. COVID-19 thus exposed the facade of evidence-based medicine when it rests on syndromic continuity rather than mechanistic inquiry. When those ventilator protocols failed to deliver expected outcomes, the solution was not to reassess the structural assumptions of ARDS research and ARDS as a causal entity, but instead to initially hold the line and argue the evidence basis of the treatment and later, in resignation due to patient loss, create a new “disease” COVID-ARDS (CARDS) [77].

This maneuver, retrofitting of a new label to preserve the old framework with a modifier, is telling. If COVID-ARDS warrants its own label due to distinct behavior, why is there no pancreatitis-ARDS, trauma-ARDS, or aspiration-ARDS? And if each of these diseases behaves differently, then what causal coherence does “ARDS” retain? Critical care science should have listened to John Murray [73] when he, debating Petty [74], pointed out as much in the 1970s. Murray, a great leader of pulmonary science who, unfortunately was finally convinced by Petty, ironically died of COVID-19 pneumonia in 2020, which his colleagues in Italy almost certainly misconstrued as Petty’s ARDS.

The inheritance from 20th century critical care included: pathologic science

While the findings of this review clearly indicate a need for a revolution in critical care science, it is important to acknowledge that it is likely that no one alive today conceived the synthetic syndrome and “Petty-Bone RCT” framework. That architecture was developed by critical care’s mentor’s mentors, clinicians and scientists of the mid to late 20th century, in the wake of Bradford Hill’s introduction of the randomized controlled trial. These were dedicated scientists working with no formal tools for structural causal modeling, and under immense clinical pressure from deadly diseases like Da Nang Lung and Group A Streptococcal bacteremia, they crafted constructs like ARDS and sepsis as similarity heuristics, practical means to classify, communicate, and act in the face of physiological crisis.

The current generation of clinicians, trialists, and educators, were taught these constructs and in turn taught their mentees that these constructs were valid foundations for causal research. Popper stated in his debate with Kuhn [78];

The ‘normal’ scientist, as described by Kuhn, has been badly taught. He has been taught in a dogmatic spirit: he is a victim of indoctrination. He has learned a technique which can be applied without asking for the reason why.

Popper has perfectly described the young scientists trained in the present Petty and Bone era. They inherited the teachings not as hypotheses, but as dogma, consensus-based definitions assumed to be neutral, objective, and rigorous. This is also how many taught their mentees. This must be fixed now lest they waste their careers studying the latest task force derived X.

While this review has shown that these synthetic syndromes are symbolic representations of severity rather than coherent mechanistic states. Nowhere has their aesthetic clarity been more seductive than in the ICU, where such constructs seem to unify the chaos of diverse disease processes into a single, triage-ready label.

But aesthetic clarity is not causal clarity. The time has come to recognize and formally promulgate that some of the inheritance of critical care science, like the inheritance of many fields of science in the past, was pathological science.

Ethical considerations and origins of science mimicry

The ethical implications of enrolling patients in structurally flawed RCT mimics demand critical scrutiny. Participants in these studies are exposed to interventions within a framework that lacks the causal structure needed for generalizable knowledge. They do so under the premise of scientific rigor, yet the trial design may ensure that their data cannot answer the clinical question it purports to address. This undermines the principle of beneficence, a cornerstone of ethical research.

Importantly, this mimicry is not the product of willful misconduct or bad intent. Rather, it arises from a long-standing, self-reinforcing paradigm, a form of collective indoctrination. Clinician-researchers, guideline developers, and reviewers are socialized into a belief system that treats synthetic syndromes as valid categories for causal inference. This belief structure resists challenge not through evidence, but through institutional inertia and professional consensus. The time has come for the deep introspection the public expects and deserves.

The present state: RCT mimicry remains the standard

The synthetic paradigm shift of 2016 left the Petty and Bone mimicry essentially unaltered. As noted, in 2024, the pediatric sepsis task force followed the same path, producing the Phoenix Sepsis Criteria [62]—a framework strikingly similar to the 1996 expert-guessed thresholds known as SOFA, codified for adults in Sepsis-3 [44]. According to the task force’s own publication, the Phoenix thresholds were developed using a combination of Delphi consensus and machine learning (ML), lending an appearance of originality. Yet the result was so similar to SOFA—including, for example, the identical platelet threshold of 100—that upon examining the appendix, it was revealed that SOFA had been used for its training. Here, the ML of new task force was literally trained by the old task force.

Some have argued that the critical care research community is aware of these structural flaws. The evidence does not support that assertion. “Petty-Bone RCT”s continue to be published—most notably in high-profile platforms such as REMAP-CAP—and the pediatric sepsis criteria published in 2024 for RCT enrollment altered X (the set of threshold-based criteria) in essentially the same manner adult sepsis investigators had done eight years earlier [62]. There is no indication that the task forces recognize that X, as constructed from threshold criteria, functions as a cohort collider; changing X in this way only mimics self-correction. If they are aware of this flaw, they have not communicated it to the broader synthetic-syndrome research community.

Given the historical influence of task forces in defining threshold-based syndromic constructs for ARDS and sepsis, there is an obligation to act with full transparency. Meaningful reform must be intentional—led either by the very organizations that codified the original methodological error in 1992 or, failing that, by a new generation of investigators unwilling to perpetuate fatally flawed approaches. These groups are uniquely positioned to replace threshold-based enrollment criteria with designs that ensure causal identifiability and transportability. They should correct the error with the same force and visibility with which they once promoted Sepsis-3 from the lectern and in leading journals. These methodological oversights arose collectively, reflecting prevailing consensus at the time rather than isolated individual decisions.

Funding bodies share this responsibility. Many once required adherence to consensus-based definitions, embedding them into trial protocols, grant requirements, and eligibility criteria. These same institutions should now take the lead in dismantling their own methodological legacy—redirecting resources toward research paradigms that privilege mechanistic understanding over symbolic uniformity, even if that entails more complex enrollment strategies and larger multicenter collaborations

Critically positive contributions to public health from the synthetic syndrome era

The didactic benefit of synthetic syndromes has been substantial. The number of lives saved through early intervention, driven by syndrome-focused education and awareness in sepsis, is profound. Moreover, genuine advances have emerged from the past fifty years of synthetic syndrome–derived research, particularly in ventilator management—such as the demonstrated survival benefit of avoiding high tidal volumes and the efficacy of prone positioning.

Yet these lessons could have been learned without the synthetic-syndrome error and without false “evidence-based” mandates such as uniform PEEP tables, and the many subsequently reversed sepsis protocols.

The field now has the opportunity to lead again—this time toward genuine epistemological reform. This will require the courage to say, publicly and unequivocally: “We fooled ourselves, but we will not be fooled again.” However, the synthetic syndrome paradigm has very deep roots, and as Thomas Kuhn teaches, convincing deeply entrenched scientists to change requires presenting a viable alternative path. The integration of causal symbolic modeling (cSM) into trial development offers such a path, as no scientist employing cSM would persist in conducting “Petty-Bone RCT”s.

cSM offers the necessary framework for this transition—a structured representation of variables, causal relationships, and selection processes, typically formalized through equations and directed acyclic graphs (DAGs). Within this architecture, do-calculus provides the formal rules for determining whether causal effects are identifiable or transportable. DAGs, by virtue of their visual clarity, allow clinicians to readily grasp causal structures and thereby facilitate deeper collaboration with statisticians in both the critique and design of studies.

As Judea Pearl notes in The Book of Why [79], one of the great benefits of cSM is that its implementation fosters closer intellectual partnership between clinicians and statisticians throughout the entire process of analysis and trial design.

Knowledge (and science) monopolies and the “petty-bone RCT”

Harold Innis [80] describes the use of language and control of media to monopolize knowledge sourcing. In the field of critical care these media are the journals, textbooks, and taskforce promulgations. Critical care taskforces extend this control to define acceptable standards of science discourse by selecting the rules must be followed to harmonize the world’s synthetic syndrome. This is perceived as necessary because the synthetic syndromes have always been generated by the taskforce without science in the first instance.

This review demonstrates that a science monopoly can form around a simple apical methodological error made by an expert and never recognized. The monopoly is built on Langmuir’s pathological science [81] and evolves into pathological consensus. The unknowingly erroneous paradigm comes to dominate the terms of legitimacy. In critical care, the “Petty-Bone RCT” has long held that position, controlling funding, journals, and guidelines. Its survival does not come from success, but from the way it absorbs contradiction into endless threshold debates and taskforce updates. Failures that should provoke deep critique are reframed as technical glitches, and the monopoly continues intact.

The “Petty-Bone RCT” framework functions as a similar monopoly within critical care science. By anchoring decades of clinical research to synthetic syndromes (ARDS, sepsis, CAP) defined by consensus thresholds, thought leaders institutionalized a paradigm where trial design was constrained within pre-set symbolic constructs. This created a self-reinforcing cycle: new task forces reiterated and re-legitimized old definitions, and young investigators were trained to equate threshold-based mimicry with scientific rigor. Competing epistemologies—bottom-up pathogen-specific inquiry, mechanistic modeling, or causal symbolic modeling—were marginalized or rendered invisible within the dominant paradigm.

Change accelerates when clinicians state plainly what many suspect: that Petty and Bone trials cannot yield transportable causal estimates. Most importantly, clinicians must not participate in or tolerate synthetic discourse from the lectern or journals. To the expert clinician, the synthetic debate as to whether severe COVID pneumonia is included in the latest Global Definition of ARDS is preempted by the disclosure that ‘ARDS is not a thing.’ When reframed not as an abstract academic dispute but as a matter of patient safety, the urgency of reform becomes undeniable.

Like Innis’s scribes and media elites, the Petty and Bone era’s gatekeepers established a monopoly of scientific method, effectively controlling what counted as valid evidence. The endurance of this monopoly explains why decades of RCT mimicry persisted unchallenged despite repeated failure to produce transportable causal knowledge. The culture of synthetic debate exemplifies the defensive retreat of a thought monopoly, where critique of first principles is deflected and energy is absorbed with little deep thought.

In an example a 2023 study demonstrated that only 6% of single center critical care RCT published in major journals are reproducible with some reversed for harm [3] yet this is largely deflected as due to sample size or dichotomization with no deep failure mode analysis. There is a list of excuses explaining why the RCT was not reproducible and they seem to cycle through them. The “Petty-Bone RCT” methodology is never considered.

Monopolies resist gradual erosion but are vulnerable to visible shocks. Overt flaws in standard trial designs, frequent reversals in guideline recommendations, or the disclosure of false evidence-based medicine promulgated as a quality measure, can all serve as catalysts. Once cracks appear, the new standard must be fixed quickly: no trial should be funded without explicit causal modeling. When this becomes the accepted filter for legitimacy, the monopoly collapses, and the field stabilizes on firmer ground with local scientists again in charge of their own research direction, and the skepticism of clinicians and young scientists embraced as part of the rewarded culture.

Epistemic discontinuity: closing the domain interface-gap

Change theory [8284] underscores that the collapse of a paradigm requires the emergence of a clear alternative framework. For those committed to reform, the task is to move beyond endless synthetic debate by creating the conditions for a decisive transition. As this analysis demonstrates, the “Petty-Bone RCT” cannot be reconciled within any sufficient-component cause model. Clinicians should therefore draw on the causal frameworks articulated by Pearl [39], Rothman [85], Greenland [86], and Hernán [87] to establish a sound foundation for the next generation of clinical science.

Yet, when the RCT functions as a black box [88], it can be inadvertently corrupted by a single apical error introduced by authority. This risk is greatest when the error arises in measurement—within the vulnerable interface, or “domain interface gap’, created by the epistemic discontinuity between the knowledge domains of statisticians and clinical trialists. It is precisely this gap that must be bridged through causal modeling.

Unfortunately there also exists an analogous domain interface gap between the language of causal symbolic modeling (cSM) and that of clinicians including clinical trailists. The specialized syntax used within the cSM domain, while precise, limits accessibility for these audiences. For clinicians and trialists already inundated with a constant influx of new medical knowledge, such encapsulation of concepts can further hinder effective uptake and application.

The unresolved RCT crisis in critical care and the imperative for prompt reform

The RCT remains the preeminent tool for causal inference and, arguably, beyond, however, epistemic discontinuities as they relate to clinical trial design must be resolved without delay. In critical care, seamless knowledge flow across domains is essential to public health. Leaders are responsible for aligning interfaces to ensure coherence and to facilitate the assimilation of knowledge across domains. The need for such change is stark. Only months ago, in 2025, another “Petty-Bone RCT” was presented in the prestigious journal Critical Care Medicine.—this time testing 200 mg/day of aspirin a treatment of “sepsis and septic shock” [89]. In truth, the trial evaluated aspirin across a heterogeneous mix of diseases captured by Sepsis-3 (SOFA) triage thresholds. Conducted in Brazil, decades and thousands of miles from where Bone once promoted the streamlined triage model, the study could just as easily have been performed under Bone’s SIRS framework thirty years earlier. That is how little progress the field has made. Predictably, the trial was negative, with signals of harm: major bleeding occurred in 8.5% of aspirin-treated patients versus 1.2% in placebo (p = 0.02), along with increased transfusion requirements.

These investigators worked diligently; trials of this scale are monumental undertakings, and patients consented in the hope of advancing science. Yet, as with countless predecessors, these efforts were constrained by the Petty and Bone paradigm. Like generations of U.S. trainees before them, these scientists were unknowingly indoctrinated into a methodology incapable of producing transportable causal knowledge. The responsibility lies not with them but with the intellectual monopoly that perpetuates this framework. Allowing such a cycle to continue unchallenged is indefensible.

As shown by this review, the pathological science underlying much of critical care is rooted in siloing and basic epistemic discontinuities. Technically, these flaws should be straightforward to correct; the greater challenge lies in resistance to change, reinforced by societal and structural entrenchment and path dependency. Yet the public health risks identified herein should outweigh and mitigate such resistance.

Conclusion

Through historical and methodological analysis of the “Petty-Bone RCT” framework, using causal symbolic modeling (cSM), directed acyclic graphs (DAGs), and do-calculus, this review has proven that “Petty-Bone RCT” design is structurally incapable of producing even reasonably transportable causal estimates. This demonstrates formally that treatment protocols derived from RCTs of synthetic syndromes do not reliably generalize even to other patients meeting the same criteria. Accordingly, treatment protocols should not be based on such trials, except in rare and explicitly justified circumstances. This principle must become the default stance for trial interpretation and protocol development.

The “Petty-Bone RCT” is not a true randomized controlled trial but an RCT mimic. It preserves the procedural form of randomization and internal validity while building its cohorts on a collider: the synthetic syndrome. This destroys transportability. Yet to the casual observer, the “Petty-Bone RCT” appears indistinguishable from gold-standard science. Randomization, blinding, protocolized care, pre-specified outcomes—these surface markers of rigor project legitimacy. They reassure peer reviewers, guideline committees, journal editors, and even machine intelligence.

But beneath the polished exterior lies the trap of symbolic rigor. By conditioning on syndromes such as ARDS, sepsis, or CAP—labels born of consensus thresholds, not mechanistic unity—PettyBone trials collapse causal structure into an artificial mixture. The effect observed belongs not to any disease, but to a symbolic, ephemeral artifact. This deception is powerful: even seasoned statisticians, and even artificial intelligence, can be fooled into saluting the form while overlooking the fatal flaw and tricked by the symbolic veneer.

This is why decades of such trials have produced cycles of contradiction rather than durable causal knowledge. The streamlining modifications to Bradford Hill’s original method, subtle at first, proved fatal to transportability and generalizability. Petty and Bone design is not simply outdated science—it is pathological science institutionalized, sustained by task force control of the “causal path.” Centralized guideline development and international adoption of pathological consensus based threshold set have choked off the field’s natural capacity for methodological self-correction.

Harrell [90] reminds us that:

The experimental design is all important, and is what allows interpretations to be causal (emphasis added).

This principle is easily overlooked when the design happens to be an RCT, where confidence in randomization often overshadows deeper concerns. Yet this review has shown that RCT design can be unknowingly corrupted—and that such corruption is readily detectable through causal symbolic modeling (cSM). The analysis demonstrates that the current trajectory is pathological and unsustainable. Synthetic syndromes must be deliberately abandoned as trial inclusion criteria and replaced with designs grounded in mechanistic reasoning and causal identifiability.

The NIH should fund and facilitate the merging of the siloes. The mitigation of epistemic discontinuity should be recognized as a public health priority. To accomplish this cSM should be incorporated into the Consolidated Standards of Reporting Trials (CONSORT) framework and required in grant applications. Critical care task forces can serve a clinical guidance role, but critical care research must be returned to a more diverse, bottom-up, natural science approach—anchored in mechanism, not consensus thresholds.

In summary, this analytic review has, for first time, provided clear and new mathematical proof that not one more patient, not one more young investigator, not one more grant should be sacrificed to the next iteration of a Petty and Bone synthetic syndrome RCT.

Acknowledgements

The Author acknowledges the assistance of Rafael Leite, MD The author acknowledges the assistance of ChatGPT in the review and writing of the manuscript and in interactive production of the figures and tables based on the author's design.

Abbreviations

ARDS

Acute Respiratory Distress Syndrome

ATE

Average Treatment Effect

CAP

Community-Acquired Pneumonia

CARDS

COVID-19–Associated ARDS

CONSORT

Consolidated Standards of Reporting Trials

cSM

Causal Symbolic Modeling

DAG

Directed Acyclic Graph

EGDT

Early Goal-Directed Therapy

HES

Hydroxyethyl Starch

LMICs

Low- and Middle-Income Countries

NMB

Neuromuscular Blockade

PaO₂/FiO₂

Arterial Oxygen Tension to Inspired Oxygen Fraction Ratio

PEEP

Positive End-Expiratory Pressure

RCT

Randomized Controlled Trial

REMAP-CAP

Randomized, Embedded, Multifactorial, Adaptive Platform Trial for Community-Acquired Pneumonia

SIRS

Systemic Inflammatory Response Syndrome

SOFA

Sequential Organ Failure Assessment

SSC

Surviving Sepsis Campaign

Glossary of terms

ARDS (Acute Respiratory Distress Syndrome)

A synthetic syndrome label first described by Ashbaugh and Petty in 1967. It aggregates diverse respiratory failures (pneumonia, trauma, aspiration, sepsis) into a single threshold-based category. Originally hypothesized to mirror infant RDS due to surfactant deficiency, but never mechanistically validated.

Apical Error (Langmuir)

An unrecognized critical error originating from one or more authoritative leaders at the “apex” of a scientific methodology. As described by Langmuir, such errors can propagate unchecked through consensus and prestige, distorting an entire field until exposed by rigorous causal validation.

Bradford Hill RCT

The classical randomized controlled trial framework established in the 1940s by Sir Austin Bradford Hill. Characterized by mechanistically grounded inclusion criteria, randomization to sever backdoor paths, and interpretable counterfactual causal estimates.

CAP (Community-Acquired Pneumonia)

Initially a pragmatic operational category used for empirical antibiotic choice, based on expected pathogens. Later ossified into a synthetic syndrome when used as a trial inclusion label for interventions beyond antibiotics, such as corticosteroids.

Causal Symbolic Modeling (cSM)

A formal framework introduced by Pearl based on early causal path analysis by Wright and extended in this manuscript to model symbolic constructs like ARDS or sepsis. Used to demonstrate how non-mechanistic placeholders distort causal inference.

Cohort Collider

A structural bias arising when many differnt diseases are pooled into a synthetic syndrome via threshold criteria. Inclusion (S = 1) is determined by crossing arbitrary thresholds (X), producing mixtures of diseases (D₁, D₂ … Dᵢ) with distinct causal pathways. This destroys transportability.

Collider

In causal inference, a variable influenced by two or more upstream variables. Conditioning on a collider induces spurious associations. In “Petty-Bone RCT”s, synthetic syndromes like ARDS and sepsis act as cohort-level colliders, destroying transportability.

Consensus Task Force

Expert groups (e.g., ACCP/SCCM, Berlin Definition, Sepsis-3, Pediatric Phoenix Task Force) that promulgate synthetic syndrome definitions. Their threshold revisions mimic scientific progress but reinforce the underlying structural flaw.

Do-Calculus

A mathematical system of rules formalized by Judea Pearl to determine identifiability of causal effects in graphs. Applied here to prove “Petty-Bone RCT”s lack transportable causal validity.

Domain Interface Gap

The epistemic discontinuity which exists between knowledge domains often enlarged by the encapsulation of domain knowledge within domain specific syntax without sufficient interfacing translation.

FEBM (False Evidence-Based Medicine)

Evidence that appears robust because it is derived from a randomized trial but is structurally flawed due to reliance on synthetic syndrome constructs. Example: ventilator protocols during COVID-19 justified by ARDS inclusion criteria.

Heterogeneous Syndrome Alibi

A rhetorical defense of failed reproducibility, claiming heterogeneity of enrolled patients explains negative trials. Overlooks the fact that much of the heterogeneity is induced by the PettyBone design itself, not by nature.

Intellectual Colonization

IN critical care this comprises the global export of synthetic syndrome methodology to low- and middle-income countries, embedding flawed Western paradigms into local research, often through funding and guideline mandates.

Pathological Consensus

The conversion of pathological science, by standardization of the apical error, into a worldwide consensus. In the setting of critical care this entrenches symbolic constructs (e.g., ARDS, sepsis) as though they were mechanistic entities, rendering the pathological science as a permanent part of the discipline.

Pathological Science

A term coined by Irving Langmuir in 1953 to describe research that persists despite being based on an unrecognized apical error made by an expert, often sustained by authority bias rather than empirical validation. “Petty-Bone RCT”s exemplify institutionalized pathological science.

“Petty-Bone RCT” (RCT Mimic)

A streamlined, pathological modification of the Bradford Hill RCT introduced by Thomas Petty (ARDS) and Roger Bone (sepsis). Enrollment is determined by threshold sets (e.g., SIRS, SOFA), creating synthetic syndromes. Mimics trial form but lacks transportable causal validity.

PettyBone Science

The prevailing pathological research paradigm in critical care shaped by Petty and Bone, where synthetic syndromes replace mechanistic entities as trial entry points. Produces superficially rigorous RCTs that fail causal validation.

Placeholders/Synthetic Syndromes

Clinical constructs such as ARDS and sepsis derived from similarity heuristics especially when used in RCT design. These act as symbolic flags rather than mechanistic disease entities. Valid as communication tools in clinical practice but invalid as trial entry criteria and for rigid treatment protocols.

REMAP-CAP (Randomized, Embedded, Multifactorial, Adaptive Platform Trial for Community-Acquired Pneumonia)

A modern adaptive platform trial testing treatments in community-acquired pneumonia. Despite statistical sophistication, it inherited PettyBone flaws by pooling different species of bacterial pneumonia and different species of viral pneumonia, undermining transportability.

Science Monopoly

A state where a single research paradigm dominates discourse, critique, legitimacy, funding, and publication. In critical care, the “Petty-Bone RCT” and synthetic syndromes created a monopoly of method, marginalizing mechanistic and causal alternatives.

SIRS (Systemic Inflammatory Response Syndrome)

Threshold-based construct (temperature, heart rate, WBC count, etc.) introduced by Roger Bone in 1992 as an entry gate for sepsis RCTs. Later abandoned but replaced with equally nonspecific SOFA, perpetuating the same error.

SOFA (Sequential Organ Failure Assessment)

Originally ‘Sepsis-related Organ Failure Assessment’ (1996), later recognized as nonspecific for sepsis in 1998 triggering a name change. Adopted in Sepsis-3 (2016), with the addition of “suspicion” of infection to enhance specificity. Promulgated as a replacement for SIRS during a synthetic paradigm shift engineered by a task force. Functions as another in a sequence of arbitrary triage threshold sets along the PettyBone pathological science continuum.

Synthetic Debate

Debate and discourse relating to threshold choices and results of pathological science based research within the PettyBone era. This creates the illusion of deep scientific discourse and progress while the structural flaw remains untouched.

Synthetic Paradigm Shift

The act of redefining synthetic syndromes by changing thresholds (e.g., SIRS → SOFA; Berlin ARDS → Global Definition 2024). Creates the illusion of scientific progress while leaving the structural flaw untouched.

Synthetic Syndrome Task Force

Organized expert committees that repeatedly update threshold-based definitions of synthetic syndromes and then promulgate them as standards for research and funding worldwide. Their revisions are adopted worldwide, reinforcing institutional legitimacy while failing to address causal invalidity.

Author contributions

LL wrote the manuscript and designed the figures and tables. ChatGPT was used to assist in this process.

Data availability

No datasets were generated or analysed during the current study.

Declaration

Ethics and consent to publish

Not applicable.

Competing interests

Dr. Lynn own a software research and development company which develops software for the analysis of electronic medical records and medical monitor datasets.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Vincent JL. We should abandon randomized controlled trials in the intensive care unit. Crit Care Med. 2010;38(10 Suppl):S534–38. 10.1097/CCM.0b013e3181f208ac. [DOI] [PubMed]
  • 2.Marshall JC. Why have clinical trials in sepsis failed? Trends Mol Med. 2014, Apr;20(4):195–203. 10.1016/j.molmed.2014.01.007. [DOI] [PubMed]
  • 3.Kotani Y, Takeuchi M, Makino K. Positive single-center randomized trials and subsequent multicenter randomized trials in intensive care: a systematic review. Crit Care. 2023;27(1):19. 10.1186/s13054-023-04755-5. [DOI] [PMC free article] [PubMed]
  • 4.Ranieri VM, Thompson BT, Barie PS, et al. Drotrecogin alfa (activated) in adults with septic shock. N Engl J Med. 2012;366(22):2055–64. 10.1056/NEJMoa1202290. [DOI] [PubMed]
  • 5.Yealy DM, Kellum JA, Huang DT, Barnato AE, Weissfeld LA, Pike F, et al. A randomized trial of protocol-based care for early septic shock. N Engl J Med. 2014;370(18):1683–93. 10.1056/NEJMoa1401602. [DOI] [PMC free article] [PubMed]
  • 6.Brunkhorst FM, Engel C, Bloos F, Meier-Hellmann A, Ragaller M, Weiler N, et al. Intensive insulin therapy and pentastarch resuscitation in severe sepsis. N Engl J Med. 2008;358(2):125–39. 10.1056/NEJMoa070716. [DOI] [PubMed]
  • 7.Moss M, Huang DT, Brower RG, Ferguson ND, Ginde AA, Gong MN, et al. Early neuromuscular blockade in the acute respiratory distress syndrome. N Engl J Med. 2019;380(21):1997–2008. 10.1056/NEJMoa1901686. [DOI] [PMC free article] [PubMed]
  • 8.Aoyama H, Pettenuzzo T, Aoyama K, Pinto R, Englesakis M, Fan E. Association of early neuromuscular blockade in the acute respiratory distress syndrome with mortality: a systematic review and meta-analysis. Jama. 2020;323(13):1312–22. 10.1001/jama.2020.2189.
  • 9.Legrand M, Vincent J-L, Payen D. Negative trials in critical care medicine and the hurdles. Lancet Respir Med. 2018;6(2):145–47. 10.1016/S2213-2600(18)30342-4. [DOI] [PubMed]
  • 10.Granholm A, Alhazzani W, Derde LPG, Angus DC, Zampieri FG, Hammond NE, et al. Randomised clinical trials in critical care: past, present and future. Intensive Care Med. 2022;48(8):1052–63. 10.1007/s00134-021-06587-9. [DOI] [PMC free article] [PubMed]
  • 11.Wright S. Correlation and causation. J Agric Res. 1921;20:557–85.
  • 12.Wright S. Path coefficients and path analysis. Ann Math Stat. 1934;5(3):161–215.
  • 13.Fisher RA. The design of experiments. Edinburgh: Oliver & Boyd; 1935.
  • 14.Medical research council. Streptomycin treatment of pulmonary tuberculosis: a medical research council investigation. BMJ. 1948;2(4582):769–82. 10.1136/bmj.2.4582.769. [PMC free article] [PubMed]
  • 15.Ashbaugh DG, Bigelow DB, Petty TL, Levine BE. Acute respiratory distress in adults. Lancet. 1967;2(7511):319–23. 10.1016/S0140-6736(67)90168-7. [DOI] [PubMed]
  • 16.Petty TL, Ashbaugh DG. The adult respiratory distress syndrome: clinical features, factors influencing prognosis and principles of management. Chest. 1971;60(3):233–39. 10.1378/chest.60.3.233. [DOI] [PubMed]
  • 17.Avery ME, Mead J. Surface properties in relation to atelectasis and hyaline membrane disease. Am J Dis Child. 1959;97(5):517–23. [DOI] [PubMed]
  • 18.IrishCentral Staff. The tragic death of Patrick, JFK and Jackie’s newborn son, in 1963. IrishCentral. 2024, Mar.
  • 19.[Unknown author]. Da Nang lung among casualties puzzles doctors. Hartford Courant. 1968, Apr, 16.
  • 20.Inkster JS. Residual positive pressure. Proc World Congr Anaesthesiol. 1968.
  • 21.Petty TL. Peep (positive end-expiratory pressure). Chest. 1972;61(4):309. [DOI] [PubMed]
  • 22.Alviar CL, Miller PE, McAreavey D, Katz JN, Lee B, Moriyama B, et al. Positive pressure ventilation in the cardiac intensive care unit. J Am Coll Cardiol. 2018;72:1532–53. [DOI] [PMC free article] [PubMed]
  • 23.Ferrer M, Esquinas A, Leon M, Gonzalez G, Alarcon A, Torres A. Noninvasive ventilation in severe hypoxemic respiratory failure: a randomized clinical trial. Crit Care Med. 2003;31(12):2510–16. [DOI] [PubMed]
  • 24.Bone RC. Sepsis syndrome: a valid clinical entity. Crit Care Med. 1989;17(5):389–93. 10.1097/00003246-198905000-00001. [PubMed]
  • 25.Bone RC, Balk RA, Cerra FB, Dellinger RP, Fein AM, Knaus WA, et al. Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. Chest. 1992;101(6):1644–55. 10.1378/chest.101.6.1644. [DOI] [PubMed]
  • 26.O’Connor A. A boy’s death pushes hospitals to take on sepsis. N Y Times. 2012, Aug, 31.
  • 27.End Sepsis. The legacy of Rory Staunton. EndSepsis.org. [Internet].
  • 28.Senn S, Julious S. Measurement in clinical trials: a neglected issue for statisticians? Stat Med. 2009;28(26):3189–209. [DOI] [PubMed]
  • 29.Levy MM, Fink MP, Marshall JC, Abraham E, Angus D, Cook D, et al. International sepsis definitions conference. Crit Care Med. 2001. 2003;31(4):1250–56. 10.1097/01.CCM.0000050454.01978.3B. [DOI] [PubMed]
  • 30.Rivers E, Nguyen B, Havstad S, Ressler J, Muzzin A, Knoblich B, et al. Early goal-directed therapy in severe sepsis and septic shock. N Engl J Med. 2001;345(19):1368–77. 10.1056/NEJMoa010307. [DOI] [PubMed]
  • 31.van den Berghe G, Wouters P, Weekers F, Verwaest C, Bruyninckx F, Schetz M, D, et al. Bouillon. Intensive insulin therapy in critically ill patients. N Engl J Med. 2001, Nov, 8;345(19):1359–67. 10.1056/NEJMoa011300. [DOI] [PubMed]
  • 32.NICE-SUGAR Study Investigators. Intensive versus conventional glucose control in critically ill patients. N Engl J Med. 2009;360(13):1283–97. 10.1056/NEJMoa0810625. [DOI] [PubMed]
  • 33.Annane D, Sébille V, Charpentier C, Bollaert PE, François B, Korach JM, et al. Effect of treatment with low doses of hydrocortisone and fludrocortisone on mortality in patients with septic shock. Jama. 2002;288(7):862–71. 10.1001/jama.288.7.862. [DOI] [PubMed]
  • 34.Sprung CL, Annane D, Keh D, Moreno R, Singer M, Freivogel K, et al. CORTICUS Study Group. Hydrocortisone therapy for patients with septic shock. N Engl J Med. 2008;358(2):111–24. 10.1056/NEJMoa071366. [DOI] [PubMed]
  • 35.Bernard GR, Vincent JL, Laterre PF, LaRosa SP, Dhainaut JF, Lopez-Rodriguez A, et al. PROWESS Study Group. Efficacy and safety of recombinant human activated protein C for severe sepsis. N Engl J Med. 2001;344(10):699–709. 10.1056/NEJM200103083441001. [DOI] [PubMed]
  • 36.Ranieri VM, Thompson BT, Barie PS, Dhainaut JF, Douglas IS, Finfer S, et al. PROWESS-SHOCK Study Group. Drotrecogin alfa (activated) in adults with septic shock. N Engl J Med. 2012;366(22):2055–64. 10.1056/NEJMoa1202290. [DOI] [PubMed]
  • 37.ProCESS Investigators. A randomized trial of protocol-based care for early septic shock. N Engl J Med. 2014;370(18):1683–93. 10.1056/NEJMoa1401602. [DOI] [PMC free article] [PubMed]
  • 38.ARISE Investigators; ANZICS Clinical Trials Group. Goal-directed resuscitation for early septic shock. N Engl J Med. 2014;371(16):1496–506. 10.1056/NEJMoa1404380. [DOI] [PubMed]
  • 39.Mouncey PR, Osborn TM, Power GS, Harrison DA, Sadique MZ, Grieve RD, et al. ProMISe trial investigators. Trial of early, goal-directed resuscitation for septic shock. N Engl J Med. 2015;372(14):1301–11. 10.1056/NEJMoa1500896. [DOI] [PubMed]
  • 40.Lynn LA. The diagnosis of sepsis revisited: a challenge for young medical scientists in the 21st century. Patient Saf Surg. 2014;8:1. 10.1186/1754-9493-8-1. [DOI] [PMC free article] [PubMed]
  • 41.Dellinger RP, Levy MM, Rhodes A, Annane D, Gerlach H, Opal SM, et al. Surviving sepsis campaign guidelines committee including the pediatric subgroup. Surviving sepsis campaign: international guidelines for management of severe sepsis and septic shock, 2012. Crit Care Med. 2013;41(2):580–637. 10.1097/CCM.0b013e31827e83af. [DOI] [PubMed]
  • 42.Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, et al. The sofa (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. Intensive Care Med. 1996;22(7):707–10. [DOI] [PubMed] [Google Scholar]
  • 43.Vincent JL, de Mendonça A, Cantraine F, Moreno R, Takala J, Suter PM, et al. Use of the sofa score to assess the incidence of organ dysfunction/failure in intensive care units: results of a multicenter, prospective study. Crit Care Med. 1998;26(11):1793–800. [DOI] [PubMed] [Google Scholar]
  • 44.Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for sepsis and septic shock (sepsis-3). Jama. 2016;315(8):801–10. 10.1001/jama.2016.0287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bernard GR, Artigas A, Brigham KL, Carlet J, Falke K, Hudson L, et al. The American-European consensus conference on ards. Am J Respir Crit Care Med. 1994;149(3 Pt 1):818–24. 10.1164/ajrccm.149.3.7509706. [DOI] [PubMed] [Google Scholar]
  • 46.Definition Task Force ARDS, Ranieri VM, Rubenfeld GD, Thompson BT, Ferguson ND, Caldwell E, et al. Acute respiratory distress syndrome: the Berlin definition. Jama. 2012;307(23):2526–33. 10.1001/jama.2012.5669. [DOI] [PubMed] [Google Scholar]
  • 47.Ferguson ND, Fan E, Camporota L, Antonelli M, Anzueto A, Beale R, et al. The Berlin definition of ards: expanded rationale, justification, and supplementary material. Intensive Care Med. 2012;38(10):1573–82. 10.1007/s00134-012-2682-1. [DOI] [PubMed] [Google Scholar]
  • 48.Matthay MA, Arabi YM, Siegel ER, Ware LB, Bos LD, Guerin C, et al. A new global definition of acute respiratory distress syndrome. Am J Respir Crit Care Med. 2024;209(1):37–47. 10.1164/rccm.202309-1467WS. [DOI] [PMC free article] [PubMed]
  • 49.Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82(4):669–710.
  • 50.Pearl J. Causality: models, reasoning, and inference. 2nd. Cambridge: Cambridge University Press; 2009.
  • 51.Angus DC, Berry S, Lewis RJ, Al-Beidh F, Arabi YM, van Bentum-Puijk W, et al. The REMAP-CAP (randomized embedded multifactorial adaptive platform for community-acquired pneumonia) study: rationale and design. Ann Am Thorac Soc. 2020;17(7):879–91. 10.1513/AnnalsATS.202003-192SD. [DOI] [PMC free article] [PubMed]
  • 52.Angus DC. REMAP-CAP investigators. Effect of hydrocortisone on mortality in patients with severe community-acquired pneumonia: the REMAP-CAP corticosteroid domain randomized clinical trial. Intensive Care Med. 2025;51(7):1415–28. 10.1007/s00134-025-07861-w. [DOI] [PMC free article] [PubMed]
  • 53.Pirracchio R, Sprung CL. REMAP-CAP corticosteroids: yet, another swing of the pendulum? Intensive Care Med. 2025;51:1135–38. 10.1007/s00134-025-07957-3. [DOI] [PubMed]
  • 54.Dequin PF, Meziani F, Quenot JP, Kamel T, Ricard JD, Badie J, et al. Hydrocortisone in severe community-acquired pneumonia (CAPECOD). N Engl J Med. 2023;388:1931–41. 10.1056/NEJMoa2215145. [DOI] [PubMed]
  • 55.Wirz SA, Blum CA, Schuetz P, Christ-Crain M, Huber A, Mueller B, et al. Pathogen- and antibiotic-specific effects of prednisone in community-acquired pneumonia. Eur Respir J. 2016;48(4):1150–59. 10.1183/13993003.00474-2016. [DOI] [PubMed]
  • 56.Taenaka H, Nagata K, Kuroda T, Sato H, Okada Y, Nakajima Y, et al. Biological effects of corticosteroids on pneumococcal pneumonia in mice—translational significance. Crit Care. 2024;28(1):185. 10.1186/s13054-024-04956-6. [DOI] [PMC free article] [PubMed]
  • 57.Tsai MJ, Yang KY, Chan MC, Kao KC, Wang HC, Wu CL, et al. Impact of corticosteroid treatment on clinical outcomes of influenza-associated ards: a multicenter retrospective cohort study. Ann Intensive Care. 2020;10:26. 10.1186/s13613-020-0642-4. [DOI] [PMC free article] [PubMed]
  • 58.Lansbury L, Rodrigo C, Leonardi-Bee J, Nguyen-Van-Tam J, Shen Lim W. Corticosteroids as adjunctive therapy in the treatment of influenza: a Cochrane systematic review. Cochrane Database Syst Rev. 2019;CD010406.pub3. 10.1002/14651858.CD010406.pub3. [DOI] [PMC free article] [PubMed]
  • 59.Confalonieri M, Urbino R, Potena A, Piattella M, Parigi P, Puccio G, et al. Hydrocortisone infusion for severe community-acquired pneumonia: a preliminary randomized study. Am J Respir Crit Care Med. 2005;171(3):242–48. [DOI] [PubMed]
  • 60.Iuliano AD, Roguski KM, Chang HH, Muscatello DJ, Palekar R, Tempia S, et al. Estimates of global seasonal influenza–associated respiratory mortality: a modelling study. Lancet. 2018;391(10127):1285–300. 10.1016/S0140-6736(17)33293-2. [DOI] [PMC free article] [PubMed]
  • 61.Wang C, Yang H, Wu S, Chen Y, He J, Yang Y, et al. Sepsis heterogeneity: translating biology into clinical practice. Intensive Care Med. 2023;49(3):283–98. 10.1007/s00134-023-07025-7.
  • 62.Schlapbach LJ, Kissoon N, Argent AC, de Souza DC, Brierley J, Carcillo J, et al. International consensus criteria for pediatric sepsis and septic shock. Jama. 2024;331(6):492–504. 10.1001/jama.2024.0311. [DOI] [PMC free article] [PubMed]
  • 63.Seymour CW, Kennedy JN, Wang S, Chang CH, Elliott CF, Xu Z, et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. Jama. 2019;321(20):2003–19. 10.1001/jama.2019.5791. [DOI] [PMC free article] [PubMed]
  • 64.Li Y, Chen T, Wang S, Zhou H, Xu Z, Zhang Y, et al. Unsupervised clustering for sepsis identification in large-scale patient data: a model development and validation study. Intensive Care Med Exp. 2025;13(1):6. 10.1186/s40635-025-00744-w. [DOI] [PMC free article] [PubMed]
  • 65.Dalkey N, Helmer O. An experimental application of the Delphi method to the use of experts. RAND Corporation Memorandum. 1963;RM-727.
  • 66.Lynn LA, Leite RO, Lynn DA. The Physician’s war: the story of the hidden battle between physicians and a science based on pathological consensus. Amazon. Published September 17, 2024. ISBN-13: 979-8339212850.
  • 67.Hopewell S, Chan AW, Collins GS, Hróbjartsson A, Moher D, Schulz KF, et al. Consort 2025 statement: updated guideline for reporting randomised trials. BMJ. 2025;389:e081123. 10.1136/bmj-2024-081123. [DOI] [PMC free article] [PubMed]
  • 68.Tobin MJ. Ards: hidden perils of an overburdened diagnosis. Intensive Care Med. 2022;48(2):194–96. 10.1007/s00134-021-06615-7. [DOI] [PMC free article] [PubMed]
  • 69.Tobin MJ. Pondering the atypicality of ards in COVID-19 is a distraction for the bedside doctor. Intensive Care Med. 2021;47(3):361–62. 10.1007/s00134-021-06302-6. [DOI] [PMC free article] [PubMed]
  • 70.Tobin MJ. Does making a diagnosis of ards in patients with coronavirus disease 2019 matter? Chest. 2020;158(6):2275–77. 10.1016/j.chest.2020.07.028. [DOI] [PMC free article] [PubMed]
  • 71.Farkas JD. Ards is not a disease! Internet Book of Critical Care (IBCC). Emcrit project. 2021. Sep 7 [updated 2023 May 27]. Available from https://emcrit.org/ibcc/ards/.
  • 72.Farkas JD. “Ards” is not a real thing. Emcrit project. 2023 May 27. Available from: https://emcrit.org/pulmcrit/pulmcrit-ards-is-not-a-real-thing/.
  • 73.Murray JF. The adult respiratory distress syndrome (may it rest in peace). Am Rev Respir Dis. 1975;111(6):716–18. 10.1164/arrd.1975.111.6.716. [DOI] [PubMed]
  • 74.Petty TL. The adult respiratory distress syndrome (confessions of a “lumper”). Am Rev Respir Dis. 1975;111(6):713–15. 10.1164/arrd.1975.111.6.713. [DOI] [PubMed]
  • 75.Bain W, Marchand-Adam S, Ricard JD. COVID-19 versus non–COVID-19 acute respiratory distress syndrome: the pathophysiologic differences suggest that neither should be treated as interchangeable. Am J Respir Crit Care Med. 2021;204(5):598–600.
  • 76.Tobin MJ, Laghi F, Jubran A. Caution about early intubation and Mechanical ventilation in COVID-19. Ann Intensive Care. 2020;10:78. 10.1186/s13613-020-00692-6. [DOI] [PMC free article] [PubMed]
  • 77.Pu D, Wu W, Wang J, Li C, Zhang Y. COVID-19-related ards (CARDS): “typical” or “atypical” ards? Ann Transl Med. 2022;10(16):908. 10.21037/atm-22-3717. [DOI] [PMC free article] [PubMed]
  • 78.Rowbottom DKV. Popper on criticism and dogmatism in science: a resolution at the group level. Stud Hist Philos Sci. 2010;41(4):386–94.
  • 79.Pearl J, Mackenzie D. The book of why: the new science of cause and effect. 1st. New York: Basic Books; 2018.
  • 80.Innis HA. The bias of communication. Toronto: Univ Toronto Press; 1951. p. 3–4.
  • 81.Langmuir I. Pathological science. In: Lecture presented at. California Institute of Technology; 1953. Available from: https://www.cs.princeton.edu/%7Eken/Langmuir/langmuir.htm.
  • 82.Kuhn TS. The structure of scientific revolutions. 2nd. Chicago: University of Chicago Press; 1970.
  • 83.Lewin K. Field theory in social science. New York: Harper & Row; 1951.
  • 84.Kotter JP. Leading change. Boston: Harvard Business School Press; 1996.
  • 85.Rothman KJ Causes. Am J Epidemiol. 1976;104(6):587–92. 10.1093/oxfordjournals.aje.a112335. [DOI] [PubMed]
  • 86.Greenland S, Brumback B. Causal inference. In: Armitage P, Colton T, editors. Encyclopedia of biostatistics. Chichester: Wiley; 2005. p. 1–15.
  • 87.Hernán MA, Robins JM. Causal inference: what if. Boca Raton: Chapman & Hall/CRC; 2020.
  • 88.Bates MA, Glennerster R. The generalizability puzzle. Stanford social innovation review. 2017. Available from: https://ssir.org/articles/entry/the_generalizability_puzzle. Accessed 2025 Aug 27.
  • 89.Almeida TML, Freitas FGR, Figueiredo RC, Houly SG, Azevedo LCP, Cavalcanti AB, et al. Acetylsalicylic acid treatment in patients with sepsis and septic shock: a phase II, placebo-controlled, randomized clinical trial. Crit Care Med. 2025;53(2):e269–81. 10.1097/CCM.0000000000006564. [DOI] [PubMed]
  • 90.Fe H Jr. Improving research through safer learning from data [Internet]. 2018 Mar 8. [cited 2025 Aug 21]. Available from. https://www.fharrell.com/post/improve-research/.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No datasets were generated or analysed during the current study.


Articles from Patient Safety in Surgery are provided here courtesy of BMC

RESOURCES