Skip to main content
. 2020 Sep 3;11:2099. doi: 10.3389/fpsyg.2020.02099

Table 1.

Reproduction of table from McPhetres (2019) with our response to each point raised.

Degree of freedom code (Wicherts et al., 2016) McPhetres (2019) criticism Our response
T2: Vague hypotheses • “We anticipated that acetaminophen would enhance effortful, reflective learning, and decrease reliance on intuitive, reflexive learning strategies”
• “Enhanced” and “poorer” performance
As reviewed in our paper (Pearson et al., 2019), this “vague” hypothesis was derived from and constrained by prior theoretical neuroscience models and predicts that the probability of success will increase faster for the acetaminophen group on a reflective learning task, and it will increase slower for the acetaminophen group on a reflexive learning task.
D5: Measuring additional variables • Depression scale
• Task performance at “chance level”
• Trials to criterion, learning rate
There was only one measured variable that was analyzed: correct response, which was assessed for 150 trials on each of two learning tasks. McPhetres is listing here study eligibility criteria, data quality criteria, and analytic approaches—not measured variables.
D6: Lack of power analysis • No justification for sample size given We used text from Mischkowski et al. (2016) (p. 1,346), which suggests 40–54 participants per cell provides sufficient power to detect a behavioral effect of acetaminophen. Ours was not a pure between-subjects design but rather a between-subjects factor acting on a within-subjects difference.
D7: No Sampling plan specified • No sampling plan specified Sampling plan, based on the above reference, was to recruit 50 participants per group.
A1/2: Vague exclusion criteria and “datacleaning” • Low score on depressive symptoms
• Lacking “complete” task data
• Task performance “at or below chance”
Depression scores were part of study eligibility prescreen; no one who completed the study was excluded for this. Eliminating random responders is standard practice, not a researcher DF. Reanalysis including all participants does not weaken findings.
A3: Treating statistical abnormalities ad-hoc • Because of possible suppression “… exploratory analysis was conducted to examine accuracy until the first rule change”
• Recoding trials to criterion as Yes/No “since the majority of participants failed to reach criterion …”
Analysis of the first rule has a strong a priori theoretical rationale. This is explained in the following blog post: https://jashu.github.io/post/apap/. An analysis of variance requires variance to analyze. When the majority of participants are right censored (their actual trials-to-criterion value was not observed and is unknown), dichotomizing to event observed/not observed is the best analytic option.
A6: Multiple scorings of the DV • Overall accuracy score
• Accuracy until the first rule change
• Number of trials to criterion
• Dichotomizing reaching criterion (Yes/No) on the reflexive-optimal task
These are indeed multiple scorings of the DV, but they are important alternative perspectives to consider, and we obviously have reported all of them. Findings are largely consistent, regardless of how DV is operationalized.
R1: Reproducibility is not assured • Data is not publicly available and not shared Data are available here: Pearson et al. (2019)
R5: Misreporting results • Eta-squared values are calculated incorrectly (noticed by a reviewer) McPhetres and his reviewer calculated partial eta-squared, not eta-squared.
R6: Presenting exploratory results as confirmatory (HARKing) • Trials to criterion scores
• Learning rate analysis
These are not exploratory results; they are alternate constructs of the DV, but they all test the same a priori hypothesis. No new hypothesis was generated by any of these results.